A Web data extraction description language and its implementation

I-Chen Wu*, Jui Yuan Su, Loon Been Chen

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Scopus citations

Abstract

A data extraction model, named the browser-oriented data extraction (BODE) model, was proposed in [14] to extract web contents with script functions. In this model, the system built on top of browsers accesses pages by simulating users' operations on browsers. Based on this model, this paper defines a scripting language, named the BODED (Browser-Oriented Data Extraction Description) language, which instructs the system how to do data extraction. This paper proposes a technique, called indirect browser replication to implement a BODE system, and also optimize the performance of this technique.

Original languageEnglish
Title of host publicationProceedings of the 29th Annual International Computer Software and Applications Conference - Workshops and Fast Abstracts, COMPSAC 2005
Pages293-298
Number of pages6
DOIs
StatePublished - 1 Dec 2005
Event29th Annual International Computer Software and Applications Conference, COMPSAC 2005 - Edinburgh, Scotland, United Kingdom
Duration: 26 Jul 200528 Jul 2005

Publication series

NameProceedings - International Computer Software and Applications Conference
Volume1
ISSN (Print)0730-3157

Conference

Conference29th Annual International Computer Software and Applications Conference, COMPSAC 2005
CountryUnited Kingdom
CityEdinburgh, Scotland
Period26/07/0528/07/05

Cite this

Wu, I-C., Su, J. Y., & Chen, L. B. (2005). A Web data extraction description language and its implementation. In Proceedings of the 29th Annual International Computer Software and Applications Conference - Workshops and Fast Abstracts, COMPSAC 2005 (pp. 293-298). [1510035] (Proceedings - International Computer Software and Applications Conference; Vol. 1). https://doi.org/10.1109/COMPSAC.2005.38