A loosely coupled interactive web data extraction system

Jui Yuan Su*, Lung Pin Chen, I-Chen Wu

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

As the rapid growing of Internet, the Web data extraction (DE) system has become a convenient tool for application programs to collect useful data. A DE system takes a wrapper as its input, which is a script describes how to navigate Web pages and extract the data. The integration of application program, wrappers, and DE system is a nontrivial task due to the complete coordination and interaction among application programs and DE systems. Furthermore, the asynchronous update technologies used in many Web pages, such as AJAX, make the integration more complex. This paper proposes a loosely coupled interactive DE system based on Browser-Oriented Data Extraction (BODE) systems and the Web Service Resource Framework (WSRF), a web service specification which standardizes the accessing and manipulation of states for web services. The loosely coupled interactive DE system provides users with the states of wrappers through the web service states, called WS-Resources. The interactive DE system can also accept parameters from the application programs during extraction through the WSRF's notification design pattern. By providing the above interactive capabilities, the application programs and wrappers of the DE systems can be easily shared and controlled.

Original languageEnglish
Pages (from-to)237-249
Number of pages13
JournalJournal of Internet Technology
Volume11
Issue number2
DOIs
StatePublished - 10 Jun 2010

Keywords

  • Browser-oriented data extraction and WSRF
  • Data extraction
  • Notification
  • WS-resource
  • Wrapper

Fingerprint Dive into the research topics of 'A loosely coupled interactive web data extraction system'. Together they form a unique fingerprint.

Cite this