As the rapid growing of Internet, the Web data extraction (DE) system has become a convenient tool for application programs to collect useful data. A DE system takes a wrapper as its input, which is a script describes how to navigate Web pages and extract the data. The integration of application program, wrappers, and DE system is a nontrivial task due to the complete coordination and interaction among application programs and DE systems. Furthermore, the asynchronous update technologies used in many Web pages, such as AJAX, make the integration more complex. This paper proposes a loosely coupled interactive DE system based on Browser-Oriented Data Extraction (BODE) systems and the Web Service Resource Framework (WSRF), a web service specification which standardizes the accessing and manipulation of states for web services. The loosely coupled interactive DE system provides users with the states of wrappers through the web service states, called WS-Resources. The interactive DE system can also accept parameters from the application programs during extraction through the WSRF's notification design pattern. By providing the above interactive capabilities, the application programs and wrappers of the DE systems can be easily shared and controlled.
- Browser-oriented data extraction and WSRF
- Data extraction