TY - JOUR
T1 - A loosely coupled interactive web data extraction system
AU - Su, Jui Yuan
AU - Chen, Lung Pin
AU - Wu, I-Chen
PY - 2010/6/10
Y1 - 2010/6/10
N2 - As the rapid growing of Internet, the Web data extraction (DE) system has become a convenient tool for application programs to collect useful data. A DE system takes a wrapper as its input, which is a script describes how to navigate Web pages and extract the data. The integration of application program, wrappers, and DE system is a nontrivial task due to the complete coordination and interaction among application programs and DE systems. Furthermore, the asynchronous update technologies used in many Web pages, such as AJAX, make the integration more complex. This paper proposes a loosely coupled interactive DE system based on Browser-Oriented Data Extraction (BODE) systems and the Web Service Resource Framework (WSRF), a web service specification which standardizes the accessing and manipulation of states for web services. The loosely coupled interactive DE system provides users with the states of wrappers through the web service states, called WS-Resources. The interactive DE system can also accept parameters from the application programs during extraction through the WSRF's notification design pattern. By providing the above interactive capabilities, the application programs and wrappers of the DE systems can be easily shared and controlled.
AB - As the rapid growing of Internet, the Web data extraction (DE) system has become a convenient tool for application programs to collect useful data. A DE system takes a wrapper as its input, which is a script describes how to navigate Web pages and extract the data. The integration of application program, wrappers, and DE system is a nontrivial task due to the complete coordination and interaction among application programs and DE systems. Furthermore, the asynchronous update technologies used in many Web pages, such as AJAX, make the integration more complex. This paper proposes a loosely coupled interactive DE system based on Browser-Oriented Data Extraction (BODE) systems and the Web Service Resource Framework (WSRF), a web service specification which standardizes the accessing and manipulation of states for web services. The loosely coupled interactive DE system provides users with the states of wrappers through the web service states, called WS-Resources. The interactive DE system can also accept parameters from the application programs during extraction through the WSRF's notification design pattern. By providing the above interactive capabilities, the application programs and wrappers of the DE systems can be easily shared and controlled.
KW - Browser-oriented data extraction and WSRF
KW - Data extraction
KW - Notification
KW - WS-resource
KW - Wrapper
UR - http://www.scopus.com/inward/record.url?scp=77953151196&partnerID=8YFLogxK
U2 - 10.6138/JIT.2010.11.2.10
DO - 10.6138/JIT.2010.11.2.10
M3 - Article
AN - SCOPUS:77953151196
VL - 11
SP - 237
EP - 249
JO - Journal of Internet Technology
JF - Journal of Internet Technology
SN - 1607-9264
IS - 2
ER -