Tracking the changes of dynamic web pages in the existence of URL rewriting

Ping Jer Yeh*, Jie Tsung Li, Shyan-Ming Yuan

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

Abstract

Crawlers in a knowledge management system need to collect and archive documents from websites, and also track the change status of these documents. However, the existence of URL rewriting mechanism raises a page tracking problem since the URLs of a pair of dynamic page instances obtained during different sessions will no longer be the same. This paper proposes a series of algorithms in a bottom-up manner to find the corresponding pairs of dynamic page instances, and then to judge the change status of them. Experiments showed that the performance was very good and the outcome was 100% accurate.

Original languageEnglish
Pages (from-to)169-176
Number of pages8
JournalConferences in Research and Practice in Information Technology Series
Volume61
StatePublished - 1 Dec 2006
Event5th Australasian Data Mining Conference, AusDM 2006 - Sydney, NSW, Australia
Duration: 29 Nov 200630 Nov 2006

Keywords

  • Crawler
  • HTTP session
  • String matching
  • URL rewriting

Fingerprint Dive into the research topics of 'Tracking the changes of dynamic web pages in the existence of URL rewriting'. Together they form a unique fingerprint.

Cite this