Web information hierarchy and importance mining based on DOM information distillation

Tseng Yi Feng, Hung-Yu Kao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Web provides people a convenient way to disseminate and search information. Due to the growth of dynamic page generation techniques, the amount and the complexity of Web pages has been increasing explosively, as has the information contained within Web pages. Redundant information is distributed throughout a page, making it difficult to automatically identify the useful information in that page. In this paper, we propose and implement a simple Web importance extraction and labeling system based on the analysis on content information and vision information of a Web page. We apply the information theory on the document object model (DOM) trees of pages and extract the vision information for each block to evaluate their importance. Results show that our system effectively extracts and labeling the importance of a page and provides a powerful surfing interface for small display device browsing. Experiments on several Web sites show high performance to meet the users' information focus.

Original languageEnglish
Title of host publicationEmerging Information Technology Conference 2005
Pages64-67
Number of pages4
DOIs
Publication statusPublished - 2005 Dec 1
EventEmerging Information Technology Conference 2005 - Taipei, Taiwan
Duration: 2005 Aug 152005 Aug 16

Publication series

NameEmerging Information Technology Conference 2005
Volume2005

Other

OtherEmerging Information Technology Conference 2005
CountryTaiwan
CityTaipei
Period05-08-1505-08-16

All Science Journal Classification (ASJC) codes

  • Engineering(all)

Fingerprint Dive into the research topics of 'Web information hierarchy and importance mining based on DOM information distillation'. Together they form a unique fingerprint.

Cite this