Web information hierarchy and importance mining based on DOM information distillation

Tseng Yi Feng, Hung Yu Kao

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Web provides people a convenient way to disseminate and search information. Due to the growth of dynamic page generation techniques, the amount and the complexity of Web pages has been increasing explosively, as has the information contained within Web pages. Redundant information is distributed throughout a page, making it difficult to automatically identify the useful information in that page. In this paper, we propose and implement a simple Web importance extraction and labeling system based on the analysis on content information and vision information of a Web page. We apply the information theory on the document object model (DOM) trees of pages and extract the vision information for each block to evaluate their importance. Results show that our system effectively extracts and labeling the importance of a page and provides a powerful surfing interface for small display device browsing. Experiments on several Web sites show high performance to meet the users' information focus.

Original languageEnglish
Title of host publicationEmerging Information Technology Conference 2005
Number of pages4
Publication statusPublished - 2005
EventEmerging Information Technology Conference 2005 - Taipei, Taiwan
Duration: 2005 Aug 152005 Aug 16

Publication series

NameEmerging Information Technology Conference 2005


OtherEmerging Information Technology Conference 2005

All Science Journal Classification (ASJC) codes

  • Engineering(all)


Dive into the research topics of 'Web information hierarchy and importance mining based on DOM information distillation'. Together they form a unique fingerprint.

Cite this