Entropy-based visual tree evaluation on block extraction

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

More and More people use Cascading Style Sheets (CSS) to manage their Web pages, because CSS is easy and convenient to typesetting. However, CSS makes a Web page displayed in an ambiguous structure. The data extraction systems that based on mining the Web page structure would generate false judgments for these CSS-rich pages. For solving this issue, we propose a system that applies properties of CSS Web pages to extract data blocks. In this system, Web pages are converted into a visual tree and the entropy attributes of each node in a visual tree is calculated. In the experiment, the result shows the node attributes and the visual tree are useful to extract blocks on CSS Web pages. Our system also outperforms with other systems on container block extraction.

Original languageEnglish
Title of host publicationProceedings - 2009 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2009
Pages580-583
Number of pages4
DOIs
Publication statusPublished - 2009 Dec 1
Event2009 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2009 - Milano, Italy
Duration: 2009 Sep 152009 Sep 18

Publication series

NameProceedings - 2009 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2009
Volume1

Other

Other2009 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2009
CountryItaly
CityMilano
Period09-09-1509-09-18

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Networks and Communications

Fingerprint Dive into the research topics of 'Entropy-based visual tree evaluation on block extraction'. Together they form a unique fingerprint.

Cite this