Entropy-based visual tree evaluation on block extraction

Wei Ting Cho, Yu Min Lin, Hung-Yu Kao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

More and More people use Cascading Style Sheets (CSS) to manage their Web pages, because CSS is easy and convenient to typesetting. However, CSS makes a Web page displayed in an ambiguous structure. The data extraction systems that based on mining the Web page structure would generate false judgments for these CSS-rich pages. For solving this issue, we propose a system that applies properties of CSS Web pages to extract data blocks. In this system, Web pages are converted into a visual tree and the entropy attributes of each node in a visual tree is calculated. In the experiment, the result shows the node attributes and the visual tree are useful to extract blocks on CSS Web pages. Our system also outperforms with other systems on container block extraction.

Original languageEnglish
Title of host publicationProceedings - 2009 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2009
Pages580-583
Number of pages4
DOIs
Publication statusPublished - 2009 Dec 1
Event2009 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2009 - Milano, Italy
Duration: 2009 Sep 152009 Sep 18

Publication series

NameProceedings - 2009 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2009
Volume1

Other

Other2009 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2009
CountryItaly
CityMilano
Period09-09-1509-09-18

Fingerprint

Websites
Entropy
Typesetting
Containers
Experiments

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Networks and Communications

Cite this

Cho, W. T., Lin, Y. M., & Kao, H-Y. (2009). Entropy-based visual tree evaluation on block extraction. In Proceedings - 2009 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2009 (pp. 580-583). [5286011] (Proceedings - 2009 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2009; Vol. 1). https://doi.org/10.1109/WI-IAT.2009.98
Cho, Wei Ting ; Lin, Yu Min ; Kao, Hung-Yu. / Entropy-based visual tree evaluation on block extraction. Proceedings - 2009 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2009. 2009. pp. 580-583 (Proceedings - 2009 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2009).
@inproceedings{a464cfa45a314e7d9d25e2860e486b6d,
title = "Entropy-based visual tree evaluation on block extraction",
abstract = "More and More people use Cascading Style Sheets (CSS) to manage their Web pages, because CSS is easy and convenient to typesetting. However, CSS makes a Web page displayed in an ambiguous structure. The data extraction systems that based on mining the Web page structure would generate false judgments for these CSS-rich pages. For solving this issue, we propose a system that applies properties of CSS Web pages to extract data blocks. In this system, Web pages are converted into a visual tree and the entropy attributes of each node in a visual tree is calculated. In the experiment, the result shows the node attributes and the visual tree are useful to extract blocks on CSS Web pages. Our system also outperforms with other systems on container block extraction.",
author = "Cho, {Wei Ting} and Lin, {Yu Min} and Hung-Yu Kao",
year = "2009",
month = "12",
day = "1",
doi = "10.1109/WI-IAT.2009.98",
language = "English",
isbn = "9780769538013",
series = "Proceedings - 2009 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2009",
pages = "580--583",
booktitle = "Proceedings - 2009 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2009",

}

Cho, WT, Lin, YM & Kao, H-Y 2009, Entropy-based visual tree evaluation on block extraction. in Proceedings - 2009 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2009., 5286011, Proceedings - 2009 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2009, vol. 1, pp. 580-583, 2009 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2009, Milano, Italy, 09-09-15. https://doi.org/10.1109/WI-IAT.2009.98

Entropy-based visual tree evaluation on block extraction. / Cho, Wei Ting; Lin, Yu Min; Kao, Hung-Yu.

Proceedings - 2009 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2009. 2009. p. 580-583 5286011 (Proceedings - 2009 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2009; Vol. 1).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Entropy-based visual tree evaluation on block extraction

AU - Cho, Wei Ting

AU - Lin, Yu Min

AU - Kao, Hung-Yu

PY - 2009/12/1

Y1 - 2009/12/1

N2 - More and More people use Cascading Style Sheets (CSS) to manage their Web pages, because CSS is easy and convenient to typesetting. However, CSS makes a Web page displayed in an ambiguous structure. The data extraction systems that based on mining the Web page structure would generate false judgments for these CSS-rich pages. For solving this issue, we propose a system that applies properties of CSS Web pages to extract data blocks. In this system, Web pages are converted into a visual tree and the entropy attributes of each node in a visual tree is calculated. In the experiment, the result shows the node attributes and the visual tree are useful to extract blocks on CSS Web pages. Our system also outperforms with other systems on container block extraction.

AB - More and More people use Cascading Style Sheets (CSS) to manage their Web pages, because CSS is easy and convenient to typesetting. However, CSS makes a Web page displayed in an ambiguous structure. The data extraction systems that based on mining the Web page structure would generate false judgments for these CSS-rich pages. For solving this issue, we propose a system that applies properties of CSS Web pages to extract data blocks. In this system, Web pages are converted into a visual tree and the entropy attributes of each node in a visual tree is calculated. In the experiment, the result shows the node attributes and the visual tree are useful to extract blocks on CSS Web pages. Our system also outperforms with other systems on container block extraction.

UR - http://www.scopus.com/inward/record.url?scp=84863161777&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84863161777&partnerID=8YFLogxK

U2 - 10.1109/WI-IAT.2009.98

DO - 10.1109/WI-IAT.2009.98

M3 - Conference contribution

SN - 9780769538013

T3 - Proceedings - 2009 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2009

SP - 580

EP - 583

BT - Proceedings - 2009 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2009

ER -

Cho WT, Lin YM, Kao H-Y. Entropy-based visual tree evaluation on block extraction. In Proceedings - 2009 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2009. 2009. p. 580-583. 5286011. (Proceedings - 2009 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2009). https://doi.org/10.1109/WI-IAT.2009.98