Web information hierarchy and importance mining based on DOM information distillation

Tseng Yi Feng, Hung-Yu Kao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Web provides people a convenient way to disseminate and search information. Due to the growth of dynamic page generation techniques, the amount and the complexity of Web pages has been increasing explosively, as has the information contained within Web pages. Redundant information is distributed throughout a page, making it difficult to automatically identify the useful information in that page. In this paper, we propose and implement a simple Web importance extraction and labeling system based on the analysis on content information and vision information of a Web page. We apply the information theory on the document object model (DOM) trees of pages and extract the vision information for each block to evaluate their importance. Results show that our system effectively extracts and labeling the importance of a page and provides a powerful surfing interface for small display device browsing. Experiments on several Web sites show high performance to meet the users' information focus.

Original languageEnglish
Title of host publicationEmerging Information Technology Conference 2005
Pages64-67
Number of pages4
DOIs
Publication statusPublished - 2005 Dec 1
EventEmerging Information Technology Conference 2005 - Taipei, Taiwan
Duration: 2005 Aug 152005 Aug 16

Publication series

NameEmerging Information Technology Conference 2005
Volume2005

Other

OtherEmerging Information Technology Conference 2005
CountryTaiwan
CityTaipei
Period05-08-1505-08-16

Fingerprint

Distillation
Websites
Labeling
Information theory
Display devices
Experiments

All Science Journal Classification (ASJC) codes

  • Engineering(all)

Cite this

Feng, T. Y., & Kao, H-Y. (2005). Web information hierarchy and importance mining based on DOM information distillation. In Emerging Information Technology Conference 2005 (pp. 64-67). [1544346] (Emerging Information Technology Conference 2005; Vol. 2005). https://doi.org/10.1109/EITC.2005.1544346
Feng, Tseng Yi ; Kao, Hung-Yu. / Web information hierarchy and importance mining based on DOM information distillation. Emerging Information Technology Conference 2005. 2005. pp. 64-67 (Emerging Information Technology Conference 2005).
@inproceedings{7b38d107a86943d69a25e702c07749d3,
title = "Web information hierarchy and importance mining based on DOM information distillation",
abstract = "Web provides people a convenient way to disseminate and search information. Due to the growth of dynamic page generation techniques, the amount and the complexity of Web pages has been increasing explosively, as has the information contained within Web pages. Redundant information is distributed throughout a page, making it difficult to automatically identify the useful information in that page. In this paper, we propose and implement a simple Web importance extraction and labeling system based on the analysis on content information and vision information of a Web page. We apply the information theory on the document object model (DOM) trees of pages and extract the vision information for each block to evaluate their importance. Results show that our system effectively extracts and labeling the importance of a page and provides a powerful surfing interface for small display device browsing. Experiments on several Web sites show high performance to meet the users' information focus.",
author = "Feng, {Tseng Yi} and Hung-Yu Kao",
year = "2005",
month = "12",
day = "1",
doi = "10.1109/EITC.2005.1544346",
language = "English",
isbn = "0780393295",
series = "Emerging Information Technology Conference 2005",
pages = "64--67",
booktitle = "Emerging Information Technology Conference 2005",

}

Feng, TY & Kao, H-Y 2005, Web information hierarchy and importance mining based on DOM information distillation. in Emerging Information Technology Conference 2005., 1544346, Emerging Information Technology Conference 2005, vol. 2005, pp. 64-67, Emerging Information Technology Conference 2005, Taipei, Taiwan, 05-08-15. https://doi.org/10.1109/EITC.2005.1544346

Web information hierarchy and importance mining based on DOM information distillation. / Feng, Tseng Yi; Kao, Hung-Yu.

Emerging Information Technology Conference 2005. 2005. p. 64-67 1544346 (Emerging Information Technology Conference 2005; Vol. 2005).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Web information hierarchy and importance mining based on DOM information distillation

AU - Feng, Tseng Yi

AU - Kao, Hung-Yu

PY - 2005/12/1

Y1 - 2005/12/1

N2 - Web provides people a convenient way to disseminate and search information. Due to the growth of dynamic page generation techniques, the amount and the complexity of Web pages has been increasing explosively, as has the information contained within Web pages. Redundant information is distributed throughout a page, making it difficult to automatically identify the useful information in that page. In this paper, we propose and implement a simple Web importance extraction and labeling system based on the analysis on content information and vision information of a Web page. We apply the information theory on the document object model (DOM) trees of pages and extract the vision information for each block to evaluate their importance. Results show that our system effectively extracts and labeling the importance of a page and provides a powerful surfing interface for small display device browsing. Experiments on several Web sites show high performance to meet the users' information focus.

AB - Web provides people a convenient way to disseminate and search information. Due to the growth of dynamic page generation techniques, the amount and the complexity of Web pages has been increasing explosively, as has the information contained within Web pages. Redundant information is distributed throughout a page, making it difficult to automatically identify the useful information in that page. In this paper, we propose and implement a simple Web importance extraction and labeling system based on the analysis on content information and vision information of a Web page. We apply the information theory on the document object model (DOM) trees of pages and extract the vision information for each block to evaluate their importance. Results show that our system effectively extracts and labeling the importance of a page and provides a powerful surfing interface for small display device browsing. Experiments on several Web sites show high performance to meet the users' information focus.

UR - http://www.scopus.com/inward/record.url?scp=33751172119&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33751172119&partnerID=8YFLogxK

U2 - 10.1109/EITC.2005.1544346

DO - 10.1109/EITC.2005.1544346

M3 - Conference contribution

AN - SCOPUS:33751172119

SN - 0780393295

SN - 9780780393295

T3 - Emerging Information Technology Conference 2005

SP - 64

EP - 67

BT - Emerging Information Technology Conference 2005

ER -

Feng TY, Kao H-Y. Web information hierarchy and importance mining based on DOM information distillation. In Emerging Information Technology Conference 2005. 2005. p. 64-67. 1544346. (Emerging Information Technology Conference 2005). https://doi.org/10.1109/EITC.2005.1544346