DOMISA: DOM-based information space adsorption for web information hierarchy mining

Hung Yu Kao, Jan Ming Ho, Ming Syan Chen

研究成果: Paper

2 引文 (Scopus)

摘要

Due to the growth of dynamic page generation techniques, the amount and the complexity of Web pages has been increasing explosively, as has the information contained within Web pages. Redundant and irrelevant information is distributed and mixed throughout a page, making it difficult to automatically identify the useful information in that page. Consequently, we propose an information hierarchy in this paper, and, from that hierarchy, we can extract the significance and the relationship value of information contained within a Web page. We can then use this hierarchical structure to create a new browsing process. Our DOM-based Information Space Adsorption (DOMISA) system applies information theory to map information in a page into an information space, and our gradient tree adsorption (GTA) process uses the document object model (DOM) trees of pages to build information hierarchies. Experiments on several commercial news Web sites show high precision and recall rates achieved by DOMISA in determining information clusters of pages which validates its practical applicability to Web sites.

原文English
頁面312-320
頁數9
出版狀態Published - 2004 一月 1
事件Proceedings of the Fourth SIAM International Conference on Data Mining - Lake Buena Vista, FL, United States
持續時間: 2004 四月 222004 四月 24

Other

OtherProceedings of the Fourth SIAM International Conference on Data Mining
國家United States
城市Lake Buena Vista, FL
期間04-04-2204-04-24

指紋

Object Model
Adsorption
Mining
Model-based
Hierarchy
Value of Information
Browsing
Information Theory
Hierarchical Structure
Systems Theory
Gradient

All Science Journal Classification (ASJC) codes

  • Mathematics(all)

引用此文

Kao, H. Y., Ho, J. M., & Chen, M. S. (2004). DOMISA: DOM-based information space adsorption for web information hierarchy mining. 312-320. 論文發表於 Proceedings of the Fourth SIAM International Conference on Data Mining, Lake Buena Vista, FL, United States.
Kao, Hung Yu ; Ho, Jan Ming ; Chen, Ming Syan. / DOMISA : DOM-based information space adsorption for web information hierarchy mining. 論文發表於 Proceedings of the Fourth SIAM International Conference on Data Mining, Lake Buena Vista, FL, United States.9 p.
@conference{0dc1bcb1c28045088d913e221b7ff255,
title = "DOMISA: DOM-based information space adsorption for web information hierarchy mining",
abstract = "Due to the growth of dynamic page generation techniques, the amount and the complexity of Web pages has been increasing explosively, as has the information contained within Web pages. Redundant and irrelevant information is distributed and mixed throughout a page, making it difficult to automatically identify the useful information in that page. Consequently, we propose an information hierarchy in this paper, and, from that hierarchy, we can extract the significance and the relationship value of information contained within a Web page. We can then use this hierarchical structure to create a new browsing process. Our DOM-based Information Space Adsorption (DOMISA) system applies information theory to map information in a page into an information space, and our gradient tree adsorption (GTA) process uses the document object model (DOM) trees of pages to build information hierarchies. Experiments on several commercial news Web sites show high precision and recall rates achieved by DOMISA in determining information clusters of pages which validates its practical applicability to Web sites.",
author = "Kao, {Hung Yu} and Ho, {Jan Ming} and Chen, {Ming Syan}",
year = "2004",
month = "1",
day = "1",
language = "English",
pages = "312--320",
note = "Proceedings of the Fourth SIAM International Conference on Data Mining ; Conference date: 22-04-2004 Through 24-04-2004",

}

Kao, HY, Ho, JM & Chen, MS 2004, 'DOMISA: DOM-based information space adsorption for web information hierarchy mining', 論文發表於 Proceedings of the Fourth SIAM International Conference on Data Mining, Lake Buena Vista, FL, United States, 04-04-22 - 04-04-24 頁 312-320.

DOMISA : DOM-based information space adsorption for web information hierarchy mining. / Kao, Hung Yu; Ho, Jan Ming; Chen, Ming Syan.

2004. 312-320 論文發表於 Proceedings of the Fourth SIAM International Conference on Data Mining, Lake Buena Vista, FL, United States.

研究成果: Paper

TY - CONF

T1 - DOMISA

T2 - DOM-based information space adsorption for web information hierarchy mining

AU - Kao, Hung Yu

AU - Ho, Jan Ming

AU - Chen, Ming Syan

PY - 2004/1/1

Y1 - 2004/1/1

N2 - Due to the growth of dynamic page generation techniques, the amount and the complexity of Web pages has been increasing explosively, as has the information contained within Web pages. Redundant and irrelevant information is distributed and mixed throughout a page, making it difficult to automatically identify the useful information in that page. Consequently, we propose an information hierarchy in this paper, and, from that hierarchy, we can extract the significance and the relationship value of information contained within a Web page. We can then use this hierarchical structure to create a new browsing process. Our DOM-based Information Space Adsorption (DOMISA) system applies information theory to map information in a page into an information space, and our gradient tree adsorption (GTA) process uses the document object model (DOM) trees of pages to build information hierarchies. Experiments on several commercial news Web sites show high precision and recall rates achieved by DOMISA in determining information clusters of pages which validates its practical applicability to Web sites.

AB - Due to the growth of dynamic page generation techniques, the amount and the complexity of Web pages has been increasing explosively, as has the information contained within Web pages. Redundant and irrelevant information is distributed and mixed throughout a page, making it difficult to automatically identify the useful information in that page. Consequently, we propose an information hierarchy in this paper, and, from that hierarchy, we can extract the significance and the relationship value of information contained within a Web page. We can then use this hierarchical structure to create a new browsing process. Our DOM-based Information Space Adsorption (DOMISA) system applies information theory to map information in a page into an information space, and our gradient tree adsorption (GTA) process uses the document object model (DOM) trees of pages to build information hierarchies. Experiments on several commercial news Web sites show high precision and recall rates achieved by DOMISA in determining information clusters of pages which validates its practical applicability to Web sites.

UR - http://www.scopus.com/inward/record.url?scp=2942525697&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=2942525697&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:2942525697

SP - 312

EP - 320

ER -

Kao HY, Ho JM, Chen MS. DOMISA: DOM-based information space adsorption for web information hierarchy mining. 2004. 論文發表於 Proceedings of the Fourth SIAM International Conference on Data Mining, Lake Buena Vista, FL, United States.