Clustering for Web information hierarchy mining

Hung-Yu Kao, Ming Syan Chen, Jan Ming Ho

研究成果: Conference contribution

摘要

Benefiting from the growth of techniques of dynamic page generation, the amount and the complexity of Web pages increase explosively. The structures of Web pages which are dynamically generated by the same templates are thus similar to one another and are usually assembled by a set of fundamental information clusters These neighboring information clusters usually represent the similar semantics and form a larger cluster with the more generalized information. The hierarchical structure generated by information clusters in a bottom-up manner is called the information hierarchy of a page. We study the problem of mining the information hierarchies of pages in Web sites to recognize the information distribution of pages within the multilevel, multigranularity configurations. Explicitly, we propose an information clustering system that applies a top-down information centroid searching algorithm and a multigranularity centroid converging process on the document object model (DOM) trees of pages to build the information hierarchies of pages. Experiments on several real news Web sites show the high precision and recall rates of the proposed method on determining information clusters of pages and also validate its practical applicability to real Web sites.

原文English
主出版物標題Proceedings - IEEE/WIC International Conference on Web Intelligence, WI 2003
編輯Jiming Liu, Nick Cercone, Matthias Klusch, Chunnian Liu, Ning Zhong
發行者Institute of Electrical and Electronics Engineers Inc.
頁面698-701
頁數4
ISBN(電子)0769519326, 9780769519326
DOIs
出版狀態Published - 2003 一月 1
事件IEEE/WIC International Conference on Web Intelligence, WI 2003 - Halifax, Canada
持續時間: 2003 十月 132003 十月 17

出版系列

名字Proceedings - IEEE/WIC International Conference on Web Intelligence, WI 2003

Other

OtherIEEE/WIC International Conference on Web Intelligence, WI 2003
國家Canada
城市Halifax
期間03-10-1303-10-17

指紋

Websites
Semantics
World Wide Web
Clustering
Experiments

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Information Systems
  • Computer Networks and Communications
  • Human-Computer Interaction
  • Information Systems and Management

引用此文

Kao, H-Y., Chen, M. S., & Ho, J. M. (2003). Clustering for Web information hierarchy mining. 於 J. Liu, N. Cercone, M. Klusch, C. Liu, & N. Zhong (編輯), Proceedings - IEEE/WIC International Conference on Web Intelligence, WI 2003 (頁 698-701). [1241299] (Proceedings - IEEE/WIC International Conference on Web Intelligence, WI 2003). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/WI.2003.1241299
Kao, Hung-Yu ; Chen, Ming Syan ; Ho, Jan Ming. / Clustering for Web information hierarchy mining. Proceedings - IEEE/WIC International Conference on Web Intelligence, WI 2003. 編輯 / Jiming Liu ; Nick Cercone ; Matthias Klusch ; Chunnian Liu ; Ning Zhong. Institute of Electrical and Electronics Engineers Inc., 2003. 頁 698-701 (Proceedings - IEEE/WIC International Conference on Web Intelligence, WI 2003).
@inproceedings{2c68c7fa036d4eb7bd069e6ae6376e9a,
title = "Clustering for Web information hierarchy mining",
abstract = "Benefiting from the growth of techniques of dynamic page generation, the amount and the complexity of Web pages increase explosively. The structures of Web pages which are dynamically generated by the same templates are thus similar to one another and are usually assembled by a set of fundamental information clusters These neighboring information clusters usually represent the similar semantics and form a larger cluster with the more generalized information. The hierarchical structure generated by information clusters in a bottom-up manner is called the information hierarchy of a page. We study the problem of mining the information hierarchies of pages in Web sites to recognize the information distribution of pages within the multilevel, multigranularity configurations. Explicitly, we propose an information clustering system that applies a top-down information centroid searching algorithm and a multigranularity centroid converging process on the document object model (DOM) trees of pages to build the information hierarchies of pages. Experiments on several real news Web sites show the high precision and recall rates of the proposed method on determining information clusters of pages and also validate its practical applicability to real Web sites.",
author = "Hung-Yu Kao and Chen, {Ming Syan} and Ho, {Jan Ming}",
year = "2003",
month = "1",
day = "1",
doi = "10.1109/WI.2003.1241299",
language = "English",
series = "Proceedings - IEEE/WIC International Conference on Web Intelligence, WI 2003",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "698--701",
editor = "Jiming Liu and Nick Cercone and Matthias Klusch and Chunnian Liu and Ning Zhong",
booktitle = "Proceedings - IEEE/WIC International Conference on Web Intelligence, WI 2003",
address = "United States",

}

Kao, H-Y, Chen, MS & Ho, JM 2003, Clustering for Web information hierarchy mining. 於 J Liu, N Cercone, M Klusch, C Liu & N Zhong (編輯), Proceedings - IEEE/WIC International Conference on Web Intelligence, WI 2003., 1241299, Proceedings - IEEE/WIC International Conference on Web Intelligence, WI 2003, Institute of Electrical and Electronics Engineers Inc., 頁 698-701, IEEE/WIC International Conference on Web Intelligence, WI 2003, Halifax, Canada, 03-10-13. https://doi.org/10.1109/WI.2003.1241299

Clustering for Web information hierarchy mining. / Kao, Hung-Yu; Chen, Ming Syan; Ho, Jan Ming.

Proceedings - IEEE/WIC International Conference on Web Intelligence, WI 2003. 編輯 / Jiming Liu; Nick Cercone; Matthias Klusch; Chunnian Liu; Ning Zhong. Institute of Electrical and Electronics Engineers Inc., 2003. p. 698-701 1241299 (Proceedings - IEEE/WIC International Conference on Web Intelligence, WI 2003).

研究成果: Conference contribution

TY - GEN

T1 - Clustering for Web information hierarchy mining

AU - Kao, Hung-Yu

AU - Chen, Ming Syan

AU - Ho, Jan Ming

PY - 2003/1/1

Y1 - 2003/1/1

N2 - Benefiting from the growth of techniques of dynamic page generation, the amount and the complexity of Web pages increase explosively. The structures of Web pages which are dynamically generated by the same templates are thus similar to one another and are usually assembled by a set of fundamental information clusters These neighboring information clusters usually represent the similar semantics and form a larger cluster with the more generalized information. The hierarchical structure generated by information clusters in a bottom-up manner is called the information hierarchy of a page. We study the problem of mining the information hierarchies of pages in Web sites to recognize the information distribution of pages within the multilevel, multigranularity configurations. Explicitly, we propose an information clustering system that applies a top-down information centroid searching algorithm and a multigranularity centroid converging process on the document object model (DOM) trees of pages to build the information hierarchies of pages. Experiments on several real news Web sites show the high precision and recall rates of the proposed method on determining information clusters of pages and also validate its practical applicability to real Web sites.

AB - Benefiting from the growth of techniques of dynamic page generation, the amount and the complexity of Web pages increase explosively. The structures of Web pages which are dynamically generated by the same templates are thus similar to one another and are usually assembled by a set of fundamental information clusters These neighboring information clusters usually represent the similar semantics and form a larger cluster with the more generalized information. The hierarchical structure generated by information clusters in a bottom-up manner is called the information hierarchy of a page. We study the problem of mining the information hierarchies of pages in Web sites to recognize the information distribution of pages within the multilevel, multigranularity configurations. Explicitly, we propose an information clustering system that applies a top-down information centroid searching algorithm and a multigranularity centroid converging process on the document object model (DOM) trees of pages to build the information hierarchies of pages. Experiments on several real news Web sites show the high precision and recall rates of the proposed method on determining information clusters of pages and also validate its practical applicability to real Web sites.

UR - http://www.scopus.com/inward/record.url?scp=84945218281&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84945218281&partnerID=8YFLogxK

U2 - 10.1109/WI.2003.1241299

DO - 10.1109/WI.2003.1241299

M3 - Conference contribution

AN - SCOPUS:84945218281

T3 - Proceedings - IEEE/WIC International Conference on Web Intelligence, WI 2003

SP - 698

EP - 701

BT - Proceedings - IEEE/WIC International Conference on Web Intelligence, WI 2003

A2 - Liu, Jiming

A2 - Cercone, Nick

A2 - Klusch, Matthias

A2 - Liu, Chunnian

A2 - Zhong, Ning

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Kao H-Y, Chen MS, Ho JM. Clustering for Web information hierarchy mining. 於 Liu J, Cercone N, Klusch M, Liu C, Zhong N, 編輯, Proceedings - IEEE/WIC International Conference on Web Intelligence, WI 2003. Institute of Electrical and Electronics Engineers Inc. 2003. p. 698-701. 1241299. (Proceedings - IEEE/WIC International Conference on Web Intelligence, WI 2003). https://doi.org/10.1109/WI.2003.1241299