TY - GEN
T1 - Web information hierarchy and importance mining based on DOM information distillation
AU - Feng, Tseng Yi
AU - Kao, Hung-Yu
PY - 2005/12/1
Y1 - 2005/12/1
N2 - Web provides people a convenient way to disseminate and search information. Due to the growth of dynamic page generation techniques, the amount and the complexity of Web pages has been increasing explosively, as has the information contained within Web pages. Redundant information is distributed throughout a page, making it difficult to automatically identify the useful information in that page. In this paper, we propose and implement a simple Web importance extraction and labeling system based on the analysis on content information and vision information of a Web page. We apply the information theory on the document object model (DOM) trees of pages and extract the vision information for each block to evaluate their importance. Results show that our system effectively extracts and labeling the importance of a page and provides a powerful surfing interface for small display device browsing. Experiments on several Web sites show high performance to meet the users' information focus.
AB - Web provides people a convenient way to disseminate and search information. Due to the growth of dynamic page generation techniques, the amount and the complexity of Web pages has been increasing explosively, as has the information contained within Web pages. Redundant information is distributed throughout a page, making it difficult to automatically identify the useful information in that page. In this paper, we propose and implement a simple Web importance extraction and labeling system based on the analysis on content information and vision information of a Web page. We apply the information theory on the document object model (DOM) trees of pages and extract the vision information for each block to evaluate their importance. Results show that our system effectively extracts and labeling the importance of a page and provides a powerful surfing interface for small display device browsing. Experiments on several Web sites show high performance to meet the users' information focus.
UR - http://www.scopus.com/inward/record.url?scp=33751172119&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33751172119&partnerID=8YFLogxK
U2 - 10.1109/EITC.2005.1544346
DO - 10.1109/EITC.2005.1544346
M3 - Conference contribution
AN - SCOPUS:33751172119
SN - 0780393295
SN - 9780780393295
T3 - Emerging Information Technology Conference 2005
SP - 64
EP - 67
BT - Emerging Information Technology Conference 2005
T2 - Emerging Information Technology Conference 2005
Y2 - 15 August 2005 through 16 August 2005
ER -