Extracting classification knowledge of Internet documents with mining term associations: A semantic approach

Shian Hua Lin, Chi Sheng Shih, Meng Chang Chen, Jan Ming Ho, Ming Tat Ko, Yueh-Min Huang

Research output: Contribution to journalConference article

55 Citations (Scopus)

Abstract

In this paper, we present a system that extracts and generalizes terms from Internet documents to represent classification knowledge of a given class hierarchy. We propose a measurement to evaluate the importance of a term with respect to a class in the class hierarchy, and denote it as support. With a given threshold, terms with high supports are sifted as keywords of a class, and terms with low supports are filtered out. To further enhance the recall of this approach, Mining Association Rules technique is applied to mine the association between terms. An inference model is composed of these association relations and the previously computed supports of the terms in the class. To increase the recall rate of the keyword selection process, we then present a polynomial-time inference algorithm to promote a term, strongly associated to a known keyword, to a keyword. According to our experiment results on the collected Internet documents from Yam search engine, we show that the proposed methods in the paper contribute to refine the classification knowledge and increase the recall of keyword selection.

Original languageEnglish
Pages (from-to)241-249
Number of pages9
JournalSIGIR Forum (ACM Special Interest Group on Information Retrieval)
Publication statusPublished - 1998 Dec 1
EventProceedings of the 1998 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'98) - Melbourne, Vic., Aust
Duration: 1998 Aug 241998 Aug 28

All Science Journal Classification (ASJC) codes

  • Management Information Systems
  • Hardware and Architecture

Fingerprint Dive into the research topics of 'Extracting classification knowledge of Internet documents with mining term associations: A semantic approach'. Together they form a unique fingerprint.

  • Cite this