TY - JOUR
T1 - A novel topic selection algorithm based on word distribution
AU - Tsai, Chun Wei
AU - Huang, Ko Wei
AU - Hsu, Heng Yao
AU - Chiang, Ming Chao
AU - Yang, Chu Sing
PY - 2010/4
Y1 - 2010/4
N2 - Over the past decade, more and more users of the Internet rely on the search engines to help them find the information they need. However, the information they find depends, to a large extent, on the ranking mechanism of the search engines they use. Not surprisingly, it generally consists of a large amount of information that is completely irrelevant. To help users of the Internet find the information they are looking for quickly, an efficient algorithm for building the summaries of a collection of documents found by a search engine in response to a user query, called DiSco (Distribution Scoring), is proposed. The main idea in the design of DiSco is to balance the rate of coverage and overlap in the selection of topic words for document summarization, especially when the datasets are large. Moreover, several measure methods such as coverage, overlap, and the computation time are employed in evaluating the performance of the proposed algorithm. All our simulation results indicate that the proposed algorithm outperforms all the state- of-the-art algorithms evaluated in terms of not only the quality of the summarizations but also the computation time.
AB - Over the past decade, more and more users of the Internet rely on the search engines to help them find the information they need. However, the information they find depends, to a large extent, on the ranking mechanism of the search engines they use. Not surprisingly, it generally consists of a large amount of information that is completely irrelevant. To help users of the Internet find the information they are looking for quickly, an efficient algorithm for building the summaries of a collection of documents found by a search engine in response to a user query, called DiSco (Distribution Scoring), is proposed. The main idea in the design of DiSco is to balance the rate of coverage and overlap in the selection of topic words for document summarization, especially when the datasets are large. Moreover, several measure methods such as coverage, overlap, and the computation time are employed in evaluating the performance of the proposed algorithm. All our simulation results indicate that the proposed algorithm outperforms all the state- of-the-art algorithms evaluated in terms of not only the quality of the summarizations but also the computation time.
UR - https://www.scopus.com/pages/publications/77951789458
UR - https://www.scopus.com/pages/publications/77951789458#tab=citedBy
M3 - Article
AN - SCOPUS:77951789458
SN - 1349-4198
VL - 6
SP - 1843
EP - 1864
JO - International Journal of Innovative Computing, Information and Control
JF - International Journal of Innovative Computing, Information and Control
IS - 4
ER -