A GA-based document clustering method for search engines

Chun Wei Tsai, Ming Chao Chiang, Chu Sing Yang

研究成果: Article同行評審

2 引文 斯高帕斯(Scopus)


In this paper, we present a novel genetic algorithm, called Multiple Search Genetic Algorithm (MSGA), for clustering the web pages returned by a search engine and providing a taxonomy of those web pages to the user. MSGA uses two different kinds of chromosomes (conservative and explorer) to improve the search capability as well as enhance the clustering result. The conservative chromosomes keep the better solutions found at each generation while the explorer chromosomes are used to increase the search directions to avoid falling into local minima. The proposed method can find the optimal solutions quickly via a multiple search strategy. Our simulation result shows that the proposed algorithm outperforms other algorithms. We also present a clustering search engine system, called Document Clustering Search Engine (DCSE). It is the DCSE that takes the responsibility for spawning agents for collecting the web pages from the meta-search engine and computing the similarity between the web pages. The user of the system will receive information that has been computed and sorted and web links that are ranked according to their relevance. The end result is that the amount of time required to filter out irrelevant information is highly reduced.

頁(從 - 到)375-383
期刊Journal of Internet Technology
出版狀態Published - 2008 10月

All Science Journal Classification (ASJC) codes

  • 軟體
  • 電腦網路與通信


深入研究「A GA-based document clustering method for search engines」主題。共同形成了獨特的指紋。