A GA-based document clustering method for search engines

Chun Wei Tsai, Ming Chao Chiang, Chu Sing Yang

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)

Abstract

In this paper, we present a novel genetic algorithm, called Multiple Search Genetic Algorithm (MSGA), for clustering the web pages returned by a search engine and providing a taxonomy of those web pages to the user. MSGA uses two different kinds of chromosomes (conservative and explorer) to improve the search capability as well as enhance the clustering result. The conservative chromosomes keep the better solutions found at each generation while the explorer chromosomes are used to increase the search directions to avoid falling into local minima. The proposed method can find the optimal solutions quickly via a multiple search strategy. Our simulation result shows that the proposed algorithm outperforms other algorithms. We also present a clustering search engine system, called Document Clustering Search Engine (DCSE). It is the DCSE that takes the responsibility for spawning agents for collecting the web pages from the meta-search engine and computing the similarity between the web pages. The user of the system will receive information that has been computed and sorted and web links that are ranked according to their relevance. The end result is that the amount of time required to filter out irrelevant information is highly reduced.

Original languageEnglish
Pages (from-to)375-383
Number of pages9
JournalJournal of Internet Technology
Volume9
Issue number4
Publication statusPublished - 2008 Oct 1

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Networks and Communications

Fingerprint Dive into the research topics of 'A GA-based document clustering method for search engines'. Together they form a unique fingerprint.

Cite this