Haplotypes incorporate more information about the underlying polymorphisms than do genotypes for individual SNPs, and are considered as a more informative format of data in association analysis. To model haplotypes requires high degrees of freedom, which could decrease power and limit a model's capacity to incorporate other complex effects, such as gene-gene interactions. Even within haplotype blocks, high degrees of freedom are still a concern unless one chooses to discard rare haplotypes. To increase the efficiency and power of haplotype analysis, we adapt the evolutionary concepts of cladistic analyses and propose a grouping algorithm to cluster rare haplotypes to the corresponding ancestral haplotypes. The algorithm determines the cluster bases by preserving common haplotypes using a criterion built on the Shannon information content. Each haplotype is then assigned to its appropriate clusters probabilistically according to the cladistic relationship. Through this algorithm, we perform association analysis based on groups of haplotypes. Simulation results indicate power increases for performing tests on the haplotype clusters when compared to tests using original haplotypes or the truncated haplotype distribution.
All Science Journal Classification (ASJC) codes