Although the haplotype data can be used to analyze the function of DNA, due to the significant efforts required in collecting the haplotype data, usually the genotype data is collected and then the population haplotype inference (PHI) problem is solved to infer haplotype data from genotype data for a population. This paper investigates the PHI problem based on the pure parsimony criterion (HIPP), which seeks the minimum number of distinct haplotypes to infer a given genotype data. We analyze the mathematical structure and properties for the HIPP problem, propose techniques to reduce the given genotype data into an equivalent one of much smaller size, and analyze the relations of genotype data using a compatible graph. Based on the mathematical properties in the compatible graph, we propose a maximal clique heuristic to obtain an upper bound, and a new polynomial-sized integer linear programming formulation to obtain a lower bound for the HIPP problem.
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Modelling and Simulation
- Biochemistry, Genetics and Molecular Biology(all)
- Immunology and Microbiology(all)
- Agricultural and Biological Sciences(all)
- Applied Mathematics