TY - JOUR

T1 - Bayesian variable selection for finite mixture model of linear regressions

AU - Lee, Kuo Jung

AU - Chen, Ray Bing

AU - Wu, Ying Nian

N1 - Funding Information:
The research of Chen and Lee is supported in part by the National Science Council under grant NSC 99-2118-M-006-006-MY2 and MOST 103-2118-M-006-002-MY2 (Chen) and NSC 101-2118-M-006-007- (Lee), and the Mathematics Division of the National Center for Theoretical Sciences in Taiwan . The research of Wu is supported by NSF DMS 1310391 , ONR MURI N00014-10-1-0933 , and DARPA MSEE FA8650-11-1-7149 .
Publisher Copyright:
© 2015 Elsevier B.V.

PY - 2016/3/1

Y1 - 2016/3/1

N2 - We propose a Bayesian variable selection method for fitting the finite mixture model of linear regressions. The model assumes that the observations come from a heterogeneous population which is a mixture of a finite number of sub-populations. Within each sub-population, the response variable can be explained by a linear regression on the predictor variables. If the number of predictor variables is large, it is assumed that only a small subset of variables are important for explaining the response variable. It is further assumed that for different sub-populations, different subsets of variables may be needed to explain the response variable. This gives rise to a complex variable selection problem. We propose to solve this problem within the Bayesian framework where we introduce two sets of latent variables. The first set of latent variables are membership indicators of the observations, indicating which sub-population each observation comes from. The second set of latent variables are inclusion/exclusion indicators for the predictor variables, indicating whether or not a variable is included in the regression model of a sub-population. Variable selection can then be accomplished by sampling from the posterior distributions of the indicators as well as the coefficients of the selected variables. We conduct simulation studies to demonstrate that the proposed method performs well in comparison with existing methods. We also analyze a real data set to further illustrate the usefulness of the proposed method.

AB - We propose a Bayesian variable selection method for fitting the finite mixture model of linear regressions. The model assumes that the observations come from a heterogeneous population which is a mixture of a finite number of sub-populations. Within each sub-population, the response variable can be explained by a linear regression on the predictor variables. If the number of predictor variables is large, it is assumed that only a small subset of variables are important for explaining the response variable. It is further assumed that for different sub-populations, different subsets of variables may be needed to explain the response variable. This gives rise to a complex variable selection problem. We propose to solve this problem within the Bayesian framework where we introduce two sets of latent variables. The first set of latent variables are membership indicators of the observations, indicating which sub-population each observation comes from. The second set of latent variables are inclusion/exclusion indicators for the predictor variables, indicating whether or not a variable is included in the regression model of a sub-population. Variable selection can then be accomplished by sampling from the posterior distributions of the indicators as well as the coefficients of the selected variables. We conduct simulation studies to demonstrate that the proposed method performs well in comparison with existing methods. We also analyze a real data set to further illustrate the usefulness of the proposed method.

UR - http://www.scopus.com/inward/record.url?scp=84947929113&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84947929113&partnerID=8YFLogxK

U2 - 10.1016/j.csda.2015.09.005

DO - 10.1016/j.csda.2015.09.005

M3 - Article

AN - SCOPUS:84947929113

VL - 95

SP - 1

EP - 16

JO - Computational Statistics and Data Analysis

JF - Computational Statistics and Data Analysis

SN - 0167-9473

ER -