TY - JOUR
T1 - Bayesian variable selection for finite mixture model of linear regressions
AU - Lee, Kuo Jung
AU - Chen, Ray Bing
AU - Wu, Ying Nian
N1 - Publisher Copyright:
© 2015 Elsevier B.V.
PY - 2016/3/1
Y1 - 2016/3/1
N2 - We propose a Bayesian variable selection method for fitting the finite mixture model of linear regressions. The model assumes that the observations come from a heterogeneous population which is a mixture of a finite number of sub-populations. Within each sub-population, the response variable can be explained by a linear regression on the predictor variables. If the number of predictor variables is large, it is assumed that only a small subset of variables are important for explaining the response variable. It is further assumed that for different sub-populations, different subsets of variables may be needed to explain the response variable. This gives rise to a complex variable selection problem. We propose to solve this problem within the Bayesian framework where we introduce two sets of latent variables. The first set of latent variables are membership indicators of the observations, indicating which sub-population each observation comes from. The second set of latent variables are inclusion/exclusion indicators for the predictor variables, indicating whether or not a variable is included in the regression model of a sub-population. Variable selection can then be accomplished by sampling from the posterior distributions of the indicators as well as the coefficients of the selected variables. We conduct simulation studies to demonstrate that the proposed method performs well in comparison with existing methods. We also analyze a real data set to further illustrate the usefulness of the proposed method.
AB - We propose a Bayesian variable selection method for fitting the finite mixture model of linear regressions. The model assumes that the observations come from a heterogeneous population which is a mixture of a finite number of sub-populations. Within each sub-population, the response variable can be explained by a linear regression on the predictor variables. If the number of predictor variables is large, it is assumed that only a small subset of variables are important for explaining the response variable. It is further assumed that for different sub-populations, different subsets of variables may be needed to explain the response variable. This gives rise to a complex variable selection problem. We propose to solve this problem within the Bayesian framework where we introduce two sets of latent variables. The first set of latent variables are membership indicators of the observations, indicating which sub-population each observation comes from. The second set of latent variables are inclusion/exclusion indicators for the predictor variables, indicating whether or not a variable is included in the regression model of a sub-population. Variable selection can then be accomplished by sampling from the posterior distributions of the indicators as well as the coefficients of the selected variables. We conduct simulation studies to demonstrate that the proposed method performs well in comparison with existing methods. We also analyze a real data set to further illustrate the usefulness of the proposed method.
UR - http://www.scopus.com/inward/record.url?scp=84947929113&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84947929113&partnerID=8YFLogxK
U2 - 10.1016/j.csda.2015.09.005
DO - 10.1016/j.csda.2015.09.005
M3 - Article
AN - SCOPUS:84947929113
SN - 0167-9473
VL - 95
SP - 1
EP - 16
JO - Computational Statistics and Data Analysis
JF - Computational Statistics and Data Analysis
ER -