TY - JOUR
T1 - Model-based clustering via mixtures of unrestricted skew normal factor analyzers with complete and incomplete data
AU - Wang, Wan Lun
AU - Lin, Tsung I.
N1 - Publisher Copyright:
© 2022, Springer-Verlag GmbH Germany, part of Springer Nature.
PY - 2023/9
Y1 - 2023/9
N2 - Mixtures of factor analyzers (MFA) based on the restricted skew normal distribution (rMSN) have emerged as a flexible tool to handle asymmetrical high-dimensional data with heterogeneity. However, the rMSN distribution is oft-criticized a lack of sufficient ability to accommodate potential skewness arisen from more than one feature space. This paper presents an alternative extension of MFA by assuming the unrestricted skew normal (uMSN) distribution for the component factors. In particular, the proposed mixtures of unrestricted skew normal factor analyzers (MuSNFA) can simultaneously capture multiple directions of skewness and deal with the occurrence of missing values or nonresponses. Under the missing at random (MAR) mechanism, we develop a computationally feasible expectation conditional maximization (ECM) algorithm for computing the maximum likelihood estimates of model parameters. Practical aspects related to model-based clustering, prediction of factor scores and imputation of missing values are also discussed. The utility of the proposed methodology is illustrated with the analysis of simulated and real datasets.
AB - Mixtures of factor analyzers (MFA) based on the restricted skew normal distribution (rMSN) have emerged as a flexible tool to handle asymmetrical high-dimensional data with heterogeneity. However, the rMSN distribution is oft-criticized a lack of sufficient ability to accommodate potential skewness arisen from more than one feature space. This paper presents an alternative extension of MFA by assuming the unrestricted skew normal (uMSN) distribution for the component factors. In particular, the proposed mixtures of unrestricted skew normal factor analyzers (MuSNFA) can simultaneously capture multiple directions of skewness and deal with the occurrence of missing values or nonresponses. Under the missing at random (MAR) mechanism, we develop a computationally feasible expectation conditional maximization (ECM) algorithm for computing the maximum likelihood estimates of model parameters. Practical aspects related to model-based clustering, prediction of factor scores and imputation of missing values are also discussed. The utility of the proposed methodology is illustrated with the analysis of simulated and real datasets.
UR - http://www.scopus.com/inward/record.url?scp=85143441263&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85143441263&partnerID=8YFLogxK
U2 - 10.1007/s10260-022-00674-x
DO - 10.1007/s10260-022-00674-x
M3 - Article
AN - SCOPUS:85143441263
SN - 1618-2510
VL - 32
SP - 787
EP - 817
JO - Statistical Methods and Applications
JF - Statistical Methods and Applications
IS - 3
ER -