Postselection Inference in Structural Equation Modeling

Research output: Contribution to journalArticle

Abstract

Most statistical inference methods were established under the assumption that the fitted model is known in advance. In practice, however, researchers often obtain their final model by some data-driven selection process. The selection process makes the finally fitted model random, and it also influences the sampling distribution of the estimator. Therefore, implementing naive inference methods may result in wrong conclusions—which is probably a prime source of the reproducibility crisis in psychological science. The present study accommodates three valid state-of-the-art postselection inference methods for structural equation modeling (SEM) from the statistical literature: data splitting (DS), postselection inference (PoSI), and the polyhedral (PH) method. A simulation is conducted to compare the three methods with the commonly used naive procedure under selection events made by L1-penalized SEM. The results show that the naive method often yields incorrect inference, and that the valid methods control the coverage rate in most cases with their own pros and cons. Real world data examples show the practical use of the valid inference methods.

Original languageEnglish
JournalMultivariate Behavioral Research
DOIs
Publication statusPublished - 2019 Jan 1

Fingerprint

Structural Equation Modeling
Valid
Sampling Distribution
Inference
Reproducibility
Statistical Inference
Data-driven
Coverage
Model
Estimator
Research Personnel
Psychology

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Experimental and Cognitive Psychology
  • Arts and Humanities (miscellaneous)

Cite this

@article{ba592a2f37d841f19d749fc0e78920a7,
title = "Postselection Inference in Structural Equation Modeling",
abstract = "Most statistical inference methods were established under the assumption that the fitted model is known in advance. In practice, however, researchers often obtain their final model by some data-driven selection process. The selection process makes the finally fitted model random, and it also influences the sampling distribution of the estimator. Therefore, implementing naive inference methods may result in wrong conclusions—which is probably a prime source of the reproducibility crisis in psychological science. The present study accommodates three valid state-of-the-art postselection inference methods for structural equation modeling (SEM) from the statistical literature: data splitting (DS), postselection inference (PoSI), and the polyhedral (PH) method. A simulation is conducted to compare the three methods with the commonly used naive procedure under selection events made by L1-penalized SEM. The results show that the naive method often yields incorrect inference, and that the valid methods control the coverage rate in most cases with their own pros and cons. Real world data examples show the practical use of the valid inference methods.",
author = "Po-Hsien Huang",
year = "2019",
month = "1",
day = "1",
doi = "10.1080/00273171.2019.1634996",
language = "English",
journal = "Multivariate Behavioral Research",
issn = "0027-3171",
publisher = "Psychology Press Ltd",

}

Postselection Inference in Structural Equation Modeling. / Huang, Po-Hsien.

In: Multivariate Behavioral Research, 01.01.2019.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Postselection Inference in Structural Equation Modeling

AU - Huang, Po-Hsien

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Most statistical inference methods were established under the assumption that the fitted model is known in advance. In practice, however, researchers often obtain their final model by some data-driven selection process. The selection process makes the finally fitted model random, and it also influences the sampling distribution of the estimator. Therefore, implementing naive inference methods may result in wrong conclusions—which is probably a prime source of the reproducibility crisis in psychological science. The present study accommodates three valid state-of-the-art postselection inference methods for structural equation modeling (SEM) from the statistical literature: data splitting (DS), postselection inference (PoSI), and the polyhedral (PH) method. A simulation is conducted to compare the three methods with the commonly used naive procedure under selection events made by L1-penalized SEM. The results show that the naive method often yields incorrect inference, and that the valid methods control the coverage rate in most cases with their own pros and cons. Real world data examples show the practical use of the valid inference methods.

AB - Most statistical inference methods were established under the assumption that the fitted model is known in advance. In practice, however, researchers often obtain their final model by some data-driven selection process. The selection process makes the finally fitted model random, and it also influences the sampling distribution of the estimator. Therefore, implementing naive inference methods may result in wrong conclusions—which is probably a prime source of the reproducibility crisis in psychological science. The present study accommodates three valid state-of-the-art postselection inference methods for structural equation modeling (SEM) from the statistical literature: data splitting (DS), postselection inference (PoSI), and the polyhedral (PH) method. A simulation is conducted to compare the three methods with the commonly used naive procedure under selection events made by L1-penalized SEM. The results show that the naive method often yields incorrect inference, and that the valid methods control the coverage rate in most cases with their own pros and cons. Real world data examples show the practical use of the valid inference methods.

UR - http://www.scopus.com/inward/record.url?scp=85069041104&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85069041104&partnerID=8YFLogxK

U2 - 10.1080/00273171.2019.1634996

DO - 10.1080/00273171.2019.1634996

M3 - Article

JO - Multivariate Behavioral Research

JF - Multivariate Behavioral Research

SN - 0027-3171

ER -