LASSO variable selection in data envelopment analysis with small datasets

Chia-Yen Lee, Jia Ying Cai

研究成果: Article

2 引文 (Scopus)

摘要

The curse of dimensionality problem arises when a limited number of observations are used to estimate a high-dimensional frontier, in particular, by data envelopment analysis (DEA). The study conducts a data generating process (DGP) to argue the typical “rule of thumb” used in DEA, e.g. the required number of observations should be at least larger than twice of the number of inputs and outputs, is ambiguous and will produce large deviations in estimating the technical efficiency. To address this issue, we propose a Least Absolute Shrinkage and Selection Operator (LASSO) variable selection technique, which is usually used in data science for extracting significant factors, and combine it in a sign-constrained convex nonparametric least squares (SCNLS), which can be regarded as DEA estimator. Simulation results demonstrate that the proposed LASSO-SCNLS method and its variants provide useful guidelines for the DEA with small datasets.

原文English
期刊Omega (United Kingdom)
DOIs
出版狀態Accepted/In press - 2018 一月 1

指紋

Data envelopment analysis
Variable selection
Shrinkage
Operator
Technical efficiency
Curse of dimensionality
Rules of thumb
Estimator
Simulation
Factors
Least squares
Large deviations
Data generating process
Least square method

All Science Journal Classification (ASJC) codes

  • Strategy and Management
  • Management Science and Operations Research
  • Information Systems and Management

引用此文

@article{e94e94b74d2f4e95bf0a37436c459e69,
title = "LASSO variable selection in data envelopment analysis with small datasets",
abstract = "The curse of dimensionality problem arises when a limited number of observations are used to estimate a high-dimensional frontier, in particular, by data envelopment analysis (DEA). The study conducts a data generating process (DGP) to argue the typical “rule of thumb” used in DEA, e.g. the required number of observations should be at least larger than twice of the number of inputs and outputs, is ambiguous and will produce large deviations in estimating the technical efficiency. To address this issue, we propose a Least Absolute Shrinkage and Selection Operator (LASSO) variable selection technique, which is usually used in data science for extracting significant factors, and combine it in a sign-constrained convex nonparametric least squares (SCNLS), which can be regarded as DEA estimator. Simulation results demonstrate that the proposed LASSO-SCNLS method and its variants provide useful guidelines for the DEA with small datasets.",
author = "Chia-Yen Lee and Cai, {Jia Ying}",
year = "2018",
month = "1",
day = "1",
doi = "10.1016/j.omega.2018.12.008",
language = "English",
journal = "Omega",
issn = "0305-0483",
publisher = "Elsevier BV",

}

TY - JOUR

T1 - LASSO variable selection in data envelopment analysis with small datasets

AU - Lee, Chia-Yen

AU - Cai, Jia Ying

PY - 2018/1/1

Y1 - 2018/1/1

N2 - The curse of dimensionality problem arises when a limited number of observations are used to estimate a high-dimensional frontier, in particular, by data envelopment analysis (DEA). The study conducts a data generating process (DGP) to argue the typical “rule of thumb” used in DEA, e.g. the required number of observations should be at least larger than twice of the number of inputs and outputs, is ambiguous and will produce large deviations in estimating the technical efficiency. To address this issue, we propose a Least Absolute Shrinkage and Selection Operator (LASSO) variable selection technique, which is usually used in data science for extracting significant factors, and combine it in a sign-constrained convex nonparametric least squares (SCNLS), which can be regarded as DEA estimator. Simulation results demonstrate that the proposed LASSO-SCNLS method and its variants provide useful guidelines for the DEA with small datasets.

AB - The curse of dimensionality problem arises when a limited number of observations are used to estimate a high-dimensional frontier, in particular, by data envelopment analysis (DEA). The study conducts a data generating process (DGP) to argue the typical “rule of thumb” used in DEA, e.g. the required number of observations should be at least larger than twice of the number of inputs and outputs, is ambiguous and will produce large deviations in estimating the technical efficiency. To address this issue, we propose a Least Absolute Shrinkage and Selection Operator (LASSO) variable selection technique, which is usually used in data science for extracting significant factors, and combine it in a sign-constrained convex nonparametric least squares (SCNLS), which can be regarded as DEA estimator. Simulation results demonstrate that the proposed LASSO-SCNLS method and its variants provide useful guidelines for the DEA with small datasets.

UR - http://www.scopus.com/inward/record.url?scp=85058780930&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85058780930&partnerID=8YFLogxK

U2 - 10.1016/j.omega.2018.12.008

DO - 10.1016/j.omega.2018.12.008

M3 - Article

AN - SCOPUS:85058780930

JO - Omega

JF - Omega

SN - 0305-0483

ER -