TY - GEN
T1 - SMITH
T2 - 26th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2022
AU - Yang, Chih Chun
AU - Li, Cheng Te
AU - Lin, Shou De
N1 - Funding Information:
Acknowledgements. This material is based upon work supported by Taiwan Ministry of Science and Technology (MOST) under grant numbers 110-2634-F-002-050, 110-2221-E-006-136-MY3, 110-2221-E-006-001, and 110-2634-F-002-051.
Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - Missing values in testing data has been a notorious problem in machine learning community since it can heavily deteriorate the performance of downstream model learned from complete data without any precaution. To better perform the prediction task with this kind of downstream model, we must impute the missing value first. Therefore, the imputation quality and how to utilize the knowledge provided by the pre-trained and fixed downstream model are the keys to address this problem. In this paper, we aim to address this problem and focus on models learned from tabular data. We present a novel Self-supervised downstream-aware framework for MIssing Testing data Handling (SMITH), which consists of a transformer-based imputation model and a downstream label estimation algorithm. The former can be replaced by any existing imputation model of interest with additional performance gain acquired in comparison with that of their original design. By advancing two self-supervised tasks and the knowledge from the prediction of the downstream model to guide the learning of our transformer-based imputation model, our SMITH framework performs favorably against state-of-the-art methods under several benchmarking datasets.
AB - Missing values in testing data has been a notorious problem in machine learning community since it can heavily deteriorate the performance of downstream model learned from complete data without any precaution. To better perform the prediction task with this kind of downstream model, we must impute the missing value first. Therefore, the imputation quality and how to utilize the knowledge provided by the pre-trained and fixed downstream model are the keys to address this problem. In this paper, we aim to address this problem and focus on models learned from tabular data. We present a novel Self-supervised downstream-aware framework for MIssing Testing data Handling (SMITH), which consists of a transformer-based imputation model and a downstream label estimation algorithm. The former can be replaced by any existing imputation model of interest with additional performance gain acquired in comparison with that of their original design. By advancing two self-supervised tasks and the knowledge from the prediction of the downstream model to guide the learning of our transformer-based imputation model, our SMITH framework performs favorably against state-of-the-art methods under several benchmarking datasets.
UR - http://www.scopus.com/inward/record.url?scp=85130260540&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85130260540&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-05936-0_39
DO - 10.1007/978-3-031-05936-0_39
M3 - Conference contribution
AN - SCOPUS:85130260540
SN - 9783031059353
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 499
EP - 510
BT - Advances in Knowledge Discovery and Data Mining - 26th Pacific-Asia Conference, PAKDD 2022, Proceedings
A2 - Gama, João
A2 - Li, Tianrui
A2 - Yu, Yang
A2 - Chen, Enhong
A2 - Zheng, Yu
A2 - Teng, Fei
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 16 May 2022 through 19 May 2022
ER -