SMITH: A Self-supervised Downstream-Aware Framework for Missing Testing Data Handling

Chih Chun Yang, Cheng Te Li, Shou De Lin

研究成果: Conference contribution

摘要

Missing values in testing data has been a notorious problem in machine learning community since it can heavily deteriorate the performance of downstream model learned from complete data without any precaution. To better perform the prediction task with this kind of downstream model, we must impute the missing value first. Therefore, the imputation quality and how to utilize the knowledge provided by the pre-trained and fixed downstream model are the keys to address this problem. In this paper, we aim to address this problem and focus on models learned from tabular data. We present a novel Self-supervised downstream-aware framework for MIssing Testing data Handling (SMITH), which consists of a transformer-based imputation model and a downstream label estimation algorithm. The former can be replaced by any existing imputation model of interest with additional performance gain acquired in comparison with that of their original design. By advancing two self-supervised tasks and the knowledge from the prediction of the downstream model to guide the learning of our transformer-based imputation model, our SMITH framework performs favorably against state-of-the-art methods under several benchmarking datasets.

原文English
主出版物標題Advances in Knowledge Discovery and Data Mining - 26th Pacific-Asia Conference, PAKDD 2022, Proceedings
編輯João Gama, Tianrui Li, Yang Yu, Enhong Chen, Yu Zheng, Fei Teng
發行者Springer Science and Business Media Deutschland GmbH
頁面499-510
頁數12
ISBN(列印)9783031059353
DOIs
出版狀態Published - 2022
事件26th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2022 - Chengdu, China
持續時間: 2022 5月 162022 5月 19

出版系列

名字Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
13281 LNAI
ISSN(列印)0302-9743
ISSN(電子)1611-3349

Conference

Conference26th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2022
國家/地區China
城市Chengdu
期間22-05-1622-05-19

All Science Journal Classification (ASJC) codes

  • 理論電腦科學
  • 電腦科學(全部)

指紋

深入研究「SMITH: A Self-supervised Downstream-Aware Framework for Missing Testing Data Handling」主題。共同形成了獨特的指紋。

引用此