TY - GEN
T1 - Pseudo Triplet Networks for Classification Tasks with Cross-Source Feature Incompleteness
AU - Liow, Cayon
AU - Li, Cheng Te
AU - Yang, Chun Pai
AU - Lin, Shou De
N1 - Publisher Copyright:
© 2023 Copyright held by the owner/author(s).
PY - 2023/10/21
Y1 - 2023/10/21
N2 - Cross-source feature incompleteness - a scenario where certain features are only available in one data source but missing in another - is a common and significant challenge in machine learning. It typically arises in situations where the training data and testing data are collected from different sources with distinct feature sets. Addressing this challenge has the potential to greatly improve the utility of valuable datasets that might otherwise be considered incomplete and enhance model performance. This paper introduces the novel Pseudo Triplet Network (PTN) to address cross-source feature incompleteness. PTN fuses two Siamese network architectures - Triplet Networks and Pseudo Networks. By segregating data into instance, positive, and negative subsets, PTN facilitates effectively contrastive learning through a hybrid loss function design. The model was rigorously evaluated on six benchmark datasets from the UCI Repository, in comparison with five other methods for managing missing data, under a range of feature overlap and missing data scenarios. The PTN consistently exhibited superior performance, displaying resilience in high missing ratio situations and maintaining robust stability across various data scenarios.
AB - Cross-source feature incompleteness - a scenario where certain features are only available in one data source but missing in another - is a common and significant challenge in machine learning. It typically arises in situations where the training data and testing data are collected from different sources with distinct feature sets. Addressing this challenge has the potential to greatly improve the utility of valuable datasets that might otherwise be considered incomplete and enhance model performance. This paper introduces the novel Pseudo Triplet Network (PTN) to address cross-source feature incompleteness. PTN fuses two Siamese network architectures - Triplet Networks and Pseudo Networks. By segregating data into instance, positive, and negative subsets, PTN facilitates effectively contrastive learning through a hybrid loss function design. The model was rigorously evaluated on six benchmark datasets from the UCI Repository, in comparison with five other methods for managing missing data, under a range of feature overlap and missing data scenarios. The PTN consistently exhibited superior performance, displaying resilience in high missing ratio situations and maintaining robust stability across various data scenarios.
UR - http://www.scopus.com/inward/record.url?scp=85178145851&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85178145851&partnerID=8YFLogxK
U2 - 10.1145/3583780.3615154
DO - 10.1145/3583780.3615154
M3 - Conference contribution
AN - SCOPUS:85178145851
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 4079
EP - 4083
BT - CIKM 2023 - Proceedings of the 32nd ACM International Conference on Information and Knowledge Management
PB - Association for Computing Machinery
T2 - 32nd ACM International Conference on Information and Knowledge Management, CIKM 2023
Y2 - 21 October 2023 through 25 October 2023
ER -