TY - GEN
T1 - Natural Language Inference by Integrating Deep and Shallow Representations with Knowledge Distillation
AU - Chen, Pei Chang
AU - Ma, Hao Shang
AU - Huang, Jen Wei
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Natural language understanding models often make use of surface patterns or idiosyncratic biases in a given dataset to make predictions pertaining to natural language inference (NLI) tasks. Unfortunately, this renders the resulting model vulnerable to out-of-distribution datasets to which the identified features are inapplicable, thereby leading to erroneous results. Many of the methods developed for out-of-distribution datasets have proven effective; however, they also tend to impose a tradeoff in performance when applied to in-distribution datasets. In this paper, we use a teacher model providing knowledge for the student ensemble model as basic information for training. The student ensemble model then integrates information of deep and shallow representations to extend learning performance to a wide range of examples. The evaluation demonstrates that the proposed model outperformed state-of-the-art models when applied to in-distribution as well as out-of-distribution datasets.
AB - Natural language understanding models often make use of surface patterns or idiosyncratic biases in a given dataset to make predictions pertaining to natural language inference (NLI) tasks. Unfortunately, this renders the resulting model vulnerable to out-of-distribution datasets to which the identified features are inapplicable, thereby leading to erroneous results. Many of the methods developed for out-of-distribution datasets have proven effective; however, they also tend to impose a tradeoff in performance when applied to in-distribution datasets. In this paper, we use a teacher model providing knowledge for the student ensemble model as basic information for training. The student ensemble model then integrates information of deep and shallow representations to extend learning performance to a wide range of examples. The evaluation demonstrates that the proposed model outperformed state-of-the-art models when applied to in-distribution as well as out-of-distribution datasets.
UR - http://www.scopus.com/inward/record.url?scp=85179005503&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85179005503&partnerID=8YFLogxK
U2 - 10.1109/DSAA60987.2023.10302536
DO - 10.1109/DSAA60987.2023.10302536
M3 - Conference contribution
AN - SCOPUS:85179005503
T3 - 2023 IEEE 10th International Conference on Data Science and Advanced Analytics, DSAA 2023 - Proceedings
BT - 2023 IEEE 10th International Conference on Data Science and Advanced Analytics, DSAA 2023 - Proceedings
A2 - Manolopoulos, Yannis
A2 - Zhou, Zhi-Hua
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 10th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2023
Y2 - 9 October 2023 through 12 October 2023
ER -