TY - GEN
T1 - True or False
T2 - 26th International Conference on Technologies and Applications of Artificial Intelligence, TAAI 2021
AU - Ni, Shiwen
AU - Li, Jiawen
AU - Kao, Hung Yu
N1 - Funding Information:
This work was funded in part by Qualcomm through a Taiwan University Research Collaboration Project and in part by the Ministry of Science and Technology, Taiwan, under grant MOST 110-2221-E-006-001 and NCKU B109-K027D. We thank to National Center for High-performance Computing (NCHC) for providing computational and storage resources.
Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - It is difficult for humans to distinguish the true and false of rumors, but current deep learning models can surpass humans and achieve excellent accuracy on many rumor datasets. In this paper, we investigate whether deep learning models that seem to perform well actually learn to detect rumors. We evaluate models on their generalization ability to out-of-domain examples by fine-tuning BERT-based models on five real-world datasets and evaluating against all test sets. The experimental results indicate that the generalization ability of the models on other unseen datasets are unsatisfactory, even common-sense rumors cannot be detected. Moreover, we found through experiments that models take shortcuts and learn absurd knowledge when the rumor datasets have serious data pitfalls. This means that simple modifications to the rumor text based on specific rules will lead to inconsistent model predictions. To more realistically evaluate rumor detection models, we proposed a new evaluation method called paired test (PairT), which requires models to correctly predict a pair of test samples at the same time. Furthermore, we make recommendations on how to better create rumor dataset and evaluate rumor detection model at the end of this paper.
AB - It is difficult for humans to distinguish the true and false of rumors, but current deep learning models can surpass humans and achieve excellent accuracy on many rumor datasets. In this paper, we investigate whether deep learning models that seem to perform well actually learn to detect rumors. We evaluate models on their generalization ability to out-of-domain examples by fine-tuning BERT-based models on five real-world datasets and evaluating against all test sets. The experimental results indicate that the generalization ability of the models on other unseen datasets are unsatisfactory, even common-sense rumors cannot be detected. Moreover, we found through experiments that models take shortcuts and learn absurd knowledge when the rumor datasets have serious data pitfalls. This means that simple modifications to the rumor text based on specific rules will lead to inconsistent model predictions. To more realistically evaluate rumor detection models, we proposed a new evaluation method called paired test (PairT), which requires models to correctly predict a pair of test samples at the same time. Furthermore, we make recommendations on how to better create rumor dataset and evaluate rumor detection model at the end of this paper.
UR - https://www.scopus.com/pages/publications/85131936733
UR - https://www.scopus.com/pages/publications/85131936733#tab=citedBy
U2 - 10.1109/TAAI54685.2021.00030
DO - 10.1109/TAAI54685.2021.00030
M3 - Conference contribution
AN - SCOPUS:85131936733
T3 - Proceedings - 2021 International Conference on Technologies and Applications of Artificial Intelligence, TAAI 2021
SP - 119
EP - 124
BT - Proceedings - 2021 International Conference on Technologies and Applications of Artificial Intelligence, TAAI 2021
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 18 November 2021 through 20 November 2021
ER -