TY - JOUR
T1 - CA-CD
T2 - context-aware clickbait detection using new Chinese clickbait dataset with transfer learning method
AU - Wang, Hei Chia
AU - Maslim, Martinus
AU - Liu, Hung Yu
N1 - Publisher Copyright:
© 2023, Emerald Publishing Limited.
PY - 2024/4/15
Y1 - 2024/4/15
N2 - Purpose: A clickbait is a deceptive headline designed to boost ad revenue without presenting closely relevant content. There are numerous negative repercussions of clickbait, such as causing viewers to feel tricked and unhappy, causing long-term confusion, and even attracting cyber criminals. Automatic detection algorithms for clickbait have been developed to address this issue. The fact that there is only one semantic representation for the same term and a limited dataset in Chinese is a need for the existing technologies for detecting clickbait. This study aims to solve the limitations of automated clickbait detection in the Chinese dataset. Design/methodology/approach: This study combines both to train the model to capture the probable relationship between clickbait news headlines and news content. In addition, part-of-speech elements are used to generate the most appropriate semantic representation for clickbait detection, improving clickbait detection performance. Findings: This research successfully compiled a dataset containing up to 20,896 Chinese clickbait news articles. This collection contains news headlines, articles, categories and supplementary metadata. The suggested context-aware clickbait detection (CA-CD) model outperforms existing clickbait detection approaches on many criteria, demonstrating the proposed strategy's efficacy. Originality/value: The originality of this study resides in the newly compiled Chinese clickbait dataset and contextual semantic representation-based clickbait detection approach employing transfer learning. This method can modify the semantic representation of each word based on context and assist the model in more precisely interpreting the original meaning of news articles.
AB - Purpose: A clickbait is a deceptive headline designed to boost ad revenue without presenting closely relevant content. There are numerous negative repercussions of clickbait, such as causing viewers to feel tricked and unhappy, causing long-term confusion, and even attracting cyber criminals. Automatic detection algorithms for clickbait have been developed to address this issue. The fact that there is only one semantic representation for the same term and a limited dataset in Chinese is a need for the existing technologies for detecting clickbait. This study aims to solve the limitations of automated clickbait detection in the Chinese dataset. Design/methodology/approach: This study combines both to train the model to capture the probable relationship between clickbait news headlines and news content. In addition, part-of-speech elements are used to generate the most appropriate semantic representation for clickbait detection, improving clickbait detection performance. Findings: This research successfully compiled a dataset containing up to 20,896 Chinese clickbait news articles. This collection contains news headlines, articles, categories and supplementary metadata. The suggested context-aware clickbait detection (CA-CD) model outperforms existing clickbait detection approaches on many criteria, demonstrating the proposed strategy's efficacy. Originality/value: The originality of this study resides in the newly compiled Chinese clickbait dataset and contextual semantic representation-based clickbait detection approach employing transfer learning. This method can modify the semantic representation of each word based on context and assist the model in more precisely interpreting the original meaning of news articles.
UR - http://www.scopus.com/inward/record.url?scp=85168918334&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85168918334&partnerID=8YFLogxK
U2 - 10.1108/DTA-03-2023-0072
DO - 10.1108/DTA-03-2023-0072
M3 - Article
AN - SCOPUS:85168918334
SN - 2514-9288
VL - 58
SP - 243
EP - 266
JO - Data Technologies and Applications
JF - Data Technologies and Applications
IS - 2
ER -