TY - JOUR
T1 - Semi-meta-supervised hate speech detection
AU - Putra, Cendra Devayana
AU - Wang, Hei Chia
N1 - Publisher Copyright:
© 2024 Elsevier B.V.
PY - 2024/3/5
Y1 - 2024/3/5
N2 - On social media, hate speech is a daily occurrence but has physical and psychological implications. Utilizing a deep learning strategy to combat hate speech is one method for preventing it. Deep learning techniques may require massive datasets to generate accurate models, but hate speech samples (such as misogyny and cyber samples) are frequently insufficient and diverse. We offer methods for leveraging these diverse datasets and enhancing deep learning models through knowledge sharing. We analyzed the existing Bidirectional Encoder Representations from Transformers (BERT) technique and built a BERT-3CNN method to generate a single-task classifier that optimally absorbs the target dataset's features. Second, we proposed a shared BERT layer to gain a general understanding of hate speech. Third, we proposed a method for adapting another dataset to the desired dataset. We conducted several quantitative experimental investigations on five datasets, including Hatebase, Supremacist, Cybertroll, TRAC, and TRAC 2020, and assessed the achieved performance using the accuracy and F1 metrics. The first experiment demonstrated that our BERT-3CNN model improved the average accuracy by 5% and the F1 score by 18%. The second experiment demonstrated that BERT-SP improved the average accuracy by 0.2% and the F1 score by 2%. TRAC, Supremacist, Hatebase, and Cybertroll all showed improvements in accuracy, with Semi BERT-SP enhancing accuracy by 6% and F1 score by 5%, while TRAC2020 showed 10% and 9% improvements.
AB - On social media, hate speech is a daily occurrence but has physical and psychological implications. Utilizing a deep learning strategy to combat hate speech is one method for preventing it. Deep learning techniques may require massive datasets to generate accurate models, but hate speech samples (such as misogyny and cyber samples) are frequently insufficient and diverse. We offer methods for leveraging these diverse datasets and enhancing deep learning models through knowledge sharing. We analyzed the existing Bidirectional Encoder Representations from Transformers (BERT) technique and built a BERT-3CNN method to generate a single-task classifier that optimally absorbs the target dataset's features. Second, we proposed a shared BERT layer to gain a general understanding of hate speech. Third, we proposed a method for adapting another dataset to the desired dataset. We conducted several quantitative experimental investigations on five datasets, including Hatebase, Supremacist, Cybertroll, TRAC, and TRAC 2020, and assessed the achieved performance using the accuracy and F1 metrics. The first experiment demonstrated that our BERT-3CNN model improved the average accuracy by 5% and the F1 score by 18%. The second experiment demonstrated that BERT-SP improved the average accuracy by 0.2% and the F1 score by 2%. TRAC, Supremacist, Hatebase, and Cybertroll all showed improvements in accuracy, with Semi BERT-SP enhancing accuracy by 6% and F1 score by 5%, while TRAC2020 showed 10% and 9% improvements.
UR - http://www.scopus.com/inward/record.url?scp=85183956994&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85183956994&partnerID=8YFLogxK
U2 - 10.1016/j.knosys.2024.111386
DO - 10.1016/j.knosys.2024.111386
M3 - Article
AN - SCOPUS:85183956994
SN - 0950-7051
VL - 287
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
M1 - 111386
ER -