TY - JOUR
T1 - Email security level classification of imbalanced data using artificial neural network
T2 - The real case in a world-leading enterprise
AU - Huang, Jen Wei
AU - Chiang, Chia Wen
AU - Chang, Jia Wei
N1 - Funding Information:
This work was supported by the Ministry of Science and Technology, Taiwan, R.O.C. , under contract no. MOST 105-2221-E-006-212-MY2 .
Funding Information:
This work was supported by the Ministry of Science and Technology, Taiwan, R.O.C., under contract no. MOST 105-2221-E-006-212-MY2.
PY - 2018/10
Y1 - 2018/10
N2 - Email is far more convenient than traditional mail in the delivery of messages. However, it is susceptible to information leakage in business. This problem can be alleviated by classifying emails into different security levels using text mining and machine learning technology. In this research, we developed a scheme in which a neural network is used to extract information from emails to enable its transformation into a multidimensional vector. Email text data is processed using bi-gram to train the document vector, which then undergoes under-sampling to deal with the problem of data imbalance. Finally, the security label of emails is classified using an artificial neural network. The proposed system was evaluated in an actual corporate setting. The results show that the proposed feature extraction approach is more effective than existing methods for the representations of email data in true positive rates and F1-scores.
AB - Email is far more convenient than traditional mail in the delivery of messages. However, it is susceptible to information leakage in business. This problem can be alleviated by classifying emails into different security levels using text mining and machine learning technology. In this research, we developed a scheme in which a neural network is used to extract information from emails to enable its transformation into a multidimensional vector. Email text data is processed using bi-gram to train the document vector, which then undergoes under-sampling to deal with the problem of data imbalance. Finally, the security label of emails is classified using an artificial neural network. The proposed system was evaluated in an actual corporate setting. The results show that the proposed feature extraction approach is more effective than existing methods for the representations of email data in true positive rates and F1-scores.
UR - http://www.scopus.com/inward/record.url?scp=85051026263&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85051026263&partnerID=8YFLogxK
U2 - 10.1016/j.engappai.2018.07.010
DO - 10.1016/j.engappai.2018.07.010
M3 - Article
AN - SCOPUS:85051026263
VL - 75
SP - 11
EP - 21
JO - Engineering Applications of Artificial Intelligence
JF - Engineering Applications of Artificial Intelligence
SN - 0952-1976
ER -