Email is far more convenient than traditional mail in the delivery of messages. However, it is susceptible to information leakage in business. This problem can be alleviated by classifying emails into different security levels using text mining and machine learning technology. In this research, we developed a scheme in which a neural network is used to extract information from emails to enable its transformation into a multidimensional vector. Email text data is processed using bi-gram to train the document vector, which then undergoes under-sampling to deal with the problem of data imbalance. Finally, the security label of emails is classified using an artificial neural network. The proposed system was evaluated in an actual corporate setting. The results show that the proposed feature extraction approach is more effective than existing methods for the representations of email data in true positive rates and F1-scores.
|Number of pages||11|
|Journal||Engineering Applications of Artificial Intelligence|
|Publication status||Published - 2018 Oct|
All Science Journal Classification (ASJC) codes
- Control and Systems Engineering
- Artificial Intelligence
- Electrical and Electronic Engineering