A Study of MapReduce-Based Distributed Monotonic SVM Model

  • 陳 泰霖

Student thesis: Master's Thesis


Support Vector Machine (SVM) is a high computing cost algorithm Traditional SVM uses quadratic programming to solve the classification problem but incurs high cost during computation To solve this problem this study proposes the use of MapReduce in Hadoop In order to increase the accuracy of classification we utilize monotonic prior knowledge from experts during the training phase Due to the rapid development of the Internet and storage infrastructure cloud computing has matured in recent years Some cloud operating systems integrate the high computation ability of cloud infrastructure and to break through limitations of data processing in the past Data has been produced at a growing rate in recent years and the volume of data has become so too large to be processed by a single machine By combining cloud computing and machine learning we can obtain more valuable information in from large scale data This study uses Hadoop which is an open-source framework to implement the MapReduce framework which is a distributed computing environment and a distributed file system MapReduce automatically allocates computing resources among the cluster and allows developers to focus on data processing This study proposes a model of SVM called MCSVM that considers the monotonic property of data Prior knowledge of monotonic property is given to the model to increase the accuracy of classification prediction The MCSVM uses quadratic programming to find the optimal solution which results in high complexity and the need for long training time This study proposes a MapReduce MCSVM that significantly reduces the required training time and increases the feasibility of MCSVM in real world applications
Date of Award2014 Jun 25
Original languageEnglish
SupervisorSheng-Tun Li (Supervisor)

Cite this