Efficient Multi-training Framework of Image Deep Learning on GPU Cluster

Chun Fu Richard Chen, Gwo Giun Chris Lee, Yinglong Xia, W. Sabrina Lin, Toyotaro Suzumura, Ching Yung Lin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

In this paper, we develop a pipelining schema for image deep learning on GPU cluster to leverage heavy workload of training procedure. In addition, it is usually necessary to train multiple models to obtain a good deep learning model due to the limited a priori knowledge on deep neural network structure. Therefore, adopting parallel and distributed computing appears is an obvious path forward, but the mileage varies depending on how amenable a deep network can be parallelized and the availability of rapid prototyping capabilities with low cost of entry. In this work, we propose a framework to organize the training procedures of multiple deep learning models into a pipeline on a GPU cluster, where each stage is handled by a particular GPU with a partition of the training dataset. Instead of frequently migrating data among the disks, CPUs, and GPUs, our framework only moves partially trained models to reduce bandwidth consumption and to leverage the full computation capability of the cluster. In this paper, we deploy the proposed framework on popular image recognition tasks using deep learning, and the experiments show that the proposed method reduces overall training time up to dozens of hours compared to the baseline method.

Original languageEnglish
Title of host publicationProceedings - 2015 IEEE International Symposium on Multimedia, ISM 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages489-494
Number of pages6
ISBN (Electronic)9781509003792
DOIs
Publication statusPublished - 2016 Mar 25
Event17th IEEE International Symposium on Multimedia, ISM 2015 - Miami, United States
Duration: 2015 Dec 142015 Dec 16

Publication series

NameProceedings - 2015 IEEE International Symposium on Multimedia, ISM 2015

Other

Other17th IEEE International Symposium on Multimedia, ISM 2015
Country/TerritoryUnited States
CityMiami
Period15-12-1415-12-16

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Hardware and Architecture
  • Software
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Efficient Multi-training Framework of Image Deep Learning on GPU Cluster'. Together they form a unique fingerprint.

Cite this