GPU Warp Scheduling Using Memory Stall Sampling on CASLAB-GPUSIM

  • 邱 健鳴

Student thesis: Master's Thesis


In these years Graphic Processing Units (GPUs) well known for parallel computing are widely adopted to accelerate non-graphic workloads such as Data Mining Machine Learning and Image Recognition Modern GPUs utilize a huge number of concurrent threads and Fine-Grained Multithreading technique to overlap operation latencies However recent researches have shown that the memory contention problem is one of the most important bottlenecks preventing modern GPUs from achieving peak performance The memory contention problem could be even more serious when the degree of multithreading gets higher due to the overloading of the memory system while the latency hiding ability is poor with a low degree of multithreading We propose Memory-Contention Aware Warp Scheduling (MAWS) to strike a balance between memory workloads and memory resources This scheme uses dynamic sampling to accurately recognize the severity level of the memory contention problem and provides an appropriate degree of thread concurrency correspondingly Our experiments show that MAWS achieves a geometric mean speedup of 96 4% over baseline Loose Round-Robin scheduler for cache sensitive workloads on GPGPU-Sim MAWS also achieves an overall speedup of 17 4% on CASLAB-GPUSIM
Date of Award2017 Aug 29
Original languageEnglish
SupervisorChung-Ho Chen (Supervisor)

Cite this