TY - GEN
T1 - Prefetch optimizations on large-scale applications via parameter value prediction
AU - Liao, Shih Wei
AU - Hung, Tzu Han
AU - Nguyen, Donald
AU - Zhou, Hucheng
AU - Chou, Chinyen
AU - Tu, Chiaheng
PY - 2009
Y1 - 2009
N2 - A typical data center application requires the processor cycles of thousands of machines. Even a single-digit performance improvement can significantly reduce the cost and power consumption of a data center. Unfortunately, achieving sustained improvement, even if modest, is difficult. Data centers are dynamic environments where applications are frequently released and servers are continually upgraded. For maintainability and fault tolerance, the physical capabilities and configuration of the servers are abstracted from the application programmer. We study application performance under different processor prefetch configurations. These configurations are largely transparent to the programmer, yet we observe a wide range of performance when comparing the worst and best configurations, with relative performance improvement ranging from 1.4% to 75.1%. Alarmingly, one application that consumes many processor cycles has a 23.6% improvement. Default prefetch configurations favor aggressively prefetching memory, which benefits most applications, but some data center applications have highly tuned memory behavior and aggressive prefetching severely decreases performance. We develop a tuning framework which attempts to predict the optimal configuration based on hardware performance counters. It applies to a large number of performance-critical data center applications without modifying the source code or binaries. The framework achieves performance within 1% of the best performance of a suite of important data center applications.
AB - A typical data center application requires the processor cycles of thousands of machines. Even a single-digit performance improvement can significantly reduce the cost and power consumption of a data center. Unfortunately, achieving sustained improvement, even if modest, is difficult. Data centers are dynamic environments where applications are frequently released and servers are continually upgraded. For maintainability and fault tolerance, the physical capabilities and configuration of the servers are abstracted from the application programmer. We study application performance under different processor prefetch configurations. These configurations are largely transparent to the programmer, yet we observe a wide range of performance when comparing the worst and best configurations, with relative performance improvement ranging from 1.4% to 75.1%. Alarmingly, one application that consumes many processor cycles has a 23.6% improvement. Default prefetch configurations favor aggressively prefetching memory, which benefits most applications, but some data center applications have highly tuned memory behavior and aggressive prefetching severely decreases performance. We develop a tuning framework which attempts to predict the optimal configuration based on hardware performance counters. It applies to a large number of performance-critical data center applications without modifying the source code or binaries. The framework achieves performance within 1% of the best performance of a suite of important data center applications.
UR - https://www.scopus.com/pages/publications/70449727094
UR - https://www.scopus.com/pages/publications/70449727094#tab=citedBy
U2 - 10.1145/1542275.1542359
DO - 10.1145/1542275.1542359
M3 - Conference contribution
AN - SCOPUS:70449727094
SN - 9781605584980
T3 - Proceedings of the International Conference on Supercomputing
SP - 519
EP - 520
BT - ICS'09 - Proceedings of the 23rd International Conference on Supercomputing
T2 - 23rd International Conference on Supercomputing, ICS'09
Y2 - 8 June 2009 through 12 June 2009
ER -