TY - GEN
T1 - A buffering approach to manage I/O in a normalized cross-correlation earthquake detection code for large seismic datasets
AU - Mu, Dawei
AU - Cicotti, Pietro
AU - Cui, Yifeng
AU - Lee, Enjui
AU - Chen, Po
N1 - Publisher Copyright:
© 2017 Association for Computing Machinery.
PY - 2017/7/9
Y1 - 2017/7/9
N2 - Continued advances in high-performance computing architectures constantly move the computational performance forward widening performance gap with I/O. As a result, I/O plays an increasingly critical role in modern data-intensive scientific applications. We have developed a high-performance GPU-based software called cuNCC, which is designed to calculate seismic waveform similarity for subjects like hypocenter estimates and small earthquake detection. GPU's acceleration greatly reduced the compute time and we are currently investigating I/O optimizations, to tackle this new performance bottleneck. In order to find an optimal I/O solution for our cuNCC code, we had performed a series of I/O benchmark tests and implemented buffering in CPU memory to manage the output transfers. With this preliminary work, we were able to establish that buffering improves the I/O bandwidth achieved, but is only beneficial when I/O bandwidth is limited, since the cost of the additional memory copy may exceed improvement in I/O. However, in realistic environment where I/O bandwidth per node is limited, and small I/O transfers are penalized, this technique will improve overall performance. In addition, by using a large memory system, the point at which computing has to stop to wait for I/O is delayed, enablingfast computations on larger data sets.
AB - Continued advances in high-performance computing architectures constantly move the computational performance forward widening performance gap with I/O. As a result, I/O plays an increasingly critical role in modern data-intensive scientific applications. We have developed a high-performance GPU-based software called cuNCC, which is designed to calculate seismic waveform similarity for subjects like hypocenter estimates and small earthquake detection. GPU's acceleration greatly reduced the compute time and we are currently investigating I/O optimizations, to tackle this new performance bottleneck. In order to find an optimal I/O solution for our cuNCC code, we had performed a series of I/O benchmark tests and implemented buffering in CPU memory to manage the output transfers. With this preliminary work, we were able to establish that buffering improves the I/O bandwidth achieved, but is only beneficial when I/O bandwidth is limited, since the cost of the additional memory copy may exceed improvement in I/O. However, in realistic environment where I/O bandwidth per node is limited, and small I/O transfers are penalized, this technique will improve overall performance. In addition, by using a large memory system, the point at which computing has to stop to wait for I/O is delayed, enablingfast computations on larger data sets.
UR - http://www.scopus.com/inward/record.url?scp=85025807278&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85025807278&partnerID=8YFLogxK
U2 - 10.1145/3093338.3093382
DO - 10.1145/3093338.3093382
M3 - Conference contribution
AN - SCOPUS:85025807278
T3 - ACM International Conference Proceeding Series
BT - PEARC 2017 - Practice and Experience in Advanced Research Computing 2017
PB - Association for Computing Machinery
T2 - 2017 Practice and Experience in Advanced Research Computing, PEARC 2017
Y2 - 9 July 2017 through 13 July 2017
ER -