Continued advances in high-performance computing architectures constantly move the computational performance forward widening performance gap with I/O. As a result, I/O plays an increasingly critical role in modern data-intensive scientific applications. We have developed a high-performance GPU-based software called cuNCC, which is designed to calculate seismic waveform similarity for subjects like hypocenter estimates and small earthquake detection. GPU's acceleration greatly reduced the compute time and we are currently investigating I/O optimizations, to tackle this new performance bottleneck. In order to find an optimal I/O solution for our cuNCC code, we had performed a series of I/O benchmark tests and implemented buffering in CPU memory to manage the output transfers. With this preliminary work, we were able to establish that buffering improves the I/O bandwidth achieved, but is only beneficial when I/O bandwidth is limited, since the cost of the additional memory copy may exceed improvement in I/O. However, in realistic environment where I/O bandwidth per node is limited, and small I/O transfers are penalized, this technique will improve overall performance. In addition, by using a large memory system, the point at which computing has to stop to wait for I/O is delayed, enablingfast computations on larger data sets.