TY - JOUR
T1 - A hardware/software framework for instruction and data scratchpad memory allocation
AU - Chen, Zhong Ho
AU - Su, Alvin W.Y.
N1 - Copyright:
Copyright 2010 Elsevier B.V., All rights reserved.
PY - 2010/4/1
Y1 - 2010/4/1
N2 - Previous researches show that a scratchpad memory device consumed less energy than a cache device with the same capacity. In this article, we locate the scratchpad memory (SPM) in the top level of the memory hierarchy to reduce the energy consumption. To take the advantage of a SPM, we address two issues of utilizing a SPM. First, the program's locality should be improved. The second issue is SPM management. To tackle these two issues, we present a hardware/software framework for dynamically allocating both instructions and data in SPM. The software flow could be divided into three phases: locality improving, locality extraction, and runtime SPM management. Without modifying the original compiler and the source code, we improve the locality of a program. An optimization algorithm is proposed to extract the SPM allocations. At runtime, an SPM management program is employed. In hardware, an address translation logic (ATL) is proposed to reduce the overhead of SPM management. The results show that the proposed framework can reduce energy delay product (EDP) by 63%, on average, when compared with the traditional cache architecture. The reduction in EDP is contributed by properly allocating both instructions and data in SPM. By allocating only instructions in SPM, the EDPs are reduced by 45%, on average. By allocating only data in SPM, the EDPs are reduced by 14%, on average.
AB - Previous researches show that a scratchpad memory device consumed less energy than a cache device with the same capacity. In this article, we locate the scratchpad memory (SPM) in the top level of the memory hierarchy to reduce the energy consumption. To take the advantage of a SPM, we address two issues of utilizing a SPM. First, the program's locality should be improved. The second issue is SPM management. To tackle these two issues, we present a hardware/software framework for dynamically allocating both instructions and data in SPM. The software flow could be divided into three phases: locality improving, locality extraction, and runtime SPM management. Without modifying the original compiler and the source code, we improve the locality of a program. An optimization algorithm is proposed to extract the SPM allocations. At runtime, an SPM management program is employed. In hardware, an address translation logic (ATL) is proposed to reduce the overhead of SPM management. The results show that the proposed framework can reduce energy delay product (EDP) by 63%, on average, when compared with the traditional cache architecture. The reduction in EDP is contributed by properly allocating both instructions and data in SPM. By allocating only instructions in SPM, the EDPs are reduced by 45%, on average. By allocating only data in SPM, the EDPs are reduced by 14%, on average.
UR - http://www.scopus.com/inward/record.url?scp=77952031363&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77952031363&partnerID=8YFLogxK
U2 - 10.1145/1736065.1736067
DO - 10.1145/1736065.1736067
M3 - Article
AN - SCOPUS:77952031363
SN - 1544-3566
VL - 7
JO - Transactions on Architecture and Code Optimization
JF - Transactions on Architecture and Code Optimization
IS - 1
M1 - 2
ER -