Scalable montgomery modular multiplication architecture with low-latency and low-memory bandwidth requirement

Wen Ching Lin, Jheng Hao Ye, Ming Der Shieh

Research output: Contribution to journalArticlepeer-review

13 Citations (Scopus)


Montgomery modular multiplication is widely used in public-key cryptosystems. This work shows how to relax the data dependency in conventional word-based algorithms to maximize the possibility of reusing the current words of variables. With the greatly relaxed data dependency, we then proposed a novel scheduling scheme to alleviate the number of memory access in the developed scalable architecture. Analytical results show that the memory bandwidth requirement of the proposed scalable architecture is almost (1/(w-1)) times that of conventional scalable architectures, where (w) denotes word size. The proposed one also retains a latency of exactly one cycle between the operations of the same words in two consecutive iterations of the Montgomery modular multiplication algorithm when employing enough processing elements. Compared to the design in the related work, experimental results demonstrate that the proposed one achieves an almost 54 percent reduction in power consumption with no degradation in throughput. The reduced number of memory access not only leads to lower power consumption, but also facilitates the design of scalable architectures for any precision of operands.

Original languageEnglish
Article number6296657
Pages (from-to)475-483
Number of pages9
JournalIEEE Transactions on Computers
Issue number2
Publication statusPublished - 2014 Feb

All Science Journal Classification (ASJC) codes

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computational Theory and Mathematics


Dive into the research topics of 'Scalable montgomery modular multiplication architecture with low-latency and low-memory bandwidth requirement'. Together they form a unique fingerprint.

Cite this