We propose a radix-4 modular multiplication algorithm based on Montgomery's algorithm, and a radix-4 cellular-array modular multiplier based on Booth's multiplication algorithm. The radix-4 modular multiplier can be used to implement fast RSA cryptosystem. Due to reduced number of iterations and pipelining, our modular multiplier is four times faster than the cellular-array modular multiplier based on the original Montgomery's algorithm. The time to calculate a modular exponentiation is about n2 clock cycles, where n is the word length, and the clock cycle is roughly equal to the delay time of a full adder. The utilization of the multiplier is 100% by interleaving consecutive exponentiations. Locality, regularity, and modularity make the proposed architecture suitable for VLSI implementation.