TY - JOUR
T1 - A Low-Cost Pipelined Architecture Based on a Hybrid Sorting Algorithm
AU - Chen, You Rong
AU - Ho, Chien Chia
AU - Chen, Wei Ting
AU - Chen, Pei Yin
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2024/2/1
Y1 - 2024/2/1
N2 - In this paper, a low-cost pipelined architecture based on a hybrid sorting algorithm is proposed. The proposed architecture is constructed with a bitonic sorter and several cascaded bidirectional insertion sorting units. The bidirectional insertion sorting unit uses the segmented sorted subsequence generated by the bitonic sorter as input, and records the maximum and minimum values of the subsequence. After all segmented subsequences are processed through the cascaded bidirectional insertion sorting units, a sorted sequence is obtained. The proposed architecture is implemented using the Verilog hardware description language (HDL) and synthesized using the Synopsys Design Compiler with a TSMC 90-nm cell library. The experimental results indicate that the proposed architecture can not only shorten sorting cycles but also reduce hardware area costs. Moreover, sorting cycles can be further shortened by increasing the parallelism of the proposed architecture. Under the configuration that 2048 32-bit data to be sorted and 16 data have to be processed simultaneously, the proposed architecture can improve the throughput-to-gate-count ratio by 16%, and throughput-to-power-consumption-ratio by 25% compared to the existing sorting design. The proposed architecture makes the most efficient use of hardware resources.
AB - In this paper, a low-cost pipelined architecture based on a hybrid sorting algorithm is proposed. The proposed architecture is constructed with a bitonic sorter and several cascaded bidirectional insertion sorting units. The bidirectional insertion sorting unit uses the segmented sorted subsequence generated by the bitonic sorter as input, and records the maximum and minimum values of the subsequence. After all segmented subsequences are processed through the cascaded bidirectional insertion sorting units, a sorted sequence is obtained. The proposed architecture is implemented using the Verilog hardware description language (HDL) and synthesized using the Synopsys Design Compiler with a TSMC 90-nm cell library. The experimental results indicate that the proposed architecture can not only shorten sorting cycles but also reduce hardware area costs. Moreover, sorting cycles can be further shortened by increasing the parallelism of the proposed architecture. Under the configuration that 2048 32-bit data to be sorted and 16 data have to be processed simultaneously, the proposed architecture can improve the throughput-to-gate-count ratio by 16%, and throughput-to-power-consumption-ratio by 25% compared to the existing sorting design. The proposed architecture makes the most efficient use of hardware resources.
UR - http://www.scopus.com/inward/record.url?scp=85181557033&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85181557033&partnerID=8YFLogxK
U2 - 10.1109/TCSI.2023.3342929
DO - 10.1109/TCSI.2023.3342929
M3 - Article
AN - SCOPUS:85181557033
SN - 1549-8328
VL - 71
SP - 717
EP - 730
JO - IEEE Transactions on Circuits and Systems I: Regular Papers
JF - IEEE Transactions on Circuits and Systems I: Regular Papers
IS - 2
ER -