TY - JOUR
T1 - Cross-Scale Fusion Transformer for Histopathological Image Classification
AU - Huang, Sheng Kai
AU - Yu, Yu Ting
AU - Huang, Chun Rong
AU - Cheng, Hsiu Chi
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2024/1/1
Y1 - 2024/1/1
N2 - Histopathological images provide the medical evidences to help the disease diagnosis. However, pathologists are not always available or are overloaded by work. Moreover, the variations of pathological images with respect to different organs, cell sizes and magnification factors lead to the difficulty of developing a general method to solve the histopathological image classification problems. To address these issues, we propose a novel cross-scale fusion (CSF) transformer which consists of the multiple field-of-view patch embedding module, the transformer encoders and the cross-fusion modules. Based on the proposed modules, the CSF transformer can effectively integrate patch embeddings of different field-of-views to learn cross-scale contextual correlations, which represent tissues and cells of different sizes and magnification factors, with less memory usage and computation compared with the state-of-the-art transformers. To verify the generalization ability of the CSF transformer, experiments are performed on four public datasets of different organs and magnification factors. The CSF transformer outperforms the state-of-the-art task specific methods, convolutional neural network-based methods and transformer-based methods.
AB - Histopathological images provide the medical evidences to help the disease diagnosis. However, pathologists are not always available or are overloaded by work. Moreover, the variations of pathological images with respect to different organs, cell sizes and magnification factors lead to the difficulty of developing a general method to solve the histopathological image classification problems. To address these issues, we propose a novel cross-scale fusion (CSF) transformer which consists of the multiple field-of-view patch embedding module, the transformer encoders and the cross-fusion modules. Based on the proposed modules, the CSF transformer can effectively integrate patch embeddings of different field-of-views to learn cross-scale contextual correlations, which represent tissues and cells of different sizes and magnification factors, with less memory usage and computation compared with the state-of-the-art transformers. To verify the generalization ability of the CSF transformer, experiments are performed on four public datasets of different organs and magnification factors. The CSF transformer outperforms the state-of-the-art task specific methods, convolutional neural network-based methods and transformer-based methods.
UR - http://www.scopus.com/inward/record.url?scp=85174798838&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85174798838&partnerID=8YFLogxK
U2 - 10.1109/JBHI.2023.3322387
DO - 10.1109/JBHI.2023.3322387
M3 - Article
C2 - 37801390
AN - SCOPUS:85174798838
SN - 2168-2194
VL - 28
SP - 297
EP - 308
JO - IEEE Journal of Biomedical and Health Informatics
JF - IEEE Journal of Biomedical and Health Informatics
IS - 1
ER -