[1]王孔源,毕 莹,郭伟峰,等.多模态医学影像分类与癌症诊断综述[J].郑州大学学报(工学版),2027,48(XX):1-10.[doi:10.13705/j.issn.1671-6833.2026.06.005]
 WANG Kongyuan,BI Ying,GUO Weifeng,et al.A Review of Multimodal Medical Image Classification and Cancer Diagnosis[J].Journal of Zhengzhou University (Engineering Science),2027,48(XX):1-10.[doi:10.13705/j.issn.1671-6833.2026.06.005]
点击复制

多模态医学影像分类与癌症诊断综述()
分享到:

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

卷:
48
期数:
2027年XX
页码:
1-10
栏目:
出版日期:
2027-12-10

文章信息/Info

Title:
A Review of Multimodal Medical Image Classification and Cancer Diagnosis
作者:
王孔源1, 毕 莹1, 郭伟峰1, 梁 静2, 吴方向3
1. 郑州大学 电气与信息工程学院,河南 郑州 450001;2. 河南工学院 电气工程与自动化学院,河南 新乡 453000;3. 萨斯喀彻温大学 生物医学工程系,加拿大 萨斯卡通 S7N 5A9
Author(s):
WANG Kongyuan1, BI Ying1, GUO Weifeng1, LIANG Jing2, WU Fangxiang3
1. School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou 450001, China; 2. School of Electrical Engineering and Automation, Henan Institute of Technology, Xinxiang 453000, China; 3. Division of Biomedical Engineering, University ofSaskatchewan, Saskatoon S7N 5A9, Canada
关键词:
医学影像分类 特征提取 多模态融合 癌症诊断 临床应用
Keywords:
medical image classification feature extraction multimodal fusion cancer diagnosis clinical application
分类号:
TP391. 4
DOI:
10.13705/j.issn.1671-6833.2026.06.005
文献标志码:
A
摘要:
由于多模态医学影像分类技术能够有效融合来自不同成像模态的数据,在结构、功能与代谢等多个层面构建更加全面和互补的特征表达,从而显著提升了疾病分类性能与临床诊断的准确性和可靠性,因此获得了广大研究人员的关注。 本篇综述首先介绍了多模态医学影像分类的基本原理与整体流程,包括数据预处理、特征提取、多模态信息融合以及最终分类与模型评估等关键环节,并总结了多模态信息融合的核心思想与主流范式。 随后,从方法层面对多模态医学影像的四种融合方法进行了系统分析与对比,并结合癌症诊断等具体临床应用场景,重点讨论了相关方法在甲状腺癌预测、胃癌早期筛查、免疫响应预测、乳腺癌诊断和皮肤病检测中的应用效果与特点。 最后,针对当前多模态医学影像分类在数据获取与标注成本较高、不同模态间异质性强导致融合困难、深度学习模型可解释性差以及模型泛化性和鲁棒性不足等问题进行了归纳与总结,并对发展趋势进行了展望。
Abstract:
Multimodal medical image classification techniques were able to effectively integrate data from differentimaging modalities and to construct more comprehensive and complementary feature representations across multiplelevels, including structural, functional, and metabolic dimensions. As a result, they markedly improved diseaseclassification performance and enhanced the accuracy and reliability of clinical diagnosis, thereby attracting substantial attention from the research community. This review first introduced the fundamental principles and overallworkflow of multimodal medical image classification, covering key stages such as data preprocessing, feature extraction, multimodal information fusion, and final classification and model evaluation. It also summarized the core ideas and mainstream paradigms of multimodal information fusion. Subsequently, it systematically analyzed and compared four multimodal medical image fusion methods at the methodological level, and discussed their clinical application effects and characteristics, with a particular focus on cancer-related tasks, including thyroid cancer prediction, early gastric cancer screening, immune response prediction, breast cancer diagnosis, and dermatological disease detection. Finally, it summarized existing challenges in the field of multimodal medical image classification,including high data acquisition and annotation costs, strong inter-modality heterogeneity, limited model interpretability, and insufficient generalization and robustness, and it provided an outlook on future research trends.

参考文献/References:

[1] Bray F, Laversanne M, Sung H, et al. Global cancerstatistics 2022: GLOBOCAN estimates of incidence andmortality worldwide for 36 cancers in 185 countries [ J] .CA: A Cancer Journal for Clinicians, 2024, 74(3): 229-263.
[2] Han Bingfeng, Zheng Rongshou, Zeng Hongmei, et al.Cancer incidence and mortality in China, 2022[ J] . Jour nal of the National Cancer Center, 2024, 4(1) : 47-53.
[3] World Health Organization. Assessing national capacityfor the prevention and control of noncommunicable disea ses: report of the 2021 global survey[ M] . World HealthOrganization, 2023.
[4] Cao Guangwen. Cancer in China: epidemiological charac-teristics, current prophylaxis and treatment, and futurestrategy [ J] . Academic Journal of Naval Medical Univer sity, 2025, 46 (3) : 279 - 290. [ 曹广文。我国癌症的流行特点、防控现状及未来应对策略 [ J] . 海军军医大学学报,2025, 46 (3) : 279-290. ]
[5] Xia J Y, Aadam A A. Advances in screening and detec tion of gastric cancer[ J] . Journal of Surgical Oncology,2022, 125(7) : 1104-1109.
[6] Dey N, Bhateja V, Hassanien A E. Medical imaging inclinical applications: algorithmic and computer-based ap proaches[M]. Cham: Springer International Publishing, 2016.
[7] Anwar S M, Majid M, Qayyum A, et al. Medical imageanalysis using convolutional neural networks: a review[ J] . Journal of Medical Systems, 2018, 42(11) : 226.
[8] Gao Zixian, Jiang Xun, Xu Xing, et al. Embracing uni modal aleatoric uncertainty for robust multimodal fusion[C]∥Proceedings of the 2024 IEEE / CVF Conference onComputer Vision and Pattern Recognition ( CVPR) . Pis cataway: IEEE, 2024: 26866-26875.
[9] Rao V M, Hla M, Moor M, et al. Multimodal generativeAI for medical image interpretation [ J] . Nature, 2025,639(8056) : 888-896.
[10] Azam M A, Khan K B, Salahuddin S, et al. A review onmultimodal medical image fusion: compendious analysisof medical modalities, multimodal databases, fusion tech niques and quality metrics[ J] . Computers in Biology andMedicine, 2022, 144: 105253.
[11] Li Yihao, El Habib Daho M, Conze P H, et al. A reviewof deep learning-based information fusion techniques formultimodal medical image classification [ J ] . Computersin Biology and Medicine, 2024, 177: 108635.
[12] Xing Xiaodan, Wu Huanjun, Wang Lichao, et al. Non imaging medical data synthesis for trustworthy AI: a com prehensive survey[ J] . ACM Computing Surveys, 2024,56(7) : 1-35.
[13] Ronneberger O, Fischer P, Brox T. U-Net: convolutionalnetworks for biomedical image segmentation[C]∥Medicalimage computing and computer-assisted intervention-MIC CAI 2015. Cham: Springer International Publishing,2015: 234-241.
[14] Hasanah U, Avian C, Darmawan J T, et al. CheXNetand feature pyramid network: a fusion deep learning ar chitecture for multilabel chest X-Ray clinical diagnosesclassification[ J] . The International Journal of Cardiovas cular Imaging, 2024, 40(4) : 709-722.
[15] Ehteshami Bejnordi B, Veta M, Johannes van Diest P, etal. Diagnostic assessment of deep learning algorithms fordetection of lymph node metastases in women with breastcancer[ J] . Jama, 2017, 318(22) : 2199.
[16] Heinrich M P, Jenkinson M, Bhushan M, et al. MIND:modality independent neighbourhood descriptor for multi modal deformable registration[ J] . Medical Image Analy sis, 2012, 16(7) : 1423-1435.
[17] Zhu Junyan, Park T, Isola P, et al. Unpaired image-to image translation using cycle-consistent adversarial net works[ C] ∥Proceedings of the 2017 IEEE InternationalConference on Computer Vision ( ICCV ) . Piscataway:IEEE, 2017: 2242-2251.
[18] Isola P, Zhu Junyan, Zhou Tinghui, et al. Image-to-im age translation with conditional adversarial networks[C]∥Proceedings of the 2017 IEEE Conference on ComputerVision and Pattern Recognition ( CVPR ) . Piscataway:IEEE, 2017: 5967-5976.
[19] Dalmaz O, Yurt M, Cukur T. ResViT: residual visiontransformers for multimodal medical image synthesis[ J] .IEEE Transactions on Medical Imaging, 2022, 41( 10) :2598-2614.
[20] Kazerouni A, Aghdam E K, Heidari M, et al. Diffusionmodels in medical imaging: a comprehensive survey[ J] .Medical Image Analysis, 2023, 88: 102846.
[21] Zaitsev M, MacLaren J, Herbst M. Motion artifacts inMRI: a complex problem with many partial solutions[ J] .Journal of Magnetic Resonance Imaging, 2015, 42 ( 4) :887-901.
[22] Zhang Kai, Zuo Wangmeng, Chen Yunjin, et al. Beyonda Gaussian denoiser: residual learning of deep CNN forimage denoising [ J] . IEEE Transactions on Image Pro cessing, 2017, 26(7) : 3142-3155.
[23] Valindria V V, Pawlowski N, Rajchl M, et al. Multi-mo dal learning from unpaired images: application to multi organ segmentation in CT and MRI [ C]∥Proceedings ofthe 2018 IEEE Winter Conference on Applications ofComputer Vision ( WACV) . Piscataway: IEEE, 2018:547-556.
[24] Lin Yusong, Li Mengya, Li Yinghao, et al. Multimodalmedical image fusion based on GAN and multiscale spa tial attention [ J] . Journal of Zhengzhou University (Engi neering Science) , 2025, 46 (1) : 1-8. [林予松,李孟娅,李英豪,等。基于 GAN 和多尺度空间注意力的多模态 医 学 图 像 融 合 [ J] . 郑 州 大 学 学 报 ( 工 学 版) ,2025, 46 (1) : 1-8. ]
[25] Chen R J, Lu M Y, Wang Jingwen, et al. Pathomic fu sion: an integrated framework for fusing histopathologyand genomic features for cancer diagnosis and prognosis[ J] . IEEE Transactions on Medical Imaging, 2022, 41(4) : 757-770.
[26] Chen Junyu, Liu Yihao, Wei Shuwen, et al. A survey ondeep learning in medical image registration: new technol-ogies, uncertainty, evaluation metrics, and beyond [ J] .Medical Image Analysis, 2025, 100: 103385.
[27] Jia Gengyun, Huang Huaibo, Fu Chaoyou, et al. Rethin king image cropping: exploring diverse compositions fromglobal views [ C] ∥Proceedings of the 2022 IEEE / CVFConference on Computer Vision and Pattern Recognition(CVPR) . Piscataway: IEEE, 2022: 2436-2445.
[28] Wang Zhiwei, Liu Chaoyue, Cheng Danpeng, et al. Au tomated detection of clinically significant prostate cancerin mp-MRI images based on an end-to-end deep neuralnetwork [ J ] . IEEE Transactions on Medical Imaging,2018, 37(5) : 1127-1139.
[29] Liu Zhonghua, Zhu Fa, Vasilakos A V, et al. Discrimi native approximate regression projection for feature extrac tion[ J] . Information Fusion, 2025, 120: 103088.
[30] Bai Yunping, Xu Yifu, Chen Shifan, et al. TOPS-speedcomplex-valued convolutional accelerator for feature ex traction and inference [ J ] . Nature Communications,2025, 16: 292.
[31] Elharrouss O, Himeur Y, Mahmood Y, et al. ViTs asbackbones: leveraging vision transformers for feature ex traction[ J] . Information Fusion, 2025, 118: 102951.
[32] Yang Boquan, Li Jixiong, Zeng Ting. A review of envi ronmental perception technology based on multi-sensor in formation fusion in autonomous driving[ J] . World Elec tric Vehicle Journal, 2025, 16(1) : 20.
[33] He Man, Han Kangfu, Zhang Yu, et al. Hierarchical-or der multimodal interaction fusion network for grading glio mas [ J ] . Physics in Medicine & Biology, 2021, 66(21) : 215016.
[34] Zhang Pengfei, Li Tianrui, Yuan Zhong, et al. A data level fusion model for unsupervised attribute selection inmulti-source homogeneous data [ J] . Information Fusion,2022, 80: 87-103.
[35] Ranipa K, Zhu Weiping, Swamy M N S. A novel feature level fusion scheme with multimodal attention CNN forheart sound classification [ J ] . Computer Methods andPrograms in Biomedicine, 2024, 248: 108122.
[36] Ma Dong, Liu Zhihao, Gao Qinhe, et al. Few-shot faultdiagnosis of EHA based on MTF-ResNet-MA and dual-at tribute adaptive decision-level fusion [ J] . Measurement,2025, 247: 116787.
[37] Burt P J, Adelson E H. The Laplacian pyramid as a com pact image code [ M ] ∥Readings in Computer Vision.Elsevier, 1987: 671-679.
[38] Choi M, Kim R Y, Nam M R, et al. Fusion of multi spectral and panchromatic satellite images using the cur velet transform[ J] . IEEE Geoscience and Remote Sens ing Letters, 2005, 2(2) : 136-140.
[39] Yang Bin, Li Shutao. Multifocus image fusion and resto ration with sparse representation [ J]. IEEE Transactionson Instrumentation and Measurement, 2010, 59(4): 884-892.
[40] Li Shutao, Kang Xudong, Hu Jianwen. Image fusion withguided filtering [ J ] . IEEE Transactions on Image Pro cessing, 2013, 22(7) : 2864-2875.
[41] Mitianoudis N, Stathaki T. Pixel-based and region-basedimage fusion schemes using ICA bases [ J] . InformationFusion, 2007, 8(2) : 131-142.
[42] Freund Y, Schapire R E. A decision-theoretic generaliza tion of on-line learning and an application to boosting[ J] . Journal of Computer and System Sciences, 1997,55(1) : 119-139.
[43] Cai Jiati, Yin Jin, Zhou Fan, et al. Research on devel opment trends of multimodal fusion for medical imageclassification [ J] . Chinese Journal of Bases and Clinics inGeneral Surgery, 2025, 32 (7) : 793-800. [蔡佳倜,殷晋,周帆,等。面向医学影像图像分类:基于深度学习的多模态融合发展趋势 [ J] . 中国普外基础与临床杂志,2025, 32 (7) : 793-800. ]
[44] Li Hui, Wu Xiaojun. DenseFuse: a fusion approach toinfrared and visible images [ J ] . IEEE Transactions onImage Processing, 2019, 28(5) : 2614-2623.
[45] Liang Nannan. Medical image fusion with deep neuralnetworks[ J] . Scientific Reports, 2024, 14: 7972.
[46] Chen Wei, Li Qixuan, Zhang Heng, et al. MR-CT imagefusion method of intracranial tumors based on Res2Net[ J] . BMC Medical Imaging, 2024, 24: 169.
[47] Kamnitsas K, Ledig C, Newcombe V F J, et al. Efficientmulti-scale 3D CNN with fully connected CRF for accu rate brain lesion segmentation[ J] . Medical Image Analy sis, 2017, 36: 61-78.
[48] Isensee F, Jaeger P F, Kohl S A A, et al. nnU-Net: aself-configuring method for deep learning-based biomedic al image segmentation [ J ] . Nature Methods, 2021, 18(2) : 203-211.
[49] Albekairi M, Mohamed M V O, Kaaniche K, et al. Mul timodal medical image fusion combining saliency percep tion and generative adversarial network [ J ] . ScientificReports, 2025, 15: 10609.
[50] Chen R J, Lu M Y, Weng W H, et al. Multimodal co-at tention transformer for survival prediction in gigapixelwhole slide images[ C]∥Proceedings of the 2021 IEEE /CVF International Conference on Computer Vision ( IC CV) . Piscataway: IEEE, 2021: 3995-4005.
[51] Dar S U, Yurt M, Karacan L, et al. Image synthesis inmulti-contrast MRI with conditional generative adversarialnetworks [ J ] . IEEE Transactions on Medical Imaging, 2019, 38(10) : 2375-2388.
[52] Yang Guang, Yu Simiao, Dong Hao, et al. DAGAN:deep de-aliasing generative adversarial networks for fastcompressed sensing MRI reconstruction[ J] . IEEE Trans actions on Medical Imaging, 2018, 37(6) : 1310-1321.
[53] Shen Pengcheng, Yang Zheyu, Sun Jingjing, et al. Ex plainable multimodal deep learning for predicting thyroidcancer lateral lymph node metastasis using ultrasound im aging[ J] . Nature Communications, 2025, 16: 7052.
[54] Hu Can, Xia Yingda, Zheng Zhilin, et al. AI-basedlarge-scale screening of gastric cancer from noncontrastCT imaging[ J] . Nature Medicine, 2025, 31(9) : 3011-3019.
[55] Luo Jia, Vanguri R S, Aukerman A T, et al. Multimodalintegration of radiology, pathology, and genomics for pre diction of response to PD - 1 blockade in patients withnon-small cell lung cancer[ J] . Journal of Clinical Oncol ogy, 2022, 40(16_suppl) : 9064.
[56] Li Chengyi, Chang K J, Yang Chengfu, et al. Towards aholistic framework for multimodal LLM in 3D brain CT ra diology report generation [ J ] . Nature Communications,2025, 16: 2258.
[57] Qian Xuejun, Pei Jing, Han Chunguang, et al. A multi modal machine learning model for the stratification ofbreast cancer risk [ J] . Nature Biomedical Engineering,2025, 9(3) : 356-370.
[58] Yan Siyuan, Yu Zhen, Primiero C, et al. A multimodalvision foundation model for clinical dermatology[ J] . Na ture Medicine, 2025, 31(8) : 2691-2702.
[59] Menze B H, Jakab A, Bauer S, et al. The multimodalbrain tumor image segmentation benchmark ( BRATS )[ J] . IEEE Transactions on Medical Imaging, 2015, 34(10) : 1993-2024.
[60] de Vent N R, Agelink van Rentergem J A, Schmand BA, et al. Advanced neuropsychological diagnostics infra structure ( ANDI ) : a normative database created fromcontrol datasets [ J ] . Frontiers in Psychology, 2016,7: 1601.
[61] Rossi G, Barabino E, Fedeli A, et al. Radiomic detec tion of EGFR mutations in NSCLC[ J] . Cancer Research,2021, 81(3) : 724-731.
[62] Li Hui, Zhu Yitan, Burnside E S, et al. QuantitativeMRI radiomics in the prediction of molecular classifica tions of breast cancer subtypes in the TCGA / TCIA dataset[ J] . npj Breast Cancer, 2016, 2: 16012.

相似文献/References:

[1]葛光涛,桑恩方,刘卓夫,等.基于BEMD的水下声影区地貌及纹理特征提取[J].郑州大学学报(工学版),2007,28(4):70.[doi:10.3969/j.issn.1671-6833.2007.04.018]
 Ge Guangtao,Sang Enfang,LIU Zhuofu,et al.Extraction of geomorphological and texture features of underwater acoustic shadow area based on BEMD[J].Journal of Zhengzhou University (Engineering Science),2007,28(XX):70.[doi:10.3969/j.issn.1671-6833.2007.04.018]
[2]师黎,王丽佳..小波变换在心电信号处理及特征提取中的应用[J].郑州大学学报(工学版),2006,27(2):65.[doi:10.3969/j.issn.1671-6833.2006.02.016]
 Shi Li,Wang Lijia.Application of wavelet transform in ECG signal processing and feature extraction[J].Journal of Zhengzhou University (Engineering Science),2006,27(XX):65.[doi:10.3969/j.issn.1671-6833.2006.02.016]
[3]徐旭华,刘凯,杨开明,等.数字化采集数据的模型特征提取[J].郑州大学学报(工学版),1999,20(2):70.[doi:10.3969/j.issn.1671-6833.1999.02.023]
 Xu Xuhua,LIU Kai,Yang Kaiming,et al.Model feature extraction for digitally collected data[J].Journal of Zhengzhou University (Engineering Science),1999,20(XX):70.[doi:10.3969/j.issn.1671-6833.1999.02.023]
[4]李志新,商樊淇,郇 战,等.基于混合特征图卷积神经网络的人体行为识别方法[J].郑州大学学报(工学版),2024,45(4):46.[doi:10.13705/ j.issn.1671-6833.2024.04.002]
 LI Zhixin,SHANG Fanqi,HUAN Zhan,et al.Human Activity Recognition Based on Hybrid Feature Graph Convolutional Neural Network[J].Journal of Zhengzhou University (Engineering Science),2024,45(XX):46.[doi:10.13705/ j.issn.1671-6833.2024.04.002]
[5]余松森,龙嘉濠,周 诺,等.基于相机运动轨迹的鲁棒无人机航拍稳像算法[J].郑州大学学报(工学版),2024,45(5):77.[doi:10.13705/j.issn.1671-6833.2024.02.004]
 YU Songsen,LONG Jiahao,ZHOU Nuo,et al.Robust Algorithm for Aerial Video Stabilization of UAV Based onCamera Motion Trajectory[J].Journal of Zhengzhou University (Engineering Science),2024,45(XX):77.[doi:10.13705/j.issn.1671-6833.2024.02.004]
[6]张 震,肖宗荣,李友好,等.基于改进YOLOv7的高风险区工程车辆识别算法[J].郑州大学学报(工学版),2025,46(5):1.[doi:10.13705/j.issn.1671-6833.2025.02.019]
 ZHANG Zhen,XIAO Zongrong,LI Youhao,et al.Construction Vehicles Recognition Algorithm Based on Improved YOLOv7 in High Risk Areas[J].Journal of Zhengzhou University (Engineering Science),2025,46(XX):1.[doi:10.13705/j.issn.1671-6833.2025.02.019]
[7]朱晓东,任春晓,刘晓兰,等.基于适应度地形分析的优化算法调度方法[J].郑州大学学报(工学版),2025,46(6):32.[doi:10.13705/j.issn.1671-6833.2025.03.017]
 ZHU Xiaodong,REN Chunxiao,LIU Xiaolan,et al.Optimization Algorithm Scheduling Method Based on Fitness Landscape Analysis[J].Journal of Zhengzhou University (Engineering Science),2025,46(XX):32.[doi:10.13705/j.issn.1671-6833.2025.03.017]
[8]曹仰杰,蔡吉灏,王沛祺,等.一种抗行人干扰的车辆重识别算法[J].郑州大学学报(工学版),2026,47(2):35.[doi:10.13705/j.issn.1671-6833.2025.05.012]
 CAO Yangjie,CAI Jihao,WANG Peiqi,et al.A Vehicle Re-identification Algorithm Against Pedestrian Interference[J].Journal of Zhengzhou University (Engineering Science),2026,47(XX):35.[doi:10.13705/j.issn.1671-6833.2025.05.012]

备注/Memo

备注/Memo:
收稿日期:2026-04-05;修订日期:2026-05-28基金项 目: 国 家 自 然 科 学 基 金 资 助 项 目 ( 62376253; 62476253 ) ; 河 南 省 生 物 与 新 医 药 产 业 研 发 联 合 基 金 项 目(245101610001)作者简介:王孔源(1998— ) ,男,河南焦作人,郑州大学博士研究生,主要从事进化计算、图像分类、深度学习等研究,Email:wangkongyuan@ gs. zzu. edu. cn。通信作者:毕莹(1992— ) ,女,湖北黄冈人,郑州大学教授,博士生导师,主要从事进化计算、遗传规划、图像分析等研究,E-mail:yingbi@ zzu. edu. cn。
更新日期/Last Update: 2026-06-03