«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j.issn.1671-6833.2023.04.012]
点击复制

复合可靠性分析下的不平衡数据证据分类()

分享到：

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

卷:: 44
期数:: 2023年04期

页码:: 22-28

栏目:

出版日期:: 2023-06-01

文章信息/Info

Title:: Imbalanced Data Evidential Classification with Composite Reliability

作者:: 田鸿朋; 张震; 张思源; 肖宗荣; 董佳兵; 郑州大学电气与信息工程学院,河南郑州 450001

Author(s):: TIAN Hongpeng; ZHANG Zhen; ZHANG Siyuan; XIAO Zongrong; DONG Jiabing; School of Electrical and Information Engineering, Zhengzhou University, 450001, Zhengzhou, Henan

关键词:: 不平衡数据; 分类; 全局可靠性; 局部可靠性; 证据推理

Keywords:: imbalanced data; classification; global reliability; local reliability; evidential reasoning

分类号:: TP181

DOI:: 10.13705/j.issn.1671-6833.2023.04.012

文献标志码:: A

摘要:: 针对传统分类模型在处理不平衡数据时会侧重于大类而忽略小类的问题,提出了一种复合可靠性分析下的不平衡数据证据分类方法,该方法通过评估分类模型的全局可靠性和局部可靠性来提升模型对每个不平衡测试样本的分类能力。首先,该方法对大类多次降采样,采样后的数据与小类组成多个训练子集,用这些子集训练得到多个分类模型,通过最大均值差异度量采样前后数据分布的差异性得到不同分类模型的全局可靠性。其次,利用待测样本在训练集中的近邻来评估其分类结果的局部可靠性,待测样本与其近邻具有相似的数据分布和空间结构,分类模型对近邻的分类结果与真实类别偏差越小,其局部可靠性就越大。最后,在证据推理框架下,全局可靠性与局部可靠性组合为复合可靠性因子对不同分类模型得到的分类结果进行折扣,将部分概率值分配给完全未知类来表征数据类别的不确定性,用 Dempster-Shafer (DS)规则融合多个折扣后的分类结果做决策分析。实验结果表明,所提方法对 KEEL 和 UCI 数据库的 12 个不平衡数据分类结果的平均 FM 为 80. 18%,GM 为 87. 24%,相较于其他不平衡数据分类方法中最优结果分别高出 8. 1%和 4. 99%,所提方法的有效性在不平衡数据分类中得到了证实。

Abstract:: To address the problem that traditional classification models focus on majority class while ignoring minority class for classifying imbalanced data, An imbalanced data evidential classification method with composite reliability is proposed. This method improves the classification ability of the model for each imbalanced test sample by evaluating the global reliability and local reliability of the classification model. First, the method implements under sampling for majority class multiple times. The sampled subsets combine with minority class to form multiple training subsets. Multiple classification models are trained using these subsets. The maximum mean discrepancy measures the difference of data distribution before and after sampling, which can measure global reliability of the classification results obtained by classification models. Then, the local reliability of the classification result of the test sample in the training set is evaluated by using its nearest neighbors. The test sample and its nearest neighbors have similar data distribution and spatial structure. The smaller the deviation between the classification result of the classification model and the ground truth, the greater the local reliability of the classification result obtained by the classification model. Finally, under the framework of evidential reasoning, the global reliability and local reliability are combined as composite reliability factors to discount the classification results obtained from different classification models. Partial probability values are assigned to completely unknown classes to represent the uncertainty of classes. Dempster-Shafer (DS) rule is employed to fuse the classification results after multiple discounts for decision analysis. The experimental results show that the average FM and GM of the proposed method for the classification of 12 imbalanced data in KEEL and UCI database are 80. 18% and 87. 24%, respectively, which are 8. 1% and 4. 99% higher than those of other best imbalanced data classification methods, respectively. This proves the effectiveness of the proposed method in classifying imbalanced data.

参考文献/References:

［1］李艳霞，柴毅，胡友强，等不平衡数据分类方法综述［J］．控制与决策，2019,34（4）：673-688．

LIU Y X,CHAIY,HUYQ,et al.Review of imbalanceddataclassification methods[J].Control and Decision,2019,34(4):673-688.

［2］胡峰，王蕾，周耀．基于三支决策的不平衡数据过采样方法［J］．电子学报，2018,46（1）：135-144．

HU F,WANG L,ZHOU Y.An oversampling methodforimbalance data based on three-way decisionmodel[J].Acta Electronica Sinica,2018,46(1):135-144.［3］张震，张英杰．基于支持向量机与Hamming距离的虹膜识别方法［J］．郑州大学学报（工学版），2015,36(3):25-29.

ZHANG Z,ZHANG Y J. Iris recognition methodbased on support vector machine and Hammingdistance [J]. Journal of Zhengzhou University(EngineeringScience), 2015,36(3):25-29.

［4］韩敏，朱新荣．不平衡数据分类的混合算法［J］．控制理论与应用，2011,28（10）：1485-1489．

HAN M,ZHU X R.Hybrid algorithm for classification ofunbalanced datasets[J]. Control Theory & Applications, 2011,28(10):1485-1489.

［5］刘定祥，乔少杰，张永清，等．不平衡分类的数据采样方法综述［J］.重庆理工大学学报（自然科学），2019,33(7):102-112.

LIU D X,QIAO S J,ZHANG Y Q,et al. A surveyondata sampling methods in imbalanceclassification[J].Journal of Chongqing Universityof Technology(NaturalScience),2019,33(7):102-112.

［6］孙艳歌，邵罕，杨艳聪，基于代价敏感不平衡数据流分类算法［J］.信阳师范学院学报（自然科学版），

2019,32(4):670-674.

SUN Y G,SHAO H,YANG Y C. Classification for im-balanced data streams based on cost- sensitive[J].Jour-nal of Xinyang Normal University(Natural Seience Edition),2019,32(4):670-674.

［7］王乐，韩萌，李小娟，等．不平衡数据集分类方法综述［J］．计算机工程与应用，2021,57（22）：42-52.

WANG L,HAN M,LI X J, et al. Review ofclassifica-tion methods for unbalanced datasets[J].Computer En-gineering and Applications,2021,57(22):42-52.

[8] ZHANGZW,TIAN HP.YAN LZ,et al.Learning acredal classifier with optimized and adaptive multiestima- tion for missing data imputationU].IEEE Transactions on Systems,Man, and Cybernetics:Systems, 2022,52 (7):4092-4104.

[9] SHAFER G.A mathematical theory of evidence [M].Princeton:Princeton University Press,1976.

[10] HE H B,GARCIA EA. Learning from imbalanced dataU].IEEE Transactions on Knowledge and Data Engi-neering, 2009,21(9):1263-1284.

[11] CHAWLAN V.BOWYER K W,HALL L O,etal.SMOTE:synthetic minority over-sampling techniquep]Journal of Artificial Intelligence Research,2002,16:321-357.

[12] GUO H X,LIY J,SHANG J,et al. Learrsjng fromclass-irmbalanced data:review of methods and applica-tions[J].Expert Systems With Applications,2017,73: 220-239.

[13] LIN W C, TSAI C F, HU Y H, et al. Clustering-basedundersampling in class-imbalanced data[J], InformationSciences,2017,409-410:17-26.

[14] LIU X Y,WU J X, ZHOU Z H. Exploratory undersam-pling for class-imbalance learning[J]. IEEE Transactionson Systems,Man, and Cybernetics Part B,Cybernetics:Publication of the IEEE Systems, Man, and CybernelicsSociety,2009,39(2):539-550.

[15] CHALLA S, KOKS D. Bayesian and Dempster-Shafer fu-sion[J]. Sādhanā,2004,29:145-174.

[16] SMETS P. Decision making in the TBM: the necessity ofthe pignistic transformation[J]. International Journal of Approximate Reasoning,2005,38(2):133-147.

[17] JIMENEZ-CASTANOC,ALVAREZ-MEZAA,OROZCO-GUTIERREZ A. Enhanced automatic twin sup-port vector machine for imbalanced data classification[J]. Pattern. Recognition, 2020, 107:107442.

［18］逯鹏，李奇航，尚莉伽，等，基于优化极限学习机的CVD预测模型研究［J］.郑州大学学报（工学版），2019,40(2):1-5

LU P,LI Q H,SHANG L J, et al. A CVD predictionmodel based on optimized extreme learning machine[J].Journal of Zhengzhou University(Engineering Science),2019,40(2):1-5.

相似文献/References:

[1]李强,石陆魁,刘恩海,等.基于流形学习的基因微阵列数据分类方法[J].郑州大学学报(工学版),2012,33(05):121.[doi:10.3969/j.issn.1671-6833.2012.05.027]
[2]李向宁,郝克刚..一种基于分类和预测技术的产品成本估算系统研究与应用[J].郑州大学学报(工学版),2006,27(03):77.[doi:10.3969/j.issn.1671-6833.2006.03.018]
　LI Xiangning,Hao Kegang.Research and application of product cost estimation system based on classification and prediction technology[J].Journal of Zhengzhou University (Engineering Science),2006,27(04):77.[doi:10.3969/j.issn.1671-6833.2006.03.018]

更新日期/Last Update: 2023-06-30

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

文章信息/Info

参考文献/References:

相似文献/References:

常用功能

导航/Navigate

工具/Tools

统计/Statistics