[1]董建华,王国胤,雍熙,等.基于Spark的标准化PCA算法[J].郑州大学学报(工学版),2017,38(05):7-12.[doi:10.13705/j.issn.1671-6833.2017.05.001]
 Dong Chee-hwa,Wang Guoyin,Yongxi,et al.Normalized PCA Algorithm Based on Spark[J].Journal of Zhengzhou University (Engineering Science),2017,38(05):7-12.[doi:10.13705/j.issn.1671-6833.2017.05.001]
点击复制

基于Spark的标准化PCA算法()
分享到:

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

卷:
38
期数:
2017年05期
页码:
7-12
栏目:
出版日期:
2017-09-26

文章信息/Info

Title:
Normalized PCA Algorithm Based on Spark
作者:
董建华王国胤雍熙史晓雨李庆亮
1.中国科学院重庆绿色智能技术研究院电子信息技术研究所,重庆400714;2.中国科学院大学,北京100049;3.中国科学院重庆绿色智能技术研究院电子信息技术研究所,重庆,400714;4.水利部水利信息中心,北京,1000535.西昌卫星发射中心,海南文昌,571300
Author(s):
Dong Chee-hwa1Wang Guoyin2Yongxi3Shi Xiaoyu2Li Qingliang4
1. Institute of Electronic Information Technology, Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714; 2. University of Chinese Academy of Sciences, Beijing 100049; 3. Institute of Electronic Information Technology , Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing, 400714; 4. Ministry of Water Resources Information Center, Beijing, 100053 5. Xichang Satellite Launch Center, Wenchang, Hainan, 571300
关键词:
主成分分析Spark分布式标准化处理
Keywords:
PCASparkdistributednormalization
DOI:
10.13705/j.issn.1671-6833.2017.05.001
文献标志码:
A
摘要:
主成分分析(PCA)是数据挖掘中常用的数据降维技术,它将原来多个变量转化为少数的几个综合指标,介绍了主成分分析的原理、Spark的分布式架构以及Spark的MLlib分布式矩阵PCA算法,通过改进设计提出了一种带有标准化处理的SNPCA算法,并在多核CPU环境下进行了测试验证,实验结果证明了该算法的有效性.
Abstract:
Principal Component Analysis (PCA) is a well known model for dimensionality reduction in data mining,it transforms the original variables into a few comprehensive indices.In this paper,we study the principle of PCA,the distributed architecture of Spark and PCA algorithm of distributed matrix from spark’s ML-lib,then improved the design and present a new algorithm named SNPCA (Spark’s Normalized Principal Component Analysis),this SNPCA algorithm computes principal components together with data normalization process.We carried out benchmarking on multicore CPUs and the results demonstrate the effectiveness of SNPCA.

相似文献/References:

[1]邓万宇,李力,牛慧娟.基于Spark的并行极速神经网络[J].郑州大学学报(工学版),2016,37(05):47.[doi:10.3969/ j.issn.1671 -6833.2016.05.010]
 Deng Wanyu,Li Li,Niu Huijuan.Sparked-based Parallel Extreme Learning Machine[J].Journal of Zhengzhou University (Engineering Science),2016,37(05):47.[doi:10.3969/ j.issn.1671 -6833.2016.05.010]
[2]吕照民,周革,苗晨.基于自适应主成分分析的化工过程在线监测[J].郑州大学学报(工学版),2020,41(01):44.[doi:10.13705/j.issn.1671-6833.2019.04.006]
 Lu Zhaomin,Zhou leather,Miao Chen.Online Monitoring of Chemical Process Based on Adaptive Principal Component Analysis[J].Journal of Zhengzhou University (Engineering Science),2020,41(05):44.[doi:10.13705/j.issn.1671-6833.2019.04.006]

更新日期/Last Update: