基于MapReduce框架一种文本挖掘算法的设计与实现-《郑州大学学报(工学版)》

文章信息/Info

Title:: The Design and lmplemention of a Text Mining AlgorithmBased on MapReduce Framework

作者:: 朱蔷蔷,张桂芸,刘文龙.; 天津师范大学计算机与信息工程学院,天津,300387, 天津师范大学计算机与信息工程学院,天津,300387, 天津师范大学计算机与信息工程学院,天津,300387

Author(s):: ZHU Qiangqiang,ZHANG Guiyun,LIU Wenlong; College of Computer and information Engineering, Tianjin Normal University, Tianjin 300387, China

摘要:: 随着文本挖掘在主动信息服务中应用的日益扩展,在文本数据的基础上分析数据的内在特征已经成为目前的研究趋势,本文在Hadoop平台上设计并实现了一种文本挖掘算法,该算法利用MapReduce框架按照自然语料中相邻词组出现的频数进行降序输出,从而有助于用户挖掘大量数据中各项集之间的联系,实验结果体现了该算法的有效性和良好的加速比.

Abstract:: With the expanding applieation of text mining in aetive information serviee, analyzing the inherentcharacteristics of data based on the text data is becoming a current research trend,this paper designs and im.plements a text mining algorithm based on the Hadoop platform which outputs the data aecording to the naturalcorpora adjacent phrase descending frequency ,thus helping the users mine the link between the set in the largequantities of data, In view of the distributed feature of the Hadoop platform, the experimental result shows theefficieney and better speedup.