一种基于内容的文档图像检索方法-《郑州大学学报(工学版)》

文章信息/Info

摘要:: 使用一个图像作为查询检索输入,根据该图像的版面分析特征、统计特征、纹理特征与数据库中图像的相似程度检索图像.该检索方法首先利用数学形态学对文档图像进行段落分割和行分割,作为文档图像的版面结构特征;然后根据图像的统计特征包括字符数、统计数特征、纹理特征给出文档图像抽取算法;最后给出检索算法模型.实验结果表明,本算法具有较好的查准率和查全率,在基于内容的文档图像检索中具有应用价值.

Abstract:: Using an image as the query retrieval input, the image is retrieved according to the similarity between the layout analysis features, statistical features, and texture features of the image and the image in the database. Firstly, the mathematical morphology is used to segment and line segmentation of document images, which are used as the layout structure characteristics of document images. Then, according to the statistical characteristics of the image, including the number of characters, statistical features, and texture features, the document image extraction algorithm is given. Finally, the retrieval algorithm model is given. Experimental results show that the proposed algorithm has good accuracy and recall rate, and has application value in content-based document image retrieval.