[1]孙宁,王龙玉,刘佶鑫,等.结合特权信息与注意力机制的场景识别[J].郑州大学学报(工学版),2021,42(01):42-49.[doi:10.13705/j.issn.1671-6833.2021.01.007]
 Sun Ning,Wang Longyu,Liu Yuxin,et al.Scene Recognition ba<x>sed on Privilege Information and Attention Mechanism[J].Journal of Zhengzhou University (Engineering Science),2021,42(01):42-49.[doi:10.13705/j.issn.1671-6833.2021.01.007]
点击复制

结合特权信息与注意力机制的场景识别()
分享到:

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

卷:
42卷
期数:
2021年01期
页码:
42-49
栏目:
出版日期:
2021-03-14

文章信息/Info

Title:
Scene Recognition ba<x>sed on Privilege Information and Attention Mechanism
作者:
孙宁王龙玉佶鑫韩光
南京邮电大学宽带无线通信技术教育部工程研究中心;南京邮电大学通信与信息工程学院;

Author(s):
Sun Ning; Wang Longyu; Liu Yuxin; Han Guang;
Nanjing University of Posts and Telecommunications Broadband Wireless Communication Technology Education Research Center; School of Communication and Information Engineering, Nanjing University of Posts and Telecommunications;

关键词:
Keywords:
DOI:
10.13705/j.issn.1671-6833.2021.01.007
文献标志码:
A
摘要:
在场景识别中,RGB图像呈现外观信息,深度(Depth)图像包含几何信息,两者互为补充。为了在只有RGB图像的测试阶段也能利用深度图像与RGB图像所包含的互补信息,本文以深度图像为特权信息,提出了一种端到端可训练的深度神经网络模型,用以结合特权信息和注意力机制。在该模型中,以图像编码到特征解码再到图像编码为架构,建立了由RGB图像到深度图像再到深度图像高层语义特征的映射关系。通过注意力机制,将RGB图像高层语义特征与对应的深度图像高层语义特征进行融合,输入分类网络,最终得到预测结果。在测试时,只需要输入RGB图像,便可在本文模型获取的深度图像特权信息的帮助下,提升场景识别的性能。通过在SUN RGB-D,NYUD2两个RGB-D场景识别数据库上进行的大量实验,验证本文方法的有效性。
Abstract:
In the scene recognition, RGB images present appearance information and depth image contains geometry information,which complement each other. In order to use the complementary information contained in the depth images and the RGB images in the test phase with only RGB images, this paper uses the depth image as the privilege information, and proposes an end-to-end trainable deep neural network model to combine the privilege information and attention mechanism. In the proposed method, the image encoding, feature decoding and then image encoding are used as the fr<x>amework to establish a mapping relationship from RGB images to depth images to high-level semantic features of depth images. By using of the attention mechanism, the high-level semantic features of RGB images are fused with the corresponding high-level semantic features of the depth image. And these two features are fed into the classification network to make the final prediction. In the test phase, only RGB images need to be used, the performance of scene recognition can be improved with the help of privilege information extracted from depth image. Extensive experiments are conducted on two RGB-D scene recognition benchmarks including SUN RGB-D and NYUD2, the validity of the proposed method in this paper is verified.
更新日期/Last Update: 2021-03-15