[1]孙 宁,王龙玉,刘佶鑫,等.结合特权信息与注意力机制的场景识别[J].郑州大学学报(工学版),2021,42(01):42-49.[doi:10.13705/j.issn.1671-6833.2021.01.007]
 SUN Ning,WANG Longyu,LIU Jixin,et al.Scene Recognition Based on Privilege Information and Attention Mechanism[J].Journal of Zhengzhou University (Engineering Science),2021,42(01):42-49.[doi:10.13705/j.issn.1671-6833.2021.01.007]
点击复制

结合特权信息与注意力机制的场景识别()
分享到:

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

卷:
42卷
期数:
2021年01期
页码:
42-49
栏目:
出版日期:
2021-03-14

文章信息/Info

Title:
Scene Recognition Based on Privilege Information and Attention Mechanism
作者:
孙 宁1王龙玉12刘佶鑫1韩 光1
1.南京邮电大学宽带无线通信技术教育部工程研究中心;2.南京邮电大学通信与信息工程学院;

Author(s):
SUN Ning1 WANG Longyu 12 LIU Jixin1 HAN Guang1
1.Engineering Research Center of Wideband Wireless Communication Technology of Ministry of Education, Nanjing University of Posts and Telecommunications, Nanjing 210003, China; 2.School of Communication and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
关键词:
场景识别 特权信息 注意力 卷积神经网络
Keywords:
scene recognition privilege information attention mechanism convolutional neural network
分类号:
TN911.73
DOI:
10.13705/j.issn.1671-6833.2021.01.007
文献标志码:
A
摘要:
在场景识别中,RGB图像呈现外观信息,深度(Depth)图像包含几何信息,两者互为补充。为了在只有RGB图像的测试阶段也能利用深度图像与RGB图像所包含的互补信息,本文以深度图像为特权信息,提出了一种端到端可训练的深度神经网络模型,用以结合特权信息和注意力机制。在该模型中,以图像编码到特征解码再到图像编码为架构,建立了由RGB图像到深度图像再到深度图像高层语义特征的映射关系。通过注意力机制,将RGB图像高层语义特征与对应的深度图像高层语义特征进行融合,输入分类网络,最终得到预测结果。在测试时,只需要输入RGB图像,便可在本文模型获取的深度图像特权信息的帮助下,提升场景识别的性能。通过在SUN RGB-D,NYUD2两个RGB-D场景识别数据库上进行的大量实验,验证本文方法的有效性。
Abstract:
In the scene recognition, RGB images present appearance information and depth image contains geometry information,which complement each other. In order to use the complementary information contained in the depth images and the RGB images in the test phase with only RGB images, this paper uses the depth image as the privilege information, and proposes an end-to-end trainable deep neural network model to combine the privilege information and attention mechanism. In the proposed method, the image encoding, feature decoding and then image encoding are used as the fr<x>amework to establish a mapping relationship from RGB images to depth images to high-level semantic features of depth images. By using of the attention mechanism, the high-level semantic features of RGB images are fused with the corresponding high-level semantic features of the depth image. And these two features are fed into the classification network to make the final prediction. In the test phase, only RGB images need to be used, the performance of scene recognition can be improved with the help of privilege information extracted from depth image. Extensive experiments are conducted on two RGB-D scene recognition benchmarks including SUN RGB-D and NYUD2, the validity of the proposed method in this paper is verified.

参考文献/References:

[1] JIA D, WEI D, RICHARD S, et al. Imagenet: a large-scale hierarchical image database[C]//2009 IEEE Conference on Computer Vision and Pattern Re-cognition. New York: IEEE, 2009: 248-255.

[2] ZHOU B L, LPEDRIZA A, XIAO J X, et al. Learning deep features for scene recognition using places database[J]. Advances in neural information processing systems,2015,1: 487-495.
[3] ZHOU B, LAPEDRIZA A, KHOSLA A, et al. Places: a 10 million image database for scene recognition[J]. IEEE transactions on pattern analysis and machine intelligence, 2018, 40(6): 1452-1464.
[4] HOFFMAN J, GUPTA S, DARRELL T. Learning with side information through modality hallucination[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 826-834.
[5] WANG A, CAI J, LU J, et al. Modality and component aware feature fusion for RGB-D scene classification [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 5995-6004.
[6] VAPNIK V, VASHIST A. A new learning paradigm: learning using privileged information[J]. Neural networks, 2009, 22(5/6): 544-557.
[7] XIONG Z T, YUAN Y, WANG Q. MSN: modality separation networks for RGB-D scene recognition[J]. Neurocomputing, 2020, 373: 81-89.
[8] DU D, WANG L, WANG H, et al. Translate-to-recognize networks for RGB-D scene recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2019: 11836-11845.
[9] SHARMANSKA V, QUADRIANTO N, LAMPERT C H. Learning to rank using privileged information[C]//2013 IEEE International Conference on Computer Vision. New York: IEEE, 2013: 825-832.
[10] GARCIA N C, MORERIO P, MURINO V. Learning with privileged information via adversarial discriminative modality distillation[J]. IEEE transactions on pattern analysis and machine intelligence, 2020,42(10): 2581-2593.
[11] WANG F, JIANG M, QIAN C, et al. Residual attention network for image classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2017: 3156-3164.
[12] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2018: 7132-7141.
[13] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 770-778.
[14] GUPTA S, GIRSHICK R, ARBELwidth=9,height=12,dpi=110EZ P, et al. Learning rich features from RGB-D images for object detection and segmentation[C]//European Conference on Computer Vision. Berlin: Springer, 2014: 345-360.
[15] KINGMA D P, BA J. Adam: a method for stochastic optimization[EB/OL]. (2014-12-22)[2020-05-15].https://arxiv.org/abs/1412.6980.
[16] SONG S, LICHTENBERG S P, XIAO J. Sun RGB-D: a RGB-D scene understanding benchmark suite[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2015: 567-576.
[17] SILBERMAN N, HOIEM D, KOHLI P, et al. Indoor segmentation and support inference from RGBD images[C]//European Conference on Computer Vision. Berlin: Springer, 2012: 746-760.
[18] LIAO Y, KODAGODA S, WANG Y, et al. Understand scene categories by objects: A semantic regularized scene classifier using convolutional neural networks[C]//2016 IEEE International Conference on Robotics and Automation (ICRA). New York: IEEE, 2016: 2318-2325.
[19] SONG X, HERANZ L, JIANG S Q. Depth CNNs for RGB-D scene recognition: learning from scratch better than transferring from RGB-CNNs[EB/OL]. (2018-01-21)[2020-05-15]. https://arxiv.org/abs/1801.06797.
[20] SONG X H, JIANG S Q, HERRANZ L. Combining models from multiple sources for RGB-D scene recognition[C]// International Joint Conference on Artificial Intelligence. Melbourne,Australia:IJCAI,2017: 4523-4529.
[21] DU D, XU X, REN T, et al. Depth images could tell us more: enhancing depth discriminability for RGB-D scene recognition[C]//2018 IEEE International Conference on Multimedia and Expo (ICME). New York: IEEE, 2018: 1-6.
[22] LI Y B, ZHANG J G, CHENG Y H, et al. DF2Net: discriminative feature learning and fusion network for RGB-D indoor scene classification[C]//The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI). New Orleans: AAAI, 2018:7041-7048.

更新日期/Last Update: 2021-03-15