Multimodal Attention in Recurrent Neural Networks for Visual Question Answering

Lorena Kodra; Elinda Kajo Mece

Vol. 17 No. D1 (2017), Articles

Vol. 17 No. D1 (2017)

Multimodal Attention in Recurrent Neural Networks for Visual Question Answering

Articles

Published 2018-01-25

Lorena Kodra
Elinda Kajo Mece

Lorena Kodra

Elinda Kajo Mece

Multimodal Attention in Recurrent Neural Networks for Visual Question Answering

Article PDF

Keywords

visual question answering (VQA)
multimodal attention mechanism
convolutional neural networks (CNN)
recurrent neural networks (RNN)
long short-term

How to Cite

Lorena Kodra, & Elinda Kajo Mece. (2018). Multimodal Attention in Recurrent Neural Networks for Visual Question Answering. Global Journal of Computer Science and Technology, 17(D1), 1–8. Retrieved from https://gjcst.com/index.php/gjcst/article/view/639

Abstract

Visual Question Answering VQA is a task for evaluating image scene understanding abilities and shortcomings and also measuring machine intelligence in the visual domain Given an image and a natural question about the image the system must ground the question into the image and return an accurate answer in a natural language A lot of progress has been done to address the challenges of this task by combining latest advances in image representation and natural language processing Several recently proposed solutions include attention mechanisms designed to support reasoning These mechanisms allow models to focus on specific part of the input in order to generate the answer and improve its accuracy In this paper we present a novel LSTM architecture for VQA that uses multimodal attention to focus over specific parts of the image and also on specific question words to generate the answer We evaluate our model on the VQA dataset and demonstrate that it performs better than state of the art We also make a qualitative analysis of the results and show the abilities and shortcomings of our model

Article PDF

This work is licensed under a Creative Commons Attribution 4.0 International License.

Similar Articles

Md. Anwar Hossain, Md. Shahriar Alam Sajib, Classification of Image using Convolutional Neural Network (CNN) , Global Journal of Computer Science and Technology: Vol. 19 No. D2 (2019): GJCST-D Neural & AI: Volume 19 Issue D2
Satish Gajawada, Arun Kumar, Dr. Maria Celestina Vanaja, Baby Supriya Sri Valikala, Artificial Heart Neural Networks – An Idea , Global Journal of Computer Science and Technology: Vol. 21 No. D1 (2021): GJCST-D Neural & AI: Volume 21 Issue D1
G Sunilkumar, A Novel Approach to Detect Malicious User Node by Cognition in Heterogeneous Wireless Networks , Global Journal of Computer Science and Technology: Vol. 14 No. E2 (2014): GJCST-E Network, Web & Security: Volume 14 Issue E2
Md. Anwar Hossain, Md. Mohon Ali, Recognition of Handwritten Digit using Convolutional Neural Network (CNN) , Global Journal of Computer Science and Technology: Vol. 19 No. D2 (2019): GJCST-D Neural & AI: Volume 19 Issue D2
Thomas L. Hemminger, Using Neural Networks to Design Transistor Amplifier Circuits , Global Journal of Computer Science and Technology: Vol. 18 No. D1 (2018): GJCST-D Neural & AI: Volume 18 Issue D1
Jay Suthar, Devansh Parikh, Tanya Sharma, Avi Patel, Sign Language Recognition for Static and Dynamic Gestures , Global Journal of Computer Science and Technology: Vol. 21 No. D2 (2021): GJCST-D Neural & AI: Volume 21 Issue D2
Joy Nkechinyere Olawuyi, Afolabi B.Samuel, Domain Specific Deep Neural Network Model for Classification of Abnormalities on Chest Radiographs , Global Journal of Computer Science and Technology: Vol. 23 No. D1 (2023): GJCST-D Neural & AI: Volume 23 Issue D1
Thomas L. Hemminger, Thomas L. Hemminger, A Neural Network Approach to Transistor Circuit Design , Global Journal of Computer Science and Technology: Vol. 16 No. D1 (2016): GJCST-D Neural & AI: Volume 16 Issue D1
Luis Barreto, WE TCP-AP: Wireless Enhanced TCP-AP , Global Journal of Computer Science and Technology: Vol. 16 No. E6 (2016): GJCST-E Network, Web & Security: Volume 16 Issue E6
Alabi Peter Akubo, Efficient Handoff for QoS Enhancement in Heterogeneous Wireless Networks (UMTS/WLAN Interworking) , Global Journal of Computer Science and Technology: Vol. 21 No. E1 (2021): GJCST-E Network, Web & Security: Volume 21 Issue E1

1 2 3 4 5 6 7 8 9 10 > >>

You may also start an advanced similarity search for this article.

Multimodal Attention in Recurrent Neural Networks for Visual Question Answering

Keywords

How to Cite

Download Citation

Abstract

Similar Articles