Title: DEVELOPING VISUAL QUESTIONS ANSWERING MODELS USING NATURAL LANGUAGE PROCESSING AND OPTIMIZED LEARNING MODEL
Journal of Artificial Intelligence and Data Science Techniques
© 2024 by jaidst - PROVINCE Publications
ISSN: 3029-2794
Volume 01, Issue 04
Year of Publication : 2024
Page: [1 - 14]
Ali Fahim and Ahmed Rashid
Senior Researcher, Department of Information Systems, American University of Sharjah, UAE
Assistant Professor, Department of Computer Science, Khalifa University, UAE
As an aspect of AI, Visual Question Answering (VQA) integrates computer vision and natural language processing. It involves designing computer-based systems capable of automatically answering questions about images. Recently, VQA has received increasing attention owing to its potential to narrow the gap between image understanding and Natural Language Processing (NLP). However, the traditional models of VQA need better interpretations of meaning from complex visual data and hence formulate answers that are only sometimes contextually relevant; this seriously limits scalability and precision. This paper proposes a new approach called VQA-NLPFA, which seeks to overcome these limitations by developing an optimized VQA model that embeds NLP techniques into an advanced optimized learning model like the Firefly Algorithm. It can combine visual and textual information effectively, as the approach takes advantage of techniques related to multimodal data fusion. This model uses an attention mechanism using deep learning strategies that focus on the salient regions of the image, considering factors necessary for understanding the question. Hybrid algorithms optimize the learning model for better training speed and accuracy by reducing overfitting and enhancing feature selection. The preliminary experiments show that the proposed model of VQA-NLPFA outperforms traditional models with remarkably improved accuracies from difficult visual questions. The enhanced capability of understanding the context and generating accurate and more context-aware answers is accomplished. An optimized learning model further reduces the computational cost by a great amount, making the system much more scalable for real-world applications.
Visual questions and answers, Natural language processing, Firefly algorithm, Multimodal Data Fusion, Feature Selection, Image Understanding.