Chandan Rani S R, Kavya N P
DOI: 10.5110/77. 1103 Page: 165-182 Vol: 19 Issue: 01 Year: 2024
In the past, the primary means of conveying visual information relied on text-based explanations or visual cues. However, these approaches are not accessible to individuals with visual impairments or those who do not understand the language used for visual information. To address this communication barrier, the concept of speech-to-image translation was introduced, leveraging machine learning and deep learning techniques. This study focuses on the introduction of a speech-to-image translation method using machine learning and deep learning. A dataset was collected containing speech samples paired with corresponding images. The speech data was converted into text using Automatic Speech Recognition (ASR) techniques. Globally, a comprehensive analysis of classification-based deep learning models for image translation is developed and conducted by various researchers. This analysis covered their performance, datasets used, and feature extraction methods. The utilization of deep learning for speech-to-image translation presents several challenges, which are also discussed. Finally, potential areas for future research in this field are identified and presented alongside the listed challenges. In summary, this abstract highlight the introduction of a speech-to-image translation method using machine learning and deep learning. It outlines the dataset collection process, conversion of speech to text, and the analysis of existing deep learning models for image translation. The challenges associated with deep learning-based speech-to-image translation are acknowledged, and potential future research directions are identified.
Automated Speech Recognition, Deep learning, Machine learning, Speech-to-image translation, Visual Information.
Received: 06 January 2024
Accepted: 22 January 2024
Published: 29 January 2024