TITLE:               

A Comprehensive Analysis on Speech to Image Translation using Deep Learning Models

AUTHORS:     

Chandan Rani S R, Kavya N P

DOI10.5110/77. 1103               Page:   165-182           Vol: 19    Issue: 01   Year: 2024

creative commons, cc, character-785334.jpg   

ABSTRACT

In the past, the primary means of conveying visual information relied on text-based explanations or visual cues. However, these approaches are not accessible to individuals with visual impairments or those who do not understand the language used for visual information. To address this communication barrier, the concept of speech-to-image translation was introduced, leveraging machine learning and deep learning techniques. This study focuses on the introduction of a speech-to-image translation method using machine learning and deep learning. A dataset was collected containing speech samples paired with corresponding images. The speech data was converted into text using Automatic Speech Recognition (ASR) techniques. Globally, a comprehensive analysis of classification-based deep learning models for image translation is developed and conducted by various researchers. This analysis covered their performance, datasets used, and feature extraction methods. The utilization of deep learning for speech-to-image translation presents several challenges, which are also discussed. Finally, potential areas for future research in this field are identified and presented alongside the listed challenges. In summary, this abstract highlight the introduction of a speech-to-image translation method using machine learning and deep learning. It outlines the dataset collection process, conversion of speech to text, and the analysis of existing deep learning models for image translation. The challenges associated with deep learning-based speech-to-image translation are acknowledged, and potential future research directions are identified.

Keywords:

Automated Speech Recognition, Deep learning, Machine learning, Speech-to-image translation, Visual Information.

Received: 06 January 2024

Accepted: 22 January 2024

Published: 29 January 2024