SPOKEN LANGUAGE IDENTIFICATION FOR ETHIOPIAN LANGUAGES USING DEEP LEARNING APPROACHES

Shenkute, Tatek

st. Mary's University Institutional Repository

Please use this identifier to cite or link to this item: http://hdl.handle.net/123456789/8207

Title:	SPOKEN LANGUAGE IDENTIFICATION FOR ETHIOPIAN LANGUAGES USING DEEP LEARNING APPROACHES
Authors:	Shenkute, Tatek
Keywords:	Spoken Language Identification, Ethiopian Languages, Deep Learning, DNN, CNN, LSTM and BLSTM, Language Diversity, Speech Recognition.
Issue Date:	Jun-2024
Publisher:	St. Mary’s University
Abstract:	This thesis investigates the application of Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), and Bidirectional LSTM (BLSTM) algorithms for Spoken Language Identification (SLI) in the context of Ethiopian languages. With Ethiopia's rich linguistic diversity presenting a unique challenge, this research endeavors to develop robust models capable of accurately identifying spoken utterances across a spectrum of Ethiopian languages. The study involves the collection and preprocessing of a comprehensive dataset encompassing diverse linguistic variations and dialectal nuances prevalent within Ethiopian speech. Through rigorous experimentation and evaluation, the efficacy of DNNs, CNNs, LSTMs, and BLSTMs in classifying spoken language samples is assessed, considering factors such as model accuracy, computational efficiency, and generalization capability. Furthermore, particularly in scenarios with limited labeled data. The outcomes of this research not only contribute to the advancement of SLI technologies but also hold significant implications for communication systems, language preservation efforts, and cultural heritage preservation in Ethiopia and beyond. Our experiment results indicate that the BLSTM algorithm, utilizing MFCC features, performed best for the Ethiopian language identification dataset. Specifically, it achieved an accuracy of 87.5% for 30 seconds, 95% for 10 seconds, and the highest accuracy of 95% for 3 seconds, particularly for Amharic, Tigrigna, and Wolaytigna languages, surpassing other algorithms tested. And DNN model followed achieved the maximum accuracy with a value of 92.5% at a speech duration of 10 s, for all languages. We utilized Librosa library in Python on a CPU with (Hp pro 1 tera) and 8 GB of RAM to tests all experiments.
URI:	http://hdl.handle.net/123456789/8207
Appears in Collections:	Master of computer science

Files in This Item:

File	Description	Size	Format
final_spoken language identification by Tatek Shenkute.pdf		1.95 MB	Adobe PDF	View/Open

Show full item record