Keywords: | Spoken Language Identification, Ethiopian Languages, Deep Learning, DNN, CNN, LSTM and BLSTM, Language Diversity, Speech Recognition. |
Abstract: | This thesis investigates the application of Deep Neural Networks (DNN), Convolutional Neural
Networks (CNN), Long Short-Term Memory (LSTM), and Bidirectional LSTM (BLSTM) algorithms
for Spoken Language Identification (SLI) in the context of Ethiopian languages. With Ethiopia's
rich linguistic diversity presenting a unique challenge, this research endeavors to develop robust
models capable of accurately identifying spoken utterances across a spectrum of Ethiopian
languages. The study involves the collection and preprocessing of a comprehensive dataset
encompassing diverse linguistic variations and dialectal nuances prevalent within Ethiopian
speech. Through rigorous experimentation and evaluation, the efficacy of DNNs, CNNs, LSTMs,
and BLSTMs in classifying spoken language samples is assessed, considering factors such as model
accuracy, computational efficiency, and generalization capability. Furthermore, particularly in
scenarios with limited labeled data. The outcomes of this research not only contribute to the
advancement of SLI technologies but also hold significant implications for communication systems,
language preservation efforts, and cultural heritage preservation in Ethiopia and beyond. Our
experiment results indicate that the BLSTM algorithm, utilizing MFCC features, performed best for
the Ethiopian language identification dataset. Specifically, it achieved an accuracy of 87.5% for 30
seconds, 95% for 10 seconds, and the highest accuracy of 95% for 3 seconds, particularly for
Amharic, Tigrigna, and Wolaytigna languages, surpassing other algorithms tested. And DNN
model followed achieved the maximum accuracy with a value of 92.5% at a speech duration of 10
s, for all languages. We utilized Librosa library in Python on a CPU with (Hp pro 1 tera) and 8 GB
of RAM to tests all experiments. |