Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Volume 6, Issue 5, May – 2021 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Malayalam Speech Recognition


Abhirami K Aiswarya P S
Dept. of Computer Science and Engineering Dept. of Computer Science and Engineering
Sahrdaya College of Engineering and Technology Sahrdaya College of Engineering and Technology
Kodakara, India Kodakara, India

Aishwariya D P
Dept. of Computer Science and Engineering
Sahrdaya College of Engineering and Technology
Kodakara,India

Abstract:- The project is based on the development of learning is used to classify the speech. Machine learning is
state-of-the-art large vocabulary continuous speech to make computers able to learn problems and solve
recognition (LVCSR) system for the Malayalam problems on their own. In our project the model is created
language. Problems of existing speech recognition are using tensor flow. Tensor flow is one machine learning
lack of accuracy and misinterpretation, time cost and approach. It is open source. It has a comprehensive, flexible
productivity, accents and speech recognition, ecosystem of tools, libraries and community resources that
background noise interference. The simulation of human let researchers push the state-of-the-art in Machine Learning
intelligence in computers refers to artificial intelligence and developers can easily build and deploy Machine
(AI) which includes Machine Learning, Natural Learning powered applications. Here we have also used a
Language Processing, Computer Vision and Robotics. In transfer Learning approach. A transfer learning method is
audio files or video files that are large and have minutes used to transfer two types of knowledge to different datasets.
in length, many files have a variety of audio and audio
files. In this project, transfer flow technique is used.So A. Motivation
the aim of the proposed system of speech recognition is The state of Kerala has 14 revenue districts. Each
to collect thousands of datasets of each category district has its own way of speaking Malayalam which is the
irrespective of their gender and also they can be of any mother tongue of the state. The state is also popularly called
age group and train them according to their native as land of backwaters which makes it a spot for tourists.
sequence so as to increase the accuracy level. Even though Malayalam is spoken all over the state, it's
occasionally hard for people to recognise the language.
Keywords:- Artificial Intelligence; Machine Learning; Hence with our software one could recognise the slang being
Large Vocabulary Continuous Speech Recognition; Support used.
Vector Machine, Tensor Flow.
B. Proposed system
I. INTRODUCTION The Malayalam language is a language that people use
in different slangs. People from each part of Kerala use a
Speech is a simple and usable technique of different slang. For each common word people from each
communication between humans, but nowadays humans region use different slang to pronounce it or say something
aren't limited to connecting but even to the different completely different. The Google assistant which we could
machines in our lives. The most important is the computer. revoke by saying OK Google is used to translate words and
So, this communication technique is often used between sentences of different languages to the language we request.
computers and humans. This interaction is completed It can't understand or differentiate different slangs or the
through interfaces, this area is called Human-Computer words that are used by the people throughout these regions.
Interaction (HCI).Presently, computers have already It's often misunderstood by Google Assistant on the words
replaced a tremendous number of humans in many creative used by the people. The accuracy hence is low. The problem
professions. Speech recognition can be predicted using a of existing speech recognition is a lack of accuracy and
computer. Our project focuses on the development of state- misinterpretation, time cost and productivity, accents and
of-the-art large vocabulary continuous speech recognition speech recognition, background noise interference. Here the
(LVCSR) systems for the Malayalam language. We choose transfer learning approach is used.
to listen to the desired sound from a large file. Here machine

IJISRT21MAY1093 www.ijisrt.com 1323


Volume 6, Issue 5, May – 2021 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

FIG 1

II. METHODOLOGY converted into spectrograms, then we input them to a CNN


plus linear classifier model which produces predictions of
The project was implemented using google cloud the class in which they belong. The simplest and easiest way
platform. The Google cloud platform is a suite for cloud to explain the process of audio classification is that: firstly
computing services provided by Google. It runs on the same the audio is collected; it is then converted into the form of a
infrastructure that Google uses internally for its end user’s spectrogram. Later it is then inputted to CNN
products such as Google search, File Manager, Gmail, architecture.After further procedures it is then inputted to
Classroom et cetera. It provides modular cloud services linear classifier which helps in classifying those audio into
including computing, data storage, analytics and machine its respective labels or classes. It requires understanding of
learning.It provides infrastructure as a service ,serverless the underlying frequency structure of acoustic signals. Here
computing environment, and platform as service. It is part of we need to build a model which has knowledge of features
google cloud service. for each audio class so that during the evaluation, it
understands and classifies a given audio segment into
The whole project was also done on corresponding class.
TeachableMachine as a trial for checking the credibility of
the project and for also getting a more wide picture of the BERT is a natural language processing model
project’s working.It is also a product of Google which proposed by researchers at Google Research 2018. The main
actually helps anyone who is interested in learning and reason for the good performance of the BERT model is the
conducting experiments or doing any projects related to AI. use of semi-supervised learning. The model is trained for a
It helps one train their computer to recognise images, audios specific task enabling it to understand the pattern of the
and also helps in creating snake games. language.It has language processing capabilities that can be
used to empower other models. It is basically an encoder
The audio classification problem is used here to stack of transformer architecture. An encoder-decoder
classify the slangs. It is also called acoustic event detection. network which uses self attention on the deoder side. Long-
It is the process of listening and analysing the audio short term memory is an artificial recurrent neural network
recordings. For classification, audio recordings are architecture used in deep learning.

IJISRT21MAY1093 www.ijisrt.com 1324


Volume 6, Issue 5, May – 2021 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
III. RESULT IV. CONCLUSION

In this work, we proposed a new method for


malayalam speech recognition. Our system automatically
detects the district. Our system uses the bert model. Model is
trained in the gcp. The application is more important to the
people who can’t understand the slang. After creating the
model we achieved an accuracy of 70%.

FUTURE WORK

Now we have created our model using the bert model.


In the future we will create our model using the mum model.
Also the dataset was less. We will collect more voices, so
that we can increase the accuracy.

REFERENCES

[1]. Yuki Takashima, Ryoichi Takashima, Tetsuya


Takiguch and Yasuo uriki, ,”Knowledge transferability
between the data of the persons with Dysarthria
speaking different languages of dysarthric speech
recognition”, IEEE Access, vol. 7, pp. 164320 -
164326, April 2020.
[2]. G. E. Dahl, D. Yu, L. Deng and A. Acero, "Large
vocabulary continuous speech recognition with
context-dependent DBN-HMMs", Proc. IEEE Int.
Conf. Acoust. Speech Signal Process., pp. 4688-4691,
May 2011.
[3]. S. Chandrakala and N. Rajeswari, "Representation
learning based speech assistive system for persons with
FIG 2: OUTPUT AFTER TRAINING THE MODEL dysarthria",IEEE Trans. Neural Syst Rehabil Eng, Vol.
25, pp. 1510-1517, Sep. 2017.
[4]. S. J. Pan and Q. Yang,, "A survey on transfer
learning", IEEE Trans.Knowl.Data Eng, vol. 22, pp.
1345-1359, Oct. 2010.
[5]. J. Duffy, Motor Speech Disorders: Substrates
Differential Diagnosis and Management, New York,
NY, USA:Elsevier, 2013.

FIG 3: AFTER DEPLOYING THE MODEL

IJISRT21MAY1093 www.ijisrt.com 1325

You might also like