Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Development of GUI for Text-to-Speech Recognition

using Natural Language Processing

Partha Mukherjee Soumen Santra Subhajit Bhowmick


Dept of Computer Application Dept of Computer Application Dept of Computer Application
Techno India College of Technology Techno India College of Technology Techno India College of Technology
Kolkata, INDIA Kolkata, INDIA Kolkata, INDIA
[email protected] [email protected] [email protected]

Ananya Paul Pubali Chatterjee Arpan Deyasi


Dept of Computer Application Dept of Computer Application Dept of Electronics and Comm. Engg.
Techno India College of Technology Techno India College of Technology RCC Institute of Information Technology
Kolkata, INDIA Kolkata, INDIA Kolkata, INDIA
[email protected] [email protected] [email protected]

Abstract—Natural language processing is a widely used character recognition process is introduced in TTS system by
technique by which systems can understand the instructions for Thu [2] with the intension to develop image to speech
manipulating text or speech. In the present paper, a Text-to- conversion system. Pros and cons of interactive voice response
speech synthesizer is developed that converts text into spoken system are reviewed by researchers [3] which are used in
word, by analysing and processing it using Natural Language different day-to-day applications. Different methods are also
Processing (NLP) and then using Digital Signal Processing (DSP) compared [4] for online systems in terms of user efficiency.
technology to convert this processed text into synthesized speech Concatenation method is further developed for local language
representation of the text. Here we developed a useful text-to- [5] with better accuracy which is an extension of work of
speech synthesizer in the form of a simple application that
Kanisha [1]. Different speech synthesis processes are also
converts inputted text into synthesized speech and reads out to
discussed by Htun and his co-workers [6]. Text inside any
the user which can then be saved as an mp3 file.
image is recently tracked by camera, and converted to speech
Keywords—Text-to-speech; Natural language processing; by Patil et. al. [7] for multilingual language. It is pointed out
Speech synthesizer; Speech recognition; Signal transformation that this method is very useful for blind persons for detecting
currency notes [8]. Finger-mounted camera provides a great
relief in these cases [9].
I. INTRODUCTION
Speech is the first primary mode of communication in The TTS synthesis is a procedure where first text analysis
Human Intelligent System (HIS) where NLP plays a role with and then generate the waveform of speech. Here it converts
many aspects of the field deal with linguistic natures of this phonetic and prosodic information into a wave form based
computation. NLP is a way of research and application that upon approximation formula. The amplitude of each signal
explains how a system (mainly a computer) can be used to which forms from the speech waves is measured and creates
understand, identify and manipulates a natural language. TTS the proper speech. Those speech waves are linguistic approach
is the automatic conversion which configures the concept of of texts. Sometimes these forms are linguistic or non-linguistic
speech recognition, speech analysis, speech synthesis, speech in nature. In the present paper, a synthesizer is developed
tuning, speech alteration etc. Here TTS use to convert a text which, apart from TTS conversion, saves the file into mp3
into speech that resembles, as closely as possible for a native format.
speaker of the language who trying to read that text. TTS is
the technology by which a computer can speak to user and II. OVEREVIEW OF SPEECH SYNTHESIS
give the computed information. TTS system acquires the text Speech synthesis is one of the artificial computations of
as input and then a computer algorithm which called TTS producing human voice. A TTS system converts any text
engine analyses the text, pre-processes the text and synthesizes followed by grammatical language into speech. Synthesized
the speech with some mathematical models. The TTS engine speech is a collection of small pieces of recorded speech
usually generates sound data in an audio format as the output. which are stored in a knowledge base (KB). This KB System
This TTS system also worked upon Natural Language differs in the size based on the stored speech units. That
Generator (NLG). system also maintains the speech quality based upon its
Kanisha [1] explained an innovative way for STS for algorithm by which it analysed the tree of speech units for
visually impaired people through voice signal. Optical better clarity. Alternatively, a synthesizer can be proper in

978-1-5386-5550-4/18/$31.00 ©2018 IEEE


such a way that the system must control the vocal peach and
distinguished it from other human voice list to create a
completely "different synthetic" voice output. These are the
significant features for a good quality of a speech synthesizer.
A TTS system (or "voice generated-engine") is composed
of one interface as a front-end and a back-end. The first
interface converts raw text containing alpha-numeric symbols
like numbers and abbreviations in terms of speech into the
equivalent of out words. Here analysis of text includes various
features such as recognition of text unit, normalization of text
unit pattern, pre-processing, etc. The front interfaces always
engage with phonetic conversion to each unit, and divides and
marks to form a speech tree or pattern tree using the speech
unit which configures the tune and rhythm through phrases,
clauses, and sentences. This process of transcriptions is known
as text-to-phoneme (TTP) or grapheme-to-phoneme (GTP)
conversion [6]. These two conversions are together known as
symbolic linguistic representation which is the desired output
of TTS. Whereas in the back side the symbolic linguistic
representation converts into sound. Sometimes this back end
computes pitch analysis, contour analysis, rhythm analysis
etc., for output speech.
Speech synthesis is done in many ways such as
Concatenative Synthesis (Unit-selection Synthesis, Diphone
Synthesis, and Domain Specific Synthesis), Formant
Synthesis, Articulatory Synthesis, HMM-based Synthesis,
sinewave Synthesis [6-8] etc.

III. TTS SYNTHESIS MODEL


TTS synthesis takes place in several steps. The TTS
systems gets a phrase or collection of phrases as input, which
it first must analyses and then converts into a phonetic
description. Then in a further step it generates the set of tune,
pitch and rhythm (known as prosody). From the following
data which are produced in the form unit of speech tree or
Fig. 1. TTS synthesis model
speech space, it can form a speech signal.
The structure of the TTS synthesizer can be broken down
into major modules:
NLP module: It produces a phonetic transcription of the
text read, together with prosody.
DSP module: It generates the symbolic representation
which receives from NLP into audible and intelligible speech
known as NLG.

Fig. 2. Speech Synthesis Model


IV. DESIGN AND IMPLEMENTATION can enter our text, ( if we want to manually enter some random
Our designed software is called the TTS Gramaty, a simple data over there as input. If not, then we can use the BROWSE
application with the text to speech functionality. The system button to import some text file content. In either case, we must
was developed using C# language upon .Net Framework 3.5. click on the SPEAK NOW button to start the reading process.

The application is divided into two main modules - the Let’s see how the import occurs using browse button:
main application module which includes the basic GUI
components which handles the basic operations of the
application such as input of parameters for conversion either
via file or direct keyboard input.
The second module, the main conversion engine which
integrated into the main module is for the acceptance of data
hence the conversion.
TTS Gramaty (TTSG) converts text to speech either by
typing the text into the text field provided or by coping from
an external document in the local machine and then pasting it
in the text field provided in the application. It also provides a
functionality that allows the user browse and open a text
document in the machine. TTSG then loads the document’s
text in the text area of the application and the reading
procedure starts automatically.
TTSR contains an exceptional function that gives the user
the choice of saving its already converted text to any part of
the local machine in an audio format; this allows the user to
copy the audio format to any of his/her audio devices, so that
they can hence forth treat it as an audio book.
The following figure depicts the loading procedure of the
TTS Gramaty.
Fig. 4. Import of file using TTS Gramaty

Now open the .txt file

Fig. 3. Screenshot of the TTS Gramaty Interface

This is the default view of the application. This screen


appears in the full screen mode when the application is
launched. As we can see, there are several options and buttons
present in the application window, each having different Fig. 5. Opening of file using TTS Gramaty
functions. The with text area in the middle is there, where we
When opened, the contents of the .txt gets automatically have plans to make it a web based real-time synthesis system,
loaded into the text area of the application. so that its uses can get more expanded.
Now, before we get any further, let us get some knowledge
about the controls. References
[1] J. Kanisha, G. Balakrishanan, “Speech Transaction for Blinds Using
As we can see, there is a volume controller to control the Speech-Text-Speech Conversions”, Communications in Computer and
audio output volume, a Speech Speed Controller, to control Information Science book series (CCIS), vol 131, part I, pp. 43-48, 2011
the speech rate. Speak now, Pause, Resume and stop buttons [2] C. S. T. Thu, T. Zin, “Implementation of Text to Speech Conversion”,
with their respective purpose. Below that we have a International Journal of Engineering Research & Technology, vol. 3(3),
microphone icon that allows us to generate the audio copy of pp. 911-915, 2014
the total text document. Below that we have the dropdown list, [3] P. S. Shetake, S. A. Patil, P. M. Jadhav, “Review of Text To Speech
from where we can select the quality of the output audio file. Conversion Methods”, International Journal of Industrial Electronics and
Electrical Engineering, vol. 2(8), pp. 29-35, 2014
At last we have the Synthesizer Status that displays the status
[4] P. Khilari, V. P. Bhope, “A Review on Speech To Text Conversion
of the synthesizer, is it idle, paused or running. Methods”, International Journal of Advanced Research in Computer
Engineering & Technology, vol. 4(7), pp. 3067-3072, 2015
V. CONCLUSION [5] A. Joshi, D. Chabbi, M. Suman, S. Kulkarni, “Text To Speech System
for Kannada Language”, International Conference on Communications
TTS synthesis is a flexible robust dynamic growing aspect and Signal Processing, 2015
of modern computer era and it is increasingly playing a more [6] H. M. Htun, T. Zin, H. M. Tun, “Text To Speech Conversion using
significance role in the way we interact with the system and Different Speech Synthesis”, International Journal of Scientific &
interfaces which is based on platform independent concept. Technology Research, vol. 4(7), pp. 104-108, 2015
We have identified the various operations and processes [7] S. Patil, M. Phonde, S. Prajapati, S. Rane, A. Lahane, “Multilingual
involved in text to speech synthesis. We have also developed a Speech and Text Recognition and Translation using Image”,
International Journal of Engineering Research & Technology, vol. 5(4),
very simple and attractive graphical user interface which pp. 85-87, 2016
allows the user to type in his/her text provided in the text field
[8] D. B. K. Kamesh, S. Nazma, J. K. R. Sastry, S. Venkateswarlu, “Camera
in the application. Our system interfaces with a text to speech based Text to Speech Conversion, Obstacle and Currency Detection for
engine developed for American English. In future, we plan to Blind Persons”, Indian Journal of Science and Technology, vol 9(30),
make efforts to create engines for conversion of one language pp. 1-5, 2016
to other make text to speech technology more accessible to a [9] B. Sanjana, J. R. Parvin, “Voice Assisted Text Reading System for
wider range. Accuracy of the software is excellent in the Visually Impaired Persons Using TTS Method”, IOSR Journal of VLSI
context of its ability to work in real-life environment. We also and Signal Processing, vol. 6(3), pp. 15-23, 2016

You might also like