Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

KINGS ENGINEERING COLLEGE

(Approved by AICTE – New Delhi & Affiliated to Anna University - Chennai)

Irungattukottai, Sriperumbudur, Chennai - 602117

A MINI-PROJECT REPORT
ON
“Handwritten Digits Recognition”

Submitted by

210818104007 T. Anusha
210818104015 L. Jayalakshmi

Department of Computer Science & Engineering


Kings Engineering College
APRIL 2021

i
KINGS ENGINEERING COLLEGE
Irungattukottai, Sriperumbudur, Chennai – 602117

Department of Computer Science & Engineering

CERTIFICATE

Certified that the mini-project work entitled “Handwritten Digits


Recognition” is a bonafide work carried out by

210818104007 T. Anusha

210818104015 L. Jayalakshmi

The report has been approved as it satisfies the academic requirements in


respect of mini-project work prescribed for the course.

……………………… …………………………

Faculty Guide Mini-Project Coordinator

……………...…………………………
Head of the Department
ii
ACKNOWLEDGEMENT

We thank our God for all his blessings showered on us and also for giving
us good knowledge and strength in enabling us to finish our project. Our deep
gratitude goes to our founder (Late) D. SELVARAJ M.A., M.L.A. for his
patronage in the completion of our project.

We take this opportunity to thank our honorable chairperson our beloved


madam Dr. S. NALINI SELVARAJ M.Com., M.Phil., and our honorable
director Mr. S. AMIRTHARAJ, B.E., M.B.A., for their support given to us to
finish our project successfully. We wish to express our sincere thanks to our
principal Dr. T. JOHN ORAL BHASKAR, M.E., Ph.D. for his kind
encouragement and his interest towards us.

We are extremely grateful to Head of Computer Science and Engineering


department DR. D. C. JULLIE JOSEPHINE, MTech., Ph.D., for her valuable
suggestions, guidance and encouragement. I wish to express my dear sense of
gratitude and sincere thanks to my Guide who help us for the successful
completion of our project Ms. S. SUHASINI, B.E., M.E., Associate professor,
Department of Computer Science and Engineering.

We express our sincere thanks to my parent, friends and staff members


who have helped and encouraged us during the entire course of completing this
project work successfully.

iii
ABSTRACT

Nowadays, more and more people use images to represent and transmit
information. It is also popular to extract important information from images.
Image recognition is an important research area for its widely applications. In the
relatively young field of computer pattern recognition, one of the challenging
tasks is the accurate automated recognition of human handwriting. Optical
Character Recognition (OCR) is a subfield of Image Processing which is
concerned with extracting text from images or scanned documents. In this project,
we have chosen to focus on recognizing handwritten digits available in the
MNIST database. The challenge in this project is to use basic Image Correlation,
also known as Matrix Matching, techniques in order to maximize the accuracy of
the handwritten digits recognizer without going through sophisticated techniques
like machine learning.

Key Words: Image Processing, Optical Character Recognition, Handwritten


Digits, Image Correlation, Matrix Matching, Machine Learning

iv
TABLE OF CONTENTS

CHAPTER NO TITLE PAGE NO

ABSTARCT iv

LIST OF FIGURES vii

LIST OF ABBREVIATION viii

1. INTRODUCTION 1

1.1 OBJECTIVES 2

1.2 SCOPE 2

2. LITERATURE SURVEY 3

3. DESIGN AND IMPLEMENTATION 6

3.1 Image Processing 6

3.2 Optical Character Recognition 7

3.3 Methods Used in OCR 7

3.3.1 Machine Learning 8

3.3.2 Artificial Neural Network 8

3.3.3 Support vector Machine 9

3.3.4 Image Correlation 9

3.3.5 Feature Extraction 10

3.4 Tools 10

v
CHAPTER NO TITLE PAGE NO

3.4.1 Octave 11

3.4.2 MNIST Database 12

3.5 Feasibility Study 12

4. METHODOLOGY 13

4.1 Getting Familiar with the Tools 13

4.2 Creating the Reference and Test Set 13

5. DATA 14

5.1 Data Set Format 14

5.2 Reference Set 15

5.3 Test Set 16

6. CONCLUSION AND FUTURE WORK 17

REFERENCES 19

vi
LIST OF FIGURES

FIGURE NO TITLE PAGE NO.


3.1 Image Processing 6
3.3.2 Artificial Neural Network 8
3.3.3 Support Vector Machine 9
3.4 Work Flow Diagram 11
5.1 MNIST Data Set 14
5.2 Reference Set 15

vii
LIST OF ABBREVIATION

OCR Optical Character Recognition


ANN Artificial Neural Network
SVM Support Vector Machines
MNIST Modified National Institute of Standards and
Technology Database
NIST National Institute of Standards and Technology

viii
CHAPTER 1
INTRODUCTION
It is easy for the human brain to process images and analysis them. When the eye
sees a certain image, the brain can easily segment it and recognize its different elements.
The brain automatically goes through that process, which involves not only the analysis of
this images, but also the comparison of their different characteristics with what it already
knows in order to be able to recognize these elements. There is a field in computer science
that tries to do the same thing for machines, which is Image Processing. Image processing
is the field that concerns analyzing images so as to extract some useful information from
them. This method takes images and converts them into a digital form readable by
computers, it applies certain algorithms on them, and results in a better-quality image or
with some of their characteristics that could be used in order to extract some important
information from them. Image processing is applied in several areas, especially nowadays,
and several software’s have been developed that use this concept. Now we have self-driven
cars which can detect other cars and human beings to avoid accidents. Also, some social
media applications, like Facebook, can do facial recognition thanks to this technique.
Furthermore, some software’s use it in order to recognize the characters in some images,
which is the concept of optical character recognition, that we will be discussing and
discovering in this project. One of the narrow fields of image processing is recognizing
characters from an image, which is referred to as Optical Character Recognition (OCR).
This method is about reading an image containing one or more characters, or reading a
scanned text of typed or handwritten characters and be able to recognize them. A lot of
research has been done in this field in order to find optimal techniques with a high accuracy
and correctness. The most used algorithms that proved a very high performance are
machine learning algorithms like Neural Networks and Support Vector Machine. One of
the main applications of OCR is recognizing handwritten characters. In this project, we will
focus on building a mechanism that will recognize handwritten digits. We will be reading
images containing handwritten digits extracted from the MNIST database and try to
recognize which digit is represented by that image. For that we will use basic Image
Correlation techniques, also referred to as Matrix Matching. This approach is based on
matrices manipulations, as it reads the images as matrices in which each element is a pixel.

1
1.1 OBJECTIVES
• To provide an easy user interface to input the object image.
• User should be able to upload the image
• System should be able to upload the image
• System should be able to preprocess the given input to supress the background
• System should be able to detect digit regions present in the image
• System should retrieve digit present in the image and display them to the user

1.2 SCOPE
• Improve of human computer interface for computer illiterate people by providing
various computing services on inputs
• Can be implemented on smart phones, tablets, as a virtual keyboard
• The system can create paperless environment by digitizing handwritten character

2
CHAPTER 2
LITERATURE SURVEY

Handwriting recognition the last frontiers, Proceeding’s 15th


International Conference on Pattern Recognition, Barcelona, ICPR-
2000, Vol.4, pp.1-10.

The last frontiers of handwriting recognition are considered to have started in the
last decade of the second millennium. This paper summarizes the nature of the problem of
handwriting recognition, the state of the art of handwriting recognition at the turn of the
new millennium, the results of CENPARMI researchers in automatic recognition of
handwritten digits, touching numerals, cursive scripts, and dates formed by a mixture of
the former 3 categories. Wherever possible, comparable results have been tabulated
according to techniques used, databases, and performance. Aspects related to human
generation and perception of handwriting are discussed. The extraction and usage of human
knowledge, and their cooperation into handwriting recognition systems are presented.
Challenges, aims, trends, efforts and possible rewards, and suggestions for future
investigations are also included.

Central Research Laboratory, Performance evaluation of pattern


classifiers for handwritten character recognition. International Journal
on Document Analysis and Recognition, Tokyo 185-8601, Japan.

This paper describes a performance evaluation study in which some efficient


classifiers are tested in handwritten digit recognition. The evaluated classifiers include a
statistical classifier (modified quadratic discriminant function, MQDF), three neural
classifiers, and an LVQ (learning vector quantization) classifier. They are efficient in that
high accuracies can be achieved at moderate memory space and computation cost. The
performance is measured in terms of classification accuracy, sensitivity to training sample
size, ambiguity rejection, and outlier resistance. The outlier resistance of neural classifiers
is enhanced by training with synthesized outlier data. The classifiers are tested on a large
data set extracted from NIST SD19. As results, the test accuracies of the evaluated

3
classifiers are comparable to or higher than those of the nearest neighbor (1-NN) rule and
regularized discriminant analysis (RDA). It is shown that neural classifiers are more
susceptible to small sample size than MQDF, although they yield higher accuracies on large
sample size. As a neural classifier, the polynomial classifier (PC) gives the highest accuracy
and performs best in ambiguity rejection. On the other hand, MQDF is superior in outlier
rejection even though it is not trained with outlier data. The results indicate that pattern
classifiers have complementary advantages and they should be appropriately combined to
achieve higher performance.

A Shallow Convolutional Neural Network for Accurate Handwritten


Digits Classification” 13th international conference, PRIP, Minsk,
Belarus, pp. 77-85.

At present the deep neural network is the hottest topic in the domain of machine
learning and can accomplish a deep hierarchical representation of the input data. Due to
deep architecture the large convolutional neural networks can reach very small test error
rates below 0.4% using the MNIST database. In this work we have shown, that high
accuracy can be achieved using reduced shallow convolutional neural network without
adding distortions for digits. The main contribution of this paper is to point out how using
simplified convolutional neural network is to obtain test error rate 0.71% on the MNIST
handwritten digit bench‐ mark. It permits to reduce computational resources in order to
model convolutional neural network.

Handwritten Digit String Recognition using Convolutional Neural


Network. 2018 24th International Conference on Pattern Recognition
(ICPR).

String recognition is one of the most important tasks in computer vision


applications. Recently the combinations of convolutional neural network (CNN) and
recurrent neural network (RNN) have been widely applied to deal with the issue of string
recognition. However, RNNs are not only hard to train but also time-consuming. In this
paper, we propose a new architecture which is based on CNN only, and apply it to
handwritten digit string recognition (HDSR). This network is composed of three parts from
4
bottom to top: feature extraction layers, feature dimension transposition layers and an
output layer. Motivated by its super performance of Dense Net, we utilize dense blocks to
conduct feature extraction. At the top of the network, a CTC (connectionist temporal
classification) output layer is used to calculate the loss and decode the feature sequence,
while some feature dimension transposition layers are applied to connect feature extraction
and output layer. The experiments have demonstrated that, compared to other methods, the
proposed method obtains significant improvements on ORAND-CAR-A and ORAND-
CAR-B datasets with recognition rates 92.2% and 94.02%.

Improved handwritten digit recognition using convolutional neural


networks (CNN). Sensors, 20(12), 3344.

Traditional systems of handwriting recognition have relied on handcrafted features


and a large amount of prior knowledge. Training an Optical character recognition (OCR)
system based on these prerequisites is a challenging task. Research in the handwriting
recognition field is focused around deep learning techniques and has achieved breakthrough
performance in the last few years. Still, the rapid growth in the amount of handwritten data
and the availability of massive processing power demands improvement in recognition
accuracy and deserves further investigation. Convolutional neural networks (CNNs) are
very effective in perceiving the structure of handwritten characters/words in ways that help
in automatic extraction of distinct features and make CNN the most suitable approach for
solving handwriting recognition problems. Our aim in the proposed work is to explore the
various design options like number of layers, stride size, receptive field, kernel size,
padding and dilution for CNN-based handwritten digit recognition. In addition, we aim to
evaluate various SGD optimization algorithms in improving the performance of
handwritten digit recognition. A network’s recognition accuracy increases by incorporating
ensemble architecture. Here, our objective is to achieve comparable accuracy by using a
pure CNN architecture without ensemble architecture, as ensemble architectures introduce
increased computational cost and high testing complexity. Thus, a CNN architecture is
proposed in order to achieve accuracy even better than that of ensemble architectures, along
with reduced operational complexity and cost. Moreover, we also present an appropriate
combination of learning parameters in designing a CNN that leads us to reach a new
absolute record in classifying MNIST handwritten digits. We carried out extensive
experiments and achieved a recognition accuracy of 99.87% for a MNIST dataset.

5
CHAPTER 3

DESIGN AND IMPLEMENTATION


3.1 Image Processing
Image processing is a very wide field within computer science which deals mainly
with analysing images and trying to get some information out of them. The image to be
processed is imported then analysed using some computations, which, by the end, results
either in an image with a better quality or some of the characteristics of this image
depending on the purpose of this analysis. This is a very wide field within computer science,
which also has several other subfields of which Optical Character Recognition that we will
be mainly dealing with throughout this project.

There are two ways to provide input to the system. The user can either upload the
image of the digit he wants to detect or the data from the MNIST dataset. The input images
are pre-processed. Using the different classifiers, the recognized digits’ accuracy is
compared and the result is obtained. The results obtained are displayed along with the
accuracy.

Figure: 3.1 Image Processing

6
3.2 Optical Character Recognition (OCR)
It is easy for the naked eye to recognize a character when spotted in any document;
however, computers cannot identify the characters from an image or scanned document. In
order to make this possible, a lot of research has been done, which resulted in the
development of several algorithms that made this possible. One of the fields that specialize
in character recognition under the light of Image Processing is Optical Character
Recognition (OCR). In Optical Character Recognition, a scanned document or an image is
read and segmented in order to be able to decipher the characters it contains. The images
are taken and are preprocessed so as to get rid of the noise and have unified colors and
shades, then the characters are segmented and recognized one by one, to finally end up with
a file containing encoded text containing these characters, which can be easily read by
computers. Optical Character Recognition dates back to the early 1900s, as it was
developed in the United States in some reading aids for the blind. In 1914, Emanuel
Goldberg was able to implement a machine able to convert characters into “standard
telegraph code”. In the 1950s, David Shepard, who was at that time an engineer at the
Department of Defense, developed a machine that he named Gismo, which is able to read
characters and translate them into machine language. In 1974, Ray Kurzweil decided to
develop a machine that would read text for blind and visually impaired people under his
company, Kurzweil Computer Products. There are several software and programs,
nowadays, which use OCR in several different applications. In 1996, the United States
Postal Services were able to develop a mechanism, HWAI, which recognizes handwritten
mail addresses.

3.3 Methods Used in OCR

A lot of research has been done in the field of OCR, and still being done, which
resulted in the development of several algorithms which enable computers to recognize
characters from images or scanned texts. Many of these techniques have attained very high
efficiency and a low error rate. However, these algorithms are still being investigated and
improved for a better performance.

7
3.3.1 Machine Learning
Machine learning is a field that concerns making programs learn and know how to
behave in different situations using data. One of its applications is Optical Character
Recognition.

3.3.2 Artificial Neural Network


An Artificial Neural Network (ANN) is a system that mimics the human’s
biological neural network in the brain. It is an algorithm used for machine learning, which
means it uses data to learn how to respond to different inputs. The ANN can be seen as a
box, which takes one or more inputs and gives one output. Inside the box, there exist several
interconnected nodes. The input is fed into the program, which goes through the several
layers and nodes of the ANN and gives an output using a transfer function.

Artificial Neural Networks are used for OCR and have proved a very high accuracy
rate. In this case, the ANN would “recognize a character based on its topological features
such as shape, symmetry, closed or open areas, and number of pixels”. The high accuracy
of this kind of algorithms is mainly thanks to its ability of learning from the training set,
which would contain characters with similar features.

Some Neural Networks have proven a very high performance. An implementation


of the ANN done by Simard, Steinkraus, and Platt has reduced the error rate of recognizing
handwritten digits from the MNIST dataset to a percentage as low as 0.7%

Figure: 3.3.2 Artificial Neural Network

8
3.3.3 Support Vector Machine
Support Vector Machine (SVM) is an algorithm that belongs to machine learning
as well. SVMs are known as high performance pattern classifiers. While Neural Networks
aim at minimizing the training error, SVMs have as goal to minimize the “upper bound of
the generalization error”. The learning algorithm in this technique is based on classification
and regression analysis.

Figure: 3.3.2 Artificial Neural Network

This kind of classifier has been used in the recognition of very complex characters
like the Khmer language and has proved a very high performance.

3.3.4 Image Correlation


Image Correlation is a technique used to recognize characters from images. This
approach, also referred to as Matrix Matching, uses mathematical computations in order to
analyse the images. By using this technique, the images are read as matrices, where each
element represents a pixel, which makes it easier to manipulate them using mathematical
approaches. The image to be identified is loaded as a matrix and compared to the images
in the reference set. The test image is overlapped with each image in the reference set to be
able to see how it matches with each one of them so as to tell which one represents it the
most. The decision can be made by seeing the pixels that match and the ones left out from
either one of the two images. This technique has many challenges and limitations, as it only
overlaps the images and tries to see how much they look alike. By using this method,

9
problems arise when having characters of different sizes, or when one of them is rotated by
a certain angle.

3.3.5 Feature Extraction


Feature extraction is a technique based on pattern recognition. The main idea of
feature extraction is analysing the images and derive some characteristics from these
images that identify each specific element. An example of these characteristics would be
the curvatures, the holes, the edges, etc. In the case of digits recognition, these features
could be the holes inside the digits (for example for the eight, the six, and maybe the two
as well) as well as the angles between some straight lines (for example in the one, the four,
and the 6 seven). Whenever an unknown image is to be recognized, its features are
compared to these so that it can be classified.

3.4 Tools

This project’s main objective is to be able to read the images containing the handwritten
digits and be able to identify those digits using basic image correlation techniques. These
images are normally represented and read as matrices, in which every element portrays a
pixel. The image correlation technique takes these matrices and compares them using some
algorithms so as to identify the match that represents the digit we are trying to figure out.
This project will be mainly using matrices and heavy numerical computations, that is why
it is very important to consider the tools that would provide us with a suitable environment
for performing these computations.

10
Figure: 3.4 Work Flow Diagram

3.4.1 Octave
Octave is a free and open-source software that uses a high-level programming
language. It has the same functionalities as MATLAB and is compatible with it. It offers a
very simple and suitable interface to exert some mathematical computations. It provides
some tools to solve mathematical problems like some common linear algebra problems. It
is also very efficient when it comes to the use of resources, i.e., time and memory, when it
comes to these operations. Also, it is very easy to use it when dealing with matrices, as it
provides with many functions and operations that make it less costly to manipulate them.
In this project, we will deal with images as matrices, in which each element represents a
pixel, that is why it is very necessary for us to choose a tool that will make our computations
easier and more efficient in terms of time and memory resources. Both MATLAB and
Octave are very easy to learn and work with and provide a suitable environment for this
kind of projects. We have opted for Octave as it is free and open source.

11
3.4.2 MNIST Database
The MNIST database, which stands for the Modified National Institute of Standards
and Technology database, is a very large dataset containing several thousands of
handwritten digits. This dataset was created by mixing different sets inside the original
National Institute of Standards and Technology (NIST) sets, so as to have a training set
containing several types and shapes of handwritten digits, as the NIST set was divided into
those written by high school students and others written by the Census Bureau workers.
The MNIST dataset 8 has been the target of so many researches done in recognizing
handwritten digits. This allowed the development and improvements of many different
algorithms with a very high performance, such as machine learning classifiers. In order to
be able to implement our recognizer and test its performance, it is necessary to have a
suitable dataset which contains a large number of handwritten digits. This dataset should
be able to allow us to discover the challenges and limitation of the image correlation
technique and push us to look for ways and rules to enhance it and assess its accuracy. We
have opted for this dataset to be used for testing our program since it has proved a great
reliability and importance in the field.

3.5 Feasibility Study


From a technical perspective, since this project makes heavy use of numerical
computations, using Octave is a wise choice as it will make the program more efficient.
This software will also provide us with some libraries to read and manipulate the images
that will make the implementation process easier.

As for the dataset to use in the testing of the project, we have chosen the MNIST
Database. This database contains thousands of handwritten digits that have been used in
the development of programs with a similar aim. This dataset is open for public use with
no charges. It is also very convenient for our project and will help us reduce the time by
using directly as a test set without having to make one ourselves.

Since all the tools to be used in this project are free of charge and very easy to use,
we can conclude that this project is very feasible in terms of financial resources as well as
effort and time.

12
CHAPTER 4
METHODOLOGY
4.1 Getting Familiar with the Tools
The first step we had to go through while working on this project was getting
familiar with the tools used, i.e., Octave and the MNIST dataset. After setting up the
environment for Octave to work perfectly and downloading the dataset, I have started
experimenting with both in order to get familiar with them and know how to use them easily
in the future. Since all the programming is mainly done in Octave, we had to download it
along with its Graphical User Interface into the computer, and learn a little bit about its
functions and how to use it. Octave is a free software which makes it very easy to work
with matrices and vectors and is very efficient in performing calculations on them. I have
started learning how to use it and looking for its main functions that I will be using in the
implementation of the project. For that, I have used some random images of digits to see
how they can be read and modified as well as how to apply some computations on them.
Moreover, I had to investigate the format of the MNIST dataset and get familiar with its
representation. The MNIST dataset, which was used to create our test set, contains
thousands of handwritten digits, represented as matrices. It has been used in the
development of several programs and projects with the same aim as ours. After
downloading the file which contains the handwritten digits, I have loaded it on Octave in
order to visualize the images and figure out how to use and manipulate them.

4.2 Creating the reference and test set


One of the main steps in the project is creating the reference and the test set that
will both be used in the implementation phase. The test set is to be used in order to assess
the performance of the program and evaluate its success or error rate. It is to be taken from
the MNIST dataset, since it contains the handwritten digits that we intend to recognize and
identify. As for the reference set, it is used to compare the test images and be able to identify
the digit they represent. It is to be created using different fonts.

13
CHAPTER 5
DATA
Data

It is very necessary to know the kind of data we are using before we start the design
and the implementation of the program. That is why we had to have a look at its format to
understand how it is represented before creating the reference and the test set.

5.1. Dataset Format


The dataset that I have downloaded from the MNIST database contains 60,000 images
of handwritten digits, from zero to nine, all grouped in one file. Each of the images is of
size 28 by 28 pixels and represents a digit. I have noticed that there is no pattern or order
to the way the images were organized in the file. The images are represented as matrices,
of which the elements represent the pixels. Also, each image has a label that indicates the
digit represented. This label was very helpful later on in order to be able to create the test
set.

Furthermore, the data did not contain noise or any major problems to deal with, that is why
it was used without preprocessing it.

Figure: 5.1 MNIST Dataset


14
5.2. Reference Set
To be able to recognize the digit represented by a certain image, it is required to
compare it with other images containing known digits to be able to make the decision. For
that it is necessary to create a reference set which will contain all these images. That is to
say, each image we would want to recognize is to be compared to the images in the
reference set. The image with the highest match is the one that represents the right number.
Since handwritten digits differ from a person to another, the reference set needs to have
digits with different fonts. That is why, we have created six images of each digit using the
online image editor pixlr.com, each one with a different font. The reference set contains
images with the same dimensions as the ones in the MNIST dataset, i.e., 28 by 28 pixels.
Furthermore, these images have a black background and a white font, which made it easier
to use and manipulate them later on using Octave. Furthermore, to make the comparison
easier, we have regrouped each six images representing the same digit under one file. So
the resulting reference set was ten files, each one representing a digit from zero to nine, and
containing six images of that digit in different fonts. The pixels of these images are then
changed into zeros and ones, which makes the overlapping of the images easier. The black
background was initially represented as zeros, so it is left the same. As for the pixels of the
white font, each one of them was represented with a different non zero value depending on
the shade of white. These non-zero values are all converted into ones. The following image
displays the digit “2” reference set. Rest of the reference sets are in Appendix A

Figure: 5.2 Reference Set

15
5.3. Test Set
The program to be developed needs to be tested against some images that contain
handwritten digits so as to be able to assess its performance and calculate its success rate.
That is why it is very necessary to create a test set. The test set represents an example of
the images containing the handwritten digits which will have to be compared to the images
in the reference set so as to identify them. This set was formed using the file from the
MNIST database. The original file contained 60,000 images representing different digits.
This made it difficult to look for each number using the label for the testing of the program.
In order to make it easier to access each digit we want; we have decided to store a number
of images from each digit in a separate file. That is why we have stored 20 images of each
digit in ten different files. That is to say, the resulting test set was in the form of ten files,
each one of them represents a digit and contains 20 images of it. These images were
extracted from the initial file by reading them and their labels using Octave. In order to
make the manipulation of the matrices/images easier, we had to make some modifications
in the elements of all the matrices representing the test set as well. The black pixels were
originally represented as zeros, so they were left the same. As for the white ones, each of
them had a different non zero number, so we turned them all into ones.

16
CHAPTER 6
CONCLUSION AND FUTURE WORK

Conclusion
Optical Character Recognition is a very broad field concerned with turning an
image or a scanned document containing a set of characters into an encoded text that could
be read by machines. In this project, we have attempted to build a recognizer for
handwritten digits using the MNIST dataset. The challenge of this project was to be able to
come up with some basic image correlation techniques, instead of some sophisticated
algorithms, and see to what extent we can make this mechanism accurate. We have tried
several versions and kept trying to improve each one in order to reach a higher performance
rate. The last version has reached a rate of 57% accuracy. Unfortunately, we could not
compare the performance of the mechanism we have built to some others that have already
been designed and/or implemented before because we did not find any academic paper that
tackles this method. The performance we have reached is far less than that of machine
learning, which reaches a performance rate of 99.3%; however, it could be further improved
and made into a better one. The goal of this project was to explore the field of OCR and try
to come up with some techniques that could be used without going into deep computations,
and even if the final result is not very reliable, it still provides an accuracy way better than
random.

Future work

The future steps that to go for would be having a closer look at the results of all the
versions in order to find new rules. By extracting and implementing them, we will be able
to enhance the performance of these versions. Moreover, it would be good if we could make
some modifications to both the reference set and the rules in order to make our program
more general and able to identify both typed and handwritten digits. Furthermore, in the
future, we could make a great use of the matrices that indicate the first maximum overlap
of each test image with the reference images, along with the number of pixels left out from
both. These matrices could be used with some clustering algorithms to build a program able
to recognize handwritten digits with a very high efficiency. Last but not least, we thought
about using linear or high-level regression in the versions we have developed in order to
create more rules. As regression could be used for binary classification and is not very
17
suitable to classify a digit out of ten, this technique could be used in order to tell which
digit is the most suitable, the first maximum or second maximum, which will enable us to
generate more rules; thus, reach a higher efficiency.

18
REFERENCES

[1] C.Y. Suen, J. Kim, K. Kim, Q. Xu, L. Lam, Handwriting recognition the last frontiers,
Proc. 15th ICPR, Barcelona, 2000, Vol.4, pp.1-10.
[2] C.-L. Liu, M. Koga, H. Sako, H. F ujisa w a,Aspect ratio adaptive normalization for
handwritten character recognition, A dvances in Multimodal Interfaces| ICMI 2000,
LNCS 1948, Springer, 2000, pp.418-425.
[3] Vladimir Golovko, MikhnoEgor, AliaksandrBrich, and AnatoliySachenko (October
2016), “A Shallow Convolutional Neural Network for Accurate Handwritten Digits
Classification” 13th international conference, PRIP, Minsk, Belarus, pp. 77-85.
[4] Hongjian Zhan, ShujingLyu, Yue Lu Shanghai (August 2018), “Handwritten Digit
String Recognition using Convolutional Neural Network”, 24th International Conference
on Pattern Recognition (ICPR), pp. 3729-3734.
[5] Ahlawat, S., Choudhary, A., Nayyar, A., Singh, S., & Yoon, B. (2020). Improved
handwritten digit recognition using convolutional neural networks (CNN). Sensors,
20(12), 3344. doi:10.3390/s20123344.
[6] N. Hagita, S. Naito, I. Masuda, Handprinted Kanji characters recognition based on
pattern matching method, Pr oc. ICTP, 1983, pp.169-174.
[7] D.-S. Lee, S.N. Srihari, Handprinted digit recognition: a comparison of algorithms, Pr
oc. 3rd IWFHR, 1993, pp.153-164.
[8] U. Kreel, J. Sch•urmann, Pattern classication techniques based on function
approximation, Handbook of Character R ecognition and Document Image Analysis,
World Scientic, 1997, pp.49-78.
[9] Xiaofeng Han and Yan Li (2015), “The Application of Convolution Neural Networks
in Handwritten Numeral Recognition” in International Journal of Database Theory and
Application, Vol. 8, No. 3, pp. 367-376.
[10] T Siva Ajay (July 2017), “Handwritten Digit Recognition Using Convolutional
Neural Networks” International Research Journal of Engineering and Technology
(IRJET), Vol. 04, Issue 07, pp. 2971-2976.

19

You might also like