Download as pdf or txt
Download as pdf or txt
You are on page 1of 74

A REPORT OF 08 WEEKS INDUSTRIAL TRAINING

At

ASPEXX Health Solution Pvt. Ltd.


SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENT FOR THE

AWARD OF THE DEGREE OF

BACHELOR OF ENGINEERING

IN

COMPUTER SCIENCE ENGINEERING

1 Feb,2021-1 April, 2021

SUBMITTED BY:

NAME: Mohan G

USN: 1VI17CS057

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

VEMANA INSTITUTE OF TECHNOLOGY


BENGALURU – 560 034
2020 - 21
CONTENTS

S. NO. TOPICS PAGE NO.


1. Certification 6

2. Candidate Declaration 7

3. Abstract 8

4. Acknowledgement 9

5. Chapter 1:Introduction 10

6.

7. Chapter 2:Topics covered in Week 1 11

8.  AI, ML, DL

9.  AI VS ML VS DL VS Data Science

10.  Learn Python using Jupyter Notebook

11.

12. Chapter 3: Topics covered in Week 2 14

13.  Mind Map: ML Algorithm

14.  Mind Map: ML Algorithm and its Application

15.  Mind Map: ML (Types of tasks,


application and approaches)

16.  Machine Learning Fundamentals


expiations with diagram

17.  Machine Learning with Python

18. Data Science Gains using - SAS, R


andPython
19.
20. Chapter 4: Topics Covered in Week 3

21.  Statistics in Machine Learning

22.  Feature Engineering

23.  Machine Learning Pipelines

24.  What is PyTorch

25.

26. Chapter 5: Topics Covered in Week 4 27

27.  Data Visualization And Tableau

28.  Data Science Projects Demo

29.  Data Science Mind Map & Deep


learning
30.

31. Chapter 6: Topics covered in week 5 35

32.  NLP

33.  Tensorflow

34.  Fuzzy Logic & Genetic Algorithms

35.

36. Chapter 7: Topics covered in week 6 37

37.  Industrial experts Webinar

38.  Neural Networks

39.  Big Data Analytics


40.

41. Chapter 8: Topics covered in week 7 45

42.  Word Press Demand


43.  Word Press Basics

44.  CMS

45.  Word Press Website from Scratch

46.

47. Chapter 9: Topics covered in week 8 52

48.  Word Press Basics

49.  Custom Word Press Theme

50.  Management of Website

51.

52. Project 54

53.  Introduction

54.  Purpose and Scope

55.  Problem Statement 55

56.  Project Analysis 56

57.  Project Timeline and Dataset used 57

58.  Methodology
59. Project Design(Block, Dataflow and Usecase 57-59
diagram)

60. Implementation 60

61. Hardware and Software Requirements 63

62. Coding 65

63. Snapshot and analysis of project 69

64. Advantages and disadvantages 70

65. Conclusion 72

66. Future Scope 72


CERTIFICATION

This is to certify that this project report entitled “ AI , ML , Data Science &
Word Press” by Mohan G submitted in partial fulfilment of the requirements
for the internship (ASPEXX Health Solution Pvt. Ltd.), during the Eight weeks
of internship , is a bonafede record of work carried out under my guidance and
supervision. I hereby declare that the work has been carried out under my
supervision and has not been submitted elsewhere for any other purpose.

(Signature of CEO)
Ms. Shivani Mishra

(Signature of Director)

Managing Director
Vemana Institute of Technology

CANDIDATE'S DECLARATION

I Mohan G hereby declare that I have undertaken 08 weeks Training from ASPEXX

Health Solution Pvt. Ltd. during a period from Feb 1, 2021- April 1, 2021 in partial

fulfillment of requirements for the award of degree of B Tech. Computer Science &

Engineering at Vemana Institute of Technology . The work which is being presented in the

training report submitted to Department of Computer Science & Engineering at Faculty of

Engineering & Technology, Bangalore is an authentic record of training work.

Name of Student: Mohan G


Roll No: 1VI17CS057

(Signature of Director)

Managing Director
ABSTRACT

Software training is one of the requirements to be fulfilled in order to obtain the


Bachelor’s degree in technology. Each student needs to do software training in a
recognized company of their respective domain. The students are compulsory to
do training for duration of 1 months which is intended for their exposure to the
software industry. A well planned, properly executed and evaluated software
training helps a lot in developing a professional attitude. It develops an
awareness of software approach to problem solving, based on a broad
understanding of processes. Besides software training build self confidence
among students and let students know the technical knowledge and
professionalism.

During internship from ASPEXX Health Solution Pvt. Ltd. most of the
theoretical knowledge gained during the course of studies was put to test.
Various efforts and processes involved in designing of a component was studied
and understood during the internship. In our internship, I undertook projects of
AI.

The training gave me good experience from the view of implementing my


theoretical knowledge in practical aspects. It gave me firsthand experience of
working as an engineering professional. It helped me in improving my technical,
interpersonal and communication skills, both oral and written. Overall, it is a
great experience to have industrial training in such a reputed firm and I believe
that it will help me in building a successful career.
ACKNOWLEDGMENT

Any effort becomes successful when there is the effect of synergy-the conceptthat two and
two make more than four. This report has also the effect of synergy, without prejudice to
my own contribution. I am appreciative of ASPEXXHealth Solutions Pvt. Ltd Company
and those people who cooperate with me. It is my privilege that I had the opportunity to do
an internship at ASPEXX HealthSolutions Pvt. Ltd Company, India. I would like to those
all who either directly or indirectly contributed to this project.
• At first, I express my deep gratefulness to Bajarang Prasad Mishra and ShivaniMishra. of
ASPEXX Health Solutions Pvt. Ltd who gave me the opportunity to allow me in this
organization areas.
• Shivesh (Mentor) without his effort, it would be impossible to bringthis to the light. I
would also like to express my excessive thanks to all members of ASPEXX Health
Solutions Pvt. Ltd for their excellent support and proper guidance in completing my
internship report.
I would also like to express my excessive thanks to all members of ASPEXX Health
Solutions Pvt. Ltd for their excellent support and proper guidance in completing my
internship report.

- Mohan G
CHAPTER 1

1.1 INTRODUCTION
CUREYA (registered as Aspexx Health solutions Pvt Ltd , under MCA ) is
DPIIT recognized start up , registered under ' STARTUP INDIA SCHEME '
.CUREYA collaborated with stakeholders include - World Yoga associations ,
Flag bits technologies and many more . CUREYA primarily objective is ‘
HEALTH FOR ALL’, by reducing the medical expenditure , eliminating the
information asymmetry , promoting health awareness and achieving inclusive &
holistic approach for healthcare treatments . The mission is to achieve the right
to "Health for All" and improve the healthcare indicators by dissemination of
health education that focuses on health promotion, health prevention, and self-
medication. The objective is to eliminate the information asymmetry, language
barrier, and emphasize to achieve global standards of healthcare delivery systems
based on access , equity, affordability andquality, efficiency, sustainability.

Social Media Links –


1. Facebook- https://www.facebook.com/cureya7
2. 2- Instagram- https://www.instagram.com/cureya.in/
3. YouTube-https://www.youtube.com/channel/UCjsRwGm--
mr1ADln5CB5Siw/videos
4. LinkedIn (Company )- https://www.linkedin.com/company/28749699 5-
Website - www.cureya.in
Chapter 2

Difference between AI, ML, Deep Learning


Comparison between AI, ML, Deep Learning and DataScience.

 What is Artificial Intelligence or AI?

Artificial Intelligence describes machines that can perform tasks resembling


those of humans. So AI implies machines that artificially model human
intelligence. AI systems help us manage, model, and analyze complex systems.
It is the superset which has Machine Learning & Deep Learning as subset.
 What is Machine Learning or ML?

Machine learning uses algorithms to parse data, learn from that data, andmake
informed decisions based on what it has learned.

 What is Deep Learning or DL?

Deep learning structures algorithms in layers to create an “artificial neural


network” that can learn and make intelligent decisions on its own. Deep
learning is a subfield of machine learning. While both fall under the broad
category of artificial intelligence, deep learning is what powers the most
human-like artificial intelligence.

 What is Data Science?

Data science is a broad field that spans the collection, management, analysis
and interpretation of large amounts of data with a wide range ofapplications.
It integrates all the terms above and summarizes or extract
insights from data (exploratory data analysis) and make predictions fromlarge
datasets (predictive analytics).

Learn Python using Jupyter Notebook

The Jupyter Notebook is an open-source web application that allows you to


create and share documents that contain live code, equations, visualizations
and narrative text. Uses include: data cleaning and transformation, numerical
simulation, statistical modelling, data visualization, machine learning, and
much more.

Jupyter Notebooks are a spin-off project from the IPython project, whichused
to have an IPython Notebook project itself. The name, Jupyter, comesfrom the
core supported programming languages that it supports: Julia, Python, and R.
Jupyter ships with the IPython kernel, which allows you to write your
programs in Python, but there are currently over 100 other kernels that you
can also use.
Chapter 3

Basic of Machine Learning Algorithm

Broadly, there are 3 types of Machine Learning Algorithms

1. Supervised Learning

How it works: This algorithm consist of a target / outcome variable (or


dependent variable) which is to be predicted from a given set of predictors
(independent variables). Using these set of variables, we generate a function
that map inputs to desired outputs. The training process continuesuntil the
model achieves a desired level of accuracy on the training data.
Examples of Supervised Learning: Regression, Decision Tree, Random
Forest, KNN, Logistic Regression etc.

2. Unsupervised Learning

How it works: In this algorithm, we do not have any target or outcome variable
to predict / estimate. It is used for clustering population in differentgroups, which
is widely used for segmenting customers in different groups for specific
intervention. Examples of Unsupervised Learning: Apriori algorithm, K-means.

3. Reinforcement Learning:

How it works: Using this algorithm, the machine is trained to make specific
decisions. It works this way: the machine is exposed to an environment where
it trains itself continually using trial and error. This machine learns from past
experience and tries to capture the best possible knowledgeto
make accurate business decisions. Example of Reinforcement Learning:
Markov Decision Process

Machine Learning Application

1. Image Recognition

2. Speech Recognition

3. Traffic prediction

4. Product recommendations

5. Self-driving cars

6. Email Spam and Malware Filtering

7. Virtual Personal Assistant

8. Online Fraud Detection

9. Stock Market trading

10. Medical Diagnosis


Machine Learning Model Approaches

A Taxonomy of Machine Learning Models

There is no simple way to classify machine learning algorithms. In this


section, we present a taxonomy of machine learning models adapted fromthe
book Machine Learning by Peter Flach. While the structure for classifying
algorithms is based on the book, the explanation presented below is created
by us.

For a given problem, the collection of all possible outcomes represents the
sample space or instance space.

The basic idea for creating a taxonomy of algorithms is that we divide the
instance space by using one of three ways:

 Using a Logical expression.


 Using the Geometry of the instancespace.
 Using Probability to classify the instance space.

The outcome of the transformation of the instance space by a machine learning


algorithm using the above techniques should be exhaustive (coverall possible
outcomes) and mutually exclusive (non-overlapping).
Machine Learning Fundamentals

1. Machine Learning is an application of artificial intelligence where a


computer/machine learns from the past experiences (input data) and
makes future predictions. The performance of such a system should
be at least human level.

2. Machine Learning Categories. Machine Learning is generally


categorized into three types: Supervised Learning, Unsupervised
Learning, Reinforcement learning.

3. The main aim of training the ML algorithm is to adjust the weights W


to reduce the MAE or MSE.

4. Gradient descent Algorithm: There are three ways of doing gradient


descent: Batch gradient descent: Uses all of the training instances to
update the model parameters in each iteration. Mini-batch Gradient
Descent: Instead of using all examples, Mini-batch Gradient Descent
divides the training set into smaller size called batch denoted by ‘b’.
Thus, a mini-batch ‘b’ is used to update the model parameters in each
iteration. Stochastic Gradient Descent (SGD): updates the parameters
using only a single training instance in each iteration. The training
instance is usually selected randomly. Stochastic gradient descent is
often preferred to optimize cost functions when there are hundredsof
thousands of training instances or more, as it will converge morequickly
than batch gradient descent.

Machine Learning with Python

Python is the fifth most important language as well as most popular language
for Machine learning and data science. The following are the features of
Python that makes it the preferred choice of language for datascience −

 Extensive set of packages:

Python has an extensive and powerful set of packages which are ready to be
used in various domains. It also has packages like numpy,SciPy, pandas, scikit-
learn etc. which are required for machine learning and data science.

 Easy prototyping:

Another important feature of Python that makes it the choice oflanguage


for data science is the easy and fast prototyping. This feature is useful
for developing new algorithm.

 Collaboration feature:

The field of data science basically needs good collaboration andPython


provides many useful tools that make this extremely.

 One language for many domains:

A typical data science project includes various domains like data


extraction, data manipulation, data analysis, feature extraction,
modelling, evaluation, deployment and updating the solution. AsPython is
a multi-purpose language, it allows the data scientist toaddress all these
domains from a common platform.

Data Science: Gains using - SAS, R and Python

Data science is an interdisciplinary field that uses scientific methods,


processes, algorithms and systems to extract knowledge and insights fromdata
in various forms, both structured and unstructured, similar todata mining.

Data science is a "concept to unify statistics, data analysis, machine learning


and their related methods" in order to "understand and analyze actual
phenomena" with data. It employs techniques and theories drawn from many
fields within the context of mathematics, statistics, informationscience and
computer science.

SAS: SAS has been the undisputed market leader in commercial analytics space.
The software offers huge array of statistical functions, has good GUI(Enterprise
Guide & Miner) for people to learn quickly and provides awesome technical
support. However, it ends up being the most expensive option and is not always
enriched with latest statistical functions.

R: R is the Open-source counterpart of SAS, which has traditionally been


used in academics and research. Because of its open-source nature, latest
techniques get released quickly. There is a lot of documentation available
over the internet and it is a very cost-effective option.

Python: With origination as an open-source scripting language, Python usage


has grown over time. Today, it sports libraries (NumPy, SciPy and
matplotlib) and functions for almost any statistical operation / model
building you may want to do. Since introduction of pandas, it has becomevery
strong in operations on structured data.

Chapter 3

Statistics in Machine Learning

Statistics and machine learning are two very closely related fields. That
statistical methods can be used to clean and prepare data readyfor modelling.
That statistical hypothesis tests and estimation statistics can aid in model
selection and in presenting the skill and predictions from finalmodels.

1. Statistics is a collection of tools that you can use to get answers to


important questions about data. You can use descriptive statistical
methods to transform raw observations into information that you can
understand and share. ... Statistics is generally considered a
prerequisite to the field of applied machine learning.

2. Machine Learning is an interdisciplinary field that uses statistics,


probability, algorithms to learn from data and provide insights which
can be used to build intelligent applications.

3. The major difference between machine learning and statistics is their


purpose. Machine learning models are designed to make the most
accurate predictions possible. Statistical models are designed for
inference about the relationships between variables.
4. The field of statistics is the science of learning from data. Statistical
knowledge helps you use the proper methods to collect the data,
employ the correct analyses, and effectively present the results.
Statistics allows you to understand a subject much more deeply.
5. Statistical learning theory was introduced in the late 1960s but untill
1990s it was simply a problem of function estimation from a given
collection of data. ... Some more examples of the learning problems
are: Predict whether a patient, hospitalized due to a heart attack, will
have a second heart attack.

6. Linear regression is a technique, while machine learning is a goal that


can be achieved through different means and techniques. So regression
performance is measured by how close it fits an expected line/curve,
while machine learning is measured by how good it can solve a certain
problem, with whatever means necessary.

7. This is caused in part by the fact that Machine Learning has adopted
many of Statistics' methods, but was never intended to replace
statistics, or even to have a statistical basis originally “Machine
learning is statistics scaled up to big data” “The short answer is thatthere is no
difference.

8. Most Important Methods For Statistical Data Analysis Mean. The


arithmetic mean, more commonly known as “the average,” is the sumof a list
of numbers divided by the number of items on the list.
Standard Deviation. ...Regression .................. Sample Size
Determination…Hypothesis Testing.
9. The major difference between machine learning and statistics is their
purpose. Machine learning models are designed to make the most
accurate predictions possible. Statistical models are designed for
inference about the relationships between variables.” ... You cannot dostatistics
unless you have data.

10. This is caused in part by the fact that Machine Learning has adopted
many of Statistics' methods, but was never intended to replace
statistics, or even to have a statistical basis originally .................. “Machine
learning is statistics scaled up to big data” “The short answer is thatthere is no
difference.

11. Statistics are used behind all the medical study. Statistic help doctors
keep track of where the baby should be in his/her mental development.
Physician's also use statistics to examine the effectiveness of
treatments. Statistics are very important for observation, analysis and
mathematical prediction models.

12. Statistics is mostly used by doctors to explain risk to patients,


accessing evidence summaries, interpreting screening test results and
reading research publications

13. Statistical learning refers to the process of extracting this structure. A


major question in language acquisition in the past few decades has been
the extent to which infants use statistical learning mechanisms to
acquire their native language.
14. Statistical learning theory is a framework for machine learning
drawing from the fields of statistics and functional analysis. Statistical
learning theory deals with the problem of finding a predictive function
based on data.

Feature Engineering

1. Feature engineering is the process of transforming raw data into


features that better represent the underlying problem to the predictive
models, resulting in improved model accuracy on unseen data.

2. Feature engineering involves leveraging data mining techniques to


extract features from raw data along with the use of domain
knowledge. Feature engineering is useful to improve the performance
of machine learning algorithms and is often considered as applied
machine learning

3. Feature engineering is the process of using data's domain knowledge


to create features that make machine learning algorithms work
(Wikipedia). It's the act of extracting important features from raw data
and transforming them into formats that are suitable for machine
learning.

4. Feature Selection: Select a subset of input features from the dataset.


Unsupervised: Do not use the target variable (e.g. remove redundant
variables). Correlation. Supervised: Use the target variable (e.g.
remove irrelevant variables). Wrapper: Search for well-performing
subsets of features. RFE.
5. Feature engineering creates features from the existing raw data in
order to increment the predictive power of the machine learning
algorithms. Generally, the feature engineering process is applied to
generate additional features from the raw data.

6. Engineering and selecting the correct features for a model will not only
significantly improve its predictive power, but will also offer the
flexibility to use less complex models that are faster to run and more
easily understood.

7. The most common techniques of feature scaling are Normalization


and Standardization. Normalization is used when we want to bound
our values between two numbers, typically, between [0,1] or [-1,1].
While Standardization transforms the data to have zero mean and a
variance of 1, they make our dataunitless.
Machine Learning Pipelines

1. Machine learning pipeline. One definition of a machine learning


pipeline is a means of automating the machine learning workflowby
enabling data to be transformed and correlated into a model that can
then be analyzed to achieve outputs. This type of ML pipeline makes
the process of inputting data into the ML model fully automated.

2. Create the resources required to run an ML pipeline. Set up a


datastoreused to access the data needed in the pipeline steps.
Configure a Dataset object to point to persistent data that lives in, or
is accessible in, a datastore. Set up the compute targets on which your
pipeline steps will run.

3. Data collection. Funnelling incoming data into a data store is the first
step of any ML workflow. The key point is that data is persisted
without undertaking any transformation at all, to allow us to have an
immutable record of the original dataset.

4. A Reserve Component category designation that identifies untrained


officer and enlisted personnel who have not completed initial active
duty for training of 12 weeks or its equivalent. See also
nondeployable account. Dictionary of Military and Associated Terms.
US Department of Defense 2005.
5. One definition of a machine learning pipeline is a means of
automating the machine learning workflow by enabling data to be
transformed and correlated into a model that can then beanalyzed to
achieve outputs. This type of ML pipeline makes the process of
inputting data into the ML model fully automated.

6. A machine learning pipeline is used to help automate machine


learning workflows. They operate by enabling a sequence of datato
be transformed and correlated together in a model that can be tested
and evaluated to achieve an outcome, whether positive or negative.

7. Scikit-learn's pipeline class is a useful tool for encapsulating multiple


different transformers alongside an estimator into one object, so that
you only have to call your important methods once ( fit() , predict() ,
etc).
What is PyTorch

1. PyTorch is an open-source machine learning library based on the


Torch library, used for applications such as computer vision and
natural language processing.

2. As you might be aware, PyTorch is an open source machine learning


library used primarily for applications such as computer vision and
natural language processing. PyTorch is a strong player in the field of
deep learning and artificial intelligence, and it can be considered
primarily as a research-first library.

3. So, both TensorFlow and PyTorch provide useful abstractions to


reduce amounts of boilerplate code and speed up model development.
The main difference between them is that PyTorch may feel more
“pythonic” and has an object-oriented approach while TensorFlow has
several options from which you may choose.

4. PyTorch is an open-source machine learning library based on the


Torch library, used for applications such as computer vision and
natural language processing, primarily developed by Facebook's AI
Research lab (FAIR).
Chapter 4

Data Visualization

Data visualization is the graphical representation of information and data. By using visual
elements like charts, graphs, and maps, data visualization tools provide an accessible way
to see and understand trends, outliers, and patterns in data.

In the world of Big Data, data visualization tools and technologies are essential to analyze
massive amounts of information and make data-driven decisions.

Data visualization is another form of visual art that grabs our interest and keeps our eyes
on the message. When we see a chart, we quickly see trends and outliers. If we can see
something, we internalize it quickly. It’s storytelling with a purpose. If you’ve ever stared
at a massive spreadsheet of data and couldn’t see a trend, you know how much more
effective a visualization can be.

Common general types of data visualization:

 Charts
 Tables
 Graphs
 Maps
 Infographics
 Dashboards

More specific examples of methods to visualize data:

 Area Chart
 Bar Chart
 Box-and-whisker Plots
 Bubble Cloud
 Bullet Graph
 Cartogram
 Circle View
 Dot Distribution Map
 Gantt Chart
 Heat Map
 Highlight Table
 Histogram
 Matrix
 Network
 Polar Area
 Radial Tree
 Scatter Plot (2D or 3D)
 Streamgraph
 Text Tables
 Timeline
 Treemap
 Wedge Stack Graph
 Word Cloud
 And any mix-and-match combination in a dashboard!

interactive data visualization enables direct actions on a graphical plot to change


elements and link between multiple plots.[33]
Interactive data visualization has been a pursuit of statisticians since the late 1960s.
Examples of the developments can be found on the American Statistical
Association video lending library.[34]
Common interactions include:
 Brushing: works by using the mouse to control a paintbrush, directly changing the
color or glyph of elements of a plot. The paintbrush is sometimes a pointer and
sometimes works by drawing an outline of sorts around points; the outline is
sometimes irregularly shaped, like a lasso. Brushing is most commonly used when
multiple plots are visible and some linking mechanism exists between the plots. There
are several different conceptual models for brushing and a number of common linking
mechanisms. Brushing scatterplots can be a transient operation, in which points in the
active plot only retain their new characteristics while they are enclosed or intersected
by the brush, or it can be a persistent operation, so that points retain their new
appearance after the brush has been moved away. Transient brushing is usually chosen
for linked brushing, as we have just described.

 Painting: Persistent brushing is useful when we want to group the points into clusters
and then proceed to use other operations, such as the tour, to compare the groups. It is
becoming common terminology to call the persistent operation painting,

 Identification: which could also be called labeling or label brushing, is another plot
manipulation that can be linked. Bringing the cursor near a point or edge in a
scatterplot, or a bar in a barchart, causes a label to appear that identifies the plot
element. It is widely available in many interactive graphics, and is sometimes called
mouseover.

 Scaling: maps the data onto the window, and changes in the area of the. mapping
function help us learn different things from the same plot. Scaling is commonly used
to zoom in on crowded regions of a scatterplot, and it can also be used to change the
aspect ratio of a plot, to reveal different features of the data.

 Linking: connects elements selected in one plot with elements in another plot. The
simplest kind of linking, one-to-one, where both plots show different projections of the
same data, and a point in one plot corresponds to exactly one point in the other. When
using area plots, brushing any part of an area has the same effect as brushing it all and
is equivalent to selecting all cases in the corresponding category.
Data Mapping
Data mapping is a way to organize various bits of data into a manageable and easy-to-
understand system. This system matches data fields with target fields while in storage.
Simply put, not all data goes by the same organizational standards. They may refer to a
phone number in as many different ways as you can think of. Data mapping recognizes
phone numbers for what they are and puts them all in the same field rather than having
them drift around by other names.
With this technique, we're able to take the organized data and put a bigger picture
together. You can find out where most of your target audience lives, learn what sorts of
things they have in common and even figure out a few controversies that you shouldn't
touch on.Armed with this information, your business can make smarter decisions and
spend less money while spinning your products and services to your audience.

Data Mapping and Machine Learning

The earlier example of recognizing phone numbers has a lot to do with something
called unification and data cleaning. These processes are often powered by machine
learning, which is not to be confused with artificial intelligence.
Machine learning uses patterns and inference to offer predictions rather than perform a
single task, which is more of a subset of AI technology than anything. In the earlier
example, machine learning is used to recognize a phone number and assign it to its proper
category for organizational purposes.
Machine learning goes a step beyond just recognizing phone numbers though. The
technology can recognize errors like missing values or typos and group information from
the same source together.
That's what data cleaning and unification really means — to clean up all of the data
without any human input and present the information in its most perfect and precise form.
This process saves time and is also more effective in regard to how correct the
information will be.
The data can then be displayed in almost any way a person or company needs to see it.
For instance, geospatial data is one route machine learning can automatically take and
create without input. Geospatial data is basically translating data into a map and plotting
out physical locations and routes that your target audience takes every day. This technique
can provide a unique aid to your next advertising campaign.
Why Machine Learning Is Important to Data Mapping

Machine learning allows data mapping to be more precise. Without that technology, data
mapping would be either very rudimentary or have to be done completely manually.
Assuming we go the rudimentary route, a simple spreadsheet would be able to take
information and plug into its best guess of a proper category. Typos wouldn't be fixed,
missing values would remain missing and some information would just be scattered in
random places.
Trying to complete data mapping manually would be worse. For one, a person would
never be able to keep up with the flow of information, not to mention the backlog of
information already hiding and in need of sorting in the Internet of Things. Assuming
someone could keep up with the flow, there would still be errors as the sheer amount of
data would lead to the human being unable to notice connections like a machine could.

Why Data Mapping Is Important to You

The use of data is an extremely important part of modern-day marketing. Knowing the
best possible place and time to reach customers will allow you to target your audience
more efficiently.Even large industries that can afford to splay their names in all possible
media outlets use data mapping to save money and appear more loyal to their customer
base.Big or small, you can use this information and get ahead of everyone else vying for
your customers' attention. The competition is dense these days, so getting ahead of the
curve and staying ahead is an art everyone is trying to perfect. Data mapping can help you
get there as early as possible.

Uses of Data Map

Population Distribution

According to demographic data such as age, gender, income, education level, etc., analyze
and classify customers in different regions or communities on the map. Data can help us
figure out their lifestyle, interests and shopping habits.

Market Capacity Forecast


Analyze the resource investments, sales revenue, and product sales of each outlet on the
map, and predict the capacity of the entire market, so that the resources can be
scientifically allocated to the region with the greatest market potential.

Basics of Neural Networks

Neural networks, in the world of finance, assist in the development of such process as
time-series forecasting, algorithmic trading, securities classification, credit risk modeling
and constructing proprietary indicators and price derivatives.

A neural network works similarly to the human brain’s neural network. A “neuron” in a
neural network is a mathematical function that collects and classifies information
according to a specific architecture. The network bears a strong resemblance to statistical
methods such as curve fitting and regression analysis.

A neural network contains layers of interconnected nodes. Each node is a perceptron and
is similar to a multiple linear regression. The perceptron feeds the signal produced by a
multiple linear regression into an activation function that may be nonlinear.

In a multi-layered perceptron (MLP), perceptrons are arranged in interconnected layers.


The input layer collects input patterns. The output layer has classifications or output
signals to which input patterns may map. For instance, the patterns may comprise a list of
quantities for technical indicators about a security; potential outputs could be “buy,”
“hold” or “sell.”

Hidden layers fine-tune the input weightings until the neural network’s margin of error is
minimal. It is hypothesized that hidden layers extrapolate salient features in the input data
that have predictive power regarding the outputs. This describes feature extraction, which
accomplishes a utility similar to statistical techniques such as principal component
analysis.

Application of Neural Networks


Neural networks are broadly used, with applications for financial operations, enterprise
planning, trading, business analytics and product maintenance. Neural networks have also
gained widespread adoption in business applications such as forecasting and marketing
research solutions, fraud detection and risk assessment.

A neural network evaluates price data and unearths opportunities for making trade
decisions based on the data analysis. The networks can distinguish subtle nonlinear
interdependencies and patterns other methods of technical analysis cannot. According to
research, the accuracy of neural networks in making price predictions for stocks differs.
Some models predict the correct stock prices 50 to 60 percent of the time while others are
accurate in 70 percent of all instances. Some have posited that a 10 percent improvement
in efficiency is all an investor can ask for from a neural network.

There will always be data sets and task classes that a better analyzed by using previously
developed algorithms. It is not so much the algorithm that matters; it is the well-prepared
input data on the targeted indicator that ultimately determines the level of success of a
neural network.

Fuzz Logic

Fuzzy logic is based on the observation that people make decisions based on imprecise
and non-numerical information. Fuzzy models or sets are mathematical means of
representing vagueness and imprecise information (hence the term fuzzy). These models
have the capability of recognising, representing, manipulating, interpreting, and utilising
data and information that are vague and lack certainty.[5]

Fuzzy logic has been applied to many fields, from control theory to artificial intelligence.
Fuzzification is the process of assigning the numerical input of a system to fuzzy sets with
some degree of membership. This degree of membership may be anywhere within the
interval [0,1]. If it is 0 then the value does not belong to the given fuzzy set, and if it is 1
then the value completely belongs within the fuzzy set. Any value between 0 and 1
represents the degree of uncertainty that the value belongs in the set. These fuzzy sets are
typically described by words, and so by assigning the system input to fuzzy sets, we can
reason with it in a linguistically natural manner.
For example, in the image below the meanings of the expressions cold, warm, and hot are
represented by functions mapping a temperature scale. A point on that scale has three
"truth values"—one for each of the three functions. The vertical line in the image
represents a particular temperature that the three arrows (truth values) gauge. Since the
red arrow points to zero, this temperature may be interpreted as "not hot"; i.e. this
temperature has zero membership in the fuzzy set "hot". The orange arrow (pointing at
0.2) may describe it as "slightly warm" and the blue arrow (pointing at 0.8) "fairly cold".
Therefore, this temperature has 0.2 membership in the fuzzy set "warm" and 0.8
membership in the fuzzy set "cold". The degree of membership assigned for each fuzzy
set is the result of fuzzification.
Chapter 5

Natural Language Processing

Natural language processing (NLP) is a branch of artificial intelligence that helps


computers understand, interpret and manipulate human language. NLP draws from many
disciplines, including computer science and computational linguistics, in its pursuit to fill
the gap between human communication and computer understanding.

Natural language processing helps computers communicate with humans in their own
language and scales other language-related tasks. For example, NLP makes it possible for
computers to read text, hear speech, interpret it, measure sentiment and determine which
parts are important.

Today’s machines can analyze more language-based data than humans, without fatigue
and in a consistent, unbiased way. Considering the staggering amount of unstructured data
that’s generated every day, from medical records to social media, automation will be
critical to fully analyze text and speech data efficiently.

Structuring a highly unstructured data source

Human language is astoundingly complex and diverse. We express ourselves in infinite


ways, both verbally and in writing. Not only are there hundreds of languages and dialects,
but within each language is a unique set of grammar and syntax rules, terms and slang.
When we write, we often misspell or abbreviate words, or omit punctuation. When we
speak, we have regional accents, and we mumble, stutter and borrow terms from other
languages.

While supervised and unsupervised learning, and specifically deep learning, are now
widely used for modeling human language, there’s also a need for syntactic and semantic
understanding and domain expertise that are not necessarily present in these machine
learning approaches. NLP is important because it helps resolve ambiguity in language and
adds useful numeric structure to the data for many downstream applications, such
as speech recognition or text analytics.
Genetic Algorithms

Nature has always been a great source of inspiration to all mankind. Genetic Algorithms
(GAs) are search based algorithms based on the concepts of natural selection and
genetics. GAs are a subset of a much larger branch of computation known
as Evolutionary Computation.
GAs were developed by John Holland and his students and colleagues at the University
of Michigan, most notably David E. Goldberg and has since been tried on various
optimization problems with a high degree of success.
In GAs, we have a pool or a population of possible solutions to the given problem. These
solutions then undergo recombination and mutation (like in natural genetics), producing
new children, and the process is repeated over various generations. Each individual (or
candidate solution) is assigned a fitness value (based on its objective function value) and
the fitter individuals are given a higher chance to mate and yield more “fitter”
individuals. This is in line with the Darwinian Theory of “Survival of the Fittest”.
In this way we keep “evolving” better individuals or solutions over generations, till we
reach a stopping criterion.
Genetic Algorithms are sufficiently randomized in nature, but they perform much better
than random local search (in which we just try various random solutions, keeping track
of the best so far), as they exploit historical information as well.

Advantages of GAs
GAs have various advantages which have made them immensely popular. These include

 Does not require any derivative information (which may not be available for many
real-world problems).
 Is faster and more efficient as compared to the traditional methods.
 Has very good parallel capabilities.
 Optimizes both continuous and discrete functions and also multi-objective
problems.
 Provides a list of “good” solutions and not just a single solution.
 Always gets an answer to the problem, which gets better over the time.
 Useful when the search space is very large and there are a large number of
parameters involved.

Limitations of GAs
Like any technique, GAs also suffer from a few limitations. These include −
 GAs are not suited for all problems, especially problems which are simple and for
which derivative information is available.
 Fitness value is calculated repeatedly which might be computationally expensive
for some problems.
 Being stochastic, there are no guarantees on the optimality or the quality of the
solution.
 If not implemented properly, the GA may not converge to the optimal solution.
Chapter 6

Natural Language Processing or NLP is a field of Artificial Intelligence that gives the
machines the ability to read, understand and derive meaning from human languages.

It is a discipline that focuses on the interaction between data science and human language,
and is scaling to lots of industries. Today NLP is booming thanks to the huge
improvements in the access to data and the increase in computational power, which are
allowing practitioners to achieve meaningful results in areas like healthcare, media, finance
and human resources, among others.

Use Cases of NLP


In simple terms, NLP represents the automatic handling of natural human language like
speech or text, and although the concept itself is fascinating, the real value behind this
technology comes from the use cases.

NLP can help you with lots of tasks and the fields of application just seem to increase on a
daily basis. Let’s mention some examples:

 NLP enables the recognition and prediction of diseases based on electronic health
records and patient’s own speech. This capability is being explored in health
conditions that go from cardiovascular diseases to depression and even schizophrenia.
For example, Amazon Comprehend Medical is a service that uses NLP to extract
disease conditions, medications and treatment outcomes from patient notes, clinical
trial reports and other electronic health records.

 Organizations can determine what customers are saying about a service or product by
identifying and extracting information in sources like social media. This sentiment
analysis can provide a lot of information about customers choices and their decision
drivers.

 An inventor at IBM developed a cognitive assistant that works like a personalized


search engine by learning all about you and then remind you of a name, a song, or
anything you can’t remember the moment you need it to.
 Companies like Yahoo and Google filter and classify your emails with NLP by
analyzing text in emails that flow through their servers and stopping spam before they
even enter your inbox.

 To help identifying fake news, the NLP Group at MIT developed a new system to
determine if a source is accurate or politically biased, detecting if a news source can be
trusted or not.

 Amazon’s Alexa and Apple’s Siri are examples of intelligent voice driven
interfaces that use NLP to respond to vocal prompts and do everything like find a
particular shop, tell us the weather forecast, suggest the best route to the office or turn
on the lights at home.

 Having an insight into what is happening and what people are talking about can be
very valuable to financial traders. NLP is being used to track news, reports,
comments about possible mergers between companies, everything can be then
incorporated into a trading algorithm to generate massive profits. Remember: buy the
rumor, sell the news.

 NLP is also being used in both the search and selection phases of talent recruitment,
identifying the skills of potential hires and also spotting prospects before they become
active on the job market.

 Powered by IBM Watson NLP technology, LegalMation developed a platform to


automate routine litigation tasks and help legal teams save time, drive down costs and
shift strategic focus.

NLP is particularly booming in the healthcare industry. This technology is improving


care delivery, disease diagnosis and bringing costs down while healthcare organizations
are going through a growing adoption of electronic health records. The fact that clinical
documentation can be improved means that patients can be better understood and benefited
through better healthcare. The goal should be to optimize their experience, and several
organizations are already working on this.
Number of publications containing the sentence “natural language processing” in PubMed
in the period 1978–2018. As of 2018, PubMed comprised more than 29 million citations
for biomedical literature

Companies like Winterlight Labs are making huge improvements in the treatment of
Alzheimer’s disease by monitoring cognitive impairment through speech and they can also
support clinical trials and studies for a wide range of central nervous system disorders.
Following a similar approach, Stanford University developed Woebot, a chatbot
therapist with the aim of helping people with anxiety and otherdisorders.

But serious controversy is around the subject. A couple of years ago Microsoft
demonstrated that by analyzing large samples of search engine queries, they could identify
internet users who were suffering from pancreatic cancer even before they have received a
diagnosis of the disease. How would users react to such diagnosis? And what would
happen if you were tested as a false positive? (meaning that you can be diagnosed with the
disease even though you don’t have it). This recalls the case of Google Flu Trends which
in 2009 was announced as being able to predict influenza but later on vanished due to its
low accuracy and inability to meet its projected rates.
NLP may be the key to an effective clinical support in the future, but there are still many
challenges to face in the short term.

Basic NLP to impress your non-NLP friends

The main drawbacks we face these days with NLP relate to the fact that language is very
tricky. The process of understanding and manipulating language is extremely complex, and
for this reason it is common to use different techniques to handle different challenges
before binding everything together. Programming languages like Python or R are highly
used to perform these techniques, but before diving into code lines (that will be the topic of
a different article), it’s important to understand the concepts beneath them. Let’s
summarize and explain some of the most frequently used algorithms in NLP when defining
the vocabulary of terms:

Bag of Words
Is a commonly used model that allows you to count all words in a piece of text. Basically it
creates an occurrence matrix for the sentence or document, disregarding grammar and
word order. These word frequencies or occurrences are then used as features for training a
classifier.

Tokenization
Is the process of segmenting running text into sentences and words. In essence, it’s the task
of cutting a text into pieces called tokens, and at the same time throwing away certain
characters, such as punctuation. Following our example, the result of tokenization would
be:

Pretty simple, right? Well, although it may seem quite basic in this case and also in
languages like English that separate words by a blank space (called segmented languages)
not all languages behave the same, and if you think about it, blank spaces alone are not
sufficient enough even for English to perform proper tokenizations. Splitting on blank
spaces may break up what should be considered as one token, as in the case of certain
names (e.g. San Francisco or New York) or borrowed foreign phrases (e.g. laissez faire).
Tokenization can remove punctuation too, easing the path to a proper word
segmentation but also triggering possible complications. In the case of periods that follow
abbreviation (e.g. dr.), the period following that abbreviation should be considered as part
of the same token and not be removed.

The tokenization process can be particularly problematic when dealing with biomedical
text domains which contain lots of hyphens, parentheses, and other punctuation marks.

For deeper details on tokenization, you can find a great explanation in this article.

Stop Words Removal


Includes getting rid of common language articles, pronouns and prepositions such as
“and”, “the” or “to” in English. In this process some very common words that appear to
provide little or no value to the NLP objective are filtered and excluded from the text to be
processed, hence removing widespread and frequent terms that are not informative about
the corresponding text.

Stop words can be safely ignored by carrying out a lookup in a pre-defined list of
keywords, freeing up database space and improving processing time.

There is no universal list of stop words. These can be pre-selected or built from scratch.
A potential approach is to begin by adopting pre-defined stop words and add words to the
list later on. Nevertheless it seems that the general trend over the past time has been to go
from the use of large standard stop word lists to the use of no lists at all.

The thing is stop words removal can wipe out relevant information and modify the context
in a given sentence. For example, if we are performing a sentiment analysis we might
throw our algorithm off track if we remove a stop word like “not”. Under these conditions,
you might select a minimal stop word list and add additional terms depending on your
specific objective.

Stemming

Refers to the process of slicing the end or the beginning of words with the intention of
removing affixes (lexical additions to the root of the word).Affixes that are attached at the
beginning of the word are called prefixes (e.g. “astro” in the word “astrobiology”) and the
ones attached at the end of the word are called suffixes (e.g. “ful” in the word
“helpful”).The problem is that affixes can create or expand new forms of the same word
called inflectional affixes

A possible approach is to consider a list of common affixes and rules (Python and R
languages have different libraries containing affixes and methods) and perform stemming
based on them, but of course this approach presents limitations. Since stemmers use
algorithmics approaches, the result of the stemming process may not be an actual word or
even change the word (and sentence) meaning. To offset this effect you can edit those
predefined methods by adding or removing affixes and rules, but you must consider that
you might be improving the performance in one area while producing a degradation in
another one. Always look at the whole picture and test your model’s performance.

So if stemming has serious limitations, why do we use it? First of all, it can be used to
correct spelling errors from the tokens. Stemmers are simple to use and run very
fast (they perform simple operations on a string), and if speed and performance are
important in the NLP model, then stemming is certainly the way to go. Remember, we use
it with the objective of improving our performance, not as a grammar exercise.

Lemmatization
Has the objective of reducing a word to its base form and grouping together different forms
of the same word. For example, verbs in past tense are changed into present (e.g. “went” is
changed to “go”) and synonyms are unified (e.g. “best” is changed to “good”), hence
standardizing words with similar meaning to their root. Although it seems closely related
to the stemming process, lemmatization uses a different approach to reach the root forms
of words.
Lemmatization resolves words to their dictionary form (known as lemma) for which it
requires detailed dictionaries in which the algorithm can look into and link words to their
corresponding lemmas.

For example, the words “running”, “runs” and “ran” are all forms of the word “run”, so
“run” is the lemma of all the previous words.

Lemmatization also takes into consideration the context of the word in order to solve other
problems like disambiguation, which means it can discriminate between identical words
that have different meanings depending on the specific context. Think about words like
“bat” (which can correspond to the animal or to the metal/wooden club used in baseball) or
“bank” (corresponding to the financial institution or to the land alongside a body of water).
By providing a part-of-speech parameter to a word ( whether it is a noun, a verb, and so
on) it’s possible to define a role for that word in the sentence and remove disambiguation.

As you might already pictured, lemmatization is a much more resource-intensive task than
performing a stemming process. At the same time, since it requires more knowledge about
the language structure than a stemming approach, it demands more computational
power than setting up or adapting a stemming algorithm.

Topic Modeling

Is as a method for uncovering hidden structures in sets of texts or documents. In essence it


clusters texts to discover latent topics based on their contents, processing individual words
and assigning them values based on their distribution. This technique is based on the
assumptions that each document consists of a mixture of topics and that each topic consists
of a set of words, which means that if we can spot these hidden topics we can unlock the
meaning of our texts.

From the universe of topic modelling techniques, Latent Dirichlet Allocation (LDA) is
probably the most commonly used. This relatively new algorithm (invented less than 20
years ago) works as an unsupervised learning method that discovers different topics
underlying a collection of documents. In unsupervised learning methods like this one,
there is no output variable to guide the learning process and data is explored by algorithms
to find patterns. To be more specific, LDA finds groups of related words by:

1. Assigning each word to a random topic, where the user defines the number of topics it
wishes to uncover. You don’t define the topics themselves (you define just the number
of topics) and the algorithm will map all documents to the topics in a way that words
in each document are mostly captured by those imaginary topics.

2. The algorithm goes through each word iteratively and reassigns the word to a topic
taking into considerations the probability that the word belongs to a topic, and the
probability that the document will be generated by a topic. These probabilities are
calculated multiple times, until the convergence of the algorithm.

Unlike other clustering algorithms like K-means that perform hard clustering (where topics
are disjointed), LDA assigns each document to a mixture of topics, which means that each
document can be described by one or more topics (e.g. Document 1 is described by 70% of
topic A, 20% of topic B and 10% of topic C) and reflect more realistic results.

Topic modeling is extremely useful for classifying texts, building recommender systems
(e.g. to recommend you books based on your past readings) or even detecting trends in
online publications.
Chapter 8
WordPress Demand:

WordPress has long been the most popular content management system (CMS). This CMS

powers millions and millions of websites. Although WordPress has had a particularly bad

track record in terms of security, in recent years many of the well-known security risks

have transmuted from the core WordPress to the numerous plugins and themes written for

the CMS.

A demand-side viewpoint was used to motivate the analysis; the basic hypothesis is that

plugins with large installation bases have been affected by multiple vulnerabilities. As the

hypothesis also holds according to the empirical results.

WordPress is web publishing software you can use to create a beautiful website or blog. It

just may be the easiest and most flexible blogging and website content management

system (CMS) for beginners. It is a web publishing software you can use to create your

own website or blog. Since it was released in 2003, WordPress has become one of the most

popular web publishing platforms. And today it powers more 35% of the entire web —

everything from hobby blogs to some of the most popular websites online.

It enables you to build and manage your own full-featured website using just your web

browser—without having to learn how to code.


Reasons why WordPress is a great choice for building your
website:

WordPress is released under an Open-Source license -- which means you can download

and use the WordPress software anyway for FREE. But it also means that hundreds of

volunteers from all around the world are constantly working to improve the WordPress

software.

 WordPress is easy to learn and use -- You don’t need to hire a web designer every

time you want to make a small change to your website. Instead, you can easily

update and create your own content… without having to learn how tocode.

 WordPress is completely customizable -- There are thousands of themes and

plugins that enable you to change the entire look of your website, or even add

features like an online store, a photo gallery, or a mailing list.

 It is compactable and designed for everyone, not just developers -- Before

WordPress became a popular CMS for website development, it was developed for

non-tech savvy bloggers. So, most of the user-interface components are easy to use.

 It has lower setup and maintenance costs -- WordPress incurs fewer setup,

customization, and maintenance costs in comparison to other Open Source CMS. It

is relatively easier to find WordPress designers and developers if more

customization or development is necessary in the future.


WordPress Basics:

There are a few stuff to know about the basics of WordPress that are necessary for building

up the necessary requirements.

 Difference Between Posts vs. Pages in WordPress

 Difference Between Categories vs Tags

 How to add images in WordPress

Difference Between Posts vs. Pages in WordPress:

WordPress comes with two content types: posts and pages.

Posts are blog content listed in a reverse chronological order. You will see posts listedon

your blog page. If you are using WordPress as a blog, then you will end up using posts for

the majority of your website’s content. You can add and edit your WordPress posts from

the ‘Posts’ menu in your dashboard. Here is how Add New Post screen looks. Due to their

reverse chronological order, your posts are meant to be timely. Older posts are archived

based on month and year.

Difference Between Categories vs Tags:

Categories are meant for broad grouping of your posts. Think of these as general topics or

the table of contents for your WordPress site. Categories are hierarchical which means you

can create sub-categories.

Tags are meant to describe specific details of your posts. Think of these as your site’s

index words. They let you micro-categorize your content. Tags are not hierarchical.
Content Management System (CMS):

The Content Management System (CMS) is a software which stores all the data

such as text, photos, music, documents, etc. and is made available on your website.

It helps in editing, publishing and modifying the content of the website.

WordPress is an open source Content Management System (CMS), which allows the

users to build dynamic websites and blogs. WordPress is the most popular blogging

system on the web and allows updating, customizing and managing the website

from its back-end CMS and components.

Features:

 User Management − It allows managing the user information such as changing the

role of the users to (subscriber, contributor, author, editor or administrator), create or

delete the user, change the password and user information. The main role of the user

manager is Authentication.

 Media Management − It is the tool for managing the media files and folder, in

which you can easily upload, organize and manage the media files on your website.

 Theme System − It allows modifying the site view and functionality. It includes

images, stylesheet, template files and custom pages.

 Extend with Plugins − Several plugins are available which provides custom

functions and features according to the users need.

 Search Engine Optimization − It provides several search engine optimization

(SEO) tools which makes on-site SEO simple.


 Multilingual − It allows translating the entire content into the language preferred by

the user.

 Importers − It allows importing data in the form of posts. It imports custom files,

comments, post pages and tags.

Advantages:

 It is an open source platform and available for free.

 CSS files can be modified according to the design as per users need.

 There are many plugins and templates available for free. Users can customize the
various plugins as per their need.

 Media files canbe uploaded easily and quickly.

 Customization is easy according to the user's needs.

Disadvantages:

 Using several plugins can make the website heavy to load and run.

 PHP knowledge is required to make modifications or changes in the WordPress

website.

 Modifying and formatting the graphic images and tables is difficult.


WordPress Website from Scratch:

Here is an overview from start to finish of all the steps

 How to find and register a domain name for free.

 hosing the best web hosting.

 How to install WordPress.

 Installinga template to change your site’s design.

 Creating pages in WordPress.

 Customizing WordPress withaddons and extensions.

 Resources to learn WordPress and getsupport.

 Taking it further, building websites with morefeatures.


Chapter 9

WordPress Basics:

WordPress was primarily a tool to create a blog, rather than more traditional websites. That

hasn’t been true for a long time, though. Nowadays, thanks to changes to the core code, as

well as WordPress massive ecosystem of plugins and themes, you can create any type of

website with WordPress.

WordPress powers a huge number of business sites and blogs, it’s also the most popular

way to create an eCommerce website

WordPress is used by individuals, big businesses, and everyone in between! On an

immediate note, we use WordPress! So the very site that you’re looking at right now is

powered by WordPress. Lots of other well-known entities use WordPress as well.

WordPress Is Extensible

Even if you aren’t a developer, you can easily modify your website thanks to WordPress’

huge ecosystem of themes and plugins

Themes – these primarily change how your website looks.

Plugins – these primarily change how your website functions.


Custom WordPress Theme:

Unlike static HTML sites, WordPress themes are a set of template files written in PHP,

HTML, CSS, and JavaScript. Typically, you would need to have a decent understanding of

all these web design languages or hire a web developer to create a custom WordPress

theme.

On the other hand, WordPress page builder plugins made it super easy to create custom

page layouts using a drag & drop interface, but they were limited to layouts only. You

couldn’t build custom themes with it.

Until Beaver Builder, one of the best WordPress page builder plugins decided to solve this

problem with their add-on called Beaver Themer.

Beaver Themer is a site builder add-on that allows you to create custom theme layouts

using a drag and drop interface and without learning to code.

Beaver Themer allows you to create a custom theme, but you will still need a theme to

start with. We recommend using a light-weight theme that includes a full-width page

template to act as your starter theme.


Project
1. Introduction:
Through chat bots one can communicate with text or voice interface and get reply through artificial
intelligence. Typically, a chat bot will communicate with a real person. Chat bots are used in applications
such as ecommerce customer service, call centers and Internet gaming. Chat bots are programs built to
automatically engage with received messages.

Chat bots can be programmed to respond the same way each time, to respond differently to messages
containing certain keywords and even to use machine learning to adapt their responses to fit the situation.
A developing number of hospitals, nursing homes, and even private centers, presently utilize online Chat
bots for human services on their sites. These bots connect with potential patients visiting the site, helping
them discover specialists, booking their appointments, and getting them access to the correct treatment.

An ML model has to be created wherein we could give any text input and on the basis of training data it
must analyze the symptoms. A Supervised Logistic Regression machine learning algorithm can be
implemented to train the model with data sets containing various diseases CSV files. The goal is to
compare outputs of various models and suggest the best model that can be used for symptoms in real-
world inputs. Data set contains CSV file having all diseases compiled together. The logistic regression
algorithm in ML allows us to process the data efficiently. The goal here is to model the underlying
structure or distribution of the data in order to learn more from the training set.

In any case, the utilization of artificial intelligence in an industry where individuals’ lives could be in
question, still starts misgivings in individuals. It brings up issues about whether the task mentioned above
ought to be assigned to human staff. This healthcare chat bot system will help hospitals to provide
healthcare support online 24 x 7, it answers deep as well as general questions. It also helps to generate
leads and automatically delivers the information of leads to sales. By asking the questions in series it
helps patients by guiding what exactly he/she is looking for.

2. Purpose and Scope:


Almost everyone kept on hold while operators connect you to a customer care executive. On an average
people spend around 7 minutes until they are assigned to a person. Gone are the frustrating days of
waiting in a queue for the next available operative. They are replacing live chat and other forms of slower
contact methods such as emails and phone calls. Since chat bots are basically virtual robots they never get
tired and continue to obey your command. They will continue to operate every day throughout the year
without requiring to take a break.
3. Problem Statement:
Through chat bots one can communicate with text or voice interface and get reply through artificial
intelligence. Typically, a chat bot will communicate with a real person. Chat bots are used in applications
such as ecommerce customer service, call centers and Internet gaming. Chat bots are programs built to
automatically engage with received messages.

Chat bots can be programmed to respond the same way each time, to respond differently to messages
containing certain keywords and even to use machine learning to adapt their responses to fit the situation.
A developing number of hospitals, nursing homes, and even private centers, presently utilize online Chat
bots for human services on their sites. These bots connect with potential patients visiting the site, helping
them discover specialists, booking their appointments, and getting them access to the correct treatment.

An ML model has to be created wherein we could give any text input and on the basis of training data it
must analyze the symptoms. A Supervised Logistic Regression machine learning algorithm can be
implemented to train the model with data sets containing various diseases CSV files. The goal is to
compare outputs of various models and suggest the best model that can be used for symptoms in real-
world inputs. Data set contains CSV file having all diseases compiled together. The logistic regression
algorithm in ML allows us to process the data efficiently. The goal here is to model the underlying
structure or distribution of the data in order to learn more from the training set.

In any case, the utilization of artificial intelligence in an industry where individuals’ lives could be in
question, still starts misgivings in individuals. It brings up issues about whether the task mentioned above
ought to be assigned to human staff. This healthcare chat bot system will help hospitals to provide
healthcare support online 24 x 7, it answers deep as well as general questions. It also helps to generate
leads and automatically delivers the information of leads to sales. By asking the questions in series it
helps patients by guiding what exactly he/she is looking for.
2. Project Analysis:
1. Review of Literature:
The main purpose of the scheme is to build the language gap between the user and health providers by
giving immediate replies to the Questions asked by the user. Today’s people are more likely addicted to
the internet but they are not concerned about their personal health. They avoid going to hospital for small
problems which may become a major disease in future. Establishing question answer forums is becoming
a simple way to answer those queries rather than browsing through the list of potentially relevant
documents from the web. Many of the existing systems have some limitations such as there is no instant
response given to the patients they have to wait for experts to acknowledge for a long time. Some of the
processes may charge an amount to perform live chat or telephony communication with doctors online.
The aim of this system is to replicate a person’s discussion.

2. Project Timeline:
Timeline provided was from Feb 1 2021-April 1 2021

3. Dataset Details:
Dataset contains description of different types of diseases. There are different sets of different
types of diseases. These sets consists of descriptions of a single disease with different doctors,
hospitals, etc.
A dataset has been created by recording sequences from over 133 number of diseases and
doctors and hospitals.
2.5 Methodology Used:
The Health-Care Chat Bot System should be written in Python, GUI links and a simple, accessible
network API. The system must provide a capacity for parallel operation and system design should not
introduce scalability issues with regard to the number of surface computers, tablets or displays connected
at any one time. The end system should also allow for seamless recovery, without data loss, from
individual device failure. There must be a strong audit chain with all system actions logged. While
interfaces are worth noting that this system is likely to conform to what is available. With that in mind,
the most adaptable and portable technologies should be used for the implementation. The system has
criticality in so far as it is a live system. If the system is down, then customers must not notice, or notice
that the system recovers quickly (seconds). The system must be reliable enough to run, crash and glitch
free more or less indefinitely, or facilitate error recovery strong enough such that glitches are never
revealed to its end-users.
3. Project Design:
1. Block Diagram:
2. Data Flow Diagram:
3. Use Case Diagram:
4. Sequence Diagram:
4. Implementation:

Work Division

1 Abhishek P R – Dataset and Basic Preprocessing and ML.


2. Kiran Kumar M B – Dataset cleaning and Choosing Best algorithm to implement.
3. Mahesh Y V – Coding and Report Preparation.
4. Mohan G – Implementation of ML and UI for project.
5. Kunal Sawant – Enhancement of Report and PPT.
1. Project Implementation Technology:
In machine learning, support-vector machines (SVMs, also support-vector networks)
are supervised learning models with associated learning algorithms that analyze data used
for classification and regression analysis. Given a set of training examples, each marked as
belonging to one or the other of two categories, an SVM training algorithm builds a model that
assigns new examples to one category or the other, making it a non-probabilistic binary linear
classifier (although methods such as Platt scaling exist to use SVM in a probabilistic
classification setting). An SVM model is a representation of the examples as points in space,
mapped so that the examples of the separate categories are divided by a clear gap that is as wide
as possible. New examples are then mapped into that same space and predicted to belong to a
category based on the side of the gap on which they fall.
In addition to performing linear classification, SVMs can efficiently perform a non-linear
classification using what is called the kernel trick, implicitly mapping their inputs into high-
dimensional feature spaces.

4.1.1 Hardware Requirement:


In recent years, a great variety of hardware solutions for real-time TSR has been proposed. These
include conventional (general purpose) computers, custom ASIC (application-specific integrated
circuit) chips, field programmable gate arrays (FPGAs), digital sign processors (DSPs) and also
graphic processing units

4.1.2 Software Requirements:


In a software-based solution running on a Linux or window system with a 2.4-GHz dual core
CPU is presented.
2. Experimental Setup:
The main purpose of the scheme is to build the language gap between the user and health providers by
giving immediate replies to the Questions asked by the user. Today’s people are more likely addicted to
the internet but they are not concerned about their personal health. Theyavoid going to hospital for small
problems which may become a major disease in future.
Establishing question answer forums is becoming a simple way to answer those queries rather than
browsing through the list of potentially relevant documents from the web. Many of the existing systems
have some limitations such as there is no instant response given to the patients they have to wait for
experts to acknowledge for a long time. Some of the processes may charge an amount to perform live
chat or telephony communication with doctors online.
The aim of this system is to replicate a person’s discussion.
3. Coding:
import numpy as np

import matplotlib.pyplot as plt


import pandas as pd

# Importing the dataset

training_dataset = pd.read_csv('Training.csv')
test_dataset = pd.read_csv('Testing.csv')

X = training_dataset.iloc[:, 0:132].values
#print(X)
y = training_dataset.iloc[:, -1].values
#print(y)

dimensionality_reduction = training_dataset.groupby(training_dataset['prognosis']).max()
#print(dimensionality_reduction)

from sklearn.preprocessing import LabelEncoder


labelencoder = LabelEncoder()
y = labelencoder.fit_transform(y)
#print(y)

from sklearn.model_selection import train_test_split


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

from sklearn.tree import DecisionTreeClassifier


classifier = DecisionTreeClassifier()
classifier.fit(X_train, y_train)

cols = training_dataset.columns
cols = cols[:-1]

importances = classifier.feature_importances_
indices = np.argsort(importances)[::-1]
features = cols

from sklearn.tree import _tree


def execute_bot():

print("Please reply with yes/Yes or no/No for the following symptoms")


def print_disease(node):
#print(node)
node = node[0]
#print(len(node))

val = node.nonzero()
#print(val)
disease = labelencoder.inverse_transform(val[0])
return disease
def tree_to_code(tree,feature_names):
tree_ = tree.tree_
#print(tree_)
feature_name = [
feature_names[i] if i != _tree.TREE_UNDEFINED else "undefined!"
for i in tree_.feature
]

#print("def tree({}):".format(", ".join(feature_names)))


symptoms_present = []

def recurse(node, depth):


indent = " " * depth

if tree_.feature[node] != _tree.TREE_UNDEFINED:
name = feature_name[node]
threshold = tree_.threshold[node]
print(name + " ?")
ans = input()
ans = ans.lower()
if ans == 'yes':
val = 1
else:
val = 0
if val <= threshold:

recurse(tree_.children_left[node], depth + 1)

else:
symptoms_present.append(name)
recurse(tree_.children_right[node], depth + 1)
else:
present_disease = print_disease(tree_.value[node])
print( "You may have " + present_disease )
print()
red_cols = dimensionality_reduction.columns
symptoms_given =
red_cols[dimensionality_reduction.loc[present_disease].values[0].nonzero()]
print("symptoms present " + str(list(symptoms_present)))
print()

print("symptoms given " + str(list(symptoms_given)) )


print()

confidence_level = (1.0*len(symptoms_present))/len(symptoms_given)
print("confidence level is " + str(confidence_level))

print()
print('The modelsuggests:')

print()

row = doctors[doctors['disease'] == present_disease[0]]


print('Consult ', str(row['name'].values))
print()
print('Visit ', str(row['link'].values))
#print(present_disease[0])

recurse(0, 1)

tree_to_code(classifier,cols)

doc_dataset = pd.read_csv('doctors_dataset.csv', names = ['Name', 'Description'])

diseases = dimensionality_reduction.index
diseases = pd.DataFrame(diseases)

doctors = pd.DataFrame()
doctors['name'] = np.nan
doctors['link'] = np.nan
doctors['disease'] = np.nan
doctors['disease'] = diseases['prognosis']
doctors['name'] = doc_dataset['Name']
doctors['link'] = doc_dataset['Description']
record = doctors[doctors['disease'] == 'AIDS']
record['name']
record['link']

execute_bot()
4. Testing:
Without a well-thought testing effort, the project will undoubtedly fail overall and will impact
the entire operational performance of the solution. With a poorly tested solution, the support and
maintenance cost will escalate exponentially, and the reliability of the solution will be poor
Therefore, project managers need to realize that the testing effort is a necessity, not merely as anad
hoc task that is the last hurdle before deployment.

The project manager should pay specific attention to developing a complete testing plan and
schedule. At this stage, the project manager should have realized that this effort would have to be
accommodated within the project budget, as many of the testing resources will be designing,
testing, and validating the solution throughout the entire project life cycle—and this consumes
work-hours and resources.

The testing effort begins at the initial project phase (i.e. preparing test plans) and continues
throughout until the closure phase

5. Result:
1. Snapshot of Result:
 Snapshot
 Analysis of Result

Fig 5.1.1

It is a
Snapshot
of
working
of the
ChatBot
6. Advantage and Disadvantages of Model:
1. Advantages:
1. Omni-capable
• The chat bot converses seamlessly across multiple digital channels and retains data and
context for a seamless experience. In best cases, even passing that information to a live
agent if needed.

2. Free to Explore
• The chat bot can reach, consume, and process vast amounts of data– both structuredand
unstructured–to surface insights from any source - to gather relevant data to solve
customer issues quickly.

3. Autonomous Reasoning
• The chat bot can perform complex reasoning without human intervention. For example, a
great Service chatbot should be able to infer solutions based on relevant case histories.

4. Pre-Trained
• The chat bot is pre-trained to understand brand-specific or industry-specific knowledge
and terms. Even better, it’s pre-configured to resolve common customer requests of a
particular industry.

5. Register/Log-in
• To access this chat bot and individual needs to register and then use the registration ID to
log in to access the features.

6. User Interface
• A user friendly interface which is engaging and easy to access.
2. Disadvantages:
• Complex Interface – Chatbots are often seen to be complicated and require a lot of time
to understand user’s requirement. It is also the poor processing which is not able to filter
results in time that can annoy people.
• Inability to Understand – Due to fixed programs, chatbots can be stuck if an unsaved
query is presented in front ofthem. This can lead to customer dissatisfaction and result in
loss. It is also the multiple messaging that can be taxing for users and deteriorate the
overall experience on the website.
• Time-Consuming – Chatbots are installed with the motive to speed-up the response and
improve customer interaction. However, due to limited data-availability and time
required for self-updating, this process appears more time-taking and expensive.
Therefore, in place of attending several customers at a time, chatbots appear
confused about how to communicate with people.
• Zero decision-making – Chat bots are known for being infamous because of their
inability to make decisions. A similar situation has landed big companies like Microsoft
etc. in trouble when their chat bot went on making a racist rant. Therefore, it is critical to
ensure proper programing of your chat bot to prevent any such incident which can
hamper your brand.

• Poor Memory – Chat bots are not able to memorize the past conversation which forces
the user to type the same thing again & again. This can be cumbersome for the customer
and annoy them because of the effort required. Thus, it isimportant to be
careful while designing chat bots and make sure that the program is able to comprehend
user queries and respond accordingly.
7. Conclusion & Future Scope:
7.1 Conclusion:
Thus, we can conclude that this system giving the accurate result. As we are using large dataset which
will ensures the better performance. Thus we build up a system which is useful for people to detect the
disease by typing symptoms

7.2 Future Scope:


Chat bots are a thing of the future which is yet to uncover its potential but with its rising
popularity and craze among companies, they are bound to stay here for long. Machine learning
has changed the way companies were communicating with their customers. With new platforms
to build various types of chat bots being introduced, it is of great excitement to witness the
growth of a new domain in technology while surpassing the previous threshold.

8. References:
 https://en.wikipedia.org/wiki/Chatbot https://en.wikipedia.org/wiki/Disease
 https://data-flair.training/blogs/python-chatbot-project/
 https://www.youtube.com/playlist?list=PLQVvvaa0QuDdc2k5dwtDTyT9aCja0on8j
Team Members

Name: Abhishek P.R

Branch : Computer Science

College : Vemana Institute Of Technology

Semester: 7th

Name: Kiran Kumar MB

Branch : Computer Science

College : Vemana Institute Of Technology

Semester: 7th

Name: Kunal Sawant

Branch :

College :

Semester:

Name: Mahesh YV

Branch : Computer Science

College : Vemana Institute Of Technology

Semester: 7th

Name: Mohan G

Branch : Computer Science

College : Vemana Institute Of Technology

Semester: 7th

You might also like