An Enlightenment To Machine Learning
An Enlightenment To Machine Learning
Preamble
The concepts of artificial intelligence and machine learning always evoke the ancient
Greek myth of Pandora’s box. In the fairytale version of the story, Pandora is
portrayed as a curious woman who opened a sealed urn and inadvertently released
eternal misery on humankind.
In the original telling, Pandora was not an innocent girl who succumbed to the
temptation to open a forbidden jar. Instead, as the poet Hesiod tells us, Pandora was
made, not born.
Like the genie that escaped the lamp, the horse that fled the barn, the myth has
become a cliche. Now, let us explore the Machine Learning to get more fascinated!
Data Everywhere!
We are drowning in information and starving for knowledge.
Google
24 petabytes of data are processed per day.
Facebook
10 million photos are uploaded every hour.
Youtube
1 hour of video is uploaded every second.
Twitter
400 million tweets are posted per day.
With data increasing every day, we can believe that smart data analysis will become
more prevalent as a fundamental ingredient for technological progress.
Definition
Machine Learning is the field of study that gives computers the ability to learn
without being explicitly programmed.
We are DATAFIED! Wherever we go, we leave a data trail. Data becomes fruitless
unless we discover the hidden patterns. Wondering how? Yes! Machine Learning is
a magic wand that turns information to knowledge, which will do wonders for
humankind.
Traditional Learning
Machine Learning
Uses data and answers to uncover the rules that build a problem.
Example
Predict if the stock price will increase or decrease.
Example
Predict the age of a person based on their height, weight, and health factors.
Example
Money withdrawal anomalies can be discovered.
Example
Finding a group of customers with similar behavior based on their buying data
history.
7 of 8
This section of the course will aid in answering this question. Keep reading to know
more!
Big Picture
The big picture of Machine Learning process lies in the following 9 steps namely
Gathering Data
Considered to be the primary step of Machine Learning process.
The quality and quantity of data you gather in this step will determine how efficient
your model will be.
Choosing a Model
There are numerous models that researchers and Data scientists have created over
the years.
Some are very well-suited for image data, while others are suited
for sequences, text-based data and many more.
Choosing the right model for the problem will impact the efficiency of the model.
Training
The next step of the Machine Learning process, often known as the the bulk of ML
is Training the model.
This step is very similar to a person who is learning to drive for the first time. Though
at first they dont know any of pedals, switches, breaks but eventually after lots of
practice and feedbacks a licensed driver emerges.
The data is split into Training Data and Testing Data.
Model is trained with the training data using different ML algorithms by adjusting the
parameters in multiple iterations.
Testing Data are put aside as unseen data to evaluate your models.
Evaluation
Once training is complete, it’s time to see if the model is any good, using Evaluation.
This is where that dataset that we set aside earlier comes into play(i.e) Testing Data.
Evaluation allows us to test our model against data that has never been used for
training.
This metric allows us to see how the model might perform against data that it has not
yet seen.
This is meant to be representative of how the model might perform in the real world.
Hyperparameter tuning
After the evaluation step, it's time to see if we can improve our training furthermore
by tuning different parameters that were implicitly assumed in the training process
and this process is called Hyperparameter Tuning.
The tuned model is once again evaluated for model performance, and this cycle
continues until the final best performing model is chosen.
Interpret and Communicate
The most challenging task of the ML project is explaining the model's output.
Earlier days, Machine learning is considered to be a BlackBox because it was hard to
interpret their insights and values.
The more interpretable your model is, then more it is easier to communicate your
model's importance to the stakeholders
Supervised Learning
Unsupervised Learning
Reinforcement Learning
Machine Learning is an umbrella term that covers 3 learning techniques. In this section, let
Supervised Learning
Supervised learning is the machine learning task of learning a function that maps an
input to an output based on example input-output pairs.
It infers a function from labeled training data.
Each training example is a pair consisting of an input object and a desired output
value.
A supervised learning algorithm analyzes the training data and produces an inferred
function, which can be used for mapping new examples.
Applications
1. Spam Detection
2. Pattern Recognition
3. Speech Recognition
Unsupervised Learning
Unsupervised Learning helps in uncovering hidden patterns from unlabeled data.
Applications
1. Recommender Systems
2. Targetted Marketing
3. Customer Segmentation
4. Structure Discovery
Reinforcement Learning
Reinforcement Learning is a type of machine learning in which software agents
ought to take actions in an environment so as to maximize some notion of
cumulative reward.
Applications
1. Genetics
2. Economics
3. Robot Navigation
Machine Learning in SDLC
Machine Learning in SDLC
The image depicted above illustrates how to integrate the process of Machine
Learning into the traditional Software Development Life Cycle (SDLC).
1. Planning
2. Data Engineering
3. Modeling
Machine learning algorithms are programs (math and logic) that adjust themselves to
perform better as they are exposed to more data.
The learning part of machine learning means that those programs change how they
process data over time, much as humans change how they process data by learning.
So a machine-learning algorithm is a program with a specific way to adjusting
its own parameters, given feedback on its previous performance making
predictions about a dataset.
Examples
Linear regression
Decision trees
Support vector machines
Neural networks
Classification
Classification aids in predicting the categorical output.
Clustering
Clustering is the unsupervised grouping of data into buckets.
Feature
For a dataset, a feature represents the combination of attribute and value.
Feature Selection
Feature selection is the process of selecting relevant features from a dataset for
creating a Machine Learning model.
Machine Learning Terminologies
Hyperparameters
Hyperparameters are higher-level properties of a model, such as how fast it can
learn or the complexity of a model.
Instance
An instance is a data point, row, or sample in a dataset.
Label
The label is the answer part of the observation in supervised learning.
Regression
Regression predicts the continuous form of output (For example, price, sales, and so
on).
Validation Set
The validation set is a set of observations used during model training to provide
feedback on how well the current parameters generalize beyond the training set.
Prelude
Let us now explore the following popular Machine Learning techniques:
Classification
Clustering
Association Rule Mining
Outlier Detection
Regression
Classification
Definition
Classification is the process of identifying a category to which a new observation
belongs, based on a training set of data containing observations whose categories
are already known.
Classification Concept
Clustering
Clustering is the task of grouping a set of objects, such that objects in the same
cluster are similar to each other when compared to the objects in the other clusters.
Distance measure plays a significant role in clustering.
Clustering is an unsupervised learning method.
The common distance measures used in various datasets are as follows.
Numeric Dataset
- Manhattan distance
- Minkowski distance
- Hamming distance
Non-Numeric Dataset
- Jaccard index
- Cosine Similarity
- Dice Coefficient
Association Rule Mining
Regression
Regression analysis is a statistical method that aids in examining the relationship
between two or more variables of interest.
Examines the influence of one or more independent variables on a dependent
variable.
There are a variety of algorithms available in the Machine Learning world.
This section will guide you through the commonly used Machine Learning
Algorithms.
Decision Tree
A Decision Tree (DT) is a tree-like model of decisions and possible consequences,
chance event outcomes, resource costs, and utility.
Decision Trees are a non-parametric supervised learning method used for
classification and regression.
Naive Bayes
A Naive Bayes classifier is a probabilistic Machine Learning model that is used for
classification tasks. The crux of the classifier is based on the following Bayes
theorem formula.
P(A|B)=\dfrac{P(B|A)P(A)}{P(B)}P(A∣B)=P(B)P(B∣A)P(A)
____________ learning uses data and answers to uncover the rules that
build a problem.
Machine
The field of study that gives computers the ability to learn without being
explicitly programmed is ___________.
Machine Learning
Which learning would you suggest the user for grouping customers into
distinct categories (Clustering)?
Unsupervised