Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

A Summer Internship on

“Business and Data Analytics”

Submitted In Partial Fulfillment of the Requirement of


Master of Business Administration (MBA)

Under the Guidance of: Submitted By:


Dr. Alok Yadav Parmod
(Founder of YBI Foundation) MBA (Business Analytics)
Roll no. 22010108003

HARYANA SCHOOL OF BUSINESS

Guru Jambheshwar University of Science and Technology

Hisar (Haryana)

Session 2i 022-24
Student Name: Parmod
Industry Supervisor: Dr. Alok Yadav
Verification url: https://workdrive.zohopublic.in/file/8uoud2cca480acfd2419a956988966bfa33ed

ii
ACKNOWLEDGEMENT

The work in this report is an outcome of continuous work over a period and drew intellectual and
other sources. I would like to articulate our profound gratitude and indebtedness to YBI Foundation
Training Associates for teaching and assisting me in making the training successful. I would like to
acknowledge the contributions of the following people without whose help and guidance this report
would not have been completed. I acknowledge the counsel and support of our training coordinator,
with respect and gratitude, whose expertise, guidance, support, encouragement, and enthusiasm has
made this report possible. Their feedback vastly improved the quality of this report and provided an
enthralling experience. I am indeed proud and fortunate to be supported by him/her. I am also
thankful to Dr. Khujan Singh, Program Coordinator of MBA (Business Analytics), Haryana
School of Business (GJUS&T,Hisar) for his constant encouragement, valuable suggestions and
moral support and blessings. And Special thanks to Dr. VINOD KUMAR BISHNOI, Director of
Haryana School of Business (GJUS&T, Hisar) for providing a studious environment and for
providing computer lab and library services and for his regular encouragement to work with full
zealand zest.

Parmod
220101080031
MBA (Business Analytics)

iii
DECLARATION

I Parmod, Student of MBA (BUSINESS ANALYTICS), 3rd semester, Batch 2022-2024 hereby
declare that project report on, “Business & Data Analytics”, which is being submitted in partial
fulfilment for the programmer in MBA, is the record of authentic work carried out by me during the
period from 26th June 2023 to 26th August 2023. I hereby certify that the work which is being
presented in the report entitled “Business & Data Analytics” in fulfilment of the requirement for
completion of 8 weeks industrial training in Department of Haryana School of Business is an
authentic record of my own work carried out during training.

Parmod
220101080031
MBA (Business Analytics)

iv
Table of Content

Chapter Particulars Page


No.
1 Introduction 1-8

1.1 Overview of Industry


1.2 Profile of the Organization
1.3 Products of the Organization
1.4 Competitive Landscape of Industry
1.5 S.W.O.T Analysis of Organization
2 Tasks Assigned and Work Plan 9

2.1 Main Tasks Assigned during the Course


2.2 Duration, geographical area of the tasks
3 Conceptual Discussion 10-11

4 Activity Report 12-37

5 References 38

v
Chapter 1: Introduction

1.1 : Overview of Industry


I have completed my online summer training with YBI Foundation which belongs to the online
internships and training industry.

Online education has become prevalent in today’s era and it got its boost after the covid-19 Pandemic.
E-Learning Market size surpassed USD 315 billion in 2021 and is projected to observe 20% CAGR
from 2022 to 2028.
The rising internet penetration across the globe will drive the industry growth. The expanding
telecom & broadband sector has increased the accessibility to economical internet connectivity
plans. With the increasing number of internet users, more people will be able to access e-
learning platforms for learning courses or completing degrees.
The COVID-19 pandemic had a positive impact on the e-learning industry revenue. The growing
employee safety concerns have encouraged corporates to implement work-from- home practices to
continue daily operational activities. This has created barriers for companies in terms of training,
communication, monitoring progress, and upskilling, supporting the demand for e-learning
platforms among large enterprises and SMEs. To cater to the growing demand, several companies
are focusing on developing customized learning solutions.
There are numerous online learning platforms in the market such as Udemy, Coursera, Lynda, Skill
share, Udacity that serve millions of people. The platforms are getting shaped by different user
verticals as well. While Skill share is mostly for creatives such as giving courses on animation,
photography, lifestyle, Coursera is mostly academic with giving access to university courses.Top
tier universities are also democratizing the learning by making courses accessible via online.
Stanford University and Harvard University give access to online courses under categories of
computer science, engineering, mathematics, business, art, and personal development.
These all show one thing, there’s a huge demand from people to learn online. The reason for
this demand and rapid growth of the market with a wide variety of platform options for different
groups of people may be the rapid change of the world. At the recent $50 Million round of
Udemy, president Darren Shimkus says, “The biggest challenge is for learners is to figure out
what skills are emerging, what they can do to compete best in the global market. We’re in a
world that’s changing so quickly that skills that were valued just three or four years ago are no
longer relevant. People are confused and don’t know what they should be learning.” At this moment,
online learning is becoming a huge catalyzer for people and companies to help the adoption of
this rapid change in the world.
1
1.2 : Profile of the Organization

YBI Foundation
YBI Foundation is an internship and online training platform, based in Delhi, India. Founded by

Dr. Alok Yadav and Arushi Yadav in 2020.

YBI Foundation is a Delhi-based not-for-profit edutech company that aims to enable the youth to
grow in the world of emerging technologies. They offer a mix of online and offline approaches to
bring new skills, education, technologies for students, academicians and practitioners. They believe
in the learning anywhere and anytime approach to reach out to learners. The platform provides free
online instructor-led classes for students to excel in data science, business analytics, machine
learning, cloud computing and big data. They aim to focus on innovation, creativity, technology
approach and keep themselves in sync with the present industry requirements. They endeavor to
support learners to achieve the highest possible goals in their academics and professions.

The courses on offer are on a chargeable basis but can be downloaded along with the supporting
materials such as videos for offline reference. YBI Foundation's courses are largely divided into
Summer Training and Winter Training. YBI Foundation offers different kinds of internships—full
time, part-time, and work from home. One can find both paid and unpaid internships on YBI
Foundation.

2
1.3 : Products of YBI Foundation

YBI Foundation offers following products & services to its users:


Online Internship and Training:
YBI Foundation offers Free programs, scholarships for girls , dual internship program , full stack
dual certificate program and guaranteed placement assistance program for students, freshers and
working professionals.
Certificate Courses:
If you are looking for complete learning along with industry level projects for your next internship or
placement or job role change, then opt for GUARANTEED program. This also helps you to master
Fundamentals to Advance Concepts, Real Projects, Resume Building, Mock Interviews and more.
ACHIEVERS program is to upskill with Fundamentals and Advance Concepts with internship exposure,
whereas FREE program simply builds the Fundamentals with certificate of completion.
Live Classes and Doubt Sessions:
We have daily live doubt sessions to resolve your query. Our TA will help you understand solution
thru call, chat or email in almost real time.

3
1.4 : Competitive Landscape of the Industry or Sector
Let’s intern:

Let’s intern is a platform where students find meaningful Internships with organization so fall scales.
It is a pre-LinkedIn for students to connect to organizations, career services and each other. The
platform is being developed to offer full range of career services including access to employers, talent
insights, intelligence drawn from data across the web and 3rd party products like assessments, content
and certifications.
Twenty19.com:

Twenty19.com is a platform which gives students access to a gamut of opportunities. The opportunities
range from internships in diverse roles to scholarships, conferences and other student competitions
and events around the world. Twenty19 exists to educate and enable students to take more
initiatives. At Twenty19firmly believe that the real learning for a student lies beyond the four walls
of a classroom. Students can gain meaningful knowledge and skills only when they take initiatives and
get hands-on experience in the real world.
On our culture and the impact that we make...
At twenty 19 they create products and processes which make a real impact on real people, every day.
They are able to see how they everyday work directly translates into the tangible impact that they make
on the educational system. This is what inspires them and this is what they are driven by - The potential
to bring about a real change.
They are also conducting 'Ask Me Anything' sessions for placements in different colleges, where a
student can ask questions related to companies coming for placements.

BYJU’S:

BYJU'S is India's largest ed-tech company and the creator of India's most loved school learning app.
Launched in 2015, BYJU'S offers highly personalised and effective learning programs for classes 1 -
12 (K-12), and aspirants of competitive exams like JEE, IAS etc. With 50 million registered students
and 3.5 million paid subscriptions, BYJU'S has become one of the most preferred education platforms
across the globe.
BYJU'S has been backed by strong and prominent investors like Chan-Zuckerberg Initiative, Naspers,
CPPIB, General Atlantic, Tencent, Sequoia Capital, Sofina, Verlinvest, IFC, Aarin Capital, Times
Internet, Light speed ventures, Tiger Global, Owl Ventures & Qatar Investment Authority.
Delivering a world-class learning experience, programs from BYJU'S are making learning contextual
and visual. The apps have been designed to adapt to the unique learning style of every student, as per
the pace, size and style of learning
4
1.5 : SWOT Analysis of YBI Foundation

Strengths:
 Easy to use:

Be it the website of YBI Foundation or their mobile app, its strong site navigation makes it
easy for users to find internships and courses that interests them.
 Work from home:

With the comfort of home if you want to gain some professional experience then YBI
Foundation is just the right place for you providing internships and courses in almost every
field.
 Genuine Internships:

While searching for internships you might come across a few sites with fake internships but
YBI Foundation follows a stringent authentication process before posting internships

5
Weaknesses:
 Stipend:

In many cases the employers (mostly from start-ups) do not pay the interns the
mentioned stipend and often negotiate with them. Therefore YBI Foundation should
ensure this kind of activities should not be their on platform.
Opportunities:
 Expanding globally specially in the emerging economies can be a huge boost for
YBI Foundation.
 Integration with voice & audio call softwares can help the platform in increasing
professional connections.

 Partnership with mobile operators to increase reach.


Threats :
 Government regulations for maintaining privacy can hurt YBI Foundation.

 New upcoming professional networking websites.

 Fake accounts and data leaks can be an issue which affect YBI Foundation business
operations.

6
Chapter 2: Main Tasks Assigned and Work Plan
2.1 : Main Tasks assigned during the course
1) I was given video lectures along with course presentation so that I am well versedwith all
the topics related to Business & Data Analytics.
2) After each topic I was assigned quizzes and exercises, to ensure proper comprehension of the
topic. With the help of exercises, I was able to groom the topic very well.
3) There was a module test at the end of each module to check my understanding of the various
topics covered in each module. These module tests were comprised of approximately 30
questions with a approximate time limit of 30 minutes.
4) I was given 5 projects related to different models.

2.2 : Duration, geographical area of the tasks


1) The whole duration of the training was of 8 weeks.

2) This training was operational in virtual mode.

7
Chapter 3: Conceptual Discussion
Business Analytics:

Business analytics is a professional discipline focused on identifying business needs and determining
solutions to business problems. Solutions may include a software-systems development component,
process improvements, or organizational changes, and may involve extensive analysis, strategic
planning and policy development. A person dedicated to carrying out these tasks within an organization
is called a business analyst or BA.

Business analysts are not found solely within projects for developing software systems. They may also
work across the organization, solving business problems in consultation with business stakeholders.
Whilst most of the work that business analysts do today relates to software development / solutions, this
is due to the ongoing massive changes businesses all over the world are experiencing in their attempts
to digitize.

Data Analytics:
Data analytics is a multidisciplinary field that employs a wide range of analysis techniques, including
math, statistics, and computer science, to draw insights from data sets. Data analytics is a broad term that
includes everything from simply analyzing data to theorizing ways of collecting data and creating the
frameworks needed to store it.
Relationship between Business Analytics & Data Analytics:
Business Analytics (BA) and Data Analytics (DA) are closely related fields, but they serve slightly
different purposes. At its core, Data Analytics focuses on processing and performing statistical analysis
of datasets to identify trends, analyze the relationships among variables, and predict outcomes. It's the
science of dissecting raw data to draw meaningful insights. On the other hand, Business Analytics uses
data and statistical methods to provide actionable insights for businesses. It's more concerned with
understanding past performance to guide future business planning, strategy, and decision-making. While
Data Analytics can be applied in various domains outside of business, Business Analytics is specifically
tailored to the business context

8
Challenges:

Data Analytics (DA) primarily grapples with challenges like managing vast and diverse datasets,
ensuring data quality, and selecting appropriate tools and methodologies for analysis. Meanwhile,
Business Analytics (BA) confronts issues like aligning data-driven insights with business goals,
overcoming stakeholder resistance to data-driven approaches, and ensuring regulatory compliance,
especially in industries handling sensitive information. Both domains seek to turn data into actionable
knowledge but face hurdles in terms of data quality, relevance, and application.
Impact and Applications:
Data Analytics (DA) has significantly influenced various industries by enabling organizations topredict
trends, optimize operations, and enhance customer experiences. Its applications are vast;for instance, it
aids in disease prediction in healthcare, fraud detection in finance, and personalization in e-commerce.
In contrast, Business Analytics (BA) has tailored this data-driven approach specifically for businesses,
aligning insights directly with business objectives. The impact of BA is profound; it not only refines
decision-making processes but also helps in unveiling new growth avenues and identifying potential
pitfalls.
Its applications are seen in strategic planning, supply chain management, customer relationship
management, and various other business operations. In essence, while DA offers a broad analysis scope,
BA focuses on translating those insights into actionable business strategies.
Ethical Considerations:
Ethical considerations in the realm of analytics are becoming increasingly important as the magnitude
and implications of data-driven decisions expand. First and foremost is the respect for data privacy. With
the surge in data collection, it's vital to ensure that personal information is handled with care,
safeguarded from breaches, and used only for the purposes for which it was collected.
Informed consent, a pillar of ethical conduct, demands that data subjects be fully aware and have granted
permission for their data to be used in specific ways.

In conclusion, Data Analytics and Business Analytics are pillars in our data-driven world. Data
Analytics offers broad insights, benefiting diverse sectors, while Business Analytics tailors these
insights for business strategies. Both are essential in harnessing data to innovate and drive decisions.
Similarly, data science and machine learning extract insights and automate processes, but come with
their distinct scopes and ethical responsibilities. Both fields are pivotal, reshaping industries and
bringing forth challenges that need careful navigation

9
Chapter 4: Activity Report

Date List Of Activities My Learning


26-06-2023 About Course Discussion and what Python Basics, Introduction to
I will Learn in this Course, predictive analysis
Predictive Analysis
27-06-2023 Scope Of Business Analytics How it is important to analyze the
data for business
28-06-2023 Quiz and Exercise MCQ Quiz and Practice
29-06-2023 Holiday -
30-06-2023 Introduction to Python and Data Analysis, enabling computers
Analytics to learn from data, identify patterns,
and make predictions without
explicit programming
01-07-2023 Holiday -

02-07-2023 Holiday -

03-07-2023 Uses of Python in Deep Learning, Flexibility of Use of Python in


ML, Data Analytics and Various Fields.
Development.
04-07-2023 Introduction to Google Colab Created New Notebook Over
Notebook. Google Colab
05-07-2023 Quiz and Exercise MCQ Quiz and Practice

Date List Of Activities My Learning


06-07-2023 Introduction to Data Science and Insights from data using statistical
types of Data Science and computational techniques
07-07-2023 Introduction about Data and Basic Versatile, high-level programming
Information of Python language known for its readability
10
and simplicity, making it ideal for
various applications
08-07-2023 Holiday -
09-07-2023 Holiday -
10-07-2023 Quiz and Exercise MCQ Quiz and Practice

11-07-2023 Installation and Use of Python Python and working on Google


Colab
12-07-2023 Introduction to Libraries Pandas, NumPy, Scikit-Learn,
Matplotlib, Seaborn
13-07-2023 Quiz and Exercise MCQ Quiz and Practice
14-07-2023 Introduction to Basic Data Loading data, checking dimensions,
Exploration in Python summarizing statistics, handling
missing values, and creating
visualizations for insights
15-07-2023 Holiday -

16-07-2023 Holiday -

17-07-2023 Quiz and Exercise MCQ Quiz and Practice

18-07-2023 Plotting Graphs in Python Import, create figures, add data,


customize, and display
with plt.show( )
19-07-2023 Introduction to Data Frames Explore Data Frame s

20-07-2023 Quiz and Exercise MCQ Quiz and Practice

21-07-2023 Simple Linear Regression Linear relationship between one


independent variable and a
dependent variable, using a
straight line equation
22-07-2023 Holiday -

11
23-07-2023 Holiday -

Date List Of Activities My Learning


24-07-2023 Quiz and Exercise MCQ Quiz and Practice
25-07-2023 Practice of Simple Linear Practice
Regression Model
26-07-2023 Multiple Linear Regression Relationships between multiple
independent variables and a
dependent variable, using a linear
equation with multiple coefficients
27-07-2023 Quiz and Exercise MCQ Quiz and Practice
28-07-2023 Practice of Multiple Linear Practice
Regression Model
29-07-2023 Holiday -

30-07-2023 Holiday -
31-07-2023 Quiz and Exercise MCQ Quiz and Practice

01-08-2023 Cancer Prediction Model Predicting Growth of Cancer using


Multiple Linear Regression Model
02-08-2023 Introduction to Logistic Statistical model used for binary
Regression classification, estimating the
probability of an outcome based
on input features
03-08-2023 Quiz and Exercise MCQ Quiz and Practice

04-08-2023 Implementation of Logistic Implementation


Regression Model
05-08-2023 Holiday -

06-08-2023 Holiday -

07-08-2023 Quiz and Exercise MCQ Quiz and Practice

12
08-08-2023 Practice of Logistic Regression Practice
Model
09-08-2023 Introduction to Decision Tree It is a visual model that makes
choices based on data, used for
classification and regression tasks
10-08-2023 Quiz and Exercise MCQ Quiz and Practice

Date List Of Activities My Learning


11-08-2023 Implementing decision Tree Collect data, choose features, split
nodes based on criteria, recursively
create branches, assign labels for
classification, predict values for
regression
12-08-2023 Holiday -

13-08-2023 Holiday -

14-08-2023 Holiday -
15-08-2023 Holiday -

16-08-2023 Holiday -

17-08-2023 Introduction To Predictive Predictive Analysis


Analysis
18-08-2023 Ice Cream Sales Prediction Model Forecasting Future Sales using
Predictive Analytics
19-08-2023 Final Project Scopes Of Analytics in Different
Fields Through Projects
26-06-2023 Quiz and Exercise MCQ Quiz and Practice

27-06-2023 Final Project Submission Project Submitted

28-06-2023 Final Quiz and Completion of MCQ Quiz & Complete the
Internship/Training Internship/Training and get the
13
Certificate of Completion
29-06-2023 Holiday -

30-06-2023 Quiz and Exercise MCQ Quiz and Practice


01-07-2023 Holiday -
02-07-2023 Holiday -

14
1. About Course

15
2. Introduction to Predictive Analysis

WORKING ON
IMPORT DATA PYTHON ANALYSIS PREDICTIONS
DATA

Predictive analysis is a branch of advanced analytics that employs a variety of techniques, from
statistical modeling to machine learning, to analyze current and historical facts to make predictions
about future or otherwise unknown events. At its core, predictive analysis transforms data into
valuable, actionable insights
Historical Context
Historically, businesses and researchers have been looking for ways to make accurate forecasts. With the
evolution of computer technology, especially in the last few decades, the ability to process large datasets
has allowed for sophisticated predictive modeling. Today's predictive analytics is built upon a foundation
of classical statistical methods combined with the modern power of computation.
Key Components
 Data: The foundation of any predictive model. This data can be historical (what has happened)
and transactional (what is happening).
 Statistical Algorithms: From linear regression, decision trees to neural networks,
various algorithms can be employed depending on the data and the nature of the problem.
 Assumptions: Every predictive model is based on certain assumptions. The validity of
predictions often depends on how accurate these assumptions are.
Applications
 Business: Predictive models can forecast inventory demand, predict sales, and even identify
potential high-value customers.
 Healthcare: Predictive analytics can be used to predict disease outbreaks, patient admissions,
and other important metrics.
 Financial Services: It can help in predicting stock market trends, credit risks, and identify
potentially fraudulent activities.
 Sports: Teams use predictive analytics to evaluate player performance and strategize game
plans.

16
Benefits
 Decision Making: By knowing what is likely to happen in the future, businesses can
make informed decisions.
 Risk Reduction: Predictive analytics can identify potential risks and allow for mitigation
strategies.
 Efficiency: By predicting demand, for example, businesses can optimize supply, thus saving
on costs.

Challenges and Limitations: -


 Data Quality: The accuracy of predictions largely depends on the quality of the data used to
train the models.
 Complexity: Building a good predictive model requires expertise in both the domain (e.g.,
finance, healthcare) and in the analytics techniques.
 Ethical Concerns: Using predictive analytics, especially in areas like hiring or law
enforcement, can lead to concerns about privacy, bias, and fairness.

Conclusion
Predictive analytics has become an integral tool across sectors, helping stakeholders to visualize
potential future scenarios. As with any tool, its effectiveness depends on its usage – understanding its
capabilities, limitations, and the context in which it is applied. As technology and methodologies evolve,
predictive analytics will only grow in its importance and impact.

17
3. Scopes of Business Analytics

18
5. Data
“Data is the new oil.” Today data is everywhere in every field. Whether you are a data scientist,
marketer, businessman, data analyst, researcher, or you are in any other profession, you need to play
or experiment with raw or structured data. This data is so important for us that it becomes important
to handle and store it properly, without any error. While working on these data, it is important to
know the types of data to processthem and get the right results. There are two types of data:
Qualitative and Quantitative data, which are further classified into four types of data: nominal,
ordinal, discrete, and Continuous.
Now business runs on data, most of the companies use data for their insights to create and launch
campaigns, design strategies, launch products, and services or try out different things. According to a
report, today, at least 2.5 quintillion bytes of data are produced per day.

Types of Data

Qualitative or Categorical Data


Qualitative or Categorical Data is data that can’t be measured or counted in the form of numbers. These
types of data are sorted by category, not by number. That’s why it is also known as Categorical Data.
These data consist of audio, images, symbols, or text. Thegender of a person, i.e., male, female, or
others, is qualitative data.

The Qualitative data are further classified into two parts :


Nominal Data
Nominal Data is used to label variables without any order or quantitative value. The colour of hair
can be considered nominal data, as one colour can’t be compared with another colour.

19
Ordinal Data
Ordinal data have natural ordering where a number is present in some kind of order bytheir
position on the scale. These data are used for observation like customer satisfaction, happiness, etc., but
we can’t do any arithmetical tasks on them.
Quantitative Data
Quantitative data can be expressed in numerical values, which makes it countable and includes statistical
data analysis. These kinds of data are also known as Numerical data. It answers the questions like, “how
much,” “how many,” and “how often.” For example, the price of a phone, the computer’s ram, the
height or weight of a person, etc., falls under the quantitative data.

The Quantitative data are further classified into two parts:

Discrete Data
The term discrete means distinct or separate. The discrete data contain the values that fall under integers
or whole numbers. The total number of students in a class is an example of discrete data. These data
can’t be broken into decimal or fraction values.
Continuous Data
Continuous data are in the form of fractional numbers. It can be the version of an android phone, the
height of a person, the length of an object, etc. Continuous data represents information that can be
divided into smaller levels. The continuous variable can take any value within a range.

20
6. Introduction to Python

What is Python?
Python is a popular programming language. It was created by Guido van Rossum, andreleased
in 1991.
It is used for:
 web development (server-side),

 software development,

 mathematics,

 system scripting.

What can Python do?


 Python can be used on a server to create web applications.

 Python can be used alongside software to create workflows.

 Python can connect to database systems. It can also read and modify files.

 Python can be used to handle big data and perform complex mathematics.

 Python can be used for rapid prototyping, or for production-ready softwaredevelopment.

Why Python?
 Python works on different platforms (Windows, Mac, Linux, Raspberry Pi, etc).

 Python has a simple syntax similar to the English language.

 Python has syntax that allows developers to write programs with fewer lines thansome other
programming languages.
 Python runs on an interpreter system, meaning that code can be executed as soon as itis written.
This means that prototyping can be very quick.

21
Installation
Pip, the package manager for Python can be used to easily install Jupyter. You must have Python
installed on your system (Python3 is recommended). To install Jupyter using pip,you can run the
following command on the terminal or command line:

// this is to upgrade pip and make sure that


// the latest version of pip is installed
pip3 install --upgrade pip

// for Python3
pip3 install jupyter

// for Python2 (not recommended)


pip install jupyter

Python standard library

The Python Standard Library contains the exact syntax, semantics, and tokens of Python. It contains
built-in modules that provide access to basic system functionality like I/O and some other core
modules. Most of the Python Libraries are written in the C programming language. The Python
standard library consists of more than 200 core modules. All these work together to make Python a
high-level programming language. Python Standard Library plays a very important role. Without it,
the programmers can’t have access to the functionalities of Python. But other than this, there are
several other libraries in Python that make a programmer’s life easier. Let’s have a look at some of
the commonly used libraries:

1. Matplotlib: This library is responsible for plotting numerical data. And that’s
why it is used in data analysis. It is also an open-source library and plots high- defined
figures like pie charts, histograms, scatterplots, graphs, etc.
2. Pandas: Pandas are an important library for data scientists. It is an open-source Data
Science & Machine Learning library that provides flexible high-level data structures
and a variety of analysis tools. It eases data analysis, data manipulation, and cleaning
of data. Pandas support operations like Sorting, Re-indexing, Iteration,
Concatenation, Conversion of data, Visualizations, Aggregations, etc.

22
3. NumPy: The name “NumPy” stands for “Numerical Python”. It is the commonly
used library. It is a popular Data Science & Machine Learning library that supports
large matrices and multi-dimensional data. It consists of in-built mathematical
functions for easy computations. Even libraries like TensorFlow use NumPy
internally to perform several operations on tensors. Array Interface is one of the key
features of this library.

4. Scikit-learn: It is a famous Python library to work with complex data. Scikit- learn
is an open-source library that supports Data Science & Machine Learning. It supports
variously supervised and unsupervised algorithms like linear regression,
classification, clustering, etc. This library works in association with NumPy and
SciPy.

23
7. Basic Data Exploration

Descriptive Statistics

24
25
26
8. Linear Regression
Introduction to Linear Regression
Linear regression is one of the easiest and most popular Data Science & Machine Learning algorithms.
It is a statistical method that is used for predictive analysis. Linear regression makes predictions for
continuous/real or numeric variables such as sales, salary, age, product price, etc.
Linear regression algorithm shows a linear relationship between a dependent (y) and one or more
independent (y) variables, hence called as linear regression. Since linear regression shows the linear
relationship, which means it finds how the value of the dependent variable ischanging according to the
value of the independent variable.
The linear regression model provides a sloped straight line representing the relationship between the
variables. Consider the below image:

Figure 5.1 Mathematically, we can represent a linear regression as: y = a0+a1x+ ε

Here,
Y = Dependent Variable (Target Variable)
X= Independent Variable (predictor Variable)
a0 = intercept of the line (Gives an additional degree of freedom)
a1 = Linear regression coefficient (scale factor to each input value)
ε = random error
The values for x and y variables are training dataset2s7for Linear Regression model representation.
Types of Linear Regression
Linear regression can be further divided into two types of the algorithm:
 Simple Linear Regression: If a single independent variable is used to predict the value of a
numerical dependent variable, then such a Linear Regression algorithm is called Simple Linear
Regression.

 Multiple Linear Regression: If more than one independent variable is used to predict the value
of a numerical dependent variable, then such a Linear Regression algorithm is called Multiple Linear
Regression.

28
9. Implementing Linear Regression

29
10. Decision Tree
Introduction to decision tree
Decision Tree in Data Science & Machine Learning is a part of classification algorithm which also
provides solutions to the regression problems using the classification rule(starting from the root to the
leaf node); its structure is like the flowchart where each of the internal nodes represents the test on a
feature (e.g., whether the random number is greater than a number or not), each leaf node is used to
represent the class label( results that need to be computed after taking all the decisions) and the branches
represents conjunction conjunctions of features that lead to the class labels.
Decision Tree in Data Science & Machine Learning has got a wide field in the modern world. There are
a lot of algorithms in ML which is utilized in our day-to-day life. One of the important algorithms is the
Decision Tree used for classification and a solution for regression problems. As it is a predictive model,
Decision Tree Analysis is done via an algorithmic approach where a dataset is split into subsets as per
conditions. The name itself says it is a tree-like model in the form of if-then-else statements. The deeper
is the tree and more are the nodes, the better is themodel.
Types of Decision Tree in Data Science & Machine Learning
Decision Tree is a tree-like graph where sorting starts from the root node to the leaf nodeuntil the
target is achieved. It is the most popular one for decision and classification based on supervised
algorithms. It is constructed by recursive partitioning where each node acts as a test case for some
attributes and each edge, deriving from the node, is a possible answer inthe test case. Both the root
and leaf nodes are two entities of the algorithm.
Let’s understand with the help of a small example as follows:

30
Here, the root node is whether you are less than 40 or not. If so, then do you eat fast food? If yes, then
you are unfit, or else, you are fit. And if you are more than 40, then do you do exercise? If so, then you
are fit, or else, you are unfit. This was basically a binary classification.

There are two types of Decision Trees :

 Classification Trees: The above example is a categorial based Classification Tree.

 Regression Trees: In this type of algorithm, the decision or result is continuous. Ithas got a
single numerical output with more inputs or predictors.

In the Decision tree, the typical challenge is to identify the attribute at each node. The processis called
attribute selection and has some measures to use in order to identify the attribute.

11. Implementing Decision Tree

31
Literature Review on Logistic Regression
Introduction
Logistic regression, a method within the class of generalized linear models, is a widely used

statistical technique. This method predicts the probability of a binary response based on

one or more predictor (or independent) variables.

Historical Context
Since its introduction in the early 20th century, logistic regression has found application in

a variety of fields, including epidemiology, economics, and social sciences (Cramer, 2002).

In its essence, it extends the ideas of linear regression into situations where the outcome

variable is categorical (binary in the simplest case).

Applications
Medical Research: Often in medical research, the outcome is binary, e.g., disease /no disease

Hosmer et al. (2013) detail its extensive application in epidemiology to determine risk factors for
diseases.

Marketing: Logistic regression aids in predicting customer behavior, such as buying/not buying

(Leeflang et al., 2000).

Financial Sector: It's used for credit scoring by predicting the likelihood of a customer defaulting on

a loan (Thomas, 2000).

Assumptions and Limitations


Like all models, logistic regression operates under certain assumptions. These include the

requirement of large sample size, no multicollinearity among independent variables, and linearity

of independent variables and log odds (Menard, 2002). If these assumptions aren't met, the reliability

of predictions may be compromised.

However, while logistic regression can predict the likelihood of an event, it doesn’t provide reasons

behind certain occurrences. It's also worth noting that as a linear model, it might not capture

more complex relationships in data, which is where techniques like decision trees or neural

networks might be considered.

32
Extensions and Variations
The standard logistic regression model has been expanded to cater to different needs.

Multinomial logistic regression, for example, is used when the outcome variable has more than

two categories (Agresti, 2003). Ordered logistic regression is employed for ordinal dependent variables.

Conclusion
Logistic regression continues to be a fundamental tool in the arsenal of researchers and

professionals across various disciplines. Its robustness, ease of interpretation, and wide

applicability make it indispensable. However, understanding its assumptions and limitations

is crucial to ensuring its appropriate use.

33
12. Implementation of Logistic Regression Model

34
35
Chapter 5: References

• https://www.ybifoundation.org/

• https://www.geeksforgeeks.org/

• https://www.javatpoint.com/

• https://www.w3schools.com/

36

You might also like