Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

-1-

Business Statistics and Analytics (KMB-104)

Unit – I
Meaning, Scope, functions and limitations of statistics, Measures of Central tendency –
Mean, Median, Mode, Quartiles, Measures of Dispersion – Range, Inter quartile range,
Mean deviation, Standard deviation, Variance, Coefficient of Variation, Skewness and
Kurtosis.

Q.1. State the different between description and inferential statistics.

Ans)

Descriptive Statistics Inferential Statistics


Inferential Statistics is a set of methods used
Descriptive Statistics is a set of methods
to make a generalization, estimate, predict or
Meaning which is used to describe data that has been
take a decision when we want to draw
collected, i.e. summarization of data
conclusions about a distribution.
Organize, analyze and present data in a
What it meaningful way. A distinction is made
Compares, test and predicts data.
does? between univariate, bivariate and multivariate
analysis.
Charts, Graphs and Tables. Frequency
Form of distribution, measures of central tendency,
Probability
final Result measures of dispersion and skewness are
used.
To summarize the population data by To generalize the results obtained from a
Usage describing what was observed in the sample random sample back to the population from
numerically or graphically. which the sample was drawn
It attempts to reach the conclusion to learn
It explains the data, which is already known,
Function about the population that extends beyond the
to summarize sample.
data available.

Questions for Business Statistics and Analytics (KMBN-104) Compiled by Dr. Ritesh Singhal, AKGIM
-2-

Q.2. State the objectives and essentials of an ideal average.

Sol. It is a single value represents the entire mass of data. Generally, these are the central part of the
distribution. Characteristics of ideal Average are:

 It should be rigidly defined


 Easy to calculate
 Based on all the observations
 Capable of further algebraic treatment
 It should not effected by extreme values
 It should be least affected by fluctuation of sampling

Q.3. How do you make a suitable measure of central tendency?


Sol.
―An average is a figure that represents the whole group‖ (Clark)
―An average is a single value within the range of the data that is used to represent all the values in the
series.‖ (Croxton and Cowden)
―An average is a single value selected from a group of values to represent them in some way‖ (A E
Wagh)
―An average is sometimes called a ‗measure of central tendency‘ because individual values of the variable
usually cluster around it.‖ (Crum and Smith)
It is a single value represents the entire mass of data. Generally, these are the central part of the
distribution.
Characteristics of Good measure of central tendency
 It should be rigidly defined
 Easy to calculate
 Based on all the observations
 Capable of further algebraic treatment
 It should not effected by extreme values
 It should be least affected by fluctuation of sampling (sampling stability)

Averages can be categorized as:


Mathematical Averages Positional Averages
1. Arithmetic Mean 1. Median
2. Weighted AM 2. Mode
3. Geometric Mean 3. Partition Values
4. Harmonic Mean

Q.4. Define coefficient of variation.

Ans) Coefficient of variation is the measure of relative dispersion which relates the standard deviation
and the mean such that the standard deviation is expressed as a percentage of mean.

 When two or more distributions having unequal mean & equal SD are to be compared.
 When two or more distributions expressed in different units of measurement are to be compared.
 CV is unit less quantity

𝜎
𝑆𝑦𝑚𝑏𝑜𝑙𝑖𝑐𝑎𝑙𝑙𝑦, 𝐶𝑉 (𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛) = 𝜇 *100 %

Series having larger CV is more variable, whereas the series having lesser CV is more consistent.

Questions for Business Statistics and Analytics (KMBN-104) Compiled by Dr. Ritesh Singhal, AKGIM
-3-

Q.5.What is Statistics? What are the various uses of statistics in the management of an
organization?
OR
Q.5. “Our managers can improve managerial decisions to a great extent, if they are
adequately familiar with the basic tools of statistics and mathematics.” Explain and
illustrate.

Sol. There are several definitions of statistics such as:

 The systematic and scientific treatment of quantitative measurement is precisely known as


statistics. – Horace Secrist
 Statistics may be called as science of counting / averages. – Bowley A.L.
 Statistics is concerned with the collection, classification (or organization), presentation and
analysis of data which are measurable in numerical terms. – Croxton & Cowden

Application of statistics could be in the following area:

1. Marketing: Statistical analysis are frequently used in providing information for making decision in the
field of marketing it is necessary first to find out what can be sold and the to evolve suitable strategy, so
that the goods which to the ultimate consumer.

2. Production: In the field of production statistical data and method play a very important role. Decision
about what to produce? How to produce? When to produce? For whom to produce is based largely on
statistical analysis.

3. Finance: The financial organization discharging their finance function effectively depends very heavily
on statistical analysis.

4. Investment: Statistics greatly assists investors in making clear and valued judgment in his investment
decision in selecting securities which are safe and have the best prospects of yielding a good income.

5. Human Resource: Statistics may be used to handle data generated through human resource for
planning, organizing, staffing.

Tools in Statistics:

Statistics used in management decision making

 Time Series – Used to analyze the trend in data and make prediction based on that trend.
 Probability – Used to find chance of success or failure of any project.
 Measure of Central Tendency – Used to know the single value of about data. Like mean, mode,
median.
 Measure of dispersion – Used to understand variation between data.
 Index Number – Used to understand commodity and inflation.
 Correlation – Used to understand the relation between variables.
 Regression – Used to make forecasting, prediction and estimation.
 Hypothesis – Used for research in management.
 Decision Theory – Used to help in decision making

(Students may add few examples of above topics)

Questions for Business Statistics and Analytics (KMBN-104) Compiled by Dr. Ritesh Singhal, AKGIM
-4-

Q.6. Discuss meaning, scope, functions and limitations of statistics.

Ans) Statistics is concerned with the collection, classification (or organization), presentation and analysis
of data which are measurable in numerical terms.

Scope of statistics
 Statistical data and techniques of statistical analysis are immensely useful in solving economical
problems such as wages, price, time series analysis, demand analysis
 It can be used in medical and actuarial sciences
 Business executives are relying more and more on statistical techniques for studying the
preference of the customers.
 In production engineering, statistical tools such as inspection plan, control chart etc. are
extensively used to find out whether the product is confirming to the specifications or not.
 Statistics are useful to banker, insurance companies, social workers, labour unions, trade
associations, chambers and to the politicians.
Functions of statistics
 It simplifies complex data
 It provides techniques for comparison
 It studies relationships
 It helps in formulating policies
 It helps in forecasting
 It is helpful for common man
 Statistical methods merges with speed of computer can make wonders; SPSS,STATA, MATLAB,
MINITAB, MS-Excel etc.

Limitations of statistics

 It does not consider qualitative phenomenon


 It does not study individuals
 It is liable to be misused
 It does not necessarily bring out the cause & effect relationship
 If collected with a given purpose, it cannot be indiscriminately applied to any situation

Q.7. Discuss the merits and demerits of measure of central tendency.

Ans)

Merits Demerits
Mode Used with nominal level data Not representative of all data
Not influenced by extreme scores Depends on group selection
Limited use in statistics
Median Used with ordinal level data May not appear in data
Not influenced by extreme scores Restricted statistical uses
Mean Used with interval level data Influenced by extreme scores
Useful statistical properties May not appear in data
Widely understood Requires interval level data

Questions for Business Statistics and Analytics (KMBN-104) Compiled by Dr. Ritesh Singhal, AKGIM
-5-

Q.8. What do you meant by the property of shape?


OR
Distinguish between skewness and kurtosis.
Ans) Skewness - measures the degree and direction of symmetry or asymmetry of the distribution. A
normal or symmetrical distribution has a skewness of zero (0). For a symmetric distribution
mean=median=mode. Therefore, a distribution may be positively skewed (skew to the right; longer tail to
the right; represented by a positive value) or negatively skewed (skew to the left; longer tail to the left;
with a negative value).

Kurtosis - measures how peaked a distribution is and the lightness or heaviness of the tails of the
distribution. In other words, how much of the distribution is actually located in the tails? A normal
distribution has a kurtosis value of zero (0) and is said to be mesokurtic. A positive kurtosis value means
that the tails are heavier than a normal distribution and the distribution is said to be leptokurtic (with a
higher, more acute "peak"). A negative kurtosis value means that the tails are lighter than a normal
distribution and the distribution is said to be platykurtic (with a smaller, flatter "peak").

Q.9. Write a note on Dispersion/variation. How dispersion can be measured?


Ans) The average does not enable us to draw a full picture of the distribution. So a further description is
necessary to get a better description. The extent or degree to which data tends to spread around an average
is called Dispersion & Variation. Dispersion can be measured using the following:-

Range- It is the difference between the maximum value and the minimum value of data
Inter quartile range = Q3-Q1, It denotes the difference between the third quartile and the first quartile or
the semi – inter-quartile range or quartile deviation = (Q3-Q1)/2
Mean deviation(MD) - It is the average of absolute amounts by which the individual items deviate from
𝐼𝑥𝐼
the mean. MD =
𝑛

Standard deviation (σ) - It is a measure that is used to quantify the amount of variation or dispersion of a
set of data values. Deviations are measured from the mean. It has desirable mathematical properties.

Objectives of studying dispersion

 For judging the reliability of averages.


 Comparison of distributions.
 Useful for controlling variability.
 Useful in further analysis.

Characteristics of Good Measure of dispersion


 It should be rigidly defined
 Easy to calculate
 Based on all the observations

Questions for Business Statistics and Analytics (KMBN-104) Compiled by Dr. Ritesh Singhal, AKGIM
-6-

 Capable of further algebraic treatment


 It should not effected by extreme values
 It should be least affected by fluctuation of sampling (sampling stability)

Q.10. Discuss various partition values such as median, quartiles, deciles and percentiles
and their uses.
Ans) Partition values divide the same set of observations in different ways. So, we can fragment these
observations into several equal parts.

Median – It is that value of the variable which divides the group into two equal parts, one part
comprising all values greater, and the other part having lesser value than median.
Deciles are those values that divide any set of a given observation into a total of ten equal parts.
Therefore, there are a total of nine deciles. These representation of these deciles are as follows – D1, D2,
D3, D4, ……… D9.

A percentile basically divides any given observation into a total of 100 equal parts. The representation of
these percentiles are given as – P1, P2, P3, P4, ……… P99. A quartile is a type of quartile. The first
quartile (Q1) is defined as the middle number between the smallest number and the median of the data set.
The second quartile (Q2) is the median of the data. The third quartile (Q3) is the middle value between the
median and the highest value of the data set.

Partition values Division Notation


Median 2 Med
Quartiles 4 Q1 to Q3
Deciles 10 D1 to D9
Percentiles 100 P1 to P99

Q.11.What is meant by Skewness? How it is measured?

Ans) Skewness indicates lack of symmetry in a distribution. When a frequency distribution is elongated
to the right, that is , having a longer tail to the right, it is said to be positively skewed. If the distribution
has a longer tail to the left, it is said to be negatively skewed.
If mean>median>mode , skewness is positive
If mean<median<mode , skewness is negative
Symmetrical distribution is not skewed. (when mean=median=mode)

Skewness can be measured using four methods :-


Karl Pearson‘s measure ( where skewness = Mean-Mode/Standard deviation) Bowley‘s measure ( where
𝑄3+𝑄1−2𝑀𝑒𝑑𝑖𝑎𝑛
skewness = 𝑄3−𝑄1

𝐷1+𝐷9−2𝑀𝑒𝑑𝑖𝑎𝑛
Kelly‘s measure (where skewness = 𝐷9−𝐷1
)

𝑥𝑖 −𝑚𝑒𝑎𝑛
Moment‘s measure (for e.g µ1= 𝑁

Questions for Business Statistics and Analytics (KMBN-104) Compiled by Dr. Ritesh Singhal, AKGIM
-7-

Q.12. Differentiate between population and sample

Questions for Business Statistics and Analytics (KMBN-104) Compiled by Dr. Ritesh Singhal, AKGIM
-8-

Unit – II
Time series analysis: Concept, Additive and Multiplicative models, Components of time
series, Trend analysis: Least Square method - Linear and Non- Linear equations,
Applications in business decision-making. Index Numbers:- Meaning , Types of index
numbers, uses of index numbers, Construction of Price, Quantity and Volume indices:-
Fixed base and Chain base methods.

Q.1. Explain the components of time series.


Ans) The series of observations recorded over time is known as time series. Time series models uses past
history to predict the future. Components are following:

 Secular trend: The Tendency of the time series data to increase, decrease or stagnate over a
long passage of time. For Ex : Population
 Seasonal component: is the variability in the behavioral pattern during different seasons in an year.
For Ex: Sale of AC, Fans.
 Cyclical component: is almost synonymous with the business cycle reflecting the upswing and
downswing of the data over extended periods of time. For Ex : Recession
 Random or Irregular component: irregular variations caused by random factors and sporadic
causes like strikes, natural disasters and so on.

Q.2. Classify various index numbers and their uses.

Types of Index Numbers

 Simple Index Number: A simple index number is a number that measures a relative change in a
single variable with respect to a base. These types of Index numbers are constructed from a single
item only.
 Composite Index Number: A composite index number is a number that measures an average
relative changes in a group of relative variables with respect to a base. A composite index number is
built from changes in a number of different items.

Questions for Business Statistics and Analytics (KMBN-104) Compiled by Dr. Ritesh Singhal, AKGIM
-9-

 Price index Numbers: Price index numbers measure the relative changes in prices of a commodity
between two periods. Prices can be either retail or wholesale. Price index number is useful to
comprehend and interpret varying economic and business conditions over time.
 Quantity Index Numbers: These types of index numbers are considered to measure changes in the
physical quantity of goods produced, consumed or sold of an item or a group of items.

Index Numbers have the following features:


Index Numbers are indispensable tools of economic and business analysis. Their significance can be
appreciated by following points:
 Index number helps in measuring relative changes in a set of items.
 Index numbers provide a good basis of comparison because they are expressed in abstract unit distinct
from the unit of element.
 Index numbers help in framing suitable policies for business and economic activities"
 Index numbers help in measuring the general trend of the phenomenon.
 Index numbers are used in deflating. They are used to adjust the original data for price changes or to
adjust wages for cost of living changes.
 Index numbers are specialized averages which are capable of being expressed in percentage.
 Index numbers measure the changes in the level of a given phenomenon.
 Index numbers measure the effect of changes over a period of time.

Q.3. Distinguish between: Laspeyre‟s and Paasche‟s index number formula


Laspeyre‟s index number Paasche‟s index number
Proposed Etienne Laspeyres
Hermann Paasche
by

Questions for Business Statistics and Analytics (KMBN-104) Compiled by Dr. Ritesh Singhal, AKGIM
-10-

Proposed Measuring current prices or quantities in Measuring current price or quantity levels relative
for relation to those of a selected base period. to those of a selected base period.

Definition is a form of index number where prices, is a ratio that compares the total purchase cost of
quantities or other units of measure over a specified bundle of current-period commodities
time are weighted according to their values (commodities valued at current prices) with the
value of those same commodities at base-period
in a specified base period.
prices; this ratio is multiplied by 100.

Formulae

Tests of Unit Test – Satisfied Unit Test – Satisfied


Adequacy
Time Reversal Test – Not Satisfied Time Reversal Test – Not Satisfied

Factor Reversal Test – Not Satisfied Factor Reversal Test – Not Satisfied

Q.4. Why fisher‟s index is popularly known as „Ideal Index‟?

Ans) Fisher‘s index is popularly known as ‗Ideal Index‘ because

 It is based on variable weights.


 It takes into consideration the price and consideration of both the base year and the current
year
 It is based on geometric mean which is regarded as best mean for calculating index number
 It satisfies Unit Test, time reversal and factor reversal test.
 It consider all possible combination of p and q

Q.5. Differentiate between Time reversal and factor reversal tests. (OR)

Q.5. Discuss Tests of adequacy.

Questions for Business Statistics and Analytics (KMBN-104) Compiled by Dr. Ritesh Singhal, AKGIM
-11-

Ans)

Unit Test: This test requires that the index number should be independent of unit of measurement.
Except for simple (unweighted) aggregate index all other formula satisfy this test.

Q.6.Differentiate between Fixed base index and chain based index.

Questions for Business Statistics and Analytics (KMBN-104) Compiled by Dr. Ritesh Singhal, AKGIM
-12-

Q.7. Discuss the methods of trend analysis in time series. Also elaborate the method of least
square for linear and non-linear equation.

Sol. The following methods are used for analyzing the trend:

 Free hand method


 Semi Average Method
 Moving Average Method

Method of least square (Trend Analysis - Linear)

 𝒀=𝒂+𝒃𝒙
Normal Equations are
𝒀=𝒂+𝒃 𝒙

𝒙𝒀 = 𝒂 𝒙 + 𝒃 𝒙𝟐

When n is odd
x = (X-middle year) / Interval in X

When n is Even
x = 2 (X-Average of two middle year) / Interval

 
Method of semi average

This method divides the data into two parts, preferably equal, and averaging the data in each part. In
this way we obtain two points on graph. The line obtained by joining these points is the required trend
line.

 Moving Average Method

It is a series of successive average of m terms at a time until we exhaust the whole time series.

Least Square Method for non-linear trend


The least squares method is a form of mathematical regression analysis that finds the line of best fit for a
dataset, providing a visual demonstration of the relationship between the data points. Each point of data is
representative of the relationship between a known independent variable and an unknown dependent
variable. It can be done in following ways:

Questions for Business Statistics and Analytics (KMBN-104) Compiled by Dr. Ritesh Singhal, AKGIM
-13-

 Parabolic
 Quadratic
 Hyperbolic
 Exponential
 Logarithmic

Q8) Write the applications of time series and index numbers in management.

Applications of time Series

 It helps in understanding past behavior.


 Helps in planning future operations.
 It facilitates comparison.
 It helps in forecasting & control.
 Helps in formulating strategies
 Better decision making.
 Use of methods like; Method of least square, moving average etc

Applications of Index Number

 Index Numbers are for comparison


 Index Numbers as Economic Barometers
 Index Numbers Help in Studying Trends and Tendencies
 Index Numbers Help in Formulating Decisions and Policies
 Price Indices Measure the Purchasing Power of Money
 Index Numbers are Used for Deflation
 Use of methods like; Laspeyre, Paasche and Fisher‘s etc

Questions for Business Statistics and Analytics (KMBN-104) Compiled by Dr. Ritesh Singhal, AKGIM
-14-

UNIT III
Correlation Analysis: Rank Method & Karl Pearson's Coefficient of Correlation and
Properties of Correlation. Regression Analysis: Fitting of a Regression Line and
Interpretation of Results, Properties of Regression Coefficients and Relationship between
Regression and Correlation.

Q.1. Distinguish between partial and multiple correlation.


Sol. In case of multiple correlations all the variables can move simultaneously during the study where as
in partial correlation at least one variable kept as constant and rest allowed moving independently. After
that correlation, the relationship between variables can be measured accordingly.
Partial Correlation
 It measures the correlation between a dependent variable and one particular independent variable
when the entire remaining variable involved are kept constant, i.e., when the effect of all other
variables are removed.
 The notation used to express the partial correlation is r12.3 where primary subscripts 12 shows that
the relation between X1 & X2; the secondary subscripts 3 denotes that X3 kept as constant.
 Similarly, r12.34 shows the relation between 1 & 2 where 3 & 4 kept constant.

Multiple Correlations
 It comes under multivariate analysis.
 Here we establish relationship between two or more variable simultaneously.
 Here we measure X1 with a joint effect on X2 and X3.
R1.23 – Multiple correlation coefficient of X1 on X2 and X3. Here X1 is dependent variable and X2 & X3 are
independent.
R2.13 – Multiple correlation coefficient of X2 on X1 and X3. Here X2 is dependent variable and X1 & X3 are
independent.
R3.12 – Multiple correlation coefficient of X3 on X1 and X2. Here X3 is dependent variable and X1 & X2 are
independent.
 Multiple correlation coefficient is always positive.
 It is always lying between 0 and 1.
 Coefficient of multiple correlation is larger than either of the correlation r12, r13 or r23.

Q.2. Define Regression. Give important uses and applications of regression analysis.

Sol. Regression analysis is a statistical process for estimating the relationships among variables. It
includes many techniques for modeling and analyzing several variables, when the focus is on the
relationship between a dependent variable and one or more independent variables. More specifically,
regression analysis helps one understand how the typical value of the dependent variable changes when
any one of the independent variables is varied, while the other independent variables are held fixed. It is
of two types y on x and x on y.

Y on X

Here Y is dependent variable and X is independent variable. Its standard equation is:

Y=a+bX

X on Y

Here X is dependent variable and Y is independent variable. Its standard equation is:

X=a+bY

Questions for Business Statistics and Analytics (KMBN-104) Compiled by Dr. Ritesh Singhal, AKGIM
-15-

Uses of regression:

1. To model some phenomena in order to better understand it and possibly use that understanding
to affect policy or to make decisions about appropriate actions to take. Basic objective: to
measure the extent that changes in one or more variables jointly affect changes in another.
2. To model some phenomena in order to predict values for that phenomenon at other places or
other times. Basic objective: to build a prediction model that is consistent and accurate. Example:
where are real estate values likely to go up next year
3. You can also use regression analysis to test hypotheses. Suppose you are modeling residential
crime in order to better understand it, and hopefully implement policy to prevent it.
4. Decision making
5. Comparison
6. Prediction and Estimation
7. Relationship between variables.
8. Estimating a variable based on other variable
9. Business forecasting
10. Linear and non-linear regression
11. Line of best fit

We can use the technique of correlation to test the statistical significance of the association (or
relationship between variable). In other cases we use regression analysis to describe the relationship
precisely by means of an equation that has predictive value. We deal separately with these two types of
analysis - correlation and regression - because they have different roles.

Q.3. Define the term correlation. Discuss various methods of measuring correlation, types
and their uses.

Sol. Correlation

 Correlation denotes the degree of interdependence between variables or the tendency of


simultaneous variation between variables.
 The measure of correlation called the correlation coefficient.
 The degree of relationship is expressed by coefficient which range from correlation ( -1 ≤
r ≥ +1)
 Correlation is a statistical tool that helps to measure and analyze the degree of
relationship between two variables.
 Correlation analysis deals with the association between two or more variables.

Measure of Correlation

 Scatter Diagram Method


 Karl Pearson‘s Coefficient of Correlation ( r )
 Spearman‘s Coefficient of Rank Correlation
 Concurrent Deviation Method

1. Scatter Diagram Method It is a graphical method to find the correlation between variables.
Here the pair of the observations are plotted on a 2-D space. After joining these points we can
have the idea about the relationship between variables.

Questions for Business Statistics and Analytics (KMBN-104) Compiled by Dr. Ritesh Singhal, AKGIM
-16-

2. Karl-Pearson‟s coefficient of correlation (r) The value of r lying between -1 and +1 i.e., -
1≤r ≤+1 Coefficient of correlation is independent of change origin and scale. Coefficient ‗r‘ is
symmetric rxy=ryx The Probable error of ‗r‘ is used to interpreting its estimated value. There
should be sufficient number of items in the series. Correlation does not necessarily mean cause &
effect relationship.

Both the correlated variables may be affected by third variable. The correlation may be due to
random or chance factor. There may be a situation of nonsense or spurious correlation b/w two
variables.

3. Spearman‟s Coefficient of Rank Correlation Karl-Pearson‟s method discusses the


relationship between the quantitative variable where as Spearman‘s coefficient suitable for
qualitative variable like, rank given to the participant in any contest by two judges and we want
to measure the relationship between rank given by these judges.

ρ = 6 ∑D2 / N (N2-1)

4. Concurrent Deviation Method This is the simplest method in which only the direction of
change is taken into consideration rather than magnitude of variation. It gives a general idea
about the correlation between variables quickly.

Types of Correlation:

1. Positive & Negative


2. Linear & Non-linear
3. Multiple & Partial
Positive relationship – Variables change in the same direction. E.g., As height increases, so
does weight.

Negative relationship – Variables change in opposite directions. E.g., As TV time increases,


grades decrease

Multiple Correlation: Under Multiple Correlation three or more than three variables are
studied.

Partial correlation: analysis recognizes more than two variables but considers only two
variables keeping the other constant.

Linear correlation: Correlation is said to be linear when the amount of change in one
variable tends to bear a constant ratio to the amount of change in the other.

Questions for Business Statistics and Analytics (KMBN-104) Compiled by Dr. Ritesh Singhal, AKGIM
-17-

Non Linear correlation: The correlation would be non linear if the amount of change in one
variable does not bear a constant ratio to the amount of change in the other variable.

Uses of correlation

1. It helps in measuring the relationship in one figure only.


2. Existence of relationship enables us to predict the future.
3. We can estimate the value of one variable based on the value of other variable.
4. It helps in decision making.

Q.4. Elaborate the relationship between correlation and regression.

OR

Q.4. Compare the properties of correlation and regression coefficient.

S. No Correlation Regression
1 It studies relationship between two or more It uses to estimate/forecast value
variable. of one variable based on other
variable.
2 Methods are:- Methods are:-
 Scatter Diagram Method  Methods of least square
 Karl Pearson‘s Coefficient of  Regression coefficient
Correlation ( r ) method
 Spearman‘s Coefficient of Rank
Correlation
 Concurrent Deviation Method

3 The value of r lying between -1 and +1 i.e., Both the regression coefficients
-1≤r ≤+1 could not more than one
simultaneously.

4 Coefficient ‗r‘ is symmetric rxy=ryx Regression coefficient is not


symmetric bxy is not equal to byx.
5 Coefficient of correlation is independent of Both the regression coefficients
change origin and scale. are independent of change of
origin but not to scale.

6 ‗r‘ is a single numerical value which depicts Regression coefficient depicts rate
strength of relationship. of change.
7

8 Value of [r]= (+)(-)1 is means it is perfect byx and bxy may be used to derive
correlation. regression equation of y on x & x
on y.
The relation between r, byx and bxy is r = ±√ byx bxy

Questions for Business Statistics and Analytics (KMBN-104) Compiled by Dr. Ritesh Singhal, AKGIM
-18-

Q.5. Define Spearman‟s rank correlation coefficient.

Sol. Karl-Pearson‘s method discusses the relationship between the quantitative variable where as
Spearman‘s coefficient suitable for qualitative variable like, rank given to the participant in any
contest by two judges and we want to measure the relationship between rank given by these
judges.

R = 1- (6 ∑ D2 ) / N (N2 – 1)

R = Rank correlation coefficient „

D = Difference of rank between paired item in two series. „

N = Total number of observation.

Equal Ranks or tie in Ranks:

In such cases average ranks should be assigned to each individual.

R = 1- (6 ∑ D2 ) + AF / N (N 2 – 1)

AF = 1/12(m13 – m1) + 1/12(m2 3 – m2) +…. 1/12(m3 3 – m3 )

m = The number of time an item is repeated.

Merits Spearman‟s Rank Correlation„

 This method is simpler to understand and easier to apply compared to Karl Pearson‘s
correlation method.
 This method is useful where we can give the ranks and not the actual data.
(qualitative term)
 This method is to use where the initial data in the form of ranks.

Limitation Spearman‟s Correlation„

 Cannot be used for finding out correlation in a grouped frequency distribution.


 This method should be applied where N exceeds 30.

Questions for Business Statistics and Analytics (KMBN-104) Compiled by Dr. Ritesh Singhal, AKGIM
-19-

UNIT IV
Probability: Theory of Probability, Addition and Multiplication Law, Baye’s Theorem,
Probability Theoretical Distributions: Concept and application of Binomial; Poisson and
Normal distributions.

Q.1. State addition theorem of probability.


Sol. The addition theorem in the Probability concept is the process of determination of the
probability that either event ‗A‘ or event ‗B‘ occurs or both occur. The notation between two
events ‗A‘ and ‗B‘ the addition is denoted as '∪' and pronounced as Union.
The result of this addition theorem generally written using Set notation,
P (A ∪ B) = P(A) + P(B) – P(A ∩ B),
Where,
P (A) = probability of occurrence of event ‗A‘
P (B) = probability of occurrence of event ‗B‘
P (A ∪ B) = probability of occurrence of event ‗A‘ or event ‗B‘.
P (A ∩ B) = probability of occurrence of event ‗A‘ and event ‗B‘.

Q.2. Explain Multiplication theorem and Bayes‟ theorem.

Sol. Multiplication theorem

Multiplication law in probability applies to combination of events. When the events have to
occur together then we make use of the multiplication law of probability. Now two cases arise:
whether the events are independent or dependent.

Multiplication or Conditional Probability


 The probability of an event B when it is known that the event A has occurred already:
P(B/A)= P(A∩B) / P(A) ; if P(A)>0
ie. P(A∩B)= P(A).P(B/A)
 If A and B are Independent event:
P(A∩B)= P(A).P(B)

Baye‟s Theorem

If an event E can occur only if any one of the set of exhaustive and mutually exclusive events E1,
E2,…En occurs. The probabilities of P (E1), P (E2),……..P (En) and conditional probabilities P (E/Ei) for
an event A to occur are known. The conditional probability P (Ei/E) is given by
P (Ei). P (E/Ei)
 P (Ei/E) = ∑P (Ei). P (E/Ei)

Q.3. What is meant by theoretical distribution? Define and compare the properties of the
following distribution: a) Binomial Distribution, b) Poisson Distribution, c) Normal
Distribution

a) BINOMIAL DISTRIBUTION

Questions for Business Statistics and Analytics (KMBN-104) Compiled by Dr. Ritesh Singhal, AKGIM
-20-

A binomial distribution is very different from a normal distribution, and yet if the sample size is
large enough, the shapes will be quite similar.

The key difference is that a binomial distribution is discrete, not continuous. In other words, it is
NOT possible to find a data value between any two data values.

The requirements for a binomial distribution are


1) The random variable of interest is the count of successes in n trials
2) The number of trials (or sample size), n, is fixed
3) Trials are independent, with fixed value p = P(success on a trial)
4) There are only two possible outcomes on each trial, called "success" and "failure." (This is
where the "bi" prefix in "binomial" comes from).

Poisson distribution is another discrete probability distribution having specific use. It is a


limiting case of binomial distribution, when ‗n‘ is large and ‗p‘ is small binomial distribution
tends towards Poisson distribution. It is defined as

p (x)= e-λ λx / x! X=0,1,2…

Properties-
 Statistical independence is assumed.
 Each trail has constant probability
 When p, probability of occurrence is very small and n is very large.
 The only parameter is λ=np
 The mean and variance of PD is λ.
 Each trial has only two possible outcomes (Dichotomy)
 It is a limiting case of Binomial distribution

The normal (z) distribution is a continuous distribution that arises in many natural processes.
"Continuous" means that between any two data values we could (at least in theory) find another
data value. For example, men's heights vary continuously and are the result of so many tiny
random influences that the overall distribution of men's heights in America is very close to
normal. Another example is the data values that we would get if we repeatedly measured the
mass of a reference object on a pan balance—the readings would differ slightly because of
random errors, and the readings taken as a whole would have a normal distribution.

The bell-shaped normal curve has probabilities that are found as the area between any
two z values.
Not all natural processes produce normal distributions.

Properties

 The normal curve is symmetrical about the vertical axis through mean.
 Mean (µ) & SD(σ) are known as the parameter of the distribution.
 The curve is Asymptotic to X-axis.

Questions for Business Statistics and Analytics (KMBN-104) Compiled by Dr. Ritesh Singhal, AKGIM
-21-

 The problems related to Normal distribution can be solved by using the properties of
Normal Curve.
 The random variable X should be transform to Standard Normal Variable ‗Z‘ using
Z = (X-µ)/σ
 After the transformation the probability (or area) can be found using Normal distribution
table.
 The total area under the normal curve is 1, which is divided into two equal halves through
vertical axis.

Binomial Distribution Poisson Distribution Normal Distribution

x n-x -λ x
P(x)=nCx p q P(x)=e λ / x!
P(x)

X=0,1,2…n X=0,1,2… -∞<x<∞; Standard normal variable


Z=(x-µ) /σ is used

Discrete Distribution Discrete Distribution Continuous Distribution

Independence, Dichotomy, Independence, Dichotomy, Properties of normal curve to be


Constant Prob and identical Constant Prob and identical used. Total area under the curve is
condition condition 1 divided into 2 equal halves (0.5
each)
2
Mean=np, variance=npq Mean=λ, variance=λ Mean=µ, variance=σ
Mean > Variance Mean = Variance

Parameter= n, p Parameter= λ Parameter=µ, σ

Used for smaller values of n Used when n is large and p Used for any value of n and x
is small (limiting case of
BD)

Questions for Business Statistics and Analytics (KMBN-104) Compiled by Dr. Ritesh Singhal, AKGIM
-22-

Q.4. Differentiate between permutation and combination.


Sol.
Basis Permutation Combination
Meaning Permutation refers to the Combination refers to several ways of choosing
different ways of arranging items from a large set of objects, such that their
a set of objects in a order does not matters.
sequential order.

Order Relevant Irrelevant

Denotes Arrangement Selection

What is it? Ordered elements


Unordered sets

Formula P(n,r)=n!(n−r)! nCr = n! / r! * (n - r)!

Q.5. What is normal curve? State the properties of normal curve.


Sol. Normal Curve

A normal curve is a bell-shaped curve which shows the probability distribution of a continuous
random variable. Moreover, the normal curve represents a normal distribution. The total area
under the normal curve logically represents the sum of all probabilities for a random variable.

Properties of Normal Curve

The bell-shaped normal curve has probabilities that are found as the area between any
two z values.
Not all natural processes produce normal distributions.

Properties

 The normal curve is symmetrical about the vertical axis through mean.

 Mean (µ) & SD(σ) are known as the parameter of the distribution.

 The curve is Asymptotic to X-axis.

 The problems related to Normal distt can be solved by using the properties of Normal
Curve.

Questions for Business Statistics and Analytics (KMBN-104) Compiled by Dr. Ritesh Singhal, AKGIM
-23-

 The random variable X should be transform to Standard Normal Variable ‗Z‘ using

Z = (X-µ)/σ

 After the transformation the probability (or area) can be find using Normal distribution
table.

 The total area under the normal curve is 1, which is divided into two equal halves through
vertical axis.

Q6. Write a comparison between mutually exclusive and independent events.


Sol. If A & B are mutually Exclusive then A ∩ B = ф,
P (A ∩ B) = 0

If A and B are Independent event:


P (A∩B) = P(A).P(B)

Questions for Business Statistics and Analytics (KMBN-104) Compiled by Dr. Ritesh Singhal, AKGIM
-24-

UNIT V
Hypothesis Testing: Null and Alternative Hypotheses; Type I and Type II errors; Testing of
Hypothesis: Large Sample Tests, Small Sample test, (t, F, Z Test and Chi Square Test)
Concept of Business Analytics- Meaning types and application of Business Analytics, Use of
Spread Sheet to anlayze data-Descriptive analytics and Predictive analytics.

Q.1. Define null hypothesis, critical region and two sided test, used in testing of statistical
hypothesis.

Sol.
Null Hypothesis: In statistical inference of observed data of a scientific experiment, the null hypothesis refers to a
general or default position: that there is no relationship between two measured phenomena, or that a potential
medical treatment has no effect. Rejecting or disproving the null hypothesis – and thus concluding that there are
grounds for believing that there is a relationship between two phenomena or that a potential treatment has a
measurable effect – is a central task in the modern practice of science, and gives a precise sense in which a claim
is capable of being proven false.
Example

Given the test scores of two random samples of men and women, does one group differ from the other? A possible
null hypothesis is that the mean male score is the same as the mean female score:

H0: μ1 = μ2

where:

H0 = the null hypothesis

μ1 = the mean of population 1, and

μ2 = the mean of population 2.

A stronger null hypothesis is that the two samples are drawn from the same population, such that the variance and
shape of the distributions are also equal.

Critical Region

The critical region CR, or rejection region RR, is a set of values of the test statistic for which the null hypothesis is
rejected in a hypothesis test. That is, the sample space for the test statistic is partitioned into two regions; one region
(the critical region) will lead us to reject the null hypothesis H0, the other will not. So, if the observed value of the
test statistic is a member of the critical region, we conclude "Reject H0"; if it is not a member of the critical region
then we conclude "Do not reject H0".

Two-tail Test

Questions for Business Statistics and Analytics (KMBN-104) Compiled by Dr. Ritesh Singhal, AKGIM
-25-

In statistical significance testing, a one-tailed test or two-tailed test are alternative ways of computing thestatistical
significance of a data set in terms of a test statistic, depending on whether only one direction is considered extreme
(and unlikely) or both directions are considered extreme. Alternative names are one-sided and two-sided tests; the
terminology "tail" is because the extremes of distributions are often small, as in the normal distribution or "bell
curve", pictured above right.

If the test statistic is always positive (or zero), only the one-tailed test is generally applicable, while if the test
statistic can assume positive and negative values, both the one-tailed and two-tailed test are of use.

Figure: A two-tailed test corresponds to both extreme negative and extreme positive directions of the test statistic,
here the normal distribution.

Q.2. What are the steps in a test of significance problem?

OR

Explain various steps involved in hypothesis testing

Sol. Steps in Testing for Statistical Significance

1) State the Null & alternate Hypothesis (H0 and H1)


2) Select a probability of error level (alpha level) (or level of significance)
3) Select the appropriate distribution to use
4) Select and compute the test for statistical significance
5) Compare the value with table value
6) Interpret the results and make decision

Q.3. Distinguish between t-test and Z-test

Z-test and t-test are basically the same; they compare between two means to suggest whether both
samples come from the same population. There are however variations on the theme for the t-test. If you
have a sample and wish to compare it with a known mean (e.g. national average) the single sample t-test
is available. If both of your samples are not independent of each other and have some factor in common,
i.e. geographical location or before/after treatment, the paired sample t-test can be applied. There are also
two variations on the two sample t-test, the first uses samples that do not have equal variances and the
second uses samples whose variances are equal.

1. Z-test is a statistical hypothesis test that follows a normal distribution while t-test follows a Student‘s t-
distribution.
2. A t-test is appropriate when you are handling small samples (n < 30) while a Z-test is appropriate when
you are handling moderate to large samples (n > 30).
3. t-tests are more commonly used than Z-tests.
4. Z-tests are preferred than t-tests when standard deviations are known

Questions for Business Statistics and Analytics (KMBN-104) Compiled by Dr. Ritesh Singhal, AKGIM
-26-

Q.4. Chi- Square Test

Pearson's chi-squared test (χ2) is the best-known of many chi-squared tests (Yates, likelihood
ratio, portmanteau test in time series, etc.) – statistical procedures whose results are evaluated by
reference to the chi-squared distribution. Its properties were first investigated by Karl Pearson in 1900. In
contexts where it is important to improve a distinction between the test statistic and its distribution, names
similar to Pearson χ-squared test or statistic are used.

Condition for applying chi-square test:

 Observations recorded are collected on random basis.


 All the observations must be independent
 No group should contain few items less than 10
 Chi-square is a non-negative value.
 The curve is not symmetrical
 The only parameter is degree of freedom
 Mean = df
 SD = √2 df
 Chi Square value is always positive ranging from 0 to ∞
 When df>30 Chi-square distribution approximated to normal distribution.
 The value of chi square increases as df increases.

Uses of chi-square:
 Test of goodness of fit
 Test of independence
 Test of variance/SD of a single population.
The procedure of the test includes the following steps:
1) Calculate the chi-squared test statistic, , which resembles a normalized sum of squared
deviations between observed and theoretical frequencies (see below).

Questions for Business Statistics and Analytics (KMBN-104) Compiled by Dr. Ritesh Singhal, AKGIM
-27-

2) Determine the degrees of freedom, d.f., of that statistic, which is essentially the number of
frequencies reduced by the number of parameters of the fitted distribution.
3) Compare to the critical value from the chi-squared distribution with d degrees of freedom,
which in many cases gives a good approximation of the distribution of .

Q.5. Types of Error


Sol. Broadly there are two types of errors Type I and Type II error, defined as follows:

Accept H0 Reject H0

H0 (True) Correct Decision Type I Error

H0 (false) Type II error Correct Decision

Type I error is represented by α and Type II error is represented by ß.


(1-ß) is called power of the test.

Q.6. Differentiate between parametric and non-parametric tests.

Sol. When the information about the population is known then we use parametric test and if there is no
knowledge about the population or parameters, then we use non-parametric tests.

Advantage of Non-Parametric Test

 Non parametric test are simple and easy to understand∗


 It will not involve complicated sampling theory∗
 No assumption is made regarding the parent population∗
 This method is only available for nominal scale data∗
 This method are easy applicable for attributed dates.

Disadvantage of Non-Parametric Test

 It can be applied only for nominal or ordinal scale∗


 For any problem, if any parametric test exists it is highly powerful.∗

Questions for Business Statistics and Analytics (KMBN-104) Compiled by Dr. Ritesh Singhal, AKGIM
-28-

 Nonparametric methods are not so efficient as of parametric test∗


 No nonparametric test available for testing the interaction in analysis of variance model.

Q.7. Write a short on ANOVA.

Sol. ANOVA is a statistical technique used to evaluate the variances between three or more sample
means. This helps to make inferences to judge whether the samples are from population having a same
mean or not.

 ANOVA is classified into one-way and two-way ANOVA.


 ANOVA is a parametric test as it assumes normality.
 F-test is conducted for performing ANOVA. F-distribution has a pair of degree of freedom.
 In order to test the null hypothesis,
H0 : μA = μB = μC = μD
H1 : At least two means are not equal
 The results could be summarized in the table below called One-way ANOVA.

 The above results could be presented in a two-way ANOVA table as below:

Q.8. Under what condition do we make use of F-test?

 The F-test is based on F-distribution is called so in honor of great statistician R.A Fisher.

 This test is suitable for test of significance two sample estimates of variance.

 Since F test is based on the ratio of two variances, hence it is also known as variance ratio test.

 This test checks hypothesis about the fact that the dispersions of two random variables X and Y
which are represented by samples xS and yS are equal. The test works correctly under the
following conditions:

 both random variables have a normal distribution


 the samples are independent
During its work, the test calculates the F-statistic:

Questions for Business Statistics and Analytics (KMBN-104) Compiled by Dr. Ritesh Singhal, AKGIM
-29-

If X and Y have a normal distribution, the F-statistic will have F-distribution with NX -1 and NY-1 degrees
of freedom. To define the significance level which corresponds to the value of F-statistic high-precision,
F-distribution approximation is used.

Q.9. Discuss the concept of business analytics with its meaning, types and applications.
Sol. Business Analytics is the use of data, information technology, statistical analysis,
quantitative methods, and mathematical or computer-based models to help managers gain
improved insight about their business operations and make better, fact-based decisions.

Types of Business Analytics


Prescriptive analytics is really valuable, but largely not used. Where big data analytics in
general sheds light on a subject, prescriptive analytics gives you a laser questions. For example,
in the health care industry, you can better manage the patient population by using prescriptive
analytics to measure the number of patients who are clinically obese, then add filters for factors
like diabetes and LDL cholesterol prescriptive model can be applied to almost any industry target
group or problem.
Predictive analytics use big data to identify past patterns to predict the future. For example,
some companies are using predictive analytics for sales lead scoring. Some companies have gone
one step further use predictive analytics for the entire sales process, analyzing lead source,
number of communications, types of communications, social media, documents, CRM data,
predictive analytics can be used to support sales, marketing, or for other types of complex
forecasts.
Diagnostic analytics are used for discovery or to determine why something happened. For
example, for a social media marketing campaign posts, mentions, followers, fans, page views,
reviews, pins, etc. There can be thousands of online mentions that can be distilled into a single
view to see what worked in your past campaigns and didn‘t.

Descriptive analytics or data mining are at the bottom of the big data value chain, but they can
be valuable for uncovering patterns that offer insight. A simple example of descriptive analytics
would be assessing credit risk; using past financial performance. Descriptive analytics can be
useful in the sales cycle, for example, to categorize customers by their likely product preferences
and sales cycle.

Questions for Business Statistics and Analytics (KMBN-104) Compiled by Dr. Ritesh Singhal, AKGIM
-30-

Applications of Business Analytics

 Finance
It is of utmost importance to the finance sector. Data Scientists are in high demand in
investment banking, portfolio management, financial planning, budgeting, forecasting,
etc.
For example: Companies these days have a large amount of financial data. Use of
intelligent business analytics tools can help use this data to determine the products‘
prices. Also, on the basis of historical information BAs can study the trends on the
performance of a particular stock and advise the client on whether to retain it or sell it.
 Marketing
Studying buying patterns of consumer behaviour, analysing trends, help in identifying the
target audience, employing advertising techniques that can appeal to the consumers,
forecast supply requirements, etc.
For example: It is used to gauge the effectiveness and impact of a marketing strategy on
the customers. Data can be used to build loyal customers by giving them exactly what
they want as per their specifications.
 HR Professional

HR professionals can make use of data to find information about educational background
of high performing candidates, employee attrition rate, number of years of service of
employees, age, gender, etc. This information can play a pivotal role in the selection
procedure of a candidate. For example: HR manager can predict the employee retention
rate on the basis of data given by business analytics.

 CRM
It helps one analyse the key performance indicators, which further helps in decision
making and make strategies to boost the relationship with the consumers. The
demographics, and data about other socio-economic factors, purchasing patterns,
lifestyle, etc., are of prime importance to the CRM department.
For example: The company wants to improve its service in a particular geographical
segment. With data analytics, one can predict the customer‘s preferences in that particular
segment, what appeals to them, and accordingly improve relations with customers.
 Manufacturing
It can help us in supply chain management, inventory management, measure performance
of targets, risk mitigation plans, improve efficiency in the basis of product data, etc.
For example: The Manager wants information on performance of machinery which has
been used past 10 years. The historical data will help evaluate the performance of the
machinery and decide whether costs of maintaining the machine will exceed the cost of
buying new machinery.
 Credit Card Companies Credit card transactions of a customer can determine many
factors: financial health, life style, preferences of purchases, behavioral trends, etc.
For example: Credit card companies can help the retail sector by locating the target
audience. According to the transactions reports, retail companies can predict the choices
of the consumers, their spending pattern, and preference over buying competitor‘s
products, etc. This historical as well as real-time information helps them direct their
marketing strategies in such a way that it hits the dart and reaches the right audience.

Questions for Business Statistics and Analytics (KMBN-104) Compiled by Dr. Ritesh Singhal, AKGIM

You might also like