Introduction To Statistics
Introduction To Statistics
Recommended literature:
Hanke E. J, Reitsch A. G: Understanding Business
Statistics
Anderson, D.R. - Sweeney, D.J. - Williams, T.A.:
Statistics for Business and Economics. South-Western
Pub., 2005, 320 p., ISBN 978-032-422-486-3
Jaisingh, L.R.: Statistics for the Utterly Confused.
McGraw Hill, 2005, 352 p., ISBN 978-007-146-193-1
Everitt, B. S.: The Cambridge (explanatory) dictionary
of statistics. Cambridge University Press, 2006, 442 p.,
ISBN 978-052-169-027-0
Illowsky, B. - Dean, S. (2009, August 5). Collaborative
Statistics. Retrieved from the Connexions Web site:
http://cnx.org/content/col10522/1.36
Why study statistics?
Collect data
e.g., Survey
Present data
e.g., Tables and graphs
Summarize data
e.g., Sample mean = X i
n
Statistical data
The collection of data that are relevant to the problem
being studied is commonly the most difficult, expensive,
and time-consuming part of the entire research project.
Statistical data are usually obtained by counting or
measuring items.
Primary data are collected specifically for the analysis
desired
Secondary data have already been compiled and are
available for statistical analysis
A variable is an item of interest that can take on many
different numerical values.
A constant has a fixed numerical value.
Data
Statistical data are usually obtained by counting or
measuring items. Most data can be put into the
following categories:
Qualitative - data are measurements that each fail into
one of several categories. (hair color, ethnic groups
and other attributes of the population)
quantitative - data are observations that are measured
on a numerical scale (distance traveled to college,
number of children in a family, etc.)
Qualitative data
Qualitative data are generally described by words or
letters. They are not as widely used as quantitative data
because many numerical techniques do not apply to the
qualitative data. For example, it does not make sense to
find an average hair color or blood type.
Qualitative data can be separated into two subgroups:
dichotomic (if it takes the form of a word with two options
(gender - male or female)
polynomic (if it takes the form of a word with more than
two options (education - primary school, secondary school
and university).
Quantitative data
Quantitative data are always numbers and are the
result of counting or measuring attributes of a population.
Quantitative data can be separated into two
subgroups:
discrete (if it is the result of counting (the number of
students of a given ethnic group in a class, the number of
books on a shelf, ...)
continuous (if it is the result of measuring (distance
traveled, weight of luggage, …)
Types of variables
Variables
Qualitative Quantitative
Understand the basic concepts of assessing model accuracy and the bias-
variance trade-off.
Communication Systems
Speech recognition, image analysis
The Learning Problem
Learning from data is used in situations where we don’t
have any analytic solution, but we do have data that we can
use to construct an empirical solution
0.10
0.05
0.00
y
-0.05
-0.10
x
The Learning Problem: Example (cont.)
The Learning Problem: Example (cont.)
Different estimates for the target function f that depend on the
standard deviation of the ε’s
Why do we estimate f?
We use modern machine learning methods to estimate
f by learning from the data.
The target function f is unknown.
We estimate f for two key purposes:
Prediction
Inference
Prediction
Prediction (cont.)
Prediction (cont.)
Prediction (cont.)
Average of the squared difference between the predicted
and actual value of Y.
Var(ε) represents the variance associated with ε.
{( X1 , Y1 ), ( X 2 , Y2 ), , ( X n , Yn )}
Second, we use the training data and a machine
learning method to estimate f.
Parametric or non-parametric methods
Parametric Methods
This reduces the learning problem of estimating the
target function f down to a problem of estimating a set
of parameters.
Reason 1:
A simple method (such as OLS regression) produces a
model that is easier to interpret (especially for inference
purposes).