Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 40

OBTAINING DATA Prepared by:

LESSON 1 Engr. Eddie Santillan, Jr.


INTRODUCTION
 Statistics is the practice or science of collecting and analyzing numerical data in large
quantities, especially for the purpose of inferring proportions in a whole from those in a
representative sample.
 The science of statistics deals with the collection, analysis, interpretation, and
presentation of data.
 Descriptive statistics - Organizing and summarizing data (graphing and using numbers)
 Inferential statistics - formal methods for drawing conclusions from good data
INTRODUCTION
Statistical models can be used to predict life’s more uncertain situations. These special forms of
mathematical models or functions are based on the idea that one value affects another value. Some
statistical models are mathematical functions that are more precise—one set of values can predict or
determine another set of values.
Probability is a mathematical tool used to study randomness. It deals with the chance of an event
occurring.
Population is a collection of persons, things, or objects under study.
Sample is a subset or portion of a population.
 Sampling is to select a portion, or subset, of the larger population and study that portion—the sample
—to gain information about the population. Data are the result of sampling from a population.
INTRODUCTION
 A statistic is a number that represents a property of the sample.
 A parameter is a numerical characteristic of the whole population that can be estimated by a statistic.
 In order to have an accurate sample, it must contain the characteristics of the population in order to be
a representative sample.
 A variable, usually notated by capital letters such as X and Y, is a characteristic or measurement that
can be determined for each member of a population.
 Numerical variables take on values with equal units such as weight in pounds and time in hours.
Categorical variables place the person or thing into a category. If we let X equal the number of points
earned by one math student at the end of a term, then X is a numerical variable. If we let Y be a person's
party affiliation, then some examples of Y include Republican, Democrat, and Independent. Y is a
categorical variable.
INTRODUCTION
 Data are the actual values of the variable. They may be numbers or they may be words. Datum is a
single value.
 Two words that come up often in statistics are mean and proportion. If you were to take three exams
in your math classes and obtain scores of 86, 75, and 92, you would calculate your mean score by
adding the three exam scores and dividing by three. Your mean score would be 84.3 to one decimal
place. If, in your math class, there are 40 students and 22 are males and 18 females, then the proportion
of men students is 22/40 and the proportion of women students is 18/40 .
DATA, SAMPLING, AND VARIATION IN
DATA AND SAMPLING
 Data may come from a population or from a sample. Lowercase letters like x or y generally are used to
represent data values. Most data can be put into the following categories:
• Qualitative data - are the result of categorizing or describing attributes of a population.
• Quantitative data - are the result of counting or measuring attributes of a population (either discrete or
continuous).
 A sample should have the same characteristics as the population it is representing. Most statisticians
use various methods of random sampling in an attempt to achieve this goal.
 simple random sample - each group has the same chance of being selected
 Other well-known random sampling methods are the stratified sample, the cluster sample, and the systematic
sample.
Convenience sampling is a non-random sampling that involves using results that are readily available. For
example, a computer software store conducts a marketing study by interviewing potential customers who happen
to be in the store browsing through the available software.
DATA, SAMPLING, AND VARIATION IN
DATA AND SAMPLING
 A sampling bias is created when a sample is collected from a population and some members of the
population are not as likely to be chosen as others.
METHODS OF COLLECTING DATA
Census. A census is a study that obtains data from every member of a population.
- census is not practical, because of the cost and/or time required.

Sample survey. A sample survey is a study that obtains data from a subset of a
population, in order to estimate population attributes.
METHODS OF COLLECTING DATA
Experiment. An experiment is a controlled study in which the researcher attempts to
understand cause-and-effect relationships.
The study is "controlled" in the sense that the researcher controls
(1) how subjects are assigned to groups and
(2) which treatments each group receives. In the analysis phase, the researcher
compares group scores on some dependent variable. Based on the analysis, the
researcher draws a conclusion about whether the treatment (independent variable)
had a causal effect on the dependent variable.
METHODS OF COLLECTING DATA
Observational study. Like experiments, observational studies attempt to understand
cause-and-effect relationships.
unlike experiments, the researcher is not able to control
(1) how subjects are assigned to groups and/or
(2) which treatments each group receives.
PLANNING AND CONDUCTING
SURVEYS
The survey is a series of unbiased well-constructed questions that the subject must answer.
advantages of surveys
- efficient ways of collecting information from a large number of people
- easy to administer
- a wide variety of information can be collected and they can be focused (researchers can
stick to just the questions that interest them.)
PLANNING AND CONDUCTING
SURVEYS
Some disadvantages of surveys
- depend on the subjects’ motivation, honesty, memory and ability to respond.
- answer choices to survey questions could lead to vague data. For example, the choice
“moderately agree” may mean different things to different people or to whoever ends up
interpreting the data.
PLANNING AND CONDUCTING
SURVEYS
Various methods for administering a survey
- face-to face interview
- phone interview where the researcher is questioning the subject
- self-administered survey where the subject can complete a survey on paper and
mail it back, or complete the survey online
PLANNING AND CONDUCTING
SURVEYS
The advantages of face-to-face interviews includes
- fewer misunderstood questions
- fewer incomplete responses
- higher response rates
- greater control over the environment in which the survey is administered
- the researcher can collect additional information if any of the respondents’ answers
need clarifying.
PLANNING AND CONDUCTING
SURVEYS
The disadvantages of face-to-face interviews are
- expensive
- time-consuming
- require a large staff of trained interviewers.
- the response can be biased by the appearance or attitude of the interviewer.
PLANNING AND CONDUCTING
SURVEYS
The advantages of self-administered surveys are
- less expensive than interviews
- do not require a large staff of experienced interviewers
- be administered in large numbers.
- anonymity and privacy encourage more candid and honest responses, and there is less
pressure on respondents.
PLANNING AND CONDUCTING
SURVEYS
The disadvantages of self-administered surveys are
- responders are more likely to stop participating mid-way through the survey and
respondents cannot ask them to clarify their answers
- lower response rates than in personal interviews
- often the respondents who bother to return surveys represent extremes of the population –
those people who care about the issue strongly, whichever way their opinion leans.
PLANNING AND CONDUCTING
SURVEYS
Designing a Survey
When designing a survey, the following steps are useful:
1. Determine the goal of your survey: What question do you want to answer?
2. Identify the sample population: Whom will you interview?
3. Choose an interviewing method: face-to-face interview, phone interview, self-administered
paper survey, or internet survey.
4. Decide what questions you will ask in what order, and how to phrase them. (This is
important if there is more than one piece of information you are looking for.)
5. Conduct the interview and collect the information.
6. Analyze the results by making graphs and drawing conclusions.
PLANNING AND CONDUCTING
EXPERIMENTS
PURPOSE:
- DOES ASPIRIN REDUCE THE RISK OF HEART ATTACK?

HOW CAN WE ANSWER THIS?


- TO CONDUCT A WELL DESIGNED/CONDUCTED EXPERIMENT
PLANNING AND CONDUCTING
EXPERIMENTS
Experiments vs. Observational Studies
Observational Study
 Observe individuals
 Measure variables
 Do NOT influence the response
 Has global warming effected penguin mating behavior
Experiment
 Do something to your individuals
 Observe/measure response
 Does housing penguins in warmer environments effect mating behavior?
PLANNING AND CONDUCTING
EXPERIMENTS
TERMS:
Treatment – any specific experiment conditions applied to subject
Control group – a group of individuals as similar as possible to individuals receiving
treatment, but do not receive treatment
Experimental units – individuals or subject that are randomly assigned to receive a
treatment or be in the control group
Random assignments – experiment units are randomly assigned into the different
treatment and control groups
Replication – ensured that there are enough ‘observation’ to reduce the variation in
the results
PLANNING AND CONDUCTING
EXPERIMENTS
SOURCES OF BIAS AND CONFOUNDING
Confounding: effects on a responsive variable (the outcome or result of the study)
cannot be distinguished between explanatory variables (a variable we think explains
in the response)
Placebo effect: receiving any treatment (real or fake) shows an improved response
Blinding: subjects do not know which treatment they are receiving
PLANNING AND CONDUCTING
EXPERIMENTS
Characteristics of a Well-Designed and Well-Conducted Experiment
Control
 The effect of lurking variables, most often by comparing treatments
 Example: a "Control group" in a drug study to eliminate the "confounding effects" of environment or the placebo
effect

Replicate
 Each treatment on many units to reduce chance variation
 Example: do the mouse study many times

Randomize
 Use probability (chance) to assign experimental units to treatments
 May be the most important!!
 Because it allows us to say the different treatment groups start out similar
PLANNING AND CONDUCTING
EXPERIMENTS
Completely Randomized Design
If all the experimental units (subjects of the experiment) are randomly assigned to either the
control group or to the treatment group, then the experiment has a completely randomized
design.

Randomize by assigning each subject a number and then generating it to choose treatment groups.
PLANNING AND CONDUCTING
EXPERIMENTS
Block Randomization
Placing subjects into groups of similar individuals. The random assignments into treatment
groups is carried out separately within each block (think stratified random sample)
PLANNING AND CONDUCTING
EXPERIMENTS
Matched Pairs Design
Subjects are matched into pairs and get different treatments
Matched pairs are more similar than random unmatched subjects
Randomizing the rest of the experiment is still important!!!
PLANNING AND CONDUCTING
EXPERIMENTS
Experimental Set Up
Treatment Imposed = Independent Variable = Factors
Experimental Units = Subjects
Response Variable Observed = Dependent Variable
PLANNING AND CONDUCTING
EXPERIMENTS
Double-Blind Experiment
PLANNING AND CONDUCTING
EXPERIMENTS
PLANNING AND CONDUCTING
EXPERIMENTS
In a double-blind experiment, neither the subjects nor the researchers know to which group,
treatment, or control, subjects have been assigned. If a researcher knows that a subject is in
the control group, they do not expect a treatment effect, and their measurement of a response
might be understated. If a researcher knows that a subject is in the treatment group, they
might overstate a response simply because they expect it.
An experiment might also be single-blind. In this case, only one of the participants, either the
subjects or the researchers, knows to which group the subjects have been assigned.
Avoids unconscious bias
PLANNING AND CONDUCTING
EXPERIMENTS
Generalizability of Results
To determine if our data is "statically significant"
 i.e. is an observed effect so large that it would rarely occur by chance

If we designed and conducted our experiment well, we can generalize these results to the
population!
PLANNING AND CONDUCTING
EXPERIMENTS
The practical steps needed for planning and conducting an experiment include: recognizing the goal of
the experiment, choice of factors, choice of response, choice of the design, analysis and then drawing
conclusions. This pretty much covers the steps involved in the scientific method.
1. Recognition and statement of the problem
2. Choice of factors, levels, and ranges
3. Selection of the response variable(s)
4. Choice of design
5. Conducting the experiment
6. Statistical analysis
7. Drawing conclusions, and making recommendations
COLLECTING ENGINEERING DATA

Sometimes the data are all of the observations in the population. This results in a census.

However, in the engineering environment, the data are almost always a sample that has been selected
from the population. Three basic methods of collecting data are
• A retrospective study using historical data
• An observational study
• A designed experiment
COLLECTING ENGINEERING DATA

 Retrospective Study
 Montgomery, Peck, and Vining (2012) describe an acetone-butyl alcohol distillation column for
which concentration of acetone in the distillate (the output product stream) is an important variable.
Factors that may affect the distillate are the reboil temperature, the condensate temperature, and the
reflux rate. Production personnel obtain and archive the following records:
• The concentration of acetone in an hourly test sample of output product
• The reboil temperature log, which is a record of the reboil temperature over time
• The condenser temperature controller log
• The nominal reflux rate each hour
 The reflux rate should be held constant for this process. Consequently, production personnel
change this very infrequently.
COLLECTING ENGINEERING DATA

 This type of study presents some problems:


1. We may not be able to see the relationship between the reflux rate and acetone concentration
because the reflux rate did not change much over the historical period.
2. The archived data on the two temperatures (which are recorded almost continuously) do not
correspond perfectly to the acetone concentration measurements (which are made hourly).
It may not be obvious how to construct an approximate correspondence.
3. Production maintains the two temperatures as closely as possible to desired targets or set
points. Because the temperatures change so little, it may be difficult to assess their real impact
on acetone concentration.
4. In the narrow ranges within which they do vary, the condensate temperature tends to increase
with the reboil temperature. Consequently, the effects of these two process variables on acetone concentration
may be difficult to separate
COLLECTING ENGINEERING DATA

 A retrospective study would use either all or a sample of the historical process data archived
over some period of time and involve a significant amount of data, but those data may contain relatively
little useful information about the problem.
 Some of the relevant data may be missing, there may be transcription or recording errors resulting in
outliers (or unusual values), or data on other important factors may not have been collected and archived.
 In the distillation column, for example, the specific concentrations of butyl alcohol and acetone in the
input feed stream are very important factors, but they are not archived because the concentrations are too
hard to obtain on a routine basis.
 As a result of these types of issues, statistical analysis of historical data sometimes identifies interesting
phenomena, but solid and reliable explanations of these phenomena are often difficult to obtain.
COLLECTING ENGINEERING DATA

 Observational study

 In an observational study, the engineer observes the process or population, disturbing it as little as
possible, and records the quantities of interest. Because these studies are usually conducted for a
relatively short time period, sometimes variables that are not routinely measured can be included.
 In the distillation column, the engineer would design a form to record the two temperatures and
the reflux rate when acetone concentration measurements are made. It may even be possible to
measure the input feed stream concentrations so that the impact of this factor could be studied.
 Generally, an observational study tends to solve problems 1 and 2 and goes a long way
toward obtaining accurate and reliable data. However, observational studies may not help resolve
problems 3 and 4 .
COLLECTING ENGINEERING DATA

 Designed Experiments

 In a designed experiment, the engineer makes deliberate or purposeful changes in the controllable
variables of the system or process, observes the resulting system output data, and then makes an
inference or decision about which variables are responsible for the observed changes in output
performance.
 Experiments designed with basic principles such as randomization are needed to establish cause-
and-effect relationships.
THE ENGINEERING METHOD
The field of statistics deals with the collection, presentation, analysis, and use of
data to make decisions, solve problems, and design products and processes.

The
engineering method.
THE ENGINEERING METHOD
The engineering, or scientific, method is the approach to formulating and solving these problems. The
steps in the engineering method are as follows:
1. Develop a clear and concise description of the problem.
2. Identify, at least tentatively, the important factors that affect this problem or that may play a
role in its solution.
3. Propose a model for the problem, using scientific or engineering knowledge of the phenomenon
being studied. State any limitations or assumptions of the model.
4. Conduct appropriate experiments and collect data to test or validate the tentative model or
conclusions made in steps 2 and 3.
5. Refine the model on the basis of the observed data.
6. Manipulate the model to assist in developing a solution to the problem.
7. Conduct an appropriate experiment to confirm that the proposed solution to the problem is
both effective and efficient.
8. Draw conclusions or make recommendations based on the problem solution.

You might also like