1 Collecting and Interpreting Data Edexcel PDF
1 Collecting and Interpreting Data Edexcel PDF
1 Collecting and Interpreting Data Edexcel PDF
1 of 3 05/08/19 © MEI
integralmaths.org
Summary sheet: Collecting and interpreting data
Types of data
Categorical data are not numerical in value (e.g. colours of cars)
Discrete data are numerical data that can take only specific values, such as shoe sizes or number
of pets.
Continuous data are numerical data that can take any real values in a range, such as weights or
times.
Statistical diagrams
There are a wide variety of statistical diagrams that can be used to illustrate data. Some diagrams are
only appropriate for certain types of data.
A histogram is used to illustrate grouped data. The vertical axis gives frequency density (found by
dividing the frequency by the class width). The frequency for each group is proportional to the
area of the bars, and there are no gaps between the bars.
A frequency polygon is a plot of frequency against data values. The points are joined with
straight lines.
A box-and-whisker diagram, or boxplot, summarises numerical data by showing the lowest
value, the lowest quartile, the median, the upper quartile and the highest value.
A cumulative frequency curve is a graph illustrating numerical data. For each value, x, the total
frequency of data less than or equal to x is plotted against x. A cumulative frequency curve is
useful for estimating the values of the median, quartiles or other percentiles.
Measures of variation
The range is the difference between the highest and lowest values from the data.
The interquartile range is the difference between the upper quartile (the value corresponding to
¾ of the data when ranked numerically) and the lower quartile (the value corresponding to ¼ of
the data when ranked numerically).
Other interpercentile ranges can also be found, such as the difference between the 10 th
percentile and the 90th percentile.
The variance is a measure of the spread of a sample of data.
Variance
xi2 nx 2
where x is the mean of the data.
n
The standard deviation is the square root of the variance.
Bivariate data
Data which involves two variables, e.g. height and weight, are called bivariate data.
Bivariate data can be illustrated on a scatter diagram in which the axes represent the two variables and
each data item is plotted using coordinates.
2 of 3 05/08/19 © MEI
integralmaths.org
Summary sheet: Collecting and interpreting data
If it is believed that one variable is dependent on the other, the independent variable (sometimes called
the explanatory variable) should be plotted on the x-axis, and the dependent variable (sometimes
called the response variable) should be on the y-axis.
If a set of bivariate data plotted on a scatter diagram fall close to a straight line, there is linear
correlation. The closer the data lie to the line, the stronger the correlation. If the line has positive
gradient, there is positive correlation, and if it has negative gradient, there is negative correlation If all
the data lies on the line, there is perfect linear correlation.
It is important to remember that correlation does not imply causation; if there is correlation or
association, there may be other factors involved.
Sometimes a scatter diagram shows data falling in two or more groups. These may represent different
sections of the population (e.g. adults and children) and it may not be appropriate to treat the data as a
single set.
Outliers may be legitimate data items, or they may be errors. Even if they are legitimate, it may
sometimes be appropriate to exclude them from the data.
Cleaning data involves dealing with missing data, errors and outliers. How you deal with these issues
depends on the situation and what you are using the data for.
3 of 3 05/08/19 © MEI
integralmaths.org