Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 132

the term spss stands for statistical package for the social sciences

this is a software that we use to

analyze data so spss has the capability

of analyzing data using some basic

statistics and all the way to advanced

statistics

for example you can use spss to

calculate things like averages or to do

frequencies

but you can also use it to do charts and

also other advanced statistics for

example chi squares correlations

main comparisons regressions and the

even structural equation modeling and

the reason why spss is extremely popular

is because it actually used a very

simple and straightforward user

interface instead of just a language

that might actually be difficult for

other people to start using it so as you

can see here on my screen you can

actually use spss

through the interface itself by just

clicking on the menus and selecting what


kind of analysis you would like to do so

as you can see here with just a few

clicks i should be able to actually run

an output as you can see right here

so in this course i'm going to take you

from the very basics of using spss from

how do you even enter data

up to how you actually analyze the data

but in the end the most important thing

that i will be sharing in this course

is how you can actually interpret the

output that you get from spss

so let's get started with how do you get

to install spss in this lesson i want to

show you how you can download a copy of

the trial version of spss it actually

allows you

to use spss for free for 30 days and

we're going to be using that

for practicing with the analysis that

we're going to be doing in this course

so i'm going to go ahead and

right here i'm actually in google chrome

so i'm just going to say spss


trial

and press enter

and you can see that we have this first

page here that says ibm spss trials and

i'm going to open that

here we go we are on the ibm website

and we are on the species trials page

so here i can just go ahead and say try

spss 36 for free

and now it's asking me to create an ibm

account if you already have an ibm

account then you can just go ahead and

click login or if you don't have an ibm

account then it means you have to go

ahead and specify some account

information

now i do already have my ibm account so

i'm just going to go ahead and click

login but if you don't have just fill in

the information here your email your

first name last name your password and

go next you'll provide some additional

information then verify your email and

you're gonna have your own ibm account


that can be used for so many other

things in ibm

but once again i already have an ibm

account so i just click login

and right here it actually recognizes

that this is me i'm just going to go

ahead and continue here and now it's

asking me if i agree to be contacted by

ibm

for special student pricing

well i think that's fine let's find out

how that goes so i'm just going to say

yes click continue and now you can see

that i do have ibm spss statistics

subscription trial

and here i can actually click to

download so just go ahead and click

download

and here i'm on the downloads page it's

basically giving me information about

the trial so the trial has all the

features in that board and including

add-on features that's awesome okay i

have to download and install the


application on my computer

if i have any issues i can go to the

troubleshooting page by clicking this

and after installation i'll have to use

the ibm id so the account that i have

with ibm

that's the one that i'm going to use to

turn on my subscription and actually

start using spss

so i can just go ahead and scroll down

and you can see that now we have all

these options so

the first option is a 64-bit software

and based on my computer this is the one

i should be choosing if you have a

computer that is quite recent this high

probability this is the one that you

should supposed to be choosing

but if your computer is running a 32-bit

version of windows then you might want

to use this one but this very high

probability that you have a 64-bit

operating system so you have to use this

one if you're running on mac os then you


have the option here at the end to

download spss i'm just going to go ahead

and click the download button here for

microsoft windows 64-bit

so now here it is i'm just going to go

ahead

and download this

and now we just have to wait for it to

finish downloading and then we're going

to come back

with how to install so we have a copy of

spss 36 downloaded here so i can just go

ahead and click here

now your operating system might actually

ask you if you want to allow this to

make changes on your device

this is perfectly fine we downloaded it

from the ibm website we can trust it so

i'm just gonna go ahead and click yes

and now the installation wizard is

preparing

your ibm sps statistics setup and here

all we have to do is click next here we

have the license agreement we must


accept the license agreement and click

next

and he's just telling us where this

program is going to be installed the

default is fine you click next

and then you click install

and now we're just going to wait a bit

until it finishes installing and now ibm

spss 36 has installed

and here it says start ibm spaces

statistics now so if i click finish it's

going to start i think that's fine let's

click that finish

and now we are presented with a screen

where we have to log in with our ibm id

to do this obviously you need to have an

internet connection so it means every

time i'm starting up my spss software i

have to be online

for the sake of practice that's fine i'm

just going to go ahead and click login

with ibm id

and now i have to specify my the login

details for my ibm account remember it's


the same ibm account we just created

and those are my details i click login

and of course certain times you have

this windows alert which is fine with me

i actually even check the private

networks and click allow access

and i actually do have spss opened as

you can see right now so this window is

the start window of spss and we're gonna

start talking about this window

and the rest of the windows in the next

section where we'll be getting started

by looking at the spss interface

but at this point bravo and

congratulations you now have the latest

version of spss

and you have your trial that has started

in the next set of days obviously this

trial is going to expire so make sure

that you make the best use of the

software as you are in the trial and

make sure that within the time that

you're using the trial you are able to

complete the course


in this lesson we'll be talking about

the spss interface if you have been

using spss before

from version 15 all the way to the

latest version that i'm using as of this

recording which is this version we've

just installed

spss version 28

the interface is not very different

especially when it comes to the

functionality that we are provided on

the interface there might be a little

few changes on the graphics that they're

using but really behind all of it it's

almost the same so the first thing that

you can see here is that we have this

start window and says welcome to ibm

spss and it's allowing you to just click

a button to create a new data set or

maybe to create a database query if you

have a database where you want to

capture information from and

specifically telling you what's new in

this version and some links to the help


and support tutorials and community

and you can also switch here to

look at recent files and also look at

sample files i think this is really

important here you can actually just

take a look at the sample files

and use them in your analysis

if you don't want to have this every

time you're starting your spss software

then you can just go ahead and click

don't show this dialog in the future but

you can agree with me that there are

certain things here that may actually be

important so instead of

don't show this dialogue in the future

i'm just going to close this window

now when you do that you notice that

behind that we actually have another

window and this is the ibm spss

statistics data editor i always like to

make sure that i have expanded this to

fill in the whole space i think that

looks much better okay so let's go

through the interface the first thing at


the top is basically the name of the

software and the name of the window so

spss actually does have multiple windows

one of them is the one we just closed

which is the welcome or the start window

but this is the data editor window so

the data editor window

is the main window in spss that's where

you actually see the data itself and the

variables for that data then the next

line is actually the menu and you have

so many different menu items like file

so under this you have

all this information that has to do with

the file that you have opened for

example creating a new file opening a

file importing data we're going to see

how to import data here especially

importing excel files

or you want to open a restore point or

you want to save all data you want to

export this to for example excel and

other formats as well

now you also have the edit menu which


has to do with copying and doing redoing

cutting pasting and so on

then we have the other one which has to

do with just the view or how the window

is looking like right now on your screen

and you can see that we have for example

we can check to remove the status bar

which is the bar at the end there

and

you want to edit the menu itself and so

on

okay the next is the data menu item

this is a menu item where you are going

to find a lot of functionalities that

have to do with editing your data itself

so for example you want to identify and

use your cases

or maybe you want to identify duplicate

cases you want to transpose

split fire so that cases we're going to

see some of these

in later lessons especially on the topic

of data management and we also have the

transform menu here where you find


commands that have to do with editing

your variables so that is transforming

your variables for example maybe you

want to create a new variable or you

want to change a variable into a

different variable or you want to run

cases and so many of the functionalities

that you have here will create new

variables or they are going to replace

the values that you have in an existing

variable we're going to go deeper into

this and actually see when you can

actually use this then you have the

powerhouse itself which is analyze tab

and you can see all the analysis that

you have here you have

reports descriptive statistics

you even have by asian statistics you

have tables compare means and so on

this is where we come when we want to

actually do data analysis in spss i'm

going to talk about that you have graphs

where you have the chat builder we're

going to take a look at the chat builder


in the section of developing charts

and then you also have other utilities

for example if you want to define a

macro

which is a piece of program that

automates a few things that you want to

be doing in spss

and maybe you want to add extensions

and then in the end you have here the

window so basically if you have several

windows you can split the windows and so

on or you can minimize windows it's

basically the same as just clicking

minimize here at the top

and then lastly you have the help menu

item where you can actually go to topics

and learn how to do analysis in spss the

documentation is actually extremely good

and i actually very much recommend that

while you're learning you also take a

look at the

topics here or you go to spss forums and

or you go to spss statistics community

or to download the documentation in pdf


format for you to actually read and

continue learning now below the menu

itself you now have the toolbar so the

toolbar is basically just a list of

functions that are commonly used so the

functions that you have here are also in

the menus so for example the first icon

here is for opening a new data document

and this one is for saving this one is

for printing and so on so so all the

icons that you have here you also do

have them in the menu but just because

they're used often

you're gonna find it actually very

convenient if you actually just use them

through

this toolbar instead of having to go

through the menu where you have to click

several times

now the next is basically this bar here

which actually shows you the contents

of the cells that you have opened so far

so right now we don't have any

information on the cells that we have


here on this screen that looks like a

worksheet

but if you we have information here when

you click you're gonna see

the information all of the information

in that cell within this box right here

and below that is where we have the main

program now so the spss data editor

window is actually divided into two

views the view that you're looking at

right now is called the data view and

this is where you're actually going to

see the actual data itself so in the

data view the rows actually stand for

the individual records in your data so

for example if you're collecting

information about children's nutrition

so each of these rows might actually be

a specific record or a specific child

that you're collecting the data from or

if it's about households then each of

this role is actually going to stand for

a single household so one single unit of

your data is what this row is about so


in spss is called a case but you might

actually just call it a record of data

the columns are variables as you can see

the shortening here it says var that's

just standing for variable right now we

don't have any variable yet but once we

create the variables in the variable

view then the name of the variable is

actually going to appear in the columns

there let's switch to the variable view

you can switch to go to the variable

view by clicking down below here just

click variable view and you can actually

see it actually looks very similar here

it also looks like a worksheet where you

have columns and rows but this time

because we're in the variable view the

columns are the characteristics of each

of the variable so we have the

characteristic of the variable name the

variable type the variable width the

variable decimal and so on so each

variable

is going to have this information


the rows

are now the individual variables

themselves

so now if you have let's say you have a

variable age you're going to specify the

name the type the width is the decimal

all the information here up to the end

so we're going to go through all of this

when we're talking about creating

variables and entering your data in spss

so with that i hope you're ready to get

started but in the next section we're

going to take a break from spss a bit to

actually start talking about some basics

of statistics that you need to

understand i'm gonna make them so easy

for you and we're gonna need those

things when we're starting to talk about

how to enter data how to define new

variables and how to analyze your data

so let's go into the next section before

we dive in into the statistics let's

first of all find out about data

as human beings we're very inquisitive


in most cases we ask a lot of questions

for example maybe with the coveted

situation you might have noticed that

there's a lot of dropouts in schools so

you inquisitive you want to find out

what are the effects of covered on

education outcomes for kids now for you

to be able to answer that question you

need to go out and actually collect

facts you cannot just sit down and guess

with the problem this is where science

comes in we want to go out and collect

some facts so the individual

pieces of facts that we collect for the

sake of analyzing it that's what we call

data we go out so for example in this

case

i would go out and ask people or ask

students to find out what has been the

effect of kovid on the education so i

might have to ask certain questions so

those kind of questions that i'm asking

on the students

i want we are coding data so this is


gonna come in terms of student value so

for example i might be interested to

know the gender of students and i'd be

interested to know the ages the

classrooms that they're in maybe the

income levels and so on i want to find

out if they have had someone in the

family who had covered so what happened

after that so those pieces of

information that we collected those

pieces of facts that we're getting

from our respondents that's what we're

calling

data so whether you call it data or you

call it data well it's all the same

you're actually going to hear me say

that interchangeably so data actually

comes in different formats or different

types so let's take a look at the

different types of data that you're

going to come across

so the first type of data is

quantitative data and this is basically

data in form of words so if you ask a


question for example gender the

responses that you're going to get the

values that you're going to get are

going to be in the form of words for

example male or female if i ask you are

you married you're going to answer in

the form of words as well whether you're

married or you're single or maybe you

are divorced so qualitative data comes

in a way of words but not only that

we usually don't have any specific or

objective tool that we can use to

measure the values so for example if

you're asking are you married or not we

don't necessarily have a measurement we

don't have a tool that we're using

to actually objectively measure whether

this person is married or not or if i

ask you what's your level of

satisfaction of this course then the

response that you're gonna give me could

be numbers or could be in words but i

don't necessarily have an objective way

in which i can measure that i don't have


a scale somewhere a tool that i can use

to actually measure the level of your

satisfaction so that is qualitative data

so in qualitative data we're mostly not

describing things in terms of quantities

we simply have maybe groups of things

we're just mentioning names of things so

those names or groups of things or maybe

levels of things that's what qualitative

data is about so let's take a look at

the second type of data second type of

data is actually quantitative data and

from the word itself you can actually

see quantitative we're talking about

quantities right so quantitative data

describes quantities expressed

numerically so for example if i ask you

how old you are you're going to give me

your age in terms of numbers so you're

going to tell me that you're probably 20

years of age or 30 years of age so that

thing there is a quantity of time but it

can also be for example what's the

distance from your home to your nearest


poho or how many people are in the same

room that you're watching this lesson so

because you're gonna give me in terms of

numbers and these numbers are

objectively defined for example if

you're talking about age somebody who's

20 years old in one country is 20 years

old in another country and the reason

for that is because we have an objective

way of measuring time but also if you

say how many people are in the room

you're gonna actually count the number

of people so we have an objective way of

measuring that so that's quantitative

data and the values come in terms of

numbers now let's take a look at how

data is connected we now know that we

have qualitative data and we have

quantitative data and that for us to

answer questions about the world around

us we have to collect facts that we're

calling data but how are we going to

collect this data so there are two main

ways in which data is collated


so we have what you call primary data so

primary data this is data collected for

the first time by an investigator for a

specific purpose so if i go in the field

or if i go to some respondents some

households or let's say i meet some

students and i'm asking them questions

first hand then that is primary data

or maybe you're getting your data from

some experiments that you're doing

because this is a fresh experiment that

you're conducting you're actually

getting the data from the source itself

that is from the respondents or from the

occurrences that are happening in real

time and that is what you call our

primary data so this is data that

doesn't have any statistical operations

done on them this is not summarized data

the data is raw you just collected it

from respondents for example that is

what we call primary data now we also

have secondary data so secondary data is

data sourced from somewhere so for


example if you go to your country's

statistical office they actually collect

some data but now let's say that they

have analyzed that data

then that becomes secondary data and

you're getting information you're

getting facts from someone who has

already summarized that information we

actually deal with a lot of secondary

data from a daily basis maybe you're

reading a newspaper article

that is summarizing certain figures and

facts and you're using that in your

reporting or you're getting that and

actually doing your own extra analysis

so that's why we call secondary data so

that's it about data in the next lesson

let's take a look at what you call

variables

so in the previous lesson we've talked

about data and we found that data are

pieces of facts that we collect for

analysis

now these pieces of facts are going to


come in the form of variables so in this

lesson let's talk about the concept of

variables so what are variables

so variables are attributes that we are

trying to study so for example let's say

that we give you a questionnaire a data

collection form to a number of students

we might want to ask them questions like

what's your gender what's your sex or

what's your age or what's your monthly

income and so on so those individual

points that we are trying to collect are

what are calling variables and variables

are sometimes known as data items so now

let's talk about different kinds of

variables so variables are categorized

in terms of the data that they contain

remember that when you go for example to

a respondent for example to a student

and we're trying to capture

data from them these data are coming in

different variables so they're coming in

different items or characteristics that

we're trying to collect from these


individuals so the data that is

contained within a specific variable

determines what type of variable it is

so in this regard we actually have

quantitative variables so just like we

have quantitative data we have

quantitative variables so quantitative

variables are simply variables that

contain quantitative data so an example

there is going to be age so if you're

asking how old are you and you are

expecting the responses to come in the

form of numbers or until in the form of

number of years then that is called a

quantitative variable on the other hand

we have quantitative variables so these

are variables or data items or the

questions or characteristics that we are

trying to collect on our survey or

whatever data collection that we're

doing that contain

qualitative data we've already talked

about qualitative data and given


examples such as

sex or marital status or race or color

so those data items whose values are

going to be quieted in nature then those

are qualitative variables okay so let's

take a look at these variables in

greater depth so for example

quantitative variables we actually also

have

different types of quadratic variables

the first one is what we call

discrete variables so discrete variables

these are variables where the values are

quantitative or they're numeric they're

numbers

but the numbers themselves are discrete

or the number of themselves are counts

of individual items so for example

if we say number of people then that is

a discord value because you can never

have a middle ground between one and two

so you can never have 1.5 people you

have to count each and one of the items

individually so number of people number


of trees that is a discrete variable

on the other hand we have continuous

quantitative variables so

these are

measurements of continuous values so for

example age all right or distance or

volume so these are examples of

continuous

variables so these variables can contain

values that are on a continuous scale so

for example age you can have someone who

is 20.5 years old or some let's say

distance we can actually say 2.5

kilometers so we are not necessarily

going to just count each of the values

discretely but rather you can actually

have values in between of whole numbers

in simple terms mathematical terms

discrete values are like integers so

while continuous variables can take

other forms of numbers not just integers

so maybe you can have dismos and so on

so qualitative variables these are also

known as categorical
or grouping variables the reason they're

called categorical is because they

mostly put people into categories so for

example if we're saying morito status we

have people who are single and people

who are married so basically you have

those two specific categories there

or we can also call them groups right if

you have sex you have people who are

male and people who are female if you

have race you you know the categories

that we put people under in terms of

race so because we put people into

categories in quadratic variables in

most cases these are also known as

categorical or grouping variables

now the question could be why is it

important for us to understand the

different types of variables when we

start to analyze data you are going to

realize that the type of analysis that

you can perform on a variable actually

depends on the structure of data in that

variable in other words it depends on


the type of variable that you're dealing

with there are certain analysis that are

only applicable or they only work well

with categorical or quantitative

variables and there are others that

actually work well with continuous

variables or quantitative variables

for example if you're dealing with

categories right you cannot do any

mathematical calculation on categorical

values for example you have gender male

and female we cannot add them up we

cannot divide them up so we can never do

any mathematical manipulation on

categorical or qualitative data well on

the other hand for continuous variables

for variables that can take so many

different values in a numerical scale

then you can actually do some

mathematical calculation and because in

most cases for continuous variables you

have a lot of values for example age you

have so many different values in age

it's not appropriate to be counting each


of the values that we have in the

variable while for categorical values

you only have a few

values for example for sex you only have

male and female you it's very very

simple and straightforward to count that

if you have rest you only have a finite

list of races or if you have colors that

you are trying to collect you may have

just a group of colors wow if you go to

let's say you're talking about income

everyone is going to have a different

value if you're collecting so many

values or you're collecting data from so

many points like let's say 100 or 200

you might actually have 200 very

different values in that case it's not

appropriate for you to be counting each

and every one of them but rather you

have to do some mathematical

calculations so this is why it's very

important for you to understand all this

but before we go let's take a look at

also different kinds of different types


of variables but this time

by the role that those variables are

performing

in your research so depending on what

variable is doing in your research you

can have two different types of

variables you have independent

variables so independent variables these

are variables that we think

are the ones that are affecting the

outcome of another verb so let's take an

example of a scenario where we're trying

to study

what affects the amount of rain that we

get

in an area so

you want to find out the factors that

affect the amount of rainfall that we're

going to receive

so those factors for example you might

have wind right you might have

temperature you might have humidity so

those are variables that we think are

affecting excellent outcome in this case


the outcome is the volume or the amount

of rainfall that we are going to receive

so those variables that we think are

affecting amount of rainfall those are

independent variable

actually they are also called predictive

variables and i always like to call them

predictive variables and independent

and the reason for that is

um when you say independent variable it

kind of sounds like they can never be

affected they are independent they can

never be affected but that's not the

fact because independent variables can

actually even be affected by other

variables so the better way to call them

is predictive variables so because we

think that they are actually going to

predict an outcome of a sitting verb in

themselves they could actually be

affected by other variables as well but

at this point we think they are

predicting the outcome of certain values

that's why we're calling them predictor


variables okay so if you're structuring

your research question or you're

structuring whatever you're trying to

find out in the example i've given you

where we're talking about rainfall what

factors affect rainfall so you're

structuring that in terms of cause and

effect

then what we're saying is the course or

whatever variable or whatever data item

that you think is the one that is

affecting the outcome of the other

that's the predictor now let's take a

look at the other type so the other type

is a dependent variable so this if you

have a course obviously you have an

effect so this

is the effect this is a variable that we

think the values within this variable

are going to change depending on what's

going to happen with another variable so

the values for example the values of the

amount of rainfall will change depending

on the variable such as humidity the


temperature of the day and so on so the

amount of rainfall is what we are

studying trying to see what would be the

change in this variable and

what variables are causing that change

and that's what we call independent

variables we think that the values in

the dependent variables are determined

by the changes in the independent or the

predictive variable and we also call

these as

outcome variables so when we come to the

topic where we're going to be talking

about cause and effect for example when

you're doing regression analysis or

analysis of variance then we're going to

be talking about predictor variables or

outcome variables just think of them as

independent variables and dependent

variables or in other ways cause

on one hand and effect on the other hand

one of the most important things that we

need to understand

in statistics before we can even start


to analyze data is measurement levels

so what are measurement levels so

measurement levels are actually the

relationship between the values within a

variable remember a variable

is basically what's holding the

information or the facts that we're

trying to collect

so a variable can take magical values or

different possible responses for example

if i ask

what's your sex you could tell me

whether you're male or female if i ask

you are you married you tell me whether

you you are single or you are married so

variables can take different values now

those different values what is the

relationship among those values

now obviously that's a book definition

let's actually explain more about these

measurement divorce in a way that you

understand very easily but before you do

that it's actually extremely important

to learn this because


the choice of analysis that you're going

to use actually depends on the type of

variable that is the measurement level

of that variable and it's very easy for

us to understand at this point because

we've already covered the types of

variables like qualitative variables and

quantity variables so the measurement of

variables actually is a concept that is

taken from types of variables and we're

going to be tying in

into types of variables when we're

discussing measurement levels so let's

take a look at the different measures or

variables or the different measurement

levels of variables now remember we have

already said that measurement levels

actually come from the types of

variables and we know that we have

qualitative variables and quantitative

variables so in qualitative variables we

actually have nominal value so nominal

variable is actually a measurement level

variable apart from nominal variable we


also have ordinal variables and we have

binary variables so binary variables

ordinal variables and denominator

variables or a measurement levels under

qualitative variable type

then remember we have quantitative

variables so quantitative variables can

come in two measurement levels you have

ratio variables and you have interval

variables so let's take now dig deep

inside of these different kinds of

variables that we have talked about but

don't forget that we have quadratic

variables then we have quantitative

variables remember qualitative variables

also known as categorical or grouping

variables and quantitative variables

also known as continuous variables and

so within the qualitative variable types

we have nominal ordinal and binary

variables while within the quantitative

variables you can either you can either

have ratio variables or you can have

interval variables so let's start with


nominal variables so nominal variables

are variables where the categories do

not have a logical order okay so these

are remember these are categorical

variables or quantity variables but the

values themselves or the possible

responses that you're going to get on a

nominal variable

do not have a logical order you cannot

put them in a logical order of say

quantity or anything else

i'll give you an example of marital

status you can define microstatus in

terms of whether the person is single or

married

or divorced

now these three categories that i've

mentioned cannot be

arranged in some logical order so you

cannot say that there's something more

to single than

in married or something more to divorced

than in single so these are simply

names of values or names of


characteristics that you can

give to people or how you can describe

people

as a matter of fact the phrase or the

name nominal the word nominal is

actually coming from the phrase name

only in other words the categories or

the values that we are describing the

values within this variable are just

names of

characteristics that just names that we

are coining to certain attributes of the

objects that we are trying to study

so examples of nominal variables can be

sex marital status rest

color so every quality variable where

the categories cannot be arranged in

some order we're calling that a nominal

variable or name only so let's take a

look at ordinal variables so ordinal

variables are variables where the

categories have a logical order so when

you look at the values of the possible

responses that you're gonna get


you can actually arrange them in terms

of a certain order

okay as a matter of fact

the way the ordinal is actually coming

from the word order

all right so each category

of or possible value is actually

a label

of the variable so i can give you for

example satisfaction labels or level of

education or level of agreement

or maybe other continuous variables that

we want we just maybe decided to express

them as categories for example that's

people who actually collect age as age

groups so instead of actually asking how

old are you and just recording the

actual number itself you're actually

putting people into categories of age so

you can actually see for example if i

give you an example of a level of

education let's say that you're

collecting that as no education and then

you have primary school education then


you have secondary education or teacher

education if you take a look at the

categories here you actually notice that

these are in order you actually notice

that when you say primary education we

have more education in primary education

as compared to no education and if

somebody actually went all the way to

secondary school education

then you can actually say secondary

education

is actually more education than primary

education

and also tertiary education if if

somebody identifies as having have

having gone all the way to tertiary

education

then they have done much more education

than one who has just done secondary

education or primary education so the

values in this variable are actually

ordered and you can actually say that

each category is a higher level of the

variable so for another example let's


say you're talking about satisfaction

and let's say that i ask you how

satisfied are you with this course at

this point and i give you the scale of

one to ten so someone who says that they

are satisfied as nine that's obviously

more satisfaction than someone who'd say

six or five so as you go up the

categories you're actually also going up

in terms of the value itself you're

going up in terms of the quantity of

some sort but remember these are still

qualitative variables these are still

categorical variables we're still

putting people into categories

just that those categories can be

ordered so if you compare that with

nominal variables we are saying for

nominal variables the categories that we

have cannot be ordered

all right so let's now talk about binary

variables now binary variables are

simply variables that have only two

categories so if you have a variable


where you the answer is yes or no or on

and off or true or false then that is

called a binary variable or so called a

dichotomous

variable now let's move on to ratio

variables now if you remember we have

now gone from qualitative variables now

we're talking about quantitative

variables first of all ratio variables

are quantitative variables so these are

variables where you are actually

collecting data that has been measured

using

an objective way or an objective tool

for measuring these are continuous

variables that actually also do contain

an absolute zero point or a meaningful

zero point let's take an example of a

number of people okay so if you're

saying there are zero people you

actually mean that there's

no people

uh the zero there is actually meaningful

and it actually stands for an absence of


something so that's what it means by

absolute zero and the values also have

equality of intervals all right let's

explain equality of interface in a bit

all right so let's say the difference

between

five and ten people right so if you have

five people than ten people if you

subtract that so the difference is five

let's move over on the scale so if you

the difference between 20 and 25 people

is also five people

so these two differences are the same

right so if you are counting from a

certain point to a certain point

if the difference is the same across uh

different numbers that you're trying to

subtract then that is equality of

intervals there are certain values where

you don't have equality of intervals

i'll give an example for ordinal

variables so all new variables don't

have equality of intervals so if you say

that your level of satisfaction may be


uh you have numbers like you have one

and three so someone who says that they

are satisfied as num at number three and

someone who said they're satisfied

number two if you subtract that you have

an interval of two but is that the same

satisfaction that difference of two is

it the same as the difference between

maybe eight and ten all right so it

might not be the same because the value

there is very subjective it's not you

don't have an objective way of measuring

it so an ordinal variable doesn't have

equality

of interval while racial variables which

are continuous or the quantitative

they do have equality over intervals

so i've given you an example for example

number of people that is actually a

ratio variable now on the other hand we

have interval variables again these are

quantitative variables but the only

difference with the ratio variables this

time
is that

although you have equality of intervals

just like the ratio variables the ratio

between two numbers is not

meaningful what does that mean let's

take an example of temperature

okay so if you're measuring temperature

in degrees celsius

right if you say that today is 20

degrees

and maybe let's say two days later it's

40 degrees you cannot necessarily say

that it's twice the temperature that was

two days prior so 40 degrees is not

twice the temperature as 20 degrees

and this is also very connected to

something that we call the absolute zero

the thing about a variable like

temperature is that you don't have an

absolute zero let's explain this a bit

first of all let's ask ourselves what

are we measuring when we are collecting

data say temperature

what are we trying to measure so


temperature is technically or

scientifically is just the movement

between atoms so that is basically an

equivalent of heat so in essence we are

measuring the amount of heat that is in

a certain object that is what is called

temperature

now if we said zero degrees celsius it

does not necessarily mean that there is

absence of heat in fact the certain

objects are at 0 degrees the atoms are

actually moving so

0 there is not an absolute number so as

such the variable temperature as

measured in degrees celsius

is not a ratio variable but rather it's

an integral variable it does have

equality of intervals

but however it does not have an absolute

zero the zero there is not an absence of

things and it's not meaningful

so i've given an example of temperature

but another example that you can provide

is ph so ph is how we measure the


acidity

of objects so when you say acidity is

zero that's not a meaningful zero the

zero there doesn't really have any

meaning it doesn't stand for absence of

something the ph of 6 is not twice the

ph of 3.

as such that is an interval variable now

in spss we are only going to see nominal

variables ordinal variables and scale

variables so scale variables is just a

bracket term for continuous variables or

quantitative variables so when we get to

spss you need to understand that for

categorical variables we have ordinal

variables and nominal variables or for

continuous variables we only have what

you call scale variables which is

basically a bracket term for all

continuous variables so we're only going

to be dealing with three types of

variables but you need to understand

that the two nominal and ordinal are

categorical or qualitative variable or a


scale variable is basically a continuous

variable or quantitative variables which

could either be ratio or they could be

interval variables

so in this lesson let's talk about

branches of statistics this is just

going to be a little introduction about

the branches of statistics

but once we start doing analysis we're

actually going to go in depth

talking about the different statistics

that are actually under this branch of

statistics that i'm about to talk about

but i think first of all let's talk

about

what we call population and samples okay

let's take a scenario here let's say

that you're trying to study the

differences between using soft copy

study materials and hard copy study

materials on educational outcomes of

students at your university right so now

the students in the university become

the subjects that you're trying to study


so interested to find out the

differences between using soft copy

materials and hard copy materials but

for all the people in the university but

because of several reasons for example

you may not have enough money to be able

to study all those people but also

because it's gonna take a long time for

you to actually study a lot of people

then you might not be able to actually

study all those people so let's say that

you have 3 000 students in your

university so that is what we call a

population so the population is

basically the number of people that

you're actually trying to study so these

people actually in most cases have one

specific attribute that is common across

all of them so in our case these are

actually students in a university so the

attribute is that they are all in that

single university

now this is the population that we have

but like i've said you might not be able


to study all of this

that's why you're actually going to take

a subset of those people you're just

going to pick a few people and we call

that your sample so now your sample is

just a subset of that population that

now you're going to try to study so just

take a look at the definitions a

population is a complete group of people

objects or items that you are trying to

study you wish that you could study all

these people in our case all the 3 000

students in the university

but realistically we're going to get

what we call a sample so a sample is now

a smaller group of people

or the objects that have been taken from

that population that you are trying to

study so the population is going to be

big a lot of people who share one

attribute that you're interested in

but because of several reasons that i've

mentioned earlier for example cost and

time you're only going to pick a few


people from that population and that's

what you call the sample statistics is

extremely smart because it actually

gives us the power to use a few people

and study them and then in the end be

able to generalize across that huge

population

so the group that we are calling example

are the ones that are actually going to

participate in the study so for example

you might actually come up with some

questionnaires that these people are

going to respond so the people who are

participating in the study itself that

is your sample

but like i've said in statistics

statistics gives us the power to

actually use the sample or whatever

we're getting from this sample to

actually generalize to that broader

population so now let's go into the

branches of statistics the first branch

of statistics is called descriptive

statistics and as the name suggests


descriptive we're using descriptive

statistics to describe or summarize your

data and which data are we talking about

the sample data that is the data at hand

so descriptive statistics is only about

summarizing the data that you have

collected

it's not about now talking about the

whole population the goal of descriptive

statistics is to just summarize and

describe the data that you have at hand

so we do that using the mean the median

the mode and standard deviations even

charts

we're actually going to be looking at

these in greater detail in just a moment

on the other hand we have what we call

inferential statistics so inferential

statistics now uses the data that we

have collected from the sample in order

to generalize or to make generalizations

to the broader population that we are

trying to study remember we wish to

study
all the students in the university but

because of some reasons we cannot so we

have drawn a sample now this sample is

supposed to be representative enough so

there are words in which we make sure

that this is representative enough now

when the sample is representative enough

then we should be able to infer or to

generalize whatever is going on in the

sample to the broader population where

the sample was taken that's what we're

calling inferential statistics and

examples of inferential tests include

correlations t-tests analysis of

variance and regressions and again we

are going to look at these in greater

detail in this course

and remember our goal here is to be able

to choose the correct type of analysis

we have to do

and then to actually do that in spss to

be able to interpret what's going on

within that analysis and in the end to

actually write our analysis or


interpretation using the apa format

now in this lesson we want to get data

into spss so there are two ways in which

you can get data into spss for you can

manually create your variables and enter

data

but you can also import data from other

formats for example from microsoft excel

now in this lesson we're just going to

see how we can manually create some

variables and enter our data directly

into spss suffice to say

it's not very recommended to enter data

straight into spss because there are

other programs that are very specific to

entering data and they actually do

provide a lot of functionality about

data entry so i very much recommend

using other programs for example cobo

toolbox whose course you can actually

also find in the course platform

okay so now how do we create variables

and enter data we're going to start by

creating some variables


so remember here to create variables we

need to switch to the variable view

which is this view here by just clicking

the button here at the bottom

okay so now we have a list of variables

which i have provided along with this

lesson so make sure that you're looking

at those variables as we are going to be

doing this together okay the first

variable is case id

and the code is just q1 so we're going

to start with the name of the variable

so the name of the variable is used in

spss procedures

and there are certain rules that you

must follow when you are recording the

name of the variable so for example it

cannot contain any spaces so here

if i type case id

like that with a space and press enter

is actually going to say variable name

contains illegal character so we cannot

do that just click ok

the next rule is that you cannot start


with the number so for example when if i

say 1

case id and press enter again it says

variable name contains an illegal first

character so the first character can

only be a letter of the alphabet from a

to z so that's okay we just click ok

there

but you can also not include any special

characters except some very few

characters that are allowed for example

the underscore the add symbol the hash

symbol or the dollar sign if you include

any other symbols for example in the

dodge or the comma it's also going to

give you an error it's always

recommended to make sure that the name

is short enough and it's also it's

readable

but also the name of the variable should

be unique in other words you can never

have two variables that have exactly the

same name

okay so remember for our variables we


actually do have codes like q1 so i

think we're just going to use q1 there

as the name of the variable it passes

all the rules that i've just mentioned

what i do in most cases once i have

typed that is i press on my tab key on

the keyboard to jump to the next so when

you jump to the next you'll notice that

spss actually adds some defaults so we

already have the type of variable the

width the decimals and so on of course

we're going to be changing some of these

variable defaults but for now let's take

a look at the next one so the next is

actually the data type and the data type

is basically the type of data that is

going to be typed in in the data view so

when you go to the data view right now

so if i switch to data view you notice

that the variable comes here on the

first column so the data that is going

to be entered here is it going to be

typed as numbers or is it as text

let me switch back to variable view


so

here when you click on the word you

merit you see that there is a button on

the right hand side here

when you click on the button you're

actually going to have this dialog box

that has a list of so many different

variable types

so the first variable type is numeric

and this is if the variables values are

going to be numbers so this will include

variables for example if you have

household size or household income and

so on but in certain cases we actually

can also make a variable like gender

where you have words like male and

female to be a numeric variable and the

way we do that is that we will assign

this as a numeric variable and then we

have to define the meaning of the

numbers that we are going to assign so

for example if you assign one for male

and two for female then we have to

actually specify what one and two mean


by using the column here that says

values so i'll talk about values in just

a moment so any variable that is purely

numeric like number of people household

size that's going to be numeric but

other variables categorical variables

like gender where you have assigned

numbers to words then you can also make

them numeric and specify the values

under the values column

the next is the comma and basically this

is also numeric just that for every

three zeros from the right hand side

you're going to have a comma

the way that we write numbers for

example currencies

in other countries instead of using the

comma to separate zeros or numbers they

actually use dots and that's what you

have here the dot type it's also a

numeric type just that for every three

digits from the right hand side you get

a dot instead of a comma and in most

cases the comma will now be used as a


dismal symbol

then we have scientific notation which

is simply a numeric variable whose

values are displayed with an embedded e

and a signed power of 10 exponent so for

example

you can have something like 5.634

e minus 5

which actually means 0.0005634

now you might not really be using this

now and again because it's mostly used

for numbers that are extremely small you

might not be dealing with this one

unless if you're going to be dealing

with pure sciences

next we have the date type which as the

name suggests is for debts and then you

have the dollar type which as the name

suggests is for dollars that is the

currency dollar

and then you have the option custom

currency the custom currency can

actually be set under the edit menu

so if you have certain currencies that


you're going to be using the most you

can actually set them under the edit

menu and then when you select custom

currency you will be able to select the

custom currency that you want it's

actually just as simple as setting the

prefix of the currency for example in my

case in malawi that's mwk or malawi

the next one is string and string is

literally just text

so if the variable is going to be typed

in as text for example descriptions

then you're going to select string

and finally we have restricted numeric

this is a variable whose values are

going to be integers but you want to

keep the leading zeros if you pull up

your calculator now and type 001 by

default the calculator will ignore the

first two zeros

so if you want to maintain the zeros in

spss when you're typing you're going to

select restricted numeric all right so

those are the different types of


variables that we have

now q1 remember this is case id and

we're actually going to have this as a

number so i'm just going to pick numeric

now you'll notice that on the same

dialog box you actually do have the

width and decimal places which are

characteristics that are also find right

here as you can see so they are actually

the same

so let me just expand them before i go

the width is the maximum number of

characters that you would like to allow

on that colon so that is how many

characters maximum can be typed in that

column so for a numeric if we say eight

characters it means that you're going to

go to the tens of millions now if you

want to go further than that then you

have to increase the width here to

include more number of characters

the decimal places is as the name

suggests how many distance do you want

to show by default it's two now if you


don't want any dismounts then you're

going to put zero here so in my case

case ids

we're not going to have any dismas so

i'm just gonna delete this and type zero

and that's it then i can click ok

so in this way it means i'm done with

the width and the decimals and why

haven't we edited the width i think the

width is fine this is the mac the

maximum is eight characters i don't

necessarily need to limit this to maybe

two or three if this allows the values

that i'm going to be typing that's

perfectly okay

the next thing is the label now the

label the next thing is the label so now

the label is the display name of the

variable

so when we do analysis when we have the

output we don't want the output to show

q1 because we're going to present this

information to other people by saying q1

people will not understand what q1 is as


a variable so it's better to actually

type the whole text which is the name of

the variable itself now the label

actually allows you to type anything

else here so it actually accepts things

like spaces or symbols and so on

so in other ways i can actually type

case space id

and that is actually going to be allowed

moving on we have the values the values

are now a list of valid options for the

variable so for example if we have a

multiple choice variable like gender

where you have one for male and two for

female you must specify here what one

and two means so what we do is we click

there and click the button and then

specify here we're going to show an

example when we get to the variable

gender or any other variables that

require the values next we have missing

you may define values as spatial missing

values for example let's say you want to

distinguish between data that are


missing because a respondent refused to

answer

so you might use something like eight

eight to show that that's refused to

answer

and maybe you want that data that are

missing because the question did not

apply to that respondent

for example you want to show 99 as not

applicable

so data values that are specified as

user missing are going to be flagged for

special treatment and will be excluded

from most calculations so for example

here let's say that you asked a question

are you married

now others will be married and others

will say they are not married but you

might have respondents who don't have to

respond to that question for example

underage kids

so when you're calculating the

percentage of people who are married you

don't want to include those that did not


respond because the question was not

applicable to them because it might mean

something quite different so you want

the percentage to only be out of those

people who said yes they're married or

no they're not married and not ones that

are not applicable so in this case for

the cases where the question was not

applicable then under missing you have

to click the button and select that you

have discrete missing values

so let's say that you are using 99 as

not applicable then you would type 99

here to show that when spss finds this

value the data then that value should

not be included in calculations

now we're not going to have that for q1

so i'm just going to go ahead and close

this

the next one is number of columns and

number of columns is just how wide the

column for that variable is going to be

so for example if i go to the data view

and then i click


in between of two columns like this that

we do in excel and click and drag out

when i go back to variable view you

notice that number of columns is now 27.

if i go back and make this a bit smaller

in the variable view you'll notice that

now it's 12. so what that means is when

you type here you can actually type

12 characters that will show up here

so this has nothing to do with your data

it's literally just about how the data

is going to be viewed in the data view

so in most cases we don't change

anything there

and that does the same with the

alignment the alignment again is not

very important it's just how the data

will be aligned in the column for that

variable do you want it to be on the

left side on the right side or the

center

it's literally just about aesthetics

and then we have measurement labels if

you remember we talked about measurement


labels and we distinguish between

nominal scale and ordinal variables if

you don't remember this please go back

to the lesson where we're talking about

measurement levels and understand the

measurement levels before you come here

because we're going to be using them

when we are creating variables

in my case here variable q1 is ids it's

not a quantity

so it's basically just names we simply

want to name the cases by giving them

numbers

so that is going to be a nominal

variable

and finally we have the row

and the reason we have the raw is

because some dialog boxes in spss

support predefined rows that can be used

to pre-select variables for analysis

so when you're running analysis and you

have defined rows already then spss is

going to automatically move the

variables to their corresponding boxes


depending on the row that you have

specified

so the rows that we have here are for

example input which is the same as an

independent variable an independent

variable is a variable that we think is

affecting another variable

and then we have targets which is the

same as a dependent variable

and the dependent variable is the effect

or the variable that we think is being

affected by another variable

and then we have both that is the

variable could either be input or target

and then we have none which means that

the variable could be neither

independent variable nor dependent

variable or we could want to use the

variable to partition the data for other

purposes for example for training for

testing and validation

or you might want to use that variable

to do comparisons across your data

by using the split command


so those are the characteristics of

variables now in the next lesson we're

going to go through the rest of the

variables and defining them

appropriately now let's move on with the

rest of the variables that we have so

the next variable is q2 which is name of

respondent so once again i'm just going

to type here q2

and then i press the tab key

and by default the type says numeric

however this is name of respondent so

we're going to be typing text or the

names of the people

in text so we have to click where it

says numeric to review that button and

then we click the button now for text we

need to use string

and you can see here it actually says

characters but it's actually the same as

width so we have to think about the

longest name how many characters would

the longest name b

i have seen that 50 characters is


actually enough so i'm just going to

type 50 like so and click ok

the next thing i would do is now specify

the dispose but already you can see that

this mode is zero and that is because

here the variable is a string variable

so for text we don't have

any decimal places the next thing is a

label of course remember the label is

the display name of the variable so here

i'm going to type name of respondent

like so

and i press the tab key to go to the

values

we don't have the values for this

variable so i'm just going to go

straight away

to the measurement label because we

don't have any missing values for name

of respondent we're going to have names

for every record in the data set and

remember

columns and alignment are not very

important but now let's set the


measurement label

now obviously this is a name so already

this is going to be a nominal variable

now

in terms of the row i'm just going to

leave it as default as a matter of fact

this is what we're going to be using

for every one of the variables we're not

going to be setting the row because it's

not very important at this point

let's go to the next one which is q3

we'll type q3 and press the tab key

and now the type here by default says

numeric

and the variable here is age of

respondent so that's certainly numeric

so i'll leave it as the default

the width is 8 that's fine

and we have two decimal places if you're

going to have decimal places for the

variable h then you can go ahead and

type there

but i'm not going to have any dismal so

i'm just going to reduce this to zero


and then the label is h of respondent

all right the next thing is the values

so this is not a multiple choice

question so we don't need to set the

values

now in the missing if we have a value

that actually stands for missing so for

example if certain people did not tell

us their age

and we had to type something else to

stand for

missing then we'll set it here so let's

say that for every person who said they

would not tell us the age we were

recording negative 9

then what we have to do is click where

there's none for that variable and click

the button

and where we have missing values we

click on discrete missing values and

here we're going to say -9 to mean

missing and then we click okay

so for every case where they were not

giving us the age would type minus nine


and when we're calculating things like

the averages for that variable that

minus nine is not going to be factored

in because it's a missing value

columns and alignment are not necessary

but let's talk about the measurement

level this is age of respondent and it's

obviously a continuous variable or a

quantitative variable as we had already

been discussing so here i'm going to

pick that it's a scale variable the next

thing is a variable q4 so i'll type

there q4 and press tab and by default

the variable type is numeric the

variable we have here is 6 of respondent

now we could make its text so that we

are typing male or female but when

you're entering data directly into spss

you can make spelling mistakes

now when you make a spelling mistake and

you analyze your data all those

spellings will actually show up as

separate values

so to avoid that that's why we enter


numeric values instead so it's easy to

type one or two as compared to typing

the word male or the word female so

we're just going to make this numeric

and then we'll set the values

accordingly

the width is fine

we're not going to have any dismounts

because we just have two values one and

two

and the label is supposed to be sex of

respondent like that

and i tap key so on the values here

we now have to click where it says none

and then click on the button like so

then we're going to set the values so

here the value one the label is male and

you click the add button

and then you have two

whose label is female and then you click

the add button again and then you click

ok

so now we have set the values

next up if there are people who say


they're not going to tell you their sex

which i'm very sure that you'll not

really be asking this as a question you

just notice an actual recording

but if there are then you would set the

values that are standing for the missing

values but we don't have any so we jump

missing columns and alignment and i talk

about measurement level the default

software is nominal and if you've

remembered the discussion on measurement

levels it certainly is a nominal

variable

finally we have q5

q5 is education level attained and just

like sex of respondent we actually have

values and their labels

so we're going to make this numeric so

that when you're entering data we're

entering the numbers one two three or

four but we are going to set the values

to show spss what those numbers mean

so here we don't have this much because

we only have one two three and four as


the values

and the label is going to be education

level

attained

then in the values we click where it

says none and then we click the button

now under value we have one which stands

for none

and then we press

and then we click add

then we have two this time we can use

the tab key

and then type primary

and we can use the enter key to add it

so we can now go back to say three

that's the value

tab key this is standing for secondary

and again we can just press the enter

key

and finally there is four

tab key and this is tertiary

and then we press the enter key again to

add that set and once that is done we

click ok
that's perfect

so now i'm going to jump missing columns

alignment again and we'll go to

measurement level

now if you take a look at this variable

the values are ranging from one all the

way to four

and if you take a look at the education

levels these are levels of education so

then as the numbers are going up

the educational level is actually also

going up so although this is a text

variable a categorical variable but we

have some kind of order going on if you

remember on our lesson on measurement

levels we call this an ordinal variable

so now we have set up all the variables

let's go ahead and save our data set you

can do that by just clicking this big

save icon

and then just go ahead and give it a

name i'm just going to call this data

and then we just click save

that's perfect so now in the next lesson


we'll talk about how to enter data in

spss

now that we have created our variables

it's now time to enter data

so what we do here to enter data is to

switch to the data view now you notice

on the data view that the variables are

right here at the top we have q1 all the

way to q5

and when you move your mouse over the

variable it actually shows all the

details of the variable so we have the

name the label

the type

and of course the measurement so let's

go ahead and type a record i have

actually provided you with a table

of the data that we are using here for

practice so just go ahead and practice

with that data

so q1 which is a case id is one and what

i normally do is once i type that i use

the tab key to maneuver to the next

column
so the next is a name which is peter

then tab key

the next is the age 20 type key the next

is sex of respondent and remember we

have one for male and two for female so

in this case that's one

and the next is educational level

attained which is primary and if you

remember on the list that is two

and when you press the tab key it

actually takes you in the next row in

the first column that is really neat and

fast

however you obviously might make

mistakes here without realizing that you

have entered the wrong data

especially if you don't really see what

the one or what the two means so what

you do now is that on the shortcut

toolbar

you go ahead and click this button here

that says value labels

so once you click that button

instead of showing the values


it's actually showing the labels so

where there's one there's mail and where

there's two there's actually a primary

in the question q5

so that's going to make it easy for us

when we're typing the data but also it's

going to make it easy for us to

understand the data just in a snapshot

by looking at it

so the next one is two

and this person is john

tab key

the age is 25

this person is male

and you can actually notice that now

instead of typing

one i'm actually typing mail and once i

start typing when i type the letter m it

actually knows that i'm typing mail so i

can just press enter to select mail and

use my arrows to go to the next one

where i can actually type primary school

but now i just type two to mean primary

school and tab key and this is going to


show me that that's primary

let's just enter the third one

three

the name is martha

the age which is q3 is 32

the gender is female so that is going to

be two and education level attained

which is q5 is tertiary which is four

and i can tab key and that is perfect so

now go ahead and type the rest of the

table just for you to practice

now in many cases you're not necessarily

going to be typing data directly into

spss you might actually have another

program where you're collecting your

data and that data is going to be

exported

into a format for example microsoft

excel so we have a microsoft excel file

right now as you can see and it has some

fictional nutrition data for kids and we

have these variables which have been

described in the key worksheet here so

when i switch to the key worksheet you


can actually see that hmi stands for

household monthly income

s is for sex phds for parents highest

education and so on and that's the names

of the variables that we have here at

the top so what we want to do is we want

to get this data into spss now the first

thing that you need to do is to close

the data set in microsoft excel because

once you have this open when you try to

import the data you're actually going to

have problems while i'm there i need to

also mention that you need to make sure

that the data set is not in read only

mode this actually happens when you have

downloaded some data set from the

internet what i recommend is if you have

downloaded a microsoft excel dataset

from the internet is you must open it

first

and excel is going to show you a message

asking you if you want to be able to

edit that worksheet once you allow it

then you can save it and close it so


let's go ahead and close this

now we're in spss and let's go ahead and

actually import the data

now to import the data if you're using

the latest spss version like the one i'm

using just go to file

import data

then select the type of data that you

have here if it's an excel file like in

my case i'll select excel

if you're using a csv file then you

select csv

you even have other options down here

so you're just going to go ahead and

click excel and that is going to take

you to this dialog box

if you are not using the latest version

of spss let me actually close this then

what you need to do is go to file

open

and data

that is going to take you to this dialog

box and the only difference with the

last one is that


where it says files of type you just

need to click the drop down

and select excel that's the only

difference

so now the next thing is you have to go

ahead and click the drop down here where

it says look in

and navigate to the folder where you

have the data set i'm already in the

folder where i have the data set which

is this nutritional data fiction here

so i'll go ahead and select it and click

open

so now you're going to end up with this

dialog box that says we are trying to

read the excel file on this path here

and now it says which worksheet are you

trying to import

if i click the drop down you notice that

those two worksheets have been listed

here the other one that has data is this

one here and then we have the one with

the key obviously we want to import the

data one so i'll click that


and you have these options here that

says you want to read the variable names

from the first row of data yes in most

cases the first row of data in excel is

going to be the variable names and

that's what we want to use

the second one says percentage of values

that determine data type that's not very

important but the whole idea of that is

if you have mixed types of data in the

column

then what percentage of data is going to

determine what type of data is going to

be assigned by spss

the default is okay

the next one is to ignore hidden rows

and columns if you have hidden rows and

columns from microsoft excel and you

don't want to ignore them then you can

uncheck this option

the next one pertains to data cleaning

if you have text variables or string

variables

where you have leadings or trailing


spaces then you might want to clean them

up by checking these two options

but i am okay here so i can just go

ahead and click ok

now you notice that spss has opened a

new data editor window

if i move this around you can see that

the other one we started with is still

there

so i'll maximize this

now on this we have the variables

but remember we're still using the codes

so it's hmi phe cbw and so on

so the next step you will do after you

have done this is probably you want to

save it so i'm actually going to go

ahead and save this data set and i'm

going to call it nutrition data

then i'll go ahead and click save

the next step is to now switch the

variable view

and clean up the variables

in most cases what you're looking for

are the labels


the values and the measurement levels

in some cases you might want to look at

the decimals so for example if i switch

the data view you notice that the child

bath weight

you have a lot of zeros and that is

coming out because of the number of

decimals that we have here which is 15.

so let's go through each variable one by

one

the id is fine

the width is okay the decimals okay but

on the label i'm just going to say

that's case id

we don't have any values or missing or

anything else but we must specify the

correct measurement label from the

previous two lessons you notice that the

id is simply a nominal variable so i'll

click the drop down and select nominal

variable

the next variable is household monthly

income i know this because i looked at

the key worksheet from the data set


so household monthly income and that's

what is going to be here as a label

it's not a multiple choice question so

we don't have any values we don't have

any missing i'll just go right away to

the measurement level as household

monthly income is a quantitative

variable

it's actually going to be a skill

variable here

the next one phe that's parents highest

education

so here we're going to type parents

highest education

and now we have to specify the values

because this is actually a categorical

variable and we have values one up to

four which are standing for the levels

of education

so what i would do is i click where it

says none and then i go ahead and click

the button

so 1

is
none

then i press enter 2 is for primary

press enter

three

is for secondary

and finally we have four

which is for tertiary

press enter and then we click okay

here we have to specify the correct

measurement level so remember priority

science education these are education

levels

we have some order on the categories

there so we have to click the drop down

and select that that's an ordinal

variable the next one is child bath

weight and we're measuring child bath

weight in kilograms we have 15 decimal

places here which is too much so i'm

actually going to edit that to just two

and in the label i'm actually going to

type child

bath weight

that's fine
that's fine now we don't have values

this is actually a continuous variable

so i'm just going to go ahead straight

away to look at the measurement level

here it says a scale variable which is

correct so i'm just going to leave it as

is

let's go to the next one which is sex

again everything is okay except we have

to start on the label so the label is

sex of chart

and here on the values since we have

values one for male and two for female

we have to set those up

so i click where it says none and click

the button

once again one that's for male

and two which is for female and press

enter and click ok

on the measurement level this has

already been specified as nominal which

is okay so we can go to the next one

the next variable is child age in months

so the first thing i need to do is type


the

is type the label

that's perfect again that's a continuous

variable

so we don't have any values and when you

go to the measurement level it's scale

and that's perfectly fine and finally we

have base weight class and we only have

two values for this we have underweight

and we have normal so the first thing on

the label let's type bit width class

on the values we click where it says

none and we click the button

so 1 is for unknowait

and 2 is for normal

then i'll go ahead and click add and

then i'll click ok and this has already

been assigned nicely as a nominal

variable as well

all right so let's go ahead and switch

the data view and see how this data

looks like now and now you can see that

everything is being nicely displayed

here
everything has been cleaned up so we can

go ahead and click the save button to

save the changes we have made and we are

actually ready for data analysis so in

the next section we are actually going

to start up to do analysis using

descriptive statistics a few lessons

back i introduced you to descriptive

statistics which is a branch of

statistics that helps us to summarize

our data

in a more meaningful way and we can also

use it to actually explore some

relationships between variables

in our sample

now sometimes we actually even use

descriptive statistics to explore errors

in our data before we can conduct some

advanced statistical analysis or

inferential statistics now to select the

appropriate type of analysis to connect

we're going to be using measurement

labels so remember we talked about

measurement levels before


now due to the nature of categorical

variables in that we cannot do any

mathematical computation on them

the best way that we can summarize them

is through using frequencies

frequencies are simply counts of the

different data values that we have on

the variables so for example here we

have

the variable sex where we have males and

females how many females they wear and

how many males they wear

now these frequencies or counts can also

be represented as percentages now let's

take a look at how we can calculate

frequencies for the variable sex so in

spss what we do is we actually go and

click on the analyze menu

and then we go to descriptive

statistics then we select frequencies

this is a dialog box that is going to

show up and you can see that on the left

hand side we have all the variables that

we have in the data set


and on the right hand side we have this

empty box

and what we need to do now to select the

variable that we want to use for

analysis

is we have to drag and drop the variable

for analysis and put it on the right

hand side

so just click

and drag and drop it over on the box on

the right hand side and believe you me

this is the only thing you need to do

to analyze a categorical variable i'll

just go ahead and click ok

now you notice that you have this window

here which is the viewer window the

output viewer window and it actually

shows us the output of the analysis that

we have just conducted

let's take a look at what this output is

talking about so the first thing is the

statistics box which is basically

showing us the n now the n is basically

the number of cases that we have in the


data set

so the n valid

means the total number of cases that

have valid value or that is the number

of cases where we have none missing

information on that column

so we have 413 people who actually gave

us the agenda

missing is now the number of cases that

have missing values and we have none in

the data set

that's perfect

the next table is going to show us now

the frequencies the frequency column is

showing us the total number of cases who

responded

so now we know that 215 were males

and 198 were females

for a total of 413. the percent column

is the percentage out of the total

number of cases that we have in the data

set

we have 413 total number of cases so we

have 52.1 percent


out of those 413 being males

and for the 7.9

out of the total of 413

being females now mind you this

percentage is out of all the people in

the data set

so even if you have other people who did

not respond to this question this

percentage is going to be out of the

total number which actually includes

those people who did not respond to the

question variety percent on the other

hand is a percentage out of the total

number of cases who actually responded

to that question that is valid or

non-missing values

so if you do have missing values here

then divide percent is going to be

different because the right percent does

not count those people who did not

respond to the question and actually

this is the best percentage for you to

report

so because we don't have any missing


values then you can actually see that

the values that we have here are exactly

the same

finally we have the cumulative

percentage

the cumulative percent is the total

percentage when you add with the

percentage of the previous category in

the table on that row

so on the first row here the percentage

is 52.1 percent for males which is

exactly the percentage we have here

but when we take 52.1 percent and add it

over to 47.9

we get 100

now there are certain other things that

you can add on to the frequency table

for example you can even turn on some

charts let's go back and take a look at

those things so i'll minimize this

go back to the main window and go to

analyze

descriptive statistics and go back to

frequencies you'll notice that spss


actually remembers what we did

previously which is we have the variable

sex of child here

but now let's go ahead and take a look

at the statistics tab

in the statistics box you can actually

see that we have several statistics here

for example the mean the median the mode

which are called central tendency we're

going to talk about this when we're

actually talking about how to summarize

continuous variables out of the

statistics that we have here only the

mode applies to categorical variables

but it doesn't necessarily make very

good sense that's why i'm not going to

do anything here and everything that we

have under this patient and of course

everything here

we're actually going to need those when

we are summarizing continuous variables

so for now let's just close this because

we don't have anything that we can do

here
next

we have charts let's click that

under the charts first we have none that

is the default option that is it's not

going to produce any charts

but you can also produce bar charts

and pie charts or histograms

if you are summarizing categorical

variables like our case here for sex of

child

then we can turn on bar charts or prices

if you're summarizing a continuous

variable then you can use a histogram

i'm actually just going to use the bar

chart and now you have the choice for

frequencies or percentages to be

presented on your y-axis

now what i do in most cases is that if

you have a lot of cases in the data set

then using percentages is going to be

much better because when you have bigger

numbers percentages are easier to

compare but if you have cases that are

below 100
then i think using percentages is like

cheating because percent is actually

short for p100 so if your cases are not

100 then saying percent is actually like

you are cheating

so then i actually recommend that you

should turn on frequencies instead of

percentages

i do have 413 cases i think that's

enough for me to actually use

percentages or click on percentages and

click continue you can also turn on to

create apa style tables which is a

feature that has just been added in this

version of spss

for now i'm just going to go ahead and

click ok

and you notice that we still have the

first two tables

but apart from that let me actually go

ahead and maximize this

we also do have this chart which we can

actually use to accompany whatever

discussion we are writing based on the


output that we have here

now let's just do one more example so

i'm just going to go ahead and minimize

this

this time i want to summarize the

parent's highest education

it has several categories more than two

so i think it's going to be a little bit

interesting as well

let's go back to analyze

descriptive statistics then frequencies

so now if you want to summarize a single

variable i can just click this drag it

and drop it back into the box

and then i'll take parenthesis education

and drop it on the right hand side here

suffice to say you can actually

summarize many variables at once so if

you want to summarize many variables

just drag the other variable and drop it

on the right hand side for example i

could take between class and drop it on

the right hand side without a problem

now the settings that we had before


actually do apply here so when i go to

charts you actually see the bar charts

is still selected here

i'll click continue if you want to reset

everything you can actually click the

reset button

which will reset everything to default

and i would have to bring the variable

to the right hand side again

and choose the options that i want to

turn on

this is fine for me so i'll just go

ahead and click ok

and i'll maximize this

and once more you can actually see that

we have this tab of statistics which now

has two variables

we have the first frequency table which

is showing us parents highest education

and from here we can see that 47 didn't

go to school

or 161 at least went to primary school

156 went to secondary school and 49 went

to texture school
and we have the corresponding

percentages value percentages and

cumulative percentages you notice that

for cumulative percentages 11.4 percent

say that they did not go to any school

and when you combine that to the next

one which is primary school you get 50.4

so basically we can say that 50.4

is a combined percentage for both non

and primary scope and that it actually

increases going up up to 100

on the level of tertiary scope for an

ordinal variable like this one it's

actually interesting to look at the

cumulative percentages

now in the end you actually do have two

different charts you have the first one

which is parents heights education

as you can see here

and now when you go to the next one you

have barefoot class where you're looking

at those that are underweight and those

that are normal and you notice right

away that i definitely have made a


mistake here when i was labeling this

because actually i need to have more

normal than underweight based on the

understanding of my sample so because of

that i need to go back to the data set

and look at the bathroom class you

notice here that we have all normal here

but if you take a look at the chart base

weight here 1.89 kilograms is not

supposed to be normal it means that when

i was recording the values i did not

assign them correctly so i have to go

back to variable view

and the bathroom class and take a look

at the values under here

i'll click the button to take a look at

what i did

so here i have one for underweight and

two for normal and it's actually

supposed to be the other way around

whereby one is supposed to be normal and

two is supposed to be underweight

so i'll click on this one and remove it

and click this one again and remove it


so now i have to type that one

is for normal

and i'll click add

and now 2 is supposed to be for

underweight

and then i'll click add and then i'll

click ok so now let's go back and do

that analysis once again so i'll go back

to analyze

descriptive statistics

and frequencies

everything is exactly the way it was

so just go ahead and click ok

so now when i expand that and scroll

down you notice that we have more normal

children than underweight children and

if i minimize this and go to the data

view

you actually notice that yeah that

actually makes sense because all these

kids that are below 3 kilograms of child

base weight

must be underweight

as compared to the ones that are above 3


kilograms

so you can see how descriptive

statistics is extremely important

not only to summarize your data but also

to look at some errors that you might

have made in the next lesson we are now

going to look at how you can interpret

the data and also how you can present it

in your reports now let's take a look at

how you're going to present this

information in your thesis in a

dissertation or in your reports so you

can either present the frequencies as

statements

and we have here templates of statements

that you can use for example here you

can throw in the percentages of

respondents in a sample where male 1 you

throw in the percentage

female or you can say the sample

consisted of and then you put the

percentage male and percentage female

respondents and then in brackets you can

actually say the total in


and or you can say the sample consisting

of and then you put the number here and

the percentage there

male and then another number here and a

percentage there female respondents and

put an end so i think this one has more

information so let's use that as an

example

so you know in our sample here we

actually have uh if you look at six of

child we have 215 males

and 198 females so we can say

the

sample consisted of

and then we're going to have the number

215 then in brackets we're going to say

52.1

percent then we're gonna close the

bracket

male

and then we'll go to the female so the

number is 198 so 198

in brackets that's 47.9

percent and closer bracket


female

so instead of respondents we'll just

have children

and in brackets we're going to have n

equals

so the total which is the n valid at the

top is 413

and then we close the bracket and full

stop

so this is perfectly fine

now another way you can present this is

by using a table

so in apa you can only use the table if

you're going to have at least two rows

or more than two rows

so in this case we do have two rows so

this is a template of the table that you

can use

so in apa this is a style you have to

use you only need to have borders at the

top

and the border to separate the header of

the table and the rest and another board

at the bottom that's all


so here the frequency for males

we have 215

while the frequency for females

we have 198

and then the percentage

for males

is 52.1 and we actually have to throw in

the percent symbol there

and finally

for the percentage for females is

47.9

percent so this is a perfect table you

can literally just take this and throw

it in your report and it's gonna suffice

but of course you have to label this so

we can say here let's say that this this

was table one so we'll say tape one

full stop

and there we now have to put what this

table is about so we can say table of

frequencies for gender

so for this statement i'm just going to

select it

and remove the board and just make it


italics so if you're using apa format

this is how it's going to look like so

unlike categorical variables which we

have just seen in the frequencies lesson

continuous variables normally have

numerous discrete values

so for example if you have 400

respondents to tell you their monthly

income

you might end up having over 100

different values of monthly income like

is the case with our variable household

monthly income right here

creating a table frequencies with over

100 rows defeats the whole purpose of

data analysis

which is to communicate a story through

your data with minimum numbers as

possible actually let me go ahead and

show you how that is going to look like

so i go to analyze

descriptive statistics then i go to

frequencies

okay i have some variables here so i'll


just click reset button here at the

bottom

and then we'll grab

the household monthly income

and then i'll click ok

now take a look at this we have a really

huge table all the way from here to

there now if we take this and throw it

in our report

obviously it defeats the whole papers

everyone is going to be like what kind

of summer is this

well so now how best can we summarize

that

so the goal of descriptive statistics

for continuous variables or scale

variables in spss

is to describe the data by a single

meaningful value we want to be able to

provide one value that when everybody

sees it should have at least an

understanding of how the data is

should at least have an understanding of

the structure of the data so one of the


best ways in which we can describe

continuous data

is by describing the central point of

the data set

the premise behind this is that all

naturally occurring phenomenon usually

have the most scores around the number

at the middle of the values

the number of the middle is the typical

value in the distribution so we assume

that the majority of the data must be

around that figure at the middle

now there are several numbers that can

be used to describe the central point of

your data set

they are all called measures of central

tendency sometimes they're just called

averages

and we have the mean

the median and the mode

the mean is the number that we get when

we add all the values together and then

divide that by the number of individual

values that we have


the median on the other hand is the

value at exact center point of the data

when we arrange the data from the lowest

to the highest

and the mod is the most frequently

occurring value

in the distribution

so let's see how we can get all these

values so i'm going to minimize the

output

we go to analyze descriptive statistics

as a matter of fact we can still go back

to frequencies

and remember this actually remembers

what it did so we have household monthly

income here so let's go to the

statistics button because this is where

we can get all those statistics so as

you can see here we have the mean median

and mode so let's just turn this on for

now

and press continue

and the next thing that we need to do is

to remove display frequency tables


remember this is what's giving us a very

big table that doesn't make any sense

so i'll uncheck that and then go ahead

and click ok

so let's maximize this so now you can

see that we have this table of

statistics

which has the same things the valid and

the missing but we now have three more

items

let's take a look at those three more

items the first one is the mean and if

you remember we said that the mean is

the value that we're going to get if we

add all the values that we have for

household monthly income

and then divide by 413 which is the

number of cases that we have in the data

set

so this is our typical value the median

like i said is the value that is at

exact meter of the data

when we arrange it from the lowest to

the highest
which as you can see is 180. so the mod

is the most frequently or calling value

in that column

but now in certain cases you might

actually end up having more than one

frequency occurring values as is the

case right now

you can see the nodes that we have here

it says multiple modes exist

and the lowest value is the one that has

been shown

now what does this mean if we go by the

mean

the typical household in the data set

ends around

203.39 dollars

if we go by the median

the typical household in the data set

ends 180.39

but if we want to use the mode then the

typical household in the data set ends

90 dollars per month

now the question is between the three

which one should we report


so the mean is the most inclusive of the

three and the reason for that is it

includes all values of the data set in

the calculation remember we have to add

everything and divide by how many

numbers we have in the data set

this is why the mean is the most

preferred in most cases

however it is easily influenced by

outliers

outliers are values that are extremely

low

or extremely higher than the majority of

the data set

the mean is best used when the data

doesn't have any outliers but if you

have outliers

then it's going to move the mean a lot

so in our case for example we have the

minimum to be 70

and the maximum is 490

now imagine what would happen if we had

someone who actually gets one million

dollars
that is going to change everything

because when we add one million dollars

to the rest and divide by 413

the value that we are going to get is

going to be higher than most of the

scores that we have here so in other

words

it's going to be way higher than the

majority of the scores or the typical of

the scores by definition of the mean

which should be the typical value in

there

it's actually not going to be

representative enough however the median

is never influenced by outliers

so the median is best used when the data

has a lot of outliers then you can use

the median

because it's going to be very

representative so whether you have

extremely high values or you have

extremely low values

that's not a problem this number at the

middle is still going to be the same


however

the median is not very inclusive because

the median is actually an exact value

that you have in the data set now the

mod is the least used value for central

tendency for several reasons

first you can have multiple modes as we

have seen in our example

and if most scores occur with similar

frequency so for example if most of the

scores occur maybe twice

and then you only have one value that

occurs three times

then that value is going to be said to

be the mode even though it occurs one

more time than the rest of the values

that we have in the data set so it's not

really very dependable

that's why most times people will either

choose the mean

or we choose the median in the previous

lesson we saw how we can describe our

data by pointing at the central point of

the data set called the central tendency


now the central point of the data alone

may not paint the full picture of the

data

we also need to know how the data varies

across a distribution let me give you an

example let's say that i tell you that

the average number of eggs i have eaten

in 10 days is two

it could mean that i ate two eggs every

single day for ten days

or it could also mean that i ate 20 eggs

in a single day

and no egg in nine days but then these

two scenarios are very different if i go

to my doctor the second example where i

ate 20 eggs in a single day might raise

an alarm

but not the one where i ate two eggs per

day for 10 days

so apart from mentioning the central

point of the data we also need to show

how the values actually vary from each

other

there are several measures that can be


used to show the dispersion or the

variability of the data

the most common and useful ones we have

the range which is basically just a

maximum value subtracted by the minimum

value

and the standard deviation

which is the average of the distances

between each value and the mean

let's take a look at some example we're

still going to use the same example for

household monthly income so i'll go

ahead and minimize this

then we go to analyze

descriptive statistics frequencies

we still have the household monthly

income here so i can just go back to

statistics button

and this time

i want to turn on standard deviation

minimum maximum

and range then i'll go ahead and click

continue

and then i'll click ok


let's go ahead and expand this

so from the output we already know the

mean the median and the mode

but the standard deviation is the

average distance between each value in

the data

and the mean so we have 413 values

when we calculate the mean

we have 203.39

now we want to find out the distance

between each value in the data set

and the mean which is 203.39

remember we think that the mean is a

middle point of the data so now we need

to calculate the average of those

differences

so that we can say that on average

each value

is

104.13 cents

away from the mean

if the value of the standard deviation

is very high then we can actually say

that we have so many variations in the


data

but if it's low then we can say that we

don't have so much variations in the

data

the range like i said it's just the

maximum minus the minimum we have the

maximum 490.

if we subtract 70 we get the range of

420

now the question is what does this mean

so remember theoretically we are saying

that the mean is the central point of

the data

now we want to see how far on average

each data point is from the mean

so what happens is that we subtract

every value from the mean

and then get the average of the

resulting values so unfortunately using

this method will always result into zero

now to solve this the differences are

going to be squared first to get a

positive number

then an average is going to be


calculated for those squared differences

now finally

we calculate the square root of the

average

so that the answer can be in the

original units of the variable that we

are trying to calculate

the answer is the standard deviation

which is simply an average of the

differences between each value to the

mean

if all values in the data set are

exactly the same for example if we have

a distribution where we have ten numbers

and every number is a two like in the

example that i gave you about eggs

then the standard deviation is zero

now a bigger standard deviation in

relation to the mean

will mean that there are far more

variations in the data

hence when we report a mean we must also

indicate the standard deviation to tell

the reader
whether the values that got us to the

mean are close together or they are

spread further apart using the standard

deviation

you can say that the standard deviation

is going to tell us how representative

the mean is if the standard deviation is

then it means that all the values are

the same in other words 2 is indeed

exactly the center of the data but if

the standard deviation is very very high

then we might actually find that the

mean is not necessarily the middle point

of the data

but at least the reader is going to know

that we have a lot of variations in the

data now let's take a look at how you

can present this in your thesis

dissertation and your new report as a

statement we have these two templates

here so you can say the average and you

plug in the variable for example the

average age was


then here you put the value of the mean

and then in bracket you put standard

deviation on sd equals then you

mentioned the standard deviation here

we also have another template here where

you can say the mean and dimension the

variable

of the respondents was and then you

mentioned the value of the mean

with a standard deviation off and then

you mentioned the value of the standard

deviation

i'm just going to use the first one

there so we're going to say

the average

household monthly income

that's the name of the variable was so

in our case here

that's

203.39 so we'll type that 203.39

and then we put in brackets sd

equals

so the standard deviation is 104.13

that's to two decimal places and then we


can close the bracket and full stop

now here we need to select this sd and

put it as italics like that and that's

exactly how you do it

now if you have multiple variables

you can actually go ahead and put the

table so what we do is we start by

saying here maybe this is table one

full stop

and down here we're going to type in the

title of this table but first of all

let's type in our information here so

here the variable is household monthly

income

and the mean that we have is 203.39

and the standard deviation is 104.13

but now obviously this is just one line

it doesn't make sense to put it in a

table like this

but i have gone ahead and actually

summarized the rest of the variables so

household monthly income

we actually have more variables so let

me just move this somewhere here


so now

the next variable that i have i'm just

going to create another row

so i can say

child pathways

our average for child birth weight is

2.8

and our standard deviation for child

birth weight is 0.39

then we can actually go ahead and do

child age in months

the mean for that

is 18.67

and our standard deviation is 9.65

but of course the chart based weight

maybe we need to include the units and

that is in cages

that is going to help the reader to

understand the units that we are using

and here we need to actually plug in our

title so here we can say

summary statistics

let's just say of key variables

and we need to select that


remove the board and make it italics and

there we are done we can actually get

this into our visitation or thesis or

report

You might also like