Assessement and Evaluation

Unit 1: Assessment: Concept,
Purpose, and Principles

For PGDT
By Cheramlak F.
Q. Define the term:
 Test
 Assessment
 Measurment and
 Evalaution
test in educational context is meant to the presentation
of a standard set of questions to be answered by students
during a fixed period of time under reasonably comparable
conditions for all students.
 It is one instrument that is used for collecting information
about students’ behaviors or performances.
there are many other ways of collecting information about
students’ educational performances other than tests, such
as observations, assignments, project works, portfolios, etc.
Measurement
in education measurement is the process by which the
attributes of a person are measured and described in
numbers.
It is a quantitative description of the behavior or
performance of students.
As educators we frequently measure human attributes
such as attitudes, academic achievement, aptitudes,
interests, personality and so forth.
Thus, the purpose of educational measurement is to
represent how much of ‘something’ is possessed by a
person using numbers.(only collecting info.)
Assessement
the planned process of gathering and synthesizing
information relevant to the purposes of (a) discovering
and documenting students' strengths and weaknesses,
(b) planning and enhancing instruction, or (c)
evaluating progress and making decisions about
students.
Rowntree (1974) views assessment as a human
encounter in which one person interacts with another
directly or indirectly with the purpose of obtaining
and interpreting information about the knowledge,
understanding, abilities and attitudes possessed by
that person.
Evaluation
This concept refers to the process of judging the
quality of student learning on the basis of established
performance standards and assigning a value to
represent the worthiness or quality of that learning or
performance.
It is concerned with determining how well they have
learned. When we evaluate, we are saying that
something is good, appropriate, valid, positive, and so
forth.
Evaluation con’t
Evaluation = Quantitative description of students’
behavior (measurement) + qualitative description of
students’ behavior (non-measurement) + value
judgment.
evaluation as the comparison of what is measured
against some defined criteria and to determine whether
it has been achieved, whether it is appropriate, whether
it is good, whether it is reasonable, whether it is valid
and so forth.
1.2 Importance and Purposes of Assessment
helping LEARNING, and;

Improving TEACHING
Assessment is used to inform and guide teaching
and learning.
Assessment is used to help students set learning
goals.
Assessment is used to assign report card grades.
Assessment is used to motivate students.
The Purpose of Assessment and Evaluation.
i. Placement of student, which involves bringing

students appropriately in the learning sequence and
classification or streaming of students according to
ability or subjects.
 Finding the right person for the right place
ii. Selecting -students for courses – general,
professional, technical, commercial etc.
iii. Certification: This helps to certify that a student has
achieved a particular level of performance.
iv. Stimulating learning: this can be motivation of the
student or teacher, providing feedback, suggesting
suitable practice etc.
v. Improving teaching: by helping to review the
effectiveness of teaching arrangements.
vii. For guidance and counseling services.
viii. For modification of the curriculum purposes.
x. For modification of teaching methods.
xi. For the purposes of promotions to students.
xii. For reporting students progress to their parents.
xiii. For the awards of scholarship and merit.
xiv. For the admission of students into educational
institutions.
Principles of Assessment
 As Miller, Linn and Grunland state the ff principles.
 Clearly specifying what is to be assessed has priority in
the assessment process.
 An assessment procedure should be selected because
of its relevance to the characteristics or performance to
be measured.
 Comprehensive assessment requires a variety of
procedures.
 Proper use of assessment procedures requires an
awareness of their limitations.
 Assessment is a means to an end, not an end in itself
Con’td
Other principles are
Assessment should be relevant and useful info.
Assessment should be appropriate
Assessment should be fair and accurate,
Assessment should be integrated into the
teaching and learning cycle,
Assessment should draw on a wide range of
evidence and
Assessment should be manageable.
Assessment and Some Basic Assumptions
Angelo and Cross (1993) have listed seven basic
assumptions of classroom assessment which are
described as follows:
The quality of student learning is directly,
although not exclusively related to the quality of
teaching. Therefore, one of the most promising
ways to improve learning is to improve teaching
Assumption cont’d
To improve their effectiveness, teachers need first
to make their goals and objectives explicit and
then to get specific, comprehendible feedback on
the extent to which they are achieving those goals
and objectives.
To improve their learning, students need to
receive appropriate and focused feedback early
and often; they also need to learn how to assess
their own learning.
Cont’
The type of assessment most likely to improve
teaching and learning is that conducted by
teachers to answer questions they themselves
have formulated in response to issues or
problems in their own teaching.
Systematic inquiry and intellectual challenge are
powerful sources of motivation, growth, and
renewal for teachers, and classroom assessment
can provide such challenge.
Cont’
Classroom assessment does not require
specialized training; it can be carried out by
dedicated teachers from all disciplines.
By collaborating with colleagues and actively
involving students in classroom assessment
efforts, teachers (and students) enhance learning
and personal satisfaction.
Unit Two: Assessment Strategies, Methods,
and Tools
 Types of assessment
There are different approaches in conducting
assessment in the classroom. Here we are going to see
three pairs of assessment typologies: namely,
formal vs. informal,
criterion referenced vs. norm referenced,
formative vs. summative assessments.
Formative and Summative Assessments
Formative Assessment: Formative assessments are

used to shape and guide classroom instruction.
They can include both informal and formal assessments
and help us to gain a clearer picture of where our
students are and what they still need help with.
They can be given before, during, and even after
instruction, as long as the goal is to improve instruction.
(ongoing process).
They serve a diagnostic function for both students and
teachers.
Formative cont’d
Both teachers and students received feedback.
Formative assessment is also known by the name
‘assessment for learning’ and continuous
assessment’.
Continuous assessment (as opposed to terminal
assessment) is based on the premise that if assessment
is to help students’ improvement in their learning and
if a teacher is to determine the progress of students
towards the achievement of the learning goals, it has
to be conducted on a continuous basis.
Ff are some strategies of formative assessment you can employ in your croom:
Students write their understanding of vocabulary or concepts

before and after instruction.
ask students to summarize the main ideas they've taken away
from your presentation, discussion, or assigned reading.
You can make students complete a few problems or questions at
the end of instruction and check answers.
You can assign brief, in-class writing assignments (e.g., "Why is
this person or event representative of this time period in
history?)
Tests and homework can also be used formatively if
teachers analyze where students are in their learning and
provide specific, focused feedback regarding performance
and ways to improve it.
Summative Assessment(SA):
SA typically comes at the end of a course (or unit) of
instruction.
It evaluates the quality of students’ learning and assigns
a mark to that students’ work based on how effectively
learners have addressed the performance standards and
criteria.
include teacher made achievement tests, ratings of
various types of performance, and assessment of
products (reports, drawings, etc.).
A particular assessment task can be both formative and
summative(if it counted for final grade).
They could also receive extensive feedback on their work.
Maximal vs Typical performance tests
Maximal performance tests are concerned with how
well an individual performs when motivated to obtain high
score
 what individuals can do when they put for their best effort.
E.g. Aptitude test, Achievement tests and intelligence
tests.
Typical performance tests are those designed to reflect a
person’s typical behavior.
 They fall into the general area of personality appraisal such
as interests, attitudes and various aspects of personal
& social adjustment.
 Self-report and observational techniques such as
interviews, questionnaires, and ratings are sometimes
used.
Formal and Informal Assessment
Formal Assessment: This usually implies a written
document, such as a test, quiz, or paper. A formal
assessment is given a numerical score or grade based
on student performance.
Informal Assessment: "Informal" is used here to
indicate techniques that can easily be incorporated
into classroom routines and learning activities.
Informal assessment techniques can be used at
anytime without interfering with instructional time.
 Their results are indicative of the student's
performance on the skill or subject of interest
Informal ass.cont’d
An informal assessment usually occurs in a more
casual manner and may include observation,
inventories, checklists, rating scales, rubrics,
performance and portfolio assessments, participation,
peer and self evaluation, and discussion.
Informal assessment seeks to identify the strengths
and needs of individual students without regard to
grade or age norms.
Methods for informal assessment can be divided into
two main types: unstructured (e.g., student work
samples, journals) and structured (e.g., checklists,
observations).
Informal ass.cont’d
The unstructured methods frequently are somewhat
more difficult to score and evaluate, but they can
provide a great deal of valuable information about the
skills of the students.
Structured methods can be reliable and valid
techniques when time is spent creating the "scoring"
procedures.
informal assessments is students actively involve in
the evaluation process - they are not just paper-and-
pencil tests.
Criterion-referenced and Norm-referenced Assessments
Criterion-referenced Assessment: This type of

assessment allows us to quantify the extent students
have achieved the goals of a unit of study and a course.
 It is carried out against previously specified criteria and
performance standards.
Criterion referenced classrooms are mastery-oriented,
informing all students of the expected standard and
teaching them to succeed on related outcome measures.
Criterion referenced assessments help to eliminate
competition and may improve cooperation.
Norm-referenced Assessment
 This type of assessment has as its end point the
determination of student performance based on a
position within a cohort of students – the norm group.
 This type of assessment is most appropriate when one
wishes to make comparisons across large numbers of
students or important decisions regarding student
placement and advancement.
The criterion-referenced assessment emphasizes
description of student’s performance, and the norm-
referenced assessment emphasizes discrimination
among individual students in terms of relative level of
learning.
Assessment strategy refers to those assessment tasks
Assessment Strategies
(methods/approaches/activities) in which students are

engaged to ensure that all the learning objectives of a
subject, a unit or a lesson have been adequately
addressed.
Assessment strategies range from informal, almost
unconscious, observation to formal examinations.
there are variety of methods that can be used by most
subjects.
There are many different ways to categorize learning
goals for students.
Categorizing helps us to thoroughly think through what
we want students to know and be able to do.
Knowledge and understanding: What facts do
The learning goals can be categorized as presented follows:
students know outright? What information can they

retrieve? What do they understand?
Reasoning proficiency: Can students analyze,
categorize, and sort into component parts? Can they
generalize and synthesize what they have learned? Can
they evaluate and justify the worth of a process or
decision?
Skills: We have certain skills that we want students to
master such as reading fluently, working productively
in a group, making an oral presentation, speaking a
foreign language, or designing an experiment.
Cont’
Dispositions: We also frequently care about student
attitudes and habits of mind, including attitudes
toward school, persistence, responsibility, flexibility,
and desire to learn.
Ability to create products: Another kind of learning
target is student-created products - tangible evidence
that the student has mastered knowledge, reasoning,
and specific production skills. Examples include a
research paper, a piece of furniture, or artwork
among the various assessment strategies
that can be used by classroom teachers
Classroom presentations: A classroom presentation is an
assessment strategy that requires students to verbalize their
knowledge, select and present samples of finished work, and
organize their thoughts about a topic in order to present a summary
of their learning.
 It may provide the basis for assessment upon completion of a
student’s project or essay.
Conferences: A conference is a formal or informal meeting between
the teacher and a student for the purpose of exchanging information
or sharing ideas.
A conference might be held to explore the student’s thinking and
suggest next steps; assess the student’s level of understanding of a
particular concept or procedure.
Cont’
Exhibitions/Demonstrations: An exhibition/
demonstration is a performance in a public setting,
during which a student explains and applies a process,
procedure, etc., in concrete ways to show individual
achievement of specific skills and knowledge.
Interviews:
Observation: Observation is a process of
systematically viewing and recording students while
they work, for the purpose of making instruction
decisions.
Cont’
Performance tasks: During a performance task,
students create, produce, perform, or present works on
"real world" issues. The performance task may be used to
assess a skill or proficiency, and provides useful
information on the process as well as the product.
Portfolios: A portfolio is a collection of samples of a
student’s work over time.
It offers a visual demonstration of a student’s
achievement, capabilities, strengths, weaknesses,
knowledge, and specific skills, over time and in a variety
of contexts.
For a portfolio to serve as an effective assessment
instrument, it has to be focused, selective, reflective, and
collaborative.
Cont’
Questions and answers:
Strategies for effective question and answer
assessment include:
Apply a wait time or 'no hands-up rule' to provide
students with time to think after a question before
they are called upon randomly to respond.
Ask a variety of questions, including open-ended
questions and those that require more than a right or
wrong answer.
Cont’
Students’ self-assessments: Self-assessment is a
process by which the student gathers information
about, and reflects on, his or her own learning.
Checklists usually offer a yes/no format in relation to
student demonstration of specific criteria. They may
be used to record observations of an individual, a
group or a whole class.
Rating Scales allow teachers to indicate the degree or
frequency of the behaviors, skills and strategies
displayed by the learner. Rating scales state the criteria
and provide three or four response selections to
describe the quality or frequency of student work.
Cont’
Rubrics use a set of criteria to evaluate a student's
performance. They consist of a fixed measurement
scale and detailed description of the characteristics for
each level of performance.
These descriptions focus on the quality of the
product or performance and not the quantity.
Rubrics use a set of specific criteria to evaluate student
performance.
They may be used to assess individuals or groups and,
as with rating scales, may be compared over time.
Cont’
One- Minute paper: During the last few minutes of
the class period, you may ask students to answer on a
half-sheet of paper.
Muddiest Point: This is similar to ‘One-Minute
Paper’ but only asks students to describe what they
didn't understand and what they think might help.
Student- generated test questions
Tests
Assessment in large classes
The existing educational literature has identified
various assessment issues associated with large classes.
They include:
Surface Learning Approach: Traditionally, teachers
rely on time-efficient and exam-based assessment
methods for assessing large classes, such as multiple
choices and short answer question examinations.
Higher level learning such as critical thinking and
analysis are often not fully assessed.
Feedback is often inadequate
Cont’
Inconsistency in marking
Difficulty in monitoring cheating and plagiarism
Lack of interaction and engagement
There are a number of ways to make the assessment of
large numbers of students more effective whilst still
supporting effective student learning. These include:
Cont’
1.Front ending: The basic idea of this strategy is that
by putting in an increased effort at the beginning in
setting up the students for the work they are going to
do, the work submitted can be improved. Therefore
the time needed to mark it is reduced (as well as time
being saved in less requests for tutorial guidance).
2.Making use of in-class assignments: In-class
assignments are usually quick and therefore relatively
easy to mark and provide feedback on, but help you to
identify gaps in understanding. Students could be
asked to complete a task within the timeframe of a
scheduled lecture, field exercise or practical class.
Cont’
3.Self-and peer-assessment
Self-assessment reduces the marking load because it
ensures a higher quality of work is submitted, thereby
minimizing the amount of time expended on marking
and feedback.
peer-assessment can provide useful learning
experiences for students.
 This could involve providing students with answer
sheets or model answers to a piece of coursework that
you had set them previously and then requiring
students to undertake the marking of those
assignments in class.
The benefits of peer assessment approach are that:
students can get to see how their peers have tackled a
particular piece of work,
they can see how you would assess the work (e.g. from
the model answers/answer sheets you've provided)
and;
they are put in the position of being an assessor,
thereby giving them an opportunity to internalize the
assessment criteria.
4.Group Assessments
5.Changing the assessment method, or at least
shortening it.
The Role of Objectives in Education
Objectives generally indicate the end points of a journey.
They specify where you want to be or what you intend to
achieve at the end of a process.
An educational objective is that achievement which a
specific educational instruction is expected to make or
accomplish.
It is the outcome of any educational instruction.
It is the purpose for which any particular educational
undertaking is carried out.
It is the goal of any educational task.
Educational objectives can be specified at various
levels(national,institutional and Instructional Level) .
Methods of stating instructional objective
 Lists of objectives for a subject or unit of study should be
detailed enough to clearly communicate the intent of the
instruction and to serve as an effective overall guide in
planning for teaching and evaluation
1. Stating general instructional objectives as intended

learning outcomes.
2. Listing, under each general objectives, a sample of specific
type of performance that students are expected to
demonstrate when they have achieved the objectives
 In order to ensure that your instructional objectives are

stated in specific terms, check that it is SMART
Course outline.docx
The boom taxonomy of educational
objectives
In this taxonomy Bloom et al (1956) divided educational
objectives into three domains. These are cognitive
domain, Affective domain and psychomotor domain.
Cognitive domain
The cognitive domain involves those objectives that deal
with the development of intellectual abilities and skills.
 These have to do with the mental abilities of the brain.
The cognitive domain is categorized into six hierarchical
knoweldge/memory, comprehension, application,
analysis, synthesis and evaluation.
Bloom taxonomy revised
Affective Domain
Characteristics of Affective Domain:
These are the emphatic characteristic of this domain of
acceptance or rejection.
It is concerned with interests, attitudes, appreciation,
emotional biases and values.
The function of the affective domain in the
instructional situation pertains to emotions, the
passions, the dispositions, the moral and the aesthetic
sensibilities, the capacity for feeling, concern,
attachment or detachment, sympathy, empathy, and
appreciation.
These level are: receiving, responding, valuing,
organization and characterization
PSYCHOMOTOR DOMAIN
In the psychomotor domain has to do with motor skills or
abilities.
It means therefore that the instructional objectives here will
make performance skills more prominent.
The psychomotor domain has to do with muscular activities.
It deals with such activities which involve the use of the
limbs (hand) or the whole of the body.
These tasks are inherent in human beings and normally
should develop naturally.
 Psychomotor domain is sub divided into hierarchical levels.
From the lowest, we have (i) Reflex movements (ii) Basic
Fundamental movements (iii) Perceptual abilities (iv)
Physical abilities (v) Skilled movements and (vi) Non-
discursive communication
Selecting and developing assessment
methods and tools
Appropriate tools or combinations of tools must be
selected and used if the assessment process is to
successfully provide information relevant to stated
educational outcomes.
Constructing Tests
There are a wide variety of styles & formats for writing test
items. Miller, Linn, & Gronlund (2009) make distinctions
between classroom tests that consist of objective test
items and performance assessments that require students
to construct responses (e.g. write an essay) or perform a
particular task (e.g., measure air pressure).
Cont’
Objective tests are highly structured and require the test
taker to select the correct answer from several alternatives
or to supply a word or short phrase to answer a question.
They are called objective because they have a single right
or best answer that can be determined in advance.
Performance assessment tasks permit the student to
organize and construct the answer in essay form.
Other types of performance assessment tasks may require
the student to use equipment, generate hypothesis, make
observations, construct something or perform for an
audience.
Cont’
Constructing Objective Test Items
There are various types of objective test items.
These can be classified into those that require the
student to supply the answer (supply type items) and
those that require the student to select the answer
from a given set of alternatives (selection type items).
 Supply type items include completion items and short
answer questions.
Selection type test items include True/False, multiple
choice and matching.
True/False Test Items
The chief advantage of true/false items is that they do

not require the student much time for answering.
This allows a teacher to cover a wide range of content
by using a large number of such items.
 true/false test items can be scored quickly, reliably,
and objectively by any body using an answer key.
If carefully constructed, true/false test items have also
the advantage of measuring higher mental processes
of understanding, application and interpretation.
The major disadvantage of true/false items
is that when they are used exclusively, they tend to promote
memorization of factual information: names, dates, definitions,
and so on.
Some argue that another weakness of true/false items is that they
encourage students for guessing.
In addition true/false items:
Can often lead a teacher to write ambiguous statements due to the
difficulty of writing statements which are clearly true or false
Do not discriminate b/n students of varying ability as well as other
test items
Can often include more irrelevant clues than do other item types
Can often lead a teacher to favour testing of trivial knowledge
The ff suggestions help teachers to construct good quality true/false test items
Avoid negative statements, and never use double

negatives. In Right-Wrong or True-False items,
negatively phrased statements make it needlessly
difficult for students to decide whether that statement
is accurate or inaccurate.
Restrict single-item statements to single concepts
Use an approximately equal number of items,
reflecting the two categories tested.
Make statements representing both categories
equal in length.
Matching Items
A matching item consists of two lists of words or

phrases according to a particular kind of association
indicated in the item’s directions.
Matching items sometimes can work well if you want
your students to cross-reference and integrate their
knowledge regarding the listed premises and
responses.
matching items can cover a good deal of content in an
efficient fashion
Merits and limitation of Matching
The major advantage of matching items is its compact form,
which makes it possible to measure a large amount of related
factual material in a relatively short time.
Another advantage is its ease of construction.
The main limitation of matching test items is that they are
restricted to the measurement of factual information based on
rote learning.
Another limitation is the difficulty of finding homogenous
material that is significant from the perspective of the learning
outcomes.
As a result test constructors tend to include in their matching
items material which is less significant
The ff suggestions are guidelines for the construction of good
matching items
Use fairly brief lists, placing the shorter entries on the
right
Employ homogeneous lists
Include more responses than premises
List responses in a logical order
Describe the basis for matching and the number of
times a response can be used(“Each response in the list at
the right may be used once, more than once, or not at all.”)
Try to place all premises and responses for any
matching item on a single page.
Short Answer/Completion Test Items
The short-answer items and completion test items are

essentially the same that can be answered by a word,
phrase, number or formula.
They differ in the way the problem is presented. The short
answer type uses a direct question, where as the
completion test item consists of an incomplete statement
requiring the student to complete.
The short-answer test items are one of the easiest to
construct
This reduces the possibility that students will obtain the
correct answer by guessing
There are two limitations cited in the use of
short-answer test items
One is that they are unsuitable for assessing complex
learning outcomes.
The other is the difficulty of scoring, this is especially
true where the item is not clearly phrased to require a
definitely correct answer and the student’s spelling
ability.
The following suggestions will help to make short-
answer type test items to function as intended.
Cont’
Word the item so that the required answer is both
brief and specific
Do not take statements directly from textbooks to use
as a basis for short-answer items.
A direct question is generally more desirable than an
incomplete statement.
If the answer is to be expressed in numerical units,
indicate the type of answer wanted
Multiple-Choice Items
It can effectively measure many of the simple learning

outcomes, in addition, it can measure a variety of complex
cognitive learning outcomes.
A multiple-choice item consists of a problem(item stem)
and a list of suggested solutions(alternatives, choices or
options).
There are two important variants in a multiple-choice item:
(1) whether the stem consists of a direct question or an
incomplete statement, and
(2) whether the student’s choice of alternatives is supposed
to be a correct answer or a best answer.
A key advantage of the multiple-choice item
is its widespread applicability to the assessment of
cognitive skills and knowledge, as well as to the
measurement of students’ affect.
 Another advantage is that it’s possible to make them
quite varied in the levels of difficulty they possess.
Cleverly constructed multiple-choice items can
present very high-level cognitive challenges to
students.
And, of course, as with all selected-response items,
multiple-choice items are fairly easy to score
The key weakness of multiple-choice items
is that when students review a set of alternatives for an item,
they may be able to recognize a correct answer that they would
never have been able to generate on their own.
In that sense, multiple-choice items can present an exaggerated
picture of a student’s understanding or competence, which
might lead teachers to invalid inferences.
Another serious weakness, one shared by all selected-response
items, is that multiple-choice items can never measure a
student’s ability to creatively synthesize content of any sort.
 Finally, in an effort to come up with the necessary number of
plausible alternatives, novice item-writers sometimes toss in
some alternatives that are obviously incorrect.
useful rules to follow when preparing multiple choice.
The question or problem in the stem must be self-

contained.
Avoid negatively stated stems.
Each alternative must be grammatically consistent with
the item’s stem
Make all alternatives plausible, but be sure that one of
them is indisputably the correct or best answer.
Randomly use all answer positions in approximately
equal numbers.
Never use “all of the above” as an answer choice, but use
“none of the above” to make items more demanding.
Constructing Performance Assessments
The distinctive feature of essay questions is that

students are free to construct, relate, and present ideas
in their own words.
Learning outcomes concerned with the ability to
conceptualize, construct, organize, relate, and
evaluate ideas require the freedom of response and the
originality provided by essay questions.
Essay questions can be classified into two types –
restricted-response essay questions and extended
response essay questions.
Cont’
Restricted-response essay questions: These types of
questions usually limit both the content and the response.
The content is usually restricted by the scope of the topic to
be discussed. Limitations on the form of response are
generally indicated in the question.
Extended response Essays: these types of questions allow
students:
to select any factual information that they think is relevant,
to organize the answer in accordance with their best
judgment, and;
to integrate and evaluate ideas as they deem appropriate.
essay questions have some more
advantages which include
Extended-response essays focus on the integration
and application of thinking and problem solving skills.
Essay assessments enable the direct evaluation of
writing skills.
Essay questions, as compared to objective tests, are
easy to construct.
Essay questions have a positive effect on students
learning
The following are suggestions for the construction of good essay
questions:

Restrict the use of essay questions to those learning
outcomes that can not be measured satisfactorily by
objective items.
Structure items so that the student’s task is explicitly
bounded.
For each question, specify the point value, an
acceptable response-length, and a recommended time
allocation.
Employ more questions requiring shorter answers
rather than fewer questions requiring longer answers.
Don’t employ optional questions.
Test a question’s quality by creating a trial response to the item.
The following guidelines would be helpful in making the scoring of
essay items easier and more reliable.
You should ensure that you are firm emotionally, mentally etc before
scoring
All responses to one item should be scored before moving to the next
item
Write out in advance a model answer to guide yourself in grading the
students’ answers
Shuffle exam papers after scoring every question before moving to the
next
The names of test takers should not be known while scoring to avoid bias
Table of Specification and Arrangement of
Items
If tests are to be valid and reliable they have to be
developed based on carefully designed plans.
planning classroom test involves identifying the
instructional objectives earlier stated and the subject
matter (content) covered during the teaching/learning
process.
the following serves as guide in planning a classroom test.
 Determine the purpose of the test;

 Describe the instructional objectives and content to be measured.
 Determine the relative emphasis to be given to each learning
outcome;
 Select the most appropriate item formats (essay or objective);
 Develop the test blue print to guide the test construction;
 Prepare test items that is relevant to the learning outcomes specified
in the test plan;
 Decide on the pattern of scoring and the interpretation of result;
 Decide on the length and duration of the test, and
 Assemble the items into a test, prepare direction and administer the
test.
Developing a table of specification involves:
1. Preparing a list of learning outcomes, i.e. the type of

performance students are expected to demonstrate
2. Outlining the contents of instruction, i.e. the area in which
each type of performance is to be shown, and
3. Preparing the two way chart that relates the learning outcomes
to the instructional content.
In geography subject
content Item type
Item Types
Contents Total Percent Contents
True/False Ma Short Answer Multiple Choice
tch
ing
Air pressure 1 1 1 3 6 24% Air pressure
Wind 1 1 1 1 4 16% Wind
Temperature 1 2 1 3 7 28% Temperature
Rainfall 1 1 1 2 5 20% Rainfall

Arrangement of test items
There are various methods of grouping items in an

achievement test depending on their purposes. For
most purposes the items can be arranged by a
systematic consideration of:
The type of items used
The learning outcomes measured
The difficulty of the items, and
The subject matter measured
To summarize, the most effective method for
organizing items in the typical classroom test is to:
 Form sections by item type
 Group the items within each section by the learning outcomes
measured, and
 Arrange both the sections and the items within sections in an
ascending order of difficulty.

Administration of Tests
Test Administration refers to the procedure of actually

presenting the learning task that the examinees are
required to perform in order to ascertain the degree of
learning that has taken place during the teaching-learning
process.
Threatening students with tests if they do not behave
Warning students to do their best “because the test is
important”
Telling students they must work fast in order to finish on
time.
Threatening dire consequences if they fail
Ensuring Quality in Test Administration
The ff are guidelines and steps involved in test administration

aimed at ensuring quality in test administration.
Collection of the question papers in time from custodian to be able to
start the test at the appropriate time stipulated.
Ensure compliance with the stipulated sitting arrangements in the test
to prevent collision between or among the test takers.
Ensure orderly and proper distribution of questions papers to the test
takers.
make it clear that cheating will be penalized.
avoid giving hints to test takers who ask about particular items. But
make corrections or clarifications to the test takers whenever necessary.
Keep interruptions during the test to a minimum
in test administration,
effort should be made to see that the test takers are given a
fair and unaided chance to demonstrate what they have
learnt with respect to:
Instructions: Test should contain a set of instructions which
are usually of two types. One is the instruction to the test
administrator while the other one is to the test taker.
Duration of the Test
Venue and Sitting Arrangement
Other necessary conditions
All these are necessary to enhance the test administration
and to make assessment civil in manifestation
Scoring Essay Test
There are two common methods of scoring essay

questions. These are:
A. The Point or Analytic Method
In this method each answer is compared with already
prepared ideal marking scheme (scoring key) and marks
are assigned according to the adequacy of the answer.
 When used conscientiously, the analytic method
provides a means for maintaining uniformity in scoring
between scorers and between scripts thus improving the
reliability of the scoring
B. The Global/Holistic of Rating Method
In this method the examiner first sorts the response
into categories of varying quality based on his general
or global impression on reading the response.
The standard of quality helps to establish a relative
scale, which forms the basis for ranking responses
from those with the poorest quality response to those
that have the highest quality response.
Scoring Objective Test.
i. Manual Scoring
In this method of scoring the answer to test items are
scored by direct comparison of the examinees answer
with the marking key.
If the answers are recorded on the test paper for
instance, a scoring key can be made by marking the
correct answers on a blank copy of the test
Scoring is then done by simply comparing the
columns of answers on the master copy with the
columns of answers on each examinee’s test paper.
ii. Stencil Scoring
On the other hand, when separate sheet of answer sheets are used
by examinees for recording their answers, it’s most convenient to
prepare and use a scoring stencil.
 A scoring stencil is prepared by pending holes on a blank answer
sheet where the correct answers are supposed to appear. Scoring
is then done by laying the stencil over each answer sheet and the
number of answer checks appearing through the holes is counted.
iii. Machine Scoring
Usually for a large number of examinees, a specially prepared
answer sheets are used to answer the questions.
The answers are normally shaded at the appropriate places
assigned to the various items
Unit 3: Item Analysis
It is the process involved in examining or analyzing testee’s
responses to each item on a test with a basic intent of
judging the quality of item.
Item analysis helps to determine the adequacy of the items
within a test as well as the adequacy of the test itself.
There are several reasons for analyzing questions and tests
that students have completed and that have already been
graded. Some of the reasons that have been cited include
the following:
Identify content that has not been adequately covered and
should be re-taught,
Provide feedback to students,
Determine if any items need to be revised in the event
they are to be used again or become part of an item file
or bank,
Identify items that may not have functioned as they
were intended,
Direct the teacher's attention to individual student
weaknesses.
The results of an item analysis provide information
about the difficulty of the items and the ability of the
items to discriminate between better and poorer
students.
If an item is too easy, too difficult, failing to show a
difference between skilled and unskilled examinees, or
even scored incorrectly, an item analysis will reveal it.
The two most common statistics reported in an item
analysis are the item difficulty and the item
discrimination.
Item difficulty level index
It is one of the most useful, and most frequently

reported, item analysis statistics.
It is a measure of the proportion of examinees who
answered the item correctly; for this reason it is
frequently called the p-value.
If scores from all students in a group are included the
difficulty index is simply the total percent correct.
 When there is a sufficient number of scores available
(i.e., 100 or more) difficulty indexes are calculated using
scores from the top and bottom 27 percent of the group.
Item analysis procedures
Rank the papers in order from the highest to the lowest

score
Select one-third of the papers with the highest total score
and another one-third of the papers with lowest total
scores
For each test item, tabulate the number of students in the
upper & lower groups who selected each option
Compute the difficulty of each item (% of students who
got the right item)
Item difficulty index can be calculated using the following
formula:
P= successes in the HSG + successes in the LSG
N in (HSG+LSG )
Where, HSG = High Scoring Groups
 LSG= Low Scoring Groups
N = the total number of HSG and LSG
 The difficulty indexes can range between 0.0 and 1.0
and are usually expressed as a percentage.
 A higher value indicates that a greater proportion of
examinees responded to the item correctly,
For maximum discrimination among students, an
average difficulty of .60 is ideal.
For example: If 243 students answered item no. 1
correctly and 9 students answered incorrectly, the
difficulty level of the item would be 243/252 or .96.
Item difficulty interpretation
P-Value Percent Range Interpretation
> or = 0.75 75-100 Easy
< or = 0.25 0-25 Difficult
between .25 & .75 26-74 Average

For criterion-referenced tests (CRTs), with their
emphasis on mastery-testing, many items on an exam
form will have p-values of .9 or above.
Norm-referenced tests (NRTs), on the other hand, are
designed to be harder overall and to spread out the
examinees’ scores. Thus, many of the items on an NRT
will have difficulty indexes between .4 and .6.
Item discrimination index
The index of discrimination is a numerical indicator
that enables us to determine whether the question
discriminates appropriately between lower scoring and
higher scoring students.
Item discrimination index can be calculated using the
following formula:
D= successes in the HSG - successes in the LSG
1/2 (HSG+LSG )
Where, HSG = High Scoring Groups
LSG = Low Scoring Groups
The item discrimination index can vary from -1.00 to
+1.00.
A negative discrimination index (between -1.00 and
zero) results when more students in the low group
answered correctly than students in the high group.
A discrimination index of zero means equal numbers
of high and low students answered correctly, so the
item did not discriminate between groups.
A positive index occurs when more students in the
high group answer correctly than the low group.
Questions that have an item difficulty index (NOT item
discrimination) of 1.00 or 0.00 need not be included
when calculating item discrimination indices.
An item difficulty of 1.00 indicates that everyone
answered correctly, while 0.00 means no one answered
correctly.
When computing the discrimination index, the scores
are divided into three groups with the top 27% of the
scores in the upper group and the bottom 27% in the
lower group.
Item discrimination interpretation
D-Value Direction Strength
> +.40 positive strong
+.20 to +.40 positive moderate
-.20 to +.20 none ---
< -.20 negative moderate to strong

For a small group of students, an index of discrimination
for an item that exceeds .20 is considered satisfactory.
 For larger groups, the index should be higher because
more difference between groups would be expected.
The guidelines for an acceptable level of discrimination
depend upon item difficulty.
For items with a difficulty level of about 70 percent, the
discrimination should be at least .30.
When an item is discriminating negatively, overall the most
knowledgeable examinees are getting the item wrong and
the least knowledgeable examinees are getting the item
right.
More often, it is a sign that the item has been mis-keyed.
Distractor Analysis
It evaluates the effectiveness of the distracters (options)
in each item by comparing the number of students in
the upper and lower groups who selected each incorrect
alternative (a good distracter will attract more students
from the lower group than the upper group).
In addition to being clearly incorrect, the distractors
must also be plausible.
That is, the distractors should seem likely or reasonable
to an examinee who is not sufficiently knowledgeable in
the content area
Evaluating the Effectiveness of Distracters
The distraction power of a distractor is its ability to
differentiate between those who do not know and those who
know what the item is measuring. That , a good distracter
attracts more testees from the lower group than the
upper group.
Formula:

No of low scorers who No of high scorers who
Option Distractor Power (Do) = marked option (L) - marked option (H)
Total No of testees in upper group (n)

Incorrect options with positive distraction power
are good distractors while one with negative
distracter must be changed or revised and those
with zero should be improved on because they are
not good. Hence, they failed to distract the low
achievers.
Item Banking
items and tasks are recorded on records as they are
constructed; information form analysis of students
responses is added after the items and tasks have been
used, and then the effective items and tasks are deposited
in the file.
Such a file is especially valuable in areas of complex
achievement, when the construction of test items and
assessment tasks is difficult and time consuming.
When enough high-quality items and tasks have been
assembled, the burden of preparing tests and
assessments is considerably lightened
The End
Thank you Attention

Assessement and Evaluation

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Assessement and Evaluation

Uploaded by

Copyright:

Available Formats

Unit 1: Assessment: Concept,

Purpose, and Principles

helping LEARNING, and;

i. Placement of student, which involves bringing

Formative Assessment: Formative assessments are

Students write their understanding of vocabulary or concepts

Criterion-referenced Assessment: This type of

(methods/approaches/activities) in which students are

students know outright? What information can they

1. Stating general instructional objectives as intended

 In order to ensure that your instructional objectives are

The chief advantage of true/false items is that they do

Avoid negative statements, and never use double

A matching item consists of two lists of words or

The short-answer items and completion test items are

It can effectively measure many of the simple learning

The question or problem in the stem must be self-

The distinctive feature of essay questions is that

 Determine the purpose of the test;

1. Preparing a list of learning outcomes, i.e. the type of

Air pressure 1 1 1 3 6 24% Air pressure

Wind 1 1 1 1 4 16% Wind

Temperature 1 2 1 3 7 28% Temperature

Rainfall 1 1 1 2 5 20% Rainfall

There are various methods of grouping items in an

ascending order of difficulty.

Test Administration refers to the procedure of actually

The ff are guidelines and steps involved in test administration

There are two common methods of scoring essay

It is one of the most useful, and most frequently

Rank the papers in order from the highest to the lowest

> or = 0.75 75-100 Easy

< or = 0.25 0-25 Difficult

between .25 & .75 26-74 Average

D-Value Direction Strength

> +.40 positive strong

+.20 to +.40 positive moderate

-.20 to +.20 none ---

< -.20 negative moderate to strong

Thank you Attention

You might also like