Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

人文社会学部紀要 VOL.

1(2001.3)

Evaluating and Modifying Language Assessment Scales

Richard Stone

An overview

An assessment scale is a set of performance-based proficiency ratings which can be used to evaluate one’s
ability to communicate in a foreign language. Four major language skill areas are commonly included in
assessment scales: Speaking, Listening, Reading, and Writing. Proficiency ratings for each language skill area
are comprised of performance criteria which correspond to increasing degrees of foreign language
competence. Assessment scales are typically used by language programs to place students at their proper
levels as they enter the program and to evaluate students as they exit the program.

Evaluating Assessment Scales

Assessment scales can be created with considerations toward validity and reliability, but to do so requires
them to be created at a time and in a manner consistent with curriculum development. More frequently,
however, they are created as after-thoughts, since curriculum decisions and classroom instruction are
typically implemented much earlier. Only when students exit programs are thought given and attempts made
to evaluate and standardize student ability levels. Instead of linking assessment scales with program goals of
instruction and teaching methodologies from the onset, assessment scales are usually produced to fill
perceived voids in administrative areas, rather than to reflect actual student competence relative to the
curriculum and methodologies already in place.

The California Association of Language Schools (CALS) Assessment Scale (See Figure 1, below.) is one such
assessment scale that was produced as an after-thought. The CALS Assessment Scale was designed to
evaluate the language proficiency of English as a Second Language students at the time they graduate from
CALS member school programs. It is analyzed in this paper to illustrate several difficulties that typically exist
in assessment scales.

−213−
人文社会学部紀要 VOL.1(2001.3)

Figure 1. California Association of Language Schools Assessment Scale

Oral Communication Listening Comprehension

1. Can make him/herself understood in a simple manner in 1. Can understand simple, clearly spoken words and
the most common everyday situations. commands.
2. Can communicate likes and dislikes in a simple manner. 2. Can understand simple English phrases if spoken slowly.
3. Can formulate complete sentences and respond accurately 3. Can understand simple conversational English when
to listener’
s questions. spoken to directly.
4. Can communicate adequately in everyday situations when 4. Can understand spoken English in areas of general and
speaking directly to the listener. personal interest.
5. Can express him/herself effectively and accurately in 5. Can understand standard conversational English spoken at
areas of personal interest. a normal speed.
6. Can express him/herself accurately over the telephone. 6. Can easily understand spoken English on the telephone or
7. Can converse naturally and confidently on unfamiliar a news broadcast.
topics. 7. Can comprehend subtle distinctions of usage when
8. Can convey nuances fluently and accurately in language speaking directly with a native English speaker.
appropriate to the situation. 8. Can understand the conversation of two native speakers
9. Can discuss abstract ideas with clarity and precision. when not directly addressed.
9. Has a near-native comprehension of spoken English.

Reading Comprehension Written Communication

1. Can understand simple signs or directions encountered in 1. Can write simple sentences with few grammatical errors.
the most common everyday situations. 2. Can convey simple information in memo form.
2. Can obtain some information from simple written English. 3. Can write connected sentences related to everyday
3. Can understand simple descriptions, explanations and situations or personal experience.
instructions in business or school matters. 4. Can write accurately on subjects or personal interest.
4. Can understand written passages on general subjects or 5. Can write accurately in letter form on a variety of
topics of personal interest. topics.
5. Can understand journals or technical reports in his/her 6. Can write business or formal letters appropriate to the
areas of competence. occasion.
6. Can comprehend what is written in newspapers, general 7. Can write accurate, well-structured English in essay form.
interest magazines or novels 8. Can express abstract ideas and opinions clearly.
7. Can understand complex written English on unfamiliar 9. Can use the written language in all forms with clarity and
topics. precision.
8. Can understand and appreciate subtle distinctions of style
and meaning in a wide range of written English.
9. Has near-native competence in understanding written
English.

An assessment scale provides a consistent framework for evaluating student language ability. However,
assessment scales are most often created by language teachers, rather than by psychometricians or curriculum
developers. Because of this, performance criteria in the assessment scale often lack validity -- the degree to
which a test measures what it was designed to measure -- and reliability -- the degree to which a test gives
consistent results. Although those doing the evaluating have a tool in the assessment scale, the results are
often irrelevant or, at best, vague, so, assessment scales lack credibility as a standardized tool, whether used
within a single language program or across CALS member school programs. Several such shortcomings of the
CALS Assessment Scale are pointed out below.

−214−
人文社会学部紀要 VOL.1(2001.3)

Arbitrariness

The CALS Assessment Scale is flawed in the arbitrariness of its arrangement of performance criteria (See
Figure 1.). Below are four questions, offered as non-exhaustive examples, and a discussion.

Questions

¡ Is formulating complete sentences and responding accurately to listeners questions (Oral Communication
#3.) easier or than expressing oneself effectively and accurately in areas of personal interest (Oral
Communication #5.)?

¡ Is easily understanding spoken English on the telephone or a news broadcast (Listening Comprehension
#6) easier or harder than understanding the conversation of two native speakers when not directly
addressed (Listening Comprehension #8)?

¡ Is understanding journals or technical reports in one’


s area of competence (Reading Comprehension #5)
easier or harder than understanding what is written in newspapers, general interest magazines, or novels
(Reading Comprehension #6)?

¡ Is writing business and formal letters appropriate to the occasion (Written Communication #6) easier or
harder than writing accurate, well-structured English in essay form (Written Communication #7)?

Discussion

The arbitrariness of the example items found in the questions, above, is a matter of opinion and case by case
analysis. Accepting this premise exemplifies the lack of validity and reliability of these performance criterion
since the opinion of one teacher or evaluator cannot be shown to measure consistent performance or be
consistently repeated.

In evaluating Oral Communication, for example, answering questions in a conversation implies less control
over the conversation than speaking about one’s area of interest does. One can control one’s own output, but
one has much less control over the questions another person might ask. Regarding Listening Comprehension,
to understand news broadcasts is often extremely difficult due to the limited broadcast time any one news item
way be allotted on the air, the specialized vocabulary, and the lack relating to the shared, non-verbal
communication present when two native speakers converse.

In addition, many daily newspapers are geared in grammar and vocabulary to native speaker junior high school
reading levels and may be easier to read than journals or reports in an area of competence. Such technical
writing may be filled with terms and concepts with which students or teachers may be urfamiliar regarding
Reading Comprehension. Producing business and formal letters appropriate to the occasion presupposes
familiarity with the ability to write well-constructed essays consistent with good Written Communication.

−215−
人文社会学部紀要 VOL.1(2001.3)

Pedagogical scope

Language schools lack the pedagogical scope, resources, and staff necessary to provide the myriad of
opportunities required to match the performance criteria included in the CALS Assessment Scale. Questions
and discussion follow.

Questions

¡ Do students find it easier to answer questions in a conversation over which they may have no control than
to talk about specific things of interest to them?

¡ Do language schools include designated telephone training components in their curricula?

¡ Do instructors consider comprehension of telephone conversations and news broadcasts reflective of an


intermediate level of listening competence.

¡ Do teachers possess sufficient and varied subject matter expertise or training to accurately evaluate the
level of a student’s technical writing?

Discussion

Questions asked in a conversation may vary greatly and randomly in difficulty and intent. Few language
schools offer courses to fully integrate the use of telephones, TVs, radios, or music into oral communication
curricula. Training in understanding news broadcasts is generally delayed until students have reached an
advanced listening comprehension level, if such training exists at all. In addition, few teachers are trained in a
multiple of specific areas of interest, nor are they trained in the distinctives of technical writing to the point of
being subject matter experts or critical evaluators for highly-specialized writing.

Subjective evaluation

Assessment scales such as the CALS Assessment Scale are replete with subjectivity. Again, recognition of
this fact shows a failure on the part of the developers to account for the validity or reliability of the instrument.
Additionally, teachers who try to fit the abilities of their students into the Assessment Scale have no way of
quantifying the performances upon which the ratings were based. Example criteria and discussion follow.

Criteria

¡ Can express himself/herself effectively and accurately in areas of personal interest (Oral Communication #5).

¡ Can easily understand spoken English on the telephone or a news broadcast (Oral Communication #6).

−216−
人文社会学部紀要 VOL.1(2001.3)

¡ Can understand journals or technical reports in his/her areas of competence (Reading Comprehension #5).

¡ Can write business and formal letters appropriate to the occasion (Written Communication #6).

Discussion

To accurately assess the example Oral Communication criterion, above, teachers, evaluators, or curriculum
designers would need to quantify the level of difficulty of each oral question, situation, or area of personal
interest. Regarding Listening Comprehension, they would need to design multi-situational evaluation
standards, as well as be involved in the phone calls, if not as participants, then as listeners.

Those evaluating Reading Comprehension would need to possess subject matter competence, themselves, in
journals and technical reports in student areas of competence in order to properly assess student ability.
Evaluators of Written Communication would additionally need to possess competence in the distinctives of
technical writing, as well as the style guide specific to the student’s field.

Modifying Assessment Scales

Producing a set of criteria ratings that accurately reflects student language abilities and dovetails with goals of
instruction and teaching methodology requires more than rewording or reorganizing that which already exists.
It requires going back to the curriculum planning stage in which goals and methodologies were discussed and
decided. Since language programs possess individual differences, except for those of a tightly-controlled chain
school with multiple locations, it is not feasible to assume one assessment scale would suffice for more than
one program. Still a major revision of the CALS Assessment Scale has merit for the purposes of paper because
a revision demonstrates several possibilities regarding what can be done to raise the level of importance
assigned to assessment scales, in general, as well as of improve their functional value and accuracy.

Choosing a framework

Three possible approaches are identified for consideration toward the design of an alternative assessment
scale.

¡ Revise some or all of the criteria ratings within the current framework.

¡ Create a modified set of criteria ratings within the current framework.

¡ Create a new framework.

One way to select the most appropriate approach for a program is to enlist the assistance of teachers and
administrators. In any program, a different selection might be preferred as the result of such feedback, but the
second approach is most frequently chosen because it offers more than mere window-dressing without

−217−
人文社会学部紀要 VOL.1(2001.3)

discarding a format with which colleagues are already familiar. Therefore, creating a modified set of criteria
ratings is the approach selected in this paper.

Each of the four sections of the CALS Assessment Scale contains nine criteria ratings. Nine is the outer limit
of what psychologists generally consider to be the number of items one’
s short-term memory can attend to at
any given time. Thus, an assessment scale with nine items each, for Oral Communication, Listening
Comprehension, Reading Comprehension, and Written Communication, is potentially cumbersome due the
pressure it puts on the limits of short-term memory when teachers and other evaluators try to use it to match
the ratings to the level of a student ability. In addition to being cognitively cumbersome in having nine items
each, many of the items are vague, ambiguous, or irrelevant for any given language program.

A better solution to creating an assessment scale with a large number of rating criteria is to present the
criteria as a progressive list of items linked to the goals of instruction or syllabus for each course or level. For
example, a scale would be more user-friendly for evaluators and more valid and reliable, even without
psychometric testing, if it were to reflect the material taught and the performance expectations for each class
or level.

Assuming a language program has a traditional college preparatory curriculum of five to ten graded grammar,
reading, writing, and listening, speaking, or conversation classes, an assessment scale night link criteria
ratings to performance expectations for each class or level. If a program had distinctives such as delayed
entry into one component until a certain competence was achieved, cessation of one component at a certain
level linked to commencement of a different component, or cessation of one component altogether, these
distinctives would be reflected in the assessment scale.

Revised scale

The language program associated with the CALS Assessment Scale within the context of this paper has the
following features:

¡ Ten levels of grammar and reading instruction.

¡ Seven levels of writing instruction that begin when a student progresses to the fourth level of both
grammar and reading instruction.

¡ Ten levels of varied listening, speaking, or conversation electives which are grouped into a ten-level
aural/oral component.

−218−
人文社会学部紀要 VOL.1(2001.3)

Figure 2. Revised Assessment Scale

Grammar Aural/Oral

1.Facility with simple present and present progressive 1.Facility with using simple sentences, survival vocabulary
verbs, singular and plural nouns, numbers, time 2.Accuracy with using simple sentences, practical
2.Accuracy with simple present, present progressive, simple vocabulary
past (Be), simple future (Be Going To) verbs, subject 3.Facility with spontaneous and high-interest topics,
pronouns vocabulary, and group interaction
3.Facility with simple future (Will), past progressive, phrasal 4.Facility with comprehension and analysis of information,
verbs, object pronouns expressing ideas, values, opinions
4.Facility with present perfect verbs, comparative and 5.Accuracy with comprehension and analysis of information,
superlative adjectives, adverbs expressing ideas, values, opinions
5.Facility with present perfect verbs, first and second 6.Accuracy with improving general listening and
conditionals, reflexive and emphatic pronouns, relative conversation skills
and noun clauses, gerunds 7.Facility with taking notes on class lectures, discussions on
6.Facility with past perfect and future perfect verbs, third American culture
conditional, passive voice, reported speech 8.Accuracy with taking notes on class lectures, discussions
7.Accuracy with all verb tenses, modals, auxiliaries on American culture, making inferences
8.Accuracy with active and passive voices, gerunds, 9.Mastery with taking notes on class lectures, discussions
infinitives, singular and plural nouns, expressions of on American culture using interactive activities, making
quantity, subject-verb agreement, personal and reflexive inferences
pronouns 10.Facility with anticipation, inference, differentiation with
9.Accuracy with adjective and noun clauses, relational radio broadcasts, related discussions
structures
10.Accuracy with conditionals, grammar terminology,
questions, negatives, articles, prepositions

Reading Writing

1.Facility with basic reading skills, vocabulary 1. - 3. Writing instruction begins consistent with criteria 1. -
2.Facility with comprehension skills, vocabulary 3. in Grammar and Reading components
3.Accuracy with comprehension skills of main idea and 4.Facility with gathering information and ideas, forming
inference, vocabulary expansion main ideas, writing, revising, and editing
4.Facility with academic reading for meaning, skimming, 5.Accuracy with gathering information and ideas, forming
scanning main ideas, writing, revising, editing
5.Accuracy with academic reading for meaning, skimming, 6.Facility with varied academic styles and structures of
scanning academic writing
6.Facility with guessing meaning from context, 7.Accuracy with varied academic styles and structures of
distinguishing fact from opinion academic writing
7.Accuracy with guessing meaning from context, 8.Facility with generating and expressing ideas, selecting,
distinguishing fact from opinion, increasing reading speed organizing, and expressing relevant information
8.Facility with analytical techniques of passage 9.Accuracy with generating and expressing ideas, selecting,
differentiation, analysis at word, sentence, and paragraph organizing, and expressing relevant information
levels 10.Facility with journalistic techniques, word processing,
9.Accuracy with analytical techniques of passage desktop publishing
differentiation, analysis at word, sentence, and paragraph
levels
10.Mastery with analytical techniques of passage
differentiation, analysis at word, sentence, and paragraph
levels

−219−
人文社会学部紀要 VOL.1(2001.3)

The modified CALS Assessment Scale, above, is based primarily on curriculum goals and teaching
methodologies and possesses both strengths and weaknesses. Several strengths and weaknesses are listed
below.

Strengths

¡ Proficiency criteria are linked to curriculum design, goals of instruction, teaching methods, and textbooks.

¡ Criteria ratings are systematic in that each rating presupposes competence in the previous rating.

¡ Criteria ratings are inter-connected, and therefore, easier to remember as a series of items.

¡ Criteria ratings are less susceptible to subjectivism by evaluators.

¡ Criteria ratings are systematic and not arbitrary concerning their arrangement and difficulty.

¡ Evaluations measure what they claim to measure and can be repeated.

Weaknesses

¡ Individuals not connected with second language education may not understand the criteria ratings.

¡ Transference is lacking between proficiency ratings and real-world, functional ability.

¡ Similarity of ratings between programs may be lacking.

¡ Grammar is not typically considered a language skill area.

Summary

Assessment scales are commonly-used instruments in language education. They are employed as pre-tests in
placing entering students and as post-tests in evaluating departing students. Many, if not most assessment
scales, are created not by testing personnel, but as after-thoughts to curriculum and pedagogical issues.
Therefore, their validity and reliability come into question: Do they assess what they claim to assess, and can
the findings be supported by data and repeated with precision?

Since assessment scales are generally created as after-thoughts, it may be impossible to return to curriculum
level planning solely to produce a scale. However, prior scales can be revised or new scales produced to reflect a
closer link between curriculum and pedagogy and the evaluation of student ability. Such a revision has been
proposed and discussed in this paper using the California Association of Language Schools Assessment Scale.

−220−
人文社会学部紀要 VOL.1(2001.3)

The CALS Assessment Scale was modified, to reflect the curricular and pedagogical distinctives of an example
language program. Although the proficiency criteria ratings were greatly modified, the basic structural
framework of the Assessment Scale was retained. Through this kind of undertaking, it is hoped that second
language programs and universities involved in foreign language instruction or study abroad programs will
create assessment tools that more accurately reflect the needs and distinctives of curricula, pedagogy,
faculties, and students.

References

California Association of Language Schools Assessment Scale. California Association of Language Schools. Sacramento,
California. 1998.

Interview. Program Director. Golden Gate Language Schools. Campbell, California. 1998.

Longman dictionary of applied linguistics. Jack Richards, John Platt, and Heidi Weber. Longman House, Essex, England. 1985.

Make your own language tests: a practical guide to writing language performance tests. Brendan J. Carroll and Patrick J. Hall.
Pergamon Press, New York. 1985.

Principles of Language Learning and Feaching. H. Douglas Brown. Prentice Hall, Engllwood Cliffs, NJ. 1987.

Techniques in testing. Harold S. Madsen. Oxford University Press, New York. 1983.

Testing Spoken English: a handbook of oral testing techniques. Nic Underhill. Cambridge University Press, New York. 1987.

−221−
人文社会学部紀要 VOL.1(2001.3)

−222−

You might also like