Teaching by Principles: Chapter 21
Language Assessment I Basic Concepts in Test Development
What is a Test?
-A
test is first a method.
˙It is a set of techniques,
procedures, and items that constitute an instrument of some sort that requires
performance or activity on the part of test-taker (and sometimes on the part of
the tester as well.)
˙The method may be intuitive and
informal, as in the case of a holistic impression of someone’s authenticity of
pronunciation.
˙It may be quite explicit and
structured, as in a multiple-choice technique in which correct responses have
already been specified by some “objective” means.
-A
test has the purpose of measuring.
˙The informal
assessment is difficult to quantify and the judgments are rendered in somewhat
global terms.
˙Formal tests, in which
carefully planned techniques of assessment are used, rely more on
quantification, especially for comparison either within an individual or across
individual.
-A
test measures a person’s ability or
knowledge
˙What is the
test-takers’ previous experience and background?
˙Is the test
appropriate for them?
˙How are scores to be
interpreted for individuals?
-Being
measured in a test is ability or
competence.
˙A test samples performances
but infers certain competence
˙From the results of
the test the examiner infers a certain level of general reading ability.
-A
test measures a given domain.
˙One of the biggest
obstacles to overcome in constructing adequate tests is to measure the desired criterion and not inadvertently include
other factors.
Practicality
-A
good test is practical. It is within the means of financial limitations, time
constraints, ease of administration, and scoring and interpretation.
-In
norm-referenced tests, each
test-taker’s score is interpreted in relation to a mean, median, standard
deviation, and/or percentile rank.
-Criterion-referenced tests are designed
to give test-takers feedback on specific course or lesson objective, that is,
the “criteria.”
Reliability
-A
reliable test is consistent and
dependable.
-If
you give the same test to the same subject or matched subjects on two different
occasions, the test itself should yield similar result; it should have test reliability.
-Scorer reliability of the consistency
of scoring by two or more scorers. If very subjective techniques are employed
in the scoring of a test, one would not expect to find high reliability.
Validity
-Validity is the degree to which the
test actually measures what is intended to measure.
-If
there is convincing evidence that a test accurately and sufficiently measures
the test-taker for the particular objective, or criterion, of the test, then the test may be said to have criterion validity.
Content validity
-If
a test actually samples the subject matter about which conclusions are to be
drawn, if it requires the test-taker to perform the behavior that is being
measured, it can claim content validity.
-You
can usually determine content validity, observationally, if you can clearly
define the achievement that you are measuring.
Face Validity
-To
achieve “peak” performance on a test, a learner needs to be convinced that the
test is indeed testing what it claims to test.
-If
the test samples the actual content of what the learner has achieved or expects
to achieve, then face validity will be perceived.
Construct Validity
-“Does
this test actually tap into the theoretical construct as it has been defined?”
-Proficiency,
communicative competence, self-esteem (這些測驗往往沒有 construct validity)
-Standardized
test designed to be given to large numbers of students typically suffer from
poor content validity but are redeemed through their construct validation.
Kinds of Tests
1.
Proficiency tests
-If
your aim in a test is to tap global competence in a language, then you are
testing proficiency.
-Proficiency
tests have traditionally consisted of standardized multiple-choice items on
grammar, vocabulary, reading comprehension, aural comprehension, and some times
on a sample of writing.
-Such
tests often have content validity weakness, but after several decades of
construct validation research, some great strides have been made toward
constructing communicative proficiency tests.
2.
Diagnostic tests
-This
test is designed to diagnose a particular aspect of a language.’
-These
tests offer a checklist of features for the administrator to use in pinpointing
difficulties.
-It
is not advisable to use a general achievement test as a diagnostic, since
diagnostic tests need to be specifically tailored to offer information on
student need that will be worked on imminently.
-Achievement
tests are useful in analyzing the extent to which students have acquired
language features that have already been taught.
3.
Placement tests
-Certain
proficiency tests and diagnostic tests can act in the role of placement tests,
whose purpose is to place a student into an appropriate level or section of a
language curriculum or school.
4.
Achievement tests
-An
achievement test is related directly to classroom lessons, units, or even a
total curriculum.
-Achievement
tests are limited to particular material covered in a curriculum within a
particular time frame, and are offered after a course has covered the objectives
in question.
-Achievement
tests can serve as indicators of features that a student needs to work on in
the future, but primary role of an achievement test is to determine acquisition
of course objectives at the end of a period of instruction.
5. Aptitude tests
-This
test predicts a person’s future
success prior to any exposure to the
second language.
-This
is designed to measure a person’s capacity or general ability to learn a
foreign language and to be successful in that undertaking.
-Aptitude
tests are considered to be independent of a particular language.
-Today,
the measurement of language aptitude has taken the direction of providing
learners with information about their preferred styles and their potential
strengths and weaknesses.
-Possible
techniques and procedures within each of the five categories of tests
˙objective to
subjective scoring procedures
˙open-ended to
structured response options
˙multiple-choice to
fill-in-the-blank item design formats
˙written to oral
performance modes
-Test
of each of the modes of performance can be focused on a continuum of linguistic
units, from smaller to larger: phonology and orthography, words, sentences, and
discourse.
-In
interpreting a test, it is important to note which linguistic units are being
tested.
Historical Development in
Language Testing
-Discrete point testing methods
˙They are constructed
on the assumption that language can be broken down into its component parts and
those parts adequately tested.
˙Those components are
basically the skills of listening, speaking, reading, writing, the various
hierarchical units of language (phonology/graphology, morphology, lexicon,
syntax, discourse) within each skill, and subcategories within those units.
-Integrative testing methods
˙Communicative
competence is so global and requires integrative testing methods that it cannot
be captured in additive tests of grammar and reading and vocabulary and other
discrete points of language.
˙cloze tests
˙dictation
˙Unitary trait
hypothesis suggested an “indivisible” view of language proficiency that
vocabulary, grammar, phonology, the four “skills,” and other discrete points of
language cannot, in fact, be distinguished from each other. There is a general
factor of language proficiency such that all the discrete points do not add up
to that whole.
Large-Scale Language Proficiency
Testing
-Along
with the components of organizational (phonology, grammar, discourse)
competence, language tests of the new millennium are focusing on the pragmatic
(sociolinguistic, functional), strategic, and interpersonal/affective
components of language ability.
-Bachman
(1991): A communicative test has to be pragmatic
in that it requires the learner to use language naturally for genuine
communication and to relate to thoughts and feelings, in short, to put
authentic language to use within a context. It should be direct. It should test the learner in a variety of language
functions.
-Four
distinguishing characteristics of a communicative test suggested by Bachman.
˙Such tests create an “information
gap,” requiring test takers to process complementary information through the
use of multiple sources of input.
˙Task dependency: tasks
in one section of the test builds upon the content of earlier sections,
including the test taker’s answers to those sections.
˙The integration of
test tasks and content within a given domain of discourse.
˙Measurement of a much
broader range of language abilities—including knowledge of cohesion, functions,
and sociolinguistic appropriateness.
-Merrill
Swain’s operationalization of traits in second language proficiency test: Table
21.1
Oral Proficiency Testing
-One
of the toughest challenges of large-scale communicative testing has been to
construct practical, reliable, and valid tests of oral production ability.
-The
best tests of oral proficiency involve a one-on-one tester/test-taker
relationship, “live” performance, a careful specification of tasks to be
accomplished during the test, and a scoring rubric that is truly descriptive of
ability.
Critical Language Testing:
Ethical Issues
-Large-scale
testing is not an unbiased process, but rather is the “agent of culture,
social, political, educational, and ideological agendas that shape the lives of
individual participants, teachers, and learners.”
-Problems
˙Widespread conviction
that standardized tests designed by reputable test manufacturers are infallible
in their predictive validity.
˙The agendas of those
who design and those who utilize the tests.
-As
a language teacher, you might be able to exercise some influence in the ways
tests are used and interpreted in your own context. Perhaps, if you are offered
a variety of choices in standardized tests, you could choose a test that offers
the least degree of culture bias.
Thanks for the summary, it's very useful
回覆刪除