2018年3月9日 星期五

Teaching by Principles: Chapter 21 Language Assessment I Basic Concepts in Test Development

Teaching by Principles: Chapter 21 Language Assessment I Basic Concepts in Test Development

What is a Test?
-A test is first a method.
 ˙It is a set of techniques, procedures, and items that constitute an instrument of some sort that requires performance or activity on the part of test-taker (and sometimes on the part of the tester as well.)
 ˙The method may be intuitive and informal, as in the case of a holistic impression of someone’s authenticity of pronunciation. 
 ˙It may be quite explicit and structured, as in a multiple-choice technique in which correct responses have already been specified by some “objective” means.
-A test has the purpose of measuring.
˙The informal assessment is difficult to quantify and the judgments are rendered in somewhat global terms.
˙Formal tests, in which carefully planned techniques of assessment are used, rely more on quantification, especially for comparison either within an individual or across individual.
-A test measures a person’s ability or knowledge
˙What is the test-takers’ previous experience and background?
˙Is the test appropriate for them?
˙How are scores to be interpreted for individuals?
-Being measured in a test is ability or competence.
˙A test samples performances but infers certain competence
˙From the results of the test the examiner infers a certain level of general reading ability.
-A test measures a given domain.
˙One of the biggest obstacles to overcome in constructing adequate tests is to measure the desired criterion and not inadvertently include other factors.

Practicality
-A good test is practical. It is within the means of financial limitations, time constraints, ease of administration, and scoring and interpretation.
-In norm-referenced tests, each test-taker’s score is interpreted in relation to a mean, median, standard deviation, and/or percentile rank.
-Criterion-referenced tests are designed to give test-takers feedback on specific course or lesson objective, that is, the “criteria.” 

Reliability
-A reliable test is consistent and dependable.
-If you give the same test to the same subject or matched subjects on two different occasions, the test itself should yield similar result; it should have test reliability.
-Scorer reliability of the consistency of scoring by two or more scorers. If very subjective techniques are employed in the scoring of a test, one would not expect to find high reliability.

Validity
-Validity is the degree to which the test actually measures what is intended to measure.
-If there is convincing evidence that a test accurately and sufficiently measures the test-taker for the particular objective, or criterion, of the test, then the test may be said to have criterion validity.

Content validity
-If a test actually samples the subject matter about which conclusions are to be drawn, if it requires the test-taker to perform the behavior that is being measured, it can claim content validity.
-You can usually determine content validity, observationally, if you can clearly define the achievement that you are measuring.  

Face Validity
-To achieve “peak” performance on a test, a learner needs to be convinced that the test is indeed testing what it claims to test.
-If the test samples the actual content of what the learner has achieved or expects to achieve, then face validity will be perceived.

Construct Validity
-“Does this test actually tap into the theoretical construct as it has been defined?”
-Proficiency, communicative competence, self-esteem (這些測驗往往沒有 construct validity)
-Standardized test designed to be given to large numbers of students typically suffer from poor content validity but are redeemed through their construct validation.

Kinds of Tests
1. Proficiency tests
-If your aim in a test is to tap global competence in a language, then you are testing proficiency.
-Proficiency tests have traditionally consisted of standardized multiple-choice items on grammar, vocabulary, reading comprehension, aural comprehension, and some times on a sample of writing.
-Such tests often have content validity weakness, but after several decades of construct validation research, some great strides have been made toward constructing communicative proficiency tests.
2. Diagnostic tests
-This test is designed to diagnose a particular aspect of a language.’
-These tests offer a checklist of features for the administrator to use in pinpointing difficulties.
-It is not advisable to use a general achievement test as a diagnostic, since diagnostic tests need to be specifically tailored to offer information on student need that will be worked on imminently.
-Achievement tests are useful in analyzing the extent to which students have acquired language features that have already been taught.
3. Placement tests
-Certain proficiency tests and diagnostic tests can act in the role of placement tests, whose purpose is to place a student into an appropriate level or section of a language curriculum or school.
4. Achievement tests
-An achievement test is related directly to classroom lessons, units, or even a total curriculum.
-Achievement tests are limited to particular material covered in a curriculum within a particular time frame, and are offered after a course has covered the objectives in question.
-Achievement tests can serve as indicators of features that a student needs to work on in the future, but primary role of an achievement test is to determine acquisition of course objectives at the end of a period of instruction.
5. Aptitude tests
-This test predicts a person’s future success prior to any exposure to the second language.
-This is designed to measure a person’s capacity or general ability to learn a foreign language and to be successful in that undertaking.
-Aptitude tests are considered to be independent of a particular language.
-Today, the measurement of language aptitude has taken the direction of providing learners with information about their preferred styles and their potential strengths and weaknesses.
-Possible techniques and procedures within each of the five categories of tests
˙objective to subjective scoring procedures
˙open-ended to structured response options
˙multiple-choice to fill-in-the-blank item design formats
˙written to oral performance modes
-Test of each of the modes of performance can be focused on a continuum of linguistic units, from smaller to larger: phonology and orthography, words, sentences, and discourse.
-In interpreting a test, it is important to note which linguistic units are being tested.

Historical Development in Language Testing
-Discrete point testing methods
˙They are constructed on the assumption that language can be broken down into its component parts and those parts adequately tested.
˙Those components are basically the skills of listening, speaking, reading, writing, the various hierarchical units of language (phonology/graphology, morphology, lexicon, syntax, discourse) within each skill, and subcategories within those units.
-Integrative testing methods
˙Communicative competence is so global and requires integrative testing methods that it cannot be captured in additive tests of grammar and reading and vocabulary and other discrete points of language.
˙cloze tests
˙dictation
˙Unitary trait hypothesis suggested an “indivisible” view of language proficiency that vocabulary, grammar, phonology, the four “skills,” and other discrete points of language cannot, in fact, be distinguished from each other. There is a general factor of language proficiency such that all the discrete points do not add up to that whole.

Large-Scale Language Proficiency Testing
-Along with the components of organizational (phonology, grammar, discourse) competence, language tests of the new millennium are focusing on the pragmatic (sociolinguistic, functional), strategic, and interpersonal/affective components of language ability.
-Bachman (1991): A communicative test has to be pragmatic in that it requires the learner to use language naturally for genuine communication and to relate to thoughts and feelings, in short, to put authentic language to use within a context. It should be direct. It should test the learner in a variety of language functions.
-Four distinguishing characteristics of a communicative test suggested by Bachman.
˙Such tests create an “information gap,” requiring test takers to process complementary information through the use of multiple sources of input.
˙Task dependency: tasks in one section of the test builds upon the content of earlier sections, including the test taker’s answers to those sections. 
˙The integration of test tasks and content within a given domain of discourse.
˙Measurement of a much broader range of language abilities—including knowledge of cohesion, functions, and sociolinguistic appropriateness.
-Merrill Swain’s operationalization of traits in second language proficiency test: Table 21.1

Oral Proficiency Testing
-One of the toughest challenges of large-scale communicative testing has been to construct practical, reliable, and valid tests of oral production ability.
-The best tests of oral proficiency involve a one-on-one tester/test-taker relationship, “live” performance, a careful specification of tasks to be accomplished during the test, and a scoring rubric that is truly descriptive of ability.

Critical Language Testing: Ethical Issues
-Large-scale testing is not an unbiased process, but rather is the “agent of culture, social, political, educational, and ideological agendas that shape the lives of individual participants, teachers, and learners.”
-Problems
˙Widespread conviction that standardized tests designed by reputable test manufacturers are infallible in their predictive validity.
˙The agendas of those who design and those who utilize the tests.
-As a language teacher, you might be able to exercise some influence in the ways tests are used and interpreted in your own context. Perhaps, if you are offered a variety of choices in standardized tests, you could choose a test that offers the least degree of culture bias. 

  

1 則留言: