|
Professional Standards to Ensure the Fair and Appropriate Use of Testing
in High-Stakes Educational Decisions
Psychologists have unique expertise to lend to policy discussions involving educational and psychological testing. Psychological scientists developed the first intelligence tests, educational achievement and ability measures, and personnel selection instruments. They continue to be at the forefront of assessment technology, and remain committed to the fair and appropriate use of tests, particularly in high-stakes situations with significant consequences for individuals.
The Standards for Educational and Psychological Testing (1999) is widely recognized as an authoritative document that addresses test development and use. Co-developed by the American Educational Research Association (AERA), the American Psychological Association (APA), and the National Council on Measurement in Education (NCME), the Standards are also adopted by the U.S. Department of Education, cited in policy guidance issued by the Equal Employment Opportunity Commission (EEOC), and frequently relied upon in legal cases related to educational and employment testing.
In an effort to help inform some of the legislative and policy issues surrounding high stakes testing in education, APA's Public Policy Office has selected the following list of individual standards from that authoritative document as having particular relevance to issues of fairness and appropriate test use in high stakes educational decisions. These standards are presented in three groups according to whether they address test use, administration, or interpretation. Full descriptions of these and the other relevant standards pertaining to high stakes educational testing are provided in the accompanying
appendix.
Standards Addressing How Tests Are Used
Responsibilities of Entities Mandating Testing
When educational testing programs are mandated by school, district, state, or other authorities, the ways in which test results are intended to be used should be clearly described. It is the responsibility of those who mandate the use of tests to monitor their impact and to identify and minimize potential negative consequences. Consequences resulting from the uses of the test, both intended and unintended, should also be examined by the test user. (Standard 13.1)
The Use of Specific Tests to Measure or Implement Policies
When tests or assessments are proposed for use as instruments of social, educational, or public policy, the test developers or users proposing the test should fully and accurately inform policymakers of the characteristics of the tests as well as any relevant and credible information that may be available concerning the likely consequences of test use. (Standard 7.9)
Scope of Information Tested
When test results substantially contribute to making decisions about student promotion or graduation, there should be evidence that the test adequately covers only the specific or generalized content and skills that students have had an opportunity to learn. (Standard 13.5)
The Use of Single Tests to Make Decisions
In educational settings, a decision or characterization that will have major impact on a student should not be made on the basis of a single test score. Other relevant information should be taken into account if it will enhance the overall validity of the decision. (Standard 13.7)
Providing Students the Opportunity to Demonstrate their Abilities
Students who must demonstrate mastery of certain skills or knowledge before being promoted or granted a diploma should have a reasonable number of opportunities to succeed on equivalent forms of the test or be provided with construct-equivalent testing alternatives of equal difficulty to demonstrate the skills or knowledge. In most circumstances, when students are provided with multiple opportunities to demonstrate mastery, the time interval between the opportunities should allow for students to have the opportunity to obtain the relevant instructional experiences. (Standard 13.6)
Standards Addressing How Tests Are Administered
Teaching to the Test
The integrity of test results should be maintained by eliminating practices designed to raise test scores without improving performance on the construct or domain measured by the test. (Standard 15.9)
Language Proficiency Concerns
Testing practice should be designed to reduce threats to the reliability and validity of test score inferences that may arise from language differences. (Standard 9.1)
Testing Students with Disabilities
In testing individuals with disabilities, test developers, test administrators, and test users should take steps to ensure that the test score inferences accurately reflect the intended construct rather than any disabilities and their associated characteristics extraneous to the intent of the measurement. (Standard 10.1)
Test Administration for Special Populations
If a test is mandated for persons of a given age or all students in a particular grade, users should identify individuals whose disabilities or linguistic background indicates the need for special accommodations in test administration and ensure that these accommodations are employed. (Standard 11.23)
Supervising Test Administration
In educational settings, those who supervise others in test selection, administration, and interpretation should have received education and training in testing necessary to ensure familiarity with the evidence for validity and reliability for tests used in the educational setting and to be prepared to articulate or to ensure that others articulate a logical explanation of the relationship among the tests used, the purposes they serve, and the interpretations of the test scores. (Standard 13.12)
Standards Addressing How Tests Are Interpreted
Differences Between Groups on Test Scores
When the use of a test results in outcomes that affect the life chances or educational opportunities of examinees, evidence of mean test score differences between relevant subgroups of examinees should, where feasible, be examined for subgroups for which credible research reports mean differences for similar tests. Where mean differences are found, an investigation should be undertaken to determine that such differences are not attributable to a source of construct underrepresentation or construct-irrelevant variance. While initially the responsibility of the test developer, the test user bears responsibility for uses with groups other than those specified by the developer. (Standard 7.10)
Interpreting Group Differences
In educational settings, reports of group differences in test scores should be accompanied by relevant contextual information, where possible, to enable meaningful interpretation of these differences. Where appropriate contextual information is not available, users should be cautioned against misinterpretation. (Standard 13.15)
Using Information Beyond Test Scores
In educational, clinical, and counseling settings, a test taker's score should not be interpreted in isolation; collateral information that may lead to alternative explanations for the examinee's test performance should be considered. (Standard 11.20)
The Relationship Between Test Scores and Other Factors
When tests scores are intended to be used as part of the process for making decisions for educational placement, promotion, or implementation of prescribed educational plans, empirical evidence documenting the relationship among particular test scores, the instructional programs, and desired student outcomes should be provided. When adequate empirical evidence is not available, users should be cautioned to weigh the test results accordingly in light of other relevant information about the student. (Standard 13.9)
Qualifications of Individuals Interpreting the Test Scores
Those who mandate testing programs should ensure that the individuals who interpret the test results to make decisions within the school or program context are qualified to assume this responsibility and proficient in the appropriate methods for interpreting test results. (Standard 15.13)
(2/28/01)
Back to Top^
|