Appropriate Use of High-Stakes Testing in Our Nation's Schools
Measuring what and how well students learn is an important building block in the process of strengthening and improving our nation's schools. Tests, along with student grades and teacher evaluations, can provide critical measures of students' skills, knowledge, and abilities. Therefore, tests should be part of a system in which broad and equitable access to educational opportunity and advancement is provided to all students. Tests, when used properly, are among the most sound and objective ways to measure student performance. But, when test results are used inappropriately or as a single measure of performance, they can have unintended adverse consequences.
Today, many school districts are mandating tests to measure student performance and to hold individual schools and school systems accountable for that performance. Knowing if and what students are learning is important. Test results give classroom teachers important information on how well individual students are learning and provide feedback to the teachers themselves on their teaching methods and curriculum materials.
It is important to remember, however, that no test is valid for all purposes. Indeed, tests vary in their intended uses and in their ability to provide meaningful assessments of student learning. Therefore, while the goal of using large-scale testing to measure and improve student and school system performance is laudable, it is also critical that such tests are sound, are scored properly, and are used appropriately.
Some public officials and educational administrators are increasingly calling for the use of tests to make high-stakes decisions, such as whether a student will move on to the next grade level or receive a diploma. School officials using such tests must ensure that students are tested on a curriculum they have had a fair opportunity to learn, so that certain subgroups of students, such as racial and ethnic minority students or students with a disability or limited English proficiency, are not systematically excluded or disadvantaged by the test or the test-taking conditions. Furthermore, high-stakes decisions should not be made on the basis of a single test score, because a single test can only provide a "snapshot" of student achievement and may not accurately reflect an entire year's worth of student progress and achievement.
The potential problem with the current increased emphasis on testing is not necessarily the test, per se, but the instances when tests have unintended and potentially negative consequences for individual students, groups of students, or the educational system more broadly. But, it is also critical to remember that, in many instances, without tests, low-performing students and schools could remain invisible and therefore not get the extra resources or remedial help that they need.
The measurement validity of a test is an extremely important concept. Measurement validity simply means whether a test provides useful information for a particular purpose. Said another way: Will the test accurately measure the test taker's knowledge in the content area being tested?
When tests are developed and used appropriately, they are among the most sound and objective knowledge and performance measures available. But, appropriate development and use are critical. Fairness in testing begins when tests are being developed. Test developers should provide to those using their tests (school systems, for example) specific information about the potential limitations of the test, including situations in which the use of the test scores would be inappropriate. For example, a test that has been validated only for diagnosing strengths and weaknesses of individual students should not be used to evaluate the educational quality of a school. Furthermore, those using a particular test should have an appreciation for how the test performance of some students--students with a disability or those with limited English-speaking ability, for example, should be interpreted.
The Standards for Educational and Psychological Testing,* created by the American Psychological Association, the American Educational Research Association, and the National Council on Measurement in Education, present a number of principles that are designed to promote fairness in testing and avoid unintended consequences. They include:
Any decision about a student's continued education, such as retention, tracking, or graduation, should not be based on the results of a single test, but should include other relevant and valid information.
When test results substantially contribute to decisions made about student promotion or graduation, there should be evidence that the test addresses only the specific or generalized content and skills that students have had an opportunity to learn. For tests that will determine a student's eligibility for promotion to the next grade or for high school graduation, students should be granted, if needed, multiple opportunities to demonstrate mastery of materials through equivalent testing procedures.
When a school district, state, or some other authority mandates a test, the ways in which the test results are intended to be used should be clearly described. It is also the responsibility of those who mandate the test to monitor its impact, particularly on racial and ethnic-minority students or students of lower socioeconomic status, and to identify and minimize potential negative consequences of such testing.
In some cases, special accommodations for students with limited English proficiency may be necessary to obtain valid test scores. If students with limited English skills are to be tested in English, their test scores should be interpreted in light of their limited English skills. For example, when a student lacks proficiency in the language in which the test is given (students for whom English is a second language for example), the test could become a measure of their ability to communicate in English rather than a measure of other skills.
Likewise, special accommodations may be needed to ensure that test scores are valid for students with disabilities. Not enough is currently known about how particular test modifications may affect the test scores of students with disabilities; more research is needed. As a first step, test developers should include students with disabilities in field testing of pilot tests and document the impact of particular modifications (if any) for test users.
Calls to improve educational outcomes by measuring student and school performance are based on good intentions. And, as previously stated, tests, when used appropriately, can be valid measures of student achievement. However, test users must ensure that results are truly indicative of student achievement rather than a reflection of the quality of school resources or instruction. It is only fair to use test results in high-stakes decisions when students have had a real opportunity to master the materials upon which the test is based.
Therefore, in conjunction with supporting the use of tests to evaluate performance, public policymakers should also support research on the consequences of such testing, and localities should work to provide the resources necessary for schools to provide quality educational opportunities and achieve real student growth and learning, not just "teaching to the test" skills acquisition. Test results should also be reported by sex, race/ethnicity, income level, disability status, and degree of English proficiency for evaluation purposes.
In summary, testing is an extremely valuable part of educational assessment, but it is only a part of the formula for quality learning. When tests are used in high-stakes circumstances, a number of safeguards must be in place. Test developers must ensure that certain groups of students are not disadvantaged by a test, and test users must guard against allowing the testing process--the need for students to pass a certain test--to overwhelm the rest of a student's mastery of a wide curriculum. Furthermore, remedial programs should be in place for students who score low or fail such tests.
Because the stakes are so high for so many students, additional research should begin immediately to learn more about the intended and unintended consequences of testing in educational decision making. If tests are going to be used to determine which students will advance and what subjects schools will teach, it is imperative that we understand how best to measure student learning and how the use of high-stakes testing will affect student drop-out rates, graduation rates, course content, levels of student anxiety, and teaching practices. The bottom-line question, as yet unanswered, is: What will be the long-term effect of high-stakes testing on student achievement? Will it enhance or diminish broad-based learning?