Contact Site Map Home APA Online Public Policy Home Public Policy Home
PPO Masthead
Science Policy Public Interest Policy Education Policy News Take Action Fellowships About PPO

EVA L. BAKER, PH.D.

UCLA Graduate School of Education & Information Studies

Center for the Study of Evaluation (CSE)

National Center for Research on Evaluation, Standards, and Student Testing (CRESST)

On behalf of the

AMERICAN PSYCHOLOGICAL ASSOCIATION

At a Congressional Briefing on

Educational Testing: Policy Considerations for the 107th Congress

 

 

2226 Rayburn House Office Building

March 20, 2001

 

My name is Eva Baker. I am a professor at UCLA and co-director of the Center for Research on Evaluation, Standards, and Student Testing (CRESST) of the Office of Educational Research and Improvement (OERI). I was also the co-chair for the revision of the Standards for Educational and Psychological Testing, published in 1999, sponsored by the American Psychological Association, the American Educational Research Association, and the National Council on Measurement in Education. I also serve as chair of the Board on Testing and Assessment of the National Research Council.

I will be discussing testing and the way it is used or planned to be used in the pursuit of educational excellence. The particular context is the Elementary and Secondary Education Act. We all know from experience that educational testing serves many different purposes. For instance, test performance is one criterion for obtaining alternative military jobs and is often used as part of a college admission process. Today, I will focus on tests (or assessments) that are intended to measure the achievement of students in our schools. Even achievement tests given in schools may have many different purposes?some are intended to provide diagnostic information to identify strengths and weaknesses of individual students, while others help determine whether students have attained a level of proficiency to allow them to continue to the next level. (For example, think of a student who needs to develop a given level of proficiency in Spanish 1 to proceed and have some success in Spanish 2.) Tests are also used to certify attainment at the end of a period of study or in a particular course. High school exit examinations, Advanced Placement courses, and the "A" level examinations in the United Kingdom are examples of such certification tests.

As tests have become important policy targets, intended to motivate students, teachers, parents, and educators, the purposes of tests have expanded. No longer are they only "after-the-fact" events to see how well students and schools have performed. Linked to content standards that attempt to describe the goals of an educational sequence, test and assessment results are supposed to provide motivation and clear indicators of levels of attainment and progress for individuals and institutions. So, in addition to individual students, test results are used to evaluate programs, to determine whether groups of students are making progress, to provide the basis for rewards and sanctions in accountability systems, and to guide instructional improvement strategies. Test results, whether viewed by parents or the community at large, have an enormous impact on perceptions of educational progress.

The simple question is this: Can we trust the results of tests as an important, but not exclusive, measure of educational quality? In the world of measurement, the technical quality of tests is judged by the concepts of validity, reliability, and fairness. Validation of a test means developing a "scientifically sound ? argument to support the intended interpretation of test scores and their relevance to the proposed use" (Standards for Educational and Psychological Testing, 1999, p. 9). Reliability is the consistency with which a test measures its domain or construct. Although many educational purposes have long been served by different tests, it is only very recently that policymakers would like to use a limited set of measures to meet many different purposes. For example, the SAT has been documented as predicting, with high school grade point average, first-year college grades. A validity argument has been made for that use. Yet, when the SAT was used as an outcome measure of high school educational programs, no such validity argument (or empirical data) was available. Thus, it was inappropriate for that application.

Let me present four validity and fairness precepts that we should address: (1) the test is a sample of what is to be learned, not all of what is to be learned; (2) the test adequately represents content standards; (3) the test measures the domain of interest and not extraneous factors; and (4) student effort and quality instruction produce learning and improved test scores.

Stated in a slightly different way, validity arguments should be developed for each purpose intended for a test or indicator. Tests should minimize factors irrelevant to the domain assessed?for instance, the use of unnecessarily complex syntax when the goal is to determine whether a student understands a science principle. All tests are only estimates, and may sometimes be wrong for individual students; thus, test specialists recommend that high-stakes decisions should not be made using the results of only one measure.

Special concern is often addressed to students with varying educational backgrounds who have not been successful in school in the past. To close the gap, these students will have to catch up, and catching up will mean that special effort is needed to assure that they learn required prior knowledge as well as new goals. We also want to support all students so that they become self-motivated and skilled learners, not just of what is on a particular test, but for the changing content and skills that they will need to master in their future lives.

Proposals for the Elementary and Secondary Education Act include changing the grade levels of testing and redefining the criterion for "adequate progress," and may support a mix of tests from state and local levels. To the extent possible, such decisions should be based on the best available evidence. States will need to develop new tests, and they will need financial support to do so. A commitment should also be made to evaluate accountability systems by every state. States and districts can then understand the impact of accountability and testing provisions on their students? academic growth and on the quality of their schools. Among continuing topics necessary for research and development are the matter of comparability among different tests, how external and local tests can be balanced, the components of adequate progress, and strategies to help schools both close achievement gaps and attain higher standards.

There are three sets of tools that can help states and districts develop or select high-quality measures of student learning. They are (1) the Standards for Educational and Psychological Testing, (2) a new set of recommendations, in combination with CRESST, CPRE, ECS, and a variety of organizations, on Standards for Educational Accountability Systems, and (3) evaluations of practice and the consistent collection and analyses of data from state and district accountability systems. Such studies will enable thoughtful, comprehensive reviews. Reliance on these tools, as well as others, will help us develop testing systems that can provide trustworthy information and reduce their unintended negative effects on students, schools, and policies.

Back to Top^

© 2008 American Psychological Association
750 First Street, NE, Washington, DC 20002-4242
Telephone: 800-374-2721; 202-336-5500. TDD/TTY: 202-336-6123
PsychNET® | Contact | Terms of Use | Privacy Policy | Security | Advertise with us