Assessment for Learning

The term “formative evaluation” was first used by Michael Scriven (1967) in connection with curriculum and teaching, but it was Bloom, Hastings, and Madaus (1971) who gave the term its generally accepted current meaning. They defined summative evaluation tests as those tests given at the end of episodes of teaching (units, courses, etc.) for the purpose of grading or certifying students, or for evaluating the effectiveness of a curriculum, in contrast with “another type of evaluation which all who are involved - student, teacher, curriculum maker - would welcome because they find it so useful in helping them improve what they wish to do” (p. 117). They labeled this formative evaluation. The language Bloom et al. used obscures the fact that the distinction between formative and summative applies to how the data from assessments are used, rather than to the assessments themselves (Wiliam and Black, 1996). More recently, the term assessment for learning has become increasingly popular (Broadfoot et al., 1999), although this term often describes the purpose of an assessment (which may be just an aspiration) rather than the function it actually fulfills. Black, Harrison, Lee, Marshall and Wiliam (2004) clarified the distinction between “formative assessment” and “assessment for learning” as follows:

Assessment for learning is any assessment for which the first priority in its design and practice is to serve the purpose of promoting students’ learning.[…] An assessment activity can help learning if it provides information to be used as feedback, by teachers, and by their students in assessing themselves and each other, to modify the teaching and learning activities in which they are engaged. Such assessment becomes “formative assessment” when the evidence is actually used to adapt the teaching work to meet learning needs. (p. 8)

In this proposal, the terms “formative assessment” and “assessment for learning” and “AfL” will be used interchangeably; it should be understood that the usage of each refers to assessment information that is used to adapt instruction to meet student learning needs.

Reviews (Black and Wiliam, 1998; Crooks, 1988; Natriello, 1987) provide clear evidence that improving the quality of formative assessment increases student achievement. Natriello’s review covers the full range of assessment purposes (which he classified as certification, selection, direction and motivation), while Crooks’ review covers only what he termed “classroom evaluation.” The review by Black and Wiliam built on four key reviews of research published since those by Natriello and Crooks: reviews by Bangert-Drowns and the Kuliks into the effects of classroom testing (Bangert-Drowns, Kulik, and Kulik, 1991; Bangert-Drowns, Kulik, Kulik, and Morgan, 1991; Kulik, Kulik, and Bangert-Drowns, 1990) and a review by Black on summative and formative assessment in science education (Black, 1993).

Natriello’s review used a model of the assessment cycle beginning with purposes, then moving on to the setting of tasks, criteria and standards, evaluating performance and providing feedback, and finally the impact of these evaluation processes on students. His most significant point was that most of the research in this area was largely irrelevant because of weak theorization, which resulted in the conflation of key distinctions, e.g., the quality versus the quantity of feedback. Crooks’ paper had a narrower focus: the impact of evaluation practices on students. He concluded that the summative function of assessment has been too dominant and that more emphasis should be given to the potential of classroom assessments to assist learning. Most importantly, assessments must emphasize the skills, knowledge and attitudes regarded as most important, not just those that are easy to assess. Black and Wiliam’s review, like Crooks’, focused specifically on day-to-day classroom assessment practices, and found that improvements in the quality of formative assessment resulted in effect sizes of the order of 0.4 to 0.7 standard deviations (equivalent to doubling the rate of learning). Equally important, given the achievement gaps that currently exist in the United States, many of the studies reviewed by Black and Wiliam benefited lower achieving students more, thus reducing achievement gaps. A more recent review of the literature on the effects of feedback and formative assessment in post-secondary education (Nyquist, 2003) found effects of similar magnitude, and, perhaps more significantly, showed that the larger effect sizes were associated with stronger implementations of the principles of assessment for learning.