cec logo Alabama Federation CEC (AFCEC) logo with pictures of kids
 
Alabama
Federation CEC

officers

membership

conference

publications

constitution

CEC mission

related links

AFCEC homepage

teacher and kids
TIPS FOR TEACHERS
Assessing for Instructional and Behavior Problems: Constructing Teacher-Made Tests
What is a Criterion-Referenced Test? A criterion-referenced test is a measurement tool designed to estimate mastery of an identified unit of a curriculum (e.g., battles of the Civil War, multi-digit addition with regrouping, use of prepositions).  In somewhat different forms, they may also be referred to as curriculum-based measures, and more broadly, as curriculum-based assessment. Criterion-referenced tests are standardized instruments, which are constructed with sufficient precision such that different examiners will administer, score and interpret results in the same way. Criterion-referenced tests contain items designed to represent the unit of instruction adequately.  Each item has a predetermined correct answer that can be scored objectively by the assessor. A criterion-referenced test may be used for two main purposes. First, it can be used to determined whether or not a student is weak in a given skill, and therefore, needs further instruction. Second, it can be used following instruction to determine the effectiveness of instruction. Although they are seldom normed nationally, it is very beneficial to collect sufficient data on which to compute local means for appropriate grade groups.

Steps in Constructing a Criterion-Referenced Test

Step 1: Naming the Test. Although it seems trivial, it is important to give the test a name that accurately represents its content.  Over the course of years, teachers construct many tests, and a filing system that allows efficient retrieval for future use demands that test names obviously reflect their content. 

Step 2: Objective(s) Represented by the Test. It is important that tests be created after objectives have been specified.  The test items are then constructed to reflect mastery of the objectives, not the other way around.  Ideally, the objectives will be drawn from a large Taxonomy of Objectives designed to cover the entire domain.  A good objective contains: (a) conditions under which the behavior will occur, (b) a statement of the behavior demanded of the examinee, and (c) criteria that will be used to determine whether or not the examinee has mastered the objective. The behavior should be objectively defined such that two people would agree that the behavior has or has not occurred. Criteria should be in the form of one of a number of recognized scores: percentage correct, behavior rates, duration, response latency, intensity, standard score, percentile rank, etc.

Step 3: Statement of the Purpose of the Test. This is merely a restatement of the objective in more easily communicated form.  That is, it uses everyday language without the technical verbiage.

Step 4: Instructions for Administration. This component tells the user how the test should be given. This will help to standardize data collection so that from occasion-to-occasion, from child-to-child, and from examiner-to-examiner the test is administered in the same way. This makes the results (i.e., the scores) comparable. Typical elements included are:  (a) instructions to the child, (b) materials needed (e.g., two sharpened number 2 pencils, use of a watch for timing purposes), (c) how to deal with interruptions, (d) how to deal with questions from the child, and many, many more. The test maker must ask herself what elements impinge upon the successful administration of her test.

Step 5: Instructions for Scoring. This section tells the user how to transform the examinee's responses into item scores and total scores.  This often means providing criteria for correct and incorrect responses to individual items in a scoring key.  There may be a formula required to obtain a total score (e.g., the formula for behavior rates) that should be illustrated for the uninformed user.

Step 6: Instructions for Interpretation. Here the user is told how to make decisions on the basis of the score(s) obtained from an administration of the instrument.  Basically, the criteria for minimally acceptable performance laid out in the objective guide this process. For instance, if the criterion mentions 95% accuracy, then the user should compare the examinee's score with 95%.  If the examinee's score equals or exceeds that value, the child has mastered the objective.  If not, then the objective needs more instruction and practice. 

Step 7: Specific Items in the Instrument. The key here is for the test maker to ensure that the items in the test are representative of the skills specified in the objectives.  First, there must be enough items to comprise a reliable sample of the skills in question.  It is rarely possible to have a reliable measure of any objective with less than 25 items.  Second, the items should adequately represent the various kinds of subskills contained within an objective.  For instance, a test on addition facts is unrepresentative if it does not include items containing 6 or 8.

Step 8: Standard Error of Measurement. The standard error of measurement (SEM) of a test can be estimated using the number of items contained in the test (Eaves, 1979).  This estimate should be included along with the other components of the test.  As an example of the use of the SEM, consider the student who obtains a raw score of 7 on a test containing 11 items. His percentage correct is 64%.  Because the percentage does not fall into one of the exceptions, the estimated SEM is 2 (for tests with less than 24 items).  In order to construct a 95% confidence interval, the assessor should double the SEM (i.e., 2 X 2 = 4).  Next, the product is subtracted from the student's raw score (7 - 4 = 3), then the product is added to the student's raw score (7 + 4 = 11).  These values represent the 95% confidence interval in raw-score form (i.e., 3 - 11).  In
percentage-correct form, the assessor can say, with the knowledge that he will be correct on 95 out of 100 such judgments, that the student's true score is contained within the interval of 27% - 100%.  Notice that such results on a test with few items provide virtually no useful information for decision making.  The same relative performance on a 100-item test would result in a 95% confidence interval of 54% - 74%.

Estimated Standard Errors of Test Scores
 

Number of Items
 
 
Standard Error*....
 
Exceptions - Regardless of the length of the test, the standard error is:
<24
2
....0 when the score is zero or perfect
24-47
3
.....1 when 1 or 2 percentage points from 0 or 100%
48-89
4
.....2 when 3 to 7 percentage points from 0 or 100%
90-109
5
.....3 when 8 to 15 percentage points from 0 or 100%
110-129
6
 
130-150
7
                    
* Standard errors are in raw score form.  Items are assumed to be scored dichotomously (i.e., 0 or 1).
        
Reference

Eaves, R.C.  (1979).  Some simple statistics for classroom use.  Diagnostique, 4, 3-12.

........................................................................................................................................
One of a series of documents prepared by Auburn University special education faculty
as contracted by the Alabama State Improvement Grant to promote positive change in the public schools. 
officers  |  membership | conference | publications | constitution | cec mission | related links  | afcec home
for more information: riceric@auburn.edu

spacer