Test Evaluation Outline

OUTLINE FOR THE EVALUATION OF A TEST
John Willis, Ed.D. & Ron P. Dumont, Ed.D., NCSP

1. Manual
Does a manual accompany the test?
Adequacy
Is there a separate technical manual and at what cost?

2. Stated purpose of the test?
Definition of Construct
“Dumbed Down Tests” (tests designed for adults or adolescents redone for children)

3. Does the name of the test reflect the test content?
Do the names of the Individual Subtests (where applicable) reflect the content?

4. Form(s) of the items: (Oral, Hands-on, Multiple-choice, Fill-ins, etc.)
Are there problems with this form or content?
Is scoring ambiguous?

Do the items appear to measure what was intended? (e.g., Do reading items really test memory?)

5. Basis of the arrangement of the items in the test?
Subtests
Scales
Spiral Omnibus
Random
Hierarchical
Homogeneity: Changes within subtests
Distinctness
Sexism and other biases

6. Printing, format and arrangement of test items.
Easels and other hardware
Color use: does it help or hurt?
Readability

7. Protocols
Room to write
Answers to examinee
Report forms
Clarity
Ease of use
Do they encourage use of confidence bands? Do they offer 90% and 95% bands?

8. Directions for administration
Clarity and adequacy?
Location (manual/protocol/both)
Flexibility
Age appropriateness

9. Directions to the examinee
Clarity and adequacy
Natural or Stilted
Boehm’s basic concepts
Alternative directions

10. Time limits and bonuses?
Are they justified?
Are there alternatives?

11. Teaching items?
Scored or unscored
Adequacy of instructions
Can you teach over and over?

12. Test materials
Child safety
Ease of use
Durability

13. Scoring
Is scoring easy? objective? subjective? arbitrary? agreed upon?
Are there adequate samples of correct answers?
Rotation errors: differences on tests
Are printed norms tables also available?
Is computer program necessary?
Is computer program provided?

14. Raw scores conversions
Interpolation
Which standard scores are reported?
Age scores: Why/why not
Grade scores: Why/why not
Percentiles
Standard scores:
Z
T
Stanines
Deviation quotients (M=100, s.d.=15 or 16)
Others

15. Standardization groups?
Total
Number per year of age
National representation
Breakdowns

16. For what groups is the test designed?
Recent
Relevant
Representational
Age
Grade
Sex
SES
Education
Geographic regions
Urban vs. rural
Ethnicity
Disabilities

17. Reliability coefficients
Internal (split halves)
Alternate forms
Test retest
practice effect
inflation of r
Length of test
Test retest interval
SEm
SEest
Inter-rater reliability

18. Validity
For what purpose?
Content
are the questions appropriate ?
are there enough questions?
level of mastery being measured?
Criterion
concurrent vs. predictive
Construct
Discriminant use vs. divergent use

19. Factor analysis
Exploratory
Confirmatory
Rotations
Different groups
Variance
Common
ErrorSpecificity

20. User friendliness
Administrator
Client: Take it yourself

21. References
Antiquity
Authors of bibliography
Relevance to current edition

22. Interpretation
Base rate
Definitions for constructs and shared abilities
Multiple comparison tables (critical values)
Significance vs. abnormality (unusualness vs. importance) (scatter)
Testing the metaphysically handicapped (dead)
What a difference a day makes
Table Games
Floor and Ceilings
Descriptive terms
Errors
Cautions