OUTLINE FOR THE EVALUATION OF A TEST
John Willis, Ed.D. & Ron P. Dumont, Ed.D., NCSP
1. Manual
Does a manual accompany the test?
Adequacy
Is there a separate technical manual and at what cost?
2. Stated purpose of the test?
Definition of Construct
“Dumbed Down Tests” (tests designed for adults or adolescents redone for children)
3. Does the name of the test reflect the test content?
Do the names of the Individual Subtests (where applicable) reflect the content?
4. Form(s) of the items: (Oral, Hands-on, Multiple-choice, Fill-ins, etc.)
Are there problems with this form or content?
Is scoring ambiguous?
Do the items appear to measure what was intended? (e.g., Do reading items really test memory?)
5. Basis of the arrangement of the items in the test?
Subtests
Scales
Spiral Omnibus
Random
Hierarchical
Homogeneity: Changes within subtests
Distinctness
Sexism and other biases
6. Printing, format and arrangement of test items.
Easels and other hardware
Color use: does it help or hurt?
Readability
7. Protocols
Room to write
Answers to examinee
Report forms
Clarity
Ease of use
Do they encourage use of confidence bands? Do they offer 90% and 95% bands?
8. Directions for administration
Clarity and adequacy?
Location (manual/protocol/both)
Flexibility
Age appropriateness
9. Directions to the examinee
Clarity and adequacy
Natural or Stilted
Boehm’s basic concepts
Alternative directions
10. Time limits and bonuses?
Are they justified?
Are there alternatives?
11. Teaching items?
Scored or unscored
Adequacy of instructions
Can you teach over and over?
12. Test materials
Child safety
Ease of use
Durability
13. Scoring
Is scoring easy? objective? subjective? arbitrary? agreed upon?
Are there adequate samples of correct answers?
Rotation errors: differences on tests
Are printed norms tables also available?
Is computer program necessary?
Is computer program provided?
14. Raw scores conversions
Interpolation
Which standard scores are reported?
Age scores: Why/why not
Grade scores: Why/why not
Percentiles
Standard scores:
Z
T
Stanines
Deviation quotients (M=100, s.d.=15 or 16)
Others
15. Standardization groups?
Total
Number per year of age
National representation
Breakdowns
16. For what groups is the test designed?
Recent
Relevant
Representational
Age
Grade
Sex
SES
Education
Geographic regions
Urban vs. rural
Ethnicity
Disabilities
17. Reliability coefficients
Internal (split halves)
Alternate forms
Test retest
practice effect
inflation of r
Length of test
Test retest interval
SEm
SEest
Inter-rater reliability
18. Validity
For what purpose?
Content
are the questions appropriate ?
are there enough questions?
level of mastery being measured?
Criterion
concurrent vs. predictive
Construct
Discriminant use vs. divergent use
19. Factor analysis
Exploratory
Confirmatory
Rotations
Different groups
Variance
Common
ErrorSpecificity
20. User friendliness
Administrator
Client: Take it yourself
21. References
Antiquity
Authors of bibliography
Relevance to current edition
22. Interpretation
Base rate
Definitions for constructs and shared abilities
Multiple comparison tables (critical values)
Significance vs. abnormality (unusualness vs. importance) (scatter)
Testing the metaphysically handicapped (dead)
What a difference a day makes
Table Games
Floor and Ceilings
Descriptive terms
Errors
Cautions
Content on these pages is copyrighted by Dumont/Willis © (2001) unless otherwise noted.