Evaluation and improvement

A test evaluation helps assess and improve the quality of your test.

When grading and assessing tests, you gain insights into which questions students found difficult and on which assessment criteria they scored poorly. In essence, you’re already evaluating the test and your teaching. Below are more structured ways to do this.

How do I evaluate and improve my tests?

Analyse common mistakes so that you know the areas students struggle with, and adjust your teaching, test questions or assessment criteria in a future course.m
Ask students for feedback, for instance, through a short (online) survey or by adding a few evaluation questions at the end of the test. Ask if the questions were clear, whether the test was a good reflection of the course content, and request suggestions for improvement.
Peer assessment can also serve as an evaluation tool. Let students assess each other using your grading rubric. This helps you test if the rubric is clear and useful. Together with students, you can refine the model, giving them a sense of ownership over the assessment process.
Based on the test results, you can determine the pass rate and examine the average score of the top 5% of students. If this group didn’t answer nearly all the questions correctly, the test might have been too difficult. In such cases, you may consider adjusting the cutoff score based on the top 5% of results.
When evaluating different test questions or components, consider:
The discriminating power of a question or section: if students with high overall grades score poorly on a specific question or section, there may be an issue with that part of the test.
Sections that are answered well may be too easy. You could consider giving them less weight next year or testing at a higher level. Sections that are poorly answered may be too difficult or not sufficiently covered in the teaching. However, you might have included these difficult sections intentionally to challenge the best students.

Indicators for test quality

In tests with open and closed questions for large student groups, you can evaluate the quality of the test or individual questions using numerical indicators. Commonly used indicators include:

Test reliability (Cronbach's α)

Cronbach’s α measures the internal consistency of the questions and predicts the test's reliability. It answers the question of how consistently students would score on two equivalent tests.

Question difficulty (P-value)

The p-value represents the percentage of students who correctly answered a particular question or met the criterion.

Discriminatory power (Rit-value)

In addition to analyzing the p-value, the discriminatory power of each item can be measured using the Rit-correlation coefficient. A question has high discriminatory power if students with high final scores answer it correctly, while students with low scores answer it incorrectly.

Digital test systems like Ans often generate a statistical analysis of the exam automatically. LLInC can help you interpret this data.

See also:

More on numerical or psychometric test analysis: Appendix IV of the "Tips for Tests" brochure and Chapter 3 of "Quality Assurance in Testing" (in Dutch).
Testing and Test Analysis (De Gruijter, 2008, in Dutch).

Human Resources

Finance & Procurement

ICT

Buildings & Facilities

Education

Research

Communications & marketing

Security

Select a different organisation