Thoughts on Teacher Developed Tests

© Robert A. Buckmaster 2022

When teachers think about testing they often have in mind such tests as Cambridge First or Advanced (the old FCE and CAE), or IELTS, or course book tests.

This is not the best model to follow when you want to create your own tests though. Tests like these are industrial scale tests (First, Advanced or IELTS) or imitative of industrialist scale tests (coursebook tests).

Industrial scale tests are designed to be given worldwide to large numbers of candidates and to be marked as efficiently (and as cheaply) as possible. Test security and standardization of test taker response – through multiple choice or cloze type items, for example – are key here. The use of live raters (expensive!) for the writing and speaking part has even been eliminated in Pearson tests, for example.

The only real direct testing of performance is in the writing and speaking parts. The reading paper, listening paper, and use of English paper (if there is one) are indirect and partial tests of candidate performance, with a large element of correlational inference i.e. if they can get this answer correct then they are probably level X, and we will combine this with the other evidence from the other questions to give your our opinion of their ability. Given enough questions we will be able to infer their level and performance ability (A, B or C) at this level. The partial picture gained from discrete item testing allows the test board to infer the level of the candidates. It is only an inference though from partial data. Because getting more information is more time consuming and more costly.

So, the data gained from such tests is only a partial picture, but this is not reason enough to reject such tests as the model for teacher developed tests. There are two other reasons why such tests are a bad idea for teachers. The first is that there are complex statistical procedures which need to be undertaken before you can say with any degree of confidence that your test items are valid and therefore that your inferences about the candidate performance as reasonable (but still might be wrong). Most teachers are not capable of carrying out such procedures and do not have the statistical tools anyway. The second reason is that it is very difficult to write good multiple choice items, for example, and then the items need to be trialed and tested and statistically analyzed. This is all beyond the capabilities of most teachers to do. And if you were to learn to do it, then you’d be an industrial tester. Teachers do not need to do this. There is another way.

The other way is to lean towards teachers’ strengths. Instead of spending time and effort in the attempt to become an industrial tester we should use our existing skills of setting up communicative tasks and evaluating student performance on these tasks. This is what you do all the time when monitoring tasks. If you ask the students to do a communicative task in pairs and then monitor them, and make a note of errors, then you are acting as a communicative tester-evaluator. And this is what we want.

You should think of communicative tasks which require the students to use English to get something done, appropriate to their level, ask them to do it and then evaluate their success. That’s it. That’s all you need to do in terms of testing as a teacher.

The tasks might require some reading or listening first and then a speaking and/or writing task. The success of the reading or listening element will be clear from the speaking or writing part. These will be integrated tests. Or they might just be a speaking or writing task.

Either way the task should be modeled on a real-life task which the students should be able to do. That way you do not need to do any pre-testing or statistical analysis of items or make inferences. You will have clear, direct evidence of the students’ actual ability to do real life tasks in English.

Do not aspire to be an industrial tester. Unless you want to take on Cambridge or Pearson. Embrace your teaching skills and become a communicative tester-evaluator.