Before deciding to construct a test, one needs to know what information is required, how quickly it is needed, and the likely actions that are to be taken according to the results on a test. The crucial question is, What information is needed about student achievement? A second important question is. Can we afford the resources needed to gather this information?According to Alderson in Jabu (2008) the process of test construction should cover test specification, item writing and moderation or editing, pre-testing or trialing and analysis, validation, posttest reports, and developing and improving tests.
1. Test specification
A test specification provides the official statement about what the test tests and how it tests it (Alderson, et.al, 1995:9). The specifications are blue print to be followed by test and item writers, and they are also essential in the establishment of the test’s construct validity. The development of test specifications is, therefore, a central and crucial part of the test construction and evaluation process.
Test specification varies according to their uses. The specification must, however, provide appropriate information that a test should cover. Test specifications should include all or some of the following: The test’s purpose, Description of the test taker, Test level, Construct, Description of suitable language course or textbook, Number of section/papers, Time of each section/paper, Target language situation, Text types, Text length, Language skill to be tested, Test task, Test method, Rubrics, Criteria for marking, Description of typical performance of each level, Description of candidates at each level can do in the real world, Sample papers and samples of students’ performance of task
2. Test construction and moderation
Test construction, which is commonly known as item writing, is the next step in test development after test specification have been formulated. The item writing must be based on the test specification, although it is possible to look at past papers. Item writing is the preparation of assessment tasks, which can reveal the knowledge, and skill of students when their responses to these tasks are inspected. Tasks which confuse, which do not engage the students, or which offend, always obscure important evidence by either failing to gather appropriate information or by distracting the student from the intended task.
Terms of test editing or moderation, each item and the test as a whole are considered for the degree or match with the test specifications, likely level of difficulty, possible unforeseen problems, ambiguities in the wording of item and of instructions, problems of layout, match between stems and choices, and overall balance of the subtest or paper.
We do not only need to know how difficult the test items are, but we also need to know whether they work. This may mean that an item which is intended to test a particular structure actually does so, or it may mean that the item succeeds in distinguishing between students at different levels so that the weaker ones. It is possible to predict whether items will work without trying them out.
The number of students on whom a test should be trialed depends on the importance and types of test, and the availability of suitable students. The only guiding rule is the more the better, since the more students there are, the less effect change will have on the result.
4. Test analysis
The test items that have been tried out must be analyzed to see whether they work. This analysis will show us the extent to which each item works. For objective test items, traditionally there were two measures of calculation – the facility value and the discrimination index. For the subjectively marked tests although item analysis is in appropriate, such as summaries, essays, and oral interviews, these tests still need to be tried out to see whether the items elicit the intended sample of language; whether the marking system, which should have been drafted during the item writing stage, is usable; and whether the examiner are able to mark consistently.
The most important question of all in language testing is validity. Test validity can be interpreted as usefulness for the purpose. Since purposes vary, it is important to specify which purpose applies when making a comment about validity. Content validity refers to the extent to which the test reflects the content represented in curriculum statements (and the skills implied by that content). A test with high content validity would provide a close match with the intentions of the curriculum, as judged by curriculum experts and teachers.
Validation process involves the terms internal and external validity, with the distinction being that internal validity relates to studies of the perceived content of test and its perceived effect, and external validity relates to studies comparing students’ test scores with measures of their ability gleaned from outside the test.
6. Public and user trial
The test that have been constructed, tried out, and analyzed should also be evaluated by public, especially the future users of the tests. The tests are presented to the future users and they analyze the tests, give comments or suggestions to the improvement of the tests, and approve the tests.