Senzoku Ronso, March 2001
Maiko Hata maihata@hotmail.com http://www.geocities.com/maihata/Newindex.html
Thomas Delaney thomas_delaney@hotmail.com
The Society for Testing English Proficiency test (the STEP test), known in Japan as Eiken, was first administered in 1963. It is published by the Society for Testing English Proficiency, Inc. (a non-profit foundation), which is fully authorized by the Japanese Ministry of Education, Science, and Culture. It was in 1968 that the Ministry officially approved the test and began publicly promoting it as an important element of the education process. The test, although fully designed by STEP, Inc., is composed of items that have been (and continue to be) solicited from junior high school, high school, and college teachers. Additionally, business people who use English in their work and native speakers of English continue to be consulted. (However, to what degree they play a part in the construction of the tests is not clear.) Since its inception the test has been administered to approximately forty million people.
In the beginning years, the STEP consisted of only three grades (as it is termed), first, second, and third. However, a fourth grade was added in 1966, a pre-first and fifth grade in 1987, and a pre-second in 1994. There are now a total of seven levels. In 1997, the tests were slated for revision and the format was changed slightly in regards to distribution of items. An additional test was also planned by STEP, Inc. that takes into account the use of gestures. However, for the sake of this discussion we will address the pre-1997 version of the exam since data for these versions is more available.
The purpose of the test, as promoted by STEP, Inc., is to measure the four skills—reading, writing, listening, and speaking—in non-native speakers of English living in Japan. Accordingly, the examination is claimed to judge one’s ability to communicate in English. The test itself has a variety of purposes, including determination of English level for university purposes (a high level on the examination will entitle the student to a waiver of English requirements and extra units) and job acquisition. In a questionnaire administered by STEP, Inc., 80% of the colleges and universities surveyed believed that it would be advantageous for the students to pass at least the pre-first or second grades of the test for the purposes of job hunting. 70% of the companies questioned said that they consider the STEP to be an indicator of English proficiency. Although the test is not used for international purposes it could also serve as a predictor of how the same examinee might score on the TOEIC or TOEFL. For example, passing the first grade of the STEP test (the highest level) indicates that the same student would score at least 830 on the TOEIC and at least 585 on the TOEFL (see appendix A).
One of the information booklets provided to us by STEP, Inc. claims that the STEP test has had great success in Japanese society and is of great importance. Social prestige is even attributed to the successful passing of the various levels of the exam. Further, STEP, Inc. claims to have nationwide trust. However, if this at first sounds like hyperbole, it is anything but. Tremendous importance is indeed placed on the STEP test and accordingly, future success can sometimes hinge upon the test results. When asked, several Japanese students at the Monterey Institute of International Studies, a graduate school in California, expounded upon the importance they themselves has placed on the exams and the pressure they had felt to succeed. Additionally, Americans who have lived and taught in Japan agreed, collectively stating that the STEP test is not something to be taken lightly. Since this test is the major instrument for evaluating English proficiency in Japan it is indeed massive.
The STEP test is administered three times a year at 400 locations in 250 cities in Japan. The second stage (the interview portion) for the first grade is given only in eight major cities; this is because there is, comparatively speaking at least, considerably less of a demand to take the highest level test. These test-taking locations are referred to as “main places.”[1] STEP, Inc. sends people to administer the tests at these sites. All of the second stage segments (the interviews) are administered only at the main places. On the other hand, “semi-places” are locations such as schools or companies which meet STEP, Inc.’s requirements of a minimum of thirty examinees for on-site self-administered tests. (In the case of a lack of thirty individuals the school or company must send their examinees to a main place.) At semi-places, teachers or employers administer the test under the instructions included in the test packet. The advantage for schools or companies of giving the test at the semi-places is that they can obtain the actual examinee scores. These are not available if the test is taken at a main place. Instead, they receive an A, B, or C grade if they should fail (A being the closest to the passing line). Another advantage for the schools and companies is that they can use their own equipment or people, therefore providing the test takers with a familiar, low stress (or at least lower stress) environment. The advantage of using semi-places for STEP, Inc. is that they avoid the expense of having to provide additional proctors.
The STEP test instructions given to the test administrators at semi-places contain specific regulations that they are strongly advised to follow. These include the importance of giving the test on the determined date and time (so that test information can not be forwarded to examinees not having yet taken the test), taking extreme care to safeguard the actual test materials prior to the day of testing, determining the functionality and sound quality of the tape recorder before its actual use, and familiarizing the proctors with the administration of the exam prior to the day of testing. Basically, the test administrator has little more to do than control the physical conditions. However, it is still possible for the test administrator to see the test prior to the actual day of administration and pass this information along to the test takers. This could happen in a situation where the tests are being administered by schools or companies giving test to their students or employees. Such a scenario is not impossible since sometimes the face validity of the test is bigger than the value of the actual ability being measured.
The test itself is administered in such a way that the proctors give some general instructions about the length and how to mark the answer sheet, but they do not read the instructions of individual activities to the examinees. Rather, the examinee proceeds at his or her own pace, reading the instructions for each subsequent section as the section is begun. The answers themselves are written on forms similar to the scantron forms used for testing in the United States.
Of the seven grades of the STEP test, the highest five actually consist of two stages (appendix B). The first stage is fairly uniform at each level, the test evaluating vocabulary, idioms, grammar, usage, composition, reading comprehension and listening comprehension. An overall score is computed which determines passing or failing. All of the test items are of the objective type. The only exception to this uniform construction is that the first grade exam also includes a writing section. (In the case of this paper we looked at the pre-first grade exam.) The first stage and second stage examinations are not taken on the same day and in fact a period of one month intercedes. The examinee is not allowed to take the second stage of the exam unless the first stage has been passed. Inversely, if the examinee passes the first stage but fails the second, the second stage can be retaken without having to redo the first. Even if only the second stage is taken the examination price remains the same.
The second stage of the exam concentrates primarily on the speaking ability of the examinees. Although we do not evaluate this segment of the test, it is important to briefly explain its make-up. In terms of the pre-second, second, and third grades, the second stage is an interview-style test consisting of the reading aloud of a passage and responding to questions posed. In addition, the examinee’s overall understanding is tested by requiring him or her to give a verbal summary. The pre-first grade second stage exam is also an interview-style test, although in this case the examinee is presented with a four-part picture sequence and is expected to give answers to spoken questions about the story. Finally, the second stage of the first grade exam is in a panel format in which the examinee is graded by two examiners (one a native-speaker) in the areas of content (persuasiveness, logicality, and comprehensiveness) and delivery (pronunciation, vocabulary, grammar, and fluency). The second stage of the first grade exam also has a listening comprehension section (this is in addition to the listening comprehension from the first stage).
We will now take a more detailed look at the components of the first stage exam (in this case, the pre-first grade exam specifically). The test consists of four multiple-choice reading sections and one multiple-choice listening section, each section made up of one or two parts. The reading section, a total of ninety questions, is self-paced—the examinee has ninety minutes to answer the questions. The listening section is paced according to the speed of the cassette and takes twenty minutes. Section one consists of thirty completion items which test vocabulary, idioms, and grammar. Examinees are required to select the most appropriate answer (among four) to complete the sentence. Section two consists of ten items: The first five involve ordering words in a sentence; the next five involve ordering sentences in a paragraph (see appendix C for example items from sections one and two). Section three consists of an excerpt from a text with words and short phrases missing. Examinees select the appropriate answers which best create a fluid text. Section four includes two excerpts, although in this case the examinee is required to choose the appropriate answer to question based on the theme of the passage. In regards to the listening section, in the first exercise the test takers hear a sentence or a series of sentences and are required to locate the picture (among four) which best illustrates the information given in the cassette. In the second exercise the examinees again hear a sentence or series of sentences, although here they are expected to choose the most appropriate from the four given.
Norm Versus Criterion Referencing
The lack of clear-cut information about the scoring procedure makes this a difficult issue to address concretely. The test itself is actually very much geared towards testing the information the examinees should have learned in high school or at a university. IN this regard, it is criterion referenced. However, because the passing line is given only in approximations (e.g., approximately 70% is needed to pass the pre-first grade exam) one can deduce that the score needed varies from test to test depending upon fluctuations in its difficulty. Although it is not known how this passing line is adjusted one can only deduce that it is the result of some sort of processes of norm referencing.
Even though “analysis of the level of difficulty and discriminatory power of multiple choice test items… and factor analyses to enable the appropriate measurement of various abilities” are conducted by STEP, concrete information regarding validity and reliability is not available to the public. We were given no clear reason by STEP, Inc. why this information was unavailable, merely that it was not their policy to disclose this information. As seen above, it was stated that factor analysis was performed on the tests items. However, clarification beyond this was not given.
In order to further our understanding of the STEP test, we decided to administer the test ourselves. Our two subjects were native speakers of Japanese and both second semester students in the Master of Arts in Teaching English to Speakers of Other Languages (TESOL) program at the Monterey Institute of International Studies. Since admission to this program requires a 600 score on the TOEFL test and both had been in the United States for around four months, it seems safe to call our subjects advanced speakers of English. Both subjects had taken the pre-first grade of the STEP test before and one had passed and one had failed. According to the comparison of levels of standardized tests (appendix A) a student scoring 600 on the TOEFL should pass the pre-first grade without difficulty (a passing score being merely 529 on the TOEFL). Thus, we were led to believe that the examinees would have few difficulties.
Seeing that this is a rather lengthy exam, we decided to ask the volunteers to work on only three sections. We chose twenty questions from the first section, which is sentence completion based on vocabulary, idioms, and grammar, and ten questions from the second section, five requiring word ordering in a sentence, and five requiring sentence ordering in a paragraph.
Examinees were seated in a classroom, given the test booklets and asked to begin after being given simple reminders about not talking during the test. We allotted forty-five minutes for the examinees to finish, although they each finished in about thirty minutes. A grading of the exam revealed that each subject missed seven out of thirty questions, a surprisingly high number or errors considering the level of the test. Although at this percentile they would still pass (barely though) we expected fewer errors. Also interesting was the fact that almost all of the questions missed (with the exception of one) occurred in the first section. The two examinees had an overlap of three missed questions.
A possible explanation to this phenomenon can be found in the examinees’ comments. After taking the exam we asked them to briefly describe to us their opinions towards the exam. Both of them agreed that the first section was “difficult” and “pointless.” One of the problems they said is that it is not English in context; it is merely one or two isolated sentences. Also, they said that the information being tested was too specific. Although they did not doubt that it was testing knowledge of vocabulary it certainly was not testing ability to communicate. Their final comment was that it was really boring. However, the fact that they were only volunteers might explain their lack of interest in the material.
One of the weaknesses of this test is that of appropriateness. Because the test has very high face validity, people use the test for many purposes. “Communicative proficiency” is the ability that STEP, Inc. claims they are measuring. However, it seems to measure mainly the grammatical knowledge that the entrance exams for the universities measure. Originally, this test was established “to popularize and improve the level of practical English in Japan.” They now claim it measures the ability to communicate. Nonetheless, since the face validity of this test is extremely high, using this test as a goal in it instead of as an indicator of English proficiency can be found throughout Japan. Also, prospective employers use this test as an indicator of whether the applicants actually studied while attending colleges or universities. (In Japan it is commonly acknowledged that university students do not study. Passing a pre-first or first grade exam would indicate that the individual had, on the contrary, studied—or had at least studied English. This is because the STEP exam tests material, which is normally taught in college level courses.)
STEP, Inc. claims that they test the “four skills necessary for communication: Reading, writing, listening, and speaking” in order to measure the ability to communicate. However, this testing of the four skills happens only when the test taker passes the first stage written test because speaking is only tested in the second stage interview test. (As mentioned earlier, the examinee must pass the first stage in order to move on to the second.) This means that if the test takers are more proficient in speaking than in reading, writing, and listening, they will have a disadvantage. And even though they claim the ability they measure is the ability to communicate, for the vocabulary section, idiom section, and sometimes even for the listening section, the passages are given in single sentences for each question. This is certainly not the best way to measure the ability to communicate since there are few communication skills needed to solve these problems. For these reasons, we believe that the STEP test is generally lacking in construct and content validity. The exception to this would be an individual taking both stages of the first grade exam. (Having taken both stages of the first grade exam indicates that the individual had been tested in all four skills. None of the other levels can claim this, even when both stages have been taken.)
Another weakness is that of predictive validity. The proficiency they claim for each level does not reflect the real level—the actual level is much higher. For example, they claim the following abilities about an individual passing the pre-first grade test:
Successful examinees are able to understand and use English well enough to listen to speeches and express their opinions to others in public. (Junior-college graduate level or the sophomore year at a four-year college; appropriate for junior college students and adults in Japan).
However, the two volunteers who took sections of this same exam had some difficulty, scoring just above the passing line. This is surprising considering that their TOEFL score would suggest a higher score. On the other hand, twenty-four companies out of forty-three which answered a questionnaire given by STEP, Inc. indicated that they give some advantage to students with second grade STEP certificates. Seventeen out of forty-three indicated that they give some advantages to students with pre-first of first grade certificates.
STEP, Inc. claims that the test uses authentic materials. This is partially true, such as in the reading comprehension and vocabulary sections. The reading passages are taken from newspapers and magazines. Vocabulary and usage items are also taken from newspapers, radio and TV broadcasts, letters, and signs. However, most of the items for the vocabulary and idioms are given only in individual sentences with very little context. Also, the pictures used in the listening and interview tests (second stage) are very artificial (cartoons designed especially for the test).
Another weakness is that too much confidence seems to be placed in the STEP test. There is not much concern about reliability or validity even though this is the most commonly administered English proficiency test in Japan.
One of the strengths of the STEP test is that it has high face validity. As stated earlier, this could also be a weakness. However, high face validity allows for an aura of privilege for those that do pass. For example, classes might be waived, additional units granted, and higher pay given (by companies). These advantages are only because of high-established status given to the STEP test.
The various levels of the STEP test allow the test takers to choose the level, which they feel will best meet their proficiency level. Also, because examinees can take two STEP tests of contiguous levels (e.g., pre-first and second level, second and pre-second level) on the same day at the same location, they can attempt a wider level. This can be useful when the examinees are unsure at which level they might be.
Although this test is administered only three times a year, it is highly available. Application forms for the test are available at 2700 bookstores for free, or by calling STEP, Inc. To apply, applicants go to the bookstores and pay the fee. They can also use the postal money order system and send it with the application form to STEP, Inc. directly. Test takers can choose the place to take the test from approximately 400 places in 250 cities. Also, as was previously mentioned, if the test takers fail in the second stage interview test, they can attempt the second stage again during the following administration without having to take the first stage written test. This is convenient.
Finally, the fee to take the exam is actually relatively low.
It is obvious that the STEP test does not measure the four skills they claim since the opportunity is not provided for the test takers to demonstrate their writing (with the exception of the first grade test). Also, the opportunity is not provided for the test takers to speak until the pass the first stage test. The STEP test is not necessarily communicative and often too specific. We do not recommend this test for the purposes STEP, Inc. claims, a measurement of communicative ability. If the opportunity to write were to be provided in all grades, STEP would measure three skills—writing, reading, and listening. By providing the opportunity to speak in the interview test regardless of performance on the written test, it could measure all four skills.
Despite all the weaknesses we have pointed out, the STEP test is a well-organized, carefully structured exam. The high face validity can be appreciated. We should keep in mind that this test is not for ESL students, but for EFL students in Japan. This test can be a good indicator of the English ability students in Japan learn in schools, and most of the time that is exactly what universities and companies want to know. For this purpose, the STEP test can be usefully administered.
References
The Society for Testing English Proficiency, Inc. (1996). Information Pamphlets. Tokyo, Japan: STEP, Inc.
Appendix A
Comparison of Levels of Standardized Tests
STEP |
TOEIC |
TOEFL |
BEST |
1ST Grade |
830-860 |
585-600 |
140 |
Pre-1st Grade |
670-830 |
529-585 |
121-140 |
2nd Grade |
545-670 |
486-529 |
108-121 |
Pre-2nd Grade |
450-545 |
453-486 |
95-108 |
3rd Grade |
X-450 |
X-453 |
X-95 |
Appendix B
Passing Line
|
First-stage |
Second-stage |
First Grade |
Approximately 70% |
60% |
Pre-First Grade |
Approximately 70% |
60% |
Second Grade |
Approximately 65% |
60% |
Pre-Second Grade |
Approximately 65% |
60% |
Third Grade |
Approximately 65% |
60% |
Fourth Grade |
Approximately 60% |
---- |
Fifth Grade |
Approximately 60% |
---- |
Appendix C.
There are some of the items that we gave to our test takers. We gave twenty items from section 1, five from each of section 2 and 3.
Section 1.
In this section, choose the best answer from among the four alternatives for each blank, and mark the answer sheet accordingly.
(1) After the storm destroyed the tent, the circus hurriedly erected a ( ) on the village green.
1. make-believe 2. makeup 3. makeshift 4. made-to-measure
(2) Tom is a great gambler, and he made a fortune by ( ) in stocks.
1. speculating 2. meditating 3. discriminating 4. devising
(3) I need a quiet room, a secluded corner, or some similar ( ) where I can occasionally be alone and relax.
1. secrecy 2. haven 3. privacy 4. parlor
Section 2 [A].
To complete each conversation, put numbers 1 to 5 into the best order. Then mark the choices on the answer sheet that show the numbers for the 2nd and 4th words or phrases.
A: Mmm… I’ve never tasted anything this good!
B: Yeah. But I suppose (1. if we 2.
wouldn’t 3. taste so good 4. it
5. hadn’t been hiking ) for six hours.
Section 2 [B].
To make a paragraph, put sentences A to D into the best order. Then mark the choice on the answer sheet that shows this sentence order.
A: I admire this simple display of public spirit which has become such an integral part of her daily routine.
B: Every morning, while cycling to the office, I see an elderly woman on the bank of the Tama River.
C: Her quick movement suggests that she is completely accustomed to this task.
D: I have noticed several times that she picks up empty soft drink cans on the grassy river bank.
1. D-B-C-A 2. D-C-B-A 3. B-C-D-A 4. B-D-C-A
.