European Journal of Educational Research Developing Assessment Instrument Using Polytomous Response in Mathematics

: This research is a developmental research aiming at developing a good mathematical test instrument using polytomous responses based on classical and modern theories. This research design uses the Plomp model, which consists of five stages, (1) preliminary investigation, (2) design, (3) realization/construction, (4) revision, and (5) implementation (testing). The study was conducted in three vocational schools in Lampung Province, Indonesia. The study involved 413 students, consisting of 191 male and 222 female students. The data were collected through questionnaire and test. The questionnaire was used to identify the assessment instruments currently employed by teachers and to be validated by the experts of mathematics and educational evaluation. The test used an open polytomous response test numbering of 40 items. The data were analyzed using both classical and modern theories. The results show that (1) the open polytomous response test has a good category according to classical and modern theory. However, the discrimination power of test items in classical theory needs several revisions, (2) the assessment instrument using the polytomous response of open multiple choice can guarantee information on the actual competence of students. This is proven by the fact that there is a harmony between the analysis result obtained from classical and modern theory from the students' arguments when giving reasons for their choices. Therefore, the open polytomous response test can be used as an alternative to learning assessment.


Introduction
Assessment is an important activity that needs to be administered by teachers in schools. The conventional paradigm often interprets assessment as a way to find out the achievement of student learning outcomes as a whole, so that the assessment is positioned as a separate activity from the learning process (Syaifuddin, 2020). Referring to the current paradigm, assessment in schools was divided into three types, e.g., assessment as learning, assessment for learning, and assessment of learning (Wulan, 2018). The three types of assessments aim to provide recognition of the achievement of student learning outcomes after the learning process (Earl, 2013). Below is the assessment pyramid ( Figure 1).

Figure 1. Assessment Pyramid
Assessment can be administered through a test. A test is a tool or procedure used to find out or measure students' abilities in particular areas with specific rules (Arikunto, 2012). A test consists of two types, namely multiple-choice and essay. A multiple-choice test is a form of assessment in which each item provides options, and one of the options is the correct answer. An essay test is a form of assessment that requires answers in sentences or words. Each type of test has its own strengths or weaknesses. The strengths of multiple-choice test over essay test are firstly, the test can be conducted for In the 1980s, the first test that was focused on and developed by experts was the closed polytomous response test, also known as the two-tier test (Treagust, 1988). This test consists of two levels: the first is choosing answers on the multiplechoice test, and the second level is choosing reasons based on the answer choices at the first level (Chandrasegaran et al., 2007). Several studies on the closed polytomous response test have been carried out, such as a test on mathematical ability in middle school (Hilton et al., 2013;Rovita et al., 2020), a test on calculus material (Khiyarunnisa & Retnawati, 2018), test on higher order thinking mathematical skills (Sundari, et al., 2021), and test on mathematical connection material (Lestari et al., 2021). Although the closed polytomous response test has been widely developed, researchers have found weaknesses in the test, such as students' misconceptions or students' actual competence cannot be identified in detail (Antara et al., 2019), the test instrument is difficult to construct (Khusnah, 2019), and student answers are still guessed (Myanda et al., 2020). However, there are strengths in the closed polytomous response test, such as the consistency of student answers errors, which is easily observed (Treagust, 1988), and the suitability between the student's answer choices and the reason is easy to know (Diani et al., 2019).
To reduce the weakness of the closed polytomous response test, the experts modified the test into an open polytomous response test. The open polytomous response test is a form of multiple-choice test that provides a place to write arguments for the answer choices (Retnawati, 2014). The studies on the open polytomous response test that have been carried out are test on calculus material in universities (Yang et al., 2017) and test on mathematics material in senior schools (Ayanwale, 2021) and junior schools (Falani et al., 2020). These studies developed open polytomous response tests for students in college and senior or junior school. Students in college and senior or junior school learn mathematics as a primary subject, while students in vocational schools learn mathematics as a secondary subject (Oktaria, 2016). In addition, students in vocational schools are more oriented towards practical abilities and skills, in contrast to students in college or high school who are more academically oriented, including in Indonesia (Permendikbud, 2016).
Currently, the Indonesian government expects that vocational schools is not academically left behind especially in mathematics. The government's commitment is to improve the method of assessing student learning in vocational school, and the current assessment method is using polytomous test. Often, students pay less attention during math exams for several reasons, such as considering mathematics as an unimportant subject (Putri et al., 2017), mathematics as a complicated subject (Vani et al., 2019), and mathematics as a boring subject (Ikmawati, 2020). Therefore, the students tend to answer the test by guessing. To avoid the tendencies, it is necessary to develop a polytomous response test (closed or open). By considering the disadvantages of the closed polytomous response test, it is reasonable to conduct research by developing open polytomous response test for students in vocational schools.
The test instrument developed must be reliable as a good test, and therefore it is necessary to analyze the quality of test items (Rosidin, 2017). There are two theories for analyzing the item quality, namely classical and modern. Classical theory is a measurement theory for assessing test based on the assumption of measurement errors between actual results and observations, and from the assumption, a formula for calculating the level of difficulty and item discrimination was developed (Hambleton & Jones, 1993). The modern theory is a measurement theory to assess students' abilities by comparing students' abilities with their group abilities, and it is known as Item Response Theory/IRT (Hambleton & Linden, 1982). Classical theory is widely used by teachers because it is easy to apply. However, this classical theory has a weakness, namely, it cannot separate the characteristics of students and items. The modern theory is a solution to overcome the weaknesses of the classical theory because, in the modern theory, an item does not affect other items (local independence), items only measure one dimension (unidimensional) (Anisa, 2013), and an item eliminates the relationship between respondents and items (parameter invariance) (Saepuzaman et al., 2021). Therefore, experts suggest that the test instrument is accountable; the quality of the items must be good according to the analysis of classical and modern theory (Retnawati, 2014). Therefore, the aim of the research is to develop a good mathematical test instrument using polytomous responses according to classical and modern theories in vocational schools. The research problems are stated as follows: (1) Does the open polytomous response test developed have a good category so that it can be used as an assessment instrument in vocational schools based on classical and modern theory? and (2) Does the open polytomous response test instrument developed provide information on students' actual competence in vocational schools?

Research Design
This research is a research and development model that refers to Plomp's (2013) model, with the research procedure consisting of five stages: preliminary investigation, design, realization or construction, test phase, revision, and implementation (test).

Figure 2. Stages of Research Design
The preliminary investigation stage is to identify the current assessment instruments used by teachers. The design stage is to make the open polytomous response test grid according to the basic competencies of mathematical concepts and to make a quality assessment questionnaire sheet. The realization/construction stage is developing the items test validated by an expert process for the items test. The revision stage is the improvement of items on the test based on expert advice. The implementation (testing) stage is to try out the test on students and to analyze the results of the test.

Research Subject
The subjects of the study were students of vocational schools situated in the province of Lampung, Indonesia. The research sample was determined using a non-probability sampling technique in the form of accidental sampling, which means taking a subject based on a subject that is easy to find and ready to be a respondent (Malhotra, 2006). The selected schools were three schools as representatives, namely the Mitra Bhakti vocational school, the Praja Utama vocational school, and the Ma'arif NU vocational school. The research subjects were 413 students in grade I (male students = 191, and female students = 222), whose mathematical abilities on the National Exam (NE) were categorized as moderate (average 64.67 out of the ideal score of 100). The following are the characteristics of the research subjects in detail shown in Table 1.

Data Collection Techniques
Data were collected using a questionnaire and a test. The questionnaire contains several questions to the teacher about the instrument for assessment used by the teacher and also expert validity of the instrument developed to determine content validity (Suhaini et al., 2021). The instrument was validated by two raters who have expertise in mathematics and educational evaluation. The three aspects assessed by the expert were the suitability of the items with the indicators, language, and alternative answers to the questions. The score validating the instrument follows the criteria as in Table 2 below.
Realization/construction Design Preliminary investigation Implementation (Test) Revision After determining the content validity, the instrument was tested to students. Then, it was continued by determining the validity of the construct and its reliability to ensure that the instrument could be further analyzed.
The instrument used was the open polytomous response test, which consisted of 40 questions on the concepts of sequences and series (arithmetic and geometry), quadratic equations and functions, and matrices. Each item contained five answer choices along with the reasons. Student scores are referred to the polytomous score in the Partial Credit Model, where answer choices and reasons were related (Retnawati, 2014), as shown in Table 3 below. Right 4

Data Analysis
The collected data were analyzed in two stages: (1) questionnaire data analysis (qualitative analysis) and (2) test data analysis (empirical analysis). The following is an explanation of each data analysis:

Questionnaire data analysis (qualitative analysis)
There are two sets of questionnaire data namely, identification of assessment instruments in schools and expert assessment of the assessment instruments developed. The results of the two questionnaires were analyzed descriptively. Specifically, for expert judgment, it was continued with an analysis of expert agreement that used the Gregory index formula (Gregory, 2015), namely: The interpretation of Gregory's formula is that the number V is in the range of 0 to 1. The higher the number V (close to 1 or equal to 1), the higher the value of the validity of an item. Conversely, the lower the number V (close to 0 or equal to 0), the lower the validity of an item.

Test data analysis (empirical analysis)
After conducting the content validity test, the researchers conducted the construct validity and reliability test. The construct validation test used exploratory factor analysis. The instrument is considered to having good construct if the explained Kaiser-Meyer-Olkin (KMO) value is greater than 0.5 (Retnawati, 2014). Reliability test using Cronbach's alpha formula. The instrument is said to have good reliability if the coefficient value of Cronbach's alpha is 0.60 (Arikunto, 2012). If the instrument has good construct validation, further test can be analyzed, namely the level of difficulty and item discrimination. The reason for analyzing the level of difficulty and item discrimination is that they are both preliminary analyses of the assumptions of measurement theory (Hambleton & Jones, 1993). To simplify the process of analyzing the level of difficulty and item discrimination, the Iteman program was used for classical theory and the Winsteps program for modern theory (Sarea & Ruslan, 2019). The Winsteps program was used because it had several advantages; namely, it can analyze polytomous data and calculate the maximum likelihood model using a 1-parameter logistic model (Untary et al., 2020).

Analysis of test data with classical theory
a. The item difficulty level is the percentage of the number of students who answered correctly or incorrectly. If the item has an index of 0.3-0.7, then the item is good; if the item has an index below 0.3, then the item is difficult; and if the item has an index above 0.7, then the item is easy.
b. Discrimination is the ability of a test to distinguish between high-ability students and low-ability students. Discrimination is said to be good if it has an index above 0.3, and if the discrimination index is below 0.3, then the question needs to be revised (Arikunto, 2012).

Analysis of test data with modern theory
a. The item difficulty level is the level of the student's latent trait towards the item. The difficulty of the items determines the ability of about 50% of the respondents who are expected to answer items correctly (DeMars, 2010). An item is said to be good if it has an index of between -2 and +2 (Hambleton & Swaminathan, 1985). If the index is close to -2, then the item is classified as very easy, and if the index is close to +2, the item is classified as very difficult (Retnawati, 2014). In the Winsteps program, the item difficulty level is in the Measure column.
b. Item discrimination is indicated by the slope of the curve on the item characteristics. The item is said to have good discrimination if the slope of the curve is moderate (not too gentle or steep) because if the slope of the curve is too gentle or steep, then the item is not good. Another opinion states that a good index is above 0.4 (Crocker & Algina, 1986). In the Winsteps program, the item discrimination is in the Pt-Measure Correlation column.
According to modern theory, before analyzing the item difficulty level and discrimination, three assumptions must be tested, namely unidimensionality, local independence, and model fit (Hambleton et al., 1991). Unidimensional means that each test item only measures one ability. There are three ways that are often used to test unidimensionality, namely the analysis of the Eigenvalue of the correlation matrix between items, the Stout-test on the unidimensional assumption test, and the index based on the residuals of the unidimensional solution (DeMars, 2010). In this study, the dimensional test used the Eigenvalue analysis of the correlation matrix between items.
Local independence is the state of the respondent's answer to an item that is not influenced by other items. Local independence test by proving that the probability of the respondent's answer pattern is the same as the probability of the respondent's answer to each item. If the unidimensional assumption is accepted, the local independence assumption will also be accepted (DeMars, 2010). Use the Model Fit Test to find out whether the model used is in accordance with the items. Test the fit of the model by measuring the outfit mean square (MNSQ) and PT-Measure. If the outfit's MNSQ value is 0.5 to 1.5 and the Pt-Measure Correlation is positive, it is said that the item fits the model (Linacre, 2012). In addition, the information function and standard error measurement (SEM) are analyzed, which aims to further explain the latent ability as measured by using a test that is expressed through item donations.

Analysis of Questionnaire Data
Based on the results of the questionnaire, it was found that the teacher had never used the polytomous response. As many as 80% of teachers used essay tests and 20% of teachers used multiple-choice tests, with each instrument consisting of 2-5 items. In addition, about 10% of teachers used this assessment as a means for learning improvement, such as improving lesson plans and teaching methods. The results of the questionnaire stated that 90% of teachers who did not use assessment as an improvement in learning were caused by several aspects, such as teachers did not understand assessment (20%), teachers did not know how to analyze assessments (50%), and teachers did not know how to develop good assessment questions (30%). The following is the summary of the questionnaire from the identification data. The results of the two expert assessments showed that the content validation instrument is good. Furthermore, the analysis of expert judgment agreement was obtained as shown in Table 4 below. Based on the results of the assessment in Table 4, it can be concluded that the instrument is valid because the value of V reaches a value of 1. Therefore, the instrument test can be continued. In addition to providing assessments, the experts also provided some suggestions for improvements to the instrument, namely the preparation of questions using the ABCD format (Audience, Behavior, Competence, and Degree), avoiding the use of ambiguous language or statements, improving mathematical concepts, making alternative answer choices that are misleading, and arranging them in order.

Construct Validity
After testing the instrument, it was followed by a construct validity test. The results of the test with exploratory factor analysis are shown in Table 5. .000 Based on Table 5, the explained KMO value is 0.76 (more than 0.5), and it can be concluded that the variables and samples used to allow for further analysis.

Reliability
The results of the estimation of the reliability of the instrument obtained a Cronbach's alpha coefficient value is 0.89 (more than 0.6). It means that the instrument has good reliability so that the analysis of the level of difficulty and item discrimination can be continued according to classical and modern methods.

Analysis of Test Data with Classical Theory
Analysis of test data in the classical way did not require testing assumptions, but the analysis of the difficulty level of the items and the distinguishing power of items could be directly calculated if the validity and reliability have been met. The results of the two analyses are presented in Table 7 below.  Table 7, it was found that all items have an item difficulty level in the index range of 0.3 to 0.7, so they are categorized in the good category. Meanwhile, only two items on discrimination had good categories, and the remaining items needed to be revised. The results indicated that all items were good based on the level of difficulty, but almost all items needed to be revised for item discrimination.

Analysis of Test Data with Modern Theory
The Unidimensional Assumption Test The unidimensional assumption test is the first assumption test with factor analysis. Factor analysis begins by testing the adequacy of the sample to be used in the analysis, constructing a variance-covariance matrix, and then calculating the Eigenvalue. The Eigenvalue was then used to calculate the percentage of explained variance, as well as to describe the scree plot (Retnawati, 2014). The output of factor analysis was the Kaiser-Meyer-Olkin Measure of Sampling Adequacy (KMO) statistic and scree plot. .760 Sig. .000 The unidimensional test is seen based on the cumulative percentage of Eigenvalue and scree plot analysis results. If the cumulative percentage of the first-factor Eigenvalue is greater than 20%, then the unidimensional assumption is fulfilled (Retnawati, 2014). In Table 9, it can be seen that the cumulative percentage of the first-factor Eigenvalue is 20.220%. Because the Eigenvalue is more than 20%, this instrument is proven to only measure one factor or dimension. In addition, the unidimensional test can also be seen on the scree plot, which is based on the number of factors marked by the steepness of the graph with the acquisition of Eigenvalue.

Figure 4. Scree Plot Unidimensional
Based on the scree plot, it is known that the Eigenvalue immediately slope on the second factor. It demonstrates that the developed instrument has only one dominant factor. The results prove that the test kit meets the unidimensional assumption, or in other words, only measures one dominant factor.

Local Independence Assumption Test
The local independence assumption test will be fulfilled if the student's answer to one item does not affect the student's answer to another item. Thus, the score of one item should not be determined or dependent on the scores of other items. This confirms that this assumption automatically proves that students' answers do not affect answers to other items (Retnawati, 2014).  Table 10 shows the results of the variance-covariance values between groups of students' abilities. In the table, it can be seen that the covariance value between the ability interval groups located on the diagonal is small and close to zero. This result shows that there is no correlation, so it can be said that the local independence assumption test is accepted.

Model Fit
The model fit test was analyzed using the Winsteps program. The item requirements are called "fit to the model."If the Outfit MNSQ value is 0.5 to 1.5 and the Outfit ZSTD value is -2 to 2, and the Pt-Measure Correlation is positive, then it can be said that the item fits the model (Sumintono & Widhiarso, 2015). An item is considered fit if one of the conditions is accepted. In addition, it can also be seen from the MNSQ infit of 0.77 to 1.3, but at this stage, the fit of the model is only taken on the MNSQ and Pt-Measure outfit values. Based on the results of the analysis, all items matched the model or fit ( Figure 5).

Figure 5. Item Fit on Model
The Item Difficulty Level The item difficulty level was analyzed using the Winsteps program, and the results obtained were presented in Table 12 (Measure column). The item difficulty level is in the range of -0.70 to 0.84. From Table 12, the highest difficulty level is item 40 (difficulty level 0.84), and the lowest difficulty level is item 1 (difficulty level -0.70). Since the difficulty level is in the range of 2 and 2, it can be concluded that all items are in the good category. If further divided into three categories, the difficulty level of items in the range of -0.7 to 0.84 is moderate (Sumintono & Widhiarso, 2015). It can also be seen on the difficult map items, namely that the difficulty level is in the range of -2 and 2. The item discrimination analysis used the Winsteps program, and the results are presented in Figure 8 (PT-Measure Correlation column). Item discrimination is in the range of 0.23 to 0.74, with details of 27 items having an index above 0.4 (good category), and 13 items having an index below 0.4 (bad category). However, the 13 items can be used as an instrument as long as the index is above 0 (Alagumalai et al., 2005).

Figure 8. Item Discrimination
European Journal of Educational Research1451

Comparative Analysis Between Classical and Modern Theory
The results of the analysis of classical and modern theories obtained the index of difficulty level and item discrimination as follows. Based on Table 11, the level of item difficulty analyzed by classical and modern theory has the same results (good category). However, item discrimination using modern theory has more items in the good category than classical theory. If we compare the index of discriminatory items between classical (Table 7) and modern (Figures 6 and 8), it can be seen that there is a match between the categories of item discrimination. It means that if item discrimination is not good with the classical theory, then item discrimination is also not good with the modern theory (13 items correspond to the item discrimination index, and 27 do not match).

Information and Measurement Error (SEM) Function
The function of information to reveal latent ability was measured by using a test that was expressed through item donation. The test information function is also the sum of the functions of each item. The information function is inversely proportional to measurement error, or standard error measurement (SEM). The value of the information function of the test device will be high if the items that make up the test have a high information function. The following is a picture of the curve of the relationship between the information function and SEM. Figure 9 shows that this instrument provides information of 22.36 (maximum) and has a measurement error of 0.21 (smallest) for medium-ability students. The lower and upper limits of the interval are the ability scores where the graph of the information function and the SEM graph intersect at that interval. This graph states that the greater the value of the information function, the smaller the measurement error (SEM), and the item information function expresses the strength or contribution of the test items in revealing the latent trait as measured by the test. This information function provides a description of the item according to the model (which helps with item selection) (Retnawati, 2014). These results conclude that this test instrument is suitable for students with medium abilities.

Figure 9. Graph of Information Function and Measurement Error
Based on the test results, it can also be seen that the actual competency information of students in the vocational school was based on the test answers. Instrument analysis is based on two patterns of student answers that have the same tendency based on Bloom's Taxonomy (Bloom, 1956). A total of six student answers were selected as samples with different abilities (high, medium, and low).

Item 1:
The cognitive domain to be achieved is C2 (Understanding). Based on students' answers, the two dominant patterns of student answers are: (1) students have understood the general form of arithmetic sequences and know the first term and other terms, and (2) students have not been able to formulate general forms in arithmetic sequences and perform algebraic operations on general forms arithmetic sequence.

Figure 10. An Example of Student Answers in Item 1
Item 2: The cognitive domain to be achieved is C2 (understanding). Based on students' answers, the two dominant patterns of student answers are (1) being able to understand the number of terms in a sequence by using the general formula for an arithmetic sequence, or determining the number of arithmetic sequences without using a general formula (only writing down all the terms from the first term until the last term) and (2) students who can already use the general formula for arithmetic sequences but have not been able to determine the number of arithmetic sequences because of errors in performing algebraic operations on general arithmetic sequences.

Figure 11. An Example of Student Answers in Item 2
Item 3: The cognitive domain to be achieved is C3 (applying). Based on students' answers, the two dominant patterns of student answers are: (1) students have understood and determined the difference between two non-adjacent arithmetic sequences using the general formula for arithmetic sequences, and (2) other students can determine the number of arithmetic sequences even though they are not using a general formula or by writing the terms of the known terms and inserting several terms.

Figure 12. An Example of Student Answers in Item 3
Item 4: The cognitive domain to be achieved is C3 (applying). Based on students' answers, the two dominant patterns of student answers are: (1) students have understood and can determine the nth term in an arithmetic sequence that is known to be two non-adjacent terms using the general formula for arithmetic sequences, and (2) students cannot determine the number of arithmetic sequences because it does not use a general formula but by writing the terms of the known terms and inserting several terms and continuing until the nth term.

Figure 13. An Example of Student Answers in Item 4
Item 5: The cognitive domain to be achieved is C1 (remembering). Based on students' answers, the two dominant patterns of student answers are: (1) students have understood and determined the middle term of an arithmetic sequence, and (2) students can determine the middle term of an arithmetic sequence but do not use general formulas, but rather by writing the terms from known terms and inserting several terms and then defining them.

Description of Instrument Development and Student Ability
The instrument developed from this research is the open polytomous response test, and all parameters have been accepted. This instrument is a combination of a multiple-choice test and an essay. Multiple choice test is easier to check students' answers, but their mathematical thinking processes cannot be known in depth. While the essay test has the advantage of being able to find deeper mathematical thinking processes, it takes a long time to check the answers.
Analysis of learning instruments is an important source of composite scores in the final report. In the final report, the student's ability score was first changed to a score of 0-10 (previously 0-100). The conversion is done through a linear transformation by dividing the student's score by the ideal score. Then, the result is multiplied by 10 to get a value range of 0-10 or multiplied by 100 to get a value range of 0-100. In the range of 0-10, the scores of students' mathematical ability were 8.56 (highest) and 4.31 (lowest), or in the range of 0-100, students' math ability scores were 85.6 (highest) and 43.1 (lowest).
The results of the analysis of student abilities are presented in the form of predicates ranging from very low to very high according to the specified category. The results of this analysis show that most students have very low to medium abilities, as much as 62% (253 students), and the remaining 38% (160 students) have high and very high abilities. Other analysis results found that high and very high-ability students tend to work according to concepts with more creative completion steps (different from the teacher's example), but students who have very low to medium abilities can solve problems according to concepts with less creative completion steps (e.g., routine or according to the teacher's example).
Other results show that the assessment with an open response polytomous makes it easier for teachers to explore students' difficulties with a material. Then, from this exploration, the teacher can continue with improvements for students who have learning difficulties. An important finding of this study is that information about students' learning difficulties through the open polytomous response test is more secondhand and complete than other assessment instruments, such as multiple-choice tests (Gierl et al., 2017) or essay tests (Putri et al., 2020).

Discussion
This research is development research aimed at developing a good mathematical assessment instrument using polytomous responses according to classical and modern theories. The results of the data analysis found that there were differences in the results of the analysis between classical theory and modern theory, namely on item discrimination. Classical theory analysis obtained 38 items with bad criteria and only 2 items with good criteria. In contrast, modern theory analysis obtained 27 items with good criteria and 13 items with bad criteria. According to evaluation theory, modern theory aims to cover the weaknesses of classical theory. The results of the classical theory analysis are often categorized as poor, but modern analysis results are categorized as good, and vice versa (Retnawati, 2014). That is, an item that is not in a good category with classical theory should be analyzed according to the modern theory before revising or replacing the item.
Research on learning assessment with open response polytomous was carried out by Yang et al. (2017). This research aims to diagnose student errors in completing calculus material at university. The instrument compares two types of test namely the two-tier test and the open polytomous response test. The research findings suggest that the open polytomous response test provides more detailed information on student error than the two-level test (see Figure 15). The research findings are in line with the results of this study, which states that the open polytomous response test provides more detailed information about students' abilities.

Response Test
Another study on learning assessment with open response polychromous was conducted by Ayanwale (2021). Ayanwale's research compares two polytomous response test analysis methods namely the Parallel and Partial Credit Model. The results of his research stated that the Partial Credit Model analysis had a first factor Eigenvalue of 20.5% (ideal Eigenvalue of more than 20%) compared to the Parallel analysis, which had the first factor Eigenvalue of 11.7%. The results of Ayanwale's research are in line with this study, which obtained the first factor Eigenvalue of 20.220%, and the author can state that the instrument developed is suitable to be used to assess vocational students in Indonesia, maybe even outside Indonesia.
Other studies related to classical and modern theory were conducted by Sarea (2018) and Saepuzaman et al. (2021). Sarea's research states that the response polytomous test has good criteria (classical and modern theory), and the classical theory analysis has more items than modern theory. Meanwhile, Saepuzaman's research found that the response polytomous test had good criteria (classical and modern theory), and the modern theory analysis had more items than classical theory.
The results of the analysis of student answers obtained information that there were two patterns of students' solving questions: (1) formulas and (2) trial and error. Students who use formula patterns in solving problems tend to be carried out by high-ability students, and students who use trial and error patterns tend to be carried out by students with medium and low abilities. Students who use both patterns can answer the questions correctly as shown in Figure 16 below.

Figure 16. An Example of Student Answer Patterns with (i) Formulas, (ii) Trial and Error
Both patterns of solving math problems are common ways. This is in line with the opinion of Mason et al. (2010) that there are several ways that can be used in solving math problems, namely trial and error, using a drawing or model, analogy, and formula. Syahlan (2017) states that two ways that students often use to solve math problems are (1) trial and error and (2) formulas. Usually, students use trial and error methods to solve easy problems and formulas methods to solve difficult questions. If combined with student answers and Syahlan's opinion, students with medium and low abilities in solving easy questions will use trial and error methods, and students with high abilities in solving difficult questions will use formulas.
The results of these student answers can be important information for teachers in teaching mathematics. This means that before teaching material, the teacher must know the students' initial abilities (high, medium, or low) and the level of difficulty of the material. Some of the benefits for teachers who know students' initial abilities are: teachers can develop professionally centered on students (Gonzalez, 2018), teachers can adjust the level of cognitive and increase student learning engagement (Dong et al., 2020), assist teachers in designing pedagogical practices and correcting students' misconceptions (Geofrey, 2021).In addition, the benefit for teachers who know the level of difficulty of a material is that they can design remedial learning plans (Muhson et al., 2017;Wulanningtyas et al., 2020).The description above shows that there are many benefits that the teacher will get if the teacher knows the students' actual abilities, and this can be known by the teacher if the teacher uses the open polytomous response test.

Conclusion
Based on the results of the research and discussion, it can be concluded that (1) the open polytomous response test has a good category according to classical and modern theory. Thus, the test instrument requirements are accountable (qualifies for a good test) is a good test instrument according to the analysis of the two theories (classical and modern theory), and (2) the open polytomous response test can provide information on the actual competence of students; this is observed in the students' arguments in giving reasons for their choices. This is observed in the students' arguments when giving reasons for their choices. Therefore, the open polytomous response test can be used as an alternative to learning assessment.

Recommendation
Based on the research results, there are several recommendations for teachers, schools, and other researchers. For teachers, they should familiarize students with giving a test in the form of a polytomous response before giving the test. In schools, principals or other leaders should encourage other teachers to take advantage of this test and develop other assessment instruments. For other researchers, they should conduct research by developing instruments with other polytomous responses (assessment of learning and assessment as learning) on other materials. In addition, for further research, it is suggested to conduct a study that develops an assessment instrument with a learning response polytomous (pretest). This is important so that students' prior knowledge can be known and learning can be effective.