European Journal of Educational Research

: The present research aimed to test an Amharic version of the multi-dimensional Work Task Motivation Scale for Teachers (WTMST), which measures the five pillars of university instructors’ motivation toward teaching and student evaluation tasks b ased on self-determination theory (SDT). Therefore, the WTMST offers the first instrument to measure all five motivational elements, and today it is one of the most applicable instruments to assess teachers’ motivation. An Amharic version of the WTMST for teachi ng and student evaluation tasks was adopted and assessed in large-scale data (N=1,117). Our findings demonstrate excellent reliability and construct validity (convergent, discriminant, divergent and factorial). Besides, the results of the model comparisons provided that out of the four theoretically competing models (single-order factor, correlated factor, higher-order factor and bi-factor models), the bi-factor model was the most-fitted one used for measurement invariance across various groups. Results also suggest that the factor structure of the WTMST for both teaching and student evaluation tasks demonstrate consistency across gender (men, women), university types (research, applied, and general university), age and experience in teaching. Therefore, the WTMST for teaching and student evaluation tasks may be valid in Ethiopian higher education settings.


Introduction
The concept of teachers' motivation carries varying meanings and dimensions.For instance, Ryan and Deci (2000) defined motivation as energy, direction, and persistence, all the aspects of activation and intention.They further stated that motivation is essential in the real world because of its consequences.One of the most comprehensive empirically grounded theories of human motivation is the self-determination theory (SDT).It suggests that the scope of motivation is rather broad, uses concepts relevant to people, comprises a wide area of phenomena, is empirical-based, and has applicable values for human beings (Deci & Ryan, 2008a).
Nevertheless, SDT is a new approach for studying individuals' motivation and well-being.It has innumerable benefits in reaching fruitful results and could be exploited by parents, health care providers, religious leaders, managers, coaches, and teachers since it incorporates applicable theories and practices (Ryan & Deci, 2017).However, the predominancy of the SDT is derived from the lack of any comprehensive theoretical motivational models that use different assessment tools in various contexts (Fernet et al., 2008;Gagné & Deci, 2005;Gagné et al., 2010).
The SDT distinguishes different types of motivation based on various goals or reasons that actuate rather than the amount of motivation (Deci & Ryan, 2008b;Fernet et al., 2008;Ryan & Deci, 2000).According to Fernet et al. (2008), the SDT has three broadly-known kinds of motivation from low to high: amotivation, extrinsic motivation, and intrinsic motivation.Gagné and Deci (2005) and Fernet et al. (2008) evidenced self-determined types of motivation is the most applicable theoretical model to assess teachers' work task motivation.Amotivation is the lack of motivation, and individuals overlook the consequences between their actions and their consequences; external regulation denotes controlled behaviour to avoid punishment or to get a reward; introjected regulation is an external demand derived from an internal representation, e.g.anxiety, shame, or guilt; in the cause of identified regulation, the action is owned or accepted as personally valuable because it matches with the subject's aims and values; and intrinsic motivation is the performance of an activity for the sake of innate gratification or satisfaction from the task itself.
Identified regulation and intrinsic motivation typically lead to positive consequences, whereas introjected regulation, external regulation, and amotivation lead to negative results (Gagne & Deci, 2005;Fernet et al., 2008).Also, selfdetermined types of motivation in work are associated with higher job satisfaction and better employees' well-being (Fernet et al., 2008).On the other hand, teachers' work task motivation is highly associated with burnout (Fernet et al., 2008) and teacher well-being (Collie et al., 2015).It is necessary to add that identified regulation and intrinsic motivation had a positive effect on the personal accomplishment of burnout inventory, whereas they are beneficial against emotional exhaustion and depersonalization (Fernet et al., 2008).In contrast, external regulation, introjected regulation and amotivation are positively associated with emotional exhaustion and depersonalization and negatively with personal accomplishment (Fernet et al., 2008).Consequently, it is more beneficial to assess the convergent (positive factors) and divergent (negative variables) validities of work task motivation with teacher well-being constructs (workload well-being, organizational and student interaction well-being; Collie et al., 2015), burnout (emotional exhaustion, depersonalization) and low personal accomplishment (Maslach et al., 1996).
Moreover, teachers on duty perform their tasks in the frame of three inseparable well-being factors.These are workrelated well-being (e.g., students' evaluation, marking assignments and tests, working after hours), organizational wellbeing (e.g., support and recognition offered by administrators, relationship and communication with administrators, participation in administrative tasks, and established school rules and procedures), and student-interaction well-being (e.g., perceptions of student behaviour and motivation) (Collie et al., 2015).
In the Canadian cultural context, Fernet et al. (2008) examined elementary and high school teachers' psychometric properties of work task-based motivation based on the SDT.Also, the authors tested the construct validity of the work tasks motivation scale for teachers by incorporating six teachers' tasks (i.e., teaching, administrative, student evaluation, classroom preparation, classroom management and complementary tasks) using a correlated factor model (Fernet et al., 2008).Similarly, Fernet (2011) also ran a correlated factor model on the Work Role Motivation Scale for School Principals (WRMSSP) in the same cultural context.These two studies on work task motivation established and provided crucial psychometric information for current research.The authors also highlighted that future research should investigate multiple dimensions of teachers' motivation, examine construct validity, and employ multiple methods to minimize measurement biases (Fernet et al., 2008;Fernet, 2011).
In general, it could be stated that the study by Fernet et al. (2008) addresses WTMST based on the six teachers' main tasks (i.e., teaching, administrative, student evaluation, classroom preparation, classroom management and complementary tasks) using a correlated factor model.On the other hand, we found no research that compares the four computing models that are the single order factor, correlated factor, higher-order factor and bi-factor models, thus, we assume this research is able to provide a new comparative model and measurement invariance in the Ethiopian educational context and focus on two big teachers' tasks (i.e.teaching and student evaluation), giving new insight to university educators and other stakeholders working in the educational settings to improve teachers motivation.

The Objectives of the Present Study
The overall intention of our research is to contribute to the scientific usage of WTMST with a translation, adaption, psychometrical validation and equivalence measurement in Ethiopian education settings.This investigation is justifiable for several reasons.First, a cross-cultural validation and adaptation of an instrument across groups could be proceeded after confirmed measurement equivalence or invariance (Davidov et al., 2014).Measuring variance or equivalence is essential for two main reasons: (a) the samples of the studies have different cultural backgrounds applying the same instrument, and (b) the data collected from different nations using different languages of the same instrument (Eremenco et al., 2005).Failure to establish measurement invariances hinders the sound interpretation of the data and the ability to demonstrate reliability and validity (Byrne & van de Vijver, 2010;Vandenberg & Lance, 2000).Therefore, some gaps are found in the literature to be investigated in other cultural contexts.Second, there have not been any validated studies on university teachers using model comparisons and measurement invariance concerning gender, age, university type, and experience in teaching in Ethiopia.Third, there has not been any research on the psychometric properties of the WTMST in the Amharic language or the African cultural context.Thus, we test Fernet et al.'s (2008) teaching and student evaluation tasks Ethiopian Amharic version translated for this study.Fourth, to test the construct validity of the WTMST, we run single and multi-confirmatory factor analysis using Ryan and Deci self-determination model assessed by Fernet et al. (2008).Fifth, we compare single, correlated, bi-factor, and higher-order factor models as those have been overlooked in earlier studies, and our examination is in accordance with recent methodological and analytical recommendations (Chen et al., 2006;Immekus & Imbrie, 2008;Liang & Luo, 2020;Stockdale et al., 2002;Wang et al., 2018).Sixth, as a final step, we select the best-fitting model and perform further measurement invariance analysis across various groups to ensure the cross-cultural validation of the study (Chen et al. ,2006;Eremenco et al., 2005).
In our questionnaire-based quantitative research, we formulate and test the following hypotheses: Hypothesis 1: The five-factor structure of the WTMST is expected to be a reliable measurement in Ethiopian higher education settings.
Hypothesis 2: The five-factor structure of the WTMST (H1a-d;Fernet et al., 2008) show a good model fit using singlefactor, correlated factor, bifactor, and higher-order confirmatory factor analysis (CFA) models with a group of university teachers.
Hypothesis 3: The WTMST show good convergent, discriminant and divergent validity.
Hypothesis 4: The WTMST measurement models show a good fit across gender, age, university type, and experience in teaching.
Hypothesis 5: The five WTMST of teaching tasks and student evaluation tasks are expected to be related positively.

Research Design
The respondents of the survey completed the translated version of WTMST (Fernet et al., 2008) and its related measures, the Teacher Well-Being Scale (TWBS, Collie et al., 2015) and the Maslach Burnout Inventory (MBI, Maslach et al., 1996), in a cross-sectional survey from three Ethiopian public universities.Forty-seven questionnaires were excluded due to incompleteness before analysis; thus, the response rate was 95.3%.The completed data of the sample comprised (N=1,117) of which, n=835, 75% men, and n=282, 25% women) Ethiopian university teachers with a mean age of 31.1 years (SD=6.1 years).

Participants
Four hundred thirty-one (38.6%) participants were from the Gondar University (a research university), followed by 353 (31.6%) participants from the University of Woldia (a general university), and 333 (29.8%) participants from the university of Wollo (applied university); these institutions are set in the Amhara Regional State of Ethiopia.Table 1 illustrates the general socio-demographic information of all participants.

Details of the Measurement Tools
The Work Task Motivation Scale for Teachers (WTMST).The WTMST by Fernet et al. (2008) used in this study seeks to answer the question, "Why are you teaching?" and "Why are you evaluating students?".The WTMST construct included 15 self-reported items on a 7-point scale, from 1 = does not correspond at all to 7= corresponds completely.The WTMST is based on SDT by Ryan and Deci (2000).It includes five sub-scales (Fernet et al., 2008), each with three items comprising the three types of motivation: intrinsic, extrinsic (identified, external, and introjected regulation), and amotivation (Fernet et al., 2008).It is a standardized, validated instrument for measuring teachers' motivation in the teaching context.As a result, intrinsic motivation, external regulation, identified regulation, introjected regulation, and amotivation constructs are used in this study.In addition, the reliability of the five components of motivation was evaluated, and Cronbach's alpha values in the previous study ranged from 0.77 to 0.92 for all constructs.Finally, WTMST results provide excellent support for its psychometric properties (Fernet et al., 2008), and the scale was published by Fernet et al. (2008) in the Journal of Career Assessment.The construct reliability and validity information are presented in the result section.The adapted Amharic version of the WTMST measure was well established psychometric properties in the Ethiopian context (see the translation of the instrument in Appendix A).
Teacher Well-Being Scale (TWBS):The original version of the TWBS consisting of 16 items can be found in Collie et al. (2015), published in the Journal of Psychoeducational Assessment.Collie et al. (2015) showed that the teacher wellbeing construct has excellent internal and external validity and reliability (Collie, 2014;Collie et al., 2015).However, in this study, we used the Amharic version of TWBS (Zewude & Hercz, 2021, 2022).The authors performed various analyses to ensure the psychometric properties of the TWBS in the Ethiopian context.For further proofing of the construct validity, the TWBS construct validity was tested using the CFA model of the Ethiopian Amharic version, and the goodness of fit of the model shows: χ2 (101) = 455.63,p < 0.001, TLI = 0.967, CFI = 0.972, RMSEA = 0.056, 95% CI [0.051, 0.061] in the Ethiopian Amharic language and culture.Therefore, the data had an acceptable fit following the recommended criteria: CFI, TLI ≥ 0.90 and RMSEA < 0.08.This study found above 0.90 goodness of fit indices on TLI and CFI and RMSEA below 0.08; therefore, the finding of this study meets the acceptable recommended criteria of the CFA models.Besides the reliability of teacher well-being dimensions ranged from 0.80 to 0.2.The total scale was 0.92.
The Job Burnout: The Maslach Burnout Inventory (MBI) developed by Maslach et al. (1996) is the most widely used tool to assess job burnout.The MBI has 22 items listed in sub-dimensions emotional exhaustion (EE), personal accomplishment (PA), and depersonalization (DP).Respondents rank items on a 7-point Likert scale, ranging from 0 =never) to 6=every day).In the current study, confirmatory factor analysis examined the model fit of the MBI scale with a robust maximum likelihood estimation method.Therefore, in this study the construct validity of the scale of confirmed the goodness of fit of the model: χ2 (206) =1330.45,p < 0.001, TLI = 0.931, CFI = 0.938, RMSEA = 0.070 [0.065, 0.074].Moreover, Cronbach's alpha for the burnout subscales ranged from α =0.885-0.942,and the total scale reliability was α=0.701.

Data Analysis
The statistical analyses of this study were computed using IBM SPSS version 25.0 and Amos software version 26.0.Several multi-modal analyses were carried out to test this study's psychometric properties and measurement invariance of the Amharic version of WTMST, for example reliability (Cronbach's alpha and composite reliability), measurement invariance (configural, metric, scalar and residual), CFA (single and multi-group analysis), validity (divergent, convergent, discriminant), and structural equation modelling were employed (Davidov et al., 2018;Hair et al., 2019;Kline, 2016).
In addition, the absence of multi-collinearity was confirmed by examining the correlation matrices among the constructs, which should be less than 0.90, and by verifying the assumption of normality.Outliers of the constructs were also examined following the procedures of Hair et al. (2019), Kline (2016), Tabachnick andFidell (2018).Values of x ≤ 2 or x ≤ 4 for skewness or kurtosis, respectively, indicate the normal distribution of data (Kim, 2013;Mishra et al., 2019).The skewness values are between -0.017 and -0.720, and kurtosis scores range from −0.063 to -1.097, which suggests all the constructs are relatively normally distributed (Table 2).

Reliability
The reliability scores of the current study were assessed by both Cronbach's alpha coefficient and composite reliability (CR).To test the accuracy of the reliability of the measurement of each construct following Cronbach's (1951) suggested guidelines are: the reliability value α ≥ 0.9 = excellent; the value range from α 0.89 to 0.80 = good; 0.79 to 0.70 = acceptable; 0.69 to 0.60 = questionable; α 0.59 to 0.50=poor, and 0.50 > α = unacceptable.

Convergent, Divergent, and Discriminant Validity
First, convergent and discriminant validity were assessed using the maximum shared variance (MSV) and the average variance extracted (AVE).The AVE values that exceed a threshold limit higher than 0.5 (AVE > 0.05) demonstrate good convergent validity.Moreover, factors whose MSV is lower than AVE are characterized by adequate discriminant validity (Hair et al., 2019).However, the researchers also expected the work task motivation to be positively associated with other similar positive variables (convergent validity) and negatively correlated with the opposite variables (divergent validity) (George & Mallery, 2020;Hair et al., 2019).As a result, Pearson's correlation coefficient was used to examine teacher well-being with the other relevant constructs (teacher well-being) to explore the convergent validity and teachers burnout to explore divergent validity.Schober et al. (2018) suggested the standard cut-points of correlational coefficients were: 0.90─1.00=very strong, 0.70─0.89=strong, 0.40─0.69=moderate, 0.10─0.39=weak, and 0.00─0.10=negligible correlation.

Single and Multi-Modal Confirmatory Factorial Analysis (SMG-CFA)
The researchers used confirmatory factor analysis to test the construct validity of the WTMST (Fernet et al., 2008).The rationale used: firstly, the CFA test provides evidence of the validity of individual measures based on the overall fitness of model's and other evidence of construct validity (Hair et al., 2019).Secondly, the CFA analysis method was used because the selected variables previously hypothesized were theory-driven and empirically confirmed rather than derived from the data (Lei & Wu, 2007).
Model comparison can test further statistical analysis, address various assumptions, and reach a firm conclusion regarding the construct validity.Specifically, to select the best model, the following methods are recommended: first, by the method of information criterions, for example, Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC), second, by the best-fit measure using the maximum likelihood estimation (ML) such as Tucker-Lewis index (TLI), comparative fit index (CFI), the root mean square error of approximation (RMSEA) methods (Liang & Luo, 2020;Wang et al., 2018).
To achieve this, we have compared correlated factor models, the higher-order model, the single-factor model, and the bi-factor model of WTMST.
The details of the CFA models described in this study are as follows: Model 1.First we examined the original inter-correlated five-factor WTMST of Fernet et al. (2008).
Model 2. The single-factor model included a single variable for each of the five core pillars of the work task motivation model to test a general factor against the dimensional factor structures (Immekus & Imbrie, 2008).
Model 3. Bi-factor model.Study the role of dimensions of work task motivation that are independent of the general (work task motivation) and to test both the general factor and dimensions of work task motivation as the primary concern of the issue (Chen et al., 2006, Wammerl et al., 2019).investigate model misfit due to fitting a unidimensional model to multi-dimensional and justify the creation of subscales (Immekus & Imbrie, 2008).
Model 4. Higher-order models were employed to examine the lower-order variables of WTMST are significantly associated with each other (Chen et al., 2006).In addition, higher-order factors also help reduce the number of path model relationships and provide means for handing collinearity among formative indicators (Johnson et al., 2011).In addition, bifactor and higher-order factor models are the two best alternative approaches for representing general constructs containing various highly correlated domains (Chen et al., 2006).
Regarding the fitness of indices in explaining our single and multi-modal CFA models, the researchers did not consider the chi-square test (χ 2 ) owing to the oversensitivity of sample size based on the suggestion of Barrett (2007) and Steiger (2007).Instead, we used the most globally reported goodness-of-fit indices: TLI, CFI, RMSEA and information criteria (AIC, BIC) are strongly recommended (Stockdale et al., 2002).The GFI is used to evaluate how justified the fitting model is, while TLI and CFI are for comparing the independent models to each model.Moreover, the TLI is free from the sample size, while the CFI estimates the sample model fit to the population (Stockdale et al., 2002).The following cut points are used for GFI, TLI and CFI, the recommended cut-points (cp) are the following: 1 = exact fit, 0.95 to 0.99 = close fit, 0.9 to 0.95 = acceptable fit, 0.85 to 0.9 = mediocre fit, and cp > 0.85 = poor fit (Hu & Bentler, 1999).Cut-points for RMSEA are as follows: 0 = exact fit, 0.01 to 0.049 = close fit, 0.05 to 0.079 = good fit, 0.08 to 0.09 = mediocre fit, and cp > 0.1 = poor fit (Hu & Bentler, 1999).For groups of 10 to 20, Hu and Bentler (1999) suggested RMSEA cut points of ≤ 0.08 for acceptable fit.
In addition to goodness-of-fit indices, the information criteria such as the AIC and BIC are the most appropriate for model comparison and valuable for selecting a good model (Hooper et al. al., 2008).For example, small values of AIC and BIC of the default model compared with saturated and independent models indicate good model fitting, and statistics also require a sample size of 200 and above to make their usage reliable (Hooper et al. al., 2008).

Measurement Invariance
Measurement invariance (MI) or equivalence refers to the unbiased measurement between two languages and cultural backgrounds using the same instrument (Eremenco et al., 2005), and it is needed to confirm comparative groups (culture, gender, age, education, etc.; Davidov et al., 2014).Such differences are detected by applying MI across various groups (Chen, 2007;Putnick & Bornstein, 2016).According to Byrne (2010), before the MI testing strategy begins, testing the CFA of each group separately for fit evidence is mandatory.
For the MI, we have tested the psychometric equivalence of the constructs across various groups using CFA (Putnick & Bornstein, 2016).The researchers tested the CFA models for the subgroups of gender, age, university type, and experience in teaching separately in the initial stage of this study.Good model fit is a prerequisite to testing MI (Byrne & van de Vijver, 2010).The researchers followed well-established scientific procedures using the four MI stages (Millsap, 2011;Putnick & Bornstein, 2016;Vandenberg & Lance, 2000).In stage 1, a configural invariance was conducted to establish a baseline model across groups without restriction, where the tested construct was the same across all groups (Vandenberg & Lance, 2000).In stage 2, the metric measurement invariance (MMI) was examined; the same constrained factorial loadings to the different groups responded similarly to indicators.In stage 3, scalar measurement invariance or strong invariance (SMI) was performed.In this test, the indicator intercepts and the factor loadings were constrained in the same way across groups.Finally, the residual measurement invariance or the strict invariance (RMI) was tested in the fourth stage.It refers to the similarity of item residuals of metric and scalar invariant items (Putnick & Bornstein, 2016).The MI of the present study four-sequential-staged analysis used single and multigroup CFA following Millsap (2011) and Putnick and Bornstein (2016) and arrived at the following recommendation criteria: ΔTLI, 0 = perfect and ≤ 0.01 = acceptable, ΔRMSEA, 0.015 for metric, scalar, and residual invariance (Chen, 2007;Putnick & Bornstein, 2016).

Convergent, Divergent and Discriminant Validity
In this study, we evaluated the validity of teachers' work task motivation in multi-dimensions for teaching and student evaluation tasks based on their respective scores in AVE and MSV (see Table 3).This study found that all five motivation constructs of teaching and student evaluation tasks have a good convergent validity (AVE > 0.05), implying that the corresponding items are composed of the core factors with acceptable correlation.The reason is that each item explains the latent constructs in each factor.Three methods were employed to test the discriminant validity of the work task motivation of teachers for both teaching and student evaluation tasks.
First, the constructs were tested since their AVE values were higher than their MSV (see Table 3).Hence, the result of this study indicated that the sub-constructs of the motivation scale (teaching tasks) AVE were greater than MSV and presented their consecutive result as follows (a Second, we compared the AVE with squared inter-item correlations within the construct (see Table 3) to assess discriminant validity to see whether the AVE is higher than squared correlations (Hair et al., 2019).Thus, based on the suggested criteria, AVE is higher for all the constructs in this study than the squared correlation of each construct, which suggests that each factor's variance is better explained by the corresponding items that mainly load on each factor.
Third, to demonstrate the convergent validity of the work task motivation of teachers' construct, the researchers tested its association with other relevant positive psychology constructs (teacher well-being).The result revealed that the teacher motivation construct for both tasks (teaching and student evaluation tasks) have a statistically and positively significant relationship with teacher well-being (r = 0.1328, p < 0.01, and r = 0.082, p < 0.01, respectively.In addition, the researchers tested teachers' work task motivation divergent validity by measuring their relationship with negative psychological constructs (burnout).The result of divergent validity showed that teachers' motivation for both student evaluation tasks was negatively and significantly correlated to a negative variable of burnout (r = -0.122,p < 0.05).However, it was positively associated with WTMST of teaching task (r = 0.106, p < 0.05) (see Table 4).
Based on the results obtained, we can confidently conclude that the work task motivation of teachers in both teaching and student evaluation tasks in the Ethiopian higher educational settings meets the requirements of convergent, discriminant, and divergent validity.The results further suggest that the instruments are applicable.

Confirmatory Factor Analysis: Model Comparison and Evaluation
The researchers examined the four competing models (single-factor model, correlated factor model, bi-factor model, and higher-order-factor model) for teachers' motivation for teaching and student evaluation tasks.We have selected the most appropriate model fit for further test measurement invariance (equivalence).One of the best criteria for selecting the best model is the highest goodness of fit (GFI, TLI, CFI and RMSEA) and the lowest AIC and BIC.As a result, the current study showed the highest GFI, TLI, CFI, RMSEA and smaller values of AIC and BIC of the bi-factor model on teaching and student evaluation tasks of work task motivation (see Table 5) than other models.Figures 4 (a─d) illustrated the four competitive models of the WTMST for teaching and student evaluation tasks.As Table 5 demonstrates, except the single-factor model, three of the competing models, namely the correlated factor, the bi-factor, and the higher-order factor models of WTMST for both teaching and student evaluation tasks showed an excellent model fit.Regarding the best model fit among the four competing models both teaching, and student evaluation tasks showed best in the bi-factor model used for further measurement invariance analysis.The bi-factor, correlated factor, and the higher order model results of WTMST in teaching tasks found that, χ2 (75) = 189.74,p < 0.001, χ2/df= 3.04, GFI =0.977, TLI = 0.979, CFI = 0.985, RMSEA = 0.037, 95% CI [0.031, 0.044], AIC=279.74,BIC=505.56;χ2 (80) = 243.18,p < 0.001, χ2/df= 2.53, GFI = 0.972, TLI = 0.971, CFI = 0.978, RMSEA = 0.043, 95% CI [0.037, 0.049], AIC=323.18,BIC=523.92; and χ2 (85) = 289.17,p < 0.001, χ2/df= 3.40, GFI = 0.965, TLI = 0.966, CFI = 0.973, RMSEA = 0.046, 95% CI [0.041, 0.052], AIC=359.17,BIC=534.81,respectively.
Besides, the WTMST of three best fitted competitive models of this study in student evaluation tasks found that, χ2 (76) = 202.22,p < 0.0016, χ2/df = 2.67, GFI = 0.977, TLI = 0.985, CFI = 0.989, RMSEA = 0.039, 95% CI [0.032, 0.045], AIC=290.72,BIC= 511.53 for bi-factor model; χ2 (80) = 286.59,p < 0.001, χ2/df= 3.58, GFI = 0.967, TLI = 0.977, CFI = 0.983, RMSEA = 0.048, 95% CI [0.042, 0.054], AIC = 366.59,BIC=567.24 for correlated factor model; and χ2 (85) = 320.04,p < 0.001, χ2/df = 3.77, GFI = 0.948, TLI = 0.976, CFI = 0.980, RMSEA = 0.050, 95% CI [0.044, 0.056], AIC = 390.04,BIC = 565.68 for higher-order factor model. Figure 1 illustrates the four competing models of WTMST of teaching and student evaluation tasks presented below.The bi-factor model of the WTMST in teaching and student evaluation tasks showed the highest global fitness indices (GFI, TLI, CFI, and RMSEA) and the lowest in AIC and BIC compared with the other three comparison CFA models.On the other hand, higher-order and correlated factor models showed relatively congruent results, and the single-factor model showed the worst fitness of indices, and it was rejected due to its poor standard global cut-off points.The poor fitness of indices of the single-factor model implies that the WTMST for both teaching and student evaluation tasks are multi-dimensional constructs.Hence, the WTMST of teaching and student evaluation tasks show the best fit in this study in the three model comparisons: the bi-factor model, three-factor model, and higher-order factor model.
Additionally, the factor loadings of the three fitted models (bifactor, correlated and higher-order factor models) were high, and the factor loadings showed significance in all five dimensions (p < 0.001).All the 15 items of the WTMST in teaching and student evaluation tasks of the three competing models factor loadings higher than 0.40 and significant (see Table 6).
To sum up, the model comparison of this study valued these goodness-of-fit indices (GFI, TLI and CFI > 0.90, and RMSEA < 0.08), and AIC and BIC lowest score indicated that among the four competing models the bifactor model was the best-fitted used for further measurement invariance analysis to ensure and give a complete picture of cross-cultural validation.

Measurement Invariance (MI) WTMST for Both Teaching and Student Evaluation Tasks
The researchers considered the following cut-points as the most fitting.MI: ΔCFI = 0.02 and ΔRMSEA = 0.03 for the metric invariance test and ΔCFI = 0.01 and ΔRMSEA ≤ 0.015 for the scalar and residual tests (Putnick & Bornstein, 2016).The teaching and student evaluation tasks of WTMST were constructed independently across groups, and the model shows a good fit.Single CFA was performed before testing MI (see Table 7).Figure 2 clearly illustrates the measurement invariance across groups of the teaching and student evaluation tasks of WTMST.
Experience in teaching.University teachers' years of teaching experience (fewer than 5 years, 6 to 10 years, and 11 years or more) showed an excellent fit to the data (see Table 7), as did MI on the configural, metric, scalar, and residual tests.In addition, the strict model (residual) was achieved, and all item loadings, intercepts, and residual variances were equivalent or equal across the three levels of experience in teaching.The hybrid WTMST model of teaching and student evaluation tasks aim is both the teaching and student evaluation tasks are potential positive sources of work task motivation.According to Fernet et al. (2008), teaching tasks and student evaluation tasks of WTMST have a positive correlation.The two-hybrid factor model (see Figure 3) was fitted with this data and the result was as follows: χ 2 (394) = 1020.49,χ 2 /df = 2.59, GFI = 0.941, TLI= 0.965, CFI= 0.968, RMSEA = 0.038, 95% CI [0.035, 0.041].However, in this study, the two-hybrid models, even though the model fit was good, had a negative correlation and did not support hypothesis 5.The teaching and student evaluation tasks are highly associated and support each other.The result might be due to cultural differences and teachers' motivation preferences in their job.

Discussion
This study made an effort to validate and evaluate the four competitive models (single, correlated, bi-factor, and higherorder) of the WTMST for teaching and student evaluation tasks by developing and examining the culturally-adapted Amharic version of the WTMST with a large sample of instructors from an Ethiopian public university.In contrast to Fernet et al.'s (2008), who investigated a good model fit of WTMST using the correlated factor model, we included the single factor, bi-factor and the higher order models in the diverse sample of public universities.The newly-added and expanded models gave additional insight into WTMST for teaching and student evaluation tasks of its multidimesionality nature.
Due to the lack of teacher-specific motivation measures in the Ethiopian context based on the SDT of motivation, and after the reconsideration of the previously-conducted research, the importance and usefulness of the cross-cultural adaptation and validation of an instrument is evident.
The Amharic version of the WTMST developed strictly following the global guidelines for cultural adaptation (Davidov et al., 2014) confirmed high reliability and best construct validity in the three independent competitive models (bifactor, correlated factor and higher-order) models and proved good convergent validity.In addition, the CFA model of WTMST for both teaching and student evaluation tasks proved that the bi-factor model confirmed a much better fit than the correlated and the higher-order factor models.
Correlated and higher-order factor models were relatively congruent and illustrated the best model fit than the singlefactor model.However, due to the unfit of the global cut-off points, the single-factor model was rejected and revealed the worst model fit than the other three CFA competitive models; this advocates WTMST multi-dimensional model.The best fit of the multi-dimensional model of WTMST and the poor fit of the single factor model in teaching and student evaluation tasks indicates the distinct domains' existence.Presumably, the WTMST for teaching and student evaluation tasks is the first study to confirm the multi-dimensionality of the WTMST by evaluating the four different CFA competitive fit models.Furthermore, this study tested for MI across various groups (genders, age, university type and experience in teaching).
The current study used single and multi-group CFA to test the WTMST of teaching and student evaluation tasks that the bifactor model is invariant across gender, university type, and experience in teaching for teaching tasks and student evaluation.In the first step, single group CFA analyses were conducted and uncovered acceptable fit indices for subgroups of gender (male versus female), age (25-35, 36-45, and 46+ years old), university type (research, applied, and general), and experience in teaching (below 5, 6-10 years and 11+ years).The only exception was the age 46 and above subgroup (n=57), which was slightly lower than the other age sub-group.The measurement invariance was performed following the single group and the multi-group CFA.The models were invariant except for age in teaching and student evaluation tasks of WTMST.The findings of present study regarding MI support the existing literature (Cheung & Rensvold, 2002;Millsap, 2011;Putnick & Bornstein, 2016;Vandenberg & Lance, 2000;van de Schoot etal (2012)..

Conclusion
Besides acknowledging that the adapted Amharic version of WTMST in teaching and student evaluation tasks is psychometrically sound, reliable, valid, and invariant using multi-modal CFA comparisons, this study also offers a sophisticated validated scale exclusively intended for assessing university instructors' motivation to the scientific community.However, it was conducted by the university instructors by taking large-scale data from Ethiopian higher education.This study was also triangulated using various recommended advanced statistical methods to ensure the psychometric properties of WTMST separately and combined both teaching and student evaluation tasks.

Recommendations
The WTMST has to be substantially validated in large-scale data by taking the different types of universities, such as research, applied and general universities.In addition, comparing CFA models in teaching and student evaluation tasks and testing invariants across various groups may also offer solid methodological evidence applicable to educational settings.However, this study targeted only two teacher tasks (teaching and student evaluation) among the six: administrative, classroom preparation, classroom management, and complementary tasks.Because of that, further studies should be advised that focus on the rest of the teachers' tasks based on multi-dimensional WTMST across various languages and cultures.In addition, in this study primary and secondary school teachers are not involved thus future inquiries should also consider this gap and compare the schools to intervene in teachers' motivation.Most notably, in this study, university teachers were required to make the validation and measure invariance in Ethiopian higher educational settings.Therefore, future studies should consider school administrators, students, and teachers to assess the WTMST in teaching and student evaluation tasks to make the best decisions in educational settings.
Furthermore, future studies could extend the present research by incorporating other aspects of teachers' motivational tasks and their association with teacher well-being and stress.Last but not least, the collected data of this study were self-reported using a quantitative approach.Hence, future studies are expected to include both quantitative and qualitative approaches using case studies, interviews, and self-reported measures to triangulate and get context-based motivational problems of teachers to be more valuable.

Limitations
Several drawbacks of the study must be acknowledged.First, this study assessed university teachers only in Ethiopian higher educational settings.However, in this study, we have taken on a global and specific context-based scientific literature dealing with WTMST and the relevant tasks of teachers.Therefore, the WTMST measure should be relevant to explore the association between teacher well-being and burnout.Second, replication of WTMST is needed by considering two or more tasks in various nations would also boost the relevance of the instrument in teaching contexts.Third, this WTMST Amharic version did not yet comprise the four pillars among the six teachers' tasks, i.e., administrative, classroom preparation, classroom management, and complementary tasks of Fernet et al.'s (2008) dimensions.Future studies could fill the gaps of this study by looking for the rest of the teachers' tasks and associating them with teacher well-being in other educational and organizational settings.

Figure 1 .
Figure 1.The Four Competing CFA Models of the Work Task Motivation Scales for Teachers for both Teaching and Student Evaluation Tasks Note: (a) single-factor model, (b) bi-factor model, (c) Higher-order factor model, and (d) five-factor correlated model

Figure 2 .
Figure 2. Measurement Invariance of The Bi-factor of Teacher's Motivation Model for both Teaching and Student Evaluation Tasks Note (a) the configural MI (the same reference), (b) the metric MI (the same factor loadings), (c) the scalar MI: the equivalence of intercepts, and (d) the residual MI (equality or similarity of errors).Again, the bi-factor model was used for MI.
003] Note.N = 1,117; * p< 0.001; MI = measurement invariance, RMSEA = root mean squared error of approximation, TLI = Tucker-Lewis index; CFI = comparative fit index; ΔRMSEA = change root mean squared error of approximation, ΔTKI = change Tucker-Lewis index; ΔCFI = change comparative fit index.The Two-Factor Model of WTMST for both Teaching and Student Evaluation Tasks

Table 2 .
Descriptive Statistics: Mean, Standard Deviation, Skewness and Kurtosis Scores

Table 3 .
Reliability and Validity Indices of the Work Task Motivation Scale for Teachers (WTMST) for Teaching and Student Evaluation Tasks

Table 5 .
Comparison of Fit Indices in the Four Competitive Models for WTMST for both Teaching and Student Evaluation Tasks

Table 6
Standardized Factor Loadings for the Four Computing Models on the Work Task Motivation Scale for Teachers

Table 7 .
Single-Factor Group and Measurement Invariance of The WTMST (Teaching and Student Evaluation Tasks) across Various Groups