Asia-Pacific Forum on Science Learning and Teaching, Volume 17, Issue 2, Article 4 (Dec., 2016)
Sook Fui CHIN and Hooi Lian LIM
Validation of an adapted instrument to measure students’ attitude towards science

Previous Contents Next


Discussion

Psychometric properties of ATST were examined by using Rasch analyses. Prior to the interpretation of the results obtained by using Rasch analyses, it is important to check the unidimensionality of the scale to ensure that data collected fits the model reasonably (Green & Denver, 2002). Previous studies show that there is misconception that internal consistency implies unidimensionality (Garner, 1995; Kind, Jones & Barmby, 2007). However, internally consistency does not represent unidimensionality. Since all items in a unidimensional scale measure a same construct, thus they will be internally consistent. However, an internal consistent scale might consist of more than one factor. Therefore, internal consistency does not imply unidimensionality. According to Kind, Jones, and Barmby (2007), a good instrument should be internally consistent and unidimensional. Thus, instead of internal consistency, it is important to examine the unidimensionality of ATST.

Based on findings, Cronbach’s alpha = .94 showed the internal consistency of ATST. In addition, fit statistics was used to show how well the raw data fit the Rasch model (Wu & Chang, 2008). Infit and outfit MNSQ for person and item are expected to be 1.00. Based on the findings in this study, infit MNSQ for person is 1.02 and for item is 1.00. Meanwhile, outfit MNSQ for both person and item is 1.02. Besides, the mean standardized (ZSTD) for infit and outfit are expected to be 0.0. The findings of this study showed that infit ZSTD and outfit ZSTD for person is -.3; for item are -.1 and .2 respectively. On the other hand, point-biserial correlations for all items are positive, ranging from .40 to .70. This indicates that the item-level scoring accord with the underlying constructs. These mentioned points provide sufficient evidences of unidimensionality.

After the unidimensionality was established, item fit statistics was used to further evaluate item-level model fit. Only one item out of 40 items is not in the reasonable ranges, which is Item 3: scientists usually like to go to their laboratories when they have a day off. However, the deviation is small (infit MNSQ = 1.46; Outfit MNSQ = 1.50). In most of the previous studies, misfit items were eliminated from the item pool. For instance, in the study by Clements, Sarama, and Liu (2008), 37 items that either showed poor fit and poor correlation with the scale were eliminated. Similarly, Wu and Chang (2008) excluded two misfit items in their study.

However, in real life, no item fits Rasch Model perfectly. Unidimensionality is merely a statistical assumption. In addition, Choe (2010) had pointed out that instead of ability that is measured, it is entirely possible that other non-intended factors intervene in the testing procedures. Besides, fit statistics show “relative” fit but not “absolute fit” (Wu & Adams, 2007). A misfit item in one test may fit well with items in another test. Hence, instead of setting rules for accepting or eliminating an item from an item pool, fit statistics should be served as an indicator for detecting problematic items. It is supported by Wu and Adams (2007) that:

The fit statistics should not be used blindly to reject items, particularly those that “over-fit”, as you may remove the best items in your test because the rest of the items are not as “good” as these items (p. 85).

Hence, fit statistics should be used to demonstrate sufficient evidences of unidimensionality but not to set a cut off point for item selections. Bradley & Sampson (2006) showed the same standpoint with Wu and Adams (2007). They had constructed a Mathematics Quality Survey and examined its psychometric properties by using Rasch analysis. Only one item fell outside the reasonable range. However, the item was not removed from the scale, but the related results were interpreted with caution. In addition, Wilson (2005) pointed out researcher should make judgment whether to delete or keep an item that is flagged as misfit under Rasch model. After considering the aforementioned points, sufficient evidences of the unidimensionality of ATST, and small deviation of infit and outfit value, Item 3 is not removed from the item pool.

After the appraisal of data fits the model, psychometric properties of ATST were analyzed and interpreted by using Rasch Model. Based on the findings, person reliability and item reliability are very high. High person reliability indicates the replicability of person ordering we could expect is another parallel test of items measuring the same trait is given to the same group of respondents. Meanwhile, item reliability indicates the replicability of item ordering we could expect if the test is given another group of respondents.

Next, person-item distribution was examined by using person-item map (Figure 3). It is very important to ensure that an instrument is able to detect the full variability in a population. As supported by Alquraan, Alshraideh, and Bsharah (2010), an instrument should be able to assess individual at both high levels and low levels of wisdom. The findings showed that not all items could cover the range of trait measured. Theoretically, items that have locations less than -.75 and more than .83 should be added. This could extend the range of the trait measured. However, only approximately one fifths of the students lie above and few students lie below the ability estimates that the items can measure. Furthermore, no noticeable gap in item difficulties among the items and the distributions of items and persons were normally distributed. Moreover, ATST is not a high-stake test for selective purposes. Therefore, it is still acceptable that ATST could not cover to range of trait measured for approximately one fifths of the students that endorse items in ATST easily.

The psychometric properties of rating scale used in ATST were examined too. In ATST, five-point Likert’s scale, which is one of the most common attitude scales in social science researches was used (Adelson & McCoach, 2010). Rating scale has great influence on the quality of data obtained (Bond & Fox, 2001). Although Likert’s scale has been used widely all around the world, one of the major problem in using Likert’s scale is the undetermined optimal number of response categories (Adelson & McCoach, 2010; Preston & Colman, 2000). Based on findings, average measures function and probability curve show that the five-point Likert Scale is a suitable scale for ATST. In conclusion, Rasch analyses provide strong indications that the ATST used in this study is having sufficient evidences of psychometric properties.

 

 


Copyright (C) 2016 EdUHK APFSLT. Volume 17, Issue 2, Article 4 (Dec., 2016). All Rights Reserved.