Asia-Pacific Forum on Science Learning and Teaching, Volume 18, Issue 1, Article 10 (Jun., 2017) |
Testing the assumptions of regression analysis
Before executing regression analysis, whether the data of this study confirm the assumptions of regression analysis was tested. According to Tabachnick and Fidell (2007), some points should be taken into consideration before performing regression analysis, for example, ratio of cases to independent variables, outliers, multicollinearity, singularity, normality, linearity, homoscedasticity, and independence of residuals. Firstly, ratio of cases to independent variables was tested in this study. Having 20 times more cases than independent variables is necessary condition for sample size (Tabachnick & Fidell, 2007). There were six independent variables and 298 cases in this study. Therefore, sample size was appropriate to perform regression analysis.
Then outliers were checked by using z-scores and Mahalanobis distances as suggested by Tabachnick and Fidell (2007). Examining z-scores was also an effective way for initial screening of the data to check normality (Osborne & Overbay, 2004). The three participants’ z-score values that were not between -3 and +3 were removed from the data in this study. Remaining data (N = 298) showed that the Mahalanobis distances ranged between .459 and 20.370. The critical value at significance level of .001 for degrees of freedom 6 was 22.458 (Tabachnick & Fidell, 2007). Because there were no cases exceeding the value of 22.458 and the all z-scores were between -3 and +3 in each dimension, it could be claimed that there were no outliers in the data in this study. According to Tabachnick and Fidell (2007), there should not also be multicollinearity in the data to perform regression analysis. To test the multicollinearity condition index (CI), variance inflation factor (VIF) and tolerance values as well as correlations among independent variables were checked. Table 1 presents CI, VIF and tolerance values of the independent variables.
Table 1. CI, VIF and tolerance values of the independent variables
Measures
CI
VIF
Tolerance
Importance
10.367
1.138
.879
Comprehension
11.831
1.084
.923
Requirement
13.373
1.176
.850
Interest
15.930
1.065
.939
Self-efficacy of learning physics
19.287
1.139
.878
Mathematics achievement
27.764
1.045
.957
Note: Dependent variable is physics achievementRequired CI, VIF and tolerance values are as follows; CI values should be lower than 30, VIF values lower than 10, and tolerance values higher than .20 (Tabachnick & Fidell, 2007). The values presented in Table 1 satisfied these requirements. In addition, high correlations (.90 and higher) among the independent variables can cause multicollinearity in the data (Tabachnick & Fidell, 2007). Therefore, correlations among observed variables in this study were also examined (see Table 2).
Table 2. Correlations among observed variables
Measures
1
2
3
4
5
6
7
Importance
-
Comprehension
.024
-
Requirement
.310**
.113
-
Interest
.144*
.168**
.132*
-
Self-efficacy of learning physics
.180**
.232**
.234**
.144*
-
Mathematics achievement
.111
.048
.172**
.081
.132*
-
Physics achievement
.044
.219**
.232**
.080
.262**
.483**
-
** Correlation is significant at the 0.01 level.
* Correlation is significant at the 0.05 level.
As shown in Table 2, maximum correlation value among independent variables (between 1 and 3) was .310. This value did not imply much higher correlation. Therefore, this result can suggest that there is no multicollinearity in the data. In addition, correlation was maximum between mathematics achievement and physics achievement, and minimum between mathematics achievement and ‘comprehension’. Another assumption singularity requires that there should not be a variable that is a combination of two or more of the other variables (Tabachnick & Fidell, 2007). In this study, none of the variable was a combination of other variables. Therefore, singularity assumption was also met in this study.
Finally, normality, linearity, homoscedasticity, and independence of residuals assumptions were checked. These assumptions should be confirmed by examining “the residuals scatterplot” and “the Normal Probability Plot of the regression standardized residuals” (Pallant, 2005, p. 150). When the points in the Normal Probability Plot are distributed along a reasonably straight diagonal line, this suggests a normal distribution (Pallant, 2005). All the cases lied in a straight line in this study. Furthermore, according to Tabachnick and Fidell (2007), the residuals scatterplot that resembles a shape of rectangle suggests to meet the assumption of linearity. In this study the residuals’ distributions resembled a rectangle more so this assumption was also met. Homoscedasticity can also be checked by examining the residuals scatterplot. If the residuals are randomly scattered around zero point, and they exhibit a relatively even distribution, the data is not heteroscedastic (Osborne & Waters, 2002). Residuals were also randomly scattered around zero point, and showed even distribution in this study. As a conclusion, all the assumptions were met to run the regression analysis.
Validity and reliability of the scales
CFA (N = 298) was performed to test the construct validity of the scales. CMIN/df, and RMSEA values as well as some fit indices CFI, GFI, and TLI were examined. CMIN/df, RMSEA, CFI, GFI, and TLI values of self-efficacy of learning physics scale were found to be 2.414, .069, .977, .956, and .968, respectively. Furthermore, factor loadings (FL), measurement errors (ME), and significance of item loadings (p) of self-efficacy of learning physics scale were examined (see Table 3).
Table 3. Factor loadings (FL), measurement errors (ME), and significance of item loadings (p) of self-efficacy of learning physics scale
Item
FL
ME
p
Self-efficacy
1
.647
-
-
2
.738
.108
< .001
3
.727
.095
< .001
4
.680
.105
< .001
5
.702
.100
< .001
6
.813
.098
< .001
7
.849
.097
< .001
8
.756
.099
< .001
As shown in Table 3, minimum factor load value was .647. Measurement errors were below .200 and significance of item loadings was below .001. CFA was also executed to test CMIN/df, RMSEA, CFI, GFI and TLI values of ATP survey. These values were found to be 1.412, .037, .956, .891, and .951, respectively. In Table 4, factor loadings (FL), measurement errors (ME), and significance of item loadings (p) of ATP scale are also presented.
Table 4. Factor loadings (FL), measurement errors (ME), and significance of item loadings (p) of ATP scale
Item
FL
ME
p
Importance
29
.728
-
-
11
.775
.076
< .001
9
.764
.079
< .001
26
.770
.078
< .001
23
.737
.079
< .001
28
.697
.078
< .001
10
.743
.071
< .001
13
.676
.080
< .001
27
.749
.077
< .001
24
.780
.079
< .001
Comprehension
14
.529
-
-
19
.665
.149
< .001
5
.724
.149
< .001
2
.402
.147
< .001
20
.433
.164
< .001
6
.650
.158
< .001
3
.565
.180
< .001
Requirement
12
.708
-
-
1
.682
.084
< .001
17
.612
.096
< .001
25
.620
.088
< .001
30
.677
.088
< .001
15
.777
.089
< .001
18
.797
.082
< .001
Interest
7
.729
-
-
16
.806
.097
< .001
21
.761
.085
< .001
8
.594
.095
< .001
4
.503
.076
< .001
22
.568
.090
< .001
As seen in Table 4, minimum factor load was .402 and maximum measurement error was .180. All significance of item loadings was also below .001. Byrne (2010) suggests that the values of fit indices CFA, GFI, and TLI should be above .90, RMSEA value closer to zero, and CMIN/df smaller than 3. When the two scales’ (self-efficacy of learning physics scale, and ATP scale) CFA results were considered, it can be claimed that these two scales have a reasonable fit. Thus the two scales were validated.
Cronbach’s alpha reliability coefficients were also examined to check the reliability of the scales. The alpha coefficients of self-efficacy of learning physics scale were observed to be .905. In addition, the alpha coefficients of the ATP scale’s dimensions ‘importance’, ‘comprehension’, ‘requirement’, and ‘interest’ were found to be .924, .760, .870, and .835, respectively. ATP scale’s overall alpha was also observed to be 0.855. According to Pallant (2005), the values above .700 for the alpha coefficients were satisfactory to claim that the scale is reliable. Therefore, the alpha coefficients found in this study were in acceptable level.
Descriptive statistics including means (M) and standard deviations (SD) of each variable were calculated. Table 5 presents the descriptive results.
Table 5. Results of descriptive statistics
Measures
N
M
SD
Importance
298
4.131
.671
Comprehension
298
3.139
.700
Requirement
298
4.011
.681
Interest
298
3.922
.713
Self-efficacy of learning physics
298
4.794
1.418
Mathematics achievement
298
67.242
16.307
Physics achievement
298
68.309
13.996
As indicated in Table 5, the means of mathematics achievement and physics achievement were very close to each other. In addition, the means of the dimensions of attitude towards physics were almost equal to 4 except the dimension ‘comprehension’. This result can suggest that the participants have positive attitude towards physics. The participants’ self-efficacy of learning physics mean was also close to 5.
Hierarchical regression analysis
Hierarchical regression analysis was performed to test whether the variables ‘importance’, ‘comprehension’, ‘requirement’, ‘interest’, ‘self-efficacy of learning physics’ and ‘mathematics achievement’ predict physics achievement. Table 6 presents the regression analysis results.
Table 6. Results of hierarchical regression analysis
Independent variables
B
SE
β
t
R2
ΔR2
Model 1
.125
.032
Importance
-1.105
1.217
-.053
-.908
Comprehension
3.076
1.140
.154
2.699*
Requirement
3.794
1.210
.185
3.135*
Interest
.197
1.107
.010
.178
Self-efficacy of learning physics
1.888
.575
.191
3.286*
Model 2
.313
.188
Importance
-1.558
1.081
-.075
-1.441
Comprehension
3.029
1.012
.151
2.994*
Requirement
2.612
1.083
.127
2.412*
Interest
-.185
.984
-.009
-.189
Self-efficacy of learning physics
1.516
.512
.154
2.963*
Mathematics achievement
.380
.043
.443
8.913**
**p < .001, *p < .05
As shown in Table 6, in model 1 the dimensions in attitude towards physics and self-efficacy of learning physics contributed significantly to the regression model, F (5, 292) = 8.358, p < .001. These variables accounted for 12.5% of the variation in physics achievement. In model 2, mathematics achievement was introduced to the equation, and it explained an additional 18.8% variation in physics achievement. This change in R2 was significant, F (1, 291) = 79.443, p < .001. In addition, in this model the dimensions ‘comprehension’ (t = 2.994) and ‘requirement’ (t = 2.412) in attitude towards physics, ‘self-efficacy of learning physics’ (t = 2.963), and ‘mathematics achievement’ (t = 8.913) were significant correlates of physics achievement. There was also a significant relationship between the independent variables taken together and physics achievement in model 2. Together the independent variables significantly explained the 31.3% of the variation in physics achievement, F (6, 291) = 22.077, p < .001. Accordingly, the dimensions ‘comprehension’ (β = .151), ‘requirement’ (β = .127) in attitude towards physics, ‘self-efficacy of learning physics’ (β = .154), and ‘mathematics achievement’ (β = .443) positively and significantly contributed to the physics achievement in this model. Considering also the beta coefficients mathematics achievement was the strongest positive predictor of physics achievement, when the other variables were controlled.
Copyright (C) 2017 EdUHK APFSLT. Volume 18, Issue 1, Article 10 (Jun., 2017). All Rights Reserved.