Exploratory Data Analysis: Applying Analytical Strategies To An Area Of Research Interest

Overview of the Data File

1Exploratory Data analysis:

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

The data file provided for analysis indicates us mainly two tasks that are-

  1. To create a repeated-measures experimental design
  2. To generate a hypothetical repeated-measures experimental design

The data file helps to provide a conclusion to the teacher whether test scores of the students increase over the course of a 12-week period or not. Two types of variables are undertaken here. First one is “Gender” that is of two types, males and females. The Other variable is test score. It is tabulated week wise for 7 times in the defined period. The scores are tabulated respectively at pre-test and the end of 2nd, 4th, 6th, 8th, 10th and 12th week.

Table 1: Percentile Values of Scores

                                                                 

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

Three scores of quartiles (1st, 2nd and 3rd) show that the scores have increased consistently from first week to last week.   

Table 2: Descriptive Statistics table of Scores of seven different weeks

                                                                       

The observed test-scores seven observations tell that “range” of test-score are highest for score at the time of Pre-test (43) followed by score at the end of 6th week (40). The spread of the test-scores in terms of standard deviation is also highest at the time of Pre-test (12.22) followed by the score at the end of 6th week (10.67).

The average values show that overall scores of the students have consistently increased from first day (29.58) to 12th week (50).

As expected, the highest minimum and maximum scores are observed after the ending of course.

                                                                      

The distribution of test-scores at the time of first day of class shows that the range is high along with no outlier.

                                                                        

                                                                           

Repeated Measures ANOVA

                                                                            

The distribution of test-scores indicate that range is high without any outlier.

 1. Repeated Measures ANOVA:

                                                                                

Among 12 students, the number of males is 4 and the number of females is 8. The observed facts inferred from the descriptive statistics table are-

  • Female students scores better than Male students at the end of 6thweek of the course.
  • Male students scores better than female students at the time of starting of course and at the end of second, fourth, eighth, tenth and twelfth week.
  • Spread according to the standard deviation is higher among males than females throughout the observed periods.

                                                                           

Here, χ2 (20) = 56.876, p = 0.0. Hence, it could be referred that the assumption of “Sphericity” is not maintained for this dataset.  

If data set violates the assumption of “Sphericity”, the adjustment of degrees of freedom is executed by multiplying estimates of “Sphericity”. It should be noted that large deviations from sphericity might be rendered as insignificant for small samples.

Estimated Marginal Means

                                                                   

Part b)

 The main effect of gender is under consideration here. The significant p-value of the F-statistic (0.480) is 0.504. The calculated p-value is greater than 0.05. Therefore, it is 95% evident that main effect of “Gender” is insignificant towards the score of tests (Field, 2013).

We can conclude that main effect of “Gender” does not have statistical significance on the variation of test-scores.

The Post-hoc test of “Pair-wise Comparisons” show that the test scores due to gender differs by (±3.946), which provides the significant p-value 0.504. Therefore, the main effect of “Gender” is also found insignificant by Post-hoc test (Girden, 1992).

                                                                       

                                                                     

Analytical Strategy of ANOVA

Based on estimated marginal means
The mean difference is significant at the .05 level.
Adjustment for multiple comparisons: Bonferroni.

                                                                              

Part c)

The main effect of “Time” is to be discussed in this segment. The p-value of Wilk’s lambda is found here as 0.002. The significant p-value is less than 0.05. Therefore, we reject the basic assumption of significant effect of “Time” (time-periods) on the average scores with 95% probability.

Therefore, we can state that the effect of “Time” have significant effect on the average test scores. According to the time, average test-score has changed.  

According to the Post-hoc test, we find the significant association between the average test-scores of different periods. According to the p-value, the average test-scores of 1 and 2, 3, 4, 5 are equal. The average test-score of 1 and 6, 7 are unequal with 95% probability. The average test-scores of 2 and 3, 4, 5 are equal. The average test-score of 2 and 6, 7 are unequal. The average test-scores of 3 and 4, 5 are equal. The average test-score of 3 and 6, 7 are unequal. The average test-scores of 4 and 5, 6 and 7 are unequal. The average test-score of 5 and 6, 7 are unequal. The average test-score of 6 and 7 are equal with 5% level of significance. Hence, overall average test-scores is unequal.  

                                                                            

                                                                  

Part d)

The interaction effect of Score and Gender has significant p-value = 0.607 (>0.05). Therefore, we can accept the assertion of effect of gender to the average scores at 5% level of significance.

The interaction effect of these two effects is found insignificant

Overall Thoughtful Consideration

                                                                     

Time factor:

  • Huynh-Geisser correction (ε) = 0.0<0.05. “Sphericity” is maintained.
  • Greenhouse-Geisser = 0.0 <0.05. Therefore, F-statistic is significant.

Gender and Time factor:

  • Huynh-Geisser correction (ε) = 0.3>0.05. “Sphericity” is violated.
  • Greenhouse-Geisser = 0.3>0.05. Therefore, F-statistic is insignificant.

(Gchang.people.ysu.edu., 2018)

                                                                             

                                                                              

                                                                   

3.Analytical Strategy of ANOVA:

The two variables that we can utilize in the repeated measures ANOVA are –Average Hours of study and IQ level of the student.

The Average hours of study is a quantitative variable and IQ level of the student is an ordinal variable. The average hours of study varies in the range of 0 to 24 hours. The average range of study varies in the greater range. The IQ range is provided in following table-

                                                              

We assume that within the three period of the average hours of study and IQ level of the student have not changed. Therefore, both the factors should be fixed in this analysis.

In case of repeated measures ANOVA, the associations between pairs of experimental conditions are alike. This assumption is called “Sphericity” (Health.uottawa.ca., 2018). It easily cannot be compared to the tabulated values (ANOVA table) of the F-distribution. The significant presence of violation of “Sphericity” could be found in-

  1. Greenhouse and Geisser statistic
  2. Huynh and Feldt statistic
  3. The lower value estimate or the lowest possible theoretical value of the data

The “Greenhouse and Geisser statistic” and “Huynh and Feldt statistic” can both range from the lower bound to 1. The Mauchly’s test examines whether the variances of the differences between conditions are equal.

On the SPSS output, we look to find out Mauchly’s test statistic. If it has p-value less than 0.05, it could be concluded that there exists significant differences between variance of differences as condition of “Sphericity” is violated.

The effect of violating “Sphericity” is actually losing the power of F-test in ANOVA. The probability of type-II error gets increased in F-ratio. In that situation, adjustment of degrees of freedom is necessary as it makes F-statistic more conservative. The smaller degrees of freedom of Study hours, IQ level and their interaction effect would manage the issue of violation of “Sphericity”.  

Overall Thoughtful Consideration:

The repeated ANOVA calculation helps to draw conclusion that as the course time proceeded, the average scores of tests overall increased. However, the effect of gender factor is negligible in scores of different weeks. The performance in test is not varying significantly when interaction effect of time and gender is considered. The addition of two chosen variable that are average daily study hours and IQ level could provide a better analysis. However, it is mandatory to be aware of maintaining “Sphericity” that previously was maintained in discussed analysis. Otherwise, bias in conclusion may arise.  

References:

Field, A. (2013). Discovering statistics using IBM SPSS statistics. sage.

Gchang.people.ysu.edu. (2018). Cite a Website – Cite This For Me. [online] Available at: https://gchang.people.ysu.edu/SPSSE/SPSS_EDA_16.pdf.

Girden, E. R. (1992). ANOVA: Repeated measures (No. 84). Sage.

Health.uottawa.ca. (2018). Cite a Website – Cite This For Me. [online] Available at: https://health.uottawa.ca/biomech/courses/apa6101/Repeated%20Measures%20ANOVA.pdf.