Exam Preparation Time And Test Scores: Analysis And Solutions

Data Collection

One hundred students in the Holmes institute were sampled and the data recorded. The data is to be used to examine the relationship between the preparation time spent by each student for the exam and the reported mark.

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

Solutions

  1. Type of survey that could be used.

A one on one survey can be conducted by the instructors on the students inform of an interview where they can respond to question regarding their preferences on the time they spend in class and the time they spend learning on their own. Alternatively, the students can be given questionnaires to fill their preferences. This methods of survey are most preferable because they are simple and less time consuming.

  1. Sampling method that could be used to select the sample.

Simple random sampling would be used to select the sample size required. The method is more preferable since it is simple, faster and it ensures that the sample chosen is a true representation of the whole population.

  1. Determination of the dependent and independent variables and the data types for each variable.

Since we are trying to evaluate how the marks scored by the student are related to preparation time, the dependent variable will be the marks scored in the test while the independent variable will be the time in spent in preparation. Both the variables are continuous numeric in nature with preparation time indicating the amount of time spent preparing for the exam and the marks scored representing the score obtained in the examination.

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper
  1. Issues likely to be faced in collecting the data using the survey method chosen.

The instructor is likely to collect incorrect and misleading information caused by the likelihood of some of the respondent’s unwillingness to provide the correct information and maybe lack of time to give sufficient information.

The instructor is also likely to obtain a sample that is not representative of the whole population if he performs a poor simple random sampling. If for example he/she chooses the random sample on a single class or a group of individuals who most probably share common preferences, then the sample will not be representative of the whole population.

Developing a distribution table with class interval, frequency, relative frequency and cumulative frequency for each variable. Additionally, drawing the frequency histogram, relative frequency histogram, cumulative relative frequency histogram, and comment with reasons on the shape of the frequency histogram for each variable. 

The frequency histogram is slightly skewed to the left indicating that majority of the times spent in preparation slightly lie below the average. The same is depicted the numeric measure of skewness shown in the table below. 

It is similar to the frequency histogram only that here the frequencies are expressed relative of the sum of all frequencies. 

Solutions

The frequency histogram is slightly skewed to the left indicating that majority of the scores obtained in the test after the preparation slightly lie below the average. The same is depicted the numeric measure of skewness shown in the table below. 

It is similar to the frequency histogram only that here the frequencies are expressed relative of the sum of all frequencies.

The histogram for the cumulative relative frequency for the score obtained in the test is as shown below: 

  1. Using an appropriate scatter plot with a fitting line to investigate the relationship between the two variables with an explanation of the selection of each variable on the X and Y axes.

The scatter plot for the relationship between the marks scored and the time spent in preparing for the exam is as shown below: 

The variable marks’ is the dependent variable and therefore placed on the Y-axis while the variable preparation time is the independent variable and therefore placed on the x-axis. The scatter plot indicates that there is a positive linear relationship between marks scored and time spent in preparation.

  1. Presenting the equation of the fitted line (regression) and estimate the effect of an increase in the independent variable by one unit on the dependent variable.

The fitted line is shown in the scatter plot above. The resulting regression equation is: 

This means that the for every unit change of the independent variable time, then the marks obtained in the test would change with a factor of 0.5831.

  1. Preparing a numerical summary report about the data on the two variables by including the mean, median, range, variance, standard deviation, smallest and largest values, quartiles, interquartile range and the 30thpercentile for each variable.

The numerical measures determined by functions in excel are shown in the table below: 

  1. Computation and interpretation of a numerical measure to determine the strength and direction of the linear relationship between the two variables.

The numerical measures that determine the strength and direction of the of the linear relationship between the dependent and the independent variable is the correlation coefficient (Linoff, 2008). It is shown on the table below: 

The correlation coefficient is positive 0.5466 indicating that there is a positive linear relationship between the dependent and the independent variable. Since the value is closer to one than it is closer to zero, then the relationship is relatively strong. 

A part of multiple regression excel output used to determine the whether or not the height of sons is related to the fathers and mother’s height. It is to be used to answer a series of questions as below:

The standard error of the statistic and what it actually means

Standard error is a component in statistics that measures the accuracy by which a sample represents a population. A larger sample size should have a smaller standard error while a smaller sample size should have a quite a higher. In this case the standard error of the model is 8.068 and for the sample size provided, this value is averagely appropriate and therefore the sample can be said to be representative of the population. Alternatively, the standard error is used in statistics as a measure of precision for which regression coefficients are determined. In the case the standard error for the coefficient of the first variable is 0.0412, and that of second variable is 0.0395. Since the coefficient of variable 1 is larger than its standard error then it can be different from zero. The coefficient of variable 2 is less than its standard error and thus it can be said to be zero.

Data Analysis

The coefficient of determination and what it means.

The coefficient of determination is the r-squared value. It is used to tell the number of points that lie on the line of regression. In this case the coefficient of determination is 26.72% (0.2672) meaning that the only 26.72% of the variation of the dependent variable on the Y-axis around the mean are explained by values of the independent variables along the x-axis. In other words, only 26.72% of the values are fitted in the model.

The adjusted coefficient of determination for the degree of freedom and the meaning of the coefficient of determination and the adjusted coefficient of determination and what they say about how well the model fits the data.

The adjusted coefficient of determination is the value of the adjusted r-squared in the model and is used to adjust term numbers in the statistical model. In this case the value of is 26.35% (0.2635). Both the coefficient of determination and the adjusted coefficient of determination are used to explain the number of values fitted in the model. However, the adjusted coefficient of determination is used when there are more than one independent variables. In this case therefore, the coefficient of determination would be the most favorable in describing how the values fit the model.

To test of the overall utility of the model we use the values of the p-values for the coefficients and their respective t-statistics. A p-value less than the chosen level of significance and t-stat of greater magnitude mean that the predictor is statistically significant (Rumsey, 2007). In this case and using the default significance level of 0.05, the first independent variable (x1) has a p-value of 0.0000 and t-statistic of 11.7772 meaning it is statistically significant. On the contrary, the second independent variable (X2) has a p-value of 0.5615 and a t-statistic of -0.5811. Since the p-value is greater than the significance level and the t-statistic is close to zero, variable is not statistically significant and therefore can be dropped. The overall model can be said to be statistically significant since the value of significance F is less than the chosen level of significance which is 0.05.

In this model we have more than one independent variable and therefore, the coefficients of this variables will tell us how much the variable on the Y-axis or otherwise the dependent variable is impacted by a unit change in an independent variable when all other independent variables are held constant. In this model the first independent variable (X1) has a coefficient of 0.4849 meaning that the dependent variable would change by a factor of 0.4849 for a unit change of this independent variable. Likewise, the second independent variable (X2) has a coefficient of -0.0229 and therefore the dependent variable would reduce by a factor of 0.0229 for a unit positive change of this independent variable.

Determination of whether the data allow statistic practitioner to infer that the heights of sons and the fathers are linearly related.

Taking the first variable in the table labelled “Year” as the first independent variable (X1) representing the father’s heights, then it is okay to say that the data allows the statistic practitioner to infer that the heights of sons and the fathers are linearly related in the positive direction. This because this predictor is statistically significant as determined above.

Determination of whether the data allows the statistic practitioner to infer that the heights of the sons and mothers are linearly related.

Taking the second variable in the table labelled “Rate” as the second independent variable (X2) representing the mother’s heights, then we can say that despite the variable having a negative coefficient indicating a negative linear relationship, it still not okay for the statistic practitioner to conclude that the heights of sons and mothers are linearly related. This due to the fact that the variable is not statistically significant as determined above. 

References

Linoff, G. (2008). Data analysis using SQL and Excel. Indianapolis, Ind.: Wiley Pub.

Rumsey, D. (2007). Intermediate statistics for dummies. 1st ed. Hoboken, N.J.: Wiley.