Analysis Of Variance (ANOVA) Table, Hypothesis Tests, And Prediction For Variance Inflation Factor

Description of Data

The data is collected is for 382 days between 2014 and 2016. The data has twelve variables; the first variable is date, the other ten are used as independent variables while the twelve variable is used as the dependent variable. The variables indicate the variations in prices of financial assets for the days on which the data was collected. Some of the variables and their purposes are listed below:

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper
  • Date: Indicating the year upon which the data was collected
  • Aluminum_Vel1: Indicating change in prices of aluminum backdated by a day
  • Copper_Vel1: Indicating change in prices of copper backdated by a day
  • US_Gasoline_Vel1: Indicating change in prices of Gasoline backdated by a day.
  • West_Texas_Vel1: Indicating change in price of West Texas Intermediate Oil by backdated by a day.
  • SPDR_XL1: Indicating the U.S. industrial confidence for industrial-oriented firms.
  • CA-Dollar_Vel1: The exchange rate between U.S. and Canadian dollar backdated by a day.
  • SP 500: The standard and Poor’s 500 index of stock prices.

The data also consists of interaction variables which the product of two variables to create new variables. These variables are:

  • Year x WERN
  • 30year x Copper_Vel1
  • Aluminum_Vel1 x Aluminum_Vel1
  • Aluminum_Vel1 x West_Texas_Vel1
  • Baltic_Vel1 x Copper_Vel1
  • SPDR_XL1 x West_Texas_Vel1

The variable contained in the last column of the data is the dependent variable that needs to be predicted. It has been sorted and ranked with the initial percentage variations in price divided by the total number of rows in the data so that they vary from 0 to 1. Zero is the maximum decrease in price, the median 0.5 indicates no change in price and 1 indicates the maximum increase in price.

The variance inflation factor (VIF) quantifies the magnitude of inflation of the variance. It is used to test for multi-collinearity in the variables of the data set. Multi-collinearity is defined as the existence of a high correlation between more than one independent variables. The existence of multi-collinearity in a dataset creates a difficulty of fitting a regression model into the dataset (Hinton, 2014).

The VIF’s were determined using the PHStat excel plug in on each of the independent variables and are as indicated in the appendix. Since these values are less than five, there is little or no correlation between the independent variables and therefore, they are independent and none of them needs to be ignored while creating the prediction model.

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

Normal probability plots are utilized in statistics to identify any observable departure from normality of a value. This includes kurtosis, skewness and outliers. Here the plot is used to indicate the presence of outliers in the residuals.  

The plot is slightly non-linear at the 50th percentile as a result of less trading days for which the share price was stagnant making these days rank equivalent to 0.5. However, the plot can be assumed to be linear indicating that there are no observable outliers. A conclusion can be reached that there is a normal distribution of the residuals and that any hypothesis test that will be performed will be accurate and reliable.

When carrying out regression, the regression analysis model assumes that there is a normal distribution of the standard residuals. The standard residue histogram is used to visualize the distribution of the standard residuals. 

The histogram indicates a normal distribution with just a little skewness to the right. However, the skewness is assumed and a conclusion of normal distribution is reached. This means that hypothesis test and predictions carried out will be reliable and safe.

Analysis of Variance (ANOVA) table is a technique used in statistics to perform hypothesis test to determine whether or not there exists a relationship between the dependent and the independent variables. The null hypothesis is that there is no linear relationship between the variable, while the alternative hypothesis is that there is a linear relationship between the variables (Foster, Barkus & Yavorsky, 2016). The null hypothesis is rejectected when the value of significance F is less than the overall significance level and fails to be reject if the inverse is true. In this case the value of significance F is . 

Variance Inflation Factor (VIF)

This value is less than the overall significance level of 0.05 and therefore there is sufficient evidence in favor of the alternative hypothesis and thus the null hypothesis can be rejected. It is therefore sufficient to conclude that there exists a relationship between the dependent variable and either independent variables.

The coefficient of determination or the r-squared value is used to indicate the number of points. The table below shows the regression table with the coefficient of determination. 

The coefficient of determination in this case is 14.76% indicating that only 14.76% of the variation of the dependent variable (future change in price) around the mean are explained by the independent variables. The value is too low and therefore the regression model does not give that much description of the future change in share price for the Werner enterprises.

If the future share price for the enterprise was unpredictable, then the value of the coefficient of determinant would be zero, in this case the value is not zero and therefore the stock market is not perfectly random and predictions can still be made.

We look at the respective p-values of the independent variables and compare them against the overall significance level of the model. If the p-value of any of the independent variables is less than the overall significance level, then we conclude that the variable is statistically significant and can be used in the model, otherwise the variable is said to statistically insignificant and is dropped from the model. 

From the table, every explanatory variable has a p-value less than the overall significance level of the whole model, there they are all statistically significant and none of them can be dropped from the regression model.

The coefficient of independent variables above indicates how each of them impact the dependent variable when all other independent variables are kept constant.

The largest positive coefficient is for the interaction variable aluminum_vel1 x West_Texas_vel1. The coefficient is 0.5711 meaning that there would be an increase by 0.5711 of the future share price for a unit increase in this interaction variable. On the other hand, the largest negative coefficient is of the interaction variable “SPDR_XL1 x West_Texas_Vel1. The coefficient is -0.487 and means that the future input share price would reduce by factor for a unit change of this interaction variable.

It is evident that none of the coefficients is equal of very much close to zero and therefore, every independent variable is related to the dependent variable to some extent and thus none is eligible of being deleted from the regression model.

To predict the future share in price the ANOVA table and the confidence interval estimate & prediction is used. Here, the historical data is used to derive a prediction of the variation in the share price and then compare it with actual achieved change in share price 

It is evident that the predicted values of the share price are nowhere close to the actual values of the share price. The two last rows indicate the limits between which the actual share price is expected to fall within. It can be noted that the actual values of share price fall within the limits within which they are expected from the prediction indicating that the model is sufficient and error free.

In all the sampled days the difference between the actual and the predicted share price is within a maximum deviation of ±0.3. With the value of the coefficient of determinant being so low, it can be said that the wide prediction interval does not guarantee a consistent prediction of the share price either going up or going down.

Conclusion

Multiple linear regression has been applied to historic data to predict the future change in the share price of the Werner Enterprises, Inc. The VIF indicated that there was no correlation between the independent variables. Residuals analysis indicated that the dataset under consideration was normally distributed and therefore valid for hypothesis test. The coefficient of determinant indicated that only 14.76% of the dependent variable around the mean was explained by the independent variables. The analysis of variance indicated there was a relationship between the dependent and the independent variables. The prediction performed indicated that there was a margin of difference between the actual change in share price and the predicted value of share price. This change could be attributed to the value of the coefficient of determination being so low. However, if the value of the coefficient of determination was large enough then accurate and reliable prediction would be achievable. 

References

Foster, J. J., Barkus, E., & Yavorsky, C. (2006). Understanding and using advanced statistics (2nd ed). London: SAGE.

Hinton, P. R. (2014). Statistics explained (3rd ed). London: Routledge, Taylor & Francis Group.