Linear Regression Model For Housing Market Price

Scatter plots for visual analysis

Understanding of the factors that affect housing prices and robust prediction models serve to be essential for those looking to buy houses but also for real economy as seen from various studies (Favara & Imbs, 2015). The study underlines and investigates certain factors which impact housing price.

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

The study is based on housing data from Australia in the period 2002 to 2017. The data is annual and so the sample size used is 15. The dependent variable for this study is housing prices in thousands AUD. Housing price index is taken as a predictor of price. This is a natural choice since rice indices serve to give basis of price movements. The index for Sydney is chosen owing to the fact that Sydney is the capital of Australia and a key market for real estate (Wu, Deng, & Liu, 2014). The study also includes annual percentage change in price as another independent predictor. Any increase or decrease in percentage change is expected to indicate whether price will rise or fall (Adelino, Schoar & Severino, 2015). A key parameter determining price is contemplate to be that of area of land of the property. A larger area is hypothesized and investigated by studies to have higher price (Xiao, Orford, & Webster, 2016). So area of land in squared meters was taken as independent variable. Again studies have identified age of the property in years as another independent variable (Xu et al., 2018). It is seen that people usually opt for older property to reduce the expense where they can just renovate an already existing property bypassing the cost of building the structure from scratch (Knoll, Schularick  & Steger, 2017).  

The linear relationship between housing price of a typical real estate property and the housing price index specific to Sydney in the period 2002 to 2017 was explored by use of the scatter diagram of housing price against price index of Sydney. The following figure shows the same and a moderate-strong positive relationship is indicated from the graph. The correlation coefficient was observed to be 0.805. 

Figure 1: Scatter plot of Sydney Price Index against Market Price

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

The change in percentage annual change of housing price in Sydney in the period 2002 to 2017 with respect to the annual overall Market Price (in thousands) is depicted in the following figure. The Market price is seen to increase with an increase in annual percentage change suggesting a positive relationship between the two.  The correlation coefficient was found to be 0.405.

Full model and interpretation

Figure 2: Scatter plot of Annual percentage change against Market Price

The linear relationship between the variables Market price (in thousand) and the total are of land of the property measured in squared meters was found to be such that the Market price increases slightly with increase in the area of land. The following figure shows the same and hence a mild positive correlation is suggested to exist between the two variables. The coefficient of correlation (r2) was found to be equal to 0.313.

Figure 3: Scatter plot of total area of real estate (in sq. m) against Market Price

The Market price of housing was seen to decrease with the increase in the age of the house or real estate measured in years. This following figure shows this relationship between the two variables and vice versa. Therefore a negative relationship was perceived to exist between age of the estate and the market price. The correlation coefficient was also found to be equal to -0.6779.

Figure 4: Scatter plot of total area of Age of House (in years) against Market Price

The linear model using these predictor variables was fitted using the ordinary least squares method and the fitted regression equation was estimated as follows:

Market Price = 548.978 + 1.963 Sydney Price Index – 5.622 Annual % Change + 0.519 Total Number of Square Meters– 2.488 Age of house (in years)

The market price is seen to be equal to 548.978 thousand AUD in the case of absence of all the predictors. The Market price increased by 1.963 thousand units when Sydney price increases by a single unit, the market price falls by 5.622 thousand units when annual percentage price change increases by a single unit, the market price is seen to increase by 0.159 thousand units when total number of squared meter area of estate increases by one unit and the market price decreases by 2.488 thousand units when the age of the estate or house in years increases by a single unit (Pfister, Schwarz, Carson  & Jancyzk,  2013).

The following table gives the estimated coefficients as shown in the fitted regression equation above for the predictors of the linear model, their standard errors, and the value of the computed t-statistic for test of significance, the p-value for test of significance of the coefficients and the 95 percent confidence interval for the estimated coefficients (Abbott, 2014).

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Intercept

548.9781

81.13154

6.766519

4.94E-05

368.2058

729.7504

Sydney price Index

1.963494

0.583205

3.366727

0.007161

0.664031

3.262957

Annual % change

-5.6222

3.240109

-1.73519

0.113362

-12.8416

1.597209

Total number of square meters

0.519146

0.323909

1.602752

0.140071

-0.20257

1.240859

Age of house (years)

-2.48787

1.129751

-2.20214

0.052252

-5.00511

0.029376

Table 1: Regression Model Summary (Full Model)

Regression coefficients and their significance

The variable Sydney price index has the coefficient point estimate as 1.96 and the 95 percent interval estimate was found to be between 0.664 and 3.263. This means that there is 95 percent chance than the interval contains the actual regression model coefficient of Sydney price index. The p-value was found to be 0.007 which is less than 0.05 and thus the predictor is significantly different from zero at 5% level of significance change (Bluman, 2015).  Similarly, the variable annual percentage change on the other hand has the coefficient point estimate as -5.622 and the 95 percent confidence interval was found to be between -12.8416 and 1.597209. This means that there is 95 percent chance than the interval contains the actual regression model coefficient of annual percentage The p-value was found to be 0.113 which is greater than 0.05 and thus the predictor was not found to be significantly different from zero at 5% level of significance. Again, the point estimate of the predictor total number of square predictors was found to be 0.519 and the 95 percent confidence interval was computed to be -0.20257 to 1.240859, which has a 95 percent chance of belonging containing the actual value and the p-value was found to be 0.14 which being greater than 0.05, the significance test for deviation from zero was rejected at 5% level of significance. Finally, the predictor, age of house (in years) was found to have the point estimate -2.487 and the 95 percent confidence interval was found to be -5.011 and 0.02, which has the probability 0.95 to contain the actual value of the coefficient. The p-value for test for deviation from zero was found to be 0.052 which is marginally greater than 0.05 and hence the significance test failed to reject the conjecture that the coefficient is zero at 5% level of significance (Siegel, 2016).

Regression Statistics

Multiple R

0.889165

R Square

0.790614

Adjusted R Square

0.70686

Standard Error

43.88783

Observations

15

Table 2: Regression Statistics

The measure of goodness of fit of the linear regression model as given by the coefficient of determination of R2 was found to be 0.79. This is the ratio between the variation explained by the model and the total variance of the dependent variable (Draper & Smith, 2014). The coefficient of variation says that model explains 79 percent of the total variation in the response variable, housing market price.

The ANOVA test shows that p-value is less than 0.05 and so the model is significant at 5% level.

ANOVA

df

SS

MS

F

Significance F

Regression

4

72728.59

18182.15

9.439675

0.001993

Residual

10

19261.41

1926.141

Total

14

91990

Coefficient of determination

Table 3: ANOVA Test for Model Significance

Now, considering the linear model with predictor Total number of square meter area of estate and response Market price in thousand to investigate the linear relation between just these two, the following least square regression equation was determined.

Market Price = 659.143 + 0.563 Total Number of Square Meters 

The equation shows that there is a positive linear relationship between the predictor and the response. The Market price increases by 0.563 thousand units with unit increase in total number of square meter area. The following table shows the summary output of the regression fit using least squares method.

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Intercept

659.143

101.2221

6.51185

1.97E-05

440.466

877.8201

Total number of square meters

0.563603

0.473897

1.189294

0.255593

-0.46019

1.587396

Table 3: Model 2 Summary

The coefficient has the interval estimate given by the 95 percent confidence interval, as 440.466 to 877.8201 and the p-value was found to be greater than 0.05 which implies that the test for deviation from zero for the coefficient is not significant at 5% level of significance.

The following table shows the regression statistics for the model, which shows the coefficient of determination which is a measure of goodness of fit to be equal to 0.098. This means that the model explains only about 9.8 percent of the variation in the Market price (in thousands).

Regression Statistics

Multiple R

0.31325

R Square

0.098125

Adjusted R Square

0.02875

Standard Error

79.88619

Observations

15

Table 4: Model 2 Regression Statistics

The following table gives the results of the ANOVA test for significance of the overall linear model. It is seen that the p-value is greater than 0.05 and hence not significant at 5% level.

df

SS

MS

F

Significance F

Regression

1

9026.557

9026.557

1.414421

0.255593

Residual

13

82963.44

6381.803

Total

14

91990

Table 5: ANOVA Test for Model Significance for Model 2

The predicted value of market price based on the second model when the total number of squared meter area of estate is 400 meter square is given by plugging it in the predicted equation (Salkind, 2016):

Predicted Market price = 659.143 + 400 × 0.563603 = 884.584 (in 000) AUD

The study analyzed the linear relationship between the independent variables, Age of household, House Price Index of Sydney, percentage annual change in house price and total area of real estate in square meters with the dependent variable Market price measured in thousand AUD, the linear model as shown in the following diagram was studied and its validity was investigated.

Figure 5: Conceptual Model

The linear model explaining housing price using the predictors, age of house, area of estate in squared meters, housing price index of Sydney and percentage annual change was found to be significant as apparent from the results shown in table 3. The variables found to have a significant effect on the monthly price is the Sydney price index as shown in table 1. The variable age of house although had a moderately strong and negative correlation with the price was not found to be statistically significant in the model (Rumsey, 2015).

Comparison between two regression models

 Contemplating on the impact that total number of square meter area may have on the market price a second linear regression model was constructed with only the area variable as predictor and the market price. Comparing the coefficient of determination for the first and second model it can be said that the first model explains more percentage of the response variable and so the first model is considered to be better than the second one with only the insignificant predictor, total number of meter squared estate area (Moorad & Wade,  2013). The market price for housing was then predicted using the second model for the area of the estate being 400 m2 and found to be 844.584 AUD (in 000).

However, the ANOVA test for the significance of the model in table 5 showed that the second model is not significant since the p-value was found to be greater than 0.05 (Anderson et al., 2016). Thus the variable total area of estate although was seen  to have a moderate degree of positive association with the market price, but was not found to be significant for the data considered in this case at least.

Conclusion

The study concludes that Sydney’s house price index is a significant factor which influences the market price of house in Australia. The area of the estate measure in number of square meters was found to have a positive association with the market price, however not enough evidence could be found to say that it has a significant impact on the market prices. Similarly, the age of the estate has been found to have a negative association with the market price of housing however there is a lack of evidence to assert that it has any major impact on the market prices. The same can be said about the percent annual change in housing price which has a positive association but not enough evidence to assert that it has an impact in determining market price. Even so the second model predicted the market price to be 844.584 AUD (in 000) when the area of estate is 400 m2.

Reference

Abbott, M. L. (2014). Understanding educational statistics using Microsoft Excel and SPSS. John Wiley & Sons.

Adelino, M., Schoar, A., & Severino, F. (2015). House prices, collateral, and self-employment. Journal of Financial Economics, 117(2), 288-306.

Anderson, D. R., Sweeney, D. J., Williams, T. A., Camm, J. D., & Cochran, J. J. (2016). Statistics for business & economics. Nelson Education.

Bluman, A. G. (2015). Elementary Statistics: A Step by Step Approach: a Brief Version. McGraw-Hill Education.

Draper, N. R., & Smith, H. (2014). Applied regression analysis(Vol. 326). John Wiley & Sons.

Favara, G., & Imbs, J. (2015). Credit supply and the price of housing. American Economic Review, 105(3), 958-92.

Ferrero, A. (2015). House price booms, current account deficits, and low interest rates. Journal of Money, Credit and Banking, 47(S1), 261-293.

Knoll, K., Schularick, M., & Steger, T. (2017). No price like home: Global house prices, 1870-2012. American Economic Review, 107(2), 331-53.

Moorad, J. A., & Wade, M. J. (2013). Selection gradients, the opportunity for selection, and the coefficient of determination. The American Naturalist, 181(3), 291-300.

Pfister, R., Schwarz, K., Carson, R., & Jancyzk, M. (2013). Easy methods for extracting individual regression slopes: Comparing SPSS, R, and Excel. Tutorials in Quantitative Methods for Psychology, 9(2), 72-78.

Rumsey, D. J. (2015). U Can: statistics for dummies. John Wiley & Sons.

Salkind, N. J. (2016). Statistics for people who (think they) hate statistics. Sage Publications.

Siegel, A. (2016). Practical business statistics. Academic Press.

Wu, J., Deng, Y., & Liu, H. (2014). House price index construction in the nascent housing market: the case of China. The Journal of Real Estate Finance and Economics, 48(3), 522-545.

Xiao, Y., Orford, S., & Webster, C. J. (2016). Urban configuration, accessibility, and property prices: A case study of Cardiff, Wales. Environment and Planning B: Planning and Design, 43(1), 108-129.

Xu, Y., Zhang, Q., Zheng, S., & Zhu, G. (2018). House Age, Price and Rent: Implications from Land-Structure Decomposition. The Journal of Real Estate Finance and Economics, 56(2), 303-324.