Multiple Linear Regression Analysis For Market Price Of Homes

Scatter Plot

The paper discusses the change of the Market price (Y) of houses due to changes in the Price index (X1), Annual % change (X2), Total number of square meters (X3), and Age of house (years) (X4) with the help of building regression model. Y is the dependent variable and the rest of the four variables are the predictor variables. The multiple linear regression model is fitted using the Data Analysis Tool-Pak of MS Excel software tool (Slezà et al., 2014). There are 15 observations from the financial years 2002-03 to 2016-17. The output of the regression analysis needs to be discussed in detail with the model estimation, building confidence interval, and elaborate interpretation.

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

Analysis for Sydney City

  • Graphical representations

The scatter plot is shown below:-

The dependent variable Market price is plotted on the vertical axis and the rest of the predictor variables are plotted against the horizontal axis. The scatter plot shows a positive relationships between the Price index and Market price of Sydney and Market price and Total number of square meters but the relationship is weaker for the latter. The Annual % change and Age of house are showing negative association with the Market price (Tang & Zhang, 2013).

  • Description of the regression model:

Let the multiple linear regression model be defined as,

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

Y = b0 + b1X1 + b2 X2 + b3X3 + b4X4 + e ;

b0 = the y-intercept

b1 = the partial regression coefficient of X1

b2 = the partial regression coefficient of X2

b3 = the partial regression coefficient of X3

b4 = the partial regression coefficient of X4

e = residual of estimation

The output of the regression analysis is shown in the table below:

Regression Statistics

 

Multiple R

0.88916481

R Square

0.79061406

Adjusted R Square

0.70685968

Standard Error

43.8878261

Observations

15

ANOVA

 

df

SS

MS

F

Significance F

Regression

4

72728.5872

18182.14679

9.43967451

0.001993481

Residual

10

19261.4128

1926.141284

Total

14

91990

 

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Intercept

548.978108

81.13153739

6.766519231

4.94032E-05

368.2057774

729.7504386

Sydney price Index

1.963493894

0.583205471

3.366727492

0.007160758

0.664031125

3.262956664

Annual % change

-5.622204236

3.240109357

-1.735189655

0.113361729

-12.84161778

1.597209306

Total number of square meters

0.519145629

0.3239088

1.60275247

0.140071458

-0.202568152

1.240859409

Age of house (years)

-2.48786597

1.129750872

-2.2021368

0.052251738

-5.005107781

0.029375841

Table 1

(Source: As created by the Author)

(4)

Model

The estimated regression equation is given from Table 2 as,

The equation shows that there is a linear relationship of Market price with the four predictor variables (Cameron & Trivedi, 2013). The Sydney price index and the Age of house have negative association whereas, the rest of the two have positive association with the Market price.

  • Interpretation of the Coefficients

Here,

The y-intercept indicates that estimated Market price = 548.978 if all the independent variables are zero. The slope coefficient  states that one unit increase in the Sydney price index will increase the market price by 1.963 units, keeping other variables constant. means that a one unit increase in the Annual % change will decrease the Market price 5.622 units. The value of  indicates 0.519 unit increase of the Market price for single unit increase of X3 variable, keeping X1, X2, and X4 constant. One unit increase in the X4 variable will decrease the Y value by  if other variables are constant (Hinton, 2014).

  • R-square

The value of the coefficient of determination of R2 is 0.790614058 which indicates a good fit as higher the value of R2 interprets better fitting of the regression model.

Here, 79.06% of the variability of the Market price is explained by the independent variables.

  • Confidence intervals (CI)

The 95% CI for the regression coefficient of each of each of the independent variables can be evaluated using the formula (, i = 1 to 4.  standard error, and  = critical value of the t-statistic at 5% significance level (Zou, 2013). The values are already obtained in the Excel sheet.

Regression Model

95% CI for X1 = (0.664031125, 3.262956664); It represents that the researcher is 95% confident that the Price index value of Sydney will lie between 0.664031125 and 3.262956664.

95% CI for X2 = (-12.84161778, 1.597209306); Here the value of Annual % change changes between the lower bound = 0.664031125 and upper bound = 3.262956664 with 95% confidence.

95% CI for X3 = (-0.202568152, 1.240859409); the values of variable X3 will lie within the given interval with 95% confidence having upper bound 1.24085 and lower bound -0.020257.

95% CI for X4 = (-5.005107781, 0.029375841); Like the above three, the value of X4 variable will lie within the calculated CI with 95% confidence level.

The simple linear regression model is shown below:

The regression analysis shows that the R2 value = 0.098 that implies that model is not a good fit. The estimated regression equation is-

 = 659.143 + 0.5636

The R2 value of the former (original) regression model is 0.790614058 which is higher than that of the re-estimated model (0.098125419). Thus the, the multiple linear regression model is a better fit for the Market price than the simple linear regression model. Greater percentage (79.06%) of variability in the predicting variable is explained by all the four predictor variables than that of the variability explained by only “Total number of square meters” (9.81%).

If Square meters = 400 then the estimated Market price value = 659.143 + (0.5636  400) = 884.583 that is, $884583.

This part shows the output of the regression analysis for the Market price of the Brisbane:

Regression Statistics

 

Multiple R

0.847362

R Square

0.7180224

Adjusted R Square

0.6052313

Standard Error

12.573407

Observations

15

ANOVA

 

df

SS

MS

F

Significance F

 

Regression

4

4025.588

1006.397

6.365952

0.008182698

Residual

10

1580.906

158.0906

Total

14

5606.493

 

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Intercept

89.871245

16.91237

5.313936

0.000341

52.18814276

127.5543

Brisbane Price Index

-0.5082788

0.47905

-1.06101

0.313636

-1.575668349

0.559111

Annual % change

1.4620069

1.016153

1.438766

0.180771

-0.802123664

3.726137

Total number of square meters

0.056465

0.094729

0.59607

0.564376

-0.154603966

0.267534

Age of house (years)

-0.7797538

0.348462

-2.2377

0.049196

-1.556176484

-0.00333

Table 2

(Source: As created by the Author)

The estimated regression equation is given from the above table is written as,

The equation shows that there is negative linear relationship of the Market price with the Brisbane price index and Age of house. However, the relationship is positive for Total number of square meters and Annual % change (Cohen, West & Aiken, 2014).

The y-intercept indicates that estimated Market price = 89.871 if all the independent variables are zero. The Market price increases 1.462 units for unit increase in the X2 variable keeping other variables constant. Similar changes occur for the X3 variable. The 0.508 unit decrement in the Market price occurs for the one unit increment of the Brisbane price index. The regression coefficient being negative, a one unit increase in the X4 variable denotes 0.78 unit decrease in the Y variable.

The value of the coefficient of determination of R2 is 0.718 which suggests that the regression model is a good fit for the given predictor variable. Approximately, 71.8% of the variation in the value of the Market price variable is explained by the four dependent variables.

95% CI for X1 = (-1.575668349, 0.559110736); With 95% confidence, the Brisbane Price index value will lie between -1.575668349 and 0.559110736.

95% CI for X2 = (-0.802123664, 3.726137411); Here the value of Annual % change changes between the lower-bound = -0.802123664 and upper-bound = 3.726137411 with 95% confidence.

95% CI for X3 = (-0.154603966, 0.267533947); the values of variable X3 will lie within the given interval with 95% confidence.

Estimated Regression Equation

95% CI for X4 = (-1.556176484, -0.003331143) that shows the upper-bound and lower-bound of the X4 variable within which the values will lie with 95% confidence.

The simple linear regression model is shown below that has R2 value 0.097 which is small, indicating a very poor fit. Here the fit is explaining 9.73% variation in the Market price explained by Total number of square meters.

The regression model is defined as,

 = 65.988 + 0.139

The R2 value of the former (original) regression model is 0.718022376 which is higher than that of the re-estimated model (0.097334455). Thus, the multiple linear regression model is a better fit for the Market price than the simple linear regression model.

If Square meters = 400 then the estimated Market price value = 65.988 + (0.139  400) = 121.588 or $121588.

Regression Statistics

 

Multiple R

0.801295

R Square

0.642074

Adjusted R Square

0.498904

Standard Error

19.79055

Observations

15

ANOVA

 

df

SS

MS

F

Significance F

 

Regression

4

7025.999

1756.5

4.484691

0.0247334

Residual

10

3916.658

391.6658

Total

14

10942.66

 

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Intercept

82.70826

26.59056

3.110437

0.011052

23.460804

141.9557

Melbourne Price Index

-0.67601

0.927016

-0.72923

0.482589

-2.741533

1.389509

Annual % change

1.851263

1.501679

1.232796

0.245847

-1.494686

5.197213

Total number of square meters

0.116315

0.145828

0.79762

0.443618

-0.20861

0.44124

Age of house (years)

-1.24947

0.451735

-2.76594

0.019926

-2.255999

-0.24294

Table 3

(Source: As created by the Author)

The estimated regression equation is given from the above table is written as,

The equation shows that there is negative linear relationship of the Market price with the Melbourne price index and Age of house. However, the relationship is positive for Total number of sq-meters and Annual % change.

Here,

The y-intercept indicates that estimated Market price = 82.708 if all Xi’s are zero. The Market price decreases 0.676 units for unit increase in the X1 variable keeping other variables constant. The decrement of 1.24974 units of the Market price occurs for a single unit increase in the X4 variable. One unit increase of the X2 variable indicates an increase of 1.851 units and one unit increase of the X3 variable interprets 0.116 units of the Y variable (Bates et al., 2014).

The value of the coefficient of determination of R2 is 0.642074 which suggests a moderately good fit of the regression model for the given predictor variable. 64.21% of the variation in the value of the Market price variable is explained by the four dependent variables (Nimon & Oswald, 2013).

95% CI for X1 = (-2.741533069, 1.389508577); With 95% confidence, the Brisbane Price index value will lie between -2.741533069 and 1.389508577.

95% CI for X2 = (-1.494686069, 5.19721276); Here the value of Annual % change changes between the lower-bound = -1.494686069 and upper-bound = 5.19721276 with 95% confidence.

95% CI for X3 = (-0.208609648, 0.441240343); the values of variable X3 will lie within the given interval with 95% confidence.

95% CI for X4 = (-2.255998646, -0.242941926) that shows the upper-bound and lower-bound of the X4 variable within which the values will lie with 95% confidence.

The simple linear regression model is shown below that has R2 value 0.108.

The regression model is defined as,

 = 50.092 + 0.204

The R2 value of the former (original) regression model is 0.642074 which is higher than that of the re-estimated model 0.108085. Thus, the multiple linear regression model is a better fit for the Market price than the simple linear regression model.

If Square meters = 400 then the estimated Market price value = 50.09162 + (0.204012  400) = 131.69642 or $131696.

Conclusion

From the above multiple linear regression analyses of the Market price of house for three cities in Australia- Sydney, Brisbane, and Melbourne, it can be concluded that the four chosen independent variables are providing a good fit of the regression model for the dependent variable Market price for all the three cities as the values of the coefficients of determination are more than 0.5. On the other hand, the re-estimated simple linear regression model where the predictor variable is only the Total number of square meters does not provide a good explanation for the variability in the Market price variable. Therefore, the multiple linear regression model is the recommended model to predict the value of the market price variable.

References

Bates, D., Maechler, M., Bolker, B., & Walker, S. (2014). lme4: Linear mixed-effects models using Eigen and S4. R package version, 1(7), 1-23.

Cameron, A. C., & Trivedi, P. K. (2013). Regression analysis of count data (Vol. 53). Cambridge university press.

Cohen, P., West, S. G., & Aiken, L. S. (2014). Applied multiple regression/correlation analysis for the behavioral sciences. Psychology Press.

Hinton, P. R. (2014). Statistics explained. Routledge.

Nimon, K. F., & Oswald, F. L. (2013). Understanding the results of multiple linear regression: Beyond standardized regression coefficients. Organizational Research Methods, 16(4), 650-674.

SlezÃ, P., Bokes, P., Pavol, N. Ã., & WaczulÃkovÃ, I. (2014). Microsoft Excel add-in for the statistical analysis of contingency tables. International Journal for Innovation Education and Research, 2(5), 90-100.

Tang, Q. Y., & Zhang, C. X. (2013). Data Processing System (DPS) software with experimental design, statistical analysis and data mining developed for use in entomological research. Insect Science, 20(2), 254-260.

Zou, G. Y. (2013). Confidence interval estimation for the Bland–Altman limits of agreement with multiple observations per individual. Statistical methods in medical research, 22(6), 630-642.