Regression Analysis For Predicting Market Price Of Houses In Sydney, Melbourne And Brisbane

Discussion

The paper aims to provide a predictive model of the Market price of houses on the basis of the Annual % change, Price Index, Total number of square meters, and Age of house (years) using statistical analysis technique Regression. This multiple linear regression model is built using the Data Analysis Tool-Pak of Microsoft Excel Tool. The dependent variable (Y) is Market price and the rest of the four variables are predictor variables which are denoted as,

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

Price index -> X1, Annual % change -> X2, Total number of square meters -> X3, Age of house->X4. This analysis will also help to measure the strength of association of the variables and also estimates the Market price value from the values of the response variables for the three different cities in Australia- Sydney, Melbourne, and Brisbane. The responses are recorded for 15 financial years, starting from 2002-03 to 2016-17.

The multiple regression model is given by, Y = b0 + b1X1 + b2 X2 + b3X3 + b4X4 + e where,

b0 = the y-intercept, b1 = partial regression coefficient of X1, b2 = partial regression coefficient of X2, b3 = partial regression coefficient of X3, b4 = partial regression coefficient of X4, e = error of estimation.

A.2. Graphical representation

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

Firstly, the data is analysed with the help of scatter plot where the response variable Market price is plotted against the vertical axis and all the four explanatory variables are plotted along the horizontal axis. The scatter plot is shown below:

Figure 1

(Source: As created by the Author)

The data points of the explanatory variables are plotted along with trend-lines. The plot shows change of the response variable for change in the values of the explanatory variables graphically. The plot shows positive relationship for X1 and X3 variables and negative relationship for X2 and X4 variables.

The output of the regression analysis from Excel tool is shown below:

Regression Statistics

Multiple R

0.88916481

R Square

0.79061406

Adjusted R Square

0.70685968

Standard Error

43.8878261

Observations

15

ANOVA

df

SS

MS

F

Significance F

Regression

4

72728.5872

18182.14679

9.43967451

0.001993481

Residual

10

19261.4128

1926.141284

Total

14

91990

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Intercept

548.978108

81.13153739

6.766519231

4.94032E-05

368.2057774

Sydney price Index

1.963493894

0.583205471

3.366727492

0.007160758

0.664031125

Annual % change

-5.622204236

3.240109357

-1.735189655

0.113361729

-12.84161778

Total number of square meters

0.519145629

0.3239088

1.60275247

0.140071458

-0.202568152

Age of house (years)

-2.48786597

1.129750872

-2.2021368

0.052251738

-5.005107781

Table 1

(Source: As created by the Author)

A.4. Regression equation

The estimated regression equation is written as,

There is negative linear association of Y with the Sydney price index and the Age of house. This association is positive for Total number of square meters and Annual % change.

 represents that estimated market price = 548.978 when values of all the explanatory variables are zero.  states that there will be 1.963 units of increment in the response variable for one unit increase in the X1 variable. = -5.622 means that a one unit increase in the Annual % change will decrease the Market price by 5.622 units. The decrease of 2.488 units occurs in Market price for one unit increase in Age of house. There is an increase of 0.519 increase in the Y variable for single unit increase in the X3 variable.

Regression analysis for Sydney

The coefficient of determination or R2 is calculated using the formula below,

Here, R2 = 0.7906. Thus it is a very good fit for the explained regression model. Approximately 79% of variation in response variable is explained by the explanatory variables.

The 95% CI of the regression coefficient ’s are calculated by the formula,

, i = 1 to 4.  standard error of estimate, and  = critical value of the t-statistic at 5% significance level (Zou, 2013). In the regression analysis output, these 95% CI’s are already calculated which are shown below.

CI for X1 = (0.664031125, 3.262956664); it represents that the researcher is 95% confident that the Price index value of Sydney will lie between 0.664031125 and 3.262956664.

CI for X2 = (-12.84161778, 1.597209306); here the value of Annual % change changes between the lower bound = 0.664031125 and upper bound = 3.262956664 with 95% confidence.

CI for X3 = (-0.202568152, 1.240859409); the values of variable X3 will vary within the interval values 1.24085 and -0.020257 with 95% confidence.

CI for X4 = (-5.005107781, 0.029375841); the value of X4 variable will lie within the calculated CI with 95% confidence level.

A.8. Regression model between Market price and Total number of square meters

The re-estimated linear regression model, where the single explanatory variable is Total number of square meters and the response variable is Market price, is given as,

Regression Statistics

Multiple R

0.313249771

R Square

0.098125419

Adjusted R Square

0.028750451

Standard Error

79.88618957

Observations

15

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Intercept

659.1430408

101.222089

6.511849797

1.9669E-05

440.466012

877.82007

Total number of square meters

0.563603274

0.4738972

1.189294376

0.2555933

-0.460189381

1.58739593

Table 2

(Source: As created by the Author)

The regression equation is

 = 659.143 + 0.5636

The original regression model has R2 value = 0.790614058, which is greater than the R2 value = 0.098125419 of the re-estimated model. Therefore, the original model or the multiple linear regression model is a better fit for the fitting the Market price value than the simple linear regression model that is the re-estimated model. Moreover, the original model explains 79% of the variation but the re-estimated explains only 9.81% of the variability of the Market price.

A.10. Predicted Market price value

Given that X3 = 400. Thus, the predicted value of Market price = 659.143 + (0.5636  400) = 884.583 that is, $884583.

B.2. Scatter plot analysis

Before the regression analysis, the scatter plot is displayed to check the association among the variables.

Figure 2

(Source: As created by the Author)

From the scatter plot, it is seen that the Total number of square meters represent a positive linear relationship. All the other variables are not showing a good strength of association.

Graphical representation

Regression Statistics

Multiple R

0.801295

R Square

0.642074

Adjusted R Square

0.498904

Standard Error

19.79055

Observations

15

ANOVA

df

SS

MS

F

Significance F

Regression

4

7025.999

1756.5

4.484691

0.024733446

Residual

10

3916.658

391.6658

Total

14

10942.66

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Intercept

82.70826

26.59056

3.110437

0.011052

23.46080361

141.955708

Melbourne Price Index

-0.67601

0.927016

-0.72923

0.482589

-2.74153307

1.38950858

Annual % change

1.851263

1.501679

1.232796

0.245847

-1.49468607

5.19721276

Total number of square meters

0.116315

0.145828

0.79762

0.443618

-0.20860965

0.44124034

Age of house (years)

-1.24947

0.451735

-2.76594

0.019926

-2.25599865

-0.2429419

Table 3

(Source: As created by the Author)

B.4. Regression equation

According to the above analysis, the regression equation is written as,

There is positive linear relationship with the Annual % change and Total number of sq-meters. However, negative association exits with the Melbourne price index and Age of house.

When all the explanatory variables are zero then the Market price = 82.708. If X1 increase one unit, then keeping other variables constant Y decreases by 0.676 units. When X2 increases one unit, then Y increase 1.851 units. There is decrease of 1.2497 units of Y for a single unit increase in X4 variable and increase of 0.116 units of Y for increase in X3 variable fixing the values of the other variables (Nimon & Oswald, 2013).

B.6. R2

Here R2 is 0.642074. Thus, the regression model gives a moderately good fitting. It states that 64.2% of the variation in the value of the Market price for Melbourne is explained by the explanatory variables.

B.7. Confidence Interval (CI)

CI for X1 = (-2.741533069, 1.389508577);at 95% confidence level, the Melbourne Price index value will lie between -2.741533069 and 1.389508577.

CI for X2 = (-1.494686069, 5.19721276); Here the value of Annual % change changes between the lower-bound = -1.494686069 and upper-bound = 5.19721276 with 95% confidence.

CI for X3 = (-0.208609648, 0.441240343); the values of variable X3 will lie within the given interval with 95% confidence.

CI for X4 = (-2.255998646, -0.242941926); it shows the upper-endpoint= -0.242941926 and lower-endpoint = -2.255998646 of the X4 variable within which the values will lie with 95% confidence.

B.8. Regression model between Market price and Total number of square meters

Regression Statistics

Multiple R

0.328763

R Square

0.108085

Adjusted R Square

0.039476

Standard Error

27.40006

Observations

15

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Intercept

50.09162

34.71803

1.442813

0.172733

-24.9121204

125.095356

Total number of square meters

0.204012

0.162541

1.255142

0.231522

-0.14713685

0.55516167

Table 4

(Source: As created by the Author)

The regression equation for the re-estimated model can be written as,

 = 50.092 + 0.204

B.9. Comparison of R2

From Table 3, the R2 = 0.642074 and from Table 4, the value is 0.108085. Clearly, the multiple regression model is definitely a better fit for the response variable than the latter re-estimated regression model. In the re-estimated model, the X3 variable explains only 10.8% of variability. However, the original model explains 64.21% of variation.

B.10. Market price prediction

When X3 = 400 then predicted value of the Market price = 50.09162 + (0.204012  400) = 131.69642 that is, $131696.

C.2. Graphical representation (Scatter plot)

Figure 3

(Source: As created by the Author)

The Excel output is shown in the below table

Regression Statistics

Multiple R

0.847362

R Square

0.7180224

Adjusted R Square

0.6052313

Standard Error

12.573407

Observations

15

ANOVA

df

SS

MS

F

Significance F

Regression

4

4025.588

1006.397

6.365952

0.008182698

Residual

10

1580.906

158.0906

Total

14

5606.493

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Intercept

89.871245

16.91237

5.313936

0.000341

52.18814276

127.5543

Brisbane Price Index

-0.5082788

0.47905

-1.06101

0.313636

-1.575668349

0.559111

Annual % change

1.4620069

1.016153

1.438766

0.180771

-0.802123664

3.726137

Total number of square meters

0.056465

0.094729

0.59607

0.564376

-0.154603966

0.267534

Age of house (years)

-0.7797538

0.348462

-2.2377

0.049196

-1.556176484

-0.00333

Regression calculation

Table 5

(Source: As created by the Author)

Regression equation

The estimated regression equation can be written as,

The y-intercept suggests that the Market price will be 89.871 when all the explanatory variables are zero. The X1 variable has negative coefficient value which suggests that one unit increase in X1 variable will decrease Y value by 0.508 units. The value of Y variable increases 1.462 units for unit increase in the X2 variable when rests of the variables are fixed. Similarly, there will be 0.056 units of increase and 0.780 units of decrease of the Market price value for a unit increase in the X3 and X4 variables respectively, keeping other variables constant.

C.6. R2

The above table shows that coefficient of determination is 0.718 and 71.8% variability of the response variable is explained by all the explanatory variables present in the regression model.

C.7. Confidence Interval

95% CI for X1 = (-1.575668349, 0.559110736); at 95% confidence, the Brisbane Price index value will lie between -1.575668349 and 0.559110736.

95% CI for X2 = (-0.802123664, 3.726137411); Here the value of Annual % change changes between the lower-bound = -0.802123664 and upper-bound = 3.726137411 with 95% confidence.

95% CI for X3 = (-0.154603966, 0.267533947); the values of variable X3 will lie within the given interval with 95% confidence.

95% CI for X4 = (-1.556176484, -0.003331143) that shows the upper-bound and lower-bound of the X4 variable within which the values will lie with 95% confidence.

C.8. Regression model between Market price and Total number of square meters

The regression model is shown below,

Regression Statistics

Multiple R

0.311985

R Square

0.097334

Adjusted R Square

0.027899

Standard Error

19.73047

Observations

15

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Lower 95.0%

Upper 95.0%

Intercept

65.98835

25.00006

2.639528

0.020414

11.979013

119.9977

11.97901

119.9977

Total number of square meters

0.138577

0.117044

1.183972

0.257621

-0.114282

0.391436

-0.11428

0.391436

Table 6

(Source: As created by the Author)

The fitted simple linear regression model is written as,

 = 65.988 + 0.139

C.9. Comparison of R2

Comparing Table 5 and Table 6, it is evident that the multiple linear regression model explains the variability more significantly as R-square of the original model = 0.718022376 and that of the re-estimated model is 0.097334455. Higher the value of R-square, better the fitting of the regression model.

C.10. Market price value (predicted)

 Putting the value of X3 in the above regression equation, the predicted Y value is obtained as,

Market price = 65.988 + (0.139  400) = 121.588 that is, $121588.

Conclusion

The above linear regression analyses and interpretation of its components for the three cities of Australia clearly concludes that the multiple linear regression models are better fit of the response variable than the simple linear regression model for all the three cities. The above conclusion is drawn from the values of the coefficient of determination. In the course of this paper, strength of association is not only evaluated by the Excel tool, but also it is graphically represented with the help of the scatter plots. The paper has also discussed about the confidence intervals of all the explanatory variables at 95% level of significance. This study will help to understand how the Market price of house is dependent on the chosen four independent variables. Moreover, the evaluated multiple linear regression equation will help to predict the value of the Market price when the values of the other explanatory variables.

References

Nimon, K. F., & Oswald, F. L. (2013). Understanding the results of multiple linear regression: Beyond standardized regression coefficients. Organizational Research Methods, 16(4), 650-674.

Zou, G. Y. (2013). Confidence interval estimation for the Bland–Altman limits of agreement with multiple observations per individual. Statistical methods in medical research, 22(6), 630-642.