Statistical Analysis Of NSW Public Transport

Background Information

Section (a): This assignment is based on the statistical analysis of New South Wales government transport by bus, train, ferry and light rail. The assignment entails the analysis of various aspects associated with the New South Wales transport such as the mode of travel, time, location, count and the date of the travel from 8th to 14th August 2016.

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

It is critical that the New South Wales government is undertaking long term master plan, uses the opportunity to give clear and concise and better transport services to the people of NSW that is passengers over a period of time (Garry Bowditch, 2018). This paper therefore presents some of the recommendations to the New South Wales government based on the analytical findings to enable adjustments in a particular variable on transport system. The assignment is thus a comprehensive analysis of factors such as mode of transport, gender, location, and times of travel and date of tap. The possible cases used in gathering the data includes direct interviewing of the travelers, observation and filling of the questionnaires by the actual passengers.

Section (b): Dataset 1is a secondary form dataset since it originates from the NSW master plan. Dataset 1 contains information that is related to the NSW transport used by people. The dataset  Based on the New South Wales Long Term Transport Master Plan (December, 2012), the New South Wales public transport there are four basic modes of transport offered i.e. by bus, by train, by ferry and by light Rail ().

Dataset 1 is a secondary dataset since it includes information collected by the government and the data was initially collected for other related research work. The dataset contains variables such as;

  • Mode of travel (Type of public transport i.e. Bus, Train, Ferry and Light Rail)
  • Gender
  • Date of the tap on or off held (From 8th to 14th August 2016)
  • Time of travel (00:15 to 23:00)
  • Location of the travel
  • Tap (On or off)
  • Count
  • Location (Locations of stops. For bus postcodes and other names of the stations).

The possible cases used in the study are observation. This is because the data was collected just on the basis of observing.

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

Dataset 2: Dataset is a data set that comprises of various variables and factors related to the New South Wales transport that were primarily collected from a sample population of travelers. Dataset 2 was collected using methods such as direct observation and interviews and thus dataset 2 is a primary data set. The type of variables involved in dataset 2 are;

  • Mode/Type of travel (by train, bus, light rail and by ferry)
  • Gender(female or male)

Also, there is biasness in the data set two collected due to the following reasons;

  1. There was no structured method of data collection that was applied in the study and as a results of this, the data obtained is biased to some extent.
  2. The data has few outliers since some values obtained were not in the range.
  • There is biasness since the sample chosen for the survey was small sample which can lead to inappropriate inferences about whole population.

Analysis of Single Variable in Dataset 1

Section (a)

Variable ‘mode’ is categorical type of data set. It is one categorical data, so we can use single frequency table to show numerical summary and a pie chart to show the graphical display. Following frequency table shows the modes of transport used by the people from 8th to 14th of August 2016

Dataset 1: Description and Analysis

Numerical Summary

Row Labels

Count of mode

Proportion (p)

Bus

472

0.472

Ferry

26

0.26

Light rail

20

0.2

Train

482

0.482

Grand Total

1000

1

From the above bar chart, we can see yellow part represent the people using the train services have the highest frequency whereas orange part represent least people uses the ferry as the modes of transport. Bus users are the second highest and light rail user at the third position.

Hypothesis testing  Part 2 B 

P-hat =0.482 (choose the highest value from the proportion column from part 2a)

N=1000

Step 1

H0: p= 0.5

H1: p>0.5

Step 2

Is condition satisfied?

Np0≥10 i.e., 1000*0.5 ≥ 10 = 500 ≥ 10

N (1-p) ≥10 i.e., 1000*(1-0.5) ≥ 10=500 ≥ 10

Yes from the above check we can say that condition are satisfied, it means the p-value can be computed as the area in the tail of standard normal beyond z.

Step 3

Calculation of test statistics

Z=

Now,

Test stat Z==== -1.14

Step 3

P (Z>-1.14)

 =0.1251

P-value is greater than alpha (p-value>0.05) [usually a=0.05]. So do not reject h0

Step 5

The test stat is below 1.645, i.e., -1.14 so do not reject Ho. It means there is no significant evidence that there are more than 50% of public transport users in NSW use the particular mode of transport found in Part a.

Section 3

Part A

Variables ‘location’ and ‘count’ are categorical and quantitative respectively. So suitable graphical display for one categorical and quantitative are dot plot and histogram and box plot. We have used the box plot to show the graphical representation of this two variable.

Numerical Summary

Box Plot

People count

 

In the above boxplot, x-represent the location of train station and y-axis represent the number of people. Box plot for Paramatta station is skewed to the right which has highest of 443 people, whereas for Bankstown Station has highest of 125 people

Variables involved:

  1. Tap- categorical
  2. Count- quantitative

Step-1

H0: all the means are equal

H1: at least two means are different

Step-2

Condition check

  1. Sample size (n) in each group are ≥30 (satisfied)
  2. Standard deviation is similar in each group (yes), because none of the standard deviation is twice the other standard deviation.

All the condition are satisfied.

From the diagram we can get value P-value which is 0.243

Since P-value i.e., 0.243>0.05, so do not reject H0, which means there is not enough evidence for difference in the mean.

Section 4                                                                 

In this survey we have got two variable and twenty-four cases. Two variables are ‘Gender’ and ‘Mode’. Both are categorical variables, so we need to use segmented bar chart for graphical representation of two categorical variables.

In the above segmented bar diagram x-axis represents the variable ‘gender’ and y-axis represent the variable ‘mode’. In each bar blue color represents the portion of male and grey color represents the portion of female. Male prefer to use train more than the female. In case of ferry, female prefer to use ferry more than the male. Similarly, male like to use bus as modes of transport than that of female. Whereas female use light rail more than male.     

Dataset 2: Collection and Analysis

Discussion and Conclusion

Executive Summary

From the analysis of various variables from sections above, the following can be discussed and concluded. Moreover, few recommendations to the NSW transport can be made in reference to some of the findings from the analyses.

From section one, it is evident that train and bus are the main modes of travel by people of New South Wales. The two modes are more frequent by both gender. Conversely, light rail was the least mode of transport having low count from both gender. Hence I would recommend to the NSW transport to consider improving efficiency on train transport by increasing frequency of the travel or by extending the system.  

In the segmented bar diagram x-axis represents the variable ‘gender’ and y-axis represent the variable ‘mode’. In each bar blue color represents the portion of male and grey color represents the portion of female. Male prefer to use train more than the female. In case of ferry, female prefer to use ferry more than the male. Similarly, male like to use bus as modes of transport than that of female. Whereas female use light rail more than male.

Based on the analysis of hypotheses from sections 2, it can be concluded that there is no mode of public transport that compose about half (50%) of the available transport modes in NSW. The findings further shows that both the gender prefer the two modes than the rest of the available transport. However, the train and the bus compose of about 48% and 47% of the total New South Wales transport and thus implies the most preferable means of transport by people. Most of males are found to prefer use of train as a mode of transport than female. In the case of using a ferry, the female are the frequent users than males. Whereas, the female also prefer light rail than male. I would also recommend to the New South Wales transport to consider investing much in trains and buses.

Consequently, a larger sample population should be applied in future research work as small sample of the population tend to yield in appropriate results for inferences and conclusion. 

References

Bruce, P. C. (2014). Introductory Statistics and Analysis [e-book]. New Jersey: John Wiley &  Sons.

Diggie, P. J. (2015).Statistics: A data Science for the 21st Century. Journal of the Royal Statistical Society.

Garry. B (2018). NSW Long Term Transport Masterplan. Smart Infrastructure Facility:University of Wollonggong. 

Jarman, K. H (2015). Beyond Basic Statistics [e-book]. New Jersey: John Wiley and Sons.

Hanne, R. A.M., & Kposowa, A.J., & Riddle, M. D (2013). Basic Statistics for Social Science.    San Francisco: Jossey-Bass (Wiley).

Lock, Robin H., Lock, Patti Frazer, Morgan, Karl Lock, Erick F., & Dennis F. (2013). Statistics. Unlocking the Power of Data. Wiley & Sons.