A Statistical Modelling Assignment: Exploring Relationships Between Different Variables

Data collection and limitations

This is a statistical modelling assignment. This assignment is meant to test on the data collection and analysis. There is also a specific test of knowledge of interpretation of analysis results and making meaningful statistical inferences. In short, this is a data- driven decision making assignment. The assignment is divided into four major sections; introduction, descriptive statistics, inferential statistics and discussion and conclusion.

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

There are two data sets used in completing this assignment, dataset1 and dataset2. Data set1 is about gender and the amount of salary and their occupation. This data set has got three variables; gender, salary/wage amount and gift amount. This is a secondary data extracted from the database of the Australian Taxation Office (ATO). The first five cases of this dataset are displayed below.

Gender

Occ_code

Sw_amt

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

Gift_amt

Male

0

0

0

Male

7

4310

0

Female

2

70839

131

Female

3

79996

383

Male

9

0

0

The research question associated with this dataset is to find out the relationship between the amount of salary and the gender of an individual. Several statistical analyses are involved in achieving the objective of this study. These statistical tests include; descriptive statistics analysis, inferential statistics and hypothesis testing.

Dataset 2 is a sample of different countries across Africa, America and Asia and their development index. This is a secondary data extracted from the United Nations website (Public, 2010). This is data is about the level of sustainable economic development across these countries (Public, 2010).  The variables in this data set include; continent, access to improved sanitation facilities, mobile telephone subscriptions and women’s average years in school. The first five cases of this dataset are displayed below.

Continent

Access to improved sanitation facilities

Mobile telephone subscriptions

Women’s average years in school

AFRICA

87.60

106.38

7.74

AFRICA

51.59

60.84

5.31

AFRICA

19.72

85.64

2.73

AFRICA

63.43

169.00

8.71

AFRICA

19.73

80.64

1.86

This research is interested in finding out the level of education among women in these countries. To achieve this, this study will seek to find out the average duration taken by women in school. This objective will be achieved by use of descriptive statistics (i.e. frequency table, graphical display and summary statistics).  

Descriptive statistics in this section has been done entirely using dataset1. Descriptive statistics gives the characteristics of the data in form of the graphical display of the variables and summary statistics  (David & David, 2000). Graphical display of data includes pie charts, bar graphs and line graphs (Krishnamoorthy, 2005).

Likewise, summary statistics include the numerical characteristics of the data such as the mean, median and mode (Knight, 2000). There are four descriptive statistics analyses done. These four descriptive statistics analyses include three graphical displays and one suitable summary statistics. The descriptive analyses are outlined below;

Variables involved in the datasets

The first is a graphical display meant to describe the relationship between the variable gender and Occ_Code. This has been done using a bar graph. The bar graph below represents the spread of the number of workers in each occupation category in terms of gender (male or female). The length of the bar represents the number of individuals in terms of gender (male or female) in each occupation (Tim, 2005). From the graph, it is clear that there are more males than females in the occupation codes 0, 2, 3, 7 and 8. 

The second descriptive analysis is a graphical display meant to describe the relationship between the variables gender and SW-amt.  A scatter plot has been developed to compare the amounts of salary across the gender (males and Females). The individual earnings are plotted and those individuals with higher earnings/salary amounts are found at a considerably far away from the x- axis line (Krishnamoorthy, 2005).

From the plot below, it is clear that averagely, males earn more amounts of salary than their female counterparts. Similarly, it is clear that generally, majority of the individuals (males and females) earn significantly low amounts of salary.

The third descriptive analysis is meant to describe the relationship between the variable gender and SW_amnt using a suitable numerical summary. This analysis is meant to compare the amount of salary earned by the males and female employees. The numerical analyses used in this analysis are mean, maximum and minimum (Krishnamoorthy, 2005).

From the table below, the mean salary for the males is 53933.88 while the mean salary for their female counterparts is 36044.18. This is a clear indication that generally, males earn more salary than females. Similarly, the minimum amount of salary earned by males is 0 same the minimum amount of salary earned by females.

On the other hand, the maximum amount of salary earned by a male employ is 839840 while the maximum amount of salary earned by a female is 308183. This is a further indication that generally, males earn higher amount of salary than females.

Males

Females

Mean

53933.88

Mean

36044.18182

Minimum

0

Minimum

0

Maximum

839840

Maximum

308183

The fourth and the last descriptive statistics in this section is meant to describe the relationship between the variables SW_amnt and Gift_amount. This analysis is outlines how the amount of salary paid to an individual is related to the amount of gift given to that individual. A line graph has been developed to describe this relationship. From the graph, it is clear that the mount of salary is generally higher than the amount of gift.

Descriptive statistics, inferential statistics and hypothesis testing

This section uses both datase1 and dataset2 in making statistical inferences. Statistical inference is where the outcome of a sample analysis is used in describing the population from which the sample was drawn (Knight, 2000). Inferences are made from the descriptive statistics outcome, graphical display outcome or summary statistics outcome (Tim, 2005). The inferences made are outlined below.

The first inference is based on the median salary. The median salary is 34788.5. This is the value separating the higher half salary amounts and the lower half salary amounts.

Sw_amt

Mean

44811.134

Standard Error

1694.223927

Median

34788.5

Therefore, the top 4 occupations based on the median are those with average salaries equal or very closer to the median. The top four occupations are; professionals, clerical and administrative workers, community and personal service workers and machinery operators and drivers. In these occupations, there are two males and two females. This is fair representation of gender.

Gender

Occ_code

Sw_amt

Gift_amt

Male

5

34916

0

Female

2

34890

40

Male

7

34645

11

Female

4

34687

186

The second statistical inference is a hypothesis test to ascertain whether the proportion of machinery operators and drivers who are males is more than 80%. To test this hypothesis, we first develop the following hypothesis;

H0: p= 0.8

H1: p> 0.8

Rejection region is such that; we reject null hypothesis whenever the p values is less than the alpha value which is 0.05. Test statistic is calculated as follows;

Sample proportion, u= 0.49

Sample size= 44

Sample Sd= 65891.4

 Therefore, the test statistics is calculated as

T=

=0.000047

And the p value 0.002458 which less than the alpha value. We conclude that there is no sufficient evidence to prove that the population has more than 80% male machine operators and drivers (Knight, 2000).

The third inferential statistics is hypothesis test to ascertain whether there is a difference in salary amount between the genders. This test can be achieved by carrying out a single factor ANOVA test. A single factor ANOVA test is used in testing whether there is a significant difference in the means of two samples with size of more than 30 observations (Tim, 2005).

The following hypothesis is used in conducting this test;

H0: There is no difference in salary amount between males and females

H1: There is a difference in the salary amount between males and females

The following table is the output of the single factor ANOVA. The p value is 0.000000202. The p value is less than the alpha value, 0.05. Hence we reject the null hypothesis that there is no difference in salary amount between males and females. We conclude that statistically, there is no sufficient evidence to prove that there is no difference in salary amount between males and females.

ANOVA

SourceofVariation

SS

df

MS

F

P-value

F crit

Between Groups

7.74E+10

1

7.74E+10

27.4072

2.02E-7

3.851103

Within Groups

2.73E+12

966

2.83E+09

Total

2.81E+12

967

Dataset1: Gender, salary/wage amount, and gift amount

The fourth and the final analysis has been done on datset2. The inferential statistics carried out here is the use of a suitable numerical summary method to determine the level of sustainable economic development in Africa. This has been done by analysing the average number of years that women take in studies. The table below shows the summary output.

From the output below, the average number of year taken by women in studies is 5.76.  Similarly, the minimum number of years taken by a woman in studies is 1.41 and the maximum is 11.48 years. This is an indication that the women take relatively shorter time for their studies (OECD, 2004). This could be possibly because majority of them do not opt obtain tertiary education. This signifies that there is still a low level of sustainable economic development in Africa (Mark, 2009)

Women’s average years in school

Mean

5.76

Minimum

1.41

Maximum

11.48

 

Conclusion

A number of conclusions can be drawn from this study. It is evidenced that there is pay parity between males and females. Males earn more salary than females. A further research could be conducted to find out whether there is pay parity between males and females for the same job. Similarly, a research could be done to establish the root course of this pay parity. Similarly, a research could be done to establish whether there is a correlation between the male and female salaries.

It has also been observed that there is a difference in the amount of salary and the gift. It is clear that the amount of salary if far much higher than the amount of gift. A research could be conducted to establish whether there is a significant difference in the amount of salary one is earning and the amount of gift.

Women in Africa take a short period of time in their studies. This is an indication that majority of African women do not pursue higher education. This could suggest a low level of economic development among the African women (OECD, 2004). A research can be done in this area to establish the causes of low education levels among the women in Africa.

References

David, J. S., & David, S. (2000). Handbook of parametric and nonparametric statistical Procedures.

Knight, K. (2000). Mathematical Statistics- Volume in Texts in Statistical Scence Series. Chapman and Hall.

Krishnamoorthy, K. (2005). Handbook of Statistical Distributions with Applications.

Mark, H. (2009). Economic Development, Education and Transnational Corporations (Routldge Studies in Development Economics).

OECD. (2004). Economic, Environmental and Social Aspects (Oecd Sustainable Development Studies). OECD Publishing .

Public, U. N. (2010). Auditing for Social Change: A Strategy for Citizen Engagement in Public Sector Accountability (Economic & Social Affairs).

Tim, S. (2005). Mastering Statistical Process Control: A handbook for Performance Improvement Using Ca