Data Analysis Report On Maternal Health In Australia


Maternal health is a major concern in the whole world among other health issues. The government among other organisations such as the World Health Organisation and private health institutions are committed to improving the health of the citizens. A lot of funds have been directed in the health docket to fight these health issues, hence making a healthy nation. Maternal health has been among the main health concerns of public health because it is among the major determinants development of a country. In this case, a strong health system set would reduce pregnancy-related issues, which significantly decreases child mortality, which is a double achievement for the society.

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

Health data was extracted from the world development indicators database which was last updated in March 2018. This was done by pre-possessing data and extracting data which meets our criteria. Only Australian data will be analysed to evaluate the risk of maternal death based on factors such as the number of maternal deaths and the amount spend on health as a percentage of the GDP. The data was extracted using MS Excel by filtering data labels with ‘health’ and ‘death’ keywords which returned 17 variables which are related to health and deaths related to health.

In this paper, data will be loaded into R software for exploratory and advanced analysis. The exploratory data analysis will include one variable analysis which includes 3 variables – whose descriptive will be obtained and appropriate graphs reported. In addition, two variable analysis will be conducted and appropriate plots generated to present the data effectively. Also, cluster analysis and linear regression analysis will be done and reported in the advanced analysis section. Journal articles are used to reference ideas and facts included in the report.

The main data file from World Development Indicators was processed in MS Excel by extracting the required variables and presenting them in a tidy dataset format, which includes a variable presentation in column format. The data was also saved in CSV format to allow easy and effective upload into R software for analysis. A metadata was created to show the description of the coded variable names that allows effective reference and a better understanding of the analysis. The R system workspace was changed to the folder with the dataset and it was imported using the code below. Also, the required packages were loaded using the library function. The variables include cluster and fpc which allow cluster analysis and visualisation. The data characteristics include in this analysis are for Australia only.


Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

mydata <- read.csv(“mydata.csv”)

library(cluster); library(fpc)


One-Variable Analysis


##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA’s

## 0.01152 0.01280 0.01320 0.01358 0.01490 0.01550       1

boxplot(SH.MMR.RISK.ZS,col = 5,

main = “Lifetime Risk of Maternal Death”, outline = T, names = T)


Figure 1: A Boxplot of Lifetime risk of maternal death

The lifetime risk of maternal death was presented as a probability and analysis shows that the least value was 1.152% and a maximum of 1.552%. The mean of the risk was 1.358% and a median value of 1.32%. These two statistics are not equal, indicating that the risk of maternal death was not normally distributed. The boxplot shows that the risk of maternal death is skewed to the right – which is an indication that most of the years from 1995 to 2014 had risk values which were below than the mean value.

Assignment Task


##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA’s

##   19.00   20.00   20.00   20.05   21.00   21.00       1

hist(SH.MMR.DTHS,col = 5,

     main = “Number of Maternal Deaths within 42 days”, xlab = “Maternal Deaths”)


Figure 2: A histogram of the number of maternal deaths within 42 days

The average number of women who died within 42 days of giving birth of termination of the pregnancy in any other way was 20 and median of 20.05 which is approximately 21. There is no great deviation between the mean and the median, hence the conclusion that the number of women who died due to pregnancy-related issued within 42 days after termination of the pregnancy was approximately normally distributed – which is depicted in the histogram.


##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA’s

##   7.260   8.006   8.473   8.443   9.023   9.422       2

boxplot(SH.XPD.TOTL.ZS, outline =T, col = 5,

        main = “Total Health Expenditure (% of GDP)”)


Figure 3: Boxplot of total health expenditure (% GDP)

Total expenditure on health is approximately normally distributed, with a mean of 8.443% and a median value of 8.473%. Between 1995 and 2014, the highest percentage of the GDP spend on health was 9.422% and the minimum was 7.26%. The difference between the maximum and minimum values is not significantly big. There are no outliers in the percentage of GDP spend on health(Oldford, 2016).

Number and risk of maternal deaths


Figure 4: Number and risk of maternal deaths

A scatter plot was selected to visualize the relationship between number and risk of maternal deaths because both are quantitative variables.


Figure 5: Total health expenditure by the risk of maternal death

The scatter plot was the appropriate graph to visualize the relationship between total health expenditure and the risk of maternal death because both are quantitative and continuous variables.


Cluster analysis is a statistical technique which combines variables into the most predictive combinations to predict a phenomenon. K-means uses the estimated combinations and groups them into groups based in centroids which are developed according to developed groups in the analysis. Each centroid represents a distinct group which are related to the used variables and the combinations developed by cluster analysis. In R, the cluster package is used to provide functions to analyse data using cluster analysis technique.

## Cluster analysis ####

mydata_1 <- mydata[, c(10,11,14)]

mydata_1 <- na.omit(mydata_1)


mydata_1 <- scale(mydata_1)

clusters <- kmeans(mydata_1, 19)

plotcluster(mydata_1, clusters$cluster)


Figure 6: Cluster plot

The cluster analysis returned 19 clusters showing that each year was independent of each other(Everitt, Landau, Leese, & Stahl, 2011). Therefore, the variables distinctly represented the years, hence there was no association between the years. These clusters can be seen in the cluster plot above.

Linear regression 1

fit1 <- lm(SH.MMR.RISK.ZS ~ SH.MMR.DTHS)


plot(SH.MMR.DTHS, SH.MMR.RISK.ZS, xlab = “Number of Maternal deaths”,

     ylab = “The risk of maternal deaths”,

     main = “Relationship between Number of maternal deaths and Risk”)

abline(fit1, col = 2)


Figure 7: Linear plot of fit 1

There is a positive linear relationship between the number and the risk of maternal deaths. Number of a maternal deaths is a significant predictor of the risk of maternal deaths. Increasing the number of maternal deaths by one women increases the risk of maternal deaths by 0.144%. The regression equation is as shown below(Crawley, 2012).



plot(SH.XPD.TOTL.ZS, SH.MMR.RISK.ZS, xlab = “Total Health Expenditure (% of GDP)”,

     ylab = “The risk of maternal deaths”, main = “The relationship between %GDP spend on health and Risk Maternal death”)

  abline(fit2, col = 2)


Figure 8: A plot of fit 2

There is a negative relationship between the health expenditure (%GDP) and the risk of maternal deaths. This shows that as the amount spend on health increases, the risk of maternal death decreases. Increasing the total amount spend on health by 1% of the GDP reduces the risk of maternal death by 0.168%(Sainani, 2013).


In conclusion, the amount spend on health (%GDP), number and risk of maternal deaths recorded since 1995 for Australia is jointly differently for all years. Australia has been improving significantly on health expenditure since 1995 to 2014, which has significantly been improving the quality of health.

In this analysis, it was a challenge to notice that double dots used on the MS Excel database to denote missing values had an effect on R. This lead to difficulties on the R system detecting the correct datatypes.  I had to remove the dots using the MS Excel software, which solved the problem and allowed proper detection of datatypes for effective data analysis.


Crawley, M. J. (2012). Regression. In The R Book (pp. 449–497).

Everitt, B. S., Landau, S., Leese, M., & Stahl, D. (2011). Cluster Analysis. Quality and Quantity (Vol. 14).

Oldford, R. W. (2016). Self-Calibrating Quantile–Quantile Plots. American Statistician, 70(1), 74–90.

Sainani, K. L. (2013). Understanding linear regression. PM and R, 5(12), 1063–1068.