Analyzing Large Datasets With Pivot Tables: London Air Quality And Authority Average Rent

London Average Air Quality Level Dataset

The dataset shows background and roadside average records about Nitric Oxide, Nitorgen Dioxide, Oxides of Nitrogen, Ozone, Particulate Matter (PM10 and PM2.5), and Sulphur Dioxide measured in Micrograms per Cubic Meter of Air (ug/m3). The spreadsheet clearly shows index value with levels per each reading that falls under category with containing charts. The charts can show pollutant level with measuring the pollution effect with instance of time of day per month. Trend graph of Nitrogen Dioxide and Particulate Matter PM2.5 are considered as major part of discussion. The air quality strategy can be adopted with proper policy so that the improvement of air quality can be supported with monitoring the network over London for providing accurate and updated information about air quality trends as per London weather conditions.

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

Each pollutant in the dataset are measured with every 15 minutes so that London Air Quality Network and average value can be estimated from these individual measurements. The averages into the process is shown with placing then into relevant DRFRA air quality index for weather bands for indicative process. The official air quality should not be monitored as per European legislation as well. A summary of air pollution definition can be performed with sources so that consequences as discussed.

Weekly amounts charged by local authorities for social housing are called as rents and the data is based on financial year values. Stock figures and data is presented for estimating and showing the average of each country, region, and nationwide values. Average rent data between 2003/2004 and 2007/2008 can be based on entire stock figures that can be used for Housing Revenue Account (HRA) and that is audited for base claim format. Before, the year region 2003/2004; average rent data is dependent on total stock figures from Housing Revenue Account (HRA) and based in subsidy claim format. For 2008/2009 rent data, the total stock figure is shown at the beginning of financial year and the data is considered from HRA over subsidy base data format. The average rent data for that year is provisional and the data will be updated in case the stock value can be started in following year and that becomes available.

Some authorities in rent division can determine stock figures in order to consider the rent reports over average rent. There are some local authorities for average rent for 2003/2004 along with lower value that 2002/2003. In case one local authority considers the estimation for regional and national averages in Midyear transfers. The values refer to authorities convenient to stock either being full or half during the financial year.   

Case Processing Summary

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

Cases

Valid

Missing

Total

N

Percent

N

Percent

N

Percent

Month * London Mean Roadside:Nitric Oxide (ug/m3)

93

79.5%

24

20.5%

117

100.0%

The case processing data summary shows that month and mean value of Nitric Oxide amount is used together for preparing a cross tabulation analysis over SPSS analytics. The considered data has some missing values; in exact amount of 20.5% over the entire dataset; 79.5% data is valid for analysis process.  

Chi-Square Tests

Value

df

Asymp. Sig. (2-sided)

Pearson Chi-Square

8463.000a

8372

.240

Likelihood Ratio

840.291

8372

1.000

Linear-by-Linear Association

.038

1

.845

N of Valid Cases

93

a. 8556 cells (100.0%) have expected count less than 5. The minimum expected count is .01.

The Air Quality data analysis over chi-square test shows that coefficient value is much higher as estimated at 8463 and the significance value is estimated at more than 0.05. The significance f value being more than 0.05; the data month and pollutant amount is not related on an overall basis. The clustered bar chart in this analysis work shows the month wise reading of pollutant amount that emits in each month with polluting the air quality. The review of data can be effective for scientific analysis.      

London Authority Average Rent Dataset

Case Processing Summary

Cases

Valid

Missing

Total

N

Percent

N

Percent

N

Percent

Area * 2015-16

280

100.0%

0

0.0%

280

100.0%

The case processing summary data shows that here areas and 2015-2016 year range rent value is considered for this cross tabulation analysis work. However, unlike another dataset; here no missing data exists and all considered data is valid for analysis in this process.

Chi-Square Tests

Value

df

Asymp. Sig. (2-sided)

Pearson Chi-Square

8960.000a

1440

.000

Likelihood Ratio

525.633

1440

1.000

N of Valid Cases

280

a. 1516 cells (99.9%) have expected count less than 5. The minimum expected count is .00.

The table shows chi-square test over taking all facts and data about rent values with corresponding year range wise rent values. The chi-square value sows that significance value is more than 0.05; hence, the area wise values are not related with year wise values about rent data into the analysis.   

In this section four different features for data mining tools are described as regression, association rule discovery, classification, and clustering. The features are:

Regression: The regression feature is most common feature of any data-mining tool available; the tool deals with straightforward yet simple version of technique with predictive analysis process. The regression analysis researcher can utilize for provided dataset along with other major features such as assuming linear or nonlinear mode of dependency.    

Association rule discovery: Association rule discovery is another important feature for data mining tools. The association rule discovery can work with simple method so that the researcher can work with the rules. The rule is to be followed as X -> Y and the rule should be able to minimally support the support and confidence. The set of records should be used for finding dependency rules so that occurrence should be discovered.     

Classification: The classification model feature of data mining tool follows handling of data over modeling phase. The researcher can be able to distinguish the data over chosen set of attributes so that based on class; the model can predict the unseen or unknown records for similar datasets. The label of class can be seen or known with provided all value of attributes as well.  

Clustering: Clustering of data can be used for determining the object grouping over different consumer groups for objects that can be prepared over same cluster based on some certain rules or conditions. The clustering problem stands into sense of reducing the similarity issues into the dataset.  

Akter, S., & Wamba, S. F. (2016). Big data analytics in E-commerce: a systematic review and agenda for future research. Electronic Markets, 26(2), 173-194.

Carroll, J. K., Moorhead, A., Bond, R., LeBlanc, W. G., Petrella, R. J., & Fiscella, K. (2017). Who Uses Mobile Phone Health Apps and Does Use Matter? A Secondary Data Analytics Approach. Journal of medical Internet research, 19(4).

Dittert, M., Härting, R. C., Reichstein, C., & Bayer, C. (2017, June). A Data Analytics Framework for Business in Small and Medium-Sized Organizations. In International Conference on Intelligent Decision Technologies (pp. 169-181). Springer, Cham.

Khalifa, S., Elshater, Y., Sundaravarathan, K., Bhat, A., Martin, P., Imam, F., … & Statchuk, C. (2016). The six pillars for building big data analytics ecosystems. ACM Computing Surveys (CSUR), 49(2), 33.

Landwehrmeyer, G. B., Fitzer?Attas, C. J., Giuliano, J. D., Gonçalves, N., Anderson, K. E., Cardoso, F., … & Sampaio, C. (2017). Data Analytics from Enroll?HD, a Global Clinical Research Platform for Huntington’s Disease. Movement Disorders Clinical Practice, 4(2), 212-224.

Le, T. M., & Liaw, S. Y. (2017). Effects of Pros and Cons of Applying Big Data Analytics to Consumers’ Responses in an E-Commerce Context. Sustainability, 9(5), 798.

Perrotta, C., & Williamson, B. (2016). The social life of Learning Analytics: cluster analysis and the ‘performance’of algorithmic education. Learning, Media and Technology, 1-14.

Tan, J. S., Ang, A. K., Lu, L., Gan, S. W., & Corral, M. G. (2016, November). Quality Analytics in a Big Data supply chain: Commodity data analytics for quality engineering. In Region 10 Conference (TENCON), 2016 IEEE (pp. 3455-3463). IEEE.

White, N., Levy, Y., Terrell, S. R., & Bronsburg, S. (2016). Using data analytics to further understand the role that boredom, loneliness, social anxiety, social gratification, and social relationships (BRAG) play in a driver’s decision to text. Online Journal of Applied Knowledge Management, 4(2), 1.

Wipulanusat, W., Panuwatwanich, K., & Stewart, R. A. (2017, July). Statistical Data Analysis of Culture for Innovation Using an Open Data Set from the Australian Public Service. In British International Conference on Databases (pp. 78-89). Springer, Cham.