Frequency Analysis Of Smartphone Entertainment Types And Statistical Analysis Of Income Distribution

Random Number Selection Process

The student ID number selected for this particular assignment is MIT17122. Thus, according to the provided guideline of selecting the relevant random numbers, the last three digits of the student ID are considered. Those are 122. Consequently, the random number selection procedure has been started from row number 22 and column number 1. As the random numbers are provided in sets of 6 digits, each set or block provides two random numbers (Hamman et al., 2016). The first and last three digits of each block represent two distinct random numbers of sizes three. In the corresponding excel sheet, the first column denotes the random number selected. Second column denotes the respective values of the random numbers selected. For instance, the first selected random number is 937 and so on (Chatterjee & Hadi, 2015). Third column describes whether the selected random number is “Good” or “Not-Good”. Good means the number can be selected as a sample number (Wilson, Bhatnagar & Townsend, 2017). Not good means it has to be rejected. Random numbers from 001 to 300 are selected otherwise it is rejected, including 000.

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

The selected samples are outlaid in the file named “SampleSmartPhoneData”, containing 50 samples from the provided list of 300.

As asked to provide, a Frequency Column Chart and a Relative Frequency Pie-chart has been constructed to depict the number of and proportions of different entertainment type (Wun et al., 2016). 

As per the following frequency column chart, 21 of the samples contain entertainment in the form of Music, Videos and Movies (Weaver et al., 2018).

It is evident from the frequency column chart that that music, videos and movies are the most commonly downloaded form of entertainments.  

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

0.18 of the sample proportion of entertainments are that of eBooks.

The table below shows the incomes from a higher to lower order. Corresponding CN numbers are also attached for convention.

CN

V1

Rank

60

$250,000

1.5

193

$250,000

1.5

140

$180,000

3.5

225

$180,000

3.5

72

$160,000

5

113

$155,000

6

114

$102,983

7

300

$101,262

8

137

$100,267

9

243

$100,200

10

252

$99,742

11

237

$99,398

14.5

242

$99,398

14.5

223

$99,398

14.5

248

$99,398

14.5

57

$99,398

14.5

165

$99,398

14.5

273

$99,374

18

46

$99,336

19

202

$98,955

20

241

$98,678

21

180

$98,673

22.5

205

$98,673

22.5

102

$98,645

24

249

$98,191

25

146

$97,756

26

277

$97,338

27.5

277

$97,338

27.5

134

$97,000

29

293

$96,286

30

98

$95,957

31

88

$95,931

32

49

$95,877

33

131

$95,297

34

62

$95,000

35

153

$93,250

36

153

$93,250

37

91

$90,164

38

221

$90,025

39

169

$88,887

40

123

$72,000

41

268

$70,000

43

176

$70,000

43

176

$70,000

43

2

$62,500

45.5

251

$62,500

45.5

4

$55,000

47

234

$45,000

48

234

$45,000

49

25

$40,000

50

The formula to determine the location of the percentile, that is to find the value of the corresponding percentile from the data provided, is as follows –

 ; Where n is the total number of observations and P is defined as the desired percentile.

Here, the desired percentile is 70. Thus P =70. Substituting the value of P and considering n = 50, the location of the parameter is found out to be –

It can be written as IR+FR=35+0.7=35.7

The value with rank 35 is $95000 and the value of 36th rank element is $93,250. Further to determine the exact value corresponding to the 70th percentile, the formula used is –

Frequency Analysis of Smartphone Entertainment Types

0.7 (95000-93250) + 93250

= 0.7*1750+93250

=1125+93250

= $ 94475

CN

V1

Rank

60

$250,000

1

193

$250,000

1

140

$180,000

2

225

$180,000

2

72

$160,000

3

113

$155,000

4

114

$102,983

5

300

$101,262

6

137

$100,267

7

243

$100,200

8

252

$99,742

9

237

$99,398

10

242

$99,398

10

223

$99,398

10

248

$99,398

10

57

$99,398

10

165

$99,398

10

273

$99,374

11

46

$99,336

12

202

$98,955

13

241

$98,678

14

180

$98,673

15

205

$98,673

15

102

$98,645

16

249

$98,191

17

146

$97,756

18

277

$97,338

19

277

$97,338

19

134

$97,000

20

293

$96,286

21

98

$95,957

22

88

$95,931

23

49

$95,877

24

131

$95,297

25

62

$95,000

26

153

$93,250

27

153

$93,250

28

91

$90,164

29

221

$90,025

30

169

$88,887

31

123

$72,000

32

268

$70,000

33

176

$70,000

33

176

$70,000

33

2

$62,500

34

251

$62,500

34

4

$55,000

35

234

$45,000

36

234

$45,000

36

25

$40,000

37

The first and third quartiles represent the 25th and 75th percentile. The calculations are carried out in a similar fashion. To determine the 25th percentile value,  

This can be expressed as 12+ 0.75 = IR+FR =12.75

The value with rank 12 is $99,336 and the value of 13th rank element is $98,955. Further to determine the exact value corresponding to the 25th percentile, the formula used is –

0.75* (99336-98955) + 98995

= 0.75*381+98995

=285.75+93250

= $ 99280.75

CN

V1

Rank

60

$250,000

1.5

193

$250,000

1.5

140

$180,000

3.5

225

$180,000

3.5

72

$160,000

5

113

$155,000

6

114

$102,983

7

300

$101,262

8

137

$100,267

9

243

$100,200

10

252

$99,742

11

237

$99,398

14.5

242

$99,398

14.5

223

$99,398

14.5

248

$99,398

14.5

57

$99,398

14.5

165

$99,398

14.5

273

$99,374

18

46

$99,336

19

202

$98,955

20

241

$98,678

21

180

$98,673

22.5

205

$98,673

22.5

102

$98,645

24

249

$98,191

25

146

$97,756

26

277

$97,338

27.5

277

$97,338

27.5

134

$97,000

29

293

$96,286

30

98

$95,957

31

88

$95,931

32

49

$95,877

33

131

$95,297

34

62

$95,000

35

153

$93,250

36

153

$93,250

37

91

$90,164

38

221

$90,025

39

169

$88,887

40

123

$72,000

41

268

$70,000

43

176

$70,000

43

176

$70,000

43

2

$62,500

45.5

251

$62,500

45.5

4

$55,000

47

234

$45,000

48

234

$45,000

49

25

$40,000

50

In order to find the 75th percentile, proceeding in a similar fashion, we get 

This can be expressed as 38+ 0.25 = IR+FR =38.25

The value with rank 38 is $90,164 and the value of 39th rank element is $90,025. Further to determine the exact value corresponding to the 75th percentile, the formula used is –

0.25* (90,164-90025) + 90025

= 0.75*139+90025

=34.75+90025

= $ 90059.75

  1. c.

Before answering this specific question, it is important to clarify the idea of percentiles. Percentile refers to the percentage of population above a certain point. For instance, 70th percentile would mean the no of people of above that specific value. In this particular case, the value is found out to be $94475. Which implies that among the total 50 selected samples, 70 percent of them have the annual income of above $94475.

  1. d.

Inter quartile range is defined as the difference between the third quartile and the first quartile. Thus, the inter quartile range in this case is –

 = 90059.75 – 99280.75 = $9221

Inter quartile range is determined with primary focus on the deviation or variation within a data set. Inter quartile range basically provides an idea about the 50% of the values spread across the mean or the average. Thus, in this case the inter quartile range is $9221. This implies that the annual income of the 50% of middle range of the provided data is spread within a range of 9221.

The following descriptive statistics table has been constructed in excel and then pasted here.

Column1

Mean

101554.46

Standard Error

5815.206028

Median

97973.5

Mode

99398

Standard Deviation

41119.71617

Sample Variance

1690831058

Kurtosis

5.801565461

Skewness

2.114964151

Range

210000

Minimum

40000

Maximum

250000

Sum

5077723

Count

50

 The upper and lower inner fences are calculated by the provided formulae.

103891.3

85449.25

The suitable measure of central tendency chosen is the mean or the average. Among all the other measures of central tendency, viz. median, mode and others, Mean is regarded as the best measure. Thus it is chosen primarily for this purpose. Also since there the inter quartile range or even the range shows that the data is well spread, median and mode will not be the best choice.  The mean is defined as –

Percentiles and Quartiles of Income Distribution

The suitable measure of dispersion chosen for this particular set of data is standard deviation (SD). SD is defined as the square root of the sum of the squares of deviations from the mean. Since the measure of central tendency is chosen as the mean, it is convenient from a practical perspective to use the standard deviation to calculate the level of dispersion. SD is defined as –

The V1 variable is defined as the annual income of the samples under consideration. The mean, as mentioned above, is found out to be $101554.46. This implies on an average the annual income of the 50 samples is the aforementioned amount. This may not seem like a middle or central value as the incomes range from $250,000 to $40,000.

The median or the middle most value of the entire data set is calculated as $97973.5. This means that half of the observation set, that is income of 50% of the observations lie above this value point and consequently rest lie underneath this point. The median also depicts that the majority of the people have income in the vicinity of the mentioned value.

Quartiles are referred to the groups or sections when the entire data set is divided in four of them. All of the quartiles values are calculated till now. Here,

The first quartile provides the value above which 25% of the observations lie. Consequently third quartile does the same with that of 75% of the observations. Median or the 50th percentile or the second quartile is the middle value of the data. This means 50% of the observations are above this value and the rest are below.

Measures of variation include the range and the sample SD and the sample variance. All the values are calculated through excel and mentioned above. The values are found out to be

Standard Deviation

41119.71617

Sample Variance

1690831058

Range

210000

Clearly the SD and the Variance are very high. The range also indicates the dispersion of the data set.

The three measures that help in recognizing whether the data is follows a normal distribution or not are – Mean Median and Skewness. In case of Normal distribution, Mean, Median and Mode shall all be equal. That is not the case for this particular data set (Leamer, 2016). The Skewness is also high as Skewness for a normal distribution tends to zero. Thus the data does not follow a Normal Population.

Here, the following table is drawn to conclude the number observations within the asked range.

CN

V1

Z

25

$40,000

-1.49696

234

$45,000

-1.37536

234

$45,000

-1.37536

4

$55,000

-1.13217

2

$62,500

-0.94977

251

$62,500

-0.94977

268

$70,000

-0.76738

176

$70,000

-0.76738

176

$70,000

-0.76738

123

$72,000

-0.71874

169

$88,887

-0.30806

221

$90,025

-0.28039

91

$90,164

-0.27701

153

$93,250

-0.20196

153

$93,250

-0.20196

62

$95,000

-0.1594

131

$95,297

-0.15218

49

$95,877

-0.13807

88

$95,931

-0.13676

98

$95,957

-0.13613

293

$96,286

-0.12812

134

$97,000

-0.11076

277

$97,338

-0.10254

277

$97,338

-0.10254

146

$97,756

-0.09238

249

$98,191

-0.0818

102

$98,645

-0.07076

180

$98,673

-0.07007

205

$98,673

-0.07007

241

$98,678

-0.06995

202

$98,955

-0.06322

46

$99,336

-0.05395

273

$99,374

-0.05303

237

$99,398

-0.05244

242

$99,398

-0.05244

223

$99,398

-0.05244

248

$99,398

-0.05244

57

$99,398

-0.05244

165

$99,398

-0.05244

252

$99,742

-0.04408

243

$100,200

-0.03294

137

$100,267

-0.03131

300

$101,262

-0.00711

114

$102,983

0.034741

113

$155,000

1.299755

72

$160,000

1.421351

140

$180,000

1.907736

225

$180,000

1.907736

60

$250,000

3.610082

193

$250,000

3.610082

Measures of Central Tendency and Variation of Income Distribution

The z scores are defined 1.5 and -1.5. From the standard normal table, the value found out is 0.43319. For both sides, the total of 86.638 % observations lies between the mentioned regions (Wan et al., 2014). This means about 43 observations lie in between the specified region.

The following table has been constructed to provide an idea of the region asked for.

CN

V1

TRUE/FALSE

25

$40,000

TRUE

234

$45,000

TRUE

234

$45,000

TRUE

4

$55,000

TRUE

2

$62,500

TRUE

251

$62,500

TRUE

268

$70,000

TRUE

176

$70,000

TRUE

176

$70,000

TRUE

123

$72,000

TRUE

169

$88,887

TRUE

221

$90,025

TRUE

91

$90,164

TRUE

153

$93,250

TRUE

153

$93,250

TRUE

62

$95,000

TRUE

131

$95,297

TRUE

49

$95,877

TRUE

88

$95,931

TRUE

98

$95,957

TRUE

293

$96,286

TRUE

134

$97,000

TRUE

277

$97,338

TRUE

277

$97,338

TRUE

146

$97,756

TRUE

249

$98,191

TRUE

102

$98,645

TRUE

180

$98,673

TRUE

205

$98,673

TRUE

241

$98,678

TRUE

202

$98,955

TRUE

46

$99,336

TRUE

273

$99,374

TRUE

237

$99,398

TRUE

242

$99,398

TRUE

223

$99,398

TRUE

248

$99,398

TRUE

57

$99,398

TRUE

165

$99,398

TRUE

252

$99,742

TRUE

243

$100,200

TRUE

137

$100,267

TRUE

300

$101,262

TRUE

114

$102,983

TRUE

113

$155,000

TRUE

72

$160,000

TRUE

140

$180,000

FALSE

225

$180,000

FALSE

60

$250,000

FALSE

193

$250,000

FALSE

It is evident 46 of the observations fall in the given region.

The regression equation is.

The primary purpose of this model is to test whether there is a linear relation between the age and percentage of phone usage for work purposes (Abdullah, Doucouliagos & Manning, 2015). If there is a linear relation between these two then  will not be equal to zero.

Since the value of  is 0.608, the variables are related in a linear positive manner.

The intercept coefficient provides the least square estimate of.

The slope coefficient is the value corresponding to age in the Coefficients column (Kolaczyk & Csárdi, 2014). That is also. Thus as described above, it represents a positive linear relation.

Coefficient of determination is the R-squared value in the table. It is 0.126743917. It means 12.6% of the variation of the variable around its mean can be described through the other regressors.

References

Hamman, R. D., Kennedy III, W. C., Rump, W. J., & Irwin, K. E. (2016). U.S. Patent Application No. 15/152,009.

Wun, T., Payne, J., Huron, S., & Carpendale, S. (2016, June). Comparing bar chart authoring with Microsoft Excel and tangible tiles. In Computer Graphics Forum (Vol. 35, No. 3, pp. 111-120).

Weaver, K. F., Morales, V., Dunn, S. L., Godde, K., & Weaver, P. F. (2018). Showing Your Data. An Introduction to Statistical Analysis in Research: With Applications in the Biological and Life Sciences, First, 61-190.

Wan, X., Wang, W., Liu, J., & Tong, T. (2014). Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range. BMC medical research methodology, 14(1), 135.

Kolaczyk, E. D., & Csárdi, G. (2014). Statistical analysis of network data with R(Vol. 65). New York: Springer.

Leamer, E. E. (2016). S-values: Conventional context-minimal measures of the sturdiness of regression coefficients. Journal of Econometrics, 193(1), 147-161.

Draper, N. R., & Smith, H. (2014). Applied regression analysis. John Wiley & Sons.

Harrell Jr, F. E. (2015). Regression modeling strategies: with applications to linear models, logistic and ordinal regression, and survival analysis. Springer.

Marcolini, G., Bellin, A., & Chiogna, G. (2017). Performance of the Standard Normal Homogeneity Test for the homogenization of mean seasonal snow depth time series. International Journal of Climatology.

Mowery, D. C., Nelson, R. R., Sampat, B. N., & Ziedonis, A. A. (2015). Ivory tower and industrial innovation: University-industry technology transfer before and after the Bayh-Dole Act. Stanford University Press.

Chatterjee, S., & Hadi, A. S. (2015). Regression analysis by example. John Wiley & Sons.

Wilson, L., Bhatnagar, P., & Townsend, N. (2017). Comparing trends in mortality from cardiovascular disease and cancer in the United Kingdom, 1983–2013: joinpoint regression analysis. Population health metrics, 15(1), 23.

Abdullah, A., Doucouliagos, H., & Manning, E. (2015). Does education reduce income inequality? A meta?regression analysis. Journal of Economic Surveys, 29(2), 301-316.