Application Of Descriptive Statistics In Probability Distributions

Service Times in a Hospital

Questions:

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

Question 1    

(a) For each of the following scenarios:

• classify the variable as either numerical or categorical, AND

• state whether the scale of measurement is nominal, ordinal, interval or ratio.

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

(i) Fast food restaurants sell soft drinks in three sizes – small, medium and large.    
         
(ii) A manufacturing company is sending millions of car parts overseas.    

(b) Classify each of the following scenarios as a statistical problem in either descriptive statistics, probability or statistical inference.

(i) Based on the survey conducted by Harris Green Pty Ltd, researchers predicted that group buying websites will be the most popular method for buying electrical and electronics products in the future.    
          
(ii) A survey of 1000 adult drivers conducted by News Today shows that 45% of drivers admit to drinking and 36% admit to talking on the mobile phone while driving a vehicle.

The following data shows the service times (in seconds) for a sample of 96 customers who arrived at the service counter of a local hospital.

105

101

99

100

105

101

102

91

102

100

104

100

98

99

107

99

101

97

101

92

100

100

101

103

94

106

94

102

93

109

100

103

103

109

96

101

103

103

101

100

98

96

98

104

96

105

103

97

102

106

100

108

100

100

99

99

104

98

106

107

108

102

93

100

101

105

108

99

96

101

100

99

106

95

92

108

102

105

105

81

89

103

108

98

109

106

101

102

104

97

103

108

104

98

109

108

(a) Construct a stem-and-leaf diagram using this data. You need to generate between 5 and 10 stems only for the diagram.    

(b) Draw a histogram for the frequency distribution with the first class “79 to less than 86”. On the same graph draw the frequency polygon.    

(c) Draw an ogive for the frequency distribution in Part (b) with the first class “79 to less than 86”.    

Question 3
    
A health research agency has recently collected the following information when investigating the occurrences of skin cancer in a certain population of beach goers:

• 7% of beach goers, who do not use any sun-screen lotion develop skin cancer at some stage in their life.

• 1% of beach goers, who use sun-screen lotion develop skin cancer at some stage in their life.

• 90% of beach goers use sun-screen.

Use this information to answer the following questions. (Hint: construct a contingency table.)

(a) If a beach goer is randomly selected, what is the probability that the person uses sun-screen lotion and yet develops skin cancer at some stage in life?      

(b) If a beach goer is randomly selected, what is the probability that the person develops skin cancer at some stage in life?        

(c) If a beach goer is randomly selected who has already developed skin cancer, what is the probability that the person does not use sun-screen lotion?        

(d) What is the probability that a beach goer randomly selected will not develop skin cancer in life time or uses sun-screen lotion?     

Question 4    

(a) A local supermarket receives fresh fruits delivery each morning at a time that varies uniformly between 6:00am and 8:00am. What delivery time can you be confident in stating that 95 percent of deliveries will arrive before?     

(b) The maintenance department of a city’s electric power company finds that it is cost-efficient to replace all street-light bulbs at once, rather than to replace the bulbs individually as they burn out. Assume that the lifetime of a bulb is normally (Gaussian) distributed with a mean of 8000 hours and a standard deviation of 300 hours. If the department wants no more than 3% of the bulbs to burn out before they are replaced, after how many hours should all of the bulbs be replaced?    

(c) The time between unplanned shutdowns of an Internet service provider has an exponential distribution with a mean of 20 days. Find the probability that the time between two unplanned shutdowns is 13 days.    

Question 5    

(a) The probability of success in a trial is 0.70. In 500 trials, what is the probability of succeeding between 280 and 355 times? Use normal approximation to the binomial distribution with continuity correction.    

(b) Toyota requires a quality assurance check of new cars before a shipment is made. The tolerable exception rate for this internal control is 0.05. During an audit, 400 cars were sampled from a population of 4,000 cars, and 10 were found that violated the internal control. Calculate the upper bound for a 95% one-sided confidence interval estimate for the rate of noncompliance.    

(c) BP wishes to estimate the mean amount of water that has seeped into the fuel storage tanks at its refineries in Sydney. A preliminary sample of n = 16 tanks showed that the standard deviation, s = 48 litres. How much larger should the sample be in order to estimate the mean water content of the tanks to within ±10 litres with 95% confidence?

Contingency Table for Beach Goers

1.

a>i>Categorical Variable and Ordinal Scale

The data is categorical because there are 3 given categories small, medium and large  and as the categories shows a degree of order  it is considered in the ordinal scale.

ii>Numerical Variable and Ratio scale

The number of cars is the count of the number of cars and therefore is the numerical variable and having ratio scale.

b>i> Probability or Statistical Inference – For the purpose of prediction and inference the help of regression or anova is required and since here it is said that the researchers were able to forecast the which will be the most profitable company in the future : this analyis is considered to be a probability or statistical inference analysis.

ii>Descriptive Statistics – Here the mean percentages are given which are evaluated from the summary statistics/descriptive statistics giving description of the data. No Inference or prediction is done here only the data is studied thoroughly.

2. The stem and leaf diagram is given below. Most of the data are given in the 90’s and 100’s values therefore the stem have total  10  stems with 5 for each. The 80’s gives the least number of values.  As the data is lies between a very close interval and the numbers are in close proximity therefore the stem and leaf plot gives a good representation of the numbers. It can be seen that maximum values are 98,99,100,101and 110 them. For the values in the 90’s , the second half of the values show much more frequency especially the value 99 arrives very often followed by the value 98. The values 100 and 101 havee the maximum frequencies .
 Stem-and-Leaf Plot

 Frequency    Stem &  Leaf

     2.00 Extremes    (=<89)
     1.00        9 .  1
     4.00        9 .  2233
     3.00        9 .  445
     7.00        9 .  6666777
    13.00        9 .  8888889999999
    22.00       10 .  0000000000001111111111
    15.00       10 .  222222233333333
    11.00       10 .  44444555555
     7.00       10 .  6666677
    11.00       10 .  88888889999

 Stem width:        10
 Each leaf:       1 case(s)

As seen from the stem and leaf plot if we look at the histogram we can almost see the same diagrammatic representation. There is almost no frequency for values 86- 94 as is evident from the data. Maximum values are concentrated near the 98-106 range showing that maximum customers remained for a time between this time. Referring to the stem and leaf diagram also maximum frequencies occurred for the values 100 and 101. Then there was a again a slight continueous decline of the values.This would be more evident if further a table of summary staistics including the mean, standard deviation is implemented. The frequency  polygon representing the histogram is given below the histogram(Conway, 1963).

Upper Limits

Frequency

86

1

90

1

94

7

98

14

102

36

106

24

110

13

More

0

Upper Limits

Frequency

Cumulative %

  86

1

1.04%

90

1

2.08%

94

7

9.38%

98

14

23.96%

102

36

61.46%

106

24

86.46%

110

13

100.00%

More

0

100.00%

From the above diagram the ogive or the cumulative frequency diagram shows the cumulative frequency plots over the histogram with a gradual increasing trend.For the ogive from the table corresponding to the data limits are given the frequencies and then the cumulative frequency table because it is necessary to calculate cumulative frequencies to draw an ogive as it is a cumulative frequency diagram. The rate of increments within the data can be observed from the diagram.

Uniform Distribution for Delivery Times

3. From the given data the following contigency table has been formulated with the number of people going to the beaches.    First the division is made by the number of people going to the beaches who use sunscreen and people who donot use sunscreen. Another division is made with the number of people having skin cancer and without skin cancer. It is a bivariate frequency table showing percentage of beach goers among the total number of beach goers.       

Skin Cancer

No Skin Cancer

Total

Use Sunscreen

1

90

Donot Use SunScreen

7

10

Total

8

92

100

i>    Total % of beach goers = 100%
% of favourable cases = 1% people developing skin cancer although using sunscreen
Therefore probability = 1/100
i>    Total % of beach goers = 100%
% of favourable cases = 1%(beachgoers using sunscreen) +7% (beachgoers not using sunscreen) = 8%
Therefore Probability = 8/100    
ii>    Percentage  of people who developed a skin cancer = 8%
Percentage of people having skin cancer and donot use sun skin = 1%
Therefore Probability = 1/8
iii>    Total % of Beach goers =100%
% of beach goers not developing cancer in lifetime = 92%
Therefore probability = 92/100
iv>    Also Total % of Beach goers =100%
% of beach goers using sunscreen =90%
Therfore probability = 90/100

4. a> The distribution of delivery times follows a uniform distribution.

The mean of the uniform distribution is 6+8/2 = 7 a.m.

And variance = (8-6) 2/12 = 4/12 = .333

Thefore the 95 percent confidence interval giving the upper limit for the delivery time is given below

Zα/2 = 1.96 therefore upper confidence level =7+1.96*.333 = 7.65

c> The exponential distribution here explains the time intervals of unplanned shut downs of the internet service providers where the mean of the number of days of unplanned shutdown is 20 days(Ahsanullah & Hamedani, 2010).
            pdf of an exponential distribution = λe-λx

               Here mean λ = 20

               Then probability  that the time between the unplanned shutdowns = 13days

If x= time between two unplanned shutdowns = 13 days

Then probability = 20*e-20*13

5. As the distribuion was about the rate of success and failure it was initially a binomial distribution but since here the toat sample size is 500 which is much greater thean 20 and since p lies between .05 to .95 and had  a value of .70 therefore it was approximated by the normal distribution.    

a>Success probability for a trail, i.e. p = .70

Total number of trials = 500

Therefore n=500

This binomial distribution is to be approximated by a normal distribution (Frederic P. Miller, 2009).

Therefore Z = X – np/√npq, where X is a binomial variable with n=500 and p = .70.

Np = mean of Z = 500*.70 = 350 and √npq = 105

Now the required problem is to find P[280-350/√105<Z<355-350/√105]

= P[-6.8 <Z<.487]

=P[ Z<.487] – P[ Z<-6.8] = .687 -0.0 =.687

Therefore required probability is .687.

b> Here the rate of non compliance is given as .05

Therefore p = .05 and q= 1-.05 = .95

The sample size n = 400 and the confidence level is given as 95%.

Therefore the upper bound for the 95% level of confidence is given below

P*q/n = .000119 , now √ P*q/n = .011

Now Zα/2 =1.96 , √ P*q/n * Zα/2   = .022 =error

Therefore upper margin = .05+.022 = .07

d> Here the standard deviation calculated from a sample before = 48

The error permitted in the data =+-10

The confidence level is 95%.

The relevant sample size that should generate the proper estimate of the mean is given by the formula

Sample size =  [(Zα/2 * S.d)/Error Margin on both sides]2

Here Zα/2 = 1.96     

Sample size = [1.96 * 48/10]2  = 88.51.

72 more tanks are needed.

References

Ahsanullah, M., & Hamedani, G. (2010). Exponential distribution. New York: Nova Science Publishers.

Conway, F. (1963). Descriptive statistics. Leicester: Leicester University Press.

Frederic P. Miller, A. (2009). Confidence interval. [S.l.]: Vdm Pub. House.