Data Analytics Case Study: Analyzing Sales Data For An ECommerce Bookstore

About Data

Books play an important part in everyone’s life be it student or any other casual reader. Nowadays, we usually order everything online and books sales are not far behind in terms of purchases and availability being provided by the ecommerce website is quite appreciable. From having dedicated section of books in every major ecommerce business houses like Amazon the books sale have been on the rising side as the order can be put in online and desired book if available can be bought from far off locations as well and these being delivered by the company in stipulated time. (Praveena & Bharathi, 2017)

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

With the boom in the eCommerce industry and rise of number of orders placed on such websites provide the sellers to cater customers from all over the world. The increase in business being exponential hence it brings number of challenges to the service providers, with customer satisfaction and competition being the most important aspect based on which today’s eCommerce industry is competing with each other.

The hypothetical data that have been developed using excel functions, with data description with their attributes is as follows:

Attribute

Description

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

Book Name

Name of the book

Book Price

Cost Price of the Book in AUD ($)

Sale Price

Sale Price of the Book in AUD ($)

Profit

Sale – Cost Price of the book in AUD ($)

Shipping Type

Free/Paid

Customer Type

New/Existing

Number of Customers

Purchase by number of customers of the particular book

Region

Western Australia/ South Australia

Category

Category of book

Total Monthly Sale

Total Sale Price of the Book in AUD ($)

Total Monthly Profit

Total Profit from the Book sale in AUD ($)

The following variables are the study objectives of the case study:

Total Monthly sale (in AUD $) = Book Sale Price (in AUD $) × No’s of customers

Total monthly profit (in AUD $) = Profit (in AUD $) × No’s of customers

The emphasis of the study is based on the following:

  1. Profit analysis based on Shipping Type, Region, Type of Customer and Book Category.
  2. Whether the number of customers (mean) is different from different category of shipping, region, category of book purchased and customer type.
  3. Correlation Analysis
  4. Regression Analysis

The different statistical techniques and tools are the major impact factors based on which data analysis is being conducted in data analysis. In order to do the profit analysis, we calculate the total monthly sale and corresponding total monthly profit for different attributes such as customer type, region, shipping and category of books on sale. The descriptive analysis have been done for the customer based on Shipping type, region of customer, category of books and customer type. The t-test and one-way ANOVA have been used for testing the difference between the mean of number of customer who both the books based on the criteria as discussed earlier. The regression model is being framed using the Python 3.6 and used MS-Excel for the Data analysis part. (Martey, Ahmed & Attoh-Okine, 2017)

In this section, we carried the following

  • Description Statistics
  • Two sample t test
  • Correlation Analysis
  • Profit Analysis
  • Regression Analysis
  • One-way ANOVA

The table below represents the profit analysis of the total monthly sales in AUD $, monthly profit in AUD $ and respective profit percentages based on shipping, customer, region and category of the books on same.

Table 1: Profit analysis according to for shipping type, customer type, region and category

Attributes

Level

Total Monthly

Sale (in $)

Total Monthly

Profit (in $)

Profit

Percentage

Shipping Type

FREE

22360

5908

8.28%

PAID

48941

13284

18.63%

Customer Type

Existing

16586

3507

3.89%

New

73646

15685

17.38%

Region

SA

53846

11408

12.64%

WA

36386

7784

8.63%

Category

Comics & Graphic Novels

18404

3918

4.34%

Literature & Fiction

38821

8271

9.17%

Mystery, Thriller & Suspense

11656

2466

2.73%

Romance

21351

4537

5.03%

From the table1 we can conclude that the eCommerce book store is earning profit of 9% of profit on each book on an average. The profits are quite different in each category/level of the different attributes of the books, the profit share of PAID shipping type and NEW customer type is higher as compared to other attributes of sales taken into consideration. Profits from comics & graphic novels and Literature & Fiction is on the higher side compared to Romance and Mystery category of books.

Project Problem

The total number of monthly sale and its profit is proportional to the number of customer on board, in the table below we show the number of customers who bought books and respective type.

Total monthly sale and profit are proportional to the number of customers. So here we display the summary statistics for the number of customers who bought the books for shipping type, customer type, region and category. Table 2 displays the size, mean, standard deviation, minimum and maximum of number of customers who bought books (Lee, Lee & Dass, 2018)

Table 2: Summary statistics for numbers of customer who bought the books for shipping type, customer type, region and category

Attributes

Level

Size

Mean

Standard Deviation

Shipping Type

FREE

401

5.115

2.111

PAID

882

4.337

2.224

Customer Type

Existing

237

4.668

2.334

New

1043

4.889

2.448

Region

SA

766

4.114

2.221

WA

517

5.339

2.332

Category

Comics & Graphic Novels

258

4.992

2.448

Literature & Fiction

555

4.116

2.219

Mystery, Thriller & Suspense

166

4.335

2.787

Romance

304

5.59

2.323

Following observations can be made using the table and its respective calculations done using MS-Excel:

  1. Averaging 4.75 customers are buying the books from the eCommerce platform.
  2. The mean of the customers buying the books are more by Free shipping rather than the paid shipping
  3. Mean of new customers is less than the number of existing customers
  4. Mean number of customer from WA is larger than the SA region.
  5. Customers demanding Romance books is higher than any other category into consideration.

We are now going to do the analysis of the difference among the mean of customers who bought the books based on shipping, region and customer. Let us assume that there is no significance difference between the mean of customers and different attributes under consideration and alternate hypothesis that there is significant difference between number of customers and their respective attributes. The test used for analysis is two sample tests with the assumption of unequal variances.

Table 3: Two sample independent test for shipping type, customer type and region

Attributes

Levels

Test Statistic

p-value

Shipping Type

Free and Paid

11.92

0.000

Customer Type

New and Existing

0.29

0.832

Region

WA and SA

-9.36

0.000

From the Table 3 we can conclude following points:

  1. There is significant difference of among number of customers who bought books on paid and free shipping.
  2. No significant difference among customers who were existing and new customers
  3. There is significant difference between customers from SA and WA region.

Using on-way ANOVA we test all the customers who have bought books in different category are different or not. The null and alternative hypothesis have been as follows:

Null Hypothesis: no significant difference among mean number of customer from different category who have made purchases

Alternative Hypothesis: no significant difference among mean number of customer from different category who have made purchases

Table 4: Output of one-way ANOVA for Category

Attributes

Level

F Statistic

P Value

Category

Comics & Graphic Novels, Mystery, Thriller & Suspense, Romance and Literature & Fiction

14.12

0.000

From one-way ANOVA test for different categories we can make the analysis that our alternative hypothesis was indeed correct and there is a significant difference between mean number of customers who have placed the order from different categories. (Giguelay & Huet, 2018)

In this section we would be focusing on the correlation among the different aspect of the data of sales of the book, table 5 provides the insight of the data analysis done. (Li, Tian, Tang & Cao, 2018)

Table 5: Pearson’s correlation coefficient

Product Price

Sale Price

Profit

Numbers of customer

Product Price

1

0.985

0.023

0.109

Sale Price

0.985

1

0.166

0.108

Profit

0.023

0.166

1

0.006

Numbers of customer

0.109

0.108

0.006

1

Following analysis can be made using the table 5:

  1. Product price is positively correlated
  2. Sale price is positively correlated
  3. Profit is positively correlated but with very low significance

We have used the simple linear regression techniques in order to predict the monthly sales of the customers who have bought the books as our target/predicting variable.

Table 6: Output of Regression Analysis

F Statistic

5978.55

P Value

0.000

R2

0.804

Intercept

-12.283

Slope

38.782

As per the table we can see that P value being 0 signifies that the relationship between monthly sales and customers who bought is highly significant. (Chen, Kang, Xing, Zhao & Milton, 2018) With the value of 0.819 of R2 we can suggest that the model is fitting the data quite well. The model used was:

Total sale (in $) = -12.283 + 38.782 × No’s of Customers

  1. The mean number of customers who have bought the books suggest that the shipping should be made free in order to have more number of orders from the customers
  2. Mean number of customer in WA is higher as compared to the SA region hence, company should promote more in the SA region of the country to increase sales.
  3. Mean number of Romance category books sales is more than any other category hence more titles should be kept in order to increase sales.
  1. Strength of shipping staff to be increase to cater more orders
  2. Adopt new marketing strategies like digital marketing in order to fetch more orders
  3. Other regions of Australia should be targeted for selling the books
  4. More Romance Titles should be kept on the portal for sales

Conclusions

Averaging 4.75 customers are buying the books from the eCommerce platform. The mean of the customers buying the books are more by Free shipping rather than the paid shipping. Mean of new customers is less than the number of existing customers. Mean number of customer from WA is larger than the SA region. Customers demanding Romance books is higher than any other category into consideration.

There is significant difference of among number of customers who bought books on paid and free shipping. No significant difference among customers who were existing and new customers. There is significant difference between customers from SA and WA region.

From one-way ANOVA test for different categories we can make the analysis that our alternative hypothesis was indeed correct and there is a significant difference between mean number of customers who have placed the order from different categories. Correlation analysis states that product price is positively correlated, Sale price is positively correlated, Profit .is positively correlated but with very low significance.

References

Chen, S., Kang, J., Xing, Y., Zhao, Y., & Milton, D. (2018). Estimating large covariance matrix with network topology for high-dimensional biomedical data. Computational Statistics & Data Analysis, 127, 82-95. doi: 10.1016/j.csda.2018.05.008

Giguelay, J., & Huet, S. (2018). Testing k-monotonicity of a discrete distribution. Application to the estimation of the number of classes in a population. Computational Statistics & Data Analysis, 127, 96-115. doi: 10.1016/j.csda.2018.02.006

Lee, K., Lee, J., & Dass, S. (2018). Inference for differential equation models using relaxation via dynamical systems. Computational Statistics & Data Analysis, 127, 116-134. doi: 10.1016/j.csda.2018.05.014

Li, H., Tian, G., Tang, N., & Cao, H. (2018). Assessing non-inferiority for incomplete paired-data under non-ignorable missing mechanism. Computational Statistics & Data Analysis, 127, 69-81. doi: 10.1016/j.csda.2018.05.009

Martey, E., Ahmed, L., & Attoh-Okine, N. (2017). Track geometry big data analysis: A machine learning approach. 2017 IEEE International Conference On Big Data (Big Data). doi: 10.1109/bigdata.2017.8258381

Praveena, M., & Bharathi, B. (2017). A survey paper on big data analytics. 2017 International Conference On Information Communication And Embedded Systems (ICICES). doi: 10.1109/icices.2017.8070723

Singh, K., & Wajgi, R. (2016). Data analysis and visualization of sales data. 2016 World Conference On Futuristic Trends In Research And Innovation For Social Welfare (Startup Conclave). doi: 10.1109/startup.2016.7583967