Data Analytics Case Study: Analyzing Sales Data For An ECommerce Bookstore
- December 28, 2023/ Uncategorized
About Data
Books play an important part in everyone’s life be it student or any other casual reader. Nowadays, we usually order everything online and books sales are not far behind in terms of purchases and availability being provided by the ecommerce website is quite appreciable. From having dedicated section of books in every major ecommerce business houses like Amazon the books sale have been on the rising side as the order can be put in online and desired book if available can be bought from far off locations as well and these being delivered by the company in stipulated time. (Praveena & Bharathi, 2017)
With the boom in the eCommerce industry and rise of number of orders placed on such websites provide the sellers to cater customers from all over the world. The increase in business being exponential hence it brings number of challenges to the service providers, with customer satisfaction and competition being the most important aspect based on which today’s eCommerce industry is competing with each other.
The hypothetical data that have been developed using excel functions, with data description with their attributes is as follows:
Attribute |
Description |
Book Name |
Name of the book |
Book Price |
Cost Price of the Book in AUD ($) |
Sale Price |
Sale Price of the Book in AUD ($) |
Profit |
Sale – Cost Price of the book in AUD ($) |
Free/Paid |
|
Customer Type |
New/Existing |
Number of Customers |
Purchase by number of customers of the particular book |
Region |
Western Australia/ South Australia |
Category |
Category of book |
Total Monthly Sale |
Total Sale Price of the Book in AUD ($) |
Total Monthly Profit |
Total Profit from the Book sale in AUD ($) |
The following variables are the study objectives of the case study:
Total Monthly sale (in AUD $) = Book Sale Price (in AUD $) × No’s of customers
Total monthly profit (in AUD $) = Profit (in AUD $) × No’s of customers
The emphasis of the study is based on the following:
- Profit analysis based on Shipping Type, Region, Type of Customer and Book Category.
- Whether the number of customers (mean) is different from different category of shipping, region, category of book purchased and customer type.
- Correlation Analysis
- Regression Analysis
The different statistical techniques and tools are the major impact factors based on which data analysis is being conducted in data analysis. In order to do the profit analysis, we calculate the total monthly sale and corresponding total monthly profit for different attributes such as customer type, region, shipping and category of books on sale. The descriptive analysis have been done for the customer based on Shipping type, region of customer, category of books and customer type. The t-test and one-way ANOVA have been used for testing the difference between the mean of number of customer who both the books based on the criteria as discussed earlier. The regression model is being framed using the Python 3.6 and used MS-Excel for the Data analysis part. (Martey, Ahmed & Attoh-Okine, 2017)
In this section, we carried the following
- Description Statistics
- Two sample t test
- Correlation Analysis
- Profit Analysis
- Regression Analysis
- One-way ANOVA
The table below represents the profit analysis of the total monthly sales in AUD $, monthly profit in AUD $ and respective profit percentages based on shipping, customer, region and category of the books on same.
Table 1: Profit analysis according to for shipping type, customer type, region and category
Attributes |
Level |
Total Monthly Sale (in $) |
Total Monthly Profit (in $) |
Profit Percentage |
Shipping Type |
FREE |
22360 |
5908 |
8.28% |
PAID |
48941 |
13284 |
18.63% |
|
Customer Type |
Existing |
16586 |
3507 |
3.89% |
New |
73646 |
15685 |
17.38% |
|
Region |
SA |
53846 |
11408 |
12.64% |
WA |
36386 |
7784 |
8.63% |
|
Category |
Comics & Graphic Novels |
18404 |
3918 |
4.34% |
Literature & Fiction |
38821 |
8271 |
9.17% |
|
Mystery, Thriller & Suspense |
11656 |
2466 |
2.73% |
|
Romance |
21351 |
4537 |
5.03% |
From the table1 we can conclude that the eCommerce book store is earning profit of 9% of profit on each book on an average. The profits are quite different in each category/level of the different attributes of the books, the profit share of PAID shipping type and NEW customer type is higher as compared to other attributes of sales taken into consideration. Profits from comics & graphic novels and Literature & Fiction is on the higher side compared to Romance and Mystery category of books.
Project Problem
The total number of monthly sale and its profit is proportional to the number of customer on board, in the table below we show the number of customers who bought books and respective type.
Total monthly sale and profit are proportional to the number of customers. So here we display the summary statistics for the number of customers who bought the books for shipping type, customer type, region and category. Table 2 displays the size, mean, standard deviation, minimum and maximum of number of customers who bought books (Lee, Lee & Dass, 2018)
Table 2: Summary statistics for numbers of customer who bought the books for shipping type, customer type, region and category
Attributes |
Level |
Size |
Mean |
Standard Deviation |
Shipping Type |
FREE |
401 |
5.115 |
2.111 |
PAID |
882 |
4.337 |
2.224 |
|
Customer Type |
Existing |
237 |
4.668 |
2.334 |
New |
1043 |
4.889 |
2.448 |
|
Region |
SA |
766 |
4.114 |
2.221 |
WA |
517 |
5.339 |
2.332 |
|
Category |
Comics & Graphic Novels |
258 |
4.992 |
2.448 |
Literature & Fiction |
555 |
4.116 |
2.219 |
|
Mystery, Thriller & Suspense |
166 |
4.335 |
2.787 |
|
Romance |
304 |
5.59 |
2.323 |
Following observations can be made using the table and its respective calculations done using MS-Excel:
- Averaging 4.75 customers are buying the books from the eCommerce platform.
- The mean of the customers buying the books are more by Free shipping rather than the paid shipping
- Mean of new customers is less than the number of existing customers
- Mean number of customer from WA is larger than the SA region.
- Customers demanding Romance books is higher than any other category into consideration.
We are now going to do the analysis of the difference among the mean of customers who bought the books based on shipping, region and customer. Let us assume that there is no significance difference between the mean of customers and different attributes under consideration and alternate hypothesis that there is significant difference between number of customers and their respective attributes. The test used for analysis is two sample tests with the assumption of unequal variances.
Table 3: Two sample independent test for shipping type, customer type and region
Attributes |
Levels |
Test Statistic |
p-value |
Shipping Type |
Free and Paid |
11.92 |
0.000 |
Customer Type |
New and Existing |
0.29 |
0.832 |
Region |
WA and SA |
-9.36 |
0.000 |
From the Table 3 we can conclude following points:
- There is significant difference of among number of customers who bought books on paid and free shipping.
- No significant difference among customers who were existing and new customers
- There is significant difference between customers from SA and WA region.
Using on-way ANOVA we test all the customers who have bought books in different category are different or not. The null and alternative hypothesis have been as follows:
Null Hypothesis: no significant difference among mean number of customer from different category who have made purchases
Alternative Hypothesis: no significant difference among mean number of customer from different category who have made purchases
Table 4: Output of one-way ANOVA for Category
Attributes |
Level |
F Statistic |
P Value |
Category |
Comics & Graphic Novels, Mystery, Thriller & Suspense, Romance and Literature & Fiction |
14.12 |
0.000 |
From one-way ANOVA test for different categories we can make the analysis that our alternative hypothesis was indeed correct and there is a significant difference between mean number of customers who have placed the order from different categories. (Giguelay & Huet, 2018)
In this section we would be focusing on the correlation among the different aspect of the data of sales of the book, table 5 provides the insight of the data analysis done. (Li, Tian, Tang & Cao, 2018)
Table 5: Pearson’s correlation coefficient
Product Price |
Sale Price |
Profit |
Numbers of customer |
|
Product Price |
1 |
0.985 |
0.023 |
0.109 |
Sale Price |
0.985 |
1 |
0.166 |
0.108 |
Profit |
0.023 |
0.166 |
1 |
0.006 |
Numbers of customer |
0.109 |
0.108 |
0.006 |
1 |
Following analysis can be made using the table 5:
- Product price is positively correlated
- Sale price is positively correlated
- Profit is positively correlated but with very low significance
We have used the simple linear regression techniques in order to predict the monthly sales of the customers who have bought the books as our target/predicting variable.
Table 6: Output of Regression Analysis
F Statistic |
5978.55 |
P Value |
0.000 |
R2 |
0.804 |
Intercept |
-12.283 |
Slope |
38.782 |
As per the table we can see that P value being 0 signifies that the relationship between monthly sales and customers who bought is highly significant. (Chen, Kang, Xing, Zhao & Milton, 2018) With the value of 0.819 of R2 we can suggest that the model is fitting the data quite well. The model used was:
Total sale (in $) = -12.283 + 38.782 × No’s of Customers
- The mean number of customers who have bought the books suggest that the shipping should be made free in order to have more number of orders from the customers
- Mean number of customer in WA is higher as compared to the SA region hence, company should promote more in the SA region of the country to increase sales.
- Mean number of Romance category books sales is more than any other category hence more titles should be kept in order to increase sales.
- Strength of shipping staff to be increase to cater more orders
- Adopt new marketing strategies like digital marketing in order to fetch more orders
- Other regions of Australia should be targeted for selling the books
- More Romance Titles should be kept on the portal for sales
Conclusions
Averaging 4.75 customers are buying the books from the eCommerce platform. The mean of the customers buying the books are more by Free shipping rather than the paid shipping. Mean of new customers is less than the number of existing customers. Mean number of customer from WA is larger than the SA region. Customers demanding Romance books is higher than any other category into consideration.
There is significant difference of among number of customers who bought books on paid and free shipping. No significant difference among customers who were existing and new customers. There is significant difference between customers from SA and WA region.
From one-way ANOVA test for different categories we can make the analysis that our alternative hypothesis was indeed correct and there is a significant difference between mean number of customers who have placed the order from different categories. Correlation analysis states that product price is positively correlated, Sale price is positively correlated, Profit .is positively correlated but with very low significance.
References
Chen, S., Kang, J., Xing, Y., Zhao, Y., & Milton, D. (2018). Estimating large covariance matrix with network topology for high-dimensional biomedical data. Computational Statistics & Data Analysis, 127, 82-95. doi: 10.1016/j.csda.2018.05.008
Giguelay, J., & Huet, S. (2018). Testing k-monotonicity of a discrete distribution. Application to the estimation of the number of classes in a population. Computational Statistics & Data Analysis, 127, 96-115. doi: 10.1016/j.csda.2018.02.006
Lee, K., Lee, J., & Dass, S. (2018). Inference for differential equation models using relaxation via dynamical systems. Computational Statistics & Data Analysis, 127, 116-134. doi: 10.1016/j.csda.2018.05.014
Li, H., Tian, G., Tang, N., & Cao, H. (2018). Assessing non-inferiority for incomplete paired-data under non-ignorable missing mechanism. Computational Statistics & Data Analysis, 127, 69-81. doi: 10.1016/j.csda.2018.05.009
Martey, E., Ahmed, L., & Attoh-Okine, N. (2017). Track geometry big data analysis: A machine learning approach. 2017 IEEE International Conference On Big Data (Big Data). doi: 10.1109/bigdata.2017.8258381
Praveena, M., & Bharathi, B. (2017). A survey paper on big data analytics. 2017 International Conference On Information Communication And Embedded Systems (ICICES). doi: 10.1109/icices.2017.8070723
Singh, K., & Wajgi, R. (2016). Data analysis and visualization of sales data. 2016 World Conference On Futuristic Trends In Research And Innovation For Social Welfare (Startup Conclave). doi: 10.1109/startup.2016.7583967