Data Mining Techniques For Large Data Sets

December 21, 2023/

Uncategorized

Data Mining Techniques for Managing Large Data Sets

Huge amount of data is required in any field of industry. However, this data is converted into useful information that provides profit to the business. However, this huge amount of data and information is required for providing a catastrophic view to the operations (Wu et al. 2014). Therefore, this huge data have to be properly managed to keep record of information.

This report discusses about the need of data mining process maintaining huge amount of data and information collected from the data set. Various techniques of classification of the data mining technique have been discussed in the report.

This report outlines the Bayes Theorem for analyzing the monthly sales from the data set. The use of Bayes theorem in the analysis of data set has been justified in this report. The geographic area for the data set has been targeted in the report and based in this recommendations are provided for improving the business in the market have been provided in the market.

The research has followed the classification technique of data mining that might help in classifying the data set into various order based on the quality, price and categories. This function of the data mining also helps in maintaining and enhancing the search option for the data set (Roiger 2014). The research has followed the Bayes Theorem for maintaining the classification process. The use of the probability technique by the Bayes theorem have helped in providing a prior classification of the products and data. As commented by Horner and Richard (2016), the Bayes theorem is the concept of conditional probability.

P(A_i|A) = P(A_i)P(A|A_i)/( sum_(j=1)^NP(A_j)P(A|A_j)),

where P(A_i) is the probability of an event A_i, P(A_i|A) is the conditional probability of A_i given that A has already occurred, the events are disjoint, and union _(i = 1)^NA_i = A.

As mentioned by Zheng (2015), Bayes theorem focuses on the probability of the event that are based on the prior knowledge of conditions that are related to the event. As argued by Aggarwal (2015), decision tree is not applicable in this data set analysis. The decision tree consist of nodes, root, branches and leaf nodes that are not applicable to the data set analysis. The use of these components are not required in the analysis. Therefore, the use of the decision tree is not applicable to the analysis (Lu, Setiono and Liu 2017). The discussion tree algorithm can be applied to project that have various steps that are interrelated to each other.

Bayes Theorem for Analyzing Monthly Sales

The research methodology have discussed about the use of Bayes theorem for analyzing the data set. The description of the process followed in the analysis has been provided below. The use of various techniques using the Bayes theorem has been implemented in the section (Shmueli et al. 2017). The graphs and charts obtained from the analysis of data set has been discussed.

After the analysis of the data set we got, there are 412 customers from the NSW region, 432 customers from QLD region, 436 customers from SA, 402 customers from TAS, 413 customers form VIC and 405 customers from WA. There are total of 838 existing customers, 853 loyal customers and 809 new customers in the data set.

The statistical data about the data set is given by the table and analysis is that Mean for Price is $ 67 (max- $119, Min- $15). However, for review, Mean- 2.995200 and Max- 4, Min- 2. Mean value for the number of customers who bought is 39 (approx.). The sales of the company has been increasing in the recent years Therefore, the mean value for monthly sales is $ 500.891200. Therefore, it can be analysis that the company is at good pace (Olson and Wu 2017). However, the company needs some changes in the marketing of the product in the market the use of the marketing strategy and planning might help in maintaining a good position in the market (Linchangco, Jay and Brouwer 2017).

From the above graph, it is evident that regions TAS have the maximum number of sales in the data set which is 44o. Therefore, after further analysis of the data set using the group by clause, we got 420 sales in QLD and SA have 418 sales. In addition to that, the regions 391 sales from the WA region, 398 sales form VIC and 399 sales from WA.

The above graph depicts the rating of the products according to the reviews provided by customers in the market. It is observed that maximum number of the products are reviewed with two star and minimum number of products are rated with three star. Similar to the have process, we found that there are 803 products with rating “2”, 908 products with rating “4” and 789 products with rating “3”.

There are 526 cheddar cheese packets, 495 JM shirts, 484 pasta packets, 496 pen and 499 Reebok limited edition shoes were sold.

Recommendations for Business Optimization based on Regional Analysis

Scatter plot for sale of products in regions

Box plot for the Price of the Products

For the products and prices, the above box plot gives the mean price is between the $60-$80

Cross table 1

Review	2	3	4	All
CustomerType
EXISTING	274	274	282	830
LOYAL	286	269	280	835
NEW	295	259	281	835
All	855	802	843	2500

Existing customers rated 248 products with rating 2, 268 products with rating 3 and 322 products with rating 4. Other details compared to the rating by customers of different regions are given by:

Review	2	3	4	All
Region
NSW	154	134	144	432
QLD	151	127	142	420
SA	134	131	153	418
TAS	164	143	133	440
VIC	128	131	140	399
WA	124	136	131	391
All	855	802	843	2500

Cross table 2

From the NSW region, 121 products are rated with rating 2, 145 products with rating 3 and 146 products with rating 4. Other ratings the different regions are shown below:

Total sales from the given data set is mounted to $166702.

Region	NSW	QLD	SA	TAS	VIC	WA	All
Product
Cheddar chesse	78	91	79	105	86	87	526
JM SHIRTS	87	93	84	70	82	79	495
Pasta	89	74	83	79	76	83	484
Pens	89	81	93	93	72	68	496
Reebok Limited Edition Shoes	89	81	79	93	83	74	499
All	432	420	418	440	399	391	2500

After the modeling of the Bayes model we got the monthly mean revenue is predicted to 500.891200

Conclusion

It can be concluded that the analysis of the data set used in this report has been properly analyzed using various theorem and procedure including python programing the graphs and charts included din the report have able to analyses the result properly. The data included in the data set has been perfectly manipulated using the data mining classification. The use of various techniques using the analytical tool has been discussed in the report. Data mining techniques have helped in maintaining the huge amount of data set and different to data to manipulate the analysis of the data set.

From the above analysis of the data set, it can be seen that the products are variably sold in different parts of the country. Therefore, there is a requirement of proper management in order in increase the sales of products in the market. The use of various strategies including the market planning and target products at proper places. This methodology might help in increasing the sales of different products at different places of the company. As per the analysis, VIC region have the lowest new customers of 115, QLD has 135 new customers and SA has maximum customers 144. The product required to be promoted more is pen. From the analysis, it is found that pens are less sold in the region. The demand of product in the market have ti be increased by proper advertisements in the market. The use of the advertisements of the pens in the market might help in promoting product in the market. Pens have to be promoted in the NSW region, as the number of new customers from that region is minimum.

Classification Techniques for Data Mining

The use of promotion strategies might help in developing the interest among the customers to the pens. The company might go fir free shipping cost to the customers. In several cases, customers are nit wiling t buy a product due to high shipping charges. Therefore, the company might bear the cost of shipping and make to free for the customers. Therefore, this strategy will increase the interest of the customers in the product. The company have to restart the advertisement program by minimizing the cost of the pens by dropping the shipping cost. The time of delivery of the products to the customers need to be minimized that helps in maintaining the interest of the customers. The feedback of the customers plays an important role in increasing the demand of the product in the market. The company have to focus on the customer satisfaction. A satisfied customer is capable of bringing another customer to the company for purchasing. Staff training might be another strategy for initiating the marketing plan of the company. Training sessions needs to be initiated in the company for enhancing skills and knowledge of the employees. Customer relationship network of the company needs to be enhanced in the company for the development of the company. Various campaigns and seminars needs to be conducted by the company for providing rewards ad prizes to the employees having good performance in the company. This help in enhancing the confidence level of the employee and can increase the production level of the company. The sales of the company is totally depended on the brand image of the company in the market. Most of the products are rated with 3. Most of the products sold valued at $60-$80.

References

Aggarwal, C.C., 2015. Data mining: the textbook. Springer.

Chaurasia, V. and Pal, S., 2017. Data mining techniques: To predict and resolve breast cancer survivability.

Gholizadeh, A., Carmon, N., Klement, A., Ben-Dor, E. and Bor?vka, L., 2017. Agricultural Soil Spectral Response and Properties Assessment: Effects of Measurement Protocol and Data Mining Technique. Remote Sensing, 9(10), p.1078.

Horner, M.W. and Richard, A., 2016. Social data mining for understanding public perceptions of autonomous vehicles: National trends and the case of florida (No. 16-3786).

Larose, D.T., 2014. Discovering knowledge in data: an introduction to data mining. John Wiley & Sons.

Linchangco, R., Jay, J.J. and Brouwer, C., 2017. Linking Nutrition and Molecular Biology Using Data Mining and Graph Theory. The FASEB Journal, 31(1 Supplement), pp.lb134-lb134.

Lior, R., 2014. Data mining with decision trees: theory and applications (Vol. 81). World scientific.

Lu, H., Setiono, R. and Liu, H., 2017. Neurorule: A connectionist approach to data mining. arXiv preprint arXiv:1701.01358.

Olson, D.L. and Wu, D.D., 2017. Data Mining Models and Enterprise Risk Management. In Enterprise Risk Management Models (pp. 119-132). Springer, Berlin, Heidelberg.

Roiger, R.J., 2017. Data mining: a tutorial-based primer. CRC Press.

Shmueli, G., Bruce, P.C., Yahav, I., Patel, N.R. and Lichtendahl Jr, K.C., 2017. Data Mining for Business Analytics: Concepts, Techniques, and Applications in R. John Wiley & Sons.

Witten, I.H., Frank, E., Hall, M.A. and Pal, C.J., 2016. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann.

Wu, X., Zhu, X., Wu, G.Q. and Ding, W., 2014. Data mining with big data. IEEE transactions on knowledge and data engineering, 26(1), pp.97-107.

Zheng, Y., 2015. Trajectory data mining: an overview. ACM Transactions on Intelligent Systems and Technology (TIST), 6(3), p.29.

Turn in your highest-quality paper
Get a qualified writer to help you with

“ Data Mining Techniques For Large Data Sets ”

Get high-quality paper

NEW! AI matching with writer

Marketing Analysis For The Retail Industry: A Report By MIT University Students

Task Overview The term marketing management means giving directions to an organization’s resources in order to implement the best possible strategy so as to meet the desires of the customers. The goal should be maximize the sales of the particular product or service. The person responsible for overseeing, planning the development of new products is […]

Developing An Audit Program For A Selected Listed Company

Introduction and Project Requirements Auditing is an independent examination of financial information of an entity whether profit making or not and irrespective of its size that is small, medium or big or its legal form whether it is a company(whether listed or unlisted), partnership or any other body corporate and when such examination is conducted […]

Organizational Factors That Hinder Employee Productivity: A Survey Study

Thesis statement 1). Thesis statement Save Time On Research and Writing Hire a Pro to Write You a 100% Plagiarism-Free Paper. Get My Paper Due to low productivity in most organizations, high labor turnover and job unsatisfactory, several issues emerge on whether there are factors that may hinder or motivate employees from working and producing […]

Legal Issues Arising From Two Cases

Case 1: Liability of Meghan and Rachel for Misappropriation of Funds by Charles Windsor III Facts Save Time On Research and Writing Hire a Pro to Write You a 100% Plagiarism-Free Paper. Get My Paper Harry Spencer,a long-time client of Charles Windsor & Sons Solicitors, gave Charles Windsor III some money to invest on his […]

Charter Development For Stud Farm Online Management System Project

Part One MOV – Measurable Organisational Value Save Time On Research and Writing Hire a Pro to Write You a 100% Plagiarism-Free Paper. Get My Paper Area of Impact Rank (1 to 5) Operational 1 Save Time On Research and Writing Hire a Pro to Write You a 100% Plagiarism-Free Paper. Get My Paper Strategy […]

Report On Organisational Performance Of An Allocated Company

Components of the Report The report is developed to provide an analysis of the organizational performance of a selected company listed on ASX. It provides an evaluation of the financial performance of the selected company with analyzing its profitability position. This is carried out by providing comparison of the current year financial figure of the […]

Connect With Us