Analysis Of Lounge Quality By Skytrax: Logistic Regression And Decision Tree Models
- December 28, 2023/ Uncategorized
Pre-processing of Data
Different aspects of Quality faced by passengers travelling around the world by Air is being reviewed by Airport Quality Agency (AQA). AQA have segregated passenger’s recommendations into four aspects. The four aspects on which AQA is seeking information are (1) airlines, (2) airports, (3) lounges and (4) seats. The features are in their order of importance. In order to evaluate various aspects of quality, Skytrax is involved in evaluating the airport quality. For the purpose of evaluation, as an initial step Skytrax has conducted a mini survey from passengers. The organization, primarily has conducted a short survey including a short descriptive study. AQA intends to collect future data through social media. They intend to use the social networking sites like Twitter and Facebook. With the use of social networking AQA would take advantage of anonymous collection of information. As of present the data collected by Skytrax would be analysed through the use of BI tool such as RapidMiner. For the present analysis the information available regarding lounge would be analysed. It is a strong belief that with favourable increase in the expectation of the travellers, they would recommend more air travel.
The aim of the present data analysis is the identification of factors which play significant role in endorsement by airline travellers.
The pre-processing of a dataset is a primary essential step towards proper analysis of information. The data collected by Skytrax contains information on various aspects pertaining to passenger’s air travel experience. In addition, there is presence of missing data also. For proper analysis of the data all missing data has been imputed. Further, since we are interested in assessing the “lounge” at airports hence it is prudent to consider “recommendation ratings of customers” for prediction purposes. From the dataset it is found that customers have rated comfort, cleanliness, beverages, catering, washrooms, Wi-Fi connectivity and staff service available at lounges of airports. All individual rating factors are included to interpret the recommendation rating of lounges by customers.
In order to analyse the relationship, it is important to understand the level of recommendation of the customers. Since, we have been assigned to analyse the data representing the lounge at airports, the comfort rating of the lounge is studied. From the histogram it is found that it can be interpreted that the average customer rating is in favour of the comfort of the lounges. Hence, we can infer that the airline traveller’s feel that the comfort quality of the lounges is good.
Logistic Regression Model
Further into the analysis the correlation amongst the quality parameters is tested. From the correlation matrix it is found that the variables under study have a positive correlation. Moreover, the variables used to assess the overall rating have a strong positive correlation. Furthermore, it can be inferred that the higher the correlation with the overall rating, the higher the quality of the attribute.
In order to analyse the quality of the lounges two optimisation models were developed using logistic regression and decision tree technique. In the following section we present the processes and outputs of the two models.
In order to undertake logistic regression in rapid miner the above given process is followed. The advantage of using the above process is that it can provide the accuracy and performance of the model. The model also provides for validation of the dataset. Logistic regression model is a useful model since it can predict a model even with the presence of missing data. Moreover, the performance of the model is also tested with the above process. The “recommendation” of the customers is selected as the dependent variable. Since the ratings of the recommendation is binomial hence logistic regression is a suitable method for evaluating the relation between dependent and independent variable. The different ratings variable is selected as the independent variables.
From the above image it is seen that the model can predict the overall rating with 77.72% accuracy.
The precision of the logistic regression is 68.49% with a variation of 3.96%.
From the above AUC curve it is found that the ROC is 0.863 with a variation of ± 0.028. Since the AUC is very high hence the accuracy / precision of the logistic regression is satisfactory. Moreover, the variability in the prediction is very low, 2.8%.
The relation of recommendation can be visualised as:
- Recommendation = 0.715*overall_rating + 0.343*comfort_rating + 0.010*Cleanliness_rating + 0.210*bar_beverages_rating + 0.033*caterin_rating -0.030*washrooms_rating + .0033*wifi_connectivity_rating + 0.222*staff_service_rating – 6.074
From the above equation it can be interpreted that except for washrooms the ratings of all other variables tend to add to the recommendation of the lounges. Further, it can be said that the coefficients of overall rating, comfort rating, bar beverages rating and staff service rating are statistically significant at 0.05 level of significance. Thus the impact of overall rating, comfort rating, bar beverages rating and staff service rating. Impact more on the recommendation of a lounge. An analysis of the coefficients demonstrates that overall rating of the customers has the highest positive impact on recommendation of a lounge. On the other hand, cleanliness rating has the lowest positive impact on recommendation.
Decision Tree Model
A decision tree ensemble in rapidminer provides the root for the “recommendation” of the lounge. The model is useful to test data sets. A complete dataset can be divided into two groups –training and validation. A model can be built using the training dataset which can be validated with the validation dataset. Moreover, a decision tree provides us with several choices between the variables under study.
From the above image it is seen that the decision tree model can predict the overall rating with 78.95% accuracy with a variation of ±3.14%.
From the above image it is seen that the decision tree model can predict the overall rating with 66.55% precisions with a variation of ±3.98%
Thus, it is seen that the accuracy and precision of the decision tree models are high with low variability.
From the above AUC curve it is found that the ROC is 0.80 with a variation of ± 0.031. Since the AUC is very high hence the accuracy / precision of the decision tree is satisfactory. Moreover, the variability in the prediction is very low, 3.1%.
The decision tree shows that it is valid when overall rating is less than 3500. At the bottom of the decision tree it is found that the service rating if more than 3 then we can take a decision. After the selection of the staff service rating we need to consider the comfort rating. At the third level the washroom rating needs to be considered. At the final level the ratings of the bars and beverages needs to be considered. Thus a decision can be arrived about the overall rating of the lounge.
The above model has been used in Rapidminer for evaluating the k-NN. K-NN evaluates the nearest neighbour for a given variable.
The accuracy of the model 76.90% with 1.27% variation.
We find that k-NN has a high classification error. Thus it is very unsuitable for use.
The clustering classifier is used to evaluate lounge type based on the factors of ratings. The above image shows the process through which clusters is evaluated. The clusters can be used to get information on the factors which would be helpful to rate the lounges.
The above image provides information on the ideal number of clusters. It is sen that on the basis of the lounge types the data can be clsutered into 3 clusters. The avergae eucledian distance between the clusters is 3.552.
The above shows that recommendation is closely related to overall rating of the lounges.
In addition recommendation can also be used to predict catering rating.
The above decision tree can be sued to predict the value of a cluster.
From the above analysis it is seen that the lounges have an overall good rating. Moreover, the decision tree model is found suitable for evaluation of the ratings of the customers. Thus, the data quality is found to be suitable. Skytrax should use the decision tree model for evaluating the Airport Quality.