Optimization CatBoost using GridSearchCV for Sentiment Analysis Customer Reviews in Digital Transportation Industry
DOI:
https://doi.org/10.38043/tiers.v6i2.7201Keywords:
sentiment analysis, uber customer reviews, catboost, gridsearchcv, digital transportationAbstract
The rapid expansion of ride-hailing services has generated a massive volume of user feedback, making automated sentiment analysis essential for understanding customer satisfaction. This study aims to classify public sentiment towards the Uber application into positive, neutral, and negative categories using the CatBoost algorithm, a gradient boosting method prioritized for its Ordered Boosting mechanism, which effectively prevents overfitting and enhances the model's generalization capabilities. Despite the use of TF-IDF for numerical text representation, CatBoost is selected for its superior performance on heterogeneous datasets compared to other boosting frameworks like XGBoost and LightGBM. The dataset comprises customer reviews collected 12.000 from the Google Play Store between January and March 2024 using web scraping techniques upload in Kaggle. The data underwent rigorous preprocessing, including lemmatization and TF-IDF vectorization, to structure the textual features, to maximize model performance, hyperparameter optimization was conducted using GridSearchCV. The experimental results demonstrate that the optimization process successfully improved the model's generalization capabilities, raising the Accuracy from 0.907 to 0.910 and the F1-Score from 0.893 to 0.897. Most significantly, the AUC score increased from 0.949 to 0.957, indicating a superior ability to distinguish between sentiment classes. However, while the model exhibited high precision in identifying positive and negative polarities, analysis of the confusion matrix revealed limitations in correctly predicting the neutral class, suggesting challenges related to class imbalance. These findings confirm that an optimized CatBoost model is a robust tool for sentiment classification, though future work is recommended to address minority class detection.
Downloads
References
X. Xu, Y. Wang, Q. Zhu, and Y. Zhuang, “Time matters: Investigating the asymmetric reflection of online reviews on customer satisfaction and recommendation across temporal lenses,” Int J Inf Manage, vol. 75, p. 102733, 2024, doi: https://doi.org/10.1016/j.ijinfomgt.2023.102733.
A. Boukis, L. Harris, and C. D. Koritos, “‘Give me an upgrade or I will give you a bad review!’ Investigating customer threats in the hospitality industry,” Tour Manag, vol. 104, p. 104927, 2024, doi: https://doi.org/10.1016/j.tourman.2024.104927.
M. Nilashi et al., “The nexus between quality of customer relationship management systems and customers’ satisfaction: Evidence from online customers’ reviews,” Heliyon, vol. 9, no. 11, p. e21828, 2023, doi: https://doi.org/10.1016/j.heliyon.2023.e21828.
T. Bruno et al., “A blockchain-based platform for incentivizing customer reviews in the grocery industry,” Blockchain: Research and Applications, vol. 5, no. 4, p. 100226, 2024, doi: https://doi.org/10.1016/j.bcra.2024.100226.
A. H. Tahir, M. Adnan, and Z. Saeed, “The impact of brand image on customer satisfaction and brand loyalty: A systematic literature review,” Heliyon, vol. 10, no. 16, p. e36254, 2024, doi: https://doi.org/10.1016/j.heliyon.2024.e36254.
F. Carichon, C. Ngouma, B. Liu, and G. Caporossi, “Objective and neutral summarization of customer reviews,” Expert Syst Appl, vol. 255, p. 124449, 2024, doi: https://doi.org/10.1016/j.eswa.2024.124449.
M. Cai and C. Yang, “Customer preference analysis integrating online reviews: An evidence theory-based method considering criteria interaction,” Eng Appl Artif Intell, vol. 133, p. 108092, 2024, doi: https://doi.org/10.1016/j.engappai.2024.108092.
N. Wang, T. S. H. Teo, S. Liu, and V. K. G. Lim, “Hotel reviews during the pandemic: Encouraging repeat customers to ‘speak up’ through management response,” Int J Hosp Manag, vol. 120, p. 103765, 2024, doi: https://doi.org/10.1016/j.ijhm.2024.103765.
A. Amato, J. R. Osterrieder, and M. R. Machado, “How can artificial intelligence help customer intelligence for credit portfolio management? A systematic literature review,” International Journal of Information Management Data Insights, vol. 4, no. 2, p. 100234, 2024, doi: https://doi.org/10.1016/j.jjimei.2024.100234.
S. Chen, Z. Xu, D. Xu, and X. Gou, “Customer purchase prediction in B2C e-business: A systematic review and future research agenda,” Expert Syst Appl, vol. 252, p. 124261, 2024, doi: https://doi.org/10.1016/j.eswa.2024.124261.
M. Zhai, X. Wang, and X. Zhao, “The importance of online customer reviews characteristics on remanufactured product sales: Evidence from the mobile phone market on Amazon.com,” Journal of Retailing and Consumer Services, vol. 77, p. 103677, 2024, doi: https://doi.org/10.1016/j.jretconser.2023.103677.
S. Soklaridis, A. M. Geske, and S. Kummer, “Key characteristics of perceived customer centricity in the passenger airline industry: A systematic literature review,” Journal of the Air Transport Research Society, vol. 3, p. 100031, 2024, doi: https://doi.org/10.1016/j.jatrs.2024.100031.
J. Langevin et al., “Customer enrollment and participation in building demand management programs: A review of key factors,” Energy Build, vol. 320, p. 114618, 2024, doi: https://doi.org/10.1016/j.enbuild.2024.114618.
B. Burhanudin, “Managing social commerce: does customer review quality matter?,” Procedia Comput Sci, vol. 234, pp. 1459–1466, 2024, doi: https://doi.org/10.1016/j.procs.2024.03.146.
Y. Liu, T.-H. You, J. Zou, and B.-B. Cao, “Modelling customer requirement for mobile games based on online reviews using BW-CNN and S-Kano models,” Expert Syst Appl, vol. 258, p. 125142, 2024, doi: https://doi.org/10.1016/j.eswa.2024.125142.
D. Leocádio, L. Guedes, J. Oliveira, J. Reis, and N. Melão, “Customer Service with AI-Powered Human-Robot Collaboration (HRC): A Literature Review,” Procedia Comput Sci, vol. 232, pp. 1222–1232, 2024, doi: https://doi.org/10.1016/j.procs.2024.01.120.
H. Li, H. Liu, H. Hailey Shin, and H. Ji, “Impacts of user-generated images in online reviews on customer engagement: A panel data analysis,” Tour Manag, vol. 101, p. 104855, 2024, doi: https://doi.org/10.1016/j.tourman.2023.104855.
Y. A. Laghbi and M. Al Dhoayan, “Examining how customers perceive community pharmacies based on Google maps reviews: Multivariable and sentiment analysis,” Exploratory Research in Clinical and Social Pharmacy, vol. 15, p. 100498, 2024, doi: https://doi.org/10.1016/j.rcsop.2024.100498.
M. Zaman, C. C. Tan, M. S. Islam, and K. M. Selem, “Hospitality customer intentions to write fake online reviews: A cross-cultural approach,” Int J Hosp Manag, vol. 120, p. 103775, 2024, doi: https://doi.org/10.1016/j.ijhm.2024.103775.
L. Kim, T. Jindabot, and S. F. Yeo, “Understanding customer loyalty in banking industry: A systematic review and meta analysis,” Heliyon, vol. 10, no. 17, p. e36619, 2024, doi: https://doi.org/10.1016/j.heliyon.2024.e36619.
S. Bellary, P. Kumar Bala, and S. Chakraborty, “Utilizing online reviews for analyzing digital healthcare consultation services: Examining perspectives of both healthcare customers and healthcare professionals,” Int J Med Inform, vol. 191, p. 105587, 2024, doi: https://doi.org/10.1016/j.ijmedinf.2024.105587.
E. B. Firmansyah, M. R. Machado, and J. L. R. Moreira, “How can Artificial Intelligence (AI) be used to manage Customer Lifetime Value (CLV)—A systematic literature review,” International Journal of Information Management Data Insights, vol. 4, no. 2, p. 100279, 2024, doi: https://doi.org/10.1016/j.jjimei.2024.100279.
M. T. H. Le, “Fostering product quality and Brand Trust by QR code traceability and customer reviews: The moderating role of brand reputation in Blockchain,” The Journal of High Technology Management Research, vol. 35, no. 1, p. 100492, 2024, doi: https://doi.org/10.1016/j.hitech.2024.100492.
T. Rahman, M. L. Othman, S. B. Mohd Noor, W. F. Binti Wan Ahmad, and M. F. Sulaima, “Methods and attributes for customer-centric dynamic electricity tariff design: A review,” Renewable and Sustainable Energy Reviews, vol. 192, p. 114228, 2024, doi: https://doi.org/10.1016/j.rser.2023.114228.
L. Zhang, Y. Xuan, Z. Li, P. Gao, and Y. Zheng, “How to obtain customer requirements for each stage of the product life cycle from online reviews: Using mobile phones as an example,” Journal of Retailing and Consumer Services, vol. 80, p. 103928, 2024, doi: https://doi.org/10.1016/j.jretconser.2024.103928.
T. Nguyen-Sy, “Optimized hybrid XGBoost-CatBoost model for enhanced prediction of concrete strength and reliability analysis using Monte Carlo simulations,” Appl Soft Comput, vol. 167, p. 112490, 2024, doi: https://doi.org/10.1016/j.asoc.2024.112490.
X. Huang, W. Liu, Q. Guo, and J. Tan, “Prediction method for the dynamic response of expressway lateritic soil subgrades on the basis of Bayesian optimization CatBoost,” Soil Dynamics and Earthquake Engineering, vol. 186, p. 108943, 2024, doi: https://doi.org/10.1016/j.soildyn.2024.108943.
X. Ren, H. Yu, X. Chen, Y. Tang, G. Wang, and X. Du, “Application of the CatBoost Model for Stirred Reactor State Monitoring Based on Vibration Signals,” CMES - Computer Modeling in Engineering and Sciences, vol. 140, no. 1, pp. 647–663, 2024, doi: https://doi.org/10.32604/cmes.2024.048782.
H. Qiu, Y. Xia, C. Xiang, F. Xu, L. Sun, and Y. Zou, “Prediction of hydrogen storage in metal-organic frameworks using CatBoost-based approach,” Int J Hydrogen Energy, vol. 79, pp. 952–961, 2024, doi: https://doi.org/10.1016/j.ijhydene.2024.07.078.
M. Zahid et al., “Factors affecting injury severity in motorcycle crashes: Different age groups analysis using Catboost and SHAP techniques,” Traffic Inj Prev, vol. 25, no. 3, pp. 472–481, 2024, doi: https://doi.org/10.1080/15389588.2023.2297168.
E. Ghorbani and S. Yagiz, “Estimating the penetration rate of tunnel boring machines via gradient boosting algorithms,” Eng Appl Artif Intell, vol. 136, p. 108985, 2024, doi: https://doi.org/10.1016/j.engappai.2024.108985.
Y. Li, Y. Duan, Y. Zhou, J. Yang, F. Li, and A. Yang, “Research on prediction model of iron ore powder sintering foundation characteristics based on FOA-Catboost algorithm,” Alexandria Engineering Journal, vol. 86, pp. 603–615, 2024, doi: https://doi.org/10.1016/j.aej.2023.12.015.
D. K. Singh and S. Kumar, “Techno-economics of high ash coal gasification: A machine learning approach using CatBoost model,” J Clean Prod, vol. 481, p. 144160, 2024, doi: https://doi.org/10.1016/j.jclepro.2024.144160.
M. Karbasi, M. Jamei, M. Ali, A. Malik, and Z. M. Yaseen, “Forecasting weekly reference evapotranspiration using Auto Encoder Decoder Bidirectional LSTM model hybridized with a Boruta-CatBoost input optimizer,” Comput Electron Agric, vol. 198, p. 107121, 2022, doi: https://doi.org/10.1016/j.compag.2022.107121.
Z. Ge et al., “Quantifying and comparing the effects of key chemical descriptors on metal–organic frameworks water stability with CatBoost and SHAP,” Microchemical Journal, vol. 196, p. 109625, 2024, doi: https://doi.org/10.1016/j.microc.2023.109625.
J. Bian, J. Wang, and Q. Yece, “A novel study on power consumption of an HVAC system using CatBoost and AdaBoost algorithms combined with the metaheuristic algorithms,” Energy, vol. 302, p. 131841, 2024, doi: https://doi.org/10.1016/j.energy.2024.131841.
J. Yu, W. Zheng, L. Xu, F. Meng, J. Li, and L. Zhangzhong, “TPE-CatBoost: An adaptive model for soil moisture spatial estimation in the main maize-producing areas of China with multiple environment covariates,” J Hydrol (Amst), vol. 613, p. 128465, 2022, doi: https://doi.org/10.1016/j.jhydrol.2022.128465.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Yahya Nur Ifriza, Ratna Nur Mustika Sanusi, Hendra Febriyanto, Azlina Kamaruddin

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.















