Optimizing Socioeconomic Features for Poverty Prediction in South Sumatera
DOI:
https://doi.org/10.38043/tiers.v6i1.6244Keywords:
Poverty Prediction, Machine Learning, Feature Engineering, Random Forest, SHAP AnalysisAbstract
Poverty in South Sumatera remains a complex challenge influenced by socioeconomic factors. Traditional methods often fail to capture nonlinear relationships critical for accurate prediction. This study enhances poverty prediction by optimizing feature engineering using 32-variable socioeconomic data from South Sumatra for the years 2019 to 2023. Data preprocessing included cleaning, imputation, normalization, and outlier handling. Feature aggregation created composite indices: Education Index (P1, P2, P3), Health Index (AH1–AH4), Economic Index (IE, GR, AI, EG), and Healthcare Workforce Index (HW1–HW9). Feature interaction derived ratios such as Income vs. Economy (AN/Education Index), Infrastructure vs. Health (road length/Healthcare Workforce Index), and Unemployment vs. Workforce (HI/AT), highlighting interdependencies. Dimensionality reduction (PCA) and Lasso Regression selected eight key predictors, including Year and Poverty Level. Among tested models, Random Forest performed best (R²=0.7244, MAE=0.2489). SHAP analysis identified Education and Economic Indices as top predictors. Optimized feature engineering improved model accuracy and interpretability, supporting targeted poverty reduction strategies in South Sumatera.
Downloads
References
R. Riangga and E. Desmamora, Jumlah Penduduk Miskin Kota Palembang masih Terbanyak di Sumsel, Daya Neli Berkurang, Pentingkan Rokok, Sumeks.com. Accessed: Mar. 13, 2025. [Online]. Available: https://sumeks.disway.id/read/740341/jumlah-penduduk-miskin-kota-palembang-masih-terbanyak-di-sumsel-daya-beli-berkurang-pentingkan-rokok
E. Saputra and D. Setiawan, 0,38 Persen Penduduk Palembang Berada di Garis Kemiskinan Ekstrim, Pal TV. Co.id. Accessed: Mar. 13, 2025. [Online]. Available: https://paltv.disway.id/read/31710/038-persen-penduduk-palembang-berada-di-garis-kemiskinan-ekstrim
Y. Wang, Y. Jiang, D. Yin, C. Liang, and F. Duan, Examining Multilevel Poverty-Causing Factors in Poor Villages: a Hierarchical Spatial Regression Model linear model, Appl Spat Anal Policy, vol. 14, no. 14, pp. 969998, Aug. 2021, doi: 10.1007/s12061-021-09388-1.
Y. Jiang, Y. Wang, W. Qi, B. Cai, C. Huang, and C. Liang, Detecting Multilevel Poverty-Causing Factors of Farmer Households in Fugong County: A Hierarchical SpatialTemporal Regressive Model, Agriculture, vol. 12, no. 11, p. 1844, Nov. 2022, doi: 10.3390/agriculture12111844.
A. I. Lismana and H. Sumarsono, Analysis of the Effect of Population Growth, Human Development Index and Unemployment Rate on Poverty in West Java Province 2017-2020, Jurnal Ekonomi Pembangunan, vol. 20, no. 01, pp. 8897, Jun. 2022, doi: 10.22219/JEP.V20I01.20286.
S. Annas, B. Poerwanto, S. Sapriani, and M. F. S, Implementation of K-Means Clustering on Poverty Indicators in Indonesia, MATRIK: Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 21, no. 2, pp. 257266, Mar. 2022, doi: 10.30812/matrik.v21i2.1289.
M. Rahman and V. Kumar, Machine Learning Based Customer Churn Prediction in Banking, in Proceedings of the 4th International Conference on Electronics, Communication and Aerospace Technology, ICECA 2020, 2020. doi: 10.1109/Iceca49313.2020.9297529.
H. Zixi, Poverty Prediction through Machine Learning, in Proceedings - 2nd International Conference on E-Commerce and Internet Technology, ECIT 2021, Institute of Electrical and Electronics Engineers Inc., Mar. 2021, pp. 314324. doi: 10.1109/ecit52743.2021.00073.
Q. Li, S. Yu, D. chevin, and M. Fan, Is poverty predictable with machine learning? A study of DHS data from Kyrgyzstan, Socioecon Plann Sci, vol. 81, p. 101195, Jun. 2022, doi: 10.1016/j.seps.2021.101195.
A. Alsharkawi, M. Al-Fetyani, M. Dawas, H. Saadeh, and M. Alyaman, Poverty classification using machine learning: The case of Jordan, Sustainability (Switzerland), vol. 13, no. 3, pp. 116, Feb. 2021, doi: 10.3390/su13031412.
L. Maruejols, H. Wang, Q. Zhao, Y. Bai, and L. Zhang, Comparison of machine learning predictions of subjective poverty in rural China, China Agricultural Economic Review, vol. 15, no. 2, pp. 379399, May 2023, doi: 10.1108/caer-03-2022-0051.
A. A. Hassan, A. H. Muse, and C. Chesneau, Machine Learning Study Using 2020 SDHS Data to Determine Poverty Determinants in Somalia, Scientific Reports 2024 14:1, vol. 14, no. 1, pp. 119, Mar. 2024, doi: 10.1038/s41598-024-56466-8.
S. K. Satapathy, S. Saravanan, S. Mishra, and S. N. Mohanty, A Comparative Analysis of Multidimensional COVID-19 Poverty Determinants: An Observational Machine Learning Approach, New Gener Comput, vol. 41, no. 1, pp. 155184, Mar. 2023, doi: 0.1007/s00354-023-00203-8.
K. Abbas et al., Measurements and determinants of extreme multidimensional energy poverty using machine learning, Energy, vol. 251, p. 123977, Jul. 2022, doi: 10.1016/J.ENERGY.2022.123977.
W. Sosa-Escudero, M. V. Anauati, and W. Brau, Poverty, Inequality and Development Studies with Machine Learning, Advanced Studies in Theoretical and Applied Econometrics, vol. 53, pp. 291335, 2022, doi: 10.1007/978-3-031-15149-1_9.
A. Nachev, Exploring Poverty Factors Through Predictive Modeling, pp. 329342, 2025, doi: 10.1007/978-3-031-85628-0_24.
M. Kuhn and K. Johnson, Feature Engineering and Selection: A Practical Approach for Predictive Models, Feature Engineering and Selection: A Practical Approach for Predictive Models, pp. 1297, Jan. 2019, doi: 10.1201/9781315108230.
A. Karim, Perbandingan Prediksi Kemiskinan di Indonesia Menggunakan Support Vector Machine (SVM) dengan Regresi Linear, Jurnal Sains Matematika dan Statistika, vol. 6, no. 1, pp. 107113, Jan. 2020, doi: doi:10.24014/jsms.v6i1.9259.
I. P. Putri, T. Terttiaavini, and N. Arminarahmah, Analisis Perbandingan Algoritma Machine Learning untuk Prediksi Stunting pada Anak, MALCOM: Indonesian Journal of Machine Learning and Computer Science, vol. 4, no. 1, 2024, doi: 10.57152/malcom.v4i1.1078.
D. Antoni, T. Avini, A. Heryati, and H. Syaputra, Business Process Reengineering, 1st ed. Perkumpulan Rumah Cemerlang Indonesia, 2023. Accessed: Apr. 25, 2025. [Online]. Available: https://www.rcipress.rcipublisher.org/index.php/rcipress/catalog/book/862
C. S. Kumar, M. N. S. Choudary, V. B. Bommineni, G. Tarun, and T. Anjali, Dimensionality Reduction based on SHAP Analysis: A Simple and Trustworthy Approach, Proceedings of the 2020 IEEE International Conference on Communication and Signal Processing, ICCSP 2020, pp. 558560, Jul. 2020, doi: 10.1109/iccsp48568.2020.9182109.
K. Cheng and D. S. Young, An Approach for Specifying Trimming and Winsorization Cutoffs, J Agric Biol Environ Stat, vol. 28, no. 2, pp. 299323, Jun. 2023, doi: 10.1007/s13253-023-00527-4.
H. Wang, Q. Liang, J. T. Hancock, and T. M. Khoshgoftaar, Feature selection strategies: a comparative analysis of SHAP-value and importance-based methods, doi: 10.1186/s40537-024-00905-w.
W. E. Marcilio and D. M. Eler, From explanations to feature selection: Assessing SHAP values as feature selection mechanism, Proceedings - 2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images, SIBGRAPI 2020, pp. 340347, Nov. 2020, doi: 10.1109/sibgrapi51738.2020.00053.
D. Chicco, M. J. Warrens, and G. Jurman, The Coefficient of Determination R-Squared is More Informative than SMAPE, MAE, MAPE, MSE and RMSE in Regression Analysis evaluation, PeerJ Comput Sci, vol. 7, pp. 124, Jul. 2021, doi: 10.7717/peerj-cs.623.
T. Verdonck, B. Baesens, M. skarsdttir, and S. vanden Broucke, Special issue on feature Engineering Editorial, Mach Learn, vol. 113, no. 7, pp. 39173928, Jul. 2024, doi: 10.1007/s10994-021-06042-2.
BPS Provinsi Sumatera Selatan, Persentase Penduduk Miskin Provinsi Sumatera Selatan Maret 2024 Sebesar 10,97 Persen - Badan Pusat Statistik Provinsi Sumatera Selatan. Accessed: Mar. 16, 2025. [Online]. Available: https://sumsel.bps.go.id/id/pressrelease/2024/07/01/810/persentase-penduduk-miskin-provinsi-sumatera-selatan-maret-2024-sebesar-10-97-persen-.html
Badan Pusat Statistik Indonesia, Persentase Penduduk Miskin Maret 2024 turun menjadi 9,03 persen, Badan Pusat Statistik Indonesia. Accessed: Mar. 16, 2025. [Online]. Available: https://www.bps.go.id/id/pressrelease/2024/07/01/2370/persentase-penduduk-miskin-maret-2024-turun-menjadi-9-03-persen-.html
Data Cleaning - Venkatesh Ganti, Anish Das Sarma - Google Buku. Accessed: Mar. 13, 2025. [Online]. Available: https://books.google.co.id/books?hl=id&lr=&id=qYdyEAAAQBAJ&oi=fnd&pg=PP1&dq=1)%09Data+Cleaning&ots=1mHWRrOPVN&sig=R9C88poXk7TjrU2k8MQ9cadSlkY&redir_esc=y#v=onepage&q=1)%09Data%20Cleaning&f=false
M. M. Mijwil, A. W. Abdulqader, S. M. Ali, and A. T. Sadiq, Null-values Imputation Using Different Modification Random Forest Algorithm, IAES International Journal of Artificial Intelligence, vol. 12, no. 1, pp. 374383, Mar. 2023, doi: 10.11591/ijai.v12.i1.pp374-383.
V. N. G. Raju, K. P. Lakshmi, V. M. Jain, A. Kalidindi, and V. Padma, Study the Influence of Normalization/Transformation process on the Accuracy of Supervised Classification, Proceedings of the 3rd International Conference on Smart Systems and Inventive Technology, ICSSIT 2020, pp. 729735, Aug. 2020, doi: 10.1109/icssit48917.2020.9214160.
K. U. Singh, S. K. Pandey, D. P. Yadav, T. Singh, G. Kumar, and A. Kumar, Data Science - A Compendious Study on Statistical Methods and Visualization Techniques, Proceedings of International Conference on Computational Intelligence and Sustainable Engineering Solution, CISES 2023, pp. 227232, 2023, doi: 10.1109/cises58720.2023.10183429.
F. Rahmat et al., Supervised feature selection using principal component analysis, Knowl Inf Syst, vol. 66, no. 3, 2024, doi: 10.1007/s10115-023-01993-5.
K. P. Sinaga and M.-S. Yang, Unsupervised K-Means Clustering Algorithm, IEEE Access, vol. 8, pp. 8071680727, 2020, doi: 10.1109/access.2020.2988796.
T. Terttiaavini et al., Clustering Analysis of Premier Research Fields, International Journal of Engineering & Technology, vol. 7, no. 4.44, 2018, doi: 10.14419/ijet.v7i4.44.26860.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 terttiaavini terttiaavini

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.