Resampling Techniques in Rainfall Classification of Banjarbaru using Decision Tree Method
DOI:
https://doi.org/10.38043/tiers.v4i2.5069Keywords:
Random undersampling, Random oversampling, SMOTE, Decision Tree, Imbalanced DatasetAbstract
Continuous heavy rains, such as in 2021, can cause flood emergencies in various areas of Banjarbaru. Therefore, classification modeling is needed to predict rainfall classes based on climate parameters. The problem faced in the classification case is the unbalanced class distribution. Class imbalance occurs when the minority class is much smaller than the majority class. This research aims to compare three resampling techniques in handling imbalanced rainfall data in Banjarbaru using the Decision Tree model. The comparison methods used were sensitivity, specificity, and G-Mean values. In this research, the method used is a decision tree model with Random undersampling, Random Oversampling, and SMOTE. The result shows that the best model is the Decision tree model with the Random Undersampling technique because it provides the highest G-Mean value and sensitivity and specificity values above 70%. Based on this model, the variables that can separate the Rainy and Cloudy classes are Minimum temperature, Maximum temperature, and Sunshine duration, with the best separator being Maximum Temperature.
Downloads
References
Siregar, D. C., Ardah, V. P., and Ninggar, R. D, “Identifikasi Kenyamanan Kota Tanjungpinang Berdasarkan Indeks Panas Humidex,” Jurnal Ilmu Lingkungan, vol. 17, no. 2, Sep., pp. 316-322, 2019. https://doi.org/ 10.14710/jil.17.2.316-322
Gunadi, I. G. A., and Dewi, A. A. K, “Klasifikasi Curah Hujan di Provinsi Bali Berdasarkan Metode Naïve Bayesian,” Wahana Matematika dan Sains: Jurnal Matematika, Sains, dan Pembelajarannya, vol. 12, no. 1, Apr., pp. 14-25, 2018. https://doi.org/10.23887/wms.v12i1.13843
Wanto, A., et al, Data Mining: Algoritma dan Implementasi. Medan: Yayasan Kita Menulis, 2020.
Ren, F., et. al., “Ensemble Based Adaptive over-sampling method for imbalanced data Learning aided detection of microaneurysm,” Computerized Medical Imaging and Graphics, vol. 55, Jan., pp. 54-67, 2017. https://doi.org/10.1016/j.compmedimag.2016.07.011
Jian, C., Gao, J., and Ao, Y, “A New Sampling Method for Classifying Imbalanced Data Based on Support Vector Machine Ensemble,” Neurocomputing, vol. 193, June, pp. 115-122, 2016. https://doi.org/10.1016/j.neucom.2016.02.006
Rajesh, K. N. V. P. S., and Dhuli, R, “Classification of imbalanced ECG beats using re-sampling techniques and AdaBoost ensemble classifier,” Biomedical Signal Processing and Control, vol. 41, Mar., pp. 242–254, 2018. https://doi.org/10.1016/j.bspc.2017.12.004
Holte, R. C, Acker, L., and Porter, B.W, “Concept Learning and the Problem of Small Disjuncts”. InIJCAI, vol 89, pp. 813-818, 1989.
Gosain, A., and Sardana, S, “Handling class imbalance problem using oversampling techniques: A review”. In 2017 international conference on advances in computing, communications and informatics (ICACCI), Sep., pp. 79-85, 2017.
He, H., Zhang, W., and Zhang, S, “A novel ensemble method for credit scoring: adaption of different imbalance ratios,” Expert Systems with Applications, vol. 98, May, pp. 105-117, 2018. https://doi.org/10.1016/j.eswa.2018.01.012
Kim, A., and Jung, I., “Optimal selection of resampling methods for imbalanced data with high complexity,” PLoS One, vol. 18, no. 7, Jul, 2023. https://doi.org/10.1371/journal.pone.0288540
Chawla, V. N., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P, “SMOTE: Synthetic Minority Over-Sampling Technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321-357, 2002.
Prasetya, R, “Penerapan Teknik Data Mining dengan Algoritma Classification Tree untuk Prediksi Hujan,” Jurnal Widya Climago, vol. 2, no. 2, Nov., pp. 13-23, 2020.
Hasanah, M. A, Soim, S., and Handayani, A. S., “Implementasi CRISP-DM Model Menggunakan Metode Decision Tree dengan Algoritma CART untuk Prediksi Curah Hujan Berpotensi Banjir,” Journal of Applied Informatics and Computing, vol. 5, no. 2, Dec., pp. 103-108, 2021. https://doi.org/10.30871/jaic.v5i2.3200
Khusaeri, A., et al., “Algoritma C4.5 untuk Pemodelan Daerah Rawan Banjir Studi Kasus Kabupaten Karawang Jawa Barat,” ILKOM Jurnal Ilmiah, vol. 9, no. 2, pp. 132-136, 2017. https://doi.org/10.33096/ilkom.v9i2.128.132-136
Risnawati, I, et al., “Klasifikasi Data Mining Untuk Mengestimasi Potensi Curah Hujan Berdampak Banjir Daerah Menggunakan Algoritma C4.5,” Jurnal INSAN, vol. 3, no. 2, pp. 78-84, 2023. https://doi.org/10.31294/jinsan.v3i2.3050
Meteorological, Climatological, and Geophysical Agency (BMKG), Onlie Data – Database Center – BMKG, 2023. Available: https://dataonline.bmkg.go.id. [Accessed: June 01, 2023]
Meteorological, Climatological, and Geophysical Agency (BMKG), “Probabilistik Curah Hujan 24 Jam”, 2023. Available: https://www.bmkg.go.id/cuaca/probabilistik-curah-hujan.bmkg. [Accessed: June 01, 2023]
Zhao, F., and Gaschler, R, “Best Graph Type to Compare Discrete Groups: Bar, Dot, and Tally,” Frontiers in Psychology, vol. 12, Dec., 2021. https://doi.org/ 10.3389/fpsyg.2021.775721
Rajesh, K., and Dhuli, R, “Classification Of Imbalanced ECG beats using re-sampling techniques And AdaBoost ensemble classifier,” Biomedical Signal Processing and Control, vol. 41, Mar., pp. 242-254, 2018. https://doi.org/10.1016/j.bspc.2017.12.004
Elreedy, D., and Atiya, A. F, “A Comprehensive Analysis of Synthetic Minority Oversampling Technique (SMOTE) for handling class imbalance,” Information Sciences, vol. 505, Dec., pp. 32-64, 2019. https://doi.org/10.1016/j.ins.2019.07.070
Charisma, R. A., et al, “Analisis Penerapan Metode Ensembled Learning Decision Tree Pada Klasifikasi Virus Hepatitis C,” Journal of Computer System and Informatics (JoSYC), vol. 3, no. 4, pp. 405-409, 2022. https://doi.org/10.47065/josyc.v3i4.2064
Wegier, W., and Ksieniewicz, P, “Application of Imbalanced Data Classification Quality Metrics as Weighting Methods of the Ensemble Data Stream Classification Algorithms,” Entropy (Basel), vol. 22, no. 8, Aug., pp. 849, 2020. https://doi.org/10.3390/e22080849
Sofyan, S., and Prasetyo, A, “Penerapan Synthetic Minority Oversampling Technique (SMOTE) Terhadap Data Tidak Seimbang Pada Tingkat Pendapatan Pekerja Informal Di Provinsi D.I. Yogyakarta Tahun 2019,” In Proc. Seminar Nasional Official Statistics, 2021, pp. 868-877.
Ri, J. H., and Kim, H., “G-Mean Based Extreme Learning Machine for Imbalance Learning,” Digital Signal Processing, vol. 98, March, 2020.
Rohmana, S. F., Rusgiyono, A., and Sugito, “Penentuan Faktor-Faktor Yang Mempengaruhi Intensitas Curah Hujan Dengan Analisis Diskriminan Ganda Dan Regresi Logistik Multinomial (Studi Kasus: Data Curah Hujan Kota Semarang dari Stasiun Meteorologi Maritim Tanjung Emas Periode Oktober 2018 – Maret 2019),” Jurnal Gaussian, vol. 8, no. 3, pp. 398-406, 2019.
Sunarmi, N., et al, “Analisis Faktor Unsur Cuaca terhadap Perubahan Iklim di Kabupaten Pasuruan pada Tahun 2021 dengan Metode Principal Component Analysis,” Newton-Maxwell Journal of Physics, vol. 3, no. 2, pp. 56-64, 2022.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Selvi Annisa, Yeni Rahkmawati
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.