Resampling Techniques in Rainfall Classification of Banjarbaru using Decision Tree Method

Authors

  • Selvi Annisa Universitas Lambung Mangkurat
  • Yeni Rahkmawati Universitas Lambung Mangkurat

DOI:

https://doi.org/10.38043/tiers.v4i2.5069

Keywords:

Random undersampling, Random oversampling, SMOTE, Decision Tree, Imbalanced Dataset

Abstract

Continuous heavy rains, such as in 2021, can cause flood emergencies in various areas of Banjarbaru. Therefore, classification modeling is needed to predict rainfall classes based on climate parameters. The problem faced in the classification case is the unbalanced class distribution. Class imbalance occurs when the minority class is much smaller than the majority class. This research aims to compare three resampling techniques in handling imbalanced rainfall data in Banjarbaru using the Decision Tree model. The comparison methods used were sensitivity, specificity, and G-Mean values. In this research, the method used is a decision tree model with Random undersampling, Random Oversampling, and SMOTE. The result shows that the best model is the Decision tree model with the Random Undersampling technique because it provides the highest G-Mean value and sensitivity and specificity values above 70%. Based on this model, the variables that can separate the Rainy and Cloudy classes are Minimum temperature, Maximum temperature, and Sunshine duration, with the best separator being Maximum Temperature.

Downloads

Download data is not yet available.

References

Siregar, D. C., Ardah, V. P., and Ninggar, R. D, “Identifikasi Kenyamanan Kota Tanjungpinang Berdasarkan Indeks Panas Humidex,” Jurnal Ilmu Lingkungan, vol. 17, no. 2, Sep., pp. 316-322, 2019. https://doi.org/ 10.14710/jil.17.2.316-322

Gunadi, I. G. A., and Dewi, A. A. K, “Klasifikasi Curah Hujan di Provinsi Bali Berdasarkan Metode Naïve Bayesian,” Wahana Matematika dan Sains: Jurnal Matematika, Sains, dan Pembelajarannya, vol. 12, no. 1, Apr., pp. 14-25, 2018. https://doi.org/10.23887/wms.v12i1.13843

Wanto, A., et al, Data Mining: Algoritma dan Implementasi. Medan: Yayasan Kita Menulis, 2020.

Ren, F., et. al., “Ensemble Based Adaptive over-sampling method for imbalanced data Learning aided detection of microaneurysm,” Computerized Medical Imaging and Graphics, vol. 55, Jan., pp. 54-67, 2017. https://doi.org/10.1016/j.compmedimag.2016.07.011

Jian, C., Gao, J., and Ao, Y, “A New Sampling Method for Classifying Imbalanced Data Based on Support Vector Machine Ensemble,” Neurocomputing, vol. 193, June, pp. 115-122, 2016. https://doi.org/10.1016/j.neucom.2016.02.006

Rajesh, K. N. V. P. S., and Dhuli, R, “Classification of imbalanced ECG beats using re-sampling techniques and AdaBoost ensemble classifier,” Biomedical Signal Processing and Control, vol. 41, Mar., pp. 242–254, 2018. https://doi.org/10.1016/j.bspc.2017.12.004

Holte, R. C, Acker, L., and Porter, B.W, “Concept Learning and the Problem of Small Disjuncts”. InIJCAI, vol 89, pp. 813-818, 1989.

Gosain, A., and Sardana, S, “Handling class imbalance problem using oversampling techniques: A review”. In 2017 international conference on advances in computing, communications and informatics (ICACCI), Sep., pp. 79-85, 2017.

He, H., Zhang, W., and Zhang, S, “A novel ensemble method for credit scoring: adaption of different imbalance ratios,” Expert Systems with Applications, vol. 98, May, pp. 105-117, 2018. https://doi.org/10.1016/j.eswa.2018.01.012

Kim, A., and Jung, I., “Optimal selection of resampling methods for imbalanced data with high complexity,” PLoS One, vol. 18, no. 7, Jul, 2023. https://doi.org/10.1371/journal.pone.0288540

Chawla, V. N., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P, “SMOTE: Synthetic Minority Over-Sampling Technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321-357, 2002.

Prasetya, R, “Penerapan Teknik Data Mining dengan Algoritma Classification Tree untuk Prediksi Hujan,” Jurnal Widya Climago, vol. 2, no. 2, Nov., pp. 13-23, 2020.

Hasanah, M. A, Soim, S., and Handayani, A. S., “Implementasi CRISP-DM Model Menggunakan Metode Decision Tree dengan Algoritma CART untuk Prediksi Curah Hujan Berpotensi Banjir,” Journal of Applied Informatics and Computing, vol. 5, no. 2, Dec., pp. 103-108, 2021. https://doi.org/10.30871/jaic.v5i2.3200

Khusaeri, A., et al., “Algoritma C4.5 untuk Pemodelan Daerah Rawan Banjir Studi Kasus Kabupaten Karawang Jawa Barat,” ILKOM Jurnal Ilmiah, vol. 9, no. 2, pp. 132-136, 2017. https://doi.org/10.33096/ilkom.v9i2.128.132-136

Risnawati, I, et al., “Klasifikasi Data Mining Untuk Mengestimasi Potensi Curah Hujan Berdampak Banjir Daerah Menggunakan Algoritma C4.5,” Jurnal INSAN, vol. 3, no. 2, pp. 78-84, 2023. https://doi.org/10.31294/jinsan.v3i2.3050

Meteorological, Climatological, and Geophysical Agency (BMKG), Onlie Data – Database Center – BMKG, 2023. Available: https://dataonline.bmkg.go.id. [Accessed: June 01, 2023]

Meteorological, Climatological, and Geophysical Agency (BMKG), “Probabilistik Curah Hujan 24 Jam”, 2023. Available: https://www.bmkg.go.id/cuaca/probabilistik-curah-hujan.bmkg. [Accessed: June 01, 2023]

Zhao, F., and Gaschler, R, “Best Graph Type to Compare Discrete Groups: Bar, Dot, and Tally,” Frontiers in Psychology, vol. 12, Dec., 2021. https://doi.org/ 10.3389/fpsyg.2021.775721

Rajesh, K., and Dhuli, R, “Classification Of Imbalanced ECG beats using re-sampling techniques And AdaBoost ensemble classifier,” Biomedical Signal Processing and Control, vol. 41, Mar., pp. 242-254, 2018. https://doi.org/10.1016/j.bspc.2017.12.004

Elreedy, D., and Atiya, A. F, “A Comprehensive Analysis of Synthetic Minority Oversampling Technique (SMOTE) for handling class imbalance,” Information Sciences, vol. 505, Dec., pp. 32-64, 2019. https://doi.org/10.1016/j.ins.2019.07.070

Charisma, R. A., et al, “Analisis Penerapan Metode Ensembled Learning Decision Tree Pada Klasifikasi Virus Hepatitis C,” Journal of Computer System and Informatics (JoSYC), vol. 3, no. 4, pp. 405-409, 2022. https://doi.org/10.47065/josyc.v3i4.2064

Wegier, W., and Ksieniewicz, P, “Application of Imbalanced Data Classification Quality Metrics as Weighting Methods of the Ensemble Data Stream Classification Algorithms,” Entropy (Basel), vol. 22, no. 8, Aug., pp. 849, 2020. https://doi.org/10.3390/e22080849

Sofyan, S., and Prasetyo, A, “Penerapan Synthetic Minority Oversampling Technique (SMOTE) Terhadap Data Tidak Seimbang Pada Tingkat Pendapatan Pekerja Informal Di Provinsi D.I. Yogyakarta Tahun 2019,” In Proc. Seminar Nasional Official Statistics, 2021, pp. 868-877.

Ri, J. H., and Kim, H., “G-Mean Based Extreme Learning Machine for Imbalance Learning,” Digital Signal Processing, vol. 98, March, 2020.

Rohmana, S. F., Rusgiyono, A., and Sugito, “Penentuan Faktor-Faktor Yang Mempengaruhi Intensitas Curah Hujan Dengan Analisis Diskriminan Ganda Dan Regresi Logistik Multinomial (Studi Kasus: Data Curah Hujan Kota Semarang dari Stasiun Meteorologi Maritim Tanjung Emas Periode Oktober 2018 – Maret 2019),” Jurnal Gaussian, vol. 8, no. 3, pp. 398-406, 2019.

Sunarmi, N., et al, “Analisis Faktor Unsur Cuaca terhadap Perubahan Iklim di Kabupaten Pasuruan pada Tahun 2021 dengan Metode Principal Component Analysis,” Newton-Maxwell Journal of Physics, vol. 3, no. 2, pp. 56-64, 2022.

Downloads

Published

2023-12-25

How to Cite

1.
Annisa S, Rahkmawati Y. Resampling Techniques in Rainfall Classification of Banjarbaru using Decision Tree Method. TIERS [Internet]. 2023Dec.25 [cited 2024Dec.22];4(2):122-8. Available from: https://journal.undiknas.ac.id/index.php/tiers/article/view/5069

Issue

Section

Articles