Detecting Data Leakage in Cloud Storage Using Decision Tree Classification

  • Parlindungan Harahap Universitas Islam Negeri Sumatera Utara, Indonesia
  • Muhammad Siddik Hasibuan Universitas Islam Negeri Sumatera Utara, Indonesia
Keywords: Cloud Storage, Data Leakage Detection, Decision Tree, GridSearchCV, Machine Learning

Abstract

Data leakage in cloud storage systems poses a significant security threat, potentially leading to unauthorized access, loss of sensitive information, and operational disruptions. This research proposes a classification model for detecting potential data leakage incidents using the Decision Tree algorithm. The dataset, obtained from the Kaggle public repository, contains user activity logs representing both normal and anomalous behaviors in cloud storage environments. Several preprocessing steps were applied to improve model quality, including handling missing values, removing outliers, and converting categorical data into numerical form. Hyperparameter optimization was performed using GridSearchCV to determine the best configuration for the Decision Tree classifier. Experimental results demonstrate that the optimized model achieved high classification performance, with an accuracy of 70,84%, a precision of 55% for the data leakage class, and an F1-score of 40%. The analysis also highlights the significance of certain features, such as multi-factor authentication usage and access to confidential data, in predicting potential leakage events. This study provides a theoretical contribution by \establishing a robust methodology for applying Decision Tree algorithms to a novel cloud security dataset, offering a scalable and interpretable framework for automated threat detection.

Downloads

Download data is not yet available.

References

D. D. Firmansyah Putri and M. H. Fahrozi, “Upaya Pencegahan Kebocoran Data Konsumen Melalui Pengesahan Ruu Perlindungan Data Pribadi (Studi Kasus E-Commerce Bhinneka.Com),” Borneo Law Rev., vol. 5, no. 1, pp. 46–68, 2021, doi: 10.35334/bolrev.v5i1.2014.

L. Tantowi and L. Wijayanti, “Peluang Dan Tantangan Penyimpanan Cloud Storage Pada Dokumen Digital,” Shaut Al-Maktabah J. Perpustakaan, Arsip dan Dokumentasi, vol. 15, no. 1, pp. 118–131, 2023, doi: 10.37108/shaut.v15i1.803.

R. Rifany, M. D. Prakoso, and P. D. Laksono, “Analisis Dampak Cloud Computing terhadap Keamanan Sistem dan Data,” Semin. Nas. TEKNOKA, vol. 8, no. 2502, pp. 01–06, 2023.

A. F. Mahmud and S. Wirawan, “Sistemasi: Jurnal Sistem Informasi Deteksi Phishing Website menggunakan Machine Learning Metode Klasifikasi Phishing Website Detection using Machine Learning Classification Method,” vol. 13, no. 4, pp. 2540–9719, 2024.

M. Fadhlurrohman, A. Muliawati, and B. Hananto, “Analisis Kinerja Intrusion Detection System pada Deteksi Anomali dengan Metode Decision Tree Terhadap Serangan Siber,” J. Ilmu Komput. dan Agri-Informatika, vol. 8, no. 2, pp. 90–94, 2021, doi: 10.29244/jika.8.2.90-94.

A. Halim Lubis, Y. Fadillah Harahap, and P. Studi Ilmu Komputer, “Analisis Sentimen Masyarakat Terhadap Resesi Ekonomi Global 2023 Menggunakan Algoritma Naïve Bayes Classifier,” J. Ilm. Elektron. Dan Komput., vol. 16, no. 2, pp. 442–450, 2023.

M. S. Hasibuan and A. Serdano, “Analisis Sentimen Kebijakan Pembelajaran Tatap Muka Menggunakan Support Vector Machine dan Naive Bayes,” JRST (Jurnal Ris. Sains dan Teknol., vol. 6, no. 2, pp. 199–204, 2022.

M. R. Fatiha, I. Setiawan, A. N. Ikhsan, and I. R. Yunita, “Optimisasi Sistem Deteksi Phishing Berbasis WeB,” J. Ilm. IT CIDA, vol. 10, no. 2, pp. 97–108, 2024.

S. Yuan, H. Li, X. Qian, W. Jiang, and G. Xu, “OnePath: Efficient and Privacy-Preserving Decision Tree Inference in the Cloud,” arXiv (Cornell Univ., pp. 1–12, 2024, doi: arXiv:2409.19334.

M. A. Nugroho and R. Kartadie, “Cloud Storage Dengan Teknologi Kubernetes Untuk Platform Collaborative Research,” JIPI (Jurnal Ilm. Penelit. dan Pembelajaran Inform., vol. 6, no. 1, pp. 74–81, 2021, doi: 10.29100/jipi.v6i1.1908.

A. C. Darmawan, “Pengembanga Aplikasi Berbasis Web dengan Python Flask untuk Klasifikasi Data Menggunakan Metode Decision Tree C4.5,” Universitas Islam Indonesia, 2022.

A. Fahri and Y. Ramdhani, “Visualisasi Data dan Penerapan Machine Learning Menggunakan Decision Tree Untuk Keputusan Layanan Kesehatan COVID-19,” J. Tekno Kompak, vol. 17, no. 2, p. 50, 2023, doi: 10.33365/jtk.v17i2.2438.

R. N. Ramadhon, A. Ogi, A. P. Agung, R. Putra, S. S. Febrihartina, and U. Firdaus, “Implementasi Algoritma Decision Tree untuk Klasifikasi Pelanggan Aktif atau Tidak Aktif pada Data Bank,” Karimah Tauhid, vol. 3, no. 2, pp. 1860–1874, 2024, doi: 10.30997/karimahtauhid.v3i2.11952.

D. A. Setyawan, “Pengembangan Metode Decision Tree Dengan Diskritisasi Data Dan Splitting Atribut Menggunakan Hierarchical Clustering Dan,” Institut Teknologi Sepuluh Nopember Surabaya, 2020.

S. M. Prasetiyo, T. U. Ningsih, B. Hakim, and A. A. R. Putra, “Jurnal Managemen Proyek Informatika Artificial Intelligence Vision Engineer,” BULLET J. Multidisiplin Ilmu, vol. 01, no. 6, pp. 987–991, 2022.

M. Ţălu, “Exploring Machine Learning Algorithms to Enhance Cloud Comput‑ ing Security,” Digit. Technol. Res. Appl., vol. 4, no. 2, pp. 33–47, 2025, doi: 10.54963/dtra.v4i2.1272.

A. B. Nassif, M. A. Talib, Q. Nasir, H. Albadani, and F. M. Dakalbab, “Machine Learning for Cloud Security: A Systematic Review,” IEEE Access, vol. 9, pp. 20717–20735, 2021, doi: 10.1109/ACCESS.2021.3054129.

S. V. Bhaskaran and S. Achar, “a Study of Evolving Cloud Computing Data Security: a Machine Learning Perspective,” Int. J. Prof. Bus. Rev., vol. 10, no. 3, p. e05315, 2025, doi: 10.26668/businessreview/2025.v10i3.5315.

Z. M. J. Nafis, R. Nazilla, R. Nugraha, and S. ’Uyun Shofwatul ’Uyun, “Perbandingan Algoritma Decision Tree dan K-Nearest Neighbor untuk Klasifikasi Serangan Jaringan IoT,” Komputika J. Sist. Komput., vol. 13, no. 2, pp. 245–252, 2024, doi: 10.34010/komputika.v13i2.12609.

F. A. Oktavirahani and R. Maharesi, “Implementasi Algoritma Decision Tree Cart Untuk Merekomendasikan Ukuran Baju,” JURIKOM (Jurnal Ris. Komputer), vol. 9, no. 1, p. 138, 2022, doi: 10.30865/jurikom.v9i1.3838.

A. Rasyid, S. Gilbijatno, A. W. Pramudya, D. Prasetyo, and T. Informatika, “Implementasi Algoritma Decision Tree CART untuk Deteksi Dini,” Pros. Semin. Nas. Teknol. Dan Sains Tahun, vol. 4, pp. 440–445, 2025.

D. Muriyatmoko, A. Musthafa, and M. H. Wijaya, “Klasifikasi Profil Kelulusan Nilai AKPAM Dengan Metode Decision Tree,” Semin. Nas. Sains dan Teknol. 2024 Fak., no. April, pp. 448–453, 2024.

R. E. Nugroho, W. Y. Pamungkas, and J. H. Jaman, “Pendeteksi Penyakit Hepatitis Menggunakan Cart Decision Tree,” J. Inform. dan Tek. Elektro Terap., vol. 12, no. 3S1, pp. 3690–3696, 2024, doi: 10.23960/jitet.v12i3s1.5184.

R. Muzayanah, D. A. A. Pertiwi, M. Ali, and M. A. Muslim, “Comparison of gridsearchcv and bayesian hyperparameter optimization in random forest algorithm for diabetes prediction,” J. Soft Comput. Explor., vol. 5, no. 1, pp. 86–91, 2024, doi: 10.52465/joscex.v5i1.308.

K. Alemerien, S. Alsarayreh, and E. Altarawneh, “Diagnosing Cardiovascular Diseases using Optimized Machine Learning Algorithms with GridSearchCV,” J. Appl. Data Sci., vol. 5, no. 4, pp. 1539–1552, 2024, doi: 10.47738/jads.v5i4.280.

Published
2025-09-30
Abstract views: 18 times
Download PDF: 14 times
How to Cite
Harahap, P., & Hasibuan, M. (2025). Detecting Data Leakage in Cloud Storage Using Decision Tree Classification. Journal of Information Systems and Informatics, 7(3), 2516-2534. https://doi.org/10.51519/journalisi.v7i3.1215
Section
Articles