Machine Learning-Based E-Archive for Archives Management of South Sumatra Province

  • Toni Tri Atmojo Universitas Bina Darma, Indonesia
  • Yesi Novaria Kunang Universitas Bina Darma, Indonesia
Keywords: Information Retrieval, TF-IDF, BM25, Archives

Abstract

Archives play a crucial role in institutional operations, yet efficiently retrieving specific information from them can be challenging. This research addresses this issue by developing an information retrieval system that incorporates advanced methods to enhance search efficiency. The system employs the TF-IDF (Term Frequency-Inverse Document Frequency) formula, which assesses the significance of a word within a document set, and the BM25 method, a sophisticated algorithm for ranking documents based on their relevance to the input query. Both methods undergo a preprocessing stage, enabling the system to calculate the relevance of each document to the given query accurately. The effectiveness of this system is evaluated using key performance metrics: precision (accuracy), recall (completeness), and the F1 Score (the harmonic means of precision and recall, representing the best value). Testing with various keywords revealed that the BM25 method yielded impressive results, achieving an average precision of 0.75, recall of 0.6, and an F1 Score of 0.6665. In contrast, the TF-IDF method scored lower, with a precision of 0.33, recall of 0.2, and an F1 Score of 0.2500. The system was tested using a dataset of 350 documents.

Downloads

Download data is not yet available.

References

R. R. Baihaqi, “Temu Kembali Informasi pada Berita Olahraga Berbahasa Indonesia dengan Metode BM25 dan Seleksi Fitur Term Frequency (TF),” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 4, no. 11, pp. 4200–4206, 2020.

J. Sistem, A. Cucus, Y. Aprilinda, I. Sistem, and I. Presensi, “768-1474-1-Sm,” 2018.

A. Roihan, P. A. Sunarya, and A. S. Rafika, “Pemanfaatan Machine Learning dalam Berbagai Bidang: Review paper,” IJCIT (Indonesian J. Comput. Inf. Technol., vol. 5, no. 1, pp. 75–82, 2020, doi: 10.31294/ijcit.v5i1.7951.

M. Ula, A. Faridhatul Ulva, and Mauliza, “Implementasi Machine Learning Dengan Model Case Based Reasoning Dalam Mendagnosa Gizi Buruk Pada Anak,” J. Inform. Kaputama, vol. 5, no. 2, pp. 333–339, 2021.

A. I. Kadhim, “Term Weighting for Feature Extraction on Twitter: A Comparison between BM25 and TF-IDF,” 2019 Int. Conf. Adv. Sci. Eng. ICOASE 2019, pp. 124–128, 2019, doi: 10.1109/ICOASE.2019.8723825.

“Faradila Puspa Wardani (1).pdf.” 2018.

W. Faradila Puspa, “Query Expansion Pada Sistem Temu Kembali Informasi Dokumen Jurnal Berbahasa Indonesia Menggunakan Metode BM25,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 3, no. 3, pp. 2619–2625, 2019.

A. I. B. Pranata and M. Indriati, “Klasifikasi Dokumen pada Laporan Kepolisian dengan Menggunakan Metode BM25 dan Improved K-Nearest Neighbor (IKNN),” Teknol. Inf. dan Ilmu Komput., vol. 3, no. 5, pp. 4434–4438, 2019.

B. Herwijayanti, D. E. Ratnawati, and L. Muflikhah, “Klasifikasi Berita Online dengan menggunakan Pembobotan TF-IDF dan Cosine Similarity,” Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 2, no. 1, pp. 306–312, 2018.

R. R. A. Siregar, F. A. Sinaga, and R. Arianto, “Aplikasi Penentuan Dosen Penguji Skripsi Menggunakan Metode TF-IDF dan Vector Space Model,” Comput. J. Comput. Sci. Inf. Syst., vol. 1, no. 2, p. 171, 2017, doi: 10.24912/computatio.v1i2.1014.

H. K. Pambudi, P. G. A. Kusuma, F. Yulianti, and K. A. Julian, “Prediksi Status Pengiriman Barang Menggunakan Metode Machine Learning,” J. Ilm. Teknol. Infomasi Terap., vol. 6, no. 2, pp. 100–109, 2020, doi: 10.33197/jitter.vol6.iss2.2020.396.

N. L. P. C. Savitri, R. A. Rahman, R. Venyutzky, and N. A. Rakhmawati, “Analisis Klasifikasi Sentimen Terhadap Sekolah Daring pada Twitter Menggunakan Supervised Machine Learning,” J. Tek. Inform. dan Sist. Inf., vol. 7, no. 1, pp. 47–58, 2021, doi: 10.28932/jutisi.v7i1.3216.

R. Sistem and E. J. Evaluasi, “JURNAL RESTI Klasifikasi Citra Burung Lovebird Menggunakan Decision Tree dengan,” J. Resti, vol. 5, no. 10, pp. 688–696, 2021.

M. Martin and L. Nilawati, “Recall dan Precision Pada Sistem Temu Kembali Informasi Online Public Access Catalogue (OPAC) di Perpustakaan,” Paradig. - J. Komput. dan Inform., vol. 21, no. 1, pp. 77–84, 2019, doi: 10.31294/p.v21i1.5064.

C. H. Yutika, A. Adiwijaya, and S. Al Faraby, “Analisis Sentimen Berbasis Aspek pada Review Female Daily Menggunakan TF-IDF dan Naïve Bayes,” J. Media Inform. Budidarma, vol. 5, no. 2, p. 422, 2021, doi: 10.30865/mib.v5i2.2845.

Published
2023-12-03
Abstract views: 1806 times
Download PDF: 1051 times
How to Cite
Atmojo, T., & Kunang, Y. (2023). Machine Learning-Based E-Archive for Archives Management of South Sumatra Province. Journal of Information Systems and Informatics, 5(4), 1491-1507. https://doi.org/10.51519/journalisi.v5i4.566