Machine Learning-Based E-Archive for Archives Management of South Sumatra Province
Abstract
Archives play a crucial role in institutional operations, yet efficiently retrieving specific information from them can be challenging. This research addresses this issue by developing an information retrieval system that incorporates advanced methods to enhance search efficiency. The system employs the TF-IDF (Term Frequency-Inverse Document Frequency) formula, which assesses the significance of a word within a document set, and the BM25 method, a sophisticated algorithm for ranking documents based on their relevance to the input query. Both methods undergo a preprocessing stage, enabling the system to calculate the relevance of each document to the given query accurately. The effectiveness of this system is evaluated using key performance metrics: precision (accuracy), recall (completeness), and the F1 Score (the harmonic means of precision and recall, representing the best value). Testing with various keywords revealed that the BM25 method yielded impressive results, achieving an average precision of 0.75, recall of 0.6, and an F1 Score of 0.6665. In contrast, the TF-IDF method scored lower, with a precision of 0.33, recall of 0.2, and an F1 Score of 0.2500. The system was tested using a dataset of 350 documents.
Downloads
References
R. R. Baihaqi, “Temu Kembali Informasi pada Berita Olahraga Berbahasa Indonesia dengan Metode BM25 dan Seleksi Fitur Term Frequency (TF),” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 4, no. 11, pp. 4200–4206, 2020.
J. Sistem, A. Cucus, Y. Aprilinda, I. Sistem, and I. Presensi, “768-1474-1-Sm,” 2018.
A. Roihan, P. A. Sunarya, and A. S. Rafika, “Pemanfaatan Machine Learning dalam Berbagai Bidang: Review paper,” IJCIT (Indonesian J. Comput. Inf. Technol., vol. 5, no. 1, pp. 75–82, 2020, doi: 10.31294/ijcit.v5i1.7951.
M. Ula, A. Faridhatul Ulva, and Mauliza, “Implementasi Machine Learning Dengan Model Case Based Reasoning Dalam Mendagnosa Gizi Buruk Pada Anak,” J. Inform. Kaputama, vol. 5, no. 2, pp. 333–339, 2021.
A. I. Kadhim, “Term Weighting for Feature Extraction on Twitter: A Comparison between BM25 and TF-IDF,” 2019 Int. Conf. Adv. Sci. Eng. ICOASE 2019, pp. 124–128, 2019, doi: 10.1109/ICOASE.2019.8723825.
“Faradila Puspa Wardani (1).pdf.” 2018.
W. Faradila Puspa, “Query Expansion Pada Sistem Temu Kembali Informasi Dokumen Jurnal Berbahasa Indonesia Menggunakan Metode BM25,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 3, no. 3, pp. 2619–2625, 2019.
A. I. B. Pranata and M. Indriati, “Klasifikasi Dokumen pada Laporan Kepolisian dengan Menggunakan Metode BM25 dan Improved K-Nearest Neighbor (IKNN),” Teknol. Inf. dan Ilmu Komput., vol. 3, no. 5, pp. 4434–4438, 2019.
B. Herwijayanti, D. E. Ratnawati, and L. Muflikhah, “Klasifikasi Berita Online dengan menggunakan Pembobotan TF-IDF dan Cosine Similarity,” Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 2, no. 1, pp. 306–312, 2018.
R. R. A. Siregar, F. A. Sinaga, and R. Arianto, “Aplikasi Penentuan Dosen Penguji Skripsi Menggunakan Metode TF-IDF dan Vector Space Model,” Comput. J. Comput. Sci. Inf. Syst., vol. 1, no. 2, p. 171, 2017, doi: 10.24912/computatio.v1i2.1014.
H. K. Pambudi, P. G. A. Kusuma, F. Yulianti, and K. A. Julian, “Prediksi Status Pengiriman Barang Menggunakan Metode Machine Learning,” J. Ilm. Teknol. Infomasi Terap., vol. 6, no. 2, pp. 100–109, 2020, doi: 10.33197/jitter.vol6.iss2.2020.396.
N. L. P. C. Savitri, R. A. Rahman, R. Venyutzky, and N. A. Rakhmawati, “Analisis Klasifikasi Sentimen Terhadap Sekolah Daring pada Twitter Menggunakan Supervised Machine Learning,” J. Tek. Inform. dan Sist. Inf., vol. 7, no. 1, pp. 47–58, 2021, doi: 10.28932/jutisi.v7i1.3216.
R. Sistem and E. J. Evaluasi, “JURNAL RESTI Klasifikasi Citra Burung Lovebird Menggunakan Decision Tree dengan,” J. Resti, vol. 5, no. 10, pp. 688–696, 2021.
M. Martin and L. Nilawati, “Recall dan Precision Pada Sistem Temu Kembali Informasi Online Public Access Catalogue (OPAC) di Perpustakaan,” Paradig. - J. Komput. dan Inform., vol. 21, no. 1, pp. 77–84, 2019, doi: 10.31294/p.v21i1.5064.
C. H. Yutika, A. Adiwijaya, and S. Al Faraby, “Analisis Sentimen Berbasis Aspek pada Review Female Daily Menggunakan TF-IDF dan Naïve Bayes,” J. Media Inform. Budidarma, vol. 5, no. 2, p. 422, 2021, doi: 10.30865/mib.v5i2.2845.


Copyright (c) 2023 Journal of Information Systems and Informatics

This work is licensed under a Creative Commons Attribution 4.0 International License.
- I certify that I have read, understand and agreed to the Journal of Information Systems and Informatics (Journal-ISI) submission guidelines, policies and submission declaration. Submission already using the provided template.
- I certify that all authors have approved the publication of this and there is no conflict of interest.
- I confirm that the manuscript is the authors' original work and the manuscript has not received prior publication and is not under consideration for publication elsewhere and has not been previously published.
- I confirm that all authors listed on the title page have contributed significantly to the work, have read the manuscript, attest to the validity and legitimacy of the data and its interpretation, and agree to its submission.
- I confirm that the paper now submitted is not copied or plagiarized version of some other published work.
- I declare that I shall not submit the paper for publication in any other Journal or Magazine till the decision is made by journal editors.
- If the paper is finally accepted by the journal for publication, I confirm that I will either publish the paper immediately or withdraw it according to withdrawal policies
- I Agree that the paper published by this journal, I transfer copyright or assign exclusive rights to the publisher (including commercial rights)