Topic Modelling Using VSM-LDA For Document Summarization

  • Luthfi Atikah Politeknik Astra
  • Novrindah Alvi Hasanah Universitas Islam Negeri Maulana Malik Ibrahim Malang
  • Agus Zainal Arifin Institut Teknologi Sepuluh Nopember

Abstract

Summarization is a process to simplify the contents of a document by eliminating elements that are considered unimportant but do not reduce the core meaning the document wants to convey. However, as is known, a document will contain more than one topic. So it is necessary to identify the topic so that the summarization process is more effective. Latent Dirichlet Allocation (LDA) is a commonly used method of identifying topics. However, when running a program on a different dataset, LDA experiences "order effects", that is, the resulting topic will be different if the train data sequence is changed. In the same document input, LDA will provide inconsistent topics resulting in low coherence values. Therefore, this paper proposes a topic modelling method using a combination of LDA and VSM (Vector Space Model) for automatic summarization. The proposed method can overcome order effects and identify document topics that are calculated based on the TF-IDF weight on VSM generated by LDA. The results of the proposed topic modeling method on the 1300 Twitter data resulted in the highest coherence value reaching 0.72. The summary results obtained Rouge 1 is 0.78, Rouge 2 is 0.67 dan Rouge L is 0.80.

Downloads

Download data is not yet available.
Published
2022-12-30
How to Cite
Atikah, L., Hasanah, N., & Arifin, A. (2022). Topic Modelling Using VSM-LDA For Document Summarization. Ultimatics : Jurnal Teknik Informatika, 14(2), 91-95. https://doi.org/https://doi.org/10.31937/ti.v14i2.2854