Improving Multi-Document Summarization Performance by Utilizing Comprehensive Document Features

  • Rosalina Rosalina President University

Abstract

The rapid growth of information technology and communication technology makes the volume of information available on the web increase rapidly. This development is leading to information overload. Multidocument summarization appears as a way to resolve the information overload problem in an effective way. In order to improve the performance of the multi-document summary this research combined the sentence features: sentence centroid, sentence position, sentence length and IsTheLongestSentence value to weight the sentences in order to find the most informative information of a text. In addition, this research uses a new method to calculate the weight of sentence position feature. The performance of the research result was evaluated using ROUGE metrics: ROUGE-N, ROUGE-L, ROUGE-W, ROUGE-S, and ROUGE-SU. The research result outperform MEAD system if it was evaluated using the dataset of cluster D133C and D134H and if it was evaluated using ROUGE-1, ROUGE-S and ROUGE SU for cluster D133C and ROUGE-2, ROUGE-3, ROUGE-4, ROUGE-L and ROUGE-W for cluster D134H. This shows that the research result captures the important words in the extracted summary and it generates longer sentences as longer sentence contains more material that would match the one in the reference summaries.

Index Terms— multi-document summarization, document features, centroid based summarization

Downloads

Download data is not yet available.
Published
2016-04-19
How to Cite
Rosalina, R. (2016). Improving Multi-Document Summarization Performance by Utilizing Comprehensive Document Features. Ultimatics : Jurnal Teknik Informatika, 8(1), 32-36. https://doi.org/https://doi.org/10.31937/ti.v8i1.500