Improving Multi-Document Summarization Performance by Utilizing Comprehensive Document Features
Abstract
The rapid growth of information technology and communication technology makes the volume of information available on the web increase rapidly. This development is leading to information overload. Multidocument summarization appears as a way to resolve the information overload problem in an effective way. In order to improve the performance of the multi-document summary this research combined the sentence features: sentence centroid, sentence position, sentence length and IsTheLongestSentence value to weight the sentences in order to find the most informative information of a text. In addition, this research uses a new method to calculate the weight of sentence position feature. The performance of the research result was evaluated using ROUGE metrics: ROUGE-N, ROUGE-L, ROUGE-W, ROUGE-S, and ROUGE-SU. The research result outperform MEAD system if it was evaluated using the dataset of cluster D133C and D134H and if it was evaluated using ROUGE-1, ROUGE-S and ROUGE SU for cluster D133C and ROUGE-2, ROUGE-3, ROUGE-4, ROUGE-L and ROUGE-W for cluster D134H. This shows that the research result captures the important words in the extracted summary and it generates longer sentences as longer sentence contains more material that would match the one in the reference summaries.
Index Terms— multi-document summarization, document features, centroid based summarization
Downloads
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlike International License (CC-BY-SA 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Copyright without Restrictions
The journal allows the author(s) to hold the copyright without restrictions and will retain publishing rights without restrictions.
The submitted papers are assumed to contain no proprietary material unprotected by patent or patent application; responsibility for technical content and for protection of proprietary material rests solely with the author(s) and their organizations and is not the responsibility of the ULTIMATICS or its Editorial Staff. The main (first/corresponding) author is responsible for ensuring that the article has been seen and approved by all the other authors. It is the responsibility of the author to obtain all necessary copyright release permissions for the use of any copyrighted materials in the manuscript prior to the submission.