Extract sentences from beginning of news in single document summarization

I am working on Single Document Summarization task on News datasets. I do some experiments in this task. A simple experiment that I make and has a good result is extracting sentences just from beginning of news. Now I want to find any paper or research result about this type of sentence selection.

Is there any research to show how good is to choose sentences just from beginning of text without any reordering?

Topic automatic-summarization text-mining nlp

Category Data Science


Keselman, Schubert

Computational models for text summarization

The paper deals with methods (models) for text summarization. The reference (base) model was "first sentence model":

As a baseline for our models we used a trivial model that repeats the first sentence of the input document.

Then, various experiments and results are presented, like this one: (notice that "first sentence model" is always present as "baseline")

enter image description here

Moreover, one of the datasets for training and evaluation of models in this paper is DUC, which may be interesting to you.


Steinberger (doctoral Thesis, 2005)

Text Summarization within the LSA Framework

In section 2.1, the author discusses document summarization approaches based on sentence extraction. He identifies five approaches:

  • Surface Level Approaches
  • Corpus-based Approaches
  • Cohesion-based Approaches
  • Rhetoric-based Approaches
  • Graph-based Approaches

(The "First Sentence Approach" belongs to the *Surface Level Approaches") The author further describes these approaches and compares them.


Khodra, Widyantoro, Aziz, Trilaksono (Journal of ICT Research and Applications, 2011)

Free Model of Sentence Classifier for Automatic Extraction of Topic Sentences

The author identifies and tests methods for identifying the most important sentences in a text (see list of 58 items below, called features). Surprisingly, in the conclusion, it is said that position of the sentence is a dominant feature, meaning that including all other features into consideration leads only to small improvement.

  1. position
  2. sentence length
  3. number of words before a main verb
  4. adjective incidence
  5. existential there incidence
  6. incidence of 3rd person singular grammatical form
  7. anaphora incidence
  8. coordinators incidence
  9. cardinal number incidence
  10. incidence of past tense endings
  11. Hypernymy
  12. Polysemy
  13. concreteness index
  14. affect_formulai
  15. bad_formulaic
  16. comparison_formulaic
  17. continue_formulaic
  18. contrast_formulaic
  19. detail_formulaic
  20. future_formulaic
  21. gap_formulaic
  22. good_formulaic
  23. here_formulaic
  24. in_order_to_formulaic
  25. method_formulaic
  26. no_textstructure_formulaic
  27. similarity_formulaic
  28. them_formulaic
  29. textstructure_formulaic
  30. tradition_formulaic
  31. us_previous_formulaic
  32. affect
  33. argumentation
  34. better_solution
  35. change
  36. comparison
  37. continue
  38. contrast
  39. interest
  40. need
  41. presentation
  42. problem
  43. research
  44. solution
  45. textstructure
  46. use
  47. copula
  48. aim_ref_agent
  49. gap_agent
  50. general_agent
  51. problem_agent
  52. ref_agent
  53. ref_us_agent
  54. solution_agent
  55. textstructure_agent
  56. them_agent
  57. them_pronoun_agent
  58. us_agent

For you, the most important part of the paper may be table 5:

enter image description here

Read carefully the explanation of the table in the paper, and the whole Section 4.3.


Other papers worth examining:


Luhn (1958)

The Automatic Creation of Literature Abstracts


Kupiec, Pedersen, Chen (1995)

A Trainable Document Summarizer


Yang, Pedersen (1997)

A Comparative Study on Feature Selection in Text Categorization


Sebastiani (2002)

Machine Learning in Automated Text Categorization



Before evaluating how good is automatic summarization by taking the first sentence, you should decide how to evaluate summarization.

In supervised learning typically it is easy to know if the prediction match the concept - they should be identical. Upon that, you can choose a metric that fits your needs (e.g., accuracy, precision, recall) and compare classifiers.

The problem of evaluating text summarization is that deciding whether a summarization is good is subjective and prone to error.

A possible metric is ROUGE which is a set of heuristics (e.g., common longest subsequence) that compare the summary to the original text. Note that having a good ROUGE score is an estimation but an estimation that will enable you to benchmark your algorithm agains others. See EVALUATION MEASURES FOR TEXT SUMMARIZATION by Josef Steinberger, Karel Jeˇzek for a discussion of metrics and other algorithms. Showing that you get a good score with respect to an algorithm that is based on cue words will be a good results.

Another possibility is building a gold standard by comparing the first sentence to the text and manually labeling it for a good summary. While manually labeling will give you good estimates of the performance of your algorithm, it costly in term of time. A more severe disadvantage id that this gold standard is fitted to your algorithm and hard to use for other algorithms. Suppose that the second sentence is just as good algorithm as the first one. The gold standard built for the first sentence won't be able to show that.

For a good estimation I suggest that you will use ROUGE for comparison and a gold standard for absolute results. If you have the resources to create a gold standard for the benchmark algorithms, the comparison will become much more robust.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.