Keselman, Schubert
The paper deals with methods (models) for text summarization. The reference (base) model was "first sentence model":
As a baseline for our models we used a trivial model that repeats the
first sentence of the input document.
Then, various experiments and results are presented, like this one: (notice that "first sentence model" is always present as "baseline")
![enter image description here](https://i.stack.imgur.com/M4E6j.png)
Moreover, one of the datasets for training and evaluation of models in this paper is DUC, which may be interesting to you.
Steinberger (doctoral Thesis, 2005)
In section 2.1, the author discusses document summarization approaches based on sentence extraction. He identifies five approaches:
- Surface Level Approaches
- Corpus-based Approaches
- Cohesion-based Approaches
- Rhetoric-based Approaches
- Graph-based Approaches
(The "First Sentence Approach" belongs to the *Surface Level Approaches") The author further describes these approaches and compares them.
Khodra, Widyantoro, Aziz, Trilaksono (Journal of ICT Research and Applications, 2011)
The author identifies and tests methods for identifying the most important sentences in a text (see list of 58 items below, called features). Surprisingly, in the conclusion, it is said that position of the sentence is a dominant feature, meaning that including all other features into consideration leads only to small improvement.
- position
- sentence length
- number of words before a main verb
- adjective incidence
- existential there incidence
- incidence of 3rd person singular grammatical form
- anaphora incidence
- coordinators incidence
- cardinal number incidence
- incidence of past tense endings
- Hypernymy
- Polysemy
- concreteness index
- affect_formulai
- bad_formulaic
- comparison_formulaic
- continue_formulaic
- contrast_formulaic
- detail_formulaic
- future_formulaic
- gap_formulaic
- good_formulaic
- here_formulaic
- in_order_to_formulaic
- method_formulaic
- no_textstructure_formulaic
- similarity_formulaic
- them_formulaic
- textstructure_formulaic
- tradition_formulaic
- us_previous_formulaic
- affect
- argumentation
- better_solution
- change
- comparison
- continue
- contrast
- interest
- need
- presentation
- problem
- research
- solution
- textstructure
- use
- copula
- aim_ref_agent
- gap_agent
- general_agent
- problem_agent
- ref_agent
- ref_us_agent
- solution_agent
- textstructure_agent
- them_agent
- them_pronoun_agent
- us_agent
For you, the most important part of the paper may be table 5:
![enter image description here](https://i.stack.imgur.com/kMmsg.png)
Read carefully the explanation of the table in the paper, and the whole Section 4.3.
Other papers worth examining:
Luhn (1958)
Kupiec, Pedersen, Chen (1995)
Yang, Pedersen (1997)
Sebastiani (2002)