General Versus Specific Sentences: Automatic Identification and Application to Analysis of News Summaries

Louis, Annie; Nenkova, Ani

General Versus Specific Sentences: Automatic Identification and Application to Analysis of News Summaries

Files

MS_CIS_11_07.pdf (150.58 KB)

Related Collections

Technical Reports (CIS)

Permalink

https://repository.upenn.edu/handle/20.500.14332/7921

View all metadata

Author

Louis, Annie

Nenkova, Ani

Abstract

In this paper, we introduce the task of identifying general and specific sentences in news articles. Instead of embarking on a new annotation effort to obtain data for the task, we explore the possibility of leveraging existing large corpora annotated with discourse information to train a classifier. We introduce several classes of features that capture lexical and syntactic information, as well as word specificity and polarity. We then use the classifier to analyze the distribution of general and specific sentences in human and machine summaries of news articles. We discover that while all types of summaries tend to be more specific than the original documents, human abstracts contain a more balanced mix of general and specific sentences but automatic summaries are overwhelmingly specific. Our findings give strong evidence for the need for a new task in (abstractive) summarization: identification and generation of general sentences.

Publication date

2011-01-01

Comments

University of Pennsylvania Department of Computer and Information Science Technical Report No. MS-CIS-11-07.

Collection

Reports

General Versus Specific Sentences: Automatic Identification and Application to Analysis of News Summaries

Files

Related Collections

Degree type

Discipline

Subject

Funder

Grant number

License

Copyright date

Distributor

Related resources

Permalink

Author

Contributor

Abstract

Advisor

Date Range for Data Collection (Start Date)

Date Range for Data Collection (End Date)

Digital Object Identifier

Series name and number

Publication date

Volume number

Issue number

Publisher

Publisher DOI

relationships.isJournalIssueOf

Comments

Recommended citation

Collection

Penn's Heritage