General Versus Specific Sentences: Automatic Identification and Application to Analysis of News Summaries

Loading...
Thumbnail Image

Related Collections

Degree type

Discipline

Subject

Funder

Grant number

License

Copyright date

Distributor

Related resources

Author

Louis, Annie

Contributor

Abstract

In this paper, we introduce the task of identifying general and specific sentences in news articles. Instead of embarking on a new annotation effort to obtain data for the task, we explore the possibility of leveraging existing large corpora annotated with discourse information to train a classifier. We introduce several classes of features that capture lexical and syntactic information, as well as word specificity and polarity. We then use the classifier to analyze the distribution of general and specific sentences in human and machine summaries of news articles. We discover that while all types of summaries tend to be more specific than the original documents, human abstracts contain a more balanced mix of general and specific sentences but automatic summaries are overwhelmingly specific. Our findings give strong evidence for the need for a new task in (abstractive) summarization: identification and generation of general sentences.

Advisor

Date Range for Data Collection (Start Date)

Date Range for Data Collection (End Date)

Digital Object Identifier

Series name and number

Publication date

2011-01-01

Volume number

Issue number

Publisher

Publisher DOI

relationships.isJournalIssueOf

Comments

University of Pennsylvania Department of Computer and Information Science Technical Report No. MS-CIS-11-07.

Recommended citation

Collection