Methods For Text Summarization Evaluation

Deutsch, Daniel

Methods For Text Summarization Evaluation

Files

Deutsch_upenngdas_0175C_15195.pdf (3.06 MB)

Degree type

Doctor of Philosophy (PhD)

Graduate group

Computer and Information Science

Subject

evaluation
evaluation metrics
question-answering
summarization
Artificial Intelligence and Robotics

Copyright date

2022-10-05T20:22:00-07:00

Permalink

https://repository.upenn.edu/handle/20.500.14332/32097

View all metadata

Author

Deutsch, Daniel

Abstract

The ability to effectively evaluate a learned model is a critical component of machine learning research; without it, progress on tasks cannot be measured and is thus impossible. In the natural language processing task of text summarization, evaluation is incredibly difficult: the notion of the "perfect" summary content is ill-defined, but even if it could be defined, that content can be expressed in many different ways, making it difficult to identify in a summary. The evaluation metrics that researchers propose for text summarization must overcome these challenges in some way. In this thesis, I identify problems with the existing methodologies for evaluating summaries as well as meta-evaluating the quality of an evaluation metric and propose solutions for improving them. I demonstrate that commonly used evaluation metrics fail to properly evaluate the information content of summaries and propose an evaluation metric based on question-answering to address the shortcomings of existing metrics. Then, I argue that the class of metrics which attempt to evaluate the quality of a summary's content without the aid of a human-written reference is inherently biased and limited in its ability to evaluate summaries. Finally, I identify that the methodology for quantifying how well an automatic metric agrees with human judgments of summary quality fails to provide a complete understanding of a metric's performance. To that end, I propose new statistical analysis tools to address the limitations of the standard meta-evaluation procedure and provide a new protocol for meta-evaluating metrics that better evaluates metrics in realistic use cases.

Advisor

Dan Roth

Date of degree

2022-01-01

Collection

Dissertations and Theses

Methods For Text Summarization Evaluation

Files

Embargo Date

Degree type

Graduate group

Discipline

Subject

Funder

Grant number

License

Copyright date

Distributor

Related resources

Permalink

Author

Contributor

Abstract

Advisor

Date of degree

Date Range for Data Collection (Start Date)

Date Range for Data Collection (End Date)

Digital Object Identifier

Series name and number

Volume number

Issue number

Publisher

Publisher DOI

Journal Issues

Comments

Recommended citation

Collection

Penn's Heritage