Methods For Text Summarization Evaluation

Loading...
Thumbnail Image

Embargo Date

Degree type

Doctor of Philosophy (PhD)

Graduate group

Computer and Information Science

Discipline

Subject

evaluation
evaluation metrics
question-answering
summarization
Artificial Intelligence and Robotics

Funder

Grant number

License

Copyright date

2022-10-05T20:22:00-07:00

Distributor

Related resources

Contributor

Abstract

The ability to effectively evaluate a learned model is a critical component of machine learning research; without it, progress on tasks cannot be measured and is thus impossible. In the natural language processing task of text summarization, evaluation is incredibly difficult: the notion of the "perfect" summary content is ill-defined, but even if it could be defined, that content can be expressed in many different ways, making it difficult to identify in a summary. The evaluation metrics that researchers propose for text summarization must overcome these challenges in some way. In this thesis, I identify problems with the existing methodologies for evaluating summaries as well as meta-evaluating the quality of an evaluation metric and propose solutions for improving them. I demonstrate that commonly used evaluation metrics fail to properly evaluate the information content of summaries and propose an evaluation metric based on question-answering to address the shortcomings of existing metrics. Then, I argue that the class of metrics which attempt to evaluate the quality of a summary's content without the aid of a human-written reference is inherently biased and limited in its ability to evaluate summaries. Finally, I identify that the methodology for quantifying how well an automatic metric agrees with human judgments of summary quality fails to provide a complete understanding of a metric's performance. To that end, I propose new statistical analysis tools to address the limitations of the standard meta-evaluation procedure and provide a new protocol for meta-evaluating metrics that better evaluates metrics in realistic use cases.

Advisor

Date of degree

2022-01-01

Date Range for Data Collection (Start Date)

Date Range for Data Collection (End Date)

Digital Object Identifier

Series name and number

Volume number

Issue number

Publisher

Publisher DOI

Journal Issues

Comments

Recommended citation