Reinforcement Learning With High-Level Task Specifications

Wen, Min

Reinforcement Learning With High-Level Task Specifications

Files

Wen_upenngdas_0175C_13856.pdf (1.99 MB)

Degree type

Doctor of Philosophy (PhD)

Graduate group

Electrical & Systems Engineering

Subject

Game theory
Inverse reinforcement learning
Learning-based control
Learning from demonstration
Reinforcement learning
Temporal logic specifications
Artificial Intelligence and Robotics
Computer Sciences

Copyright date

2019-10-23T00:00:00-07:00

Permalink

https://repository.upenn.edu/handle/20.500.14332/30540

View all metadata

Author

Wen, Min

Abstract

Reinforcement learning (RL) has been widely used, for example, in robotics, recommendation systems, and financial services. Existing RL algorithms typically optimize reward-based surrogates rather than the task performance itself. Therefore, they suffer from several shortcomings in providing guarantees for the task performance of the learned policies: An optimal policy for a surrogate objective may not have optimal task performance. A reward function that helps achieve satisfactory task performance in one environment may not transfer well to another environment. RL algorithms tackle nonlinear and nonconvex optimization problems and may, in general, not able to find globally optimal policies. The goal of this dissertation is to develop RL algorithms that explicitly account for formal high-level task specifications and equip the learned policies with provable guarantees for the satisfaction of these specifications. The resulting RL and inverse RL algorithms utilize multiple representations of task specifications, including conventional reward functions, expert demonstrations, temporal logic formulas, trajectory-based constraint functions as well as their combinations. These algorithms offer several promising capabilities. First, they automatically generate a memory transition system, which is critical for tasks that cannot be implemented by memoryless policies. Second, the formal specifications can act as reliable performance criteria for the learned policies despite the quality of the designed reward functions and variations in the underlying environments. Third, the algorithms enable online RL that never violates critical task and safety requirements, even during exploration.

Advisor

Ufuk Topcu

George J. Pappas

Date of degree

2019-01-01

Collection

Dissertations and Theses

Reinforcement Learning With High-Level Task Specifications

Files

Degree type

Graduate group

Discipline

Subject

Funder

Grant number

License

Copyright date

Distributor

Related resources

Permalink

Author

Contributor

Abstract

Advisor

Date of degree

Date Range for Data Collection (Start Date)

Date Range for Data Collection (End Date)

Digital Object Identifier

Series name and number

Volume number

Issue number

Publisher

Publisher DOI

relationships.isJournalIssueOf

Comments

Recommended citation

Collection

Penn's Heritage