Safe Reinforcement Learning Using Shield Synthesis for LTL Modulo Theories
In recent years, machine learning models and in particular reinforcement learning (RL) have achieved remarkable success in various domains, including sequential decision makers. However, these models tend to exhibit unsafe behaviors, precluding their deployment in safety-critical systems. To cope with this issue, considerable research focuses on developing methods that guarantee the safe behavior of a given RL model, also referred to as "Safe Reinforcement Learning".
A prominent approach to safe RL is shielding which incorporates an external component generated using formal methods, called a shield, that blocks or corrects unwanted behavior. Despite significant progress, shielding suffers from a major setback: classical shielding starts from properties in propositional temporal logics, typically LTL, and is unsuitable for richer logics. This, in turn, limits the widespread applicability of shielding in many real-world systems where the dynamics are complex.
In this work, we address this gap, and extend shielding to LTL modulo theories, by building upon recent advances in reactive synthesis modulo theories. LTL modulo theories allows the use of both temporal modalities and literals for arbitrary theories. This has allowed us to develop a novel approach for generating shields conforming to
complex safety specifications in these more expressive, logics.
This is a "neurosymbolic" solution, in the sense that we build a system that combines the complex goals and behaviors of RL with the safety guarantees of formal synthesis.
The roadmap of the talk is to (1) motivate safe reinforcement learning (2) introduce temporal logic and reactive synthesis (3) present our work on synthesis modulo theories, (4) describe how to extend synthesis modulo theories for shielding.