
ShieldAgent
Shielding Agents via Verifiable Safety Policy Reasoning
1University of Chicago, 2University of Illinois at Urbana-Champaign

Automatic Extraction: Parses policy documents and extracts verifiable rules
Rule Refinement: Iteratively refines rules to ensure they can be concretely verified
Action-Based Circuits: Organizes rules into probabilistic circuits for each agent action

Efficient Localization: Identifies relevant rule circuits for each action
Formal Verification: Executes specialized tools to verify each policy rule
Probabilistic Inference: Assigns safety labels and identifies rule violations

Comprehensive Coverage: 2K instructions across 7 risk categories and 6 web environments
Attack Simulation: Risky trajectories generated under agent-based and environment-based attacks
Explicit Risk Definition: Clear safety policies and violation mechanisms



ShieldAgent outperforms prior methods by 11.3% on SA-Guard and 9.4% on existing benchmarks, with high rule recall of 90.1%.

ShieldAgent reduces API queries by 64.7% and inference time by 58.2%, providing significant computational savings while maintaining high accuracy.
BibTeX
@article{chen2024shieldagent,
title={ShieldAgent: Shielding Agents via Verifiable Safety Policy Reasoning},
author={Chen, Zhaorun and Kang, Mintong and Li, Bo},
journal={arXiv preprint arXiv:XXXX.XXXXX},
year={2024}
}
This research was supported by [funding sources]. We thank [acknowledgments]. All content on this website is for research purposes only. If you have any questions or concerns, please contact us.