Great find. AgentLeak highlights a real blind spot — most governance happens at the output layer, but sensitive data leaks through intermediate messages, shared memory, and tool inputs long before the final output.
I’ve been working on this problem from a different angle. Instead of auditing after the fact, I built deterministic governance that intercepts at every layer before data moves between agents.
Here’s what works in practice with CrewAI:
Governed Memory — every write to shared memory passes through a Constitution Enforcer before it’s stored. The enforcer uses deterministic pattern matching (not another LLM) to block PII patterns, API keys, and sensitive data from entering the memory store at all. If it doesn’t pass governance, it never reaches the next agent.
AST Verification on tool inputs — when agents generate code that calls tools, static analysis via Python’s ast module catches dangerous patterns (eval, subprocess, hardcoded secrets) before execution. This covers the “tool input” leakage channel AgentLeak describes.
The crew_factory pattern — on retries, the entire crew is rebuilt with fresh provider assignments. This breaks the correlation between a compromised intermediate state and the retry, which is relevant to AgentLeak’s finding about state persistence across agent interactions.
Inter-agent message governance — every output that flows from one agent to another passes through the same 7-layer governance stack. There’s no unmonitored channel between agents because the governance layer sits in the execution path, not alongside it.
The key insight: you can’t solve inter-agent leakage by checking the final output. You have to put governance IN the pipeline, not around it. 5 of our 7 governance layers use zero LLM tokens — deterministic checks that can’t be bypassed by prompt injection.
Validated with 685 tests across 15 modules, externally verified through GitHub Actions CI. Works with CrewAI out of the box:
pip install dof-sdk
The framework is open source: GitHub - Cyberpaisa/deterministic-observability-framework: Deterministic architecture for multi-agent observability, verification and controlled degradation. · GitHub
Haven’t run the AgentLeak benchmark against DOF yet — that would be interesting. Their 32 attack classes would map well to our TestGenerator which already produces 400 adversarial tests across 4 categories. Happy to collaborate on that.
Honest caveat: our hallucination detection is regex-based (0% FDR on semantic adversarial tests). We catch structural PII leaks deterministically but not semantic ones. AgentLeak’s contribution on semantic leakage channels is something we’re actively thinking about.