Verifiable Tool Invocation Receipts for CrewAI Flows

Bin_Zhang · April 21, 2026, 12:28pm

I have been exploring a practical question around agent execution integrity and auditability: when an AI agent calls a sensitive tool, what should the evidence record look like, and how can it be checked outside the original agent runtime?

AI agents increasingly call tools, APIs, and data systems. After a tool call, teams may need portable evidence of:

what request was made
what policy snapshot applied
which tool was used
what input/output evidence was recorded
whether an independent verifier can detect later changes

I built a small open-source CrewAI Flow template around this pattern. The Flow wraps a sensitive tool call with:

an exact-match policy snapshot
a tool manifest
`guarded_tool_call()`
an evidence bundle
an Ed25519-signed execution receipt
an independent verification report

The core reusable piece is `guarded_tool_call()`. The CrewAI Flow is intentionally mostly orchestration around a receipt and validator pattern, so the validator can run outside the agent runtime that originally produced the artifacts.

Architecture summary:

```text
Execution request
→ policy snapshot
→ tool manifest
→ guarded tool call
→ evidence bundle
→ signed execution receipt
→ independent validator
→ verification report
```

Links:

CrewAI Flow template: GitHub - joy7758/verifiable-tool-invocation-flow: CrewAI Flow template for signed execution receipts and independent validation of sensitive tool calls. · GitHub
Core PyPI package: verifiable-tool-invocation-flow · PyPI
Hugging Face demo: Agent Receipt Validator - a Hugging Face Space by joy7759
GitHub Marketplace Action: Verify Agent Execution Receipt · Actions · GitHub Marketplace · GitHub
MCP server package: Client Challenge
Landing note: verifiable-tool-invocation-flow/docs/public_landing_note.md at main · joy7758/verifiable-tool-invocation-flow · GitHub

I would appreciate concrete feedback on the receipt profile and validator rules:

Are the receipt fields minimal enough?
Which validator checks should be mandatory by default?
Should replay detection be mandatory or profile-specific?
Should policy snapshot references include issuer or trust-anchor metadata?
What evidence should be present for practical third-party audit?
Should tool manifests be treated as evidence, policy, or both?
What should remain out of scope?

Scope boundary:

This validates signed execution evidence.
It does not prove semantic correctness of a tool output.
It does not prove that the policy itself is correct.
It does not replace sandboxing, IAM, access control, monitoring, or human approval.
It does not certify legal compliance.
It does not require or expose raw chain-of-thought.

I am especially interested in feedback on:

receipt profile
validator rules
CrewAI Flow integration
audit/evidence assumptions

Topic		Replies	Views
Validating a task actually called a tool CrewAI Community Support	2	56	November 6, 2025
Agent eval tools? General agent	1	34	May 1, 2026
Human verification before tool execution CrewAI Community Support agent , crewai	1	148	April 1, 2025
How do I see actual request and response for Tools used during Tracing General	8	588	April 13, 2025
Tool Usage not guaranteed General tools_issues , agent , task , crewai , flows	7	250	May 23, 2025

Verifiable Tool Invocation Receipts for CrewAI Flows

Related topics