Verifiable Tool Invocation Receipts for CrewAI Flows

I have been exploring a practical question around agent execution integrity and auditability: when an AI agent calls a sensitive tool, what should the evidence record look like, and how can it be checked outside the original agent runtime?

AI agents increasingly call tools, APIs, and data systems. After a tool call, teams may need portable evidence of:

  • what request was made
  • what policy snapshot applied
  • which tool was used
  • what input/output evidence was recorded
  • whether an independent verifier can detect later changes

I built a small open-source CrewAI Flow template around this pattern. The Flow wraps a sensitive tool call with:

  • an exact-match policy snapshot
  • a tool manifest
  • `guarded_tool_call()`
  • an evidence bundle
  • an Ed25519-signed execution receipt
  • an independent verification report

The core reusable piece is `guarded_tool_call()`. The CrewAI Flow is intentionally mostly orchestration around a receipt and validator pattern, so the validator can run outside the agent runtime that originally produced the artifacts.

Architecture summary:

```text
Execution request
→ policy snapshot
→ tool manifest
→ guarded tool call
→ evidence bundle
→ signed execution receipt
→ independent validator
→ verification report
```

Links:

I would appreciate concrete feedback on the receipt profile and validator rules:

  • Are the receipt fields minimal enough?
  • Which validator checks should be mandatory by default?
  • Should replay detection be mandatory or profile-specific?
  • Should policy snapshot references include issuer or trust-anchor metadata?
  • What evidence should be present for practical third-party audit?
  • Should tool manifests be treated as evidence, policy, or both?
  • What should remain out of scope?

Scope boundary:

This validates signed execution evidence.
It does not prove semantic correctness of a tool output.
It does not prove that the policy itself is correct.
It does not replace sandboxing, IAM, access control, monitoring, or human approval.
It does not certify legal compliance.
It does not require or expose raw chain-of-thought.

I am especially interested in feedback on:

  • receipt profile
  • validator rules
  • CrewAI Flow integration
  • audit/evidence assumptions