Issues with complex crew architexture, tool calling / memory. Running in air gapped environments

Ok this one is going to be a doozy. I am working on creating a complex simulation benchmarking system for local LLM’s using crewAI. I have been an early adopter of the tech but had to spend some time away and boy it has changed. (Super glad to see CrewAI take off Joao is awesome)

So break down of the problem. Local LLM’s are having issues with tool calls and memory. ( i havent had this issue with local LLMs in the past )

Agents interact in a virtual world that is basically controlled by 2 JSON files. One is the world state the other is an internal “email” communication system between the agents.

NOTE: THIS ALL NEEDS TO RUN IN AN AIR GAPPED ENVIRONMENT

I am dynamically generating the agents and crews from the yaml config files. The crew runs BUT they either refuse to use tools or use placeholder text instead of the proper commands. So the logs are about a few million lines so breaking them all down is pretty hard. I tried to use gemini but it choked so I had to use some better custom longer context models to break them down for me.

IV. Summary of Errors

  1. CRITICAL Configuration Error: Both Crew ‘{redacted} A’ and Crew ‘{redacted} B’ are missing their dedicated function_calling_llm instance. This is the primary and most severe error.

  2. Repeated Memory Failure: As a direct result of the missing crew LLM, every single task execution attempt for both crews fails to add information to long-term memory (Failed to add to long term memory: … ‘NoneType’ object has no attribute ‘function_calling_llm’). This prevents agents from building context and remembering information between steps.

Error:
Failed to add to long term memory: Failed to convert text into a Pydantic model due to error: ‘NoneType’ object has no attribute ‘function_calling_llm’

  1. Incorrect Agent Output Format: Many agents (especially in Crew B) failed to adhere to the critical instruction of responding only with a JSON tool call. They outputted plain text, explanatory text, or error messages instead.

  2. Tool Call Failures:

  • Agents frequently attempted tool calls using placeholder data (“…”) instead of context gathered from (failed) memory.

  • Some agents called the wrong tool for their assigned task (e.g., calling reader instead of sender/analyzer/{world state).

  • One agent ({redacted} ) attempted to call a tool (Draft {redacted} Message) that doesn’t seem to be in the assigned list or registry.

  1. Context Contamination / Hallucination: Agents produced outputs based on incorrect or generic information, likely due to memory failure (e.g., {redacted} _a’s {redacted} report, {redacted} _a’s inputs, {redacted} _b repeating {redacted} _b’s errors).

  2. Output Corruption: In later stages of Crew B’s execution, {redacted} _b started producing garbage/corrupted token sequences instead of coherent text or JSON.

  3. Inconsistent Formatting: Minor issue where Crew B agents sometimes prefixed their output with ToolCall:.

The weird thing is it seems only 1 agent is able to use tools. All the others just refuse or just send malformed data. I have tried about a dozen models so far. No luck.

Any thoughts of how to troubleshoot this?

So far from my research it seems crewAI has not been good at supporting local hosting and memory. Am I wrong? I never had issues witht hat in the past. Still searching to see what i can find.

Also I am using a few different things I may need to investigate like, my own version and endpoint of llama.cpp with expanded features hosting custom models… Which may be an issue. Also I had to custom patch Lite_llm to get it to work that may be effecting this. There was a “none type choices” issue that kept on popping up till I patched it. Im not really even making it to the point crew is calling to the /embeddings/ url

Any one got any ideas? Bueller? Bueller?

Hey @jklre thanks for sharing about what you’re building. It seems the issue lies in the model config. There’s a few issues raised lately about Ollama models specifically but this is related to LiteLLM bug and they have PR’s they’re reviewing to patch this issue with tool calling; so nothing wrong on the CrewAI side.

For your case, i’d also recommend you check out our new docs on how to custom implement your own LLM which is great for OSS models you want to integrate:

The other thing to keep in mind of course, is that not all models (esp. OSS) are good at function/tool calling and it usually might take a few tries for the agent to pass in the correct input.

If you still run into issues, feel free to sketch a diagram of your crew or use one of the LLMs to generate a mermaid diagram of how each component interacts and you can redact the names so it can help us troubleshoot it for you better.

Thanks Tony. Is there any way to cut out or bypass lite llm? We recently found a 0 day in it that allows bad actors to have full access over your endpoints.