Azure's o3 and o3-mini fail to call CrewAI tool

jon-torres · July 30, 2025, 8:21pm

Hi,

We have been using crewAI to prototype a feature and it’s running under Bedrock Agentcore. We had issues when invoking azure/o3 and azure/o3-mini as our llm models. They are supposed to make use of crewAI’s FileReadTool and it seems able to do so, but it’s very intermittent and common behaviour is not making use of the tool. For instance, it may call the tool in 1 out of 10 times.

We have also tested the models below and all of them performed the tool calling accurately in every run, without exception:

bedrock/us.anthropic.claude-sonnet-4-20250514-v1:0
gemini/gemini-2.5-pro
azure/gpt-4.1

OS and package versions:

macOS: 15.5 (24F74)
crewai>=0.150.0
crewai_tools==0.58.0

We have also tried function_calling_llm: azure/gpt-4.1 and it did not work for both o3 and o3-mini.

That’s the agent’s yaml, the task’s yaml and python code. I can’t share everything so I tried to keep it reproducible.

analyst:
    role: >
       Strategic Analyst
    goal: >
       Analyze multiple markdown files ({markdown_files}).
    backstory: >
       You are a top-tier analyst.
    allow_delegation: false
    llm: azure/o3
    temperature: 0.1

markdown_generation_task:
  description: >
    Read all files ({markdown_files}) using the FileReadTool.
  expected_output: >
    A single, comprehensive Markdown document.
  agent: analyst

from crewai_tools import FileReadTool
file_tool = FileReadTool()
    @agent
    def analyst(self) -> Agent:
        return Agent(
            config=self.agents_config["analyst"],
            tools=[file_tool],
            verbose=True,
        )
    @task
    def markdown_generation_task(self) -> Task:
        return Task(
            config=self.tasks_config["markdown_generation_task"],
            output_file="file.md",
            tools=[file_tool],
        )

Max_Moura · July 30, 2025, 10:22pm

Well, there’s a pretty long discussion on the CrewAI GitHub about failures in function calling (tool usage). Here I shared my two cents, and here I proposed a proof-of-concept for how we might be able to improve this aspect of the framework.

Right off the bat, I’d suggest paying closer attention to your prompt engineering. Try to be more explicit about how your agent is supposed to use its tools. I think that could significantly improve the chances of them being used correctly.

I’d also suggest sharing the logs/output from your execution. That way, other users can get a better handle on the issues you’re running into and can actually contribute to the discussion.

jon-torres · July 31, 2025, 12:12pm

Hey Max,

Thank you for taking the time to answer and for the heads-up and insights.

I have been following the GitHub issues, but I’m not sure if they are related to my case. I have run the same prompt with Claude 4 over 30 times, and a smaller number of times with Gemini-2.5-Pro and GPT-4.1. With these models, the process works as intended (i.e., by calling the tool) 100% of the time.

This leads me to believe the issue lies with the combination of the o3 family of models and CrewAI. When I run the same query using only LiteLLM, the models seem to work correctly, although I don’t have reliable data on that yet. This points toward the issue mentioned in your second link; it’s likely that the way CrewAI prompts the o3 models is not optimal, and I will investigate this further.

Regarding the incorrect output, the failures manifested in a few ways: either a blank output or a message saying, “I do not have access to the files.” The latter would lead to the LLM generating the markdown with the given instructions but filling it with random content.

We’d like to use the o3 models because of their cost-effectiveness and their capabilities with “agency” and function calling. It seems we might have to move to a “Flow” paradigm and call the model directly within a function.

edit: Just as a final thought, the fact that function_calling_llm: azure/gpt-4.1 (or any other aforementioned working model) won’t work intrigues me.

Max_Moura · July 31, 2025, 2:05pm

Hey Jon, thanks for the feedback.

Looks like you’ve run into a situation where the framework’s default behavior isn’t working as expected, and at this point, the best solution is to debug your agentic system. To do that, you’ll need to isolate the problem and get an x-ray of what’s happening under the hood:

Build a simple, specific Crew that deals directly with the LLM and the tool you’re having trouble with. This way, you avoid sharing sensitive customer (or personal) data while still having a minimal reproducible example of the issue.
Then I recommend adding a crucial tool for this debugging/optimization phase of your agentic system: a monitoring and observability platform. I suggest AgentOps or Phoenix, as they both offer smooth integration with CrewAI and a decent enough free tier.

Keep the community posted on your progress so everyone can benefit from your findings.

Topic		Replies	Views
Azure O1 model causes crew.kickoff() error: BadRequestError: litellm.BadRequestError: AzureException - Error code: 400 - “Unsupported value: ‘messages[0].role’ does not support ‘system’ with this model.” General openai , agent , crewai	9	394	December 8, 2024
Is the OpenAI model o3-mini working on crewai CrewAI Community Support	13	786	May 15, 2025
Error while kicking off the Crew with custom LLM - AzureOpenAI CrewAI Community Support tools_issues , agent , crewai	10	942	November 15, 2024
I need to use Azure OpenAI Gpt-4o-mini API for my crewai agents, I am struggling with errors only and I can;t find Documentation CrewAI Community Support openai , crewai	3	245	May 23, 2025
litellm.BadRequestError: LLM Provider NOT provided General crewai	12	3527	April 7, 2025

Azure's o3 and o3-mini fail to call CrewAI tool

Related topics