How to get CrewAI to properly use Ollama models

maxmoura · April 3, 2025, 12:41am

This has nothing to do with CrewAI. It’s related to Ollama. By default, Ollama truncates input at 2048 tokens, even if the input is longer and the model itself could handle much larger contexts. Here a workaround is presented.

So, to test it, I’ll use the suggested workaround for Qwen2.5-3B by increasing the context to 32k in Ollama and creating a new model:

$ ollama run qwen2.5:3b
>>> /set parameter num_ctx 32768
Set parameter 'num_ctx' to '32768'
>>> /save qwen2.5:3b-32k
Created new model 'qwen2.5:3b-32k'
>>> /bye

Then I’ll ask the modified model to summarize a text that originally has over 12k tokens. Here’s the example code that successfully uses the modified version of the model:

from crewai import Agent, Task, Crew, LLM, Process
import requests

ollama_llm = LLM(
    model="ollama/qwen2.5:3b-32k",
    base_url="http://localhost:11434"
)

summarizer_agent = Agent(
    role="Expert Text Summarizer",
    goal=(
        "Create a concise, single-paragraph summary of the "
        "provided text content."
    ),
    backstory=(
        "You are a highly skilled editor with a talent for "
        "distilling complex information into a brief, easy-to-understand "
        "single paragraph summary."
    ),
    llm=ollama_llm,
    allow_delegation=False,
    verbose=False
)

summarize_task = Task(
    description=(
        "Read and analyze the following book content carefully: "
        "\n---\n{book_content}\n---\n"
        "Your goal is to extract the main ideas, themes, and key "
        "points presented in this text."
    ),
    expected_output=(
        "A single, well-written paragraph that accurately summarizes "
        "the essence of the provided book content. It should capture "
        "the core message or plot without going into excessive detail."
    ),
    agent=summarizer_agent
)

book_crew = Crew(
    agents=[summarizer_agent],
    tasks=[summarize_task],
    process=Process.sequential,
    verbose=False
)

url = "https://www.gutenberg.org/ebooks/75653.txt.utf-8"
response = requests.get(url)
response.raise_for_status()
book_content_string = response.text
print(f"\nLength of 'book_content_string': {len(book_content_string):,d} chars.")

result = book_crew.kickoff(
    inputs={
        "book_content": book_content_string
    }
)

print(f"\n## 🤖📚 Book Summary\n\n{result.raw}")

Topic		Replies	Views
Context window limit to 8192 tokens while using openai embed model CrewAI Community Support	0	482	September 18, 2024
OLLAMA and max_tokens etc CrewAI Community Support tools_issues , crewai , memory	2	3349	January 6, 2025
A question regarding using open source LLMs through Ollama with CrewAI CrewAI Community Support	12	1390	September 18, 2024
Connecting Ollama with crewai Crews crewai	14	10798	July 19, 2025
Regarding how CrewAI agents/task handles scenarios where the input tokens exceeds the LLMs context window? CrewAI Community Support task , crewai	0	79	May 26, 2025

How to get CrewAI to properly use Ollama models

Related topics