How to get CrewAI to properly use Ollama models

ynaponte · April 2, 2025, 10:34pm

I’m trying to use qwen2.5 models but i’m clearly doing somethin wrong, for a mode with 128k context windows, in a machine with 128 GB of RAM and 24 GB of VRAM, CAN’T PROCESS more then 5k token prompts. I’ve searched every where. Saw people using ChatOllama, using OllamaLLM, usign ChatOpenAI but withi ollama models, I was trying to use the LLM class. NOE WORK. Either gives you an error, or, in the case of the LLM class, it doesn’t configure the model context window correctly.
So please, I BEG, for as great as this framework is, it’s documentation is next to useless, HOW DO I USE OLLAMA MODELS CORRECTLY? (i’ve been at it for AT LEAST a month)

Max_Moura · April 3, 2025, 12:41am

This has nothing to do with CrewAI. It’s related to Ollama. By default, Ollama truncates input at 2048 tokens, even if the input is longer and the model itself could handle much larger contexts. Here a workaround is presented.

So, to test it, I’ll use the suggested workaround for Qwen2.5-3B by increasing the context to 32k in Ollama and creating a new model:

$ ollama run qwen2.5:3b
>>> /set parameter num_ctx 32768
Set parameter 'num_ctx' to '32768'
>>> /save qwen2.5:3b-32k
Created new model 'qwen2.5:3b-32k'
>>> /bye

Then I’ll ask the modified model to summarize a text that originally has over 12k tokens. Here’s the example code that successfully uses the modified version of the model:

from crewai import Agent, Task, Crew, LLM, Process
import requests

ollama_llm = LLM(
    model="ollama/qwen2.5:3b-32k",
    base_url="http://localhost:11434"
)

summarizer_agent = Agent(
    role="Expert Text Summarizer",
    goal=(
        "Create a concise, single-paragraph summary of the "
        "provided text content."
    ),
    backstory=(
        "You are a highly skilled editor with a talent for "
        "distilling complex information into a brief, easy-to-understand "
        "single paragraph summary."
    ),
    llm=ollama_llm,
    allow_delegation=False,
    verbose=False
)

summarize_task = Task(
    description=(
        "Read and analyze the following book content carefully: "
        "\n---\n{book_content}\n---\n"
        "Your goal is to extract the main ideas, themes, and key "
        "points presented in this text."
    ),
    expected_output=(
        "A single, well-written paragraph that accurately summarizes "
        "the essence of the provided book content. It should capture "
        "the core message or plot without going into excessive detail."
    ),
    agent=summarizer_agent
)

book_crew = Crew(
    agents=[summarizer_agent],
    tasks=[summarize_task],
    process=Process.sequential,
    verbose=False
)

url = "https://www.gutenberg.org/ebooks/75653.txt.utf-8"
response = requests.get(url)
response.raise_for_status()
book_content_string = response.text
print(f"\nLength of 'book_content_string': {len(book_content_string):,d} chars.")

result = book_crew.kickoff(
    inputs={
        "book_content": book_content_string
    }
)

print(f"\n## 🤖📚 Book Summary\n\n{result.raw}")

system · April 4, 2025, 12:41am

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Local LLMs tools calling CrewAI Community Support tools_issues , agent , crewai	1	245	March 28, 2025
🔥New ollama model qwen3:8b 🔥 General	12	487	May 17, 2025
A question regarding using open source LLMs through Ollama with CrewAI CrewAI Community Support	12	1260	September 18, 2024
Connecting Ollama with crewai Crews crewai	11	7742	May 25, 2025
Recommendations for running custom tools with local ollama models (having function calling capabilities) CrewAI Community Support tools_issues	5	209	May 15, 2025

How to get CrewAI to properly use Ollama models

Related topics