I’m trying to use qwen2.5 models but i’m clearly doing somethin wrong, for a mode with 128k context windows, in a machine with 128 GB of RAM and 24 GB of VRAM, CAN’T PROCESS more then 5k token prompts. I’ve searched every where. Saw people using ChatOllama, using OllamaLLM, usign ChatOpenAI but withi ollama models, I was trying to use the LLM class. NOE WORK. Either gives you an error, or, in the case of the LLM class, it doesn’t configure the model context window correctly.
So please, I BEG, for as great as this framework is, it’s documentation is next to useless, HOW DO I USE OLLAMA MODELS CORRECTLY? (i’ve been at it for AT LEAST a month)
This has nothing to do with CrewAI. It’s related to Ollama. By default, Ollama truncates input at 2048 tokens, even if the input is longer and the model itself could handle much larger contexts. Here a workaround is presented.
So, to test it, I’ll use the suggested workaround for Qwen2.5-3B by increasing the context to 32k in Ollama and creating a new model:
$ ollama run qwen2.5:3b
>>> /set parameter num_ctx 32768
Set parameter 'num_ctx' to '32768'
>>> /save qwen2.5:3b-32k
Created new model 'qwen2.5:3b-32k'
>>> /bye
Then I’ll ask the modified model to summarize a text that originally has over 12k tokens. Here’s the example code that successfully uses the modified version of the model:
from crewai import Agent, Task, Crew, LLM, Process
import requests
ollama_llm = LLM(
model="ollama/qwen2.5:3b-32k",
base_url="http://localhost:11434"
)
summarizer_agent = Agent(
role="Expert Text Summarizer",
goal=(
"Create a concise, single-paragraph summary of the "
"provided text content."
),
backstory=(
"You are a highly skilled editor with a talent for "
"distilling complex information into a brief, easy-to-understand "
"single paragraph summary."
),
llm=ollama_llm,
allow_delegation=False,
verbose=False
)
summarize_task = Task(
description=(
"Read and analyze the following book content carefully: "
"\n---\n{book_content}\n---\n"
"Your goal is to extract the main ideas, themes, and key "
"points presented in this text."
),
expected_output=(
"A single, well-written paragraph that accurately summarizes "
"the essence of the provided book content. It should capture "
"the core message or plot without going into excessive detail."
),
agent=summarizer_agent
)
book_crew = Crew(
agents=[summarizer_agent],
tasks=[summarize_task],
process=Process.sequential,
verbose=False
)
url = "https://www.gutenberg.org/ebooks/75653.txt.utf-8"
response = requests.get(url)
response.raise_for_status()
book_content_string = response.text
print(f"\nLength of 'book_content_string': {len(book_content_string):,d} chars.")
result = book_crew.kickoff(
inputs={
"book_content": book_content_string
}
)
print(f"\n## 🤖📚 Book Summary\n\n{result.raw}")
1 Like
This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.