Large Outputs from the Tools

Gajjala_Yeswanth · December 30, 2024, 5:00pm

Hi Support Team,
I would like to understand how the crew/agent behaves after getting the large outputs from the tools that we design which in general exceeds the token limit. How does crew tackles this one. Does this put the whole message in the next prompt ? If so how does the provider responds and how does the agents gracefully tackles this one ?

morozovdd · January 4, 2025, 1:05am

also interested in the topic. Is there a way to pass large amount of data (that exceeds the token limit) to another agent?

nightshiba · January 18, 2025, 4:24pm

Hey there, here’s a Reddit thread on tool output issues where others faced similar problems.

I’ve also bumped into the same issue when using tools that output a lot. It seems like there’s no built-in way to cap the output from these tools directly. I noticed this especially when working with the FileReadTool; configuring it to split up the responses into smaller parts didn’t work. Even when I set the max_tokens and max_completion_tokens to a small number on the LLM object, it still throws errors:

llm.py-llm:187 - ERROR: LiteLLM call failed: litellm.BadRequestError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: VertexAIException BadRequestError - {\n  "error": {\n    "code": 400,\n    "message": "The input token count (2036021) exceeds the maximum number of tokens allowed (1000000).",\n    "status": "INVALID_ARGUMENT"\n  }

Rohithreddy · April 28, 2025, 1:19pm

any solution other than changing the llm model?

zinyando · May 1, 2025, 7:31am

You might need to use RAG, that way, you can control the context size. Other than that, I can’t think of anything else right now.

jklre · May 9, 2025, 10:39pm

I thought RAG and [context] between tasks handled this but I may be wrong.

maxmoura · May 12, 2025, 1:51am

Alright folks, lemme throw in my two cents on this topic. But before I dive in, let me make one thing crystal clear:

There’s no magic bullet for cramming more context into your LLM than its context window can handle. Any attempt to do so will inevitably boil down to some form of content synthesis, and let’s be honest, any synthesis means you’re losing something, okay? The RAG technique is still one of the most solid and reliable approaches because the way it trims things down uses semantic criteria to try and fetch the best chunks. This shrinks the context but actually boosts the quality of the answers. So, yeah, what @zinyando and @jklre suggested is definitely your best bet.

Now, for those situations where you figure RAG isn’t strictly necessary, but you still gotta watch that context window limit (like when an Agent keeps calling tools over and over for the same task, and the context from those tools just keeps piling up), nothing beats having the tool itself offer some way to truncate the content. To encourage y’all to get into the habit of cooking up custom tools that actually fit what your projects need, I’m sharing some simple alternative versions of FileReadTool and ScrapeWebsiteTool. These are two of the usual suspects for retrieving context, and they can easily blow past those limits. The cool thing about these alternative versions is they give you four ways to retrieve content:

Full content extraction: This makes the tools behave just like their counterparts in crewai_tools – fully compatible.
Partial reads based on character limits (head truncation): This is your classic truncation. The tool just grabs the first max_chars characters from the content. For some things, that’s all you need.
Partial reads based on chunking (random chunks): With this mode, the content gets sliced into 1000-character chunks. max_chars is rounded down to the nearest multiple of 1000, and that many chunks (max_chars / 1000) are returned. The kicker is, the first and last chunks of the original content are always included, and the ones in between are picked at random. This mode is pretty handy if you need to keep a good mix of the content while making sure you don’t send more than max_chars to your Agent’s LLM.
AI-powered summarization: Here, you pass in a crewai.LLM, and it takes care of summarizing the content. To stop the summarizer LLM itself from hitting its context limit, it uses method #3 (random chunking) to pull up to 34,000 characters (around 8,500 tokens) and then whips up a summary of up to 6,000 characters (around 1,500 tokens). How many tokens you actually get back depends on how well the summarizer LLM plays ball. Now, this summarization LLM doesn’t need to be (and frankly, I don’t think it should be) super sophisticated. Summarizing is a pretty straightforward job, and any LLM with a 12,000-token context window can handle it, so you can get good summaries without spending a fortune.

Just keep in mind that these three truncation methods come with a trade-off: your Agent won’t be getting the entire content anymore (though, let’s be real, it might not have been getting it all anyway, which is probably why we’re even talking about this ).

You can find the alternative version of FileReadTool here.
And the alternative version of ScrapeWebsiteTool here.

In the examples below, I’m using this file (a hefty 865,173 characters) to test out the FileReadTool, and this site (24,319 characters) to test out the ScrapeWebsiteTool.

VersatileFileReadTool - Full Content Extraction (Default/Compatibility Mode):

from versatile_file_read_tool import VersatileFileReadTool

file_read_tool = VersatileFileReadTool()

content = file_read_tool.run(
    file_path="dracula_by_bram_stoker.txt"
)

print(f"Retrieved {len(content)} chars")

# Using Tool: File Reading Tool
# Retrieved 865173 chars

VersatileFileReadTool - Partial Extraction (Head Truncation Mode):

from versatile_file_read_tool import VersatileFileReadTool

file_read_tool = VersatileFileReadTool(
    retrieval_mode="head",
    max_chars=13002
)

content = file_read_tool.run(
    file_path="dracula_by_bram_stoker.txt"
)

print(f"Retrieved {len(content)} chars")

# Using Tool: File Reading Tool
# Retrieved 13002 chars

VersatileFileReadTool - Partial Extraction (Random Chunking Mode):

from versatile_file_read_tool import VersatileFileReadTool

file_read_tool = VersatileFileReadTool(
    retrieval_mode="random_chunks",
    max_chars=13321
)

content = file_read_tool.run(
    file_path="dracula_by_bram_stoker.txt"
)

print(f"Retrieved {len(content)} chars")

# Using Tool: File Reading Tool
# Retrieved 12212 chars

VersatileFileReadTool - AI-Powered Extraction (Summarization Mode):

from versatile_file_read_tool import VersatileFileReadTool
from crewai import LLM
import os

os.environ["GEMINI_API_KEY"] = "<YOUR_API_KEY>"

gemini_llm = LLM(
    model="gemini/gemini-2.5-flash-preview-04-17",
    temperature=0.3
)

file_read_tool = VersatileFileReadTool(
    retrieval_mode="summarize",
    llm=gemini_llm
)

content = file_read_tool.run(
    file_path="dracula_by_bram_stoker.txt"
)

print(f"Retrieved {len(content)} chars")
print(content)

# Using Tool: File Reading Tool
# Retrieved 3967 chars
# This text contains excerpts from Bram Stoker's novel *Dracula*, presented...

VersatileScrapeWebsiteTool - AI-Powered Extraction (Summarization Mode):

from versatile_scrape_website_tool import VersatileScrapeWebsiteTool
from crewai import LLM
import os

os.environ["GEMINI_API_KEY"] = "<YOUR_API_KEY>"

gemini_llm = LLM(
    model="gemini/gemini-2.5-flash-preview-04-17",
    temperature=0.3
)

scrape_website_tool = VersatileScrapeWebsiteTool(
    retrieval_mode="summarize",
    llm=gemini_llm
)

content = scrape_website_tool.run(
    website_url="https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/"
)

print(f"Retrieved {len(content)} chars")
print(content)

# Using Tool: Website Scraping Tool
# Retrieved 2621 chars
# Google Cloud has announced the Agent2Agent Protocol (A2A), a new open protocol...

Tony_Wood · May 12, 2025, 7:33am

Thank you for this… This is much needed as we start to increase context windows on projects. It would be great if these ideas were alongside the doco from crewai
Keep them coming

Topic		Replies	Views
Problem with context window using tools CrewAI Community Support tools_issues , crewai , memory	0	99	June 30, 2025
Overcoming "litellm.ContextWindowExceededError" and too big context lenght CrewAI Community Support	10	467	April 29, 2025
Regarding how CrewAI agents/task handles scenarios where the input tokens exceeds the LLMs context window? CrewAI Community Support task , crewai	0	86	May 26, 2025
Context limit and hallucinations when reading from file CrewAI Community Support	1	709	November 1, 2024
CREWAI Agent keeps triggering rate limits after tool completion CrewAI Community Support openai , agent	1	398	February 22, 2025

Large Outputs from the Tools

Related topics