What are the strategies for handling outputs which exceed max output token

andrew3 · March 4, 2025, 10:41am

Hi,

I wondering what is the best strategy to handle outputs which exceed the output token limit of the LLM. Especially when the output is more deterministic and can’t be summarised.

e.g. Gemini has a limit of 8192 output token limit, if you use the web UI you can simply re-prompt the model with the last part of the output and ask it to continue and you will get the rest of the output.

in crewai if I want to store the output in a pydantic output structure, you can’t really re-prompt within the same crew without losing the previous output. Does this require some new feature request so you can output multiple outputs?

I was thinking that maybe it requires another follow-up crew which can take the last few lines of the previous crew output and re-prompt to test to see if there are any more output due but this feels like more of a hack to overcome this issue. (Although I’m unsure this will work as the followup crew would have all the preamble generated by crewai which will prevent the LLM from completing the follow-up).

Also just to add I don’t want to use another LLM provider e.g. openai with a higher token output as I need a more capable model e.g. Gemini which seems to consistently provide better outcomes in this case. Also the individual task can’t easily be broken down into a smaller subset as this is the smallest chunk but the output is large.

Topic		Replies	Views
Large Outputs from the Tools CrewAI Community Support tools_issues	7	527	May 12, 2025
Regarding how CrewAI agents/task handles scenarios where the input tokens exceeds the LLMs context window? CrewAI Community Support task , crewai	0	37	May 26, 2025
Iteration limit or time limit issue when reading too many files CrewAI Community Support	9	391	September 15, 2024
How to limit token usage? (For infinite loops) CrewAI Community Support	8	856	October 15, 2024
OLLAMA and max_tokens etc CrewAI Community Support tools_issues , crewai , memory	2	2624	January 6, 2025

What are the strategies for handling outputs which exceed max output token

Related topics