What are the strategies for handling outputs which exceed max output token

Hi,

I wondering what is the best strategy to handle outputs which exceed the output token limit of the LLM. Especially when the output is more deterministic and can’t be summarised.

e.g. Gemini has a limit of 8192 output token limit, if you use the web UI you can simply re-prompt the model with the last part of the output and ask it to continue and you will get the rest of the output.

in crewai if I want to store the output in a pydantic output structure, you can’t really re-prompt within the same crew without losing the previous output. Does this require some new feature request so you can output multiple outputs?

I was thinking that maybe it requires another follow-up crew which can take the last few lines of the previous crew output and re-prompt to test to see if there are any more output due but this feels like more of a hack to overcome this issue. (Although I’m unsure this will work as the followup crew would have all the preamble generated by crewai which will prevent the LLM from completing the follow-up).

Also just to add I don’t want to use another LLM provider e.g. openai with a higher token output as I need a more capable model e.g. Gemini which seems to consistently provide better outcomes in this case. Also the individual task can’t easily be broken down into a smaller subset as this is the smallest chunk but the output is large.