Overcoming "litellm.ContextWindowExceededError" and too big context lenght

Hey everyone,

I am trying to set up an agent that scrap and analyze data. But in run into a problem during the first step of scrapping.

At the end, the scrapper (I use Firecrawl) return a context of 220.000 tokens above the anthropic context window of 200.000.

So when the agent tried to feed the llm I go an error and my scrapper agent fail.

Error during LLM call: litellm.ContextWindowExceededError: litellm.BadRequestError: AnthropicError - {“type”:“error”,“error”:{“type”:“invalid_request_error”,“message”:“prompt is too long: 217095 tokens > 200000 maximum”}}
An unknown error occurred. Please check the details below.
Error details: litellm.ContextWindowExceededError: litellm.BadRequestError: AnthropicError - {“type”:“error”,“error”:{“type”:“invalid_request_error”,“message”:“prompt is too long: 217095 tokens > 200000 maximum”}}

I don’t really know how to tackle this issue. If someone have an idea :slight_smile:

You could try using Google Gemini models, they have a larger context window.

1 Like

Well, with over 200k tokens, you might also want to consider an intermediate step where the scraping results feed into a RAG system, which would then be used during your data analysis phase.

@zinyando’s suggestion to use an LLM with a huge context window is great. It’s (almost) a form of Cache-Augmented Generation (CAG), just without the ‘cache’. Alternatively, using RAG is another way to reduce the dependency on those massive context windows.

1 Like

smart ! thank you I will try that

I dont have access to gemini models, but after scraping a website the token length exceeded my llm context (>128k) , is there any variable that i can declare so that the length truncates after certain length or how to overcome this problem?

What is your exact use case after scraping the website?

Depending on how you are scrapping it and how you intend to use the data then you might try a RAG setup as @Max_Moura mentioned above.

I am trying to scrap somewebsites based on the some topic where the websites are collected by serperdev tool and the collected website links will be scraped by firecrawl and i have to generate a report on the scraped content , this is exactly the usecase

1 Like

Hey,

I was trying to set-up a rag after scrapping then I understood I can pass the first task of scrapping because of the text lenght. It seems that at the end of the scrapping the agent trying to feed the llm right away while he should just write the result and nothing more.

Any idea how I can make the agent stop and just write the result?

1 Like

Interesting! But it doesn’t seem to work with FirecrawlCrawlWebsiteTool.

crawler_tool = FirecrawlCrawlWebsiteTool(url=“some url”)

tools=[crawler_tool(result_as_answer=True), file_writer_tool]

It returns an error:

TypeError: ‘FirecrawlCrawlWebsiteTool’ object is not callable
During handling of the above exception, another exception occurred:

Did you manage to make it work?

I think the latest version of crewai this has been resolved.
Now if the this particular is recieved, crewai internally summarize all the past message and then hit the llm again, do update the crewai version and try this again.