Hi, Im not the experienced user of crewAI, so maybe my question is pretty simple, but I havent found the answer here yet.
So I have very simple test setup:
I have a file with a list of around 50 urls I want to visit and collect similar set of data.
I use gpt-4o-mini, my crew has only 1 agent and 1 task: @agent
def data_collector(self) → Agent:
return Agent(
config=self.agents_config[‘data_collector’],
verbose=True,
tools=[file_read_tool, selenium_tool]
)
where
selenium_tool = SeleniumScrapingTool()
file_read_tool = FileReadTool()
So when I run crewai, it successfully collects the data I need but never runs to the end of the list, but always process a few urls (14-19) and finishes without any errors or warnings.
Yes I’ve read the similar question here:
But setting result_as_answer parameter hardly works here.
Hitting the 4o-mini output window context size (16000 as far as I know) is also probably not the reason, as I check the number of output tokens used for each run and it never is close to 16000. Last run spent 12000 for example.
But anyway I understand that if I want to collect data from huge amount of urls, lets say 100-200-300 - I will run out of my output tokens for any model.
So how to avoid this problem? Maybe I should create a flow and fetch urls from file and then use kickoff_for_each(urls), like it was explained in the Advanced Use Cases with crewAI course?
Please, suggest the direction which I have to follow in order to solve this issue.
HI, I’ve tries to play with max_iter, as well as with max_rpm, setting them to high values and None, respectively.
It seems that number of processed items increased slightly, from less them 20 to 30-35.
But nevertheless agent stops, not finishing the input list, without any explanations.
I wonder, if there’s any way to force it to work on all the list`s items?
I wrote this demand in the description of the task, but this demand is being ignored.
Update
Also i tried the option to hardcode the input list of urls, instead of reading it from a file. The results were somewhat better - on some runs almost all urls were processes, except some with issues, Im currently working on.
But some runs again ended with only 20-30 urls processed.
So the results are not stable, it is the main problem
So, as I planned, I switched to flow and kickoff_for_each solution and this way my agent was able to process all the list of urls right to the very end.
Changing parameters like max_iter didn`t bring results, so the possible solution is using kickoff_for_each looping through the list.