Spider Scraper Output Parsing Error

Milan · November 25, 2024, 4:28pm

Hi,

i use SpiderTool for some agents with gpt-4o and I get this errors nearly everytime the agents searches the web:

Agent: Professioneller YouTube-Content-Researcher

Thought: The search results did not provide specific information from the Handpan-Portal website related to video conclusions or CTAs. I will access the Handpan-Portal YouTube channel directly to gather potential insights on how they typically conclude their videos related to Handpan and musical instruments.

Using tool: Spider scrape & crawl tool

Tool Input:

“{"url": "https:// www .youtube.com/@handpan-portal/videos", "params": {"mode": "scrape"}}”

Tool Output:

[{‘content’: ‘(https:// www .youtube.com/)(https:// www .youtube.com/)\n[About](https:// www .youtube.com/about/)[Press](https:// www .youtube.com/about/press/)[Copyright](https:// www .youtube.com/about/copyright/)[Contact us](https:// www .youtube.com/t/contact_us/)[Creators](https:// www .youtube.com/creators/)[Advertise](https:// www .youtube.com/ads/)[Developers](https:// YouTube | Google for Developers)[Terms](https:// www .youtube.com/t/terms)[Privacy](https:// www .youtube.com/t/privacy)[Policy & Safety](https:// www .youtube.com/about/policies/)[How YouTube works](https:// www .youtube.com/howyoutubeworks?utm_campaign=ytgen&utm_source=ythp&utm_medium=LeftNav&utm_content=txt&u=https:// www .youtube.com/howyoutubeworks?utm_source=ythp&utm_medium=LeftNav&utm_campaign=ytgen)[Test new features](https:// www .youtube.com/new)[NFL Sunday Ticket](https:// The Exclusive Home of NFL Sunday Ticket - YouTube & YouTube TV)\n© 2024 Google LLC’, ‘costs’: {‘ai_cost’: 0, ‘compute_cost’: 0.0001, ‘file_cost’: 0.0007, ‘total_cost’: 0.0009, ‘transform_cost’: 0.0001}, ‘error’: None, ‘status’: 200, ‘url’: ‘https:// www .youtube.com/@handpan-portal/videos’}]
Error parsing LLM output, agent will retry: I did it wrong. Invalid Format: I missed the ‘Action:’ after ‘Thought:’. I will do right next, and don’t use a tool I have already used.

If you don’t need to use any more tools, you must give your best complete final answer, make sure it satisfy the expect criteria, use the EXACT format below:

Thought: I now can give a great answer
Final Answer: my best complete final answer to the task.

Does somebody has an idea how to solve this?

Best regards and thanks for any help!
Milan

rokbenko · November 25, 2024, 5:24pm

Can you please show the code? How do you set the LLM?

Milan · November 25, 2024, 5:44pm

Hi, this is the same answer as in my other topic: Spider Web Scraper

Yes for sure.

I use a yaml file to define an agent. The defintion is in German but I translated it for you,
topic_researcher:
role: >
…
goal: |
…
Suche im Internet:
Nutz fuer die Suche im Internet das Tool “SerperDevTool”. Waehle danach die passendes Suchergebnisse aus und Crawle deren URLs mit dem Tool “SpiderTool”.

Translation:
Search on the Internet:
Use the tool "SerperDevTool" for searching on the Internet. Then select the appropriate search results and crawl their URLs with the tool "SpiderTool".

backstory: |
…
llm: openai/gpt-4o

I use a yaml file to define the task:

research_task:
description: |
…

1. **Initiale Informationssuche:**
   - **Tool:** Verwende das Tool **SerperDevTool**, um im Internet nach relevanten Informationen zum Thema "{topic}" zu suchen.
   - **Suchbegriffe:** Nutze kurze und klar verständliche Suchbegriffe, die der typischen Suchweise von Menschen im Internet entsprechen.
   - **Auswahl der Ergebnisse:** Wähle die am besten passenden und relevantesten Suchergebnisse aus und notiere deren URLs.

2. **Inhaltsanalyse:**
   - **Tool:** Verwende das Tool **Spider scrape & crawl tool**, um die ausgewählten URLs zu scrapen und deren Inhalte vollständig zu extrahieren. WICHTIG: Halte dich EXAKT an die Anleitung des Tools und die Formatvorgaben!
   -        - **Datenextraktion:** Sammle alle relevanten Informationen aus den gescrapten Webseiten.
 

Translation:
1. **Initial information search:**
   - **Tool:** Use the tool **SerperDevTool** to search the internet for relevant information on the topic "{topic}".
   - **Keywords:** Use short and easily understandable keywords that correspond to the typical search behavior of people on the internet.
   - **Selection of results:** Choose the most suitable and relevant search results and note their URLs.

2. **Content analysis:**
   - **Tool:** Use the tool **Spider scrape & crawl tool** to scrape the selected URLs and extract their content completely. IMPORTANT: Follow the instructions of the tool and the formatting guidelines EXACTLY!
   - **Data extraction:** Collect all relevant information from the scraped websites.

expected_output: |
…
agent: topic_researcher

In the crew file ill give the tools to the agent:

@agent
def topic_researcher(self) → Agent:
return Agent(
config=self.agents_config[‘topic_researcher’],
# tools=[MyCustomTool()], # Example of custom tool, loaded on the beginning of file
verbose=True,
tools=[SerperDevTool(), SpiderTool()]
)

I hope this is what you were asking for.

Thank you very much for your help.

Best
Milan

rokbenko · November 25, 2024, 6:49pm

@Milan Make sure to set the gpt-4o to all agents! By default, the gpt-4o-mini is used, which is a less capable LLM and may cause errors. Try this and let me know if it fixes the issue.

Milan · November 26, 2024, 6:18am

Hi and thank you very much for taking the time to help me!

I had already placed all research agents on gpt-4o. However, the error still occurred. I just did a test with the research agents on gpt-4o-mini. There were no errors in one run. However, they responded to me in English instead of German

Best
Milan

rokbenko · November 26, 2024, 10:00am

Wait, what? It worked with the mini version?

Topic		Replies	Views
Spider Web Scraper CrewAI Community Support tools_issues	9	281	December 19, 2024
Why is WebsiteSearchTool not parsing the given URL? CrewAI Community Support tools_issues	3	293	December 13, 2024
Serper API tool from crewai tools throwing validation error General	7	624	May 21, 2025
Can Agent input argument for tool(s)? CrewAI Community Support tools_issues , agent	4	244	November 29, 2024
How ScrapeWebsiteTool works with 2 Agents CrewAI Community Support tools_issues , agent	3	149	February 11, 2025