I need to locate and extract a specific web element from a webpage. The challenge is that I cannot rely on selectors (e.g., CSS or XPath) since I don’t know the exact HTML structure of the page in advance. Instead, I can describe the target element, such as: it should be a div
, table
, list
, or grid
containing a structured set of items that represent users, participants etc - some kind of catalog.
For simpler pages, I’ve successfully used the FirecrawlScrapeWebsiteTool to scrape the entire page content and passed the output to a second agent. This agent could locate the element based on my description.
However, this approach fails on larger pages where the HTML content is massive (thousands of lines). Such large content often exceeds the context window size for processing, making it impractical.
I’ve tried several tools, including:
- WebsiteSearchTool
- SeleniumScrapingTool
- ScrapeWebsiteTool
- FirecrawlScrapeWebsiteTool
But none have consistently succeeded in efficiently locating the element.
I attempted to save the HTML markup of the page to a local file and read it using the FileReadTool. My plan was to process the saved HTML file to locate the desired web element. However, I encountered this recurring error:
File reading error: 'charmap' codec can't decode byte 0x98 in position ...
This probably suggests the FileReadTool is unable to properly decode the saved HTML file, likely due to encoding issues.
So I would appreciate if someone, who already solved similar problems will suggest:
- What’s the most efficient way to solve my task?
- Should I focus on locating the element directly on the page in real-time, possibly using a tool that can interpret the page dynamically?
- If saving the HTML and processing it offline is the better approach, what tool can I use to accurately read and process the HTML file?