Issue with CSVSearchTool

jets6276 · October 31, 2024, 10:35pm

I am using a csv file with tabular data; I have a column called ‘product_name’ which I pass as part of the inputs to the crew. Then I ask an Agent to use the CSVSearchTool to retrieve the data for this product, but the data does not match, is as if the Agent retrieves data for other products. Is there documentation or examples that show how to use CSV files or structured data and how Agents can read it and use it? I have set the LLM temperature to zero to reduce the risk of hallucinations, but it seems the Agent is not able to find the correct information from CSV file.

rokbenko · November 1, 2024, 1:50pm

@jets6276 Set allow_code_execution to True for the agent. This allows the agent to write and run code when executing tasks, which should help improve the performance. Default is False .

Leticia_B · December 10, 2024, 1:53pm

rokbenko’s solution didn’t seem to solve the problem for me. What I’ve done is follow this example where the CSV search tool is initialized with the csv path and then sent as the tool for the agent.

example:
from crewai_tools import CSVSearchTool
csv_search_tool = CSVSearchTool(csv_file_path)

then inside the agent:

  @agent
  def agent_name(self) -> Agent:
      return Agent(
          config=self.agents_config['agent_name'],
          tools=[csv_search_tool]
      )

avfranco-br · April 8, 2025, 7:09pm

Hi Leticia, even following this approach, the quality of response hasn’t improved at all. I’ve been using ChromaDB,and have tried different models e.g. OpenAI and Gemini. Have you been able to achieve satisfactory performance with this tool? If so, any advice you could share? Many thanks, Alexandre

Max_Moura · April 8, 2025, 8:20pm

Hey Alexandre, welcome aboard!

I haven’t really dug into the CSVSearchTool code myself. So, while you wait for someone to give you a more spot-on answer for your use case, I’m going to take your question as a chance for us to reflect a bit on something Barry Zhang from Anthropic mentioned in this presentation:

Think like your agents

Alright, so we’ve got a CSV file packed with information. But let me simplify things a bit. Here’s what our file looks like:

Name,Age,ID,Pet
Peter,40,89,Lucy
Susan,35,11,Buddy
David,28,22,Daisy
Laura,32,18,Rocky
Alexandre,30,23,Bella
Mary,25,56,Max

Now, say you ask your agent: “What’s Alexandre’s age?” or even “Who owns Max?” As part of the RAG (Retrieval-Augmented Generation) process — CSVSearchTool is one of those RAG tools — your file gets broken up into chunks. After a semantic search, the agent gets the following chunk as context for both questions:

Laura,32,18,Rocky
Alexandre,30,23,Bella
Mary,25,56,Max

Notice what’s going on here? Thinking like our agents, we find “Alexandre” and see the numbers 30 and 23. So, what’s Alexandre’s age? Then we also see “Mary” and “Max” on the same line, and with a bit of intelligence, we can guess there’s some kind of relationship there. But who exactly owns whom? See how, sometimes, by thinking like our agents, we start spotting weaknesses in how we’re tackling real-world use cases?

Now imagine instead that you received a JSON chunk made up of those same last three rows — still assuming your data got split during RAG:

[
  {
    "Name": "Laura",
    "Age": 32,
    "ID": 18,
    "Pet": "Rocky"
  },
  {
    "Name": "Alexandre",
    "Age": 30,
    "ID": 23,
    "Pet": "Bella"
  },
  {
    "Name": "Mary",
    "Age": 25,
    "ID": 56,
    "Pet": "Max"
  }
]

Our content is still fragmented, right? But this time, each fragment carries enough information to answer those original questions, and answer them well. I’m not sure if this exactly lines up with your real use case, but this kind of thought process can definitely help whenever you’re trying to solve real problems with agentic systems, whether that’s workflows or agents.

By the way, here’s some Python code that converts a CSV file into a JSON format like above, which you can then use with the JSONSearchTool. Happy coding!

import pandas as pd
import json

def csv_to_json(csv_filepath, json_filepath):
    """
    Reads a CSV file using pandas and saves its contents as a JSON file.

    Args:
        csv_filepath (str): Path to the input CSV file.
        json_filepath (str): Path where the output JSON file will be saved.
    """
    try:
        # Load CSV into a pandas DataFrame
        df = pd.read_csv(csv_filepath)

        # Convert DataFrame to a list of dictionaries
        json_data = df.to_dict(orient='records')

        # Write JSON data to file with indentation for readability
        with open(json_filepath, "w", encoding="utf-8") as f:
            json.dump(json_data, f, indent=2)

        print(f"Successfully converted '{csv_filepath}' to '{json_filepath}'")
    except FileNotFoundError:
        print(f"Error: CSV file not found at '{csv_filepath}'")
    except Exception as e:
        print(f"An unexpected error occurred: {e}")

csv_file = "pets.csv"
json_file = "pets.json"
csv_to_json(csv_file, json_file)

Inesh · April 15, 2025, 8:17am

Hey,
I had a similar problem using CSVSearchTool(file_path). The agent was reading other csv than the file_path. When checking keyword_cpa_tool.adapter.embedchain_app.get_data_sources(), I see way more sources than my unique file_path, so it seems like the tool has access to all the past files I used. Any clue how I could clean this history?

Thanks,
Inès

Max_Moura · April 15, 2025, 11:03am

Hey Inès!

If you listed all the data sources using:

keyword_cpa_tool.adapter.embedchain_app.get_data_sources()

Then I think you could probably reset them with:

keyword_cpa_tool.adapter.embedchain_app.reset()

Inesh · April 15, 2025, 11:19am

Hey Max, Thanks for your answer, it solved my problem

avfranco-br · April 27, 2025, 10:26am

Apologies for the late reply @Max_Moura! Thank you very much for your help with this. To solve my problem, I’ve changed to Text-to-SQL where no semantic search was needed and built my own RAG CSV tool using Chroma collection what’s improved the response for the semantic-search part. However, I think your proposed solution should be even better and I’ll try it later and post the outcomes here. Thanks again, Alexandre

system · May 8, 2025, 6:49pm

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
JSON Search Tool is not working as expected CrewAI Community Support tools_issues	2	107	April 16, 2025
Having Agents pass crew input Pandas DataFrame to tools Crews tools_issues , crewai , memory	0	510	January 30, 2025
CSVSearchTool Tool Usage Failed General tools_issues	3	88	May 16, 2025
Help with KeyError CrewAI Community Support	4	367	February 6, 2025
FileReadTool - issue with input file_path General	0	63	April 15, 2025

Issue with CSVSearchTool

Related topics