Problem: I’m using CSVSearchTool
to search for products in a CSV database with thousands of records, but regardless of the query, I always receive only 3 results, even when I know there are many more relevant products in the database.
Current code:
python
from crewai_tools import CSVSearchTool
import os
from dotenv import load_dotenv
load_dotenv(override=True)
os.environ["OPENAI_API_KEY"] = os.getenv("CHAVE_API")
csv_tool = CSVSearchTool(
csv="base_produtos.csv",
config=dict(
llm=dict(
provider="openai",
config=dict(
model="gpt-4.1-mini",
temperature=0.0
),
),
embedder=dict(
provider="openai",
config=dict(
model="text-embedding-3-small"
),
),
)
)
Example of the problem:
- Query: “Do you have dipyrone?”
- Current result:
["DIPYRONE 1G 10CP", "DIPYRONE 500MG 30CP", "DIPYRONE 500MG ENV 10CP"]
(always 3)
- Expected result: All available dipyrone products (should be 8-10 products)
Attempts made:
- Tested with different queries - always 3 results
- Verified there are more products in the CSV database - there are dozens of dipyrone variations
- Tried changing the LLM temperature - no effect
Question: How can I configure the CSVSearchTool
to return more than 3 results? Is there any parameter like k
, limit
, or results_limit
that I can use in the configuration?
Thanks for any help!
Hey Valclemir,
Try using a custom Adapter
like the one I’ve laid out below. I took the chance to swap out the search functionality between Adapter
s: the original one relies on Embedchain’s .query()
method, but the new one uses the .search()
method instead, which provides the flexibility to adjust the number of results you get back.
I’ve also beefed up the text that’s returned to the LLM. You really want to think about the tool output as something that enriches the LLM’s understanding. Whenever you’re building custom tools, consider this output as part of the LLM’s actual prompt.
Finally, you’ll notice I got rid of the llm
attribute you were passing in your configuration and just stuck with the embedder
. Even the original Adapter
doesn’t require the llm
attribute. It’s only used (in the default Adapter
) if you set summarize=True
. In that case, Embedchain itself hands your Agent a result that’s been summarized by the LLM specified in that parameter. Since you’re using the raw data from your search, that llm
parameter is pretty much never necessary.
from typing import Any
from crewai_tools.tools.rag.rag_tool import Adapter
from embedchain import App
class CustomEmbedchainAdapter(Adapter):
embedchain_app: App
def query(self, question: str) -> str:
response = "---\n"
response += f"**Additional Context for Query '{question}':**\n"
search_results = self.embedchain_app.search(
query=question,
num_documents=5, # Up to 5 relevant chunks
)
if search_results:
for context, metadata in (item.values() for item in search_results):
response += f"**Context:** '{context}' "
response += f"(**Metadata:** '{metadata}')\n"
else:
response += "No relevant context found for this query.\n"
response += "---\n"
return response.strip()
def add(self, *args: Any, **kwargs: Any) -> None:
self.embedchain_app.add(*args, **kwargs)
#
# Test it out
#
from crewai_tools import CSVSearchTool
import os
os.environ["OPENAI_API_KEY"] = "<YOUR_OPENAI_API_KEY>"
embedchain_config = {
"embedder": {
"provider": "openai",
"config": {
"model": "text-embedding-3-small"
}
}
}
csv_tool = CSVSearchTool(
csv="/path/to/your/file.csv",
config=embedchain_config,
adapter=CustomEmbedchainAdapter(
embedchain_app=App.from_config(config=embedchain_config)
)
)
print(
csv_tool.run("<YOUR_QUERY>")
)
Max, thank you very much for your help 
1 Like