CSVSearchTool always returns only 3 results - How to increase the limit?

Problem: I’m using CSVSearchTool to search for products in a CSV database with thousands of records, but regardless of the query, I always receive only 3 results, even when I know there are many more relevant products in the database.

Current code:

python

from crewai_tools import CSVSearchTool
import os
from dotenv import load_dotenv

load_dotenv(override=True)
os.environ["OPENAI_API_KEY"] = os.getenv("CHAVE_API")

csv_tool = CSVSearchTool(
    csv="base_produtos.csv",
    config=dict(
        llm=dict(
            provider="openai",
            config=dict(
                model="gpt-4.1-mini",
                temperature=0.0
            ),
        ),
        embedder=dict(
            provider="openai", 
            config=dict(
                model="text-embedding-3-small"                                
            ),
        ),
    )
)

Example of the problem:

  • Query: “Do you have dipyrone?”
  • Current result: ["DIPYRONE 1G 10CP", "DIPYRONE 500MG 30CP", "DIPYRONE 500MG ENV 10CP"] (always 3)
  • Expected result: All available dipyrone products (should be 8-10 products)

Attempts made:

  1. Tested with different queries - always 3 results
  2. Verified there are more products in the CSV database - there are dozens of dipyrone variations
  3. Tried changing the LLM temperature - no effect

Question: How can I configure the CSVSearchTool to return more than 3 results? Is there any parameter like k, limit, or results_limit that I can use in the configuration?

Thanks for any help!

Hey Valclemir,

Try using a custom Adapter like the one I’ve laid out below. I took the chance to swap out the search functionality between Adapters: the original one relies on Embedchain’s .query() method, but the new one uses the .search() method instead, which provides the flexibility to adjust the number of results you get back.

I’ve also beefed up the text that’s returned to the LLM. You really want to think about the tool output as something that enriches the LLM’s understanding. Whenever you’re building custom tools, consider this output as part of the LLM’s actual prompt.

Finally, you’ll notice I got rid of the llm attribute you were passing in your configuration and just stuck with the embedder. Even the original Adapter doesn’t require the llm attribute. It’s only used (in the default Adapter) if you set summarize=True. In that case, Embedchain itself hands your Agent a result that’s been summarized by the LLM specified in that parameter. Since you’re using the raw data from your search, that llm parameter is pretty much never necessary.

from typing import Any
from crewai_tools.tools.rag.rag_tool import Adapter
from embedchain import App

class CustomEmbedchainAdapter(Adapter):
    embedchain_app: App

    def query(self, question: str) -> str:
        response = "---\n"
        response += f"**Additional Context for Query '{question}':**\n"
    
        search_results = self.embedchain_app.search(
            query=question,
            num_documents=5, # Up to 5 relevant chunks
        )
        
        if search_results:
            for context, metadata in (item.values() for item in search_results):
                response += f"**Context:** '{context}' "
                response += f"(**Metadata:** '{metadata}')\n"
        else:
            response += "No relevant context found for this query.\n"
        
        response += "---\n"
        return response.strip()

    def add(self, *args: Any, **kwargs: Any) -> None:
        self.embedchain_app.add(*args, **kwargs)

#
# Test it out
#

from crewai_tools import CSVSearchTool
import os

os.environ["OPENAI_API_KEY"] = "<YOUR_OPENAI_API_KEY>"

embedchain_config = {
    "embedder": {
        "provider": "openai",
        "config": {
            "model": "text-embedding-3-small"
        }
    }
}

csv_tool = CSVSearchTool(
    csv="/path/to/your/file.csv",
    config=embedchain_config,
    adapter=CustomEmbedchainAdapter(
        embedchain_app=App.from_config(config=embedchain_config)
    )
)

print(
    csv_tool.run("<YOUR_QUERY>")
)

Max, thank you very much for your help :slight_smile:

1 Like