Issue with the search_tool

Please help me solve this

C:\Users\puppy\prompt_mail\promptmail.venv\Lib\site-packages\pydantic_internal_generate_schema.py:623: UserWarning: is not a Python type (it may be an instance of an object), Pydantic will allow any object with no validation since we cannot even enforce that the input is an instance of the given type. To get rid of this error wrap the type with pydantic.SkipValidation.
warn(
:cross_mark: Error while running PromptMail: ‘search_tool’

Welcome to the community.

Quick tip: to boost your chances of getting help, you should give us more context about your problem. Show us how you’re implementing it and provide the full error traceback.

1 Like

Hi am facing similar issue I can share the details from end.
I am trying to learn crewai and specifically rag implementation using PDFSearchTool, firstly I am getting:

The simple code I ran was

    # --- Tools ---
# PDF SOURCE: https://www.gpinspect.com/wp-content/uploads/2021/03/sample-home-report-inspection.pdf
pdf_search_tool = PDFSearchTool(
    pdf="./example_home_inspection.pdf",
)

As based on the image, my pdf to chromadb conversion was not being carried out successfully.
when I run that code it gives the

Pydantic will allow any object with no validation since we cannot even enforce that the input is an insntype. Toetance of the given type. To get rid of this error wrap the type with pydantic.SkipValidation.
warn(

and the chromdb insert batches is at 0% and the code completes it’s execution.

**edit

I tried using a older version of pydantic to see if it resolves the issue I used 2.10, tho I was able to get rid of the warning but the 0% issue still persists

Code:

from crewai_tools import CSVSearchTool

tool = CSVSearchTool(csv='Pokemon.csv')

Just a quick question: Have you got a simple easy CrewAI version working?
I found it better to get something simple working to understand the quirks of CrewAI first then build on this. This helps remove issues of versions of dependancies.

Try this https://www.youtube.com/watch?v=-kSOTtYzgEw

If you have one working already then it could be something about the PDF file.

Hi Tony, Yes I have a sample code running. So what I am currently working on is I wanted to demo a Agentic framework that using Multiple agents and one of which should be a RAG Implementation in a chat based format. I am using local model via ollama phi4-mini and have a simple terminal chat operational:

import os
from crewai import LLM, Agent, Task, Crew, Process


VERBOSE_MODE = False 


terminal_assistant = Agent(
    role='Terminal Assistant',
    goal='Answer questions asked by the user in the terminal accurately and concisely.',
    backstory=(
        "You are an AI assistant powered by the Gemma model running locally via Ollama. "
        "Your purpose is to directly answer the user's prompts entered in the terminal."
    ),
    llm=LLM(model="ollama/phi4-mini", base_url="http://localhost:11434"),
    verbose=VERBOSE_MODE,
    allow_delegation=False,
)


query_task = Task(
    description=(
        "Analyze the user's question: '{user_question}'. "
        "Provide a clear, direct, and helpful answer based on your knowledge."
    ),
    expected_output='A concise and accurate textual answer to the user\'s question.',
    agent=terminal_assistant,
)


qa_crew = Crew(
    agents=[terminal_assistant],
    tasks=[query_task],
    process=Process.sequential,
    verbose=VERBOSE_MODE
)




if __name__ == "__main__":
    print("-----------------------------------------------------")
    print("CrewAI Terminal Assistant")
    print("-----------------------------------------------------")
    print("Enter your question below (type 'exit' to quit):")

while True:
    user_input = input("\nYour Question: ")
    if user_input.lower() == 'exit':
        print("Exiting...")
        break
    if not user_input:
        print("Please enter a question.")
        continue

    crew_inputs = {'user_question': user_input}
    print("\n... Processing your question ...\n")
    try:
        result = qa_crew.kickoff(inputs=crew_inputs)
        print("\n--- Assistant's Response ---")
        print(result)
        print("--------------------------")
    except Exception as e:
        print(f"\n--- An Error Occurred ---")
        print(f"Error: {e}") 
        print("-------------------------") 

Following this I was testing RAG implementation using PDF and was facing same issue as above. So I tried CSV and Json Tools as well and it yield the same result of 0% and then the code completing. I folllowed the same syntax as in documentation:

from crewai_tools import JSONSearchTool


# Restricting search to a specific JSON file
# Use this initialization method when you want to limit the search scope to a specific JSON file.
tool = JSONSearchTool(json_path='data.json')

from crewai_tools import CSVSearchTool


tool = CSVSearchTool(csv='Pokemon.csv')

image

I would follow the video example by Tony https://www.youtube.com/watch?v=-kSOTtYzgEw to get it working. As this is a Knowledgesource (RAG) example.

1 Like

Hey, @Open_Mail.

I took a look at the configuration you shared, and I didn’t see the embedding setup.

Just to recap, all of CrewAI’s RAG tools (like PDFSearchTool, YoutubeChannelSearchTool, etc.) rely on having an embedding model configured according to the Embedchain library’s standard.

Since it looks like you’re using Ollama models locally, I was expecting to see something like this:

from crewai_tools import PDFSearchTool

embedchain_config = {
    "embedder": {
        "provider": "ollama",
        "config": {
            "model": "nomic-embed-text",
            "base_url": "http://localhost:11434"
        }
    }
}

pdf_search_tool = PDFSearchTool(
    pdf="./example_home_inspection.pdf",
    config=embedchain_config
)

Could you please check if adding this modification gets your setup working successfully?

Hi @Max_Moura , Thanks for pointing it out. I tried the snippet u provided me:

 embedchain_config = {
    "embedder": {
        "provider": "ollama",
        "config": {
            "model": "nomic-embed-text",
            "base_url": "http://localhost:11434"
        }
    }
}

tool = PDFSearchTool(pdf="../PDFs/research_paper.pdf",embedchain_config=embedchain_config)

and I am getting the following error:

D:\Sample\1.Testing\myenv\Lib\site-packages\pydantic\_internal\_generate_schema.py:623: UserWarning: <built-in function callable> is not a Python type (it may be an instance of an object), Pydantic will allow any object with no validation since we cannot even enforce that the input is an instance of the given type. To get rid of this error wrap the type with `pydantic.SkipValidation`.
  warn(
D:\Sample\1.Testing\myenv\Lib\site-packages\chromadb\api\rust.py:236: DeprecationWarning: legacy embedding function config
D:\Sample\1.Testing\myenv\Lib\site-packages\chromadb\api\models\CollectionCommon.py:158: DeprecationWarning: legacy embedding function config
  return load_collection_configuration_from_json(self._model.configuration_json)
Inserting batches in chromadb:   0%|                                                                                    | 0/1 [00:01<?, ?it/s]

And after this, same as previous situation the code is completes execution at 0% as shown above

Alright, here’s a fully working version of your code with the following setup:

  • qwen2.5:3b for the LLM via Ollama, because phi4-mini’s performance was pretty frustrating.
  • nomic-embed-text for embeddings, also via Ollama.
  • https://www.gpinspect.com/wp-content/uploads/2021/03/sample-home-report-inspection.pdf as the data source.

Directory structure:

crewai_local_qa/
├── main.py
└── sample-home-report-inspection.pdf

main.py file:

from crewai import LLM, Agent, Task, Crew, Process
from crewai_tools import PDFSearchTool

VERBOSE_LOGGING = True

embedding_config = {
    "embedder": {
        "provider": "ollama",
        "config": {
            "model": "nomic-embed-text",
            "base_url": "http://localhost:11434",
        },
    }
}

pdf_search_tool = PDFSearchTool(
    pdf="./sample-home-report-inspection.pdf",
    config=embedding_config
)

language_model = LLM(
    model="ollama/qwen2.5:3b",
    base_url="http://localhost:11434",
    temperature=0.1
)

qa_agent = Agent(
    role="Information Retrieval Specialist",
    goal=(
        "Accurately answer user questions based on the content of the "
        "provided document."
    ),
    backstory=(
        "You are an AI assistant specialized in extracting information "
        "from the given documents. Your sole purpose is to consult "
        "the documents using your tools and provide answers based "
        "exclusively on its contents. Do not use any external "
        "knowledge."
    ),
    llm=language_model,
    tools=[pdf_search_tool],
    verbose=VERBOSE_LOGGING,
    allow_delegation=False,
)

answer_generation_task = Task(
    description=(
        "Analyze the user's question:\n\n'{user_question}'\n\n"
        "Use available tools to locate relevant information. Formulate a "
        "clear and direct answer based *only* on the retrieved context. "
        "If the document does not contain the answer, explicitly state "
        "that the information is not available in the provided context."
    ),
    expected_output=(
        "A concise, factual summary derived solely from the retrieved "
        "context, presented in a single paragraph. If the information "
        "cannot be found, clearly state that."
    ),
    agent=qa_agent,
)

rag_crew = Crew(
    agents=[qa_agent],
    tasks=[answer_generation_task],
    process=Process.sequential,
    verbose=VERBOSE_LOGGING,
)

if __name__ == "__main__":
    result = rag_crew.kickoff(
        inputs={
            "user_question": "Any recommendations on hot and cold water lines?"
        }
    )
    print("\n--- Assistant's Response ---")
    print(result.raw)
    print("--------------------------")

First off, make sure this version runs smoothly on your machine before you start tweaking it for your real use case.

Hi @Max_Moura , Thanks for the sample code, But Still I am facing the same error.
The only change I made was the ollama mode. I’ll share what happened, after the first execution it prompted me to pip install ollama

raise ImportError("Ollama Embedder requires extra dependencies. Install with `pip install ollama`") from None
ImportError: Ollama Embedder requires extra dependencies. Install with `pip install ollama`

Then After Installed it and executed the main again I was stuck (Waited 5-10 mins) at the following:

D:\Sample\8.sample_code_provided\venv\Lib\site-packages\pydantic\_internal\_generate_schema.py:623: UserWarning: <built-in function callable> is not a Python type (it may be an instance of an object), Pydantic will allow any object with no validation since we cannot even enforce that the input is an instance of the given type. To get rid of this error wrap the type with `pydantic.SkipValidation`.
  warn(
D:\Sample\8.sample_code_provided\venv\Lib\site-packages\ollama\_types.py:81: PydanticDeprecatedSince211: Accessing the 'model_fields' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.
  if key in self.model_fields:
D:\Sample\8.sample_code_provided\venv\Lib\site-packages\embedchain\embedder\ollama.py:27: LangChainDeprecationWarning: The class `OllamaEmbeddings` was deprecated in LangChain 0.3.1 and will be removed in 1.0.0. An updated version of the class exists in the :class:`~langchain-ollama package and should be used instead. To use it run `pip install -U :class:`~langchain-ollama` and import as `from :class:`~langchain_ollama import OllamaEmbeddings``.
  embeddings = OllamaEmbeddings(model=self.config.model, base_url=config.base_url)
Inserting batches in chromadb:   0%|                                                                                    | 0/3 [00:00<?, ?it/s]

So naturally I keyboard interrupted using clt+c and this is complete error code:

D:\Sample\8.sample_code_provided\venv\Lib\site-packages\pydantic\_internal\_generate_schema.py:623: UserWarning: <built-in function callable> is not a Python type (it may be an instance of an object), Pydantic will allow any object with no validation since we cannot even enforce that the input is an instance of the given type. To get rid of this error wrap the type with `pydantic.SkipValidation`.
  warn(
D:\Sample\8.sample_code_provided\venv\Lib\site-packages\ollama\_types.py:81: PydanticDeprecatedSince211: Accessing the 'model_fields' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.
  if key in self.model_fields:
D:\Sample\8.sample_code_provided\venv\Lib\site-packages\embedchain\embedder\ollama.py:27: LangChainDeprecationWarning: The class `OllamaEmbeddings` was deprecated in LangChain 0.3.1 and will be removed in 1.0.0. An updated version of the class exists in the :class:`~langchain-ollama package and should be used instead. To use it run `pip install -U :class:`~langchain-ollama` and import as `from :class:`~langchain_ollama import OllamaEmbeddings``.
  embeddings = OllamaEmbeddings(model=self.config.model, base_url=config.base_url)
Inserting batches in chromadb:   0%|                                                                                    | 0/3 [02:18<?, ?it/s]
Traceback (most recent call last):
  File "D:\Sample\8.sample_code_provided\venv\Lib\site-packages\urllib3\util\connection.py", line 73, in create_connection
sock.connect(sa)
ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\Sample\8.sample_code_provided\main.py", line 16, in <module>
pdf_search_tool = PDFSearchTool(
                  ^^^^^^^^^^^^^^
  File "D:\Sample\8.sample_code_provided\venv\Lib\site-packages\crewai_tools\tools\pdf_search_tool\pdf_search_tool.py", line 34, in __init__
self.add(pdf)
  File "D:\Sample\8.sample_code_provided\venv\Lib\site-packages\crewai_tools\tools\pdf_search_tool\pdf_search_tool.py", line 60, in add
super().add(*args, **kwargs)
  File "D:\Sample\8.sample_code_provided\venv\Lib\site-packages\crewai_tools\tools\rag\rag_tool.py", line 57, in add
self.adapter.add(*args, **kwargs)
  File "D:\Sample\8.sample_code_provided\venv\Lib\site-packages\crewai_tools\adapters\pdf_embedchain_adapter.py", line 32, in add
self.embedchain_app.add(*args, **kwargs)
  File "D:\Sample\8.sample_code_provided\venv\Lib\site-packages\embedchain\embedchain.py", line 192, in add
documents, metadatas, _ids, new_chunks = self._load_and_embed(
                                         ^^^^^^^^^^^^^^^^^^^^^
  File "D:\Sample\8.sample_code_provided\venv\Lib\site-packages\embedchain\embedchain.py", line 416, in _load_and_embed
self.db.add(documents=batch_docs, metadatas=batch_meta, ids=batch_ids, **kwargs)
  File "D:\Sample\8.sample_code_provided\venv\Lib\site-packages\embedchain\vectordb\chroma.py", line 159, in add
self.collection.add(
  File "D:\Sample\8.sample_code_provided\venv\Lib\site-packages\chromadb\api\models\Collection.py", line 81, in add
add_request = self._validate_and_prepare_add_request(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Sample\8.sample_code_provided\venv\Lib\site-packages\chromadb\api\models\CollectionCommon.py", line 90, in wrapper
return func(self, *args, **kwargs)
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Sample\8.sample_code_provided\venv\Lib\site-packages\chromadb\api\models\CollectionCommon.py", line 213, in _validate_and_prepare_add_request
add_embeddings = self._embed_record_set(record_set=add_records)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Sample\8.sample_code_provided\venv\Lib\site-packages\chromadb\api\models\CollectionCommon.py", line 526, in _embed_record_set
return self._embed(input=record_set[field])  # type: ignore[literal-required]
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Sample\8.sample_code_provided\venv\Lib\site-packages\chromadb\api\models\CollectionCommon.py", line 539, in _embed
return self._embedding_function(input=input)
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Sample\8.sample_code_provided\venv\Lib\site-packages\chromadb\api\types.py", line 466, in __call__
result = call(self, input)
         ^^^^^^^^^^^^^^^^^
  File "D:\Sample\8.sample_code_provided\venv\Lib\site-packages\embedchain\embedder\base.py", line 20, in __call__
return self.embedding_fn(input)
       ^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Sample\8.sample_code_provided\venv\Lib\site-packages\langchain_community\embeddings\ollama.py", line 214, in embed_documents
embeddings = self._embed(instruction_pairs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Sample\8.sample_code_provided\venv\Lib\site-packages\langchain_community\embeddings\ollama.py", line 202, in _embed
return [self._process_emb_response(prompt) for prompt in iter_]
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Sample\8.sample_code_provided\venv\Lib\site-packages\langchain_community\embeddings\ollama.py", line 167, in _process_emb_response
res = requests.post(
      ^^^^^^^^^^^^^^
  File "D:\Sample\8.sample_code_provided\venv\Lib\site-packages\requests\api.py", line 115, in post
return request("post", url, data=data, json=json, **kwargs)
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Sample\8.sample_code_provided\venv\Lib\site-packages\requests\api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Sample\8.sample_code_provided\venv\Lib\site-packages\requests\sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Sample\8.sample_code_provided\venv\Lib\site-packages\requests\sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Sample\8.sample_code_provided\venv\Lib\site-packages\requests\adapters.py", line 667, in send
resp = conn.urlopen(
       ^^^^^^^^^^^^^
  File "D:\Sample\8.sample_code_provided\venv\Lib\site-packages\urllib3\connectionpool.py", line 787, in urlopen
response = self._make_request(
           ^^^^^^^^^^^^^^^^^^^
  File "D:\Sample\8.sample_code_provided\venv\Lib\site-packages\urllib3\connectionpool.py", line 493, in _make_request
conn.request(
  File "D:\Sample\8.sample_code_provided\venv\Lib\site-packages\urllib3\connection.py", line 445, in request
self.endheaders()
  File "C:\Users\OpenMail \AppData\Roaming\uv\python\cpython-3.12.10-windows-x86_64-none\Lib\http\client.py", line 1333, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
  File "C:\Users\OpenMail \AppData\Roaming\uv\python\cpython-3.12.10-windows-x86_64-none\Lib\http\client.py", line 1093, in _send_output
self.send(msg)
  File "C:\Users\OpenMail \AppData\Roaming\uv\python\cpython-3.12.10-windows-x86_64-none\Lib\http\client.py", line 1037, in send
self.connect()
  File "D:\Sample\8.sample_code_provided\venv\Lib\site-packages\urllib3\connection.py", line 276, in connect
self.sock = self._new_conn()
            ^^^^^^^^^^^^^^^^
  File "D:\Sample\8.sample_code_provided\venv\Lib\site-packages\urllib3\connection.py", line 198, in _new_conn
sock = connection.create_connection(
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Sample\8.sample_code_provided\venv\Lib\site-packages\urllib3\util\connection.py", line 81, in create_connection
sock.close()
  File "C:\Users\OpenMail \AppData\Roaming\uv\python\cpython-3.12.10-windows-x86_64-none\Lib\socket.py", line 501, in close
def close(self):

KeyboardInterrupt
^C

To run this code I created a new venv using uv and install the latest crewai and crewai[tools], and python 3.12.10

It definitely looks like a setup issue on your end. To try and reproduce your error, I set up a fresh virtual environment for testing:

cd crewai_local_qa
uv venv crewai_uv_env
source crewai_uv_env/bin/activate
uv pip install crewai crewai-tools pytest setuptools ollama
uv run main.py

And everything worked fine.