String Knowledge sources not working with Gemini

tonykipkemboi · January 6, 2025, 2:41pm

Could you share the code in question? I’ll try troubleshooting this for you all this morning.

Evan_Scallan · January 6, 2025, 4:56pm

Made a few updates on an CSV (custom) Knowledge source that requires no additional configuration after being instantiated. Hope this can serve as some additional help or context. This is tested and working:

from crewai.knowledge.source.base_knowledge_source import BaseKnowledgeSource
import csv
import hashlib
from typing import Dict, Any
from pydantic import Field
from crewai.knowledge.storage.knowledge_storage import KnowledgeStorage
from dotenv import load_dotenv
import os

load_dotenv()

# Create and configure knowledge storage
csv_storage: KnowledgeStorage = KnowledgeStorage(
    embedder_config={
        "provider": "azure",
        "config": {
            "model": "text-embedding-3-small",
            "api_key": os.environ.get("AZURE_OPENAI_API_KEY")
        }
})

csv_storage.initialize_knowledge_storage()

class AzureCSVKnowledgeSource(BaseKnowledgeSource):
    """Knowledge source that fetches data from Space News API."""
    document_path: str = Field(
        description="The relative paths to a document"
    )

    def __init__(self, document_path, storage: KnowledgeStorage = None):
        super().__init__(document_path=document_path, storage=csv_storage)
        self.document_path = document_path
        if storage:
            self.storage = storage
            self.add()

    def load_content(self) -> Dict[Any, str]:
        """ Read the CSV file and return a dictionary containing all rows joined as a string.
        This method demonstrates how to parse the CSV so that chunking can later
        be done in a more structured way."""
        try:
            rows = []
            with open(self.document_path, 'r', encoding='utf-8') as file:
                reader = csv.reader(file)
                # Convert each row (list of columns) into a single string
                for row in reader:
                    row_as_text = ", ".join(row)
                    rows.append(row_as_text)

            # Join the entire CSV into one text. If you prefer, you could also return
            # the rows as a list and handle them directly in add().
            content = "\n".join(rows)
            return {"document_data": content}

        except FileNotFoundError:
            print("File not found!")
        except PermissionError:
            print("You don't have permission to access this file.")
        except Exception as e:
            print("An error occurred:", e)
            return {}

    def _chunk_csv_rows(self, rows: list[str], rows_per_chunk: int = 20) -> list[str]:
        """
        A helper method specifically for CSV row-based chunking.
        Groups every `rows_per_chunk` rows into a single text chunk.
        """
        chunks = []
        for i in range(0, len(rows), rows_per_chunk):
            # Join a subset of rows into one chunk
            chunk_rows = rows[i: i + rows_per_chunk]
            chunk_text = "\n".join(chunk_rows)
            chunks.append(chunk_text)
        return chunks

    def generate_unique_id(self, content: str) -> str:
        """Generate a unique ID using a hash of the content."""
        return hashlib.sha256(content.encode('utf-8')).hexdigest()

    def add(self) -> None:
        """
        Load the CSV content, chunk it (row-based or character-based), and save it.
        """
        content_dict = self.load_content()
        if not content_dict:
            return  # In case of errors in load_content

        # The full CSV as a single string
        full_content = content_dict.get("document_data", "")

        # Split that string back into row-based text so we can chunk by row count.
        # If you'd rather keep it as columns or parse differently, adjust here.
        rows = full_content.split("\n")

        # You can tune `rows_per_chunk` according to your needs.
        self.chunks = self._chunk_csv_rows(rows, rows_per_chunk=20)
        # OPTIONAL: If you still want character-based chunking, you could use the inherited
        # `_chunk_text` method instead:
        # self.chunks = self._chunk_text(full_content)

        # Create metadata for each chunk
        self.metadata = []
        for chunk in self.chunks:
            self.metadata.append({"id": self.generate_unique_id(chunk)})

        # Validate metadata and chunk alignment
        if len(self.chunks) != len(self.metadata):
            raise ValueError(
                f"Mismatch in chunks and metadata lengths: "
                f"{len(self.chunks)} vs {len(self.metadata)}"
            )

        # Save documents with the associated metadata
        self.save_documents(metadata=self.metadata)

Given that your env variables and embedder attribute for Crew or Agent are correctly set up, all you have to do is instantiate your knowledge source. Example:

text_data = AzureDocKnowledgeSource(document_path="./knowledge_and_documentation/ascension.txt")

You can then pass this on to your Crew or Agent(s) as needed. You can alter the class slightly to take in a string instead of a file if needed.

Ricram2 · January 10, 2025, 3:11pm

from crewai import Agent, Task, Crew, Process, LLM
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource


# Create a knowledge source
content = "Users name is John. He is 30 years old and lives in San Francisco."
string_source = StringKnowledgeSource(
    content=content,
)

# Create an LLM with a temperature of 0 to ensure deterministic outputs
llm = LLM(model="llama3", temperature=0)

# Create an agent with the knowledge store
agent = Agent(
    role="About User",
    goal="You know everything about the user.",
    backstory="""You are a master at understanding people and their preferences.""",
    verbose=True,
    allow_delegation=False,
    llm=llm,
)
task = Task(
    description="Answer the following questions about the user: {question}",
    expected_output="An answer to the question.",
    agent=agent,
)

crew = Crew(
    agents=[agent],
    tasks=[task],
    verbose=True,
    process=Process.sequential,
    knowledge_sources=[string_source],
    embedder={
        "provider": "ollama",
        "config": {
            "model": "nomic-embed-text",
            "api_key": ""
        }
    }
)

result = crew.kickoff(inputs={"question": "What city does John live in and how old is he?"})

crewai run 
Running the Crew
warning: `VIRTUAL_ENV=/Users/ricram2/canada_tax/.venv` does not match the project environment path `.venv` and will be ignored
/Users/ricram2/canada_tax/crewai/tax_crew/.venv/lib/python3.11/site-packages/pydantic/_internal/_config.py:345: UserWarning: Valid config keys have changed in V2:
* 'fields' has been removed
  warnings.warn(message, UserWarning)
Traceback (most recent call last):
  File "/Users/ricram2/canada_tax/crewai/tax_crew/.venv/bin/run_crew", line 5, in <module>
    from tax_crew.main import run
  File "/Users/ricram2/canada_tax/crewai/tax_crew/src/tax_crew/main.py", line 5, in <module>
    from tax_crew.crew import TaxCrew
  File "/Users/ricram2/canada_tax/crewai/tax_crew/src/tax_crew/crew.py", line 7, in <module>
    string_source = StringKnowledgeSource(
                    ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ricram2/canada_tax/crewai/tax_crew/.venv/lib/python3.11/site-packages/pydantic/main.py", line 214, in __init__
    validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ricram2/canada_tax/crewai/tax_crew/.venv/lib/python3.11/site-packages/crewai/knowledge/storage/knowledge_storage.py", line 72, in __init__
    self._set_embedder_config(embedder_config)
  File "/Users/ricram2/canada_tax/crewai/tax_crew/.venv/lib/python3.11/site-packages/crewai/knowledge/storage/knowledge_storage.py", line 195, in _set_embedder_config
    else self._create_default_embedding_function()
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ricram2/canada_tax/crewai/tax_crew/.venv/lib/python3.11/site-packages/crewai/knowledge/storage/knowledge_storage.py", line 179, in _create_default_embedding_function
    return OpenAIEmbeddingFunction(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ricram2/canada_tax/crewai/tax_crew/.venv/lib/python3.11/site-packages/chromadb/utils/embedding_functions/openai_embedding_function.py", line 56, in __init__
    raise ValueError(
ValueError: Please provide an OpenAI API key. You can get one at https://platform.openai.com/account/api-keys
An error occurred while running the crew: Command '['uv', 'run', 'run_crew']' returned non-zero exit status 1.

tonykipkemboi · January 10, 2025, 7:53pm

The issue is from the intantiation of the LLM class. Try this but make sure to modify the model to match what you have locally through ollama:

Pick one and use it below. Make sure you start with the ollama/{your-model}:

from crewai import Agent, Task, Crew, Process, LLM
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource


# Create a knowledge source
content = "Users name is John. He is 30 years old and lives in San Francisco."
string_source = StringKnowledgeSource(content=content)

# Create an LLM with a temperature of 0 to ensure deterministic outputs
llm = LLM(
    model="ollama/llama3.2:latest", # run !ollama list to see models you have
    temperature=0, 
    api_key=""
)

# Create an agent with the knowledge store
agent = Agent(
    role="About User",
    goal="You know everything about the user.",
    backstory="""You are a master at understanding people and their preferences.""",
    verbose=True,
    allow_delegation=False,
    llm=llm,
)
task = Task(
    description="Answer the following questions about the user: {question}",
    expected_output="An answer to the question.",
    agent=agent,
)

crew = Crew(
    agents=[agent],
    tasks=[task],
    verbose=True,
    process=Process.sequential,
    knowledge_sources=[string_source],
    embedder={
        "provider": "ollama",
        "config": {
            "model": "nomic-embed-text",
            "api_key": ""
        }
    }
)

result = crew.kickoff(inputs={"question": "What city does John live in and how old is he?"})

Ricram2 · January 10, 2025, 10:56pm

Nope. I did as you suggested. same issue. here is the code:

from crewai import Agent, Task, Crew, Process, LLM
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource


# Create a knowledge source
content = "Users name is John. He is 30 years old and lives in San Francisco."
string_source = StringKnowledgeSource(
    content=content,
)

# Create an LLM with a temperature of 0 to ensure deterministic outputs
llm = LLM(model="llama3:latest", temperature=0)

# Create an agent with the knowledge store
agent = Agent(
    role="About User",
    goal="You know everything about the user.",
    backstory="""You are a master at understanding people and their preferences.""",
    verbose=True,
    allow_delegation=False,
    llm=llm,
)
task = Task(
    description="Answer the following questions about the user: {question}",
    expected_output="An answer to the question.",
    agent=agent,
)

crew = Crew(
    agents=[agent],
    tasks=[task],
    verbose=True,
    process=Process.sequential,
    knowledge_sources=[string_source],
    embedder={
        "provider": "ollama",
        "config": {
            "model": "nomic-embed-text:latest",
            "api_key": ""
        }
    }
)

result = crew.kickoff(inputs={"question": "What city does John live in and how old is he?"})

here is the ollama list


NAME                       ID              SIZE      MODIFIED   
nomic-embed-text:latest    0a109f422b47    274 MB    6 days ago    
llama3:latest              365c0bd3c000    4.7 GB    6 days ago    
llama2:latest              78e26419b446    3.8 GB    6 days ago

tonykipkemboi · January 11, 2025, 5:43pm

what error did you gett his time around? also what system are you using and are you in a virtual enironment?

Ricram2 · January 13, 2025, 2:28am

It is the same error with the OpenAI key, And yes using the project’s own venv

smrati_katiyar · January 14, 2025, 12:38pm

llm = LLM(
	model="ollama/llama3.2:latest",
	base_url="http://localhost:11434",
    temperature=0.2,
    embedder = {
        "provider": "ollama",
        "config": {
            "model": "nomic-embed-text:latest",
            "api_key": ""
        }
    }
)

This is how i have configured my LLM

I am using knowledge_sources in my crew

crew = Crew(
    agents=[node_selector, flow_designer],
    tasks=[select_node, setup_nodes],
    process=Process.sequential,
    verbose=True,
    # planning=True,  # Enable planning feature
    knowledge_sources=[
        text_source
    ]
)

Still getting following issue, my whole knowledge source is getting ignored by agent while answering

 Failed to init knowledge: Please provide an OpenAI API key.

tonykipkemboi · January 15, 2025, 5:45pm

Are you comfortable sharing your code with me to test?

smrati_katiyar · January 15, 2025, 5:53pm

Please find below

import os
from crewai import Agent, Task, Crew, Process, LLM
from crewai.knowledge.source.crew_docling_source import CrewDoclingSource


# Create a text file knowledge source
text_source = CrewDoclingSource(
    file_paths=["remedy_tickets_info.md", "jira_tickets_info.md", "translation_service_info.md", "named_entity_recognition_service_info.md"]
)


llm = LLM(
	model="ollama/llama3.2:latest",
	base_url="http://localhost:11434",
    temperature=0.2,
    embedder = {
        "provider": "ollama",
        "config": {
            "model": "nomic-embed-text:latest",
            "api_key": ""
        }
    }
)


# Create agents
node_selector = Agent(
    role='Node selector',
    goal='Select nodes from a list of nodes to accomplish a given task',
    backstory='You are an expert is selecting nodes. If some task is given to you, you can decide which nodes will be useful'
              'to finish a task. Once you have shortlisted the nodes you give these nodes to flow designer. Flow designer is epert in arranging the nodes'
              'in correct order to sinish the task. remember your job is to select the most appropriate nodes only',
    verbose=True,
    max_iter=2,
    llm=llm,
    # knowledge_sources=[
    #     text_source
    # ]
)

flow_designer = Agent(
    role='Flow designer',
    goal='Based on the nodes given to you, arrange them in certain order and explain how to use them to finish the given task succesfully.',
    backstory='You are an expert is usng nodes given to you by Node Selector'
              'Once you have received the nodes you know how to arrange them in certain order to finish the task. You think carfully multiple times'
              'to best organise nodes in the most appropriate order.',
    verbose=True,
    max_iter=2,
    llm=llm,
    # knowledge_sources=[
    #     text_source
    # ]
)

# Define tasks
select_node = Task(
    description='Select Nodes to convert a remedy ticket into a servicenow card.',
    expected_output='Give a list of nodes which can be useful to finish this task. In the output only include node names, each node should be'
                     'in a new line. Only one node name should be there in one line. If you have selected multiple nodes as output put each one of there name'
                     'in a new line',
    agent=node_selector
)

setup_nodes = Task(
    description='Node selector has done selection of nodes for you, your job is to arrange these nodes in appropriate order and explain how they cna be used to'
                'convert a remedy ticket into service now card',
    expected_output='Explain the ordering of nodes and how it can be used to achive the task',
    agent=flow_designer,
    output_file='node-setup/remedy-to-servicenow.md',  # Node selection and use will be explained here
    context=[select_node]
)

# Assemble a crew with planning enabled
crew = Crew(
    agents=[node_selector, flow_designer],
    tasks=[select_node, setup_nodes],
    process=Process.sequential,
    verbose=True,
    # planning=True,  # Enable planning feature
    knowledge_sources=[
        text_source
    ]
)

# Execute tasks
crew_output = crew.kickoff()


# Accessing the crew output
print(f"Raw Output: {crew_output.raw}")
print(f"Tasks Output: {crew_output.tasks_output}")
print(f"Token Usage: {crew_output.token_usage}")
print(crew.usage_metrics)

Output:

Davi_Santos · January 28, 2025, 1:01pm

I’m facing the same issue here

Tony_Wood · January 28, 2025, 2:07pm

Try setting your .env info here Quickstart - CrewAI

Ricram2 · January 28, 2025, 4:30pm

This is not the issue.

Guigui_DuSud · February 6, 2025, 8:49am

wow it looks like i’m not the only one trying ollama and failling my RAG.

I’m also try to setup a working PDFKnowledgeSource per doc . I even try the CrewAI chat assistant that guide me wrong .
here is full crew code :

from crewai import Agent, Crew, Process, Task, LLM
from crewai.project import CrewBase, agent, crew, task
from crewai.knowledge.source.pdf_knowledge_source import PDFKnowledgeSource



ollamaLLM = LLM(
	model="ollama/llama3.2",
	base_url="http://localhost:11434",
	temperature=0
)


# If you want to run a snippet of code before or after the crew starts, 
# you can use the @before_kickoff and @after_kickoff decorators
# https://docs.crewai.com/concepts/crews#example-crew-class-with-decorators


pdf_source = PDFKnowledgeSource(
    file_paths=["Séance7-IPAetanalysedecontenu.pdf", "TD Analyse Contenu et IPA_étudiants.pdf"],
	collection_name="data_agathe",
)


@CrewBase
class Yousfifromscratch():
	"""Yousfifromscratch crew"""

	# Learn more about YAML configuration files here:
	# Agents: https://docs.crewai.com/concepts/agents#yaml-configuration-recommended
	# Tasks: https://docs.crewai.com/concepts/tasks#yaml-configuration-recommended
	agents_config = 'config/agents.yaml'
	tasks_config = 'config/tasks.yaml'

	# If you would like to add tools to your agents, you can learn more about it here:
	# https://docs.crewai.com/concepts/agents#agent-tools
	@agent
	def expert_analyst(self) -> Agent:
		return Agent(
			config=self.agents_config['expert_analyst'],
			verbose=True,
			llm=ollamaLLM,
			allow_delegation=False,
			#knowledge_storage=[pdf_source],
			embedder={
        		"provider": "ollama",
        		"config": {
					"model": "nomic-embed-text",
					 "api_key": ""
	     		}
   			 }
		)

	@agent
	def task_coordinator(self) -> Agent:
		return Agent(
			config=self.agents_config['task_coordinator'],
			verbose=True,
			llm=ollamaLLM,
			
		)

	@agent
	def human_supervisor(self) -> Agent:
		return Agent(
			config=self.agents_config['human_supervisor'],
			verbose=True,
			
		)

	# To learn more about structured task outputs, 
	# task dependencies, and task callbacks, check out the documentation:
	# https://docs.crewai.com/concepts/tasks#overview-of-a-task
	@task
	def text_analysis_task(self) -> Task:
		return Task(
			config=self.tasks_config['text_analysis_task'],
			output_file='report.txt'
		)

	@task
	def supervision_task(self) -> Task:
		return Task(
			config=self.tasks_config['supervision_task'],
			
		)

	@crew
	def crew(self) -> Crew:
		"""Creates the Yousfifromscratch crew"""
		# To learn how to add knowledge sources to your crew, check out the documentation:
		# https://docs.crewai.com/concepts/knowledge#what-is-knowledge

		return Crew(
			agents=self.agents, # Automatically created by the @agent decorator
			tasks=self.tasks, # Automatically created by the @task decorator
			process=Process.sequential,
			verbose=True,
			knowledge_sources=[pdf_source],
			embedder={
        		"provider": "ollama",
        		"config": {
					"model": "nomic-embed-text",
	     		}
   			 }
			
			# process=Process.hierarchical, # In case you wanna use that instead https://docs.crewai.com/how-to/Hierarchical/
		)

hope this will help to solve issue .

Matthew · February 6, 2025, 6:41pm

If this helps, Knowledge doesn’t work for me with Ollama either.
it works for me in a single crew file (MyCrew.py) but does not work in a ‘Multiple File Crew’ (main.py, crew.py, tasks.yaml, agents.yaml etc)

subbu · February 6, 2025, 6:44pm

Its been 2 months I opened this thread, still I’m not sure why its too complex to setup a simple agent with Knowledge base using crewai.

@rokbenko / @tonykipkemboi I have updated to the latest crewai SDK and passing the knowledge base config to Agent instead of Crew. Also I’m testing with OpenAPI GPT model, now I’m seeing this error, any idea how to fix this?

crewai run
Running the Crew
warning: `VIRTUAL_ENV=../.venv` does not match the project environment path `.venv` and will be ignored
[2025-02-07 00:05:56,061][INFO] Logging is set to INFO, use `logging_level` argument or `COMPOSIO_LOGGING_LEVEL` change this

[2025-02-07 00:05:59][ERROR]: Failed to upsert documents: APIStatusError.__init__() missing 2 required keyword-only arguments: 'response' and 'body'
Traceback (most recent call last):
File "../.venv/bin/run_crew", line 8, in <module>
sys.exit(run())
^^^^^
File "../src/bot/main.py", line 22, in run
HrSlackBotCrew(get_policies_content()).hr_crew().kickoff(inputs=inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "../.venv/lib/python3.12/site-packages/crewai/project/crew_base.py", line 35, in __init__
self.map_all_task_variables()
File "../.venv/lib/python3.12/site-packages/crewai/project/crew_base.py", line 166, in map_all_task_variables
self._map_task_variables(
    File "../.venv/lib/python3.12/site-packages/crewai/project/crew_base.py", line 199, in _map_task_variables
self.tasks_config[task_name]["agent"] = agents[agent_name]()
                                        ^^^^^^^^^^^^^^^^^^^^
                                        File "../.venv/lib/python3.12/site-packages/crewai/project/utils.py", line 7, in memoized_func
cache[key] = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
             File "../src/bot/crew.py", line 48, in hr_agent
return Agent(
    ^^^^^^
    File "../.venv/lib/python3.12/site-packages/pydantic/main.py", line 214, in __init__
validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                 pydantic_core._pydantic_core.ValidationError: 1 validation error for Agent
Value error, Invalid Knowledge Configuration: APIStatusError.__init__() missing 2 required keyword-only arguments: 'response' and 'body' [type=value_error, input_value={'llm': <crewai.llm.LLM o... like Google and Meta.'}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.10/v/value_error

Matthew_Scarmuzza · February 23, 2025, 4:05am

Have there been any updates on this issue? I’m also having the error that asks for an openAI key despite not using the openAI embedding model.

Guigui_DuSud · February 23, 2025, 1:14pm

i managed to get both Knowlege working.

# Create a knowledge source
content_source = CrewDoclingSource(
    file_paths=[
        "theme.md",
		
        # "https://lilianweng.github.io/posts/2024-07-07-hallucination",
    ],	
)


@agent
	def semantic_classificor(self) -> Agent:
		return Agent(
			config=self.agents_config['semantic_classificor'],
			verbose=True,
			llm=localllm,
			embedder={
				"provider": "ollama",
				"config": {
					"model": "nomic-embed-text:latest",
					"base_url":"http://localhost:11434",
					# "api_key": GEMINI_API_KEY,
        		}
    		},
			knowledge_sources= [content_source],
		)

this looks to work as expected . I try to ensure a clean env is used but i observed

I try as well succesfully PDF

files = glob.glob("knowledge/HB documents/**/*.pdf", recursive=True)

files = [file.replace("knowledge/", "", 1) for file in files]

# print(files)

QIAcuityknowlege = PDFKnowledgeSource(

file_paths=files,

)
@agent
	def researcher(self) -> Agent:
		return Agent(
			config=self.agents_config['researcher'],
			llm=localllm,
			verbose=True,
			embedder={
				"provider": "ollama",
				"config": {
					"model": "nomic-embed-text:latest",
					"base_url":"http://localhost:11434",
        		}
    		},
			knowledge_sources=[QIAcuityknowlege]
		)

I do not know if it is because i put embeder above the knowlege_source call .
I can see that my apps try to load knowlege succesfully and refer to it .

crewai version is crewai, version 0.102.0

Ruben_Casillas · February 23, 2025, 2:04pm

Nice! thanks for sharing, I am trying it right now.

Thom_Web_Com · March 2, 2025, 12:04pm

Hi team,

I am facing a similar issue with Gemini: body and response missing. I think they are working on it.

Topic		Replies	Views
Failing to embed knowledge source using ollama CrewAI Community Support	4	540	June 12, 2025
Knowledge_sources is not working in crewai both in Agent and Crew level using Azure OpenAI CrewAI Community Support tools_issues , feature	0	193	March 14, 2025
StringKnowledgeSource is erroring out General crewai	3	284	May 2, 2025
🤔 Flows - CrewDoclingSource Markdown Knowledge Source Error, Ollama Local LLM Environment CrewAI Community Support crewai , flows	6	679	April 10, 2025
Agent does not recognize the knowledge sources file CrewAI Community Support agent , crewai	7	746	April 7, 2025

String Knowledge sources not working with Gemini

Related topics