Agent does not recognize the knowledge sources file

I have a problem, my agent cannot access the file that I placed in the knowledge folder, I have already debugged the file and extracted the text completely, I pass it to the agent and even so it cannot access it, it returns the following response:

“Unfortunately, I don’t have access to external files or documents, including the “main_document.md” file mentioned. To proceed, please provide the content of the document here, and I will assist you in extracting the required information and formatting it into the specified JSON schema.”

I tried using different approaches to pass this file as a knowledge source using TextFileKnowledgeSource, StringKnowledgeSource and CrewDoclingSource, but it still doesn’t work, in the task description I clearly implied that it was necessary to consult the content that is in the knowledge folder as a source of knowledge in the following section:

" Input Data for reference:
Document Content: The content is available in the knowledge source file main_document.md. Make sure to thoroughly analyze this document for the required information."

I also tried to apply the parameter: knowledge_sources=[self.knowledge_source] both to the agent directly and to the crew and even to both

my document is in knowledge/data/file/main_document.md

my agent code:

class Agents:
def init(self, model: str, api_key: str, temperature: float, top_p: float):
os.environ[“OPENAI_API_KEY”] = api_key
self.llm = LLM(
model=model,
api_key=api_key,
temperature=temperature,
top_p=top_p,
)

    self.agents_config = self.load_yaml("config/agents.yaml")
    self.tasks_config = self.load_yaml("config/tasks.yaml")

def load_yaml(self, filepath):
    with open(filepath, "r") as f:
        return yaml.safe_load(f)

files = glob.glob("knowledge/data/**/*.md", recursive=True)
files = [file.replace("knowledge/", "", 1) for file in files]

print(f"Found {files} markdown files.")
knowledge_source = CrewDoclingSource(file_paths=files)

# with open("knowledge/data/file/main_document.md", "r") as file:
#     content = file.read()

# print(f"Found {content} markdown files.")
# knowledge_source = StringKnowledgeSource(content=content)

@agent
def get_specific_info_agent(self) -> Agent:
    return Agent(
        config=self.agents_config["GetSpecificInfoAgent"],
        llm=self.llm,
        max_iter=5,
        verbose=True,
        knowledge_sources=[self.knowledge_source]
    )

@task
def get_specific_info_task(self) -> Task:
    return Task(
        config=self.tasks_config["GetSpecificInfoTask"],
        agent=self.get_specific_info_agent(),
    )

def call_agent(self) -> Crew:
    return Crew(
        agents=[
            self.get_specific_info_agent(),
        ],
        tasks=[
            self.get_specific_info_task(),
        ],
        verbose=True,
        knowledge_sources=[self.knowledge_source]
    )

It’s my first time using this parameter in my agent system, I followed the documentation, but I don’t know if I asked for something important, I would love if someone could help me

I am running into a similar issue with PDFKnowledgeSource. Did you get past this?

To try and help you out, let’s look at a simple, working example of how to use PDFKnowledgeSource. The same goes for StringKnowledgeSource or JSONKnowledgeSource, for that matter. All of them will be used internally as a source for RAG.

When we talk about RAG in CrewAI, we’re actually talking about the Embedchain library. And you’ll need to follow that library’s standards when configuring things like embeddings, for example. Note that the configuration is slightly different from what we use for LLMs, because CrewAI uses the LiteLLM library for LLMs. One example of this difference is when using Gemini: the environment variable for the API key for LiteLLM is GEMINI_API_KEY, whereas for Embedchain it’s GOOGLE_API_KEY. If you need help configuring the embedding or the LLM, there are plenty of examples here in the forum and in the CrewAI documentation.

In the following setup, I used Gemini, but you can and should adapt it to your own environment. I downloaded the file https://www.cs.sjsu.edu/faculty/pollett/masters/Semesters/Fall24/adithya/CrewAI.pdf to use as our knowledge source. Notice that I placed it inside the knowledge directory so CrewAI can find it.

Directory structure:

crewai_pdfknowledgesource_example/
├── knowledge/
│   └── CrewAI.pdf
└── knowledge_test.py

knowledge_test.py file:

from crewai import Agent, Task, Crew, Process, LLM
from crewai.knowledge.source.pdf_knowledge_source import PDFKnowledgeSource
import os

os.environ["GEMINI_API_KEY"] = "<YOUR-KEY>"

pdf_source = PDFKnowledgeSource(
    file_paths=["CrewAI.pdf"]
)

embedder_config = {
    "provider": "google",
    "config": {
        "model": "models/text-embedding-004",
        "api_key": os.environ["GEMINI_API_KEY"]
    }
}

llm = LLM(
    model="gemini/gemini-2.0-flash",
    temperature=0.1
)

expert_agent = Agent(
    role="CrewAI Expert Assistant",
    goal="Answer questions about the CrewAI framework accurately.",
    backstory=(
        "You are 'CrewBot', an AI specialized solely in CrewAI. Your purpose "
        "is to explain CrewAI concepts (Agents, Tasks, Tools, Process, Crew) "
        "based on your internal knowledge. You decline questions unrelated "
        "to CrewAI."
    ),
    llm=llm,
    verbose=True,
    allow_delegation=False,
)

info_task = Task(
    description=(
        "Instructions:\n"
        "1. Answer the question '{user_question}' clearly and concisely.\n"
        "2. Base your answer **STRICTLY** on your internal knowledge "
        "of the CrewAI framework.\n"
        "3. **DO NOT** invent information or answer about topics "
        "outside the scope of CrewAI.\n"
        "4. If the question is not about CrewAI, politely state that "
        "you only know about CrewAI. For example: 'Sorry, I can only "
        "answer questions about the CrewAI framework.'\n"
        "5. Maintain a helpful and focused tone."
    ),
    expected_output=(
        "A direct and accurate answer to the user's question "
        "based solely on CrewAI knowledge. If the question is off-topic,"
        "provide a polite refusal."
    ),
    agent=expert_agent,
)

qa_crew = Crew(
    agents=[expert_agent],
    tasks=[info_task],
    process=Process.sequential,
    verbose=True,
    knowledge_sources=[pdf_source],
    embedder=embedder_config
)

result = qa_crew.kickoff(
    inputs={
        "user_question": "Tell me about Multi-Agent Collaboration and Memory."
    }
)

print(f"\n🤖 Answer:\n\n{result.raw}\n")

I got this to work as a standalone tester app, thank you. Not able to invoke the crew from a driver

Glad to hear you made some progress on that front.

I’m not quite sure what you mean there. Are you trying to call the crew from another file, is that it?

I have two files, one that defines a crew called enrichment.

@CrewBase
class Enrichment():
"""Enrichment crew"""

# Create a PDF knowledge source
pdf_source = PDFKnowledgeSource(
	file_paths=["MyPDF.pdf"],
)

# Create an Excel knowledge source
excel_source = ExcelKnowledgeSource(
	file_paths=["MyExcel.xlsm"]
) 
embedder_config = {
	"provider": "google",
	"config": {
		"model": "models/text-embedding-004",
		"api_key": os.environ["GEMINI_API_KEY"]
	}
}

llm = LLM(
	model="gemini/gemini-2.0-flash",
	temperature=0.1
)	
	
# Learn more about YAML configuration files here:
# Agents: https://docs.crewai.com/concepts/agents#yaml-configuration-recommended
# Tasks: https://docs.crewai.com/concepts/tasks#yaml-configuration-recommended
agents_config = 'config/agents.yaml'
tasks_config = 'config/tasks.yaml'

# If you would like to add tools to your agents, you can learn more about it here:
# https://docs.crewai.com/concepts/agents#agent-tools
@agent
def researcher(self) -> Agent:
	return Agent(
		config=self.agents_config['researcher'],
		verbose=True,
		llm=self.llm
	) 

# To learn more about structured task outputs, 
# task dependencies, and task callbacks, check out the documentation:
# https://docs.crewai.com/concepts/tasks#overview-of-a-task
@task
def research_task(self) -> Task:
	return Task(
		config=self.tasks_config['research_task'],
	)

@crew
def crew(self) -> Crew:
	"""Creates the Enrichment crew"""
	# To learn how to add knowledge sources to your crew, check out the documentation:
	# https://docs.crewai.com/concepts/knowledge#what-is-knowledge
	crew = Crew(
		agents=[self.researcher], 
        tasks=[self.research_task],
		process=Process.sequential,
		verbose=True,
		knowledge_sources=[self.pdf_source, excel_source],
		embedder=self.embedder_config,
	)
	return crew

and main.py that calls the crew

def call_enrich_deals(self, deals) :
    crew = Enrichment().crew()
    try:
        result = crew.kickoff(
            )
        print(f"result = {result}")
    except Exception as e:
        # Handle the exception
        print(f"An error occurred: {e}")

It appears that the crew starts but never runs to completion if I call the crew sub class from the driver (main.py).

Well, your code isn’t quite complete enough for me to give a more definitive opinion. For instance, I can only assume that agents.yaml and tasks.yaml are properly set up.

One thing that stands out in your code is that the deals parameter in call_enrich_deals isn’t actually being used, and also, you’re not passing any inputs when you call crew.kickoff(). Maybe your research_task is expecting an input that never arrived?