Agent does not recognize the knowledge sources file

To try and help you out, let’s look at a simple, working example of how to use PDFKnowledgeSource. The same goes for StringKnowledgeSource or JSONKnowledgeSource, for that matter. All of them will be used internally as a source for RAG.

When we talk about RAG in CrewAI, we’re actually talking about the Embedchain library. And you’ll need to follow that library’s standards when configuring things like embeddings, for example. Note that the configuration is slightly different from what we use for LLMs, because CrewAI uses the LiteLLM library for LLMs. One example of this difference is when using Gemini: the environment variable for the API key for LiteLLM is GEMINI_API_KEY, whereas for Embedchain it’s GOOGLE_API_KEY. If you need help configuring the embedding or the LLM, there are plenty of examples here in the forum and in the CrewAI documentation.

In the following setup, I used Gemini, but you can and should adapt it to your own environment. I downloaded the file https://www.cs.sjsu.edu/faculty/pollett/masters/Semesters/Fall24/adithya/CrewAI.pdf to use as our knowledge source. Notice that I placed it inside the knowledge directory so CrewAI can find it.

Directory structure:

crewai_pdfknowledgesource_example/
├── knowledge/
│   └── CrewAI.pdf
└── knowledge_test.py

knowledge_test.py file:

from crewai import Agent, Task, Crew, Process, LLM
from crewai.knowledge.source.pdf_knowledge_source import PDFKnowledgeSource
import os

os.environ["GEMINI_API_KEY"] = "<YOUR-KEY>"

pdf_source = PDFKnowledgeSource(
    file_paths=["CrewAI.pdf"]
)

embedder_config = {
    "provider": "google",
    "config": {
        "model": "models/text-embedding-004",
        "api_key": os.environ["GEMINI_API_KEY"]
    }
}

llm = LLM(
    model="gemini/gemini-2.0-flash",
    temperature=0.1
)

expert_agent = Agent(
    role="CrewAI Expert Assistant",
    goal="Answer questions about the CrewAI framework accurately.",
    backstory=(
        "You are 'CrewBot', an AI specialized solely in CrewAI. Your purpose "
        "is to explain CrewAI concepts (Agents, Tasks, Tools, Process, Crew) "
        "based on your internal knowledge. You decline questions unrelated "
        "to CrewAI."
    ),
    llm=llm,
    verbose=True,
    allow_delegation=False,
)

info_task = Task(
    description=(
        "Instructions:\n"
        "1. Answer the question '{user_question}' clearly and concisely.\n"
        "2. Base your answer **STRICTLY** on your internal knowledge "
        "of the CrewAI framework.\n"
        "3. **DO NOT** invent information or answer about topics "
        "outside the scope of CrewAI.\n"
        "4. If the question is not about CrewAI, politely state that "
        "you only know about CrewAI. For example: 'Sorry, I can only "
        "answer questions about the CrewAI framework.'\n"
        "5. Maintain a helpful and focused tone."
    ),
    expected_output=(
        "A direct and accurate answer to the user's question "
        "based solely on CrewAI knowledge. If the question is off-topic,"
        "provide a polite refusal."
    ),
    agent=expert_agent,
)

qa_crew = Crew(
    agents=[expert_agent],
    tasks=[info_task],
    process=Process.sequential,
    verbose=True,
    knowledge_sources=[pdf_source],
    embedder=embedder_config
)

result = qa_crew.kickoff(
    inputs={
        "user_question": "Tell me about Multi-Agent Collaboration and Memory."
    }
)

print(f"\n🤖 Answer:\n\n{result.raw}\n")