Difference between RAG and knowledge source

Hi there!

I’ve been conducting some experiments on information retrieval from documents (mostly pdfs) and I’ve noticed a pattern between pdf rag search tool and knowledge source.
It seems like the pdf RAG search returns information without much depth, like a text summarization that misses a lot of important details (i.e. statistics) that have been asked in prompts for the agents with this tool. While knowledge source has shown a more detailed summary answering key points, drastically different from the other tool; which left me wondering why is that behavior happening as it seems that both use embedders, vector databases, queries…

My final question is: how exactly are the “external information sources” being used to provide information for the crew that works differently than RAG tools?

thx!

6 Likes

Bump.

Don’t have an answers, but I am also interested in this. Not exactly sure what the difference is between PDF passed in as knowledge (from crewai.knowledge.source.pdf_knowledge_source import PDFKnowledgeSource) versus using the PDF search tool (from crewai_tools import PDFSearchTool).

bump also super interested and have failed ot get any rag functions to work with the crews

I totally agree. PDF passed as knowledge works much better than with PDG RAG tool.

1 Like

I have been using knowledge sources “as if” they are RAG. I find this works best with me default memory for Memory - CrewAI where it stores and updates you long term memory is also useful.

1 Like

I would also love to know move about when to use knowledge and when to use tools.

What about using NotebookLM as your RAG? Is there a tool to do that?

I’ve same question, when is it better to use PDFSearchTool or Knowledge? They are both RAG solutions that crewAI offers. I know Knowledge can be shared across crew agents … but it is not enough to get a decision. I’m looking around it and I wrote even a post on CrewAI meets RAG: built-in and custom solutions - DEV Community
I’d like to have a “guide line” from who knows the framework having “an internal poinmt of view”.

Great question.

I will use PDFSearch when i want to search a single file and knowlege when I have a lot of PDFs I want to reference locally.

You can also use mem0 : just to mix things up Memory - CrewAI

Thanks Tony, this is what perplexity.ai answered to my question
" To choose between PDFSearchTool and Knowledge for decision-making clarity, focus on the nature of your documents and the level of detail and sharing needed for clarity in decisions:

  • Use PDFSearchTool if your decision-making relies on fast, precise, and grounded extraction of specific passages from PDF documents. It excels in semantic search and clear retrieval of exact content, which aids clarity by sticking closely to source content without adding inference or nuance. This is suitable when clarity depends on referencing precise document facts without broad contextual summarization.
  • Use Knowledge when you want a shared, curated knowledge base that integrates diverse document types (PDF, text, CSV, etc.) and supports more detailed, synthesized summaries with key facts and statistics. This is helpful for decisions requiring broader contextual understanding and for sharing insights across teams or agents to achieve collective clarity.

For clearer decision-making clarity:

Key Aspect PDFSearchTool Knowledge
Document focus Single/multiple PDFs semantically searched Aggregated, multi-format knowledge base
Result style Exact passage extraction, precise and grounded Detailed summarization, richer context
Sharing/collaboration Typically single use, not shared Shared across agents, supports collaboration
Decision clarity suitability Clarity from direct quote/reference Clarity from synthesized insights, collaborative views

If your decision needs are narrowly scoped to exact facts in PDFs, PDFSearchTool offers clarity through precision. If your decisions benefit from a comprehensive, sharable knowledge context, Knowledge supports clarity by providing depth and shared understanding.

Additionally, consider the trade-off that PDFSearchTool’s approach may be less prone to hallucination but less nuanced, while Knowledge may offer richer answers but require careful knowledge base curation to avoid confusion or hallucinations. Combining both or integrating additional tools may also further improve decision clarity depending on complexity and scale.

This approach aligns with CrewAI’s design, helping you select the tool that best supports clarity and confidence in your decision-making workflow."

Both the RagTools (e.g., PDFSearchTool, CSVSearchTool, CodeDocsSearchTool, and TXTSearchTool) and the knowledge_sources parameter do the exact same thing. They implement a RAG system following its canonical steps:

Split the document into chunks → Generate embeddings for the chunks → Save the embeddings into a vector database (ChromaDB by default) → Query the database to find relevant information → Return the relevant chunks to enhance the context.

The RagTools adopt what I consider a very elegant solution: outsourcing the process to a dedicated and mature library that is great for small to medium-sized projects. Embedchain has rich compatibility with data sources, vector databases, and embedding models, and it even allows the internal use of LLMs to generate summaries of the chunks, implementing a complete Agentic RAG. All the available configurations in Embedchain become available to CrewAI. This seems like a great choice by the framework. There is an edge case here, as CrewAI’s default adapter relies on Embedchain’s query method, which doesn’t allow for additional parameters like num_documents (number of relevant documents to fetch) or where (for metadata filtering). However, in :backhand_index_pointing_right: this other thread, I presented a trivial modification that allows the use of the more flexible search method.

On the other hand, the knowledge_sources parameter implements classes like Knowledge and KnowledgeStorage to recreate and directly manage a ChromaDB collection. Personally, I find the idea of being able to say, “I want to provide knowledge to my crew,” to be semantically well-thought-out. For a framework that aims to model real team organization, this naming seems very appropriate. However, IMHO, the implementation of this idea is fragile because it violates good engineering practices, especially the DRY (Don’t Repeat Yourself) principle. In my view, CrewAI’s implementation is an attempt to duplicate what already exists in Embedchain, which can lead to inconsistencies (as seen in the configuration differences between the two) and a potential increase in maintenance, which can impact the DX. A solution that seems quite interesting to me would be for CrewAI to transparently inject the corresponding RagTools in direct relation to the provided knowledge_sources. This would maintain the “provide knowledge to my crew” semantics while avoiding code duplication.

So, to answer the initial question: are they interchangeable? Well, if you evaluate and conclude that CrewAI’s in-house (and duplicated) implementation is on par with that of a dedicated library, then yes, they are the same thing.

2 Likes