Both the RagTool
s (e.g., PDFSearchTool
, CSVSearchTool
, CodeDocsSearchTool
, and TXTSearchTool
) and the knowledge_sources
parameter do the exact same thing. They implement a RAG system following its canonical steps:
Split the document into chunks → Generate embeddings for the chunks → Save the embeddings into a vector database (ChromaDB by default) → Query the database to find relevant information → Return the relevant chunks to enhance the context.
The RagTool
s adopt what I consider a very elegant solution: outsourcing the process to a dedicated and mature library that is great for small to medium-sized projects. Embedchain has rich compatibility with data sources, vector databases, and embedding models, and it even allows the internal use of LLMs to generate summaries of the chunks, implementing a complete Agentic RAG. All the available configurations in Embedchain become available to CrewAI. This seems like a great choice by the framework. There is an edge case here, as CrewAI’s default adapter relies on Embedchain’s query
method, which doesn’t allow for additional parameters like num_documents
(number of relevant documents to fetch) or where
(for metadata filtering). However, in
this other thread, I presented a trivial modification that allows the use of the more flexible search
method.
On the other hand, the knowledge_sources
parameter implements classes like Knowledge
and KnowledgeStorage
to recreate and directly manage a ChromaDB collection. Personally, I find the idea of being able to say, “I want to provide knowledge to my crew,” to be semantically well-thought-out. For a framework that aims to model real team organization, this naming seems very appropriate. However, IMHO, the implementation of this idea is fragile because it violates good engineering practices, especially the DRY (Don’t Repeat Yourself) principle. In my view, CrewAI’s implementation is an attempt to duplicate what already exists in Embedchain, which can lead to inconsistencies (as seen in the configuration differences between the two) and a potential increase in maintenance, which can impact the DX. A solution that seems quite interesting to me would be for CrewAI to transparently inject the corresponding RagTool
s in direct relation to the provided knowledge_sources
. This would maintain the “provide knowledge to my crew” semantics while avoiding code duplication.
So, to answer the initial question: are they interchangeable? Well, if you evaluate and conclude that CrewAI’s in-house (and duplicated) implementation is on par with that of a dedicated library, then yes, they are the same thing.