I am working on enabling Docling Knowledge source in crewai.
Crew ai version - 0.108.0
docling==2.29.0
content_source = CrewDoclingSource(
file_paths=[“Browser Related.docx”],
collection_name=“sourcedocling”,
storage=KnowledgeStorage(collection_name=“sourcedocling”)
# storage=“./storage”
)
It is Working fine but , I am not able to correctly use the Storage parameter. I want my Docling vector db to be save in my directory so that my crewai and my agent can access it later on.
But no luck yet have tried different ways I have declare CREWAI_STORAGE_DIR="./storage_DB" in my .env file also.
Yes ,I have also gone through this module but didn’t get any luck on base path also it is noy getting saved anywhere. I am not sure I am intializing the KnowledgeStorage instance correctly. Or their is some issue in module which is not correctly calling Save and initialize_knowledge_storage(self) function.
As CrewDoclingSource also has a save function with save_documents() but nowhere vector db is saved.
Yes Crew and Agent by default handle it correctly but in that case for every call the processing of the document will be done again that will give a latency that I don’t think so is correct.
Additionally do you have idea how collection name works and how I can use it if I have to make 3 different Vector Db to pass it to 3 different agents.
Yep, turns out I was missing something: db_storage_path() itself relies on get_project_directory_name(), which ultimately respects the CREWAI_STORAGE_DIR environment variable. So you absolutely can control where your vector database gets saved, just as you need. For example:
from pathlib import Path
from crewai.utilities.paths import db_storage_path
import os
os.environ["CREWAI_STORAGE_DIR"] = str(Path.cwd()) # Set to current working directory
base_knowledge_db_path = Path(db_storage_path()) / "knowledge"
print(base_knowledge_db_path)