My Crewai app generates the storage bank but it doesn’t reuse the saved embeddings when I run the app again. I have local storage set to:
custom_storage_dir = Path(__file__).parent / "storage"
os.environ["CREWAI_STORAGE_DIR"] = str(custom_storage_dir).
My script:
import os
from pathlib import Path
custom_storage_dir = Path(__file__).parent / "storage"
os.environ["CREWAI_STORAGE_DIR"] = str(custom_storage_dir)
from crewai import Agent, Task, Crew, Process
from crewai.knowledge.storage.knowledge_storage import KnowledgeStorage
from crewai.knowledge.source.text_file_knowledge_source import TextFileKnowledgeSource
custom_storage = KnowledgeStorage(
embedder={
"provider": "openai",
"config": {"model": "text-embedding-3-small"}
},
collection_name="custom_dba",
)
doc_sources = TextFileKnowledgeSource(
file_paths=["estrutura_db.txt"])
doc_sources.storage = custom_storage
agent = Agent(
role="Answer only SQL-related questions, always using very short answers.",
goal="Display queries according to the question and show the query results.",
backstory="You are an assistant who helps users solve SQL problems. You have access to documentation and in-depth knowledge of SQL.",
verbose=True,
allow_delegation=False,
memory=True,
knowledge_sources=[doc_sources],
)
If I ask the same question again, it regenerates the 14,000 tokens from my txt file. When I run the app, it always asks the same question and generates 14,000 tokens again.
I ran this script
from pathlib import Path
import them
# Configuration of the storage directory
custom_storage_dir = Path(__file__).parent / "storage"
os.environ["CREWAI_STORAGE_DIR"] = str(custom_storage_dir)
import chromadb
from crewai.utilities.paths import db_storage_path
# Connect to CrewAI's ChromaDB
storage_path = db_storage_path()
chroma_path = os.path.join(storage_path, "knowledge")
if os.path.exists(chroma_path):
client = chromadb.PersistentClient(path=chroma_path)
collections = client.list_collections()
print("ChromaDB Collections:")
for collection in collections:
print(f" - {collection.name}: {collection.count()} documents")
else:
print("No ChromaDB storage found")
print("CREWAI_STORAGE_DIR:", os.getenv("CREWAI_STORAGE_DIR"))
print("Current working directory:", os.getcwd())
print("Computed storage path:", db_storage_path())
And I got this result:
E:\projects\api_teste_python
(.venv) λ python pa.py
ChromaDB Collections:
- knowledge_Answer_only_SQL_related_questions__uses: 13 documents
CREWAI_STORAGE_DIR: E:\projects\api_teste_python\storage
Current working directory: E:\projects\api_teste_python
Computed storage path: E:\projects\api_test_python\storage
What I can’t understand, since it stores the embeddings, is why every time I ask a question, it sends the entire document to text-embedding-3-small.
In the tokenizer, my document gives 14,000 tokens, and every time I ask a question, I check the text-embedding-3-small dashboard, and it increases by 14,000, so it doesn’t update anything.
Note: The question is always the same one embedded in the script.