Persistence Knowledge

My Crewai app generates the storage bank but it doesn’t reuse the saved embeddings when I run the app again. I have local storage set to:

custom_storage_dir = Path(__file__).parent / "storage"
os.environ["CREWAI_STORAGE_DIR"] = str(custom_storage_dir). 

My script:

import os
from pathlib import Path

custom_storage_dir = Path(__file__).parent / "storage"
os.environ["CREWAI_STORAGE_DIR"] = str(custom_storage_dir)

from crewai import Agent, Task, Crew, Process
from crewai.knowledge.storage.knowledge_storage import KnowledgeStorage
from crewai.knowledge.source.text_file_knowledge_source import TextFileKnowledgeSource

custom_storage = KnowledgeStorage( 
embedder={ 
"provider": "openai", 
"config": {"model": "text-embedding-3-small"}
},
collection_name="custom_dba",

)

doc_sources = TextFileKnowledgeSource(
file_paths=["estrutura_db.txt"])

doc_sources.storage = custom_storage

agent = Agent(
role="Answer only SQL-related questions, always using very short answers.",
goal="Display queries according to the question and show the query results.",
backstory="You are an assistant who helps users solve SQL problems. You have access to documentation and in-depth knowledge of SQL.",
verbose=True,
allow_delegation=False,
memory=True,
knowledge_sources=[doc_sources],

)

If I ask the same question again, it regenerates the 14,000 tokens from my txt file. When I run the app, it always asks the same question and generates 14,000 tokens again.

I ran this script

from pathlib import Path
import them

# Configuration of the storage directory
custom_storage_dir = Path(__file__).parent / "storage"
os.environ["CREWAI_STORAGE_DIR"] = str(custom_storage_dir)

import chromadb
from crewai.utilities.paths import db_storage_path

# Connect to CrewAI's ChromaDB
storage_path = db_storage_path()
chroma_path = os.path.join(storage_path, "knowledge")

if os.path.exists(chroma_path): 
client = chromadb.PersistentClient(path=chroma_path) 
collections = client.list_collections() 

print("ChromaDB Collections:") 
for collection in collections: 
print(f" - {collection.name}: {collection.count()} documents")
else:
print("No ChromaDB storage found")

print("CREWAI_STORAGE_DIR:", os.getenv("CREWAI_STORAGE_DIR"))
print("Current working directory:", os.getcwd())
print("Computed storage path:", db_storage_path())

And I got this result:

E:\projects\api_teste_python
(.venv) λ python pa.py
ChromaDB Collections:
- knowledge_Answer_only_SQL_related_questions__uses: 13 documents
CREWAI_STORAGE_DIR: E:\projects\api_teste_python\storage
Current working directory: E:\projects\api_teste_python
Computed storage path: E:\projects\api_test_python\storage

What I can’t understand, since it stores the embeddings, is why every time I ask a question, it sends the entire document to text-embedding-3-small.
In the tokenizer, my document gives 14,000 tokens, and every time I ask a question, I check the text-embedding-3-small dashboard, and it increases by 14,000, so it doesn’t update anything.

Note: The question is always the same one embedded in the script.

firstly, i noticed your storage paths don’t match. was this intentional or am missing something?

  • CREWAI_STORAGE_DIR: E:\projects\api_teste_python\storage
  • Computed storage path: E:\projects\api_test_python\storage

secondly, try using crew-level knowledge sources instead of agent-level and see if that fixes the issue.

Hi, thanks for replying!
I used os.environ[“CREWAI_STORAGE_DIR”] = str(Path(file).parent / “storage”).

My script:
import os
from pathlib import Path
from crewai import Agent, Task, Crew, Process, LLM
from crewai.knowledge.source.text_file_knowledge_source import TextFileKnowledgeSource
import warnings
warnings.filterwarnings(“ignore”)

os.environ[“CREWAI_STORAGE_DIR”] = str(Path(file).parent / “storage”)

doc_sources = TextFileKnowledgeSource(
file_paths=[“estrutura_db.txt”])

llm = LLM(
model=“gpt-4.1-nano-2025-04-14”,
max_tokens=150,
temperature=0,
)

agent = Agent(
role=“Responda apenas perguntas relacionadas a SQL, utilizando sempre respostas super curtas.”,
goal=“Mostrar as queries de acordo com a pergunta e mostrar os resultados da consulta.”,
backstory=“Você é um assistente que ajuda os usuários a resolver problemas de SQL. Tem acesso a documentação e conhecimento profundo sobre sql.”,
verbose=True,
allow_delegation=False,
memory= True,
llm=llm,

)

task = Task(
description=“Interaja com o usuario: {question}”,
expected_output=“Uma resposta a interação.”,
agent=agent,
)

crew = Crew(
agents=[agent],
tasks=[task],
verbose=True,
knowledge_sources=[doc_sources],
memory=True,
process=Process.sequential,
)

result = crew.kickoff(inputs={“question”: “mostre os campos que tem chave estrangeira na channel”})
print(result)

Even at crew-level knowledge sources, the complete content of the doc is still sent to the model.