Persistence Knowledge

Amado_Tairone · July 15, 2025, 3:04am

My Crewai app generates the storage bank but it doesn’t reuse the saved embeddings when I run the app again. I have local storage set to:

custom_storage_dir = Path(__file__).parent / "storage"
os.environ["CREWAI_STORAGE_DIR"] = str(custom_storage_dir).

My script:

import os
from pathlib import Path

custom_storage_dir = Path(__file__).parent / "storage"
os.environ["CREWAI_STORAGE_DIR"] = str(custom_storage_dir)

from crewai import Agent, Task, Crew, Process
from crewai.knowledge.storage.knowledge_storage import KnowledgeStorage
from crewai.knowledge.source.text_file_knowledge_source import TextFileKnowledgeSource

custom_storage = KnowledgeStorage( 
embedder={ 
"provider": "openai", 
"config": {"model": "text-embedding-3-small"}
},
collection_name="custom_dba",

)

doc_sources = TextFileKnowledgeSource(
file_paths=["estrutura_db.txt"])

doc_sources.storage = custom_storage

agent = Agent(
role="Answer only SQL-related questions, always using very short answers.",
goal="Display queries according to the question and show the query results.",
backstory="You are an assistant who helps users solve SQL problems. You have access to documentation and in-depth knowledge of SQL.",
verbose=True,
allow_delegation=False,
memory=True,
knowledge_sources=[doc_sources],

)

If I ask the same question again, it regenerates the 14,000 tokens from my txt file. When I run the app, it always asks the same question and generates 14,000 tokens again.

I ran this script

from pathlib import Path
import them

# Configuration of the storage directory
custom_storage_dir = Path(__file__).parent / "storage"
os.environ["CREWAI_STORAGE_DIR"] = str(custom_storage_dir)

import chromadb
from crewai.utilities.paths import db_storage_path

# Connect to CrewAI's ChromaDB
storage_path = db_storage_path()
chroma_path = os.path.join(storage_path, "knowledge")

if os.path.exists(chroma_path): 
client = chromadb.PersistentClient(path=chroma_path) 
collections = client.list_collections() 

print("ChromaDB Collections:") 
for collection in collections: 
print(f" - {collection.name}: {collection.count()} documents")
else:
print("No ChromaDB storage found")

print("CREWAI_STORAGE_DIR:", os.getenv("CREWAI_STORAGE_DIR"))
print("Current working directory:", os.getcwd())
print("Computed storage path:", db_storage_path())

And I got this result:

E:\projects\api_teste_python
(.venv) λ python pa.py
ChromaDB Collections:
- knowledge_Answer_only_SQL_related_questions__uses: 13 documents
CREWAI_STORAGE_DIR: E:\projects\api_teste_python\storage
Current working directory: E:\projects\api_teste_python
Computed storage path: E:\projects\api_test_python\storage

What I can’t understand, since it stores the embeddings, is why every time I ask a question, it sends the entire document to text-embedding-3-small.
In the tokenizer, my document gives 14,000 tokens, and every time I ask a question, I check the text-embedding-3-small dashboard, and it increases by 14,000, so it doesn’t update anything.

Note: The question is always the same one embedded in the script.

tonykipkemboi · July 15, 2025, 12:51pm

firstly, i noticed your storage paths don’t match. was this intentional or am missing something?

CREWAI_STORAGE_DIR: E:\projects\api_teste_python\storage
Computed storage path: E:\projects\api_test_python\storage

secondly, try using crew-level knowledge sources instead of agent-level and see if that fixes the issue.

Amado_Tairone · July 15, 2025, 1:38pm

Hi, thanks for replying!
I used os.environ[“CREWAI_STORAGE_DIR”] = str(Path(file).parent / “storage”).

My script:
import os
from pathlib import Path
from crewai import Agent, Task, Crew, Process, LLM
from crewai.knowledge.source.text_file_knowledge_source import TextFileKnowledgeSource
import warnings
warnings.filterwarnings(“ignore”)

os.environ[“CREWAI_STORAGE_DIR”] = str(Path(file).parent / “storage”)

doc_sources = TextFileKnowledgeSource(
file_paths=[“estrutura_db.txt”])

llm = LLM(
model=“gpt-4.1-nano-2025-04-14”,
max_tokens=150,
temperature=0,
)

agent = Agent(
role=“Responda apenas perguntas relacionadas a SQL, utilizando sempre respostas super curtas.”,
goal=“Mostrar as queries de acordo com a pergunta e mostrar os resultados da consulta.”,
backstory=“Você é um assistente que ajuda os usuários a resolver problemas de SQL. Tem acesso a documentação e conhecimento profundo sobre sql.”,
verbose=True,
allow_delegation=False,
memory= True,
llm=llm,

)

task = Task(
description=“Interaja com o usuario: {question}”,
expected_output=“Uma resposta a interação.”,
agent=agent,
)

crew = Crew(
agents=[agent],
tasks=[task],
verbose=True,
knowledge_sources=[doc_sources],
memory=True,
process=Process.sequential,
)

result = crew.kickoff(inputs={“question”: “mostre os campos que tem chave estrangeira na channel”})
print(result)

Even at crew-level knowledge sources, the complete content of the doc is still sent to the model.

Topic		Replies	Views
Custom embedding model and custom Knowledge storage CrewAI Community Support crewai	0	197	June 3, 2025
Unable to save the Vector db of Crewai Docling Knowledge CrewAI Community Support agent , crewai	4	128	April 13, 2025
CrewAI Long Term Memory Persistence CrewAI Community Support crewai , feature , memory	4	1475	February 6, 2025
String Knowledge sources not working with Gemini CrewAI Community Support tools_issues	62	1775	April 19, 2025
How to use multiple independant "memory" and RAG database for different crews? CrewAI Community Support memory	10	859	July 3, 2025

Persistence Knowledge

Related topics