Craft a guide to using Memory in CrewAI please

Can you please provide example doco and code for the following area

RAG
RAG CSV RAG Search - CrewAI, PDF RAG Search - CrewAI etc

Provide working examples, how to agents can access it, how you let them see it, is it global? etc.
Then a simple “poem” like example with it working so we can grok it fully.

Knowledge and Memory

Can you please provide a simple example as to what these mean. Do they mean RAG? do they mean Global Flow variables or Crew specific.

How long do they last, how can we see if they work, how do we clear

Examples

Please provide full examples that are working, also put these in your code tests as things like

# Create a text file knowledge source
text_source = CrewDoclingSource(
    file_paths=["document.txt", "another.txt"]

does not work for me as CrewDoclingSource does not support .txt files but the doco says it does as test rig with all core features would help pick up these issues.

For this

Error loading content: File format not allowed: knowledge/members.txt. Supported formats: [<InputFormat.MD: 'md'>, <InputFormat.ASCIIDOC: 'asciidoc'>, <InputFormat.PDF: 'pdf'>, <InputFormat.DOCX: 'docx'>, <InputFormat.HTML: 'html'>, <InputFormat.IMAGE: 'image'>, <InputFormat.XLSX: 'xlsx'>, <InputFormat.PPTX: 'pptx'>]

I recognise a lot of this is new and you are a growing team so grateful for the doco.

1 Like

Hey @Tony_Wood, thinking of doing a blog post of these topics because it seems the community needs more tutorials and example code for memory and knowledge configuration and usage.

What are the pain points you have noticed around these topics and which material ended up bridging the game for you to understand these things. I’m assuming your questions where answered since this issue is a bit old :sweat_smile:

@Tony_Wood, hey, correct me if I’m off base here, but I couldn’t spot anything in the docs saying CrewDoclingSource handles TXT files. What it does say is:

The CrewDoclingSource is actually quite versatile and can handle multiple file formats including MD, PDF, DOCX, HTML, and more.

Now, you could probably (haven’t tried it myself, full disclosure) just rename your files to something like document.md or another.md – since Markdown is basically just text with some bells and whistles. But the thing is, Docling is a bit of a heavyweight for plain old TXT files, kind of like you brilliantly pointed out in that other discussion. For something lean and mean like TXT files, you’ve got TextFileKnowledgeSource on your side.

Seeing as you can pile on as many knowledge sources as you want, it’s probably a smart move to use the right tool for the right job for each file type in your knowledge base. Here’s a quick example where I’m mixing CrewDoclingSource (for this webpage) with TextFileKnowledgeSource (using content I snagged from this article) to build up a more well-rounded knowledge base.

Just a couple of things to keep in mind:

  • Make sure your local knowledge files are tucked away in the knowledge folder.
  • You can pass the knowledge_sources parameter either globally to the Crew (like I did in the example below, just to keep things simple) or target it specifically to an Agent, which was covered over in this other thread.

knowledge/elon_musk_offline.txt file:

"Elon Musk becomes first person to surpass $400B net worth, according to 
Bloomberg," by Rob Wile.

Musk spent a quarter of a billion dollars this year helping elect Donald Trump 
to a second term as president.

Elon Musk's estimated net worth has now surpassed $400 billion, according to 
an estimate from Bloomberg News — making him the first person ever to reach 
such a milestone.

knowledge_simple_example.py file:

from crewai import LLM, Agent, Crew, Process, Task
from crewai.knowledge.source.crew_docling_source import CrewDoclingSource
from crewai.knowledge.source.text_file_knowledge_source import TextFileKnowledgeSource
import os

os.environ["GEMINI_API_KEY"] = "<YOUR_API_KEY>"

remote_sources = CrewDoclingSource(
    file_paths=["https://dantekim.com/notes/elon-musk-tesla-spacex-and-the-quest-for-a-fantastic-future/"],
)

local_sources = TextFileKnowledgeSource(
    file_paths=["elon_musk_offline.txt"],
)

google_embedder = {
    "provider": "google",
    "config": {
        "model": "models/text-embedding-004",
        "api_key": os.environ["GEMINI_API_KEY"],
    },
}

gemini_llm = LLM(
    model="gemini/gemini-2.5-flash-preview-04-17",
    temperature=0.7
)

kb_qa_agent = Agent(
    role="Knowledge Base QA Bot",
    goal=(
        "Answer questions using ONLY the provided knowledge base."
    ),
    backstory=(
        "I am an AI assistant designed to answer questions *exclusively* "
        "from the information contained within the provided knowledge sources. "
        "If the information isn't found, I will clearly state that. "
        "My answers should be concise, single paragraphs, and not exceed 400 characters."
    ),
    llm=gemini_llm,
    allow_delegation=False,
    verbose=True
)

kb_qa_task = Task(
    description=(
        "Carefully analyze the provided knowledge and answer the following question:\n\n"
        "'{question}'\n\n"
        "You *must* use only the information available in the knowledge base. "
        "If the answer cannot be found in the provided documents, "
        "you must explicitly state that the information is not available."
    ),
    expected_output=(
        "A single paragraph answer, with a maximum of 400 characters, "
        "based *strictly* on the information found in the knowledge base. "
        "Alternatively, a statement like 'Information not found in the knowledge base.'"
    ),
    agent=kb_qa_agent
)

knowledge_crew = Crew(
    agents=[kb_qa_agent],
    tasks=[kb_qa_task],
    process=Process.sequential,
    knowledge_sources=[remote_sources, local_sources],
    embedder=google_embedder,
    verbose=True
)

response = knowledge_crew.kickoff(
    inputs={"question": "What do you know about Musk's life and fortune?"}
)

print(f"\n[🤖 Response]: {response.raw}")
1 Like