@Tony_Wood, hey, correct me if I’m off base here, but I couldn’t spot anything in the docs saying CrewDoclingSource
handles TXT files. What it does say is:
The CrewDoclingSource
is actually quite versatile and can handle multiple file formats including MD, PDF, DOCX, HTML, and more.
Now, you could probably (haven’t tried it myself, full disclosure) just rename your files to something like document.md
or another.md
– since Markdown is basically just text with some bells and whistles. But the thing is, Docling is a bit of a heavyweight for plain old TXT files, kind of like you brilliantly pointed out in that other discussion. For something lean and mean like TXT files, you’ve got TextFileKnowledgeSource
on your side.
Seeing as you can pile on as many knowledge sources as you want, it’s probably a smart move to use the right tool for the right job for each file type in your knowledge base. Here’s a quick example where I’m mixing CrewDoclingSource
(for this webpage) with TextFileKnowledgeSource
(using content I snagged from this article) to build up a more well-rounded knowledge base.
Just a couple of things to keep in mind:
- Make sure your local knowledge files are tucked away in the
knowledge
folder.
- You can pass the
knowledge_sources
parameter either globally to the Crew
(like I did in the example below, just to keep things simple) or target it specifically to an Agent
, which was covered over in this other thread.
knowledge/elon_musk_offline.txt
file:
"Elon Musk becomes first person to surpass $400B net worth, according to
Bloomberg," by Rob Wile.
Musk spent a quarter of a billion dollars this year helping elect Donald Trump
to a second term as president.
Elon Musk's estimated net worth has now surpassed $400 billion, according to
an estimate from Bloomberg News — making him the first person ever to reach
such a milestone.
knowledge_simple_example.py
file:
from crewai import LLM, Agent, Crew, Process, Task
from crewai.knowledge.source.crew_docling_source import CrewDoclingSource
from crewai.knowledge.source.text_file_knowledge_source import TextFileKnowledgeSource
import os
os.environ["GEMINI_API_KEY"] = "<YOUR_API_KEY>"
remote_sources = CrewDoclingSource(
file_paths=["https://dantekim.com/notes/elon-musk-tesla-spacex-and-the-quest-for-a-fantastic-future/"],
)
local_sources = TextFileKnowledgeSource(
file_paths=["elon_musk_offline.txt"],
)
google_embedder = {
"provider": "google",
"config": {
"model": "models/text-embedding-004",
"api_key": os.environ["GEMINI_API_KEY"],
},
}
gemini_llm = LLM(
model="gemini/gemini-2.5-flash-preview-04-17",
temperature=0.7
)
kb_qa_agent = Agent(
role="Knowledge Base QA Bot",
goal=(
"Answer questions using ONLY the provided knowledge base."
),
backstory=(
"I am an AI assistant designed to answer questions *exclusively* "
"from the information contained within the provided knowledge sources. "
"If the information isn't found, I will clearly state that. "
"My answers should be concise, single paragraphs, and not exceed 400 characters."
),
llm=gemini_llm,
allow_delegation=False,
verbose=True
)
kb_qa_task = Task(
description=(
"Carefully analyze the provided knowledge and answer the following question:\n\n"
"'{question}'\n\n"
"You *must* use only the information available in the knowledge base. "
"If the answer cannot be found in the provided documents, "
"you must explicitly state that the information is not available."
),
expected_output=(
"A single paragraph answer, with a maximum of 400 characters, "
"based *strictly* on the information found in the knowledge base. "
"Alternatively, a statement like 'Information not found in the knowledge base.'"
),
agent=kb_qa_agent
)
knowledge_crew = Crew(
agents=[kb_qa_agent],
tasks=[kb_qa_task],
process=Process.sequential,
knowledge_sources=[remote_sources, local_sources],
embedder=google_embedder,
verbose=True
)
response = knowledge_crew.kickoff(
inputs={"question": "What do you know about Musk's life and fortune?"}
)
print(f"\n[🤖 Response]: {response.raw}")