So, i’ve been fighting to use the crewAI docs as the knowledge source for the crew.
Meanwhile, found out following errors:
StringKnowledgeSource
accepts a single string a knowledge source, breaks it in chunks, but does not generates the metadata for the chunks, however requires it. This, producing errors like:
[WARNING]: Failed to init knowledge: Unequal lengths for fields: ids: 487, metadatas: 1, documents: 487 in upsert.
when used as:
StringKnowledgeSource(
content=load_crewai_docs(), #load_crewai_docs returns single-line string
)
- There is a typo in docs (
knowledge
section, sample custom knowledge source):
def add(self) -> None:
"""Process and store the articles."""
content = self.load_content()
for _, text in content.items():
chunks = self._chunk_text(text)
self.chunks.extend(chunks)
self._save_documents()
→ parent function have save_documents()
function without prefix _
- i’ve made the custom
LocalTxTFileKnowledgeSource
where i’ve included the dummy metadata generation:
class LocalTxTFileKnowledgeSource(BaseKnowledgeSource):
file_path: str = Field(description="Path to the local .txt file")
def load_content(self) -> Dict[str, str]:
try:
with open(self.file_path, "r", encoding="utf-8") as file:
content = file.read()
return {self.file_path: content}
except Exception as e:
raise ValueError(f"Failed to read the file {self.file_path}: {str(e)}")
def add(self) -> None:
"""Process and store the file content."""
content = self.load_content()
for _, text in content.items():
chunks = self._chunk_text(text)
self.chunks.extend(chunks)
chunks_metadata = [
{
"chunk_id": str(uuid.uuid4()),
"source": self.file_path,
"description": f"Chunk {i + 1} from file {self.file_path}"
}
for i in range(len(chunks))
]
self.save_documents(metadata=chunks_metadata)
(my guess is that metadata entries must contain some summary headers instead?)
when used in crew settings:
knowledge_sources=[LocalTxTFileKnowledgeSource(
file_path="docs_crewai/singlefile.txt",
)
I didn’t get any errors and seems like the documents were saved. However the agents within the crew behaved as if the data was not accessed, so maybe then it wasn’t loaded somehow? or loaded in some incorrect format?