[BUG] Knowledge Source metadata generation doesn't work (and possibly the knowledge store at all)

So, i’ve been fighting to use the crewAI docs as the knowledge source for the crew.

Meanwhile, found out following errors:

  1. StringKnowledgeSource accepts a single string a knowledge source, breaks it in chunks, but does not generates the metadata for the chunks, however requires it. This, producing errors like:
[WARNING]: Failed to init knowledge: Unequal lengths for fields: ids: 487, metadatas: 1, documents: 487 in upsert.

when used as:

            StringKnowledgeSource(
                content=load_crewai_docs(),   #load_crewai_docs returns single-line string
            )
  1. There is a typo in docs (knowledge section, sample custom knowledge source):
    def add(self) -> None:
        """Process and store the articles."""
        content = self.load_content()
        for _, text in content.items():
            chunks = self._chunk_text(text)
            self.chunks.extend(chunks)

        self._save_documents()

→ parent function have save_documents() function without prefix _

  1. i’ve made the custom LocalTxTFileKnowledgeSource where i’ve included the dummy metadata generation:
class LocalTxTFileKnowledgeSource(BaseKnowledgeSource):
    file_path: str = Field(description="Path to the local .txt file")
    def load_content(self) -> Dict[str, str]:
        try:
            with open(self.file_path, "r", encoding="utf-8") as file:
                content = file.read()
            return {self.file_path: content}
        except Exception as e:
            raise ValueError(f"Failed to read the file {self.file_path}: {str(e)}")

    def add(self) -> None:
        """Process and store the file content."""
        content = self.load_content()

        for _, text in content.items():
            chunks = self._chunk_text(text)
            self.chunks.extend(chunks)

            chunks_metadata = [
                {
                    "chunk_id": str(uuid.uuid4()),
                    "source": self.file_path,
                    "description": f"Chunk {i + 1} from file {self.file_path}"
                }
                for i in range(len(chunks))
            ]

        self.save_documents(metadata=chunks_metadata)

(my guess is that metadata entries must contain some summary headers instead?)

when used in crew settings:

            knowledge_sources=[LocalTxTFileKnowledgeSource(
                file_path="docs_crewai/singlefile.txt",
            )

I didn’t get any errors and seems like the documents were saved. However the agents within the crew behaved as if the data was not accessed, so maybe then it wasn’t loaded somehow? or loaded in some incorrect format?

opened issue: [BUG] Knowledge Source metadata generation doesn’t work (and possibly the knowledge store at all) · Issue #1757 · crewAIInc/crewAI · GitHub

2 Likes

I also ran into this same issue.

I used the example from the document and got similar output;

[2024-12-21 20:59:12][ERROR]: Failed to upsert documents: Expected metadata to be a non-empty dict, got 0 metadata attributes in upsert
.
from crewai import Agent, Task, Crew, Process, LLM
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource

# Create a knowledge source
content = "Users name is John. He is 30 years old and lives in San Francisco."
string_source = StringKnowledgeSource(
    content=content,
)

# Create an LLM with a temperature of 0 to ensure deterministic outputs
llm = LLM(model="gpt-4o-mini", temperature=0)

# Create an agent with the knowledge store
agent = Agent(
    role="About User",
    goal="You know everything about the user.",
    backstory="""You are a master at understanding people and their preferences.""",
    verbose=True,
    allow_delegation=False,
    llm=llm,
)
task = Task(
    description="Answer the following questions about the user: {question}",
    expected_output="An answer to the question.",
    agent=agent,
)

crew = Crew(
    agents=[agent],
    tasks=[task],
    verbose=True,
    process=Process.sequential,
    knowledge_sources=[
        string_source
    ],  # Enable knowledge by adding the sources here. You can also add more sources to the sources list.
)

result = crew.kickoff(
    inputs={"question": "What city does John live in and how old is he?"}
)

and got the following output;

1 Like

The same thing is happening here, has anyone managed to solve it?

[2024-12-26 14:19:09][ERROR]: Failed to upsert documents: Unequal lengths for fields: ids: 14, metadatas: 1, documents: 14 in upsert.

[2024-12-26 14:19:09][WARNING]: Failed to init knowledge: Unequal lengths for fields: ids: 14, metadatas: 1, documents: 14 in upsert.

I am getting issues with the CrewDoclingSource using the specified code in the example as well and I get “NameError: name ‘DoclingDocument’ is not defined”

Has anyone got the knowledge part of crewai working yet?

1 Like

It is not working for me either

DoclingSource
content: List[DoclingDocument] = Field(default_factory=list)
              ^^^^^^^^^^^^^^^

NameError: name ‘DoclingDocument’ is not defined

Same error with CrewDoclingSource. NameError: name ‘DoclingDocument’ is not defined.

Is there a solution already?

yeah after the last update metadata issues were fixed, and the new knowledge source introduced. also got those import errors.

So, there are few issues:

  1. you need to install the docling package. this exception is not rised for some reason:
        if not DOCLING_AVAILABLE:
            raise ImportError(
                "The docling package is required to use CrewDoclingSource. "
                "Please install it using: uv add docling"
            )
  1. Note that the supported formats are:
        default_factory=lambda: DocumentConverter(
            allowed_formats=[
                InputFormat.MD,
                InputFormat.ASCIIDOC,
                InputFormat.PDF,
                InputFormat.DOCX,
                InputFormat.HTML,
                InputFormat.IMAGE,
                InputFormat.XLSX,
                InputFormat.PPTX,
            ]
        )

so the docs are incorrect when mentioning the .txt source

# Create a text file knowledge source
text_source = CrewDoclingSource(
    file_paths=["document.txt", "another.txt"]
)
1 Like