🤔 Flows - CrewDoclingSource Markdown Knowledge Source Error, Ollama Local LLM Environment

Matthew · February 2, 2025, 1:02am

Hi!
I would greatly appreciate any insights on this issue.
I’m attempting to utilize a markdown.md file as a knowledge source within a crew that is currently in a flow.
The entire environment is local, and I’m working with Ollama.
The knowledge file is located in a folder named [knowledge], which resides in the same directory as main.py.

Knowledge integration works seamlessly in a single .py crew, in that I can also successfully use a .md file.

Here is the Error:

[ERROR]: Failed to upsert documents: APIStatusError.__init__() missing 2 required keyword-only arguments: 'response' and 'body'
[WARNING]: Failed to init knowledge: APIStatusError.__init__() missing 2 required keyword-only arguments: 'response' and 'body'

Versions:
Crewai: 0.100.1
crewai-tools: 0.33.0
Python 3.12.8
Using Ollama only, local llms i have tried. deepseekR1:7b, llama3.2:3b. others
Different temps.

Here is my code:

# /crews/poem_crew/poemcrew.py
from crewai.knowledge.source.crew_docling_source import CrewDoclingSource
knowledge_source = CrewDoclingSource(
    file_paths=["knowledge.md"]
)

And

# /crews/poem_crew/poemcrew.py
@CrewBase
class PoemCrew:
    """Poem Crew"""

    agents_config = "config/agents.yaml"
    tasks_config = "config/tasks.yaml"
    llm = LLM(model="ollama/deepseek-r1:7b", temperature=0.7)

    @agent
    def poem_writer(self) -> Agent:
        return Agent(
            config=self.agents_config["poem_writer"],
            # memory=True,
            llm=self.llm,
            # LLM=LLM(model="ollama/llama3-70b-8192", temperature=0.3),
        )

    @task
    def write_poem(self) -> Task:
        return Task(
            config=self.tasks_config["write_poem"],
        )

    @crew
    def crew(self) -> Crew:

        return Crew(
            agents=self.agents,
            tasks=self.tasks,
            process=Process.sequential,
            # memory=True,
            verbose=True,
            knowledge_sources=[knowledge_source],
            embedder={
                "provider": "ollama",
                "config": {
                    "model": "mxbai-embed-large"
                }
            }
        )

I have also tried the following:

# /crews/poem_crew/poemcrew.py
from crewai.knowledge.source.crew_docling_source import CrewDoclingSource
knowledge_source = CrewDoclingSource(
    file_paths=["knowledge.md"],
    chunk_size=4000,     # Characters per chunk (default)
    chunk_overlap=200,  # Overlap between chunks (default)
)

and

# /crews/poem_crew/poemcrew.py
from crewai.knowledge.source.crew_docling_source import CrewDoclingSource
from crewai.knowledge.storage.knowledge_storage import KnowledgeStorage
knowledge_source = CrewDoclingSource(
    file_paths=["knowledge.md"],
    storage=KnowledgeStorage(
        embedder_config={
            "provider": "ollama",
            "model": "nomic-embed-text",
            "base_url": "http://localhost:11434"
        }
    )
)

The above gives a slightly Different Error:
Failed to upsert documents: APIStatusError.init() missing 2 required keyword-only arguments: ‘response’ and ‘body’

tried updating:

pip Install --upgrade crewai crewai-tools transformers tokenizers docling docling-core
from crewai.knowledge.source.crew_docling_source import CrewDoclingSource
from crewai.knowledge.source.text_file_knowledge_source import TextFileKnowledgeSource

Anyone have any examples of a fow: crew.py with Knowledge, CrewDoclingSource and markdown.md with just ollama?

** Is this a flow problem?

github.com/crewAIInc/crewAI

[BUG] Markdown Knowledge Sub-Module not working

opened 08:50PM - 14 Jan 25 UTC

closed 12:17PM - 20 Feb 25 UTC

jaredrobinsonchurch

bug no-issue-activity

### Description I am trying to use markdown files as knowledge for the crew but… when I try to use CrewDoclingSource with markdown files the docling\backend\md_backend.py file line snippet_text = str(element.children[0].children[0].children) throws error IndexError: list index out of range An error occurred while running the crew: Command '['uv', 'run', 'run_crew']' returned non-zero exit status 1. If I convert a file to a word document it works fine, but id rather not have to do that for all incoming documents. ### Steps to Reproduce Create any basic crew project and use a CrewDoclingSource knowledge for a markdown file, that's all I did ### Expected behavior MD Files would be useable by the crew as a knowledge base ### Screenshots/Code snippets content_source = CrewDoclingSource( file_paths=['0-introductory-overview.md', '3-priesthood-principles.md'], ) @crew def crew(self) -> Crew: """Creates the Search crew""" return Crew( agents=self.agents, tasks=self.tasks, process=Process.sequential, verbose=True, knowledge_sources=[ content_source ] ) ### Operating System Windows 11 ### Python Version 3.12 ### crewAI Version 0.95.0 ### crewAI Tools Version 0.25.8 ### Virtual Environment Venv ### Evidence PS C:\Users\jred\OneDrive - Church of Jesus Christ\Desktop\githubRepos\aei_agent_exploration\crewai\search> crewai run Running the Crew [2025-01-14 13:16:40][ERROR]: Error loading content: list index out of range Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "C:\Users\jred\OneDrive - Church of Jesus Christ\Desktop\githubRepos\aei_agent_exploration\crewai\search\.venv\Scripts\run_crew.exe\__main__.py", line 5, in <module> File "C:\Users\jred\OneDrive - Church of Jesus Christ\Desktop\githubRepos\aei_agent_exploration\crewai\search\src\search\main.py", line 5, in <module> from search.crew import Search File "C:\Users\jred\OneDrive - Church of Jesus Christ\Desktop\githubRepos\aei_agent_exploration\crewai\search\src\search\crew.py", line 21, in <module> content_source = CrewDoclingSource( ^^^^^^^^^^^^^^^^^^ File "C:\Users\jred\OneDrive - Church of Jesus Christ\Desktop\githubRepos\aei_agent_exploration\crewai\search\.venv\Lib\site-packages\crewai\knowledge\source\crew_docling_source.py", line 33, in __init__ super().__init__(*args, **kwargs) File "C:\Users\jred\OneDrive - Church of Jesus Christ\Desktop\githubRepos\aei_agent_exploration\crewai\search\.venv\Lib\site-packages\pydantic\main.py", line 214, in __init__ validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jred\OneDrive - Church of Jesus Christ\Desktop\githubRepos\aei_agent_exploration\crewai\search\.venv\Lib\site-packages\pydantic\_internal\_model_construction.py", line 126, in wrapped_model_post_init original_model_post_init(self, context) File "C:\Users\jred\OneDrive - Church of Jesus Christ\Desktop\githubRepos\aei_agent_exploration\crewai\search\.venv\Lib\site-packages\crewai\knowledge\source\crew_docling_source.py", line 66, in model_post_init self.content = self._load_content() ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jred\OneDrive - Church of Jesus Christ\Desktop\githubRepos\aei_agent_exploration\crewai\search\.venv\Lib\site-packages\crewai\knowledge\source\crew_docling_source.py", line 80, in _load_content raise e File "C:\Users\jred\OneDrive - Church of Jesus Christ\Desktop\githubRepos\aei_agent_exploration\crewai\search\.venv\Lib\site-packages\crewai\knowledge\source\crew_docling_source.py", line 70, in _load_content return self._convert_source_to_docling_documents() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jred\OneDrive - Church of Jesus Christ\Desktop\githubRepos\aei_agent_exploration\crewai\search\.venv\Lib\site-packages\crewai\knowledge\source\crew_docling_source.py", line 92, in _convert_source_to_docling_documents return [result.document for result in conv_results_iter] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jred\OneDrive - Church of Jesus Christ\Desktop\githubRepos\aei_agent_exploration\crewai\search\.venv\Lib\site-packages\docling\document_converter.py", line 212, in convert_all for conv_res in conv_res_iter: File "C:\Users\jred\OneDrive - Church of Jesus Christ\Desktop\githubRepos\aei_agent_exploration\crewai\search\.venv\Lib\site-packages\docling\document_converter.py", line 247, in _convert for item in map( File "C:\Users\jred\OneDrive - Church of Jesus Christ\Desktop\githubRepos\aei_agent_exploration\crewai\search\.venv\Lib\site-packages\docling\document_converter.py", line 288, in _process_document conv_res = self._execute_pipeline(in_doc, raises_on_error=raises_on_error) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jred\OneDrive - Church of Jesus Christ\Desktop\githubRepos\aei_agent_exploration\crewai\search\.venv\Lib\site-packages\docling\document_converter.py", line 311, in _execute_pipeline conv_res = pipeline.execute(in_doc, raises_on_error=raises_on_error) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jred\OneDrive - Church of Jesus Christ\Desktop\githubRepos\aei_agent_exploration\crewai\search\.venv\Lib\site-packages\docling\pipeline\base_pipeline.py", line 52, in execute raise e File "C:\Users\jred\OneDrive - Church of Jesus Christ\Desktop\githubRepos\aei_agent_exploration\crewai\search\.venv\Lib\site-packages\docling\pipeline\base_pipeline.py", line 44, in execute conv_res = self._build_document(conv_res) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jred\OneDrive - Church of Jesus Christ\Desktop\githubRepos\aei_agent_exploration\crewai\search\.venv\Lib\site-packages\docling\pipeline\simple_pipeline.py", line 41, in _build_document conv_res.document = conv_res.input._backend.convert() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jred\OneDrive - Church of Jesus Christ\Desktop\githubRepos\aei_agent_exploration\crewai\search\.venv\Lib\site-packages\docling\backend\md_backend.py", line 340, in convert self.iterate_elements(parsed_ast, 0, doc, None) File "C:\Users\jred\OneDrive - Church of Jesus Christ\Desktop\githubRepos\aei_agent_exploration\crewai\search\.venv\Lib\site-packages\docling\backend\md_backend.py", line 306, in iterate_elements self.iterate_elements(child, depth + 1, doc, parent_element) File "C:\Users\jred\OneDrive - Church of Jesus Christ\Desktop\githubRepos\aei_agent_exploration\crewai\search\.venv\Lib\site-packages\docling\backend\md_backend.py", line 306, in iterate_elements self.iterate_elements(child, depth + 1, doc, parent_element) File "C:\Users\jred\OneDrive - Church of Jesus Christ\Desktop\githubRepos\aei_agent_exploration\crewai\search\.venv\Lib\site-packages\docling\backend\md_backend.py", line 212, in iterate_elements snippet_text = str(element.children[0].children[0].children) ~~~~~~~~~~~~~~~~^^^ IndexError: list index out of range An error occurred while running the crew: Command '['uv', 'run', 'run_crew']' returned non-zero exit status 1. ### Possible Solution None, I think the sub library just needs to be fixed ### Additional context There appears to be some kind of disconnect between the open source documentation (https://docs.crewai.com/concepts/knowledge) and reality because the documentation says text is supported but running it the library says md is supported.

Possible Solution - None, I think the sub library just needs to be fixed

Deepak_Dhiman · February 11, 2025, 11:21am

Hello

This can also mean that embedding are not being generated and saved in chroma. Put the logger (below code) when you run crew code blog and observe the logs. You might see 429 status code somewhere.

import logging
logging.basicConfig(level=logging.DEBUG)

Try with some open source embedder in that case as shown below:

crew = Crew(
agents=[agent],
tasks=[task],
memory=False,
verbose=True,
knowledge_sources=[pdf_source], 
embedder={
    "provider": "google",
    "config": {
        "model": "models/text-embedding-004",
        "api_key": os.getenv("GEMINI_API_KEY"),
    }
}

Hope it helps

Matthew · February 12, 2025, 12:14am

Hello deepak_Dhiman
Thank you for reaching out,
I am using debugging, and i have entered the debugging at the top of my crew.py

print('🪰--- #1 | Debugging 2025.02.10  -------🪰')
import logging
import litellm
logging.basicConfig(level=logging.DEBUG)
litellm.set_verbose = True
os.environ['LITELLM_LOG'] = 'DEBUG'  # Enable detailed logging for LiteLLM
os.environ['LITELLM_CACHE'] = 'True'   # Enable caching to improve API response performance

I am using Ollama.
What exactly is the 429 status code?

Deepak_Dhiman · February 12, 2025, 9:53am

Hi Matthew

429 status code is related to API overloaded issue in case of OpenAI and Azure OpenAI. Although you said you are using some other model using Ollama wrapper. So I am not sure how helpful it would be for you. Nonetheless if you are getting APIStatusError.init() missing 2 required keyword-only arguments: ‘response’ and ‘body’ type of error, it indicates that embeddings are not formed and saved in Chroma DB.

Why don’t you try with open source models such as Huggingface or Gemini or Groq. That way you will know whether you have model specific error or it something else. My suggestion is, that you experiment with smaller data file for now. All the best

https://help.openai.com/en/articles/6891829-error-code-429-rate-limit-reached-for-requests

Matthew · February 15, 2025, 6:17pm

Fixed the issue.
went back to single py file “crew.py”
running just crew.py (kept agents.yaml and tasks.py in config folder)
added this to the bottom of the crew.py
news_input = get_news_input()

poem_crew_instance = PoemCrew() # Create an instance of PoemCrew
crew_object = poem_crew_instance.crew() # Call the crew() method to get a Crew object
result = crew_object.kickoff(inputs=news_input) # Now call kickoff()
print(result)

and everything works great!! ollama, gemini, groq… etc
Hope this helps someone who is stuck!
Never Give Up!

Thom_Web_Com · March 1, 2025, 7:43pm

I also have this issue when trying to use PDF as knowledge source.

Tried the gemini embedder as suggested but does not work.

Matthew · April 10, 2025, 12:47am

I have it working now!
CrewDoclingSource Markdown File Knowledge File
Ollama using Knowledge md file
Update a knowledge markdown file using Obsidian, Knowledge file outside of the script directory for a crew

Here is the code for the community! hope this helps someone

knowledge_file = Path(userprofile_dir) / "OneDrive" / "Documents" / "Obsidian2025" / "matt" / "IN" / "knowledge" / "knowledge.md"
print(f"Looking for knowledge file at: {knowledge_file}")
print(f"File exists: {knowledge_file.exists()}")from crewai.knowledge.storage.knowledge_storage import KnowledgeStorage
from crewai.knowledge.source.crew_docling_source import CrewDoclingSource
# Initialize the knowledge storage with Ollama embeddings
knowledge_storage = KnowledgeStorage()
knowledge_storage._set_embedder_config({
    "provider": "ollama",
    "config": {
        "model": "mxbai-embed-large",
        "api_key": "NA",
        "base_url": "http://127.0.0.1:11434"
    }
})

knowledge_source = CrewDoclingSource(
    file_paths=[knowledge_file],
    storage=knowledge_storage
)
print(f"✅ Knowledge source initialized with file: {knowledge_file}")

Topic		Replies	Views
String Knowledge sources not working with Gemini CrewAI Community Support tools_issues	62	1888	April 19, 2025
Failing to embed knowledge source using ollama CrewAI Community Support	4	626	June 12, 2025
Agent does not recognize the knowledge sources file CrewAI Community Support agent , crewai	7	844	April 7, 2025
CrewAI is not allowing the use of a knowledge source when using an Ollama-based LLM. It defaults to requiring an OpenAI API key, even though Ollama is correctly configure General openai , agent , crewai	6	883	February 6, 2025
Getting error when using the knowledge and embedder configuration CrewAI Community Support crewai , feature , memory	10	716	February 2, 2025

🤔 Flows - CrewDoclingSource Markdown Knowledge Source Error, Ollama Local LLM Environment

Possible Solution - None, I think the sub library just needs to be fixed

Related topics