Hello,
Basically my project is, I have data coming from a streamlit page, I need the agent to determine this incoming text is valid or not using a set of business rules. I’m trying to run the knowledge base with .txt file. My configuration is according to the docs and I’ve used embedding as well as chunking.
When I try to run the knowledge, I keep getting the same issue no matter what I try,
My error is
raise Exception(f"An error occurred while running the crew: {e}")
Exception: An error occurred while running the crew: Failed to create or get collection
File "D:\anaconda\envs\ocr\Lib\site-packages\chromadb\rate_limit\simple_rate_limit_init .py", line 23, in wrapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File “D:\anaconda\envs\ocr\Lib\site-packages\chromadb\api\segment.py”, line 202, in create_collection check_index_name(name) File “D:\anaconda\envs\ocr\Lib\site-packages\chromadb\api\segment.py”, line 90, in check_index_name raise ValueError(msg) ValueError: Expected collection name that (1) contains 3-63 characters, (2) starts and ends with an alphanumeric character, (3) otherwise contains only alphanumeric characters, underscores or hyphens (-), (4) contains no two consecutive periods (..) and (5) is not a valid IPv4 address,
My error is that chromadb is generating the collection name based on my data. It’s characters are more than 68 [which it’s generating on its own]. I tried hardcoding a name that satisfies the rules but still chromadb is generating its own unacceptable name.
My code is:
from crewai.knowledge.source.text_file_knowledge_source import TextFileKnowledgeSource
text_source = TextFileKnowledgeSource(
file_paths=[“Businessrulex.txt”],
chunk_size=200, # Maximum size of each chunk (default: 4000)
chunk_overlap=10, # Overlap between chunks (default: 200)
collection_name=“text_file_rules”
)
llm = LLM(model=“gpt-4o-mini”, temperature=0,)
@CrewBase
class Visionproj():
“”“Visionproj crew”“”
agents_config = 'config/agents.yaml'
tasks_config = 'config/tasks.yaml'
@agent
def claimagent(self) -> Agent:
return Agent(
config=self.agents_config['cagent'],
knowledge_sources=[text_source],
verbose=True,
llm=llm,
embedder={
"provider": "openai",
"config": {
"model": "text-embedding-3-small",
"api_key": os.environ['OPENAI_API_KEY'],
}
}
)
@task
def claimtask(self) -> Task:
return Task(
config=self.tasks_config['ctask'],
verbose=True,
)
@crew
def crew(self) -> Crew:
"""Creates the Visionproj crew"""
return Crew(
agents=self.agents, # Automatically created by the @agent decorator
tasks=self.tasks, # Automatically created by the @task decorator
process=Process.sequential,
verbose=True,
knowledge_sources=[text_source],
embedder={
"provider": "openai",
"config": {
"model": "text-embedding-3-small",
"api_key": os.environ['OPENAI_API_KEY'],
}
}
)
I tried hashlib.sha256(text.encode()), removing collection name and letting it create its own name[still created a very big useless name], client.get_or_create_collection(name=“test”), hardcoding collection_name=“” and everything.I’m still stuck on this part of chromadb, I dont know what else issues I will face later. I updated libraries but nothing worked. My agents and tasks yaml files are correctly configured btw.
Any kind of help would be very helpful.
Thank you!