Hello,
I have an issue with the integration of our company embedder to RAG tools like DirectorySearchTool.
Background: I use an internal company embedder model with openAI interface but it has a batch size limitation which can not be configured explicitely via standard CrewAI 1.14.1 interface. So in principle the following code does his job, but breaks the batch size restriction:
from crewai_tools import DirectorySearchTool
dicTool = DirectorySearchTool(
config={
"embedding_model": {
"provider": "openai",
"config": {
"model_name": "text-embedding-gte-multilingual-base",
"api_key": "sk-xxx",
"api_base": "https://companyURL/api/v1",
},
},
"vectordb": {
"provider": "chromadb",
"config": {
"collection_name": "project_code_index",
"allow_reset": True,
},
}
}
)
Problem: My idea is to write a wrapper (looping big batches by small batches) around the embedder and use a callback mechanism. But the following code (partially extracted from RAG Tool - CrewAI)
from crewai.rag.core.base_embeddings_callable import EmbeddingFunction
from crewai.rag.embeddings.providers.custom.types import CustomProviderSpec
class MyEmbeddingFunction(EmbeddingFunction):
def __call__(self, input):
# Your custom embedding logic
return None #embeddings
my_embedding_model: CustomProviderSpec = {
"provider": "custom",
"config": {
"embedding_callable": MyEmbeddingFunction
}
}
from crewai_tools import DirectorySearchTool
dicTool = DirectorySearchTool(
config={
"embedding_model": my_embedding_model,
"vectordb": {
"provider": "chromadb",
"config": {
"collection_name": "project_code_index",
"allow_reset": True,
},
}
}
)
runs into an error:
pydantic_core._pydantic_core.ValidationError: 1 validation error for DirectorySearchTool
config
Value error, Invalid configuration for embedding provider 'custom':
- config.embedding_callable: Input should be a subclass of EmbeddingFunction [type=value_error, input_value={'embedding_model': {'pro..., 'allow_reset': True}}}, input_type=dict]
I searched a lot but no custom callback example works.
Thanks