Building a Vendor-Agnostic Chat Application: Can CrewAI Replace proprietary Agent Logic (like in OpenAI ChatKit)?

How to Combine CrewAI Agents with Universal Chat History and Multi-LLM Support?

Hello CrewAI Community,

I’m developing a new internal chat application for my team, and I’m aiming for maximum flexibility and autonomy. I’m impressed by the collaborative power of CrewAI, but I’m facing a challenge in fitting it into a full, persistent chat solution.

I originally looked at frameworks like OpenAI’s ChatKit for Python because it handles the chat session history, user management, and multi-turn conversations well. However, it’s very limiting—it forces me to use only OpenAI’s LLMs and their proprietary storage solutions.

:light_bulb: My Goal: OmniChatKit (A Universal Chat Solution)

I want to build a solution (let’s call it OmniChatKit for now) that achieves three things:

  1. Universal LLM Support: Use ANY LLM (GPT-4, Claude, local models like Llama, etc.) based on cost or performance needs.

  2. Persistent History: Store all chat history (messages, user sessions) in standard databases like Redis or Postgres.

  3. Powerful Agent Collaboration: When a user asks a complex question, I want to deploy autonomous, specialized agents using CrewAI to perform the task.

:thinking: Seeking Advice on Integrating CrewAI

My main pain point is connecting the powerful task-based automation of CrewAI back into the continuous, history-based session of a chat application.

  1. Bridging the Gap: How do you recommend integrating a CrewAI execution run into an ongoing chat session? When a Crew finishes a task, what is the best practice to capture that final output and seamlessly save it back into the user’s continuous chat history?

  2. LLM Configuration: Since my application needs to configure many different LLM providers (for chat history summarization, quick replies, and for the CrewAI agents themselves), is it best to just use the LLM support built into CrewAI (LiteLLM) for all LLM calls in my application, or should I create a separate, unified layer?

  3. Scalability Concerns: For an application supporting many concurrent chat sessions, are there any best practices for running CrewAI (e.g., using a manager agent, specific Process type, or cloud deployment strategy) to ensure rapid response times?

Any advice from people who have successfully built a long-running, multi-user chat interface powered by CrewAI would be incredibly helpful!

Thank you,

Hans Tsang

Integrating CrewAI into a flexible, persistent chat platform like your envisioned OmniChatKit involves aligning three layers:

  • the session layer (conversation state),
  • the agent orchestration layer (CrewAI),
  • and the LLM abstraction layer (LiteLLM/unified adapter).

Below is a practical architecture and set of best practices based on CrewAI documentation, AMP platform insights, and recent developer experiences.


Bridging CrewAI with Chat Session History

CrewAI focuses on task-based execution — that is, a Crew completes a run, returns a structured output, and stops. To make it work in a continuous session, treat each Crew run as a sub-process in the wider conversation.

Recommended integration strategy:

  1. Session coupling via message queue or middleware
    Use a pub/sub or task queue (Redis Streams, Kafka, or Celery) to route a user’s task request to the CrewAI execution service. The Crew’s response (including intermediate messages or logs) can then be streamed back to the chat UI.
  2. Persist outputs as chat turns
    Once a Crew finishes, persist the final summarized message in your chat schema (e.g., a “role=assistant_system” entry). Include metadata such as Crew name, participants, and execution time for auditing or model selection later.
  3. Context handoff with embeddings
    Before invoking a new Crew run, embed the last N turns into vector storage (e.g., PostgreSQL + pgvector) to give agents contextual awareness. CrewAI doesn’t maintain conversational memory internally between runs , so external persistence is key.
  4. Optionally use CrewAI AMP
    AMP (Agent Management Platform) can handle observation, tracing, and centralized state sync if you move beyond the open-source library.

Universal Multi-LLM Configuration

CrewAI natively supports providers via LiteLLM, the abstraction layer powering all its model connections. You can rely on it for most operations:

  • Use LiteLLM inside CrewAI for agent execution — it already supports OpenAI, Anthropic, Google, Cohere, Ollama, Hugging Face, Mistral, AWS Bedrock, and others.
  • Build a unified adapter for your overall app (chat summarization, classification, etc.) using the same LiteLLM syntax (llm='provider/model-name').
    This minimizes friction between your chat components and CrewAI runs.

Example snippet:

from crewai import Agent, Crew

researcher = Agent(
    role="Research Specialist",
    goal="Synthesize knowledge from multiple LLM sources",
    backstory="Expert on cross-model aggregation.",
    llm="claude-3-opus"
)

If you need per-chat-layer specialization (e.g., using GPT-4 for summaries, Llama for fast replies), centralize model routing in your adapter and call CrewAI’s LiteLLM backend through it.


Scalability and Deployment Practices

Building a multi-user, concurrent CrewAI environment requires isolating long-running or high-load crews:

  1. Use async workers for Crew execution
    Deploy each Crew run as an async job. This allows your chat app to remain responsive even during long multi-agent workflows.
  2. Manager agent pattern
    Introduce a “manager” crew that evaluates user tasks and spins up specialized crews dynamically (e.g., using Process=thread or Process=sequential configurations).
  3. Horizontal scaling
    Host your CrewAI instances as microservices behind a load balancer. CrewAI AMP supports serverless scaling when deployed with AWS Bedrock or container orchestration.
  4. Persistent storage for memory and telemetry
    Use Redis for transient worker state and Postgres for histories. CrewAI’s tracing tools help monitor each agent’s lifecycle end-to-end.

Example Architecture Flow

  1. Frontend: User sends a message to OmniChatKit API.
  2. Router: Determines if the message requires CrewAI (e.g., multi-step reasoning).
  3. Crew Orchestrator: Triggers a CrewAI run (Python or via AMP API).
  4. LLM Adapter: Uses unified LiteLLM config for chosen model.
  5. Result Handler: Stores Crew output and injects it into session history.
  6. Summarizer Agent (optional): Periodically summarizes chat logs for context compression.

In short, CrewAI can drive your agent layer effectively if you treat chat history as an external persistence concern and delegate model routing to a LiteLLM-based abstraction shared between chat and CrewAI. For production-level load, AMP or AWS Bedrock integration is recommended for scaling and monitoring.

This setup makes OmniChatKit architecturally modular, vendor-agnostic, and fully capable of blending persistent chat memory, multi-LLM interchangeability, and autonomous CrewAI collaboration.

hope this helps!