Crew AI Performance Issues Compared to Direct GPT 4o-mini for Sales Email Generation. What we are doing wrong?

Nick · September 20, 2024, 3:56pm

Hey Crew AI Community,

I’m Nick, co-founder of SalesFlow.Team. We’re working on automating sales emails for Salesforce users, but recently, we’ve run into a bit of a head-scratcher with Crew AI. Our goal is to craft personalized, context-driven emails that introduce our service to new leads. Specifically, we want to promote SalesFlow.Team to Salesforce users, tailoring the messaging to each lead’s industry.

We decided to give Crew AI a shot, thinking it would improve the quality of our generated emails comparing to direct proping to GPT, but… well, the results have been disappointing so far - compared to using GPT 4o-mini directly. I think we are doing something wrong - please help.

What We’re Trying to Achieve:

We need Crew AI to help generate personalized emails promoting SalesFlow.Team to new leads in Salesforce. The emails should highlight the ways our service boosts productivity by automating follow-ups, intro emails, quotes, etc., and focus on increasing engagement (especially for cold outreach).

The Issue:

We used the same prompts for both Crew AI and GPT 4o-mini. I literally copied the same prompts from our two agents (CRM Investigator and Email Crafter/Sender), along with the corresponding tasks (retrieve activity from CRM and craft/send email). The target in this case is a cold outreach to Doctors Without Borders (non-profit - so they care about fundrasing)

Here’s the result we got from Crew AI:

Crew AI Email Result:
“Hi Bertha,
I hope this message finds you well! My name is Nick, and I’m a co-founder at SalesFlow.Team. We specialize in helping Salesforce users like yourself enhance productivity through our AI-powered service.
Our platform automates the generation and sending of intro emails, follow-ups, quotes, and sales emails, all tailored based on your full communication history within Salesforce. This means you can focus on what truly matters—building relationships and closing deals—while we handle the repetitive tasks.
We’ve seen substantial increases in response rates and demo flows—up to 50% compared to traditional approaches—thanks to our integration with Salesforce. Plus, we’re currently offering special pricing for our first pilot customers, ensuring you receive significant value from our service.
I would love to schedule a quick 15-minute demo to showcase how SalesFlow.Team can specifically benefit your organization. If you’re not the right contact for this, I’d appreciate it if you could forward this email to the appropriate person.
Looking forward to your response!
Best regards,
…”

While it’s not terrible, it’s just kind of flat and generic. It feels like it’s missing the personal touch that we’re looking for in these emails. All this WAS in both promts to GPT and Crew AI agents and tasks.

Now compare that to the result from GPT 4o-mini directly:

Direct GPT 4o-mini Email Result:

Subject: Elevate Your Outreach with SalesFlow.Team

Hi Nick,
I hope this message finds you well!

I’m reaching out as a co-founder of SalesFlow.Team, your AI co-pilot for sales communication. We’ve been following the incredible work that Doctors Without Borders does in providing humanitarian aid and health support worldwide, and I believe we can help enhance your outreach efforts.

Our platform seamlessly integrates with Salesforce, automating the generation and sending of intro emails, follow-ups, quotes, and sales communications. By utilizing your full communication history within Salesforce, we ensure that every interaction is personalized and context-driven.

Here’s how SalesFlow.Team can boost your productivity:

** Enhanced Fundraising: Our tailored messaging can significantly improve your outreach, helping you connect more effectively with potential donors.*
** Automated Communication: We leverage AI to draft relevant emails based on your existing deals and engagements, ensuring timely follow-ups without overwhelming your team.*
** Proven Results: Early pilot projects have shown up to a 50% increase in response rates compared to traditional methods, allowing teams like yours to maximize their impact.*

As a special offer for our initial pilot customers, we’re providing exclusive pricing to deliver substantial value. I’d love to schedule a quick 15-minute demo to showcase how our features can support your mission in action.

If you think someone else in your team would be more appropriate for this conversation, please feel free to forward this email.

Looking forward to the opportunity to collaborate!
Best regards,
…

This email feels a lot more polished, focused in indusrty process (target id Non-profit) personable, and aligned with the tone we’re aiming for. Aagin and again - prompts are identical!

What We’ve Tried:

We used the exact same prompts for both Crew AI and GPT 4o-mini directly.
Ensured we were using the same GPT 4o-mini model in both cases.
Copied the structure and content requirements, focusing on the same objective (personalization, industry specifics, highlighting Salesforce integration, and cold outreach).
Tried several times - killed at least 24 hours - put

What We’re Looking For:

We’re hoping to get some insight into why Crew AI’s using GPT 4o-mini is delivering weaker results compared to using GPT 4o-mini directly. Is there something we’re missing, or are there optimizations we could apply to make the output more tailored and human-like?

We’d really appreciate any advice or suggestions on improving the output, whether it’s tweaking the prompts, adjusting settings, or any other potential solutions. Thanks in advance for the help! Will share promts privately to anybody interested. Questions are welocome !!!

Dabnis · September 20, 2024, 4:46pm

My first thought is this:
The core of CrewAI constructs it’s own prompts/context before passing off to GPTxxx. This is the reason why you can not do a direct comparison of prompts between GPT & CrewAI.

Solution
Get an experienced CrewAI accredited dev to re-do the prompts for you.

I’m relatively new to CrewAI, but I’m sure that others will add their thoughts below.

@matt

Best of luck .

Nick · September 20, 2024, 4:52pm

We see in Langtrace that Crew AI sends the content to LLM (4o mini) with request to generate a industry specific mail. And result is poor. Is there a guidance somewhere for how to set goals backstories, task descriptions in a optimal way ?

And we do copmare because if we do not get better result than LLM - there is juts no sence for use engaing Crew AI.

Dabnis · September 20, 2024, 5:04pm

Hi Nick,
Apologies for the confusion, I was referring to the comparison of the actual prompts, not so much the output.

Prompt engineering for CrewAI: Apart from the standard consideration of promt engineering and doing this youtube search all I can suggest is the docs

jklre · September 20, 2024, 6:48pm

I get wildly different results depending on prompts and models. You need to spend time and find out which is the best prompt and the best model for you. Have you tried out e11 and fabric for your prompting?

ZnK · October 14, 2024, 2:42am

You can try “Forcing Tool as Result”. I think it might work for this case: https://docs.crewai.com/how-to/force-tool-output-as-result

Topic		Replies	Views
CrewAI Chatbot Performance - Agent Execution Time General	8	281	May 2, 2025
How does Crew AI compare to other agent frameworks you've tried so far? Crews agent	2	194	April 3, 2025
CrewAI Content generation vs Business workflows CrewAI Community Support agent , crewai , feature	0	77	October 12, 2024
Job Opportunity General agent , crewai	1	155	September 24, 2024
GSoC 2025 Proposal Help – Boosting Gemini in CrewAI! General	2	37	April 3, 2025

Crew AI Performance Issues Compared to Direct GPT 4o-mini for Sales Email Generation. What we are doing wrong?

What We’re Trying to Achieve:

The Issue:

Now compare that to the result from GPT 4o-mini directly:

Related topics