Back to Resources

60% of chats the AI handed off to humans went unanswered — Bella Capri didn't know

Bella Capri Restaurants Duration: 3 weeks of implementation; pipeline runs continuously
case-studyrestaurantspizzeriaai-customer-servicewhatsapp
Embed Station

Key Results

The AI was escalating chats to humans. 60% of them disappeared. No one in operations knew.

60%
Escalated chats with no human response
1,000+
Conversations analyzed in 60 days
55
Stores covered

Introduction

Bella Capri thought it had 95% human coverage on customer service. It was 40%. 60% of chats the AI escalated to a human simply disappeared — no one replied, and the sale was lost. No one in operations knew.

Bella Capri is a premium pizza chain with 55 stores in Southern and Southeastern Brazil, average ticket above R$110 (≈ $20 USD, original BRL converted at 5.5:1 — exchange rate at the time of the case), with more than 80% of customer service already automated by AI via WhatsApp and Instagram. They brought in Embed Station to answer one specific question: is the AI converting, or is it losing sales along the way? In 3 weeks of implementation and 60+ days of pipeline running over 1,000+ analyzed conversations, the system revealed where the AI was losing sales — and where the human team wasn't covering what the AI escalated. The team acted on each finding. Today the AI handles customer service entirely on its own.

"It's an analysis we always wanted to run but never had the human resources to do. Now it's about applying the solutions to the problems we identified." — Operations Director, Bella Capri


The Challenge

Before the analysis, here's how it went: the AI handled customer service, classified some conversations as "this one a human will resolve better," and pushed them into the team's queue. The team picked them up — when they could. When they couldn't, the customer waited. There was no alarm, no visible SLA, no metric on what went unanswered. Each operator looked at their own panel.

The operations team knew there were gaps. They didn't know where, or how big. The evaluations that did happen were manual — someone opened the history of 10 conversations a day, read each one, took notes in a document. 30 minutes daily for a tiny diagnostic of an operation that processes thousands of messages across 55 stores.

And in parallel, no one knew if the AI was nailing intent, offering the right product, closing the sale, or just pushing customers into the queue — because no one was measuring it in a structured way.

Real human coverage (escalated chats)
40%
95%
Manual conversation evaluation
30min/day, 10 chats
0min, every chat
Visibility into the gaps
Anecdotal
Audit by store, hour, and dimension

Before: Blind operation. The AI handled, escalated, and no one saw what happened next. 30 minutes of daily manual evaluation covered 10 conversations — across 55 stores processing thousands of messages.

After: Every conversation evaluated across 6 dimensions. Every store, every hour, every operator with their metric. No more "guessing" — now it's "seeing."

Dashboard pointing to conversations where the AI escalated to a human and no one responded
Dashboard pointing to conversations where the AI escalated to a human and no one responded

Our Approach

The system reads every conversation that comes into Bella Capri's customer service channels (WhatsApp and Instagram), deterministically classifies what's fact — who sent it, when, how many messages, response time — and uses a language model to evaluate what's subjective: was the customer actually served? did the AI nail the intent? did the sale go through? Each conversation becomes a record with 6 dimensions of score, and everything lands in a dashboard that shows, by store and by hour, exactly where the gaps are.

Behind that, Embed Station built a custom pipeline in Node.js, with a database tailored for this case (it's not an off-the-shelf SaaS). The design is vendor-agnostic — works with any chat platform that has exportable history.

Tech Stack

You don't need to understand the stack — Embed Station handles that. For transparency, here's what runs underneath:

Pipeline in [Node.js](https://nodejs.org)Language model (GPT-4o)Custom databaseChat platform integrationVisualization dashboard
01

Map the operation

understand how the AI escalated conversations, when, why, and what happened next (or didn't).

02

Build the analysis pipeline

script that consumes every conversation, mixes deterministic analysis (numbers) with AI evaluation (quality, intent, conversion), and produces score across 6 dimensions.

03

Dashboard to see where it hurts

everything in one place: by store, by hour, by problem type. The operation gained eyes where it was blind before.

Diagram of the pipeline that analyzes each conversation and produces score across 6 dimensions
Diagram of the pipeline that analyzes each conversation and produces score across 6 dimensions

The Solution

Before, the operations team thought it had visibility — it had visibility into what each operator resolved, not into what disappeared. Now every conversation that comes in is evaluated, classified, and stored with proof.

Picks up every conversation and measures across 6 dimensions

service quality, intent capture, conversion, response time, and more. Where the AI is weak, it shows up by store.

Points to where humans aren't responding

of the chats escalated to the team, what % was answered within X minutes? Which store, which hour, which operator dropped the ball? No guessing — with data.

Becomes input to tune the AI, not a report that sits in a drawer

every catalogued failure becomes input to refine routing rules, train the team, and fine-tune the model. The analysis became a continuous improvement cycle.

AI evaluation dashboard — score by store, by hour, and by dimension
AI evaluation dashboard — score by store, by hour, and by dimension

Who This Works For

This model fits you if:

  • You operate a chain (multi-store, franchise, or single unit with high volume) with customer service via WhatsApp, Instagram, or any other chat channel.
  • You already have AI or a chatbot in customer service, in production for at least 3 months — long enough to have data, not still "tuning."
  • You're an owner, partner, operations director, or commercial director — and you need to decide with data, not gut feeling, where to invest your team's time.

If that sounds like your operation, this path is your starting point.


Results and Impact

AI Autonomy
76%
~100% (today)
Human response rate (escalated chats)
40%
95%
Manual evaluation time
30min/day
0

The 60% revelation triggered immediate action: training the team, adjusting the rules for when the AI escalates, changing the operator notification protocol (so no one is left out of the loop), and better internal routing across stores. In a few weeks, human coverage went from 40% to 95%. In a premium chain with average tickets above R$110 (≈ $20 USD), every chat that used to evaporate became recovered revenue — or at least a quality response that kept the customer close. From there, with each analysis cycle feeding fine-tuning of the AI, autonomy kept growing until the AI today can handle customer service entirely. This same pattern of "making the invisible visible" also shows up in our notification automation case for an accounting firm — in both, the real gain came from measuring what no one was measuring.

"We discovered our peak sales hour had the worst human coverage — without this analysis, we would never have identified that." — Commercial Director, Bella Capri

Human coverage on AI-escalated chats

Before (%)40After (%)95
Before and after comparison — human coverage jumped from 40% to 95%
Before and after comparison — human coverage jumped from 40% to 95%

Conclusion

60% of escalated chats disappeared. Today, the AI handles customer service entirely on its own. 30 minutes of daily manual evaluation turned into zero. And every operations or commercial decision is based on data, not guesswork.

But here's what's worth taking with you: AI in customer service isn't "set it and forget it." Most AI implementations don't fail because of the model — they fail because no one measures what's happening on the other side of the conversation. Bella Capri only discovered the size of the gap when they started looking. If you've deployed AI and don't have a dashboard showing, by store and by hour, how many chats were escalated to humans and how many got a response — you have exactly the same gap. And it's costing you sales right now. This same principle shows up in our Bella AI case at Bruno's salon, where the AI books appointments straight into the salon's calendar via WhatsApp.

Every chat your AI escalates and no one answers is a sale walking out the door. With an average ticket above R$110 (≈ $20 USD) like Bella Capri's, just 5 chats per day abandoned add up to R$16,500 per month (≈ $3,000 USD) in vanishing revenue. This number exists in your operation right now. You're just not measuring it.

Next Steps for Bella Capri

  • Expand analysis to new channels as they come online.
  • Integrate real-time analysis for automatic detection and correction of service gaps.

Next Steps for You

30 minutes to see the size of the gap. A few weeks to have the dashboard running. Informed decisions from day one.


Can you tell me right now how many chats your AI escalated to a human in the last 24 hours — and how many got a response?

If you had to open 3 systems to even try, this case study is about you.

A free 30-minute audit. You walk away with (1) where the AI is most likely losing sales in your funnel, (2) what signals to look for in your customer service history, and (3) a concrete next step to investigate.

If in 30 minutes we don't show you failures or breakdowns caused by your AI or by your human team, we can stop right there.

This isn't a pitch — it's a diagnostic. You keep the map either way. The analysis is vendor-agnostic: works with any chat platform or chatbot that has exportable history. No swapping anything you already use.

📩 contact@embedstation.com 🌐 embedstation.com

Book your 30 min30 min to solve your problem
Book a call