Degraded performance

Incident Report for Dixa

Postmortem

Summary
On March 2, 2026, Dixa experienced platform-wide degraded performance lasting approximately 3 hours and 45 minutes. Customers experienced slow or failed conversation loading, timeouts on email sending, conversation transfers, assignments, and flow processing. No data was lost, and there were no security issues at any point.

Impact

  • Availability: Platform-wide slowness and partial inaccessibility for ~3 hours 45 minutes.
  • Affected functionality: Inbound email processing, conversation loading, conversation transfers, conversation assignments, and flow processing — all experienced significant slowness and intermittent failures.
  • Side effect: Some inbound emails resulted in empty, queue-less conversations being created. These are safe to close or merge with the correctly processed follow-up conversation.
  • Data integrity: All emails were fully processed after the fix. No data was lost and no security issues occurred at any point.

Root Cause
The incident was caused by a significant and atypical spike in inbound conversations that fell well outside expected operational parameters. This unexpected volume put pressure on a central database component, which began throttling requests. The throttling then cascaded to other parts of the system, resulting in platform-wide slowness and temporary inaccessibility.

Timeline (CET)
Mar 1, 02:14 Significant and atypical spike in inbound conversations begins
Mar 2, ~17:45 Database throttling begins; connection pools saturate
Mar 2, 19:08 Incident declared; investigation begins
Mar 2, 20:06Mitigation measures applied; investigation ongoing
Mar 2, 20:18 Root cause identified
Mar 2, 21:09 Fix deployed; error rates drop
Mar 2, 21:17 Resolution confirmed
Mar 2, 21:21Incident resolved

Resolution
We identified and addressed the source of the abnormal conversation volume, which immediately caused error rates to drop and the platform to recover.

What We're Doing to Prevent Recurrence
We have identified several systemic improvements and are actively working on them:

  1. Detect and suppress atypical inbound volume patterns — Introduce early filtering to prevent abnormal spikes from reaching core platform components.
  2. Improve retry and backoff behaviour — Reduce the risk of compounding load during high-traffic failure scenarios.
  3. Reduce inter-service dependencies — Limit the blast radius of a single overloaded component affecting other parts of the platform.
  4. Improve database resilience — Add circuit breakers and timeouts to prevent database pressure from cascading across services.
  5. Atomic conversation creation — Ensure conversations are never persisted without their initial message, eliminating orphaned empty conversations as a failure side effect.

Closing Note
We sincerely apologize for the disruption this caused. We take platform reliability seriously and are committed to the systemic improvements outlined above. If you have any questions, please reach out to friends@dixa.com.

Posted Mar 04, 2026 - 16:26 CET

Resolved

The incident has been resolved.

During the incident, inbound emails may have had trouble processing. You may therefore experience some blank, queue-less emails that didn't properly go through a flow, followed by an inbound email that did go through your email flow and does have the correct message.

You can safely close these conversations or merge them into the correctly processed inbound email.

We sincerely apologize for the inconvenience and encourage you to contact friends@dixa.com if you have further questions.

A Post Mortem will be posted within 5 business days.
Posted Mar 02, 2026 - 21:21 CET

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Mar 02, 2026 - 21:09 CET

Identified

We've identified the cause of the issues and are taking measures to resolve the problems you're experiencing as soon as possible.

Our sincere apologies for the inconvenience caused.
Posted Mar 02, 2026 - 20:18 CET

Update

We've taken a few measures trying to improve stability while we investigate the cause of the issues. You may notice slight improvement, but we haven't pinned down the root cause yet.
Posted Mar 02, 2026 - 20:06 CET

Update

We are continuing to investigate this issue.
Posted Mar 02, 2026 - 19:45 CET

Update

We are continuing to investigate this issue.
Posted Mar 02, 2026 - 19:25 CET

Update

We've received reports of slowness and problems with responsiveness across Dixa's agent interface. We're investigating.
Posted Mar 02, 2026 - 19:16 CET

Investigating

We have received reports of instability in the platform. We are investigating the issue. Updates will follow
Posted Mar 02, 2026 - 19:08 CET
This incident affected: Agent Interface (Agent Interface).