Loading conversations not working

Incident Report for Dixa

Postmortem

Incident Date: July 1, 2025
Incident Duration: 10:49–11:15 AM UTC
Severity: Major
Status: Resolved

Summary

On the morning of July 1st, an infrastructure change aimed at improving the scalability of one of our core backend components caused a temporary disruption to key services, including the loading of conversations and analytics for several customers.

Impact

Between 10:49 AM and 11:01 AM UTC, users experienced degraded service, including:

  • Inability to load Conversations and Analytics
  • Increased API latency and error rates
  • Service instability across core business functionalities

The issue was identified quickly, and a rollback restored full functionality within 12 minutes.

Timeline of Events

Time (UTC) Event
10:49 AM A change was deployed to our core.
10:53 AM Analytics service loading issues were reported.
10:54 AM Multiple customers reported that Conversations were not loading.
10:56 AM Engineers decided to roll back the change to the previous configuration.
11:01 AM Rollback completed. Services began functioning normally.
11:15 AM Incident was marked as resolved. API performance confirmed stable.

Immediate Fix

The engineering team initiated a rollback to the previous provisioned capacity mode, restoring the prior configuration for both the table and its index. This resolved the throttling within minutes.

Long-Term Preventive Actions

To prevent similar incidents in the future, we are improving how configuration changes are tested and deployed:

  • Improved Change Validation: We’re enhancing our testing processes to better simulate real-world usage before changes are rolled out to production.
  • Stronger Safeguards on Infrastructure Changes: We’re updating how critical infrastructure components are configured to ensure changes do not unintentionally impact system performance.

These improvements are part of our ongoing commitment to delivering a stable and reliable experience.

Closing Notes

We sincerely apologize for the disruption this caused. The lessons from this incident are already being acted on to further strengthen our systems and processes. Ensuring the reliability and scalability of our platform remains a top priority.

If you have any questions or would like more details, please don’t hesitate to reach out to our support team.

Posted Jul 05, 2025 - 21:09 CEST

Resolved

This incident has been resolved.
Posted Jul 01, 2025 - 11:15 CEST

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Jul 01, 2025 - 11:04 CEST

Investigating

We are currently investigating an issue with loading conversations.
Posted Jul 01, 2025 - 11:01 CEST
This incident affected: Agent Interface (Agent Interface, Dashboard).