Incident Date: July 1, 2025
Incident Duration: 10:49–11:15 AM UTC
Severity: Major
Status: Resolved
On the morning of July 1st, an infrastructure change aimed at improving the scalability of one of our core backend components caused a temporary disruption to key services, including the loading of conversations and analytics for several customers.
Between 10:49 AM and 11:01 AM UTC, users experienced degraded service, including:
The issue was identified quickly, and a rollback restored full functionality within 12 minutes.
Time (UTC) | Event |
---|---|
10:49 AM | A change was deployed to our core. |
10:53 AM | Analytics service loading issues were reported. |
10:54 AM | Multiple customers reported that Conversations were not loading. |
10:56 AM | Engineers decided to roll back the change to the previous configuration. |
11:01 AM | Rollback completed. Services began functioning normally. |
11:15 AM | Incident was marked as resolved. API performance confirmed stable. |
The engineering team initiated a rollback to the previous provisioned capacity mode, restoring the prior configuration for both the table and its index. This resolved the throttling within minutes.
To prevent similar incidents in the future, we are improving how configuration changes are tested and deployed:
These improvements are part of our ongoing commitment to delivering a stable and reliable experience.
We sincerely apologize for the disruption this caused. The lessons from this incident are already being acted on to further strengthen our systems and processes. Ensuring the reliability and scalability of our platform remains a top priority.
If you have any questions or would like more details, please don’t hesitate to reach out to our support team.