Multiple services disruptions

Incident Report for Dixa

Postmortem

Summary 

On the 20th of October 2025, starting from 08:58 am CEST, Dixa experienced widespread disruptions caused by the issues coming from third-party services. 

Impacted services

  • Telephony Inbound and Outbound traffic - Calls were not being placed for the most part of the incident, and for a period of time, calls were placed, but the connection between agents and end users was not established. 
  • WhatsApp channel - conversations were queued but not sent during the incident. All queued messages were sent as soon as the issue was resolved.
  • SMS channel - conversations were queued but not sent during the incident. All queued messages were sent as soon as the issue was resolved.
  • Login to Elevio - login to Elevio was impacted. Help Centers were fully operational. 
  • Login to the Status page was impacted, which caused challenges with communicating incident updates once the incident was raised. 
  • Access to Dixa’s Public API documentation docs.dixa.io was also affected by the incident; Dixa’s API was fully operational.

Root cause 

The incident originated from a major AWS service disruption in the us-east-1 region caused by a DNS race condition within Amazon DynamoDB’s endpoint management system. The fault temporarily removed valid DNS records for several AWS services, resulting in widespread API resolution failures across multiple dependent systems.

Dixa is using Twilio’s APIs for telephony, WhatsApp, and SMS channels. When DynamoDB’s DNS records were invalidated, Twilio’s regional load balancers lost access to key internal routing data, which caused inbound and outbound voice requests to fail. This also impacted dependent APIs for SMS and WhatsApp delivery.

Because Twilio’s APIs are globally routed through Twilio’s us-east-1 infrastructure, Dixa’s platform became unable to establish or maintain voice sessions, send or receive SMS messages, or process WhatsApp traffic. Other parts of the Dixa platform (login, chat, email) remained operational, but all Twilio-dependent communication channels were impacted for the most part of the duration of the AWS event. 

Timeline

At 08:58 am CEST: We started observing errors in the logs coming from 3rd party systems, and shortly after, we started getting reports from customers about issues.

At 09:16 am CEST: The Incident was reported on the status page with status: Investigating. The system we use for the status page was also impacted due to the same root cause at AWS, and we were unable to add further details regarding the incident.

Between 9:00 and 10:00 am CEST: We kept updating the users who reached out to our support team about the status of the issue, while adding more details to the status page was not possible.

At 10:00 am CEST: An updated incident message was posted in the Agent interface to notify users about the status of the incident 

At 10:07 am CEST: We regained access to the status page and updated the status of the incident to: Identified and added the information about impacted services. 

At 10:20 am CEST: The incident status update is posted as well, with the notification being sent to all status page subscribers. 

At 10:55 am CEST: The next status page update is posted with reference to the incidents at AWS and Twilio, which were the root cause of the issues with Dixa services. We also reported issues with logging in to Elevio caused by the same incident. 

At 11:52 am CEST: Update is posted. The Elevio login issue is resolved. The issue with telephony inbound and outbound, as well as SMS and WhatsApp service, continues. 

At 12:58 pm CEST: The first calls were starting to get to Dixa infrastructure again, but the voice in the calls was still not operational. 

At 13:05 pm CEST: The connection issue with telephony continues. SMS is confirmed to be fully operational again. 

At 13:26 pm CEST: The inbound and outbound telephony is confirmed to be operational again.

At 14:21 pm CEST: WhatsApp is confirmed to be operational as well. All messages that were queued during the incident got successfully delivered. The incident status is changed to: Monitoring

At 15:19 pm CEST: All systems have been operational since the incident was moved to Monitoring, and the incident status was changed to: Resolved. 

Preventive measures 

Dixa will evaluate implementing the following measures to mitigate the risk of similar issues in the future: 

  • evaluate the feasibility of enabling outbound and inbound routing via multi-regional providers in case the us-east-1 Twilio region fails.
  • Independent external health check monitoring outside of AWS and Twilio.
  • Service dependency mapping and redundancy revisit.
  • Incident communication redundancy 

We sincerely apologise for the inconvenience this has caused.

Posted Oct 28, 2025 - 15:05 CET

Resolved

All known issues reported as part of this incident have been fully resolved. We have been monitoring the system and all services are operational. We truly apologise for the inconvenience this incident cause and thank you for your patience while this was being solved.

Post mortem about this incident will be posted within 5 business days.
Posted Oct 20, 2025 - 15:19 CEST

Monitoring

We're happy to report that all Whatsapp issues have been resolved, and messages sent out during the outage have caught up and have been sent to their receivers.

We also got confirmation that telephony is working as expected.

Again our sincere apologies for the inconvenience this has caused. If you still experience issues, please reach out to Dixa Support.

We will continue to monitor if anything changes, but don't expect any further impact.
Posted Oct 20, 2025 - 14:21 CEST

Update

We are happy to report inbound and outbound calls are operational again. If agents continue to experience problems, ask them to refresh the page before trying again.

We continue to monitor the situation up close, and if anything changes we will inform you via this page. If you continue to experience issues, feel free to reach out to Dixa support.

Our sincere apologies for the inconvenience you are experiencing today.

Next update in 60 minutes, or earlier if there is an update to share.
Posted Oct 20, 2025 - 13:26 CEST

Update

We continue to see connection issues with phone calls. Calls are getting through to the Dixa platform but when agents accept the call, agents are not being connected to customers due to Twilio's network services which are still experiencing issues.

SMS is fully functional again.

We continue to monitor the situation and will update you again in the next 30 minutes, or sooner if there is news to share.
Posted Oct 20, 2025 - 13:05 CEST

Update

Telephony traffic is slowly recovering. The inbound calls are entering the Dixa platform and agents can accept offers but can't connect to the call. Outbound calls are still affected. We continue to follow the status of this issue with Twilio and we will keep posting updates.

We will post the next update in 30 minutes.
Posted Oct 20, 2025 - 12:38 CEST

Update

We can see that log in issues to Elevio and Jira (including the Jira integration) are now resolved, however we are still experiencing issues with the telephony channel due to the outage at Twilio (https://status.twilio.com/).

We will keep updating the status of this incident, in one hour at the latest.
Posted Oct 20, 2025 - 11:52 CEST

Update

The issues are still continuing and telephony is still down. We're following the issue which is related to an outage for our telephony network supplier Twilio (https://status.twilio.com/). This in turn is caused by an Incident at AWS (https://health.aws.amazon.com/health/status). As a consequence, other services dependent on AWS might experience downtime, such as Elevio and JIRA.

We'll keep monitoring and will keep you posted once we get an update.

Next update is at latest in one hour
Posted Oct 20, 2025 - 10:55 CEST

Update

We are currently experiencing significant issues with our telephony channel, which is impacting our ability to provide telephony services to our customers.

Both inbound and outbound traffic are being affected, meaning that you will encounter difficulties when trying to make or receive calls. Additionally, our SMS and Whatsapp channels may also experience disruptions as a result of this ongoing issue, which could hinder your ability to communicate via those platforms as well.

The root cause of these problems is related to an incident with our telephony services provider, which is completely outside of our control. We want to assure you that we are actively collaborating with them to address and resolve the issue at the earliest possible opportunity. Our team is working diligently to ensure that normal service is restored as quickly as possible.

We believe that the current issues are likely tied to a larger incident that a major cloud service provider is experiencing in one of its regions.

This situation has unfortunately impacted multiple services, including ours, complicating the process further.

We sincerely apologize for any inconvenience this may cause you and appreciate your patience and understanding during this time. Our priority is to keep you updated as we work through this situation, and we are committed to restoring service as swiftly as possible
Posted Oct 20, 2025 - 10:20 CEST

Identified

We are currently experiencing significant issues with our telephony channel, which is impacting our ability to provide telephony services to our customers.

Both inbound and outbound traffic are being affected, meaning that you will encounter difficulties when trying to make or receive calls. Additionally, our SMS and Whatsapp channels may also experience disruptions as a result of this ongoing issue, which could hinder your ability to communicate via those platforms as well.

The root cause of these problems is related to an incident with our telephony services provider, which is completely outside of our control. We want to assure you that we are actively collaborating with them to address and resolve the issue at the earliest possible opportunity. Our team is working diligently to ensure that normal service is restored as quickly as possible.

We believe that the current issues are likely tied to a larger incident that a major cloud service provider is experiencing in one of its regions.

This situation has unfortunately impacted multiple services, including ours, complicating the process further.

We sincerely apologize for any inconvenience this may cause you and appreciate your patience and understanding during this time. Our priority is to keep you updated as we work through this situation, and we are committed to restoring service as swiftly as possible.
Posted Oct 20, 2025 - 10:07 CEST

Investigating

We have received reports of instability in the platform. We are investigating the issue. Updates will follow
Posted Oct 20, 2025 - 09:16 CEST
This incident affected: Telephony & SMS (Inbound, Outbound, WebRTC, SMS) and Other Channels (WhatsApp).