Issue with loading the Dixa Interface

Incident Report for Dixa

Postmortem

What caused the downtime?

An experimental change targeted at a test environment was accidentally sent to the production environment.
This change had the effect of redirecting all traffic of a central part of Dixa to a single local developer machine.
The single machine was unable to cope with this load.

How will we prevent this in the future?

At Dixa we pride ourselves on delivering stable service and we are truly sorry for such an unnecessary interruption in service.
We will review our internal processes to make them more resilient to human errors.

The more detailed version

We use Kubernetes to deploy and manage our services.
Kubernetes is a great platform that has served us well and continue to serve us well as we scale Dixa.
However kubernetes can also introduce some overhead in terms of development deployments.
We are in the process of investigating options reducing this overhead to speed up development.
In the process of testing one of those options (telepresence.io) a production development was inadvertently swapped out.

Posted Sep 13, 2019 - 21:28 CEST

Resolved

The outage has been completely resolved. It lasted from 15:47 to 16:05 CEST. We apologize for the inconvenience this has caused.

Posted Sep 13, 2019 - 16:50 CEST

Monitoring

We have found and solved the problem. We expect regular operations for all customers shortly.

Posted Sep 13, 2019 - 16:08 CEST

Investigating

We are investigating an issue with loading the Dixa interface. Both native applications and browser access is affected. We will update as soon as more information is available.

Posted Sep 13, 2019 - 15:58 CEST

This incident affected: Agent Interface (Agent Interface).