Conversations and Search: Degraded performance

Incident Report for Dixa

Postmortem

Summary

Dixa’s Search engine was under high pressure due to untimely cluster data rebalancing action. This caused requests routed to one heavily impacted node in particular, to respond slowly or even time out, resulting in failure for certain to load the Conversations overview and Search pages.

Root Cause

Around 1O:35 AM (CET), reports about high latency and instability started flowing into Dixa Support. Engineers immediately started investigating the issue and quickly identified one node in our Search engine being overloaded. The issue was caused by the engine automatically starting to move data between nodes to improve the overall data distribution.

Action Items

No immediate action had to be taken as the data rebalancing was completed shortly after the root cause had been identified.

In order to prevent untimely actions like this from happening going forward, we will reassess our engine alerts/configurations to help control how/when these, often automated, cluster actions may occur.Moreover, we are also in the process of rebuilding certain Search components, for which extra attention is being paid to our scalability and performance constrain.

Posted Nov 19, 2024 - 09:43 CET

Resolved

The issue affecting the conversation overview and search functionalities has been fully resolved. All services are now operating as expected, and we have confirmed system stability. We are sorry for any inconvenience this has caused and thank you for your patience.

Posted Nov 11, 2024 - 13:14 CET

Monitoring

A fix has been implemented to resolve the issue impacting the dashboard and search functionalities. We are currently monitoring the system to ensure stability and confirm that all services are fully operational. We will provide a final update once we verify that the issue is completely resolved.

Posted Nov 11, 2024 - 11:57 CET

Identified

We have identified an issue affecting the dashboard and search functionalities. These services may be slower than expected or intermittently unresponsive. Our team is actively working on resolving the root cause. Please note that the offers service is unaffected and continues to operate normally. We will provide an update as soon as further information is available.

Posted Nov 11, 2024 - 11:45 CET

Investigating

We have received reports of instability in the platform. We are investigating the issue. Updates will follow

Posted Nov 11, 2024 - 11:34 CET

This incident affected: Agent Interface (Search, Dashboard).