Dixa’s Search engine was under high pressure due to untimely cluster data rebalancing action. This caused requests routed to one heavily impacted node in particular, to respond slowly or even time out, resulting in failure for certain to load the Conversations overview and Search pages.
Around 1O:35 AM (CET), reports about high latency and instability started flowing into Dixa Support. Engineers immediately started investigating the issue and quickly identified one node in our Search engine being overloaded. The issue was caused by the engine automatically starting to move data between nodes to improve the overall data distribution.
No immediate action had to be taken as the data rebalancing was completed shortly after the root cause had been identified.
In order to prevent untimely actions like this from happening going forward, we will reassess our engine alerts/configurations to help control how/when these, often automated, cluster actions may occur.Moreover, we are also in the process of rebuilding certain Search components, for which extra attention is being paid to our scalability and performance constrain.