影響あり
稼働中 から 7:04 PM ~ 7:04 PM, 重大な障害 から 7:04 PM ~ 7:22 PM, 稼働中 から 7:22 PM ~ 7:22 PM
- 死後死後
During scheduled drive replacement performed in two of our Moscow data centers, application cluster responsible for API service went offline. Redis failed to promote masters on the remaining healthy node, causing the API service to repeatedly attempt connections to an unavailable Redis master. Manual failover resolved the issue and services were restored.
Timeline & Root Cause Analysis
Planned maintenance was performed to replace hard drives in two data centers.
As part of the work, the app server in DC1 was shut down.
Later, the app server in DC2 was also shut down for the same maintenance.
The DC1 app server did not have enough time to fully come back online before the second shutdown.
As a result, two app servers in the cluster went down simultaneously.
Redis did not switch the master role to the remaining node in the other DC as expected.
The API service failed to start because it kept trying to connect to the Redis master located in DC1, which was unavailable.
Redis master roles were manually promoted to the servers in DC2.
Once Redis topology was corrected, the API and dashboard services recovered and returned to normal operation.
Resolution
All Redis masters were manually switched to the healthy nodes in DC2.
Application services (API, dashboard) successfully started and functioned as expected.
Next Steps / Preventive ActionsApplied corrections to the maintenance algorithm, ensuring app servers are never taken down simultaneously and Redis failover logic is properly validated before each step.
Review and improve Redis automatic failover configuration.
Add additional health checks and monitoring around Redis master availability and app server readiness.
Adjust maintenance sequencing to guarantee sufficient startup time between operations.
- 解決済み解決済み
- 調査中調査中
We are currently investigating this incident.
![[object Object]](/_next/image?url=https%3A%2F%2Finstatus.com%2Fuser-content%2Fv1683915017%2Fufwqirxrleos66fssooy.png&w=3840&q=75)