[WW] Core systems degradation

Resolved

September 20, 2022 at 12:20:00 PM

Resolved

September 20, 2022 at 12:20:00 PM

Please find here after the root cause analysis.

On 2022-09-20 at 13:37 CEDT, we experienced a major network failure due to OVHCloud, our infrastructure provider in France. In the context of continuous improvement, OVHcloud has migrated a peering configuration from a network equipment to another. During the maintenance window, a configuration change resulted in some external routes not being properly propagated, which resulted in links saturation. We noticed during a short period of time (14min) latencies and packet drops (degradation of service) externally to end users, but also internally between the different Rainbow servers.

Therefore, rainbow users may have experienced following issues:

[EMEA] connect or use rainbow core services, participate to conferences
[WW] Hybrid telephony services (control the PBX phones and make peer to peer calls)
[WW] Use Rainbow hub cloud telephony services

While most of usages were restored at 14:15 CEDT, some rare users may have experienced issues up to 14:45 as some PBX agents took longer to reconnect than others Also, some servers showed an unwanted high error rate and were restarted manually.

Monitoring

September 20, 2022 at 12:15:00 PM

Monitoring

September 20, 2022 at 12:15:00 PM

We implemented a fix and currently monitoring the result.

Identified

September 20, 2022 at 12:03:45 PM

Identified

September 20, 2022 at 12:03:45 PM

We are continuing to work on a fix for this incident.

We see worldwide impacts on all services and core components because of an issue at our provider.

We are currently mitigating and restarting the services

Investigating

September 20, 2022 at 11:37:00 AM

Investigating

September 20, 2022 at 11:37:00 AM

We are currently investigating this incident.

Rainbow - [WW] Core systems degradation – Incident details

All systems operational