Rainbow - [WW] Core systems degradation – Incident details

All systems operational

[WW] Core systems degradation

Resolved
Degraded performance
Started almost 3 years agoLasted 43 minutes

Affected

Global (WW)

Degraded performance from 11:37 AM to 12:20 PM

[WW] Rainbow Hybrid PBX Telephony

Degraded performance from 11:37 AM to 12:20 PM

[WW] Rainbow Administration & Subscriptions

Degraded performance from 11:37 AM to 12:20 PM

Europe & Middle East (EMEA)

Degraded performance from 11:37 AM to 12:20 PM

[EMEA] Rainbow Core Services

Degraded performance from 11:37 AM to 12:20 PM

[EMEA] Rainbow Hub Voice Services

Degraded performance from 11:37 AM to 12:20 PM

Updates
  • Resolved
    Resolved

    Please find here after the root cause analysis.

    On 2022-09-20 at 13:37 CEDT, we experienced a major network failure due to OVHCloud, our infrastructure provider in France. In the context of continuous improvement, OVHcloud has migrated a peering configuration from a network equipment to another. During the maintenance window, a configuration change resulted in some external routes not being properly propagated, which resulted in links saturation. We noticed during a short period of time (14min) latencies and packet drops (degradation of service) externally to end users, but also internally between the different Rainbow servers.

    Therefore, rainbow users may have experienced following issues:

    • [EMEA] connect or use rainbow core services, participate to conferences
    • [WW] Hybrid telephony services (control the PBX phones and make peer to peer calls)
    • [WW] Use Rainbow hub cloud telephony services

    While most of usages were restored at 14:15 CEDT, some rare users may have experienced issues up to 14:45 as some PBX agents took longer to reconnect than others Also, some servers showed an unwanted high error rate and were restarted manually.

  • Monitoring
    Monitoring

    We implemented a fix and currently monitoring the result.

  • Identified
    Identified

    We are continuing to work on a fix for this incident.

    We see worldwide impacts on all services and core components because of an issue at our provider.

    We are currently mitigating and restarting the services

  • Investigating
    Investigating

    We are currently investigating this incident.