[Resolved] Uptime Demystified ?
Started on January 1, 2020 at 11:00:00 AM GMT+0. Resolved after less than a minute
- ResolvedJanuary 1, 2020 at 11:00:00 AM GMT+0ResolvedJanuary 1, 2020 at 11:00:00 AM GMT+0
Rainbow is a global service operated in multiple data-centers around the world. As to ensure both data privacy and location and best-in-class performances, our services are covered within multiple regions or availability zones:
- North America (NA)
- Caribbean & Latin America (CALA)
- Europe & Middle East (EMEA)
- Germany (DE)
- Asia-Pacific (APAC)
- Australia-New Zealand (ANZ)
- and a Global (WW) zone, providing some key services used by all aforementioned regions .
Services within these regions are filtered in multiple categories:
- "Core" services, representing all day-to-day basic features of Rainbow (Login, Instant Messaging, WebRTC P2P calls, presence, search & contacts management, file sharing …)
- "Conferencing" services, including Bubbles, Channels, PSTN-based and WebRTC multi-parties audio/video conferences.
- "Media Relays" services, featuring all WebRTC traffic relays available in our various points of presence (PoP), allowing low-latency high-quality communications.
- "Hybrid PBX Telephony" services, gathering PBX and DeskPhones connectivity to Cloud with presence and 3PCC services.
- "Admin & Subscriptions" services, related to compagnies administration features, including mass provisioning, license enablement, subscription, billing and invoicing.
Each service in each zone is autonomous and figures its very own uptime, according to our contractual Service Level Agreement (SLA).
Only non-planned incidents are reflected on this service. Planned production rollouts are out of the scope.
Should incidents occur, they are categorized as:
- Degraded: services remaining operational but are slower to respond than usual and may slightly decrease the overall user experience. Such events do not reflect our uptime calculation.
- Minor: some abnormal systems behavior has occured and may impact quite a few customers being a bit of nuisance with not much consequences.
- Partial: our systems have suffered from a more consequent issue, preventing a larger subset of our users to use our service.
- Major: that shouldn't happen but unfortunately so is life. A severe incident likely lead to a service disruption to a large majority of our users, preventing most if not all access to our services.
For each category of service, our displayed uptime is calculated using the following formulae over the last period (90 days):
Uptime = (Total minutes over the period – Total Incidents minutes over the period) x 100 / Total minutes over the period
Total Incidents minutes = sum("Major" incident minutes) + 30% * sum("Partial" incidents minutes) + 30% * sum("Minor" incidents minutes)
For each incident, duration is expressed by the delta between the time of occurence (as reported or most commonly diagnosed by our monitoring systems) and the moment where the incident has been mitigated and considered as resolved, all services being restored to a nominal state.
We also provide a human-friendly daily representation of our systems daily life, through our color-code chronogram representation where:
- Green is for a happy day, with less than 1 minute of cumulated incidents leading to downtime.
- Yellow represents less than 20 minutes of downtime
- Orange stands for 20 to 40 minutes of cumulated downtime.
- Red calls for more than 40 minutes of cumulated downtime in a given day.
(degraded state being once again, out of the scope of uptime calculation)