Five Nines etc...

All systems will fail in time. It's called "Entropy". Simply, Entropy means that without near-constant input of energy/effort, a system (any collection of components working in a greater whole) will break down and become part of its surroundings. Living things die, cool down and begin to decay until they are nothing but soil - they move into equilibrium with their environment.

Even a wrecked ship over years will become progressively more and more broken until nothing remains. Its components become smaller and smaller particles, distributed into the sea and beach.

IT services come with Service Level Agreements (SLA, an assurance of availability) which is a percentage of a year that you can expect them to be working - called "Uptime". Time outside this is called "Downtime". SLAs deal with unplanned downtime. Storage will rarely (and never should) rely on a single disk to store valuable information (see our RAID article) and servers use a form of High Availability such as clustering to ensure that no one device can fail and take down a service entirely.

These SLAs are often expressed as "nines". This is the number of nines in the "uptime" percentage, so "four nines" is 99.99% etc., Sometimes you'll see the last digit given as well, "Five Nines Five" means a particular service will be available 99.9995% of the time. To human ears, 99.9% sounds a lot, but in computer terms a few seconds are a large lump of compute time so it is important to know how reliable a service will be before you plan to use it. This has to be balanced against how realistic your requirement is i.e. does your swimming club website justify seven nines - and the associated high price likely attached to that? If it was down for less than 9 hours over an entire year (three nines) would it really matter? On the other hand, if banking systems can't talk to each other, eight nines (or better) is justified.

To show what the most common ranges are, it is easier to think of a year in terms of the seconds it contains. We'll assume a year is 365 days just to keep the maths simple. A day contains 86400 seconds so simply multiplying that by 365 gives us 31536000 seconds. It's now a simple case of progressively dividing that number by common "Nines" to see what it means in human terms.

"Nines"PercentageDowntime"Human" Time
299%3153603.6 days
399.9%315368.75 Hours
499.99%315352.5 minutes
599.999%3155.25 minutes
699.9999%31.5seconds
799.99999%3seconds
899.999999%0.3seconds
999.9999999%0.03 seconds

Remember these figures are unplanned totals over a single year (i.e. those due to failure of some kind), so they don't take into account planned outage for improvements and patching etc. If someone tells you a system will be unavailable for two hours on sunday night, that is planned downtime and doesn't commonly count towards the SLA).

No-one will promise 100% up-time, Amazon S3 has a maximum of 99.999999999995%. This means 100% barring catastrophic solar storm/near-earth super nova, asteroid collision, global EM Pulse etc... at which point it probably doesn't matter anymore. LogiTEL GreenCloud's lowest SLA is "Four Nines" and Higher levels are available with our premium packages. GreenCloud is currently performing at better than Nine Nines (we've had zero unplanned outages in the last 7 years).

 
>