Redundant Array of Independant Disks - RAID

Ensuring your data survives a disk problem is a great worry for most people and businesses. RAID is all about making sure you can get to your files when you want to and to protect them when something goes wrong with the disks that store them. Unbelievably some ignore the potential problem and do not perform even the most basic back-ups. Furthermore, some even store everything on a single USB disk never contemplating it is just as likely to fail as the disk in their PC. The author has a friend who lost thousands of photographs for exactly this reason. Only when it was too late did he understand about backing up irreplacable data. A simple question is: "if I lost this, would it hurt?" If the answer is YES then you need to make it NO. It is that easy.

A good way of safeguarding data is to use several disks to store it - to always make sure you have the data in more than one place. This might simply be a case of copying one disk to another but a better way is to spread the data across the disks and use some mathematics and electronics to make sure it stays safe. A RAID array will do this and a plus point - it is usually faster than a single disk.

RAID comes in several different flavours called "Levels" and each has strengths and weaknesses. There now follows a discussion of the levels you are most likely to encounter and a discussion of the pros and cons of each.

The above diagram is called the "CAP Triangle". It may be thought of as representing extent of three critical requirements of disk storage: Cost, Availability (how likely your data is to survive a problem) and Performance (how quickly can you Read and Write that data), Every disk storage method can be thought of as occupying a point within this triangle- the closer to an edge, the greater degree of this property it possesses, the further away, a lesser degree.

As a rule of thumb, each disk mechanism in a RAID set needs to be of the same capacity and preferrably model. Most RAID controllers will permit some "mix n matching" but the lowest capacity of any single drive will be imposed on all disks in the array. e.g. if a simple RAID5 array is composed of 1x320Gb and 3x500Gb member disks, the whole group will be treated as 4x320Gb disks. Some RAID controllers have a very limited range of disks they will work with due to close-coupling of the low level commands of the disk interface to the firmware of the controller to gain increased performance. In all instances you will need to check that disks and controllers you plan to use are compatible.

Software RAID

This article deals mainly with RAID provided by hardware specifically designed for the purpose but it can be achieved in software also: Software RAID falls into two broadly related categories. In all cases it is implimented by either the BIOS on low-end servers & high-end desktop PCs and laptops or by dedicated software to present a server as a RAID array across a network - Such machines, loosely termed "NAS Filers" (Network Attached Storage), use common network protocols to provide storage to other machines in the vicinity. These protocols will mainly include SAMBA/CFS and NFS. Open source software is available to provide tough linux based solutions for NAS filers e.g. FreeNAS. Software RAID is often seen as a "poor-man's" solution and performance rarely matches that of a dedicated RAID controller. However, it is usually more tolerant of mixed model disks and when used in NAS filers the lower performance is never an issue as software RAID will easily out-perform the throughput of giga-bit ethernet (10G is still something of a rarity in establishments that might use Software RAID across a network). Thus it makes no sense to buy expensive dedicated RAID hardware for storage arrays accessed purely via ethernet. (NOTE NAS is not to be confused with SAN)

RAID Levels

The most common RAID levels, listed below, approach data redundancy by duplicating physical blocks of disk storage (i.e. at a hardware level on the actual disk plattens) across multiple mechanisms. This is enhanced further by some methods using mathematical tricks to store a fingerprint of the data (parity) on another disk. This can be used to rebuild any missing data in the event of a failure. Each RAID level exhibits its own unique benefits and drawbacks. This overview will attempt to highlight each or help you find the right RAID level for your particular application. Please note that the numbers assigned to each level of RAID do not indicate superiority, they are merely for differentiation. It is also important to remember that most RAID configurations require all member disks to be the same capacity (if not the same make & model).

Level 0

Striped Disk Array without Fault Tolerance. RAID level 0, often time called "striping", is a performance-orientated data mapping technique. "Striping" means the data being written to the array is broken down into sections, which are written simultaneously across all member disks of the array. Because the data is not stored contiguously on a single drive, it can be accessed in parallel - The whole data being constructed from blocks read back simultaneously also and presented to the requesting system at full interface speed with the only seek delay being that required for a single block (as all drives do this together). This provides very high I/O performance (among the best) at low cost but provides no redundancy or Fault Tolerance at all. It is ideally suited to working buffers where the data is held temporarily and rapid access is required for work-in-progress but it is not expected to reside on the array indefinitely but rather moved to more resilient storage or discarded once processed due to the very high risk of loss. RAID Level 0 requires a minimum of 2 drives to implement

Advantages

  • No parity calculation overhead is involved
  • Very simple design
  • Easy to implement
  • All the array capacity is available for storage so the cheapest $/GB
  • Probably the fastest Read & Write because of parallel access to disks
Disadvantages
  • Not a true RAID because it is not fault tolerant.
  • Data Availability is statistically worse than a simple, single disk - Loss of any one disk will destroy the data on the entire array.
  • Should never be used for permanent storage or in mission critical environments because of the high probability of data loss.
Recommended Applications
  • Unless capacity is key, consider using RAID 10 (to provide data security) which can deliver similar read speeds.
  • Video Production and Editing
  • Image Editing
  • Pre-Press Applications
  • Processing of existing data and text
  • Any application requiring high bandwidth

Level 0+1

High Data Transfer Performance. RAID 0+1 is NOT to be confused with RAID 10. Two sets of striped disks are mirrored and a single drive failure will cause the whole array to become, in essence, a RAID Level 0 array. Requires a minimum of 4 drives to implement.

Advantages

  • High I/O rates are achieved thanks to multiple stripe segments. Large memory buffers on disks and controllers can make RAID0+1 very fast for both read and write accesses.
  • Excellent solution for sites that need high performance but are not concerned with achieving maximum reliability
Disadvantages
  • Expensive due to 100% overhead - Only half the total capacity is available for storage.
  • All drives must move in parallel to give full potential. If the disks or controller cannot provide this function poor sustained I/O performance will result.
  • Limited scalability at high inherent cost. Once defined, a RAID 0+1 array is difficult to expand and must usually be destroyed and rebuilt.

                                                   

Level 1

Mirroring and Duplexing. RAID level 1, or mirroring, has been used longer than any other form of RAID. Level 1 provides redundancy by writing identical data to each member disk of the array, leaving a "mirrored" copy on each disk, thus a second copy of each data block is available should the first become un-usable. Mirroring remains popular due to its simplicity and high level of data availability. Level 1 operates with two or more disks that may use parallel access when reading to improve I/O performance. Level 1 provides very good basic data reliability and improves performance for read intensive applications but at relatively high cost. For best performance, the controller must be able to perform two concurrent separate reads per mirrored pair or two duplicate writes per mirrored pair. RAID Level 1 requires a minimum of 2 drives to implement.

Advantages

  • Two Reads possible per mirrored pair -Twice the Read transaction rate of single disks. Write transaction rates are the same as a single disks
  • 100% redundancy of data means no rebuild is necessary in case of a disk mechanism failure, just copy the data to the replacement disk
  • Under certain circumstances, RAID 1 can sustain multiple simultaneous drive failures
  • Simplest true RAID storage subsystem design
Disadvantages
  • Expensive due to 100% overhead - Only half the capacity is available for storage
  • Typically the RAID function is done by system software, possibly degrading throughput at high activity levels. Hardware implementation is strongly recommended. Also may not support hot swap of failed disk mechanisms when implemented in software
  • Limited scalability at high inherent cost. Once defined, a RAID 1 array is difficult to expand and must usually be destroyed and rebuilt
Recommended Applications
  • Accounting/Payroll/Financial
  • Any application requiring high availability

                                                   

Level 10

High Reliability combined with High Performance. Not to be confused with RAID 0+1, RAID 10 is implemented as a striped array whose segments are RAID 1 arrays. It has the same fault tolerance as RAID level 1 with the same overhead for fault tolerance as mirroring alone. RAID Level 10 requires a minimum of 4 drives to implement. It is arguably the most common RAID level and provides an excellent trade-off in simplicity of implimentation and speed in use.

Advantages

  • High I/O rates are achieved by striping RAID 1 segments
  • Under certain circumstances, RAID 10 array can sustain multiple simultaneous drive failures
  • Excellent solution for sites that would have otherwise gone with RAID 1 but need some additional performance boost. Large memory buffers on disks and controllers can make RAID10 very fast for both read and write accesses.
Disadvantages
  • Expensive due to 100% overhead - Only half the capacity is available for storage
  • Ideally all drives must move in parallel lowering sustained performance
  • Limited scalability at high inherent cost. Once defined, a RAID 10 array is difficult to expand and must usually be destroyed and rebuilt

Recommended Applications

  • High throughput transactional Databases requiring maximum performance with fault tolerance

                                                   

Level 2

Hamming Code ECC. Each bit of data word is written to a data disk drive. Each data word has a Hamming Code or Error Correction Code (ECC) word recorded on the ECC disks. On Read, the ECC code verifies correct data or corrects single disk errors.

Advantages

  • "On the fly" data error correction
  • The higher the data transfer rate required, the better the ratio of data disks to ECC disks
  • Relatively simple controller design compared to RAID levels 3,4 & 5
Disadvantages
  • Expensive due to 100 - 200% overhead
  • Entry level cost prohibitively high for small businesses
  • Low I/O transaction rate. Equal to a single disk at best (with spindle synchronization)
  • Not commercially viable so difficult to find an implimentation

                                                   

Level 3

Parallel transfer with parity. RAID 3 adds redundant information in the form of parity to a parallel access striped array, permitting regeneration and rebuilding in the event of a disk failure. One strip of parity protects corresponding strips of data on the remaining disks. RAID 3 provides high data transfer rate and high data availability, at an inherently lower cost than mirroring. Its transaction performance is poor, however, because the array member disks operate in lockstep. RAID Level 3 requires a minimum of 3 drives to implement

Advantages

  • Very high Read data transfer rate
  • Very high Write data transfer rate
  • Disk failure has an insignificant impact on throughput
  • Low ratio of ECC (Parity) disks to data disks means high efficiency
  • Greater proportion of capacity is available for storage than with lower RAID levels - roughly 33% overhead
Disadvantages
  • Transaction rate on many small files is poor, probably no better than that of a single disk drive at best (if spindles are synchronized)
  • Controller design is fairly complex
  • Very difficult and resource intensive to do as a software RAID
Recommended Applications
  • Video streaming
  • Image Editing
  • Video Editing
  • Any application requiring contiguous access to large files

                                                   

Level 30 and 03

See RAID53
 

Level 4

Independent Data disks with shared Parity disk. Like level 3, level 4 uses parity concentrated on a single disk to protect data. Unlike level 3, level 4 member disks are independently accessible making it better suited to transaction I/O rather than large file transfers. Because the dedicated parity disk represents an inherent bottleneck, level 4 is seldom used without accompanying technologies such as write back caching. Each entire block is written onto a data disk. Parity for same rank blocks is generated on Writes, recorded on the parity disk and checked on Reads. RAID Level 4 requires a minimum of 3 drives to implement

Advantages

  • Very high Read data transaction rate
  • Low ratio of ECC (Parity) disks to data disks means high efficiency
  • Greater proportion of capacity is available for storage than with lower RAID levels - roughly 33% overhead
  • High aggregate Read transfer rate
Disadvantages
  • Quite complex controller design
  • Worst Write transaction rate and Write aggregate transfer rate on large quantities of small files
  • Difficult and inefficient data rebuild in the event of disk failure
  • Block Read transfer rate equal to that of a single disk
Recommended Applications
  • Video streaming
  • Image Editing
  • Video Editing
  • Any application requiring contiguous access to large files

                                                   

Level 5

Independent Data disks with distributed parity blocks. By distributing parity across some or all of an array's member disks, RAID level 5 reduces (but does not eliminate) the write bottleneck inherent to level 4. As with level 4, the result is asymmetrical performance, with reads substantially outperforming writes. Level 5 is often used with caching to reduce the asymmetry. Each entire data block is written on a data disk; parity for blocks in the same rank is generated on Writes, recorded in a distributed location and checked on Reads. RAID 5 requires a minimum of 3 drives to implement but provides a higher proportion of the array as usable storage over RAID0, RAID1 and their variants. Certain variants (usually non standard and collectively termed RAID 5+) use a second disk for parity thus permitting multiple simultaneous failures.

Advantages

  • High Read data transaction rate - may approach striped data speeds
  • Medium Write data transaction rate
  • Greater proportion of capacity is available for storage than with lower RAID levels - roughly just 33% overhead
  • Good aggregate transfer rate
  • Possibly the most popular RAID configuration
Disadvantages
  • Disk failure has a medium impact on throughput
  • Complex controller design
  • Intensive rebuild in the event of a disk failure. While suffering a disk failure, simple RAID 5 arrays may have no further fault tolerance until the failed disk has been rebuilt and the rebuild process itself places increased stress on the remaining members, in-turn increasing the chance of further failure
  • Individual block data transfer rate same as single disk
Recommended Applications
  • File and Application servers
  • Low throughput Database servers (high transaction rates on large DBs will be severely impacted by RAID 5)
  • WWW, E-mail, and News servers
  • Intranet servers
  • Live storage and backup

                                                   

Level 53

High I/O Rates and Data Transfer Performance. RAID 53 should really be called RAID 03 because it is implemented as a striped (RAID level 0) array whose segments are RAID 3 arrays. RAID 53 has the same fault tolerance and fault tolerance overhead as RAID 3. RAID 53 requires a minimum of 5 drives to implement

Advantages

  • High data transfer rates are achieved thanks to its RAID 3 array segments
  • High I/O rates for small requests are achieved thanks to its RAID 0 striping
Disadvantages
  • Very expensive to implement
  • All disk spindles must be synchronized, which limits the choice of drives
  • Byte striping results in poor utilization of formatted capacity
Recommended Applications
  • File and Application servers
  • Database servers
  • WWW, E-mail, and News servers
  • Intranet servers
  • Live document storage and backup

                                                   

Level 6

Independent Data disks with two independent distributed parity schemes. RAID 6 is essentially an extension of RAID level 5 which allows for additional fault tolerance by using a second independent distributed parity scheme (two-dimensional parity)

Advantages

  • Data is striped on a block level across a set of drives, just like in RAID 5, and a second set of parity is calculated and written across all the drives and so provides for an extremely high data fault tolerance and can sustain multiple simultaneous drive failures
  • Perfect solution for mission critical applications
Disadvantages
  • Very complex controller design
  • Controller overhead to compute parity addresses is high
  • Very poor write performance
  • Requires N+2 drives to implement because of two-dimensional parity scheme
Recommended Applications
  • File and Application servers
  • Low throughput Database servers (high transaction rates on large DBs will be severely impacted by RAID 6)
  • WWW, E-mail, and News servers
  • Intranet servers
  • Live storage and backup

                                                   

Level 7

Optimized Asynchrony for High I/O Rates as well as High Data Transfer Rates. All I/O transfers are asynchronous, independently controlled and cached including host interface transfers. All reads and writes are centrally cached via the high speed X-Bus. Dedicated parity drive can be on any channel. Fully implemented process oriented real time operating system resident on embedded array control microprocessor. Embedded real time operating system controlled communications channel. Open system uses standard SCSI drives, standard PC buses, motherboards and memory SIMMs. High speed internal cache data transfer bus. Parity generation integrated into cache. Multiple attached drive devices can be declared hot standbys. Manageability: SNMP agent allows for remote monitoring and management

Advantages

  • Overall write performance is around twice as fast as a single spindle
  • Read performance and 1.5 to 6 times better than other array levels
  • Host interfaces are scalable for connectivity or increased bandwidth
  • Small reads in multi-user environment have very high cache hit rate resulting in near zero access times
  • Write performance improves with an increase in the number of drives in the array
  • Access times decrease with each increase in the number of actuators in the array
  • No extra data transfers required for parity manipulation
  • RAID 7 is a registered trademark of Storage Computer Corporation.
Disadvantages
  • One vendor proprietary solution
  • Extremely high $/Gb
  • Very short warranty
  • Not user serviceable
  • Power supply must be UPS to prevent loss of cache data
Recommended Applications
  • Corporate data mining
  • Large/High throughput Database servers
  • "Big data" search engines

                                                   

JABOD (or sometimes JBOD)

Not really a RAID configuration, it is included here because many RAID controllers can support hard disks just as a simple controller. When working in this mode it is termed "Just A Bunch Of Disks". Clearly, the RAID controller is being under-utilized and offers none of the advantages detailed above except that the all the disks are available for storage. Some controllers allow the disks to be "Spanned" in JABOD. Subject disks become a single contiguous array, similar in concept to RAID0 except that spanning supports different capacity disks and each disk is filled consecutively.

Proprietory and Modern Filesystems

RAID is getting quite old and has always approached the problem of data redundancy by duplicating disk blocks (which is the main reason behind individual member disks being the same capacity). New filesystems and approaches are producing a raft of generally proprietory systems. These are often given RAID-like names to make capital on a familiar phrase and concept. Data Robotics use a system called "Beyond RAID" in their DroBo series of filers. This uses a non-block approach to data security, rather splitting data into variable size files on a proprietory filesystem. This provides excellent redundancy and allows for different size disks to be used without compromising either capacity or redundancy. It also supports two parity disks (see RAID 6). Read and Write times are comparable with RAID 5 but deletes can be very slow, especially on larger files - possibly due to the filer being a Linux based system and so the filesystem likely based on ext3 (ext3 wipes every iNode for a file delete rather than just marking the space as free in the directory - large files have lots of iNodes).

New filesystems have RAID-like features built in without the need to apply some other scheme at a hardware level. These are arguably a better path to follow as being either non-proprietory or ubiquitous, there is more chance the disks are transferrable between systems while keeping the data intact.

Last Words on Data Security

A RAID array is tough and with proper maintenance, will provide the best security for your data, but it is not infallable. Most importantly - it is not a replacement for a proper back-up regimen - even though it might form a crucial part of it. Backing up local disk to a RAID array is excellent practice but the array itself needs to be backed-up also. If you are serious about irreplacable data, you have to accept an unpleasant truth - one day it is going to let you down. WHEN that day comes, you need to have your last chance copy of the data to ensure you can rebuild your working data. Computers can be rebuilt, their Operating Systems can be replaced. Last years accounts, all your live invoices and customer records cannot be without much pain and work. You absolutely must have a secondary copy of all live data and it should not be stored close to the original - if your offices burn down, this second copy is going to be no use if it was kept in the same server room. Be smart, accept this truth and invest in some off-site data storage. At the most basic (as that is what most small businesses can afford) buy some good synchronization software and a large external disk and take it with you when you go home each night. If you never have to use it, be happy!