Redundant Array of Independant Disks - RAID
Ensuring your data survives a disk problem is a great worry for most people and businesses. RAID is all about making sure you can get to your files when you want to and to protect them when something goes wrong with the disks that store them. Unbelievably some ignore the potential problem and do not perform even the most basic back-ups. Furthermore, some even store everything on a single USB disk never contemplating it is just as likely to fail as the disk in their PC. The author has a friend who lost thousands of photographs for exactly this reason. Only when it was too late did he understand about backing up irreplacable data. A simple question is: "if I lost this, would it hurt?" If the answer is YES then you need to make it NO. It is that easy.
A good way of safeguarding data is to use several disks to store it - to always make sure you have the data in more than one place. This might simply be a case of copying one disk to another but a better way is to spread the data across the disks and use some mathematics and electronics to make sure it stays safe. A RAID array will do this and a plus point - it is usually faster than a single disk.
RAID comes in several different flavours called "Levels" and each has strengths and weaknesses. There now follows a discussion of the levels you are most likely to encounter and a discussion of the pros and cons of each.
![]() |
The above diagram is called the "CAP Triangle". It may be thought of as representing extent of three critical requirements of disk storage: Cost, Availability (how likely your data is to survive a problem) and Performance (how quickly can you Read and Write that data), Every disk storage method can be thought of as occupying a point within this triangle- the closer to an edge, the greater degree of this property it possesses, the further away, a lesser degree.
As a rule of thumb, each disk mechanism in a RAID set needs to be of the same capacity and preferrably model. Most RAID controllers will permit some "mix n matching" but the lowest capacity of any single drive will be imposed on all disks in the array. e.g. if a simple RAID5 array is composed of 1x320Gb and 3x500Gb member disks, the whole group will be treated as 4x320Gb disks. Some RAID controllers have a very limited range of disks they will work with due to close-coupling of the low level commands of the disk interface to the firmware of the controller to gain increased performance. In all instances you will need to check that disks and controllers you plan to use are compatible.
Software RAID
This article deals mainly with RAID provided by hardware specifically designed for the purpose but it can be achieved in software also: Software RAID falls into two broadly related categories. In all cases it is implimented by either the BIOS on
RAID Levels
The most common RAID levels, listed below, approach data redundancy by duplicating physical blocks of disk storage (i.e. at a hardware level on the actual disk plattens) across multiple mechanisms. This is enhanced further by some methods using mathematical tricks to store a fingerprint of the data (parity) on another disk. This can be used to rebuild any missing data in the event of a failure. Each RAID level exhibits its own unique benefits and drawbacks. This overview will attempt to highlight each or help you find the right RAID level for your particular application. Please note that the numbers assigned to each level of RAID do not indicate superiority, they are merely for differentiation. It is also important to remember that most RAID configurations require all member disks to be the same capacity (if not the same make & model).
Level 0Striped Disk Array without Fault Tolerance. RAID level 0, often time called "striping", is a performance-orientated data mapping technique. "Striping" means the data being written to the array is broken down into sections, which are written simultaneously across all member disks of the array. Because the data is not stored contiguously on a single drive, it can be accessed in parallel - The whole data being constructed from blocks read back simultaneously also and presented to the requesting system at full interface speed with the only seek delay being that required for a single block (as all drives do this together). This provides very high I/O performance (among the best) at low cost but provides no redundancy or Fault Tolerance at all. It is ideally suited to working buffers where the data is held temporarily and rapid access is required for work-in-progress but it is not expected to reside on the array indefinitely but rather moved to more resilient storage or discarded once processed due to the very high risk of loss. RAID Level 0 requires a minimum of 2 drives to implementAdvantages
|
|
Level 0+1High Data Transfer Performance. RAID 0+1 is NOT to be confused with RAID 10. Two sets of striped disks are mirrored and a single drive failure will cause the whole array to become, in essence, a RAID Level 0 array. Requires a minimum of 4 drives to implement.Advantages
|
![]()
|
Level 1Mirroring and Duplexing. RAID level 1, or mirroring, has been used longer than any other form of RAID. Level 1 provides redundancy by writing identical data to each member disk of the array, leaving a "mirrored" copy on each disk, thus a second copy of each data block is available should the first become un-usable. Mirroring remains popular due to its simplicity and high level of data availability. Level 1 operates with two or more disks that may use parallel access when reading to improve I/O performance. Level 1 provides very good basic data reliability and improves performance for read intensive applications but at relatively high cost. For best performance, the controller must be able to perform two concurrent separate reads per mirrored pair or two duplicate writes per mirrored pair. RAID Level 1 requires a minimum of 2 drives to implement.Advantages
|
|
Level 10High Reliability combined with High Performance. Not to be confused with RAID 0+1, RAID 10 is implemented as a striped array whose segments are RAID 1 arrays. It has the same fault tolerance as RAID level 1 with the same overhead for fault tolerance as mirroring alone. RAID Level 10 requires a minimum of 4 drives to implement. It is arguably the most common RAID level and provides an excellent trade-off in simplicity of implimentation and speed in use.Advantages
Recommended Applications
|
|
Level 2Hamming Code ECC. Each bit of data word is written to a data disk drive. Each data word has a Hamming Code or Error Correction Code (ECC) word recorded on the ECC disks. On Read, the ECC code verifies correct data or corrects single disk errors.Advantages
|
|
Level 3Parallel transfer with parity. RAID 3 adds redundant information in the form of parity to a parallel access striped array, permitting regeneration and rebuilding in the event of a disk failure. One strip of parity protects corresponding strips of data on the remaining disks. RAID 3 provides high data transfer rate and high data availability, at an inherently lower cost than mirroring. Its transaction performance is poor, however, because the array member disks operate in lockstep. RAID Level 3 requires a minimum of 3 drives to implementAdvantages
|
|
Level 30 and 03See RAID53 |
|
Level 4Independent Data disks with shared Parity disk. Like level 3, level 4 uses parity concentrated on a single disk to protect data. Unlike level 3, level 4 member disks are independently accessible making it better suited to transaction I/O rather than large file transfers. Because the dedicated parity disk represents an inherent bottleneck, level 4 is seldom used without accompanying technologies such as write back caching. Each entire block is written onto a data disk. Parity for same rank blocks is generated on Writes, recorded on the parity disk and checked on Reads. RAID Level 4 requires a minimum of 3 drives to implementAdvantages
|
|
Level 5Independent Data disks with distributed parity blocks. By distributing parity across some or all of an array's member disks, RAID level 5 reduces (but does not eliminate) the write bottleneck inherent to level 4. As with level 4, the result is asymmetrical performance, with reads substantially outperforming writes. Level 5 is often used with caching to reduce the asymmetry. Each entire data block is written on a data disk; parity for blocks in the same rank is generated on Writes, recorded in a distributed location and checked on Reads. RAID 5 requires a minimum of 3 drives to implement but provides a higher proportion of the array as usable storage over RAID0, RAID1 and their variants. Certain variants (usually non standard and collectively termed RAID 5+) use a second disk for parity thus permitting multiple simultaneous failures.Advantages
|
|
Level 53High I/O Rates and Data Transfer Performance. RAID 53 should really be called RAID 03 because it is implemented as a striped (RAID level 0) array whose segments are RAID 3 arrays. RAID 53 has the same fault tolerance and fault tolerance overhead as RAID 3. RAID 53 requires a minimum of 5 drives to implementAdvantages
|
|
Level 6Independent Data disks with two independent distributed parity schemes. RAID 6 is essentially an extension of RAID level 5 which allows for additional fault tolerance by using a second independent distributed parity scheme (two-dimensional parity)Advantages
|
|
Level 7Optimized Asynchrony for High I/O Rates as well as High Data Transfer Rates. All I/O transfers are asynchronous, independently controlled and cached including host interface transfers. All reads and writes are centrally cached via the high speedAdvantages
|
|
JABOD (or sometimes JBOD)Not really a RAID configuration, it is included here because many RAID controllers can support hard disks just as a simple controller. When working in this mode it is termed "Just A Bunch Of Disks". Clearly, the RAID controller is being under-utilized and offers none of the advantages detailed above except that the all the disks are available for storage. Some controllers allow the disks to be "Spanned" in JABOD. Subject disks become a single contiguous array, similar in concept to RAID0 except that spanning supports different capacity disks and each disk is filled consecutively. |
|
Proprietory and Modern FilesystemsRAID is getting quite old and has always approached the problem of data redundancy by duplicating disk blocks (which is the main reason behind individual member disks being the same capacity). New filesystems and approaches are producing a raft of generally proprietory systems. These are often given RAID-like names to make capital on a familiar phrase and concept. Data Robotics use a system called "Beyond RAID" in their DroBo series of filers. This uses a non-block approach to data security, rather splitting data into variable size files on a proprietory filesystem. This provides excellent redundancy and allows for different size disks to be used without compromising either capacity or redundancy. It also supports two parity disks (see RAID 6). Read and Write times are comparable with RAID 5 but deletes can be very slow, especially on larger files - possibly due to the filer being a Linux based system and so the filesystem likely based on ext3 (ext3 wipes every iNode for a file delete rather than just marking the space as free in the directory - large files have lots of iNodes).New filesystems have RAID-like features built in without the need to apply some other scheme at a hardware level. These are arguably a better path to follow as being either non-proprietory or ubiquitous, there is more chance the disks are transferrable between systems while keeping the data intact. |
|
Last Words on Data SecurityA RAID array is tough and with proper maintenance, will provide the best security for your data, but it is not infallable. Most importantly - it is not a replacement for a proper back-up regimen - even though it might form a crucial part of it. Backing up local disk to a RAID array is excellent practice but the array itself needs to be backed-up also. If you are serious about irreplacable data, you have to accept an unpleasant truth - one day it is going to let you down. WHEN that day comes, you need to have your last chance copy of the data to ensure you can rebuild your working data. Computers can be rebuilt, their Operating Systems can be replaced. Last years accounts, all your live invoices and customer records cannot be without much pain and work. You absolutely must have a secondary copy of all live data and it should not be stored close to the original - if your offices burn down, this second copy is going to be no use if it was kept in the same server room. Be smart, accept this truth and invest in some off-site data storage. At the most basic (as that is what most small businesses can afford) buy some good synchronization software and a large external disk and take it with you when you go home each night. If you never have to use it, be happy! |