3 Ways RAID 5 Can Fail


RAID 5 is one of the most popular ways to set up a server, especially for small businesses. The way this setup works is that it spreads data across three or more hard disks and creates extra data by running XOR functions on everything else. The extra data allows the entire array to regenerate any missing data if one of the hard disks stops working or is removed, no matter how many disks are in the array. Most RAID 5 setups use anywhere from three disks, the bare minimum to run RAID 5, to six disks. Past six hard disks, RAID 5 becomes less efficient compared to other arrays, such as RAID 6 or RAID 10.

Because of the redundant features in RAID 5 that allow the user’s server to keep running, even after one of its hard drives breaks down, many users mistake RAID 5 for an adequate form of data backup and get lulled into a false sense of security. They think, “Well, I can just replace one hard drive as soon as it fails and never worry about losing any of my data,” so they don’t think about backing up their data to an offsite data center or a cloud-based backup service.

Unfortunately, RAID 5 isn’t backup. And nobody’s data is 100% safe on a RAID 5 array. Gillware’s data recovery lab sees crashed RAID 5 servers on an almost-daily basis. Here are some of the most common ways we’ve seen people’s RAID 5 servers fail.

1. Improper RAID Monitoring

Let’s get a big reason for RAID 5 failure out of the way first. Usually, when a RAID 5 server breaks down, it’s because one hard drive failed months ago and nobody noticed, leaving the server vulnerable and unprotected when a second hard drive eventually stops working and crashes the whole server.

When people buy smaller NAS devices designed for a more plug-and-play, out-of-the-box functionality for their small business or freelance operations, they often forget to properly set up any sort of health monitoring for their new server. Any NAS manufacturer worth its salt lets you set up your NAS device to send you emails regarding its health, especially when a hard drive seems to be failing.

But because so many people just want instant gratification, many users do the bare minimum to get their NAS device up and running, and that means neglecting setting up this simple step to send emails to the right place whenever things might look like the server’s health might go south.

Even with larger, enterprise-grade servers, it’s too easy to miss this crucial step in the setup. One green light on your server turns yellow or red and nobody notices until the second one does—and by then, it’s too late.

This particular reason for a RAID 5 failure ties into just about every other way a RAID 5 can fail. For the most part, having proper RAID monitoring set up can stave off a good 95% of all server disasters, save for “acts of god” like flooding or fires. A RAID 5 server that runs with one drive already dead is considered “degraded.” The vast majority of RAID 5 servers Gillware’s data recovery specialists work with were running in a degraded state for weeks or even months before they crashed.

2. Power Outage

Let’s say your building loses power, or even someone just trips over the power cord to your server. Most of the time, everything will just come back on, but sometimes it won’t.

You know how every time you unplug a USB flash drive or external hard drive from your computer, your computer tells you to “safely eject” it first? Most of the time you can get away with not doing it, but every once in a while, you’ll unplug a device only to find that it won’t work properly the next time you plug it in, and it might even prompt you to initialize or format the device.

The same thing can happen when any hard disk drive, from the one in your PC to any of the dozen in a massive RAID server, suddenly loses power. There is always a slim chance that the abrupt loss of power will prevent the hard drive from spinning down properly, resulting in damage to its sensitive innards that will prevent the drive from working properly.

The chances of two drives in a RAID 5 array both failing at the same time due to a sudden loss of power is astronomically low, but you better believe our lab has seen it happen. More likely, the power outage happens long after one hard drive in the array has already failed without anyone noticing (see above), and this is what brings the whole server down. When this happens, it’s time to scramble and search for a professional data recovery company that can handle the job of salvaging data from the array.

3. Crashing During a Rebuild

The one feature of RAID 5 that sets it apart from other RAID levels is its use of XOR parity to provide fault tolerance. When one hard drive in the array fails, the array uses the parity data to reconstruct all of the missing data from that one drive on the fly; once you replace the failed drive with a fresh one, the array rebuilds all of the missing data and writes it to the new drive so that the whole array can carry on in perfect health.

The rebuild process is supposed to save a degraded RAID 5 array, but it also puts a lot of stress on the remaining drives in the array, which can cause one or more of them to wear out. All hard drives eventually break down sooner or later, and the high stress of a RAID 5 rebuild can actually cause one or more hard drives to fail if there is a lot of data to write and the drives are coming close to the end of their life cycle anyway.

When your RAID 5 server crashes, all of the data on it becomes inaccessible. You can’t rebuild any of the data if more than one drive fails without leaving big holes throughout your array that will make pretty much all of your data completely unusable. When the data on your failed RAID 5 server is mission critical and can’t be recreated, and when you don’t have any of the lost data backed up, that’s when you have to reach out and find data recovery services that can get the job done. Look for professional labs with cleanrooms and an experienced staff of computer scientists and mechanical engineers, since only these companies will have the proper experience and facilities to properly help you.