“The things you own end up owning you.”
– Tyler Durden from the movie Fight Club
About a week ago, I received a new processor to install in my file/media server. The idea was to replace the quad-core processor (which is overkill for a file server) with a low-power dual-core processor. The price difference between the low-power dual-core and the higher-power quad-core would allow me to make money on the trade (by selling the quad-core) and also benefit from energy savings as well (since the server is running 24/7).
In the process of swapping parts, I had to remove the large heatsink assembly that I had installed to provide ample airflow for the quad-core processor. The dual-core simply didn’t need that much cooling, so I wanted to just install the stock heatsink that came with the dual-core. Whoever designed the large Typhoon heatsink for my quad-core processor must have thought that no-one ever changes their heatsink once installed. Because the backplate to the heatsink had an adhesive that bonded very strongly to the backside of my motherboard.
So I carefully pried the large heatsink backplate off, swapped in the new dual-core processor and stock heatsink, and put the rest of the computer back together. Except when I powered it on, it came on for a few seconds, then rebooted. This automatically repeated several times before I powered off the computer. Something was very wrong.
At first, I thought it was just an issue of a bad processor (after all that’s the only part I changed), or maybe I forgot to connect something properly. So I swapped the new processor into my main desktop (which had a compatible motherboard). It worked fine. So I took out the motherboard and carefully inspected the back of it where I removed the heatsink backplate. And I found the problem: a tiny bit of the motherboard had been scratched during the removal process, and shorted out one of the connections.
I was now out a motherboard. I ordered a compatible one, and the last blog entry featured a hopeful me. Well, the replacement had since arrived. I installed it, and booted up. But I didn’t have a chance to hit the DEL key and enter into the BIOS before Windows started to load. I was actually quite surprised that it completely loaded into Windows (considering it was loading onto a completely different motherboard. That missed DEL key ended up being a crucial mistake.
Well, it turns out that the new motherboard’s BIOS has the ICH10R SATA ports set to IDE mode by default – not RAID. So the computer booted up with 6 individual hard drives visible to Windows, not one big RAID array. And unfortunately, Windows has a tendency to search for newly-installed hardware and automatically detect new hard drive partitions if available. So once the OS loaded, it detected the 6 new hard drives and apparently overwrote the partition table information on two of the drives. Because when I rebooted and set the ICH10R drives to RAID mode, the Intel RAID BIOS detected the array . . . with two non-RAID member drives.
A Titanic Failure
A RAID 5 array can be thought of like the Titanic. It has built-in safety features in the event of a damaged drive; like the Titanic had airtight compartments. Lose a couple airtight compartments, and the ship will still float (or lose one hard drive in the case of RAID 5). But if too many airtight compartments are breached (or if two or more hard drives fail), and the ship sinks. My RAID 5 array could still float with one failed drive. But with two hard drives showing up as non-RAID members, this ship was sunk.
There was some hope that I could recover the data, as others had done before. But I soon realized that because I didn’t remember exactly in which order the 6 hard drives were connected to the array, I couldn’t recover the array. But why didn’t I just try all the possible combinations of the drives? There are only 6 drives, right? The math, it turns out, was not on my side.
To try all the possible combinations of order-sensitive hard drive arrangements, I would have to try a factorial of the number of drives: 6! (six factorial):
1x2x3x4x5x6 = 720 unique combinations
I have a lot of patience, but that’s a ridiculous amount of time to spend manually swapping SATA cables. And the chances that I would screw up and forget one combination are pretty high. So after trying the two obvious configurations (top drive to bottom, and bottom to top) with no luck, I have decided to give up. That’s right; I’m saying a sad goodbye to 4.2TB of software, documents, pictures, music, movies, TV shows, and video projects. Ouch.
If you squint and strain your eyes, in the blur that forms you can half see / half imagine the hazy image of a good outcome in this whole mess. The first part is pretty obvious: even relatively robust data storage setups can be prone to failure. I was protected against an individual drive failure. And I was even protected against a motherboard failure by the widespread use of the Intel ICH10R chipset. But in the end, it was bad luck and human error that decided the fate of my data.
The second part is less obvious, but I tried to hint at it by the Tyler Durden quote at the beginning of this post: “the things you own end up owning you.” Over time, my hardware and software purchases had been geared towards catering to my massive stockpile of data. Most of it was just movies and TV shows; hardly critical data. But in my haste to “have it all” I had build a system that, while providing some safety nets in terms of hard drive failure, also jeopardized my valuable data (software, documents, pictures, music, video projects) that resided on the same RAID array.
So while the loss is certainly painful, it provides me with an excellent opportunity to do two things: stop obsessing over large quantities of data (especially movies and TV shows) and getting rid of my complicated, expensive server. In accordance with all the new trend in advertising, it’s time to go “back to basics.” Which for me means trimming down my media library and keeping multiple copies of my data on separate hard drives.
And at least for the foreseeable future, I’m saying goodbye to RAID. It was nice when it worked. But it also lulled me into a false sense of security. And as a result, I lost a lot of data. Much of it can be reconstructed over time, but some of it will remain forever lost. It’s good to be free from the pull of buying more and more hardware to keep up with my ever-expanding data collection, a la Tyler Durden. But it’s also a shame that the important data was lost because of my misplaced desire to keep too much.