Macs and Failing Hard Disks - an early detection tool
The other day I was sitting at my desk when I started to hear a faint clicking sound. I pushed the noise out of my mind for a while and continued to work on the task at hand. Before long the clicking started to get louder and louder; it was clearly a consistent mechanical noise and was coming from under my desk, right where my Mac Pro is parked.
I popped my head down there and sure enough, it sounded like one of my 4 hard drives was starting to go. Usually if you hear a clicking sound coming from a hard drive its demise is imminent. I blasted out a quick note about this on Twitter and my friend Ast recommended that I try running SMART Utility to see where the problem was.
SMART Utility for Mac scans the internal hardware diagnostics of a hard drive to quickly determine its health. Using the data collected on the hard drive itself as well as a custom algorithm it can help predict when a hard drive is starting to have problems and may need to be replaced. It's like an early warning system that can give you a chance to pull the data off a drive before it's too late.
I pulled SMART Utility down and ran it and sure enough a Growl warning popped up:
When I took a look at the main SMART Utility screen there was a failing drive:
My Backup drive was having some issues: 375 errors and a reallocated bad sector. This 1TB drive is my primary backup for Time Machine; with that drive potentially compromised I started to panic. With all the documents, photos and digitized home video I've collected over the last 20+ years I was worried if one of my primary drives went down I'd be in serious trouble.
Fortunately for me I had an additional drive that I kept in my Mac Pro to serve as a spare. SMART Utility recommended that I replace the drive so I reset Time Machine to point at my spare drive and let it run, backing up the drive overnight.
The next day I shut down the Mac Pro, preparing to pull out the bad drive. It was only after I powered down my Mac that I realized I could still hear that clicking sound that started this little adventure. How was the drive clicking if it didn't have any power?!?
Well, it turns out it wasn't one of my drives that was doing the clicking; it was an older UPS that was also parked right next to my Mac Pro. The fan in it had started clicking—that was the sound I was actually hearing. I proceeded to kick the UPS until it stopped clicking.
(No, really, I did. Kicked it like a soccer ball. It ended up getting quiet for about 10 minutes too. Ultimately I ended up having to replace it anyway. No nasty comments from the People for the Ethical Treatment of UPSs, please.)
An Important Lesson
While the clicking sound wasn't actually the problem it did prompt me to test my drives. Had I known that a tool like SMART Utility was out there I would have bought and run it a long time ago. Sure, the drive SMART Utility identified hadn't completely failed yet and is technically still serviceable. That said, the data I have is far too important to store it on a drive that shows signs of having problems.
SMART Utility is a nice little app, can diagnose all of your drives in just a few seconds and costs $25. Highly recommended.
Oh yeah, if your drive starts to make a clicking sound I wouldn't recommend kicking it until it quiets down. That's only something you do with a balky UPS. Got a tip for keeping your hard drive healthy? A utility you recommend for ensuring it's safe? Drop a note in the comments.
Comments
I have a (free) SMART checker which didn't pick it up and also Diskwarrior, Disk Utility and Tech Tools Deluxe missed it.
Apple Hardware test takes a long time if you run the extended test - which can be vital to pick up RAM failure.
Time Capsule HD failures are very difficult to pick up as most of the above tools either are not supported or don't work.
http://www.corecode.at/smartreporter/
Just had a buddy tell me it warned him of a failing drive.
-Kevin
A good report though - thanks for the link to it.
@all: I just checked out SMARTReporter. Very cool and has some features (mainly running all the time and notification options) that SMART Utility doesn't have. I still like that the author of SMART Utility has included an algorithm that projects a possible failure rather than simply reporting the raw numbers.
I do agree with Allan Scullion above though - this really should be a part of OS X.
Yes, I too have found that failing drives rarely indicate a failing SMART status in Disk Utility, so I have taken a SMART status of Verified with a grain of salt. Because of this I've totally ignored this kind of utility.
From the Smart Utility web page:
> ...to detect drives failing before SMART
> indicates it has failed
Maybe I've been closed minded. A utility that checks all the SMART results and makes it's own decision might be better than the drive itself making that determination. I wonder whether it's in the drive manufacturer's self interest to have the drive confess that it is failing.
Anyway, I wish OS X did this kind of thing on it's own. Drive failures are way more common than most people think.
Don't take SMART as Gospel.....just use it as a possible tool to help.
Backups are the only real sure-fire way to protect your data.
-Kevin
Since SMART doesn't catch a lot of failing drives, I would never pay for a SMART utility.
Backup your data and if you really want a useful HD utility, try Drive Genius.
-Blackrockcity
I wanted to respond to some of the comments here and clarify them.
THe difference between SMARTReporter, Disk Utilty, and other similar utilities is that they only look at the overall SMART status, which I found can be very conservative. SMART Utility looks at each attribute and can give a pre-fail warning.
SMART Utility can also be setup to poll every hour, day, or week, and with Growl installed can post notifications. Version 3.0 will offer a menu item as well.
The last thing is that while SMART is not perfect at alerting for errors, even that Google study says its better than nothing. If there is an error in SMART, it can indicate a future problem (while if there is no error that doesn't mean there won't be a problem).
I hope that clears some things up.
With a total of 7 physical disks (including the internal system disk) but only the internal drive having S.M.A.R.T. capability, the whole thing is sorta useless to me.
(In fact, one of the Seagates inside one of the 2 Venus AMS DS-DS3RPRO RAID units just croaked with no warning - from either the RAID SteelVine software or the OS. I wonder what would've happened if it had S.M.A.R.T. capability ... )
In short, I'd rather lobby for Apple to support S.M.A.R.T. status in external drives (that support it internally), that would be a lot more useful to me.
Having mirrored RAID units make me sleep better at night, but forewarning would still be good - I could know to order a replacement drive as it was just about to fail, instead of having to order it after the fact and be without the RAID for several days (I'm paranoid and shut the other remaining working drive off when one goes).
Reliance upon a single Time Machine drive (especially an Apple brand one) is not good either.
I would suggest at least two clones made with SuperDuper!, preferably three with at least one off site. One time machine drive in addition to the clones will suffice in many circumstances, but two, with one off site, is better.
The price of drives is down, but so is the quality. I have had a number of drives fail zeroing or other processes to validate them prior to placing them in service. You should also be aware that a drive that is three years old is on life support and probably should be replaced for mission critical applications.
There are few things that will give you a worse sinking feeling in your stomach than to learn that you drive has gone haywire and then to discover that the only backup you have is corrupt because the main drive had problems you did not realize and they were transported over to your backup.
P.S. No, I do not sell drives...it only seems like it. :-)
The 2 biggest problems, IMHO, are surface scan defects and mechanical failure. SMART is good at mechanical failure but abysmal in surface scan problems. The reason is that SMART monitors the mechanicals of a drive during start up and use, but most implementations don't scan the drive. SMART won't acknowledge a bad sector until it's recovered.
For this reason, once again, IMHO, you need a tool capable of doing surface scans like TechTool Pro, Drive Genius, or Scannerz. You need to scan the drive for SMART to detect a bad sector or sectors developing. Scannerz, FWIW also detects marginal sectors, and I'm not sure they're even seen as "failures" by SMART.
Best wishes,
Captain Nemo