Pages

Macs and Failing Hard Disks - an early detection tool

The other day I was sitting at my desk when I started to hear a faint clicking sound. I pushed the noise out of my mind for a while and continued to work on the task at hand. Before long the clicking started to get louder and louder; it was clearly a consistent mechanical noise and was coming from under my desk, right where my Mac Pro is parked.

I popped my head down there and sure enough, it sounded like one of my 4 hard drives was starting to go. Usually if you hear a clicking sound coming from a hard drive its demise is imminent. I blasted out a quick note about this on Twitter and my friend Ast recommended that I try running SMART Utility to see where the problem was.

SMART Utility for Mac scans the internal hardware diagnostics of a hard drive to quickly determine its health. Using the data collected on the hard drive itself as well as a custom algorithm it can help predict when a hard drive is starting to have problems and may need to be replaced. It's like an early warning system that can give you a chance to pull the data off a drive before it's too late.

I pulled SMART Utility down and ran it and sure enough a Growl warning popped up:


When I took a look at the main SMART Utility screen there was a failing drive:
My Backup drive was having some issues: 375 errors and a reallocated bad sector. This 1TB drive is my primary backup for Time Machine; with that drive potentially compromised I started to panic. With all the documents, photos and digitized home video I've collected over the last 20+ years I was worried if one of my primary drives went down I'd be in serious trouble.

Fortunately for me I had an additional drive that I kept in my Mac Pro to serve as a spare. SMART Utility recommended that I replace the drive so I reset Time Machine to point at my spare drive and let it run, backing up the drive overnight.

The next day I shut down the Mac Pro, preparing to pull out the bad drive. It was only after I powered down my Mac that I realized I could still hear that clicking sound that started this little adventure. How was the drive clicking if it didn't have any power?!?

Well, it turns out it wasn't one of my drives that was doing the clicking; it was an older UPS that was also parked right next to my Mac Pro. The fan in it had started clicking—that was the sound I was actually hearing. I proceeded to kick the UPS until it stopped clicking.

(No, really, I did. Kicked it like a soccer ball. It ended up getting quiet for about 10 minutes too. Ultimately I ended up having to replace it anyway. No nasty comments from the People for the Ethical Treatment of UPSs, please.)

An Important Lesson
While the clicking sound wasn't actually the problem it did prompt me to test my drives. Had I known that a tool like SMART Utility was out there I would have bought and run it a long time ago. Sure, the drive SMART Utility identified hadn't completely failed yet and is technically still serviceable. That said, the data I have is far too important to store it on a drive that shows signs of having problems.

SMART Utility is a nice little app, can diagnose all of your drives in just a few seconds and costs $25. Highly recommended.

Oh yeah, if your drive starts to make a clicking sound I wouldn't recommend kicking it until it quiets down. That's only something you do with a balky UPS. Got a tip for keeping your hard drive healthy? A utility you recommend for ensuring it's safe? Drop a note in the comments.

22 comments:

Keleko said...

The failing drive also infected your UPS. That must be why the UPS was clicking and made you think a drive was failing. You'll need to add an anti-virus scanner to the new UPS to prevent that from happening again.

Allan Scullion said...

Whilst it is good to know there is a tool out there, I really feel that OSX should be on top of the SMART status and report it to the user as soon as it becomes a problem.

AliBali said...

The only tool that caught my last Hard Drive fail was restarting my iMac and using the Apple Hardware extended test.

I have a (free) SMART checker which didn't pick it up and also Diskwarrior, Disk Utility and Tech Tools Deluxe missed it.

Apple Hardware test takes a long time if you run the extended test - which can be vital to pick up RAM failure.

Time Capsule HD failures are very difficult to pick up as most of the above tools either are not supported or don't work.

netnothing said...

I have SMARTReporter installed on all my Macs:

http://www.corecode.at/smartreporter/

Just had a buddy tell me it warned him of a failing drive.

-Kevin

Dan said...

Very interesting post, thanks Dave. Smart Utility sounds like, well, a "smart" app to have. ;)

Anonymous said...

I've been using SMARTreporter for years. It has the advantage that it is always there, checking those error count registers. Much better than having to manually run something when you suspect a problem, and it is free!

phitar said...

SMART is a actually a poor predictor of drive failure. http://labs.google.com/papers/disk_failures.html

David Alison said...

@phitar: What I found interesting is that the time that the errors were occurring roughly corresponded with when I was getting Time Machine errors. While there may not be a direct correlation between drive failure rates and SMART errors found I'm using it as a way to suspect that a drive has problems. Better safe than sorry.

A good report though - thanks for the link to it.

@all: I just checked out SMARTReporter. Very cool and has some features (mainly running all the time and notification options) that SMART Utility doesn't have. I still like that the author of SMART Utility has included an algorithm that projects a possible failure rather than simply reporting the raw numbers.

I do agree with Allan Scullion above though - this really should be a part of OS X.

Paul Russo said...

> SMART is a actually a poor predictor of drive failure.

Yes, I too have found that failing drives rarely indicate a failing SMART status in Disk Utility, so I have taken a SMART status of Verified with a grain of salt. Because of this I've totally ignored this kind of utility.

From the Smart Utility web page:

> ...to detect drives failing before SMART
> indicates it has failed

Maybe I've been closed minded. A utility that checks all the SMART results and makes it's own decision might be better than the drive itself making that determination. I wonder whether it's in the drive manufacturer's self interest to have the drive confess that it is failing.

Anyway, I wish OS X did this kind of thing on it's own. Drive failures are way more common than most people think.

netnothing said...

While SMART isn't perfect, I know a few times that having SMARTReporter in the menubar has saved some data from a failing drive by catching early. I've also had drives fail with no warning.

Don't take SMART as Gospel.....just use it as a possible tool to help.

Backups are the only real sure-fire way to protect your data.

-Kevin

James Katt said...

A UPS generally lasts only about a year before needing replacement.

Hendrik said...

I would recommend having a second backup disk, ideally stored off-site.

MacGecko said...

If you have a Mac with easily accessible hard disks and you have a PC you may also want to look into SpinRite 6 by from grc.com

Bradley said...

After my boot drive died without any SMART warning, I upgraded to a RAID 6 array with hot swap enterprise class drives. The RAID controller auto-maps out bad blocks and can send me an email of any failure. It also beeps in case of a real drive failure and a light goes a different color to show which has gone bad. The whole thing is battery backed up. Overall drives are partitioned and Super Duper Cloned from one partition to the other daily. Unless there is a fire or flood or theft, I can't lose data. Also it's damn fast. More pricey than most solutions, but just about bullet proof.

Anonymous said...

In the System Profiler SMART status is reported in OS X (for free).

Since SMART doesn't catch a lot of failing drives, I would never pay for a SMART utility.

Backup your data and if you really want a useful HD utility, try Drive Genius.

Anonymous said...

People! You have been warned! SMART does not test for all possible types of failure. Your drive could die without SMART warning you. Back up your data at all times!

-Blackrockcity

David Alison said...

@Anon: I don't think anyone would think that using a SMART based tool—or any utility for that matter—obviates the need for backups. It's just a tool to help minimize the need to have to use a backup.

mbutch said...

Hi, I'm the developer of SMART Utility. I first want to thank David for his blog post. I'm glad it could help him out.

I wanted to respond to some of the comments here and clarify them.

THe difference between SMARTReporter, Disk Utilty, and other similar utilities is that they only look at the overall SMART status, which I found can be very conservative. SMART Utility looks at each attribute and can give a pre-fail warning.

SMART Utility can also be setup to poll every hour, day, or week, and with Growl installed can post notifications. Version 3.0 will offer a menu item as well.

The last thing is that while SMART is not perfect at alerting for errors, even that Google study says its better than nothing. If there is an error in SMART, it can indicate a future problem (while if there is no error that doesn't mean there won't be a problem).

I hope that clears some things up.

Riot Nrrrd™ said...

While I suppose this is all well and good for those of you with Mac Pros and multiple internal disks, I have a MacBook Pro with 2 Venus RAID1 units hanging off of a dual-port eSATA ExpressCard/34 card as well as 2 FireWire drives.

With a total of 7 physical disks (including the internal system disk) but only the internal drive having S.M.A.R.T. capability, the whole thing is sorta useless to me.

(In fact, one of the Seagates inside one of the 2 Venus AMS DS-DS3RPRO RAID units just croaked with no warning - from either the RAID SteelVine software or the OS. I wonder what would've happened if it had S.M.A.R.T. capability ... )

In short, I'd rather lobby for Apple to support S.M.A.R.T. status in external drives (that support it internally), that would be a lot more useful to me.

Having mirrored RAID units make me sleep better at night, but forewarning would still be good - I could know to order a replacement drive as it was just about to fail, instead of having to order it after the fact and be without the RAID for several days (I'm paranoid and shut the other remaining working drive off when one goes).

Vito Traino said...

I was very entertained by this post, good luck taking care of your faulty hard drive.

Anonymous said...

In my experience, various and sundry "SMART reporters" seldom catch a failing drive. Reliance upon them is like a suicide pact.

Reliance upon a single Time Machine drive (especially an Apple brand one) is not good either.

I would suggest at least two clones made with SuperDuper!, preferably three with at least one off site. One time machine drive in addition to the clones will suffice in many circumstances, but two, with one off site, is better.

The price of drives is down, but so is the quality. I have had a number of drives fail zeroing or other processes to validate them prior to placing them in service. You should also be aware that a drive that is three years old is on life support and probably should be replaced for mission critical applications.

There are few things that will give you a worse sinking feeling in your stomach than to learn that you drive has gone haywire and then to discover that the only backup you have is corrupt because the main drive had problems you did not realize and they were transported over to your backup.

P.S. No, I do not sell drives...it only seems like it. :-)

Anonymous said...

SMART utilities and their effectiveness have been discussed time and time again on the web. The problem with SMART is that it doesn't proactively "hunt" for problems.

The 2 biggest problems, IMHO, are surface scan defects and mechanical failure. SMART is good at mechanical failure but abysmal in surface scan problems. The reason is that SMART monitors the mechanicals of a drive during start up and use, but most implementations don't scan the drive. SMART won't acknowledge a bad sector until it's recovered.

For this reason, once again, IMHO, you need a tool capable of doing surface scans like TechTool Pro, Drive Genius, or Scannerz. You need to scan the drive for SMART to detect a bad sector or sectors developing. Scannerz, FWIW also detects marginal sectors, and I'm not sure they're even seen as "failures" by SMART.

Best wishes,

Captain Nemo