Mac Pro Crash Recovery: A Tale of 36 Hours

Yeah, it was that kind of weekend.

I went to check email Saturday morning, and was greeted with quite a shock.  My Mac Pro was locked in some sort of grey screen.  No icon, no progress, nothing.

A quick press of the power button confirmed it – simple power down.  No real OS boot.

On Friday night, before bed, I had shutdown the machine.  Some apps had been misbehaving, and I thought a full shut down & reboot was in order.  Apparently that reboot had failed.

I don’t know how “normal people” deal with problems like this.  When I say “normal”, I mean people who haven’t actually developed software on the Mac, who haven’t worked repairing Macs, and who haven’t spent countless hours futzing with their own machines.

Just in case its useful, here’s what I did.  The good news is that it proves out the benefit of using backup software, like Time Machine.  The bad news is that it also proves that this stuff is still way too hard:

1) Tried to reboot. Yes, I know, not rocket science.  But there is always that hope that just rebooting will magically “fix” the problem.  In this case, rebooting went into an endless loop.  Grey screen, Apple logo, spin icon… then grey screen and reboot.  Kept repeating.  Bad news.

The lack of either the blinking folder or the regular boot sequence told me I was on dangerous ground.  It was either a hardware issue, or the system was corrupted.  In either case, the machine was not getting to the normal boot sequence.

2) Tried to boot of DVD. For those “Dodgeball” movie fans, “If you can dodge a wrench, you can dodge a ball.”  For Macs, “If you can boot of a DVD, then your hardware can boot anything.”  It’s not totally true, but true enough.  In my case, it proved harder than you might think.   The machine wasn’t getting far enough in the boot sequence to load Bluetooth, so my wireless keyboard was useless.  Fortunately, I keep a USB keyboard around.  Plugged in, holding down the “C” key (nostalgia: the “C” is for CD, and they never migrated to “D” for DVD.)  In any case, if there is no DVD in the drive with a bootable OS, it opens the tray.  Got the tray opened, popped in the Mac OS X 10.5 Leopard DVD, and began to boot.  Total time spent: 20 minutes.

3) Diagnose the boot drive. On the Mac OS DVD, a little known trick is that while the installer is running, you can go to the Menu Bar, and select “Disk Utility” to run diagnostics on your disk.  I did so, and discovered some bad news.  My main system drive, a 300GB Western Digital, had problems.  Worse, Disk Utility basically told me that I was crazy if I thought it could fix them.  Drat.  Time spent: 10 minutes.

4) Get Mac OS X installed on another hard drive. Running the system off DVD is slow, and you are limited in options without a full Finder.  Fortunately, my iTunes HD had a few hundred GB free.  Installed OS X on that drive and rebooted.  While that happened, I went to have breakfast and actually get productive chores done.  Time spent: 30+ minutes.  Who knows, I didn’t come back to the machine for several hours.

5) Assess whether the system drive is lost cause.  I was ready to run down to Fry’s to get a new HD (or better yet, a new SSD.  Why not turn tragedy into opportunity?)  Unfortunately, the disk mounted.  Interesting.  I did get a strange system warning that I’d never seen before, telling me the disk had problems and that I should reformat it.  Never waste access to a dying disk.  I immediately tried to use Disk Utility to create a disk image of the disk, but it failed.  (You do this by dragging the hard drive icon from the desktop over the Disk Utility application).  Some cryptic error.  Fortunately, a Finder copy of my user directory worked, providing an extra backup of files, just in case.  Time spent: 20 minutes.

6) Reformat system drive. Well, Mac OS X told me to, right?  I was surprised, but I tried it.  Disk Utility was able to reformat the drive – I noticed the old formatting was Mac OS X, without journaling enabled.  Wow.  Was the drive that old?  In any case, I reformatted with the appropriate GUID setting for booting Intel macs, and with journaling enabled.  Afterward, a quick Disk Verify confirmed a shocking outcome… the drive was fine.  Time spent: 10 minutes.

7) Reinstalled Mac OS X on System Drive. Tempt fate?  Sure, why not.  This was the first time I had a hard drive crash after using Time Machine, and I was eager to try it out.  When you install Mac OS X 10.5 now, it asks you if you are migrating from another machine.  You can specify a Time Machine backup.  I was pleased to see the last one was from 10:59pm on Friday… less than 1/2 hour before the “great crash of 2009”.  Unfortunately, this process seems to take hours.  160GB of material for some reason took over 3 hours.  No way I’m sitting around for this!  Time spent: 3 hours+

8) Get everything up to date. I came back to the machine that evening.  It booted, seemed fine.  Even had my old accounts on the login screen.  I signed in, and everything looked normal.  All files/folders in the right places… except iPhoto failed to launch, and iTunes complained that it didn’t understand my library.  Whoops.  The DVD installed Mac OS 10.5… but we’re on 10.5.6 these days, and my apps and files had been upgraded.

Brief rant: I’m really wondering why Apple hasn’t tied the update logs from it’s automatic updates to the restore from Time Machine.  It’s pretty obvious that the Time Machine backup has a system on it that has a series of updates installed – would not be hard to boot the OS with instructions to download and install those updates.

In any case, Apple Mail tried to “import” my Mail folder.  I cancelled that and quit.  iTunes offered to create a new library, and I declined.  Phew.  Hope I’m safe.   Ran the System Update system preference, and discovered about 10 updates waiting for me.  Downloading them and installing would take… 2 hours!   Let it run over night.  Time spent: 2 hours + one nights sleep.

9) Get everything up to date… again. In the morning, after breakfast, checked on the machine.  Was booted, looked fine… except now Apple Mail had lost all of my old mail, and iPhoto still wouldn’t boot.  iTunes was fine, though.   Ran System Update again… and there were another 8 updates, clearly waiting for the last 10 to run.  Great.  Fine, let’s update some more.  Time spent: 1 hour + me leaving for the morning.

10) Restore Mail. Thank goodness I’m paranoid.  I copied the “Mail” folder in my “User > adamnash > Library” folder from the “extra” backup I had made to my System drive.  3GB to copy, but hard drive to hard drive over internal SATA 2 bus is wicked fast.  Time spent: 15 minutes.

11) Everyone lived happily ever after. It was about 11:30am on Sunday, literally about 36 hours since the crash happened.  And everything was back to normal.  Seriously, I doubt you could have easily figured out anything had happened.  Even little details like my browser history were there.  Firefox re-opened with the same 20 tabs I had open on Friday.  It was if the last 36 hours had been a test, and since I had kept calm and walked through the steps, I had passed.

So, what did I learn from this?  A few things:

  • Keep a Mac OS X boot DVD handy. Most people lose track of this, because it came with their Mac when they bought it.  Don’t lose it.  I prefer the retail disc myself – it’s worth the cost to have one.
  • Disk Utility is your friend. There was a time when Apple utility software sucked, and you had to go third party.  There are still superior third party tools out there (and for serious hard drive crashes, you need them.)  But these days, starting with the standard Apple software is a good bet.
  • Migration Assistant has come into its own. I’ve used it now for work and home.  It’s very good.  Not perfect, but better than hand-crafting system restores.  Very impressed with the Time Machine integration.  If it was smart enough to handle Apple Update history, I’d be truly happy with it.
  • Don’t underestimate the value of an extra hard drive. The reason my restore was relatively painless is that I had another hard drive that I could boot the system off of.  Without that, you have to depend on the DVD.  Ouch.  If you have a tower, and extra hard drive is cheap insurance (and extra storage).   If not, consider a cheap firewire external drive.
  • Time Machine is good. Look, if you care about your files, you should backup.  Period.  Time Machine makes it painless.  I’m really impressed – backup systems are only really tested when you need them, and I needed Time Machine today.  It came through.
  • Beward of hard reboots.  The reason my system had problems is likely due to a software conflict I had been ignoring – XTorrent and my .Mac screensaver.   I would come home to a locked up machine, and would be forced to hard reboot the system.  Hard reboots = increased risk of file system damage.  I played Russian roulette one too many times, and paid the price.  36 hours of it.

Mostly, however, I discovered that after 18 years of fixing/restoring Macs, it’s still stressful dealing with a crash like this.  I just can’t imagine why any normal human being would know or care about all the steps above, or how they would be expected to keep multiple backups, hard drives, and techniques handy to manage this type of issue.  It’s 2009 for goodness sake.  By now the computers should be taking care of themselves.

In any case, I hope the above proves useful to a reader or two.  If not, maybe the story will prove either entertaining or depressing, depending on your perspective.

8 thoughts on “Mac Pro Crash Recovery: A Tale of 36 Hours

  1. Agreed – computers should be looking after themselves!

    Excellent write-up. Documenting an incident like this is hard work in itself.

    There’s one step I’d like to know more about – how did you set up the external disk to be bootable? Does it need to be Firewire?

    cheers,
    Con

  2. Sorry to hear about the crash — what a pain (and, having been there myself many times, it’s unfortunate how the “2 minute” computer task can turn into the 36 hour, nail-biting ordeal).

    Out of curiosity, did you try applejack? (and if not, definitely worthy to consider installing it now, as I don’t think it will work post crash).

    Chris

    • I have DiskWarrior, but not sure how it would have made this better. I tend to use DiskWarrior around recovering files from a disk when I don’t have a good backup. In this case, I had a good backup.

      • Yeah, I’m not sure how DW would have saved you anything, unless perhaps said correspondent meant you should have run DW when you first noticed your software conflict. Perhaps cleaning up the catalog and fixing permissions would have alerted you to the serious nature of the problem? I dunno, I’m a video editor, not a developer, and have been forced by circumstances to learn some of this maintenence and support stuff.

        Anyway, GREAT article, as was the article which brought me to your blog (about NTFS mounters for Mac). I have bookmarked you and will be referring often…it’s refreshing to read someone who writes knowledgeably about something, instead of ranting.

        I notice the following comment mentions my preferred method of backup; a dual-drive setup (I have a MacPro dualcore 2.6), using SuperDuper to clone the drive “often” (should be every night…but sometimes I’m working at night…) As I’m on Tiger still, Time Machine isn’t an option. And, why do I care about Time Machine if I’m religious about cloning my drive every night? Inquiring minds want to know…thanks.

        -Mike

  3. Been there, done that—well, not exactly that, but similar. I use SuperDuper to clone my boot drive every night, run TimeMachine, and use Mozy for off-sight backup. Plus, since my desktop has been giving me a hard time, I’ve got an external to backup my User Folder, in case I can’t get the desktop to boot properly (reoccurring issue, either crashes at grey screen gears, or within one minute of booting to desktop) so I’ll have access to all my files on my mac laptop.

    Not a fun way to spend time, dealing with hardware/software issues, but I much prefer having the ability to troubleshoot/repair my system myself rather then paying someone to do it for me. 😉 Hardware is another story, if it’s software I can fix it.

  4. Just read this – useful reminder about how Time Machine restore works, and why I choose to use SuperDuper + Time Machine.

    I split my 1TB external drive into two partitions, one SD and one TM (500GB each, since my MBP drive is 500GB). I do a SuperDuper backup (incremental; less than 20 mins) every few weeks (whenever I get a bunch of system updates or buy/update a software package). TM is always running. Identical setup at home and work (two separate drives – offsite backup!). External and internal drives all encrypted with PGP WDE.

    When my MBP hard drive failed & I had to send the MBP back to Apple for a week, I had just completed a SD backup. I borrowed our spare Operations MB (13″, 4 yrs old), booted off the SD partition of my external drive, and continued working like nothing had happened (but on a smaller screen), and not touching the internal drive.

    When MBP came back, I restored the SD partition (now changed with a week’s work) back to the hard drive, rebooted, and I was back on my own box. Never lost anything; it was amazing to me. I spent perhaps 15 minutes getting up on the Ops MB, plus whatever the restore time was on the MBP. Ops got their unadulterated MB back.

    If the SD backup was not current, I could always have restored from the latest Time Machine update to the SD partition.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s