Pages

Arq and Glacier - Affordable Mac Cloud Storage

After the near miss I had in losing a considerable portion of my personal digital library I decided to do something about it and look into a cloud based solution for keeping my files safe. I’m still using Time Machine locally to back up nearly everything, I just wanted a final line of storage just in case.

I’ve been using DropBox for years for my documents and miscellaneous files. I have several Google Apps for Business accounts that store my emails and shared docs and spreadsheets. The code I write is versioned and stored in GitHub. For the most part I live off the cloud already, the only thing missing was my large collection of family photos and videos, which totaled nearly 140GB.

iCloud is cool and all, and I love the way it keeps my little iPhone photos synced, but at $100 / year for only 55GB, this is a pretty expensive solution. I looked at a variety of different cloud backup solutions and found them to be ill-fitted to my needs. While many of them have plenty of capacity and are pretty affordable, they usually require a rather heavy backup application to be running in the background monitoring changes.

Amazon to the Rescue

I’ve always been a huge fan of Amazon Web Services and specifically Amazon S3 (Simple Storage Service), a cloud based storage model. Amazon makes S3 incredibly resilient, providing a 99.999999999% durability rate. Though I’ve used Amazon S3 extensively myself as a software developer for my online services, I had never used it for storing personal data. Since up to 1TB of storage costs $0.095 / GB / month to store, my 140GB collection would cost me just a little over $13 / month to keep safe. That was a little steeper than I wanted.

Fortunately Amazon introduced another variation of S3 storage called Amazon Glacier. Designed to be just as durable as S3, with Glacier you cannot pull the data out very quickly without incurring some additional costs. This wasn’t an issue for me since this collection was really just a deep archive and backup to my backup. The advantage is Glacier is very affordable at $0.01 / GB / month. This meant my 140GB collection would cost me $1.40 / month on Glacier. Perfect!

Transporting the Files with Arq

The next challenge was getting the files up to Glacier from my Mac Pro. Amazon designs their services as something a programmer or systems administrator would access, not the average end user. I considered writing some scripts that would push my files up to Glacier but that seemed like too much work. Besides, it turns out somebody has already done that: Haystack Software’s Arq.
Arq's Main Window
This $29 download (30 day free trial) makes it really easy to just select what you want pushed up to your Glacier account and let it manage moving it up to the cloud. The interface is very spartan, which is actually great. You just need to have established an Amazon AWS account and signed up for S3, and Arq will take care of the rest.

One of the more reassuring features of Arq is that they have released the restore tool as an open source project on GitHub, providing some peace of mind.
The Restore Tool is available as open source
I set up Arq to run at 3AM, dropped the key photo and video folders on to it and let it run. It took a few days through my Cox Cable connection to get everything up to Glacier. If I add any new photos Arq picks them up on the next sweep and pushes them to Glacier.

Restoring from Glacier Takes Time

If you want the files to be readily accessible, then Glacier probably isn’t your best bet; you should use S3. Restoring even a single file from Glacier can take up to 4 hours before it even starts. This is one of the side effects of Glacier and why it’s best suited for deep, long term archiving, whereas S3 is better for files you need rapid access to.
Glacier Restores require great patience
For $29 up front and $1.40 / month, I’ve now got highly durable cloud copies of all my photos and home videos. Sure beats the panic I went through a few weeks ago.

Got a great cloud based backup / archiving solution people should know about? Drop a line in the comments. I’d love to know how others are handling this.

18 comments:

Keleko said...

I've been considering Crashplan. For $4/month you get unlimited storage. A family of up to 10 computers can back up for $9/month with unlimited storage. You can access files from your phone or tablet, too.

paul7 said...

I was going to suggest Crashplan as well. It works pretty well, pretty cheap, keeps versions, unlimited space, etc.

George Entenman said...

Flickr just announced 1TB for free!

Unknown said...

Glad to hear you went with ARQ, I've been using it for a couple of years and it's wonderful. Sometimes heavy on the CPU side, but great otherwise.

Anonymous said...

I have been using Crashplan for several years now and think it is a brilliant service. It seems to get cheaper every year too.

An unintended benefit of Crashplan is it helped me track down a drug dealer who swapped my iMac for drugs after a thief stole my iMac from our house.

The drug dealer or someone reset my account and logged in and deleted all my files. The Crashplan service connected back to its server in the USA and hey presto I had an IP Address that the police tracked to Telstra (Aussie telco) and finally the drug dealers house.

I got my iMac back and the Spectorsoft screen capturing tool recorded the drug dealers credentials logging into Facebook, Gmail, TAB betting sites and he completed the Australian Government Cencys form so I know absolutely everything about him.

Still have not been game enough to log into his bank account and make a sizeable (anonymous) donation to a "victims of drugs/crime" charity.

David Alison said...

@Keleko: You should throw Arq into consideration. It really is a nice, lightweight option and the long term storage rates for Amazon Glacier are hard to beat.

@George Entenman: Well now, that's pretty cool, though I don't know if I trust a free service to do anything other than ensure everyone sees every picture I put up there. Still, that is a LOT.

@Anonymous: That is quite possibly the most brilliant comment I've seen on my blog in a very long time. Bravo!

Dan said...

Great post @Anonymous!!

I also use Crashplan Dave. I used to use Mozy years ago but they screwed me (like a lot of users) by almost quadrupling their prices for the amount of data I back up and giving us a month's notice.

This was back in 2011 and when Crashplan found out about it, they had a "Mozy Switcher" discount. I paid $139 for 4 years of service.

I couldn't even tell you anymore how much Crashplan is because I'm paid up until 2015. It is a great service though.

telcan said...

PBX is ideal for small business and entrepreneurs. It gives a professional image and make sure that you don't miss a single call. Few months back, I decided to buy PBX from Telcan. You can get more information here: PBX

Dave Deaton said...

I, too, use Crashplan. I actually got fed up with their service at one point due to failures in their Java program though and added BackBlaze to the mix. Since I had already paid for 3 years service, I left Crashplan running, but I liked BackBlaze better. Some time later, my 6TB external video drives, that I don't access very often when I'm super busy, failed. It took me more than an month before I decided that it was one of the drives and not the housing (an OWC Mercury Elite-AL Pro). So, by the time I went to BackBlaze's site to start the arduous task of restoring about 3TB of data, BackBlaze had already deleted my data! They only keep it there for 30 days. I was so pissed! All I can say is "Thank God for Crashplan!", because they did have the data and they saved my butt! BackBlaze crashed and burned. To make matters worse for BackBlaze, several weeks after that, I started getting messages that said about some file that had gotten too big and I had to delete and reinstall the program which meant reuploading my main 3TB hard drive again. I wouldn't recommend Backblaze to anyone. I like what you are using here so I might add that as my primary video and photo archive methodology…
Thanks for the tip!

David Alison said...

@Dave Deaton: all of the backup providers are in a mad race to the bottom since they offer nearly the same basic products. This is one of the reasons I like the Arq / Glacier model so much. Arq is a small vendor that only sells the software (and released the restore tool as Open Source) and Amazon offers up Glacier in such a way that it's being used by a lot more than just home backup/archival folks.

sfmitch said...

I, too, use crashplan.

It is very reasonably priced and works.

I'd like it more if they didn't use Java.

Crashplan along with Time Machine and Carbon Copy Clones leaves me comfortable that my data is protected in case of any problems.

David Orriss Jr said...

Hey David,

I use a combination Time Machine for my critical data (photos, documents etc) along with SuperDuper to make a complete bootable duplicate of my second drive.

After reading about your experience and trying to get your data I thought perhaps I should use an off-site option as well. Initially I was looking at Carbonite (I have friends who use it and swear by it). But I have to say that Glacier and Arq seem to be a far better fit.

I love reading this blog.. I always find good information... :-)

David Alison said...

@David Orriss Jr: Thanks for the kind words David! Arq is very basic but it works well from what I've seen so far and Glacier is about as robust a service as you'll find.

Rafael said...

This is awesome!

stíobhart said...

I'm glad Arq has worked for you. My experience has been the complete opposite:

I recently had a hard drive crash and burn which contained my entire 75GB photographic archive.
Thankfully [or so I thought at the time!] I had the archive backed up to Amazon Glacier via Arq. So
I set Arq going to retrieve the files. That was about two months ago. Since then, I've been
unsuccessfully trying to run the Arq restore about two or three times a week with very little to show for it:

1: The first time I set the Arq downloaded in motion it retrieved only 7GB of the archive and ran up
a $42 Amazon bill in the process. Checking my Amazon AWS dashboard showed that Arq had made over
3000000 [yes, that's million!] GB of data requests to download that measley 7GB of files.

2: Subsequent attempts to retrieve the rest of the archive have garnered me about another 2GB over the course of a couple of months, running up another $20 odd in Glacier fees.

3: Several times Arq downloads nothing at all, throwing up an error [after several hours of seeming to do very little], saying that the request has "expired"

4: Other alerts, thrown up [again, after several hours doing seemingly nothing] have included "invalid credentials" and "invalid timestamp".

In other words, Arq has restored about 9Gb of a 75GB archive, taken 2 months to do so and run up an AWS bill approaching $100.

There are two problems with trying to get to the bottom of this:

Firstly the very nature of the Glacier restore process means that there is an expectation that it will take at least 4 hours to initiate, so while Arq sits there doing apparently very little for several hours on end, you're never quite sure if it's just waiting for Amazon to cue up the files, or it's dropped the connection for some spurious reason. [In my experience mostly the latter!].

Secondly, the Arq developer censors any comments pointing out these flaws on his support forums. I've written two or three posts now, politely enquiring as to what's happening here, asking if anyone else has had the same issues and alerting others to the possibility of running up substantial bills while using Glacier restore on Arq. Each time, the posts have been removed by the developer.

It kind of irritates me that whenever I search for help on this issue, all I seem to find are glowing 'reviews' of Arq which seem to be culled from the official press releases and unquestioningly accept the developer's claims for the product, without actually putting them to the test. As I said at the start, I'm glad Arq seems to have worked for you so far but anyone else planning to jump in with both feet should satisfy themselves that they can get their 'stuff' back again, if the ordure hits the fan. I've not been able to, which is incredibly frustrating. Through the Arq interface, I can see my photographic archive sitting there in Amazon Glacier, but it might as well be lost for good, if I can't get it back again.

One more point: you mention the fact that the developer has released a command line restore tool so that people are not 'locked in' to using Arq. The command line tool does not work with Glacier, only with Arq's regular S3 backups so, in the case of Arq's Glacier backups, you ARE unfortunately locked in... and locked into a product which in my opinion isn't fit for purpose.

David Alison said...

@stíobhart: Thanks for the comments - I haven't tried doing anything other than a very quick restore test which seemed to work. This is very disturbing and I wasn't aware that Arq's command line tool only supported S3, not Glacier.

More disturbing is that the developer is deleting critical comments from the support forums. Have you tried reaching them directly via email? Any response?

Best of luck and I encourage you to post any follow up information here.

Stefan Reitshamer said...

Hi, I'm the developer behind Arq.
In response to @stiobhart's comments:
Regarding the forum: I deleted the Arq forum. I've tried 3 times now to run a support forum over the years, and each time the spammers eventually overrun it. I give up on forums.

I've never received an email from anyone named Stiobhart, so this is the first I'm hearing of the problems he's having. I didn't have any contact info and can't find a purchase with that name, so I guess he didn't purchase a license and was using the 30-day free trial? I found him on Twitter and app.net yesterday and reached out to him on both channels but he did not respond.

Hopefully he'll email me at support@haystacksoftware.com so I can help him restore his photos. My contact info has always been on the haystacksoftware.com web site on the Support page, and I strive to answer emails within 1 business day. I know people have restored hundreds of GBs from Glacier using Arq, so I'm pretty confident we can get his photos back if he wants to.

In the absence of actual details of what happened, I can speculate on a possible cause of large Amazon charges: I would guess he started a restore from Glacier, canceled it, and tried a different restore, maybe canceled that, tried again, and so on.
Glacier has a “peak hourly request fee” that basically punishes you for downloading too quickly. This is explained in the app, though I guess not clearly enough for him. When you click the button to restore from Glacier, Arq first asks you what transfer rate to use and estimates the restore time and peak-hourly-request fee; you can edit the transfer rate and see how both the restore time and the fee change as a result, all before actually starting to request from Glacier. If he started a restore, that would start the “transfer” of Glacier data (even though he didn’t download it). If he did that a few times within the same hour, the total amount requested during that hour could be very large, which would cause a large “peak hourly request fee”. That’s the way Glacier works, unfortunately.

If anyone has more questions about any of that, please email me at support@haystacksoftware.com.

Shahana Shafiuddin said...

Good info for file storage.