Real Backups On The Cheap

So you have your data on the “cloud” – on Dropbox or GDrive folders – and you believe you’ve done a decent job of safe-guarding your precious files, while really you’ve only saved your files against total computer or hard-drive loss. I used to be this guy until one day when I discovered a few of my precious files went missing from Dropbox. I searched and searched everywhere to discover I’d truly lost them. Was it an accidental delete, or was it a bad program that deleted it? I’ll never know.

While free-Dropbox comes with free 30days restore, it did not have my precious files, and that made me realize that we all have way too much stored in our cloud-folders and there is absolutely no way one can keep a handle on what was deleted/added/modified and when.

Although this incident taught me a lesson and made me store offline-copies of my data on another hard-drive, I really had to go through another painful loss before I seriously started looking for an alternative. Several albums belonging to my really precious music collection (synced across my computers using Resilio, Play Music, and duct-tape) got corrupt and/or missing at some point and were nicely synced across all devices. Again, the sheer volume of data (10,000 music files) ensured I’d find out much much later after too much damage had been done.

So now, I seriously started looking for a solution that would endure the tests of time, stupidity, bad software, and a drunk-me. A solution that would ensure extreme durability while still maintaining relatively quick access when needed.

Enter: AWS S3

Now, I’m sure you know everything about S3 and how inexpensive it can be (0.023 per GB/month for standard-class) to store multiple GBs or TBs of data. But if you’re not an enterprise user, and you’re like me who likes simple and cheap, you’re probably considering S3’s infrequent-storage class at 0.0125 per GB/month. At around 60GB of potential data to store on the cloud, that is just 9$ USD per year vs S3’s standard-class ~17$. But, wait, there’s something even cheaper.

Enter Glacier – the cloud-tape solution from Amazon. Glacier is cheap, as durable as S3 standard-class (99.999999999% durability) and comes at a dirt cheap 0.004$ per GB/month cost. That really is 0.4cents. For 60GB, my cost for the entire year is 2.88$. I remember paying 100$ for a 4.75GB HDD way back. Today, I pay 4.8$ for 100GB of ultra-durable and multi-AZ-replicated storage on Glacier. Times have changed, indeed!

However, before you close this tab and proceed to backup your files into Glacier using a glacier tool (such as, Freeze or FastGlacier), I’d like to let you know that once you upload to a Glacier vault, you will need to request AWS to have access to your data. This includes listing of the contents, so uploading to raw Glacier is not recommended for this use-case.

Instead, I recommend uploading your data into an S3 bucket with a lifecycle management policy set to move data into Glacier 0-days (zero-days) after upload. This ensures that your objects in S3 are moved to Glacier end-of-zeroth-day so you’re not billed for even a single day of S3 storage. AWS moves your data into what it calls the Glacier-class of S3 storage.

This approach ensures that you always have the ability to list your glacier contents using S3 APIs/AWS console. This also lets you use cheap or free S3 tools (such as CyberDuck or S3browser) to upload your data into Glacier vs. having to spend $30+ on Glacier-specific tools such as Freeze.

I hope this was helpful. So far, I’ve managed to upload 20 of 60GB of my data and have been pleasantly surprised by how easy it has been and how much stress it takes off your mind regarding your backups.

A future blog post will detail out the steps required to download your S3-Glacier backups to your computer in an emergency.

This entry was posted in Amazon Web Services, DevOps and tagged , , , , , , , , , , . Bookmark the permalink.

Leave a comment