Amazon Glacier Backup of Linux Server
If you’re looking for cheap offsite backup storage and are ok waiting up to 24 hours for the ability to restore these backups, Amazon Glacier seems to be the main contender out there. At .01 per GB per month and no cost for storing or deleting backups (they do nick you on restores), it’s hard to beat. The biggest problem is the lack of backup software that will write to Glacier, especially if you want to automate the process, and again worse if you’re backing up from a linux host. Recently I was tasked with just that, getting backups set up for a linux server that we host on Amazon and storing as many of them as we could for as long as we could, for as cheap as we could, and our return to service SLA is around 48 hours. Glacier is the obvious choice, beating out all other cloud storage vendors by almost 10x (Amazon S3 and Windows Azure came in at the same price for the first TB, .095 per GB per month), but I was having a hard time finding software to use for the backups. Eventually I came across amazon-glacier-cmd-interface on Github (I know, using development software in a production environment is a big no-no, but I figure as long as I keep an eye on it we should be fine, plus we keep 2 weeks worth of backups local anyway), it provides a relatively easy command line interface to backup to and restore from Amazon Glacier, though the documentation leaves some to be desired. Thus, my post. After spending quite a bit of time (I’m not terribly linux savvy) on a test server getting everything working, rolling out to a QA server for verification and then rolling into production, I’ve worked out almost all of the kinks. Below is my entire process for installing amazon-glacier-cmd-interface along with all of the prerequisites and getting it configured and working.
Step 1: Install Git
Git is the program used to clone projects from Github to your linux server and though it’s not required (you can download the program/script file separately and unpack on the server) it makes installation much easier.
sudo apt-get install git
Step 2: Clone and install Boto
Boto is a required set of scripts that amazon-glacier-cmd-interface uses as a backend to talk to Amazon Glacier.
cd /opt sudo git clone git://github.com/boto/boto.git cd boto sudo python setup.py install
Step 3: Clone amazon-glacier-cmd-interface, install prerequisites (Glacier Core Calls and Distribute tools) and install amazon-glacier-cmd-interface.
sudo git clone git://github.com/uskudnik/amazon-glacier-cmd-interface.git cd amazon-glacier-cmd-interface/glacier sudo python glaciercorecalls.py install cd /opt
Then:
sudo curl -0 http://python-distribute.org/distribute_setup.py
OR (depending on your version and distro of linux. Distribute utility replaced setup tools.)
sudo wget http://python-distribute.org/distribute_setup.py
Then:
cd distribute_setup sudo python distribute_setup.py cd ../amazon_glacier_cmd_interface sudo python setup.py install
At this point, amazon-glacier-cmd-interface is installed, now we must create the config file: /etc/glacier-cmd.conf. Here is a sample:
[aws] access_key=your_aws_access_key secret_key=your_aws_secret_key [sdb] access_key=your_sdb_access_key secret_key=your_sdb_secret_key region=us-west-1 [glacier] region=your_glacier_region bookkeeping=False bookkeeping-domain-name=your_simple_db_domain_name logfile=~/.glacier-cmd.log loglevel=INFO output=print [SNS:<your_sns_name>] method=email,youremailaddress@domain.com;
At this point, you’re ready to start backing up to Glacier. The following command is a sample backup command.
glacier-cmd upload <your_vault_name> <your_backup_file>
Also, here is a sample backup script. This script will tar your backup folder, calculate the amazon hash, upload the tar file to Glacier, and delete the tar file (but leave your backup folder alone).
#!/bin/sh #What to back up day=$(date +"%Y-%m-%d") folder="/your_backup_folder" source_file="$day-daily" #Echo Backup Date for Logs echo echo "***********************************************************************" echo "Copy "$day-daily" backup to Amazon Glacier storage" #Backup file backup_file="$day-daily.tgz" #Tar the file tar czf $folder/$backup_file $folder/$source_file #Calculate the Tree hash sudo glacier-cmd treehash $folder/$backup_file #Send file to Glacier glacier-cmd upload <your_vault> $folder/$backup_file #Delete backup file rm $folder/$backup_file
After that, all you need to do is add the following line to your /etc/crontab.
00 1 * * * root bash /<script_location>/script.sh >> /<log_location>/log.log 2>&1
And your offsite to Glacier will run every morning at 1. I’ve had pretty good success, it’s been installed for 2 months backing up our environment and I think it’s errored out on 3 backups (which I was able to go back and manually upload the next day).
It’s not the most elegant solution, and I would never consider it to be an enterprise-level solution, but in the absence of an enterprise-grade solution, it will do. A few things that I didn’t cover:
1. Multipart backups – Amazon Glacier requires multipart uploading for anything larger than 4GB, but recommends it for anything larger than 100MB. Our backups are about 2.5GB, so I haven’t had to deal with it yet. I know amazon-glacier-cmd-interface supports it, but I haven’t looked into it.
2. Restoring backups – I’ve attempted several times to restore using amazon-glacier-cmd-interface but have had no luck, it basically blows up on me. What I’ve resorted to as a restore process is using Cloudberry Cloud Explorer (there’s a free version) to pull the backup file down, then FTP it to my linux server and unpack it. Again, not elegant, but it works.
NOTE: I had some real issues getting Cloudberry Cloud Explorer free version to connect to Glacier. It would connect to S3 just fine, but would error out while connecting to Glacier. After working with their support, we were able to figure out that if you install a trial of the Pro version, then uninstall the trial, and then install the free version, it will work just fine. Apparently there’s a registry key somewhere that it needs to connect to Glacier that the free trial is missing. Installing the Pro version gets it there and then leaves it there during the uninstall (something I usually hate about programs, but actually worked out in this case).
3. Catalog – amazon-glacier-cmd-interface does have the ability to keep a catalog of your backups to simplify the restore process. It utilizes an Amazon SimpleDB database, where it stores information about your backup sets. I looked into this a bit, but it seemed to overcomplicate things for what I was looking for (and also for my experience level to be honest). I did not work with it, but I thought it was worth mentioning that it does exist.
Download and Docs for Boto: https://github.com/boto/boto
Download and Docs for amazon-glacier-cmd-interface: https://github.com/uskudnik/amazon-glacier-cmd-interface
Download and Site for Cloudberry Cloud Explorer: http://www.cloudberrylab.com
To use these product you access key and secret keys needs subscription for Amazon SimpleDB. Amazon SimpleDB can be useful for those who need a non-relational database for storage of smaller, non-structural data. Amazon SimpleDB has restricted storage size to 10GB per domain. Amazon SimpleDB offers simplicity and flexibility. SimpleDB automatically indexes all data. Amazon SimpleDB pricing is based on your actual box usage. You can store any UTF-8 string data in Amazon SimpleDB.
On the different note- SDB Explorer provides an industry-leading and intuitive Graphical User Interface (GUI) to explore Amazon SimpleDB service in a thorough manner, and in a very efficient and user friendly way.
http://www.sdbexplorer.com/
You can also try https://github.com/vsespb/mt-aws-glacier
– Multipart supported (enabled by default).
– Multithreaded upload supported.
– Catalog in file system – “Journal” (you might want to sync it to S3 with another tool, or, in case journal is lost you can recover it from Amazon Glacier servers)
– Restore should be possible. no open bugs about failing restore. Downloading with HTTP Range header is implemented ( there is known issue when it’s impossible to download big (2-20Gb) files from Amazon Servers with any client without that feature https://forums.aws.amazon.com/thread.jspa?messageID=454696 )
– For Debian/Ubuntu you can deploy using OS package manager. For other distros deployment is easy too.
To download stored backups from a Windows machine you could also consider using fastglacier.com