Archiving to S3 Glacier
Table of Contents
Why S3 Glacier?
I have some nonzero amount of music and old ROMs on my WD My Cloud Home (I think about a terabyte combined). My MCH is now maybe 3 years old and not set up as a RAID. I do have a backup of most of the music on the 2TB hard drive in my Windows laptop. I also want an online backup that would exist as long as the corresponding datacenter is up, even if it would cost me some money. I'm not expecting to use this backup often, only if all my other backups fail, so I just wanted cheap storage. Egress fees are OK. Note that I currently have 5TB of OneDrive storage from MIT, though that will expire when I graduate.
S3 Glacier is the cheapest for long-term storage. It has egress fees (unlike Backblaze, which charges more per TB but has free egress as long as it doesn't exceed a monthly bandwidth). When I lose the OD storage I might look into Backblaze.
Reading through the S3 Glacier documentation, the main thing is that when you upload your documents you either keep track of their ID <–> name correspondence or just accept that your files will come back to you with scrambled names. I thought that was OK because I was zipping everything and the metadata is all contained in my FLAC (mostly) files.
For privacy, I used OpenSSL to encrypt everything. This does risk losing everything if you forget the decryption key or the number of iterations, but I…hope…I won't.
My bill each month will be about $1, which is great. A 2 TB hard drive (an actual one, not one of those fake ones on Amazon) still goes for about \$80 on Western Digital. To extract my currently 600 GB and 1000 files, I'd be paying about \$6.05 (\$0.01 per GB and \$0.05 per 1000 requests). That means I'd need my hard drive to last about 6 years before the costs break even. (If I wanted RAID, I'd have to buy 2 of those and set them up, so even more expensive!)
Preparing Files
First, zip the folder or file. Then, use openssl to encode these:
for file in *.zip; do openssl enc -aes-256-cbc -iter 1000 -salt -in "$file" -out "${file%.*}.enc" -pass file:pass.txt; done
For decryption, use openssl dec ... and the same password! Don't forget the password!
Uploading Files
If the file is under 4 GB, a call to aws glacier upload-archive works:
for file in *.enc; do aws glacier upload-archive --vault-name "MCH_Backup" --account-id "-" --body "$file"; done
Multi-part uploads for files > 4 GB
If your file is over 4 GB, you have to tell AWS you're going to split up your file and upload the parts, but they should all be considered part of the same file. To confirm that you've uploaded everything correctly, you have to pass the checksum of the big file to AWS. AWS computes its checksums by hashing 1MB chunks, then hashing pairs of those, then pairs of those, and so on.
- Initiate multipart upload for 4GB size
aws glacier initiate-multipart-upload --account-id - --part-size 4294967296 --vault-name "MCH_Backup"
returns an ID; probably do
export UPLOAD_ID=blah
- split file into 4G chunks:
split -b 4G file.enc prefix
- upload each part to AWS:
aws glacier upload-multipart-part --body part_n.enc --range 'bytes 0-4294967295/*' --account-id - --vault-name "MCH_Backup" --upload-id $UPLOAD_ID
byte ranges (hopefully I never go past this):
– 0-4294967295 – 4294967296-8589934591 – 8589934592-12884901887
use ls -l for size of the file, export FILE_SIZE=blah
– checksum the bytes (use the glacier_checksum.py script), export CHECKSUM=blah
– tell aws all parts uploaded (checksum needed for validation):
aws glacier complete-multipart-upload --checksum $CHECKSUM --archive-size $FILE_SIZE --upload-id $UPLOAD_ID --account-id - --vault-name "MCH_Backup"