[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [SAGE] Long lived, cheap data storage



How about something like JungleDisk? It backs on to Amazon's Simple
Storage Service (S3) and effectively translates from the web service
to locally bound WebDAV, adding encryption on the way using simple and
apparently sensible key management and AES-256; there's even some GPL
sample code if you want to get your data back without using the tool.
Anyway regardless of the tool you use, paying 15c/gb/month for highly
available storage suits me just fine, especially if it's mostly
archive in which case you avoid running up bandwidth charges.

As a bonus, if you want to manipulate the data (say, converting images
or transcoding media) then you can just fire up a virtual machine in
the Elastic Compute Cloud (EC2) from an image (AMI) also stored on S3
and do what you like with it for all of 10c/instance/hr.

This appears to meet all your requirements and covers a number of
options: S1 C1 C2 A1 A2 A3? (EC2 as a bastion host) I1 I2  L1 L3. For
more on the storage subsystem itself, Dynamo, see
http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html

Here's some info from http://www.jungledisk.com

Jungle Disk is an application that lets you store files and backup
data securely to Amazon.com's S3 ™ Storage Service.

    * Store an unlimited amount of data for only 15¢ per gigabyte
    * No monthly subscription fee, no startup fee, no commitment
    * Your data is fully encrypted at all times
    * Data is stored at multiple Amazon.com datacenters around the
country for high availability
    * Access files directly from Windows Explorer, Mac OSX Finder, and Linux
    * Automatically backup your important files quickly and easily

Unlike other services, with Amazon S3 ™ there is no mimimum and no
maximum amount of data you can store. You pay only for the actual
amount of storage you are using.

On 07/10/2007, Mark R. Lindsey <lindsey@acm.org> wrote:
> How do you archive personal data for a long time, keep it secret,
> keep it safe, and keep it available, and do it affordably? I'd
> welcome your ideas. If we have something interesting enough, maybe we
> could submit a paper somewhere to document the state of the art.
>
> I've started to accumulate personal records that I want safe,
> confidential, and available. These include tax forms, some scanned
> legal documents, software, papers, and research data I've collected,
> 15 years of email, and family photos. The plight of Katrina Victims
> and their difficulty getting back into their homes, and the theft of
> Francis Ford Coppola's backup hard drive [1], have me thinking
> seriously.
>
> ---------------------------------------------------------------------
>
> I. My requirements
>
> I want the files to be available [R1] in case of disaster that
> destroys my house, and [R2] disaster that destroys my company's data
> center, [R3] if I lose my job and thus access to company servers.
>
> [R4] I want the files to be private, so that only I or my wife, or
> maybe another designated family member can read them.
>
> [R5] I want to be able to access the files regardless of OS.
>
> [R6] I'd also like to be able to access the files from elsewhere,
> preferably anywhere over the Internet.
>
> [R7] I want to be able to retrieve a given file in no more than 10
> minutes of work, or add a file to the archive in relatively little work.
>
> [R8] I want the files to be available to my descendants for a really
> long time -- hopefully another 100 years -- because this stuff is
> valuable to my family and they keep raising the life expectancy
> (dangit!).
>
> [R9] I need to minimize costs -- hopefully around $15/month-$25/month
> (using 2007 dollars going forward).
>
> [R10] I need to accommodate 20GB of storage now, plus 5GB additional
> storage per year.
>
> ---------------------------------------------------------------------
>
>
> II. Obvious options
>
> It seems like the problem can be modeled in terms of Storage,
> Confidentiality, and Ease of Availability, Integrity (non
> corruption), and Longevity.
>
> Storage
> ~~~~~~~
> (S-1) Buy service through an online backup service.
>
> (S-2) Find a friend and borrow space on their server, then use FTP or
> SCP to access it.
>
> (S-3) Buy cheap web hosting and get their storage. Access with FTP or
> WebDAV.
>
> (S-4) Use Gmail accounts for storage.
>
> (S-5) Store the files on a hard drive at my house.
>
> (S-6) Store the files on a hard drive at my company's data center.
>
> (S-7) Start a co-op to lease hosted servers from rackspace.com and
> similar.
>
> Confidentiality
> ~~~~~~~~~~~~~~~
> (C-1) Depend on the provider's access-control mechanisms to keep
> intruders out.
>
> (C-2) Use private-key encryption (gpg, for example) using a pass-code
> I memorize.
>
> Availability
> ~~~~~~~~~~~~
> (A-1) Use the backup provider's special tool to access files.
>
> (A-2) Only use a service that lets me download it via the web, ftp,
> scp, or similar standard tool via the Internet.
>
> (A-3) Only access the file from a designated bastion host that's
> accessible via the Internet.
>
> Integrity
> ~~~~~~~~~
> (I-1) Depend on the storage provider to prevent data corruption.
>
> (I-2) Use multiple distributed copies to recover from data
> corruption, possibly with a checksum to prevent data corruption.
>
> Longevity
> ~~~~~~~~~
> (L-1) Depend on the storage provider to keep the files available for
> a long time.
>
> (L-2) Use multiple storage providers simultaneously, to defend
> against business failure.
>
> (L-3) Plan to migrate data from provider to provider. (* -- Can we
> infer anything ability stability of hosted storage
>
> ---------------------------------------------------------------------
>
>
> III. My favorite solution
>
> My favorite is (S-3)(C-2)(A-2)(I-1)(L-3). I.e., buy cheap web hosting
> service storing my files in plain sight of Google, use encryption
> with a readily-available tool like GnuPG, expect the web hosting
> folks to keep the files in tact, and just plan to migrate to another
> provider when they go under. A key feature here is cheapness -- web
> hosting is very cheap.
>
> [R1, Home Disaster] OK; web hosting doesn't depend on my house.
> [R2, Business disaster] OK, as long as I don't use my company's own
> web hosting.
> [R3, Loss of job & access to servers] OK; web hosting doesn't depend
> on that particular job.
> [R4, Private] OK, assuming GnuPG encryption and our passcode is safe.
> [R5, OS-agnostic access] OK, assuming GnuPG is available on the
> platform I'm using.
> [R6, Access from anywhere] OK, web hosting providers are good at that.
> [R7, Quick access] OK, assuming I have Internet access, and ability
> to download a binary of GnuPG.
> [R8, Century-long access] OK assuming I actually do keep the data
> with a company who's in business.
> [R9, Cheap] OK, As long as I can use a web hosting provider for this.
> Some already exclude this [2]. Many providers are offering 300GB of
> storage for <$10/month [3].
> [R10, Growth] OK, assuming the cost of hosting storage keeps going
> down. TBH, I'm not banking on fault-tolerant storage.
>
> But risks in this plan:
>
> -> How long would GnuPG encryption last me? After all, single-DES was
> considered useful once. What's the expected lifespan of readily-
> available encryption software?
>
> -> What if something happened and I didn't make the yearly payment?
> All the data would be flushed.
>
> -> Is it reasonable to depend on the provider to keep files intact?
> If not, I've got to do replication of some sort -- probably
> replicating the data to a different storage provider.
>
> -> Will I know to migrate the files soon enough to be useful? Say I
> go with GoDaddy, and they cancel web hosting services 8 years from
> now. Will I know in time? Will I be in a position to migrate the data
> quickly enough?
>
>
> ------------------------
>
> [1] -- "Coppola plea after computer theft", http://news.bbc.co.uk/2/
> hi/entertainment/7019644.stm
>
> [2] -- http://www.websitesource.com/browser_windows/webstorage.html
>