[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [SAGE] Long lived, cheap data storage



The problem here, is do you really want to keep all of your eggs in  
one basket? What happens in 3 years when Amazon realizes that they're  
not a storage company and shuts down EC2? Or when EC2 has an outage  
that causes data loss (like the one last weekend), and you're SOL  
because Amazon was never a storage company, and refused to provide an  
SLA?


On Oct 7, 2007, at 3:03 PM, Sam Johnston wrote:

> How about something like JungleDisk? It backs on to Amazon's Simple
> Storage Service (S3) and effectively translates from the web service
> to locally bound WebDAV, adding encryption on the way using simple and
> apparently sensible key management and AES-256; there's even some GPL
> sample code if you want to get your data back without using the tool.
> Anyway regardless of the tool you use, paying 15c/gb/month for highly
> available storage suits me just fine, especially if it's mostly
> archive in which case you avoid running up bandwidth charges.
>
> As a bonus, if you want to manipulate the data (say, converting images
> or transcoding media) then you can just fire up a virtual machine in
> the Elastic Compute Cloud (EC2) from an image (AMI) also stored on S3
> and do what you like with it for all of 10c/instance/hr.
>
> This appears to meet all your requirements and covers a number of
> options: S1 C1 C2 A1 A2 A3? (EC2 as a bastion host) I1 I2  L1 L3. For
> more on the storage subsystem itself, Dynamo, see
> http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
>
> Here's some info from http://www.jungledisk.com
>
> Jungle Disk is an application that lets you store files and backup
> data securely to Amazon.com's S3 ™ Storage Service.
>
>     * Store an unlimited amount of data for only 15¢ per gigabyte
>     * No monthly subscription fee, no startup fee, no commitment
>     * Your data is fully encrypted at all times
>     * Data is stored at multiple Amazon.com datacenters around the
> country for high availability
>     * Access files directly from Windows Explorer, Mac OSX Finder,  
> and Linux
>     * Automatically backup your important files quickly and easily
>
> Unlike other services, with Amazon S3 ™ there is no mimimum and no
> maximum amount of data you can store. You pay only for the actual
> amount of storage you are using.
>
> On 07/10/2007, Mark R. Lindsey <lindsey@acm.org> wrote:
>> How do you archive personal data for a long time, keep it secret,
>> keep it safe, and keep it available, and do it affordably? I'd
>> welcome your ideas. If we have something interesting enough, maybe we
>> could submit a paper somewhere to document the state of the art.
>>
>> I've started to accumulate personal records that I want safe,
>> confidential, and available. These include tax forms, some scanned
>> legal documents, software, papers, and research data I've collected,
>> 15 years of email, and family photos. The plight of Katrina Victims
>> and their difficulty getting back into their homes, and the theft of
>> Francis Ford Coppola's backup hard drive [1], have me thinking
>> seriously.
>>
>> ---------------------------------------------------------------------
>>
>> I. My requirements
>>
>> I want the files to be available [R1] in case of disaster that
>> destroys my house, and [R2] disaster that destroys my company's data
>> center, [R3] if I lose my job and thus access to company servers.
>>
>> [R4] I want the files to be private, so that only I or my wife, or
>> maybe another designated family member can read them.
>>
>> [R5] I want to be able to access the files regardless of OS.
>>
>> [R6] I'd also like to be able to access the files from elsewhere,
>> preferably anywhere over the Internet.
>>
>> [R7] I want to be able to retrieve a given file in no more than 10
>> minutes of work, or add a file to the archive in relatively little  
>> work.
>>
>> [R8] I want the files to be available to my descendants for a really
>> long time -- hopefully another 100 years -- because this stuff is
>> valuable to my family and they keep raising the life expectancy
>> (dangit!).
>>
>> [R9] I need to minimize costs -- hopefully around $15/month-$25/month
>> (using 2007 dollars going forward).
>>
>> [R10] I need to accommodate 20GB of storage now, plus 5GB additional
>> storage per year.
>>
>> ---------------------------------------------------------------------
>>
>>
>> II. Obvious options
>>
>> It seems like the problem can be modeled in terms of Storage,
>> Confidentiality, and Ease of Availability, Integrity (non
>> corruption), and Longevity.
>>
>> Storage
>> ~~~~~~~
>> (S-1) Buy service through an online backup service.
>>
>> (S-2) Find a friend and borrow space on their server, then use FTP or
>> SCP to access it.
>>
>> (S-3) Buy cheap web hosting and get their storage. Access with FTP or
>> WebDAV.
>>
>> (S-4) Use Gmail accounts for storage.
>>
>> (S-5) Store the files on a hard drive at my house.
>>
>> (S-6) Store the files on a hard drive at my company's data center.
>>
>> (S-7) Start a co-op to lease hosted servers from rackspace.com and
>> similar.
>>
>> Confidentiality
>> ~~~~~~~~~~~~~~~
>> (C-1) Depend on the provider's access-control mechanisms to keep
>> intruders out.
>>
>> (C-2) Use private-key encryption (gpg, for example) using a pass-code
>> I memorize.
>>
>> Availability
>> ~~~~~~~~~~~~
>> (A-1) Use the backup provider's special tool to access files.
>>
>> (A-2) Only use a service that lets me download it via the web, ftp,
>> scp, or similar standard tool via the Internet.
>>
>> (A-3) Only access the file from a designated bastion host that's
>> accessible via the Internet.
>>
>> Integrity
>> ~~~~~~~~~
>> (I-1) Depend on the storage provider to prevent data corruption.
>>
>> (I-2) Use multiple distributed copies to recover from data
>> corruption, possibly with a checksum to prevent data corruption.
>>
>> Longevity
>> ~~~~~~~~~
>> (L-1) Depend on the storage provider to keep the files available for
>> a long time.
>>
>> (L-2) Use multiple storage providers simultaneously, to defend
>> against business failure.
>>
>> (L-3) Plan to migrate data from provider to provider. (* -- Can we
>> infer anything ability stability of hosted storage
>>
>> ---------------------------------------------------------------------
>>
>>
>> III. My favorite solution
>>
>> My favorite is (S-3)(C-2)(A-2)(I-1)(L-3). I.e., buy cheap web hosting
>> service storing my files in plain sight of Google, use encryption
>> with a readily-available tool like GnuPG, expect the web hosting
>> folks to keep the files in tact, and just plan to migrate to another
>> provider when they go under. A key feature here is cheapness -- web
>> hosting is very cheap.
>>
>> [R1, Home Disaster] OK; web hosting doesn't depend on my house.
>> [R2, Business disaster] OK, as long as I don't use my company's own
>> web hosting.
>> [R3, Loss of job & access to servers] OK; web hosting doesn't depend
>> on that particular job.
>> [R4, Private] OK, assuming GnuPG encryption and our passcode is safe.
>> [R5, OS-agnostic access] OK, assuming GnuPG is available on the
>> platform I'm using.
>> [R6, Access from anywhere] OK, web hosting providers are good at  
>> that.
>> [R7, Quick access] OK, assuming I have Internet access, and ability
>> to download a binary of GnuPG.
>> [R8, Century-long access] OK assuming I actually do keep the data
>> with a company who's in business.
>> [R9, Cheap] OK, As long as I can use a web hosting provider for this.
>> Some already exclude this [2]. Many providers are offering 300GB of
>> storage for <$10/month [3].
>> [R10, Growth] OK, assuming the cost of hosting storage keeps going
>> down. TBH, I'm not banking on fault-tolerant storage.
>>
>> But risks in this plan:
>>
>> -> How long would GnuPG encryption last me? After all, single-DES was
>> considered useful once. What's the expected lifespan of readily-
>> available encryption software?
>>
>> -> What if something happened and I didn't make the yearly payment?
>> All the data would be flushed.
>>
>> -> Is it reasonable to depend on the provider to keep files intact?
>> If not, I've got to do replication of some sort -- probably
>> replicating the data to a different storage provider.
>>
>> -> Will I know to migrate the files soon enough to be useful? Say I
>> go with GoDaddy, and they cancel web hosting services 8 years from
>> now. Will I know in time? Will I be in a position to migrate the data
>> quickly enough?
>>
>>
>> ------------------------
>>
>> [1] -- "Coppola plea after computer theft", http://news.bbc.co.uk/2/
>> hi/entertainment/7019644.stm
>>
>> [2] -- http://www.websitesource.com/browser_windows/webstorage.html
>>
>