[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[SAGE] Long lived, cheap data storage
How do you archive personal data for a long time, keep it secret,
keep it safe, and keep it available, and do it affordably? I'd
welcome your ideas. If we have something interesting enough, maybe we
could submit a paper somewhere to document the state of the art.
I've started to accumulate personal records that I want safe,
confidential, and available. These include tax forms, some scanned
legal documents, software, papers, and research data I've collected,
15 years of email, and family photos. The plight of Katrina Victims
and their difficulty getting back into their homes, and the theft of
Francis Ford Coppola's backup hard drive [1], have me thinking
seriously.
---------------------------------------------------------------------
I. My requirements
I want the files to be available [R1] in case of disaster that
destroys my house, and [R2] disaster that destroys my company's data
center, [R3] if I lose my job and thus access to company servers.
[R4] I want the files to be private, so that only I or my wife, or
maybe another designated family member can read them.
[R5] I want to be able to access the files regardless of OS.
[R6] I'd also like to be able to access the files from elsewhere,
preferably anywhere over the Internet.
[R7] I want to be able to retrieve a given file in no more than 10
minutes of work, or add a file to the archive in relatively little work.
[R8] I want the files to be available to my descendants for a really
long time -- hopefully another 100 years -- because this stuff is
valuable to my family and they keep raising the life expectancy
(dangit!).
[R9] I need to minimize costs -- hopefully around $15/month-$25/month
(using 2007 dollars going forward).
[R10] I need to accommodate 20GB of storage now, plus 5GB additional
storage per year.
---------------------------------------------------------------------
II. Obvious options
It seems like the problem can be modeled in terms of Storage,
Confidentiality, and Ease of Availability, Integrity (non
corruption), and Longevity.
Storage
~~~~~~~
(S-1) Buy service through an online backup service.
(S-2) Find a friend and borrow space on their server, then use FTP or
SCP to access it.
(S-3) Buy cheap web hosting and get their storage. Access with FTP or
WebDAV.
(S-4) Use Gmail accounts for storage.
(S-5) Store the files on a hard drive at my house.
(S-6) Store the files on a hard drive at my company's data center.
(S-7) Start a co-op to lease hosted servers from rackspace.com and
similar.
Confidentiality
~~~~~~~~~~~~~~~
(C-1) Depend on the provider's access-control mechanisms to keep
intruders out.
(C-2) Use private-key encryption (gpg, for example) using a pass-code
I memorize.
Availability
~~~~~~~~~~~~
(A-1) Use the backup provider's special tool to access files.
(A-2) Only use a service that lets me download it via the web, ftp,
scp, or similar standard tool via the Internet.
(A-3) Only access the file from a designated bastion host that's
accessible via the Internet.
Integrity
~~~~~~~~~
(I-1) Depend on the storage provider to prevent data corruption.
(I-2) Use multiple distributed copies to recover from data
corruption, possibly with a checksum to prevent data corruption.
Longevity
~~~~~~~~~
(L-1) Depend on the storage provider to keep the files available for
a long time.
(L-2) Use multiple storage providers simultaneously, to defend
against business failure.
(L-3) Plan to migrate data from provider to provider. (* -- Can we
infer anything ability stability of hosted storage
---------------------------------------------------------------------
III. My favorite solution
My favorite is (S-3)(C-2)(A-2)(I-1)(L-3). I.e., buy cheap web hosting
service storing my files in plain sight of Google, use encryption
with a readily-available tool like GnuPG, expect the web hosting
folks to keep the files in tact, and just plan to migrate to another
provider when they go under. A key feature here is cheapness -- web
hosting is very cheap.
[R1, Home Disaster] OK; web hosting doesn't depend on my house.
[R2, Business disaster] OK, as long as I don't use my company's own
web hosting.
[R3, Loss of job & access to servers] OK; web hosting doesn't depend
on that particular job.
[R4, Private] OK, assuming GnuPG encryption and our passcode is safe.
[R5, OS-agnostic access] OK, assuming GnuPG is available on the
platform I'm using.
[R6, Access from anywhere] OK, web hosting providers are good at that.
[R7, Quick access] OK, assuming I have Internet access, and ability
to download a binary of GnuPG.
[R8, Century-long access] OK assuming I actually do keep the data
with a company who's in business.
[R9, Cheap] OK, As long as I can use a web hosting provider for this.
Some already exclude this [2]. Many providers are offering 300GB of
storage for <$10/month [3].
[R10, Growth] OK, assuming the cost of hosting storage keeps going
down. TBH, I'm not banking on fault-tolerant storage.
But risks in this plan:
-> How long would GnuPG encryption last me? After all, single-DES was
considered useful once. What's the expected lifespan of readily-
available encryption software?
-> What if something happened and I didn't make the yearly payment?
All the data would be flushed.
-> Is it reasonable to depend on the provider to keep files intact?
If not, I've got to do replication of some sort -- probably
replicating the data to a different storage provider.
-> Will I know to migrate the files soon enough to be useful? Say I
go with GoDaddy, and they cancel web hosting services 8 years from
now. Will I know in time? Will I be in a position to migrate the data
quickly enough?
------------------------
[1] -- "Coppola plea after computer theft", http://news.bbc.co.uk/2/
hi/entertainment/7019644.stm
[2] -- http://www.websitesource.com/browser_windows/webstorage.html