[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [SAGE] Long lived, cheap data storage
On 10/9/07, Brad Knowles wrote:
> Moreover, just the amount of network capacity required to move that
> much data would be prohibitive, due to the bandwith-delay product
> nature of TCP/IP.
And let's not forget your own network issues -- most people have DSL
or cablemodem, if they're lucky. Sure, you may get
kinda-semi-sorta-okay download speeds, but your upload speeds are
almost certainly going to really seriously suck big time, and even if
your provider isn't hopelessly oversubscribed on their disk storage
and their bandwidth, you simply won't be able to upload stuff fast
enough to them.
Ignoring everything else, to make this sort of thing work on any kind
of scale, you'll need to have a highly customized backup client &
server system, one that makes heavy use of both P2P and anycast
network routing, so that you can get your packets off your system as
quickly as possible and written to disk that is relatively closely
located to you (minimizing the problem caused by the bandwidth-delay
product nature of TCP/IP, as previously mentioned).
You would then need some sort of other front-end (maybe web-based)
that takes advantage of that same kind of P2P/anycast back-end
solution, in order to turn that into something that could be
understood by most reasonable clients.
You'd also want some sort of freenet-like built-in encryption system.
Sure, quantum crypto would make such a thing relatively useless, but
there's no sense just giving away that farm and throwing up our hands
in defeat.
More importantly, you want to encrypt the meta-data, so that they
can't just go on any old fishing expedition any time they want. No,
they'd have to have specific targets in mind, and they'd only be able
to pull up the encrypted versions of those targets, which they would
then have to decrypt.
And it would be a considerably more intensive task to pull up one
target after another and decrypt it, just to find out that either
that target doesn't have anything of interest or that it only tells
you about what the next link in the chain is, as opposed to your
being able to trivially access the entire network at once and being
able to scan all links in parallel.
This isn't something that can be trivially easily solved with a
content distribution network -- you need something that works
bi-directionally, and gets your packets out of "normal" TCP/IP and
into the private network as quickly and directly as possible, so that
they can apply whatever internal network/storage tricks need to be
performed in order to make this thing scale.
Thinking about it, the kinds of groups that must either already have
such systems or are rapidly working on them are content-heavy sites
like Flickr or YouTube.
If a competitor of Flickr or YouTube decides to partner with Amazon
and make use of the S3 system as a storage back-end, then that would
tell you something about the ability of the S3 system to take that
kind of load. Otherwise, I think that also tells you something about
the Amazon S3 system -- and what it lacks.
--
Brad Knowles <brad@shub-internet.org>
LinkedIn Profile: <http://tinyurl.com/y8kpxu>