[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [SAGE] Long lived, cheap data storage



On 10/9/07, Brad Knowles wrote:

>  Moreover, just the amount of network capacity required to move that
>  much data would be prohibitive, due to the bandwith-delay product
>  nature of TCP/IP.

And let's not forget your own network issues -- most people have DSL 
or cablemodem, if they're lucky.  Sure, you may get 
kinda-semi-sorta-okay download speeds, but your upload speeds are 
almost certainly going to really seriously suck big time, and even if 
your provider isn't hopelessly oversubscribed on their disk storage 
and their bandwidth, you simply won't be able to upload stuff fast 
enough to them.


Ignoring everything else, to make this sort of thing work on any kind 
of scale, you'll need to have a highly customized backup client & 
server system, one that makes heavy use of both P2P and anycast 
network routing, so that you can get your packets off your system as 
quickly as possible and written to disk that is relatively closely 
located to you (minimizing the problem caused by the bandwidth-delay 
product nature of TCP/IP, as previously mentioned).

You would then need some sort of other front-end (maybe web-based) 
that takes advantage of that same kind of P2P/anycast back-end 
solution, in order to turn that into something that could be 
understood by most reasonable clients.


You'd also want some sort of freenet-like built-in encryption system. 
Sure, quantum crypto would make such a thing relatively useless, but 
there's no sense just giving away that farm and throwing up our hands 
in defeat.

More importantly, you want to encrypt the meta-data, so that they 
can't just go on any old fishing expedition any time they want.  No, 
they'd have to have specific targets in mind, and they'd only be able 
to pull up the encrypted versions of those targets, which they would 
then have to decrypt.

And it would be a considerably more intensive task to pull up one 
target after another and decrypt it, just to find out that either 
that target doesn't have anything of interest or that it only tells 
you about what the next link in the chain is, as opposed to your 
being able to trivially access the entire network at once and being 
able to scan all links in parallel.


This isn't something that can be trivially easily solved with a 
content distribution network -- you need something that works 
bi-directionally, and gets your packets out of "normal" TCP/IP and 
into the private network as quickly and directly as possible, so that 
they can apply whatever internal network/storage tricks need to be 
performed in order to make this thing scale.


Thinking about it, the kinds of groups that must either already have 
such systems or are rapidly working on them are content-heavy sites 
like Flickr or YouTube.

If a competitor of Flickr or YouTube decides to partner with Amazon 
and make use of the S3 system as a storage back-end, then that would 
tell you something about the ability of the S3 system to take that 
kind of load.  Otherwise, I think that also tells you something about 
the Amazon S3 system -- and what it lacks.

-- 
Brad Knowles <brad@shub-internet.org>
LinkedIn Profile: <http://tinyurl.com/y8kpxu>