[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [SAGE] number of eggs in a basket



On Sat, Jan 08, 2005 at 10:28:19AM +0100, Brad Knowles wrote:
> At 9:40 PM -0700 2005-01-07, Ruth Milner wrote:
> 
> > What I said was that it was "10x more likely that some sort of
> > hardware or *system software* problem will take out a service".
> 
> 	I'm not convinced.  If you have N+M load-balanced/fail-over 
> clusters, the probability of the entire service being taken out by a 
> single hardware or system software failure should approach zero.

I think you're both right, you're just talking about different sized
clients.  Few companies are rich enough or large enough to justify the
N+M model for N+M > 3.  AOLs schema is frigging brilliant, but they're
far and away a different beast from the 100-person programming shop.

Very few small shops has services that require absolutely-by-God-must-
be-there-7x24x60x60 service availability.  In the few cases where they
do require such a service, I'd bet that the majority are some sort of
web site.  In those cases, they ought to be farming out the hosting
and connectivity to a well-qualified, well-vetted service.  The rest of
the shops can survive 15 minutes of downtime just fine.  Servers that
you manually fail over can give you that in most cases, tho file service
is probably an exception.

N+M is great in the right place.  For places that can afford 15 minutes
of downtime 4 times a year, it's overkill.  As Ruth and others have
said (including me), everybody's mileage varies.