[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[SAGE] Incident & outage management principles, practices, etc.?



Does anybody know of any good web sites, books, papers, or other 
materials related to outage and incident management principles, 
practices, and so forth?

I'm not talking about security incidents in particular, though those 
are certainly one type; I'm talking about more general types of 
incidents and outages that service provider (particularly an ASP) 
might run into.  Network outages, hardware failures, system 
overloads, cooling/power failures, software meltdowns, database 
debacles, etc.  My audience is the management team of an ASP client, 
who are pretty sharp in their own fields but are mostly fairly 
inexperienced with operations.

I've got a pretty good knowledge of the topic in my head, since I've 
been doing this for so long, plus I've got have quite a bit of 
volunteer emergency services (search and rescue) training and 
experience.  What's the most effective way for somebody who doesn't 
happen to have that background, though, up to speed relatively 
quickly?

By the way, at the BayLISA meeting in October 
(http://www.baylisa.org/) and the LISA conference in early December 
(http://www.usenix.org/events/lisa05/), I'll be doing an invited talk 
on a related topic: how police and fire departments dynamically 
organize to manage emergencies as they develop, and what IT 
professionals can learn about that from them.  That will be part of 
what I'll present to this client, but it's not the whole story; it's 
a narrow look at one particular set of incident management methods. 
I'd like to give this client a broader view, and I'm wondering what 
useful introductory literature and so forth is out there before I 
spend my time (and their money) creating something from scratch.


Thanks!

-Brent
-- 
Brent Chapman <brent@greatcircle.com> -- Great Circle Associates, Inc.
Specializing in network infrastructure for Silicon Valley since 1989
For info about us and our services, please see http://www.greatcircle.com/
Network Automation blog: http://www.greatcircle.com/blog/network_automation