[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[SAGE] Incident & outage management principles, practices, etc.?
Does anybody know of any good web sites, books, papers, or other
materials related to outage and incident management principles,
practices, and so forth?
I'm not talking about security incidents in particular, though those
are certainly one type; I'm talking about more general types of
incidents and outages that service provider (particularly an ASP)
might run into. Network outages, hardware failures, system
overloads, cooling/power failures, software meltdowns, database
debacles, etc. My audience is the management team of an ASP client,
who are pretty sharp in their own fields but are mostly fairly
inexperienced with operations.
I've got a pretty good knowledge of the topic in my head, since I've
been doing this for so long, plus I've got have quite a bit of
volunteer emergency services (search and rescue) training and
experience. What's the most effective way for somebody who doesn't
happen to have that background, though, up to speed relatively
quickly?
By the way, at the BayLISA meeting in October
(http://www.baylisa.org/) and the LISA conference in early December
(http://www.usenix.org/events/lisa05/), I'll be doing an invited talk
on a related topic: how police and fire departments dynamically
organize to manage emergencies as they develop, and what IT
professionals can learn about that from them. That will be part of
what I'll present to this client, but it's not the whole story; it's
a narrow look at one particular set of incident management methods.
I'd like to give this client a broader view, and I'm wondering what
useful introductory literature and so forth is out there before I
spend my time (and their money) creating something from scratch.
Thanks!
-Brent
--
Brent Chapman <brent@greatcircle.com> -- Great Circle Associates, Inc.
Specializing in network infrastructure for Silicon Valley since 1989
For info about us and our services, please see http://www.greatcircle.com/
Network Automation blog: http://www.greatcircle.com/blog/network_automation