[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [SAGE] Sun Enterprise hardware watchdog enable experience



On Wed, 28 Aug 2002, Jeff Mallory wrote:

>
> Hi all,
>
> The Sun Enterprise machines have a hardware watchdog that can reboot the
> server if the OS misses resetting it (presumably because the system has
> crashed or hung). This feature is off by default (enabled with set
> watchdog_enable = 1 in /etc/system ) and my crusty brain cells tell me this
> was once a semi-flaky feature in earlier Sun hardware, hence the
> off-by-default setting.
>
> There is a software watchdog that blinks the front panel operating light
> (among other things) but I believe it can only force the system to the
> OpenBoot prompt when it detects a problem.
>
> Has anyone had experience one way or the other with using (or mis-using)
> this hardware watchdog reset feature?
>
I have not used it on the enterprise systems, but I have used it
successfully on a worldwide deployment of netra t1 class systems.
They are part of a redundant failover non time-dependent architecture.
There was/is an issue with the machines locking up occassionally, and
watchdog reset as controlled by lom combined with automatic reboot on
lockup combined with logging on /, and /var means I can just forget
about them and worry about more important things. If they get stuck,
they fix themselves and I don't care. On an enterprise machine I'd
have to think more carefully about it, because presumably I'd care
more about the machine getting stuck and what caused it.
	Doug