Monday, March 29, 2021

Re: The case of the phantom reboot

On 3/28/21 12:13 PM, David Newman wrote:
> On 3/28/21 4:58 AM, Kristjan Komloši wrote:
>
>> On 3/27/21 10:27 PM, David Newman wrote:
>>> OpenBSD 6.8 GENERIC#5 i386
>>>
>>> One of my systems rebooted at 03:01 local time today. I've seen kernel
>>> panics and bad hardware but I've never seen OpenBSD "just reboot" by
>>> itself, ever.

OpenBSD, not usually. Hardware OpenBSD is running on? Sure.

>>> There's no cron job that would do this. last(1) is no help; it shows the
>>> reboot command but not the shutdown that preceded it:
>>>
>>> root@ns ~ 4# last -f /var/log/wtmp.0
>>> reboot    ~                                 Sat Mar 27 03:01
>>> root      ttyp0    192.168.0.132            Wed Mar 24 11:23 - 11:23
>>> (00:00)
>>>
>>> wtmp.0 begins Wed Mar 24 11:23 2021
>>> root@ns ~ 5# last -f /var/log/wtmp.1
>>> root      ttyp0    192.168.0.132            Tue Mar 16 21:30 - 21:30
>>> (00:00)
>>> root      ttyp0    75.82.86.131             Tue Mar 16 13:14 - 21:30
>>> (08:15)
>>> root      ttyp0    75.82.86.131             Sun Mar 14 21:20 - 21:29
>>> (00:08)
>>> root      ttyp0    75.82.86.131             Sat Mar 13 17:42 - 21:13
>>> (03:31)
>>>
>>> The date gaps seem odd. I've ssh'd into this system multiple times
>>> between March 16-27. I don't see other signs of trouble in /var/log.
>>>
>>> I could use some help in looking for evidence of foul play, or "just" a
>>> hardware or software problem.
>>>
>>> Thanks in advance for further troubleshooting clues.
>>>
>>> dn
>>>
>> What kind of a machine is it running on? I remember having reboot
>> problems on certain HP and Supermicro servers with hardware watchdogs.
>
> This is a 10+-year-old Dell 1U server with a 2-GHz Celeron 440, part of
> a pair running CARP. Aside from having to replace spinning disks with
> SSDs a couple of years ago, they've been rock solid.

basic machine, worked for a long time, then starts giving problems, almost
certainly a hw problem unless you can tie the problem to a recent upgrade.
And that's not terribly likely on a "basic" hardware.

Every broken device started out "rock solid" ... until it isn't. That's
the definition of "Broken".

> I too have seen issues with Supermicros but that's with other OSs. I've
> never had a spontaneous reboot, on this system, and am concerned from
> the wtmp stuff above that this *may* have been triggered externally. I
> could use some clues in other things to check. Thanks.

As Stuart pointed out, that comes from the boot process, not the shutdown.

If you are really curious, you could put a serial console on it and wait
for the next event. PROBABLY won't see much, however.

Believe me, I'm all in favor of recycling computers -- in fact, as I
often tell skeptical employers, I'd rather have two ten year old systems
than one brand new system with a service contract, but computers don't
last as long as they used to, and curiously, some big-name servers seem
to sometimes have a shorter life than some desktops, A ten year old
computer that does the job reliably is good, but not an expectation.

Nick.

No comments:

Post a Comment