Sunday, March 28, 2021

Re: The case of the phantom reboot

On 2021-03-28, David Newman <dnewman@networktest.com> wrote:
> On 3/28/21 4:58 AM, Kristjan KomloĊĦi wrote:
>
>> On 3/27/21 10:27 PM, David Newman wrote:
>>> OpenBSD 6.8 GENERIC#5 i386
>>>
>>> One of my systems rebooted at 03:01 local time today. I've seen kernel
>>> panics and bad hardware but I've never seen OpenBSD "just reboot" by
>>> itself, ever.
>>>
>>> There's no cron job that would do this. last(1) is no help; it shows the
>>> reboot command but not the shutdown that preceded it:
>>>
>>> root@ns ~ 4# last -f /var/log/wtmp.0
>>> reboot    ~                                 Sat Mar 27 03:01
>>> root      ttyp0    192.168.0.132            Wed Mar 24 11:23 - 11:23
>>> (00:00)
>>>
>>> wtmp.0 begins Wed Mar 24 11:23 2021
>>> root@ns ~ 5# last -f /var/log/wtmp.1
>>> root      ttyp0    192.168.0.132            Tue Mar 16 21:30 - 21:30
>>> (00:00)
>>> root      ttyp0    75.82.86.131             Tue Mar 16 13:14 - 21:30
>>> (08:15)
>>> root      ttyp0    75.82.86.131             Sun Mar 14 21:20 - 21:29
>>> (00:08)
>>> root      ttyp0    75.82.86.131             Sat Mar 13 17:42 - 21:13
>>> (03:31)
>>>
>>> The date gaps seem odd. I've ssh'd into this system multiple times
>>> between March 16-27. I don't see other signs of trouble in /var/log.
>>>
>>> I could use some help in looking for evidence of foul play, or "just" a
>>> hardware or software problem.
>>>
>>> Thanks in advance for further troubleshooting clues.
>>>
>>> dn
>>>
>> What kind of a machine is it running on? I remember having reboot
>> problems on certain HP and Supermicro servers with hardware watchdogs.
>
> This is a 10+-year-old Dell 1U server with a 2-GHz Celeron 440, part of
> a pair running CARP. Aside from having to replace spinning disks with
> SSDs a couple of years ago, they've been rock solid.
>
> I too have seen issues with Supermicros but that's with other OSs. I've
> never had a spontaneous reboot, on this system, and am concerned from
> the wtmp stuff above that this *may* have been triggered externally. I
> could use some clues in other things to check. Thanks.
>
> dn
>
>

The "reboot" wtmp entry is written by init(8).

It is something that could possibly be caused by bad hardware or a
glitch in the power feed amongst other options (the latter may affect
some machines differently than others)..

Perhaps it's worth enabling accounting in rc.conf.local to see if
you can figure out if any commands are executed around that time if
it happens again.

No comments:

Post a Comment