Sunday, April 29, 2018

Re: Troubleshooting rl instability on OpenBSD 6.1

On 29/04/18 18:58, Stuart Henderson wrote:
> On 2018-04-29, Stuart Longland <stuartl@longlandclan.id.au> wrote:
>> The rack has 5 servers, a ARM-based PC and the switch, all of which run
>> from a pair of 12V 105Ah AGM batteries, charged from mains power and
>> solar. Switch is a Linksys LGS326-AU. No other devices plugged into
>> this switch have connectivity issues.
>>
>> The port the industrial PC is connected to is a plain access port with
>> no VLAN tagging, trunking or other funny stuff (although all of the
>> above get used elsewhere in the network).
>>
>> When the link drops out, there's nothing in `dmesg`. If I tether my
>> phone and hit the machine via SSH, I find it is unable to ping anything
>> on the internal network via IPv4 or IPv6, or vice versa.
>
> What does "ifconfig rl0" show, both normally and when this happens?
>
> How about "netstat -nI rl0"?

Not sure about netstat; but ifconfig rl0 isn't any different during and
outside of inaccessibility events. Right now, I'm getting this:

> # ifconfig rl0
> rl0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
> lladdr 00:d0:c9:e0:f4:75
> index 1 priority 0 llprio 3
> media: Ethernet autoselect (100baseTX full-duplex)
> status: active
> inet 172.31.249.254 netmask 0xffffff00 broadcast 172.31.249.255
> inet6 fe80::2d0:c9ff:fee0:f475%rl0 prefixlen 64 scopeid 0x1
> inet6 2001:44b8:21ac:70f9::fe prefixlen 64> # netstat -nI rl0
> Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Colls
> rl0 1500 <Link> 00:d0:c9:e0:f4:75 13402102 0 16803928 0 0
> rl0 1500 172.31.249/ 172.31.249.254 13402102 0 16803928 0 0
> rl0 1500 fe80::%rl0/ fe80::2d0:c9ff:fe 13402102 0 16803928 0 0
> rl0 1500 2001:44b8:2 2001:44b8:21ac:70 13402102 0 16803928 0 0

I'll try both commands again when I get problems.

> Is anything logged on the switch? This model has things like broadcast
> storm control, I'm wondering if that might have triggered or if it shows
> anything else useful.

I'll have a closer look, there were some messages about the link going
up and down, but it seems I missed turning the SNTP client on (settings
in two places) so the log timestamps are years off. I've reset the logs
and will see how we go.

> Can you try a different cable, can you try a different switch port?

Unfortunately all ports are full. I did replace the cable though.

>> Is there some sort of debugging flag I can turn on in the kernel to log
>> more detail about what's going on with rl0 when the loss of connectivity
>> is being experienced?
>
> I don't see any extra debugging that can be enabled for rl.
>

No problems, well, I'll keep an eye on the switch and see what it tells me.
--
Stuart Longland (aka Redhatter, VK4MSL)

I haven't lost my mind...
...it's backed up on a tape somewhere.

No comments:

Post a Comment