Followup...
On 5/12/23 08:17, Stuart Henderson wrote:
> On 2023-05-12, Nick Holland <nick@holland-consulting.net> wrote:
...
>> I had several other people suggest network problems. I'm not going to
>> say "impossible" or even "unlikely", but my understanding is that the
>> two machines are both plugged into the same switch, in the same rack.
>
I've since had someone more familiar with the physical environment say
my blind trust in their switch hw may be slightly misplaced. :)
> You can also look at
>
> netstat -ni -I ixl0
> netstat -ni -I ixl0 -e
> kstat ixl0:::
>
These looked REALLY clean. no drops, fails or collisions.
> which may give some other clues
>
> even pfctl -si might have something relevant
>
>> Several people pointed out I was using the default advskew of 1 second,
>> which means a small network glitch (or system load? maybe I'm all wrong
>> about this system never breaking a sweat, at least when it comes to
>> network traffic) would flip it, so I've increased it to 10 on both
>> machines (and apparently just induced a flip of my own. oops). By the
>> nature of this system, some people will be annoyed by any flip, so it
>> really doesn't matter if it was a 1 second outage or a 30 second outage,
>> I just want the system available again after an unhappy event (or
>> routine maintenance).
>
> the course adjustment in seconds is advbase, advskew is a much smaller
> delay meant for a config with primary/backup where the backup advertises
> just slightly less frequently.
Um. yeah. I set advbase, and typed advskew in the e-mail. my bad.
After setting to 10, I have gone over two weeks without any flips, so that
looks like that is a pretty good fix.
Thanks for the guidance!
Nick.
No comments:
Post a Comment