Hi,
I hit the same case too.
It looks like there's something wrong with the ipi:
I have a system where I am running the current OpenBSD kernel dated May 21.
The systat output and the vmstat -i output do not match, and there are serious differences between them.
For example, while the ip in vmstat -i output is below 5000, the ip in systat output can go above 65000.
I don't know if it's a coincidence, but I received complaints from users on a firewall I upgraded to 7.3 and then I've downgraded the system when I saw the systat values. Maybe the notifications from the user were not correct and I was in a hurry. It can be both; I am not sure.
On the other hand, when the ix(4) tso code is fully committed(*), I wanna make detailed tests with Cisco Trex and share it.
(*) I think the ix(4) tso code is partially committed, but I guess it's not completely finished yet, right?
________________________________
From: owner-misc@openbsd.org <owner-misc@openbsd.org> on behalf of Sven F. <sven.falempin@gmail.com>
Sent: Thursday, June 1, 2023 00:35
To: misc@openbsd.org <misc@openbsd.org>
Subject: Re: High Interrupt After 7.3 Upgrade
On Wed, May 31, 2023 at 5:27 PM Stuart Henderson <stu.lists@spacehopper.org>
wrote:
> On 2023-05-31, Mark (obsd) <openbsd-list@nerdish.us> wrote:
> > Hi Chris,
> >
> > On Tue, May 30, 2023 at 8:59 AM Chris Cappuccio <chris@nmedia.net>
> wrote:
> >
> >> Samuel Jayden [samueljaydan1994@gmail.com] wrote:
> >> > Hi again,
> >> >
> >> > Just for the record:
> >> > I've downgraded to OpenBSD 7.2 (reinstalled) and everything is working
> >> like
> >> > a charm again.
> >> > I don't know what is wrong with 7.3 but ipi interrupt rate is too much
> >> and
> >> > somehow OpenBSD performance is too bad..
> >> > Thanks for reading.
> >> >
> >>
> >> Sounds like you are using 'systat' to measure interrupts. This is a bug
> >> in systat was was fixed in 7.3. Here is Scott Cheloha's message from
> that
> >> fix:
> >>
> >> "systat(1): vmstat: measure elapsed time with clock_gettime(2) instead
> of
> >> ticks
> >>
> >> The vmstat view in systat(1) should not use statclock() ticks to count
> >> elapsed time. First, ticks are low resolution. Second, the statclock
> >> is sometimes randomized, so each tick is not necessarily of equal
> >> length. Third, we're counting ticks from every CPU on the system, so
> >> every rate in the view is divided by the number of CPUs. For example,
> >> on an amd64 system with 8 CPUs you currently see:
> >>
> >> 200 clock
> >>
> >> ... when the true clock interrupt rate on that system is 1600.
> >>
> >> Instead, measure elapsed time with clock_gettime(2). Use CLOCK_UPTIME
> >> here so we exclude time when the system is suspended. With this
> >> change we no longer need "stathz" or "hertz". We can also get rid of
> >> the anachronistic secondary clock failure test.
> >>
> >>
> >>
> > I'm not the OP, but that's interesting to me because I'm wondering if
> it's
> > why Prometheus'
> > node_exporter from packages is reporting wildly wrong CPU stats on 7.3
> that
> > don't at all
> > match what you'd expect when comparing top/htop output? It was fine prior
> > to upgrading
> > to 7.3, but I've just left digging into it on the back burner due to
> other
> > priorities.
>
> That's a different issue, it was fixed in -current - I've just merged it to
> -stable so updated packages should show up in a day or two.
>
>
> 7.3 interrupt ( Intel(R) Celeron(R) J6412 )
v6-fw# vmstat -i
interrupt total rate
irq96/acpi0 1 0
irq145/inteldrm0 497 0
irq97/xhci0 3 0
irq98/ahci0 1873806 0
irq114/igc0:0 157799531 50
irq115/igc0:1 194120194 61
irq116/igc0:2 148272908 47
irq117/igc0:3 159077128 50
irq118/igc0 2 0
irq119/igc1:0 158925348 50
irq120/igc1:1 181916246 58
irq121/igc1:2 155586734 49
irq122/igc1:3 170737329 54
irq123/igc1 2 0
irq129/igc3:0 2126 0
irq130/igc3:1 540117832 172
irq131/igc3:2 568886 0
irq132/igc3:3 909270099 290
irq133/igc3 13 0
irq0/clock 2505321992 799
irq0/ipi 5601964631 1788
Total 10885555308 3475
I did not notice performance issue here,
but maybe irq0/ipi 5601964631 1788
is bad
i did noticed some unexpected kernel_lock jittering the traffic ~15ms
--
--
---------------------------------------------------------------------------------------------------------------------
Knowing is not enough; we must apply. Willing is not enough; we must do
No comments:
Post a Comment