Tuesday, October 01, 2019

Re: wrong memory stats with collectd

--- src/memory.c.orig Tue Oct 23 08:57:09 2018
+++ src/memory.c Mon Sep 30 18:37:17 2019
@@ -412,6 +412,6 @@

#elif HAVE_SYSCTL
- int mib[] = {CTL_VM, VM_METER};
- struct vmtotal vmtotal = {0};
+ int mib[] = {CTL_VM, VM_UVMEXP};
+ struct uvmexp uvmexp;
gauge_t mem_active;
gauge_t mem_inactive;
@@ -419,17 +419,20 @@
size_t size;

- size = sizeof(vmtotal);
+ #define pgtok(a) ((a) * ((unsigned int)uvmexp.pagesize >> 10))

- if (sysctl(mib, 2, &vmtotal, &size, NULL, 0) < 0) {
+ size = sizeof(uvmexp);
+
+ if (sysctl(mib, 2, &uvmexp, &size, NULL, 0) < 0) {
char errbuf[1024];
WARNING("memory plugin: sysctl failed: %s",
sstrerror(errno, errbuf, sizeof(errbuf)));
+ bzero(&uvmexp, sizeof(uvmexp));
return -1;
}

assert(pagesize > 0);
- mem_active = (gauge_t)(vmtotal.t_arm * pagesize);
- mem_inactive = (gauge_t)((vmtotal.t_rm - vmtotal.t_arm) * pagesize);
- mem_free = (gauge_t)(vmtotal.t_free * pagesize);
+ mem_active = pgtok(uvmexp.active);
+ mem_inactive = pgtok(uvmexp.inactive);
+ mem_free = pgtok(uvmexp.free);

MEMORY_SUBMIT("active", mem_active, "inactive", mem_inactive, "free",
On Wed, Sep 25, 2019 at 08:02:26PM -0400, Predrag Punosevac wrote:
> Hi,
>
> I think I can confirm what you see on the bare metal system running 6.5
> with 16GB of RAM. I use Observium to display statistics from all my
> servers including dozen or so OpenBSD servers. I see that the numbers
> recovered by SNMP walk from OpenBSD servers are consistent with vmstat
> numbers. However when I try to override default collectd.conf and report
> absolute number and percentage besides memory used I don't get a
> meaningful number. Stuart Henderson @sthen is port maintainer and knows
> infinitely more about collectd than I. I hope he pitches in on the
> issue.

As the initial searches pointed towards VM_METER vs UVMEXP, I looked at
various sources in the tree (vmstat and top mostly) and tried to figure
out how they get the memory information. I ended up with the attached
patch which partialy works.

If I get the numbers as k, they are correct regarding vmstat/top values.
But as soon as I try to get them in bytes (using * 1024), the value seem
to overflow the variable used in collectd. As far as I can understand
it, the gauge_t type iis defined in src/libcollectdclient/collectd/types.h:
typedef double gauge_t;
I tried using uint64_t inside of double but that wrecked a whole lot of
things.

Any thoughts?
Thanks.

No comments:

Post a Comment