Thursday, August 31, 2017

OpenBSD 6.1-stable lock up

Hey,
having a dual-node setup of 6.0 in prod, I decided to move forward with one of machines
and upgrade to 6.1-stable. Ending up in benchmark tool "locking" the 6.1 machine.

Background:
Nodes are Xeon E5-2642v3 3.4Ghz x12, 16G RAM, 64G DOM modules as hdd,
4x X540T (ix) - 2x on-board and 2x PCI-card.

All 4x X540T are connected to 2x Cisco Nexus 3000-series, creating an LACP trunk (1x on-board + 1x PCI).
trunk0 - external (VLAN), 1x NIC connected to switch1 and 1x NIC connected to switch2 (ix0 + ix3)
trunk1 - internal (VLAN) , 1x NIC connected to switch1 and 1x NIC connected to switch2 (ix1 + ix2)
As I have 2x Nexus 3000, VPC is configured and sitting on top of LACP trunk on their end.

Each obsd node have several carp interfaces configured on top of trunk0.
Only one carp interface on trunk1 - carp1.

Each switch acting as a default gw (VRRP configured) for any existing VLAN, except one towards trunk1.
Default gateway for those switches is IP on carp1.
Those switches run OSPF as well as obsd nodes do.

obsd nodes are the front line, facing the Internet. (2x uplink goes into 2x Nexus and then traffic is passed to 2x obsd.)
Running relayd with SSL-offload and plain HTTP.
Except relayd, there is ospfd, ntpd, snmpd, and bgpd(for distributed blacklisting around other global nodes).

The problem:
While doing a bench with https://github.com/wg/wrk <https://github.com/wg/wrk> from my laptop (OS X, 1Gbps max. pipe) agains the environment (HTTPS)
relayd experienced problems with handling the traffic.

shell# ./wrk -t16 -c1500 -d90s —latency <https://URL>

wrk hammering apache 2.4(behind those nodes), serving a txt file with avg 7k-10k req/s as an output:

wrk -t16 -c1500 -d90s --latency https:/<URL>/ping.txt
Running 2m test @ https://<URL>/ping.txt
16 threads and 1500 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 131.17ms 70.91ms 1.97s 91.70%
Req/Sec 651.06 135.80 1.09k 84.95%
Latency Distribution
50% 131.90ms
75% 144.63ms
90% 159.63ms
99% 230.92ms
927039 requests in 1.50m, 190.12MB read
Socket errors: connect 0, read 0, write 0, timeout 1330
Requests/sec: 10290.54
Transfer/sec: 2.11MB

wrk hammering apache 2.4, mod_proxy_balance, with NodeJS nodes behind apache:

wrk -t16 -c1500 -d90s --latency https://<URL>/nodejs
Running 2m test @ https://<URL>/nodejs
16 threads and 1500 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 445.91ms 518.66ms 2.00s 83.49%
Req/Sec 56.80 26.89 180.00 68.48%
Latency Distribution
50% 217.57ms
75% 374.15ms
90% 1.50s
99% 1.95s
80673 requests in 1.50m, 1.12GB read
Socket errors: connect 0, read 5534, write 0, timeout 18099
Requests/sec: 895.42
Transfer/sec: 12.72MB

'top' showed none interrupting at all, but rather heavy system load values and some user values.
20-30% - user
80-90% - system
relayd (12 forks as the number of cores) - 99% usage.

I basically killed both machines running 6.0, thus my decision to upgrade to 6.1.
However, during the tests against 6.0, my ssh session never got terminated ("kicked out") even with this hight load (0% CPU idle).
6.1 showed different symptoms - ssh session termination, login via web based IPMI GUI hanging after log in part,
ping not responding(from the switches and node1 which is 6.0 yet).
After a while, with bench aborted, 6.1 eventually let me in via ssh (terminal via IPMI stil hanging).

snmpd which been running (remember), been polled by other sys doing graphs.
What been seen on those graphs is high rate of output err pkts on trunks, not NICs (ix) them selves.
Also, syslog, with enabled 'log all' for relayd showed a lot of 'buffer timeout event',
ospfd yeilding about 'no buffer space available'.

I had to modd relayd.conf to spawn only 8 preforks instead of 12
and

kern.maxclusters=24576 #12288
kern.maxfiles=65536 #32768

in order to survive the bench (e.g.. having ssh session alive).
Values commented out are from the 6.0 setup.

I'm looking for any advice here, which hopefully will lead to a stable and performant setup.
Configuration follows.

———sysct.conf (obsd 6.0)————
net.inet.ip.forwarding=1
net.inet.ipcomp.enable=1 # 1=Enable the IPCOMP protocol
net.inet.etherip.allow=1 # 1=Enable the Ethernet-over-IP protocol
net.inet.tcp.ecn=1 # 1=Enable the TCP ECN extension
net.inet.carp.preempt=1 # 1=Enable carp(4) preemption
net.inet.carp.log=3 # log level of carp(4) info, default 2
ddb.panic=0 # 0=Do not drop into ddb on a kernel panic
ddb.console=1 # 1=Permit entry of ddb from the console
kern.pool_debug=0
net.inet.ip.maxqueue=2048
kern.somaxconn=4096
kern.maxclusters=12288
kern.maxfiles=32768
net.inet.ip.ifq.maxlen=2048


————login.conf———————
relayd:\
:maxproc-max=31:\
:openfiles-cur=65536:\
:openfiles-max=65536:\
:tc=daemon:

—————pf.conf———————
set block-policy drop
set limit { states 3000000, frags 2000, src-nodes 1000000 }

—————relayd.conf———————
interval 10
timeout 1000
prefork 8 #12
log all ——>>>>>>>> for debuging the situation

shell# netstat -m
1227 mbufs in use:
626 mbufs allocated to data
189 mbufs allocated to packet headers
412 mbufs allocated to socket names and addresses
0/232/64 mbuf 2048 byte clusters in use (current/peak/max)
423/2865/120 mbuf 2112 byte clusters in use (current/peak/max)
0/160/64 mbuf 4096 byte clusters in use (current/peak/max)
0/200/64 mbuf 8192 byte clusters in use (current/peak/max)
0/14/112 mbuf 9216 byte clusters in use (current/peak/max)
0/20/80 mbuf 12288 byte clusters in use (current/peak/max)
0/16/64 mbuf 16384 byte clusters in use (current/peak/max)
0/8/64 mbuf 65536 byte clusters in use (current/peak/max)
23400 Kbytes allocated to network (5% in use)
0 requests for memory denied
0 requests for memory delayed
0 calls to protocol drain routines

Kernel is stock, latest via syspatch.

P.S.
Ifq.drops been never observed, nor
high ifq.len (max. 5 pkts in queue)
PF had max. 290k states.

Br
//mxb

No comments:

Post a Comment