Tuesday, August 03, 2021

Re: WireGuard host crashes roughly every week

(Resending, as I forgot to include the mailing list itself)

> On Aug 1, 2021, at 3:37 AM, Stuart Henderson <stu@spacehopper.org> wrote:
>
> It is always good to include dmesg when reporting a problem.
>
> An outline of the wireguard and other network config would be
> useful too. If you can give instructions to reproduce that would
> be ideal. If not then as much information about the setup as
> possible so we can try to reproduce.
>
> Does anything funny show up in dmesg if you do "ifconfig wg0
> debug"? (replace/repeat wg0 if you have other wg interfaces).


Hi Stuart!

Your advice lead me to discover, the issue happens only with the "PersistantKeepalive = 25" option I had enabled on each wg-quick peer. Looks like you could recreate it by making a few no-address peers with this option enabled.

In /etc/wireguard/wg0.conf I have a config file for wg-quick:

> [Interface]
> PrivateKey = xxxx
> ListenPort = 55555
> Address = 10.0.166.1/24
> SaveConfig = false
> MTU = 1400
>
> [Peer]
> # ExamplePeer1
> PresharedKey= xxxx
> PublicKey= xxxx
> AllowedIPs= 10.0.166.2/32
> PersistentKeepalive = 25

... And so on.

The 'ifconfig wg0 debug' with PersistantKeepalive enabled leaves these messages in the dmesg:

> wg0: Handshake for peer 6 did not complete after 5 seconds, retrying (try 18)
> wg0: Sending handshake initiation to peer 6
> wg0: Sending handshake initiation to peer 3
> wg0: Sending handshake initiation to peer 7
> wg0: Sending handshake initiation to peer 0
> wg0: Handshake for peer 2 did not complete after 5 seconds, retrying (try 18)
> wg0: Sending handshake initiation to peer 2
> wg0: Sending handshake initiation to peer 1
> wg0: Handshake for peer 4 did not complete after 5 seconds, retrying (try 14)
> wg0: Sending handshake initiation to peer 4
> wg0: Sending handshake initiation to peer 5
> wg0: Handshake for peer 6 did not complete after 5 seconds, retrying (try 19)
> wg0: Sending handshake initiation to peer 6
> wg0: Handshake for peer 3 did not complete after 5 seconds, retrying (try 2)
> wg0: Sending handshake initiation to peer 3
> wg0: Handshake for peer 2 did not complete after 5 seconds, retrying (try 19)
> wg0: Sending handshake initiation to peer 2
> wg0: Handshake for peer 0 did not complete after 5 seconds, retrying (try 2)
> wg0: Sending handshake initiation to peer 0
> wg0: Handshake for peer 7 did not complete after 5 seconds, retrying (try 2)
> wg0: Sending handshake initiation to peer 7
> wg0: Handshake for peer 5 did not complete after 5 seconds, retrying (try 2)
> wg0: Sending handshake initiation to peer 5
> wg0: Handshake for peer 4 did not complete after 5 seconds, retrying (try 15)
> wg0: Sending handshake initiation to peer 4
> wg0: Handshake for peer 1 did not complete after 5 seconds, retrying (try 2)
> wg0: Sending handshake initiation to peer 1

You can see the peers don't have pre-configured addresses as they are usually phones and not connected. But with PersistantKeepalive it looks like Wireguard is trying to connect to them, despite having no idea where to find them.

I commented out the PersistantKeepalive lines and the number of mbufs stays low as it should be. The VPN still works fine. Supposedly the PersistantKeepalive would prevent a NAT from destroying your connection due to no traffic in 30 seconds, which I've never seen before, but I figured better safe than sorry.

With PersistantKeepalive disabled on the server (enabled on the client), if I connect to the server and then disconnect, it begins trying to handshake the missing partner again, but this time it _doesn't_ raise the mbufs.

> wg0: Receiving handshake initiation from peer 0
> wg0: Sending handshake response to peer 0
> wg0: Receiving keepalive packet from peer 0
> wg0: Sending keepalive packet to peer 0
> wg0: Receiving keepalive packet from peer 0
> wg0: Receiving keepalive packet from peer 0
> wg0: Receiving keepalive packet from peer 0
> wg0: Receiving keepalive packet from peer 0
> wg0: Retrying handshake with peer 0 because we stopped hearing back after 15 seconds
> wg0: Sending handshake initiation to peer 0
> wg0: Handshake for peer 0 did not complete after 5 seconds, retrying (try 2)
> wg0: Sending handshake initiation to peer 0
> wg0: Handshake for peer 0 did not complete after 5 seconds, retrying (try 3)
> wg0: Sending handshake initiation to peer 0
> wg0: Handshake for peer 0 did not complete after 5 seconds, retrying (try 4)
> wg0: Sending handshake initiation to peer 0
> wg0: Retrying handshake with peer 0 because we stopped hearing back after 15 seconds
> wg0: Handshake for peer 0 did not complete after 5 seconds, retrying (try 2)
> wg0: Sending handshake initiation to peer 0
> wg0: Handshake for peer 0 did not complete after 5 seconds, retrying (try 3)
> wg0: Sending handshake initiation to peer 0
> wg0: Handshake for peer 0 did not complete after 5 seconds, retrying (try 4)
> wg0: Sending handshake initiation to peer 0
> wg0: Handshake for peer 0 did not complete after 5 seconds, retrying (try 5)
> wg0: Sending handshake initiation to peer 0
> wg0: Retrying handshake with peer 0 because we stopped hearing back after 15 seconds
> wg0: Handshake for peer 0 did not complete after 5 seconds, retrying (try 2)
> wg0: Sending handshake initiation to peer 0
> wg0: Handshake for peer 0 did not complete after 5 seconds, retrying (try 3)
> wg0: Sending handshake initiation to peer 0

Again, no wasted mbufs happening with the above handshakes. Only the ones happening with PersistantKeepalive enabled.

I think that's the problem found, but here's the other info requested.

The summary of my setup is like this. The Wireguard server is configured to let my devices VPN to my home network, both for accessing local network devices, and for connecting to the internet through the tunnel. My ISP-provided router forwards a public port to the Wireguard server so they can connect.

As you saw in the config file, Peers can come from any IP address and are usually not connected.

I have pf configured to forward the traffic:

> # wireguard
> # open wireguard port
> pass in on $ext_if proto udp from any to any port $wg_port
> # allow communication between wireguard peers
> pass on $wg_if
> # allow clients connected to wg0 to tunnel their outside world traffic
> pass out on $ext_if inet from ($wg_if:network) nat-to ($ext_if:0)

And below is the rest of my dmesg.

Thanks,
--Matt

> OpenBSD 6.9 (GENERIC.MP) #3: Mon Jun 7 08:21:26 MDT 2021
> root@syspatch-69-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 1996484608 (1903MB)
> avail mem = 1920663552 (1831MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 3.0 @ 0x7ee89040 (13 entries)
> bios0: vendor coreboot version "v4.13.0.2" date 12/23/2020
> bios0: PC Engines apu4
> acpi0 at bios0: ACPI 6.0
> acpi0: sleep states S0 S1 S4 S5
> acpi0: tables DSDT FACP SSDT MCFG TPM2 APIC HEST SSDT SSDT DRTM HPET
> acpi0: wakeup devices PBR4(S4) PBR5(S4) PBR6(S4) PBR7(S4) PBR8(S4) UOH1(S3) UOH2(S3) UOH3(S3) UOH4(S3) UOH5(S3) UOH6(S3) XHC0(S4)
> acpitimer0 at acpi0: 3579545 Hz, 32 bits
> acpimcfg0 at acpi0
> acpimcfg0: addr 0xf8000000, bus 0-63
> acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: AMD GX-412TC SOC, 998.27 MHz, 16-30-01
> cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TOPEXT,DBKP,PERFTSC,PCTRL3,ITSC,BMI1,XSAVEOPT
> cpu0: 32KB 64b/line 2-way I-cache, 32KB 64b/line 8-way D-cache, 2MB 64b/line 16-way L2 cache
> cpu0: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative
> cpu0: DTLB 40 4KB entries fully associative, 8 4MB entries fully associative
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> cpu0: apic clock running at 99MHz
> cpu0: mwait min=64, max=64, IBE
> cpu1 at mainbus0: apid 1 (application processor)
> cpu1: AMD GX-412TC SOC, 998.14 MHz, 16-30-01
> cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TOPEXT,DBKP,PERFTSC,PCTRL3,ITSC,BMI1,XSAVEOPT
> cpu1: 32KB 64b/line 2-way I-cache, 32KB 64b/line 8-way D-cache, 2MB 64b/line 16-way L2 cache
> cpu1: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative
> cpu1: DTLB 40 4KB entries fully associative, 8 4MB entries fully associative
> cpu1: smt 0, core 1, package 0
> cpu2 at mainbus0: apid 2 (application processor)
> cpu2: AMD GX-412TC SOC, 998.17 MHz, 16-30-01
> cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TOPEXT,DBKP,PERFTSC,PCTRL3,ITSC,BMI1,XSAVEOPT
> cpu2: 32KB 64b/line 2-way I-cache, 32KB 64b/line 8-way D-cache, 2MB 64b/line 16-way L2 cache
> cpu2: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative
> cpu2: DTLB 40 4KB entries fully associative, 8 4MB entries fully associative
> cpu2: smt 0, core 2, package 0
> cpu3 at mainbus0: apid 3 (application processor)
> cpu3: AMD GX-412TC SOC, 998.14 MHz, 16-30-01
> cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TOPEXT,DBKP,PERFTSC,PCTRL3,ITSC,BMI1,XSAVEOPT
> cpu3: 32KB 64b/line 2-way I-cache, 32KB 64b/line 8-way D-cache, 2MB 64b/line 16-way L2 cache
> cpu3: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative
> cpu3: DTLB 40 4KB entries fully associative, 8 4MB entries fully associative
> cpu3: smt 0, core 3, package 0
> ioapic0 at mainbus0: apid 4 pa 0xfec00000, version 21, 24 pins
> ioapic1 at mainbus0: apid 5 pa 0xfec20000, version 21, 32 pins
> acpihpet0 at acpi0: 14318180 Hz
> acpiprt0 at acpi0: bus 0 (PCI0)
> acpiprt1 at acpi0: bus 1 (PBR4)
> acpiprt2 at acpi0: bus 2 (PBR5)
> acpiprt3 at acpi0: bus 3 (PBR6)
> acpiprt4 at acpi0: bus 4 (PBR7)
> acpiprt5 at acpi0: bus -1 (PBR8)
> acpipci0 at acpi0 PCI0: 0x00000000 0x00000011 0x00000001
> acpicmos0 at acpi0
> amdgpio0 at acpi0 GPIO uid 0 addr 0xfed81500/0x300 irq 7, 184 pins
> "PRP0001" at acpi0 not configured
> "PRP0001" at acpi0 not configured
> "PRP0001" at acpi0 not configured
> "PRP0001" at acpi0 not configured
> "PRP0001" at acpi0 not configured
> "PRP0001" at acpi0 not configured
> "BOOT0000" at acpi0 not configured
> acpicpu0 at acpi0: C2(0@400 io@0x1771), C1(@1 halt!), PSS
> acpicpu1 at acpi0: C2(0@400 io@0x1771), C1(@1 halt!), PSS
> acpicpu2 at acpi0: C2(0@400 io@0x1771), C1(@1 halt!), PSS
> acpicpu3 at acpi0: C2(0@400 io@0x1771), C1(@1 halt!), PSS
> acpitz0 at acpi0: critical temperature is 115 degC
> cpu0: 998 MHz: speeds: 1000 800 600MHz
> pci0 at mainbus0 bus 0
> pchb0 at pci0 dev 0 function 0 "AMD 16h Root Complex" rev 0x00
> vendor "AMD", unknown product 0x1567 (class system subclass IOMMU, rev 0x00) at pci0 dev 0 function 2 not configured
> pchb1 at pci0 dev 2 function 0 "AMD 16h Host" rev 0x00
> ppb0 at pci0 dev 2 function 1 "AMD 16h PCIE" rev 0x00: msi
> pci1 at ppb0 bus 1
> em0 at pci1 dev 0 function 0 "Intel I211" rev 0x03: msi, address 00:0d:b9:58:72:bc
> ppb1 at pci0 dev 2 function 2 "AMD 16h PCIE" rev 0x00: msi
> pci2 at ppb1 bus 2
> em1 at pci2 dev 0 function 0 "Intel I211" rev 0x03: msi, address 00:0d:b9:58:72:bd
> ppb2 at pci0 dev 2 function 3 "AMD 16h PCIE" rev 0x00: msi
> pci3 at ppb2 bus 3
> em2 at pci3 dev 0 function 0 "Intel I211" rev 0x03: msi, address 00:0d:b9:58:72:be
> ppb3 at pci0 dev 2 function 4 "AMD 16h PCIE" rev 0x00: msi
> pci4 at ppb3 bus 4
> em3 at pci4 dev 0 function 0 "Intel I211" rev 0x03: msi, address 00:0d:b9:58:72:bf
> ccp0 at pci0 dev 8 function 0 "AMD 16h Crypto" rev 0x00
> xhci0 at pci0 dev 16 function 0 "AMD Bolton xHCI" rev 0x11: msi, xHCI 1.0
> usb0 at xhci0: USB revision 3.0
> uhub0 at usb0 configuration 1 interface 0 "AMD xHCI root hub" rev 3.00/1.00 addr 1
> ahci0 at pci0 dev 17 function 0 "AMD Hudson-2 SATA" rev 0x40: apic 4 int 19, AHCI 1.3
> ahci0: port 0: 6.0Gb/s
> scsibus1 at ahci0: 32 targets
> sd0 at scsibus1 targ 0 lun 0: <ATA, KingFast, Q120> t10.ATA_KingFast_10052220J0323_
> sd0: 114473MB, 512 bytes/sector, 234441648 sectors, thin
> ehci0 at pci0 dev 19 function 0 "AMD Hudson-2 USB2" rev 0x39: apic 4 int 18
> usb1 at ehci0: USB revision 2.0
> uhub1 at usb1 configuration 1 interface 0 "AMD EHCI root hub" rev 2.00/1.00 addr 1
> piixpm0 at pci0 dev 20 function 0 "AMD Hudson-2 SMBus" rev 0x42: SMI
> iic0 at piixpm0
> iic1 at piixpm0
> iic1: addr 0x4c 3e=00 48=00 4a=00 4e=00 fc=00 fe=00 words 00=ffff 01=ffff 02=ffff 03=ffff 04=ffff 05=ffff 06=ffff 07=ffff
> pcib0 at pci0 dev 20 function 3 "AMD Hudson-2 LPC" rev 0x11
> sdhc0 at pci0 dev 20 function 7 "AMD Bolton SD/MMC" rev 0x01: apic 4 int 16
> sdhc0: SDHC 2.0, 50 MHz base clock
> sdmmc0 at sdhc0: 4-bit, sd high-speed, mmc high-speed, dma
> pchb2 at pci0 dev 24 function 0 "AMD 16h Link Cfg" rev 0x00
> pchb3 at pci0 dev 24 function 1 "AMD 16h Address Map" rev 0x00
> pchb4 at pci0 dev 24 function 2 "AMD 16h DRAM Cfg" rev 0x00
> km0 at pci0 dev 24 function 3 "AMD 16h Misc Cfg" rev 0x00
> pchb5 at pci0 dev 24 function 4 "AMD 16h CPU Power" rev 0x00
> pchb6 at pci0 dev 24 function 5 "AMD 16h Misc Cfg" rev 0x00
> isa0 at pcib0
> isadma0 at isa0
> com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
> com0: console
> com1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
> com2 at isa0 port 0x3e8/8 irq 5: ns16550a, 16 byte fifo
> pcppi0 at isa0 port 0x61
> spkr0 at pcppi0
> lpt0 at isa0 port 0x378/4 irq 7
> intr_establish: pic ioapic0 pin 7: can't share type 3 with 2
> wbsio0 at isa0 port 0x2e/2: NCT5104D rev 0x53
> vmm0 at mainbus0: SVM/RVI
> scsibus2 at sdmmc0: 2 targets, initiator 0
> sd1 at scsibus2 targ 1 lun 0: <SD/MMC, SD04G, 0030> removable
> sd1: 3796MB, 512 bytes/sector, 7774208sectors
> uhub2 at uhub1 port 1 configuration 1 interface 0 "Advanced Micro Devices Hub" rev 2.00/0.18 addr 2
> vscsi0 at root
> scsibus3 at vscsi0: 256 targets
> softraid0 at root
> scsibus4 at softraid0: 256 targets
> root on sd0a (45c496e613922820.a) swap on sd0b dump on sd0b

No comments:

Post a Comment