Tuesday, June 30, 2020

Re: OpenBGPD fatal in RDE: rde_dispatch_imsg_session: imsg_get error: Cannot allocate memory

On Tue, Jun 30, 2020 at 10:23:07AM +0200, Laurent CARON wrote:
> Hi,
>
>
> I'm running a pretty busy OpenBGPd router (~250 bgp sessions) with 4 IPv4
> and 4 IPv6 full views, plus a few IX sessions.
>
>
> # bgpctl show rib mem
> RDE memory statistics
>     820983 IPv4 unicast network entries using 31.3M of memory
>     203228 IPv6 unicast network entries using 10.9M of memory
>    1935802 rib entries using 118M of memory
>    6348318 prefix entries using 775M of memory
>     728103 BGP path attribute entries using 50.0M of memory
>            and holding 6348318 references
>     464633 BGP AS-PATH attribute entries using 22.3M of memory
>            and holding 728103 references
>      29055 entries for 371905 BGP communities using 8.6M of memory
>            and holding 6348318 references
>      18541 BGP attributes entries using 724K of memory
>            and holding 1618379 references
>      18540 BGP attributes using 145K of memory
>          0 as-set elements in 0 tables using 0B of memory
>         64 prefix-set elements using 3.0K of memory
> RIB using 1008M of memory
> Sets using 3.0K of memory
>
> RDE hash statistics
>         path hash: size 131072, 728103 entries
>             min 0 max 19 avg/std-dev = 5.555/2.268
>         aspath hash: size 131072, 464633 entries
>             min 0 max 17 avg/std-dev = 3.545/1.853
>         comm hash: size 16384, 29055 entries
>             min 0 max 8 avg/std-dev = 1.773/0.925
>         attr hash: size 16384, 18541 entries
>             min 0 max 8 avg/std-dev = 1.132/0.848
>
>
> More often than not the BGPd daemon is crashing (although having plenty of
> RAM (80G) on the server) with: /var/log/messages
>
> fatal in RDE: rde_dispatch_imsg_session: imsg_get error: Cannot allocate
> memory
>
> fatal in RDE: prefix_alloc: Cannot allocate memory
>
> fatal in RDE: communities_copy: Cannot allocate memory
>
> peer closed imsg connection
> main: Lost connection to RDE
> peer closed imsg connection
> SE: Lost connection to RDE
> peer closed imsg connection
> SE: Lost connection to RDE control
> Can't send message 57 to RDE, pipe closed
> last message repeated 12 times
> peer closed imsg connection
> SE: Lost connection to parent
> neighbor A.B.C.D (sas-v4-001): sending notification: Cease, administratively
> down
>
>
> :/etc/login.conf:
>
> default:\
>         :path=/usr/bin /bin /usr/sbin /sbin /usr/X11R6/bin /usr/local/bin
> /usr/local/sbin:\
>         :umask=022:\
>         :datasize-max=768M:\
>         :datasize-cur=768M:\
>         :maxproc-max=256:\
>         :maxproc-cur=128:\
>         :openfiles-max=1024:\
>         :openfiles-cur=512:\
>         :stacksize-cur=4M:\
>         :localcipher=blowfish,a:\
>         :tc=auth-defaults:\
>         :tc=auth-ftp-defaults:
>
> daemon:\
>         :ignorenologin:\
>         :datasize=infinity:\
>         :maxproc=infinity:\
>         :openfiles-max=1024:\
>         :openfiles-cur=128:\
>         :stacksize-cur=8M:\
>         :localcipher=blowfish,a:\
>         :tc=default:
>
> bgpd:\
>         :openfiles=512:\
>         :tc=daemon:
>
> How can I pinpoint the source of the problem ?
>

Can you check and monitor with ps aux | grep bgpd and or top the VSZ and
RSS of the RDE process. What is the maximum you notice. Also how do you
start bgpd? Make sure the limits from login.conf are actually applied
(using rcctl start should do that while doas bgpd would not).

Cheers
--
:wq Claudio

No comments:

Post a Comment