Sunday, April 05, 2020

Re: ospfd in 6.6 when dying doesn't recover database before adj timer expires

Hi Tobias,

On Fri, Apr 03, 2020 at 08:39:30AM +0000, Tobias Urdin wrote:
> Hello,
>
>
> We've seen a issue where if you perform a ospfctl reload and have a faulty configuration for example a interface
>
> that doesn't exist it dies (which is fair in itself) but the seq num for the database never catches up with the DR until
>
> the adjacency timer expires over and over again, can take up to 30 minutes before it's back.
>
>
> I produce a failure with a faulty interface.
>
> Apr 3 10:03:46 router1 ospfd[36062]: fatal in rde: rde_nbr_new: unknown interface
> Apr 3 10:03:46 router1 ospfd[19043]: ospf engine exiting
> Apr 3 10:03:46 router1 ospfd[67917]: kernel routing table decoupled
> Apr 3 10:03:46 router1 ospfd[67917]: terminating​

Can you tell us, how this failure can be reproduced? ospfd is supposed to log that a
config reload failed and carry on with it's old config.


> Upon startup we then get stuck in this loop och trying to get back.
>
> Apr 3 10:04:15 router1 ospfd[91965]: startup
> Apr 3 10:06:22 router1 ospfd[19699]: nbr_adj_timer: failed to form adjacency with x.x.x.1 on interface vmx0
> Apr 3 10:06:42 router1 ospfd[19699]: recv_db_description: neighbor ID x.x.x.1: invalid seq num, mine 27a9fd66 his 27a99b25
> Apr 3 10:08:22 router1 ospfd[19699]: nbr_adj_timer: failed to form adjacency with x.x.x.1 on interface vmx0
> Apr 3 10:09:17 router1 ospfd[19699]: recv_db_description: neighbor ID x.x.x.1: invalid seq num, mine 27aa6475 his 27a9fd69
> Apr 3 10:10:22 router1 ospfd[19699]: nbr_adj_timer: failed to form adjacency with x.x.x.1 on interface vmx0
> Apr 3 10:11:02 router1 ospfd[19699]: recv_db_description: neighbor ID x.x.x.1: invalid seq num, mine 27aa9109 his 27aa6476
> Apr 3 10:11:22 router1 ospfd[19699]: recv_db_description: neighbor ID x.x.x.1: invalid seq num, mine 27aa9109 his 27aa6476
> Apr 3 10:11:27 router1 ospfd[19699]: recv_db_description: neighbor ID x.x.x.1: invalid seq num, mine 27aa9109 his 27aa6476
> Apr 3 10:11:32 router1 ospfd[19699]: recv_db_description: neighbor ID x.x.x.1: invalid seq num, mine 27aa9109 his 27aa6476
> Apr 3 10:11:37 router1 ospfd[19699]: recv_db_description: neighbor ID x.x.x.1: invalid seq num, mine 27aa9109 his 27aa6476
> Apr 3 10:11:42 router1 ospfd[19699]: recv_db_description: neighbor ID x.x.x.1: invalid seq num, mine 27aa9109 his 27aa6476
> Apr 3 10:11:47 router1 ospfd[19699]: recv_db_description: neighbor ID x.x.x.1: invalid seq num, mine 27aa9109 his 27aa6476
> Apr 3 10:12:22 router1 ospfd[19699]: nbr_adj_timer: failed to form adjacency with x.x.x.1 on interface vmx0
> Apr 3 10:12:51 router1 ospfd[19699]: recv_db_description: neighbor ID x.x.x.1: invalid seq num, mine 27ab558d his 27aa910b
> Apr 3 10:12:51 router1 ospfd[19699]: recv_db_description: neighbor ID x.x.x.1: invalid seq num, mine 27ab558d his 27aa910b​
>
>

Can you share a pcap file with the OSPF packages during this situation?

> It's like it cannot match the database with the DR until the DEFAULT_ADJ_TMOUT​ (120sec) timeout occurs and it starts all over again.
>
> Anybody seen this before? Should probably note that the DR in the other end is not a device running OpenOSPFD.

What device / software version is on the other end?

Thank you,
Remi

No comments:

Post a Comment