Thursday, September 19, 2024

Re: vxlan(4) Between Three Sites

On Thu, Sep 19, 2024 at 09:48:15AM -0700, Bryan Vyhmeister wrote:
> On Wed, Sep 18, 2024 at 11:17:45AM +1000, David Gwynne wrote:
> > On Mon, Sep 16, 2024 at 09:57:18PM -0700, Bryan Vyhmeister wrote:
> > > On Tue, Sep 17, 2024 at 02:31:09PM +1000, David Gwynne wrote:
> > > >
> > > > On Mon, Sep 16, 2024 at 12:25:35PM -0700, Bryan Vyhmeister wrote:
> > > > > I am attempting to build a proof of concept of how to use vxlan(4)
> > > > > on OpenBSD in a fully meshed OSPF network with [wireless] links
> > > > > between sites under my full control so mtu is not an issue (mtu 1550
> > > > > for vxlan0 and mtu 1600 or higher for hardware interfaces). The goal
> > > > > is to bridge a group of VLANs between sites A, B, and C.
> > > <snip>
> > > >
> > > > vxlan(4) in learning mode relies on a single multicast capable
> > > > underlay network between all sites/points. if you are using separate
> > > > interfaces on A to talk to B and C, then this requirement isn't
> > > > satisfied.
> > > >
> > > > i dont know enough about multicast routing to know if or how i should
> > > > support vxlan in learning mode with routes to multiple interfaces.
> > >
> > > Thanks for your response. That makes sense then if that is how things
> > > are underneath. I'm not that familiar with how multicast routing works
> > > either but that does appear to be how commercial vendors'
> > > implementations work from what I have read.
> >
> > they rely on routes?
>
> I think it relies on PIM which I just found out is not supported. Again,
> I'm not too familiar with PIM. I could also use a Juniper or some or
> other switch to do all of the OSPF routing and provide the multicast
> routing environment and then just attach OpenBSD routers for running the
> vxlan(4) only but I would prefer to do everything in OpenBSD.
>
> > > > > I also tried using a WireGuard overlay on top of this network. With
> > > > > wg0 as the parent but that does not seem to work either in vxlan(4)
> > > > > learning mode unless I am missing something.
> > > >
> > > > wireguard as an underlay for vxlan in learning mode doesn't work
> > > > because wg isn't multicast capable. the cryptokey routing thing doesnt
> > > > support sending a packet destined to a single address (eg, 239.0.0.1)
> > > > to multiple peers (ie, B and C).
> > >
> > > I was testing BGP over tunnels and noticed that ospf6d will not function
> > > over wg(4) either.
> >
> > wg is neither multicast or point-to-point, and it completely ignored
> > existing point to multipoint semantics. so yeah. it feels pretty clumsy
> > when you try to do interesting stuff beyond what it was specifically
> > created for.
>
> Once I realized wg(4) wouldn't work, my solution was to use a gif(4)
> tunnel or etherip(4) bridged with veb(4) to a vport(4) but I think the
> gif(4) solution is simpler. Either solution worked fine for ospfd and
> ospf6d as well as BGP over IPv4 and IPv6. Is there a performance benefit
> with etherip(4) and vport(4) rather than gif(4)?

gif over dedicated ethernet links seems unecessary becase you should
already have working IP connectivity. how does it help your situation?

> > openbsd lets you combine vlans and bridges/vebs/tpmr and tunnels in
> > pretty arbitrary ways. there's advantages to doing everything in
> > software sometimes.
>
> It's quite nice to have so many flexible options.
>
> > etherip(4) is the lowest overhead ethernet over ip tunnel interface, but
> > you can only have one etherip tunnel between 2 endpoints. you can add
> > vlans on top of etherip, or you can use egre/vxlan/etc with different
> > vnetids instead.
>
> I had not tried using VLANs over etherip(4) but that is a good idea and
> maybe better than trying to get vxlan(4) to do what I want. My plan is
> to feed the site A hardware ethernet interface from a switch with all
> traffic being tagged with VLAN tags. At sites B and C (and D, E, etc.),
> the hardware ethernet interface would plug right into a switch port that
> will be prepared for the tagged traffic as well. I'm essentially
> building a network ring and that's where I thought vxlan(4) would work
> well. Once I have this setup properly, I don't anticipate needing to
> make that many changes to the OpenBSD setup and can just add and remove
> VLANs from the managed switches as needed.
>
> > a couple of notes though:
> >
> > veb (and bridge) are not vlan aware. this means they will not scope the
> > mac addresses they learn by vlan ids, and apart from the link0 flag on
> > veb they don't let you filter vlans. if you want to control individual
> > vlans, create a veb for a specific networks and add vlan (or
> > egre/vxlan/etc) interfaces to it.
>
> That will not be necessary in this design but I appreciate that
> explanation. That makes sense. I won't need to filter at this level but
> can leave that to the switch.

ok, if you're just providing transit between switchports then etherip
and veb with link0 enabled should be enough.

> > it can be helpful to know the order of processing in the ethernet stack
> > for packets rxed on an interface, which is currently best documented
> > by comments in ether_input():
> >
> > * Process a received Ethernet packet.
> > *
> > * Ethernet input has several "phases" of filtering packets to
> > * support virtual/pseudo interfaces before actual layer 3 protocol
> > * handling.
> > *
> > * First phase:
> > *
> > * The first phase supports drivers that aggregate multiple Ethernet
> > * ports into a single logical interface, ie, aggr(4) and trunk(4).
> >
> > * Second phase: service delimited packet filtering.
> > *
> > * Let vlan(4) and svlan(4) look at "service delimited"
> > * packets. If a virtual interface does not exist to take
> > * those packets, they're returned to ether_input() so a
> > * bridge can have a go at forwarding them.
> >
> > * Third phase: bridge processing.
> > *
> > * Give the packet to a bridge interface, ie, bridge(4),
> > * veb(4), or tpmr(4), if it is configured. A bridge
> > * may take the packet and forward it to another port, or it
> > * may return it here to ether_input() to support local
> > * delivery to this port.
> >
> > * Fourth phase: drop service delimited packets.
> > *
> > * If the packet has a tag, and a bridge didn't want it,
> > * it's not for this port.
> >
> > * Fifth phase: destination address check.
> > *
> > * Is the packet specifically addressed to this port?
> >
> > * Sixth phase: protocol demux.
> > *
> > * At this point it is known that the packet is destined
> > * for layer 3 protocol handling on the local port.
>
> That's very helpful in understanding how this works. Thank you.
>
> I'm still not clear on exactly what protected accomplishes with veb(4).
> You mentioned that prevents loops but I don't understand how.
>
> Essentially, at this point, I think I can have etherip(4) links between
> each site maybe in a close to fully meshed layout particularly back to
> site A and, as long as I put the etherip(4) interfaces into the veb(4)
> as protected, I will not have loops? Is that a correct understanding of
> what you said?

it's about what happens when you have broadcast/multicast/unknown
unicast traffic in a full mesh topology.

if a broadcast packet enters the veb at site A, it will flood the packet
to the etherip links to both site B and site C. site B will then flood
the broadcast packets to it's physical port and the link to site C. site
C will then flood that broadcast packet to it's physical port and the
link to site A. site A will then flood the packet to it's physical port
and the link to site B, and so on.

putting the etherip links at each site in the same protected domain
prevents it flooding traffic from etherip links to other etherip links,
which should be unecessary because the site that got the original
broadcast traffic should have already flooded it to all sites anyway.

No comments:

Post a Comment