Tuesday, September 17, 2024

Re: vxlan(4) Between Three Sites

On Mon, Sep 16, 2024 at 09:57:18PM -0700, Bryan Vyhmeister wrote:
> On Tue, Sep 17, 2024 at 02:31:09PM +1000, David Gwynne wrote:
> >
> > On Mon, Sep 16, 2024 at 12:25:35PM -0700, Bryan Vyhmeister wrote:
> > > I am attempting to build a proof of concept of how to use vxlan(4)
> > > on OpenBSD in a fully meshed OSPF network with [wireless] links
> > > between sites under my full control so mtu is not an issue (mtu 1550
> > > for vxlan0 and mtu 1600 or higher for hardware interfaces). The goal
> > > is to bridge a group of VLANs between sites A, B, and C.
> <snip>
> >
> > vxlan(4) in learning mode relies on a single multicast capable
> > underlay network between all sites/points. if you are using separate
> > interfaces on A to talk to B and C, then this requirement isn't
> > satisfied.
> >
> > i dont know enough about multicast routing to know if or how i should
> > support vxlan in learning mode with routes to multiple interfaces.
>
> Thanks for your response. That makes sense then if that is how things
> are underneath. I'm not that familiar with how multicast routing works
> either but that does appear to be how commercial vendors'
> implementations work from what I have read.

they rely on routes?

> > > I also tried using a WireGuard overlay on top of this network. With
> > > wg0 as the parent but that does not seem to work either in vxlan(4)
> > > learning mode unless I am missing something.
> >
> > wireguard as an underlay for vxlan in learning mode doesn't work
> > because wg isn't multicast capable. the cryptokey routing thing doesnt
> > support sending a packet destined to a single address (eg, 239.0.0.1)
> > to multiple peers (ie, B and C).
>
> I was testing BGP over tunnels and noticed that ospf6d will not function
> over wg(4) either.

wg is neither multicast or point-to-point, and it completely ignored
existing point to multipoint semantics. so yeah. it feels pretty clumsy
when you try to do interesting stuff beyond what it was specifically
created for.

>
> > > The other possible solution that I believe I tested and works is to
> > > have a vxlan0 between sites A and B and then a vxlan1 between sites
> > > A and C and then use veb(4) to bridge vxlan0, vxlan1, and whatever
> > > the hardware interface is together. This seems to defeat the purpose
> > > of using vxlan(4) to begin with and is not ideal for traffic between
> > > sites B and C unless I missed something.
> >
> > this last one is pretty good,
> >
> > veb and vxlan in learning mode actually use the same "etherbridge"
> > code internally, the main difference between them is what endpoints
> > they learn and associate with Ethernet addresses. veb associates
> > Ethernet addresses with the interfaces added as ports to the bridge,
> > while vxlan associates Ethernet addresses with the IP addresses of
> > peers.
> >
> > with veb bridging tunnels together, the tunnel interfaces basically
> > act as proxies for the ip tunnel enpoints in the bridge.
> >
> > i would just add ethernet tunnels between B and C so they can talk
> > directly too. you will probably have to add them to the same protected
> > bridge domain to avoid loops, which is discussed a bit in the mpw
> > manpage examples a bit.
>
> I will test that and see if it works to my satisfaction. I had not come
> across this "protected bridge domain" or at least I ignored it when
> reading through mpw(4). Would it be better to use etherip(4) or egre(4)
> (I want VLAN support) rather than vxlan(4) between the endpoints based
> on what you're saying? Would I add only the vxlan(4) or egre(4)
> interfaces as protected and not the bridged ethernet hardware interface
> to the switch or should all be configured as protected. In the mpw(4)
> example, only the mpw(4) interfaces are added as protected and not the
> ethernet interface itself. Thank you for taking the time to get back to
> me.

openbsd lets you combine vlans and bridges/vebs/tpmr and tunnels in
pretty arbitrary ways. there's advantages to doing everything in
software sometimes.

etherip(4) is the lowest overhead ethernet over ip tunnel interface, but
you can only have one etherip tunnel between 2 endpoints. you can add
vlans on top of etherip, or you can use egre/vxlan/etc with different
vnetids instead.

a couple of notes though:

veb (and bridge) are not vlan aware. this means they will not scope the
mac addresses they learn by vlan ids, and apart from the link0 flag on
veb they don't let you filter vlans. if you want to control individual
vlans, create a veb for a specific networks and add vlan (or
egre/vxlan/etc) interfaces to it.

it can be helpful to know the order of processing in the ethernet stack
for packets rxed on an interface, which is currently best documented
by comments in ether_input():

* Process a received Ethernet packet.
*
* Ethernet input has several "phases" of filtering packets to
* support virtual/pseudo interfaces before actual layer 3 protocol
* handling.
*
* First phase:
*
* The first phase supports drivers that aggregate multiple Ethernet
* ports into a single logical interface, ie, aggr(4) and trunk(4).

* Second phase: service delimited packet filtering.
*
* Let vlan(4) and svlan(4) look at "service delimited"
* packets. If a virtual interface does not exist to take
* those packets, they're returned to ether_input() so a
* bridge can have a go at forwarding them.

* Third phase: bridge processing.
*
* Give the packet to a bridge interface, ie, bridge(4),
* veb(4), or tpmr(4), if it is configured. A bridge
* may take the packet and forward it to another port, or it
* may return it here to ether_input() to support local
* delivery to this port.

* Fourth phase: drop service delimited packets.
*
* If the packet has a tag, and a bridge didn't want it,
* it's not for this port.

* Fifth phase: destination address check.
*
* Is the packet specifically addressed to this port?

* Sixth phase: protocol demux.
*
* At this point it is known that the packet is destined
* for layer 3 protocol handling on the local port.

No comments:

Post a Comment