All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nikolay Aleksandrov <razor@blackwall.org>
To: Jonathan Toppins <jtoppins@redhat.com>, netdev@vger.kernel.org
Cc: toke@redhat.com, Long Xin <lxin@redhat.com>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Jay Vosburgh <j.vosburgh@gmail.com>,
	Veaceslav Falico <vfalico@gmail.com>,
	Andy Gospodarek <andy@greyhouse.net>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH net-next v3] bond: add mac filter option for balance-xor
Date: Sun, 15 May 2022 09:32:01 +0300	[thread overview]
Message-ID: <da6bbb3b-344c-f032-fe03-5e8c8ac3c388@blackwall.org> (raw)
In-Reply-To: <4c9db6ac-aa24-2ca2-3e44-18cfb23ac1bc@blackwall.org>

On 15/05/2022 00:41, Nikolay Aleksandrov wrote:
> On 13/05/2022 20:43, Jonathan Toppins wrote:
>> Implement a MAC filter that prevents duplicate frame delivery when
>> handling BUM traffic. This attempts to partially replicate OvS SLB
>> Bonding[1] like functionality without requiring significant change
>> in the Linux bridging code.
>>
>> A typical network setup for this feature would be:
>>
>>             .--------------------------------------------.
>>             |         .--------------------.             |
>>             |         |                    |             |
>>        .-------------------.               |             |
>>        |    | Bond 0  |    |               |             |
>>        | .--'---. .---'--. |               |             |
>>   .----|-| eth0 |-| eth1 |-|----.    .-----+----.   .----+------.
>>   |    | '------' '------' |    |    | Switch 1 |   | Switch 2  |
>>   |    '---,---------------'    |    |          +---+           |
>>   |       /                     |    '----+-----'   '----+------'
>>   |  .---'---.    .------.      |         |              |
>>   |  |  br0  |----| VM 1 |      |      ~~~~~~~~~~~~~~~~~~~~~
>>   |  '-------'    '------'      |     (                     )
>>   |      |        .------.      |     ( Rest of Network     )
>>   |      '--------| VM # |      |     (_____________________)
>>   |               '------'      |
>>   |  Host 1                     |
>>   '-----------------------------'
>>
>> Where 'VM1' and 'VM#' are hosts connected to a Linux bridge, br0, with
>> bond0 and its associated links, eth0 & eth1, provide ingress/egress. One
>> can assume bond0, br1, and hosts VM1 to VM# are all contained in a
>> single box, as depicted. Interfaces eth0 and eth1 provide redundant
>> connections to the data center with the requirement to use all bandwidth
>> when the system is functioning normally. Switch 1 and Switch 2 are
>> physical switches that do not implement any advanced L2 management
>> features such as MLAG, Cisco's VPC, or LACP.
>>
>> Combining this feature with vlan+srcmac hash policy allows a user to
>> create an access network without the need to use expensive switches that
>> support features like Cisco's VCP.
>>
>> [1] https://docs.openvswitch.org/en/latest/topics/bonding/#slb-bonding
>>
>> Co-developed-by: Long Xin <lxin@redhat.com>
>> Signed-off-by: Long Xin <lxin@redhat.com>
>> Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
>> ---
>>
>> Notes:
>>     v2:
>>      * dropped needless abstraction functions and put code in module init
>>      * renamed variable "rc" to "ret" to stay consistent with most of the
>>        code
>>      * fixed parameter setting management, when arp-monitor is turned on
>>        this feature will be turned off similar to how miimon and arp-monitor
>>        interact
>>      * renamed bond_xor_recv to bond_mac_filter_recv for a little more
>>        clarity
>>      * it appears the implied default return code for any bonding recv probe
>>        must be `RX_HANDLER_ANOTHER`. Changed the default return code of
>>        bond_mac_filter_recv to use this return value to not break skb
>>        processing when the skb dev is switched to the bond dev:
>>          `skb->dev = bond->dev`
>>     
>>     v3: Nik's comments
>>      * clarified documentation
>>      * fixed inline and basic reverse Christmas tree formatting
>>      * zero'ed entry in mac_create
>>      * removed read_lock taking in bond_mac_filter_recv
>>      * made has_expired() atomic and removed critical sections
>>        surrounding calls to has_expired(), this also removed the
>>        use-after-free that would have occurred:
>>            spin_lock_irqsave(&entry->lock, flags);
>>                if (has_expired(bond, entry))
>>                    mac_delete(bond, entry);
>>            spin_unlock_irqrestore(&entry->lock, flags); <---
>>      * moved init/destroy of mac_filter_tbl to bond_open/bond_close
>>        this removed the complex option dependencies, the only behavioural
>>        change the user will see is if the bond is up and mac_filter is
>>        enabled if they try and set arp_interval they will receive -EBUSY
>>      * in bond_changelink moved processing of mac_filter option just below
>>        mode processing
>>
>>  Documentation/networking/bonding.rst  |  20 +++
>>  drivers/net/bonding/Makefile          |   2 +-
>>  drivers/net/bonding/bond_mac_filter.c | 201 ++++++++++++++++++++++++++
>>  drivers/net/bonding/bond_mac_filter.h |  37 +++++
>>  drivers/net/bonding/bond_main.c       |  30 ++++
>>  drivers/net/bonding/bond_netlink.c    |  13 ++
>>  drivers/net/bonding/bond_options.c    |  81 +++++++++--
>>  drivers/net/bonding/bonding_priv.h    |   1 +
>>  include/net/bond_options.h            |   1 +
>>  include/net/bonding.h                 |   3 +
>>  include/uapi/linux/if_link.h          |   1 +
>>  11 files changed, 373 insertions(+), 17 deletions(-)
>>  create mode 100644 drivers/net/bonding/bond_mac_filter.c
>>  create mode 100644 drivers/net/bonding/bond_mac_filter.h
>>
> 
[snip]

The same problem solved using a few nftables rules (in case you don't want to load eBPF):
$ nft 'add table netdev nt'
$ nft 'add chain netdev nt bond0EgressFilter { type filter hook egress device bond0 priority 0; }'
$ nft 'add chain netdev nt bond0IngressFilter { type filter hook ingress device bond0 priority 0; }'
$ nft 'add set netdev nt macset { type ether_addr; flags timeout; }'
$ nft 'add rule netdev nt bond0EgressFilter set update ether saddr timeout 5s @macset'
$ nft 'add rule netdev nt bond0IngressFilter ether saddr @macset counter drop'

Cheers,
 Nik

  reply	other threads:[~2022-05-15  6:32 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-13 17:43 [PATCH net-next v3] bond: add mac filter option for balance-xor Jonathan Toppins
2022-05-14 21:41 ` Nikolay Aleksandrov
2022-05-15  6:32   ` Nikolay Aleksandrov [this message]
2022-05-16 14:06     ` Jonathan Toppins
2022-05-16 17:22       ` Nikolay Aleksandrov
2022-05-16 17:24         ` Nikolay Aleksandrov
2022-05-23 13:35     ` Jonathan Toppins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=da6bbb3b-344c-f032-fe03-5e8c8ac3c388@blackwall.org \
    --to=razor@blackwall.org \
    --cc=andy@greyhouse.net \
    --cc=corbet@lwn.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=j.vosburgh@gmail.com \
    --cc=jtoppins@redhat.com \
    --cc=kuba@kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lxin@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=toke@redhat.com \
    --cc=vfalico@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.