All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tobias Waldekranz <tobias@waldekranz.com>
To: "Huang\, Joseph" <Joseph.Huang@garmin.com>,
	Nikolay Aleksandrov <nikolay@nvidia.com>,
	Roopa Prabhu <roopa@nvidia.com>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	"bridge\@lists.linux-foundation.org" 
	<bridge@lists.linux-foundation.org>,
	"netdev\@vger.kernel.org" <netdev@vger.kernel.org>,
	"linux-kernel\@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Ido Schimmel <idosch@idosch.org>
Subject: Re: [PATCH net 0/6] bridge: Fix snooping in multi-bridge config with switchdev
Date: Wed, 05 May 2021 08:52:21 +0200	[thread overview]
Message-ID: <87sg31n04a.fsf@waldekranz.com> (raw)
In-Reply-To: <82693dbedd524f94b5a6223f0287525c@garmin.com>

On Tue, May 04, 2021 at 23:26, "Huang, Joseph" <Joseph.Huang@garmin.com> wrote:
>> If I may make a suggestion: I also work with mv88e6xxx systems, and we
>> have the same issues with known multicast not being flooded to router
>> ports. Knowing that chipset, I see what you are trying to do.
>> 
>> But other chips may work differently. Imagine for example a switch where
>> there is a separate vector of router ports that the hardware can OR in after
>> looking up the group in the ATU. This implementation would render the
>> performance gains possible on that device useless. As another example, you
>> could imagine a device where an ATU operation exists that sets a bit in the
>> vector of every group in a particular database; instead of having to update
>> each entry individually.
>> 
>> I think we (mv88e6xxx) will have to accept that we need to add the proper
>> scaffolding to manage this on the driver side. That way the bridge can stay
>> generic. The bridge could just provide some MDB iterator to save us from
>> having to cache all the configured groups.
>> 
>> So basically:
>> 
>> - In mv88e6xxx, maintain a per-switch vector of router ports.
>> 
>> - When a ports router state is toggled:
>>   1. Update the vector.
>>   2. Ask the bridge to iterate through all applicable groups and update
>>      the corresponding ATU entries.
>> 
>> - When a new MDB entry is updated, make sure to also OR in the current
>>   vector of router ports in the DPV of the ATU entry.
>> 
>> 
>> I would be happy to help out with testing of this!
>
> Thanks for the suggestion/offer!
>
> What patch 0002 does is that:
>
> - When an mrouter port is added/deleted, it iterates over the list of mdb's
>   to add/delete that port to/from the group in the hardware (I think this is
>   what your bullet #2 does as well, except that one is done in the bridge,
>   and the other is done in the driver)
>
> - When a group is added/deleted, it iterates over the list of mrouter ports
>   to add/delete the switchdev programming
>
> I think what Nik is objecting to is that with this approach, there's now
> a for-loop in the call paths (thus it "increases the complexity with 1 order
> of magnitude), however I can't think of a way to avoid the looping (whether
> done inside the bridge or in the driver) but still achieve the same result
> (for Marvell at least).

(I will stop trying to read Nikolay's mind and go forward with my own
opinions now :))

The problem with solving this at the bridge layer is that you miss out
on optimizations that are available at lower layers. As an example:

      br0
    /  |  \
swp0 swp1 swp2
     (R)  (R)

With two router ports, any new group added/removed to/from swp0 would
incur 3 individual ATU operations: First adding swp0, then each router
port individually in your loop. If you have the vector prepared in the
driver, you can batch them together in one operation.

This also atomically transitions the group from unknown to known without
disrupting any streams towards a router. In the bridge-layer solution,
flows will still be blocked in the (admittedly small) window between
adding swp0 and swp{1,2}.

> I suspect that other SOHO switches might have this problem as well (Broadcom
> comes to mind).

I suspect you are right. That is why I suggested implementing the
iterator in the bridge that can be reused by all drivers. Something
along the lines of br_fdb_replay. The rest should mostly be hardware
specific anyway.

WARNING: multiple messages have this Message-ID (diff)
From: Tobias Waldekranz <tobias@waldekranz.com>
To: "Huang, Joseph" <Joseph.Huang@garmin.com>,
	Nikolay Aleksandrov <nikolay@nvidia.com>,
	Roopa Prabhu <roopa@nvidia.com>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	"bridge@lists.linux-foundation.org"
	<bridge@lists.linux-foundation.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Ido Schimmel <idosch@idosch.org>
Subject: Re: [Bridge] [PATCH net 0/6] bridge: Fix snooping in multi-bridge config with switchdev
Date: Wed, 05 May 2021 08:52:21 +0200	[thread overview]
Message-ID: <87sg31n04a.fsf@waldekranz.com> (raw)
In-Reply-To: <82693dbedd524f94b5a6223f0287525c@garmin.com>

On Tue, May 04, 2021 at 23:26, "Huang, Joseph" <Joseph.Huang@garmin.com> wrote:
>> If I may make a suggestion: I also work with mv88e6xxx systems, and we
>> have the same issues with known multicast not being flooded to router
>> ports. Knowing that chipset, I see what you are trying to do.
>> 
>> But other chips may work differently. Imagine for example a switch where
>> there is a separate vector of router ports that the hardware can OR in after
>> looking up the group in the ATU. This implementation would render the
>> performance gains possible on that device useless. As another example, you
>> could imagine a device where an ATU operation exists that sets a bit in the
>> vector of every group in a particular database; instead of having to update
>> each entry individually.
>> 
>> I think we (mv88e6xxx) will have to accept that we need to add the proper
>> scaffolding to manage this on the driver side. That way the bridge can stay
>> generic. The bridge could just provide some MDB iterator to save us from
>> having to cache all the configured groups.
>> 
>> So basically:
>> 
>> - In mv88e6xxx, maintain a per-switch vector of router ports.
>> 
>> - When a ports router state is toggled:
>>   1. Update the vector.
>>   2. Ask the bridge to iterate through all applicable groups and update
>>      the corresponding ATU entries.
>> 
>> - When a new MDB entry is updated, make sure to also OR in the current
>>   vector of router ports in the DPV of the ATU entry.
>> 
>> 
>> I would be happy to help out with testing of this!
>
> Thanks for the suggestion/offer!
>
> What patch 0002 does is that:
>
> - When an mrouter port is added/deleted, it iterates over the list of mdb's
>   to add/delete that port to/from the group in the hardware (I think this is
>   what your bullet #2 does as well, except that one is done in the bridge,
>   and the other is done in the driver)
>
> - When a group is added/deleted, it iterates over the list of mrouter ports
>   to add/delete the switchdev programming
>
> I think what Nik is objecting to is that with this approach, there's now
> a for-loop in the call paths (thus it "increases the complexity with 1 order
> of magnitude), however I can't think of a way to avoid the looping (whether
> done inside the bridge or in the driver) but still achieve the same result
> (for Marvell at least).

(I will stop trying to read Nikolay's mind and go forward with my own
opinions now :))

The problem with solving this at the bridge layer is that you miss out
on optimizations that are available at lower layers. As an example:

      br0
    /  |  \
swp0 swp1 swp2
     (R)  (R)

With two router ports, any new group added/removed to/from swp0 would
incur 3 individual ATU operations: First adding swp0, then each router
port individually in your loop. If you have the vector prepared in the
driver, you can batch them together in one operation.

This also atomically transitions the group from unknown to known without
disrupting any streams towards a router. In the bridge-layer solution,
flows will still be blocked in the (admittedly small) window between
adding swp0 and swp{1,2}.

> I suspect that other SOHO switches might have this problem as well (Broadcom
> comes to mind).

I suspect you are right. That is why I suggested implementing the
iterator in the bridge that can be reused by all drivers. Something
along the lines of br_fdb_replay. The rest should mostly be hardware
specific anyway.

  reply	other threads:[~2021-05-05  6:52 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-04 18:22 [PATCH net 0/6] bridge: Fix snooping in multi-bridge config with switchdev Joseph Huang
2021-05-04 18:22 ` [Bridge] " Joseph Huang
2021-05-04 18:22 ` [PATCH 1/6] bridge: Refactor br_mdb_notify Joseph Huang
2021-05-04 18:22   ` [Bridge] " Joseph Huang
2021-05-04 18:22 ` [PATCH 2/6] bridge: Offload mrouter port forwarding to switchdev Joseph Huang
2021-05-04 18:22   ` [Bridge] " Joseph Huang
2021-05-04 18:22 ` [PATCH 3/6] bridge: Avoid traffic disruption when Querier state changes Joseph Huang
2021-05-04 18:22   ` [Bridge] " Joseph Huang
2021-05-04 18:22 ` [PATCH 4/6] bridge: Force mcast_flooding for mrouter ports Joseph Huang
2021-05-04 18:22   ` [Bridge] " Joseph Huang
2021-05-04 18:22 ` [PATCH 5/6] bridge: Flood Queries even when mcast_flood is disabled Joseph Huang
2021-05-04 18:22   ` [Bridge] " Joseph Huang
2021-05-04 18:22 ` [PATCH 6/6] bridge: Always multicast_flood Reports Joseph Huang
2021-05-04 18:22   ` [Bridge] " Joseph Huang
2021-05-04 20:05 ` [PATCH net 0/6] bridge: Fix snooping in multi-bridge config with switchdev Nikolay Aleksandrov
2021-05-04 20:05   ` [Bridge] " Nikolay Aleksandrov
2021-05-04 20:37   ` Huang, Joseph
2021-05-04 20:37     ` [Bridge] " Huang, Joseph
2021-05-04 22:29     ` Tobias Waldekranz
2021-05-04 22:29       ` [Bridge] " Tobias Waldekranz
2021-05-04 23:26       ` Huang, Joseph
2021-05-04 23:26         ` [Bridge] " Huang, Joseph
2021-05-05  6:52         ` Tobias Waldekranz [this message]
2021-05-05  6:52           ` Tobias Waldekranz
2021-05-05  6:59         ` Nikolay Aleksandrov
2021-05-05  6:59           ` [Bridge] " Nikolay Aleksandrov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87sg31n04a.fsf@waldekranz.com \
    --to=tobias@waldekranz.com \
    --cc=Joseph.Huang@garmin.com \
    --cc=bridge@lists.linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=idosch@idosch.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=nikolay@nvidia.com \
    --cc=roopa@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.