From: Tobias Waldekranz <tobias@waldekranz.com> To: "Huang\, Joseph" <Joseph.Huang@garmin.com>, Nikolay Aleksandrov <nikolay@nvidia.com>, Roopa Prabhu <roopa@nvidia.com>, "David S. Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>, "bridge\@lists.linux-foundation.org" <bridge@lists.linux-foundation.org>, "netdev\@vger.kernel.org" <netdev@vger.kernel.org>, "linux-kernel\@vger.kernel.org" <linux-kernel@vger.kernel.org>, Ido Schimmel <idosch@idosch.org> Subject: Re: [PATCH net 0/6] bridge: Fix snooping in multi-bridge config with switchdev Date: Wed, 05 May 2021 08:52:21 +0200 [thread overview] Message-ID: <87sg31n04a.fsf@waldekranz.com> (raw) In-Reply-To: <82693dbedd524f94b5a6223f0287525c@garmin.com> On Tue, May 04, 2021 at 23:26, "Huang, Joseph" <Joseph.Huang@garmin.com> wrote: >> If I may make a suggestion: I also work with mv88e6xxx systems, and we >> have the same issues with known multicast not being flooded to router >> ports. Knowing that chipset, I see what you are trying to do. >> >> But other chips may work differently. Imagine for example a switch where >> there is a separate vector of router ports that the hardware can OR in after >> looking up the group in the ATU. This implementation would render the >> performance gains possible on that device useless. As another example, you >> could imagine a device where an ATU operation exists that sets a bit in the >> vector of every group in a particular database; instead of having to update >> each entry individually. >> >> I think we (mv88e6xxx) will have to accept that we need to add the proper >> scaffolding to manage this on the driver side. That way the bridge can stay >> generic. The bridge could just provide some MDB iterator to save us from >> having to cache all the configured groups. >> >> So basically: >> >> - In mv88e6xxx, maintain a per-switch vector of router ports. >> >> - When a ports router state is toggled: >> 1. Update the vector. >> 2. Ask the bridge to iterate through all applicable groups and update >> the corresponding ATU entries. >> >> - When a new MDB entry is updated, make sure to also OR in the current >> vector of router ports in the DPV of the ATU entry. >> >> >> I would be happy to help out with testing of this! > > Thanks for the suggestion/offer! > > What patch 0002 does is that: > > - When an mrouter port is added/deleted, it iterates over the list of mdb's > to add/delete that port to/from the group in the hardware (I think this is > what your bullet #2 does as well, except that one is done in the bridge, > and the other is done in the driver) > > - When a group is added/deleted, it iterates over the list of mrouter ports > to add/delete the switchdev programming > > I think what Nik is objecting to is that with this approach, there's now > a for-loop in the call paths (thus it "increases the complexity with 1 order > of magnitude), however I can't think of a way to avoid the looping (whether > done inside the bridge or in the driver) but still achieve the same result > (for Marvell at least). (I will stop trying to read Nikolay's mind and go forward with my own opinions now :)) The problem with solving this at the bridge layer is that you miss out on optimizations that are available at lower layers. As an example: br0 / | \ swp0 swp1 swp2 (R) (R) With two router ports, any new group added/removed to/from swp0 would incur 3 individual ATU operations: First adding swp0, then each router port individually in your loop. If you have the vector prepared in the driver, you can batch them together in one operation. This also atomically transitions the group from unknown to known without disrupting any streams towards a router. In the bridge-layer solution, flows will still be blocked in the (admittedly small) window between adding swp0 and swp{1,2}. > I suspect that other SOHO switches might have this problem as well (Broadcom > comes to mind). I suspect you are right. That is why I suggested implementing the iterator in the bridge that can be reused by all drivers. Something along the lines of br_fdb_replay. The rest should mostly be hardware specific anyway.
WARNING: multiple messages have this Message-ID (diff)
From: Tobias Waldekranz <tobias@waldekranz.com> To: "Huang, Joseph" <Joseph.Huang@garmin.com>, Nikolay Aleksandrov <nikolay@nvidia.com>, Roopa Prabhu <roopa@nvidia.com>, "David S. Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>, "bridge@lists.linux-foundation.org" <bridge@lists.linux-foundation.org>, "netdev@vger.kernel.org" <netdev@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, Ido Schimmel <idosch@idosch.org> Subject: Re: [Bridge] [PATCH net 0/6] bridge: Fix snooping in multi-bridge config with switchdev Date: Wed, 05 May 2021 08:52:21 +0200 [thread overview] Message-ID: <87sg31n04a.fsf@waldekranz.com> (raw) In-Reply-To: <82693dbedd524f94b5a6223f0287525c@garmin.com> On Tue, May 04, 2021 at 23:26, "Huang, Joseph" <Joseph.Huang@garmin.com> wrote: >> If I may make a suggestion: I also work with mv88e6xxx systems, and we >> have the same issues with known multicast not being flooded to router >> ports. Knowing that chipset, I see what you are trying to do. >> >> But other chips may work differently. Imagine for example a switch where >> there is a separate vector of router ports that the hardware can OR in after >> looking up the group in the ATU. This implementation would render the >> performance gains possible on that device useless. As another example, you >> could imagine a device where an ATU operation exists that sets a bit in the >> vector of every group in a particular database; instead of having to update >> each entry individually. >> >> I think we (mv88e6xxx) will have to accept that we need to add the proper >> scaffolding to manage this on the driver side. That way the bridge can stay >> generic. The bridge could just provide some MDB iterator to save us from >> having to cache all the configured groups. >> >> So basically: >> >> - In mv88e6xxx, maintain a per-switch vector of router ports. >> >> - When a ports router state is toggled: >> 1. Update the vector. >> 2. Ask the bridge to iterate through all applicable groups and update >> the corresponding ATU entries. >> >> - When a new MDB entry is updated, make sure to also OR in the current >> vector of router ports in the DPV of the ATU entry. >> >> >> I would be happy to help out with testing of this! > > Thanks for the suggestion/offer! > > What patch 0002 does is that: > > - When an mrouter port is added/deleted, it iterates over the list of mdb's > to add/delete that port to/from the group in the hardware (I think this is > what your bullet #2 does as well, except that one is done in the bridge, > and the other is done in the driver) > > - When a group is added/deleted, it iterates over the list of mrouter ports > to add/delete the switchdev programming > > I think what Nik is objecting to is that with this approach, there's now > a for-loop in the call paths (thus it "increases the complexity with 1 order > of magnitude), however I can't think of a way to avoid the looping (whether > done inside the bridge or in the driver) but still achieve the same result > (for Marvell at least). (I will stop trying to read Nikolay's mind and go forward with my own opinions now :)) The problem with solving this at the bridge layer is that you miss out on optimizations that are available at lower layers. As an example: br0 / | \ swp0 swp1 swp2 (R) (R) With two router ports, any new group added/removed to/from swp0 would incur 3 individual ATU operations: First adding swp0, then each router port individually in your loop. If you have the vector prepared in the driver, you can batch them together in one operation. This also atomically transitions the group from unknown to known without disrupting any streams towards a router. In the bridge-layer solution, flows will still be blocked in the (admittedly small) window between adding swp0 and swp{1,2}. > I suspect that other SOHO switches might have this problem as well (Broadcom > comes to mind). I suspect you are right. That is why I suggested implementing the iterator in the bridge that can be reused by all drivers. Something along the lines of br_fdb_replay. The rest should mostly be hardware specific anyway.
next prev parent reply other threads:[~2021-05-05 6:52 UTC|newest] Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-05-04 18:22 [PATCH net 0/6] bridge: Fix snooping in multi-bridge config with switchdev Joseph Huang 2021-05-04 18:22 ` [Bridge] " Joseph Huang 2021-05-04 18:22 ` [PATCH 1/6] bridge: Refactor br_mdb_notify Joseph Huang 2021-05-04 18:22 ` [Bridge] " Joseph Huang 2021-05-04 18:22 ` [PATCH 2/6] bridge: Offload mrouter port forwarding to switchdev Joseph Huang 2021-05-04 18:22 ` [Bridge] " Joseph Huang 2021-05-04 18:22 ` [PATCH 3/6] bridge: Avoid traffic disruption when Querier state changes Joseph Huang 2021-05-04 18:22 ` [Bridge] " Joseph Huang 2021-05-04 18:22 ` [PATCH 4/6] bridge: Force mcast_flooding for mrouter ports Joseph Huang 2021-05-04 18:22 ` [Bridge] " Joseph Huang 2021-05-04 18:22 ` [PATCH 5/6] bridge: Flood Queries even when mcast_flood is disabled Joseph Huang 2021-05-04 18:22 ` [Bridge] " Joseph Huang 2021-05-04 18:22 ` [PATCH 6/6] bridge: Always multicast_flood Reports Joseph Huang 2021-05-04 18:22 ` [Bridge] " Joseph Huang 2021-05-04 20:05 ` [PATCH net 0/6] bridge: Fix snooping in multi-bridge config with switchdev Nikolay Aleksandrov 2021-05-04 20:05 ` [Bridge] " Nikolay Aleksandrov 2021-05-04 20:37 ` Huang, Joseph 2021-05-04 20:37 ` [Bridge] " Huang, Joseph 2021-05-04 22:29 ` Tobias Waldekranz 2021-05-04 22:29 ` [Bridge] " Tobias Waldekranz 2021-05-04 23:26 ` Huang, Joseph 2021-05-04 23:26 ` [Bridge] " Huang, Joseph 2021-05-05 6:52 ` Tobias Waldekranz [this message] 2021-05-05 6:52 ` Tobias Waldekranz 2021-05-05 6:59 ` Nikolay Aleksandrov 2021-05-05 6:59 ` [Bridge] " Nikolay Aleksandrov
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=87sg31n04a.fsf@waldekranz.com \ --to=tobias@waldekranz.com \ --cc=Joseph.Huang@garmin.com \ --cc=bridge@lists.linux-foundation.org \ --cc=davem@davemloft.net \ --cc=idosch@idosch.org \ --cc=kuba@kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=netdev@vger.kernel.org \ --cc=nikolay@nvidia.com \ --cc=roopa@nvidia.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.