netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ido Schimmel <idosch@idosch.org>
To: Vladimir Oltean <olteanv@gmail.com>
Cc: Vladimir Oltean <vladimir.oltean@nxp.com>,
	netdev@vger.kernel.org, Jakub Kicinski <kuba@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Roopa Prabhu <roopa@nvidia.com>,
	Nikolay Aleksandrov <nikolay@nvidia.com>,
	Andrew Lunn <andrew@lunn.ch>,
	Florian Fainelli <f.fainelli@gmail.com>,
	Vivien Didelot <vivien.didelot@gmail.com>,
	Vadym Kochan <vkochan@marvell.com>,
	Taras Chornyi <tchornyi@marvell.com>,
	Jiri Pirko <jiri@nvidia.com>, Ido Schimmel <idosch@nvidia.com>,
	UNGLinuxDriver@microchip.com,
	Grygorii Strashko <grygorii.strashko@ti.com>,
	Marek Behun <kabel@blackhole.sk>,
	DENG Qingfang <dqfext@gmail.com>,
	Kurt Kanzenbach <kurt@linutronix.de>,
	Hauke Mehrtens <hauke@hauke-m.de>,
	Woojung Huh <woojung.huh@microchip.com>,
	Sean Wang <sean.wang@mediatek.com>,
	Landen Chao <Landen.Chao@mediatek.com>,
	Claudiu Manoil <claudiu.manoil@nxp.com>,
	Alexandre Belloni <alexandre.belloni@bootlin.com>,
	George McCollister <george.mccollister@gmail.com>,
	Ioana Ciornei <ioana.ciornei@nxp.com>,
	Saeed Mahameed <saeedm@nvidia.com>,
	Leon Romanovsky <leon@kernel.org>,
	Lars Povlsen <lars.povlsen@microchip.com>,
	Steen Hegelund <Steen.Hegelund@microchip.com>,
	Julian Wiedmann <jwi@linux.ibm.com>,
	Karsten Graul <kgraul@linux.ibm.com>,
	Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Ivan Vecera <ivecera@redhat.com>, Vlad Buslov <vladbu@nvidia.com>,
	Jianbo Liu <jianbol@nvidia.com>, Mark Bloch <mbloch@nvidia.com>,
	Roi Dayan <roid@nvidia.com>,
	Tobias Waldekranz <tobias@waldekranz.com>,
	Vignesh Raghavendra <vigneshr@ti.com>,
	Jesse Brandeburg <jesse.brandeburg@intel.com>
Subject: Re: [PATCH v2 net-next 0/5] Make SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE blocking
Date: Sun, 22 Aug 2021 10:19:14 +0300	[thread overview]
Message-ID: <YSH6ckM65582PB3P@shredder> (raw)
In-Reply-To: <20210821190914.dkrjtcbn277m67bk@skbuf>

On Sat, Aug 21, 2021 at 10:09:14PM +0300, Vladimir Oltean wrote:
> On Fri, Aug 20, 2021 at 07:11:15PM +0300, Ido Schimmel wrote:
> > On Fri, Aug 20, 2021 at 01:49:48PM +0300, Vladimir Oltean wrote:
> > > On Fri, Aug 20, 2021 at 12:16:10PM +0300, Ido Schimmel wrote:
> > > > On Thu, Aug 19, 2021 at 07:07:18PM +0300, Vladimir Oltean wrote:
> > > > > Problem statement:
> > > > >
> > > > > Any time a driver needs to create a private association between a bridge
> > > > > upper interface and use that association within its
> > > > > SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE handler, we have an issue with FDB
> > > > > entries deleted by the bridge when the port leaves. The issue is that
> > > > > all switchdev drivers schedule a work item to have sleepable context,
> > > > > and that work item can be actually scheduled after the port has left the
> > > > > bridge, which means the association might have already been broken by
> > > > > the time the scheduled FDB work item attempts to use it.
> > > >
> > > > This is handled in mlxsw by telling the device to flush the FDB entries
> > > > pointing to the {port, FID} when the VLAN is deleted (synchronously).
> > > 
> > > If you have FDB entries pointing to bridge ports that are foreign
> > > interfaces and you offload them, do you catch the VLAN deletion on the
> > > foreign port and flush your entries towards it at that time?
> > 
> > Yes, that's how VXLAN offload works. VLAN addition is used to determine
> > the mapping between VNI and VLAN.
> 
> I was only able to follow as far as:
> 
> mlxsw_sp_switchdev_blocking_event
> -> mlxsw_sp_switchdev_handle_vxlan_obj_del
>    -> mlxsw_sp_switchdev_vxlan_vlans_del
>       -> mlxsw_sp_switchdev_vxlan_vlan_del
>          -> ??? where are the FDB entries flushed?

 mlxsw_sp_switchdev_blocking_event
 -> mlxsw_sp_switchdev_handle_vxlan_obj_del
    -> mlxsw_sp_switchdev_vxlan_vlans_del
       -> mlxsw_sp_switchdev_vxlan_vlan_del
          -> mlxsw_sp_bridge_vxlan_leave
	     -> mlxsw_sp_nve_fid_disable
	        -> mlxsw_sp_nve_fdb_flush_by_fid

> 
> I was expecting to see something along the lines of
> 
> mlxsw_sp_switchdev_blocking_event
> -> mlxsw_sp_port_vlans_del
>    -> mlxsw_sp_bridge_port_vlan_del
>       -> mlxsw_sp_port_vlan_bridge_leave
>          -> mlxsw_sp_bridge_port_fdb_flush
> 
> but that is exactly on the other branch of the "if (netif_is_vxlan(dev))"
> condition (and also, mlxsw_sp_bridge_port_fdb_flush flushes an externally-facing
> port, not really what I needed to know, see below).
> 
> Anyway, it also seems to me that we are referring to slightly different
> things by "foreign" interfaces. To me, a "foreign" interface is one
> towards which there is no hardware data path. Like for example if you
> have a mlxsw port in a plain L2 bridge with an Intel card. The data path
> is the CPU and that was my question: do you track FDB entries towards
> those interfaces (implicitly: towards the CPU)? You've answered about
> VXLAN, which is quite not "foreign" in the sense I am thinking about,
> because mlxsw does have a hardware data path towards a VXLAN interface
> (as you've mentioned, it associates a VID with each VNI).
> 
> I've been searching through the mlxsw driver and I don't see that this
> is being done, so I'm guessing you might wonder/ask why you would want
> to do that in the first place. If you bridge a mlxsw port with an Intel
> card, then (from another thread where you've said that mlxsw always
> injects control packets where hardware learning is not performed) my
> guess is that the MAC addresses learned on the Intel bridge port will
> never be learned on the mlxsw device. So every packet that ingresses the
> mlxsw and must egress the Intel card will reach the CPU through flooding
> (and will consequently be flooded in the entire broadcast domain of the
> mlxsw side of the bridge). Right?

I can see how this use case makes sense on systems where the difference
in performance between the ASIC and the CPU is not huge, but it doesn't
make much sense with Spectrum and I have yet to get requests to support
it (might change). Keep in mind that Spectrum is able to forward several
Bpps with a switching capacity of several Tbps. It is usually connected
to a weak CPU (e.g., low-end ARM, Intel Atom) through a PCI bus with a
bandwidth of several Gbps. There is usually one "Intel card" on such
systems which is connected to the management network that is separated
from the data plane network.

If we were to support it, FDB entries towards "foreign" interfaces would
be programmed to trap packets to the CPU. For now, for correctness /
rigor purposes, I would prefer simply returning an error / warning via
extack when such topologies are configured.

      reply	other threads:[~2021-08-22  7:19 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-19 16:07 [PATCH v2 net-next 0/5] Make SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE blocking Vladimir Oltean
2021-08-19 16:07 ` [PATCH v2 net-next 1/5] net: switchdev: move SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE to the blocking notifier chain Vladimir Oltean
2021-08-19 18:15   ` Vlad Buslov
2021-08-19 23:18     ` Vladimir Oltean
2021-08-20  7:36       ` Vlad Buslov
2021-08-19 16:07 ` [PATCH v2 net-next 2/5] net: bridge: switchdev: make br_fdb_replay offer sleepable context to consumers Vladimir Oltean
2021-08-19 16:07 ` [PATCH v2 net-next 3/5] net: switchdev: drop the atomic notifier block from switchdev_bridge_port_{,un}offload Vladimir Oltean
2021-08-19 16:07 ` [PATCH v2 net-next 4/5] net: switchdev: don't assume RCU context in switchdev_handle_fdb_{add,del}_to_device Vladimir Oltean
2021-08-19 16:07 ` [PATCH v2 net-next 5/5] net: dsa: handle SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE synchronously Vladimir Oltean
2021-08-20  9:16 ` [PATCH v2 net-next 0/5] Make SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE blocking Ido Schimmel
2021-08-20  9:37   ` Vladimir Oltean
2021-08-20 16:09     ` Ido Schimmel
2021-08-20 17:06       ` Vladimir Oltean
2021-08-20 23:36         ` Nikolay Aleksandrov
2021-08-21  0:22           ` Vladimir Oltean
2021-08-22  6:48           ` Ido Schimmel
2021-08-22  9:12             ` Nikolay Aleksandrov
2021-08-22 13:31               ` Vladimir Oltean
2021-08-22 17:06                 ` Ido Schimmel
2021-08-22 17:44                   ` Vladimir Oltean
2021-08-23 10:47                     ` Ido Schimmel
2021-08-23 11:00                       ` Vladimir Oltean
2021-08-23 12:16                         ` Ido Schimmel
2021-08-23 14:29                           ` Vladimir Oltean
2021-08-23 15:18                             ` Ido Schimmel
2021-08-23 15:42                               ` Nikolay Aleksandrov
2021-08-23 15:42                               ` Vladimir Oltean
2021-08-23 16:02                                 ` Ido Schimmel
2021-08-23 16:11                                   ` Vladimir Oltean
2021-08-23 16:23                                   ` Vladimir Oltean
2021-08-20 10:49   ` Vladimir Oltean
2021-08-20 16:11     ` Ido Schimmel
2021-08-21 19:09       ` Vladimir Oltean
2021-08-22  7:19         ` Ido Schimmel [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YSH6ckM65582PB3P@shredder \
    --to=idosch@idosch.org \
    --cc=Landen.Chao@mediatek.com \
    --cc=Steen.Hegelund@microchip.com \
    --cc=UNGLinuxDriver@microchip.com \
    --cc=alexandre.belloni@bootlin.com \
    --cc=andrew@lunn.ch \
    --cc=borntraeger@de.ibm.com \
    --cc=claudiu.manoil@nxp.com \
    --cc=davem@davemloft.net \
    --cc=dqfext@gmail.com \
    --cc=f.fainelli@gmail.com \
    --cc=george.mccollister@gmail.com \
    --cc=gor@linux.ibm.com \
    --cc=grygorii.strashko@ti.com \
    --cc=hauke@hauke-m.de \
    --cc=hca@linux.ibm.com \
    --cc=idosch@nvidia.com \
    --cc=ioana.ciornei@nxp.com \
    --cc=ivecera@redhat.com \
    --cc=jesse.brandeburg@intel.com \
    --cc=jianbol@nvidia.com \
    --cc=jiri@nvidia.com \
    --cc=jwi@linux.ibm.com \
    --cc=kabel@blackhole.sk \
    --cc=kgraul@linux.ibm.com \
    --cc=kuba@kernel.org \
    --cc=kurt@linutronix.de \
    --cc=lars.povlsen@microchip.com \
    --cc=leon@kernel.org \
    --cc=mbloch@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=nikolay@nvidia.com \
    --cc=olteanv@gmail.com \
    --cc=roid@nvidia.com \
    --cc=roopa@nvidia.com \
    --cc=saeedm@nvidia.com \
    --cc=sean.wang@mediatek.com \
    --cc=tchornyi@marvell.com \
    --cc=tobias@waldekranz.com \
    --cc=vigneshr@ti.com \
    --cc=vivien.didelot@gmail.com \
    --cc=vkochan@marvell.com \
    --cc=vladbu@nvidia.com \
    --cc=vladimir.oltean@nxp.com \
    --cc=woojung.huh@microchip.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).