All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vladimir Oltean <olteanv@gmail.com>
To: Ido Schimmel <idosch@idosch.org>
Cc: Nikolay Aleksandrov <nikolay@nvidia.com>,
	Vladimir Oltean <vladimir.oltean@nxp.com>,
	netdev@vger.kernel.org, Jakub Kicinski <kuba@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Roopa Prabhu <roopa@nvidia.com>, Andrew Lunn <andrew@lunn.ch>,
	Florian Fainelli <f.fainelli@gmail.com>,
	Vivien Didelot <vivien.didelot@gmail.com>,
	Vadym Kochan <vkochan@marvell.com>,
	Taras Chornyi <tchornyi@marvell.com>,
	Jiri Pirko <jiri@nvidia.com>, Ido Schimmel <idosch@nvidia.com>,
	UNGLinuxDriver@microchip.com,
	Grygorii Strashko <grygorii.strashko@ti.com>,
	Marek Behun <kabel@blackhole.sk>,
	DENG Qingfang <dqfext@gmail.com>,
	Kurt Kanzenbach <kurt@linutronix.de>,
	Hauke Mehrtens <hauke@hauke-m.de>,
	Woojung Huh <woojung.huh@microchip.com>,
	Sean Wang <sean.wang@mediatek.com>,
	Landen Chao <Landen.Chao@mediatek.com>,
	Claudiu Manoil <claudiu.manoil@nxp.com>,
	Alexandre Belloni <alexandre.belloni@bootlin.com>,
	George McCollister <george.mccollister@gmail.com>,
	Ioana Ciornei <ioana.ciornei@nxp.com>,
	Saeed Mahameed <saeedm@nvidia.com>,
	Leon Romanovsky <leon@kernel.org>,
	Lars Povlsen <lars.povlsen@microchip.com>,
	Steen Hegelund <Steen.Hegelund@microchip.com>,
	Julian Wiedmann <jwi@linux.ibm.com>,
	Karsten Graul <kgraul@linux.ibm.com>,
	Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Ivan Vecera <ivecera@redhat.com>, Vlad Buslov <vladbu@nvidia.com>,
	Jianbo Liu <jianbol@nvidia.com>, Mark Bloch <mbloch@nvidia.com>,
	Roi Dayan <roid@nvidia.com>,
	Tobias Waldekranz <tobias@waldekranz.com>,
	Vignesh Raghavendra <vigneshr@ti.com>,
	Jesse Brandeburg <jesse.brandeburg@intel.com>
Subject: Re: [PATCH v2 net-next 0/5] Make SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE blocking
Date: Mon, 23 Aug 2021 14:00:46 +0300	[thread overview]
Message-ID: <20210823110046.xuuo37kpsxdbl6c2@skbuf> (raw)
In-Reply-To: <YSN83d+wwLnba349@shredder>

On Mon, Aug 23, 2021 at 01:47:57PM +0300, Ido Schimmel wrote:
> On Sun, Aug 22, 2021 at 08:44:49PM +0300, Vladimir Oltean wrote:
> > On Sun, Aug 22, 2021 at 08:06:00PM +0300, Ido Schimmel wrote:
> > > On Sun, Aug 22, 2021 at 04:31:45PM +0300, Vladimir Oltean wrote:
> > > > 3. There is a larger issue that SWITCHDEV_FDB_ADD_TO_DEVICE events are
> > > >    deferred by drivers even from code paths that are initially blocking
> > > >    (are running in process context):
> > > >
> > > > br_fdb_add
> > > > -> __br_fdb_add
> > > >    -> fdb_add_entry
> > > >       -> fdb_notify
> > > >          -> br_switchdev_fdb_notify
> > > >
> > > >     It seems fairly trivial to move the fdb_notify call outside of the
> > > >     atomic section of fdb_add_entry, but with switchdev offering only an
> > > >     API where the SWITCHDEV_FDB_ADD_TO_DEVICE is atomic, drivers would
> > > >     still have to defer these events and are unable to provide
> > > >     synchronous feedback to user space (error codes, extack).
> > > >
> > > > The above issues would warrant an attempt to fix a central problem, and
> > > > make switchdev expose an API that is easier to consume rather than
> > > > having drivers implement lateral workarounds.
> > > >
> > > > In this case, we must notice that
> > > >
> > > > (a) switchdev already has the concept of notifiers emitted from the fast
> > > >     path that are still processed by drivers from blocking context. This
> > > >     is accomplished through the SWITCHDEV_F_DEFER flag which is used by
> > > >     e.g. SWITCHDEV_OBJ_ID_HOST_MDB.
> > > >
> > > > (b) the bridge del_nbp() function already calls switchdev_deferred_process().
> > > >     So if we could hook into that, we could have a chance that the
> > > >     bridge simply waits for our FDB entry offloading procedure to finish
> > > >     before it calls netdev_upper_dev_unlink() - which is almost
> > > >     immediately afterwards, and also when switchdev drivers typically
> > > >     break their stateful associations between the bridge upper and
> > > >     private data.
> > > >
> > > > So it is in fact possible to use switchdev's generic
> > > > switchdev_deferred_enqueue mechanism to get a sleepable callback, and
> > > > from there we can call_switchdev_blocking_notifiers().
> > > >
> > > > To address all requirements:
> > > >
> > > > - drivers that are unconverted from atomic to blocking still work
> > > > - drivers that currently have a private workqueue are not worse off
> > > > - drivers that want the bridge to wait for their deferred work can use
> > > >   the bridge's defer mechanism
> > > > - a SWITCHDEV_FDB_ADD_TO_DEVICE event which does not have any interested
> > > >   parties does not get deferred for no reason, because this takes the
> > > >   rtnl_mutex and schedules a worker thread for nothing
> > > >
> > > > it looks like we can in fact start off by emitting
> > > > SWITCHDEV_FDB_ADD_TO_DEVICE on the atomic chain. But we add a new bit in
> > > > struct switchdev_notifier_fdb_info called "needs_defer", and any
> > > > interested party can set this to true.
> > > >
> > > > This way:
> > > >
> > > > - unconverted drivers do their work (i.e. schedule their private work
> > > >   item) based on the atomic notifier, and do not set "needs_defer"
> > > > - converted drivers only mark "needs_defer" and treat a separate
> > > >   notifier, on the blocking chain, called SWITCHDEV_FDB_ADD_TO_DEVICE_DEFERRED
> > > > - SWITCHDEV_FDB_ADD_TO_DEVICE events with no interested party do not
> > > >   generate any follow-up SWITCHDEV_FDB_ADD_TO_DEVICE_DEFERRED
> > > >
> > > > Additionally, code paths that are blocking right not, like br_fdb_replay,
> > > > could notify only SWITCHDEV_FDB_ADD_TO_DEVICE_DEFERRED, as long as all
> > > > consumers of the replayed FDB events support that (right now, that is
> > > > DSA and dpaa2-switch).
> > > >
> > > > Once all consumers of SWITCHDEV_FDB_ADD_TO_DEVICE are converted to set
> > > > needs_defer as appropriate, then the notifiers emitted from process
> > > > context by the bridge could call SWITCHDEV_FDB_ADD_TO_DEVICE_DEFERRED
> > > > directly, and we would also have fully blocking context all the way
> > > > down, with the opportunity for error propagation and extack.
> > >
> > > IIUC, at this stage all the FDB notifications drivers get are blocking,
> > > either from the work queue (because they were deferred) or directly from
> > > process context. If so, how do we synchronize the two and ensure drivers
> > > get the notifications at the correct order?
> >
> > What does 'at this stage' mean? Does it mean 'assuming the patch we're
> > discussing now gets accepted'? If that's what it means, then 'at this
> > stage' all drivers would first receive the atomic FDB_ADD_TO_DEVICE,
> > then would set needs_defer, then would receive the blocking
> > FDB_ADD_TO_DEVICE.
>
> I meant after:
>
> "Once all consumers of SWITCHDEV_FDB_ADD_TO_DEVICE are converted to set
> needs_defer as appropriate, then the notifiers emitted from process
> context by the bridge could call SWITCHDEV_FDB_ADD_TO_DEVICE_DEFERRED
> directly, and we would also have fully blocking context all the way
> down, with the opportunity for error propagation and extack."
>
> IIUC, after the conversion the 'needs_defer' is gone and all the FDB
> events are blocking? Either from syscall context or the workqueue.

We would not delete 'needs_defer'. It still offers a useful preliminary
filtering mechanism for the fast path (and for br_fdb_replay). In
retrospect, the SWITCHDEV_OBJ_ID_HOST_MDB would also benefit from 'needs_defer'
instead of jumping to blocking context (if we care so much about performance).

If a FDB event does not need to be processed by anyone (dynamically
learned entry on a switchdev port), the bridge notifies the atomic call
chain for the sake of it, but not the blocking chain.

> If so, I'm not sure how we synchronize the two. That is, making sure
> that an event from syscall context does not reach drivers before an
> earlier event that was added to the 'deferred' list.
>
> I mean, in syscall context we are holding RTNL so whatever is already on
> the 'deferred' list cannot be dequeued and processed.

So switchdev_deferred_process() has ASSERT_RTNL. If we call
switchdev_deferred_process() right before adding the blocking FDB entry
in process context (and we already hold rtnl_mutex), I though that would
be enough to ensure we have a synchronization point: Everything that was
scheduled before is flushed now, everything that is scheduled while we
are running will run after we unlock the rtnl_mutex. Is that not the
order we expect? I mean, if there is a fast path FDB entry being learned
/ deleted while user space say adds that same FDB entry as static, how
is the relative ordering ensured between the two?

  reply	other threads:[~2021-08-23 11:01 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-19 16:07 [PATCH v2 net-next 0/5] Make SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE blocking Vladimir Oltean
2021-08-19 16:07 ` [PATCH v2 net-next 1/5] net: switchdev: move SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE to the blocking notifier chain Vladimir Oltean
2021-08-19 18:15   ` Vlad Buslov
2021-08-19 23:18     ` Vladimir Oltean
2021-08-20  7:36       ` Vlad Buslov
2021-08-19 16:07 ` [PATCH v2 net-next 2/5] net: bridge: switchdev: make br_fdb_replay offer sleepable context to consumers Vladimir Oltean
2021-08-19 16:07 ` [PATCH v2 net-next 3/5] net: switchdev: drop the atomic notifier block from switchdev_bridge_port_{,un}offload Vladimir Oltean
2021-08-19 16:07 ` [PATCH v2 net-next 4/5] net: switchdev: don't assume RCU context in switchdev_handle_fdb_{add,del}_to_device Vladimir Oltean
2021-08-19 16:07 ` [PATCH v2 net-next 5/5] net: dsa: handle SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE synchronously Vladimir Oltean
2021-08-20  9:16 ` [PATCH v2 net-next 0/5] Make SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE blocking Ido Schimmel
2021-08-20  9:37   ` Vladimir Oltean
2021-08-20 16:09     ` Ido Schimmel
2021-08-20 17:06       ` Vladimir Oltean
2021-08-20 23:36         ` Nikolay Aleksandrov
2021-08-21  0:22           ` Vladimir Oltean
2021-08-22  6:48           ` Ido Schimmel
2021-08-22  9:12             ` Nikolay Aleksandrov
2021-08-22 13:31               ` Vladimir Oltean
2021-08-22 17:06                 ` Ido Schimmel
2021-08-22 17:44                   ` Vladimir Oltean
2021-08-23 10:47                     ` Ido Schimmel
2021-08-23 11:00                       ` Vladimir Oltean [this message]
2021-08-23 12:16                         ` Ido Schimmel
2021-08-23 14:29                           ` Vladimir Oltean
2021-08-23 15:18                             ` Ido Schimmel
2021-08-23 15:42                               ` Nikolay Aleksandrov
2021-08-23 15:42                               ` Vladimir Oltean
2021-08-23 16:02                                 ` Ido Schimmel
2021-08-23 16:11                                   ` Vladimir Oltean
2021-08-23 16:23                                   ` Vladimir Oltean
2021-08-20 10:49   ` Vladimir Oltean
2021-08-20 16:11     ` Ido Schimmel
2021-08-21 19:09       ` Vladimir Oltean
2021-08-22  7:19         ` Ido Schimmel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210823110046.xuuo37kpsxdbl6c2@skbuf \
    --to=olteanv@gmail.com \
    --cc=Landen.Chao@mediatek.com \
    --cc=Steen.Hegelund@microchip.com \
    --cc=UNGLinuxDriver@microchip.com \
    --cc=alexandre.belloni@bootlin.com \
    --cc=andrew@lunn.ch \
    --cc=borntraeger@de.ibm.com \
    --cc=claudiu.manoil@nxp.com \
    --cc=davem@davemloft.net \
    --cc=dqfext@gmail.com \
    --cc=f.fainelli@gmail.com \
    --cc=george.mccollister@gmail.com \
    --cc=gor@linux.ibm.com \
    --cc=grygorii.strashko@ti.com \
    --cc=hauke@hauke-m.de \
    --cc=hca@linux.ibm.com \
    --cc=idosch@idosch.org \
    --cc=idosch@nvidia.com \
    --cc=ioana.ciornei@nxp.com \
    --cc=ivecera@redhat.com \
    --cc=jesse.brandeburg@intel.com \
    --cc=jianbol@nvidia.com \
    --cc=jiri@nvidia.com \
    --cc=jwi@linux.ibm.com \
    --cc=kabel@blackhole.sk \
    --cc=kgraul@linux.ibm.com \
    --cc=kuba@kernel.org \
    --cc=kurt@linutronix.de \
    --cc=lars.povlsen@microchip.com \
    --cc=leon@kernel.org \
    --cc=mbloch@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=nikolay@nvidia.com \
    --cc=roid@nvidia.com \
    --cc=roopa@nvidia.com \
    --cc=saeedm@nvidia.com \
    --cc=sean.wang@mediatek.com \
    --cc=tchornyi@marvell.com \
    --cc=tobias@waldekranz.com \
    --cc=vigneshr@ti.com \
    --cc=vivien.didelot@gmail.com \
    --cc=vkochan@marvell.com \
    --cc=vladbu@nvidia.com \
    --cc=vladimir.oltean@nxp.com \
    --cc=woojung.huh@microchip.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.