All of lore.kernel.org
 help / color / mirror / Atom feed
From: Leon Romanovsky <leon@kernel.org>
To: Jakub Kicinski <kuba@kernel.org>
Cc: idosch@idosch.org, edwin.peer@broadcom.com, jiri@resnulli.us,
	netdev@vger.kernel.org
Subject: Re: [RFC 0/5] devlink: add an explicit locking API
Date: Mon, 1 Nov 2021 20:36:15 +0200	[thread overview]
Message-ID: <YYAzn+mtrGp/As74@unreal> (raw)
In-Reply-To: <20211101073259.33406da3@kicinski-fedora-PC1C0HJN>

On Mon, Nov 01, 2021 at 07:32:59AM -0700, Jakub Kicinski wrote:
> On Sun, 31 Oct 2021 09:23:42 +0200 Leon Romanovsky wrote:
> > On Sat, Oct 30, 2021 at 04:12:49PM -0700, Jakub Kicinski wrote:
> > > This implements what I think is the right direction for devlink
> > > API overhaul. It's an early RFC/PoC because the body of code is
> > > rather large so I'd appreciate feedback early... The patches are
> > > very roughly split, the point of the posting is primarily to prove
> > > that from the driver side the conversion is an net improvement
> > > in complexity.
> > > 
> > > IMO the problems with devlink locking are caused by two things:
> > > 
> > >  (1) driver has no way to block devlink calls like it does in case
> > >      of netedev / rtnl_lock, note that for devlink each driver has
> > >      it's own lock so contention is not an issue;
> > >      
> > >  (2) sometimes devlink calls into the driver without holding its lock
> > >      - for operations which may need the driver to call devlink (port
> > >      splitting, switch config, reload etc.), the circular dependency
> > >      is there, the driver can't solve this problem.
> > > 
> > > This set "fixes" the ordering by allowing the driver to participate
> > > in locking. The lock order remains:
> > > 
> > >   device_lock -> [devlink_mutex] -> devlink instance -> rtnl_lock
> > > 
> > > but now driver can take devlink lock itself, and _ALL_ devlink ops
> > > are locked.
> > > 
> > > The expectation is that driver will take the devlink instance lock
> > > on its probe and remove paths, hence protecting all configuration
> > > paths with the devlink instance lock.
> > > 
> > > This is clearly demonstrated by the netdevsim conversion. All entry
> > > points to driver config are protected by devlink instance lock, so
> > > the driver switches to the "I'm already holding the devlink lock" API
> > > when calling devlink. All driver locks and trickery is removed.
> > > 
> > > The part which is slightly more challanging is quiescing async entry
> > > points which need to be closed on the devlink reload path (workqueue,
> > > debugfs etc.) and which also take devlink instance lock. For that we
> > > need to be able to take refs on the devlink instance and let them
> > > clean up after themselves rather than waiting synchronously.
> > > 
> > > That last part is not 100% finished in this patch set - it works but
> > > we need the driver to take devlink_mutex (the global lock) from its
> > > probe/remove path. I hope this is good enough for an RFC, the problem
> > > is easily solved by protecting the devlink XArray with something else
> > > than devlink_mutex and/or not holding devlink_mutex around each op
> > > (so that there is no ordering between the global and instance locks).
> > > Speaking of not holding devlink_mutex around each op this patch set
> > > also opens the path for parallel ops to different devlink instances
> > > which is currently impossible because all user space calls take
> > > devlink_mutex...  
> > 
> > No, please no.

<...>

> How is RW semaphore going to solve the problem that ops are unlocked
> and have to take the instance lock from within to add/remove ports?

This is three step process, but mainly it is first step. We need to make
sure that functions that can re-entry will use nested locks.

Steps:
1. Use proper locking API that supports nesting:
https://lore.kernel.org/netdev/YYABqfFy%2F%2Fg5Gdis@nanopsycho/T/#mf9dc5cac2013abe413545bbe4a09cc231ae209a4
2. Convert devlink->lock to be RW semaphore:
commit 4506dd3a90a82a0b6bde238f507907747ab88407
Author: Leon Romanovsky <leon@kernel.org>
Date:   Sun Oct 24 16:54:16 2021 +0300

    devlink: Convert devlink lock to be RW semaphore

    This is naive conversion of devlink->lock to RW semaphore, so we will be
    able to differentiate commands that require exclusive access vs. parallel
    ready-to-run ones.

    All "set" commands that used devlink->lock are converted to write lock,
    while all "get" commands are marked with read lock.

@@ -578,8 +584,12 @@ static int devlink_nl_pre_doit(const struct genl_ops *ops,
                mutex_unlock(&devlink_mutex);
                return PTR_ERR(devlink);
        }
-       if (~ops->internal_flags & DEVLINK_NL_FLAG_NO_LOCK)
-               mutex_lock(&devlink->lock);
+
+       if (~ops->internal_flags & DEVLINK_NL_FLAG_SHARED_ACCESS)
+               down_write(&devlink->rwsem);
+       else
+               down_read(&devlink->rwsem);
+

3. Drop devlink_mutex:
commit 3177af9971c4cd95f9633aeb9b0434687da62fd0
Author: Leon Romanovsky <leon@kernel.org>
Date:   Sun Oct 31 16:05:40 2021 +0200

    devlink: Use xarray locking mechanism instead big devlink lock

    The conversion to XArray together with devlink reference counting
    allows us reuse the following locking pattern:
     xa_lock()
      xa_for_each() {
       devlink_try_get()
       xa_unlock()
       ....
       xa_lock()
     }

    This pattern gives us a way to run any commands between xa_unlock() and
    xa_lock() without big devlink mutex, while making sure that devlink instance
    won't be released.


Steps 2 and 3 were not posted due to merge window and my desire to get
mileage in our regression.

> 
> > Please, let's not give up on standalone devlink implementation without
> > drivers need to know internal devlink details. It is hard to do but possible.
> 
> We may just disagree on this one. Please answer my question above -
> so far IDK how you're going to fix the problem of re-reg'ing subobjects
> from the reload path.
> 
> My experience writing drivers is that it was painfully unclear what 
> the devlink locking rules are. Sounds like you'd make them even more
> complicated.

How? I have only one rule:
 * Driver author should be aware that between devlink_register() and
   devlink_unregister(), users can send netlink commands.

In your solution, driver authors will need to follow whole devlink
how-to-use book.

> 
> This RFC makes the rules simple - all devlink ops are locked.
> 
> For the convenience of the drivers they can also take the instance lock
> whenever they want to prevent ops from being called. Experience with
> rtnl_lock teaches us that this is very useful for drivers.

Strange, I see completely opposite picture in the git log with so much
changes that have Fixes line and add/remove/mention rtnl_lock. Maybe it
is useful, but almost all if not all drivers full of fixes of rtnl_lock
usage. I don't want same for the devlink.

Thanks

  reply	other threads:[~2021-11-01 18:36 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-30 23:12 [RFC 0/5] devlink: add an explicit locking API Jakub Kicinski
2021-10-30 23:12 ` [RFC 1/5] devlink: add unlocked APIs Jakub Kicinski
2021-10-30 23:12 ` [RFC 2/5] devlink: add API for explicit locking Jakub Kicinski
2021-10-30 23:12 ` [RFC 3/5] devlink: allow locking of all ops Jakub Kicinski
2021-10-30 23:12 ` [RFC 4/5] netdevsim: minor code move Jakub Kicinski
2021-10-30 23:12 ` [RFC 5/5] netdevsim: use devlink locking Jakub Kicinski
2021-10-31  7:23 ` [RFC 0/5] devlink: add an explicit locking API Leon Romanovsky
2021-11-01 14:32   ` Jakub Kicinski
2021-11-01 18:36     ` Leon Romanovsky [this message]
2021-11-01 21:16       ` Jakub Kicinski
2021-11-02  8:08         ` Leon Romanovsky
2021-11-02 15:14           ` Jakub Kicinski
2021-11-02 18:14             ` Leon Romanovsky
2021-11-03  0:05               ` Jakub Kicinski
2021-11-03  7:23                 ` Leon Romanovsky
2021-11-03 14:12                   ` Jakub Kicinski
2021-11-01 20:04   ` Edwin Peer
2021-11-02  7:44     ` Leon Romanovsky
2021-11-02 15:16       ` Jakub Kicinski
2021-11-02 17:50         ` Leon Romanovsky
2021-11-03  9:03 ` Jiri Pirko
2021-11-03 14:52   ` Jakub Kicinski
2021-11-03 19:19     ` Jiri Pirko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YYAzn+mtrGp/As74@unreal \
    --to=leon@kernel.org \
    --cc=edwin.peer@broadcom.com \
    --cc=idosch@idosch.org \
    --cc=jiri@resnulli.us \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.