Re: [RFC 0/5] devlink: add an explicit locking API

From: Leon Romanovsky <leon@kernel.org>
To: Jakub Kicinski <kuba@kernel.org>
Cc: idosch@idosch.org, edwin.peer@broadcom.com, jiri@resnulli.us,
	netdev@vger.kernel.org
Subject: Re: [RFC 0/5] devlink: add an explicit locking API
Date: Mon, 1 Nov 2021 20:36:15 +0200	[thread overview]
Message-ID: <YYAzn+mtrGp/As74@unreal> (raw)
In-Reply-To: <20211101073259.33406da3@kicinski-fedora-PC1C0HJN>

On Mon, Nov 01, 2021 at 07:32:59AM -0700, Jakub Kicinski wrote:
> On Sun, 31 Oct 2021 09:23:42 +0200 Leon Romanovsky wrote:
> > On Sat, Oct 30, 2021 at 04:12:49PM -0700, Jakub Kicinski wrote:
> > > This implements what I think is the right direction for devlink
> > > API overhaul. It's an early RFC/PoC because the body of code is
> > > rather large so I'd appreciate feedback early... The patches are
> > > very roughly split, the point of the posting is primarily to prove
> > > that from the driver side the conversion is an net improvement
> > > in complexity.
> > > 
> > > IMO the problems with devlink locking are caused by two things:
> > > 
> > >  (1) driver has no way to block devlink calls like it does in case
> > >      of netedev / rtnl_lock, note that for devlink each driver has
> > >      it's own lock so contention is not an issue;
> > >      
> > >  (2) sometimes devlink calls into the driver without holding its lock
> > >      - for operations which may need the driver to call devlink (port
> > >      splitting, switch config, reload etc.), the circular dependency
> > >      is there, the driver can't solve this problem.
> > > 
> > > This set "fixes" the ordering by allowing the driver to participate
> > > in locking. The lock order remains:
> > > 
> > >   device_lock -> [devlink_mutex] -> devlink instance -> rtnl_lock
> > > 
> > > but now driver can take devlink lock itself, and _ALL_ devlink ops
> > > are locked.
> > > 
> > > The expectation is that driver will take the devlink instance lock
> > > on its probe and remove paths, hence protecting all configuration
> > > paths with the devlink instance lock.
> > > 
> > > This is clearly demonstrated by the netdevsim conversion. All entry
> > > points to driver config are protected by devlink instance lock, so
> > > the driver switches to the "I'm already holding the devlink lock" API
> > > when calling devlink. All driver locks and trickery is removed.
> > > 
> > > The part which is slightly more challanging is quiescing async entry
> > > points which need to be closed on the devlink reload path (workqueue,
> > > debugfs etc.) and which also take devlink instance lock. For that we
> > > need to be able to take refs on the devlink instance and let them
> > > clean up after themselves rather than waiting synchronously.
> > > 
> > > That last part is not 100% finished in this patch set - it works but
> > > we need the driver to take devlink_mutex (the global lock) from its
> > > probe/remove path. I hope this is good enough for an RFC, the problem
> > > is easily solved by protecting the devlink XArray with something else
> > > than devlink_mutex and/or not holding devlink_mutex around each op
> > > (so that there is no ordering between the global and instance locks).
> > > Speaking of not holding devlink_mutex around each op this patch set
> > > also opens the path for parallel ops to different devlink instances
> > > which is currently impossible because all user space calls take
> > > devlink_mutex...  
> > 
> > No, please no.

<...>

> How is RW semaphore going to solve the problem that ops are unlocked
> and have to take the instance lock from within to add/remove ports?

This is three step process, but mainly it is first step. We need to make
sure that functions that can re-entry will use nested locks.

Steps:
1. Use proper locking API that supports nesting:
https://lore.kernel.org/netdev/YYABqfFy%2F%2Fg5Gdis@nanopsycho/T/#mf9dc5cac2013abe413545bbe4a09cc231ae209a4
2. Convert devlink->lock to be RW semaphore:
commit 4506dd3a90a82a0b6bde238f507907747ab88407
Author: Leon Romanovsky <leon@kernel.org>
Date:   Sun Oct 24 16:54:16 2021 +0300

    devlink: Convert devlink lock to be RW semaphore

    This is naive conversion of devlink->lock to RW semaphore, so we will be
    able to differentiate commands that require exclusive access vs. parallel
    ready-to-run ones.

    All "set" commands that used devlink->lock are converted to write lock,
    while all "get" commands are marked with read lock.

@@ -578,8 +584,12 @@ static int devlink_nl_pre_doit(const struct genl_ops *ops,
                mutex_unlock(&devlink_mutex);
                return PTR_ERR(devlink);
        }
-       if (~ops->internal_flags & DEVLINK_NL_FLAG_NO_LOCK)
-               mutex_lock(&devlink->lock);
+
+       if (~ops->internal_flags & DEVLINK_NL_FLAG_SHARED_ACCESS)
+               down_write(&devlink->rwsem);
+       else
+               down_read(&devlink->rwsem);
+

3. Drop devlink_mutex:
commit 3177af9971c4cd95f9633aeb9b0434687da62fd0
Author: Leon Romanovsky <leon@kernel.org>
Date:   Sun Oct 31 16:05:40 2021 +0200

    devlink: Use xarray locking mechanism instead big devlink lock

    The conversion to XArray together with devlink reference counting
    allows us reuse the following locking pattern:
     xa_lock()
      xa_for_each() {
       devlink_try_get()
       xa_unlock()
       ....
       xa_lock()
     }

    This pattern gives us a way to run any commands between xa_unlock() and
    xa_lock() without big devlink mutex, while making sure that devlink instance
    won't be released.


Steps 2 and 3 were not posted due to merge window and my desire to get
mileage in our regression.

> 
> > Please, let's not give up on standalone devlink implementation without
> > drivers need to know internal devlink details. It is hard to do but possible.
> 
> We may just disagree on this one. Please answer my question above -
> so far IDK how you're going to fix the problem of re-reg'ing subobjects
> from the reload path.
> 
> My experience writing drivers is that it was painfully unclear what 
> the devlink locking rules are. Sounds like you'd make them even more
> complicated.

How? I have only one rule:
 * Driver author should be aware that between devlink_register() and
   devlink_unregister(), users can send netlink commands.

In your solution, driver authors will need to follow whole devlink
how-to-use book.

> 
> This RFC makes the rules simple - all devlink ops are locked.
> 
> For the convenience of the drivers they can also take the instance lock
> whenever they want to prevent ops from being called. Experience with
> rtnl_lock teaches us that this is very useful for drivers.

Strange, I see completely opposite picture in the git log with so much
changes that have Fixes line and add/remove/mention rtnl_lock. Maybe it
is useful, but almost all if not all drivers full of fixes of rtnl_lock
usage. I don't want same for the devlink.

Thanks