linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ido Schimmel <idosch@idosch.org>
To: Leon Romanovsky <leon@kernel.org>
Cc: "David S . Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Ido Schimmel <idosch@nvidia.com>, Ingo Molnar <mingo@redhat.com>,
	Jiri Pirko <jiri@nvidia.com>,
	linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
	mlxsw@nvidia.com, Moshe Shemesh <moshe@nvidia.com>,
	netdev@vger.kernel.org, Saeed Mahameed <saeedm@nvidia.com>,
	Salil Mehta <salil.mehta@huawei.com>,
	Shay Drory <shayd@nvidia.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Tariq Toukan <tariqt@nvidia.com>,
	Yisen Zhuang <yisen.zhuang@huawei.com>
Subject: Re: [PATCH net-next v2 5/5] devlink: Delete reload enable/disable interface
Date: Tue, 5 Oct 2021 09:10:15 +0300	[thread overview]
Message-ID: <YVvsR4CxOW09k8KX@shredder> (raw)
In-Reply-To: <YVtPruw9kzOQvhZu@unreal>

On Mon, Oct 04, 2021 at 10:02:06PM +0300, Leon Romanovsky wrote:
> On Mon, Oct 04, 2021 at 07:54:02PM +0300, Ido Schimmel wrote:
> > On Mon, Oct 04, 2021 at 06:45:07PM +0300, Leon Romanovsky wrote:
> > > On Mon, Oct 04, 2021 at 05:19:40PM +0300, Ido Schimmel wrote:
> > > > On Sun, Oct 03, 2021 at 09:12:06PM +0300, Leon Romanovsky wrote:
> > > > > From: Leon Romanovsky <leonro@nvidia.com>
> > > > > 
> > > > > After changes to allow dynamically set the reload_up/_down callbacks,
> > > > > we ensure that properly supported devlink ops are not accessible before
> > > > > devlink_register, which is last command in the initialization sequence.
> > > > > 
> > > > > It makes devlink_reload_enable/_disable not relevant anymore and can be
> > > > > safely deleted.
> > > > > 
> > > > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > > > 
> > > > [...]
> > > > 
> > > > > diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
> > > > > index cb6645012a30..09e48fb232a9 100644
> > > > > --- a/drivers/net/netdevsim/dev.c
> > > > > +++ b/drivers/net/netdevsim/dev.c
> > > > > @@ -1512,7 +1512,6 @@ int nsim_dev_probe(struct nsim_bus_dev *nsim_bus_dev)
> > > > >  
> > > > >  	nsim_dev->esw_mode = DEVLINK_ESWITCH_MODE_LEGACY;
> > > > >  	devlink_register(devlink);
> > > > > -	devlink_reload_enable(devlink);
> > > > >  	return 0;
> > > > >  
> > > > >  err_psample_exit:
> > > > > @@ -1566,9 +1565,7 @@ void nsim_dev_remove(struct nsim_bus_dev *nsim_bus_dev)
> > > > >  	struct nsim_dev *nsim_dev = dev_get_drvdata(&nsim_bus_dev->dev);
> > > > >  	struct devlink *devlink = priv_to_devlink(nsim_dev);
> > > > >  
> > > > > -	devlink_reload_disable(devlink);
> > > > >  	devlink_unregister(devlink);
> > > > > -
> > > > >  	nsim_dev_reload_destroy(nsim_dev);
> > > > >  
> > > > >  	nsim_bpf_dev_exit(nsim_dev);
> > > > 
> > > > I didn't remember why devlink_reload_{enable,disable}() were added in
> > > > the first place so it was not clear to me from the commit message why
> > > > they can be removed. It is described in commit a0c76345e3d3 ("devlink:
> > > > disallow reload operation during device cleanup") with a reproducer.
> > > 
> > > It was added because devlink ops were accessible by the user space very
> > > early in the driver lifetime. All my latest devlink patches are the
> > > attempt to fix this arch/design/implementation issue.
> > 
> > The reproducer in the commit message executed the reload after the
> > device was fully initialized. IIRC, the problem there was that nothing
> > prevented these two tasks from racing:
> > 
> > devlink dev reload netdevsim/netdevsim10
> > echo 10 > /sys/bus/netdevsim/del_device
> > 
> > The title also talks about forbidding reload during device cleanup.
> 
> It is incomplete title and reproducer.

How can the reproducer be incomplete when it reproduced the issue 100%
of the time?

> In our verification, we observed more than 40 bugs related to devlink
> reload flows and races around it.

I assume these bugs are related to mlx5. syzkaller is familiar with the
devlink messages [1] and we are using it to fuzz over both mlxsw and
netdevsim. syzbot is also fuzzing over netdevsim and I'm not aware of
any open bugs.

[1] https://github.com/google/syzkaller/blob/master/sys/linux/socket_netlink_generic_devlink.txt

> 
> > 
> > > 
> > > > 
> > > > Tried the reproducer with this series and I cannot reproduce the issue.
> > > > Wasn't quite sure why, but it does not seem to be related to "changes to
> > > > allow dynamically set the reload_up/_down callbacks", as this seems to
> > > > be specific to mlx5.
> > > 
> > > You didn't reproduce because of my series that moved
> > > devlink_register()/devlink_unregister() to be last/first commands in
> > > .probe()/.remove() flows.
> > 
> > Agree, that is what I wrote in the next paragraph of my reply.
> > 
> > > 
> > > Patch to allow dynamically set ops was needed because mlx5 had logic
> > > like this:
> > >  if(something)
> > >     devlink_reload_enable()
> > > 
> > > And I needed a way to keep this if ... condition.
> > > 
> > > > 
> > > > IIUC, the reason that the race described in above mentioned commit can
> > > > no longer happen is related to the fact that devlink_unregister() is
> > > > called first in the device dismantle path, after your previous patches.
> > > > Since both the reload operation and devlink_unregister() hold
> > > > 'devlink_mutex', it is not possible for the reload operation to race
> > > > with device dismantle.
> > > > 
> > > > Agree? If so, I think it would be good to explain this in the commit
> > > > message unless it's clear to everyone else.
> > > 
> > > I don't agree for very simple reason that devlink_mutex is going to be
> > > removed very soon and it is really not a reason why devlink reload is
> > > safer now when before.
> > > 
> > > The reload can't race due to:
> > > 1. devlink_unregister(), which works as a barrier to stop accesses
> > > from the user space.
> > > 2. reference counting that ensures that all in-flight commands are counted.
> > > 3. wait_for_completion that blocks till all commands are done.
> > 
> > So the wait_for_completion() is what prevents the race, not
> > 'devlink_mutex' that is taken later. This needs to be explained in the
> > commit message to make it clear why the removal is safe.
> 
> Can you please suggest what exactly should I write in the commit message
> to make it clear?
> 
> I'm too much into this delvink stuff already and for me this patch is
> trivial. IMHO, that change doesn't need an explanation at all because
> coding pattern of refcount + wait_for_completion is pretty common in the
> kernel. So I think that I explained good enough: move of
> devlink_register/devlink_unregister obsoletes the devlink_reload_* APIs.
> 
> I have no problem to update the commit message, just help me with the
> message.

I suggest something like:

"
Commit a0c76345e3d3 ("devlink: disallow reload operation during device
cleanup") added devlink_reload_{enable,disable}() APIs to prevent reload
operation from racing with device probe / dismantle.

After recent changes to move devlink_register() to the end of device
probe and devlink_unregister() to the beginning of device dismantle,
these races can no longer happen. Reload operations will be denied if
the devlink instance is unregistered and devlink_unregister() will block
until all in-flight operations are done.

Therefore, remove these devlink_reload_{enable,disable}() APIs. Tested
with the reproducer mentioned in cited commit.
"

  reply	other threads:[~2021-10-05  6:10 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-03 18:12 [PATCH net-next v2 0/5] devlink reload simplification Leon Romanovsky
2021-10-03 18:12 ` [PATCH net-next v2 1/5] devlink: Reduce struct devlink exposure Leon Romanovsky
2021-10-04 23:38   ` Jakub Kicinski
2021-10-05  6:13     ` Leon Romanovsky
2021-10-03 18:12 ` [PATCH net-next v2 2/5] devlink: Annotate devlink API calls Leon Romanovsky
2021-10-03 18:12 ` [PATCH net-next v2 3/5] devlink: Allow set specific ops callbacks dynamically Leon Romanovsky
2021-10-04 23:44   ` Jakub Kicinski
2021-10-05  7:32     ` Leon Romanovsky
2021-10-05 18:32       ` Jakub Kicinski
2021-10-05 19:15         ` Leon Romanovsky
2021-10-06  0:39           ` Jakub Kicinski
2021-10-06  3:37             ` Leon Romanovsky
2021-10-06 13:35               ` Jakub Kicinski
2021-10-06 14:48                 ` Leon Romanovsky
2021-10-03 18:12 ` [PATCH net-next v2 4/5] net/mlx5: Register separate reload devlink ops for multiport device Leon Romanovsky
2021-10-03 18:12 ` [PATCH net-next v2 5/5] devlink: Delete reload enable/disable interface Leon Romanovsky
2021-10-04 14:19   ` Ido Schimmel
2021-10-04 15:45     ` Leon Romanovsky
2021-10-04 16:54       ` Ido Schimmel
2021-10-04 19:02         ` Leon Romanovsky
2021-10-05  6:10           ` Ido Schimmel [this message]
2021-10-05  7:40             ` Leon Romanovsky
2021-10-05  8:18               ` Ido Schimmel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YVvsR4CxOW09k8KX@shredder \
    --to=idosch@idosch.org \
    --cc=davem@davemloft.net \
    --cc=idosch@nvidia.com \
    --cc=jiri@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mlxsw@nvidia.com \
    --cc=moshe@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=saeedm@nvidia.com \
    --cc=salil.mehta@huawei.com \
    --cc=shayd@nvidia.com \
    --cc=tariqt@nvidia.com \
    --cc=yisen.zhuang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).