netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Leon Romanovsky <leon@kernel.org>
To: Jakub Kicinski <kuba@kernel.org>
Cc: "David S . Miller" <davem@davemloft.net>,
	Guangbin Huang <huangguangbin2@huawei.com>,
	Ido Schimmel <idosch@nvidia.com>, Jiri Pirko <jiri@nvidia.com>,
	linux-kernel@vger.kernel.org,
	Michael Guralnik <michaelgur@mellanox.com>,
	netdev@vger.kernel.org, Saeed Mahameed <saeedm@nvidia.com>,
	Salil Mehta <salil.mehta@huawei.com>,
	Tariq Toukan <tariqt@nvidia.com>,
	Yisen Zhuang <yisen.zhuang@huawei.com>,
	Yufeng Mo <moyufeng@huawei.com>
Subject: Re: [PATCH net-next 0/5] Move devlink_register to be near devlink_reload_enable
Date: Wed, 11 Aug 2021 17:01:20 +0300	[thread overview]
Message-ID: <YRPYMKxPHUkegEhj@unreal> (raw)
In-Reply-To: <20210811062732.0f569b9a@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>

On Wed, Aug 11, 2021 at 06:27:32AM -0700, Jakub Kicinski wrote:
> On Wed, 11 Aug 2021 09:10:49 +0300 Leon Romanovsky wrote:
> > On Tue, Aug 10, 2021 at 04:53:18PM -0700, Jakub Kicinski wrote:
> > > On Tue, 10 Aug 2021 16:37:30 +0300 Leon Romanovsky wrote:  
> > > > This series prepares code to remove devlink_reload_enable/_disable API
> > > > and in order to do, we move all devlink_register() calls to be right
> > > > before devlink_reload_enable().
> > > > 
> > > > The best place for such a call should be right before exiting from
> > > > the probe().
> > > > 
> > > > This is done because devlink_register() opens devlink netlink to the
> > > > users and gives them a venue to issue commands before initialization
> > > > is finished.
> > > > 
> > > > 1. Some drivers were aware of such "functionality" and tried to protect
> > > > themselves with extra locks, state machines and devlink_reload_enable().
> > > > Let's assume that it worked for them, but I'm personally skeptical about
> > > > it.
> > > > 
> > > > 2. Some drivers copied that pattern, but without locks and state
> > > > machines. That protected them from reload flows, but not from any _set_
> > > > routines.
> > > > 
> > > > 3. And all other drivers simply didn't understand the implications of early
> > > > devlink_register() and can be seen as "broken".  
> > > 
> > > What are those implications for drivers which don't implement reload?
> > > Depending on which parts of devlink the drivers implement there may well
> > > be nothing to worry about.
> > > 
> > > Plus devlink instances start out with reload disabled. Could you please
> > > take a step back and explain why these changes are needed.  
> > 
> > The problem is that devlink_register() adds new devlink instance to the
> > list of visible devlinks (devlink_list). It means that all devlink_*_dumpit()
> > will try to access devices during their initialization, before they are ready.
> > 
> > The more troublesome case is that devlink_list is iterated in the
> > devlink_get_from_attrs() and it is used in devlink_nl_pre_doit(). The
> > latter function will return to the caller that new devlink is valid and
> > such caller will be able to proceed to *_set_doit() functions.
> > 
> > Just as an example:
> >  * user sends netlink message
> >   * devlink_nl_cmd_eswitch_set_doit()
> >    * ops->eswitch_mode_set()
> >     * Are you sure that all drivers protected here?
> >       I remind that driver is in the middle of its probe().
> > 
> > Someone can argue that drivers and devlink are protected from anything
> > harmful with their global (devlink_mutex and devlink->lock) and internal
> > (device->lock, e.t.c.) locks. However it is impossible to prove for all
> > drivers and prone to errors.
> > 
> > Reload enable/disable gives false impression that the problem exists in
> > that flow only, which is not true.
> > 
> > devlink_reload_enable() is a duct tape because reload flows much easier
> > to hit.
> 
> Right :/
> 
> > > > In this series, we focus on items #1 and #2.
> > > > 
> > > > Please share your opinion if I should change ALL other drivers to make
> > > > sure that devlink_register() is the last command or leave them in an
> > > > as-is state.  
> > > 
> > > Can you please share the output of devlink monitor and ip monitor link
> > > before and after?  The modified drivers will not register ports before
> > > they register the devlink instance itself.  
> > 
> > Not really, they will register but won't be accessible from the user space.
> > The only difference is the location of "[dev,new] ..." notification.
> 
> Is that because of mlx5's use of auxdev, or locking? I don't see
> anything that should prevent the port notification from coming out.

And it is ok, kernel can (and does) send notifications, because we left
devlink_ops assignment to be in devlink_alloc(). It ensures that all
flows that worked before will continue to work without too much changes.

> 
> I think the notifications need to get straightened out, we can't notify
> about sub-objects until the object is registered, since they are
> inaccessible.

I'm not sure about that. You present the case where kernel and user
space races against each other and historically kernel doesn't protect
from such flows. 

For example, you can randomly remove and add kernel modules. At some
point of time, you will get "missing symbols errors", just because
one module tries to load and it depends on already removed one.

We must protect kernel and this is what I do. User shouldn't access
devlink instance before he sees "dev name" notification.

Of course, we can move various iterators to devlink_register(), but it
will make code much complex, because we have objects that can be
registered at any time (IMHO. trap is one of them) and I will need to 
implement notification logic that separate objects that were created
before devlink_register and after.

Thanks

  reply	other threads:[~2021-08-11 14:01 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-10 13:37 [PATCH net-next 0/5] Move devlink_register to be near devlink_reload_enable Leon Romanovsky
2021-08-10 13:37 ` [PATCH net-next 1/5] net: hns3: remove always exist devlink pointer check Leon Romanovsky
2021-08-10 13:37 ` [PATCH net-next 2/5] net/mlx4: Move devlink_register to be the last initialization command Leon Romanovsky
2021-08-10 13:37 ` [PATCH net-next 3/5] mlxsw: core: Refactor code to publish devlink ops when device is ready Leon Romanovsky
2021-08-10 13:37 ` [PATCH net-next 4/5] net/mlx5: Accept devlink user input after driver initialization complete Leon Romanovsky
2021-08-10 13:37 ` [PATCH net-next 5/5] netdevsim: Delay user access till probe is finished Leon Romanovsky
2021-08-10 23:53 ` [PATCH net-next 0/5] Move devlink_register to be near devlink_reload_enable Jakub Kicinski
2021-08-11  6:10   ` Leon Romanovsky
2021-08-11 13:27     ` Jakub Kicinski
2021-08-11 14:01       ` Leon Romanovsky [this message]
2021-08-11 14:15         ` Leon Romanovsky
2021-08-11 14:18         ` Jakub Kicinski
2021-08-11 14:36           ` Leon Romanovsky
2021-08-12  4:10           ` Leon Romanovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YRPYMKxPHUkegEhj@unreal \
    --to=leon@kernel.org \
    --cc=davem@davemloft.net \
    --cc=huangguangbin2@huawei.com \
    --cc=idosch@nvidia.com \
    --cc=jiri@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michaelgur@mellanox.com \
    --cc=moyufeng@huawei.com \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@nvidia.com \
    --cc=salil.mehta@huawei.com \
    --cc=tariqt@nvidia.com \
    --cc=yisen.zhuang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).