linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>,
	Max Gurtovoy <mgurtovoy@nvidia.com>,
	Alexey Kardashevskiy <aik@ozlabs.ru>,
	cohuck@redhat.com, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, liranl@nvidia.com, oren@nvidia.com,
	tzahio@nvidia.com, leonro@nvidia.com, yarong@nvidia.com,
	aviadye@nvidia.com, shahafs@nvidia.com, artemp@nvidia.com,
	kwankhede@nvidia.com, ACurrid@nvidia.com, cjia@nvidia.com,
	yishaih@nvidia.com, mjrosato@linux.ibm.com
Subject: Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers
Date: Sun, 21 Mar 2021 09:58:18 -0300	[thread overview]
Message-ID: <20210321125818.GM2356281@nvidia.com> (raw)
In-Reply-To: <20210319224028.51b01435@x1.home.shazbot.org>

On Fri, Mar 19, 2021 at 10:40:28PM -0600, Alex Williamson wrote:

> > Well, today we don't, but Max here adds id_table's to the special
> > devices and a MODULE_DEVICE_TABLE would come too if we do the flavours
> > thing below.
> 
> I think the id_tables are the wrong approach for IGD and NVLink
> variants.

I really disagree with this. Checking for some random bits in firmware
and assuming that every device made forever into the future works with
this check is not a good way to do compatibility. Christoph made the
same point.

We have good processes to maintain id tables, I don't see this as a
problem.

> > As-is driver_override seems dangerous as overriding the matching table
> > could surely allow root userspace to crash the machine. In situations
> > with trusted boot/signed modules this shouldn't be.
> 
> When we're dealing with meta-drivers that can bind to anything, we
> shouldn't rely on the match, but should instead verify the driver is
> appropriate in the probe callback.  Even without driver_override,
> there's the new_id mechanism.  Either method allows the root user to
> break driver binding.  Greg has previously stated something to the
> effect that users get to keep all the pieces when they break something
> by manipulating driver binding.

Yes, but that is a view where root is allowed to break the kernel, we
now have this optional other world where that is not allowed and root
access to lots of dangerous things are now disabled.

new_id and driver_override should probably be in that disable list
too..

> > While that might not seem too bad with these simple drivers, at least
> > the mlx5 migration driver will have a large dependency tree and pull
> > in lots of other modules. Even Max's sample from v1 pulls in mlx5_core.ko
> > and a bunch of other stuff in its orbit.
> 
> Luckily the mlx5 driver doesn't need to be covered by compatibility
> support, so we don't need to set a softdep for it and the module could
> be named such that a wildcard driver_override of vfio_pci* shouldn't
> logically include that driver.  Users can manually create their own
> modprobe.d softdep entry if they'd like to include it.  Otherwise
> userspace would need to know to bind to it specifically.

But now you are giving up on the whole point, which was to
automatically load the correct specific module without special admin
involvement!

> > This is why I want to try for fine grained autoloading first. It
> > really is the elegant solution if we can work it out.
> 
> I just don't see how we create a manageable change to userspace.

I'm not sure I understand. Even if we add a new sysfs to set some
flavour then that is a pretty trivial change for userspace to move
from driver_override?

> > I don't think we should over-focus on these two firmware triggered
> > examples. I looked at the Intel GPU driver and it already only reads
> > the firmware thing for certain PCI ID's, we can absolutely generate a
> > narrow match table for it. Same is true for the NVIDIA GPU.
> 
> I'm not sure we can make this assertion, both only care about the type
> of device and existence of associated firmware tables.  

Well, I read through the Intel GPU driver and this is how I felt it
works. It doesn't even check the firmware bit unless certain PCI IDs
are matched first.

For NVIDIA GPU Max checked internally and we saw it looks very much
like how Intel GPU works. Only some PCI IDs trigger checking on the
feature the firmware thing is linked to.

My point is: the actual *drivers* consuming these firmware features do
*not* blindly match every PCI device and check for the firmware
bit. They all have narrow matches and further only try to use the
firmware thing for some subset of PCI IDs that the entire driver
supports.

Given that the actual drivers work this way there is no technical
reason vfio-pci can't do this as well.

We don't have to change them of course, they can stay as is if people
feel really strongly.

> > Even so, I'm not *so* worried about "over matching" - if IGD or the
> > nvidia stuff load on a wide set of devices then they can just not
> > enable their extended stuff. It wastes some kernel memory, but it is
> > OK.
> 
> I'd rather they bind to the base vfio-pci driver if their extended
> features are not available.

Sure it would be nice, but functionally it is no different.

> > And if some driver *really* gets stuck here the true answer is to
> > improve the driver core match capability.
> > 
> > > devices in the deny-list and non-endpoint devices.  Many drivers
> > > clearly place implicit trust in their id_table, others don't.  In the
> > > case of meta drivers, I think it's fair to make use of the latter
> > > approach.  
> > 
> > Well, AFAIK, the driver core doesn't have a 'try probe, if it fails
> > then try another driver' approach. One device, one driver. Am I
> > missing something?
> 
> If the driver probe callback fails, really_probe() returns 0 with the
> comment:
> 
>         /*
>          * Ignore errors returned by ->probe so that the next driver can try
>          * its luck.
>          */
>         ret = 0;
> 
> That allows bus_for_each_drv() to continue to iterate.

Er, but we have no reliable way to order drivers in the list so this
still assumes the system has exactly one driver match (even if some of
the match is now in code).

It won't work with a "universal" driver without more changes.

(and I couldn't find out why Cornelia added this long ago, or how or
even if it actually ended up being used)

> > I also think the softdep/implicit loading/ordering will not be
> > welcomed, it feels weird to me.
> 
> AFAICT, it works within the existing driver-core, it's largely an
> extension to pci-core driver_override support to enable wildcard
> matching, ideally along with adding the same for all buses that support
> driver_override.  Thanks,

It is the implicit ordering of module loading that is trouble.

Regards,
Jason

  reply	other threads:[~2021-03-21 12:59 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-09  8:33 [PATCH v3 0/9] Introduce vfio-pci-core subsystem Max Gurtovoy
2021-03-09  8:33 ` [PATCH 1/9] vfio-pci: rename vfio_pci.c to vfio_pci_core.c Max Gurtovoy
2021-03-09  8:33 ` [PATCH 2/9] vfio-pci: rename vfio_pci_private.h to vfio_pci_core.h Max Gurtovoy
2021-03-09  8:33 ` [PATCH 3/9] vfio-pci: rename vfio_pci_device to vfio_pci_core_device Max Gurtovoy
2021-03-09  8:33 ` [PATCH 4/9] vfio-pci: introduce vfio_pci_core subsystem driver Max Gurtovoy
2021-03-09  8:33 ` [PATCH 5/9] vfio/pci: introduce vfio_pci_device structure Max Gurtovoy
2021-03-09  8:33 ` [PATCH 6/9] vfio-pci-core: export vfio_pci_register_dev_region function Max Gurtovoy
2021-03-09  8:33 ` [PATCH 7/9] vfio/pci_core: split nvlink2 to nvlink2gpu and npu2 Max Gurtovoy
2021-03-10  8:08   ` Christoph Hellwig
2021-03-09  8:33 ` [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers Max Gurtovoy
2021-03-10  6:39   ` Alexey Kardashevskiy
2021-03-10 12:57     ` Max Gurtovoy
2021-03-10 13:02       ` Jason Gunthorpe
2021-03-10 14:24         ` Alexey Kardashevskiy
2021-03-10 19:40           ` Jason Gunthorpe
2021-03-11  1:20             ` Alexey Kardashevskiy
2021-03-11  1:34               ` Jason Gunthorpe
2021-03-11  1:42                 ` Alexey Kardashevskiy
2021-03-11  2:00                   ` Jason Gunthorpe
2021-03-11  7:54                     ` Alexey Kardashevskiy
2021-03-11  9:44                       ` Max Gurtovoy
2021-03-11 16:51                         ` Jason Gunthorpe
2021-03-11 17:01                       ` Jason Gunthorpe
2021-03-10 14:19       ` Alexey Kardashevskiy
2021-03-11  1:10         ` Max Gurtovoy
2021-03-19 15:23       ` Alex Williamson
2021-03-19 16:17         ` Jason Gunthorpe
2021-03-19 16:20           ` Christoph Hellwig
2021-03-19 16:28             ` Jason Gunthorpe
2021-03-19 16:34               ` Christoph Hellwig
2021-03-19 17:36                 ` Alex Williamson
2021-03-19 20:07                   ` Jason Gunthorpe
2021-03-19 21:08                     ` Alex Williamson
2021-03-19 22:59                       ` Jason Gunthorpe
2021-03-20  4:40                         ` Alex Williamson
2021-03-21 12:58                           ` Jason Gunthorpe [this message]
2021-03-22 16:40                             ` Alex Williamson
2021-03-23 19:32                               ` Jason Gunthorpe
2021-03-24  2:39                                 ` Alexey Kardashevskiy
2021-03-29 23:10                                 ` Alex Williamson
2021-04-01 13:04                                   ` Cornelia Huck
2021-04-01 13:12                                   ` Jason Gunthorpe
2021-04-01 21:49                                     ` Alex Williamson
2021-03-22 15:11                     ` Christoph Hellwig
2021-03-22 16:44                       ` Jason Gunthorpe
2021-03-23 13:17                         ` Christoph Hellwig
2021-03-23 13:42                           ` Jason Gunthorpe
2021-03-09  8:33 ` [PATCH 9/9] vfio/pci: export igd support into vendor vfio_pci driver Max Gurtovoy
2021-03-10  8:15   ` Christoph Hellwig
2021-03-10 12:31     ` Jason Gunthorpe
2021-03-11 11:37       ` Christoph Hellwig
2021-03-11 12:09         ` Max Gurtovoy
2021-03-11 15:43         ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210321125818.GM2356281@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=ACurrid@nvidia.com \
    --cc=aik@ozlabs.ru \
    --cc=alex.williamson@redhat.com \
    --cc=artemp@nvidia.com \
    --cc=aviadye@nvidia.com \
    --cc=cjia@nvidia.com \
    --cc=cohuck@redhat.com \
    --cc=hch@lst.de \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=leonro@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=liranl@nvidia.com \
    --cc=mgurtovoy@nvidia.com \
    --cc=mjrosato@linux.ibm.com \
    --cc=oren@nvidia.com \
    --cc=shahafs@nvidia.com \
    --cc=tzahio@nvidia.com \
    --cc=yarong@nvidia.com \
    --cc=yishaih@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).