Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices

From: Alex Williamson <alex.williamson@redhat.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: kvm@vger.kernel.org, mst@redhat.com, maorg@nvidia.com,
	virtualization@lists.linux-foundation.org, jiri@nvidia.com,
	leonro@nvidia.com
Subject: Re: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices
Date: Wed, 18 Oct 2023 12:29:25 -0600	[thread overview]
Message-ID: <20231018122925.3fde9405.alex.williamson@redhat.com> (raw)
In-Reply-To: <20231018163333.GZ3952@nvidia.com>

On Wed, 18 Oct 2023 13:33:33 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Tue, Oct 17, 2023 at 02:24:48PM -0600, Alex Williamson wrote:
> > On Tue, 17 Oct 2023 16:42:17 +0300
> > Yishai Hadas <yishaih@nvidia.com> wrote:  
> > > +static int virtiovf_pci_probe(struct pci_dev *pdev,
> > > +			      const struct pci_device_id *id)
> > > +{
> > > +	const struct vfio_device_ops *ops = &virtiovf_acc_vfio_pci_ops;
> > > +	struct virtiovf_pci_core_device *virtvdev;
> > > +	int ret;
> > > +
> > > +	if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
> > > +	    !virtiovf_bar0_exists(pdev) && pdev->msix_cap)
> > > +		ops = &virtiovf_acc_vfio_pci_tran_ops;  
> > 
> > This is still an issue for me, it's a very narrow use case where we
> > have a modern device and want to enable legacy support.  Implementing an
> > IO BAR and mangling the device ID seems like it should be an opt-in,
> > not standard behavior for any compatible device.  Users should
> > generally expect that the device they see in the host is the device
> > they see in the guest.  They might even rely on that principle.  
> 
> I think this should be configured when the VF is provisioned. If the
> user does not want legacy IO bar support then the VFIO VF function
> should not advertise the capability, and they won't get driver
> support.
> 
> I think that is a very reasonable way to approach this - it is how we
> approached similar problems for mlx5. The provisioning interface is
> what "profiles" the VF, regardless of if VFIO is driving it or not.

It seems like a huge assumption that every device is going to allow
this degree of specification in provisioning VFs.  mlx5 is a vendor
specific driver, it can make such assumptions in design philosophy.

> > We can't use the argument that users wanting the default device should
> > use vfio-pci rather than virtio-vfio-pci because we've already defined
> > the algorithm by which libvirt should choose a variant driver for a
> > device.  libvirt will choose this driver for all virtio-net devices.  
> 
> Well, we can if the use case is niche. I think profiling a virtio VF
> to support legacy IO bar emulation and then not wanting to use it is
> a niche case.
> 
> The same argument is going come with live migration. This same driver
> will still bind and enable live migration if the virtio function is
> profiled to support it. If you don't want that in your system then
> don't profile the VF for migration support.

What in the virtio or SR-IOV spec requires a vendor to make this
configurable?

> > This driver effectively has the option to expose two different profiles
> > for the device, native or transitional.  We've discussed profile
> > support for variant drivers previously as an equivalent functionality
> > to mdev types, but the only use case for this currently is out-of-tree.
> > I think this might be the opportunity to define how device profiles are
> > exposed and selected in a variant driver.  
> 
> Honestly, I've been trying to keep this out of VFIO...
> 
> The function is profiled when it is created, by whatever created
> it. As in the other thread we have a vast amount of variation in what
> is required to provision the function in the first place. "Legacy IO
> BAR emulation support" is just one thing. virtio-net needs to be
> hooked up to real network and get a MAC, virtio-blk needs to be hooked
> up to real storage and get a media. At a minimum. This is big and
> complicated.
> 
> It may not even be the x86 running VFIO that is doing this
> provisioning, the PCI function may come pre-provisioned from a DPU.
> 
> It feels better to keep that all in one place, in whatever external
> thing is preparing the function before giving it to VFIO. VFIO is
> concerned with operating a prepared function.
> 
> When we get to SIOV it should not be VFIO that is
> provisioning/creating functions. The owning driver should be doing
> this and routing the function to VFIO (eg with an aux device or
> otherwise)
> 
> This gets back to the qemu thread on the grace patch where we need to
> ask how does the libvirt world see this, given there is no good way to
> generically handle all scenarios without a userspace driver to operate
> elements.

So nothing here is really "all in one place", it may be in the
provisioning of the VF, outside of the scope of the host OS, it might
be a collection of scripts or operators with device or interface
specific tooling to configure the device.  Sometimes this configuration
will be before the device is probed by the vfio-pci variant driver,
sometimes in between probing and opening the device.

I don't see why it becomes out of scope if the variant driver itself
provides some means for selecting a device profile.  We have evidence
both from mdev vGPUs and here (imo) that we can expect to see that
behavior, so why wouldn't we want to attempt some basic shared
interface for variant drivers to implement for selecting such a profile
rather than add to this hodgepodge 

> > Jason had previously suggested a devlink interface for this, but I
> > understand that path had been shot down by devlink developers.    
> 
> I think we go some things support but supporting all things was shot
> down.
> 
> > Another obvious option is sysfs, where we might imagine an optional
> > "profiles" directory, perhaps under vfio-dev.  Attributes of
> > "available" and "current" could allow discovery and selection of a
> > profile similar to mdev types.  
> 
> IMHO it is a far too complex problem for sysfs.

Isn't it then just like devlink, not a silver bullet, but useful for
some configuration?  AIUI, devlink shot down a means to list available
profiles for a device and a means to select one of those profiles.
There are a variety of attributes in sysfs which perform this sort of
behavior.  Specifying a specific profile in sysfs can be difficult, and
I'm not proposing sysfs profile support as a mandatory feature, but I'm
also not a fan of the vendor specific sysfs approach that out of tree
drivers have taken.

The mdev type interface is certainly not perfect, but from it we've
been able to develop mdevctl to allow persistent and complex
configurations of mdev devices.  I'd like to see the ability to do
something like that with variant drivers that offer multiple profiles
without always depending on vendor specific interfaces.  Thanks,

Alex

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization