linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@lst.de>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Christoph Hellwig <hch@lst.de>, Lei Rao <lei.rao@intel.com>,
	kbusch@kernel.org, axboe@fb.com, kch@nvidia.com,
	sagi@grimberg.me, alex.williamson@redhat.com, cohuck@redhat.com,
	yishaih@nvidia.com, shameerali.kolothum.thodi@huawei.com,
	kevin.tian@intel.com, mjrosato@linux.ibm.com,
	linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org,
	kvm@vger.kernel.org, eddie.dong@intel.com, yadong.li@intel.com,
	yi.l.liu@intel.com, Konrad.wilk@oracle.com,
	stephen@eideticom.com, hang.yuan@intel.com
Subject: Re: [RFC PATCH 1/5] nvme-pci: add function nvme_submit_vf_cmd to issue admin commands for VF driver.
Date: Tue, 13 Dec 2022 17:08:07 +0100	[thread overview]
Message-ID: <20221213160807.GA626@lst.de> (raw)
In-Reply-To: <Y5iFnw5i3vI9iMFY@ziepe.ca>

On Tue, Dec 13, 2022 at 10:01:03AM -0400, Jason Gunthorpe wrote:
> > So now we need to write a vfio shim for every function even if there
> > is absolutely nothing special about that function?  Migrating really
> > is the controlling functions behavior, and writing a new vfio bit
> > for every controlled thing just does not scale.
> 
> Huh? "does not scale?" We are looking at boilerplates of around 20-30
> lines to make a VFIO driver for a real PCI device. Why is that even
> something we should worry about optimizing?

But we need a new driver for every controlled function now, which
is very different from the classic VFIO model where we had one
vfio_pci.

> And when you get into exciting future devices like SIOV you already
> need to make a special VFIO driver anyhow.

You need to special support for it.  It's probably not another
Linux driver but part of the parent one, though.

> So far 100% of the drivers that have been presented, including the two
> RFC ones, have entanglements between live migration and vfio. Shifting
> things to dev/live_migration doesn't make the "communication problem"
> away, it just shifted it into another subsystem.

The main entanglement seems to be that it needs to support a vfio
interface for live migration while the actual commands go to the
parent device.

> This is my point, I've yet to even see a driver that meets your
> theoretical standard that it can exist without vfio entanglement.

It can't right now due to the VFIO design.

> > While creating the VFs from the PF driver makes a lot more sense,
> > remember that vfio is absolutely not the only use case for VFs.
> > There are plenty use cases where you want to use them with the normal
> > kernel driver as well.  So the interface to create VFs needs a now
> > to decide if it should be vfio exported, or use the normal kernel
> > binding.
> 
> Yes, that is why this problem has been open for so long. Fixing it
> well requires some reconsideration of how the driver core works :(
> 
> It is worse than just VFIO vs one kernel driver, like mlx5 could spawn
> a controlled function that is NVMe, VDPA, mlx5, virtio-net, VFIO,
> etc.

This seems to violate the PCIe spec, which says:

"All VFs associated with a PF must be the same device type as the PF,
(e.g., the same network device type or the same storage device type.)",

which is also enforced by not allowing to read vendor/device/class
fields from VFs.

(not that I'm arguing that this is a good limit, but that's how
PCIe does it).

> When we create the function we really want to tell the device what
> kind of function it is, and that also tells the kernel what driver
> should be bound to it.

I'd rather have different ways to probe by passing a "kind" or "type"
argument along the device IDs during probing.  E.g. "driver"
and "vfio", and then only match for the kind the creator of the device
added them to the device model for. 

> mlx5 even has weird limitations, like a controlled function that is
> live migration capable has fewer features than a function that is
> not. So the user must specify what parameters it wants the controlled
> function to have..

I don't think that is weird.  If you want to live migrate, you need to

 a) make sure the feature set is compatible with the other side
 b) there is only state that actually is migratable

so I'd expect that for any other sufficiently complex device.  NVMe
for sure will have limits like this.

  reply	other threads:[~2022-12-13 16:08 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-06  5:58 [RFC PATCH 0/5] Add new VFIO PCI driver for NVMe devices Lei Rao
2022-12-06  5:58 ` [RFC PATCH 1/5] nvme-pci: add function nvme_submit_vf_cmd to issue admin commands for VF driver Lei Rao
2022-12-06  6:19   ` Christoph Hellwig
2022-12-06 13:44     ` Jason Gunthorpe
2022-12-06 13:51       ` Keith Busch
2022-12-06 14:27         ` Jason Gunthorpe
2022-12-06 13:58       ` Christoph Hellwig
2022-12-06 15:22         ` Jason Gunthorpe
2022-12-06 15:38           ` Christoph Hellwig
2022-12-06 15:51             ` Jason Gunthorpe
2022-12-06 16:55               ` Christoph Hellwig
2022-12-06 19:15                 ` Jason Gunthorpe
2022-12-07  2:30                   ` Max Gurtovoy
2022-12-07  7:58                     ` Christoph Hellwig
2022-12-09  2:11                       ` Tian, Kevin
2022-12-12  7:41                         ` Christoph Hellwig
2022-12-07  7:54                   ` Christoph Hellwig
2022-12-07 10:59                     ` Max Gurtovoy
2022-12-07 13:46                       ` Christoph Hellwig
2022-12-07 14:50                         ` Max Gurtovoy
2022-12-07 16:35                           ` Christoph Hellwig
2022-12-07 13:34                     ` Jason Gunthorpe
2022-12-07 13:52                       ` Christoph Hellwig
2022-12-07 15:07                         ` Jason Gunthorpe
2022-12-07 16:38                           ` Christoph Hellwig
2022-12-07 17:31                             ` Jason Gunthorpe
2022-12-07 18:33                               ` Christoph Hellwig
2022-12-07 20:08                                 ` Jason Gunthorpe
2022-12-09  2:50                                   ` Tian, Kevin
2022-12-09 18:56                                     ` Dong, Eddie
2022-12-11 11:39                                   ` Max Gurtovoy
2022-12-12  7:55                                     ` Christoph Hellwig
2022-12-12 14:49                                       ` Max Gurtovoy
2022-12-12  7:50                                   ` Christoph Hellwig
2022-12-13 14:01                                     ` Jason Gunthorpe
2022-12-13 16:08                                       ` Christoph Hellwig [this message]
2022-12-13 17:49                                         ` Jason Gunthorpe
2022-12-06  5:58 ` [RFC PATCH 2/5] nvme-vfio: add new vfio-pci driver for NVMe device Lei Rao
2022-12-06  5:58 ` [RFC PATCH 3/5] nvme-vfio: enable the function of VFIO live migration Lei Rao
2023-01-19 10:21   ` Max Gurtovoy
2023-02-09  9:09     ` Rao, Lei
2022-12-06  5:58 ` [RFC PATCH 4/5] nvme-vfio: check if the hardware supports " Lei Rao
2022-12-06 13:47   ` Keith Busch
2022-12-06  5:58 ` [RFC PATCH 5/5] nvme-vfio: Add a document for the NVMe device Lei Rao
2022-12-06  6:26   ` Christoph Hellwig
2022-12-06 13:05     ` Jason Gunthorpe
2022-12-06 13:09       ` Christoph Hellwig
2022-12-06 13:52         ` Jason Gunthorpe
2022-12-06 14:00           ` Christoph Hellwig
2022-12-06 14:20             ` Jason Gunthorpe
2022-12-06 14:31               ` Christoph Hellwig
2022-12-06 14:48                 ` Jason Gunthorpe
2022-12-06 15:01                   ` Christoph Hellwig
2022-12-06 15:28                     ` Jason Gunthorpe
2022-12-06 15:35                       ` Christoph Hellwig
2022-12-06 18:00                         ` Dong, Eddie
2022-12-12  7:57                           ` Christoph Hellwig
2022-12-11 12:05                     ` Max Gurtovoy
2022-12-11 13:21                       ` Rao, Lei
2022-12-11 14:51                         ` Max Gurtovoy
2022-12-12  1:20                           ` Rao, Lei
2022-12-12  8:09                           ` Christoph Hellwig
2022-12-09  2:05         ` Tian, Kevin
2022-12-09 16:53           ` Li, Yadong
2022-12-12  8:11             ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221213160807.GA626@lst.de \
    --to=hch@lst.de \
    --cc=Konrad.wilk@oracle.com \
    --cc=alex.williamson@redhat.com \
    --cc=axboe@fb.com \
    --cc=cohuck@redhat.com \
    --cc=eddie.dong@intel.com \
    --cc=hang.yuan@intel.com \
    --cc=jgg@ziepe.ca \
    --cc=kbusch@kernel.org \
    --cc=kch@nvidia.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=lei.rao@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=mjrosato@linux.ibm.com \
    --cc=sagi@grimberg.me \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=stephen@eideticom.com \
    --cc=yadong.li@intel.com \
    --cc=yi.l.liu@intel.com \
    --cc=yishaih@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).