dmaengine Archive on lore.kernel.org
 help / color / Atom feed
From: "Tian, Kevin" <kevin.tian@intel.com>
To: Alex Williamson <alex.williamson@redhat.com>,
	Jason Gunthorpe <jgg@nvidia.com>
Cc: "Jiang, Dave" <dave.jiang@intel.com>,
	"vkoul@kernel.org" <vkoul@kernel.org>,
	"Dey, Megha" <megha.dey@intel.com>,
	"maz@kernel.org" <maz@kernel.org>,
	"bhelgaas@google.com" <bhelgaas@google.com>,
	"rafael@kernel.org" <rafael@kernel.org>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"Pan, Jacob jun" <jacob.jun.pan@intel.com>,
	"Raj, Ashok" <ashok.raj@intel.com>,
	"Liu, Yi L" <yi.l.liu@intel.com>,
	"Lu, Baolu" <baolu.lu@intel.com>,
	"Kumar, Sanjay K" <sanjay.k.kumar@intel.com>,
	"Luck, Tony" <tony.luck@intel.com>,
	"Lin, Jing" <jing.lin@intel.com>,
	"Williams, Dan J" <dan.j.williams@intel.com>,
	"kwankhede@nvidia.com" <kwankhede@nvidia.com>,
	"eric.auger@redhat.com" <eric.auger@redhat.com>,
	"parav@mellanox.com" <parav@mellanox.com>,
	"Hansen, Dave" <dave.hansen@intel.com>,
	"netanelg@mellanox.com" <netanelg@mellanox.com>,
	"shahafs@mellanox.com" <shahafs@mellanox.com>,
	"yan.y.zhao@linux.intel.com" <yan.y.zhao@linux.intel.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"Ortiz, Samuel" <samuel.ortiz@intel.com>,
	"Hossain, Mona" <mona.hossain@intel.com>,
	"dmaengine@vger.kernel.org" <dmaengine@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"x86@kernel.org" <x86@kernel.org>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: RE: [PATCH RFC v2 00/18] Add VFIO mediated device support and DEV-MSI support for the idxd driver
Date: Wed, 12 Aug 2020 01:58:00 +0000
Message-ID: <MWHPR11MB16451E4BEC7BF29C241FD69C8C420@MWHPR11MB1645.namprd11.prod.outlook.com> (raw)
In-Reply-To: <20200811110036.7d337837@x1.home>

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Wednesday, August 12, 2020 1:01 AM
> 
> On Mon, 10 Aug 2020 07:32:24 +0000
> "Tian, Kevin" <kevin.tian@intel.com> wrote:
> 
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Friday, August 7, 2020 8:20 PM
> > >
> > > On Wed, Aug 05, 2020 at 07:22:58PM -0600, Alex Williamson wrote:
> > >
> > > > If you see this as an abuse of the framework, then let's identify those
> > > > specific issues and come up with a better approach.  As we've discussed
> > > > before, things like basic PCI config space emulation are acceptable
> > > > overhead and low risk (imo) and some degree of register emulation is
> > > > well within the territory of an mdev driver.
> > >
> > > What troubles me is that idxd already has a direct userspace interface
> > > to its HW, and does userspace DMA. The purpose of this mdev is to
> > > provide a second direct userspace interface that is a little different
> > > and trivially plugs into the virtualization stack.
> >
> > No. Userspace DMA and subdevice passthrough (what mdev provides)
> > are two distinct usages IMO (at least in idxd context). and this might
> > be the main divergence between us, thus let me put more words here.
> > If we could reach consensus in this matter, which direction to go
> > would be clearer.
> >
> > First, a passthrough interface requires some unique requirements
> > which are not commonly observed in an userspace DMA interface, e.g.:
> >
> > - Tracking DMA dirty pages for live migration;
> > - A set of interfaces for using SVA inside guest;
> > 	* PASID allocation/free (on some platforms);
> > 	* bind/unbind guest mm/page table (nested translation);
> > 	* invalidate IOMMU cache/iotlb for guest page table changes;
> > 	* report page request from device to guest;
> > 	* forward page response from guest to device;
> > - Configuring irqbypass for posted interrupt;
> > - ...
> >
> > Second, a passthrough interface requires delegating raw controllability
> > of subdevice to guest driver, while the same delegation might not be
> > required for implementing an userspace DMA interface (especially for
> > modern devices which support SVA). For example, idxd allows following
> > setting per wq (guest driver may configure them in any combination):
> > 	- put in dedicated or shared mode;
> > 	- enable/disable SVA;
> > 	- Associate guest-provided PASID to MSI/IMS entry;
> > 	- set threshold;
> > 	- allow/deny privileged access;
> > 	- allocate/free interrupt handle (enlightened for guest);
> > 	- collect error status;
> > 	- ...
> >
> > We plan to support idxd userspace DMA with SVA. The driver just needs
> > to prepare a wq with a predefined configuration (e.g. shared, SVA,
> > etc.), bind the process mm to IOMMU (non-nested) and then map
> > the portal to userspace. The goal that userspace can do DMA to
> > associated wq doesn't change the fact that the wq is still *owned*
> > and *controlled* by kernel driver. However as far as passthrough
> > is concerned, the wq is considered 'owned' by the guest driver thus
> > we need an interface which can support low-level *controllability*
> > from guest driver. It is sort of a mess in uAPI when mixing the
> > two together.
> >
> > Based on above two reasons, we see distinct requirements between
> > userspace DMA and passthrough interfaces, at least in idxd context
> > (though other devices may have less distinction in-between). Therefore,
> > we didn't see the value/necessity of reinventing the wheel that mdev
> > already handles well to evolve an simple application-oriented usespace
> > DMA interface to a complex guest-driver-oriented passthrough interface.
> > The complexity of doing so would incur far more kernel-side changes
> > than the portion of emulation code that you've been concerned about...
> >
> > >
> > > I don't think VFIO should be the only entry point to
> > > virtualization. If we say the universe of devices doing user space DMA
> > > must also implement a VFIO mdev to plug into virtualization then it
> > > will be alot of mdevs.
> >
> > Certainly VFIO will not be the only entry point. and This has to be a
> > case-by-case decision.  If an userspace DMA interface can be easily
> > adapted to be a passthrough one, it might be the choice. But for idxd,
> > we see mdev a much better fit here, given the big difference between
> > what userspace DMA requires and what guest driver requires in this hw.
> >
> > >
> > > I would prefer to see that the existing userspace interface have the
> > > extra needed bits for virtualization (eg by having appropriate
> > > internal kernel APIs to make this easy) and all the emulation to build
> > > the synthetic PCI device be done in userspace.
> >
> > In the end what decides the direction is the amount of changes that
> > we have to put in kernel, not whether we call it 'emulation'. For idxd,
> > adding special passthrough requirements (guest SVA, dirty tracking,
> > etc.) and raw controllability to the simple userspace DMA interface
> > is for sure making kernel more complex than reusing the mdev
> > framework (plus some degree of emulation mockup behind). Not to
> > mention the merit of uAPI compatibility with mdev...
> 
> I agree with a lot of this argument, exposing a device through a
> userspace interface versus allowing user access to a device through a
> userspace interface are different levels of abstraction and control.
> In an ideal world, perhaps we could compose one from the other, but I
> don't think the existence of one is proof that the other is redundant.
> That's not to say that mdev/vfio isn't ripe for abuse in this space,
> but I'm afraid the test for that abuse is probably much more subtle.
> 
> I'll also remind folks that LPC is coming up in just a couple short
> weeks and this might be something we should discuss (virtually)
> in-person.  uconf CfPs are currently open. </plug>   Thanks,
> 

Yes, LPC is a good place to reach consensus. btw I saw there is 
already one VFIO topic called "device assignment/sub-assignment".
Do you think whether this can be covered under that topic, or
makes more sense to be a new one?

Thanks
Kevin

  reply index

Thread overview: 97+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-21 16:02 Dave Jiang
2020-07-21 16:02 ` [PATCH RFC v2 01/18] platform-msi: Introduce platform_msi_ops Dave Jiang
2020-07-21 16:02 ` [PATCH RFC v2 02/18] irq/dev-msi: Add support for a new DEV_MSI irq domain Dave Jiang
2020-07-21 16:13   ` Jason Gunthorpe
2020-07-22 16:50     ` Dey, Megha
2020-07-22 18:52   ` Marc Zyngier
2020-07-22 19:59     ` Jason Gunthorpe
2020-07-23  8:51       ` Marc Zyngier
2020-07-24  0:16         ` Jason Gunthorpe
2020-07-24  0:36           ` Thomas Gleixner
2020-08-05 19:18       ` Dey, Megha
2020-08-05 22:15         ` Jason Gunthorpe
2020-08-05 22:36           ` Dey, Megha
2020-08-05 22:53             ` Jason Gunthorpe
2020-08-06  0:13               ` Dey, Megha
2020-08-06  0:19                 ` Jason Gunthorpe
2020-08-06  0:32                   ` Dey, Megha
2020-08-06  0:46                     ` Jason Gunthorpe
2020-08-06 17:10                     ` Thomas Gleixner
2020-08-06 17:58                       ` Dey, Megha
2020-08-06 20:21                         ` Thomas Gleixner
2020-08-06 22:27                           ` Dey, Megha
2020-08-07  8:48                             ` Thomas Gleixner
2020-08-07 12:06                           ` Jason Gunthorpe
2020-08-07 12:38                             ` gregkh
2020-08-07 13:34                               ` Jason Gunthorpe
2020-08-07 16:47                                 ` Thomas Gleixner
2020-08-07 17:54                                   ` Dey, Megha
2020-08-07 18:39                                     ` Jason Gunthorpe
2020-08-07 20:31                                       ` Dey, Megha
2020-08-08 19:47                                     ` Thomas Gleixner
2020-08-10 21:46                                       ` Thomas Gleixner
2020-08-11  9:53                                         ` Thomas Gleixner
2020-08-11 18:46                                           ` Dey, Megha
2020-08-11 21:25                                             ` Thomas Gleixner
2020-08-11 18:39                                       ` Dey, Megha
2020-08-11 22:39                                         ` Thomas Gleixner
2020-08-07 15:22                             ` Thomas Gleixner
2020-08-05 18:55     ` Dey, Megha
2020-07-21 16:02 ` [PATCH RFC v2 03/18] irq/dev-msi: Create IR-DEV-MSI " Dave Jiang
2020-07-21 16:21   ` Jason Gunthorpe
2020-07-22 17:03     ` Dey, Megha
2020-07-22 17:33       ` Jason Gunthorpe
2020-07-22 20:44   ` Thomas Gleixner
2020-08-05 19:02     ` Dey, Megha
2020-07-21 16:02 ` [PATCH RFC v2 04/18] irq/dev-msi: Introduce APIs to allocate/free dev-msi interrupts Dave Jiang
2020-07-21 16:25   ` Jason Gunthorpe
2020-07-22 17:05     ` Dey, Megha
2020-07-22 17:35       ` Jason Gunthorpe
2020-08-05 20:19         ` Dey, Megha
2020-07-21 16:02 ` [PATCH RFC v2 05/18] dmaengine: idxd: add support for readonly config devices Dave Jiang
2020-07-21 16:02 ` [PATCH RFC v2 06/18] dmaengine: idxd: add interrupt handle request support Dave Jiang
2020-07-21 16:03 ` [PATCH RFC v2 07/18] dmaengine: idxd: add DEV-MSI support in base driver Dave Jiang
2020-07-21 16:03 ` [PATCH RFC v2 08/18] dmaengine: idxd: add device support functions in prep for mdev Dave Jiang
2020-07-21 16:03 ` [PATCH RFC v2 09/18] dmaengine: idxd: add basic mdev registration and helper functions Dave Jiang
2020-07-21 16:03 ` [PATCH RFC v2 10/18] dmaengine: idxd: add emulation rw routines Dave Jiang
2020-07-21 16:03 ` [PATCH RFC v2 11/18] dmaengine: idxd: prep for virtual device commands Dave Jiang
2020-07-21 16:03 ` [PATCH RFC v2 12/18] dmaengine: idxd: virtual device commands emulation Dave Jiang
2020-07-21 16:03 ` [PATCH RFC v2 13/18] dmaengine: idxd: ims setup for the vdcm Dave Jiang
2020-07-21 16:03 ` [PATCH RFC v2 14/18] dmaengine: idxd: add mdev type as a new wq type Dave Jiang
2020-07-21 16:03 ` [PATCH RFC v2 15/18] dmaengine: idxd: add dedicated wq mdev type Dave Jiang
2020-07-21 16:04 ` [PATCH RFC v2 16/18] dmaengine: idxd: add new wq state for mdev Dave Jiang
2020-07-21 16:04 ` [PATCH RFC v2 17/18] dmaengine: idxd: add error notification from host driver to mediated device Dave Jiang
2020-07-21 16:04 ` [PATCH RFC v2 18/18] dmaengine: idxd: add ABI documentation for mediated device support Dave Jiang
2020-07-21 16:28 ` [PATCH RFC v2 00/18] Add VFIO mediated device support and DEV-MSI support for the idxd driver Greg KH
2020-07-21 17:17   ` Dave Jiang
2020-07-21 21:35   ` Dan Williams
2020-07-21 16:45 ` Jason Gunthorpe
2020-07-21 18:00   ` Dave Jiang
2020-07-22 17:31     ` Dey, Megha
2020-07-22 18:16       ` Jason Gunthorpe
2020-07-21 23:54   ` Tian, Kevin
2020-07-24  0:19     ` Jason Gunthorpe
2020-08-06  1:22       ` Alex Williamson
2020-08-07 12:19         ` Jason Gunthorpe
2020-08-10  7:32           ` Tian, Kevin
2020-08-11 17:00             ` Alex Williamson
2020-08-12  1:58               ` Tian, Kevin [this message]
2020-08-12  2:36                 ` Alex Williamson
2020-08-12  3:35                   ` Tian, Kevin
2020-08-12  3:28             ` Jason Wang
2020-08-12  4:05               ` Tian, Kevin
2020-08-13  4:33                 ` Jason Wang
2020-08-13  5:26                   ` Tian, Kevin
2020-08-13  6:01                     ` Jason Wang
2020-08-14 13:23                       ` Jason Gunthorpe
2020-08-17  2:24                         ` Tian, Kevin
2020-08-14 13:35             ` Jason Gunthorpe
2020-08-17  2:12               ` Tian, Kevin
2020-08-18  0:43                 ` Jason Gunthorpe
2020-08-18  1:09                   ` Tian, Kevin
2020-08-18 11:50                     ` Jason Gunthorpe
2020-08-18 16:27                       ` Paolo Bonzini
2020-08-18 16:49                         ` Jason Gunthorpe
2020-08-18 17:05                           ` Paolo Bonzini
2020-08-18 17:18                             ` Jason Gunthorpe
2020-08-19  7:29                       ` Tian, Kevin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MWHPR11MB16451E4BEC7BF29C241FD69C8C420@MWHPR11MB1645.namprd11.prod.outlook.com \
    --to=kevin.tian@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=ashok.raj@intel.com \
    --cc=baolu.lu@intel.com \
    --cc=bhelgaas@google.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dmaengine@vger.kernel.org \
    --cc=eric.auger@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hpa@zytor.com \
    --cc=jacob.jun.pan@intel.com \
    --cc=jgg@nvidia.com \
    --cc=jing.lin@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=megha.dey@intel.com \
    --cc=mona.hossain@intel.com \
    --cc=netanelg@mellanox.com \
    --cc=parav@mellanox.com \
    --cc=pbonzini@redhat.com \
    --cc=rafael@kernel.org \
    --cc=samuel.ortiz@intel.com \
    --cc=sanjay.k.kumar@intel.com \
    --cc=shahafs@mellanox.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=vkoul@kernel.org \
    --cc=x86@kernel.org \
    --cc=yan.y.zhao@linux.intel.com \
    --cc=yi.l.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

dmaengine Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/dmaengine/0 dmaengine/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dmaengine dmaengine/ https://lore.kernel.org/dmaengine \
		dmaengine@vger.kernel.org
	public-inbox-index dmaengine

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.dmaengine


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git