dmaengine Archive on lore.kernel.org
 help / color / Atom feed
From: "Tian, Kevin" <kevin.tian@intel.com>
To: Jason Wang <jasowang@redhat.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	"Alex Williamson" <alex.williamson@redhat.com>
Cc: "Jiang, Dave" <dave.jiang@intel.com>,
	"vkoul@kernel.org" <vkoul@kernel.org>,
	"Dey, Megha" <megha.dey@intel.com>,
	"maz@kernel.org" <maz@kernel.org>,
	"bhelgaas@google.com" <bhelgaas@google.com>,
	"rafael@kernel.org" <rafael@kernel.org>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"Pan, Jacob jun" <jacob.jun.pan@intel.com>,
	"Raj, Ashok" <ashok.raj@intel.com>,
	"Liu, Yi L" <yi.l.liu@intel.com>,
	"Lu, Baolu" <baolu.lu@intel.com>,
	"Kumar, Sanjay K" <sanjay.k.kumar@intel.com>,
	"Luck, Tony" <tony.luck@intel.com>,
	"Lin, Jing" <jing.lin@intel.com>,
	"Williams, Dan J" <dan.j.williams@intel.com>,
	"kwankhede@nvidia.com" <kwankhede@nvidia.com>,
	"eric.auger@redhat.com" <eric.auger@redhat.com>,
	"parav@mellanox.com" <parav@mellanox.com>,
	"Hansen, Dave" <dave.hansen@intel.com>,
	"netanelg@mellanox.com" <netanelg@mellanox.com>,
	"shahafs@mellanox.com" <shahafs@mellanox.com>,
	"yan.y.zhao@linux.intel.com" <yan.y.zhao@linux.intel.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"Ortiz, Samuel" <samuel.ortiz@intel.com>,
	"Hossain, Mona" <mona.hossain@intel.com>,
	"dmaengine@vger.kernel.org" <dmaengine@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"x86@kernel.org" <x86@kernel.org>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: RE: [PATCH RFC v2 00/18] Add VFIO mediated device support and DEV-MSI support for the idxd driver
Date: Wed, 12 Aug 2020 04:05:23 +0000
Message-ID: <MWHPR11MB1645DDC2C87D533B2A09A2D58C420@MWHPR11MB1645.namprd11.prod.outlook.com> (raw)
In-Reply-To: <b59ce5b0-5530-1f30-9852-409f7c9f630a@redhat.com>

> From: Jason Wang <jasowang@redhat.com>
> Sent: Wednesday, August 12, 2020 11:28 AM
> 
> 
> On 2020/8/10 下午3:32, Tian, Kevin wrote:
> >> From: Jason Gunthorpe <jgg@nvidia.com>
> >> Sent: Friday, August 7, 2020 8:20 PM
> >>
> >> On Wed, Aug 05, 2020 at 07:22:58PM -0600, Alex Williamson wrote:
> >>
> >>> If you see this as an abuse of the framework, then let's identify those
> >>> specific issues and come up with a better approach.  As we've discussed
> >>> before, things like basic PCI config space emulation are acceptable
> >>> overhead and low risk (imo) and some degree of register emulation is
> >>> well within the territory of an mdev driver.
> >> What troubles me is that idxd already has a direct userspace interface
> >> to its HW, and does userspace DMA. The purpose of this mdev is to
> >> provide a second direct userspace interface that is a little different
> >> and trivially plugs into the virtualization stack.
> > No. Userspace DMA and subdevice passthrough (what mdev provides)
> > are two distinct usages IMO (at least in idxd context). and this might
> > be the main divergence between us, thus let me put more words here.
> > If we could reach consensus in this matter, which direction to go
> > would be clearer.
> >
> > First, a passthrough interface requires some unique requirements
> > which are not commonly observed in an userspace DMA interface, e.g.:
> >
> > - Tracking DMA dirty pages for live migration;
> > - A set of interfaces for using SVA inside guest;
> > 	* PASID allocation/free (on some platforms);
> > 	* bind/unbind guest mm/page table (nested translation);
> > 	* invalidate IOMMU cache/iotlb for guest page table changes;
> > 	* report page request from device to guest;
> > 	* forward page response from guest to device;
> > - Configuring irqbypass for posted interrupt;
> > - ...
> >
> > Second, a passthrough interface requires delegating raw controllability
> > of subdevice to guest driver, while the same delegation might not be
> > required for implementing an userspace DMA interface (especially for
> > modern devices which support SVA). For example, idxd allows following
> > setting per wq (guest driver may configure them in any combination):
> > 	- put in dedicated or shared mode;
> > 	- enable/disable SVA;
> > 	- Associate guest-provided PASID to MSI/IMS entry;
> > 	- set threshold;
> > 	- allow/deny privileged access;
> > 	- allocate/free interrupt handle (enlightened for guest);
> > 	- collect error status;
> > 	- ...
> >
> > We plan to support idxd userspace DMA with SVA. The driver just needs
> > to prepare a wq with a predefined configuration (e.g. shared, SVA,
> > etc.), bind the process mm to IOMMU (non-nested) and then map
> > the portal to userspace. The goal that userspace can do DMA to
> > associated wq doesn't change the fact that the wq is still *owned*
> > and *controlled* by kernel driver. However as far as passthrough
> > is concerned, the wq is considered 'owned' by the guest driver thus
> > we need an interface which can support low-level *controllability*
> > from guest driver. It is sort of a mess in uAPI when mixing the
> > two together.
> 
> 
> So for userspace drivers like DPDK, it can use both of the two uAPIs?

yes.

> 
> 
> >
> > Based on above two reasons, we see distinct requirements between
> > userspace DMA and passthrough interfaces, at least in idxd context
> > (though other devices may have less distinction in-between). Therefore,
> > we didn't see the value/necessity of reinventing the wheel that mdev
> > already handles well to evolve an simple application-oriented usespace
> > DMA interface to a complex guest-driver-oriented passthrough interface.
> > The complexity of doing so would incur far more kernel-side changes
> > than the portion of emulation code that you've been concerned about...
> >
> >> I don't think VFIO should be the only entry point to
> >> virtualization. If we say the universe of devices doing user space DMA
> >> must also implement a VFIO mdev to plug into virtualization then it
> >> will be alot of mdevs.
> > Certainly VFIO will not be the only entry point. and This has to be a
> > case-by-case decision.
> 
> 
> The problem is that if we tie all controls via VFIO uAPI, the other
> subsystem like vDPA is likely to duplicate them. I wonder if there is a
> way to decouple the vSVA out of VFIO uAPI?

vSVA is a per-device (either pdev or mdev) feature thus naturally should 
be managed by its device driver (VFIO or vDPA). From this angle some
duplication is inevitable given VFIO and vDPA are orthogonal passthrough
frameworks. Within the kernel the majority of vSVA handling is done by
IOMMU and IOASID modules thus most logic are shared.

> 
> 
> >   If an userspace DMA interface can be easily
> > adapted to be a passthrough one, it might be the choice.
> 
> 
> It's not that easy even for VFIO which requires a lot of new uAPIs and
> infrastructures(e.g mdev) to be invented.
> 
> 
> > But for idxd,
> > we see mdev a much better fit here, given the big difference between
> > what userspace DMA requires and what guest driver requires in this hw.
> 
> 
> A weak point for mdev is that it can't serve kernel subsystem other than
> VFIO. In this case, you need some other infrastructures (like [1]) to do
> this.

mdev is not exclusive from kernel usages. It's perfectly fine for a driver
to reserve some work queues for host usages, while wrapping others
into mdevs.

Thanks
Kevin

> 
> (For idxd, you probably don't need this, but it's pretty common in the
> case of networking or storage device.)
> 
> Thanks
> 
> [1] https://patchwork.kernel.org/patch/11280547/
> 
> 
> >
> >> I would prefer to see that the existing userspace interface have the
> >> extra needed bits for virtualization (eg by having appropriate
> >> internal kernel APIs to make this easy) and all the emulation to build
> >> the synthetic PCI device be done in userspace.
> > In the end what decides the direction is the amount of changes that
> > we have to put in kernel, not whether we call it 'emulation'. For idxd,
> > adding special passthrough requirements (guest SVA, dirty tracking,
> > etc.) and raw controllability to the simple userspace DMA interface
> > is for sure making kernel more complex than reusing the mdev
> > framework (plus some degree of emulation mockup behind). Not to
> > mention the merit of uAPI compatibility with mdev...
> >
> > Thanks
> > Kevin
> >


  reply index

Thread overview: 97+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-21 16:02 Dave Jiang
2020-07-21 16:02 ` [PATCH RFC v2 01/18] platform-msi: Introduce platform_msi_ops Dave Jiang
2020-07-21 16:02 ` [PATCH RFC v2 02/18] irq/dev-msi: Add support for a new DEV_MSI irq domain Dave Jiang
2020-07-21 16:13   ` Jason Gunthorpe
2020-07-22 16:50     ` Dey, Megha
2020-07-22 18:52   ` Marc Zyngier
2020-07-22 19:59     ` Jason Gunthorpe
2020-07-23  8:51       ` Marc Zyngier
2020-07-24  0:16         ` Jason Gunthorpe
2020-07-24  0:36           ` Thomas Gleixner
2020-08-05 19:18       ` Dey, Megha
2020-08-05 22:15         ` Jason Gunthorpe
2020-08-05 22:36           ` Dey, Megha
2020-08-05 22:53             ` Jason Gunthorpe
2020-08-06  0:13               ` Dey, Megha
2020-08-06  0:19                 ` Jason Gunthorpe
2020-08-06  0:32                   ` Dey, Megha
2020-08-06  0:46                     ` Jason Gunthorpe
2020-08-06 17:10                     ` Thomas Gleixner
2020-08-06 17:58                       ` Dey, Megha
2020-08-06 20:21                         ` Thomas Gleixner
2020-08-06 22:27                           ` Dey, Megha
2020-08-07  8:48                             ` Thomas Gleixner
2020-08-07 12:06                           ` Jason Gunthorpe
2020-08-07 12:38                             ` gregkh
2020-08-07 13:34                               ` Jason Gunthorpe
2020-08-07 16:47                                 ` Thomas Gleixner
2020-08-07 17:54                                   ` Dey, Megha
2020-08-07 18:39                                     ` Jason Gunthorpe
2020-08-07 20:31                                       ` Dey, Megha
2020-08-08 19:47                                     ` Thomas Gleixner
2020-08-10 21:46                                       ` Thomas Gleixner
2020-08-11  9:53                                         ` Thomas Gleixner
2020-08-11 18:46                                           ` Dey, Megha
2020-08-11 21:25                                             ` Thomas Gleixner
2020-08-11 18:39                                       ` Dey, Megha
2020-08-11 22:39                                         ` Thomas Gleixner
2020-08-07 15:22                             ` Thomas Gleixner
2020-08-05 18:55     ` Dey, Megha
2020-07-21 16:02 ` [PATCH RFC v2 03/18] irq/dev-msi: Create IR-DEV-MSI " Dave Jiang
2020-07-21 16:21   ` Jason Gunthorpe
2020-07-22 17:03     ` Dey, Megha
2020-07-22 17:33       ` Jason Gunthorpe
2020-07-22 20:44   ` Thomas Gleixner
2020-08-05 19:02     ` Dey, Megha
2020-07-21 16:02 ` [PATCH RFC v2 04/18] irq/dev-msi: Introduce APIs to allocate/free dev-msi interrupts Dave Jiang
2020-07-21 16:25   ` Jason Gunthorpe
2020-07-22 17:05     ` Dey, Megha
2020-07-22 17:35       ` Jason Gunthorpe
2020-08-05 20:19         ` Dey, Megha
2020-07-21 16:02 ` [PATCH RFC v2 05/18] dmaengine: idxd: add support for readonly config devices Dave Jiang
2020-07-21 16:02 ` [PATCH RFC v2 06/18] dmaengine: idxd: add interrupt handle request support Dave Jiang
2020-07-21 16:03 ` [PATCH RFC v2 07/18] dmaengine: idxd: add DEV-MSI support in base driver Dave Jiang
2020-07-21 16:03 ` [PATCH RFC v2 08/18] dmaengine: idxd: add device support functions in prep for mdev Dave Jiang
2020-07-21 16:03 ` [PATCH RFC v2 09/18] dmaengine: idxd: add basic mdev registration and helper functions Dave Jiang
2020-07-21 16:03 ` [PATCH RFC v2 10/18] dmaengine: idxd: add emulation rw routines Dave Jiang
2020-07-21 16:03 ` [PATCH RFC v2 11/18] dmaengine: idxd: prep for virtual device commands Dave Jiang
2020-07-21 16:03 ` [PATCH RFC v2 12/18] dmaengine: idxd: virtual device commands emulation Dave Jiang
2020-07-21 16:03 ` [PATCH RFC v2 13/18] dmaengine: idxd: ims setup for the vdcm Dave Jiang
2020-07-21 16:03 ` [PATCH RFC v2 14/18] dmaengine: idxd: add mdev type as a new wq type Dave Jiang
2020-07-21 16:03 ` [PATCH RFC v2 15/18] dmaengine: idxd: add dedicated wq mdev type Dave Jiang
2020-07-21 16:04 ` [PATCH RFC v2 16/18] dmaengine: idxd: add new wq state for mdev Dave Jiang
2020-07-21 16:04 ` [PATCH RFC v2 17/18] dmaengine: idxd: add error notification from host driver to mediated device Dave Jiang
2020-07-21 16:04 ` [PATCH RFC v2 18/18] dmaengine: idxd: add ABI documentation for mediated device support Dave Jiang
2020-07-21 16:28 ` [PATCH RFC v2 00/18] Add VFIO mediated device support and DEV-MSI support for the idxd driver Greg KH
2020-07-21 17:17   ` Dave Jiang
2020-07-21 21:35   ` Dan Williams
2020-07-21 16:45 ` Jason Gunthorpe
2020-07-21 18:00   ` Dave Jiang
2020-07-22 17:31     ` Dey, Megha
2020-07-22 18:16       ` Jason Gunthorpe
2020-07-21 23:54   ` Tian, Kevin
2020-07-24  0:19     ` Jason Gunthorpe
2020-08-06  1:22       ` Alex Williamson
2020-08-07 12:19         ` Jason Gunthorpe
2020-08-10  7:32           ` Tian, Kevin
2020-08-11 17:00             ` Alex Williamson
2020-08-12  1:58               ` Tian, Kevin
2020-08-12  2:36                 ` Alex Williamson
2020-08-12  3:35                   ` Tian, Kevin
2020-08-12  3:28             ` Jason Wang
2020-08-12  4:05               ` Tian, Kevin [this message]
2020-08-13  4:33                 ` Jason Wang
2020-08-13  5:26                   ` Tian, Kevin
2020-08-13  6:01                     ` Jason Wang
2020-08-14 13:23                       ` Jason Gunthorpe
2020-08-17  2:24                         ` Tian, Kevin
2020-08-14 13:35             ` Jason Gunthorpe
2020-08-17  2:12               ` Tian, Kevin
2020-08-18  0:43                 ` Jason Gunthorpe
2020-08-18  1:09                   ` Tian, Kevin
2020-08-18 11:50                     ` Jason Gunthorpe
2020-08-18 16:27                       ` Paolo Bonzini
2020-08-18 16:49                         ` Jason Gunthorpe
2020-08-18 17:05                           ` Paolo Bonzini
2020-08-18 17:18                             ` Jason Gunthorpe
2020-08-19  7:29                       ` Tian, Kevin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MWHPR11MB1645DDC2C87D533B2A09A2D58C420@MWHPR11MB1645.namprd11.prod.outlook.com \
    --to=kevin.tian@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=ashok.raj@intel.com \
    --cc=baolu.lu@intel.com \
    --cc=bhelgaas@google.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dmaengine@vger.kernel.org \
    --cc=eric.auger@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hpa@zytor.com \
    --cc=jacob.jun.pan@intel.com \
    --cc=jasowang@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=jing.lin@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=megha.dey@intel.com \
    --cc=mona.hossain@intel.com \
    --cc=netanelg@mellanox.com \
    --cc=parav@mellanox.com \
    --cc=pbonzini@redhat.com \
    --cc=rafael@kernel.org \
    --cc=samuel.ortiz@intel.com \
    --cc=sanjay.k.kumar@intel.com \
    --cc=shahafs@mellanox.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=vkoul@kernel.org \
    --cc=x86@kernel.org \
    --cc=yan.y.zhao@linux.intel.com \
    --cc=yi.l.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

dmaengine Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/dmaengine/0 dmaengine/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dmaengine dmaengine/ https://lore.kernel.org/dmaengine \
		dmaengine@vger.kernel.org
	public-inbox-index dmaengine

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.dmaengine


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git