linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Tian, Kevin" <kevin.tian@intel.com>
To: Jason Gunthorpe <jgg@mellanox.com>
Cc: "Raj, Ashok" <ashok.raj@intel.com>,
	"Jiang, Dave" <dave.jiang@intel.com>,
	"vkoul@kernel.org" <vkoul@kernel.org>,
	"megha.dey@linux.intel.com" <megha.dey@linux.intel.com>,
	"maz@kernel.org" <maz@kernel.org>,
	"bhelgaas@google.com" <bhelgaas@google.com>,
	"rafael@kernel.org" <rafael@kernel.org>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"alex.williamson@redhat.com" <alex.williamson@redhat.com>,
	"Pan, Jacob jun" <jacob.jun.pan@intel.com>,
	"Liu, Yi L" <yi.l.liu@intel.com>,
	"Lu, Baolu" <baolu.lu@intel.com>,
	"Kumar, Sanjay K" <sanjay.k.kumar@intel.com>,
	"Luck, Tony" <tony.luck@intel.com>,
	"Lin, Jing" <jing.lin@intel.com>,
	"Williams, Dan J" <dan.j.williams@intel.com>,
	"kwankhede@nvidia.com" <kwankhede@nvidia.com>,
	"eric.auger@redhat.com" <eric.auger@redhat.com>,
	"parav@mellanox.com" <parav@mellanox.com>,
	"dmaengine@vger.kernel.org" <dmaengine@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"x86@kernel.org" <x86@kernel.org>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: RE: [PATCH RFC 00/15] Add VFIO mediated device support and IMS support for the idxd driver.
Date: Fri, 24 Apr 2020 16:25:56 +0000	[thread overview]
Message-ID: <AADFC41AFE54684AB9EE6CBC0274A5D19D8A808B@SHSMSX104.ccr.corp.intel.com> (raw)
In-Reply-To: <20200424124444.GJ13640@mellanox.com>

> From: Jason Gunthorpe
> Sent: Friday, April 24, 2020 8:45 PM
> 
> On Fri, Apr 24, 2020 at 03:27:41AM +0000, Tian, Kevin wrote:
> 
> > > > That by itself doesn't translate to what a guest typically does
> > > > with a VDEV. There are other control paths that need to be serviced
> > > > from the kernel code via VFIO. For speed path operations like
> > > > ringing doorbells and such they are directly managed from guest.
> > >
> > > You don't need vfio to mmap BAR pages to userspace. The unique thing
> > > that vfio gives is it provides a way to program the classic non-PASID
> > > iommu, which you are not using here.
> >
> > That unique thing is indeed used here. Please note sharing CPU virtual
> > address space with device (what SVA API is invented for) is not the
> > purpose of this series. We still rely on classic non-PASID iommu
> programming,
> > i.e. mapping/unmapping IOVA->HPA per iommu_domain. Although
> > we do use PASID to tag ADI, the PASID is contained within iommu_domain
> > and invisible to VFIO. From userspace p.o.v, this is a device passthrough
> > usage instead of PASID-based address space binding.
> 
> So you have PASID support but don't use it? Why? PASID is much better
> than classic VFIO iommu, it doesn't require page pinning...

PASID and I/O page fault (through ATS/PRI) are orthogonal things. Don't
draw the equation between them. The host driver can tag PASID to 
ADI so every DMA request out of that ADI has a PASID prefix, allowing VT-d
to do PASID-granular DMA isolation. However I/O page fault cannot be
taken for granted. A scalable IOV device may support PASID while without
ATS/PRI. Even when ATS/PRI is supported, the tolerance of I/O page fault
is decided by the work queue mode that is configured by the guest. For 
example, if the guest put the work queue in non-faultable transaction 
mode, the device doesn't do PRI and simply report error if no valid IOMMU 
mapping.

So in this series we support only the basic form for non-faultable transactions,
using the classic VFIO iommu interface plus PASID-granular translation. 
We are working on virtual SVA support in parallel. Once that feature is ready, 
then I/O page fault could be CONDITIONALLY enabled according to guest 
vIOMMU setting, e.g. when virtual context entry has page request enabled 
then we enable nested translation in the physical PASID entry, with 1st 
level linking to guest page table (GVA->GPA) and 2nd-level carrying 
(GPA->HPA).

> 
> > > > How do you propose to use the existing SVA api's  to also provide
> > > > full device emulation as opposed to using an existing infrastructure
> > > > that's already in place?
> > >
> > > You'd provide the 'full device emulation' in userspace (eg qemu),
> > > along side all the other device emulation. Device emulation does not
> > > belong in the kernel without a very good reason.
> >
> > The problem is that we are not doing full device emulation. It's based
> > on mediated passthrough. Some emulation logic requires close
> > engagement with kernel device driver, e.g. resource allocation, WQ
> > configuration, fault report, etc., while the detail interface is very vendor/
> > device specific (just like between PF and VF).
> 
> Which sounds like the fairly classic case of device emulation to me.
> 
> > idxd is just the first device that supports Scalable IOV. We have a
> > lot more coming later, in different types. Then putting such
> > emulation in user space means that Qemu needs to support all those
> > vendor specific interfaces for every new device which supports
> 
> It would be very sad to see an endless amount of device emulation code
> crammed into the kernel. Userspace is where device emulation is
> supposed to live. For security

I think providing an unified abstraction to userspace is also important,
which is what VFIO provides today. The merit of using one set of VFIO 
API to manage all kinds of mediated devices and VF devices is a major
gain. Instead, inventing a new vDPA-like interface for every Scalable-IOV
or equivalent device is just overkill and doesn't scale. Also the actual
emulation code in idxd driver is actually small, if putting aside the PCI
config space part for which I already explained most logic could be shared
between mdev device drivers.

> 
> qemu is the right place to put this stuff.
> 
> > > > Perhaps Alex can ease Jason's concerns?
> > >
> > > Last we talked Alex also had doubts on what mdev should be used
> > > for. It is a feature that seems to lack boundaries, and I'll note that
> > > when the discussion came up for VDPA, they eventually choose not to
> > > use VFIO.
> > >
> >
> > Is there a link to Alex's doubt? I'm not sure why vDPA didn't go
> > for VFIO, but imho it is a different story.
> 
> No, not at all. VDPA HW today is using what Intel has been calling
> ADI. But qemu already had the device emulation part in userspace, (all
> of the virtio emulation parts are in userspace) so they didn't try to
> put it in the kernel.
> 
> This is the pattern. User space is supposed to do the emulation parts,
> the kernel provides the raw elements to manage queues/etc - and it is
> not done through mdev.
> 
> > efficient for all vDPA type devices. However Scalable IOV is
> > similar to SR-IOV, only for resource partitioning. It doesn't change
> > the device programming interface, which could be in any vendor
> > specific form. Here VFIO mdev is good for providing an unified
> > interface for managing resource multiplexing of all such devices.
> 
> SIOV doesn't have a HW config space, and for some reason in these
> patches there is BAR emulation too. So, no, it is not like SR-IOV at
> all.
> 
> This is more like classic device emulation, presumably with some fast
> path for the data plane. ie just like VDPA :)
> 
> Jason

Thanks
Kevin

  reply	other threads:[~2020-04-24 16:26 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-21 23:33 Dave Jiang
2020-04-21 23:33 ` [PATCH RFC 01/15] drivers/base: Introduce platform_msi_ops Dave Jiang
2020-04-26  7:01   ` Greg KH
2020-04-27 21:38     ` Dave Jiang
2020-04-28  7:34       ` Greg KH
2020-04-21 23:33 ` [PATCH RFC 02/15] drivers/base: Introduce a new platform-msi list Dave Jiang
2020-04-25 21:13   ` Thomas Gleixner
2020-05-04  0:08     ` Dey, Megha
2020-04-21 23:34 ` [PATCH RFC 03/15] drivers/base: Allocate/free platform-msi interrupts by group Dave Jiang
2020-04-25 21:23   ` Thomas Gleixner
2020-05-04  0:08     ` Dey, Megha
2020-04-21 23:34 ` [PATCH RFC 04/15] drivers/base: Add support for a new IMS irq domain Dave Jiang
2020-04-23 20:11   ` Jason Gunthorpe
2020-05-01 22:30     ` Dey, Megha
2020-05-03 22:25       ` Jason Gunthorpe
2020-05-03 22:40         ` Dey, Megha
2020-05-03 22:46           ` Jason Gunthorpe
2020-05-04  0:25             ` Dey, Megha
2020-05-04 12:14               ` Jason Gunthorpe
2020-05-06 10:27                 ` Tian, Kevin
2020-04-25 21:38   ` Thomas Gleixner
2020-05-04  0:11     ` Dey, Megha
2020-04-21 23:34 ` [PATCH RFC 05/15] ims-msi: Add mask/unmask routines Dave Jiang
2020-04-25 21:49   ` Thomas Gleixner
2020-05-04  0:16     ` Dey, Megha
2020-04-21 23:34 ` [PATCH RFC 06/15] ims-msi: Enable IMS interrupts Dave Jiang
2020-04-25 22:13   ` Thomas Gleixner
2020-05-04  0:17     ` Dey, Megha
2020-04-21 23:34 ` [PATCH RFC 07/15] Documentation: Interrupt Message store Dave Jiang
2020-04-23 20:04   ` Jason Gunthorpe
2020-05-01 22:32     ` Dey, Megha
2020-05-03 22:28       ` Jason Gunthorpe
2020-05-03 22:41         ` Dey, Megha
2020-04-21 23:34 ` [PATCH RFC 08/15] vfio/mdev: Add a member for iommu domain in mdev_device Dave Jiang
2020-04-21 23:34 ` [PATCH RFC 09/15] vfio/type1: Save domain when attach domain to mdev Dave Jiang
2020-04-21 23:34 ` [PATCH RFC 10/15] dmaengine: idxd: add config support for readonly devices Dave Jiang
2020-04-21 23:34 ` [PATCH RFC 11/15] dmaengine: idxd: add IMS support in base driver Dave Jiang
2020-04-21 23:35 ` [PATCH RFC 12/15] dmaengine: idxd: add device support functions in prep for mdev Dave Jiang
2020-04-21 23:35 ` [PATCH RFC 13/15] dmaengine: idxd: add support for VFIO mediated device Dave Jiang
2020-04-21 23:35 ` [PATCH RFC 14/15] dmaengine: idxd: add error notification from host driver to " Dave Jiang
2020-04-21 23:35 ` [PATCH RFC 15/15] dmaengine: idxd: add ABI documentation for mediated device support Dave Jiang
2020-04-21 23:54 ` [PATCH RFC 00/15] Add VFIO mediated device support and IMS support for the idxd driver Jason Gunthorpe
2020-04-22  0:53   ` Tian, Kevin
2020-04-22 11:50     ` Jason Gunthorpe
2020-04-22 21:14       ` Raj, Ashok
2020-04-23 19:12         ` Jason Gunthorpe
2020-04-24  3:27           ` Tian, Kevin
2020-04-24 12:44             ` Jason Gunthorpe
2020-04-24 16:25               ` Tian, Kevin [this message]
2020-04-24 18:12                 ` Jason Gunthorpe
2020-04-26  5:18                   ` Tian, Kevin
2020-04-26 19:13                     ` Jason Gunthorpe
2020-04-27  3:43                       ` Alex Williamson
2020-04-27 11:58                         ` Jason Gunthorpe
2020-04-27 13:19                           ` Alex Williamson
2020-04-27 13:22                             ` Jason Gunthorpe
2020-04-27 14:18                               ` Alex Williamson
2020-04-27 14:25                                 ` Jason Gunthorpe
2020-04-27 15:41                                   ` Alex Williamson
2020-04-27 16:16                                     ` Jason Gunthorpe
2020-04-27 16:25                                       ` Dave Jiang
2020-04-27 21:56                                         ` Jason Gunthorpe
2020-04-29  9:42                               ` Tian, Kevin
2020-05-08 20:47                                 ` Raj, Ashok
2020-05-08 23:16                                   ` Jason Gunthorpe
2020-05-08 23:52                                     ` Dave Jiang
2020-05-09  0:09                                     ` Raj, Ashok
2020-05-09 12:21                                       ` Jason Gunthorpe
2020-05-13  2:29                                         ` Jason Wang
2020-05-13  8:30                                         ` Tian, Kevin
2020-05-13 12:40                                           ` Jason Gunthorpe
2020-04-27 12:13                       ` Tian, Kevin
2020-04-27 12:55                         ` Jason Gunthorpe
2020-04-22 21:24   ` Dan Williams
2020-04-23 19:17     ` Dan Williams
2020-04-23 19:49       ` Jason Gunthorpe
2020-05-01 22:31         ` Dey, Megha
2020-05-03 22:21           ` Jason Gunthorpe
2020-05-03 22:32             ` Dey, Megha
2020-04-23 19:18     ` Jason Gunthorpe
2020-05-01 22:31       ` Dey, Megha
2020-05-03 22:22         ` Jason Gunthorpe
2020-05-03 22:31           ` Dey, Megha
2020-05-03 22:36             ` Jason Gunthorpe
2020-05-04  0:20               ` Dey, Megha
2020-04-22 23:04   ` Dey, Megha
2020-04-23 19:44     ` Jason Gunthorpe
2020-05-01 22:32       ` Dey, Megha
2020-04-24  6:31   ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AADFC41AFE54684AB9EE6CBC0274A5D19D8A808B@SHSMSX104.ccr.corp.intel.com \
    --to=kevin.tian@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=ashok.raj@intel.com \
    --cc=baolu.lu@intel.com \
    --cc=bhelgaas@google.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dmaengine@vger.kernel.org \
    --cc=eric.auger@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hpa@zytor.com \
    --cc=jacob.jun.pan@intel.com \
    --cc=jgg@mellanox.com \
    --cc=jing.lin@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=megha.dey@linux.intel.com \
    --cc=parav@mellanox.com \
    --cc=rafael@kernel.org \
    --cc=sanjay.k.kumar@intel.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=vkoul@kernel.org \
    --cc=x86@kernel.org \
    --cc=yi.l.liu@intel.com \
    --subject='RE: [PATCH RFC 00/15] Add VFIO mediated device support and IMS support for the idxd driver.' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).