linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Tian, Kevin" <kevin.tian@intel.com>
To: Jason Gunthorpe <jgg@mellanox.com>, "Raj, Ashok" <ashok.raj@intel.com>
Cc: "Jiang, Dave" <dave.jiang@intel.com>,
	"vkoul@kernel.org" <vkoul@kernel.org>,
	"megha.dey@linux.intel.com" <megha.dey@linux.intel.com>,
	"maz@kernel.org" <maz@kernel.org>,
	"bhelgaas@google.com" <bhelgaas@google.com>,
	"rafael@kernel.org" <rafael@kernel.org>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"alex.williamson@redhat.com" <alex.williamson@redhat.com>,
	"Pan, Jacob jun" <jacob.jun.pan@intel.com>,
	"Liu, Yi L" <yi.l.liu@intel.com>,
	"Lu, Baolu" <baolu.lu@intel.com>,
	"Kumar, Sanjay K" <sanjay.k.kumar@intel.com>,
	"Luck, Tony" <tony.luck@intel.com>,
	"Lin, Jing" <jing.lin@intel.com>,
	"Williams, Dan J" <dan.j.williams@intel.com>,
	"kwankhede@nvidia.com" <kwankhede@nvidia.com>,
	"eric.auger@redhat.com" <eric.auger@redhat.com>,
	"parav@mellanox.com" <parav@mellanox.com>,
	"dmaengine@vger.kernel.org" <dmaengine@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"x86@kernel.org" <x86@kernel.org>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: RE: [PATCH RFC 00/15] Add VFIO mediated device support and IMS support for the idxd driver.
Date: Fri, 24 Apr 2020 03:27:41 +0000	[thread overview]
Message-ID: <AADFC41AFE54684AB9EE6CBC0274A5D19D8960F9@SHSMSX104.ccr.corp.intel.com> (raw)
In-Reply-To: <20200423191217.GD13640@mellanox.com>

> From: Jason Gunthorpe <jgg@mellanox.com>
> Sent: Friday, April 24, 2020 3:12 AM
> 
> On Wed, Apr 22, 2020 at 02:14:36PM -0700, Raj, Ashok wrote:
> > Hi Jason
> >
> > > > >
> > > > > I'm feeling really skeptical that adding all this PCI config space and
> > > > > MMIO BAR emulation to the kernel just to cram this into a VFIO
> > > > > interface is a good idea, that kind of stuff is much safer in
> > > > > userspace.
> > > > >
> > > > > Particularly since vfio is not really needed once a driver is using
> > > > > the PASID stuff. We already have general code for drivers to use to
> > > > > attach a PASID to a mm_struct - and using vfio while disabling all the
> > > > > DMA/iommu config really seems like an abuse.
> > > >
> > > > Well, this series is for virtualizing idxd device to VMs, instead of
> > > > supporting SVA for bare metal processes. idxd implements a
> > > > hardware-assisted mediated device technique called Intel Scalable
> > > > I/O Virtualization,
> > >
> > > I'm familiar with the intel naming scheme.
> > >
> > > > which allows each Assignable Device Interface (ADI, e.g. a work
> > > > queue) tagged with an unique PASID to ensure fine-grained DMA
> > > > isolation when those ADIs are assigned to different VMs. For this
> > > > purpose idxd utilizes the VFIO mdev framework and IOMMU aux-
> domain
> > > > extension. Bare metal SVA will be enabled for idxd later by using
> > > > the general SVA code that you mentioned.  Both paths will co-exist
> > > > in the end so there is no such case of disabling DMA/iommu config.
> > >
> > > Again, if you will have a normal SVA interface, there is no need for a
> > > VFIO version, just use normal SVA for both.
> > >
> > > PCI emulation should try to be in userspace, not the kernel, for
> > > security.
> >
> > Not sure we completely understand your proposal. Mediated devices
> > are software constructed and they have protected resources like
> > interrupts and stuff and VFIO already provids abstractions to export
> > to user space.
> >
> > Native SVA is simply passing the process CR3 handle to IOMMU so
> > IOMMU knows how to walk process page tables, kernel handles things
> > like page-faults, doing device tlb invalidations and such.
> 
> > That by itself doesn't translate to what a guest typically does
> > with a VDEV. There are other control paths that need to be serviced
> > from the kernel code via VFIO. For speed path operations like
> > ringing doorbells and such they are directly managed from guest.
> 
> You don't need vfio to mmap BAR pages to userspace. The unique thing
> that vfio gives is it provides a way to program the classic non-PASID
> iommu, which you are not using here.

That unique thing is indeed used here. Please note sharing CPU virtual 
address space with device (what SVA API is invented for) is not the
purpose of this series. We still rely on classic non-PASID iommu programming, 
i.e. mapping/unmapping IOVA->HPA per iommu_domain. Although 
we do use PASID to tag ADI, the PASID is contained within iommu_domain 
and invisible to VFIO. From userspace p.o.v, this is a device passthrough
usage instead of PASID-based address space binding.

> 
> > How do you propose to use the existing SVA api's  to also provide
> > full device emulation as opposed to using an existing infrastructure
> > that's already in place?
> 
> You'd provide the 'full device emulation' in userspace (eg qemu),
> along side all the other device emulation. Device emulation does not
> belong in the kernel without a very good reason.

The problem is that we are not doing full device emulation. It's based
on mediated passthrough. Some emulation logic requires close 
engagement with kernel device driver, e.g. resource allocation, WQ 
configuration, fault report, etc., while the detail interface is very vendor/
device specific (just like between PF and VF). idxd is just the first 
device that supports Scalable IOV. We have a lot more coming later, 
in different types. Then putting such emulation in user space means 
that Qemu needs to support all those vendor specific interfaces for 
every new device which supports Scalable IOV. This is contrast to our 
goal of using Scalable IOV as an alternative to SR-IOV. For SR-IOV, 
Qemu only needs to support one VFIO API then any VF type simply 
works. We want to sustain the same user experience through VFIO 
mdev. 

Specifically for PCI config space emulation, now it's already done 
in multiple kernel places, e.g. vfio-pci, kvmgt, etc. We do plan to 
consolidate them later.

> 
> You get the doorbell BAR page from your own char dev
> 
> You setup a PASID IOMMU configuration over your own char dev
> 
> Interrupt delivery is triggering a generic event fd
> 
> What is VFIO needed for?

Based on above explanation VFIO mdev already meets all of our
requirements then why bother inventing a new one...

> 
> > Perhaps Alex can ease Jason's concerns?
> 
> Last we talked Alex also had doubts on what mdev should be used
> for. It is a feature that seems to lack boundaries, and I'll note that
> when the discussion came up for VDPA, they eventually choose not to
> use VFIO.
> 

Is there a link to Alex's doubt? I'm not sure why vDPA didn't go 
for VFIO, but imho it is a different story. vDPA is specifically for
devices which implement standard vhost/virtio interface, thus
it's reasonable that inventing a new mechanism might be more
efficient for all vDPA type devices. However Scalable IOV is
similar to SR-IOV, only for resource partitioning. It doesn't change
the device programming interface, which could be in any vendor
specific form. Here VFIO mdev is good for providing an unified 
interface for managing resource multiplexing of all such devices.

Thanks
Kevin

  reply	other threads:[~2020-04-24  3:28 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-21 23:33 [PATCH RFC 00/15] Add VFIO mediated device support and IMS support for the idxd driver Dave Jiang
2020-04-21 23:33 ` [PATCH RFC 01/15] drivers/base: Introduce platform_msi_ops Dave Jiang
2020-04-26  7:01   ` Greg KH
2020-04-27 21:38     ` Dave Jiang
2020-04-28  7:34       ` Greg KH
2020-04-21 23:33 ` [PATCH RFC 02/15] drivers/base: Introduce a new platform-msi list Dave Jiang
2020-04-25 21:13   ` Thomas Gleixner
2020-05-04  0:08     ` Dey, Megha
2020-04-21 23:34 ` [PATCH RFC 03/15] drivers/base: Allocate/free platform-msi interrupts by group Dave Jiang
2020-04-25 21:23   ` Thomas Gleixner
2020-05-04  0:08     ` Dey, Megha
2020-04-21 23:34 ` [PATCH RFC 04/15] drivers/base: Add support for a new IMS irq domain Dave Jiang
2020-04-23 20:11   ` Jason Gunthorpe
2020-05-01 22:30     ` Dey, Megha
2020-05-03 22:25       ` Jason Gunthorpe
2020-05-03 22:40         ` Dey, Megha
2020-05-03 22:46           ` Jason Gunthorpe
2020-05-04  0:25             ` Dey, Megha
2020-05-04 12:14               ` Jason Gunthorpe
2020-05-06 10:27                 ` Tian, Kevin
2020-04-25 21:38   ` Thomas Gleixner
2020-05-04  0:11     ` Dey, Megha
2020-04-21 23:34 ` [PATCH RFC 05/15] ims-msi: Add mask/unmask routines Dave Jiang
2020-04-25 21:49   ` Thomas Gleixner
2020-05-04  0:16     ` Dey, Megha
2020-04-21 23:34 ` [PATCH RFC 06/15] ims-msi: Enable IMS interrupts Dave Jiang
2020-04-25 22:13   ` Thomas Gleixner
2020-05-04  0:17     ` Dey, Megha
2020-04-21 23:34 ` [PATCH RFC 07/15] Documentation: Interrupt Message store Dave Jiang
2020-04-23 20:04   ` Jason Gunthorpe
2020-05-01 22:32     ` Dey, Megha
2020-05-03 22:28       ` Jason Gunthorpe
2020-05-03 22:41         ` Dey, Megha
2020-04-21 23:34 ` [PATCH RFC 08/15] vfio/mdev: Add a member for iommu domain in mdev_device Dave Jiang
2020-04-21 23:34 ` [PATCH RFC 09/15] vfio/type1: Save domain when attach domain to mdev Dave Jiang
2020-04-21 23:34 ` [PATCH RFC 10/15] dmaengine: idxd: add config support for readonly devices Dave Jiang
2020-04-21 23:34 ` [PATCH RFC 11/15] dmaengine: idxd: add IMS support in base driver Dave Jiang
2020-04-21 23:35 ` [PATCH RFC 12/15] dmaengine: idxd: add device support functions in prep for mdev Dave Jiang
2020-04-21 23:35 ` [PATCH RFC 13/15] dmaengine: idxd: add support for VFIO mediated device Dave Jiang
2020-04-21 23:35 ` [PATCH RFC 14/15] dmaengine: idxd: add error notification from host driver to " Dave Jiang
2020-04-21 23:35 ` [PATCH RFC 15/15] dmaengine: idxd: add ABI documentation for mediated device support Dave Jiang
2020-04-21 23:54 ` [PATCH RFC 00/15] Add VFIO mediated device support and IMS support for the idxd driver Jason Gunthorpe
2020-04-22  0:53   ` Tian, Kevin
2020-04-22 11:50     ` Jason Gunthorpe
2020-04-22 21:14       ` Raj, Ashok
2020-04-23 19:12         ` Jason Gunthorpe
2020-04-24  3:27           ` Tian, Kevin [this message]
2020-04-24 12:44             ` Jason Gunthorpe
2020-04-24 16:25               ` Tian, Kevin
2020-04-24 18:12                 ` Jason Gunthorpe
2020-04-26  5:18                   ` Tian, Kevin
2020-04-26 19:13                     ` Jason Gunthorpe
2020-04-27  3:43                       ` Alex Williamson
2020-04-27 11:58                         ` Jason Gunthorpe
2020-04-27 13:19                           ` Alex Williamson
2020-04-27 13:22                             ` Jason Gunthorpe
2020-04-27 14:18                               ` Alex Williamson
2020-04-27 14:25                                 ` Jason Gunthorpe
2020-04-27 15:41                                   ` Alex Williamson
2020-04-27 16:16                                     ` Jason Gunthorpe
2020-04-27 16:25                                       ` Dave Jiang
2020-04-27 21:56                                         ` Jason Gunthorpe
2020-04-29  9:42                               ` Tian, Kevin
2020-05-08 20:47                                 ` Raj, Ashok
2020-05-08 23:16                                   ` Jason Gunthorpe
2020-05-08 23:52                                     ` Dave Jiang
2020-05-09  0:09                                     ` Raj, Ashok
2020-05-09 12:21                                       ` Jason Gunthorpe
2020-05-13  2:29                                         ` Jason Wang
2020-05-13  8:30                                         ` Tian, Kevin
2020-05-13 12:40                                           ` Jason Gunthorpe
2020-04-27 12:13                       ` Tian, Kevin
2020-04-27 12:55                         ` Jason Gunthorpe
2020-04-22 21:24   ` Dan Williams
2020-04-23 19:17     ` Dan Williams
2020-04-23 19:49       ` Jason Gunthorpe
2020-05-01 22:31         ` Dey, Megha
2020-05-03 22:21           ` Jason Gunthorpe
2020-05-03 22:32             ` Dey, Megha
2020-04-23 19:18     ` Jason Gunthorpe
2020-05-01 22:31       ` Dey, Megha
2020-05-03 22:22         ` Jason Gunthorpe
2020-05-03 22:31           ` Dey, Megha
2020-05-03 22:36             ` Jason Gunthorpe
2020-05-04  0:20               ` Dey, Megha
2020-04-22 23:04   ` Dey, Megha
2020-04-23 19:44     ` Jason Gunthorpe
2020-05-01 22:32       ` Dey, Megha
2020-04-24  6:31   ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AADFC41AFE54684AB9EE6CBC0274A5D19D8960F9@SHSMSX104.ccr.corp.intel.com \
    --to=kevin.tian@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=ashok.raj@intel.com \
    --cc=baolu.lu@intel.com \
    --cc=bhelgaas@google.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dmaengine@vger.kernel.org \
    --cc=eric.auger@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hpa@zytor.com \
    --cc=jacob.jun.pan@intel.com \
    --cc=jgg@mellanox.com \
    --cc=jing.lin@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=megha.dey@linux.intel.com \
    --cc=parav@mellanox.com \
    --cc=rafael@kernel.org \
    --cc=sanjay.k.kumar@intel.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=vkoul@kernel.org \
    --cc=x86@kernel.org \
    --cc=yi.l.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).