archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <>
To: Alex Williamson <>
Cc: Jean-Philippe Brucker <>,, "Raj, Ashok" <>,,,, Jason Wang <>,
	"Michael S. Tsirkin" <>,,,
Subject: Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
Date: Tue, 15 Sep 2020 11:29:06 -0300	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <20200914163310.450c8d6e@x1.home>

On Mon, Sep 14, 2020 at 04:33:10PM -0600, Alex Williamson wrote:

> Can you explain that further, or spit-ball what you think this /dev/sva
> interface looks like and how a user might interact between vfio and
> this new interface? 

When you open it you get some container, inside the container the
user can create PASIDs. PASIDs outside that container cannot be

Creating a PASID, or the guest PASID range would be the entry point
for doing all the operations against a PASID or range that this patch
series imagines:
 - Map process VA mappings to the PASID's DMA virtual address space
 - Catch faults
 - Setup any special HW stuff like Intel's two level thing, ARM stuff, etc
 - Expose resource controls, cgroup, whatever
 - Migration special stuff (allocate fixed PASIDs)

A PASID is a handle for an IOMMU page table, and the tools to
manipulate it. Within /dev/sva the page table is just 'floating' and
not linked to any PCI functions

The open /dev/sva FD holding the allocated PASIDs would be passed to a
kernel driver. This is a security authorization that the specified
PASID can be assigned to a PCI device by the kernel.

At this point the kernel driver would have the IOMMU permit its
bus/device/function to use the PASID. The PASID can be passed to
multiple drivers of any driver flavour so table re-use is
possible. Now the IOMMU page table is linked to a device.

The kernel device driver would also do the device specific programming
to setup the PASID in the device, attach it to some device object and
expose the device for user DMA.

For instance IDXD's char dev would map the queue memory and associate
the PASID with that queue and setup the HW to be ready for the new
enque instruction. The IDXD mdev would link to its emulated PCI BAR
and ensure the guest can only use PASID's included in the /dev/sva

The qemu control plane for vIOMMU related to PASID would run over

I think the design could go further where a 'PASID' is just an
abstract idea of a page table, then vfio-pci could consume it too as a
IOMMU page table handle even though there is no actual PASID. So qemu
could end up with one API to universally control the vIOMMU, an API
that can be shared between subsystems and is not tied to VFIO.

> allocating pasids and associating them with page tables for that
> two-stage IOMMU setup, performing cache invalidations based on page
> table updates, etc.  How does it make more sense for a vIOMMU to
> setup some aspects of the IOMMU through vfio and others through a
> TBD interface?

vfio's IOMMU interface is about RID based full device ownership,
and fixed mappings.

PASID is about mediation, shared ownership and page faulting.

Does PASID overlap with the existing IOMMU RID interface beyond both
are using the IOMMU?

> The IOMMU needs to allocate PASIDs, so in that sense it enforces a
> quota via the architectural limits, but is the IOMMU layer going to
> distinguish in-kernel versus user limits?  A cgroup limit seems like a
> good idea, but that's not really at the IOMMU layer either and I don't
> see that a /dev/sva and vfio interface couldn't both support a cgroup
> type quota.

It is all good questions. PASID is new, this stuff needs to be
sketched out more. A lot of in-kernel users of IOMMU PASID are
probably going to be triggered by userspace actions.

I think a cgroup quota would end up near the IOMMU layer, so vfio,
sva, and any other driver char devs would all be restricted by the
cgroup as peers.

> And it's not clear that they'll have compatible requirements.  A
> userspace idxd driver might have limited needs versus a vIOMMU backend.
> Does a single quota model adequately support both or are we back to the
> differences between access to a device and ownership of a device?

At the end of the day a PASID is just a number and the drivers only
use of it is to program it into HW.

All these other differences deal with the IOMMU side of the PASID, how
pages are mapped into it, how page fault works, etc, etc. Keeping the
two concerns seperated seems very clean. A device driver shouldn't
care how the PASID is setup.

> > > This series is a blueprint within the context of the ownership and
> > > permission model that VFIO already provides.  It doesn't seem like we
> > > can pluck that out on its own, nor is it necessarily the case that VFIO
> > > wouldn't want to provide PASID services within its own API even if we
> > > did have this undefined /dev/sva interface.  
> > 
> > I don't see what you do - VFIO does not own PASID, and in this
> > vfio-mdev mode it does not own the PCI device/IOMMU either. So why
> > would this need to be part of the VFIO owernship and permission model?
> Doesn't the PASID model essentially just augment the requester ID IOMMU
> model so as to manage the IOVAs for a subdevice of a RID?  

I'd say not really.. PASID is very different from RID because PASID
must always be mediated by the kernel. vfio-pci doesn't know how to
use PASID because it doesn't know how to program the PASID into
a specific device. While RID is fully self contained with vfio-pci.

Further, with the SVA models, the mediated devices are highly likely
to be shared between a vfio-mdev and a normal driver, as IDXD
shows. Userspace will get PASID's for SVA and share the device equally
with vfio-mdev.

> What elevates a user to be able to allocate such resources in this
> new proposal?

AFAIK the target for the current SVA model is no limitation. User
processes can open their devices, establish SVA and go ahead with
their workload.

If you are asking about iommu groups.. For PASID the PCI
bus/device/function that is the 'control point' for PASID must be
secure and owned by the kernel. ie only the kernel can progam the
device to use a given PASID. P2P access from other devices under
non-kernel control must not be allowed, as they could program a device
to use a PASID the kernel would not authorize.

All of this has to be done regardless of VFIO's involvement..

> Do they need a device at all?  It's not clear to me why RID based
> IOMMU management fits within vfio's scope, but PASID based does not.

In RID mode vfio-pci completely owns the PCI function, so it is more
natural that VFIO, as the sole device owner, would own the DMA mapping
machinery. Further, the RID IOMMU mode is rarely used outside of VFIO
so there is not much reason to try and disaggregate the API.

PASID on the other hand, is shared. vfio-mdev drivers will share the
device with other kernel drivers. PASID and DMA will be concurrent
with VFIO and other kernel drivers/etc.

Thus it makes more sense here to have the control plane for PASID also
be shared and not tied exclusively to VFIO.

iommu mailing list

  reply	other threads:[~2020-09-15 14:29 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-10 10:45 [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs Liu Yi L
2020-09-10 10:45 ` [PATCH v7 01/16] iommu: Report domain nesting info Liu Yi L
2020-09-11 19:38   ` Alex Williamson
2020-09-10 10:45 ` [PATCH v7 02/16] iommu/smmu: Report empty " Liu Yi L
2021-01-12  6:50   ` Vivek Gautam
2021-01-12  9:21     ` Liu, Yi L
2021-01-12 11:05       ` Vivek Gautam
2021-01-13  5:56         ` Liu, Yi L
2021-01-19 10:03           ` Auger Eric
2021-01-23  8:59             ` Liu, Yi L
2021-02-12  7:14               ` Vivek Gautam
2021-02-12  9:57                 ` Auger Eric
2021-02-12 10:18                   ` Vivek Kumar Gautam
2021-02-12 11:01                     ` Vivek Kumar Gautam
2021-03-03  9:44                   ` Liu, Yi L
2020-09-10 10:45 ` [PATCH v7 03/16] vfio/type1: Report iommu nesting info to userspace Liu Yi L
2020-09-11 20:16   ` Alex Williamson
2020-09-12  8:24     ` Liu, Yi L
2020-09-10 10:45 ` [PATCH v7 04/16] vfio: Add PASID allocation/free support Liu Yi L
2020-09-11 20:54   ` Alex Williamson
2020-09-15  4:03     ` Liu, Yi L
2020-09-10 10:45 ` [PATCH v7 05/16] iommu/vt-d: Support setting ioasid set to domain Liu Yi L
2020-09-10 10:45 ` [PATCH v7 06/16] iommu/vt-d: Remove get_task_mm() in bind_gpasid() Liu Yi L
2020-09-10 10:45 ` [PATCH v7 07/16] vfio/type1: Add VFIO_IOMMU_PASID_REQUEST (alloc/free) Liu Yi L
2020-09-11 21:38   ` Alex Williamson
2020-09-12  6:17     ` Liu, Yi L
2020-09-10 10:45 ` [PATCH v7 08/16] iommu: Pass domain to sva_unbind_gpasid() Liu Yi L
2020-09-10 10:45 ` [PATCH v7 09/16] iommu/vt-d: Check ownership for PASIDs from user-space Liu Yi L
2020-09-10 10:45 ` [PATCH v7 10/16] vfio/type1: Support binding guest page tables to PASID Liu Yi L
2020-09-11 22:03   ` Alex Williamson
2020-09-12  6:02     ` Liu, Yi L
2020-09-10 10:45 ` [PATCH v7 11/16] vfio/type1: Allow invalidating first-level/stage IOMMU cache Liu Yi L
2020-09-10 10:45 ` [PATCH v7 12/16] vfio/type1: Add vSVA support for IOMMU-backed mdevs Liu Yi L
2020-09-10 10:45 ` [PATCH v7 13/16] vfio/pci: Expose PCIe PASID capability to guest Liu Yi L
2020-09-11 22:13   ` Alex Williamson
2020-09-12  7:17     ` Liu, Yi L
2020-09-10 10:45 ` [PATCH v7 14/16] vfio: Document dual stage control Liu Yi L
2020-09-10 10:45 ` [PATCH v7 15/16] iommu/vt-d: Only support nesting when nesting caps are consistent across iommu units Liu Yi L
2020-09-10 10:45 ` [PATCH v7 16/16] iommu/vt-d: Support reporting nesting capability info Liu Yi L
2020-09-14  4:20 ` [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs Jason Wang
2020-09-14  8:01   ` Tian, Kevin
2020-09-14  8:57     ` Jason Wang
2020-09-14 10:38       ` Tian, Kevin
2020-09-14 11:38         ` Jason Gunthorpe
2020-09-14 13:31   ` Jean-Philippe Brucker
2020-09-14 13:47     ` Jason Gunthorpe
2020-09-14 16:22       ` Raj, Ashok
2020-09-14 16:33         ` Jason Gunthorpe
2020-09-14 16:58           ` Alex Williamson
2020-09-14 17:41             ` Jason Gunthorpe
2020-09-14 18:23               ` Alex Williamson
2020-09-14 19:00                 ` Jason Gunthorpe
2020-09-14 22:33                   ` Alex Williamson
2020-09-15 14:29                     ` Jason Gunthorpe [this message]
2020-09-16  1:19                       ` Tian, Kevin
2020-09-16  8:32                         ` Jean-Philippe Brucker
2020-09-16 14:51                           ` Jason Gunthorpe
2020-09-16 16:20                             ` Jean-Philippe Brucker
2020-09-16 16:32                               ` Jason Gunthorpe
2020-09-16 16:50                                 ` Auger Eric
2020-09-16 14:44                         ` Jason Gunthorpe
2020-09-17  6:01                           ` Tian, Kevin
2020-09-14 22:44                   ` Raj, Ashok
2020-09-15 11:33                     ` Jason Gunthorpe
2020-09-15 18:11                       ` Raj, Ashok
2020-09-15 18:45                         ` Jason Gunthorpe
2020-09-15 19:26                           ` Raj, Ashok
2020-09-15 23:45                             ` Jason Gunthorpe
2020-09-16  2:33                             ` Jason Wang
2020-09-15 22:08                           ` Jacob Pan
2020-09-15 23:51                             ` Jason Gunthorpe
2020-09-16  0:22                               ` Jacob Pan (Jun)
2020-09-16  1:46                                 ` Lu Baolu
2020-09-16 15:07                                 ` Jason Gunthorpe
2020-09-16 16:33                                   ` Raj, Ashok
2020-09-16 17:01                                     ` Jason Gunthorpe
2020-09-16 18:21                                       ` Jacob Pan (Jun)
2020-09-16 18:38                                         ` Jason Gunthorpe
2020-09-16 23:09                                           ` Jacob Pan (Jun)
2020-09-17  3:53                                             ` Jason Wang
2020-09-17 17:31                                               ` Jason Gunthorpe
2020-09-17 18:17                                               ` Jacob Pan (Jun)
2020-09-18  3:58                                                 ` Jason Wang
2020-09-16  2:29     ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).