iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: "Tian, Kevin" <kevin.tian@intel.com>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	Jason Wang <jasowang@redhat.com>,
	Kirti Wankhede <kwankhede@nvidia.com>,
	Jean-Philippe Brucker <jean-philippe@linaro.org>,
	"Jiang, Dave" <dave.jiang@intel.com>,
	"Raj, Ashok" <ashok.raj@intel.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Jason Gunthorpe <jgg@nvidia.com>,
	"parav@mellanox.com" <parav@mellanox.com>,
	"Enrico Weigelt, metux IT consult" <lkml@metux.net>,
	David Gibson <david@gibson.dropbear.id.au>,
	Robin Murphy <robin.murphy@arm.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Shenming Lu <lushenming@huawei.com>,
	"iommu@lists.linux-foundation.org"
	<iommu@lists.linux-foundation.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	David Woodhouse <dwmw2@infradead.org>
Subject: Re: [RFC v2] /dev/iommu uAPI proposal
Date: Fri, 9 Jul 2021 15:50:52 -0600	[thread overview]
Message-ID: <20210709155052.2881f561.alex.williamson@redhat.com> (raw)
In-Reply-To: <BN9PR11MB5433B1E4AE5B0480369F97178C189@BN9PR11MB5433.namprd11.prod.outlook.com>

Hi Kevin,

A couple first pass comments...

On Fri, 9 Jul 2021 07:48:44 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:
> 2.2. /dev/vfio device uAPI
> ++++++++++++++++++++++++++
> 
> /*
>   * Bind a vfio_device to the specified IOMMU fd
>   *
>   * The user should provide a device cookie when calling this ioctl. The 
>   * cookie is later used in IOMMU fd for capability query, iotlb invalidation
>   * and I/O fault handling.
>   *
>   * User is not allowed to access the device before the binding operation
>   * is completed.
>   *
>   * Unbind is automatically conducted when device fd is closed.
>   *
>   * Input parameters:
>   *	- iommu_fd;
>   *	- cookie;
>   *
>   * Return: 0 on success, -errno on failure.
>   */
> #define VFIO_BIND_IOMMU_FD	_IO(VFIO_TYPE, VFIO_BASE + 22)

I believe this is an ioctl on the device fd, therefore it should be
named VFIO_DEVICE_BIND_IOMMU_FD.

> 
> 
> /*
>   * Report vPASID info to userspace via VFIO_DEVICE_GET_INFO
>   *
>   * Add a new device capability. The presence indicates that the user
>   * is allowed to create multiple I/O address spaces on this device. The
>   * capability further includes following flags:
>   *
>   *	- PASID_DELEGATED, if clear every vPASID must be registered to 
>   *	  the kernel;
>   *	- PASID_CPU, if set vPASID is allowed to be carried in the CPU 
>   *	  instructions (e.g. ENQCMD);
>   *	- PASID_CPU_VIRT, if set require vPASID translation in the CPU; 
>   * 
>   * The user must check that all devices with PASID_CPU set have the 
>   * same setting on PASID_CPU_VIRT. If mismatching, it should enable 
>   * vPASID only in one category (all set, or all clear).
>   *
>   * When the user enables vPASID on the device with PASID_CPU_VIRT
>   * set, it must enable vPASID CPU translation via kvm fd before attempting
>   * to use ENQCMD to submit work items. The command portal is blocked 
>   * by the kernel until the CPU translation is enabled.
>   */
> #define VFIO_DEVICE_INFO_CAP_PASID		5
> 
> 
> /*
>   * Attach a vfio device to the specified IOASID
>   *
>   * Multiple vfio devices can be attached to the same IOASID, and vice 
>   * versa. 
>   *
>   * User may optionally provide a "virtual PASID" to mark an I/O page 
>   * table on this vfio device, if PASID_DELEGATED is not set in device info. 
>   * Whether the virtual PASID is physically used or converted to another 
>   * kernel-allocated PASID is a policy in the kernel.
>   *
>   * Because one device is allowed to bind to multiple IOMMU fd's, the
>   * user should provide both iommu_fd and ioasid for this attach operation.
>   *
>   * Input parameter:
>   *	- iommu_fd;
>   *	- ioasid;
>   *	- flag;
>   *	- vpasid (if specified);
>   * 
>   * Return: 0 on success, -errno on failure.
>   */
> #define VFIO_ATTACH_IOASID		_IO(VFIO_TYPE, VFIO_BASE + 23)
> #define VFIO_DETACH_IOASID		_IO(VFIO_TYPE, VFIO_BASE + 24)

Likewise, VFIO_DEVICE_{ATTACH,DETACH}_IOASID

...
> 3. Sample structures and helper functions
> --------------------------------------------------------
> 
> Three helper functions are provided to support VFIO_BIND_IOMMU_FD:
> 
> 	struct iommu_ctx *iommu_ctx_fdget(int fd);
> 	struct iommu_dev *iommu_register_device(struct iommu_ctx *ctx,
> 		struct device *device, u64 cookie);
> 	int iommu_unregister_device(struct iommu_dev *dev);
> 
> An iommu_ctx is created for each fd:
> 
> 	struct iommu_ctx {
> 		// a list of allocated IOASID data's
> 		struct xarray		ioasid_xa;
> 
> 		// a list of registered devices
> 		struct xarray		dev_xa;
> 	};
> 
> Later some group-tracking fields will be also introduced to support 
> multi-devices group.
> 
> Each registered device is represented by iommu_dev:
> 
> 	struct iommu_dev {
> 		struct iommu_ctx	*ctx;
> 		// always be the physical device
> 		struct device 		*device;
> 		u64			cookie;
> 		struct kref		kref;
> 	};
> 
> A successful binding establishes a security context for the bound
> device and returns struct iommu_dev pointer to the caller. After this
> point, the user is allowed to query device capabilities via IOMMU_
> DEVICE_GET_INFO.

If we have an initial singleton group only restriction, I assume that
both iommu_register_device() would fail for any devices that are not in
a singleton group and vfio would only expose direct device files for
the devices in singleton groups.  The latter implementation could
change when multi-device group support is added so that userspace can
assume that if the vfio device file exists, this interface is available.
I think this is confirmed further below.

> For mdev the struct device should be the pointer to the parent device. 

I don't get how iommu_register_device() differentiates an mdev from a
pdev in this case.

...
> 4.3. IOASID nesting (software)
> ++++++++++++++++++++++++++++++
> 
> Same usage scenario as 4.2, with software-based IOASID nesting 
> available. In this mode it is the kernel instead of user to create the
> shadow mapping.
> 
> The flow before guest boots is same as 4.2, except one point. Because 
> giova_ioasid is nested on gpa_ioasid, locked accounting is only 
> conducted for gpa_ioasid which becomes the only root.
> 
> There could be a case where different gpa_ioasids are created due
> to incompatible format between dev1/dev2 (e.g. about IOMMU 
> enforce-snoop). In such case the user could further created a dummy
> IOASID (HVA->HVA) as the root parent for two gpa_ioasids to avoid 
> duplicated accounting. But this scenario is not covered in following 
> flows.

This use case has been noted several times in the proposal, it probably
deserves an example.

> 
> To save space we only list the steps after boots (i.e. both dev1/dev2
> have been attached to gpa_ioasid before guest boots):
> 
> 	/* After boots */
> 	/* Create GIOVA space nested on GPA space
> 	 * Both page tables are managed by the kernel
> 	 */
> 	alloc_data = {.user_pgtable = false; .parent = gpa_ioasid};
> 	giova_ioasid = ioctl(iommu_fd, IOMMU_IOASID_ALLOC, &alloc_data);

So the user would use IOMMU_DEVICE_GET_INFO on the iommu_fd with device
cookie2 after the VFIO_DEVICE_BIND_IOMMU_FD to learn that software
nesting is supported before proceeding down this path?

> 
> 	/* Attach dev2 to the new address space (child)
> 	 * Note dev2 is still attached to gpa_ioasid (parent)
> 	 */
> 	at_data = { .fd = iommu_fd; .ioasid = giova_ioasid};
> 	ioctl(device_fd2, VFIO_ATTACH_IOASID, &at_data);
> 
> 	/* Setup a GIOVA [0x2000] ->GPA [0x1000] mapping for giova_ioasid, 
> 	 * based on the vIOMMU page table. The kernel is responsible for
> 	 * creating the shadow mapping GIOVA [0x2000] -> HVA [0x40001000]
> 	 * by walking the parent's I/O page table to find out GPA [0x1000] ->
> 	 * HVA [0x40001000].
> 	 */
> 	dma_map = {
> 		.ioasid	= giova_ioasid;
> 		.iova	= 0x2000;	// GIOVA
> 		.vaddr	= 0x1000;	// GPA
> 		.size	= 4KB;
> 	};
> 	ioctl(iommu_fd, IOMMU_MAP_DMA, &dma_map);
> 
> 4.4. IOASID nesting (hardware)
> ++++++++++++++++++++++++++++++
> 
> Same usage scenario as 4.2, with hardware-based IOASID nesting
> available. In this mode the I/O page table is managed by userspace
> thus an invalidation interface is used for the user to request iotlb
> invalidation.
> 
> 	/* After boots */
> 	/* Create GIOVA space nested on GPA space.
> 	 * Claim it's an user-managed I/O page table.
> 	 */
> 	alloc_data = {
> 		.user_pgtable	= true;
> 		.parent		= gpa_ioasid;
> 		.addr		= giova_pgtable;
> 		// and format information;
> 	};
> 	giova_ioasid = ioctl(iommu_fd, IOMMU_IOASID_ALLOC, &alloc_data);
> 
> 	/* Attach dev2 to the new address space (child)
> 	 * Note dev2 is still attached to gpa_ioasid (parent)
> 	 */
> 	at_data = { .fd = iommu_fd; .ioasid = giova_ioasid};
> 	ioctl(device_fd2, VFIO_ATTACH_IOASID, &at_data);
> 
> 	/* Invalidate IOTLB when required */
> 	inv_data = {
> 		.ioasid	= giova_ioasid;
> 		// granular/cache type information
> 	};
> 	ioctl(iommu_fd, IOMMU_INVALIDATE_CACHE, &inv_data);
> 
> 	/* See 4.6 for I/O page fault handling */
> 	
> 4.5. Guest SVA (vSVA)
> +++++++++++++++++++++
> 
> After boots the guest further creates a GVA address spaces (vpasid1) on 
> dev1. Dev2 is not affected (still attached to giova_ioasid).
> 
> As explained in section 1.4, the user should check the PASID capability
> exposed via VFIO_DEVICE_GET_INFO and follow the required uAPI
> semantics when doing the attaching call:

And this characteristic lives in VFIO_DEVICE_GET_INFO rather than
IOMMU_DEVICE_GET_INFO because this is a characteristic known by the
vfio device driver rather than the system IOMMU, right?  Thanks,

Alex

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

  reply	other threads:[~2021-07-09 21:51 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-09  7:48 [RFC v2] /dev/iommu uAPI proposal Tian, Kevin
2021-07-09 21:50 ` Alex Williamson [this message]
2021-07-12  1:22   ` Tian, Kevin
2021-07-12 18:41     ` Alex Williamson
2021-07-12 23:41       ` Tian, Kevin
2021-07-12 23:56       ` Tian, Kevin
2021-07-13 12:55         ` Jason Gunthorpe
2021-07-13 16:26           ` Alex Williamson
2021-07-13 16:32             ` Jason Gunthorpe
2021-07-13 22:48               ` Tian, Kevin
2021-07-13 23:02                 ` Jason Gunthorpe
2021-07-13 23:20                   ` Tian, Kevin
2021-07-13 23:22                     ` Jason Gunthorpe
2021-07-13 23:24                       ` Tian, Kevin
2021-07-15  3:20 ` Shenming Lu
2021-07-15  3:55   ` Tian, Kevin
2021-07-15  6:29     ` Shenming Lu
2021-07-15  6:49       ` Tian, Kevin
2021-07-15  8:14         ` Shenming Lu
2021-07-15 12:48         ` Jason Gunthorpe via iommu
2021-07-15 13:57           ` Raj, Ashok
2021-07-15 15:23             ` Jason Gunthorpe via iommu
2021-07-15 16:21               ` Raj, Ashok
2021-07-15 17:18                 ` Jason Gunthorpe via iommu
2021-07-15 17:48                   ` Raj, Ashok
2021-07-15 17:53                     ` Jason Gunthorpe via iommu
2021-07-15 18:05                       ` Raj, Ashok
2021-07-15 18:13                         ` Jason Gunthorpe via iommu
2021-07-16  1:20                           ` Tian, Kevin
2021-07-16 12:20                             ` Shenming Lu
2021-07-21  2:13                               ` Tian, Kevin
2021-07-22 16:30                                 ` Jason Gunthorpe via iommu
2021-07-16 18:30                             ` Jason Gunthorpe via iommu
2021-07-21  2:11                               ` Tian, Kevin
2021-07-26  4:50 ` David Gibson
2021-07-28  4:04   ` Tian, Kevin
2021-08-03  1:50     ` David Gibson
2021-08-03  3:19       ` Tian, Kevin
2021-08-06  4:45         ` David Gibson
2021-08-06 12:32           ` Jason Gunthorpe via iommu
2021-08-10  6:10             ` David Gibson
2021-08-09  8:34           ` Tian, Kevin
2021-08-10  4:47             ` David Gibson
2021-08-10  6:04               ` Tian, Kevin
2021-07-30 14:51   ` Jason Gunthorpe via iommu
2021-08-02  2:49     ` Tian, Kevin
2021-08-04 14:04       ` Jason Gunthorpe via iommu
2021-08-04 22:59         ` Tian, Kevin
2021-08-05 11:27           ` Jason Gunthorpe via iommu
2021-08-05 22:44             ` Tian, Kevin
2021-08-06  4:47         ` David Gibson
2021-08-03  1:58     ` David Gibson
2021-08-04 14:07       ` Jason Gunthorpe via iommu
2021-08-06  4:24         ` David Gibson
2021-07-26  8:14 ` Jean-Philippe Brucker
2021-07-28  4:05   ` Tian, Kevin
2021-08-04 15:59 ` Eric Auger
2021-08-05  0:36   ` Tian, Kevin
2021-08-10  7:17     ` Eric Auger
2021-08-10  9:00       ` Tian, Kevin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210709155052.2881f561.alex.williamson@redhat.com \
    --to=alex.williamson@redhat.com \
    --cc=ashok.raj@intel.com \
    --cc=corbet@lwn.net \
    --cc=dave.jiang@intel.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=dwmw2@infradead.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jasowang@redhat.com \
    --cc=jean-philippe@linaro.org \
    --cc=jgg@nvidia.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkml@metux.net \
    --cc=lushenming@huawei.com \
    --cc=parav@mellanox.com \
    --cc=pbonzini@redhat.com \
    --cc=robin.murphy@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).