iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/15] vfio: expose virtual Shared Virtual Addressing to VMs
@ 2020-06-11 12:15 Liu Yi L
  2020-06-11 12:15 ` [PATCH v2 01/15] vfio/type1: Refactor vfio_iommu_type1_ioctl() Liu Yi L
                   ` (15 more replies)
  0 siblings, 16 replies; 37+ messages in thread
From: Liu Yi L @ 2020-06-11 12:15 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, jun.j.tian, iommu,
	linux-kernel, yi.y.sun, hao.wu

Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on
Intel platforms allows address space sharing between device DMA and
applications. SVA can reduce programming complexity and enhance security.

This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing
guest application address space with passthru devices. This is called
vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU
changes. For IOMMU and QEMU changes, they are in separate series (listed
in the "Related series").

The high-level architecture for SVA virtualization is as below, the key
design of vSVA support is to utilize the dual-stage IOMMU translation (
also known as IOMMU nesting translation) capability in host IOMMU.


    .-------------.  .---------------------------.
    |   vIOMMU    |  | Guest process CR3, FL only|
    |             |  '---------------------------'
    .----------------/
    | PASID Entry |--- PASID cache flush -
    '-------------'                       |
    |             |                       V
    |             |                CR3 in GPA
    '-------------'
Guest
------| Shadow |--------------------------|--------
      v        v                          v
Host
    .-------------.  .----------------------.
    |   pIOMMU    |  | Bind FL for GVA-GPA  |
    |             |  '----------------------'
    .----------------/  |
    | PASID Entry |     V (Nested xlate)
    '----------------\.------------------------------.
    |             |   |SL for GPA-HPA, default domain|
    |             |   '------------------------------'
    '-------------'
Where:
 - FL = First level/stage one page tables
 - SL = Second level/stage two page tables

Patch Overview:
 1. a refactor to vfio_iommu_type1 ioctl (patch 0001)
 2. reports IOMMU nesting info to userspace ( patch 0002, 0003 and 0015)
 3. vfio support for PASID allocation and free for VMs (patch 0004, 0005, 0006)
 4. vfio support for binding guest page table to host (patch 0007, 0008, 0009, 0010)
 5. vfio support for IOMMU cache invalidation from VMs (patch 0011)
 6. vfio support for vSVA usage on IOMMU-backed mdevs (patch 0012)
 7. expose PASID capability to VM (patch 0013)
 8. add doc for VFIO dual stage control (patch 0014)

The complete vSVA kernel upstream patches are divided into three phases:
    1. Common APIs and PCI device direct assignment
    2. IOMMU-backed Mediated Device assignment
    3. Page Request Services (PRS) support

This patchset is aiming for the phase 1 and phase 2, and based on Jacob's
below series.
[PATCH v13 0/8] Nested Shared Virtual Address (SVA) VT-d support - merged
https://lkml.org/lkml/2020/5/13/1582

[PATCH v2 0/3] IOMMU user API enhancement - wip
https://lkml.org/lkml/2020/6/11/5

[PATCH 00/10] IOASID extensions for guest SVA - wip
https://lkml.org/lkml/2020/3/25/874

The latest IOASID code added below new interface for itertate all PASIDs of an
ioasid_set. The implementation is not sent out yet as Jacob needs some cleanup,
it can be found in branch vsva-linux-5.7-rc4-v2.
 int ioasid_set_for_each_ioasid(int sid, void (*fn)(ioasid_t id, void *data), void *data);

Complete set for current vSVA can be found in below branch.
This branch also includes some extra modifications to IOASID core code and
vt-d iommu driver cleanup patches.
https://github.com/luxis1999/linux-vsva.git:vsva-linux-5.7-rc4-v2

The corresponding QEMU patch series is included in below branch:
https://github.com/luxis1999/qemu.git:vsva_5.7_rc4_qemu_rfcv6


Regards,
Yi Liu

Changelog:
	- Patch v1 -> Patch v2:
	  a) Refactor vfio_iommu_type1_ioctl() per suggestion from Christoph
	     Hellwig.
	  b) Re-sequence the patch series for better bisect support.
	  c) Report IOMMU nesting cap info in detail instead of a format in
	     v1.
	  d) Enforce one group per nesting type container for vfio iommu type1
	     driver.
	  e) Build the vfio_mm related code from vfio.c to be a separate
	     vfio_pasid.ko.
	  f) Add PASID ownership check in IOMMU driver.
	  g) Adopted to latest IOMMU UAPI design. Removed IOMMU UAPI version
	     check. Added iommu_gpasid_unbind_data for unbind requests from
	     userspace.
	  h) Define a single ioctl:VFIO_IOMMU_NESTING_OP for bind/unbind_gtbl
	     and cahce_invld.
	  i) Document dual stage control in vfio.rst.
	  Patch v1: https://lore.kernel.org/linux-iommu/1584880325-10561-1-git-send-email-yi.l.liu@intel.com/

	- RFC v3 -> Patch v1:
	  a) Address comments to the PASID request(alloc/free) path
	  b) Report PASID alloc/free availabitiy to user-space
	  c) Add a vfio_iommu_type1 parameter to support pasid quota tuning
	  d) Adjusted to latest ioasid code implementation. e.g. remove the
	     code for tracking the allocated PASIDs as latest ioasid code
	     will track it, VFIO could use ioasid_free_set() to free all
	     PASIDs.
	  RFC v3: https://lore.kernel.org/linux-iommu/1580299912-86084-1-git-send-email-yi.l.liu@intel.com/

	- RFC v2 -> v3:
	  a) Refine the whole patchset to fit the roughly parts in this series
	  b) Adds complete vfio PASID management framework. e.g. pasid alloc,
	  free, reclaim in VM crash/down and per-VM PASID quota to prevent
	  PASID abuse.
	  c) Adds IOMMU uAPI version check and page table format check to ensure
	  version compatibility and hardware compatibility.
	  d) Adds vSVA vfio support for IOMMU-backed mdevs.
	  RFC v2: https://lore.kernel.org/linux-iommu/1571919983-3231-1-git-send-email-yi.l.liu@intel.com/

	- RFC v1 -> v2:
	  Dropped vfio: VFIO_IOMMU_ATTACH/DETACH_PASID_TABLE.
	  RFC v1: https://lore.kernel.org/linux-iommu/1562324772-3084-1-git-send-email-yi.l.liu@intel.com/


Eric Auger (1):
  vfio: Document dual stage control

Liu Yi L (13):
  vfio/type1: Refactor vfio_iommu_type1_ioctl()
  iommu: Report domain nesting info
  vfio/type1: Report iommu nesting info to userspace
  vfio: Add PASID allocation/free support
  iommu/vt-d: Support setting ioasid set to domain
  vfio/type1: Add VFIO_IOMMU_PASID_REQUEST (alloc/free)
  iommu/uapi: Add iommu_gpasid_unbind_data
  iommu/vt-d: Check ownership for PASIDs from user-space
  vfio/type1: Support binding guest page tables to PASID
  vfio/type1: Allow invalidating first-level/stage IOMMU cache
  vfio/type1: Add vSVA support for IOMMU-backed mdevs
  vfio/pci: Expose PCIe PASID capability to guest
  iommu/vt-d: Support reporting nesting capability info

Yi Sun (1):
  iommu: Pass domain and unbind_data to sva_unbind_gpasid()

 Documentation/driver-api/vfio.rst  |  64 ++++
 drivers/iommu/intel-iommu.c        | 107 ++++++-
 drivers/iommu/intel-svm.c          |  20 +-
 drivers/iommu/iommu.c              |   4 +-
 drivers/vfio/Kconfig               |   6 +
 drivers/vfio/Makefile              |   1 +
 drivers/vfio/pci/vfio_pci_config.c |   2 +-
 drivers/vfio/vfio_iommu_type1.c    | 614 ++++++++++++++++++++++++++++++++-----
 drivers/vfio/vfio_pasid.c          | 191 ++++++++++++
 include/linux/intel-iommu.h        |  23 +-
 include/linux/iommu.h              |  10 +-
 include/linux/vfio.h               |  54 ++++
 include/uapi/linux/iommu.h         |  47 +++
 include/uapi/linux/vfio.h          |  78 +++++
 14 files changed, 1134 insertions(+), 87 deletions(-)
 create mode 100644 drivers/vfio/vfio_pasid.c

-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 01/15] vfio/type1: Refactor vfio_iommu_type1_ioctl()
  2020-06-11 12:15 [PATCH v2 00/15] vfio: expose virtual Shared Virtual Addressing to VMs Liu Yi L
@ 2020-06-11 12:15 ` Liu Yi L
  2020-06-11 12:15 ` [PATCH v2 02/15] iommu: Report domain nesting info Liu Yi L
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: Liu Yi L @ 2020-06-11 12:15 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, jun.j.tian, iommu,
	linux-kernel, yi.y.sun, hao.wu

This patch refactors the vfio_iommu_type1_ioctl() to use switch instead of
if-else, and each cmd got a helper function.

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 drivers/vfio/vfio_iommu_type1.c | 183 +++++++++++++++++++++++-----------------
 1 file changed, 105 insertions(+), 78 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index cc1d647..402aad3 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -2106,6 +2106,23 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
 	return ret;
 }
 
+static int vfio_iommu_type1_check_extension(struct vfio_iommu *iommu,
+					    unsigned long arg)
+{
+	switch (arg) {
+	case VFIO_TYPE1_IOMMU:
+	case VFIO_TYPE1v2_IOMMU:
+	case VFIO_TYPE1_NESTING_IOMMU:
+		return 1;
+	case VFIO_DMA_CC_IOMMU:
+		if (!iommu)
+			return 0;
+		return vfio_domains_have_iommu_cache(iommu);
+	default:
+		return 0;
+	}
+}
+
 static int vfio_iommu_iova_add_cap(struct vfio_info_cap *caps,
 		 struct vfio_iommu_type1_info_cap_iova_range *cap_iovas,
 		 size_t size)
@@ -2173,110 +2190,120 @@ static int vfio_iommu_iova_build_caps(struct vfio_iommu *iommu,
 	return ret;
 }
 
-static long vfio_iommu_type1_ioctl(void *iommu_data,
-				   unsigned int cmd, unsigned long arg)
+static int vfio_iommu_type1_get_info(struct vfio_iommu *iommu,
+				     unsigned long arg)
 {
-	struct vfio_iommu *iommu = iommu_data;
+	struct vfio_iommu_type1_info info;
 	unsigned long minsz;
+	struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
+	unsigned long capsz;
+	int ret;
 
-	if (cmd == VFIO_CHECK_EXTENSION) {
-		switch (arg) {
-		case VFIO_TYPE1_IOMMU:
-		case VFIO_TYPE1v2_IOMMU:
-		case VFIO_TYPE1_NESTING_IOMMU:
-			return 1;
-		case VFIO_DMA_CC_IOMMU:
-			if (!iommu)
-				return 0;
-			return vfio_domains_have_iommu_cache(iommu);
-		default:
-			return 0;
-		}
-	} else if (cmd == VFIO_IOMMU_GET_INFO) {
-		struct vfio_iommu_type1_info info;
-		struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
-		unsigned long capsz;
-		int ret;
-
-		minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
+	minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
 
-		/* For backward compatibility, cannot require this */
-		capsz = offsetofend(struct vfio_iommu_type1_info, cap_offset);
+	/* For backward compatibility, cannot require this */
+	capsz = offsetofend(struct vfio_iommu_type1_info, cap_offset);
 
-		if (copy_from_user(&info, (void __user *)arg, minsz))
-			return -EFAULT;
+	if (copy_from_user(&info, (void __user *)arg, minsz))
+		return -EFAULT;
 
-		if (info.argsz < minsz)
-			return -EINVAL;
+	if (info.argsz < minsz)
+		return -EINVAL;
 
-		if (info.argsz >= capsz) {
-			minsz = capsz;
-			info.cap_offset = 0; /* output, no-recopy necessary */
-		}
+	if (info.argsz >= capsz) {
+		minsz = capsz;
+		info.cap_offset = 0; /* output, no-recopy necessary */
+	}
 
-		info.flags = VFIO_IOMMU_INFO_PGSIZES;
+	info.flags = VFIO_IOMMU_INFO_PGSIZES;
 
-		info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
+	info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
 
-		ret = vfio_iommu_iova_build_caps(iommu, &caps);
-		if (ret)
-			return ret;
+	ret = vfio_iommu_iova_build_caps(iommu, &caps);
+	if (ret)
+		return ret;
 
-		if (caps.size) {
-			info.flags |= VFIO_IOMMU_INFO_CAPS;
+	if (caps.size) {
+		info.flags |= VFIO_IOMMU_INFO_CAPS;
 
-			if (info.argsz < sizeof(info) + caps.size) {
-				info.argsz = sizeof(info) + caps.size;
-			} else {
-				vfio_info_cap_shift(&caps, sizeof(info));
-				if (copy_to_user((void __user *)arg +
-						sizeof(info), caps.buf,
-						caps.size)) {
-					kfree(caps.buf);
-					return -EFAULT;
-				}
-				info.cap_offset = sizeof(info);
+		if (info.argsz < sizeof(info) + caps.size) {
+			info.argsz = sizeof(info) + caps.size;
+		} else {
+			vfio_info_cap_shift(&caps, sizeof(info));
+			if (copy_to_user((void __user *)arg +
+					sizeof(info), caps.buf,
+					caps.size)) {
+				kfree(caps.buf);
+				return -EFAULT;
 			}
-
-			kfree(caps.buf);
+			info.cap_offset = sizeof(info);
 		}
 
-		return copy_to_user((void __user *)arg, &info, minsz) ?
-			-EFAULT : 0;
+		kfree(caps.buf);
+	}
 
-	} else if (cmd == VFIO_IOMMU_MAP_DMA) {
-		struct vfio_iommu_type1_dma_map map;
-		uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
-				VFIO_DMA_MAP_FLAG_WRITE;
+	return copy_to_user((void __user *)arg, &info, minsz) ?
+		-EFAULT : 0;
 
-		minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
+}
 
-		if (copy_from_user(&map, (void __user *)arg, minsz))
-			return -EFAULT;
+static int vfio_iommu_type1_map_dma(struct vfio_iommu *iommu,
+				    unsigned long arg)
+{
+	struct vfio_iommu_type1_dma_map map;
+	unsigned long minsz;
+	uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
+			VFIO_DMA_MAP_FLAG_WRITE;
 
-		if (map.argsz < minsz || map.flags & ~mask)
-			return -EINVAL;
+	minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
 
-		return vfio_dma_do_map(iommu, &map);
+	if (copy_from_user(&map, (void __user *)arg, minsz))
+		return -EFAULT;
 
-	} else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
-		struct vfio_iommu_type1_dma_unmap unmap;
-		long ret;
+	if (map.argsz < minsz || map.flags & ~mask)
+		return -EINVAL;
 
-		minsz = offsetofend(struct vfio_iommu_type1_dma_unmap, size);
+	return vfio_dma_do_map(iommu, &map);
+}
 
-		if (copy_from_user(&unmap, (void __user *)arg, minsz))
-			return -EFAULT;
+static int vfio_iommu_type1_unmap_dma(struct vfio_iommu *iommu,
+				    unsigned long arg)
+{
+	struct vfio_iommu_type1_dma_unmap unmap;
+	unsigned long minsz;
+	long ret;
 
-		if (unmap.argsz < minsz || unmap.flags)
-			return -EINVAL;
+	minsz = offsetofend(struct vfio_iommu_type1_dma_unmap, size);
 
-		ret = vfio_dma_do_unmap(iommu, &unmap);
-		if (ret)
-			return ret;
+	if (copy_from_user(&unmap, (void __user *)arg, minsz))
+		return -EFAULT;
+
+	if (unmap.argsz < minsz || unmap.flags)
+		return -EINVAL;
+
+	ret = vfio_dma_do_unmap(iommu, &unmap);
+	if (ret)
+		return ret;
+
+	return copy_to_user((void __user *)arg, &unmap, minsz) ?
+		-EFAULT : 0;
+
+}
+
+static long vfio_iommu_type1_ioctl(void *iommu_data,
+				   unsigned int cmd, unsigned long arg)
+{
+	struct vfio_iommu *iommu = iommu_data;
 
-		return copy_to_user((void __user *)arg, &unmap, minsz) ?
-			-EFAULT : 0;
+	switch (cmd) {
+	case VFIO_CHECK_EXTENSION:
+		return vfio_iommu_type1_check_extension(iommu, arg);
+	case VFIO_IOMMU_GET_INFO:
+		return vfio_iommu_type1_get_info(iommu, arg);
+	case VFIO_IOMMU_MAP_DMA:
+		return vfio_iommu_type1_map_dma(iommu, arg);
+	case VFIO_IOMMU_UNMAP_DMA:
+		return vfio_iommu_type1_unmap_dma(iommu, arg);
 	}
 
 	return -ENOTTY;
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 02/15] iommu: Report domain nesting info
  2020-06-11 12:15 [PATCH v2 00/15] vfio: expose virtual Shared Virtual Addressing to VMs Liu Yi L
  2020-06-11 12:15 ` [PATCH v2 01/15] vfio/type1: Refactor vfio_iommu_type1_ioctl() Liu Yi L
@ 2020-06-11 12:15 ` Liu Yi L
  2020-06-11 19:30   ` Alex Williamson
  2020-06-17 14:39   ` Jean-Philippe Brucker
  2020-06-11 12:15 ` [PATCH v2 03/15] vfio/type1: Report iommu nesting info to userspace Liu Yi L
                   ` (13 subsequent siblings)
  15 siblings, 2 replies; 37+ messages in thread
From: Liu Yi L @ 2020-06-11 12:15 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, jun.j.tian, iommu,
	linux-kernel, yi.y.sun, hao.wu

IOMMUs that support nesting translation needs report the capability info
to userspace, e.g. the format of first level/stage paging structures.

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
@Jean, Eric: as nesting was introduced for ARM, but looks like no actual
user of it. right? So I'm wondering if we can reuse DOMAIN_ATTR_NESTING
to retrieve nesting info? how about your opinions?

 include/linux/iommu.h      |  1 +
 include/uapi/linux/iommu.h | 34 ++++++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 78a26ae..f6e4b49 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -126,6 +126,7 @@ enum iommu_attr {
 	DOMAIN_ATTR_FSL_PAMUV1,
 	DOMAIN_ATTR_NESTING,	/* two stages of translation */
 	DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
+	DOMAIN_ATTR_NESTING_INFO,
 	DOMAIN_ATTR_MAX,
 };
 
diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index 303f148..02eac73 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -332,4 +332,38 @@ struct iommu_gpasid_bind_data {
 	};
 };
 
+struct iommu_nesting_info {
+	__u32	size;
+	__u32	format;
+	__u32	features;
+#define IOMMU_NESTING_FEAT_SYSWIDE_PASID	(1 << 0)
+#define IOMMU_NESTING_FEAT_BIND_PGTBL		(1 << 1)
+#define IOMMU_NESTING_FEAT_CACHE_INVLD		(1 << 2)
+	__u32	flags;
+	__u8	data[];
+};
+
+/*
+ * @flags:	VT-d specific flags. Currently reserved for future
+ *		extension.
+ * @addr_width:	The output addr width of first level/stage translation
+ * @pasid_bits:	Maximum supported PASID bits, 0 represents no PASID
+ *		support.
+ * @cap_reg:	Describe basic capabilities as defined in VT-d capability
+ *		register.
+ * @cap_mask:	Mark valid capability bits in @cap_reg.
+ * @ecap_reg:	Describe the extended capabilities as defined in VT-d
+ *		extended capability register.
+ * @ecap_mask:	Mark the valid capability bits in @ecap_reg.
+ */
+struct iommu_nesting_info_vtd {
+	__u32	flags;
+	__u16	addr_width;
+	__u16	pasid_bits;
+	__u64	cap_reg;
+	__u64	cap_mask;
+	__u64	ecap_reg;
+	__u64	ecap_mask;
+};
+
 #endif /* _UAPI_IOMMU_H */
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 03/15] vfio/type1: Report iommu nesting info to userspace
  2020-06-11 12:15 [PATCH v2 00/15] vfio: expose virtual Shared Virtual Addressing to VMs Liu Yi L
  2020-06-11 12:15 ` [PATCH v2 01/15] vfio/type1: Refactor vfio_iommu_type1_ioctl() Liu Yi L
  2020-06-11 12:15 ` [PATCH v2 02/15] iommu: Report domain nesting info Liu Yi L
@ 2020-06-11 12:15 ` Liu Yi L
  2020-06-11 12:15 ` [PATCH v2 04/15] vfio: Add PASID allocation/free support Liu Yi L
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: Liu Yi L @ 2020-06-11 12:15 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, jun.j.tian, iommu,
	linux-kernel, yi.y.sun, hao.wu

This patch exports iommu nesting capability info to user space through
VFIO. User space is expected to check this info for supported uAPIs (e.g.
PASID alloc/free, bind page table, and cache invalidation) and the vendor
specific format information for first level/stage page table that will be
bound to.

The nesting info is available only after the nesting iommu type is set
for a container. Current implementation imposes one limitation - one
nesting container should include at most one group. The philosophy of
vfio container is having all groups/devices within the container share
the same IOMMU context. When vSVA is enabled, one IOMMU context could
include one 2nd-level address space and multiple 1st-level address spaces.
While the 2nd-leve address space is reasonably sharable by multiple groups
, blindly sharing 1st-level address spaces across all groups within the
container might instead break the guest expectation. In the future sub/
super container concept might be introduced to allow partial address space
sharing within an IOMMU context. But for now let's go with this restriction
by requiring singleton container for using nesting iommu features. Below
link has the related discussion about this decision.

https://lkml.org/lkml/2020/5/15/1028

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 drivers/vfio/vfio_iommu_type1.c | 73 +++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/vfio.h       |  9 +++++
 2 files changed, 82 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 402aad3..22432cf 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -71,6 +71,7 @@ struct vfio_iommu {
 	unsigned int		dma_avail;
 	bool			v2;
 	bool			nesting;
+	struct iommu_nesting_info *nesting_info;
 };
 
 struct vfio_domain {
@@ -125,6 +126,9 @@ struct vfio_regions {
 #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)	\
 					(!list_empty(&iommu->domain_list))
 
+#define IS_DOMAIN_IN_CONTAINER(iommu)	((iommu->external_domain) || \
+					 (!list_empty(&iommu->domain_list)))
+
 static int put_pfn(unsigned long pfn, int prot);
 
 /*
@@ -1641,6 +1645,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 		}
 	}
 
+	/* Nesting type container can include only one group */
+	if (iommu->nesting && IS_DOMAIN_IN_CONTAINER(iommu)) {
+		mutex_unlock(&iommu->lock);
+		return -EINVAL;
+	}
+
 	group = kzalloc(sizeof(*group), GFP_KERNEL);
 	domain = kzalloc(sizeof(*domain), GFP_KERNEL);
 	if (!group || !domain) {
@@ -1700,6 +1710,36 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	if (ret)
 		goto out_domain;
 
+	/* Nesting cap info is available only after attaching */
+	if (iommu->nesting) {
+		struct iommu_nesting_info tmp;
+		struct iommu_nesting_info *info;
+
+		/* First get the size of vendor specific nesting info */
+		ret = iommu_domain_get_attr(domain->domain,
+					    DOMAIN_ATTR_NESTING_INFO,
+					    &tmp);
+		if (ret)
+			goto out_detach;
+
+		info = kzalloc(tmp.size, GFP_KERNEL);
+		if (!info) {
+			ret = -ENOMEM;
+			goto out_detach;
+		}
+
+		/* Now get the nesting info */
+		info->size = tmp.size;
+		ret = iommu_domain_get_attr(domain->domain,
+					    DOMAIN_ATTR_NESTING_INFO,
+					    info);
+		if (ret) {
+			kfree(info);
+			goto out_detach;
+		}
+		iommu->nesting_info = info;
+	}
+
 	/* Get aperture info */
 	iommu_domain_get_attr(domain->domain, DOMAIN_ATTR_GEOMETRY, &geo);
 
@@ -1801,6 +1841,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	return 0;
 
 out_detach:
+	kfree(iommu->nesting_info);
 	vfio_iommu_detach_group(domain, group);
 out_domain:
 	iommu_domain_free(domain->domain);
@@ -1998,6 +2039,8 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
 					vfio_iommu_unmap_unpin_all(iommu);
 				else
 					vfio_iommu_unmap_unpin_reaccount(iommu);
+
+				kfree(iommu->nesting_info);
 			}
 			iommu_domain_free(domain->domain);
 			list_del(&domain->next);
@@ -2190,6 +2233,30 @@ static int vfio_iommu_iova_build_caps(struct vfio_iommu *iommu,
 	return ret;
 }
 
+static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu,
+					   struct vfio_info_cap *caps)
+{
+	struct vfio_info_cap_header *header;
+	struct vfio_iommu_type1_info_cap_nesting *nesting_cap;
+	size_t size;
+
+	size = sizeof(*nesting_cap) + iommu->nesting_info->size;
+
+	header = vfio_info_cap_add(caps, size,
+				   VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1);
+	if (IS_ERR(header))
+		return PTR_ERR(header);
+
+	nesting_cap = container_of(header,
+				   struct vfio_iommu_type1_info_cap_nesting,
+				   header);
+
+	memcpy(&nesting_cap->info, iommu->nesting_info,
+	       iommu->nesting_info->size);
+
+	return 0;
+}
+
 static int vfio_iommu_type1_get_info(struct vfio_iommu *iommu,
 				     unsigned long arg)
 {
@@ -2223,6 +2290,12 @@ static int vfio_iommu_type1_get_info(struct vfio_iommu *iommu,
 	if (ret)
 		return ret;
 
+	if (iommu->nesting_info) {
+		ret = vfio_iommu_info_add_nesting_cap(iommu, &caps);
+		if (ret)
+			return ret;
+	}
+
 	if (caps.size) {
 		info.flags |= VFIO_IOMMU_INFO_CAPS;
 
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 015516b..26e3dce 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -14,6 +14,7 @@
 
 #include <linux/types.h>
 #include <linux/ioctl.h>
+#include <linux/iommu.h>
 
 #define VFIO_API_VERSION	0
 
@@ -785,6 +786,14 @@ struct vfio_iommu_type1_info_cap_iova_range {
 	struct	vfio_iova_range iova_ranges[];
 };
 
+#define VFIO_IOMMU_TYPE1_INFO_CAP_NESTING  2
+
+struct vfio_iommu_type1_info_cap_nesting {
+	struct	vfio_info_cap_header header;
+	__u32	flags;
+	__u8	info[];
+};
+
 #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
 
 /**
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 04/15] vfio: Add PASID allocation/free support
  2020-06-11 12:15 [PATCH v2 00/15] vfio: expose virtual Shared Virtual Addressing to VMs Liu Yi L
                   ` (2 preceding siblings ...)
  2020-06-11 12:15 ` [PATCH v2 03/15] vfio/type1: Report iommu nesting info to userspace Liu Yi L
@ 2020-06-11 12:15 ` Liu Yi L
  2020-06-11 12:15 ` [PATCH v2 05/15] iommu/vt-d: Support setting ioasid set to domain Liu Yi L
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: Liu Yi L @ 2020-06-11 12:15 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, jun.j.tian, iommu,
	linux-kernel, yi.y.sun, hao.wu

Shared Virtual Addressing (a.k.a Shared Virtual Memory) allows sharing
multiple process virtual address spaces with the device for simplified
programming model. PASID is used to tag an virtual address space in DMA
requests and to identify the related translation structure in IOMMU. When
a PASID-capable device is assigned to a VM, we want the same capability
of using PASID to tag guest process virtual address spaces to achieve
virtual SVA (vSVA).

PASID management for guest is vendor specific. Some vendors (e.g. Intel
VT-d) requires system-wide managed PASIDs cross all devices, regardless
of whether a device is used by host or assigned to guest. Other vendors
(e.g. ARM SMMU) may allow PASIDs managed per-device thus could be fully
delegated to the guest for assigned devices.

For system-wide managed PASIDs, this patch introduces a vfio module to
handle explicit PASID alloc/free requests from guest. Allocated PASIDs
are associated to a process (or, mm_struct) in IOASID core. A vfio_mm
object is introduced to track mm_struct. Multiple VFIO containers within
a process share the same vfio_mm object.

A quota mechanism is provided to prevent malicious user from exhausting
available PASIDs. Currently the quota is a global parameter applied to
all VFIO devices. In the future per-device quota might be supported too.

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
v1 -> v2:
*) added in v2, split from the pasid alloc/free support of v1

 drivers/vfio/Kconfig      |   5 ++
 drivers/vfio/Makefile     |   1 +
 drivers/vfio/vfio_pasid.c | 151 ++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/vfio.h      |  28 +++++++++
 4 files changed, 185 insertions(+)
 create mode 100644 drivers/vfio/vfio_pasid.c

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index fd17db9..3d8a108 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -19,6 +19,11 @@ config VFIO_VIRQFD
 	depends on VFIO && EVENTFD
 	default n
 
+config VFIO_PASID
+	tristate
+	depends on IOASID && VFIO
+	default n
+
 menuconfig VFIO
 	tristate "VFIO Non-Privileged userspace driver framework"
 	depends on IOMMU_API
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index de67c47..bb836a3 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -3,6 +3,7 @@ vfio_virqfd-y := virqfd.o
 
 obj-$(CONFIG_VFIO) += vfio.o
 obj-$(CONFIG_VFIO_VIRQFD) += vfio_virqfd.o
+obj-$(CONFIG_VFIO_PASID) += vfio_pasid.o
 obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
 obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
 obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o
diff --git a/drivers/vfio/vfio_pasid.c b/drivers/vfio/vfio_pasid.c
new file mode 100644
index 0000000..dd5b6d1
--- /dev/null
+++ b/drivers/vfio/vfio_pasid.c
@@ -0,0 +1,151 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Intel Corporation.
+ *     Author: Liu Yi L <yi.l.liu@intel.com>
+ *
+ */
+
+#include <linux/vfio.h>
+#include <linux/eventfd.h>
+#include <linux/file.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/sched/mm.h>
+
+#define DRIVER_VERSION  "0.1"
+#define DRIVER_AUTHOR   "Liu Yi L <yi.l.liu@intel.com>"
+#define DRIVER_DESC     "PASID management for VFIO bus drivers"
+
+#define VFIO_DEFAULT_PASID_QUOTA	1000
+static int pasid_quota = VFIO_DEFAULT_PASID_QUOTA;
+module_param_named(pasid_quota, pasid_quota, uint, 0444);
+MODULE_PARM_DESC(pasid_quota,
+		 " Set the quota for max number of PASIDs that an application is allowed to request (default 1000)");
+
+struct vfio_mm_token {
+	unsigned long long val;
+};
+
+struct vfio_mm {
+	struct kref		kref;
+	struct vfio_mm_token	token;
+	int			ioasid_sid;
+	int			pasid_quota;
+	struct list_head	next;
+};
+
+static struct vfio_pasid {
+	struct mutex		vfio_mm_lock;
+	struct list_head	vfio_mm_list;
+} vfio_pasid;
+
+/* called with vfio.vfio_mm_lock held */
+static void vfio_mm_release(struct kref *kref)
+{
+	struct vfio_mm *vmm = container_of(kref, struct vfio_mm, kref);
+
+	list_del(&vmm->next);
+	mutex_unlock(&vfio_pasid.vfio_mm_lock);
+	ioasid_free_set(vmm->ioasid_sid, true);
+	kfree(vmm);
+}
+
+void vfio_mm_put(struct vfio_mm *vmm)
+{
+	kref_put_mutex(&vmm->kref, vfio_mm_release, &vfio_pasid.vfio_mm_lock);
+}
+
+static void vfio_mm_get(struct vfio_mm *vmm)
+{
+	kref_get(&vmm->kref);
+}
+
+struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task)
+{
+	struct mm_struct *mm = get_task_mm(task);
+	struct vfio_mm *vmm;
+	unsigned long long val = (unsigned long long) mm;
+	int ret;
+
+	mutex_lock(&vfio_pasid.vfio_mm_lock);
+	/* Search existing vfio_mm with current mm pointer */
+	list_for_each_entry(vmm, &vfio_pasid.vfio_mm_list, next) {
+		if (vmm->token.val == val) {
+			vfio_mm_get(vmm);
+			goto out;
+		}
+	}
+
+	vmm = kzalloc(sizeof(*vmm), GFP_KERNEL);
+	if (!vmm)
+		return ERR_PTR(-ENOMEM);
+
+	/*
+	 * IOASID core provides a 'IOASID set' concept to track all
+	 * PASIDs associated with a token. Here we use mm_struct as
+	 * the token and create a IOASID set per mm_struct. All the
+	 * containers of the process share the same IOASID set.
+	 */
+	ret = ioasid_alloc_set((struct ioasid_set *) mm, pasid_quota,
+			       &vmm->ioasid_sid);
+	if (ret) {
+		kfree(vmm);
+		return ERR_PTR(ret);
+	}
+
+	kref_init(&vmm->kref);
+	vmm->token.val = (unsigned long long) mm;
+	vmm->pasid_quota = pasid_quota;
+
+	list_add(&vmm->next, &vfio_pasid.vfio_mm_list);
+out:
+	mutex_unlock(&vfio_pasid.vfio_mm_lock);
+	mmput(mm);
+	return vmm;
+}
+
+int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max)
+{
+	ioasid_t pasid;
+
+	pasid = ioasid_alloc(vmm->ioasid_sid, min, max, NULL);
+
+	return (pasid == INVALID_IOASID) ? -ENOSPC : pasid;
+}
+
+void vfio_pasid_free_range(struct vfio_mm *vmm,
+			    ioasid_t min, ioasid_t max)
+{
+	ioasid_t pasid = min;
+
+	if (min > max)
+		return;
+
+	/*
+	 * IOASID core will notify PASID users (e.g. IOMMU driver) to
+	 * teardown necessary structures depending on the to-be-freed
+	 * PASID.
+	 */
+	for (; pasid <= max; pasid++)
+		ioasid_free(pasid);
+}
+
+static int __init vfio_pasid_init(void)
+{
+	mutex_init(&vfio_pasid.vfio_mm_lock);
+	INIT_LIST_HEAD(&vfio_pasid.vfio_mm_list);
+	return 0;
+}
+
+static void __exit vfio_pasid_exit(void)
+{
+	WARN_ON(!list_empty(&vfio_pasid.vfio_mm_list));
+}
+
+module_init(vfio_pasid_init);
+module_exit(vfio_pasid_exit);
+
+MODULE_VERSION(DRIVER_VERSION);
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 5d92ee1..104418a 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -95,6 +95,34 @@ extern int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops *ops);
 extern void vfio_unregister_iommu_driver(
 				const struct vfio_iommu_driver_ops *ops);
 
+struct vfio_mm;
+#if IS_ENABLED(CONFIG_VFIO_PASID)
+extern struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task);
+extern void vfio_mm_put(struct vfio_mm *vmm);
+extern int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max);
+extern void vfio_pasid_free_range(struct vfio_mm *vmm,
+					ioasid_t min, ioasid_t max);
+#else
+static inline struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task)
+{
+	return NULL;
+}
+
+static inline void vfio_mm_put(struct vfio_mm *vmm)
+{
+}
+
+static inline int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max)
+{
+	return -ENOTTY;
+}
+
+static inline void vfio_pasid_free_range(struct vfio_mm *vmm,
+					  ioasid_t min, ioasid_t max)
+{
+}
+#endif /* CONFIG_VFIO_PASID */
+
 /*
  * External user API
  */
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 05/15] iommu/vt-d: Support setting ioasid set to domain
  2020-06-11 12:15 [PATCH v2 00/15] vfio: expose virtual Shared Virtual Addressing to VMs Liu Yi L
                   ` (3 preceding siblings ...)
  2020-06-11 12:15 ` [PATCH v2 04/15] vfio: Add PASID allocation/free support Liu Yi L
@ 2020-06-11 12:15 ` Liu Yi L
  2020-06-11 12:15 ` [PATCH v2 06/15] vfio/type1: Add VFIO_IOMMU_PASID_REQUEST (alloc/free) Liu Yi L
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: Liu Yi L @ 2020-06-11 12:15 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, jun.j.tian, iommu,
	linux-kernel, yi.y.sun, hao.wu

From IOMMU p.o.v., PASIDs allocated and managed by external components
(e.g. VFIO) will be passed in for gpasid_bind/unbind operation. IOMMU
needs some knowledge to check the PASID ownership, hence add an interface
for those components to tell the PASID owner.

In latest kernel design, PASID ownership is managed by IOASID set where
the PASID is allocated from. This patch adds support for setting ioasid
set ID to the domains used for nesting/vSVA. Subsequent SVA operations
on the PASID will be checked against its IOASID set for proper ownership.

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/intel-iommu.c | 16 ++++++++++++++++
 include/linux/intel-iommu.h |  4 ++++
 include/linux/iommu.h       |  1 +
 3 files changed, 21 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 5628e4b..2d59a5d 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1788,6 +1788,7 @@ static struct dmar_domain *alloc_domain(int flags)
 	if (first_level_by_default())
 		domain->flags |= DOMAIN_FLAG_USE_FIRST_LEVEL;
 	domain->has_iotlb_device = false;
+	domain->ioasid_sid = INVALID_IOASID_SET;
 	INIT_LIST_HEAD(&domain->devices);
 
 	return domain;
@@ -6035,6 +6036,21 @@ intel_iommu_domain_set_attr(struct iommu_domain *domain,
 		}
 		spin_unlock_irqrestore(&device_domain_lock, flags);
 		break;
+	case DOMAIN_ATTR_IOASID_SID:
+		if (!(dmar_domain->flags & DOMAIN_FLAG_NESTING_MODE)) {
+			ret = -ENODEV;
+			break;
+		}
+		if ((dmar_domain->ioasid_sid != INVALID_IOASID_SET) &&
+		    (dmar_domain->ioasid_sid != (*(int *) data))) {
+			pr_warn_ratelimited("multi ioasid_set (%d:%d) setting",
+					    dmar_domain->ioasid_sid,
+					    (*(int *) data));
+			ret = -EBUSY;
+			break;
+		}
+		dmar_domain->ioasid_sid = *(int *) data;
+		break;
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 1e02624..29d1c6f 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -548,6 +548,10 @@ struct dmar_domain {
 					   2 == 1GiB, 3 == 512GiB, 4 == 1TiB */
 	u64		max_addr;	/* maximum mapped address */
 
+	int		ioasid_sid;	/*
+					 * the ioasid set which tracks all
+					 * PASIDs used by the domain.
+					 */
 	int		default_pasid;	/*
 					 * The default pasid used for non-SVM
 					 * traffic on mediated devices.
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index f6e4b49..57c46ae 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -127,6 +127,7 @@ enum iommu_attr {
 	DOMAIN_ATTR_NESTING,	/* two stages of translation */
 	DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
 	DOMAIN_ATTR_NESTING_INFO,
+	DOMAIN_ATTR_IOASID_SID,
 	DOMAIN_ATTR_MAX,
 };
 
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 06/15] vfio/type1: Add VFIO_IOMMU_PASID_REQUEST (alloc/free)
  2020-06-11 12:15 [PATCH v2 00/15] vfio: expose virtual Shared Virtual Addressing to VMs Liu Yi L
                   ` (4 preceding siblings ...)
  2020-06-11 12:15 ` [PATCH v2 05/15] iommu/vt-d: Support setting ioasid set to domain Liu Yi L
@ 2020-06-11 12:15 ` Liu Yi L
  2020-06-11 12:15 ` [PATCH v2 07/15] iommu/uapi: Add iommu_gpasid_unbind_data Liu Yi L
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: Liu Yi L @ 2020-06-11 12:15 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, jun.j.tian, iommu,
	linux-kernel, yi.y.sun, hao.wu

This patch allows user space to request PASID allocation/free, e.g. when
serving the request from the guest.

PASIDs that are not freed by userspace are automatically freed when the
IOASID set is destroyed when process exits.

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
v1 -> v2:
*) move the vfio_mm related code to be a seprate module
*) use a single structure for alloc/free, could support a range of PASIDs
*) fetch vfio_mm at group_attach time instead of at iommu driver open time

 drivers/vfio/Kconfig            |  1 +
 drivers/vfio/vfio_iommu_type1.c | 96 ++++++++++++++++++++++++++++++++++++++++-
 drivers/vfio/vfio_pasid.c       | 10 +++++
 include/linux/vfio.h            |  6 +++
 include/uapi/linux/vfio.h       | 36 ++++++++++++++++
 5 files changed, 147 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index 3d8a108..95d90c6 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -2,6 +2,7 @@
 config VFIO_IOMMU_TYPE1
 	tristate
 	depends on VFIO
+	select VFIO_PASID if (X86)
 	default n
 
 config VFIO_IOMMU_SPAPR_TCE
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 22432cf..a7b3a83 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -72,6 +72,7 @@ struct vfio_iommu {
 	bool			v2;
 	bool			nesting;
 	struct iommu_nesting_info *nesting_info;
+	struct vfio_mm		*vmm;
 };
 
 struct vfio_domain {
@@ -1615,6 +1616,17 @@ static void vfio_iommu_iova_insert_copy(struct vfio_iommu *iommu,
 
 	list_splice_tail(iova_copy, iova);
 }
+
+static void vfio_iommu_release_nesting_info(struct vfio_iommu *iommu)
+{
+	if (iommu->vmm) {
+		vfio_mm_put(iommu->vmm);
+		iommu->vmm = NULL;
+	}
+
+	kfree(iommu->nesting_info);
+}
+
 static int vfio_iommu_type1_attach_group(void *iommu_data,
 					 struct iommu_group *iommu_group)
 {
@@ -1738,6 +1750,25 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 			goto out_detach;
 		}
 		iommu->nesting_info = info;
+
+		if (info->features & IOMMU_NESTING_FEAT_SYSWIDE_PASID) {
+			struct vfio_mm *vmm;
+			int sid;
+
+			vmm = vfio_mm_get_from_task(current);
+			if (IS_ERR(vmm)) {
+				ret = PTR_ERR(vmm);
+				goto out_detach;
+			}
+			iommu->vmm = vmm;
+
+			sid = vfio_mm_ioasid_sid(vmm);
+			ret = iommu_domain_set_attr(domain->domain,
+						    DOMAIN_ATTR_IOASID_SID,
+						    &sid);
+			if (ret)
+				goto out_detach;
+		}
 	}
 
 	/* Get aperture info */
@@ -1841,7 +1872,8 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	return 0;
 
 out_detach:
-	kfree(iommu->nesting_info);
+	if (iommu->nesting_info)
+		vfio_iommu_release_nesting_info(iommu);
 	vfio_iommu_detach_group(domain, group);
 out_domain:
 	iommu_domain_free(domain->domain);
@@ -2040,7 +2072,8 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
 				else
 					vfio_iommu_unmap_unpin_reaccount(iommu);
 
-				kfree(iommu->nesting_info);
+				if (iommu->nesting_info)
+					vfio_iommu_release_nesting_info(iommu);
 			}
 			iommu_domain_free(domain->domain);
 			list_del(&domain->next);
@@ -2363,6 +2396,63 @@ static int vfio_iommu_type1_unmap_dma(struct vfio_iommu *iommu,
 
 }
 
+static int vfio_iommu_type1_pasid_alloc(struct vfio_iommu *iommu,
+					unsigned int min,
+					unsigned int max)
+{
+	int ret = -ENOTSUPP;
+
+	mutex_lock(&iommu->lock);
+	if (iommu->vmm)
+		ret = vfio_pasid_alloc(iommu->vmm, min, max);
+	mutex_unlock(&iommu->lock);
+	return ret;
+}
+
+static int vfio_iommu_type1_pasid_free(struct vfio_iommu *iommu,
+					unsigned int min,
+					unsigned int max)
+{
+	int ret = -ENOTSUPP;
+
+	mutex_lock(&iommu->lock);
+	if (iommu->vmm) {
+		vfio_pasid_free_range(iommu->vmm, min, max);
+		ret = 0;
+	}
+	mutex_unlock(&iommu->lock);
+	return ret;
+}
+
+static int vfio_iommu_type1_pasid_request(struct vfio_iommu *iommu,
+					  unsigned long arg)
+{
+	struct vfio_iommu_type1_pasid_request req;
+	unsigned long minsz;
+
+	minsz = offsetofend(struct vfio_iommu_type1_pasid_request, range);
+
+	if (copy_from_user(&req, (void __user *)arg, minsz))
+		return -EFAULT;
+
+	if (req.argsz < minsz || (req.flags & ~VFIO_PASID_REQUEST_MASK))
+		return -EINVAL;
+
+	if (req.range.min > req.range.max)
+		return -EINVAL;
+
+	switch (req.flags & VFIO_PASID_REQUEST_MASK) {
+	case VFIO_IOMMU_ALLOC_PASID:
+		return vfio_iommu_type1_pasid_alloc(iommu,
+					req.range.min, req.range.max);
+	case VFIO_IOMMU_FREE_PASID:
+		return vfio_iommu_type1_pasid_free(iommu,
+					req.range.min, req.range.max);
+	default:
+		return -EINVAL;
+	}
+}
+
 static long vfio_iommu_type1_ioctl(void *iommu_data,
 				   unsigned int cmd, unsigned long arg)
 {
@@ -2377,6 +2467,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 		return vfio_iommu_type1_map_dma(iommu, arg);
 	case VFIO_IOMMU_UNMAP_DMA:
 		return vfio_iommu_type1_unmap_dma(iommu, arg);
+	case VFIO_IOMMU_PASID_REQUEST:
+		return vfio_iommu_type1_pasid_request(iommu, arg);
 	}
 
 	return -ENOTTY;
diff --git a/drivers/vfio/vfio_pasid.c b/drivers/vfio/vfio_pasid.c
index dd5b6d1..2ea9f1a 100644
--- a/drivers/vfio/vfio_pasid.c
+++ b/drivers/vfio/vfio_pasid.c
@@ -54,6 +54,7 @@ void vfio_mm_put(struct vfio_mm *vmm)
 {
 	kref_put_mutex(&vmm->kref, vfio_mm_release, &vfio_pasid.vfio_mm_lock);
 }
+EXPORT_SYMBOL_GPL(vfio_mm_put);
 
 static void vfio_mm_get(struct vfio_mm *vmm)
 {
@@ -103,6 +104,13 @@ struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task)
 	mmput(mm);
 	return vmm;
 }
+EXPORT_SYMBOL_GPL(vfio_mm_get_from_task);
+
+int vfio_mm_ioasid_sid(struct vfio_mm *vmm)
+{
+	return vmm->ioasid_sid;
+}
+EXPORT_SYMBOL_GPL(vfio_mm_ioasid_sid);
 
 int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max)
 {
@@ -112,6 +120,7 @@ int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max)
 
 	return (pasid == INVALID_IOASID) ? -ENOSPC : pasid;
 }
+EXPORT_SYMBOL_GPL(vfio_pasid_alloc);
 
 void vfio_pasid_free_range(struct vfio_mm *vmm,
 			    ioasid_t min, ioasid_t max)
@@ -129,6 +138,7 @@ void vfio_pasid_free_range(struct vfio_mm *vmm,
 	for (; pasid <= max; pasid++)
 		ioasid_free(pasid);
 }
+EXPORT_SYMBOL_GPL(vfio_pasid_free_range);
 
 static int __init vfio_pasid_init(void)
 {
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 104418a..76da56e 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -99,6 +99,7 @@ struct vfio_mm;
 #if IS_ENABLED(CONFIG_VFIO_PASID)
 extern struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task);
 extern void vfio_mm_put(struct vfio_mm *vmm);
+int vfio_mm_ioasid_sid(struct vfio_mm *vmm);
 extern int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max);
 extern void vfio_pasid_free_range(struct vfio_mm *vmm,
 					ioasid_t min, ioasid_t max);
@@ -112,6 +113,11 @@ static inline void vfio_mm_put(struct vfio_mm *vmm)
 {
 }
 
+static inline int vfio_mm_ioasid_sid(struct vfio_mm *vmm)
+{
+	return -ENOTTY;
+}
+
 static inline int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max)
 {
 	return -ENOTTY;
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 26e3dce..24d1992 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -840,6 +840,42 @@ struct vfio_iommu_type1_dma_unmap {
 #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
 #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
 
+/**
+ * VFIO_IOMMU_PASID_REQUEST - _IOWR(VFIO_TYPE, VFIO_BASE + 22,
+ *				struct vfio_iommu_type1_pasid_request)
+ *
+ * PASID (Processor Address Space ID) is a PCIe concept for tagging
+ * address spaces in DMA requests. When system-wide PASID allocation
+ * is required by underlying iommu driver (e.g. Intel VT-d), this
+ * provides an interface for userspace to request pasid alloc/free
+ * for its assigned devices. Userspace should check the availability
+ * of this API through VFIO_IOMMU_GET_INFO.
+ *
+ * @flags=VFIO_IOMMU_ALLOC_PASID, allocate a single PASID within @range.
+ * @flags=VFIO_IOMMU_FREE_PASID, free the PASIDs within @range.
+ * @range is [min, max], which means both @min and @max are inclusive.
+ * ALLOC_PASID and FREE_PASID are mutually exclusive.
+ *
+ * returns: allocated PASID value on success, -errno on failure for
+ *	     ALLOC_PASID;
+ *	     0 for FREE_PASID operation;
+ */
+struct vfio_iommu_type1_pasid_request {
+	__u32	argsz;
+#define VFIO_IOMMU_ALLOC_PASID	(1 << 0)
+#define VFIO_IOMMU_FREE_PASID	(1 << 1)
+	__u32	flags;
+	struct {
+		__u32	min;
+		__u32	max;
+	} range;
+};
+
+#define VFIO_PASID_REQUEST_MASK	(VFIO_IOMMU_ALLOC_PASID | \
+					 VFIO_IOMMU_FREE_PASID)
+
+#define VFIO_IOMMU_PASID_REQUEST	_IO(VFIO_TYPE, VFIO_BASE + 22)
+
 /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
 
 /*
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 07/15] iommu/uapi: Add iommu_gpasid_unbind_data
  2020-06-11 12:15 [PATCH v2 00/15] vfio: expose virtual Shared Virtual Addressing to VMs Liu Yi L
                   ` (5 preceding siblings ...)
  2020-06-11 12:15 ` [PATCH v2 06/15] vfio/type1: Add VFIO_IOMMU_PASID_REQUEST (alloc/free) Liu Yi L
@ 2020-06-11 12:15 ` Liu Yi L
  2020-06-11 12:15 ` [PATCH v2 08/15] iommu: Pass domain and unbind_data to sva_unbind_gpasid() Liu Yi L
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: Liu Yi L @ 2020-06-11 12:15 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, jun.j.tian, iommu,
	linux-kernel, yi.y.sun, hao.wu

Existing iommu_gpasid_bind_data is used for binding guest page tables
to a specified PASID. While for unwind it, a unbind_data structure is
needed.

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 include/uapi/linux/iommu.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index 02eac73..46a7c57 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -332,6 +332,19 @@ struct iommu_gpasid_bind_data {
 	};
 };
 
+/**
+ * struct iommu_gpasid_unbind_data - Information about device and guest PASID
+ *				     unbinding
+ * @argsz:	User filled size of this data
+ * @flags:	Additional information on guest unbind request
+ * @pasid:	Process address space ID used for the guest mm in host IOMMU
+ */
+struct iommu_gpasid_unbind_data {
+	__u32 argsz;
+	__u64 flags;
+	__u64 pasid;
+};
+
 struct iommu_nesting_info {
 	__u32	size;
 	__u32	format;
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 08/15] iommu: Pass domain and unbind_data to sva_unbind_gpasid()
  2020-06-11 12:15 [PATCH v2 00/15] vfio: expose virtual Shared Virtual Addressing to VMs Liu Yi L
                   ` (6 preceding siblings ...)
  2020-06-11 12:15 ` [PATCH v2 07/15] iommu/uapi: Add iommu_gpasid_unbind_data Liu Yi L
@ 2020-06-11 12:15 ` Liu Yi L
  2020-06-11 12:15 ` [PATCH v2 09/15] iommu/vt-d: Check ownership for PASIDs from user-space Liu Yi L
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: Liu Yi L @ 2020-06-11 12:15 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, jun.j.tian, iommu,
	linux-kernel, yi.y.sun, hao.wu

From: Yi Sun <yi.y.sun@intel.com>

Current interface is good enough for SVA virtualization on an assigned
physical PCI device, but when it comes to mediated devices, a physical
device may attached with multiple aux-domains. So this interface needs
to pass in domain info. Then the iommu driver is able to know which
domain will be used for the 2nd stage translation of the nesting mode.

This patch passes @domain per the above reason. This interface is supposed
to serve the unbind request from user-space, should pass in unbind_data as
well.

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Sun <yi.y.sun@intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 drivers/iommu/intel-svm.c   | 14 ++++++++++++--
 drivers/iommu/iommu.c       |  4 ++--
 include/linux/intel-iommu.h |  3 ++-
 include/linux/iommu.h       |  8 +++++---
 4 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 6272da6..bf55e2f 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -445,16 +445,26 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
 	return ret;
 }
 
-int intel_svm_unbind_gpasid(struct device *dev, int pasid)
+int intel_svm_unbind_gpasid(struct iommu_domain *domain,
+			    struct device *dev,
+			    struct iommu_gpasid_unbind_data *data)
 {
 	struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
 	struct intel_svm_dev *sdev;
 	struct intel_svm *svm;
 	int ret = -EINVAL;
+	unsigned long minsz;
+	int pasid;
+
+	if (WARN_ON(!iommu) || !data)
+		return -EINVAL;
 
-	if (WARN_ON(!iommu))
+	minsz = offsetofend(struct iommu_gpasid_unbind_data, pasid);
+	if (data->argsz < minsz || data->flags)
 		return -EINVAL;
 
+	pasid = data->pasid;
+
 	mutex_lock(&pasid_mutex);
 	svm = ioasid_find(INVALID_IOASID_SET, pasid, NULL);
 	if (!svm) {
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 374b34f..57aac03 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1955,12 +1955,12 @@ int iommu_sva_bind_gpasid(struct iommu_domain *domain,
 EXPORT_SYMBOL_GPL(iommu_sva_bind_gpasid);
 
 int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
-			     ioasid_t pasid)
+			    struct iommu_gpasid_unbind_data *data)
 {
 	if (unlikely(!domain->ops->sva_unbind_gpasid))
 		return -ENODEV;
 
-	return domain->ops->sva_unbind_gpasid(dev, pasid);
+	return domain->ops->sva_unbind_gpasid(domain, dev, data);
 }
 EXPORT_SYMBOL_GPL(iommu_sva_unbind_gpasid);
 
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 29d1c6f..0b238c3 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -737,7 +737,8 @@ extern int intel_svm_finish_prq(struct intel_iommu *iommu);
 
 int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
 			  struct iommu_gpasid_bind_data *data);
-int intel_svm_unbind_gpasid(struct device *dev, int pasid);
+int intel_svm_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
+			    struct iommu_gpasid_unbind_data *data);
 struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm,
 				 void *drvdata);
 void intel_svm_unbind(struct iommu_sva *handle);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 57c46ae..a19f063 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -325,7 +325,8 @@ struct iommu_ops {
 	int (*sva_bind_gpasid)(struct iommu_domain *domain,
 			struct device *dev, struct iommu_gpasid_bind_data *data);
 
-	int (*sva_unbind_gpasid)(struct device *dev, int pasid);
+	int (*sva_unbind_gpasid)(struct iommu_domain *domain,
+		struct device *dev, struct iommu_gpasid_unbind_data *data);
 
 	int (*def_domain_type)(struct device *dev);
 
@@ -459,7 +460,7 @@ extern int iommu_cache_invalidate(struct iommu_domain *domain,
 extern int iommu_sva_bind_gpasid(struct iommu_domain *domain,
 		struct device *dev, struct iommu_gpasid_bind_data *data);
 extern int iommu_sva_unbind_gpasid(struct iommu_domain *domain,
-				struct device *dev, ioasid_t pasid);
+		struct device *dev, struct iommu_gpasid_unbind_data *data);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
@@ -1084,7 +1085,8 @@ static inline int iommu_sva_bind_gpasid(struct iommu_domain *domain,
 }
 
 static inline int iommu_sva_unbind_gpasid(struct iommu_domain *domain,
-					   struct device *dev, int pasid)
+					  struct device *dev,
+					  struct iommu_gpasid_unbind_data *data)
 {
 	return -ENODEV;
 }
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 09/15] iommu/vt-d: Check ownership for PASIDs from user-space
  2020-06-11 12:15 [PATCH v2 00/15] vfio: expose virtual Shared Virtual Addressing to VMs Liu Yi L
                   ` (7 preceding siblings ...)
  2020-06-11 12:15 ` [PATCH v2 08/15] iommu: Pass domain and unbind_data to sva_unbind_gpasid() Liu Yi L
@ 2020-06-11 12:15 ` Liu Yi L
  2020-06-11 12:15 ` [PATCH v2 10/15] vfio/type1: Support binding guest page tables to PASID Liu Yi L
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: Liu Yi L @ 2020-06-11 12:15 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, jun.j.tian, iommu,
	linux-kernel, yi.y.sun, hao.wu

When an IOMMU domain with nesting attribute is used for guest SVA, a
system-wide PASID is allocated for binding with the device and the domain.
For security reason, we need to check the PASID passsed from user-space.
e.g. page table bind/unbind and PASID related cache invalidation.

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/intel-iommu.c | 10 ++++++++++
 drivers/iommu/intel-svm.c   |  6 ++++--
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 2d59a5d..25650ac 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5433,6 +5433,7 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, struct device *dev,
 		int granu = 0;
 		u64 pasid = 0;
 		u64 addr = 0;
+		void *pdata;
 
 		granu = to_vtd_granularity(cache_type, inv_info->granularity);
 		if (granu == -EINVAL) {
@@ -5452,6 +5453,15 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, struct device *dev,
 			 (inv_info->addr_info.flags & IOMMU_INV_ADDR_FLAGS_PASID))
 			pasid = inv_info->addr_info.pasid;
 
+		pdata = ioasid_find(dmar_domain->ioasid_sid, pasid, NULL);
+		if (!pdata) {
+			ret = -EINVAL;
+			goto out_unlock;
+		} else if (IS_ERR(pdata)) {
+			ret = PTR_ERR(pdata);
+			goto out_unlock;
+		}
+
 		switch (BIT(cache_type)) {
 		case IOMMU_CACHE_INV_TYPE_IOTLB:
 			/* HW will ignore LSB bits based on address mask */
diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index bf55e2f..49059c1 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -332,7 +332,7 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
 	dmar_domain = to_dmar_domain(domain);
 
 	mutex_lock(&pasid_mutex);
-	svm = ioasid_find(INVALID_IOASID_SET, data->hpasid, NULL);
+	svm = ioasid_find(dmar_domain->ioasid_sid, data->hpasid, NULL);
 	if (IS_ERR(svm)) {
 		ret = PTR_ERR(svm);
 		goto out;
@@ -450,6 +450,7 @@ int intel_svm_unbind_gpasid(struct iommu_domain *domain,
 			    struct iommu_gpasid_unbind_data *data)
 {
 	struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
+	struct dmar_domain *dmar_domain;
 	struct intel_svm_dev *sdev;
 	struct intel_svm *svm;
 	int ret = -EINVAL;
@@ -464,9 +465,10 @@ int intel_svm_unbind_gpasid(struct iommu_domain *domain,
 		return -EINVAL;
 
 	pasid = data->pasid;
+	dmar_domain = to_dmar_domain(domain);
 
 	mutex_lock(&pasid_mutex);
-	svm = ioasid_find(INVALID_IOASID_SET, pasid, NULL);
+	svm = ioasid_find(dmar_domain->ioasid_sid, pasid, NULL);
 	if (!svm) {
 		ret = -EINVAL;
 		goto out;
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 10/15] vfio/type1: Support binding guest page tables to PASID
  2020-06-11 12:15 [PATCH v2 00/15] vfio: expose virtual Shared Virtual Addressing to VMs Liu Yi L
                   ` (8 preceding siblings ...)
  2020-06-11 12:15 ` [PATCH v2 09/15] iommu/vt-d: Check ownership for PASIDs from user-space Liu Yi L
@ 2020-06-11 12:15 ` Liu Yi L
  2020-06-11 12:15 ` [PATCH v2 11/15] vfio/type1: Allow invalidating first-level/stage IOMMU cache Liu Yi L
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: Liu Yi L @ 2020-06-11 12:15 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, jun.j.tian, iommu,
	linux-kernel, yi.y.sun, hao.wu

Nesting translation allows two-levels/stages page tables, with 1st level
for guest translations (e.g. GVA->GPA), 2nd level for host translations
(e.g. GPA->HPA). This patch adds interface for binding guest page tables
to a PASID. This PASID must have been allocated to user space before the
binding request.

CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
v1 -> v2:
*) rename subject from "vfio/type1: Bind guest page tables to host"
*) remove VFIO_IOMMU_BIND, introduce VFIO_IOMMU_NESTING_OP to support bind/
   unbind guet page table
*) replaced vfio_iommu_for_each_dev() with a group level loop since this
   series enforces one group per container w/ nesting type as start.
*) rename vfio_bind/unbind_gpasid_fn() to vfio_dev_bind/unbind_gpasid_fn()
*) vfio_dev_unbind_gpasid() always successful
*) use vfio_mm->pasid_lock to avoid race between PASID free and page table
   bind/unbind

Cc: Kevin Tian <kevin.tian@intel.com>
 drivers/vfio/vfio_iommu_type1.c | 191 ++++++++++++++++++++++++++++++++++++++++
 drivers/vfio/vfio_pasid.c       |  30 +++++++
 include/linux/vfio.h            |  20 +++++
 include/uapi/linux/vfio.h       |  30 +++++++
 4 files changed, 271 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index a7b3a83..f1468a0 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -130,6 +130,33 @@ struct vfio_regions {
 #define IS_DOMAIN_IN_CONTAINER(iommu)	((iommu->external_domain) || \
 					 (!list_empty(&iommu->domain_list)))
 
+struct domain_capsule {
+	struct vfio_group *group;
+	struct iommu_domain *domain;
+	void *data;
+};
+
+/* iommu->lock must be held */
+static struct vfio_group *vfio_find_nesting_group(struct vfio_iommu *iommu)
+{
+	struct vfio_domain *d;
+	struct vfio_group *g, *group = NULL;
+
+	if (!iommu->nesting_info)
+		return NULL;
+
+	/* only support singleton container with nesting type */
+	list_for_each_entry(d, &iommu->domain_list, next) {
+		list_for_each_entry(g, &d->group_list, next) {
+			if (!group) {
+				group = g;
+				break;
+			}
+		}
+	}
+	return group;
+}
+
 static int put_pfn(unsigned long pfn, int prot);
 
 /*
@@ -2014,6 +2041,39 @@ static int vfio_iommu_resv_refresh(struct vfio_iommu *iommu,
 	return ret;
 }
 
+static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
+{
+	struct domain_capsule *dc = (struct domain_capsule *)data;
+	struct iommu_gpasid_bind_data *bind_data =
+		(struct iommu_gpasid_bind_data *) dc->data;
+
+	return iommu_sva_bind_gpasid(dc->domain, dev, bind_data);
+}
+
+static int vfio_dev_unbind_gpasid_fn(struct device *dev, void *data)
+{
+	struct domain_capsule *dc = (struct domain_capsule *)data;
+	struct iommu_gpasid_unbind_data *unbind_data =
+		(struct iommu_gpasid_unbind_data *) dc->data;
+
+	iommu_sva_unbind_gpasid(dc->domain, dev, unbind_data);
+	return 0;
+}
+
+static void vfio_group_unbind_gpasid_fn(ioasid_t pasid, void *data)
+{
+	struct domain_capsule *dc = (struct domain_capsule *) data;
+	struct iommu_gpasid_unbind_data unbind_data;
+
+	unbind_data.argsz = sizeof(unbind_data);
+	unbind_data.flags = 0;
+	unbind_data.pasid = pasid;
+
+	dc->data = &unbind_data;
+	iommu_group_for_each_dev(dc->group->iommu_group,
+				 dc, vfio_dev_unbind_gpasid_fn);
+}
+
 static void vfio_iommu_type1_detach_group(void *iommu_data,
 					  struct iommu_group *iommu_group)
 {
@@ -2055,6 +2115,21 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
 		if (!group)
 			continue;
 
+		if (iommu->nesting_info && iommu->vmm &&
+		    (iommu->nesting_info->features &
+					IOMMU_NESTING_FEAT_BIND_PGTBL)) {
+			struct domain_capsule dc = { .group = group,
+						     .domain = domain->domain,
+						     .data = NULL };
+
+			/*
+			 * Unbind page tables bound with system wide PASIDs
+			 * which are allocated to user space.
+			 */
+			vfio_mm_for_each_pasid(iommu->vmm, &dc,
+					       vfio_group_unbind_gpasid_fn);
+		}
+
 		vfio_iommu_detach_group(domain, group);
 		list_del(&group->next);
 		kfree(group);
@@ -2453,6 +2528,120 @@ static int vfio_iommu_type1_pasid_request(struct vfio_iommu *iommu,
 	}
 }
 
+static long vfio_iommu_handle_pgtbl_op(struct vfio_iommu *iommu,
+				       bool is_bind, void *data)
+{
+	struct iommu_nesting_info *info;
+	struct domain_capsule dc = { .data = data };
+	struct vfio_group *group;
+	struct vfio_domain *domain;
+	int ret;
+
+	mutex_lock(&iommu->lock);
+
+	info = iommu->nesting_info;
+	if (!info || !(info->features & IOMMU_NESTING_FEAT_BIND_PGTBL)) {
+		ret = -ENOTSUPP;
+		goto out_unlock_iommu;
+	}
+
+	if (!iommu->vmm) {
+		ret = -EINVAL;
+		goto out_unlock_iommu;
+	}
+
+	group = vfio_find_nesting_group(iommu);
+	if (!group) {
+		ret = -EINVAL;
+		goto out_unlock_iommu;
+	}
+
+	domain = list_first_entry(&iommu->domain_list,
+				      struct vfio_domain, next);
+	dc.group = group;
+	dc.domain = domain->domain;
+
+	/* Avoid race with other containers within the same process */
+	vfio_mm_pasid_lock(iommu->vmm);
+
+	if (is_bind) {
+		ret = iommu_group_for_each_dev(group->iommu_group, &dc,
+					       vfio_dev_bind_gpasid_fn);
+		if (ret)
+			iommu_group_for_each_dev(group->iommu_group, &dc,
+						 vfio_dev_unbind_gpasid_fn);
+	} else {
+		iommu_group_for_each_dev(group->iommu_group,
+					 &dc, vfio_dev_unbind_gpasid_fn);
+		ret = 0;
+	}
+
+	vfio_mm_pasid_unlock(iommu->vmm);
+out_unlock_iommu:
+	mutex_unlock(&iommu->lock);
+	return ret;
+}
+
+static long vfio_iommu_type1_nesting_op(struct vfio_iommu *iommu,
+					unsigned long arg)
+{
+	struct vfio_iommu_type1_nesting_op hdr;
+	unsigned int minsz;
+	u8 *data = NULL;
+	size_t data_size;
+	int ret;
+
+	minsz = offsetofend(struct vfio_iommu_type1_nesting_op, flags);
+
+	if (copy_from_user(&hdr, (void __user *)arg, minsz))
+		return -EFAULT;
+
+	if (hdr.argsz < minsz || hdr.flags & ~VFIO_NESTING_OP_MASK)
+		return -EINVAL;
+
+	/* Get the current IOMMU UAPI data size */
+	switch (hdr.flags & VFIO_NESTING_OP_MASK) {
+	case VFIO_IOMMU_NESTING_OP_BIND_PGTBL:
+		data_size = sizeof(struct iommu_gpasid_bind_data);
+		break;
+	case VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL:
+		data_size = sizeof(struct iommu_gpasid_unbind_data);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if ((hdr.argsz - minsz) > data_size) {
+		/* User data > current kernel */
+		return -E2BIG;
+	}
+
+	data = kzalloc(data_size, GFP_KERNEL);
+	if (!data)
+		return -ENOMEM;
+
+	if (copy_from_user(data, (void __user *)(arg + minsz),
+			   hdr.argsz - minsz)) {
+		ret = -EFAULT;
+		goto out_free;
+	}
+
+	switch (hdr.flags & VFIO_NESTING_OP_MASK) {
+	case VFIO_IOMMU_NESTING_OP_BIND_PGTBL:
+		ret = vfio_iommu_handle_pgtbl_op(iommu, true, data);
+		break;
+	case VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL:
+		ret = vfio_iommu_handle_pgtbl_op(iommu, false, data);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+out_free:
+	kfree(data);
+	return ret;
+}
+
 static long vfio_iommu_type1_ioctl(void *iommu_data,
 				   unsigned int cmd, unsigned long arg)
 {
@@ -2469,6 +2658,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 		return vfio_iommu_type1_unmap_dma(iommu, arg);
 	case VFIO_IOMMU_PASID_REQUEST:
 		return vfio_iommu_type1_pasid_request(iommu, arg);
+	case VFIO_IOMMU_NESTING_OP:
+		return vfio_iommu_type1_nesting_op(iommu, arg);
 	}
 
 	return -ENOTTY;
diff --git a/drivers/vfio/vfio_pasid.c b/drivers/vfio/vfio_pasid.c
index 2ea9f1a..20f1e72 100644
--- a/drivers/vfio/vfio_pasid.c
+++ b/drivers/vfio/vfio_pasid.c
@@ -30,6 +30,7 @@ struct vfio_mm {
 	struct kref		kref;
 	struct vfio_mm_token	token;
 	int			ioasid_sid;
+	struct mutex		pasid_lock;
 	int			pasid_quota;
 	struct list_head	next;
 };
@@ -97,6 +98,7 @@ struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task)
 	kref_init(&vmm->kref);
 	vmm->token.val = (unsigned long long) mm;
 	vmm->pasid_quota = pasid_quota;
+	mutex_init(&vmm->pasid_lock);
 
 	list_add(&vmm->next, &vfio_pasid.vfio_mm_list);
 out:
@@ -134,12 +136,40 @@ void vfio_pasid_free_range(struct vfio_mm *vmm,
 	 * IOASID core will notify PASID users (e.g. IOMMU driver) to
 	 * teardown necessary structures depending on the to-be-freed
 	 * PASID.
+	 * Hold pasid_lock to avoid race with PASID usages like bind/
+	 * unbind page tables to requested PASID.
 	 */
+	mutex_lock(&vmm->pasid_lock);
 	for (; pasid <= max; pasid++)
 		ioasid_free(pasid);
+	mutex_unlock(&vmm->pasid_lock);
 }
 EXPORT_SYMBOL_GPL(vfio_pasid_free_range);
 
+int vfio_mm_for_each_pasid(struct vfio_mm *vmm, void *data,
+			   void (*fn)(ioasid_t id, void *data))
+{
+	int ret;
+
+	mutex_lock(&vmm->pasid_lock);
+	ret = ioasid_set_for_each_ioasid(vmm->ioasid_sid, fn, data);
+	mutex_unlock(&vmm->pasid_lock);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(vfio_mm_for_each_pasid);
+
+void vfio_mm_pasid_lock(struct vfio_mm *vmm)
+{
+	mutex_lock(&vmm->pasid_lock);
+}
+EXPORT_SYMBOL_GPL(vfio_mm_pasid_lock);
+
+void vfio_mm_pasid_unlock(struct vfio_mm *vmm)
+{
+	mutex_unlock(&vmm->pasid_lock);
+}
+EXPORT_SYMBOL_GPL(vfio_mm_pasid_unlock);
+
 static int __init vfio_pasid_init(void)
 {
 	mutex_init(&vfio_pasid.vfio_mm_lock);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 76da56e..66495de 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -103,6 +103,11 @@ int vfio_mm_ioasid_sid(struct vfio_mm *vmm);
 extern int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max);
 extern void vfio_pasid_free_range(struct vfio_mm *vmm,
 					ioasid_t min, ioasid_t max);
+extern int vfio_mm_for_each_pasid(struct vfio_mm *vmm, void *data,
+				  void (*fn)(ioasid_t id, void *data));
+extern void vfio_mm_pasid_lock(struct vfio_mm *vmm);
+extern void vfio_mm_pasid_unlock(struct vfio_mm *vmm);
+
 #else
 static inline struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task)
 {
@@ -127,6 +132,21 @@ static inline void vfio_pasid_free_range(struct vfio_mm *vmm,
 					  ioasid_t min, ioasid_t max)
 {
 }
+
+static inline int vfio_mm_for_each_pasid(struct vfio_mm *vmm, void *data,
+					 void (*fn)(ioasid_t id, void *data))
+{
+	return -ENOTTY;
+}
+
+static inline void vfio_mm_pasid_lock(struct vfio_mm *vmm)
+{
+}
+
+static inline void vfio_mm_pasid_unlock(struct vfio_mm *vmm)
+{
+}
+
 #endif /* CONFIG_VFIO_PASID */
 
 /*
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 24d1992..e4aa466 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -876,6 +876,36 @@ struct vfio_iommu_type1_pasid_request {
 
 #define VFIO_IOMMU_PASID_REQUEST	_IO(VFIO_TYPE, VFIO_BASE + 22)
 
+/**
+ * VFIO_IOMMU_NESTING_OP - _IOW(VFIO_TYPE, VFIO_BASE + 23,
+ *				struct vfio_iommu_type1_nesting_op)
+ *
+ * This interface allows user space to utilize the nesting IOMMU
+ * capabilities as reported through VFIO_IOMMU_GET_INFO.
+ *
+ * @data[] types defined for each op:
+ * +=================+===============================================+
+ * | NESTING OP      |                  @data[]                      |
+ * +=================+===============================================+
+ * | BIND_PGTBL      |      struct iommu_gpasid_bind_data            |
+ * +-----------------+-----------------------------------------------+
+ * | UNBIND_PGTBL    |      struct iommu_gpasid_unbind_data          |
+ * +-----------------+-----------------------------------------------+
+ *
+ * returns: 0 on success, -errno on failure.
+ */
+struct vfio_iommu_type1_nesting_op {
+	__u32	argsz;
+	__u32	flags;
+#define VFIO_NESTING_OP_MASK	(0xffff) /* lower 16-bits for op */
+	__u8	data[];
+};
+
+#define VFIO_IOMMU_NESTING_OP_BIND_PGTBL	(0)
+#define VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL	(1)
+
+#define VFIO_IOMMU_NESTING_OP		_IO(VFIO_TYPE, VFIO_BASE + 23)
+
 /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
 
 /*
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 11/15] vfio/type1: Allow invalidating first-level/stage IOMMU cache
  2020-06-11 12:15 [PATCH v2 00/15] vfio: expose virtual Shared Virtual Addressing to VMs Liu Yi L
                   ` (9 preceding siblings ...)
  2020-06-11 12:15 ` [PATCH v2 10/15] vfio/type1: Support binding guest page tables to PASID Liu Yi L
@ 2020-06-11 12:15 ` Liu Yi L
  2020-06-11 12:15 ` [PATCH v2 12/15] vfio/type1: Add vSVA support for IOMMU-backed mdevs Liu Yi L
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: Liu Yi L @ 2020-06-11 12:15 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, jun.j.tian, iommu,
	linux-kernel, yi.y.sun, hao.wu

This patch provides an interface allowing the userspace to invalidate
IOMMU cache for first-level page table. It is required when the first
level IOMMU page table is not managed by the host kernel in the nested
translation setup.

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
v1 -> v2:
*) rename from "vfio/type1: Flush stage-1 IOMMU cache for nesting type"
*) rename vfio_cache_inv_fn() to vfio_dev_cache_invalidate_fn()
*) vfio_dev_cache_inv_fn() always successful
*) remove VFIO_IOMMU_CACHE_INVALIDATE, and reuse VFIO_IOMMU_NESTING_OP

 drivers/vfio/vfio_iommu_type1.c | 59 +++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/vfio.h       |  3 +++
 2 files changed, 62 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index f1468a0..c233c8e 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -2582,6 +2582,54 @@ static long vfio_iommu_handle_pgtbl_op(struct vfio_iommu *iommu,
 	return ret;
 }
 
+static int vfio_dev_cache_invalidate_fn(struct device *dev, void *data)
+{
+	struct domain_capsule *dc = (struct domain_capsule *)data;
+	struct iommu_cache_invalidate_info *cache_info =
+		(struct iommu_cache_invalidate_info *) dc->data;
+
+	iommu_cache_invalidate(dc->domain, dev, cache_info);
+	return 0;
+}
+
+static long vfio_iommu_invalidate_cache(struct vfio_iommu *iommu,
+			struct iommu_cache_invalidate_info *cache_info)
+{
+	struct domain_capsule dc = { .data = cache_info };
+	struct vfio_group *group;
+	struct vfio_domain *domain;
+	int ret = 0;
+	struct iommu_nesting_info *info;
+
+	mutex_lock(&iommu->lock);
+	/*
+	 * Cache invalidation is required for any nesting IOMMU,
+	 * so no need to check system-wide PASID support.
+	 */
+	info = iommu->nesting_info;
+	if (!info || !(info->features & IOMMU_NESTING_FEAT_CACHE_INVLD)) {
+		ret = -ENOTSUPP;
+		goto out_unlock;
+	}
+
+	group = vfio_find_nesting_group(iommu);
+	if (!group) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	domain = list_first_entry(&iommu->domain_list,
+				      struct vfio_domain, next);
+	dc.group = group;
+	dc.domain = domain->domain;
+	iommu_group_for_each_dev(group->iommu_group, &dc,
+				 vfio_dev_cache_invalidate_fn);
+
+out_unlock:
+	mutex_unlock(&iommu->lock);
+	return ret;
+}
+
 static long vfio_iommu_type1_nesting_op(struct vfio_iommu *iommu,
 					unsigned long arg)
 {
@@ -2607,6 +2655,9 @@ static long vfio_iommu_type1_nesting_op(struct vfio_iommu *iommu,
 	case VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL:
 		data_size = sizeof(struct iommu_gpasid_unbind_data);
 		break;
+	case VFIO_IOMMU_NESTING_OP_CACHE_INVLD:
+		data_size = sizeof(struct iommu_cache_invalidate_info);
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -2633,6 +2684,14 @@ static long vfio_iommu_type1_nesting_op(struct vfio_iommu *iommu,
 	case VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL:
 		ret = vfio_iommu_handle_pgtbl_op(iommu, false, data);
 		break;
+	case VFIO_IOMMU_NESTING_OP_CACHE_INVLD:
+	{
+		struct iommu_cache_invalidate_info *cache_info =
+				(struct iommu_cache_invalidate_info *)data;
+
+		ret = vfio_iommu_invalidate_cache(iommu, cache_info);
+		break;
+	}
 	default:
 		ret = -EINVAL;
 	}
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index e4aa466..9a011d6 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -891,6 +891,8 @@ struct vfio_iommu_type1_pasid_request {
  * +-----------------+-----------------------------------------------+
  * | UNBIND_PGTBL    |      struct iommu_gpasid_unbind_data          |
  * +-----------------+-----------------------------------------------+
+ * | CACHE_INVLD     |      struct iommu_cache_invalidate_info       |
+ * +-----------------+-----------------------------------------------+
  *
  * returns: 0 on success, -errno on failure.
  */
@@ -903,6 +905,7 @@ struct vfio_iommu_type1_nesting_op {
 
 #define VFIO_IOMMU_NESTING_OP_BIND_PGTBL	(0)
 #define VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL	(1)
+#define VFIO_IOMMU_NESTING_OP_CACHE_INVLD	(2)
 
 #define VFIO_IOMMU_NESTING_OP		_IO(VFIO_TYPE, VFIO_BASE + 23)
 
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 12/15] vfio/type1: Add vSVA support for IOMMU-backed mdevs
  2020-06-11 12:15 [PATCH v2 00/15] vfio: expose virtual Shared Virtual Addressing to VMs Liu Yi L
                   ` (10 preceding siblings ...)
  2020-06-11 12:15 ` [PATCH v2 11/15] vfio/type1: Allow invalidating first-level/stage IOMMU cache Liu Yi L
@ 2020-06-11 12:15 ` Liu Yi L
  2020-06-11 12:15 ` [PATCH v2 13/15] vfio/pci: Expose PCIe PASID capability to guest Liu Yi L
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: Liu Yi L @ 2020-06-11 12:15 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, jun.j.tian, iommu,
	linux-kernel, yi.y.sun, hao.wu

Recent years, mediated device pass-through framework (e.g. vfio-mdev)
is used to achieve flexible device sharing across domains (e.g. VMs).
Also there are hardware assisted mediated pass-through solutions from
platform vendors. e.g. Intel VT-d scalable mode which supports Intel
Scalable I/O Virtualization technology. Such mdevs are called IOMMU-
backed mdevs as there are IOMMU enforced DMA isolation for such mdevs.
In kernel, IOMMU-backed mdevs are exposed to IOMMU layer by aux-domain
concept, which means mdevs are protected by an iommu domain which is
auxiliary to the domain that the kernel driver primarily uses for DMA
API. Details can be found in the KVM presentation as below:

https://events19.linuxfoundation.org/wp-content/uploads/2017/12/\
Hardware-Assisted-Mediated-Pass-Through-with-VFIO-Kevin-Tian-Intel.pdf

This patch extends NESTING_IOMMU ops to IOMMU-backed mdev devices. The
main requirement is to use the auxiliary domain associated with mdev.

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
CC: Jun Tian <jun.j.tian@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
v1 -> v2:
*) check the iommu_device to ensure the handling mdev is IOMMU-backed

 drivers/vfio/vfio_iommu_type1.c | 30 +++++++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index c233c8e..bcd7935 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -2041,13 +2041,27 @@ static int vfio_iommu_resv_refresh(struct vfio_iommu *iommu,
 	return ret;
 }
 
+static struct device *vfio_get_iommu_device(struct vfio_group *group,
+					    struct device *dev)
+{
+	if (group->mdev_group)
+		return vfio_mdev_get_iommu_device(dev);
+	else
+		return dev;
+}
+
 static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
 {
 	struct domain_capsule *dc = (struct domain_capsule *)data;
 	struct iommu_gpasid_bind_data *bind_data =
 		(struct iommu_gpasid_bind_data *) dc->data;
+	struct device *iommu_device;
+
+	iommu_device = vfio_get_iommu_device(dc->group, dev);
+	if (!iommu_device)
+		return -EINVAL;
 
-	return iommu_sva_bind_gpasid(dc->domain, dev, bind_data);
+	return iommu_sva_bind_gpasid(dc->domain, iommu_device, bind_data);
 }
 
 static int vfio_dev_unbind_gpasid_fn(struct device *dev, void *data)
@@ -2055,8 +2069,13 @@ static int vfio_dev_unbind_gpasid_fn(struct device *dev, void *data)
 	struct domain_capsule *dc = (struct domain_capsule *)data;
 	struct iommu_gpasid_unbind_data *unbind_data =
 		(struct iommu_gpasid_unbind_data *) dc->data;
+	struct device *iommu_device;
+
+	iommu_device = vfio_get_iommu_device(dc->group, dev);
+	if (!iommu_device)
+		return 0;
 
-	iommu_sva_unbind_gpasid(dc->domain, dev, unbind_data);
+	iommu_sva_unbind_gpasid(dc->domain, iommu_device, unbind_data);
 	return 0;
 }
 
@@ -2587,8 +2606,13 @@ static int vfio_dev_cache_invalidate_fn(struct device *dev, void *data)
 	struct domain_capsule *dc = (struct domain_capsule *)data;
 	struct iommu_cache_invalidate_info *cache_info =
 		(struct iommu_cache_invalidate_info *) dc->data;
+	struct device *iommu_device;
+
+	iommu_device = vfio_get_iommu_device(dc->group, dev);
+	if (!iommu_device)
+		return -EINVAL;
 
-	iommu_cache_invalidate(dc->domain, dev, cache_info);
+	iommu_cache_invalidate(dc->domain, iommu_device, cache_info);
 	return 0;
 }
 
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 13/15] vfio/pci: Expose PCIe PASID capability to guest
  2020-06-11 12:15 [PATCH v2 00/15] vfio: expose virtual Shared Virtual Addressing to VMs Liu Yi L
                   ` (11 preceding siblings ...)
  2020-06-11 12:15 ` [PATCH v2 12/15] vfio/type1: Add vSVA support for IOMMU-backed mdevs Liu Yi L
@ 2020-06-11 12:15 ` Liu Yi L
  2020-06-11 12:15 ` [PATCH v2 14/15] vfio: Document dual stage control Liu Yi L
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: Liu Yi L @ 2020-06-11 12:15 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, jun.j.tian, iommu,
	linux-kernel, yi.y.sun, hao.wu

This patch exposes PCIe PASID capability to guest for assigned devices.
Existing vfio_pci driver hides it from guest by setting the capability
length as 0 in pci_ext_cap_length[].

And this patch only exposes PASID capability for devices which has PCIe
PASID extended struture in its configuration space. So VFs, will will
not see PASID capability on VFs as VF doesn't implement PASID extended
structure in its configuration space. For VF, it is a TODO in future.
Related discussion can be found in below link:

https://lkml.org/lkml/2020/4/7/693

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
v1 -> v2:
*) added in v2, but it was sent in a separate patchseries before

 drivers/vfio/pci/vfio_pci_config.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index 90c0b80..4b9af99 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -95,7 +95,7 @@ static const u16 pci_ext_cap_length[PCI_EXT_CAP_ID_MAX + 1] = {
 	[PCI_EXT_CAP_ID_LTR]	=	PCI_EXT_CAP_LTR_SIZEOF,
 	[PCI_EXT_CAP_ID_SECPCI]	=	0,	/* not yet */
 	[PCI_EXT_CAP_ID_PMUX]	=	0,	/* not yet */
-	[PCI_EXT_CAP_ID_PASID]	=	0,	/* not yet */
+	[PCI_EXT_CAP_ID_PASID]	=	PCI_EXT_CAP_PASID_SIZEOF,
 };
 
 /*
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 14/15] vfio: Document dual stage control
  2020-06-11 12:15 [PATCH v2 00/15] vfio: expose virtual Shared Virtual Addressing to VMs Liu Yi L
                   ` (12 preceding siblings ...)
  2020-06-11 12:15 ` [PATCH v2 13/15] vfio/pci: Expose PCIe PASID capability to guest Liu Yi L
@ 2020-06-11 12:15 ` Liu Yi L
  2020-06-15  9:41   ` Stefan Hajnoczi
  2020-06-11 12:15 ` [PATCH v2 15/15] iommu/vt-d: Support reporting nesting capability info Liu Yi L
  2020-06-15 10:02 ` [PATCH v2 00/15] vfio: expose virtual Shared Virtual Addressing to VMs Stefan Hajnoczi
  15 siblings, 1 reply; 37+ messages in thread
From: Liu Yi L @ 2020-06-11 12:15 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, jun.j.tian, iommu,
	linux-kernel, yi.y.sun, hao.wu

From: Eric Auger <eric.auger@redhat.com>

The VFIO API was enhanced to support nested stage control: a bunch of
new iotcls and usage guideline.

Let's document the process to follow to set up nested mode.

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
v1 -> v2:
*) new in v2, compared with Eric's original version, pasid table bind
   and fault reporting is removed as this series doesn't cover them.
   Original version from Eric.
   https://lkml.org/lkml/2020/3/20/700

 Documentation/driver-api/vfio.rst | 64 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)

diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst
index f1a4d3c..06224bd 100644
--- a/Documentation/driver-api/vfio.rst
+++ b/Documentation/driver-api/vfio.rst
@@ -239,6 +239,70 @@ group and can access them as follows::
 	/* Gratuitous device reset and go... */
 	ioctl(device, VFIO_DEVICE_RESET);
 
+IOMMU Dual Stage Control
+------------------------
+
+Some IOMMUs support 2 stages/levels of translation. Stage corresponds to
+the ARM terminology while level corresponds to Intel's VTD terminology.
+In the following text we use either without distinction.
+
+This is useful when the guest is exposed with a virtual IOMMU and some
+devices are assigned to the guest through VFIO. Then the guest OS can use
+stage 1 (GIOVA -> GPA or GVA->GPA), while the hypervisor uses stage 2 for
+VM isolation (GPA -> HPA).
+
+Under dual stage translation, the guest gets ownership of the stage 1 page
+tables and also owns stage 1 configuration structures. The hypervisor owns
+the root configuration structure (for security reason), including stage 2
+configuration. This works as long configuration structures and page table
+format are compatible between the virtual IOMMU and the physical IOMMU.
+
+Assuming the HW supports it, this nested mode is selected by choosing the
+VFIO_TYPE1_NESTING_IOMMU type through:
+
+    ioctl(container, VFIO_SET_IOMMU, VFIO_TYPE1_NESTING_IOMMU);
+
+This forces the hypervisor to use the stage 2, leaving stage 1 available
+for guest usage. The guest stage 1 format depends on IOMMU vendor, and
+it is the same with the nesting configuration method. User space should
+check the format and configuration method after setting nesting type by
+using:
+
+    ioctl(container->fd, VFIO_IOMMU_GET_INFO, &nesting_info);
+
+Details can be found in Documentation/userspace-api/iommu.rst. For Intel
+VT-d, each stage 1 page table is bound to host by:
+
+    nesting_op->flags = VFIO_IOMMU_NESTING_OP_BIND_PGTBL;
+    memcpy(&nesting_op->data, &bind_data, sizeof(bind_data));
+    ioctl(container->fd, VFIO_IOMMU_NESTING_OP, nesting_op);
+
+As mentioned above, guest OS may use stage 1 for GIOVA->GPA or GVA->GPA.
+GVA->GPA page tables are available when PASID (Process Address Space ID)
+is exposed to guest. e.g. guest with PASID-capable devices assigned. For
+such page table binding, the bind_data should include PASID info, which
+is allocated by guest itself or by host. This depends on hardware vendor
+e.g. Intel VT-d requires to allocate PASID from host. This requirement is
+available by VFIO_IOMMU_GET_INFO. User space could allocate PASID from
+host by:
+
+    req.flags = VFIO_IOMMU_ALLOC_PASID;
+    ioctl(container, VFIO_IOMMU_PASID_REQUEST, &req);
+
+With first stage/level page table bound to host, it allows to combine the
+guest stage 1 translation along with the hypervisor stage 2 translation to
+get final address.
+
+When the guest invalidates stage 1 related caches, invalidations must be
+forwarded to the host through
+
+    nesting_op->flags = VFIO_IOMMU_NESTING_OP_CACHE_INVLD;
+    memcpy(&nesting_op->data, &inv_data, sizeof(inv_data));
+    ioctl(container->fd, VFIO_IOMMU_NESTING_OP, nesting_op);
+
+Those invalidations can happen at various granularity levels, page, context,
+...
+
 VFIO User API
 -------------------------------------------------------------------------------
 
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 15/15] iommu/vt-d: Support reporting nesting capability info
  2020-06-11 12:15 [PATCH v2 00/15] vfio: expose virtual Shared Virtual Addressing to VMs Liu Yi L
                   ` (13 preceding siblings ...)
  2020-06-11 12:15 ` [PATCH v2 14/15] vfio: Document dual stage control Liu Yi L
@ 2020-06-11 12:15 ` Liu Yi L
  2020-06-15 10:02 ` [PATCH v2 00/15] vfio: expose virtual Shared Virtual Addressing to VMs Stefan Hajnoczi
  15 siblings, 0 replies; 37+ messages in thread
From: Liu Yi L @ 2020-06-11 12:15 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, jun.j.tian, iommu,
	linux-kernel, yi.y.sun, hao.wu

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/intel-iommu.c | 81 +++++++++++++++++++++++++++++++++++++++++++--
 include/linux/intel-iommu.h | 16 +++++++++
 2 files changed, 95 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 25650ac..5415dc7 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5655,12 +5655,16 @@ static inline bool iommu_pasid_support(void)
 static inline bool nested_mode_support(void)
 {
 	struct dmar_drhd_unit *drhd;
-	struct intel_iommu *iommu;
+	struct intel_iommu *iommu, *prev = NULL;
 	bool ret = true;
 
 	rcu_read_lock();
 	for_each_active_iommu(iommu, drhd) {
-		if (!sm_supported(iommu) || !ecap_nest(iommu->ecap)) {
+		if (!prev)
+			prev = iommu;
+		if (!sm_supported(iommu) || !ecap_nest(iommu->ecap) ||
+		    (VTD_CAP_MASK & (iommu->cap ^ prev->cap)) ||
+		    (VTD_ECAP_MASK & (iommu->ecap ^ prev->ecap))) {
 			ret = false;
 			break;
 		}
@@ -6069,11 +6073,84 @@ intel_iommu_domain_set_attr(struct iommu_domain *domain,
 	return ret;
 }
 
+static int intel_iommu_get_nesting_info(struct iommu_domain *domain,
+					struct iommu_nesting_info *info)
+{
+	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+	u64 cap = VTD_CAP_MASK, ecap = VTD_ECAP_MASK;
+	struct device_domain_info *domain_info;
+	struct iommu_nesting_info_vtd vtd;
+	unsigned long flags;
+	u32 size;
+
+	if ((domain->type != IOMMU_DOMAIN_UNMANAGED) ||
+	    !(dmar_domain->flags & DOMAIN_FLAG_NESTING_MODE))
+		return -ENODEV;
+
+	if (!info)
+		return -EINVAL;
+
+	size = sizeof(struct iommu_nesting_info) +
+		sizeof(struct iommu_nesting_info_vtd);
+	/*
+	 * if provided buffer size is not equal to the size, should
+	 * return 0 and also the expected buffer size to caller.
+	 */
+	if (info->size != size) {
+		info->size = size;
+		return 0;
+	}
+
+	spin_lock_irqsave(&device_domain_lock, flags);
+	/*
+	 * arbitrary select the first domain_info as all nesting
+	 * related capabilities should be consistent across iommu
+	 * units.
+	 */
+	domain_info = list_first_entry(&dmar_domain->devices,
+				      struct device_domain_info, link);
+	cap &= domain_info->iommu->cap;
+	ecap &= domain_info->iommu->ecap;
+	spin_unlock_irqrestore(&device_domain_lock, flags);
+
+	info->format = IOMMU_PASID_FORMAT_INTEL_VTD;
+	info->features = IOMMU_NESTING_FEAT_SYSWIDE_PASID |
+			 IOMMU_NESTING_FEAT_BIND_PGTBL |
+			 IOMMU_NESTING_FEAT_CACHE_INVLD;
+	vtd.flags = 0;
+	vtd.addr_width = dmar_domain->gaw;
+	vtd.pasid_bits = ilog2(intel_pasid_max_id);
+	vtd.cap_reg = cap;
+	vtd.cap_mask = VTD_CAP_MASK;
+	vtd.ecap_reg = ecap;
+	vtd.ecap_mask = VTD_ECAP_MASK;
+
+	memcpy(info->data, &vtd, sizeof(vtd));
+	return 0;
+}
+
+static int intel_iommu_domain_get_attr(struct iommu_domain *domain,
+				       enum iommu_attr attr, void *data)
+{
+	switch (attr) {
+	case DOMAIN_ATTR_NESTING_INFO:
+	{
+		struct iommu_nesting_info *info =
+				(struct iommu_nesting_info *) data;
+
+		return intel_iommu_get_nesting_info(domain, info);
+	}
+	default:
+		return -ENODEV;
+	}
+}
+
 const struct iommu_ops intel_iommu_ops = {
 	.capable		= intel_iommu_capable,
 	.domain_alloc		= intel_iommu_domain_alloc,
 	.domain_free		= intel_iommu_domain_free,
 	.domain_set_attr	= intel_iommu_domain_set_attr,
+	.domain_get_attr	= intel_iommu_domain_get_attr,
 	.attach_dev		= intel_iommu_attach_device,
 	.detach_dev		= intel_iommu_detach_device,
 	.aux_attach_dev		= intel_iommu_aux_attach_device,
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 0b238c3..be48f4e 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -196,6 +196,22 @@
 #define ecap_max_handle_mask(e) ((e >> 20) & 0xf)
 #define ecap_sc_support(e)	((e >> 7) & 0x1) /* Snooping Control */
 
+/* Nesting Support Capability Alignment */
+#define VTD_CAP_FL1GP		(1ULL << 56)
+#define VTD_CAP_FL5LP		(1ULL << 60)
+#define VTD_ECAP_PRS		(1ULL << 29)
+#define VTD_ECAP_ERS		(1ULL << 30)
+#define VTD_ECAP_SRS		(1ULL << 31)
+#define VTD_ECAP_EAFS		(1ULL << 34)
+#define VTD_ECAP_PASID		(1ULL << 40)
+
+/* Only capabilities marked in below MASKs are reported */
+#define VTD_CAP_MASK		(VTD_CAP_FL1GP | VTD_CAP_FL5LP)
+
+#define VTD_ECAP_MASK		(VTD_ECAP_PRS | VTD_ECAP_ERS | \
+				 VTD_ECAP_SRS | VTD_ECAP_EAFS | \
+				 VTD_ECAP_PASID)
+
 /* Virtual command interface capability */
 #define vccap_pasid(v)		(((v) & DMA_VCS_PAS)) /* PASID allocation */
 
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 02/15] iommu: Report domain nesting info
  2020-06-11 12:15 ` [PATCH v2 02/15] iommu: Report domain nesting info Liu Yi L
@ 2020-06-11 19:30   ` Alex Williamson
  2020-06-12  9:05     ` Liu, Yi L
  2020-06-17 14:39   ` Jean-Philippe Brucker
  1 sibling, 1 reply; 37+ messages in thread
From: Alex Williamson @ 2020-06-11 19:30 UTC (permalink / raw)
  To: Liu Yi L
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, iommu, linux-kernel,
	yi.y.sun, hao.wu, jun.j.tian

On Thu, 11 Jun 2020 05:15:21 -0700
Liu Yi L <yi.l.liu@intel.com> wrote:

> IOMMUs that support nesting translation needs report the capability info
> to userspace, e.g. the format of first level/stage paging structures.
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Eric Auger <eric.auger@redhat.com>
> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
> @Jean, Eric: as nesting was introduced for ARM, but looks like no actual
> user of it. right? So I'm wondering if we can reuse DOMAIN_ATTR_NESTING
> to retrieve nesting info? how about your opinions?
> 
>  include/linux/iommu.h      |  1 +
>  include/uapi/linux/iommu.h | 34 ++++++++++++++++++++++++++++++++++
>  2 files changed, 35 insertions(+)
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 78a26ae..f6e4b49 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -126,6 +126,7 @@ enum iommu_attr {
>  	DOMAIN_ATTR_FSL_PAMUV1,
>  	DOMAIN_ATTR_NESTING,	/* two stages of translation */
>  	DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
> +	DOMAIN_ATTR_NESTING_INFO,
>  	DOMAIN_ATTR_MAX,
>  };
>  
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> index 303f148..02eac73 100644
> --- a/include/uapi/linux/iommu.h
> +++ b/include/uapi/linux/iommu.h
> @@ -332,4 +332,38 @@ struct iommu_gpasid_bind_data {
>  	};
>  };
>  
> +struct iommu_nesting_info {
> +	__u32	size;
> +	__u32	format;
> +	__u32	features;
> +#define IOMMU_NESTING_FEAT_SYSWIDE_PASID	(1 << 0)
> +#define IOMMU_NESTING_FEAT_BIND_PGTBL		(1 << 1)
> +#define IOMMU_NESTING_FEAT_CACHE_INVLD		(1 << 2)
> +	__u32	flags;
> +	__u8	data[];
> +};
> +
> +/*
> + * @flags:	VT-d specific flags. Currently reserved for future
> + *		extension.
> + * @addr_width:	The output addr width of first level/stage translation
> + * @pasid_bits:	Maximum supported PASID bits, 0 represents no PASID
> + *		support.
> + * @cap_reg:	Describe basic capabilities as defined in VT-d capability
> + *		register.
> + * @cap_mask:	Mark valid capability bits in @cap_reg.
> + * @ecap_reg:	Describe the extended capabilities as defined in VT-d
> + *		extended capability register.
> + * @ecap_mask:	Mark the valid capability bits in @ecap_reg.

Please explain this a little further, why do we need to tell userspace
about cap/ecap register bits that aren't valid through this interface?
Thanks,

Alex


> + */
> +struct iommu_nesting_info_vtd {
> +	__u32	flags;
> +	__u16	addr_width;
> +	__u16	pasid_bits;
> +	__u64	cap_reg;
> +	__u64	cap_mask;
> +	__u64	ecap_reg;
> +	__u64	ecap_mask;
> +};
> +
>  #endif /* _UAPI_IOMMU_H */

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH v2 02/15] iommu: Report domain nesting info
  2020-06-11 19:30   ` Alex Williamson
@ 2020-06-12  9:05     ` Liu, Yi L
  2020-06-15  1:22       ` Tian, Kevin
  0 siblings, 1 reply; 37+ messages in thread
From: Liu, Yi L @ 2020-06-12  9:05 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jean-philippe, Tian, Kevin, Raj, Ashok, kvm, iommu, linux-kernel,
	Sun, Yi Y, Wu, Hao, Tian, Jun J

Hi Alex,

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Friday, June 12, 2020 3:30 AM
> 
> On Thu, 11 Jun 2020 05:15:21 -0700
> Liu Yi L <yi.l.liu@intel.com> wrote:
> 
> > IOMMUs that support nesting translation needs report the capability
> > info to userspace, e.g. the format of first level/stage paging structures.
> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Alex Williamson <alex.williamson@redhat.com>
> > Cc: Eric Auger <eric.auger@redhat.com>
> > Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > Cc: Joerg Roedel <joro@8bytes.org>
> > Cc: Lu Baolu <baolu.lu@linux.intel.com>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > ---
> > @Jean, Eric: as nesting was introduced for ARM, but looks like no
> > actual user of it. right? So I'm wondering if we can reuse
> > DOMAIN_ATTR_NESTING to retrieve nesting info? how about your opinions?
> >
> >  include/linux/iommu.h      |  1 +
> >  include/uapi/linux/iommu.h | 34 ++++++++++++++++++++++++++++++++++
> >  2 files changed, 35 insertions(+)
> >
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h index
> > 78a26ae..f6e4b49 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -126,6 +126,7 @@ enum iommu_attr {
> >  	DOMAIN_ATTR_FSL_PAMUV1,
> >  	DOMAIN_ATTR_NESTING,	/* two stages of translation */
> >  	DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
> > +	DOMAIN_ATTR_NESTING_INFO,
> >  	DOMAIN_ATTR_MAX,
> >  };
> >
> > diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> > index 303f148..02eac73 100644
> > --- a/include/uapi/linux/iommu.h
> > +++ b/include/uapi/linux/iommu.h
> > @@ -332,4 +332,38 @@ struct iommu_gpasid_bind_data {
> >  	};
> >  };
> >
> > +struct iommu_nesting_info {
> > +	__u32	size;
> > +	__u32	format;
> > +	__u32	features;
> > +#define IOMMU_NESTING_FEAT_SYSWIDE_PASID	(1 << 0)
> > +#define IOMMU_NESTING_FEAT_BIND_PGTBL		(1 << 1)
> > +#define IOMMU_NESTING_FEAT_CACHE_INVLD		(1 << 2)
> > +	__u32	flags;
> > +	__u8	data[];
> > +};
> > +
> > +/*
> > + * @flags:	VT-d specific flags. Currently reserved for future
> > + *		extension.
> > + * @addr_width:	The output addr width of first level/stage translation
> > + * @pasid_bits:	Maximum supported PASID bits, 0 represents no PASID
> > + *		support.
> > + * @cap_reg:	Describe basic capabilities as defined in VT-d capability
> > + *		register.
> > + * @cap_mask:	Mark valid capability bits in @cap_reg.
> > + * @ecap_reg:	Describe the extended capabilities as defined in VT-d
> > + *		extended capability register.
> > + * @ecap_mask:	Mark the valid capability bits in @ecap_reg.
> 
> Please explain this a little further, why do we need to tell userspace about
> cap/ecap register bits that aren't valid through this interface?
> Thanks,

we only want to tell userspace about the bits marked in the cap/ecap_mask.
cap/ecap_mask is kind of white-list of the cap/ecap register. userspace should
only care about the bits in the white-list, for other bits, it should ignore.

Regards,
Yi Liu

> Alex
> 
> 
> > + */
> > +struct iommu_nesting_info_vtd {
> > +	__u32	flags;
> > +	__u16	addr_width;
> > +	__u16	pasid_bits;
> > +	__u64	cap_reg;
> > +	__u64	cap_mask;
> > +	__u64	ecap_reg;
> > +	__u64	ecap_mask;
> > +};
> > +
> >  #endif /* _UAPI_IOMMU_H */

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH v2 02/15] iommu: Report domain nesting info
  2020-06-12  9:05     ` Liu, Yi L
@ 2020-06-15  1:22       ` Tian, Kevin
  2020-06-15  6:04         ` Liu, Yi L
  0 siblings, 1 reply; 37+ messages in thread
From: Tian, Kevin @ 2020-06-15  1:22 UTC (permalink / raw)
  To: Liu, Yi L, Alex Williamson
  Cc: jean-philippe, Raj, Ashok, kvm, iommu, linux-kernel, Sun, Yi Y,
	Wu, Hao, Tian,  Jun J

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Friday, June 12, 2020 5:05 PM
> 
> Hi Alex,
> 
> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Friday, June 12, 2020 3:30 AM
> >
> > On Thu, 11 Jun 2020 05:15:21 -0700
> > Liu Yi L <yi.l.liu@intel.com> wrote:
> >
> > > IOMMUs that support nesting translation needs report the capability
> > > info to userspace, e.g. the format of first level/stage paging structures.
> > >
> > > Cc: Kevin Tian <kevin.tian@intel.com>
> > > CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > Cc: Alex Williamson <alex.williamson@redhat.com>
> > > Cc: Eric Auger <eric.auger@redhat.com>
> > > Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > > Cc: Joerg Roedel <joro@8bytes.org>
> > > Cc: Lu Baolu <baolu.lu@linux.intel.com>
> > > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > ---
> > > @Jean, Eric: as nesting was introduced for ARM, but looks like no
> > > actual user of it. right? So I'm wondering if we can reuse
> > > DOMAIN_ATTR_NESTING to retrieve nesting info? how about your
> opinions?
> > >
> > >  include/linux/iommu.h      |  1 +
> > >  include/uapi/linux/iommu.h | 34
> ++++++++++++++++++++++++++++++++++
> > >  2 files changed, 35 insertions(+)
> > >
> > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h index
> > > 78a26ae..f6e4b49 100644
> > > --- a/include/linux/iommu.h
> > > +++ b/include/linux/iommu.h
> > > @@ -126,6 +126,7 @@ enum iommu_attr {
> > >  	DOMAIN_ATTR_FSL_PAMUV1,
> > >  	DOMAIN_ATTR_NESTING,	/* two stages of translation */
> > >  	DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
> > > +	DOMAIN_ATTR_NESTING_INFO,
> > >  	DOMAIN_ATTR_MAX,
> > >  };
> > >
> > > diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> > > index 303f148..02eac73 100644
> > > --- a/include/uapi/linux/iommu.h
> > > +++ b/include/uapi/linux/iommu.h
> > > @@ -332,4 +332,38 @@ struct iommu_gpasid_bind_data {
> > >  	};
> > >  };
> > >
> > > +struct iommu_nesting_info {
> > > +	__u32	size;
> > > +	__u32	format;
> > > +	__u32	features;
> > > +#define IOMMU_NESTING_FEAT_SYSWIDE_PASID	(1 << 0)
> > > +#define IOMMU_NESTING_FEAT_BIND_PGTBL		(1 << 1)
> > > +#define IOMMU_NESTING_FEAT_CACHE_INVLD		(1 << 2)
> > > +	__u32	flags;
> > > +	__u8	data[];
> > > +};
> > > +
> > > +/*
> > > + * @flags:	VT-d specific flags. Currently reserved for future
> > > + *		extension.
> > > + * @addr_width:	The output addr width of first level/stage translation
> > > + * @pasid_bits:	Maximum supported PASID bits, 0 represents no
> PASID
> > > + *		support.
> > > + * @cap_reg:	Describe basic capabilities as defined in VT-d
> capability
> > > + *		register.
> > > + * @cap_mask:	Mark valid capability bits in @cap_reg.
> > > + * @ecap_reg:	Describe the extended capabilities as defined in VT-d
> > > + *		extended capability register.
> > > + * @ecap_mask:	Mark the valid capability bits in @ecap_reg.
> >
> > Please explain this a little further, why do we need to tell userspace about
> > cap/ecap register bits that aren't valid through this interface?
> > Thanks,
> 
> we only want to tell userspace about the bits marked in the cap/ecap_mask.
> cap/ecap_mask is kind of white-list of the cap/ecap register. userspace
> should
> only care about the bits in the white-list, for other bits, it should ignore.
> 
> Regards,
> Yi Liu

For invalid bits if kernel just clears them then do we still need additional
mask bits to explicitly mark them out? I guess this might be the point that 
Alex asked...

> 
> > Alex
> >
> >
> > > + */
> > > +struct iommu_nesting_info_vtd {
> > > +	__u32	flags;
> > > +	__u16	addr_width;
> > > +	__u16	pasid_bits;
> > > +	__u64	cap_reg;
> > > +	__u64	cap_mask;
> > > +	__u64	ecap_reg;
> > > +	__u64	ecap_mask;
> > > +};
> > > +
> > >  #endif /* _UAPI_IOMMU_H */

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH v2 02/15] iommu: Report domain nesting info
  2020-06-15  1:22       ` Tian, Kevin
@ 2020-06-15  6:04         ` Liu, Yi L
  2020-06-16  1:56           ` Tian, Kevin
  0 siblings, 1 reply; 37+ messages in thread
From: Liu, Yi L @ 2020-06-15  6:04 UTC (permalink / raw)
  To: Tian, Kevin, Alex Williamson
  Cc: jean-philippe, Raj, Ashok, kvm, iommu, linux-kernel, Sun, Yi Y,
	Wu, Hao, Tian,  Jun J

Hi Kevin,

> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: Monday, June 15, 2020 9:23 AM
> 
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Friday, June 12, 2020 5:05 PM
> >
> > Hi Alex,
> >
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Friday, June 12, 2020 3:30 AM
> > >
> > > On Thu, 11 Jun 2020 05:15:21 -0700
> > > Liu Yi L <yi.l.liu@intel.com> wrote:
> > >
> > > > IOMMUs that support nesting translation needs report the
> > > > capability info to userspace, e.g. the format of first level/stage paging
> structures.
> > > >
> > > > Cc: Kevin Tian <kevin.tian@intel.com>
> > > > CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > > Cc: Alex Williamson <alex.williamson@redhat.com>
> > > > Cc: Eric Auger <eric.auger@redhat.com>
> > > > Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > > > Cc: Joerg Roedel <joro@8bytes.org>
> > > > Cc: Lu Baolu <baolu.lu@linux.intel.com>
> > > > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > > > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > > ---
> > > > @Jean, Eric: as nesting was introduced for ARM, but looks like no
> > > > actual user of it. right? So I'm wondering if we can reuse
> > > > DOMAIN_ATTR_NESTING to retrieve nesting info? how about your
> > opinions?
> > > >
> > > >  include/linux/iommu.h      |  1 +
> > > >  include/uapi/linux/iommu.h | 34
> > ++++++++++++++++++++++++++++++++++
> > > >  2 files changed, 35 insertions(+)
> > > >
> > > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h index
> > > > 78a26ae..f6e4b49 100644
> > > > --- a/include/linux/iommu.h
> > > > +++ b/include/linux/iommu.h
> > > > @@ -126,6 +126,7 @@ enum iommu_attr {
> > > >  	DOMAIN_ATTR_FSL_PAMUV1,
> > > >  	DOMAIN_ATTR_NESTING,	/* two stages of translation */
> > > >  	DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
> > > > +	DOMAIN_ATTR_NESTING_INFO,
> > > >  	DOMAIN_ATTR_MAX,
> > > >  };
> > > >
> > > > diff --git a/include/uapi/linux/iommu.h
> > > > b/include/uapi/linux/iommu.h index 303f148..02eac73 100644
> > > > --- a/include/uapi/linux/iommu.h
> > > > +++ b/include/uapi/linux/iommu.h
> > > > @@ -332,4 +332,38 @@ struct iommu_gpasid_bind_data {
> > > >  	};
> > > >  };
> > > >
> > > > +struct iommu_nesting_info {
> > > > +	__u32	size;
> > > > +	__u32	format;
> > > > +	__u32	features;
> > > > +#define IOMMU_NESTING_FEAT_SYSWIDE_PASID	(1 << 0)
> > > > +#define IOMMU_NESTING_FEAT_BIND_PGTBL		(1 << 1)
> > > > +#define IOMMU_NESTING_FEAT_CACHE_INVLD		(1 << 2)
> > > > +	__u32	flags;
> > > > +	__u8	data[];
> > > > +};
> > > > +
> > > > +/*
> > > > + * @flags:	VT-d specific flags. Currently reserved for future
> > > > + *		extension.
> > > > + * @addr_width:	The output addr width of first level/stage
> translation
> > > > + * @pasid_bits:	Maximum supported PASID bits, 0 represents no
> > PASID
> > > > + *		support.
> > > > + * @cap_reg:	Describe basic capabilities as defined in VT-d
> > capability
> > > > + *		register.
> > > > + * @cap_mask:	Mark valid capability bits in @cap_reg.
> > > > + * @ecap_reg:	Describe the extended capabilities as defined in VT-d
> > > > + *		extended capability register.
> > > > + * @ecap_mask:	Mark the valid capability bits in @ecap_reg.
> > >
> > > Please explain this a little further, why do we need to tell
> > > userspace about cap/ecap register bits that aren't valid through this interface?
> > > Thanks,
> >
> > we only want to tell userspace about the bits marked in the cap/ecap_mask.
> > cap/ecap_mask is kind of white-list of the cap/ecap register.
> > userspace should only care about the bits in the white-list, for other
> > bits, it should ignore.
> >
> > Regards,
> > Yi Liu
> 
> For invalid bits if kernel just clears them then do we still need additional mask bits
> to explicitly mark them out? I guess this might be the point that Alex asked...

For invalid bits, kernel will clear them. But I think the mask bits is
still necessary. The mask bits tells user space the bits related to
nesting. Without it, user space may have no idea about it.

Maybe talk about QEMU usage of the cap/ecap bits would help. QEMU vIOMMU
decides cap/ecap bits according to QEMU cmdline. But not all of them are
compatible with hardware support. Especially, vIOMMU built on nesting.
So needs to sync the cap/ecap bits with host side. Based on the mask
bits, QEMU can compare the cap/ecap bits configured by QEMU cmdline with
the cap/ecap bits reported by this interface. This comparation is limited
to the nesting related bits in cap/ecap, the other bits are not included
and can use the configuration by QEMU cmdline.

The link below show the current Intel vIOMMU usage on the cap/ecap bits.
For each assigned device, vIOMMU will compare the nesting related bits in
cap/ecap and mask out the bits which hardware doesn't support. After the
machine is intilized, the vIOMMU cap/ecap bits are determined. If user
hot-plug devices to VM, vIOMMU will fail it if the hardware cap/ecap bits
behind hot-plug device are not compatible with determined vIOMMU cap/ecap
bits.

https://www.spinics.net/lists/kvm/msg218294.html

Regards,
Yi Liu

> >
> > > Alex
> > >
> > >
> > > > + */
> > > > +struct iommu_nesting_info_vtd {
> > > > +	__u32	flags;
> > > > +	__u16	addr_width;
> > > > +	__u16	pasid_bits;
> > > > +	__u64	cap_reg;
> > > > +	__u64	cap_mask;
> > > > +	__u64	ecap_reg;
> > > > +	__u64	ecap_mask;
> > > > +};
> > > > +
> > > >  #endif /* _UAPI_IOMMU_H */

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 14/15] vfio: Document dual stage control
  2020-06-11 12:15 ` [PATCH v2 14/15] vfio: Document dual stage control Liu Yi L
@ 2020-06-15  9:41   ` Stefan Hajnoczi
  2020-06-17  6:27     ` Liu, Yi L
  0 siblings, 1 reply; 37+ messages in thread
From: Stefan Hajnoczi @ 2020-06-15  9:41 UTC (permalink / raw)
  To: Liu Yi L
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, yi.y.sun,
	linux-kernel, alex.williamson, iommu, hao.wu, jun.j.tian


[-- Attachment #1.1: Type: text/plain, Size: 4316 bytes --]

On Thu, Jun 11, 2020 at 05:15:33AM -0700, Liu Yi L wrote:
> From: Eric Auger <eric.auger@redhat.com>
> 
> The VFIO API was enhanced to support nested stage control: a bunch of
> new iotcls and usage guideline.
> 
> Let's document the process to follow to set up nested mode.
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Eric Auger <eric.auger@redhat.com>
> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> ---
> v1 -> v2:
> *) new in v2, compared with Eric's original version, pasid table bind
>    and fault reporting is removed as this series doesn't cover them.
>    Original version from Eric.
>    https://lkml.org/lkml/2020/3/20/700
> 
>  Documentation/driver-api/vfio.rst | 64 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 64 insertions(+)
> 
> diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst
> index f1a4d3c..06224bd 100644
> --- a/Documentation/driver-api/vfio.rst
> +++ b/Documentation/driver-api/vfio.rst
> @@ -239,6 +239,70 @@ group and can access them as follows::
>  	/* Gratuitous device reset and go... */
>  	ioctl(device, VFIO_DEVICE_RESET);
>  
> +IOMMU Dual Stage Control
> +------------------------
> +
> +Some IOMMUs support 2 stages/levels of translation. Stage corresponds to
> +the ARM terminology while level corresponds to Intel's VTD terminology.
> +In the following text we use either without distinction.
> +
> +This is useful when the guest is exposed with a virtual IOMMU and some
> +devices are assigned to the guest through VFIO. Then the guest OS can use
> +stage 1 (GIOVA -> GPA or GVA->GPA), while the hypervisor uses stage 2 for
> +VM isolation (GPA -> HPA).
> +
> +Under dual stage translation, the guest gets ownership of the stage 1 page
> +tables and also owns stage 1 configuration structures. The hypervisor owns
> +the root configuration structure (for security reason), including stage 2
> +configuration. This works as long configuration structures and page table

s/as long configuration/as long as configuration/

> +format are compatible between the virtual IOMMU and the physical IOMMU.

s/format/formats/

> +
> +Assuming the HW supports it, this nested mode is selected by choosing the
> +VFIO_TYPE1_NESTING_IOMMU type through:
> +
> +    ioctl(container, VFIO_SET_IOMMU, VFIO_TYPE1_NESTING_IOMMU);
> +
> +This forces the hypervisor to use the stage 2, leaving stage 1 available
> +for guest usage. The guest stage 1 format depends on IOMMU vendor, and
> +it is the same with the nesting configuration method. User space should
> +check the format and configuration method after setting nesting type by
> +using:
> +
> +    ioctl(container->fd, VFIO_IOMMU_GET_INFO, &nesting_info);
> +
> +Details can be found in Documentation/userspace-api/iommu.rst. For Intel
> +VT-d, each stage 1 page table is bound to host by:
> +
> +    nesting_op->flags = VFIO_IOMMU_NESTING_OP_BIND_PGTBL;
> +    memcpy(&nesting_op->data, &bind_data, sizeof(bind_data));
> +    ioctl(container->fd, VFIO_IOMMU_NESTING_OP, nesting_op);
> +
> +As mentioned above, guest OS may use stage 1 for GIOVA->GPA or GVA->GPA.
> +GVA->GPA page tables are available when PASID (Process Address Space ID)
> +is exposed to guest. e.g. guest with PASID-capable devices assigned. For
> +such page table binding, the bind_data should include PASID info, which
> +is allocated by guest itself or by host. This depends on hardware vendor
> +e.g. Intel VT-d requires to allocate PASID from host. This requirement is
> +available by VFIO_IOMMU_GET_INFO. User space could allocate PASID from
> +host by:
> +
> +    req.flags = VFIO_IOMMU_ALLOC_PASID;
> +    ioctl(container, VFIO_IOMMU_PASID_REQUEST, &req);

It is not clear how the userspace application determines whether PASIDs
must be allocated from the host via VFIO_IOMMU_PASID_REQUEST or if the
guest itself can allocate PASIDs. The text mentions VFIO_IOMMU_GET_INFO
but what exactly should the userspace application check?

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 00/15] vfio: expose virtual Shared Virtual Addressing to VMs
  2020-06-11 12:15 [PATCH v2 00/15] vfio: expose virtual Shared Virtual Addressing to VMs Liu Yi L
                   ` (14 preceding siblings ...)
  2020-06-11 12:15 ` [PATCH v2 15/15] iommu/vt-d: Support reporting nesting capability info Liu Yi L
@ 2020-06-15 10:02 ` Stefan Hajnoczi
  2020-06-15 12:39   ` Liu, Yi L
  2020-06-16  2:26   ` Tian, Kevin
  15 siblings, 2 replies; 37+ messages in thread
From: Stefan Hajnoczi @ 2020-06-15 10:02 UTC (permalink / raw)
  To: Liu Yi L
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, yi.y.sun,
	linux-kernel, alex.williamson, iommu, hao.wu, jun.j.tian


[-- Attachment #1.1: Type: text/plain, Size: 2381 bytes --]

On Thu, Jun 11, 2020 at 05:15:19AM -0700, Liu Yi L wrote:
> Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on
> Intel platforms allows address space sharing between device DMA and
> applications. SVA can reduce programming complexity and enhance security.
> 
> This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing
> guest application address space with passthru devices. This is called
> vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU
> changes. For IOMMU and QEMU changes, they are in separate series (listed
> in the "Related series").
> 
> The high-level architecture for SVA virtualization is as below, the key
> design of vSVA support is to utilize the dual-stage IOMMU translation (
> also known as IOMMU nesting translation) capability in host IOMMU.
> 
> 
>     .-------------.  .---------------------------.
>     |   vIOMMU    |  | Guest process CR3, FL only|
>     |             |  '---------------------------'
>     .----------------/
>     | PASID Entry |--- PASID cache flush -
>     '-------------'                       |
>     |             |                       V
>     |             |                CR3 in GPA
>     '-------------'
> Guest
> ------| Shadow |--------------------------|--------
>       v        v                          v
> Host
>     .-------------.  .----------------------.
>     |   pIOMMU    |  | Bind FL for GVA-GPA  |
>     |             |  '----------------------'
>     .----------------/  |
>     | PASID Entry |     V (Nested xlate)
>     '----------------\.------------------------------.
>     |             |   |SL for GPA-HPA, default domain|
>     |             |   '------------------------------'
>     '-------------'
> Where:
>  - FL = First level/stage one page tables
>  - SL = Second level/stage two page tables

Hi,
Looks like an interesting feature!

To check I understand this feature: can applications now pass virtual
addresses to devices instead of translating to IOVAs?

If yes, can guest applications restrict the vSVA address space so the
device only has access to certain regions?

On one hand replacing IOVA translation with virtual addresses simplifies
the application programming model, but does it give up isolation if the
device can now access all application memory?

Thanks,
Stefan

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH v2 00/15] vfio: expose virtual Shared Virtual Addressing to VMs
  2020-06-15 10:02 ` [PATCH v2 00/15] vfio: expose virtual Shared Virtual Addressing to VMs Stefan Hajnoczi
@ 2020-06-15 12:39   ` Liu, Yi L
  2020-06-16 15:34     ` Stefan Hajnoczi
  2020-06-16  2:26   ` Tian, Kevin
  1 sibling, 1 reply; 37+ messages in thread
From: Liu, Yi L @ 2020-06-15 12:39 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: jean-philippe, Tian, Kevin, Raj, Ashok, kvm, Sun, Yi Y,
	linux-kernel, alex.williamson, iommu, Wu, Hao, Tian, Jun J

> From: Stefan Hajnoczi <stefanha@gmail.com>
> Sent: Monday, June 15, 2020 6:02 PM
> 
> On Thu, Jun 11, 2020 at 05:15:19AM -0700, Liu Yi L wrote:
> > Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on
> > Intel platforms allows address space sharing between device DMA and
> > applications. SVA can reduce programming complexity and enhance security.
> >
> > This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing
> > guest application address space with passthru devices. This is called
> > vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU
> > changes. For IOMMU and QEMU changes, they are in separate series (listed
> > in the "Related series").
> >
> > The high-level architecture for SVA virtualization is as below, the key
> > design of vSVA support is to utilize the dual-stage IOMMU translation (
> > also known as IOMMU nesting translation) capability in host IOMMU.
> >
> >
> >     .-------------.  .---------------------------.
> >     |   vIOMMU    |  | Guest process CR3, FL only|
> >     |             |  '---------------------------'
> >     .----------------/
> >     | PASID Entry |--- PASID cache flush -
> >     '-------------'                       |
> >     |             |                       V
> >     |             |                CR3 in GPA
> >     '-------------'
> > Guest
> > ------| Shadow |--------------------------|--------
> >       v        v                          v
> > Host
> >     .-------------.  .----------------------.
> >     |   pIOMMU    |  | Bind FL for GVA-GPA  |
> >     |             |  '----------------------'
> >     .----------------/  |
> >     | PASID Entry |     V (Nested xlate)
> >     '----------------\.------------------------------.
> >     |             |   |SL for GPA-HPA, default domain|
> >     |             |   '------------------------------'
> >     '-------------'
> > Where:
> >  - FL = First level/stage one page tables
> >  - SL = Second level/stage two page tables
> 
> Hi,
> Looks like an interesting feature!

thanks for the interest. Stefan :-)

> To check I understand this feature: can applications now pass virtual
> addresses to devices instead of translating to IOVAs?

yes, application could pass virtual addresses to device directly. As
long as the virtual address is mapped in cpu page table, then IOMMU
would get it translated to physical address.

> If yes, can guest applications restrict the vSVA address space so the
> device only has access to certain regions?

do you mean restrict the access of certain virtual address regions of
guest application ? or certain guest memory? :-)

> On one hand replacing IOVA translation with virtual addresses simplifies
> the application programming model, but does it give up isolation if the
> device can now access all application memory?

yeah, you are right, SVA simplifies application programming model. And
today, we do allow access all application memory by SVA. this is also
another benefit of SVA. e.g. say an accelerator gets a copy of data from
a buffer written by cpu. If there is some other data which is directed
by a pointer (a virtual address) within the data got from memory, accelerator
could do another DMA to fetch it without cpu's involvement.

Regards,
Yi Liu

> Thanks,
> Stefan
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH v2 02/15] iommu: Report domain nesting info
  2020-06-15  6:04         ` Liu, Yi L
@ 2020-06-16  1:56           ` Tian, Kevin
  2020-06-16  2:24             ` Liu, Yi L
  0 siblings, 1 reply; 37+ messages in thread
From: Tian, Kevin @ 2020-06-16  1:56 UTC (permalink / raw)
  To: Liu, Yi L, Alex Williamson
  Cc: jean-philippe, Raj, Ashok, kvm, iommu, linux-kernel, Sun, Yi Y,
	Wu, Hao, Tian,  Jun J

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Monday, June 15, 2020 2:05 PM
> 
> Hi Kevin,
> 
> > From: Tian, Kevin <kevin.tian@intel.com>
> > Sent: Monday, June 15, 2020 9:23 AM
> >
> > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > Sent: Friday, June 12, 2020 5:05 PM
> > >
> > > Hi Alex,
> > >
> > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > Sent: Friday, June 12, 2020 3:30 AM
> > > >
> > > > On Thu, 11 Jun 2020 05:15:21 -0700
> > > > Liu Yi L <yi.l.liu@intel.com> wrote:
> > > >
> > > > > IOMMUs that support nesting translation needs report the
> > > > > capability info to userspace, e.g. the format of first level/stage paging
> > structures.
> > > > >
> > > > > Cc: Kevin Tian <kevin.tian@intel.com>
> > > > > CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > > > Cc: Alex Williamson <alex.williamson@redhat.com>
> > > > > Cc: Eric Auger <eric.auger@redhat.com>
> > > > > Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > > > > Cc: Joerg Roedel <joro@8bytes.org>
> > > > > Cc: Lu Baolu <baolu.lu@linux.intel.com>
> > > > > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > > > > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > > > ---
> > > > > @Jean, Eric: as nesting was introduced for ARM, but looks like no
> > > > > actual user of it. right? So I'm wondering if we can reuse
> > > > > DOMAIN_ATTR_NESTING to retrieve nesting info? how about your
> > > opinions?
> > > > >
> > > > >  include/linux/iommu.h      |  1 +
> > > > >  include/uapi/linux/iommu.h | 34
> > > ++++++++++++++++++++++++++++++++++
> > > > >  2 files changed, 35 insertions(+)
> > > > >
> > > > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h index
> > > > > 78a26ae..f6e4b49 100644
> > > > > --- a/include/linux/iommu.h
> > > > > +++ b/include/linux/iommu.h
> > > > > @@ -126,6 +126,7 @@ enum iommu_attr {
> > > > >  	DOMAIN_ATTR_FSL_PAMUV1,
> > > > >  	DOMAIN_ATTR_NESTING,	/* two stages of translation */
> > > > >  	DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
> > > > > +	DOMAIN_ATTR_NESTING_INFO,
> > > > >  	DOMAIN_ATTR_MAX,
> > > > >  };
> > > > >
> > > > > diff --git a/include/uapi/linux/iommu.h
> > > > > b/include/uapi/linux/iommu.h index 303f148..02eac73 100644
> > > > > --- a/include/uapi/linux/iommu.h
> > > > > +++ b/include/uapi/linux/iommu.h
> > > > > @@ -332,4 +332,38 @@ struct iommu_gpasid_bind_data {
> > > > >  	};
> > > > >  };
> > > > >
> > > > > +struct iommu_nesting_info {
> > > > > +	__u32	size;
> > > > > +	__u32	format;
> > > > > +	__u32	features;
> > > > > +#define IOMMU_NESTING_FEAT_SYSWIDE_PASID	(1 << 0)
> > > > > +#define IOMMU_NESTING_FEAT_BIND_PGTBL		(1 << 1)
> > > > > +#define IOMMU_NESTING_FEAT_CACHE_INVLD		(1 <<
> 2)
> > > > > +	__u32	flags;
> > > > > +	__u8	data[];
> > > > > +};
> > > > > +
> > > > > +/*
> > > > > + * @flags:	VT-d specific flags. Currently reserved for future
> > > > > + *		extension.
> > > > > + * @addr_width:	The output addr width of first level/stage
> > translation
> > > > > + * @pasid_bits:	Maximum supported PASID bits, 0 represents
> no
> > > PASID
> > > > > + *		support.
> > > > > + * @cap_reg:	Describe basic capabilities as defined in VT-d
> > > capability
> > > > > + *		register.
> > > > > + * @cap_mask:	Mark valid capability bits in @cap_reg.
> > > > > + * @ecap_reg:	Describe the extended capabilities as defined in VT-d
> > > > > + *		extended capability register.
> > > > > + * @ecap_mask:	Mark the valid capability bits in @ecap_reg.
> > > >
> > > > Please explain this a little further, why do we need to tell
> > > > userspace about cap/ecap register bits that aren't valid through this
> interface?
> > > > Thanks,
> > >
> > > we only want to tell userspace about the bits marked in the
> cap/ecap_mask.
> > > cap/ecap_mask is kind of white-list of the cap/ecap register.
> > > userspace should only care about the bits in the white-list, for other
> > > bits, it should ignore.
> > >
> > > Regards,
> > > Yi Liu
> >
> > For invalid bits if kernel just clears them then do we still need additional
> mask bits
> > to explicitly mark them out? I guess this might be the point that Alex asked...
> 
> For invalid bits, kernel will clear them. But I think the mask bits is
> still necessary. The mask bits tells user space the bits related to
> nesting. Without it, user space may have no idea about it.

userspace should know which bit is related to nesting and then should
check that bit explicitly...

> 
> Maybe talk about QEMU usage of the cap/ecap bits would help. QEMU
> vIOMMU
> decides cap/ecap bits according to QEMU cmdline. But not all of them are
> compatible with hardware support. Especially, vIOMMU built on nesting.
> So needs to sync the cap/ecap bits with host side. Based on the mask
> bits, QEMU can compare the cap/ecap bits configured by QEMU cmdline with
> the cap/ecap bits reported by this interface. This comparation is limited
> to the nesting related bits in cap/ecap, the other bits are not included
> and can use the configuration by QEMU cmdline.

I didn't get this explanation. Based on patch [15/15], nesting capabilities
are defined as:
+/* Nesting Support Capability Alignment */
+#define VTD_CAP_FL1GP		(1ULL << 56)
+#define VTD_CAP_FL5LP		(1ULL << 60)
+#define VTD_ECAP_PRS		(1ULL << 29)
+#define VTD_ECAP_ERS		(1ULL << 30)
+#define VTD_ECAP_SRS		(1ULL << 31)
+#define VTD_ECAP_EAFS		(1ULL << 34)
+#define VTD_ECAP_PASID		(1ULL << 40)

When Qemu gets an cmdline option it knows which bit out of above
list should be checked against hardware capability. Then just do the
check bit-by-bit. Why do we need mask bit in uapi to tell which bits
are valid? Unless 0/1 doesn't represent validity of some bit. Do we
have such example?

> 
> The link below show the current Intel vIOMMU usage on the cap/ecap bits.
> For each assigned device, vIOMMU will compare the nesting related bits in
> cap/ecap and mask out the bits which hardware doesn't support. After the
> machine is intilized, the vIOMMU cap/ecap bits are determined. If user
> hot-plug devices to VM, vIOMMU will fail it if the hardware cap/ecap bits
> behind hot-plug device are not compatible with determined vIOMMU
> cap/ecap
> bits.
> 
> https://www.spinics.net/lists/kvm/msg218294.html
> 
> Regards,
> Yi Liu
> 
> > >
> > > > Alex
> > > >
> > > >
> > > > > + */
> > > > > +struct iommu_nesting_info_vtd {
> > > > > +	__u32	flags;
> > > > > +	__u16	addr_width;
> > > > > +	__u16	pasid_bits;
> > > > > +	__u64	cap_reg;
> > > > > +	__u64	cap_mask;
> > > > > +	__u64	ecap_reg;
> > > > > +	__u64	ecap_mask;
> > > > > +};
> > > > > +
> > > > >  #endif /* _UAPI_IOMMU_H */

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH v2 02/15] iommu: Report domain nesting info
  2020-06-16  1:56           ` Tian, Kevin
@ 2020-06-16  2:24             ` Liu, Yi L
  0 siblings, 0 replies; 37+ messages in thread
From: Liu, Yi L @ 2020-06-16  2:24 UTC (permalink / raw)
  To: Tian, Kevin, Alex Williamson
  Cc: jean-philippe, Raj, Ashok, kvm, iommu, linux-kernel, Sun, Yi Y,
	Wu, Hao, Tian,  Jun J

> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: Tuesday, June 16, 2020 9:56 AM
> 
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Monday, June 15, 2020 2:05 PM
> >
> > Hi Kevin,
> >
> > > From: Tian, Kevin <kevin.tian@intel.com>
> > > Sent: Monday, June 15, 2020 9:23 AM
> > >
> > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > Sent: Friday, June 12, 2020 5:05 PM
> > > >
> > > > Hi Alex,
> > > >
> > > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > > Sent: Friday, June 12, 2020 3:30 AM
> > > > >
> > > > > On Thu, 11 Jun 2020 05:15:21 -0700
> > > > > Liu Yi L <yi.l.liu@intel.com> wrote:
> > > > >
> > > > > > IOMMUs that support nesting translation needs report the
> > > > > > capability info to userspace, e.g. the format of first level/stage paging
> > > structures.
> > > > > >
> > > > > > Cc: Kevin Tian <kevin.tian@intel.com>
> > > > > > CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > > > > Cc: Alex Williamson <alex.williamson@redhat.com>
> > > > > > Cc: Eric Auger <eric.auger@redhat.com>
> > > > > > Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > > > > > Cc: Joerg Roedel <joro@8bytes.org>
> > > > > > Cc: Lu Baolu <baolu.lu@linux.intel.com>
> > > > > > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > > > > > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > > > > ---
> > > > > > @Jean, Eric: as nesting was introduced for ARM, but looks like no
> > > > > > actual user of it. right? So I'm wondering if we can reuse
> > > > > > DOMAIN_ATTR_NESTING to retrieve nesting info? how about your
> > > > opinions?
> > > > > >
> > > > > >  include/linux/iommu.h      |  1 +
> > > > > >  include/uapi/linux/iommu.h | 34
> > > > ++++++++++++++++++++++++++++++++++
> > > > > >  2 files changed, 35 insertions(+)
> > > > > >
> > > > > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h index
> > > > > > 78a26ae..f6e4b49 100644
> > > > > > --- a/include/linux/iommu.h
> > > > > > +++ b/include/linux/iommu.h
> > > > > > @@ -126,6 +126,7 @@ enum iommu_attr {
> > > > > >  	DOMAIN_ATTR_FSL_PAMUV1,
> > > > > >  	DOMAIN_ATTR_NESTING,	/* two stages of translation */
> > > > > >  	DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
> > > > > > +	DOMAIN_ATTR_NESTING_INFO,
> > > > > >  	DOMAIN_ATTR_MAX,
> > > > > >  };
> > > > > >
> > > > > > diff --git a/include/uapi/linux/iommu.h
> > > > > > b/include/uapi/linux/iommu.h index 303f148..02eac73 100644
> > > > > > --- a/include/uapi/linux/iommu.h
> > > > > > +++ b/include/uapi/linux/iommu.h
> > > > > > @@ -332,4 +332,38 @@ struct iommu_gpasid_bind_data {
> > > > > >  	};
> > > > > >  };
> > > > > >
> > > > > > +struct iommu_nesting_info {
> > > > > > +	__u32	size;
> > > > > > +	__u32	format;
> > > > > > +	__u32	features;
> > > > > > +#define IOMMU_NESTING_FEAT_SYSWIDE_PASID	(1 << 0)
> > > > > > +#define IOMMU_NESTING_FEAT_BIND_PGTBL		(1 << 1)
> > > > > > +#define IOMMU_NESTING_FEAT_CACHE_INVLD		(1 <<
> > 2)
> > > > > > +	__u32	flags;
> > > > > > +	__u8	data[];
> > > > > > +};
> > > > > > +
> > > > > > +/*
> > > > > > + * @flags:	VT-d specific flags. Currently reserved for future
> > > > > > + *		extension.
> > > > > > + * @addr_width:	The output addr width of first level/stage
> > > translation
> > > > > > + * @pasid_bits:	Maximum supported PASID bits, 0 represents
> > no
> > > > PASID
> > > > > > + *		support.
> > > > > > + * @cap_reg:	Describe basic capabilities as defined in VT-d
> > > > capability
> > > > > > + *		register.
> > > > > > + * @cap_mask:	Mark valid capability bits in @cap_reg.
> > > > > > + * @ecap_reg:	Describe the extended capabilities as defined in
> VT-d
> > > > > > + *		extended capability register.
> > > > > > + * @ecap_mask:	Mark the valid capability bits in @ecap_reg.
> > > > >
> > > > > Please explain this a little further, why do we need to tell
> > > > > userspace about cap/ecap register bits that aren't valid through this
> > interface?
> > > > > Thanks,
> > > >
> > > > we only want to tell userspace about the bits marked in the
> > cap/ecap_mask.
> > > > cap/ecap_mask is kind of white-list of the cap/ecap register.
> > > > userspace should only care about the bits in the white-list, for other
> > > > bits, it should ignore.
> > > >
> > > > Regards,
> > > > Yi Liu
> > >
> > > For invalid bits if kernel just clears them then do we still need additional
> > mask bits
> > > to explicitly mark them out? I guess this might be the point that Alex asked...
> >
> > For invalid bits, kernel will clear them. But I think the mask bits is
> > still necessary. The mask bits tells user space the bits related to
> > nesting. Without it, user space may have no idea about it.
> 
> userspace should know which bit is related to nesting and then should
> check that bit explicitly...

ok, so userspace could get such info by the understanding of spec, right?
if user space could get it, then I think it's uncessary to have cap/ecap mask
bits.

> >
> > Maybe talk about QEMU usage of the cap/ecap bits would help. QEMU
> > vIOMMU
> > decides cap/ecap bits according to QEMU cmdline. But not all of them are
> > compatible with hardware support. Especially, vIOMMU built on nesting.
> > So needs to sync the cap/ecap bits with host side. Based on the mask
> > bits, QEMU can compare the cap/ecap bits configured by QEMU cmdline with
> > the cap/ecap bits reported by this interface. This comparation is limited
> > to the nesting related bits in cap/ecap, the other bits are not included
> > and can use the configuration by QEMU cmdline.
> 
> I didn't get this explanation. Based on patch [15/15], nesting capabilities
> are defined as:
> +/* Nesting Support Capability Alignment */
> +#define VTD_CAP_FL1GP		(1ULL << 56)
> +#define VTD_CAP_FL5LP		(1ULL << 60)
> +#define VTD_ECAP_PRS		(1ULL << 29)
> +#define VTD_ECAP_ERS		(1ULL << 30)
> +#define VTD_ECAP_SRS		(1ULL << 31)
> +#define VTD_ECAP_EAFS		(1ULL << 34)
> +#define VTD_ECAP_PASID		(1ULL << 40)
> 
> When Qemu gets an cmdline option it knows which bit out of above
> list should be checked against hardware capability. Then just do the
> check bit-by-bit. Why do we need mask bit in uapi to tell which bits
> are valid?

as above reply, if userspace has the check list for the cap/ecap bits,
then it's not necessary to use mask bit.

> Unless 0/1 doesn't represent validity of some bit. Do we
> have such example?

yes, like the pasid bits. it's 20 bits. but we already got pasid_bits
in the iommu_nesting_info_vtd structure. so it's not covered in the
ecap_bits.

Regards,
Yi Liu

> >
> > The link below show the current Intel vIOMMU usage on the cap/ecap bits.
> > For each assigned device, vIOMMU will compare the nesting related bits in
> > cap/ecap and mask out the bits which hardware doesn't support. After the
> > machine is intilized, the vIOMMU cap/ecap bits are determined. If user
> > hot-plug devices to VM, vIOMMU will fail it if the hardware cap/ecap bits
> > behind hot-plug device are not compatible with determined vIOMMU
> > cap/ecap
> > bits.
> >
> > https://www.spinics.net/lists/kvm/msg218294.html
> >
> > Regards,
> > Yi Liu
> >
> > > >
> > > > > Alex
> > > > >
> > > > >
> > > > > > + */
> > > > > > +struct iommu_nesting_info_vtd {
> > > > > > +	__u32	flags;
> > > > > > +	__u16	addr_width;
> > > > > > +	__u16	pasid_bits;
> > > > > > +	__u64	cap_reg;
> > > > > > +	__u64	cap_mask;
> > > > > > +	__u64	ecap_reg;
> > > > > > +	__u64	ecap_mask;
> > > > > > +};
> > > > > > +
> > > > > >  #endif /* _UAPI_IOMMU_H */

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH v2 00/15] vfio: expose virtual Shared Virtual Addressing to VMs
  2020-06-15 10:02 ` [PATCH v2 00/15] vfio: expose virtual Shared Virtual Addressing to VMs Stefan Hajnoczi
  2020-06-15 12:39   ` Liu, Yi L
@ 2020-06-16  2:26   ` Tian, Kevin
  2020-06-16 15:49     ` Stefan Hajnoczi
  1 sibling, 1 reply; 37+ messages in thread
From: Tian, Kevin @ 2020-06-16  2:26 UTC (permalink / raw)
  To: Stefan Hajnoczi, Liu, Yi L
  Cc: jean-philippe, Raj, Ashok, kvm, Sun, Yi Y, linux-kernel,
	alex.williamson, iommu, Wu, Hao, Tian, Jun J

> From: Stefan Hajnoczi <stefanha@gmail.com>
> Sent: Monday, June 15, 2020 6:02 PM
> 
> On Thu, Jun 11, 2020 at 05:15:19AM -0700, Liu Yi L wrote:
> > Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on
> > Intel platforms allows address space sharing between device DMA and
> > applications. SVA can reduce programming complexity and enhance
> security.
> >
> > This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing
> > guest application address space with passthru devices. This is called
> > vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU
> > changes. For IOMMU and QEMU changes, they are in separate series (listed
> > in the "Related series").
> >
> > The high-level architecture for SVA virtualization is as below, the key
> > design of vSVA support is to utilize the dual-stage IOMMU translation (
> > also known as IOMMU nesting translation) capability in host IOMMU.
> >
> >
> >     .-------------.  .---------------------------.
> >     |   vIOMMU    |  | Guest process CR3, FL only|
> >     |             |  '---------------------------'
> >     .----------------/
> >     | PASID Entry |--- PASID cache flush -
> >     '-------------'                       |
> >     |             |                       V
> >     |             |                CR3 in GPA
> >     '-------------'
> > Guest
> > ------| Shadow |--------------------------|--------
> >       v        v                          v
> > Host
> >     .-------------.  .----------------------.
> >     |   pIOMMU    |  | Bind FL for GVA-GPA  |
> >     |             |  '----------------------'
> >     .----------------/  |
> >     | PASID Entry |     V (Nested xlate)
> >     '----------------\.------------------------------.
> >     |             |   |SL for GPA-HPA, default domain|
> >     |             |   '------------------------------'
> >     '-------------'
> > Where:
> >  - FL = First level/stage one page tables
> >  - SL = Second level/stage two page tables
> 
> Hi,
> Looks like an interesting feature!
> 
> To check I understand this feature: can applications now pass virtual
> addresses to devices instead of translating to IOVAs?
> 
> If yes, can guest applications restrict the vSVA address space so the
> device only has access to certain regions?
> 
> On one hand replacing IOVA translation with virtual addresses simplifies
> the application programming model, but does it give up isolation if the
> device can now access all application memory?
> 

with SVA each application is allocated with a unique PASID to tag its
virtual address space. The device that claims SVA support must guarantee 
that one application can only program the device to access its own virtual
address space (i.e. all DMAs triggered by this application are tagged with
the application's PASID, and are translated by IOMMU's PASID-granular
page table). So, isolation is not sacrificed in SVA.

Thanks
Kevin
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 00/15] vfio: expose virtual Shared Virtual Addressing to VMs
  2020-06-15 12:39   ` Liu, Yi L
@ 2020-06-16 15:34     ` Stefan Hajnoczi
  0 siblings, 0 replies; 37+ messages in thread
From: Stefan Hajnoczi @ 2020-06-16 15:34 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: jean-philippe, Tian, Kevin, Raj, Ashok, kvm, Sun, Yi Y,
	linux-kernel, alex.williamson, iommu, Wu, Hao, Tian, Jun J


[-- Attachment #1.1: Type: text/plain, Size: 3240 bytes --]

On Mon, Jun 15, 2020 at 12:39:40PM +0000, Liu, Yi L wrote:
> > From: Stefan Hajnoczi <stefanha@gmail.com>
> > Sent: Monday, June 15, 2020 6:02 PM
> > 
> > On Thu, Jun 11, 2020 at 05:15:19AM -0700, Liu Yi L wrote:
> > > Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on
> > > Intel platforms allows address space sharing between device DMA and
> > > applications. SVA can reduce programming complexity and enhance security.
> > >
> > > This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing
> > > guest application address space with passthru devices. This is called
> > > vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU
> > > changes. For IOMMU and QEMU changes, they are in separate series (listed
> > > in the "Related series").
> > >
> > > The high-level architecture for SVA virtualization is as below, the key
> > > design of vSVA support is to utilize the dual-stage IOMMU translation (
> > > also known as IOMMU nesting translation) capability in host IOMMU.
> > >
> > >
> > >     .-------------.  .---------------------------.
> > >     |   vIOMMU    |  | Guest process CR3, FL only|
> > >     |             |  '---------------------------'
> > >     .----------------/
> > >     | PASID Entry |--- PASID cache flush -
> > >     '-------------'                       |
> > >     |             |                       V
> > >     |             |                CR3 in GPA
> > >     '-------------'
> > > Guest
> > > ------| Shadow |--------------------------|--------
> > >       v        v                          v
> > > Host
> > >     .-------------.  .----------------------.
> > >     |   pIOMMU    |  | Bind FL for GVA-GPA  |
> > >     |             |  '----------------------'
> > >     .----------------/  |
> > >     | PASID Entry |     V (Nested xlate)
> > >     '----------------\.------------------------------.
> > >     |             |   |SL for GPA-HPA, default domain|
> > >     |             |   '------------------------------'
> > >     '-------------'
> > > Where:
> > >  - FL = First level/stage one page tables
> > >  - SL = Second level/stage two page tables
> > 
> > Hi,
> > Looks like an interesting feature!
> 
> thanks for the interest. Stefan :-)
> 
> > To check I understand this feature: can applications now pass virtual
> > addresses to devices instead of translating to IOVAs?
> 
> yes, application could pass virtual addresses to device directly. As
> long as the virtual address is mapped in cpu page table, then IOMMU
> would get it translated to physical address.
> 
> > If yes, can guest applications restrict the vSVA address space so the
> > device only has access to certain regions?
> 
> do you mean restrict the access of certain virtual address regions of
> guest application ? or certain guest memory? :-)

Your reply below answered my question. I was wondering if applications
can protect parts of their virtual memory space that should not be
accessed by the device. It makes sense that there is a trade-off to
simplify the programming model and performance might also be better if
the application doesn't need to DMA map/unmap buffers frequently.

Stefan

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 00/15] vfio: expose virtual Shared Virtual Addressing to VMs
  2020-06-16  2:26   ` Tian, Kevin
@ 2020-06-16 15:49     ` Stefan Hajnoczi
  2020-06-16 16:09       ` Peter Xu
  2020-06-16 17:00       ` Raj, Ashok
  0 siblings, 2 replies; 37+ messages in thread
From: Stefan Hajnoczi @ 2020-06-16 15:49 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: jean-philippe, Raj, Ashok, kvm, iommu, Sun, Yi Y, linux-kernel,
	alex.williamson, Wu, Hao, Tian, Jun J


[-- Attachment #1.1: Type: text/plain, Size: 3646 bytes --]

On Tue, Jun 16, 2020 at 02:26:38AM +0000, Tian, Kevin wrote:
> > From: Stefan Hajnoczi <stefanha@gmail.com>
> > Sent: Monday, June 15, 2020 6:02 PM
> > 
> > On Thu, Jun 11, 2020 at 05:15:19AM -0700, Liu Yi L wrote:
> > > Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on
> > > Intel platforms allows address space sharing between device DMA and
> > > applications. SVA can reduce programming complexity and enhance
> > security.
> > >
> > > This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing
> > > guest application address space with passthru devices. This is called
> > > vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU
> > > changes. For IOMMU and QEMU changes, they are in separate series (listed
> > > in the "Related series").
> > >
> > > The high-level architecture for SVA virtualization is as below, the key
> > > design of vSVA support is to utilize the dual-stage IOMMU translation (
> > > also known as IOMMU nesting translation) capability in host IOMMU.
> > >
> > >
> > >     .-------------.  .---------------------------.
> > >     |   vIOMMU    |  | Guest process CR3, FL only|
> > >     |             |  '---------------------------'
> > >     .----------------/
> > >     | PASID Entry |--- PASID cache flush -
> > >     '-------------'                       |
> > >     |             |                       V
> > >     |             |                CR3 in GPA
> > >     '-------------'
> > > Guest
> > > ------| Shadow |--------------------------|--------
> > >       v        v                          v
> > > Host
> > >     .-------------.  .----------------------.
> > >     |   pIOMMU    |  | Bind FL for GVA-GPA  |
> > >     |             |  '----------------------'
> > >     .----------------/  |
> > >     | PASID Entry |     V (Nested xlate)
> > >     '----------------\.------------------------------.
> > >     |             |   |SL for GPA-HPA, default domain|
> > >     |             |   '------------------------------'
> > >     '-------------'
> > > Where:
> > >  - FL = First level/stage one page tables
> > >  - SL = Second level/stage two page tables
> > 
> > Hi,
> > Looks like an interesting feature!
> > 
> > To check I understand this feature: can applications now pass virtual
> > addresses to devices instead of translating to IOVAs?
> > 
> > If yes, can guest applications restrict the vSVA address space so the
> > device only has access to certain regions?
> > 
> > On one hand replacing IOVA translation with virtual addresses simplifies
> > the application programming model, but does it give up isolation if the
> > device can now access all application memory?
> > 
> 
> with SVA each application is allocated with a unique PASID to tag its
> virtual address space. The device that claims SVA support must guarantee 
> that one application can only program the device to access its own virtual
> address space (i.e. all DMAs triggered by this application are tagged with
> the application's PASID, and are translated by IOMMU's PASID-granular
> page table). So, isolation is not sacrificed in SVA.

Isolation between applications is preserved but there is no isolation
between the device and the application itself. The application needs to
trust the device.

Examples:

1. The device can snoop secret data from readable pages in the
   application's virtual memory space.

2. The device can gain arbitrary execution on the CPU by overwriting
   control flow addresses (e.g. function pointers, stack return
   addresses) in writable pages.

Stefan

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 00/15] vfio: expose virtual Shared Virtual Addressing to VMs
  2020-06-16 15:49     ` Stefan Hajnoczi
@ 2020-06-16 16:09       ` Peter Xu
  2020-06-22 12:49         ` Stefan Hajnoczi
  2020-06-16 17:00       ` Raj, Ashok
  1 sibling, 1 reply; 37+ messages in thread
From: Peter Xu @ 2020-06-16 16:09 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: jean-philippe, Tian, Kevin, Raj, Ashok, kvm, Sun, Yi Y, iommu,
	linux-kernel, alex.williamson, Wu, Hao, Tian, Jun J

On Tue, Jun 16, 2020 at 04:49:28PM +0100, Stefan Hajnoczi wrote:
> Isolation between applications is preserved but there is no isolation
> between the device and the application itself. The application needs to
> trust the device.
> 
> Examples:
> 
> 1. The device can snoop secret data from readable pages in the
>    application's virtual memory space.
> 
> 2. The device can gain arbitrary execution on the CPU by overwriting
>    control flow addresses (e.g. function pointers, stack return
>    addresses) in writable pages.

To me, SVA seems to be that "middle layer" of secure where it's not as safe as
VFIO_IOMMU_MAP_DMA which has buffer level granularity of control (but of course
we pay overhead on buffer setups and on-the-fly translations), however it's far
better than DMA with no IOMMU which can ruin the whole host/guest, because
after all we do a lot of isolations as process based.

IMHO it's the same as when we see a VM (or the QEMU process) as a whole along
with the guest code.  In some cases we don't care if the guest did some bad
things to mess up with its own QEMU process.  It is still ideal if we can even
stop the guest from doing so, but when it's not easy to do it the ideal way, we
just lower the requirement to not spread the influence to the host and other
VMs.

Thanks,

-- 
Peter Xu

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 00/15] vfio: expose virtual Shared Virtual Addressing to VMs
  2020-06-16 15:49     ` Stefan Hajnoczi
  2020-06-16 16:09       ` Peter Xu
@ 2020-06-16 17:00       ` Raj, Ashok
  2020-06-22 12:49         ` Stefan Hajnoczi
  1 sibling, 1 reply; 37+ messages in thread
From: Raj, Ashok @ 2020-06-16 17:00 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: jean-philippe, Tian, Kevin, Ashok Raj, kvm, iommu, Sun, Yi Y,
	linux-kernel, alex.williamson, Wu, Hao, Tian, Jun J

On Tue, Jun 16, 2020 at 04:49:28PM +0100, Stefan Hajnoczi wrote:
> On Tue, Jun 16, 2020 at 02:26:38AM +0000, Tian, Kevin wrote:
> > > From: Stefan Hajnoczi <stefanha@gmail.com>
> > > Sent: Monday, June 15, 2020 6:02 PM
> > > 
> > > On Thu, Jun 11, 2020 at 05:15:19AM -0700, Liu Yi L wrote:
> > > > Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on
> > > > Intel platforms allows address space sharing between device DMA and
> > > > applications. SVA can reduce programming complexity and enhance
> > > security.
> > > >
> > > > This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing
> > > > guest application address space with passthru devices. This is called
> > > > vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU
> > > > changes. For IOMMU and QEMU changes, they are in separate series (listed
> > > > in the "Related series").
> > > >
> > > > The high-level architecture for SVA virtualization is as below, the key
> > > > design of vSVA support is to utilize the dual-stage IOMMU translation (
> > > > also known as IOMMU nesting translation) capability in host IOMMU.
> > > >
> > > >
> > > >     .-------------.  .---------------------------.
> > > >     |   vIOMMU    |  | Guest process CR3, FL only|
> > > >     |             |  '---------------------------'
> > > >     .----------------/
> > > >     | PASID Entry |--- PASID cache flush -
> > > >     '-------------'                       |
> > > >     |             |                       V
> > > >     |             |                CR3 in GPA
> > > >     '-------------'
> > > > Guest
> > > > ------| Shadow |--------------------------|--------
> > > >       v        v                          v
> > > > Host
> > > >     .-------------.  .----------------------.
> > > >     |   pIOMMU    |  | Bind FL for GVA-GPA  |
> > > >     |             |  '----------------------'
> > > >     .----------------/  |
> > > >     | PASID Entry |     V (Nested xlate)
> > > >     '----------------\.------------------------------.
> > > >     |             |   |SL for GPA-HPA, default domain|
> > > >     |             |   '------------------------------'
> > > >     '-------------'
> > > > Where:
> > > >  - FL = First level/stage one page tables
> > > >  - SL = Second level/stage two page tables
> > > 
> > > Hi,
> > > Looks like an interesting feature!
> > > 
> > > To check I understand this feature: can applications now pass virtual
> > > addresses to devices instead of translating to IOVAs?
> > > 
> > > If yes, can guest applications restrict the vSVA address space so the
> > > device only has access to certain regions?
> > > 
> > > On one hand replacing IOVA translation with virtual addresses simplifies
> > > the application programming model, but does it give up isolation if the
> > > device can now access all application memory?
> > > 
> > 
> > with SVA each application is allocated with a unique PASID to tag its
> > virtual address space. The device that claims SVA support must guarantee 
> > that one application can only program the device to access its own virtual
> > address space (i.e. all DMAs triggered by this application are tagged with
> > the application's PASID, and are translated by IOMMU's PASID-granular
> > page table). So, isolation is not sacrificed in SVA.
> 
> Isolation between applications is preserved but there is no isolation
> between the device and the application itself. The application needs to
> trust the device.

Right. With all convenience comes security trust. With SVA there is an
expectation that the device has the required security boundaries properly
implemented. FWIW, what is our guarantee today that VF's are secure from
one another or even its own PF? They can also generate transactions with
any of its peer id's and there is nothing an IOMMU can do today. Other than
rely on ACS. Even BusMaster enable can be ignored and devices (malicious
or otherwise) can generate after the BM=0. With SVM you get the benefits of

* Not having to register regions
* Don't need to pin application space for DMA.

> 
> Examples:
> 
> 1. The device can snoop secret data from readable pages in the
>    application's virtual memory space.

Aren't there other security technologies that can address this?

> 
> 2. The device can gain arbitrary execution on the CPU by overwriting
>    control flow addresses (e.g. function pointers, stack return
>    addresses) in writable pages.

I suppose technology like CET might be able to guard. The general
expectation is code pages and anything that needs to be protected should be
mapped nor writable.

Cheers,
Ashok
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH v2 14/15] vfio: Document dual stage control
  2020-06-15  9:41   ` Stefan Hajnoczi
@ 2020-06-17  6:27     ` Liu, Yi L
  2020-06-22 12:51       ` Stefan Hajnoczi
  0 siblings, 1 reply; 37+ messages in thread
From: Liu, Yi L @ 2020-06-17  6:27 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: jean-philippe, Tian, Kevin, Raj, Ashok, kvm, Sun, Yi Y,
	linux-kernel, alex.williamson, iommu, Wu, Hao, Tian, Jun J

> From: Stefan Hajnoczi <stefanha@gmail.com>
> Sent: Monday, June 15, 2020 5:41 PM
> On Thu, Jun 11, 2020 at 05:15:33AM -0700, Liu Yi L wrote:
>
> > From: Eric Auger <eric.auger@redhat.com>
> >
> > The VFIO API was enhanced to support nested stage control: a bunch of
> > new iotcls and usage guideline.
> >
> > Let's document the process to follow to set up nested mode.
> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Alex Williamson <alex.williamson@redhat.com>
> > Cc: Eric Auger <eric.auger@redhat.com>
> > Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > Cc: Joerg Roedel <joro@8bytes.org>
> > Cc: Lu Baolu <baolu.lu@linux.intel.com>
> > Signed-off-by: Eric Auger <eric.auger@redhat.com>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > ---
> > v1 -> v2:
> > *) new in v2, compared with Eric's original version, pasid table bind
> >    and fault reporting is removed as this series doesn't cover them.
> >    Original version from Eric.
> >    https://lkml.org/lkml/2020/3/20/700
> >
> >  Documentation/driver-api/vfio.rst | 64
> > +++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 64 insertions(+)
> >
> > diff --git a/Documentation/driver-api/vfio.rst
> > b/Documentation/driver-api/vfio.rst
> > index f1a4d3c..06224bd 100644
> > --- a/Documentation/driver-api/vfio.rst
> > +++ b/Documentation/driver-api/vfio.rst
> > @@ -239,6 +239,70 @@ group and can access them as follows::
> >  	/* Gratuitous device reset and go... */
> >  	ioctl(device, VFIO_DEVICE_RESET);
> >
> > +IOMMU Dual Stage Control
> > +------------------------
> > +
> > +Some IOMMUs support 2 stages/levels of translation. Stage corresponds
> > +to the ARM terminology while level corresponds to Intel's VTD terminology.
> > +In the following text we use either without distinction.
> > +
> > +This is useful when the guest is exposed with a virtual IOMMU and
> > +some devices are assigned to the guest through VFIO. Then the guest
> > +OS can use stage 1 (GIOVA -> GPA or GVA->GPA), while the hypervisor
> > +uses stage 2 for VM isolation (GPA -> HPA).
> > +
> > +Under dual stage translation, the guest gets ownership of the stage 1
> > +page tables and also owns stage 1 configuration structures. The
> > +hypervisor owns the root configuration structure (for security
> > +reason), including stage 2 configuration. This works as long
> > +configuration structures and page table
> 
> s/as long configuration/as long as configuration/

got it.

> 
> > +format are compatible between the virtual IOMMU and the physical IOMMU.
> 
> s/format/formats/

I see.

> > +
> > +Assuming the HW supports it, this nested mode is selected by choosing
> > +the VFIO_TYPE1_NESTING_IOMMU type through:
> > +
> > +    ioctl(container, VFIO_SET_IOMMU, VFIO_TYPE1_NESTING_IOMMU);
> > +
> > +This forces the hypervisor to use the stage 2, leaving stage 1
> > +available for guest usage. The guest stage 1 format depends on IOMMU
> > +vendor, and it is the same with the nesting configuration method.
> > +User space should check the format and configuration method after
> > +setting nesting type by
> > +using:
> > +
> > +    ioctl(container->fd, VFIO_IOMMU_GET_INFO, &nesting_info);
> > +
> > +Details can be found in Documentation/userspace-api/iommu.rst. For
> > +Intel VT-d, each stage 1 page table is bound to host by:
> > +
> > +    nesting_op->flags = VFIO_IOMMU_NESTING_OP_BIND_PGTBL;
> > +    memcpy(&nesting_op->data, &bind_data, sizeof(bind_data));
> > +    ioctl(container->fd, VFIO_IOMMU_NESTING_OP, nesting_op);
> > +
> > +As mentioned above, guest OS may use stage 1 for GIOVA->GPA or GVA->GPA.
> > +GVA->GPA page tables are available when PASID (Process Address Space
> > +GVA->ID)
> > +is exposed to guest. e.g. guest with PASID-capable devices assigned.
> > +For such page table binding, the bind_data should include PASID info,
> > +which is allocated by guest itself or by host. This depends on
> > +hardware vendor e.g. Intel VT-d requires to allocate PASID from host.
> > +This requirement is available by VFIO_IOMMU_GET_INFO. User space
> > +could allocate PASID from host by:
> > +
> > +    req.flags = VFIO_IOMMU_ALLOC_PASID;
> > +    ioctl(container, VFIO_IOMMU_PASID_REQUEST, &req);
> 
> It is not clear how the userspace application determines whether PASIDs must be
> allocated from the host via VFIO_IOMMU_PASID_REQUEST or if the guest itself can
> allocate PASIDs. The text mentions VFIO_IOMMU_GET_INFO but what exactly
> should the userspace application check?

For VT-d, spec 3.0 introduced Virtual Cmd interface for PASID allocation,
guest request PASID from host if it detects the interface. Application
should check the IOMMU_NESTING_FEAT_SYSWIDE_PASID setting in the below
info reported by VFIO_IOMMU_GET_INFO. And virtual VT-d should not report
SVA related capabilities to guest if  SYSWIDE_PASID is not supported by
kernel.

+struct iommu_nesting_info {
+	__u32	size;
+	__u32	format;
+	__u32	features;
+#define IOMMU_NESTING_FEAT_SYSWIDE_PASID	(1 << 0)
+#define IOMMU_NESTING_FEAT_BIND_PGTBL		(1 << 1)
+#define IOMMU_NESTING_FEAT_CACHE_INVLD		(1 << 2)
+	__u32	flags;
+	__u8	data[];
+};
https://lore.kernel.org/linux-iommu/1591877734-66527-3-git-send-email-yi.l.liu@intel.com/

Regards,
Yi Liu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 02/15] iommu: Report domain nesting info
  2020-06-11 12:15 ` [PATCH v2 02/15] iommu: Report domain nesting info Liu Yi L
  2020-06-11 19:30   ` Alex Williamson
@ 2020-06-17 14:39   ` Jean-Philippe Brucker
  2020-06-18 11:46     ` Liu, Yi L
  1 sibling, 1 reply; 37+ messages in thread
From: Jean-Philippe Brucker @ 2020-06-17 14:39 UTC (permalink / raw)
  To: Liu Yi L
  Cc: kevin.tian, ashok.raj, kvm, robin.murphy, yi.y.sun, linux-kernel,
	alex.williamson, iommu, hao.wu, will, jun.j.tian

[+ Will and Robin]

Hi Yi,

On Thu, Jun 11, 2020 at 05:15:21AM -0700, Liu Yi L wrote:
> IOMMUs that support nesting translation needs report the capability info
> to userspace, e.g. the format of first level/stage paging structures.
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Eric Auger <eric.auger@redhat.com>
> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
> @Jean, Eric: as nesting was introduced for ARM, but looks like no actual
> user of it. right? So I'm wondering if we can reuse DOMAIN_ATTR_NESTING
> to retrieve nesting info? how about your opinions?

Sure, I think we could rework the getters for DOMAIN_ATTR_NESTING since
they aren't used, but we do need to keep the setters as is.

Before attaching a domain, VFIO sets DOMAIN_ATTR_NESTING if userspace
requested a VFIO_TYPE1_NESTING_IOMMU container. This is necessary for the
SMMU driver to know how to attach later, but at that point we don't know
whether the SMMU does support nesting (since the domain isn't attached to
any endpoint). During attach, the SMMU driver adapts to the SMMU's
capabilities, and may well fallback to one stage if the SMMU doesn't
support nesting.

VFIO should check after attaching that the nesting attribute held, by
calling iommu_domain_get_attr(NESTING). At the moment it does not, and
since your 03/15 patch does that with additional info, I agree with
reusing DOMAIN_ATTR_NESTING instead of adding DOMAIN_ATTR_NESTING_INFO.

However it requires changing the get_attr(NESTING) implementations in both
SMMU drivers as a precursor of this series, to avoid breaking
VFIO_TYPE1_NESTING_IOMMU on Arm. Since we haven't yet defined the
nesting_info structs for SMMUv2 and v3, I suppose we could return an empty
struct iommu_nesting_info for now?

> 
>  include/linux/iommu.h      |  1 +
>  include/uapi/linux/iommu.h | 34 ++++++++++++++++++++++++++++++++++
>  2 files changed, 35 insertions(+)
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 78a26ae..f6e4b49 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -126,6 +126,7 @@ enum iommu_attr {
>  	DOMAIN_ATTR_FSL_PAMUV1,
>  	DOMAIN_ATTR_NESTING,	/* two stages of translation */
>  	DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
> +	DOMAIN_ATTR_NESTING_INFO,
>  	DOMAIN_ATTR_MAX,
>  };
>  
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> index 303f148..02eac73 100644
> --- a/include/uapi/linux/iommu.h
> +++ b/include/uapi/linux/iommu.h
> @@ -332,4 +332,38 @@ struct iommu_gpasid_bind_data {
>  	};
>  };
>  
> +struct iommu_nesting_info {
> +	__u32	size;
> +	__u32	format;

What goes into format? And flags? This structure needs some documentation.

Thanks,
Jean

> +	__u32	features;
> +#define IOMMU_NESTING_FEAT_SYSWIDE_PASID	(1 << 0)
> +#define IOMMU_NESTING_FEAT_BIND_PGTBL		(1 << 1)
> +#define IOMMU_NESTING_FEAT_CACHE_INVLD		(1 << 2)
> +	__u32	flags;
> +	__u8	data[];
> +};
> +
> +/*
> + * @flags:	VT-d specific flags. Currently reserved for future
> + *		extension.
> + * @addr_width:	The output addr width of first level/stage translation
> + * @pasid_bits:	Maximum supported PASID bits, 0 represents no PASID
> + *		support.
> + * @cap_reg:	Describe basic capabilities as defined in VT-d capability
> + *		register.
> + * @cap_mask:	Mark valid capability bits in @cap_reg.
> + * @ecap_reg:	Describe the extended capabilities as defined in VT-d
> + *		extended capability register.
> + * @ecap_mask:	Mark the valid capability bits in @ecap_reg.
> + */
> +struct iommu_nesting_info_vtd {
> +	__u32	flags;
> +	__u16	addr_width;
> +	__u16	pasid_bits;
> +	__u64	cap_reg;
> +	__u64	cap_mask;
> +	__u64	ecap_reg;
> +	__u64	ecap_mask;
> +};
> +
>  #endif /* _UAPI_IOMMU_H */
> -- 
> 2.7.4
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH v2 02/15] iommu: Report domain nesting info
  2020-06-17 14:39   ` Jean-Philippe Brucker
@ 2020-06-18 11:46     ` Liu, Yi L
  0 siblings, 0 replies; 37+ messages in thread
From: Liu, Yi L @ 2020-06-18 11:46 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Tian, Kevin, Raj, Ashok, kvm, robin.murphy, Sun, Yi Y,
	linux-kernel, alex.williamson, iommu, Wu, Hao, will, Tian,
	 Jun J

Hi Jean,

> From: Jean-Philippe Brucker < jean-philippe@linaro.org>
> Sent: Wednesday, June 17, 2020 10:39 PM
> 
> [+ Will and Robin]
> 
> Hi Yi,
> 
> On Thu, Jun 11, 2020 at 05:15:21AM -0700, Liu Yi L wrote:
> > IOMMUs that support nesting translation needs report the capability
> > info to userspace, e.g. the format of first level/stage paging structures.
> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Alex Williamson <alex.williamson@redhat.com>
> > Cc: Eric Auger <eric.auger@redhat.com>
> > Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > Cc: Joerg Roedel <joro@8bytes.org>
> > Cc: Lu Baolu <baolu.lu@linux.intel.com>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > ---
> > @Jean, Eric: as nesting was introduced for ARM, but looks like no
> > actual user of it. right? So I'm wondering if we can reuse
> > DOMAIN_ATTR_NESTING to retrieve nesting info? how about your opinions?
> 
> Sure, I think we could rework the getters for DOMAIN_ATTR_NESTING since they
> aren't used, but we do need to keep the setters as is.
> 
> Before attaching a domain, VFIO sets DOMAIN_ATTR_NESTING if userspace
> requested a VFIO_TYPE1_NESTING_IOMMU container. This is necessary for the
> SMMU driver to know how to attach later, but at that point we don't know whether
> the SMMU does support nesting (since the domain isn't attached to any endpoint).
> During attach, the SMMU driver adapts to the SMMU's capabilities, and may well
> fallback to one stage if the SMMU doesn't support nesting.

got you. so even VFIO sets DOMAIN_ATTR_NESTING successfully, it doesn't mean
the nesting will be used. yeah, it's a little bit different with VT-d side. intel iommu
driver will fail ATT_NESTING setting if it found not all iommu units in the system
are nesting capable.

> VFIO should check after attaching that the nesting attribute held, by calling
> iommu_domain_get_attr(NESTING). At the moment it does not, and since your
> 03/15 patch does that with additional info, I agree with reusing
> DOMAIN_ATTR_NESTING instead of adding DOMAIN_ATTR_NESTING_INFO.
>
> However it requires changing the get_attr(NESTING) implementations in both SMMU
> drivers as a precursor of this series, to avoid breaking
> VFIO_TYPE1_NESTING_IOMMU on Arm. Since we haven't yet defined the
> nesting_info structs for SMMUv2 and v3, I suppose we could return an empty struct
> iommu_nesting_info for now?

got you. I think it works. So far, I didn't see any getter for ATTR_NESTING, once
SMMU drivers return empty struct iommu_nesting_info, VFIO won't fail. will
do it when switching to reuse ATTR_NESTING for getting nesting info.

> >
> >  include/linux/iommu.h      |  1 +
> >  include/uapi/linux/iommu.h | 34 ++++++++++++++++++++++++++++++++++
> >  2 files changed, 35 insertions(+)
> >
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h index
> > 78a26ae..f6e4b49 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -126,6 +126,7 @@ enum iommu_attr {
> >  	DOMAIN_ATTR_FSL_PAMUV1,
> >  	DOMAIN_ATTR_NESTING,	/* two stages of translation */
> >  	DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
> > +	DOMAIN_ATTR_NESTING_INFO,
> >  	DOMAIN_ATTR_MAX,
> >  };
> >
> > diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> > index 303f148..02eac73 100644
> > --- a/include/uapi/linux/iommu.h
> > +++ b/include/uapi/linux/iommu.h
> > @@ -332,4 +332,38 @@ struct iommu_gpasid_bind_data {
> >  	};
> >  };
> >
> > +struct iommu_nesting_info {
> > +	__u32	size;
> > +	__u32	format;
> 
> What goes into format? And flags? This structure needs some documentation.

format will be the same with the definition of @format in iommu_gpasid_bind_data.
flags is reserved for future extension. will add description in next version. :-)

struct iommu_gpasid_bind_data {
        __u32 argsz;
#define IOMMU_GPASID_BIND_VERSION_1     1
        __u32 version;
#define IOMMU_PASID_FORMAT_INTEL_VTD    1
        __u32 format;
#define IOMMU_SVA_GPASID_VAL    (1 << 0) /* guest PASID valid */
        __u64 flags;
        __u64 gpgd;
        __u64 hpasid;
        __u64 gpasid;
        __u32 addr_width;
        __u8  padding[12];
        /* Vendor specific data */
        union {
                struct iommu_gpasid_bind_data_vtd vtd;
        } vendor;
};

Regards,
Yi Liu

> Thanks,
> Jean
> 
> > +	__u32	features;
> > +#define IOMMU_NESTING_FEAT_SYSWIDE_PASID	(1 << 0)
> > +#define IOMMU_NESTING_FEAT_BIND_PGTBL		(1 << 1)
> > +#define IOMMU_NESTING_FEAT_CACHE_INVLD		(1 << 2)
> > +	__u32	flags;
> > +	__u8	data[];
> > +};
> > +
> > +/*
> > + * @flags:	VT-d specific flags. Currently reserved for future
> > + *		extension.
> > + * @addr_width:	The output addr width of first level/stage translation
> > + * @pasid_bits:	Maximum supported PASID bits, 0 represents no PASID
> > + *		support.
> > + * @cap_reg:	Describe basic capabilities as defined in VT-d capability
> > + *		register.
> > + * @cap_mask:	Mark valid capability bits in @cap_reg.
> > + * @ecap_reg:	Describe the extended capabilities as defined in VT-d
> > + *		extended capability register.
> > + * @ecap_mask:	Mark the valid capability bits in @ecap_reg.
> > + */
> > +struct iommu_nesting_info_vtd {
> > +	__u32	flags;
> > +	__u16	addr_width;
> > +	__u16	pasid_bits;
> > +	__u64	cap_reg;
> > +	__u64	cap_mask;
> > +	__u64	ecap_reg;
> > +	__u64	ecap_mask;
> > +};
> > +
> >  #endif /* _UAPI_IOMMU_H */
> > --
> > 2.7.4
> >
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 00/15] vfio: expose virtual Shared Virtual Addressing to VMs
  2020-06-16 17:00       ` Raj, Ashok
@ 2020-06-22 12:49         ` Stefan Hajnoczi
  0 siblings, 0 replies; 37+ messages in thread
From: Stefan Hajnoczi @ 2020-06-22 12:49 UTC (permalink / raw)
  To: Raj, Ashok
  Cc: jean-philippe, Tian, Kevin, kvm, iommu, Sun, Yi Y, linux-kernel,
	alex.williamson, Wu, Hao, Tian, Jun J


[-- Attachment #1.1: Type: text/plain, Size: 5729 bytes --]

On Tue, Jun 16, 2020 at 10:00:16AM -0700, Raj, Ashok wrote:
> On Tue, Jun 16, 2020 at 04:49:28PM +0100, Stefan Hajnoczi wrote:
> > On Tue, Jun 16, 2020 at 02:26:38AM +0000, Tian, Kevin wrote:
> > > > From: Stefan Hajnoczi <stefanha@gmail.com>
> > > > Sent: Monday, June 15, 2020 6:02 PM
> > > > 
> > > > On Thu, Jun 11, 2020 at 05:15:19AM -0700, Liu Yi L wrote:
> > > > > Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on
> > > > > Intel platforms allows address space sharing between device DMA and
> > > > > applications. SVA can reduce programming complexity and enhance
> > > > security.
> > > > >
> > > > > This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing
> > > > > guest application address space with passthru devices. This is called
> > > > > vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU
> > > > > changes. For IOMMU and QEMU changes, they are in separate series (listed
> > > > > in the "Related series").
> > > > >
> > > > > The high-level architecture for SVA virtualization is as below, the key
> > > > > design of vSVA support is to utilize the dual-stage IOMMU translation (
> > > > > also known as IOMMU nesting translation) capability in host IOMMU.
> > > > >
> > > > >
> > > > >     .-------------.  .---------------------------.
> > > > >     |   vIOMMU    |  | Guest process CR3, FL only|
> > > > >     |             |  '---------------------------'
> > > > >     .----------------/
> > > > >     | PASID Entry |--- PASID cache flush -
> > > > >     '-------------'                       |
> > > > >     |             |                       V
> > > > >     |             |                CR3 in GPA
> > > > >     '-------------'
> > > > > Guest
> > > > > ------| Shadow |--------------------------|--------
> > > > >       v        v                          v
> > > > > Host
> > > > >     .-------------.  .----------------------.
> > > > >     |   pIOMMU    |  | Bind FL for GVA-GPA  |
> > > > >     |             |  '----------------------'
> > > > >     .----------------/  |
> > > > >     | PASID Entry |     V (Nested xlate)
> > > > >     '----------------\.------------------------------.
> > > > >     |             |   |SL for GPA-HPA, default domain|
> > > > >     |             |   '------------------------------'
> > > > >     '-------------'
> > > > > Where:
> > > > >  - FL = First level/stage one page tables
> > > > >  - SL = Second level/stage two page tables
> > > > 
> > > > Hi,
> > > > Looks like an interesting feature!
> > > > 
> > > > To check I understand this feature: can applications now pass virtual
> > > > addresses to devices instead of translating to IOVAs?
> > > > 
> > > > If yes, can guest applications restrict the vSVA address space so the
> > > > device only has access to certain regions?
> > > > 
> > > > On one hand replacing IOVA translation with virtual addresses simplifies
> > > > the application programming model, but does it give up isolation if the
> > > > device can now access all application memory?
> > > > 
> > > 
> > > with SVA each application is allocated with a unique PASID to tag its
> > > virtual address space. The device that claims SVA support must guarantee 
> > > that one application can only program the device to access its own virtual
> > > address space (i.e. all DMAs triggered by this application are tagged with
> > > the application's PASID, and are translated by IOMMU's PASID-granular
> > > page table). So, isolation is not sacrificed in SVA.
> > 
> > Isolation between applications is preserved but there is no isolation
> > between the device and the application itself. The application needs to
> > trust the device.
> 
> Right. With all convenience comes security trust. With SVA there is an
> expectation that the device has the required security boundaries properly
> implemented. FWIW, what is our guarantee today that VF's are secure from
> one another or even its own PF? They can also generate transactions with
> any of its peer id's and there is nothing an IOMMU can do today. Other than
> rely on ACS. Even BusMaster enable can be ignored and devices (malicious
> or otherwise) can generate after the BM=0. With SVM you get the benefits of
> 
> * Not having to register regions
> * Don't need to pin application space for DMA.

As along as the security model is clearly documented users can decide
whether or not SVA meets their requirements. I just wanted to clarify
what the security model is.

> 
> > 
> > Examples:
> > 
> > 1. The device can snoop secret data from readable pages in the
> >    application's virtual memory space.
> 
> Aren't there other security technologies that can address this?

Maybe the IOMMU could enforce Memory Protection Keys? Imagine each
device is assigned a subset of memory protection keys and the IOMMU
checks them on each device access. This would allow the application to
mark certain pages off-limits to the device but the IOMMU could still
walk the full process page table (no need to construct a special device
page table for the IOMMU).

> > 
> > 2. The device can gain arbitrary execution on the CPU by overwriting
> >    control flow addresses (e.g. function pointers, stack return
> >    addresses) in writable pages.
> 
> I suppose technology like CET might be able to guard. The general
> expectation is code pages and anything that needs to be protected should be
> mapped nor writable.

Function pointers are a common exception to this. They are often located
in writable heap or stack pages.

There might also be dynamic linker memory structures that are easy to
hijack.

Stefan

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 00/15] vfio: expose virtual Shared Virtual Addressing to VMs
  2020-06-16 16:09       ` Peter Xu
@ 2020-06-22 12:49         ` Stefan Hajnoczi
  0 siblings, 0 replies; 37+ messages in thread
From: Stefan Hajnoczi @ 2020-06-22 12:49 UTC (permalink / raw)
  To: Peter Xu
  Cc: jean-philippe, Tian, Kevin, Raj, Ashok, kvm, Sun, Yi Y, iommu,
	linux-kernel, alex.williamson, Wu, Hao, Tian, Jun J


[-- Attachment #1.1: Type: text/plain, Size: 1458 bytes --]

On Tue, Jun 16, 2020 at 12:09:16PM -0400, Peter Xu wrote:
> On Tue, Jun 16, 2020 at 04:49:28PM +0100, Stefan Hajnoczi wrote:
> > Isolation between applications is preserved but there is no isolation
> > between the device and the application itself. The application needs to
> > trust the device.
> > 
> > Examples:
> > 
> > 1. The device can snoop secret data from readable pages in the
> >    application's virtual memory space.
> > 
> > 2. The device can gain arbitrary execution on the CPU by overwriting
> >    control flow addresses (e.g. function pointers, stack return
> >    addresses) in writable pages.
> 
> To me, SVA seems to be that "middle layer" of secure where it's not as safe as
> VFIO_IOMMU_MAP_DMA which has buffer level granularity of control (but of course
> we pay overhead on buffer setups and on-the-fly translations), however it's far
> better than DMA with no IOMMU which can ruin the whole host/guest, because
> after all we do a lot of isolations as process based.
> 
> IMHO it's the same as when we see a VM (or the QEMU process) as a whole along
> with the guest code.  In some cases we don't care if the guest did some bad
> things to mess up with its own QEMU process.  It is still ideal if we can even
> stop the guest from doing so, but when it's not easy to do it the ideal way, we
> just lower the requirement to not spread the influence to the host and other
> VMs.

Makes sense.

Stefan

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 14/15] vfio: Document dual stage control
  2020-06-17  6:27     ` Liu, Yi L
@ 2020-06-22 12:51       ` Stefan Hajnoczi
  2020-06-23  6:43         ` Liu, Yi L
  0 siblings, 1 reply; 37+ messages in thread
From: Stefan Hajnoczi @ 2020-06-22 12:51 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: jean-philippe, Tian, Kevin, Raj, Ashok, kvm, Sun, Yi Y,
	linux-kernel, alex.williamson, iommu, Wu, Hao, Tian, Jun J


[-- Attachment #1.1: Type: text/plain, Size: 5855 bytes --]

On Wed, Jun 17, 2020 at 06:27:27AM +0000, Liu, Yi L wrote:
> > From: Stefan Hajnoczi <stefanha@gmail.com>
> > Sent: Monday, June 15, 2020 5:41 PM
> > On Thu, Jun 11, 2020 at 05:15:33AM -0700, Liu Yi L wrote:
> >
> > > From: Eric Auger <eric.auger@redhat.com>
> > >
> > > The VFIO API was enhanced to support nested stage control: a bunch of
> > > new iotcls and usage guideline.
> > >
> > > Let's document the process to follow to set up nested mode.
> > >
> > > Cc: Kevin Tian <kevin.tian@intel.com>
> > > CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > Cc: Alex Williamson <alex.williamson@redhat.com>
> > > Cc: Eric Auger <eric.auger@redhat.com>
> > > Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > > Cc: Joerg Roedel <joro@8bytes.org>
> > > Cc: Lu Baolu <baolu.lu@linux.intel.com>
> > > Signed-off-by: Eric Auger <eric.auger@redhat.com>
> > > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > > ---
> > > v1 -> v2:
> > > *) new in v2, compared with Eric's original version, pasid table bind
> > >    and fault reporting is removed as this series doesn't cover them.
> > >    Original version from Eric.
> > >    https://lkml.org/lkml/2020/3/20/700
> > >
> > >  Documentation/driver-api/vfio.rst | 64
> > > +++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 64 insertions(+)
> > >
> > > diff --git a/Documentation/driver-api/vfio.rst
> > > b/Documentation/driver-api/vfio.rst
> > > index f1a4d3c..06224bd 100644
> > > --- a/Documentation/driver-api/vfio.rst
> > > +++ b/Documentation/driver-api/vfio.rst
> > > @@ -239,6 +239,70 @@ group and can access them as follows::
> > >  	/* Gratuitous device reset and go... */
> > >  	ioctl(device, VFIO_DEVICE_RESET);
> > >
> > > +IOMMU Dual Stage Control
> > > +------------------------
> > > +
> > > +Some IOMMUs support 2 stages/levels of translation. Stage corresponds
> > > +to the ARM terminology while level corresponds to Intel's VTD terminology.
> > > +In the following text we use either without distinction.
> > > +
> > > +This is useful when the guest is exposed with a virtual IOMMU and
> > > +some devices are assigned to the guest through VFIO. Then the guest
> > > +OS can use stage 1 (GIOVA -> GPA or GVA->GPA), while the hypervisor
> > > +uses stage 2 for VM isolation (GPA -> HPA).
> > > +
> > > +Under dual stage translation, the guest gets ownership of the stage 1
> > > +page tables and also owns stage 1 configuration structures. The
> > > +hypervisor owns the root configuration structure (for security
> > > +reason), including stage 2 configuration. This works as long
> > > +configuration structures and page table
> > 
> > s/as long configuration/as long as configuration/
> 
> got it.
> 
> > 
> > > +format are compatible between the virtual IOMMU and the physical IOMMU.
> > 
> > s/format/formats/
> 
> I see.
> 
> > > +
> > > +Assuming the HW supports it, this nested mode is selected by choosing
> > > +the VFIO_TYPE1_NESTING_IOMMU type through:
> > > +
> > > +    ioctl(container, VFIO_SET_IOMMU, VFIO_TYPE1_NESTING_IOMMU);
> > > +
> > > +This forces the hypervisor to use the stage 2, leaving stage 1
> > > +available for guest usage. The guest stage 1 format depends on IOMMU
> > > +vendor, and it is the same with the nesting configuration method.
> > > +User space should check the format and configuration method after
> > > +setting nesting type by
> > > +using:
> > > +
> > > +    ioctl(container->fd, VFIO_IOMMU_GET_INFO, &nesting_info);
> > > +
> > > +Details can be found in Documentation/userspace-api/iommu.rst. For
> > > +Intel VT-d, each stage 1 page table is bound to host by:
> > > +
> > > +    nesting_op->flags = VFIO_IOMMU_NESTING_OP_BIND_PGTBL;
> > > +    memcpy(&nesting_op->data, &bind_data, sizeof(bind_data));
> > > +    ioctl(container->fd, VFIO_IOMMU_NESTING_OP, nesting_op);
> > > +
> > > +As mentioned above, guest OS may use stage 1 for GIOVA->GPA or GVA->GPA.
> > > +GVA->GPA page tables are available when PASID (Process Address Space
> > > +GVA->ID)
> > > +is exposed to guest. e.g. guest with PASID-capable devices assigned.
> > > +For such page table binding, the bind_data should include PASID info,
> > > +which is allocated by guest itself or by host. This depends on
> > > +hardware vendor e.g. Intel VT-d requires to allocate PASID from host.
> > > +This requirement is available by VFIO_IOMMU_GET_INFO. User space
> > > +could allocate PASID from host by:
> > > +
> > > +    req.flags = VFIO_IOMMU_ALLOC_PASID;
> > > +    ioctl(container, VFIO_IOMMU_PASID_REQUEST, &req);
> > 
> > It is not clear how the userspace application determines whether PASIDs must be
> > allocated from the host via VFIO_IOMMU_PASID_REQUEST or if the guest itself can
> > allocate PASIDs. The text mentions VFIO_IOMMU_GET_INFO but what exactly
> > should the userspace application check?
> 
> For VT-d, spec 3.0 introduced Virtual Cmd interface for PASID allocation,
> guest request PASID from host if it detects the interface. Application
> should check the IOMMU_NESTING_FEAT_SYSWIDE_PASID setting in the below
> info reported by VFIO_IOMMU_GET_INFO. And virtual VT-d should not report
> SVA related capabilities to guest if  SYSWIDE_PASID is not supported by
> kernel.
> 
> +struct iommu_nesting_info {
> +	__u32	size;
> +	__u32	format;
> +	__u32	features;
> +#define IOMMU_NESTING_FEAT_SYSWIDE_PASID	(1 << 0)
> +#define IOMMU_NESTING_FEAT_BIND_PGTBL		(1 << 1)
> +#define IOMMU_NESTING_FEAT_CACHE_INVLD		(1 << 2)
> +	__u32	flags;
> +	__u8	data[];
> +};
> https://lore.kernel.org/linux-iommu/1591877734-66527-3-git-send-email-yi.l.liu@intel.com/

I see. Is it possible to add this information into this patch or at
least a reference so readers know where to find out exactly how to do
this?

Stefan

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH v2 14/15] vfio: Document dual stage control
  2020-06-22 12:51       ` Stefan Hajnoczi
@ 2020-06-23  6:43         ` Liu, Yi L
  0 siblings, 0 replies; 37+ messages in thread
From: Liu, Yi L @ 2020-06-23  6:43 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: jean-philippe, Tian, Kevin, Raj, Ashok, kvm, Sun, Yi Y,
	linux-kernel, alex.williamson, iommu, Wu, Hao, Tian, Jun J

> From: Stefan Hajnoczi <stefanha@gmail.com>
> Sent: Monday, June 22, 2020 8:51 PM
> 
> On Wed, Jun 17, 2020 at 06:27:27AM +0000, Liu, Yi L wrote:
> > > From: Stefan Hajnoczi <stefanha@gmail.com>
> > > Sent: Monday, June 15, 2020 5:41 PM
> > > On Thu, Jun 11, 2020 at 05:15:33AM -0700, Liu Yi L wrote:
> > >
> > > > From: Eric Auger <eric.auger@redhat.com>
> > > >
> > > > The VFIO API was enhanced to support nested stage control: a bunch of
> > > > new iotcls and usage guideline.
> > > >
> > > > Let's document the process to follow to set up nested mode.
> > > >
> > > > Cc: Kevin Tian <kevin.tian@intel.com>
> > > > CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > > Cc: Alex Williamson <alex.williamson@redhat.com>
> > > > Cc: Eric Auger <eric.auger@redhat.com>
> > > > Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > > > Cc: Joerg Roedel <joro@8bytes.org>
> > > > Cc: Lu Baolu <baolu.lu@linux.intel.com>
> > > > Signed-off-by: Eric Auger <eric.auger@redhat.com>
> > > > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > > > ---
> > > > v1 -> v2:
> > > > *) new in v2, compared with Eric's original version, pasid table bind
> > > >    and fault reporting is removed as this series doesn't cover them.
> > > >    Original version from Eric.
> > > >    https://lkml.org/lkml/2020/3/20/700
> > > >
> > > >  Documentation/driver-api/vfio.rst | 64
> > > > +++++++++++++++++++++++++++++++++++++++
> > > >  1 file changed, 64 insertions(+)
> > > >
> > > > diff --git a/Documentation/driver-api/vfio.rst
> > > > b/Documentation/driver-api/vfio.rst
> > > > index f1a4d3c..06224bd 100644
> > > > --- a/Documentation/driver-api/vfio.rst
> > > > +++ b/Documentation/driver-api/vfio.rst
> > > > @@ -239,6 +239,70 @@ group and can access them as follows::
> > > >  	/* Gratuitous device reset and go... */
> > > >  	ioctl(device, VFIO_DEVICE_RESET);
> > > >
> > > > +IOMMU Dual Stage Control
> > > > +------------------------
> > > > +
> > > > +Some IOMMUs support 2 stages/levels of translation. Stage corresponds
> > > > +to the ARM terminology while level corresponds to Intel's VTD terminology.
> > > > +In the following text we use either without distinction.
> > > > +
> > > > +This is useful when the guest is exposed with a virtual IOMMU and
> > > > +some devices are assigned to the guest through VFIO. Then the guest
> > > > +OS can use stage 1 (GIOVA -> GPA or GVA->GPA), while the hypervisor
> > > > +uses stage 2 for VM isolation (GPA -> HPA).
> > > > +
> > > > +Under dual stage translation, the guest gets ownership of the stage 1
> > > > +page tables and also owns stage 1 configuration structures. The
> > > > +hypervisor owns the root configuration structure (for security
> > > > +reason), including stage 2 configuration. This works as long
> > > > +configuration structures and page table
> > >
> > > s/as long configuration/as long as configuration/
> >
> > got it.
> >
> > >
> > > > +format are compatible between the virtual IOMMU and the physical IOMMU.
> > >
> > > s/format/formats/
> >
> > I see.
> >
> > > > +
> > > > +Assuming the HW supports it, this nested mode is selected by choosing
> > > > +the VFIO_TYPE1_NESTING_IOMMU type through:
> > > > +
> > > > +    ioctl(container, VFIO_SET_IOMMU, VFIO_TYPE1_NESTING_IOMMU);
> > > > +
> > > > +This forces the hypervisor to use the stage 2, leaving stage 1
> > > > +available for guest usage. The guest stage 1 format depends on IOMMU
> > > > +vendor, and it is the same with the nesting configuration method.
> > > > +User space should check the format and configuration method after
> > > > +setting nesting type by
> > > > +using:
> > > > +
> > > > +    ioctl(container->fd, VFIO_IOMMU_GET_INFO, &nesting_info);
> > > > +
> > > > +Details can be found in Documentation/userspace-api/iommu.rst. For
> > > > +Intel VT-d, each stage 1 page table is bound to host by:
> > > > +
> > > > +    nesting_op->flags = VFIO_IOMMU_NESTING_OP_BIND_PGTBL;
> > > > +    memcpy(&nesting_op->data, &bind_data, sizeof(bind_data));
> > > > +    ioctl(container->fd, VFIO_IOMMU_NESTING_OP, nesting_op);
> > > > +
> > > > +As mentioned above, guest OS may use stage 1 for GIOVA->GPA or GVA->GPA.
> > > > +GVA->GPA page tables are available when PASID (Process Address Space
> > > > +GVA->ID)
> > > > +is exposed to guest. e.g. guest with PASID-capable devices assigned.
> > > > +For such page table binding, the bind_data should include PASID info,
> > > > +which is allocated by guest itself or by host. This depends on
> > > > +hardware vendor e.g. Intel VT-d requires to allocate PASID from host.
> > > > +This requirement is available by VFIO_IOMMU_GET_INFO. User space
> > > > +could allocate PASID from host by:
> > > > +
> > > > +    req.flags = VFIO_IOMMU_ALLOC_PASID;
> > > > +    ioctl(container, VFIO_IOMMU_PASID_REQUEST, &req);
> > >
> > > It is not clear how the userspace application determines whether PASIDs must be
> > > allocated from the host via VFIO_IOMMU_PASID_REQUEST or if the guest itself
> can
> > > allocate PASIDs. The text mentions VFIO_IOMMU_GET_INFO but what exactly
> > > should the userspace application check?
> >
> > For VT-d, spec 3.0 introduced Virtual Cmd interface for PASID allocation,
> > guest request PASID from host if it detects the interface. Application
> > should check the IOMMU_NESTING_FEAT_SYSWIDE_PASID setting in the below
> > info reported by VFIO_IOMMU_GET_INFO. And virtual VT-d should not report
> > SVA related capabilities to guest if  SYSWIDE_PASID is not supported by
> > kernel.
> >
> > +struct iommu_nesting_info {
> > +	__u32	size;
> > +	__u32	format;
> > +	__u32	features;
> > +#define IOMMU_NESTING_FEAT_SYSWIDE_PASID	(1 << 0)
> > +#define IOMMU_NESTING_FEAT_BIND_PGTBL		(1 << 1)
> > +#define IOMMU_NESTING_FEAT_CACHE_INVLD		(1 << 2)
> > +	__u32	flags;
> > +	__u8	data[];
> > +};
> > https://lore.kernel.org/linux-iommu/1591877734-66527-3-git-send-email-
> yi.l.liu@intel.com/
> 
> I see. Is it possible to add this information into this patch or at
> least a reference so readers know where to find out exactly how to do
> this?

oh, yes. this would help a lot. will add it.

Regards,
Yi Liu
> Stefan
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2020-06-23  6:44 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-11 12:15 [PATCH v2 00/15] vfio: expose virtual Shared Virtual Addressing to VMs Liu Yi L
2020-06-11 12:15 ` [PATCH v2 01/15] vfio/type1: Refactor vfio_iommu_type1_ioctl() Liu Yi L
2020-06-11 12:15 ` [PATCH v2 02/15] iommu: Report domain nesting info Liu Yi L
2020-06-11 19:30   ` Alex Williamson
2020-06-12  9:05     ` Liu, Yi L
2020-06-15  1:22       ` Tian, Kevin
2020-06-15  6:04         ` Liu, Yi L
2020-06-16  1:56           ` Tian, Kevin
2020-06-16  2:24             ` Liu, Yi L
2020-06-17 14:39   ` Jean-Philippe Brucker
2020-06-18 11:46     ` Liu, Yi L
2020-06-11 12:15 ` [PATCH v2 03/15] vfio/type1: Report iommu nesting info to userspace Liu Yi L
2020-06-11 12:15 ` [PATCH v2 04/15] vfio: Add PASID allocation/free support Liu Yi L
2020-06-11 12:15 ` [PATCH v2 05/15] iommu/vt-d: Support setting ioasid set to domain Liu Yi L
2020-06-11 12:15 ` [PATCH v2 06/15] vfio/type1: Add VFIO_IOMMU_PASID_REQUEST (alloc/free) Liu Yi L
2020-06-11 12:15 ` [PATCH v2 07/15] iommu/uapi: Add iommu_gpasid_unbind_data Liu Yi L
2020-06-11 12:15 ` [PATCH v2 08/15] iommu: Pass domain and unbind_data to sva_unbind_gpasid() Liu Yi L
2020-06-11 12:15 ` [PATCH v2 09/15] iommu/vt-d: Check ownership for PASIDs from user-space Liu Yi L
2020-06-11 12:15 ` [PATCH v2 10/15] vfio/type1: Support binding guest page tables to PASID Liu Yi L
2020-06-11 12:15 ` [PATCH v2 11/15] vfio/type1: Allow invalidating first-level/stage IOMMU cache Liu Yi L
2020-06-11 12:15 ` [PATCH v2 12/15] vfio/type1: Add vSVA support for IOMMU-backed mdevs Liu Yi L
2020-06-11 12:15 ` [PATCH v2 13/15] vfio/pci: Expose PCIe PASID capability to guest Liu Yi L
2020-06-11 12:15 ` [PATCH v2 14/15] vfio: Document dual stage control Liu Yi L
2020-06-15  9:41   ` Stefan Hajnoczi
2020-06-17  6:27     ` Liu, Yi L
2020-06-22 12:51       ` Stefan Hajnoczi
2020-06-23  6:43         ` Liu, Yi L
2020-06-11 12:15 ` [PATCH v2 15/15] iommu/vt-d: Support reporting nesting capability info Liu Yi L
2020-06-15 10:02 ` [PATCH v2 00/15] vfio: expose virtual Shared Virtual Addressing to VMs Stefan Hajnoczi
2020-06-15 12:39   ` Liu, Yi L
2020-06-16 15:34     ` Stefan Hajnoczi
2020-06-16  2:26   ` Tian, Kevin
2020-06-16 15:49     ` Stefan Hajnoczi
2020-06-16 16:09       ` Peter Xu
2020-06-22 12:49         ` Stefan Hajnoczi
2020-06-16 17:00       ` Raj, Ashok
2020-06-22 12:49         ` Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).