iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/15] vfio: expose virtual Shared Virtual Addressing to VMs
@ 2020-07-12 11:20 Liu Yi L
  2020-07-12 11:20 ` [PATCH v5 01/15] vfio/type1: Refactor vfio_iommu_type1_ioctl() Liu Yi L
                   ` (14 more replies)
  0 siblings, 15 replies; 58+ messages in thread
From: Liu Yi L @ 2020-07-12 11:20 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	iommu, linux-kernel, yi.y.sun, hao.wu

Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on
Intel platforms allows address space sharing between device DMA and
applications. SVA can reduce programming complexity and enhance security.

This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing
guest application address space with passthru devices. This is called
vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU
changes. For IOMMU and QEMU changes, they are in separate series (listed
in the "Related series").

The high-level architecture for SVA virtualization is as below, the key
design of vSVA support is to utilize the dual-stage IOMMU translation (
also known as IOMMU nesting translation) capability in host IOMMU.


    .-------------.  .---------------------------.
    |   vIOMMU    |  | Guest process CR3, FL only|
    |             |  '---------------------------'
    .----------------/
    | PASID Entry |--- PASID cache flush -
    '-------------'                       |
    |             |                       V
    |             |                CR3 in GPA
    '-------------'
Guest
------| Shadow |--------------------------|--------
      v        v                          v
Host
    .-------------.  .----------------------.
    |   pIOMMU    |  | Bind FL for GVA-GPA  |
    |             |  '----------------------'
    .----------------/  |
    | PASID Entry |     V (Nested xlate)
    '----------------\.------------------------------.
    |             |   |SL for GPA-HPA, default domain|
    |             |   '------------------------------'
    '-------------'
Where:
 - FL = First level/stage one page tables
 - SL = Second level/stage two page tables

Patch Overview:
 1. a refactor to vfio_iommu_type1 ioctl (patch 0001)
 2. reports IOMMU nesting info to userspace ( patch 0002, 0003, 0004 and 0015)
 3. vfio support for PASID allocation and free for VMs (patch 0005, 0006, 0007)
 4. vfio support for binding guest page table to host (patch 0008, 0009, 0010)
 5. vfio support for IOMMU cache invalidation from VMs (patch 0011)
 6. vfio support for vSVA usage on IOMMU-backed mdevs (patch 0012)
 7. expose PASID capability to VM (patch 0013)
 8. add doc for VFIO dual stage control (patch 0014)

The complete vSVA kernel upstream patches are divided into three phases:
    1. Common APIs and PCI device direct assignment
    2. IOMMU-backed Mediated Device assignment
    3. Page Request Services (PRS) support

This patchset is aiming for the phase 1 and phase 2, and based on Jacob's
below series.
*) [PATCH v4 0/5] IOMMU user API enhancement - wip
   https://lore.kernel.org/linux-iommu/1594165429-20075-1-git-send-email-jacob.jun.pan@linux.intel.com/

*) [PATCH 00/10] IOASID extensions for guest SVA - wip
   https://lkml.org/lkml/2020/3/25/874

The latest IOASID code added below new interface for itertate all PASIDs of an
ioasid_set. The implementation is not sent out yet as Jacob needs some cleanup,
it can be found in branch vsva-linux-5.8-rc3-v5 on github (mentioned below):
 int ioasid_set_for_each_ioasid(int sid, void (*fn)(ioasid_t id, void *data), void *data);

Complete set for current vSVA can be found in below branch.
https://github.com/luxis1999/linux-vsva.git: vsva-linux-5.8-rc3-v5

The corresponding QEMU patch series is included in below branch:
https://github.com/luxis1999/qemu.git: vsva_5.8_rc3_qemu_rfcv8


Regards,
Yi Liu

Changelog:
	- Patch v4 -> Patch v5:
	  a) Address comments against v4
	  Patch v4: https://lore.kernel.org/kvm/1593861989-35920-1-git-send-email-yi.l.liu@intel.com/

	- Patch v3 -> Patch v4:
	  a) Address comments against v3
	  b) Add rb from Stefan on patch 14/15
	  Patch v3: https://lore.kernel.org/linux-iommu/1592988927-48009-1-git-send-email-yi.l.liu@intel.com/

	- Patch v2 -> Patch v3:
	  a) Rebase on top of Jacob's v3 iommu uapi patchset
	  b) Address comments from Kevin and Stefan Hajnoczi
	  c) Reuse DOMAIN_ATTR_NESTING to get iommu nesting info
	  d) Drop [PATCH v2 07/15] iommu/uapi: Add iommu_gpasid_unbind_data
	  Patch v2: https://lore.kernel.org/linux-iommu/1591877734-66527-1-git-send-email-yi.l.liu@intel.com/#r

	- Patch v1 -> Patch v2:
	  a) Refactor vfio_iommu_type1_ioctl() per suggestion from Christoph
	     Hellwig.
	  b) Re-sequence the patch series for better bisect support.
	  c) Report IOMMU nesting cap info in detail instead of a format in
	     v1.
	  d) Enforce one group per nesting type container for vfio iommu type1
	     driver.
	  e) Build the vfio_mm related code from vfio.c to be a separate
	     vfio_pasid.ko.
	  f) Add PASID ownership check in IOMMU driver.
	  g) Adopted to latest IOMMU UAPI design. Removed IOMMU UAPI version
	     check. Added iommu_gpasid_unbind_data for unbind requests from
	     userspace.
	  h) Define a single ioctl:VFIO_IOMMU_NESTING_OP for bind/unbind_gtbl
	     and cahce_invld.
	  i) Document dual stage control in vfio.rst.
	  Patch v1: https://lore.kernel.org/linux-iommu/1584880325-10561-1-git-send-email-yi.l.liu@intel.com/

	- RFC v3 -> Patch v1:
	  a) Address comments to the PASID request(alloc/free) path
	  b) Report PASID alloc/free availabitiy to user-space
	  c) Add a vfio_iommu_type1 parameter to support pasid quota tuning
	  d) Adjusted to latest ioasid code implementation. e.g. remove the
	     code for tracking the allocated PASIDs as latest ioasid code
	     will track it, VFIO could use ioasid_free_set() to free all
	     PASIDs.
	  RFC v3: https://lore.kernel.org/linux-iommu/1580299912-86084-1-git-send-email-yi.l.liu@intel.com/

	- RFC v2 -> v3:
	  a) Refine the whole patchset to fit the roughly parts in this series
	  b) Adds complete vfio PASID management framework. e.g. pasid alloc,
	  free, reclaim in VM crash/down and per-VM PASID quota to prevent
	  PASID abuse.
	  c) Adds IOMMU uAPI version check and page table format check to ensure
	  version compatibility and hardware compatibility.
	  d) Adds vSVA vfio support for IOMMU-backed mdevs.
	  RFC v2: https://lore.kernel.org/linux-iommu/1571919983-3231-1-git-send-email-yi.l.liu@intel.com/

	- RFC v1 -> v2:
	  Dropped vfio: VFIO_IOMMU_ATTACH/DETACH_PASID_TABLE.
	  RFC v1: https://lore.kernel.org/linux-iommu/1562324772-3084-1-git-send-email-yi.l.liu@intel.com/

---
Eric Auger (1):
  vfio: Document dual stage control

Liu Yi L (13):
  vfio/type1: Refactor vfio_iommu_type1_ioctl()
  iommu: Report domain nesting info
  iommu/smmu: Report empty domain nesting info
  vfio/type1: Report iommu nesting info to userspace
  vfio: Add PASID allocation/free support
  iommu/vt-d: Support setting ioasid set to domain
  vfio/type1: Add VFIO_IOMMU_PASID_REQUEST (alloc/free)
  iommu/vt-d: Check ownership for PASIDs from user-space
  vfio/type1: Support binding guest page tables to PASID
  vfio/type1: Allow invalidating first-level/stage IOMMU cache
  vfio/type1: Add vSVA support for IOMMU-backed mdevs
  vfio/pci: Expose PCIe PASID capability to guest
  iommu/vt-d: Support reporting nesting capability info

Yi Sun (1):
  iommu: Pass domain to sva_unbind_gpasid()

 Documentation/driver-api/vfio.rst  |  67 +++
 drivers/iommu/arm-smmu-v3.c        |  29 +-
 drivers/iommu/arm-smmu.c           |  29 +-
 drivers/iommu/intel/iommu.c        | 113 ++++-
 drivers/iommu/intel/svm.c          |  10 +-
 drivers/iommu/iommu.c              |   2 +-
 drivers/vfio/Kconfig               |   6 +
 drivers/vfio/Makefile              |   1 +
 drivers/vfio/pci/vfio_pci_config.c |   2 +-
 drivers/vfio/vfio_iommu_type1.c    | 818 ++++++++++++++++++++++++++++---------
 drivers/vfio/vfio_pasid.c          | 271 ++++++++++++
 include/linux/intel-iommu.h        |  23 +-
 include/linux/iommu.h              |   4 +-
 include/linux/vfio.h               |  54 +++
 include/uapi/linux/iommu.h         |  77 ++++
 include/uapi/linux/vfio.h          |  90 ++++
 16 files changed, 1395 insertions(+), 201 deletions(-)
 create mode 100644 drivers/vfio/vfio_pasid.c

-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v5 01/15] vfio/type1: Refactor vfio_iommu_type1_ioctl()
  2020-07-12 11:20 [PATCH v5 00/15] vfio: expose virtual Shared Virtual Addressing to VMs Liu Yi L
@ 2020-07-12 11:20 ` Liu Yi L
  2020-07-12 11:20 ` [PATCH v5 02/15] iommu: Report domain nesting info Liu Yi L
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 58+ messages in thread
From: Liu Yi L @ 2020-07-12 11:20 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	iommu, linux-kernel, yi.y.sun, hao.wu

This patch refactors the vfio_iommu_type1_ioctl() to use switch instead of
if-else, and each command got a helper function.

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
v4 -> v5:
*) address comments from Eric Auger, add r-b from Eric.
---
 drivers/vfio/vfio_iommu_type1.c | 394 ++++++++++++++++++++++------------------
 1 file changed, 213 insertions(+), 181 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 5e556ac..3bd70ff 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -2453,6 +2453,23 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
 	return ret;
 }
 
+static int vfio_iommu_type1_check_extension(struct vfio_iommu *iommu,
+					    unsigned long arg)
+{
+	switch (arg) {
+	case VFIO_TYPE1_IOMMU:
+	case VFIO_TYPE1v2_IOMMU:
+	case VFIO_TYPE1_NESTING_IOMMU:
+		return 1;
+	case VFIO_DMA_CC_IOMMU:
+		if (!iommu)
+			return 0;
+		return vfio_domains_have_iommu_cache(iommu);
+	default:
+		return 0;
+	}
+}
+
 static int vfio_iommu_iova_add_cap(struct vfio_info_cap *caps,
 		 struct vfio_iommu_type1_info_cap_iova_range *cap_iovas,
 		 size_t size)
@@ -2529,241 +2546,256 @@ static int vfio_iommu_migration_build_caps(struct vfio_iommu *iommu,
 	return vfio_info_add_capability(caps, &cap_mig.header, sizeof(cap_mig));
 }
 
-static long vfio_iommu_type1_ioctl(void *iommu_data,
-				   unsigned int cmd, unsigned long arg)
+static int vfio_iommu_type1_get_info(struct vfio_iommu *iommu,
+				     unsigned long arg)
 {
-	struct vfio_iommu *iommu = iommu_data;
+	struct vfio_iommu_type1_info info;
 	unsigned long minsz;
+	struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
+	unsigned long capsz;
+	int ret;
 
-	if (cmd == VFIO_CHECK_EXTENSION) {
-		switch (arg) {
-		case VFIO_TYPE1_IOMMU:
-		case VFIO_TYPE1v2_IOMMU:
-		case VFIO_TYPE1_NESTING_IOMMU:
-			return 1;
-		case VFIO_DMA_CC_IOMMU:
-			if (!iommu)
-				return 0;
-			return vfio_domains_have_iommu_cache(iommu);
-		default:
-			return 0;
-		}
-	} else if (cmd == VFIO_IOMMU_GET_INFO) {
-		struct vfio_iommu_type1_info info;
-		struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
-		unsigned long capsz;
-		int ret;
-
-		minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
+	minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
 
-		/* For backward compatibility, cannot require this */
-		capsz = offsetofend(struct vfio_iommu_type1_info, cap_offset);
+	/* For backward compatibility, cannot require this */
+	capsz = offsetofend(struct vfio_iommu_type1_info, cap_offset);
 
-		if (copy_from_user(&info, (void __user *)arg, minsz))
-			return -EFAULT;
+	if (copy_from_user(&info, (void __user *)arg, minsz))
+		return -EFAULT;
 
-		if (info.argsz < minsz)
-			return -EINVAL;
+	if (info.argsz < minsz)
+		return -EINVAL;
 
-		if (info.argsz >= capsz) {
-			minsz = capsz;
-			info.cap_offset = 0; /* output, no-recopy necessary */
-		}
+	if (info.argsz >= capsz) {
+		minsz = capsz;
+		info.cap_offset = 0; /* output, no-recopy necessary */
+	}
 
-		mutex_lock(&iommu->lock);
-		info.flags = VFIO_IOMMU_INFO_PGSIZES;
+	mutex_lock(&iommu->lock);
+	info.flags = VFIO_IOMMU_INFO_PGSIZES;
 
-		info.iova_pgsizes = iommu->pgsize_bitmap;
+	info.iova_pgsizes = iommu->pgsize_bitmap;
 
-		ret = vfio_iommu_migration_build_caps(iommu, &caps);
+	ret = vfio_iommu_migration_build_caps(iommu, &caps);
 
-		if (!ret)
-			ret = vfio_iommu_iova_build_caps(iommu, &caps);
+	if (!ret)
+		ret = vfio_iommu_iova_build_caps(iommu, &caps);
 
-		mutex_unlock(&iommu->lock);
+	mutex_unlock(&iommu->lock);
 
-		if (ret)
-			return ret;
+	if (ret)
+		return ret;
 
-		if (caps.size) {
-			info.flags |= VFIO_IOMMU_INFO_CAPS;
+	if (caps.size) {
+		info.flags |= VFIO_IOMMU_INFO_CAPS;
 
-			if (info.argsz < sizeof(info) + caps.size) {
-				info.argsz = sizeof(info) + caps.size;
-			} else {
-				vfio_info_cap_shift(&caps, sizeof(info));
-				if (copy_to_user((void __user *)arg +
-						sizeof(info), caps.buf,
-						caps.size)) {
-					kfree(caps.buf);
-					return -EFAULT;
-				}
-				info.cap_offset = sizeof(info);
+		if (info.argsz < sizeof(info) + caps.size) {
+			info.argsz = sizeof(info) + caps.size;
+		} else {
+			vfio_info_cap_shift(&caps, sizeof(info));
+			if (copy_to_user((void __user *)arg +
+					sizeof(info), caps.buf,
+					caps.size)) {
+				kfree(caps.buf);
+				return -EFAULT;
 			}
-
-			kfree(caps.buf);
+			info.cap_offset = sizeof(info);
 		}
 
-		return copy_to_user((void __user *)arg, &info, minsz) ?
-			-EFAULT : 0;
+		kfree(caps.buf);
+	}
 
-	} else if (cmd == VFIO_IOMMU_MAP_DMA) {
-		struct vfio_iommu_type1_dma_map map;
-		uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
-				VFIO_DMA_MAP_FLAG_WRITE;
+	return copy_to_user((void __user *)arg, &info, minsz) ?
+			-EFAULT : 0;
+}
 
-		minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
+static int vfio_iommu_type1_map_dma(struct vfio_iommu *iommu,
+				    unsigned long arg)
+{
+	struct vfio_iommu_type1_dma_map map;
+	unsigned long minsz;
+	uint32_t mask = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
 
-		if (copy_from_user(&map, (void __user *)arg, minsz))
-			return -EFAULT;
+	minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
 
-		if (map.argsz < minsz || map.flags & ~mask)
-			return -EINVAL;
+	if (copy_from_user(&map, (void __user *)arg, minsz))
+		return -EFAULT;
 
-		return vfio_dma_do_map(iommu, &map);
+	if (map.argsz < minsz || map.flags & ~mask)
+		return -EINVAL;
 
-	} else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
-		struct vfio_iommu_type1_dma_unmap unmap;
-		struct vfio_bitmap bitmap = { 0 };
-		int ret;
+	return vfio_dma_do_map(iommu, &map);
+}
 
-		minsz = offsetofend(struct vfio_iommu_type1_dma_unmap, size);
+static int vfio_iommu_type1_unmap_dma(struct vfio_iommu *iommu,
+				      unsigned long arg)
+{
+	struct vfio_iommu_type1_dma_unmap unmap;
+	struct vfio_bitmap bitmap = { 0 };
+	unsigned long minsz;
+	int ret;
 
-		if (copy_from_user(&unmap, (void __user *)arg, minsz))
-			return -EFAULT;
+	minsz = offsetofend(struct vfio_iommu_type1_dma_unmap, size);
 
-		if (unmap.argsz < minsz ||
-		    unmap.flags & ~VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP)
-			return -EINVAL;
+	if (copy_from_user(&unmap, (void __user *)arg, minsz))
+		return -EFAULT;
 
-		if (unmap.flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) {
-			unsigned long pgshift;
+	if (unmap.argsz < minsz ||
+	    unmap.flags & ~VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP)
+		return -EINVAL;
 
-			if (unmap.argsz < (minsz + sizeof(bitmap)))
-				return -EINVAL;
+	if (unmap.flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) {
+		unsigned long pgshift;
 
-			if (copy_from_user(&bitmap,
-					   (void __user *)(arg + minsz),
-					   sizeof(bitmap)))
-				return -EFAULT;
+		if (unmap.argsz < (minsz + sizeof(bitmap)))
+			return -EINVAL;
 
-			if (!access_ok((void __user *)bitmap.data, bitmap.size))
-				return -EINVAL;
+		if (copy_from_user(&bitmap,
+				   (void __user *)(arg + minsz),
+				   sizeof(bitmap)))
+			return -EFAULT;
 
-			pgshift = __ffs(bitmap.pgsize);
-			ret = verify_bitmap_size(unmap.size >> pgshift,
-						 bitmap.size);
-			if (ret)
-				return ret;
-		}
+		if (!access_ok((void __user *)bitmap.data, bitmap.size))
+			return -EINVAL;
 
-		ret = vfio_dma_do_unmap(iommu, &unmap, &bitmap);
+		pgshift = __ffs(bitmap.pgsize);
+		ret = verify_bitmap_size(unmap.size >> pgshift,
+					 bitmap.size);
 		if (ret)
 			return ret;
+	}
+
+	ret = vfio_dma_do_unmap(iommu, &unmap, &bitmap);
+	if (ret)
+		return ret;
 
-		return copy_to_user((void __user *)arg, &unmap, minsz) ?
+	return copy_to_user((void __user *)arg, &unmap, minsz) ?
 			-EFAULT : 0;
-	} else if (cmd == VFIO_IOMMU_DIRTY_PAGES) {
-		struct vfio_iommu_type1_dirty_bitmap dirty;
-		uint32_t mask = VFIO_IOMMU_DIRTY_PAGES_FLAG_START |
-				VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP |
-				VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP;
-		int ret = 0;
+}
 
-		if (!iommu->v2)
-			return -EACCES;
+static int vfio_iommu_type1_dirty_pages(struct vfio_iommu *iommu,
+					unsigned long arg)
+{
+	struct vfio_iommu_type1_dirty_bitmap dirty;
+	uint32_t mask = VFIO_IOMMU_DIRTY_PAGES_FLAG_START |
+			VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP |
+			VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP;
+	unsigned long minsz;
+	int ret = 0;
 
-		minsz = offsetofend(struct vfio_iommu_type1_dirty_bitmap,
-				    flags);
+	if (!iommu->v2)
+		return -EACCES;
 
-		if (copy_from_user(&dirty, (void __user *)arg, minsz))
-			return -EFAULT;
+	minsz = offsetofend(struct vfio_iommu_type1_dirty_bitmap, flags);
 
-		if (dirty.argsz < minsz || dirty.flags & ~mask)
-			return -EINVAL;
+	if (copy_from_user(&dirty, (void __user *)arg, minsz))
+		return -EFAULT;
 
-		/* only one flag should be set at a time */
-		if (__ffs(dirty.flags) != __fls(dirty.flags))
-			return -EINVAL;
+	if (dirty.argsz < minsz || dirty.flags & ~mask)
+		return -EINVAL;
 
-		if (dirty.flags & VFIO_IOMMU_DIRTY_PAGES_FLAG_START) {
-			size_t pgsize;
+	/* only one flag should be set at a time */
+	if (__ffs(dirty.flags) != __fls(dirty.flags))
+		return -EINVAL;
 
-			mutex_lock(&iommu->lock);
-			pgsize = 1 << __ffs(iommu->pgsize_bitmap);
-			if (!iommu->dirty_page_tracking) {
-				ret = vfio_dma_bitmap_alloc_all(iommu, pgsize);
-				if (!ret)
-					iommu->dirty_page_tracking = true;
-			}
-			mutex_unlock(&iommu->lock);
-			return ret;
-		} else if (dirty.flags & VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP) {
-			mutex_lock(&iommu->lock);
-			if (iommu->dirty_page_tracking) {
-				iommu->dirty_page_tracking = false;
-				vfio_dma_bitmap_free_all(iommu);
-			}
-			mutex_unlock(&iommu->lock);
-			return 0;
-		} else if (dirty.flags &
-				 VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP) {
-			struct vfio_iommu_type1_dirty_bitmap_get range;
-			unsigned long pgshift;
-			size_t data_size = dirty.argsz - minsz;
-			size_t iommu_pgsize;
-
-			if (!data_size || data_size < sizeof(range))
-				return -EINVAL;
-
-			if (copy_from_user(&range, (void __user *)(arg + minsz),
-					   sizeof(range)))
-				return -EFAULT;
+	if (dirty.flags & VFIO_IOMMU_DIRTY_PAGES_FLAG_START) {
+		size_t pgsize;
 
-			if (range.iova + range.size < range.iova)
-				return -EINVAL;
-			if (!access_ok((void __user *)range.bitmap.data,
-				       range.bitmap.size))
-				return -EINVAL;
+		mutex_lock(&iommu->lock);
+		pgsize = 1 << __ffs(iommu->pgsize_bitmap);
+		if (!iommu->dirty_page_tracking) {
+			ret = vfio_dma_bitmap_alloc_all(iommu, pgsize);
+			if (!ret)
+				iommu->dirty_page_tracking = true;
+		}
+		mutex_unlock(&iommu->lock);
+		return ret;
+	} else if (dirty.flags & VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP) {
+		mutex_lock(&iommu->lock);
+		if (iommu->dirty_page_tracking) {
+			iommu->dirty_page_tracking = false;
+			vfio_dma_bitmap_free_all(iommu);
+		}
+		mutex_unlock(&iommu->lock);
+		return 0;
+	} else if (dirty.flags & VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP) {
+		struct vfio_iommu_type1_dirty_bitmap_get range;
+		unsigned long pgshift;
+		size_t data_size = dirty.argsz - minsz;
+		size_t iommu_pgsize;
 
-			pgshift = __ffs(range.bitmap.pgsize);
-			ret = verify_bitmap_size(range.size >> pgshift,
-						 range.bitmap.size);
-			if (ret)
-				return ret;
+		if (!data_size || data_size < sizeof(range))
+			return -EINVAL;
 
-			mutex_lock(&iommu->lock);
+		if (copy_from_user(&range, (void __user *)(arg + minsz),
+				   sizeof(range)))
+			return -EFAULT;
 
-			iommu_pgsize = (size_t)1 << __ffs(iommu->pgsize_bitmap);
+		if (range.iova + range.size < range.iova)
+			return -EINVAL;
+		if (!access_ok((void __user *)range.bitmap.data,
+			       range.bitmap.size))
+			return -EINVAL;
 
-			/* allow only smallest supported pgsize */
-			if (range.bitmap.pgsize != iommu_pgsize) {
-				ret = -EINVAL;
-				goto out_unlock;
-			}
-			if (range.iova & (iommu_pgsize - 1)) {
-				ret = -EINVAL;
-				goto out_unlock;
-			}
-			if (!range.size || range.size & (iommu_pgsize - 1)) {
-				ret = -EINVAL;
-				goto out_unlock;
-			}
+		pgshift = __ffs(range.bitmap.pgsize);
+		ret = verify_bitmap_size(range.size >> pgshift,
+					 range.bitmap.size);
+		if (ret)
+			return ret;
 
-			if (iommu->dirty_page_tracking)
-				ret = vfio_iova_dirty_bitmap(range.bitmap.data,
-						iommu, range.iova, range.size,
-						range.bitmap.pgsize);
-			else
-				ret = -EINVAL;
-out_unlock:
-			mutex_unlock(&iommu->lock);
+		mutex_lock(&iommu->lock);
 
-			return ret;
+		iommu_pgsize = (size_t)1 << __ffs(iommu->pgsize_bitmap);
+
+		/* allow only smallest supported pgsize */
+		if (range.bitmap.pgsize != iommu_pgsize) {
+			ret = -EINVAL;
+			goto out_unlock;
+		}
+		if (range.iova & (iommu_pgsize - 1)) {
+			ret = -EINVAL;
+			goto out_unlock;
+		}
+		if (!range.size || range.size & (iommu_pgsize - 1)) {
+			ret = -EINVAL;
+			goto out_unlock;
 		}
+
+		if (iommu->dirty_page_tracking)
+			ret = vfio_iova_dirty_bitmap(range.bitmap.data,
+						     iommu, range.iova,
+						     range.size,
+						     range.bitmap.pgsize);
+		else
+			ret = -EINVAL;
+out_unlock:
+		mutex_unlock(&iommu->lock);
+
+		return ret;
 	}
 
-	return -ENOTTY;
+	return -EINVAL;
+}
+
+static long vfio_iommu_type1_ioctl(void *iommu_data,
+				   unsigned int cmd, unsigned long arg)
+{
+	struct vfio_iommu *iommu = iommu_data;
+
+	switch (cmd) {
+	case VFIO_CHECK_EXTENSION:
+		return vfio_iommu_type1_check_extension(iommu, arg);
+	case VFIO_IOMMU_GET_INFO:
+		return vfio_iommu_type1_get_info(iommu, arg);
+	case VFIO_IOMMU_MAP_DMA:
+		return vfio_iommu_type1_map_dma(iommu, arg);
+	case VFIO_IOMMU_UNMAP_DMA:
+		return vfio_iommu_type1_unmap_dma(iommu, arg);
+	case VFIO_IOMMU_DIRTY_PAGES:
+		return vfio_iommu_type1_dirty_pages(iommu, arg);
+	default:
+		return -ENOTTY;
+	}
 }
 
 static int vfio_iommu_type1_register_notifier(void *iommu_data,
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v5 02/15] iommu: Report domain nesting info
  2020-07-12 11:20 [PATCH v5 00/15] vfio: expose virtual Shared Virtual Addressing to VMs Liu Yi L
  2020-07-12 11:20 ` [PATCH v5 01/15] vfio/type1: Refactor vfio_iommu_type1_ioctl() Liu Yi L
@ 2020-07-12 11:20 ` Liu Yi L
  2020-07-17 16:29   ` Auger Eric
  2020-07-12 11:20 ` [PATCH v5 03/15] iommu/smmu: Report empty " Liu Yi L
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 58+ messages in thread
From: Liu Yi L @ 2020-07-12 11:20 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	iommu, linux-kernel, yi.y.sun, hao.wu

IOMMUs that support nesting translation needs report the capability info
to userspace, e.g. the format of first level/stage paging structures.

This patch reports nesting info by DOMAIN_ATTR_NESTING. Caller can get
nesting info after setting DOMAIN_ATTR_NESTING.

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
v4 -> v5:
*) address comments from Eric Auger.

v3 -> v4:
*) split the SMMU driver changes to be a separate patch
*) move the @addr_width and @pasid_bits from vendor specific
   part to generic part.
*) tweak the description for the @features field of struct
   iommu_nesting_info.
*) add description on the @data[] field of struct iommu_nesting_info

v2 -> v3:
*) remvoe cap/ecap_mask in iommu_nesting_info.
*) reuse DOMAIN_ATTR_NESTING to get nesting info.
*) return an empty iommu_nesting_info for SMMU drivers per Jean'
   suggestion.
---
 include/uapi/linux/iommu.h | 77 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index 1afc661..d2a47c4 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -332,4 +332,81 @@ struct iommu_gpasid_bind_data {
 	} vendor;
 };
 
+/*
+ * struct iommu_nesting_info - Information for nesting-capable IOMMU.
+ *			       user space should check it before using
+ *			       nesting capability.
+ *
+ * @size:	size of the whole structure
+ * @format:	PASID table entry format, the same definition as struct
+ *		iommu_gpasid_bind_data @format.
+ * @features:	supported nesting features.
+ * @flags:	currently reserved for future extension.
+ * @addr_width:	The output addr width of first level/stage translation
+ * @pasid_bits:	Maximum supported PASID bits, 0 represents no PASID
+ *		support.
+ * @data:	vendor specific cap info. data[] structure type can be deduced
+ *		from @format field.
+ *
+ * +===============+======================================================+
+ * | feature       |  Notes                                               |
+ * +===============+======================================================+
+ * | SYSWIDE_PASID |  PASIDs are managed in system-wide, instead of per   |
+ * |               |  device. When a device is assigned to userspace or   |
+ * |               |  VM, proper uAPI (userspace driver framework uAPI,   |
+ * |               |  e.g. VFIO) must be used to allocate/free PASIDs for |
+ * |               |  the assigned device.                                |
+ * +---------------+------------------------------------------------------+
+ * | BIND_PGTBL    |  The owner of the first level/stage page table must  |
+ * |               |  explicitly bind the page table to associated PASID  |
+ * |               |  (either the one specified in bind request or the    |
+ * |               |  default PASID of iommu domain), through userspace   |
+ * |               |  driver framework uAPI (e.g. VFIO_IOMMU_NESTING_OP). |
+ * +---------------+------------------------------------------------------+
+ * | CACHE_INVLD   |  The owner of the first level/stage page table must  |
+ * |               |  explicitly invalidate the IOMMU cache through uAPI  |
+ * |               |  provided by userspace driver framework (e.g. VFIO)  |
+ * |               |  according to vendor-specific requirement when       |
+ * |               |  changing the page table.                            |
+ * +---------------+------------------------------------------------------+
+ *
+ * @data[] types defined for @format:
+ * +================================+=====================================+
+ * | @format                        | @data[]                             |
+ * +================================+=====================================+
+ * | IOMMU_PASID_FORMAT_INTEL_VTD   | struct iommu_nesting_info_vtd       |
+ * +--------------------------------+-------------------------------------+
+ *
+ */
+struct iommu_nesting_info {
+	__u32	size;
+	__u32	format;
+#define IOMMU_NESTING_FEAT_SYSWIDE_PASID	(1 << 0)
+#define IOMMU_NESTING_FEAT_BIND_PGTBL		(1 << 1)
+#define IOMMU_NESTING_FEAT_CACHE_INVLD		(1 << 2)
+	__u32	features;
+	__u32	flags;
+	__u16	addr_width;
+	__u16	pasid_bits;
+	__u32	padding;
+	__u8	data[];
+};
+
+/*
+ * struct iommu_nesting_info_vtd - Intel VT-d specific nesting info
+ *
+ * @flags:	VT-d specific flags. Currently reserved for future
+ *		extension.
+ * @cap_reg:	Describe basic capabilities as defined in VT-d capability
+ *		register.
+ * @ecap_reg:	Describe the extended capabilities as defined in VT-d
+ *		extended capability register.
+ */
+struct iommu_nesting_info_vtd {
+	__u32	flags;
+	__u32	padding;
+	__u64	cap_reg;
+	__u64	ecap_reg;
+};
+
 #endif /* _UAPI_IOMMU_H */
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v5 03/15] iommu/smmu: Report empty domain nesting info
  2020-07-12 11:20 [PATCH v5 00/15] vfio: expose virtual Shared Virtual Addressing to VMs Liu Yi L
  2020-07-12 11:20 ` [PATCH v5 01/15] vfio/type1: Refactor vfio_iommu_type1_ioctl() Liu Yi L
  2020-07-12 11:20 ` [PATCH v5 02/15] iommu: Report domain nesting info Liu Yi L
@ 2020-07-12 11:20 ` Liu Yi L
  2020-07-13 13:14   ` Will Deacon
  2020-07-17 17:18   ` Auger Eric
  2020-07-12 11:20 ` [PATCH v5 04/15] vfio/type1: Report iommu nesting info to userspace Liu Yi L
                   ` (11 subsequent siblings)
  14 siblings, 2 replies; 58+ messages in thread
From: Liu Yi L @ 2020-07-12 11:20 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	iommu, linux-kernel, yi.y.sun, Robin Murphy, Will Deacon, hao.wu

This patch is added as instead of returning a boolean for DOMAIN_ATTR_NESTING,
iommu_domain_get_attr() should return an iommu_nesting_info handle.

Cc: Will Deacon <will@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Suggested-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
v4 -> v5:
*) address comments from Eric Auger.
---
 drivers/iommu/arm-smmu-v3.c | 29 +++++++++++++++++++++++++++--
 drivers/iommu/arm-smmu.c    | 29 +++++++++++++++++++++++++++--
 2 files changed, 54 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index f578677..ec815d7 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -3019,6 +3019,32 @@ static struct iommu_group *arm_smmu_device_group(struct device *dev)
 	return group;
 }
 
+static int arm_smmu_domain_nesting_info(struct arm_smmu_domain *smmu_domain,
+					void *data)
+{
+	struct iommu_nesting_info *info = (struct iommu_nesting_info *)data;
+	unsigned int size;
+
+	if (!info || smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+		return -ENODEV;
+
+	size = sizeof(struct iommu_nesting_info);
+
+	/*
+	 * if provided buffer size is smaller than expected, should
+	 * return 0 and also the expected buffer size to caller.
+	 */
+	if (info->size < size) {
+		info->size = size;
+		return 0;
+	}
+
+	/* report an empty iommu_nesting_info for now */
+	memset(info, 0x0, size);
+	info->size = size;
+	return 0;
+}
+
 static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
 				    enum iommu_attr attr, void *data)
 {
@@ -3028,8 +3054,7 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
 	case IOMMU_DOMAIN_UNMANAGED:
 		switch (attr) {
 		case DOMAIN_ATTR_NESTING:
-			*(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
-			return 0;
+			return arm_smmu_domain_nesting_info(smmu_domain, data);
 		default:
 			return -ENODEV;
 		}
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 243bc4c..09e2f1b 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1506,6 +1506,32 @@ static struct iommu_group *arm_smmu_device_group(struct device *dev)
 	return group;
 }
 
+static int arm_smmu_domain_nesting_info(struct arm_smmu_domain *smmu_domain,
+					void *data)
+{
+	struct iommu_nesting_info *info = (struct iommu_nesting_info *)data;
+	unsigned int size;
+
+	if (!info || smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+		return -ENODEV;
+
+	size = sizeof(struct iommu_nesting_info);
+
+	/*
+	 * if provided buffer size is smaller than expected, should
+	 * return 0 and also the expected buffer size to caller.
+	 */
+	if (info->size < size) {
+		info->size = size;
+		return 0;
+	}
+
+	/* report an empty iommu_nesting_info for now */
+	memset(info, 0x0, size);
+	info->size = size;
+	return 0;
+}
+
 static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
 				    enum iommu_attr attr, void *data)
 {
@@ -1515,8 +1541,7 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
 	case IOMMU_DOMAIN_UNMANAGED:
 		switch (attr) {
 		case DOMAIN_ATTR_NESTING:
-			*(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
-			return 0;
+			return arm_smmu_domain_nesting_info(smmu_domain, data);
 		default:
 			return -ENODEV;
 		}
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v5 04/15] vfio/type1: Report iommu nesting info to userspace
  2020-07-12 11:20 [PATCH v5 00/15] vfio: expose virtual Shared Virtual Addressing to VMs Liu Yi L
                   ` (2 preceding siblings ...)
  2020-07-12 11:20 ` [PATCH v5 03/15] iommu/smmu: Report empty " Liu Yi L
@ 2020-07-12 11:20 ` Liu Yi L
  2020-07-17 17:34   ` Auger Eric
  2020-07-12 11:21 ` [PATCH v5 05/15] vfio: Add PASID allocation/free support Liu Yi L
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 58+ messages in thread
From: Liu Yi L @ 2020-07-12 11:20 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	iommu, linux-kernel, yi.y.sun, hao.wu

This patch exports iommu nesting capability info to user space through
VFIO. User space is expected to check this info for supported uAPIs (e.g.
PASID alloc/free, bind page table, and cache invalidation) and the vendor
specific format information for first level/stage page table that will be
bound to.

The nesting info is available only after the nesting iommu type is set
for a container. Current implementation imposes one limitation - one
nesting container should include at most one group. The philosophy of
vfio container is having all groups/devices within the container share
the same IOMMU context. When vSVA is enabled, one IOMMU context could
include one 2nd-level address space and multiple 1st-level address spaces.
While the 2nd-level address space is reasonably sharable by multiple groups
, blindly sharing 1st-level address spaces across all groups within the
container might instead break the guest expectation. In the future sub/
super container concept might be introduced to allow partial address space
sharing within an IOMMU context. But for now let's go with this restriction
by requiring singleton container for using nesting iommu features. Below
link has the related discussion about this decision.

https://lkml.org/lkml/2020/5/15/1028

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
v4 -> v5:
*) address comments from Eric Auger.
*) return struct iommu_nesting_info for VFIO_IOMMU_TYPE1_INFO_CAP_NESTING as
   cap is much "cheap", if needs extension in future, just define another cap.
   https://lore.kernel.org/kvm/20200708132947.5b7ee954@x1.home/

v3 -> v4:
*) address comments against v3.

v1 -> v2:
*) added in v2
---
 drivers/vfio/vfio_iommu_type1.c | 102 +++++++++++++++++++++++++++++++++++-----
 include/uapi/linux/vfio.h       |  19 ++++++++
 2 files changed, 109 insertions(+), 12 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 3bd70ff..ed80104 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -62,18 +62,20 @@ MODULE_PARM_DESC(dma_entry_limit,
 		 "Maximum number of user DMA mappings per container (65535).");
 
 struct vfio_iommu {
-	struct list_head	domain_list;
-	struct list_head	iova_list;
-	struct vfio_domain	*external_domain; /* domain for external user */
-	struct mutex		lock;
-	struct rb_root		dma_list;
-	struct blocking_notifier_head notifier;
-	unsigned int		dma_avail;
-	uint64_t		pgsize_bitmap;
-	bool			v2;
-	bool			nesting;
-	bool			dirty_page_tracking;
-	bool			pinned_page_dirty_scope;
+	struct list_head		domain_list;
+	struct list_head		iova_list;
+	/* domain for external user */
+	struct vfio_domain		*external_domain;
+	struct mutex			lock;
+	struct rb_root			dma_list;
+	struct blocking_notifier_head	notifier;
+	unsigned int			dma_avail;
+	uint64_t			pgsize_bitmap;
+	bool				v2;
+	bool				nesting;
+	bool				dirty_page_tracking;
+	bool				pinned_page_dirty_scope;
+	struct iommu_nesting_info	*nesting_info;
 };
 
 struct vfio_domain {
@@ -130,6 +132,9 @@ struct vfio_regions {
 #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)	\
 					(!list_empty(&iommu->domain_list))
 
+#define CONTAINER_HAS_DOMAIN(iommu)	(((iommu)->external_domain) || \
+					 (!list_empty(&(iommu)->domain_list)))
+
 #define DIRTY_BITMAP_BYTES(n)	(ALIGN(n, BITS_PER_TYPE(u64)) / BITS_PER_BYTE)
 
 /*
@@ -1929,6 +1934,13 @@ static void vfio_iommu_iova_insert_copy(struct vfio_iommu *iommu,
 
 	list_splice_tail(iova_copy, iova);
 }
+
+static void vfio_iommu_release_nesting_info(struct vfio_iommu *iommu)
+{
+	kfree(iommu->nesting_info);
+	iommu->nesting_info = NULL;
+}
+
 static int vfio_iommu_type1_attach_group(void *iommu_data,
 					 struct iommu_group *iommu_group)
 {
@@ -1959,6 +1971,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 		}
 	}
 
+	/* Nesting type container can include only one group */
+	if (iommu->nesting && CONTAINER_HAS_DOMAIN(iommu)) {
+		mutex_unlock(&iommu->lock);
+		return -EINVAL;
+	}
+
 	group = kzalloc(sizeof(*group), GFP_KERNEL);
 	domain = kzalloc(sizeof(*domain), GFP_KERNEL);
 	if (!group || !domain) {
@@ -2029,6 +2047,32 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	if (ret)
 		goto out_domain;
 
+	/* Nesting cap info is available only after attaching */
+	if (iommu->nesting) {
+		struct iommu_nesting_info tmp = { .size = 0, };
+
+		/* First get the size of vendor specific nesting info */
+		ret = iommu_domain_get_attr(domain->domain,
+					    DOMAIN_ATTR_NESTING,
+					    &tmp);
+		if (ret)
+			goto out_detach;
+
+		iommu->nesting_info = kzalloc(tmp.size, GFP_KERNEL);
+		if (!iommu->nesting_info) {
+			ret = -ENOMEM;
+			goto out_detach;
+		}
+
+		/* Now get the nesting info */
+		iommu->nesting_info->size = tmp.size;
+		ret = iommu_domain_get_attr(domain->domain,
+					    DOMAIN_ATTR_NESTING,
+					    iommu->nesting_info);
+		if (ret)
+			goto out_detach;
+	}
+
 	/* Get aperture info */
 	iommu_domain_get_attr(domain->domain, DOMAIN_ATTR_GEOMETRY, &geo);
 
@@ -2138,6 +2182,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	return 0;
 
 out_detach:
+	vfio_iommu_release_nesting_info(iommu);
 	vfio_iommu_detach_group(domain, group);
 out_domain:
 	iommu_domain_free(domain->domain);
@@ -2338,6 +2383,8 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
 					vfio_iommu_unmap_unpin_all(iommu);
 				else
 					vfio_iommu_unmap_unpin_reaccount(iommu);
+
+				vfio_iommu_release_nesting_info(iommu);
 			}
 			iommu_domain_free(domain->domain);
 			list_del(&domain->next);
@@ -2546,6 +2593,31 @@ static int vfio_iommu_migration_build_caps(struct vfio_iommu *iommu,
 	return vfio_info_add_capability(caps, &cap_mig.header, sizeof(cap_mig));
 }
 
+static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu,
+					   struct vfio_info_cap *caps)
+{
+	struct vfio_info_cap_header *header;
+	struct vfio_iommu_type1_info_cap_nesting *nesting_cap;
+	size_t size;
+
+	size = offsetof(struct vfio_iommu_type1_info_cap_nesting, info) +
+		iommu->nesting_info->size;
+
+	header = vfio_info_cap_add(caps, size,
+				   VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1);
+	if (IS_ERR(header))
+		return PTR_ERR(header);
+
+	nesting_cap = container_of(header,
+				   struct vfio_iommu_type1_info_cap_nesting,
+				   header);
+
+	memcpy(&nesting_cap->info, iommu->nesting_info,
+	       iommu->nesting_info->size);
+
+	return 0;
+}
+
 static int vfio_iommu_type1_get_info(struct vfio_iommu *iommu,
 				     unsigned long arg)
 {
@@ -2581,6 +2653,12 @@ static int vfio_iommu_type1_get_info(struct vfio_iommu *iommu,
 	if (!ret)
 		ret = vfio_iommu_iova_build_caps(iommu, &caps);
 
+	if (iommu->nesting_info) {
+		ret = vfio_iommu_info_add_nesting_cap(iommu, &caps);
+		if (ret)
+			return ret;
+	}
+
 	mutex_unlock(&iommu->lock);
 
 	if (ret)
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 9204705..46a78af 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -14,6 +14,7 @@
 
 #include <linux/types.h>
 #include <linux/ioctl.h>
+#include <linux/iommu.h>
 
 #define VFIO_API_VERSION	0
 
@@ -1039,6 +1040,24 @@ struct vfio_iommu_type1_info_cap_migration {
 	__u64	max_dirty_bitmap_size;		/* in bytes */
 };
 
+/*
+ * The nesting capability allows to report the related capability
+ * and info for nesting iommu type.
+ *
+ * The structures below define version 1 of this capability.
+ *
+ * User space selected VFIO_TYPE1_NESTING_IOMMU type should check
+ * this capto get supported features.
+ *
+ * @info: the nesting info provided by IOMMU driver.
+ */
+#define VFIO_IOMMU_TYPE1_INFO_CAP_NESTING  3
+
+struct vfio_iommu_type1_info_cap_nesting {
+	struct	vfio_info_cap_header header;
+	struct iommu_nesting_info info;
+};
+
 #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
 
 /**
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v5 05/15] vfio: Add PASID allocation/free support
  2020-07-12 11:20 [PATCH v5 00/15] vfio: expose virtual Shared Virtual Addressing to VMs Liu Yi L
                   ` (3 preceding siblings ...)
  2020-07-12 11:20 ` [PATCH v5 04/15] vfio/type1: Report iommu nesting info to userspace Liu Yi L
@ 2020-07-12 11:21 ` Liu Yi L
  2020-07-19 15:38   ` Auger Eric
  2020-07-12 11:21 ` [PATCH v5 06/15] iommu/vt-d: Support setting ioasid set to domain Liu Yi L
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 58+ messages in thread
From: Liu Yi L @ 2020-07-12 11:21 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	iommu, linux-kernel, yi.y.sun, hao.wu

Shared Virtual Addressing (a.k.a Shared Virtual Memory) allows sharing
multiple process virtual address spaces with the device for simplified
programming model. PASID is used to tag an virtual address space in DMA
requests and to identify the related translation structure in IOMMU. When
a PASID-capable device is assigned to a VM, we want the same capability
of using PASID to tag guest process virtual address spaces to achieve
virtual SVA (vSVA).

PASID management for guest is vendor specific. Some vendors (e.g. Intel
VT-d) requires system-wide managed PASIDs cross all devices, regardless
of whether a device is used by host or assigned to guest. Other vendors
(e.g. ARM SMMU) may allow PASIDs managed per-device thus could be fully
delegated to the guest for assigned devices.

For system-wide managed PASIDs, this patch introduces a vfio module to
handle explicit PASID alloc/free requests from guest. Allocated PASIDs
are associated to a process (or, mm_struct) in IOASID core. A vfio_mm
object is introduced to track mm_struct. Multiple VFIO containers within
a process share the same vfio_mm object.

A quota mechanism is provided to prevent malicious user from exhausting
available PASIDs. Currently the quota is a global parameter applied to
all VFIO devices. In the future per-device quota might be supported too.

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
v4 -> v5:
*) address comments from Eric Auger.
*) address the comments from Alex on the pasid free range support. Added
   per vfio_mm pasid r-b tree.
   https://lore.kernel.org/kvm/20200709082751.320742ab@x1.home/

v3 -> v4:
*) fix lock leam in vfio_mm_get_from_task()
*) drop pasid_quota field in struct vfio_mm
*) vfio_mm_get_from_task() returns ERR_PTR(-ENOTTY) when !CONFIG_VFIO_PASID

v1 -> v2:
*) added in v2, split from the pasid alloc/free support of v1
---
 drivers/vfio/Kconfig      |   5 +
 drivers/vfio/Makefile     |   1 +
 drivers/vfio/vfio_pasid.c | 235 ++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/vfio.h      |  28 ++++++
 4 files changed, 269 insertions(+)
 create mode 100644 drivers/vfio/vfio_pasid.c

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index fd17db9..3d8a108 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -19,6 +19,11 @@ config VFIO_VIRQFD
 	depends on VFIO && EVENTFD
 	default n
 
+config VFIO_PASID
+	tristate
+	depends on IOASID && VFIO
+	default n
+
 menuconfig VFIO
 	tristate "VFIO Non-Privileged userspace driver framework"
 	depends on IOMMU_API
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index de67c47..bb836a3 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -3,6 +3,7 @@ vfio_virqfd-y := virqfd.o
 
 obj-$(CONFIG_VFIO) += vfio.o
 obj-$(CONFIG_VFIO_VIRQFD) += vfio_virqfd.o
+obj-$(CONFIG_VFIO_PASID) += vfio_pasid.o
 obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
 obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
 obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o
diff --git a/drivers/vfio/vfio_pasid.c b/drivers/vfio/vfio_pasid.c
new file mode 100644
index 0000000..66e6054e
--- /dev/null
+++ b/drivers/vfio/vfio_pasid.c
@@ -0,0 +1,235 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Intel Corporation.
+ *     Author: Liu Yi L <yi.l.liu@intel.com>
+ *
+ */
+
+#include <linux/vfio.h>
+#include <linux/eventfd.h>
+#include <linux/file.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/sched/mm.h>
+
+#define DRIVER_VERSION  "0.1"
+#define DRIVER_AUTHOR   "Liu Yi L <yi.l.liu@intel.com>"
+#define DRIVER_DESC     "PASID management for VFIO bus drivers"
+
+#define VFIO_DEFAULT_PASID_QUOTA	1000
+static int pasid_quota = VFIO_DEFAULT_PASID_QUOTA;
+module_param_named(pasid_quota, pasid_quota, uint, 0444);
+MODULE_PARM_DESC(pasid_quota,
+		 " Set the quota for max number of PASIDs that an application is allowed to request (default 1000)");
+
+struct vfio_mm_token {
+	unsigned long long val;
+};
+
+struct vfio_mm {
+	struct kref		kref;
+	int			ioasid_sid;
+	struct mutex		pasid_lock;
+	struct rb_root		pasid_list;
+	struct list_head	next;
+	struct vfio_mm_token	token;
+};
+
+static struct mutex		vfio_mm_lock;
+static struct list_head		vfio_mm_list;
+
+struct vfio_pasid {
+	struct rb_node		node;
+	ioasid_t		pasid;
+};
+
+static void vfio_remove_all_pasids(struct vfio_mm *vmm);
+
+/* called with vfio.vfio_mm_lock held */
+static void vfio_mm_release(struct kref *kref)
+{
+	struct vfio_mm *vmm = container_of(kref, struct vfio_mm, kref);
+
+	list_del(&vmm->next);
+	mutex_unlock(&vfio_mm_lock);
+	vfio_remove_all_pasids(vmm);
+	ioasid_free_set(vmm->ioasid_sid, true);
+	kfree(vmm);
+}
+
+void vfio_mm_put(struct vfio_mm *vmm)
+{
+	kref_put_mutex(&vmm->kref, vfio_mm_release, &vfio_mm_lock);
+}
+
+static void vfio_mm_get(struct vfio_mm *vmm)
+{
+	kref_get(&vmm->kref);
+}
+
+struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task)
+{
+	struct mm_struct *mm = get_task_mm(task);
+	struct vfio_mm *vmm;
+	unsigned long long val = (unsigned long long)mm;
+	int ret;
+
+	mutex_lock(&vfio_mm_lock);
+	/* Search existing vfio_mm with current mm pointer */
+	list_for_each_entry(vmm, &vfio_mm_list, next) {
+		if (vmm->token.val == val) {
+			vfio_mm_get(vmm);
+			goto out;
+		}
+	}
+
+	vmm = kzalloc(sizeof(*vmm), GFP_KERNEL);
+	if (!vmm) {
+		vmm = ERR_PTR(-ENOMEM);
+		goto out;
+	}
+
+	/*
+	 * IOASID core provides a 'IOASID set' concept to track all
+	 * PASIDs associated with a token. Here we use mm_struct as
+	 * the token and create a IOASID set per mm_struct. All the
+	 * containers of the process share the same IOASID set.
+	 */
+	ret = ioasid_alloc_set((struct ioasid_set *)mm, pasid_quota,
+			       &vmm->ioasid_sid);
+	if (ret) {
+		kfree(vmm);
+		vmm = ERR_PTR(ret);
+		goto out;
+	}
+
+	kref_init(&vmm->kref);
+	vmm->token.val = val;
+	mutex_init(&vmm->pasid_lock);
+	vmm->pasid_list = RB_ROOT;
+
+	list_add(&vmm->next, &vfio_mm_list);
+out:
+	mutex_unlock(&vfio_mm_lock);
+	mmput(mm);
+	return vmm;
+}
+
+/*
+ * Find PASID within @min and @max
+ */
+static struct vfio_pasid *vfio_find_pasid(struct vfio_mm *vmm,
+					  ioasid_t min, ioasid_t max)
+{
+	struct rb_node *node = vmm->pasid_list.rb_node;
+
+	while (node) {
+		struct vfio_pasid *vid = rb_entry(node,
+						struct vfio_pasid, node);
+
+		if (max < vid->pasid)
+			node = node->rb_left;
+		else if (min > vid->pasid)
+			node = node->rb_right;
+		else
+			return vid;
+	}
+
+	return NULL;
+}
+
+static void vfio_link_pasid(struct vfio_mm *vmm, struct vfio_pasid *new)
+{
+	struct rb_node **link = &vmm->pasid_list.rb_node, *parent = NULL;
+	struct vfio_pasid *vid;
+
+	while (*link) {
+		parent = *link;
+		vid = rb_entry(parent, struct vfio_pasid, node);
+
+		if (new->pasid <= vid->pasid)
+			link = &(*link)->rb_left;
+		else
+			link = &(*link)->rb_right;
+	}
+
+	rb_link_node(&new->node, parent, link);
+	rb_insert_color(&new->node, &vmm->pasid_list);
+}
+
+static void vfio_remove_pasid(struct vfio_mm *vmm, struct vfio_pasid *vid)
+{
+	rb_erase(&vid->node, &vmm->pasid_list); /* unlink pasid */
+	ioasid_free(vid->pasid);
+	kfree(vid);
+}
+
+static void vfio_remove_all_pasids(struct vfio_mm *vmm)
+{
+	struct rb_node *node;
+
+	mutex_lock(&vmm->pasid_lock);
+	while ((node = rb_first(&vmm->pasid_list)))
+		vfio_remove_pasid(vmm, rb_entry(node, struct vfio_pasid, node));
+	mutex_unlock(&vmm->pasid_lock);
+}
+
+int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max)
+{
+	ioasid_t pasid;
+	struct vfio_pasid *vid;
+
+	pasid = ioasid_alloc(vmm->ioasid_sid, min, max, NULL);
+	if (pasid == INVALID_IOASID)
+		return -ENOSPC;
+
+	vid = kzalloc(sizeof(*vid), GFP_KERNEL);
+	if (!vid) {
+		ioasid_free(pasid);
+		return -ENOMEM;
+	}
+
+	vid->pasid = pasid;
+
+	mutex_lock(&vmm->pasid_lock);
+	vfio_link_pasid(vmm, vid);
+	mutex_unlock(&vmm->pasid_lock);
+
+	return pasid;
+}
+
+void vfio_pasid_free_range(struct vfio_mm *vmm,
+			   ioasid_t min, ioasid_t max)
+{
+	struct vfio_pasid *vid = NULL;
+
+	/*
+	 * IOASID core will notify PASID users (e.g. IOMMU driver) to
+	 * teardown necessary structures depending on the to-be-freed
+	 * PASID.
+	 */
+	mutex_lock(&vmm->pasid_lock);
+	while ((vid = vfio_find_pasid(vmm, min, max)) != NULL)
+		vfio_remove_pasid(vmm, vid);
+	mutex_unlock(&vmm->pasid_lock);
+}
+
+static int __init vfio_pasid_init(void)
+{
+	mutex_init(&vfio_mm_lock);
+	INIT_LIST_HEAD(&vfio_mm_list);
+	return 0;
+}
+
+static void __exit vfio_pasid_exit(void)
+{
+	WARN_ON(!list_empty(&vfio_mm_list));
+}
+
+module_init(vfio_pasid_init);
+module_exit(vfio_pasid_exit);
+
+MODULE_VERSION(DRIVER_VERSION);
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 38d3c6a..31472a9 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -97,6 +97,34 @@ extern int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops *ops);
 extern void vfio_unregister_iommu_driver(
 				const struct vfio_iommu_driver_ops *ops);
 
+struct vfio_mm;
+#if IS_ENABLED(CONFIG_VFIO_PASID)
+extern struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task);
+extern void vfio_mm_put(struct vfio_mm *vmm);
+extern int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max);
+extern void vfio_pasid_free_range(struct vfio_mm *vmm,
+				  ioasid_t min, ioasid_t max);
+#else
+static inline struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task)
+{
+	return ERR_PTR(-ENOTTY);
+}
+
+static inline void vfio_mm_put(struct vfio_mm *vmm)
+{
+}
+
+static inline int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max)
+{
+	return -ENOTTY;
+}
+
+static inline void vfio_pasid_free_range(struct vfio_mm *vmm,
+					  ioasid_t min, ioasid_t max)
+{
+}
+#endif /* CONFIG_VFIO_PASID */
+
 /*
  * External user API
  */
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v5 06/15] iommu/vt-d: Support setting ioasid set to domain
  2020-07-12 11:20 [PATCH v5 00/15] vfio: expose virtual Shared Virtual Addressing to VMs Liu Yi L
                   ` (4 preceding siblings ...)
  2020-07-12 11:21 ` [PATCH v5 05/15] vfio: Add PASID allocation/free support Liu Yi L
@ 2020-07-12 11:21 ` Liu Yi L
  2020-07-19 15:38   ` Auger Eric
  2020-07-12 11:21 ` [PATCH v5 07/15] vfio/type1: Add VFIO_IOMMU_PASID_REQUEST (alloc/free) Liu Yi L
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 58+ messages in thread
From: Liu Yi L @ 2020-07-12 11:21 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	iommu, linux-kernel, yi.y.sun, hao.wu

From IOMMU p.o.v., PASIDs allocated and managed by external components
(e.g. VFIO) will be passed in for gpasid_bind/unbind operation. IOMMU
needs some knowledge to check the PASID ownership, hence add an interface
for those components to tell the PASID owner.

In latest kernel design, PASID ownership is managed by IOASID set where
the PASID is allocated from. This patch adds support for setting ioasid
set ID to the domains used for nesting/vSVA. Subsequent SVA operations
on the PASID will be checked against its IOASID set for proper ownership.

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
v4 -> v5:
*) address comments from Eric Auger.
---
 drivers/iommu/intel/iommu.c | 22 ++++++++++++++++++++++
 include/linux/intel-iommu.h |  4 ++++
 include/linux/iommu.h       |  1 +
 3 files changed, 27 insertions(+)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 72ae6a2..4d54198 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1793,6 +1793,7 @@ static struct dmar_domain *alloc_domain(int flags)
 	if (first_level_by_default())
 		domain->flags |= DOMAIN_FLAG_USE_FIRST_LEVEL;
 	domain->has_iotlb_device = false;
+	domain->ioasid_sid = INVALID_IOASID_SET;
 	INIT_LIST_HEAD(&domain->devices);
 
 	return domain;
@@ -6039,6 +6040,27 @@ intel_iommu_domain_set_attr(struct iommu_domain *domain,
 		}
 		spin_unlock_irqrestore(&device_domain_lock, flags);
 		break;
+	case DOMAIN_ATTR_IOASID_SID:
+	{
+		int sid = *(int *)data;
+
+		if (!(dmar_domain->flags & DOMAIN_FLAG_NESTING_MODE)) {
+			ret = -ENODEV;
+			break;
+		}
+		spin_lock_irqsave(&device_domain_lock, flags);
+		if (dmar_domain->ioasid_sid != INVALID_IOASID_SET &&
+		    dmar_domain->ioasid_sid != sid) {
+			pr_warn_ratelimited("multi ioasid_set (%d:%d) setting",
+					    dmar_domain->ioasid_sid, sid);
+			ret = -EBUSY;
+			spin_unlock_irqrestore(&device_domain_lock, flags);
+			break;
+		}
+		dmar_domain->ioasid_sid = sid;
+		spin_unlock_irqrestore(&device_domain_lock, flags);
+		break;
+	}
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 3f23c26..0d0ab32 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -549,6 +549,10 @@ struct dmar_domain {
 					   2 == 1GiB, 3 == 512GiB, 4 == 1TiB */
 	u64		max_addr;	/* maximum mapped address */
 
+	int		ioasid_sid;	/*
+					 * the ioasid set which tracks all
+					 * PASIDs used by the domain.
+					 */
 	int		default_pasid;	/*
 					 * The default pasid used for non-SVM
 					 * traffic on mediated devices.
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 7ca9d48..e84a1d5 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -124,6 +124,7 @@ enum iommu_attr {
 	DOMAIN_ATTR_FSL_PAMUV1,
 	DOMAIN_ATTR_NESTING,	/* two stages of translation */
 	DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
+	DOMAIN_ATTR_IOASID_SID,
 	DOMAIN_ATTR_MAX,
 };
 
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v5 07/15] vfio/type1: Add VFIO_IOMMU_PASID_REQUEST (alloc/free)
  2020-07-12 11:20 [PATCH v5 00/15] vfio: expose virtual Shared Virtual Addressing to VMs Liu Yi L
                   ` (5 preceding siblings ...)
  2020-07-12 11:21 ` [PATCH v5 06/15] iommu/vt-d: Support setting ioasid set to domain Liu Yi L
@ 2020-07-12 11:21 ` Liu Yi L
  2020-07-19 15:38   ` Auger Eric
  2020-07-12 11:21 ` [PATCH v5 08/15] iommu: Pass domain to sva_unbind_gpasid() Liu Yi L
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 58+ messages in thread
From: Liu Yi L @ 2020-07-12 11:21 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	iommu, linux-kernel, yi.y.sun, hao.wu

This patch allows user space to request PASID allocation/free, e.g. when
serving the request from the guest.

PASIDs that are not freed by userspace are automatically freed when the
IOASID set is destroyed when process exits.

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
v4 -> v5:
*) address comments from Eric Auger.
*) the comments for the PASID_FREE request is addressed in patch 5/15 of
   this series.

v3 -> v4:
*) address comments from v3, except the below comment against the range
   of PASID_FREE request. needs more help on it.
    "> +if (req.range.min > req.range.max)

    Is it exploitable that a user can spin the kernel for a long time in
    the case of a free by calling this with [0, MAX_UINT] regardless of
    their actual allocations?"
    https://lore.kernel.org/linux-iommu/20200702151832.048b44d1@x1.home/

v1 -> v2:
*) move the vfio_mm related code to be a seprate module
*) use a single structure for alloc/free, could support a range of PASIDs
*) fetch vfio_mm at group_attach time instead of at iommu driver open time
---
 drivers/vfio/Kconfig            |  1 +
 drivers/vfio/vfio_iommu_type1.c | 85 +++++++++++++++++++++++++++++++++++++++++
 drivers/vfio/vfio_pasid.c       | 10 +++++
 include/linux/vfio.h            |  6 +++
 include/uapi/linux/vfio.h       | 37 ++++++++++++++++++
 5 files changed, 139 insertions(+)

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index 3d8a108..95d90c6 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -2,6 +2,7 @@
 config VFIO_IOMMU_TYPE1
 	tristate
 	depends on VFIO
+	select VFIO_PASID if (X86)
 	default n
 
 config VFIO_IOMMU_SPAPR_TCE
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index ed80104..55b4065 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -76,6 +76,7 @@ struct vfio_iommu {
 	bool				dirty_page_tracking;
 	bool				pinned_page_dirty_scope;
 	struct iommu_nesting_info	*nesting_info;
+	struct vfio_mm			*vmm;
 };
 
 struct vfio_domain {
@@ -1937,6 +1938,11 @@ static void vfio_iommu_iova_insert_copy(struct vfio_iommu *iommu,
 
 static void vfio_iommu_release_nesting_info(struct vfio_iommu *iommu)
 {
+	if (iommu->vmm) {
+		vfio_mm_put(iommu->vmm);
+		iommu->vmm = NULL;
+	}
+
 	kfree(iommu->nesting_info);
 	iommu->nesting_info = NULL;
 }
@@ -2071,6 +2077,26 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 					    iommu->nesting_info);
 		if (ret)
 			goto out_detach;
+
+		if (iommu->nesting_info->features &
+					IOMMU_NESTING_FEAT_SYSWIDE_PASID) {
+			struct vfio_mm *vmm;
+			int sid;
+
+			vmm = vfio_mm_get_from_task(current);
+			if (IS_ERR(vmm)) {
+				ret = PTR_ERR(vmm);
+				goto out_detach;
+			}
+			iommu->vmm = vmm;
+
+			sid = vfio_mm_ioasid_sid(vmm);
+			ret = iommu_domain_set_attr(domain->domain,
+						    DOMAIN_ATTR_IOASID_SID,
+						    &sid);
+			if (ret)
+				goto out_detach;
+		}
 	}
 
 	/* Get aperture info */
@@ -2855,6 +2881,63 @@ static int vfio_iommu_type1_dirty_pages(struct vfio_iommu *iommu,
 	return -EINVAL;
 }
 
+static int vfio_iommu_type1_pasid_alloc(struct vfio_iommu *iommu,
+					unsigned int min,
+					unsigned int max)
+{
+	int ret = -EOPNOTSUPP;
+
+	mutex_lock(&iommu->lock);
+	if (iommu->vmm)
+		ret = vfio_pasid_alloc(iommu->vmm, min, max);
+	mutex_unlock(&iommu->lock);
+	return ret;
+}
+
+static int vfio_iommu_type1_pasid_free(struct vfio_iommu *iommu,
+				       unsigned int min,
+				       unsigned int max)
+{
+	int ret = -EOPNOTSUPP;
+
+	mutex_lock(&iommu->lock);
+	if (iommu->vmm) {
+		vfio_pasid_free_range(iommu->vmm, min, max);
+		ret = 0;
+	}
+	mutex_unlock(&iommu->lock);
+	return ret;
+}
+
+static int vfio_iommu_type1_pasid_request(struct vfio_iommu *iommu,
+					  unsigned long arg)
+{
+	struct vfio_iommu_type1_pasid_request req;
+	unsigned long minsz;
+
+	minsz = offsetofend(struct vfio_iommu_type1_pasid_request, range);
+
+	if (copy_from_user(&req, (void __user *)arg, minsz))
+		return -EFAULT;
+
+	if (req.argsz < minsz || (req.flags & ~VFIO_PASID_REQUEST_MASK))
+		return -EINVAL;
+
+	if (req.range.min > req.range.max)
+		return -EINVAL;
+
+	switch (req.flags & VFIO_PASID_REQUEST_MASK) {
+	case VFIO_IOMMU_FLAG_ALLOC_PASID:
+		return vfio_iommu_type1_pasid_alloc(iommu,
+					req.range.min, req.range.max);
+	case VFIO_IOMMU_FLAG_FREE_PASID:
+		return vfio_iommu_type1_pasid_free(iommu,
+					req.range.min, req.range.max);
+	default:
+		return -EINVAL;
+	}
+}
+
 static long vfio_iommu_type1_ioctl(void *iommu_data,
 				   unsigned int cmd, unsigned long arg)
 {
@@ -2871,6 +2954,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 		return vfio_iommu_type1_unmap_dma(iommu, arg);
 	case VFIO_IOMMU_DIRTY_PAGES:
 		return vfio_iommu_type1_dirty_pages(iommu, arg);
+	case VFIO_IOMMU_PASID_REQUEST:
+		return vfio_iommu_type1_pasid_request(iommu, arg);
 	default:
 		return -ENOTTY;
 	}
diff --git a/drivers/vfio/vfio_pasid.c b/drivers/vfio/vfio_pasid.c
index 66e6054e..ebec244 100644
--- a/drivers/vfio/vfio_pasid.c
+++ b/drivers/vfio/vfio_pasid.c
@@ -61,6 +61,7 @@ void vfio_mm_put(struct vfio_mm *vmm)
 {
 	kref_put_mutex(&vmm->kref, vfio_mm_release, &vfio_mm_lock);
 }
+EXPORT_SYMBOL_GPL(vfio_mm_put);
 
 static void vfio_mm_get(struct vfio_mm *vmm)
 {
@@ -114,6 +115,13 @@ struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task)
 	mmput(mm);
 	return vmm;
 }
+EXPORT_SYMBOL_GPL(vfio_mm_get_from_task);
+
+int vfio_mm_ioasid_sid(struct vfio_mm *vmm)
+{
+	return vmm->ioasid_sid;
+}
+EXPORT_SYMBOL_GPL(vfio_mm_ioasid_sid);
 
 /*
  * Find PASID within @min and @max
@@ -197,6 +205,7 @@ int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max)
 
 	return pasid;
 }
+EXPORT_SYMBOL_GPL(vfio_pasid_alloc);
 
 void vfio_pasid_free_range(struct vfio_mm *vmm,
 			   ioasid_t min, ioasid_t max)
@@ -213,6 +222,7 @@ void vfio_pasid_free_range(struct vfio_mm *vmm,
 		vfio_remove_pasid(vmm, vid);
 	mutex_unlock(&vmm->pasid_lock);
 }
+EXPORT_SYMBOL_GPL(vfio_pasid_free_range);
 
 static int __init vfio_pasid_init(void)
 {
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 31472a9..a111108 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -101,6 +101,7 @@ struct vfio_mm;
 #if IS_ENABLED(CONFIG_VFIO_PASID)
 extern struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task);
 extern void vfio_mm_put(struct vfio_mm *vmm);
+int vfio_mm_ioasid_sid(struct vfio_mm *vmm);
 extern int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max);
 extern void vfio_pasid_free_range(struct vfio_mm *vmm,
 				  ioasid_t min, ioasid_t max);
@@ -114,6 +115,11 @@ static inline void vfio_mm_put(struct vfio_mm *vmm)
 {
 }
 
+static inline int vfio_mm_ioasid_sid(struct vfio_mm *vmm)
+{
+	return -ENOTTY;
+}
+
 static inline int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max)
 {
 	return -ENOTTY;
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 46a78af..96a115f 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -1172,6 +1172,43 @@ struct vfio_iommu_type1_dirty_bitmap_get {
 
 #define VFIO_IOMMU_DIRTY_PAGES             _IO(VFIO_TYPE, VFIO_BASE + 17)
 
+/**
+ * VFIO_IOMMU_PASID_REQUEST - _IOWR(VFIO_TYPE, VFIO_BASE + 18,
+ *				struct vfio_iommu_type1_pasid_request)
+ *
+ * PASID (Processor Address Space ID) is a PCIe concept for tagging
+ * address spaces in DMA requests. When system-wide PASID allocation
+ * is required by underlying iommu driver (e.g. Intel VT-d), this
+ * provides an interface for userspace to request pasid alloc/free
+ * for its assigned devices. Userspace should check the availability
+ * of this API by checking VFIO_IOMMU_TYPE1_INFO_CAP_NESTING through
+ * VFIO_IOMMU_GET_INFO.
+ *
+ * @flags=VFIO_IOMMU_FLAG_ALLOC_PASID, allocate a single PASID within @range.
+ * @flags=VFIO_IOMMU_FLAG_FREE_PASID, free the PASIDs within @range.
+ * @range is [min, max], which means both @min and @max are inclusive.
+ * ALLOC_PASID and FREE_PASID are mutually exclusive.
+ *
+ * returns: allocated PASID value on success, -errno on failure for
+ *	     ALLOC_PASID;
+ *	     0 for FREE_PASID operation;
+ */
+struct vfio_iommu_type1_pasid_request {
+	__u32	argsz;
+#define VFIO_IOMMU_FLAG_ALLOC_PASID	(1 << 0)
+#define VFIO_IOMMU_FLAG_FREE_PASID	(1 << 1)
+	__u32	flags;
+	struct {
+		__u32	min;
+		__u32	max;
+	} range;
+};
+
+#define VFIO_PASID_REQUEST_MASK	(VFIO_IOMMU_FLAG_ALLOC_PASID | \
+					 VFIO_IOMMU_FLAG_FREE_PASID)
+
+#define VFIO_IOMMU_PASID_REQUEST	_IO(VFIO_TYPE, VFIO_BASE + 18)
+
 /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
 
 /*
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v5 08/15] iommu: Pass domain to sva_unbind_gpasid()
  2020-07-12 11:20 [PATCH v5 00/15] vfio: expose virtual Shared Virtual Addressing to VMs Liu Yi L
                   ` (6 preceding siblings ...)
  2020-07-12 11:21 ` [PATCH v5 07/15] vfio/type1: Add VFIO_IOMMU_PASID_REQUEST (alloc/free) Liu Yi L
@ 2020-07-12 11:21 ` Liu Yi L
  2020-07-19 15:37   ` Auger Eric
  2020-07-12 11:21 ` [PATCH v5 09/15] iommu/vt-d: Check ownership for PASIDs from user-space Liu Yi L
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 58+ messages in thread
From: Liu Yi L @ 2020-07-12 11:21 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	iommu, linux-kernel, yi.y.sun, hao.wu

From: Yi Sun <yi.y.sun@intel.com>

Current interface is good enough for SVA virtualization on an assigned
physical PCI device, but when it comes to mediated devices, a physical
device may attached with multiple aux-domains. Also, for guest unbind,
the PASID to be unbind should be allocated to the VM. This check requires
to know the ioasid_set which is associated with the domain.

So this interface needs to pass in domain info. Then the iommu driver is
able to know which domain will be used for the 2nd stage translation of
the nesting mode and also be able to do PASID ownership check. This patch
passes @domain per the above reason.

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Sun <yi.y.sun@intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
v2 -> v3:
*) pass in domain info only
*) use ioasid_t for pasid instead of int type

v1 -> v2:
*) added in v2.
---
 drivers/iommu/intel/svm.c   | 3 ++-
 drivers/iommu/iommu.c       | 2 +-
 include/linux/intel-iommu.h | 3 ++-
 include/linux/iommu.h       | 3 ++-
 4 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index b9a9c55..d2c0e1a 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -432,7 +432,8 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
 	return ret;
 }
 
-int intel_svm_unbind_gpasid(struct device *dev, int pasid)
+int intel_svm_unbind_gpasid(struct iommu_domain *domain,
+			    struct device *dev, ioasid_t pasid)
 {
 	struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
 	struct intel_svm_dev *sdev;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 7910249..d3e554c 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2151,7 +2151,7 @@ int __iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
 	if (unlikely(!domain->ops->sva_unbind_gpasid))
 		return -ENODEV;
 
-	return domain->ops->sva_unbind_gpasid(dev, data->hpasid);
+	return domain->ops->sva_unbind_gpasid(domain, dev, data->hpasid);
 }
 EXPORT_SYMBOL_GPL(__iommu_sva_unbind_gpasid);
 
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 0d0ab32..18f292e 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -738,7 +738,8 @@ extern int intel_svm_enable_prq(struct intel_iommu *iommu);
 extern int intel_svm_finish_prq(struct intel_iommu *iommu);
 int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
 			  struct iommu_gpasid_bind_data *data);
-int intel_svm_unbind_gpasid(struct device *dev, int pasid);
+int intel_svm_unbind_gpasid(struct iommu_domain *domain,
+			    struct device *dev, ioasid_t pasid);
 struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm,
 				 void *drvdata);
 void intel_svm_unbind(struct iommu_sva *handle);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index e84a1d5..ca5edd8 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -303,7 +303,8 @@ struct iommu_ops {
 	int (*sva_bind_gpasid)(struct iommu_domain *domain,
 			struct device *dev, struct iommu_gpasid_bind_data *data);
 
-	int (*sva_unbind_gpasid)(struct device *dev, int pasid);
+	int (*sva_unbind_gpasid)(struct iommu_domain *domain,
+				 struct device *dev, ioasid_t pasid);
 
 	int (*def_domain_type)(struct device *dev);
 
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v5 09/15] iommu/vt-d: Check ownership for PASIDs from user-space
  2020-07-12 11:20 [PATCH v5 00/15] vfio: expose virtual Shared Virtual Addressing to VMs Liu Yi L
                   ` (7 preceding siblings ...)
  2020-07-12 11:21 ` [PATCH v5 08/15] iommu: Pass domain to sva_unbind_gpasid() Liu Yi L
@ 2020-07-12 11:21 ` Liu Yi L
  2020-07-19 16:06   ` Auger Eric
  2020-07-12 11:21 ` [PATCH v5 10/15] vfio/type1: Support binding guest page tables to PASID Liu Yi L
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 58+ messages in thread
From: Liu Yi L @ 2020-07-12 11:21 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	iommu, linux-kernel, yi.y.sun, hao.wu

When an IOMMU domain with nesting attribute is used for guest SVA, a
system-wide PASID is allocated for binding with the device and the domain.
For security reason, we need to check the PASID passsed from user-space.
e.g. page table bind/unbind and PASID related cache invalidation.

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/intel/iommu.c | 10 ++++++++++
 drivers/iommu/intel/svm.c   |  7 +++++--
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 4d54198..a9504cb 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5436,6 +5436,7 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, struct device *dev,
 		int granu = 0;
 		u64 pasid = 0;
 		u64 addr = 0;
+		void *pdata;
 
 		granu = to_vtd_granularity(cache_type, inv_info->granularity);
 		if (granu == -EINVAL) {
@@ -5456,6 +5457,15 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, struct device *dev,
 			 (inv_info->granu.addr_info.flags & IOMMU_INV_ADDR_FLAGS_PASID))
 			pasid = inv_info->granu.addr_info.pasid;
 
+		pdata = ioasid_find(dmar_domain->ioasid_sid, pasid, NULL);
+		if (!pdata) {
+			ret = -EINVAL;
+			goto out_unlock;
+		} else if (IS_ERR(pdata)) {
+			ret = PTR_ERR(pdata);
+			goto out_unlock;
+		}
+
 		switch (BIT(cache_type)) {
 		case IOMMU_CACHE_INV_TYPE_IOTLB:
 			/* HW will ignore LSB bits based on address mask */
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index d2c0e1a..212dee0 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -319,7 +319,7 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
 	dmar_domain = to_dmar_domain(domain);
 
 	mutex_lock(&pasid_mutex);
-	svm = ioasid_find(INVALID_IOASID_SET, data->hpasid, NULL);
+	svm = ioasid_find(dmar_domain->ioasid_sid, data->hpasid, NULL);
 	if (IS_ERR(svm)) {
 		ret = PTR_ERR(svm);
 		goto out;
@@ -436,6 +436,7 @@ int intel_svm_unbind_gpasid(struct iommu_domain *domain,
 			    struct device *dev, ioasid_t pasid)
 {
 	struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
+	struct dmar_domain *dmar_domain;
 	struct intel_svm_dev *sdev;
 	struct intel_svm *svm;
 	int ret = -EINVAL;
@@ -443,8 +444,10 @@ int intel_svm_unbind_gpasid(struct iommu_domain *domain,
 	if (WARN_ON(!iommu))
 		return -EINVAL;
 
+	dmar_domain = to_dmar_domain(domain);
+
 	mutex_lock(&pasid_mutex);
-	svm = ioasid_find(INVALID_IOASID_SET, pasid, NULL);
+	svm = ioasid_find(dmar_domain->ioasid_sid, pasid, NULL);
 	if (!svm) {
 		ret = -EINVAL;
 		goto out;
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v5 10/15] vfio/type1: Support binding guest page tables to PASID
  2020-07-12 11:20 [PATCH v5 00/15] vfio: expose virtual Shared Virtual Addressing to VMs Liu Yi L
                   ` (8 preceding siblings ...)
  2020-07-12 11:21 ` [PATCH v5 09/15] iommu/vt-d: Check ownership for PASIDs from user-space Liu Yi L
@ 2020-07-12 11:21 ` Liu Yi L
  2020-07-20  9:37   ` Auger Eric
  2020-07-12 11:21 ` [PATCH v5 11/15] vfio/type1: Allow invalidating first-level/stage IOMMU cache Liu Yi L
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 58+ messages in thread
From: Liu Yi L @ 2020-07-12 11:21 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	iommu, linux-kernel, yi.y.sun, hao.wu

Nesting translation allows two-levels/stages page tables, with 1st level
for guest translations (e.g. GVA->GPA), 2nd level for host translations
(e.g. GPA->HPA). This patch adds interface for binding guest page tables
to a PASID. This PASID must have been allocated to user space before the
binding request.

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
v3 -> v4:
*) address comments from Alex on v3

v2 -> v3:
*) use __iommu_sva_unbind_gpasid() for unbind call issued by VFIO
   https://lore.kernel.org/linux-iommu/1592931837-58223-6-git-send-email-jacob.jun.pan@linux.intel.com/

v1 -> v2:
*) rename subject from "vfio/type1: Bind guest page tables to host"
*) remove VFIO_IOMMU_BIND, introduce VFIO_IOMMU_NESTING_OP to support bind/
   unbind guet page table
*) replaced vfio_iommu_for_each_dev() with a group level loop since this
   series enforces one group per container w/ nesting type as start.
*) rename vfio_bind/unbind_gpasid_fn() to vfio_dev_bind/unbind_gpasid_fn()
*) vfio_dev_unbind_gpasid() always successful
*) use vfio_mm->pasid_lock to avoid race between PASID free and page table
   bind/unbind
---
 drivers/vfio/vfio_iommu_type1.c | 166 ++++++++++++++++++++++++++++++++++++++++
 drivers/vfio/vfio_pasid.c       |  26 +++++++
 include/linux/vfio.h            |  20 +++++
 include/uapi/linux/vfio.h       |  31 ++++++++
 4 files changed, 243 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 55b4065..f0f21ff 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -149,6 +149,30 @@ struct vfio_regions {
 #define DIRTY_BITMAP_PAGES_MAX	 ((u64)INT_MAX)
 #define DIRTY_BITMAP_SIZE_MAX	 DIRTY_BITMAP_BYTES(DIRTY_BITMAP_PAGES_MAX)
 
+struct domain_capsule {
+	struct vfio_group *group;
+	struct iommu_domain *domain;
+	void *data;
+};
+
+/* iommu->lock must be held */
+static struct vfio_group *vfio_find_nesting_group(struct vfio_iommu *iommu)
+{
+	struct vfio_domain *d;
+	struct vfio_group *group = NULL;
+
+	if (!iommu->nesting_info)
+		return NULL;
+
+	/* only support singleton container with nesting type */
+	list_for_each_entry(d, &iommu->domain_list, next) {
+		list_for_each_entry(group, &d->group_list, next) {
+			break;
+		}
+	}
+	return group;
+}
+
 static int put_pfn(unsigned long pfn, int prot);
 
 static struct vfio_group *vfio_iommu_find_iommu_group(struct vfio_iommu *iommu,
@@ -2349,6 +2373,48 @@ static int vfio_iommu_resv_refresh(struct vfio_iommu *iommu,
 	return ret;
 }
 
+static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
+{
+	struct domain_capsule *dc = (struct domain_capsule *)data;
+	unsigned long arg = *(unsigned long *)dc->data;
+
+	return iommu_sva_bind_gpasid(dc->domain, dev, (void __user *)arg);
+}
+
+static int vfio_dev_unbind_gpasid_fn(struct device *dev, void *data)
+{
+	struct domain_capsule *dc = (struct domain_capsule *)data;
+	unsigned long arg = *(unsigned long *)dc->data;
+
+	iommu_sva_unbind_gpasid(dc->domain, dev, (void __user *)arg);
+	return 0;
+}
+
+static int __vfio_dev_unbind_gpasid_fn(struct device *dev, void *data)
+{
+	struct domain_capsule *dc = (struct domain_capsule *)data;
+	struct iommu_gpasid_bind_data *unbind_data =
+				(struct iommu_gpasid_bind_data *)dc->data;
+
+	__iommu_sva_unbind_gpasid(dc->domain, dev, unbind_data);
+	return 0;
+}
+
+static void vfio_group_unbind_gpasid_fn(ioasid_t pasid, void *data)
+{
+	struct domain_capsule *dc = (struct domain_capsule *)data;
+	struct iommu_gpasid_bind_data unbind_data;
+
+	unbind_data.argsz = offsetof(struct iommu_gpasid_bind_data, vendor);
+	unbind_data.flags = 0;
+	unbind_data.hpasid = pasid;
+
+	dc->data = &unbind_data;
+
+	iommu_group_for_each_dev(dc->group->iommu_group,
+				 dc, __vfio_dev_unbind_gpasid_fn);
+}
+
 static void vfio_iommu_type1_detach_group(void *iommu_data,
 					  struct iommu_group *iommu_group)
 {
@@ -2392,6 +2458,21 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
 		if (!group)
 			continue;
 
+		if (iommu->nesting_info && iommu->vmm &&
+		    (iommu->nesting_info->features &
+					IOMMU_NESTING_FEAT_BIND_PGTBL)) {
+			struct domain_capsule dc = { .group = group,
+						     .domain = domain->domain,
+						     .data = NULL };
+
+			/*
+			 * Unbind page tables bound with system wide PASIDs
+			 * which are allocated to user space.
+			 */
+			vfio_mm_for_each_pasid(iommu->vmm, &dc,
+					       vfio_group_unbind_gpasid_fn);
+		}
+
 		vfio_iommu_detach_group(domain, group);
 		update_dirty_scope = !group->pinned_page_dirty_scope;
 		list_del(&group->next);
@@ -2938,6 +3019,89 @@ static int vfio_iommu_type1_pasid_request(struct vfio_iommu *iommu,
 	}
 }
 
+static long vfio_iommu_handle_pgtbl_op(struct vfio_iommu *iommu,
+				       bool is_bind, unsigned long arg)
+{
+	struct iommu_nesting_info *info;
+	struct domain_capsule dc = { .data = &arg };
+	struct vfio_group *group;
+	struct vfio_domain *domain;
+	int ret;
+
+	mutex_lock(&iommu->lock);
+
+	info = iommu->nesting_info;
+	if (!info || !(info->features & IOMMU_NESTING_FEAT_BIND_PGTBL)) {
+		ret = -EOPNOTSUPP;
+		goto out_unlock_iommu;
+	}
+
+	if (!iommu->vmm) {
+		ret = -EINVAL;
+		goto out_unlock_iommu;
+	}
+
+	group = vfio_find_nesting_group(iommu);
+	if (!group) {
+		ret = -EINVAL;
+		goto out_unlock_iommu;
+	}
+
+	domain = list_first_entry(&iommu->domain_list,
+				  struct vfio_domain, next);
+	dc.group = group;
+	dc.domain = domain->domain;
+
+	/* Avoid race with other containers within the same process */
+	vfio_mm_pasid_lock(iommu->vmm);
+
+	if (is_bind) {
+		ret = iommu_group_for_each_dev(group->iommu_group, &dc,
+					       vfio_dev_bind_gpasid_fn);
+		if (ret)
+			iommu_group_for_each_dev(group->iommu_group, &dc,
+						 vfio_dev_unbind_gpasid_fn);
+	} else {
+		iommu_group_for_each_dev(group->iommu_group,
+					 &dc, vfio_dev_unbind_gpasid_fn);
+		ret = 0;
+	}
+
+	vfio_mm_pasid_unlock(iommu->vmm);
+out_unlock_iommu:
+	mutex_unlock(&iommu->lock);
+	return ret;
+}
+
+static long vfio_iommu_type1_nesting_op(struct vfio_iommu *iommu,
+					unsigned long arg)
+{
+	struct vfio_iommu_type1_nesting_op hdr;
+	unsigned int minsz;
+	int ret;
+
+	minsz = offsetofend(struct vfio_iommu_type1_nesting_op, flags);
+
+	if (copy_from_user(&hdr, (void __user *)arg, minsz))
+		return -EFAULT;
+
+	if (hdr.argsz < minsz || hdr.flags & ~VFIO_NESTING_OP_MASK)
+		return -EINVAL;
+
+	switch (hdr.flags & VFIO_NESTING_OP_MASK) {
+	case VFIO_IOMMU_NESTING_OP_BIND_PGTBL:
+		ret = vfio_iommu_handle_pgtbl_op(iommu, true, arg + minsz);
+		break;
+	case VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL:
+		ret = vfio_iommu_handle_pgtbl_op(iommu, false, arg + minsz);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	return ret;
+}
+
 static long vfio_iommu_type1_ioctl(void *iommu_data,
 				   unsigned int cmd, unsigned long arg)
 {
@@ -2956,6 +3120,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 		return vfio_iommu_type1_dirty_pages(iommu, arg);
 	case VFIO_IOMMU_PASID_REQUEST:
 		return vfio_iommu_type1_pasid_request(iommu, arg);
+	case VFIO_IOMMU_NESTING_OP:
+		return vfio_iommu_type1_nesting_op(iommu, arg);
 	default:
 		return -ENOTTY;
 	}
diff --git a/drivers/vfio/vfio_pasid.c b/drivers/vfio/vfio_pasid.c
index ebec244..ecabaaa 100644
--- a/drivers/vfio/vfio_pasid.c
+++ b/drivers/vfio/vfio_pasid.c
@@ -216,6 +216,8 @@ void vfio_pasid_free_range(struct vfio_mm *vmm,
 	 * IOASID core will notify PASID users (e.g. IOMMU driver) to
 	 * teardown necessary structures depending on the to-be-freed
 	 * PASID.
+	 * Hold pasid_lock also avoids race with PASID usages like bind/
+	 * unbind page tables to requested PASID.
 	 */
 	mutex_lock(&vmm->pasid_lock);
 	while ((vid = vfio_find_pasid(vmm, min, max)) != NULL)
@@ -224,6 +226,30 @@ void vfio_pasid_free_range(struct vfio_mm *vmm,
 }
 EXPORT_SYMBOL_GPL(vfio_pasid_free_range);
 
+int vfio_mm_for_each_pasid(struct vfio_mm *vmm, void *data,
+			   void (*fn)(ioasid_t id, void *data))
+{
+	int ret;
+
+	mutex_lock(&vmm->pasid_lock);
+	ret = ioasid_set_for_each_ioasid(vmm->ioasid_sid, fn, data);
+	mutex_unlock(&vmm->pasid_lock);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(vfio_mm_for_each_pasid);
+
+void vfio_mm_pasid_lock(struct vfio_mm *vmm)
+{
+	mutex_lock(&vmm->pasid_lock);
+}
+EXPORT_SYMBOL_GPL(vfio_mm_pasid_lock);
+
+void vfio_mm_pasid_unlock(struct vfio_mm *vmm)
+{
+	mutex_unlock(&vmm->pasid_lock);
+}
+EXPORT_SYMBOL_GPL(vfio_mm_pasid_unlock);
+
 static int __init vfio_pasid_init(void)
 {
 	mutex_init(&vfio_mm_lock);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index a111108..2bc8347 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -105,6 +105,11 @@ int vfio_mm_ioasid_sid(struct vfio_mm *vmm);
 extern int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max);
 extern void vfio_pasid_free_range(struct vfio_mm *vmm,
 				  ioasid_t min, ioasid_t max);
+extern int vfio_mm_for_each_pasid(struct vfio_mm *vmm, void *data,
+				  void (*fn)(ioasid_t id, void *data));
+extern void vfio_mm_pasid_lock(struct vfio_mm *vmm);
+extern void vfio_mm_pasid_unlock(struct vfio_mm *vmm);
+
 #else
 static inline struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task)
 {
@@ -129,6 +134,21 @@ static inline void vfio_pasid_free_range(struct vfio_mm *vmm,
 					  ioasid_t min, ioasid_t max)
 {
 }
+
+static inline int vfio_mm_for_each_pasid(struct vfio_mm *vmm, void *data,
+					 void (*fn)(ioasid_t id, void *data))
+{
+	return -ENOTTY;
+}
+
+static inline void vfio_mm_pasid_lock(struct vfio_mm *vmm)
+{
+}
+
+static inline void vfio_mm_pasid_unlock(struct vfio_mm *vmm)
+{
+}
+
 #endif /* CONFIG_VFIO_PASID */
 
 /*
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 96a115f..a8ad786 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -1209,6 +1209,37 @@ struct vfio_iommu_type1_pasid_request {
 
 #define VFIO_IOMMU_PASID_REQUEST	_IO(VFIO_TYPE, VFIO_BASE + 18)
 
+/**
+ * VFIO_IOMMU_NESTING_OP - _IOW(VFIO_TYPE, VFIO_BASE + 19,
+ *				struct vfio_iommu_type1_nesting_op)
+ *
+ * This interface allows user space to utilize the nesting IOMMU
+ * capabilities as reported in VFIO_IOMMU_TYPE1_INFO_CAP_NESTING
+ * cap through VFIO_IOMMU_GET_INFO.
+ *
+ * @data[] types defined for each op:
+ * +=================+===============================================+
+ * | NESTING OP      |      @data[]                                  |
+ * +=================+===============================================+
+ * | BIND_PGTBL      |      struct iommu_gpasid_bind_data            |
+ * +-----------------+-----------------------------------------------+
+ * | UNBIND_PGTBL    |      struct iommu_gpasid_bind_data            |
+ * +-----------------+-----------------------------------------------+
+ *
+ * returns: 0 on success, -errno on failure.
+ */
+struct vfio_iommu_type1_nesting_op {
+	__u32	argsz;
+	__u32	flags;
+#define VFIO_NESTING_OP_MASK	(0xffff) /* lower 16-bits for op */
+	__u8	data[];
+};
+
+#define VFIO_IOMMU_NESTING_OP_BIND_PGTBL	(0)
+#define VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL	(1)
+
+#define VFIO_IOMMU_NESTING_OP		_IO(VFIO_TYPE, VFIO_BASE + 19)
+
 /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
 
 /*
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v5 11/15] vfio/type1: Allow invalidating first-level/stage IOMMU cache
  2020-07-12 11:20 [PATCH v5 00/15] vfio: expose virtual Shared Virtual Addressing to VMs Liu Yi L
                   ` (9 preceding siblings ...)
  2020-07-12 11:21 ` [PATCH v5 10/15] vfio/type1: Support binding guest page tables to PASID Liu Yi L
@ 2020-07-12 11:21 ` Liu Yi L
  2020-07-20  9:41   ` Auger Eric
  2020-07-12 11:21 ` [PATCH v5 12/15] vfio/type1: Add vSVA support for IOMMU-backed mdevs Liu Yi L
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 58+ messages in thread
From: Liu Yi L @ 2020-07-12 11:21 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	iommu, linux-kernel, yi.y.sun, hao.wu

This patch provides an interface allowing the userspace to invalidate
IOMMU cache for first-level page table. It is required when the first
level IOMMU page table is not managed by the host kernel in the nested
translation setup.

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
v1 -> v2:
*) rename from "vfio/type1: Flush stage-1 IOMMU cache for nesting type"
*) rename vfio_cache_inv_fn() to vfio_dev_cache_invalidate_fn()
*) vfio_dev_cache_inv_fn() always successful
*) remove VFIO_IOMMU_CACHE_INVALIDATE, and reuse VFIO_IOMMU_NESTING_OP
---
 drivers/vfio/vfio_iommu_type1.c | 50 +++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/vfio.h       |  3 +++
 2 files changed, 53 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index f0f21ff..960cc59 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -3073,6 +3073,53 @@ static long vfio_iommu_handle_pgtbl_op(struct vfio_iommu *iommu,
 	return ret;
 }
 
+static int vfio_dev_cache_invalidate_fn(struct device *dev, void *data)
+{
+	struct domain_capsule *dc = (struct domain_capsule *)data;
+	unsigned long arg = *(unsigned long *)dc->data;
+
+	iommu_cache_invalidate(dc->domain, dev, (void __user *)arg);
+	return 0;
+}
+
+static long vfio_iommu_invalidate_cache(struct vfio_iommu *iommu,
+					unsigned long arg)
+{
+	struct domain_capsule dc = { .data = &arg };
+	struct vfio_group *group;
+	struct vfio_domain *domain;
+	int ret = 0;
+	struct iommu_nesting_info *info;
+
+	mutex_lock(&iommu->lock);
+	/*
+	 * Cache invalidation is required for any nesting IOMMU,
+	 * so no need to check system-wide PASID support.
+	 */
+	info = iommu->nesting_info;
+	if (!info || !(info->features & IOMMU_NESTING_FEAT_CACHE_INVLD)) {
+		ret = -EOPNOTSUPP;
+		goto out_unlock;
+	}
+
+	group = vfio_find_nesting_group(iommu);
+	if (!group) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	domain = list_first_entry(&iommu->domain_list,
+				  struct vfio_domain, next);
+	dc.group = group;
+	dc.domain = domain->domain;
+	iommu_group_for_each_dev(group->iommu_group, &dc,
+				 vfio_dev_cache_invalidate_fn);
+
+out_unlock:
+	mutex_unlock(&iommu->lock);
+	return ret;
+}
+
 static long vfio_iommu_type1_nesting_op(struct vfio_iommu *iommu,
 					unsigned long arg)
 {
@@ -3095,6 +3142,9 @@ static long vfio_iommu_type1_nesting_op(struct vfio_iommu *iommu,
 	case VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL:
 		ret = vfio_iommu_handle_pgtbl_op(iommu, false, arg + minsz);
 		break;
+	case VFIO_IOMMU_NESTING_OP_CACHE_INVLD:
+		ret = vfio_iommu_invalidate_cache(iommu, arg + minsz);
+		break;
 	default:
 		ret = -EINVAL;
 	}
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index a8ad786..845a5800 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -1225,6 +1225,8 @@ struct vfio_iommu_type1_pasid_request {
  * +-----------------+-----------------------------------------------+
  * | UNBIND_PGTBL    |      struct iommu_gpasid_bind_data            |
  * +-----------------+-----------------------------------------------+
+ * | CACHE_INVLD     |      struct iommu_cache_invalidate_info       |
+ * +-----------------+-----------------------------------------------+
  *
  * returns: 0 on success, -errno on failure.
  */
@@ -1237,6 +1239,7 @@ struct vfio_iommu_type1_nesting_op {
 
 #define VFIO_IOMMU_NESTING_OP_BIND_PGTBL	(0)
 #define VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL	(1)
+#define VFIO_IOMMU_NESTING_OP_CACHE_INVLD	(2)
 
 #define VFIO_IOMMU_NESTING_OP		_IO(VFIO_TYPE, VFIO_BASE + 19)
 
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v5 12/15] vfio/type1: Add vSVA support for IOMMU-backed mdevs
  2020-07-12 11:20 [PATCH v5 00/15] vfio: expose virtual Shared Virtual Addressing to VMs Liu Yi L
                   ` (10 preceding siblings ...)
  2020-07-12 11:21 ` [PATCH v5 11/15] vfio/type1: Allow invalidating first-level/stage IOMMU cache Liu Yi L
@ 2020-07-12 11:21 ` Liu Yi L
  2020-07-20 12:22   ` Auger Eric
  2020-07-12 11:21 ` [PATCH v5 13/15] vfio/pci: Expose PCIe PASID capability to guest Liu Yi L
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 58+ messages in thread
From: Liu Yi L @ 2020-07-12 11:21 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	iommu, linux-kernel, yi.y.sun, hao.wu

Recent years, mediated device pass-through framework (e.g. vfio-mdev)
is used to achieve flexible device sharing across domains (e.g. VMs).
Also there are hardware assisted mediated pass-through solutions from
platform vendors. e.g. Intel VT-d scalable mode which supports Intel
Scalable I/O Virtualization technology. Such mdevs are called IOMMU-
backed mdevs as there are IOMMU enforced DMA isolation for such mdevs.
In kernel, IOMMU-backed mdevs are exposed to IOMMU layer by aux-domain
concept, which means mdevs are protected by an iommu domain which is
auxiliary to the domain that the kernel driver primarily uses for DMA
API. Details can be found in the KVM presentation as below:

https://events19.linuxfoundation.org/wp-content/uploads/2017/12/\
Hardware-Assisted-Mediated-Pass-Through-with-VFIO-Kevin-Tian-Intel.pdf

This patch extends NESTING_IOMMU ops to IOMMU-backed mdev devices. The
main requirement is to use the auxiliary domain associated with mdev.

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
CC: Jun Tian <jun.j.tian@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
v1 -> v2:
*) check the iommu_device to ensure the handling mdev is IOMMU-backed
---
 drivers/vfio/vfio_iommu_type1.c | 39 +++++++++++++++++++++++++++++++++++----
 1 file changed, 35 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 960cc59..f1f1ae2 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -2373,20 +2373,41 @@ static int vfio_iommu_resv_refresh(struct vfio_iommu *iommu,
 	return ret;
 }
 
+static struct device *vfio_get_iommu_device(struct vfio_group *group,
+					    struct device *dev)
+{
+	if (group->mdev_group)
+		return vfio_mdev_get_iommu_device(dev);
+	else
+		return dev;
+}
+
 static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
 {
 	struct domain_capsule *dc = (struct domain_capsule *)data;
 	unsigned long arg = *(unsigned long *)dc->data;
+	struct device *iommu_device;
+
+	iommu_device = vfio_get_iommu_device(dc->group, dev);
+	if (!iommu_device)
+		return -EINVAL;
 
-	return iommu_sva_bind_gpasid(dc->domain, dev, (void __user *)arg);
+	return iommu_sva_bind_gpasid(dc->domain, iommu_device,
+				     (void __user *)arg);
 }
 
 static int vfio_dev_unbind_gpasid_fn(struct device *dev, void *data)
 {
 	struct domain_capsule *dc = (struct domain_capsule *)data;
 	unsigned long arg = *(unsigned long *)dc->data;
+	struct device *iommu_device;
 
-	iommu_sva_unbind_gpasid(dc->domain, dev, (void __user *)arg);
+	iommu_device = vfio_get_iommu_device(dc->group, dev);
+	if (!iommu_device)
+		return -EINVAL;
+
+	iommu_sva_unbind_gpasid(dc->domain, iommu_device,
+				(void __user *)arg);
 	return 0;
 }
 
@@ -2395,8 +2416,13 @@ static int __vfio_dev_unbind_gpasid_fn(struct device *dev, void *data)
 	struct domain_capsule *dc = (struct domain_capsule *)data;
 	struct iommu_gpasid_bind_data *unbind_data =
 				(struct iommu_gpasid_bind_data *)dc->data;
+	struct device *iommu_device;
+
+	iommu_device = vfio_get_iommu_device(dc->group, dev);
+	if (!iommu_device)
+		return -EINVAL;
 
-	__iommu_sva_unbind_gpasid(dc->domain, dev, unbind_data);
+	__iommu_sva_unbind_gpasid(dc->domain, iommu_device, unbind_data);
 	return 0;
 }
 
@@ -3077,8 +3103,13 @@ static int vfio_dev_cache_invalidate_fn(struct device *dev, void *data)
 {
 	struct domain_capsule *dc = (struct domain_capsule *)data;
 	unsigned long arg = *(unsigned long *)dc->data;
+	struct device *iommu_device;
+
+	iommu_device = vfio_get_iommu_device(dc->group, dev);
+	if (!iommu_device)
+		return -EINVAL;
 
-	iommu_cache_invalidate(dc->domain, dev, (void __user *)arg);
+	iommu_cache_invalidate(dc->domain, iommu_device, (void __user *)arg);
 	return 0;
 }
 
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v5 13/15] vfio/pci: Expose PCIe PASID capability to guest
  2020-07-12 11:20 [PATCH v5 00/15] vfio: expose virtual Shared Virtual Addressing to VMs Liu Yi L
                   ` (11 preceding siblings ...)
  2020-07-12 11:21 ` [PATCH v5 12/15] vfio/type1: Add vSVA support for IOMMU-backed mdevs Liu Yi L
@ 2020-07-12 11:21 ` Liu Yi L
  2020-07-20 12:35   ` Auger Eric
  2020-07-12 11:21 ` [PATCH v5 14/15] vfio: Document dual stage control Liu Yi L
  2020-07-12 11:21 ` [PATCH v5 15/15] iommu/vt-d: Support reporting nesting capability info Liu Yi L
  14 siblings, 1 reply; 58+ messages in thread
From: Liu Yi L @ 2020-07-12 11:21 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	iommu, linux-kernel, yi.y.sun, hao.wu

This patch exposes PCIe PASID capability to guest for assigned devices.
Existing vfio_pci driver hides it from guest by setting the capability
length as 0 in pci_ext_cap_length[].

And this patch only exposes PASID capability for devices which has PCIe
PASID extended struture in its configuration space. So VFs, will will
not see PASID capability on VFs as VF doesn't implement PASID extended
structure in its configuration space. For VF, it is a TODO in future.
Related discussion can be found in below link:

https://lkml.org/lkml/2020/4/7/693

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
v1 -> v2:
*) added in v2, but it was sent in a separate patchseries before
---
 drivers/vfio/pci/vfio_pci_config.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index d98843f..07ff2e6 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -95,7 +95,7 @@ static const u16 pci_ext_cap_length[PCI_EXT_CAP_ID_MAX + 1] = {
 	[PCI_EXT_CAP_ID_LTR]	=	PCI_EXT_CAP_LTR_SIZEOF,
 	[PCI_EXT_CAP_ID_SECPCI]	=	0,	/* not yet */
 	[PCI_EXT_CAP_ID_PMUX]	=	0,	/* not yet */
-	[PCI_EXT_CAP_ID_PASID]	=	0,	/* not yet */
+	[PCI_EXT_CAP_ID_PASID]	=	PCI_EXT_CAP_PASID_SIZEOF,
 };
 
 /*
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v5 14/15] vfio: Document dual stage control
  2020-07-12 11:20 [PATCH v5 00/15] vfio: expose virtual Shared Virtual Addressing to VMs Liu Yi L
                   ` (12 preceding siblings ...)
  2020-07-12 11:21 ` [PATCH v5 13/15] vfio/pci: Expose PCIe PASID capability to guest Liu Yi L
@ 2020-07-12 11:21 ` Liu Yi L
  2020-07-18 13:39   ` Auger Eric
  2020-07-12 11:21 ` [PATCH v5 15/15] iommu/vt-d: Support reporting nesting capability info Liu Yi L
  14 siblings, 1 reply; 58+ messages in thread
From: Liu Yi L @ 2020-07-12 11:21 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	iommu, linux-kernel, yi.y.sun, hao.wu

From: Eric Auger <eric.auger@redhat.com>

The VFIO API was enhanced to support nested stage control: a bunch of
new iotcls and usage guideline.

Let's document the process to follow to set up nested mode.

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
v3 -> v4:
*) add review-by from Stefan Hajnoczi

v2 -> v3:
*) address comments from Stefan Hajnoczi

v1 -> v2:
*) new in v2, compared with Eric's original version, pasid table bind
   and fault reporting is removed as this series doesn't cover them.
   Original version from Eric.
   https://lkml.org/lkml/2020/3/20/700
---
 Documentation/driver-api/vfio.rst | 67 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 67 insertions(+)

diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst
index f1a4d3c..0672c45 100644
--- a/Documentation/driver-api/vfio.rst
+++ b/Documentation/driver-api/vfio.rst
@@ -239,6 +239,73 @@ group and can access them as follows::
 	/* Gratuitous device reset and go... */
 	ioctl(device, VFIO_DEVICE_RESET);
 
+IOMMU Dual Stage Control
+------------------------
+
+Some IOMMUs support 2 stages/levels of translation. Stage corresponds to
+the ARM terminology while level corresponds to Intel's VTD terminology.
+In the following text we use either without distinction.
+
+This is useful when the guest is exposed with a virtual IOMMU and some
+devices are assigned to the guest through VFIO. Then the guest OS can use
+stage 1 (GIOVA -> GPA or GVA->GPA), while the hypervisor uses stage 2 for
+VM isolation (GPA -> HPA).
+
+Under dual stage translation, the guest gets ownership of the stage 1 page
+tables and also owns stage 1 configuration structures. The hypervisor owns
+the root configuration structure (for security reason), including stage 2
+configuration. This works as long as configuration structures and page table
+formats are compatible between the virtual IOMMU and the physical IOMMU.
+
+Assuming the HW supports it, this nested mode is selected by choosing the
+VFIO_TYPE1_NESTING_IOMMU type through:
+
+    ioctl(container, VFIO_SET_IOMMU, VFIO_TYPE1_NESTING_IOMMU);
+
+This forces the hypervisor to use the stage 2, leaving stage 1 available
+for guest usage. The guest stage 1 format depends on IOMMU vendor, and
+it is the same with the nesting configuration method. User space should
+check the format and configuration method after setting nesting type by
+using:
+
+    ioctl(container->fd, VFIO_IOMMU_GET_INFO, &nesting_info);
+
+Details can be found in Documentation/userspace-api/iommu.rst. For Intel
+VT-d, each stage 1 page table is bound to host by:
+
+    nesting_op->flags = VFIO_IOMMU_NESTING_OP_BIND_PGTBL;
+    memcpy(&nesting_op->data, &bind_data, sizeof(bind_data));
+    ioctl(container->fd, VFIO_IOMMU_NESTING_OP, nesting_op);
+
+As mentioned above, guest OS may use stage 1 for GIOVA->GPA or GVA->GPA.
+GVA->GPA page tables are available when PASID (Process Address Space ID)
+is exposed to guest. e.g. guest with PASID-capable devices assigned. For
+such page table binding, the bind_data should include PASID info, which
+is allocated by guest itself or by host. This depends on hardware vendor.
+e.g. Intel VT-d requires to allocate PASID from host. This requirement is
+defined by the Virtual Command Support in VT-d 3.0 spec, guest software
+running on VT-d should allocate PASID from host kernel. To allocate PASID
+from host, user space should check the IOMMU_NESTING_FEAT_SYSWIDE_PASID
+bit of the nesting info reported from host kernel. VFIO reports the nesting
+info by VFIO_IOMMU_GET_INFO. User space could allocate PASID from host by:
+
+    req.flags = VFIO_IOMMU_ALLOC_PASID;
+    ioctl(container, VFIO_IOMMU_PASID_REQUEST, &req);
+
+With first stage/level page table bound to host, it allows to combine the
+guest stage 1 translation along with the hypervisor stage 2 translation to
+get final address.
+
+When the guest invalidates stage 1 related caches, invalidations must be
+forwarded to the host through
+
+    nesting_op->flags = VFIO_IOMMU_NESTING_OP_CACHE_INVLD;
+    memcpy(&nesting_op->data, &inv_data, sizeof(inv_data));
+    ioctl(container->fd, VFIO_IOMMU_NESTING_OP, nesting_op);
+
+Those invalidations can happen at various granularity levels, page, context,
+...
+
 VFIO User API
 -------------------------------------------------------------------------------
 
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v5 15/15] iommu/vt-d: Support reporting nesting capability info
  2020-07-12 11:20 [PATCH v5 00/15] vfio: expose virtual Shared Virtual Addressing to VMs Liu Yi L
                   ` (13 preceding siblings ...)
  2020-07-12 11:21 ` [PATCH v5 14/15] vfio: Document dual stage control Liu Yi L
@ 2020-07-12 11:21 ` Liu Yi L
  2020-07-17 17:14   ` Auger Eric
  14 siblings, 1 reply; 58+ messages in thread
From: Liu Yi L @ 2020-07-12 11:21 UTC (permalink / raw)
  To: alex.williamson, eric.auger, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	iommu, linux-kernel, yi.y.sun, hao.wu

Cc: Kevin Tian <kevin.tian@intel.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
v2 -> v3:
*) remove cap/ecap_mask in iommu_nesting_info.
---
 drivers/iommu/intel/iommu.c | 81 +++++++++++++++++++++++++++++++++++++++++++--
 include/linux/intel-iommu.h | 16 +++++++++
 2 files changed, 95 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index a9504cb..9f7ad1a 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5659,12 +5659,16 @@ static inline bool iommu_pasid_support(void)
 static inline bool nested_mode_support(void)
 {
 	struct dmar_drhd_unit *drhd;
-	struct intel_iommu *iommu;
+	struct intel_iommu *iommu, *prev = NULL;
 	bool ret = true;
 
 	rcu_read_lock();
 	for_each_active_iommu(iommu, drhd) {
-		if (!sm_supported(iommu) || !ecap_nest(iommu->ecap)) {
+		if (!prev)
+			prev = iommu;
+		if (!sm_supported(iommu) || !ecap_nest(iommu->ecap) ||
+		    (VTD_CAP_MASK & (iommu->cap ^ prev->cap)) ||
+		    (VTD_ECAP_MASK & (iommu->ecap ^ prev->ecap))) {
 			ret = false;
 			break;
 		}
@@ -6079,6 +6083,78 @@ intel_iommu_domain_set_attr(struct iommu_domain *domain,
 	return ret;
 }
 
+static int intel_iommu_get_nesting_info(struct iommu_domain *domain,
+					struct iommu_nesting_info *info)
+{
+	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+	u64 cap = VTD_CAP_MASK, ecap = VTD_ECAP_MASK;
+	struct device_domain_info *domain_info;
+	struct iommu_nesting_info_vtd vtd;
+	unsigned long flags;
+	unsigned int size;
+
+	if (domain->type != IOMMU_DOMAIN_UNMANAGED ||
+	    !(dmar_domain->flags & DOMAIN_FLAG_NESTING_MODE))
+		return -ENODEV;
+
+	if (!info)
+		return -EINVAL;
+
+	size = sizeof(struct iommu_nesting_info) +
+		sizeof(struct iommu_nesting_info_vtd);
+	/*
+	 * if provided buffer size is smaller than expected, should
+	 * return 0 and also the expected buffer size to caller.
+	 */
+	if (info->size < size) {
+		info->size = size;
+		return 0;
+	}
+
+	spin_lock_irqsave(&device_domain_lock, flags);
+	/*
+	 * arbitrary select the first domain_info as all nesting
+	 * related capabilities should be consistent across iommu
+	 * units.
+	 */
+	domain_info = list_first_entry(&dmar_domain->devices,
+				       struct device_domain_info, link);
+	cap &= domain_info->iommu->cap;
+	ecap &= domain_info->iommu->ecap;
+	spin_unlock_irqrestore(&device_domain_lock, flags);
+
+	info->format = IOMMU_PASID_FORMAT_INTEL_VTD;
+	info->features = IOMMU_NESTING_FEAT_SYSWIDE_PASID |
+			 IOMMU_NESTING_FEAT_BIND_PGTBL |
+			 IOMMU_NESTING_FEAT_CACHE_INVLD;
+	info->addr_width = dmar_domain->gaw;
+	info->pasid_bits = ilog2(intel_pasid_max_id);
+	info->padding = 0;
+	vtd.flags = 0;
+	vtd.padding = 0;
+	vtd.cap_reg = cap;
+	vtd.ecap_reg = ecap;
+
+	memcpy(info->data, &vtd, sizeof(vtd));
+	return 0;
+}
+
+static int intel_iommu_domain_get_attr(struct iommu_domain *domain,
+				       enum iommu_attr attr, void *data)
+{
+	switch (attr) {
+	case DOMAIN_ATTR_NESTING:
+	{
+		struct iommu_nesting_info *info =
+				(struct iommu_nesting_info *)data;
+
+		return intel_iommu_get_nesting_info(domain, info);
+	}
+	default:
+		return -ENODEV;
+	}
+}
+
 /*
  * Check that the device does not live on an external facing PCI port that is
  * marked as untrusted. Such devices should not be able to apply quirks and
@@ -6101,6 +6177,7 @@ const struct iommu_ops intel_iommu_ops = {
 	.domain_alloc		= intel_iommu_domain_alloc,
 	.domain_free		= intel_iommu_domain_free,
 	.domain_set_attr	= intel_iommu_domain_set_attr,
+	.domain_get_attr	= intel_iommu_domain_get_attr,
 	.attach_dev		= intel_iommu_attach_device,
 	.detach_dev		= intel_iommu_detach_device,
 	.aux_attach_dev		= intel_iommu_aux_attach_device,
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 18f292e..c4ed0d4 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -197,6 +197,22 @@
 #define ecap_max_handle_mask(e) ((e >> 20) & 0xf)
 #define ecap_sc_support(e)	((e >> 7) & 0x1) /* Snooping Control */
 
+/* Nesting Support Capability Alignment */
+#define VTD_CAP_FL1GP		BIT_ULL(56)
+#define VTD_CAP_FL5LP		BIT_ULL(60)
+#define VTD_ECAP_PRS		BIT_ULL(29)
+#define VTD_ECAP_ERS		BIT_ULL(30)
+#define VTD_ECAP_SRS		BIT_ULL(31)
+#define VTD_ECAP_EAFS		BIT_ULL(34)
+#define VTD_ECAP_PASID		BIT_ULL(40)
+
+/* Only capabilities marked in below MASKs are reported */
+#define VTD_CAP_MASK		(VTD_CAP_FL1GP | VTD_CAP_FL5LP)
+
+#define VTD_ECAP_MASK		(VTD_ECAP_PRS | VTD_ECAP_ERS | \
+				 VTD_ECAP_SRS | VTD_ECAP_EAFS | \
+				 VTD_ECAP_PASID)
+
 /* Virtual command interface capability */
 #define vccap_pasid(v)		(((v) & DMA_VCS_PAS)) /* PASID allocation */
 
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH v5 03/15] iommu/smmu: Report empty domain nesting info
  2020-07-12 11:20 ` [PATCH v5 03/15] iommu/smmu: Report empty " Liu Yi L
@ 2020-07-13 13:14   ` Will Deacon
  2020-07-14 10:12     ` Liu, Yi L
  2020-07-17 17:18   ` Auger Eric
  1 sibling, 1 reply; 58+ messages in thread
From: Will Deacon @ 2020-07-13 13:14 UTC (permalink / raw)
  To: Liu Yi L
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, yi.y.sun,
	linux-kernel, alex.williamson, iommu, hao.wu, Robin Murphy,
	jun.j.tian

On Sun, Jul 12, 2020 at 04:20:58AM -0700, Liu Yi L wrote:
> This patch is added as instead of returning a boolean for DOMAIN_ATTR_NESTING,
> iommu_domain_get_attr() should return an iommu_nesting_info handle.
> 
> Cc: Will Deacon <will@kernel.org>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: Eric Auger <eric.auger@redhat.com>
> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> Suggested-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
> v4 -> v5:
> *) address comments from Eric Auger.
> ---
>  drivers/iommu/arm-smmu-v3.c | 29 +++++++++++++++++++++++++++--
>  drivers/iommu/arm-smmu.c    | 29 +++++++++++++++++++++++++++--
>  2 files changed, 54 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index f578677..ec815d7 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -3019,6 +3019,32 @@ static struct iommu_group *arm_smmu_device_group(struct device *dev)
>  	return group;
>  }
>  
> +static int arm_smmu_domain_nesting_info(struct arm_smmu_domain *smmu_domain,
> +					void *data)
> +{
> +	struct iommu_nesting_info *info = (struct iommu_nesting_info *)data;
> +	unsigned int size;
> +
> +	if (!info || smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
> +		return -ENODEV;
> +
> +	size = sizeof(struct iommu_nesting_info);
> +
> +	/*
> +	 * if provided buffer size is smaller than expected, should
> +	 * return 0 and also the expected buffer size to caller.
> +	 */
> +	if (info->size < size) {
> +		info->size = size;
> +		return 0;
> +	}
> +
> +	/* report an empty iommu_nesting_info for now */
> +	memset(info, 0x0, size);
> +	info->size = size;
> +	return 0;
> +}

Have you verified that this doesn't break the existing usage of
DOMAIN_ATTR_NESTING in drivers/vfio/vfio_iommu_type1.c?

Will
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [PATCH v5 03/15] iommu/smmu: Report empty domain nesting info
  2020-07-13 13:14   ` Will Deacon
@ 2020-07-14 10:12     ` Liu, Yi L
  2020-07-16 15:39       ` Jean-Philippe Brucker
  0 siblings, 1 reply; 58+ messages in thread
From: Liu, Yi L @ 2020-07-14 10:12 UTC (permalink / raw)
  To: Will Deacon
  Cc: jean-philippe, Tian, Kevin, Raj, Ashok, kvm, stefanha, Sun, Yi Y,
	linux-kernel, alex.williamson, iommu, Wu, Hao, Robin Murphy,
	Tian,  Jun J

Hi Will,

> From: Will Deacon <will@kernel.org>
> Sent: Monday, July 13, 2020 9:15 PM
> 
> On Sun, Jul 12, 2020 at 04:20:58AM -0700, Liu Yi L wrote:
> > This patch is added as instead of returning a boolean for DOMAIN_ATTR_NESTING,
> > iommu_domain_get_attr() should return an iommu_nesting_info handle.
> >
> > Cc: Will Deacon <will@kernel.org>
> > Cc: Robin Murphy <robin.murphy@arm.com>
> > Cc: Eric Auger <eric.auger@redhat.com>
> > Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > Suggested-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > ---
> > v4 -> v5:
> > *) address comments from Eric Auger.
> > ---
> >  drivers/iommu/arm-smmu-v3.c | 29 +++++++++++++++++++++++++++--
> >  drivers/iommu/arm-smmu.c    | 29 +++++++++++++++++++++++++++--
> >  2 files changed, 54 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> > index f578677..ec815d7 100644
> > --- a/drivers/iommu/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm-smmu-v3.c
> > @@ -3019,6 +3019,32 @@ static struct iommu_group
> *arm_smmu_device_group(struct device *dev)
> >  	return group;
> >  }
> >
> > +static int arm_smmu_domain_nesting_info(struct arm_smmu_domain
> *smmu_domain,
> > +					void *data)
> > +{
> > +	struct iommu_nesting_info *info = (struct iommu_nesting_info *)data;
> > +	unsigned int size;
> > +
> > +	if (!info || smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
> > +		return -ENODEV;
> > +
> > +	size = sizeof(struct iommu_nesting_info);
> > +
> > +	/*
> > +	 * if provided buffer size is smaller than expected, should
> > +	 * return 0 and also the expected buffer size to caller.
> > +	 */
> > +	if (info->size < size) {
> > +		info->size = size;
> > +		return 0;
> > +	}
> > +
> > +	/* report an empty iommu_nesting_info for now */
> > +	memset(info, 0x0, size);
> > +	info->size = size;
> > +	return 0;
> > +}
> 
> Have you verified that this doesn't break the existing usage of
> DOMAIN_ATTR_NESTING in drivers/vfio/vfio_iommu_type1.c?

I didn't have ARM machine on my hand. But I contacted with Jean
Philippe, he confirmed no compiling issue. I didn't see any code
getting DOMAIN_ATTR_NESTING attr in current drivers/vfio/vfio_iommu_type1.c.
What I'm adding is to call iommu_domai_get_attr(, DOMAIN_ATTR_NESTIN)
and won't fail if the iommu_domai_get_attr() returns 0. This patch
returns an empty nesting info for DOMAIN_ATTR_NESTIN and return
value is 0 if no error. So I guess it won't fail nesting for ARM.

@Eric, how about your opinion? your dual-stage vSMMU support may
also share the vfio_iommu_type1.c code.

Regards,
Yi Liu

> Will
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v5 03/15] iommu/smmu: Report empty domain nesting info
  2020-07-14 10:12     ` Liu, Yi L
@ 2020-07-16 15:39       ` Jean-Philippe Brucker
  2020-07-16 20:38         ` Auger Eric
  2020-07-23  9:40         ` Liu, Yi L
  0 siblings, 2 replies; 58+ messages in thread
From: Jean-Philippe Brucker @ 2020-07-16 15:39 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Tian, Kevin, Raj, Ashok, kvm, stefanha, Sun, Yi Y, linux-kernel,
	alex.williamson, iommu, Robin Murphy, Wu, Hao, Will Deacon, Tian,
	Jun J

On Tue, Jul 14, 2020 at 10:12:49AM +0000, Liu, Yi L wrote:
> > Have you verified that this doesn't break the existing usage of
> > DOMAIN_ATTR_NESTING in drivers/vfio/vfio_iommu_type1.c?
> 
> I didn't have ARM machine on my hand. But I contacted with Jean
> Philippe, he confirmed no compiling issue. I didn't see any code
> getting DOMAIN_ATTR_NESTING attr in current drivers/vfio/vfio_iommu_type1.c.
> What I'm adding is to call iommu_domai_get_attr(, DOMAIN_ATTR_NESTIN)
> and won't fail if the iommu_domai_get_attr() returns 0. This patch
> returns an empty nesting info for DOMAIN_ATTR_NESTIN and return
> value is 0 if no error. So I guess it won't fail nesting for ARM.

I confirm that this series doesn't break the current support for
VFIO_IOMMU_TYPE1_NESTING with an SMMUv3. That said...

If the SMMU does not support stage-2 then there is a change in behavior
(untested): after the domain is silently switched to stage-1 by the SMMU
driver, VFIO will now query nesting info and obtain -ENODEV. Instead of
succeding as before, the VFIO ioctl will now fail. I believe that's a fix
rather than a regression, it should have been like this since the
beginning. No known userspace has been using VFIO_IOMMU_TYPE1_NESTING so
far, so I don't think it should be a concern.

And if userspace queries the nesting properties using the new ABI
introduced in this patchset, it will obtain an empty struct. I think
that's acceptable, but it may be better to avoid adding the nesting cap if
@format is 0?

Thanks,
Jean

> 
> @Eric, how about your opinion? your dual-stage vSMMU support may
> also share the vfio_iommu_type1.c code.
> 
> Regards,
> Yi Liu
> 
> > Will
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v5 03/15] iommu/smmu: Report empty domain nesting info
  2020-07-16 15:39       ` Jean-Philippe Brucker
@ 2020-07-16 20:38         ` Auger Eric
  2020-07-17  9:09           ` Jean-Philippe Brucker
  2020-07-23  9:40         ` Liu, Yi L
  1 sibling, 1 reply; 58+ messages in thread
From: Auger Eric @ 2020-07-16 20:38 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Liu, Yi L
  Cc: Tian, Kevin, Raj, Ashok, kvm, stefanha, Sun, Yi Y, linux-kernel,
	alex.williamson, iommu, Robin Murphy, Wu, Hao, Will Deacon, Tian,
	Jun J

Hi Jean,

On 7/16/20 5:39 PM, Jean-Philippe Brucker wrote:
> On Tue, Jul 14, 2020 at 10:12:49AM +0000, Liu, Yi L wrote:
>>> Have you verified that this doesn't break the existing usage of
>>> DOMAIN_ATTR_NESTING in drivers/vfio/vfio_iommu_type1.c?
>>
>> I didn't have ARM machine on my hand. But I contacted with Jean
>> Philippe, he confirmed no compiling issue. I didn't see any code
>> getting DOMAIN_ATTR_NESTING attr in current drivers/vfio/vfio_iommu_type1.c.
>> What I'm adding is to call iommu_domai_get_attr(, DOMAIN_ATTR_NESTIN)
>> and won't fail if the iommu_domai_get_attr() returns 0. This patch
>> returns an empty nesting info for DOMAIN_ATTR_NESTIN and return
>> value is 0 if no error. So I guess it won't fail nesting for ARM.
> 
> I confirm that this series doesn't break the current support for
> VFIO_IOMMU_TYPE1_NESTING with an SMMUv3. That said...
> 
> If the SMMU does not support stage-2 then there is a change in behavior
> (untested): after the domain is silently switched to stage-1 by the SMMU
> driver, VFIO will now query nesting info and obtain -ENODEV. Instead of
> succeding as before, the VFIO ioctl will now fail. I believe that's a fix
> rather than a regression, it should have been like this since the
> beginning. No known userspace has been using VFIO_IOMMU_TYPE1_NESTING so
> far, so I don't think it should be a concern.
But as Yi mentioned ealier, in the current vfio code there is no
DOMAIN_ATTR_NESTING query yet. In my SMMUV3 nested stage series, I added
such a query in vfio-pci.c to detect if I need to expose a fault region
but I already test both the returned value and the output arg. So to me
there is no issue with that change.
> 
> And if userspace queries the nesting properties using the new ABI
> introduced in this patchset, it will obtain an empty struct. I think
> that's acceptable, but it may be better to avoid adding the nesting cap if
> @format is 0?
agreed

Thanks

Eric
> 
> Thanks,
> Jean
> 
>>
>> @Eric, how about your opinion? your dual-stage vSMMU support may
>> also share the vfio_iommu_type1.c code.
>>
>> Regards,
>> Yi Liu
>>
>>> Will
> 

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v5 03/15] iommu/smmu: Report empty domain nesting info
  2020-07-16 20:38         ` Auger Eric
@ 2020-07-17  9:09           ` Jean-Philippe Brucker
  2020-07-17 10:28             ` Auger Eric
  2020-07-23  9:44             ` Liu, Yi L
  0 siblings, 2 replies; 58+ messages in thread
From: Jean-Philippe Brucker @ 2020-07-17  9:09 UTC (permalink / raw)
  To: Auger Eric
  Cc: Tian, Kevin, Raj, Ashok, kvm, iommu, stefanha, Sun, Yi Y,
	linux-kernel, alex.williamson, Robin Murphy, Wu, Hao,
	Will Deacon, Tian, Jun J

On Thu, Jul 16, 2020 at 10:38:17PM +0200, Auger Eric wrote:
> Hi Jean,
> 
> On 7/16/20 5:39 PM, Jean-Philippe Brucker wrote:
> > On Tue, Jul 14, 2020 at 10:12:49AM +0000, Liu, Yi L wrote:
> >>> Have you verified that this doesn't break the existing usage of
> >>> DOMAIN_ATTR_NESTING in drivers/vfio/vfio_iommu_type1.c?
> >>
> >> I didn't have ARM machine on my hand. But I contacted with Jean
> >> Philippe, he confirmed no compiling issue. I didn't see any code
> >> getting DOMAIN_ATTR_NESTING attr in current drivers/vfio/vfio_iommu_type1.c.
> >> What I'm adding is to call iommu_domai_get_attr(, DOMAIN_ATTR_NESTIN)
> >> and won't fail if the iommu_domai_get_attr() returns 0. This patch
> >> returns an empty nesting info for DOMAIN_ATTR_NESTIN and return
> >> value is 0 if no error. So I guess it won't fail nesting for ARM.
> > 
> > I confirm that this series doesn't break the current support for
> > VFIO_IOMMU_TYPE1_NESTING with an SMMUv3. That said...
> > 
> > If the SMMU does not support stage-2 then there is a change in behavior
> > (untested): after the domain is silently switched to stage-1 by the SMMU
> > driver, VFIO will now query nesting info and obtain -ENODEV. Instead of
> > succeding as before, the VFIO ioctl will now fail. I believe that's a fix
> > rather than a regression, it should have been like this since the
> > beginning. No known userspace has been using VFIO_IOMMU_TYPE1_NESTING so
> > far, so I don't think it should be a concern.
> But as Yi mentioned ealier, in the current vfio code there is no
> DOMAIN_ATTR_NESTING query yet.

That's why something that would have succeeded before will now fail:
Before this series, if user asked for a VFIO_IOMMU_TYPE1_NESTING, it would
have succeeded even if the SMMU didn't support stage-2, as the driver
would have silently fallen back on stage-1 mappings (which work exactly
the same as stage-2-only since there was no nesting supported). After the
series, we do check for DOMAIN_ATTR_NESTING so if user asks for
VFIO_IOMMU_TYPE1_NESTING and the SMMU doesn't support stage-2, the ioctl
fails.

I believe it's a good fix and completely harmless, but wanted to make sure
no one objects because it's an ABI change.

Thanks,
Jean

> In my SMMUV3 nested stage series, I added
> such a query in vfio-pci.c to detect if I need to expose a fault region
> but I already test both the returned value and the output arg. So to me
> there is no issue with that change.
> > 
> > And if userspace queries the nesting properties using the new ABI
> > introduced in this patchset, it will obtain an empty struct. I think
> > that's acceptable, but it may be better to avoid adding the nesting cap if
> > @format is 0?
> agreed
> 
> Thanks
> 
> Eric
> > 
> > Thanks,
> > Jean
> > 
> >>
> >> @Eric, how about your opinion? your dual-stage vSMMU support may
> >> also share the vfio_iommu_type1.c code.
> >>
> >> Regards,
> >> Yi Liu
> >>
> >>> Will
> > 
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v5 03/15] iommu/smmu: Report empty domain nesting info
  2020-07-17  9:09           ` Jean-Philippe Brucker
@ 2020-07-17 10:28             ` Auger Eric
  2020-07-23  9:44             ` Liu, Yi L
  1 sibling, 0 replies; 58+ messages in thread
From: Auger Eric @ 2020-07-17 10:28 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Tian, Kevin, Raj, Ashok, kvm, iommu, stefanha, Sun, Yi Y,
	linux-kernel, alex.williamson, Robin Murphy, Wu, Hao,
	Will Deacon, Tian, Jun J

Hi Jean,

On 7/17/20 11:09 AM, Jean-Philippe Brucker wrote:
> On Thu, Jul 16, 2020 at 10:38:17PM +0200, Auger Eric wrote:
>> Hi Jean,
>>
>> On 7/16/20 5:39 PM, Jean-Philippe Brucker wrote:
>>> On Tue, Jul 14, 2020 at 10:12:49AM +0000, Liu, Yi L wrote:
>>>>> Have you verified that this doesn't break the existing usage of
>>>>> DOMAIN_ATTR_NESTING in drivers/vfio/vfio_iommu_type1.c?
>>>>
>>>> I didn't have ARM machine on my hand. But I contacted with Jean
>>>> Philippe, he confirmed no compiling issue. I didn't see any code
>>>> getting DOMAIN_ATTR_NESTING attr in current drivers/vfio/vfio_iommu_type1.c.
>>>> What I'm adding is to call iommu_domai_get_attr(, DOMAIN_ATTR_NESTIN)
>>>> and won't fail if the iommu_domai_get_attr() returns 0. This patch
>>>> returns an empty nesting info for DOMAIN_ATTR_NESTIN and return
>>>> value is 0 if no error. So I guess it won't fail nesting for ARM.
>>>
>>> I confirm that this series doesn't break the current support for
>>> VFIO_IOMMU_TYPE1_NESTING with an SMMUv3. That said...
>>>
>>> If the SMMU does not support stage-2 then there is a change in behavior
>>> (untested): after the domain is silently switched to stage-1 by the SMMU
>>> driver, VFIO will now query nesting info and obtain -ENODEV. Instead of
>>> succeding as before, the VFIO ioctl will now fail. I believe that's a fix
>>> rather than a regression, it should have been like this since the
>>> beginning. No known userspace has been using VFIO_IOMMU_TYPE1_NESTING so
>>> far, so I don't think it should be a concern.
>> But as Yi mentioned ealier, in the current vfio code there is no
>> DOMAIN_ATTR_NESTING query yet.
> 
> That's why something that would have succeeded before will now fail:
> Before this series, if user asked for a VFIO_IOMMU_TYPE1_NESTING, it would
> have succeeded even if the SMMU didn't support stage-2, as the driver
> would have silently fallen back on stage-1 mappings (which work exactly
> the same as stage-2-only since there was no nesting supported). After the
> series, we do check for DOMAIN_ATTR_NESTING so if user asks for
> VFIO_IOMMU_TYPE1_NESTING and the SMMU doesn't support stage-2, the ioctl
> fails.
OK I now understand what you meant. Yes this actual change is brought by
the next patch, ie.
[PATCH v5 04/15] vfio/type1: Report iommu nesting info to userspace

Thanks

Eric
> 
> I believe it's a good fix and completely harmless, but wanted to make sure
> no one objects because it's an ABI change.
> 
> Thanks,
> Jean
> 
>> In my SMMUV3 nested stage series, I added
>> such a query in vfio-pci.c to detect if I need to expose a fault region
>> but I already test both the returned value and the output arg. So to me
>> there is no issue with that change.
>>>
>>> And if userspace queries the nesting properties using the new ABI
>>> introduced in this patchset, it will obtain an empty struct. I think
>>> that's acceptable, but it may be better to avoid adding the nesting cap if
>>> @format is 0?
>> agreed
>>
>> Thanks
>>
>> Eric
>>>
>>> Thanks,
>>> Jean
>>>
>>>>
>>>> @Eric, how about your opinion? your dual-stage vSMMU support may
>>>> also share the vfio_iommu_type1.c code.
>>>>
>>>> Regards,
>>>> Yi Liu
>>>>
>>>>> Will
>>>
>>
> 

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v5 02/15] iommu: Report domain nesting info
  2020-07-12 11:20 ` [PATCH v5 02/15] iommu: Report domain nesting info Liu Yi L
@ 2020-07-17 16:29   ` Auger Eric
  2020-07-20  7:20     ` Liu, Yi L
  0 siblings, 1 reply; 58+ messages in thread
From: Auger Eric @ 2020-07-17 16:29 UTC (permalink / raw)
  To: Liu Yi L, alex.williamson, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	iommu, linux-kernel, yi.y.sun, hao.wu

Hi Yi,

On 7/12/20 1:20 PM, Liu Yi L wrote:
> IOMMUs that support nesting translation needs report the capability info
s/needs/need to report
> to userspace, e.g. the format of first level/stage paging structures.
It gives information about requirements the userspace needs to implement
plus other features characterizing the physical implementation.
> 
> This patch reports nesting info by DOMAIN_ATTR_NESTING. Caller can get
> nesting info after setting DOMAIN_ATTR_NESTING.
I guess you meant after selecting VFIO_TYPE1_NESTING_IOMMU?
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Eric Auger <eric.auger@redhat.com>
> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
> v4 -> v5:
> *) address comments from Eric Auger.
> 
> v3 -> v4:
> *) split the SMMU driver changes to be a separate patch
> *) move the @addr_width and @pasid_bits from vendor specific
>    part to generic part.
> *) tweak the description for the @features field of struct
>    iommu_nesting_info.
> *) add description on the @data[] field of struct iommu_nesting_info
> 
> v2 -> v3:
> *) remvoe cap/ecap_mask in iommu_nesting_info.
> *) reuse DOMAIN_ATTR_NESTING to get nesting info.
> *) return an empty iommu_nesting_info for SMMU drivers per Jean'
>    suggestion.
> ---
>  include/uapi/linux/iommu.h | 77 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 77 insertions(+)
> 
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> index 1afc661..d2a47c4 100644
> --- a/include/uapi/linux/iommu.h
> +++ b/include/uapi/linux/iommu.h
> @@ -332,4 +332,81 @@ struct iommu_gpasid_bind_data {
>  	} vendor;
>  };
>  
> +/*
> + * struct iommu_nesting_info - Information for nesting-capable IOMMU.
> + *			       user space should check it before using
> + *			       nesting capability.
> + *
> + * @size:	size of the whole structure
> + * @format:	PASID table entry format, the same definition as struct
> + *		iommu_gpasid_bind_data @format.
> + * @features:	supported nesting features.
> + * @flags:	currently reserved for future extension.
> + * @addr_width:	The output addr width of first level/stage translation
> + * @pasid_bits:	Maximum supported PASID bits, 0 represents no PASID
> + *		support.
> + * @data:	vendor specific cap info. data[] structure type can be deduced
> + *		from @format field.
> + *
> + * +===============+======================================================+
> + * | feature       |  Notes                                               |
> + * +===============+======================================================+
> + * | SYSWIDE_PASID |  PASIDs are managed in system-wide, instead of per   |
s/in system-wide/system-wide ?
> + * |               |  device. When a device is assigned to userspace or   |
> + * |               |  VM, proper uAPI (userspace driver framework uAPI,   |
> + * |               |  e.g. VFIO) must be used to allocate/free PASIDs for |
> + * |               |  the assigned device.          
Isn't it possible to be more explicit, something like:
                      |
System-wide PASID management is mandated by the physical IOMMU. All
PASIDs allocation must be mediated through the TBD API.
> + * +---------------+------------------------------------------------------+
> + * | BIND_PGTBL    |  The owner of the first level/stage page table must  |
> + * |               |  explicitly bind the page table to associated PASID  |
> + * |               |  (either the one specified in bind request or the    |
> + * |               |  default PASID of iommu domain), through userspace   |
> + * |               |  driver framework uAPI (e.g. VFIO_IOMMU_NESTING_OP). |
As per your answer in https://lkml.org/lkml/2020/7/6/383, I now
understand ARM would not expose that BIND_PGTBL nesting feature, I still
think the above wording is a bit confusing. Maybe you may explicitly
talk about the PASID *entry* that needs to be passed from guest to host.
On ARM we directly pass the PASID table but when reading the above
description I fail to determine if this does not fit that description.
> + * +---------------+------------------------------------------------------+
> + * | CACHE_INVLD   |  The owner of the first level/stage page table must  |
> + * |               |  explicitly invalidate the IOMMU cache through uAPI  |
> + * |               |  provided by userspace driver framework (e.g. VFIO)  |
> + * |               |  according to vendor-specific requirement when       |
> + * |               |  changing the page table.                            |
> + * +---------------+------------------------------------------------------+

instead of using the "uAPI provided by userspace driver framework (e.g.
VFIO)", can't we use the so-called IOMMU UAPI terminology which now has
a userspace documentation?

> + *
> + * @data[] types defined for @format:
> + * +================================+=====================================+
> + * | @format                        | @data[]                             |
> + * +================================+=====================================+
> + * | IOMMU_PASID_FORMAT_INTEL_VTD   | struct iommu_nesting_info_vtd       |
> + * +--------------------------------+-------------------------------------+
> + *
> + */
> +struct iommu_nesting_info {
> +	__u32	size;
shouldn't it be @argsz to fit the iommu uapi convention and take benefit
to put the flags field just below?
> +	__u32	format;
> +#define IOMMU_NESTING_FEAT_SYSWIDE_PASID	(1 << 0)
> +#define IOMMU_NESTING_FEAT_BIND_PGTBL		(1 << 1)
> +#define IOMMU_NESTING_FEAT_CACHE_INVLD		(1 << 2)
> +	__u32	features;
> +	__u32	flags;
> +	__u16	addr_width;
> +	__u16	pasid_bits;
> +	__u32	padding;
> +	__u8	data[];
> +};
> +
> +/*
> + * struct iommu_nesting_info_vtd - Intel VT-d specific nesting info
> + *
> + * @flags:	VT-d specific flags. Currently reserved for future
> + *		extension.
must be set to 0?
> + * @cap_reg:	Describe basic capabilities as defined in VT-d capability
> + *		register.
> + * @ecap_reg:	Describe the extended capabilities as defined in VT-d
> + *		extended capability register.
> + */
> +struct iommu_nesting_info_vtd {
> +	__u32	flags;
> +	__u32	padding;
> +	__u64	cap_reg;
> +	__u64	ecap_reg;
> +};
> +
>  #endif /* _UAPI_IOMMU_H */
Thanks

Eric
> 

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v5 15/15] iommu/vt-d: Support reporting nesting capability info
  2020-07-12 11:21 ` [PATCH v5 15/15] iommu/vt-d: Support reporting nesting capability info Liu Yi L
@ 2020-07-17 17:14   ` Auger Eric
  2020-07-20 13:33     ` Liu, Yi L
  0 siblings, 1 reply; 58+ messages in thread
From: Auger Eric @ 2020-07-17 17:14 UTC (permalink / raw)
  To: Liu Yi L, alex.williamson, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	iommu, linux-kernel, yi.y.sun, hao.wu

Hi Yi,

Missing a proper commit message. You can comment on the fact you only
support the case where all the physical iomms have the same CAP/ECAP MASKS

On 7/12/20 1:21 PM, Liu Yi L wrote:
> Cc: Kevin Tian <kevin.tian@intel.com>
> CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Eric Auger <eric.auger@redhat.com>
> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
> v2 -> v3:
> *) remove cap/ecap_mask in iommu_nesting_info.
> ---
>  drivers/iommu/intel/iommu.c | 81 +++++++++++++++++++++++++++++++++++++++++++--
>  include/linux/intel-iommu.h | 16 +++++++++
>  2 files changed, 95 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index a9504cb..9f7ad1a 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -5659,12 +5659,16 @@ static inline bool iommu_pasid_support(void)
>  static inline bool nested_mode_support(void)
>  {
>  	struct dmar_drhd_unit *drhd;
> -	struct intel_iommu *iommu;
> +	struct intel_iommu *iommu, *prev = NULL;
>  	bool ret = true;
>  
>  	rcu_read_lock();
>  	for_each_active_iommu(iommu, drhd) {
> -		if (!sm_supported(iommu) || !ecap_nest(iommu->ecap)) {
> +		if (!prev)
> +			prev = iommu;
> +		if (!sm_supported(iommu) || !ecap_nest(iommu->ecap) ||
> +		    (VTD_CAP_MASK & (iommu->cap ^ prev->cap)) ||
> +		    (VTD_ECAP_MASK & (iommu->ecap ^ prev->ecap))) {
>  			ret = false;
>  			break;>  		}
> @@ -6079,6 +6083,78 @@ intel_iommu_domain_set_attr(struct iommu_domain *domain,
>  	return ret;
>  }
>  
> +static int intel_iommu_get_nesting_info(struct iommu_domain *domain,
> +					struct iommu_nesting_info *info)
> +{
> +	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> +	u64 cap = VTD_CAP_MASK, ecap = VTD_ECAP_MASK;
> +	struct device_domain_info *domain_info;
> +	struct iommu_nesting_info_vtd vtd;
> +	unsigned long flags;
> +	unsigned int size;
> +
> +	if (domain->type != IOMMU_DOMAIN_UNMANAGED ||
> +	    !(dmar_domain->flags & DOMAIN_FLAG_NESTING_MODE))
> +		return -ENODEV;
> +
> +	if (!info)
> +		return -EINVAL;
> +
> +	size = sizeof(struct iommu_nesting_info) +
> +		sizeof(struct iommu_nesting_info_vtd);
> +	/*
> +	 * if provided buffer size is smaller than expected, should
> +	 * return 0 and also the expected buffer size to caller.
> +	 */
> +	if (info->size < size) {
> +		info->size = size;
> +		return 0;
> +	}
> +
> +	spin_lock_irqsave(&device_domain_lock, flags);
> +	/*
> +	 * arbitrary select the first domain_info as all nesting
> +	 * related capabilities should be consistent across iommu
> +	 * units.
> +	 */
> +	domain_info = list_first_entry(&dmar_domain->devices,
> +				       struct device_domain_info, link);
> +	cap &= domain_info->iommu->cap;
> +	ecap &= domain_info->iommu->ecap;
> +	spin_unlock_irqrestore(&device_domain_lock, flags);
> +
> +	info->format = IOMMU_PASID_FORMAT_INTEL_VTD;
> +	info->features = IOMMU_NESTING_FEAT_SYSWIDE_PASID |
> +			 IOMMU_NESTING_FEAT_BIND_PGTBL |
> +			 IOMMU_NESTING_FEAT_CACHE_INVLD;
> +	info->addr_width = dmar_domain->gaw;
> +	info->pasid_bits = ilog2(intel_pasid_max_id);
> +	info->padding = 0;
> +	vtd.flags = 0;
> +	vtd.padding = 0;
> +	vtd.cap_reg = cap;
> +	vtd.ecap_reg = ecap;
> +
> +	memcpy(info->data, &vtd, sizeof(vtd));
> +	return 0;
> +}
> +
> +static int intel_iommu_domain_get_attr(struct iommu_domain *domain,
> +				       enum iommu_attr attr, void *data)
> +{
> +	switch (attr) {
> +	case DOMAIN_ATTR_NESTING:
> +	{
> +		struct iommu_nesting_info *info =
> +				(struct iommu_nesting_info *)data;
> +
> +		return intel_iommu_get_nesting_info(domain, info);
> +	}
> +	default:
> +		return -ENODEV;
-ENOENT?
> +	}
> +}
> +
>  /*
>   * Check that the device does not live on an external facing PCI port that is
>   * marked as untrusted. Such devices should not be able to apply quirks and
> @@ -6101,6 +6177,7 @@ const struct iommu_ops intel_iommu_ops = {
>  	.domain_alloc		= intel_iommu_domain_alloc,
>  	.domain_free		= intel_iommu_domain_free,
>  	.domain_set_attr	= intel_iommu_domain_set_attr,
> +	.domain_get_attr	= intel_iommu_domain_get_attr,
>  	.attach_dev		= intel_iommu_attach_device,
>  	.detach_dev		= intel_iommu_detach_device,
>  	.aux_attach_dev		= intel_iommu_aux_attach_device,
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 18f292e..c4ed0d4 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -197,6 +197,22 @@
>  #define ecap_max_handle_mask(e) ((e >> 20) & 0xf)
>  #define ecap_sc_support(e)	((e >> 7) & 0x1) /* Snooping Control */
>  
> +/* Nesting Support Capability Alignment */
> +#define VTD_CAP_FL1GP		BIT_ULL(56)
> +#define VTD_CAP_FL5LP		BIT_ULL(60)
> +#define VTD_ECAP_PRS		BIT_ULL(29)
> +#define VTD_ECAP_ERS		BIT_ULL(30)
> +#define VTD_ECAP_SRS		BIT_ULL(31)
> +#define VTD_ECAP_EAFS		BIT_ULL(34)
> +#define VTD_ECAP_PASID		BIT_ULL(40)

> +
> +/* Only capabilities marked in below MASKs are reported */
> +#define VTD_CAP_MASK		(VTD_CAP_FL1GP | VTD_CAP_FL5LP)
> +
> +#define VTD_ECAP_MASK		(VTD_ECAP_PRS | VTD_ECAP_ERS | \
> +				 VTD_ECAP_SRS | VTD_ECAP_EAFS | \
> +				 VTD_ECAP_PASID)
> +
>  /* Virtual command interface capability */
>  #define vccap_pasid(v)		(((v) & DMA_VCS_PAS)) /* PASID allocation */
>  
> 
Thanks

Eric


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v5 03/15] iommu/smmu: Report empty domain nesting info
  2020-07-12 11:20 ` [PATCH v5 03/15] iommu/smmu: Report empty " Liu Yi L
  2020-07-13 13:14   ` Will Deacon
@ 2020-07-17 17:18   ` Auger Eric
  2020-07-20  7:26     ` Liu, Yi L
  1 sibling, 1 reply; 58+ messages in thread
From: Auger Eric @ 2020-07-17 17:18 UTC (permalink / raw)
  To: Liu Yi L, alex.williamson, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	iommu, linux-kernel, yi.y.sun, Robin Murphy, Will Deacon, hao.wu

Yi,

On 7/12/20 1:20 PM, Liu Yi L wrote:
> This patch is added as instead of returning a boolean for DOMAIN_ATTR_NESTING,
> iommu_domain_get_attr() should return an iommu_nesting_info handle.

you may add in the commit message you return an empty nesting info
struct for now as true nesting is not yet supported by the SMMUs.

Besides:
Reviewed-by: Eric Auger <eric.auger@redhat.com>

Thanks

Eric
> 
> Cc: Will Deacon <will@kernel.org>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: Eric Auger <eric.auger@redhat.com>
> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> Suggested-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
> v4 -> v5:
> *) address comments from Eric Auger.
> ---
>  drivers/iommu/arm-smmu-v3.c | 29 +++++++++++++++++++++++++++--
>  drivers/iommu/arm-smmu.c    | 29 +++++++++++++++++++++++++++--
>  2 files changed, 54 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index f578677..ec815d7 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -3019,6 +3019,32 @@ static struct iommu_group *arm_smmu_device_group(struct device *dev)
>  	return group;
>  }
>  
> +static int arm_smmu_domain_nesting_info(struct arm_smmu_domain *smmu_domain,
> +					void *data)
> +{
> +	struct iommu_nesting_info *info = (struct iommu_nesting_info *)data;
> +	unsigned int size;
> +
> +	if (!info || smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
> +		return -ENODEV;
> +
> +	size = sizeof(struct iommu_nesting_info);
> +
> +	/*
> +	 * if provided buffer size is smaller than expected, should
> +	 * return 0 and also the expected buffer size to caller.
> +	 */
> +	if (info->size < size) {
> +		info->size = size;
> +		return 0;
> +	}
> +
> +	/* report an empty iommu_nesting_info for now */
> +	memset(info, 0x0, size);
> +	info->size = size;
> +	return 0;
> +}
> +
>  static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
>  				    enum iommu_attr attr, void *data)
>  {
> @@ -3028,8 +3054,7 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
>  	case IOMMU_DOMAIN_UNMANAGED:
>  		switch (attr) {
>  		case DOMAIN_ATTR_NESTING:
> -			*(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
> -			return 0;
> +			return arm_smmu_domain_nesting_info(smmu_domain, data);
>  		default:
>  			return -ENODEV;
>  		}
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 243bc4c..09e2f1b 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -1506,6 +1506,32 @@ static struct iommu_group *arm_smmu_device_group(struct device *dev)
>  	return group;
>  }
>  
> +static int arm_smmu_domain_nesting_info(struct arm_smmu_domain *smmu_domain,
> +					void *data)
> +{
> +	struct iommu_nesting_info *info = (struct iommu_nesting_info *)data;
> +	unsigned int size;
> +
> +	if (!info || smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
> +		return -ENODEV;
> +
> +	size = sizeof(struct iommu_nesting_info);
> +
> +	/*
> +	 * if provided buffer size is smaller than expected, should
> +	 * return 0 and also the expected buffer size to caller.
> +	 */
> +	if (info->size < size) {
> +		info->size = size;
> +		return 0;
> +	}
> +
> +	/* report an empty iommu_nesting_info for now */
> +	memset(info, 0x0, size);
> +	info->size = size;
> +	return 0;
> +}
> +
>  static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
>  				    enum iommu_attr attr, void *data)
>  {
> @@ -1515,8 +1541,7 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
>  	case IOMMU_DOMAIN_UNMANAGED:
>  		switch (attr) {
>  		case DOMAIN_ATTR_NESTING:
> -			*(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
> -			return 0;
> +			return arm_smmu_domain_nesting_info(smmu_domain, data);
>  		default:
>  			return -ENODEV;
>  		}
> 

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v5 04/15] vfio/type1: Report iommu nesting info to userspace
  2020-07-12 11:20 ` [PATCH v5 04/15] vfio/type1: Report iommu nesting info to userspace Liu Yi L
@ 2020-07-17 17:34   ` Auger Eric
  2020-07-20  7:51     ` Liu, Yi L
  0 siblings, 1 reply; 58+ messages in thread
From: Auger Eric @ 2020-07-17 17:34 UTC (permalink / raw)
  To: Liu Yi L, alex.williamson, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	iommu, linux-kernel, yi.y.sun, hao.wu

Yi,

On 7/12/20 1:20 PM, Liu Yi L wrote:
> This patch exports iommu nesting capability info to user space through
> VFIO. User space is expected to check this info for supported uAPIs (e.g.
it is not only to check the supported uAPIS but rather to know which
callbacks it must call upon vIOMMU events and which features are
supported by the physical IOMMU.
> PASID alloc/free, bind page table, and cache invalidation) and the vendor
> specific format information for first level/stage page table that will be
> bound to.
> 
> The nesting info is available only after the nesting iommu type is set
> for a container.
to NESTED type
 Current implementation imposes one limitation - one
> nesting container should include at most one group. The philosophy of
> vfio container is having all groups/devices within the container share
> the same IOMMU context. When vSVA is enabled, one IOMMU context could
> include one 2nd-level address space and multiple 1st-level address spaces.
> While the 2nd-level address space is reasonably sharable by multiple groups
> , blindly sharing 1st-level address spaces across all groups within the
> container might instead break the guest expectation. In the future sub/
> super container concept might be introduced to allow partial address space
> sharing within an IOMMU context. But for now let's go with this restriction
> by requiring singleton container for using nesting iommu features. Below
> link has the related discussion about this decision.

Maybe add a note about SMMU related changes spotted by Jean.
> 
> https://lkml.org/lkml/2020/5/15/1028
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Eric Auger <eric.auger@redhat.com>
> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> ---
> v4 -> v5:
> *) address comments from Eric Auger.
> *) return struct iommu_nesting_info for VFIO_IOMMU_TYPE1_INFO_CAP_NESTING as
>    cap is much "cheap", if needs extension in future, just define another cap.
>    https://lore.kernel.org/kvm/20200708132947.5b7ee954@x1.home/
> 
> v3 -> v4:
> *) address comments against v3.
> 
> v1 -> v2:
> *) added in v2
> ---
>  drivers/vfio/vfio_iommu_type1.c | 102 +++++++++++++++++++++++++++++++++++-----
>  include/uapi/linux/vfio.h       |  19 ++++++++
>  2 files changed, 109 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 3bd70ff..ed80104 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -62,18 +62,20 @@ MODULE_PARM_DESC(dma_entry_limit,
>  		 "Maximum number of user DMA mappings per container (65535).");
>  
>  struct vfio_iommu {
> -	struct list_head	domain_list;
> -	struct list_head	iova_list;
> -	struct vfio_domain	*external_domain; /* domain for external user */
> -	struct mutex		lock;
> -	struct rb_root		dma_list;
> -	struct blocking_notifier_head notifier;
> -	unsigned int		dma_avail;
> -	uint64_t		pgsize_bitmap;
> -	bool			v2;
> -	bool			nesting;
> -	bool			dirty_page_tracking;
> -	bool			pinned_page_dirty_scope;
> +	struct list_head		domain_list;
> +	struct list_head		iova_list;
> +	/* domain for external user */
> +	struct vfio_domain		*external_domain;
> +	struct mutex			lock;
> +	struct rb_root			dma_list;
> +	struct blocking_notifier_head	notifier;
> +	unsigned int			dma_avail;
> +	uint64_t			pgsize_bitmap;
> +	bool				v2;
> +	bool				nesting;
> +	bool				dirty_page_tracking;
> +	bool				pinned_page_dirty_scope;
> +	struct iommu_nesting_info	*nesting_info;
>  };
>  
>  struct vfio_domain {
> @@ -130,6 +132,9 @@ struct vfio_regions {
>  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)	\
>  					(!list_empty(&iommu->domain_list))
>  
> +#define CONTAINER_HAS_DOMAIN(iommu)	(((iommu)->external_domain) || \
> +					 (!list_empty(&(iommu)->domain_list)))
> +
>  #define DIRTY_BITMAP_BYTES(n)	(ALIGN(n, BITS_PER_TYPE(u64)) / BITS_PER_BYTE)
>  
>  /*
> @@ -1929,6 +1934,13 @@ static void vfio_iommu_iova_insert_copy(struct vfio_iommu *iommu,
>  
>  	list_splice_tail(iova_copy, iova);
>  }
> +
> +static void vfio_iommu_release_nesting_info(struct vfio_iommu *iommu)
> +{
> +	kfree(iommu->nesting_info);
> +	iommu->nesting_info = NULL;
> +}
> +
>  static int vfio_iommu_type1_attach_group(void *iommu_data,
>  					 struct iommu_group *iommu_group)
>  {
> @@ -1959,6 +1971,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>  		}
>  	}
>  
> +	/* Nesting type container can include only one group */
> +	if (iommu->nesting && CONTAINER_HAS_DOMAIN(iommu)) {
> +		mutex_unlock(&iommu->lock);
> +		return -EINVAL;
> +	}
> +
>  	group = kzalloc(sizeof(*group), GFP_KERNEL);
>  	domain = kzalloc(sizeof(*domain), GFP_KERNEL);
>  	if (!group || !domain) {
> @@ -2029,6 +2047,32 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>  	if (ret)
>  		goto out_domain;
>  
> +	/* Nesting cap info is available only after attaching */
> +	if (iommu->nesting) {
> +		struct iommu_nesting_info tmp = { .size = 0, };
> +
> +		/* First get the size of vendor specific nesting info */
> +		ret = iommu_domain_get_attr(domain->domain,
> +					    DOMAIN_ATTR_NESTING,
> +					    &tmp);
> +		if (ret)
> +			goto out_detach;
> +
> +		iommu->nesting_info = kzalloc(tmp.size, GFP_KERNEL);
> +		if (!iommu->nesting_info) {
> +			ret = -ENOMEM;
> +			goto out_detach;
> +		}
> +
> +		/* Now get the nesting info */
> +		iommu->nesting_info->size = tmp.size;
> +		ret = iommu_domain_get_attr(domain->domain,
> +					    DOMAIN_ATTR_NESTING,
> +					    iommu->nesting_info);
> +		if (ret)
> +			goto out_detach;
> +	}
> +
>  	/* Get aperture info */
>  	iommu_domain_get_attr(domain->domain, DOMAIN_ATTR_GEOMETRY, &geo);
>  
> @@ -2138,6 +2182,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>  	return 0;
>  
>  out_detach:
> +	vfio_iommu_release_nesting_info(iommu);
>  	vfio_iommu_detach_group(domain, group);
>  out_domain:
>  	iommu_domain_free(domain->domain);
> @@ -2338,6 +2383,8 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
>  					vfio_iommu_unmap_unpin_all(iommu);
>  				else
>  					vfio_iommu_unmap_unpin_reaccount(iommu);
> +
> +				vfio_iommu_release_nesting_info(iommu);
>  			}
>  			iommu_domain_free(domain->domain);
>  			list_del(&domain->next);
> @@ -2546,6 +2593,31 @@ static int vfio_iommu_migration_build_caps(struct vfio_iommu *iommu,
>  	return vfio_info_add_capability(caps, &cap_mig.header, sizeof(cap_mig));
>  }
>  
> +static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu,
> +					   struct vfio_info_cap *caps)
> +{
> +	struct vfio_info_cap_header *header;
> +	struct vfio_iommu_type1_info_cap_nesting *nesting_cap;
> +	size_t size;
> +
> +	size = offsetof(struct vfio_iommu_type1_info_cap_nesting, info) +
> +		iommu->nesting_info->size;
> +
> +	header = vfio_info_cap_add(caps, size,
> +				   VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1);
> +	if (IS_ERR(header))
> +		return PTR_ERR(header);
> +
> +	nesting_cap = container_of(header,
> +				   struct vfio_iommu_type1_info_cap_nesting,
> +				   header);
> +
> +	memcpy(&nesting_cap->info, iommu->nesting_info,
> +	       iommu->nesting_info->size);
you must check whether nesting_info is non NULL before doing that.

Besides I agree with Jean on the fact it may be better to not report the
capability if nested is not supported.
> +
> +	return 0;
> +}
> +
>  static int vfio_iommu_type1_get_info(struct vfio_iommu *iommu,
>  				     unsigned long arg)
>  {
> @@ -2581,6 +2653,12 @@ static int vfio_iommu_type1_get_info(struct vfio_iommu *iommu,
>  	if (!ret)
>  		ret = vfio_iommu_iova_build_caps(iommu, &caps);
>  
> +	if (iommu->nesting_info) {
> +		ret = vfio_iommu_info_add_nesting_cap(iommu, &caps);
> +		if (ret)
> +			return ret;
> +	}
> +
>  	mutex_unlock(&iommu->lock);
>  
>  	if (ret)
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 9204705..46a78af 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -14,6 +14,7 @@
>  
>  #include <linux/types.h>
>  #include <linux/ioctl.h>
> +#include <linux/iommu.h>
>  
>  #define VFIO_API_VERSION	0
>  
> @@ -1039,6 +1040,24 @@ struct vfio_iommu_type1_info_cap_migration {
>  	__u64	max_dirty_bitmap_size;		/* in bytes */
>  };
>  
> +/*
> + * The nesting capability allows to report the related capability
> + * and info for nesting iommu type.
> + *
> + * The structures below define version 1 of this capability.
> + *
> + * User space selected VFIO_TYPE1_NESTING_IOMMU type should check
> + * this capto get supported features.
s/capto/capability to get
> + *
> + * @info: the nesting info provided by IOMMU driver.
> + */
> +#define VFIO_IOMMU_TYPE1_INFO_CAP_NESTING  3
> +
> +struct vfio_iommu_type1_info_cap_nesting {
> +	struct	vfio_info_cap_header header;
> +	struct iommu_nesting_info info;
> +};
> +
>  #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
>  
>  /**
> 

Thanks

Eric

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v5 14/15] vfio: Document dual stage control
  2020-07-12 11:21 ` [PATCH v5 14/15] vfio: Document dual stage control Liu Yi L
@ 2020-07-18 13:39   ` Auger Eric
  2020-07-25  8:54     ` Liu, Yi L
  0 siblings, 1 reply; 58+ messages in thread
From: Auger Eric @ 2020-07-18 13:39 UTC (permalink / raw)
  To: Liu Yi L, alex.williamson, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	iommu, linux-kernel, yi.y.sun, hao.wu

Hi Yi,

On 7/12/20 1:21 PM, Liu Yi L wrote:
> From: Eric Auger <eric.auger@redhat.com>
> 
> The VFIO API was enhanced to support nested stage control: a bunch of
> new iotcls and usage guideline.
ioctls
> 
> Let's document the process to follow to set up nested mode.
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Eric Auger <eric.auger@redhat.com>
> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Lu Baolu <baolu.lu@linux.intel.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> ---
> v3 -> v4:
> *) add review-by from Stefan Hajnoczi
> 
> v2 -> v3:
> *) address comments from Stefan Hajnoczi
> 
> v1 -> v2:
> *) new in v2, compared with Eric's original version, pasid table bind
>    and fault reporting is removed as this series doesn't cover them.
>    Original version from Eric.
>    https://lkml.org/lkml/2020/3/20/700
> ---
>  Documentation/driver-api/vfio.rst | 67 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 67 insertions(+)
> 
> diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst
> index f1a4d3c..0672c45 100644
> --- a/Documentation/driver-api/vfio.rst
> +++ b/Documentation/driver-api/vfio.rst
> @@ -239,6 +239,73 @@ group and can access them as follows::
>  	/* Gratuitous device reset and go... */
>  	ioctl(device, VFIO_DEVICE_RESET);
>  
> +IOMMU Dual Stage Control
> +------------------------
> +
> +Some IOMMUs support 2 stages/levels of translation. Stage corresponds to
> +the ARM terminology while level corresponds to Intel's VTD terminology.
> +In the following text we use either without distinction.
> +
> +This is useful when the guest is exposed with a virtual IOMMU and some
> +devices are assigned to the guest through VFIO. Then the guest OS can use
> +stage 1 (GIOVA -> GPA or GVA->GPA), while the hypervisor uses stage 2 for
> +VM isolation (GPA -> HPA).
> +
> +Under dual stage translation, the guest gets ownership of the stage 1 page
> +tables and also owns stage 1 configuration structures. The hypervisor owns
> +the root configuration structure (for security reason), including stage 2
> +configuration. This works as long as configuration structures and page table
> +formats are compatible between the virtual IOMMU and the physical IOMMU.
> +
> +Assuming the HW supports it, this nested mode is selected by choosing the
> +VFIO_TYPE1_NESTING_IOMMU type through:
> +
> +    ioctl(container, VFIO_SET_IOMMU, VFIO_TYPE1_NESTING_IOMMU);
> +
> +This forces the hypervisor to use the stage 2, leaving stage 1 available
> +for guest usage. The guest stage 1 format depends on IOMMU vendor, and
> +it is the same with the nesting configuration method. User space should
> +check the format and configuration method after setting nesting type by
> +using:
> +
> +    ioctl(container->fd, VFIO_IOMMU_GET_INFO, &nesting_info);
> +
> +Details can be found in Documentation/userspace-api/iommu.rst. For Intel
> +VT-d, each stage 1 page table is bound to host by:
> +
> +    nesting_op->flags = VFIO_IOMMU_NESTING_OP_BIND_PGTBL;
> +    memcpy(&nesting_op->data, &bind_data, sizeof(bind_data));
> +    ioctl(container->fd, VFIO_IOMMU_NESTING_OP, nesting_op);
> +
> +As mentioned above, guest OS may use stage 1 for GIOVA->GPA or GVA->GPA.
the guest OS, here and below?
> +GVA->GPA page tables are available when PASID (Process Address Space ID)
> +is exposed to guest. e.g. guest with PASID-capable devices assigned. For
> +such page table binding, the bind_data should include PASID info, which
> +is allocated by guest itself or by host. This depends on hardware vendor.
> +e.g. Intel VT-d requires to allocate PASID from host. This requirement is

> +defined by the Virtual Command Support in VT-d 3.0 spec, guest software
> +running on VT-d should allocate PASID from host kernel.
because VTD 3.0 requires the unicity of the PASID, system wide, instead
of the above repetition.

 To allocate PASID
> +from host, user space should check the IOMMU_NESTING_FEAT_SYSWIDE_PASID
> +bit of the nesting info reported from host kernel. VFIO reports the nesting
> +info by VFIO_IOMMU_GET_INFO. User space could allocate PASID from host by:
if SYSWIDE_PASID requirement is exposed, the userspace *must* allocate ...
> +
> +    req.flags = VFIO_IOMMU_ALLOC_PASID;
> +    ioctl(container, VFIO_IOMMU_PASID_REQUEST, &req);
> +
> +With first stage/level page table bound to host, it allows to combine the
> +guest stage 1 translation along with the hypervisor stage 2 translation to
> +get final address.
> +
> +When the guest invalidates stage 1 related caches, invalidations must be
> +forwarded to the host through
> +
> +    nesting_op->flags = VFIO_IOMMU_NESTING_OP_CACHE_INVLD;
> +    memcpy(&nesting_op->data, &inv_data, sizeof(inv_data));
> +    ioctl(container->fd, VFIO_IOMMU_NESTING_OP, nesting_op);
> +
> +Those invalidations can happen at various granularity levels, page, context,
> +...
> +
>  VFIO User API
>  -------------------------------------------------------------------------------
I see you dropped the unrecoverable error reporting part of the original
contribution. By the way don't you need any error handling for either of
the use cases on vtd?
>  
> 
Thanks

Eric

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v5 08/15] iommu: Pass domain to sva_unbind_gpasid()
  2020-07-12 11:21 ` [PATCH v5 08/15] iommu: Pass domain to sva_unbind_gpasid() Liu Yi L
@ 2020-07-19 15:37   ` Auger Eric
  2020-07-20  9:06     ` Liu, Yi L
  0 siblings, 1 reply; 58+ messages in thread
From: Auger Eric @ 2020-07-19 15:37 UTC (permalink / raw)
  To: Liu Yi L, alex.williamson, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	iommu, linux-kernel, yi.y.sun, hao.wu

Yi,

On 7/12/20 1:21 PM, Liu Yi L wrote:
> From: Yi Sun <yi.y.sun@intel.com>
> 
> Current interface is good enough for SVA virtualization on an assigned
> physical PCI device, but when it comes to mediated devices, a physical
> device may attached with multiple aux-domains. Also, for guest unbind,
> the PASID to be unbind should be allocated to the VM. This check requires
> to know the ioasid_set which is associated with the domain.
> 
> So this interface needs to pass in domain info. Then the iommu driver is
> able to know which domain will be used for the 2nd stage translation of
> the nesting mode and also be able to do PASID ownership check. This patch
> passes @domain per the above reason.
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Eric Auger <eric.auger@redhat.com>
> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Yi Sun <yi.y.sun@intel.com>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> ---
> v2 -> v3:
> *) pass in domain info only
> *) use ioasid_t for pasid instead of int type
> 
> v1 -> v2:
> *) added in v2.
> ---
>  drivers/iommu/intel/svm.c   | 3 ++-
>  drivers/iommu/iommu.c       | 2 +-
>  include/linux/intel-iommu.h | 3 ++-
>  include/linux/iommu.h       | 3 ++-
>  4 files changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> index b9a9c55..d2c0e1a 100644
> --- a/drivers/iommu/intel/svm.c
> +++ b/drivers/iommu/intel/svm.c
> @@ -432,7 +432,8 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
>  	return ret;
>  }
>  
> -int intel_svm_unbind_gpasid(struct device *dev, int pasid)
> +int intel_svm_unbind_gpasid(struct iommu_domain *domain,
> +			    struct device *dev, ioasid_t pasid)
int -> ioasid_t proto change is not described in the commit message,
>  {
>  	struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
>  	struct intel_svm_dev *sdev;
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 7910249..d3e554c 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2151,7 +2151,7 @@ int __iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
>  	if (unlikely(!domain->ops->sva_unbind_gpasid))
>  		return -ENODEV;
>  
> -	return domain->ops->sva_unbind_gpasid(dev, data->hpasid);
> +	return domain->ops->sva_unbind_gpasid(domain, dev, data->hpasid);
>  }
>  EXPORT_SYMBOL_GPL(__iommu_sva_unbind_gpasid);
>  
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 0d0ab32..18f292e 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -738,7 +738,8 @@ extern int intel_svm_enable_prq(struct intel_iommu *iommu);
>  extern int intel_svm_finish_prq(struct intel_iommu *iommu);
>  int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
>  			  struct iommu_gpasid_bind_data *data);
> -int intel_svm_unbind_gpasid(struct device *dev, int pasid);
> +int intel_svm_unbind_gpasid(struct iommu_domain *domain,
> +			    struct device *dev, ioasid_t pasid);
>  struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm,
>  				 void *drvdata);
>  void intel_svm_unbind(struct iommu_sva *handle);
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index e84a1d5..ca5edd8 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -303,7 +303,8 @@ struct iommu_ops {
>  	int (*sva_bind_gpasid)(struct iommu_domain *domain,
>  			struct device *dev, struct iommu_gpasid_bind_data *data);
>  
> -	int (*sva_unbind_gpasid)(struct device *dev, int pasid);
> +	int (*sva_unbind_gpasid)(struct iommu_domain *domain,
> +				 struct device *dev, ioasid_t pasid);
>  
>  	int (*def_domain_type)(struct device *dev);
>  
> 
Besides
Reviewed-by: Eric Auger <eric.auger@redhat.com>

Eric

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v5 06/15] iommu/vt-d: Support setting ioasid set to domain
  2020-07-12 11:21 ` [PATCH v5 06/15] iommu/vt-d: Support setting ioasid set to domain Liu Yi L
@ 2020-07-19 15:38   ` Auger Eric
  2020-07-20  8:11     ` Liu, Yi L
  0 siblings, 1 reply; 58+ messages in thread
From: Auger Eric @ 2020-07-19 15:38 UTC (permalink / raw)
  To: Liu Yi L, alex.williamson, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	iommu, linux-kernel, yi.y.sun, hao.wu

Yi,

On 7/12/20 1:21 PM, Liu Yi L wrote:
> From IOMMU p.o.v., PASIDs allocated and managed by external components
> (e.g. VFIO) will be passed in for gpasid_bind/unbind operation. IOMMU
> needs some knowledge to check the PASID ownership, hence add an interface
> for those components to tell the PASID owner.
> 
> In latest kernel design, PASID ownership is managed by IOASID set where
> the PASID is allocated from. This patch adds support for setting ioasid
> set ID to the domains used for nesting/vSVA. Subsequent SVA operations
> on the PASID will be checked against its IOASID set for proper ownership.
Subsequent SVA operations will check the PASID against its IOASID set
for proper ownership.
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Eric Auger <eric.auger@redhat.com>
> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
> v4 -> v5:
> *) address comments from Eric Auger.
> ---
>  drivers/iommu/intel/iommu.c | 22 ++++++++++++++++++++++
>  include/linux/intel-iommu.h |  4 ++++
>  include/linux/iommu.h       |  1 +
>  3 files changed, 27 insertions(+)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 72ae6a2..4d54198 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -1793,6 +1793,7 @@ static struct dmar_domain *alloc_domain(int flags)
>  	if (first_level_by_default())
>  		domain->flags |= DOMAIN_FLAG_USE_FIRST_LEVEL;
>  	domain->has_iotlb_device = false;
> +	domain->ioasid_sid = INVALID_IOASID_SET;
>  	INIT_LIST_HEAD(&domain->devices);
>  
>  	return domain;
> @@ -6039,6 +6040,27 @@ intel_iommu_domain_set_attr(struct iommu_domain *domain,
>  		}
>  		spin_unlock_irqrestore(&device_domain_lock, flags);
>  		break;
> +	case DOMAIN_ATTR_IOASID_SID:
> +	{
> +		int sid = *(int *)data;
> +
> +		if (!(dmar_domain->flags & DOMAIN_FLAG_NESTING_MODE)) {
> +			ret = -ENODEV;
> +			break;
> +		}
> +		spin_lock_irqsave(&device_domain_lock, flags);
I think the lock should be taken before the DOMAIN_FLAG_NESTING_MODE
check. Otherwise, the flags can be theretically changed inbetween the
check and the test below?

Thanks

Eric
> +		if (dmar_domain->ioasid_sid != INVALID_IOASID_SET &&
> +		    dmar_domain->ioasid_sid != sid) {
> +			pr_warn_ratelimited("multi ioasid_set (%d:%d) setting",
> +					    dmar_domain->ioasid_sid, sid);
> +			ret = -EBUSY;
> +			spin_unlock_irqrestore(&device_domain_lock, flags);
> +			break;
> +		}
> +		dmar_domain->ioasid_sid = sid;
> +		spin_unlock_irqrestore(&device_domain_lock, flags);
> +		break;
> +	}
>  	default:
>  		ret = -EINVAL;
>  		break;
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 3f23c26..0d0ab32 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -549,6 +549,10 @@ struct dmar_domain {
>  					   2 == 1GiB, 3 == 512GiB, 4 == 1TiB */
>  	u64		max_addr;	/* maximum mapped address */
>  
> +	int		ioasid_sid;	/*
> +					 * the ioasid set which tracks all
> +					 * PASIDs used by the domain.
> +					 */
>  	int		default_pasid;	/*
>  					 * The default pasid used for non-SVM
>  					 * traffic on mediated devices.
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 7ca9d48..e84a1d5 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -124,6 +124,7 @@ enum iommu_attr {
>  	DOMAIN_ATTR_FSL_PAMUV1,
>  	DOMAIN_ATTR_NESTING,	/* two stages of translation */
>  	DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
> +	DOMAIN_ATTR_IOASID_SID,
>  	DOMAIN_ATTR_MAX,
>  };
>  
> 

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v5 05/15] vfio: Add PASID allocation/free support
  2020-07-12 11:21 ` [PATCH v5 05/15] vfio: Add PASID allocation/free support Liu Yi L
@ 2020-07-19 15:38   ` Auger Eric
  2020-07-20  8:03     ` Liu, Yi L
  0 siblings, 1 reply; 58+ messages in thread
From: Auger Eric @ 2020-07-19 15:38 UTC (permalink / raw)
  To: Liu Yi L, alex.williamson, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	iommu, linux-kernel, yi.y.sun, hao.wu

Yi,

On 7/12/20 1:21 PM, Liu Yi L wrote:
> Shared Virtual Addressing (a.k.a Shared Virtual Memory) allows sharing
> multiple process virtual address spaces with the device for simplified
> programming model. PASID is used to tag an virtual address space in DMA
> requests and to identify the related translation structure in IOMMU. When
> a PASID-capable device is assigned to a VM, we want the same capability
> of using PASID to tag guest process virtual address spaces to achieve
> virtual SVA (vSVA).
> 
> PASID management for guest is vendor specific. Some vendors (e.g. Intel
> VT-d) requires system-wide managed PASIDs cross all devices, regardless
across?
> of whether a device is used by host or assigned to guest. Other vendors
> (e.g. ARM SMMU) may allow PASIDs managed per-device thus could be fully
> delegated to the guest for assigned devices.
> 
> For system-wide managed PASIDs, this patch introduces a vfio module to
> handle explicit PASID alloc/free requests from guest. Allocated PASIDs
> are associated to a process (or, mm_struct) in IOASID core. A vfio_mm
> object is introduced to track mm_struct. Multiple VFIO containers within
> a process share the same vfio_mm object.
> 
> A quota mechanism is provided to prevent malicious user from exhausting
> available PASIDs. Currently the quota is a global parameter applied to
> all VFIO devices. In the future per-device quota might be supported too.
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Eric Auger <eric.auger@redhat.com>
> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Lu Baolu <baolu.lu@linux.intel.com>
> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> ---
> v4 -> v5:
> *) address comments from Eric Auger.
> *) address the comments from Alex on the pasid free range support. Added
>    per vfio_mm pasid r-b tree.
>    https://lore.kernel.org/kvm/20200709082751.320742ab@x1.home/
> 
> v3 -> v4:
> *) fix lock leam in vfio_mm_get_from_task()
> *) drop pasid_quota field in struct vfio_mm
> *) vfio_mm_get_from_task() returns ERR_PTR(-ENOTTY) when !CONFIG_VFIO_PASID
> 
> v1 -> v2:
> *) added in v2, split from the pasid alloc/free support of v1
> ---
>  drivers/vfio/Kconfig      |   5 +
>  drivers/vfio/Makefile     |   1 +
>  drivers/vfio/vfio_pasid.c | 235 ++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/vfio.h      |  28 ++++++
>  4 files changed, 269 insertions(+)
>  create mode 100644 drivers/vfio/vfio_pasid.c
> 
> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> index fd17db9..3d8a108 100644
> --- a/drivers/vfio/Kconfig
> +++ b/drivers/vfio/Kconfig
> @@ -19,6 +19,11 @@ config VFIO_VIRQFD
>  	depends on VFIO && EVENTFD
>  	default n
>  
> +config VFIO_PASID
> +	tristate
> +	depends on IOASID && VFIO
> +	default n
> +
>  menuconfig VFIO
>  	tristate "VFIO Non-Privileged userspace driver framework"
>  	depends on IOMMU_API
> diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
> index de67c47..bb836a3 100644
> --- a/drivers/vfio/Makefile
> +++ b/drivers/vfio/Makefile
> @@ -3,6 +3,7 @@ vfio_virqfd-y := virqfd.o
>  
>  obj-$(CONFIG_VFIO) += vfio.o
>  obj-$(CONFIG_VFIO_VIRQFD) += vfio_virqfd.o
> +obj-$(CONFIG_VFIO_PASID) += vfio_pasid.o
>  obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
>  obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
>  obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o
> diff --git a/drivers/vfio/vfio_pasid.c b/drivers/vfio/vfio_pasid.c
> new file mode 100644
> index 0000000..66e6054e
> --- /dev/null
> +++ b/drivers/vfio/vfio_pasid.c
> @@ -0,0 +1,235 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2020 Intel Corporation.
> + *     Author: Liu Yi L <yi.l.liu@intel.com>
> + *
> + */
> +
> +#include <linux/vfio.h>
> +#include <linux/eventfd.h>
> +#include <linux/file.h>
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +#include <linux/sched/mm.h>
> +
> +#define DRIVER_VERSION  "0.1"
> +#define DRIVER_AUTHOR   "Liu Yi L <yi.l.liu@intel.com>"
> +#define DRIVER_DESC     "PASID management for VFIO bus drivers"
> +
> +#define VFIO_DEFAULT_PASID_QUOTA	1000
> +static int pasid_quota = VFIO_DEFAULT_PASID_QUOTA;
> +module_param_named(pasid_quota, pasid_quota, uint, 0444);
> +MODULE_PARM_DESC(pasid_quota,
> +		 " Set the quota for max number of PASIDs that an application is allowed to request (default 1000)");
s/ Set/Set
> +
> +struct vfio_mm_token {
> +	unsigned long long val;
> +};
> +
> +struct vfio_mm {
> +	struct kref		kref;
> +	int			ioasid_sid;
> +	struct mutex		pasid_lock;
> +	struct rb_root		pasid_list;
> +	struct list_head	next;
> +	struct vfio_mm_token	token;
> +};
> +
> +static struct mutex		vfio_mm_lock;
> +static struct list_head		vfio_mm_list;
> +
> +struct vfio_pasid {
> +	struct rb_node		node;
> +	ioasid_t		pasid;
> +};
> +
> +static void vfio_remove_all_pasids(struct vfio_mm *vmm);
> +
> +/* called with vfio.vfio_mm_lock held */
> +static void vfio_mm_release(struct kref *kref)
> +{
> +	struct vfio_mm *vmm = container_of(kref, struct vfio_mm, kref);
> +
> +	list_del(&vmm->next);
> +	mutex_unlock(&vfio_mm_lock);
> +	vfio_remove_all_pasids(vmm);
> +	ioasid_free_set(vmm->ioasid_sid, true);
> +	kfree(vmm);
> +}
> +
> +void vfio_mm_put(struct vfio_mm *vmm)
> +{
> +	kref_put_mutex(&vmm->kref, vfio_mm_release, &vfio_mm_lock);
> +}
> +
> +static void vfio_mm_get(struct vfio_mm *vmm)
> +{
> +	kref_get(&vmm->kref);
> +}
> +
> +struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task)
> +{
> +	struct mm_struct *mm = get_task_mm(task);
> +	struct vfio_mm *vmm;
> +	unsigned long long val = (unsigned long long)mm;
> +	int ret;
> +
> +	mutex_lock(&vfio_mm_lock);
> +	/* Search existing vfio_mm with current mm pointer */
> +	list_for_each_entry(vmm, &vfio_mm_list, next) {
> +		if (vmm->token.val == val) {
> +			vfio_mm_get(vmm);
> +			goto out;
> +		}
> +	}
> +
> +	vmm = kzalloc(sizeof(*vmm), GFP_KERNEL);
> +	if (!vmm) {
> +		vmm = ERR_PTR(-ENOMEM);
> +		goto out;
> +	}
> +
> +	/*
> +	 * IOASID core provides a 'IOASID set' concept to track all
> +	 * PASIDs associated with a token. Here we use mm_struct as
> +	 * the token and create a IOASID set per mm_struct. All the
> +	 * containers of the process share the same IOASID set.
> +	 */
> +	ret = ioasid_alloc_set((struct ioasid_set *)mm, pasid_quota,
> +			       &vmm->ioasid_sid);
> +	if (ret) {
> +		kfree(vmm);
> +		vmm = ERR_PTR(ret);
> +		goto out;
> +	}
> +
> +	kref_init(&vmm->kref);
> +	vmm->token.val = val;
> +	mutex_init(&vmm->pasid_lock);
> +	vmm->pasid_list = RB_ROOT;
> +
> +	list_add(&vmm->next, &vfio_mm_list);
> +out:
> +	mutex_unlock(&vfio_mm_lock);
> +	mmput(mm);
> +	return vmm;
> +}
> +
> +/*
> + * Find PASID within @min and @max
> + */
> +static struct vfio_pasid *vfio_find_pasid(struct vfio_mm *vmm,
> +					  ioasid_t min, ioasid_t max)
> +{
> +	struct rb_node *node = vmm->pasid_list.rb_node;
> +
> +	while (node) {
> +		struct vfio_pasid *vid = rb_entry(node,
> +						struct vfio_pasid, node);
> +
> +		if (max < vid->pasid)
> +			node = node->rb_left;
> +		else if (min > vid->pasid)
> +			node = node->rb_right;
> +		else
> +			return vid;
> +	}
> +
> +	return NULL;
> +}
> +
> +static void vfio_link_pasid(struct vfio_mm *vmm, struct vfio_pasid *new)
> +{
> +	struct rb_node **link = &vmm->pasid_list.rb_node, *parent = NULL;
> +	struct vfio_pasid *vid;
> +
> +	while (*link) {
> +		parent = *link;
> +		vid = rb_entry(parent, struct vfio_pasid, node);
> +
> +		if (new->pasid <= vid->pasid)
> +			link = &(*link)->rb_left;
> +		else
> +			link = &(*link)->rb_right;
> +	}
> +
> +	rb_link_node(&new->node, parent, link);
> +	rb_insert_color(&new->node, &vmm->pasid_list);
> +}
> +
> +static void vfio_remove_pasid(struct vfio_mm *vmm, struct vfio_pasid *vid)
> +{
> +	rb_erase(&vid->node, &vmm->pasid_list); /* unlink pasid */
nit: to be consistent with vfio_unlink_dma, introduce vfio_unlink_pasid
> +	ioasid_free(vid->pasid);
> +	kfree(vid);
> +}
> +
> +static void vfio_remove_all_pasids(struct vfio_mm *vmm)
> +{
> +	struct rb_node *node;
> +
> +	mutex_lock(&vmm->pasid_lock);
> +	while ((node = rb_first(&vmm->pasid_list)))
> +		vfio_remove_pasid(vmm, rb_entry(node, struct vfio_pasid, node));
> +	mutex_unlock(&vmm->pasid_lock);
> +}
> +
> +int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max)
> +{
> +	ioasid_t pasid;
> +	struct vfio_pasid *vid;
> +
> +	pasid = ioasid_alloc(vmm->ioasid_sid, min, max, NULL);
> +	if (pasid == INVALID_IOASID)
> +		return -ENOSPC;
> +
> +	vid = kzalloc(sizeof(*vid), GFP_KERNEL);
> +	if (!vid) {
> +		ioasid_free(pasid);
> +		return -ENOMEM;
> +	}
> +
> +	vid->pasid = pasid;
> +
> +	mutex_lock(&vmm->pasid_lock);
> +	vfio_link_pasid(vmm, vid);
> +	mutex_unlock(&vmm->pasid_lock);
> +
> +	return pasid;
> +}
I am not totally convinced by your previous reply on EXPORT_SYMBOL_GP()
irrelevance in this patch. But well ;-)


> +
> +void vfio_pasid_free_range(struct vfio_mm *vmm,
> +			   ioasid_t min, ioasid_t max)
> +{
> +	struct vfio_pasid *vid = NULL;
> +
> +	/*
> +	 * IOASID core will notify PASID users (e.g. IOMMU driver) to
> +	 * teardown necessary structures depending on the to-be-freed
> +	 * PASID.
> +	 */
> +	mutex_lock(&vmm->pasid_lock);
> +	while ((vid = vfio_find_pasid(vmm, min, max)) != NULL)
> +		vfio_remove_pasid(vmm, vid);
> +	mutex_unlock(&vmm->pasid_lock);
> +}
> +
> +static int __init vfio_pasid_init(void)
> +{
> +	mutex_init(&vfio_mm_lock);
> +	INIT_LIST_HEAD(&vfio_mm_list);
> +	return 0;
> +}
> +
> +static void __exit vfio_pasid_exit(void)
> +{
> +	WARN_ON(!list_empty(&vfio_mm_list));
> +}
In your previous reply, ie. https://lkml.org/lkml/2020/7/7/273
you said:
"
I guess yes. VFIO_PASID is supposed to be referenced by VFIO_IOMMU_TYPE1
and may be other module. once vfio_pasid_exit() is triggered, that means
its user (VFIO_IOMMU_TYPE1) has been removed. Should all the vfio_mm
instances should have been released. If not, means there is vfio_mm
leak, should be a bug of user module."

if I am not wrong this dependency is not yet known at this stage of the
series? I would rather add this comment either in the commit message or
here.

Thanks

Eric

> +
> +module_init(vfio_pasid_init);
> +module_exit(vfio_pasid_exit);
> +
> +MODULE_VERSION(DRIVER_VERSION);
> +MODULE_LICENSE("GPL v2");
> +MODULE_AUTHOR(DRIVER_AUTHOR);
> +MODULE_DESCRIPTION(DRIVER_DESC);
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index 38d3c6a..31472a9 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -97,6 +97,34 @@ extern int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops *ops);
>  extern void vfio_unregister_iommu_driver(
>  				const struct vfio_iommu_driver_ops *ops);
>  
> +struct vfio_mm;
> +#if IS_ENABLED(CONFIG_VFIO_PASID)
> +extern struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task);
> +extern void vfio_mm_put(struct vfio_mm *vmm);
> +extern int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max);
> +extern void vfio_pasid_free_range(struct vfio_mm *vmm,
> +				  ioasid_t min, ioasid_t max);
> +#else
> +static inline struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task)
> +{
> +	return ERR_PTR(-ENOTTY);
> +}
> +
> +static inline void vfio_mm_put(struct vfio_mm *vmm)
> +{
> +}
> +
> +static inline int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max)
> +{
> +	return -ENOTTY;
> +}
> +
> +static inline void vfio_pasid_free_range(struct vfio_mm *vmm,
> +					  ioasid_t min, ioasid_t max)
> +{
> +}
> +#endif /* CONFIG_VFIO_PASID */
> +
>  /*
>   * External user API
>   */
> 

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v5 07/15] vfio/type1: Add VFIO_IOMMU_PASID_REQUEST (alloc/free)
  2020-07-12 11:21 ` [PATCH v5 07/15] vfio/type1: Add VFIO_IOMMU_PASID_REQUEST (alloc/free) Liu Yi L
@ 2020-07-19 15:38   ` Auger Eric
  2020-07-20  8:56     ` Liu, Yi L
  0 siblings, 1 reply; 58+ messages in thread
From: Auger Eric @ 2020-07-19 15:38 UTC (permalink / raw)
  To: Liu Yi L, alex.williamson, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	iommu, linux-kernel, yi.y.sun, hao.wu

Yi,

On 7/12/20 1:21 PM, Liu Yi L wrote:
> This patch allows user space to request PASID allocation/free, e.g. when
> serving the request from the guest.
> 
> PASIDs that are not freed by userspace are automatically freed when the
> IOASID set is destroyed when process exits.
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Eric Auger <eric.auger@redhat.com>
> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
> v4 -> v5:
> *) address comments from Eric Auger.
> *) the comments for the PASID_FREE request is addressed in patch 5/15 of
>    this series.
> 
> v3 -> v4:
> *) address comments from v3, except the below comment against the range
>    of PASID_FREE request. needs more help on it.
>     "> +if (req.range.min > req.range.max)
> 
>     Is it exploitable that a user can spin the kernel for a long time in
>     the case of a free by calling this with [0, MAX_UINT] regardless of
>     their actual allocations?"
>     https://lore.kernel.org/linux-iommu/20200702151832.048b44d1@x1.home/
> 
> v1 -> v2:
> *) move the vfio_mm related code to be a seprate module
> *) use a single structure for alloc/free, could support a range of PASIDs
> *) fetch vfio_mm at group_attach time instead of at iommu driver open time
> ---
>  drivers/vfio/Kconfig            |  1 +
>  drivers/vfio/vfio_iommu_type1.c | 85 +++++++++++++++++++++++++++++++++++++++++
>  drivers/vfio/vfio_pasid.c       | 10 +++++
>  include/linux/vfio.h            |  6 +++
>  include/uapi/linux/vfio.h       | 37 ++++++++++++++++++
>  5 files changed, 139 insertions(+)
> 
> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> index 3d8a108..95d90c6 100644
> --- a/drivers/vfio/Kconfig
> +++ b/drivers/vfio/Kconfig
> @@ -2,6 +2,7 @@
>  config VFIO_IOMMU_TYPE1
>  	tristate
>  	depends on VFIO
> +	select VFIO_PASID if (X86)
>  	default n
>  
>  config VFIO_IOMMU_SPAPR_TCE
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index ed80104..55b4065 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -76,6 +76,7 @@ struct vfio_iommu {
>  	bool				dirty_page_tracking;
>  	bool				pinned_page_dirty_scope;
>  	struct iommu_nesting_info	*nesting_info;
> +	struct vfio_mm			*vmm;
>  };
>  
>  struct vfio_domain {
> @@ -1937,6 +1938,11 @@ static void vfio_iommu_iova_insert_copy(struct vfio_iommu *iommu,
>  
>  static void vfio_iommu_release_nesting_info(struct vfio_iommu *iommu)
>  {
> +	if (iommu->vmm) {
> +		vfio_mm_put(iommu->vmm);
> +		iommu->vmm = NULL;
> +	}
> +
>  	kfree(iommu->nesting_info);
>  	iommu->nesting_info = NULL;
>  }
> @@ -2071,6 +2077,26 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>  					    iommu->nesting_info);
>  		if (ret)
>  			goto out_detach;
> +
> +		if (iommu->nesting_info->features &
> +					IOMMU_NESTING_FEAT_SYSWIDE_PASID) {
> +			struct vfio_mm *vmm;
> +			int sid;
> +
> +			vmm = vfio_mm_get_from_task(current);
> +			if (IS_ERR(vmm)) {
> +				ret = PTR_ERR(vmm);
> +				goto out_detach;
> +			}
> +			iommu->vmm = vmm;
> +
> +			sid = vfio_mm_ioasid_sid(vmm);
> +			ret = iommu_domain_set_attr(domain->domain,
> +						    DOMAIN_ATTR_IOASID_SID,
> +						    &sid);
> +			if (ret)
> +				goto out_detach;
> +		}
>  	}
>  
>  	/* Get aperture info */
> @@ -2855,6 +2881,63 @@ static int vfio_iommu_type1_dirty_pages(struct vfio_iommu *iommu,
>  	return -EINVAL;
>  }
>  
> +static int vfio_iommu_type1_pasid_alloc(struct vfio_iommu *iommu,
> +					unsigned int min,
> +					unsigned int max)
> +{
> +	int ret = -EOPNOTSUPP;
> +
> +	mutex_lock(&iommu->lock);
> +	if (iommu->vmm)
> +		ret = vfio_pasid_alloc(iommu->vmm, min, max);
> +	mutex_unlock(&iommu->lock);
> +	return ret;
> +}
> +
> +static int vfio_iommu_type1_pasid_free(struct vfio_iommu *iommu,
> +				       unsigned int min,
> +				       unsigned int max)
> +{
> +	int ret = -EOPNOTSUPP;
> +
> +	mutex_lock(&iommu->lock);
> +	if (iommu->vmm) {
> +		vfio_pasid_free_range(iommu->vmm, min, max);
> +		ret = 0;
> +	}
> +	mutex_unlock(&iommu->lock);
> +	return ret;
> +}
> +
> +static int vfio_iommu_type1_pasid_request(struct vfio_iommu *iommu,
> +					  unsigned long arg)
> +{
> +	struct vfio_iommu_type1_pasid_request req;
> +	unsigned long minsz;
> +
> +	minsz = offsetofend(struct vfio_iommu_type1_pasid_request, range);
> +
> +	if (copy_from_user(&req, (void __user *)arg, minsz))
> +		return -EFAULT;
> +
> +	if (req.argsz < minsz || (req.flags & ~VFIO_PASID_REQUEST_MASK))
> +		return -EINVAL;
> +
> +	if (req.range.min > req.range.max)
> +		return -EINVAL;
> +
> +	switch (req.flags & VFIO_PASID_REQUEST_MASK) {
> +	case VFIO_IOMMU_FLAG_ALLOC_PASID:not sure it is worth to introduce both vfio_iommu_type1_pasid_free/alloc
helpers. You could take the lock above, and test iommu->vmm as well above.
> +		return vfio_iommu_type1_pasid_alloc(iommu,
> +					req.range.min, req.range.max);
> +	case VFIO_IOMMU_FLAG_FREE_PASID:
> +		return vfio_iommu_type1_pasid_free(iommu,
> +					req.range.min, req.range.max);
> +	default:
> +		return -EINVAL;
> +	}
> +}
> +
>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>  				   unsigned int cmd, unsigned long arg)
>  {
> @@ -2871,6 +2954,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  		return vfio_iommu_type1_unmap_dma(iommu, arg);
>  	case VFIO_IOMMU_DIRTY_PAGES:
>  		return vfio_iommu_type1_dirty_pages(iommu, arg);
> +	case VFIO_IOMMU_PASID_REQUEST:
> +		return vfio_iommu_type1_pasid_request(iommu, arg);
>  	default:
>  		return -ENOTTY;
>  	}
> diff --git a/drivers/vfio/vfio_pasid.c b/drivers/vfio/vfio_pasid.c
> index 66e6054e..ebec244 100644
> --- a/drivers/vfio/vfio_pasid.c
> +++ b/drivers/vfio/vfio_pasid.c
> @@ -61,6 +61,7 @@ void vfio_mm_put(struct vfio_mm *vmm)
>  {
>  	kref_put_mutex(&vmm->kref, vfio_mm_release, &vfio_mm_lock);
>  }
> +EXPORT_SYMBOL_GPL(vfio_mm_put);
>  
>  static void vfio_mm_get(struct vfio_mm *vmm)
>  {
> @@ -114,6 +115,13 @@ struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task)
>  	mmput(mm);
>  	return vmm;
>  }
> +EXPORT_SYMBOL_GPL(vfio_mm_get_from_task);
> +
> +int vfio_mm_ioasid_sid(struct vfio_mm *vmm)
> +{
> +	return vmm->ioasid_sid;
> +}
> +EXPORT_SYMBOL_GPL(vfio_mm_ioasid_sid);
>  
>  /*
>   * Find PASID within @min and @max
> @@ -197,6 +205,7 @@ int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max)
>  
>  	return pasid;
>  }
> +EXPORT_SYMBOL_GPL(vfio_pasid_alloc);
>  
>  void vfio_pasid_free_range(struct vfio_mm *vmm,
>  			   ioasid_t min, ioasid_t max)
> @@ -213,6 +222,7 @@ void vfio_pasid_free_range(struct vfio_mm *vmm,
>  		vfio_remove_pasid(vmm, vid);
>  	mutex_unlock(&vmm->pasid_lock);
>  }
> +EXPORT_SYMBOL_GPL(vfio_pasid_free_range);
>  
>  static int __init vfio_pasid_init(void)
>  {
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index 31472a9..a111108 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -101,6 +101,7 @@ struct vfio_mm;
>  #if IS_ENABLED(CONFIG_VFIO_PASID)
>  extern struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task);
>  extern void vfio_mm_put(struct vfio_mm *vmm);
> +int vfio_mm_ioasid_sid(struct vfio_mm *vmm);
I still don't get why it does not take an extern as the other functions
implemented in vfio_pasid.c and used in other modules. what's the
difference?
>  extern int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max);
>  extern void vfio_pasid_free_range(struct vfio_mm *vmm,
>  				  ioasid_t min, ioasid_t max);
> @@ -114,6 +115,11 @@ static inline void vfio_mm_put(struct vfio_mm *vmm)
>  {
>  }
>  
> +static inline int vfio_mm_ioasid_sid(struct vfio_mm *vmm)
> +{
> +	return -ENOTTY;
> +}
> +
>  static inline int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max)
>  {
>  	return -ENOTTY;
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 46a78af..96a115f 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -1172,6 +1172,43 @@ struct vfio_iommu_type1_dirty_bitmap_get {
>  
>  #define VFIO_IOMMU_DIRTY_PAGES             _IO(VFIO_TYPE, VFIO_BASE + 17)
>  
> +/**
> + * VFIO_IOMMU_PASID_REQUEST - _IOWR(VFIO_TYPE, VFIO_BASE + 18,
> + *				struct vfio_iommu_type1_pasid_request)
> + *
> + * PASID (Processor Address Space ID) is a PCIe concept for tagging
> + * address spaces in DMA requests. When system-wide PASID allocation
> + * is required by underlying iommu driver (e.g. Intel VT-d), this
by the underlying
> + * provides an interface for userspace to request pasid alloc/free
> + * for its assigned devices. Userspace should check the availability
> + * of this API by checking VFIO_IOMMU_TYPE1_INFO_CAP_NESTING through
> + * VFIO_IOMMU_GET_INFO.
> + *
> + * @flags=VFIO_IOMMU_FLAG_ALLOC_PASID, allocate a single PASID within @range.
> + * @flags=VFIO_IOMMU_FLAG_FREE_PASID, free the PASIDs within @range.
> + * @range is [min, max], which means both @min and @max are inclusive.
> + * ALLOC_PASID and FREE_PASID are mutually exclusive.
> + *
> + * returns: allocated PASID value on success, -errno on failure for
> + *	     ALLOC_PASID;
> + *	     0 for FREE_PASID operation;
> + */
> +struct vfio_iommu_type1_pasid_request {
> +	__u32	argsz;
> +#define VFIO_IOMMU_FLAG_ALLOC_PASID	(1 << 0)
> +#define VFIO_IOMMU_FLAG_FREE_PASID	(1 << 1)
> +	__u32	flags;
> +	struct {
> +		__u32	min;
> +		__u32	max;
> +	} range;
> +};
> +
> +#define VFIO_PASID_REQUEST_MASK	(VFIO_IOMMU_FLAG_ALLOC_PASID | \
> +					 VFIO_IOMMU_FLAG_FREE_PASID)
> +
> +#define VFIO_IOMMU_PASID_REQUEST	_IO(VFIO_TYPE, VFIO_BASE + 18)
> +
>  /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
>  
>  /*
> 
Otherwise, besides the pieces I would have put directly in the
vfio_pasid module patch (EXPORT_SYMBOL_GPL and vfio_mm_ioasid_sid),
looks good.

Thanks

Eric

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v5 09/15] iommu/vt-d: Check ownership for PASIDs from user-space
  2020-07-12 11:21 ` [PATCH v5 09/15] iommu/vt-d: Check ownership for PASIDs from user-space Liu Yi L
@ 2020-07-19 16:06   ` Auger Eric
  2020-07-20 10:18     ` Liu, Yi L
  0 siblings, 1 reply; 58+ messages in thread
From: Auger Eric @ 2020-07-19 16:06 UTC (permalink / raw)
  To: Liu Yi L, alex.williamson, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	iommu, linux-kernel, yi.y.sun, hao.wu

Hi Yi,

On 7/12/20 1:21 PM, Liu Yi L wrote:
> When an IOMMU domain with nesting attribute is used for guest SVA, a
> system-wide PASID is allocated for binding with the device and the domain.
> For security reason, we need to check the PASID passsed from user-space.
passed
> e.g. page table bind/unbind and PASID related cache invalidation.
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Eric Auger <eric.auger@redhat.com>
> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
>  drivers/iommu/intel/iommu.c | 10 ++++++++++
>  drivers/iommu/intel/svm.c   |  7 +++++--
>  2 files changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 4d54198..a9504cb 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -5436,6 +5436,7 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, struct device *dev,
>  		int granu = 0;
>  		u64 pasid = 0;
>  		u64 addr = 0;
> +		void *pdata;
>  
>  		granu = to_vtd_granularity(cache_type, inv_info->granularity);
>  		if (granu == -EINVAL) {
> @@ -5456,6 +5457,15 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, struct device *dev,
>  			 (inv_info->granu.addr_info.flags & IOMMU_INV_ADDR_FLAGS_PASID))
>  			pasid = inv_info->granu.addr_info.pasid;
>  
> +		pdata = ioasid_find(dmar_domain->ioasid_sid, pasid, NULL);
> +		if (!pdata) {
> +			ret = -EINVAL;
> +			goto out_unlock;
> +		} else if (IS_ERR(pdata)) {
> +			ret = PTR_ERR(pdata);
> +			goto out_unlock;
> +		}
> +
>  		switch (BIT(cache_type)) {
>  		case IOMMU_CACHE_INV_TYPE_IOTLB:
>  			/* HW will ignore LSB bits based on address mask */
> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> index d2c0e1a..212dee0 100644
> --- a/drivers/iommu/intel/svm.c
> +++ b/drivers/iommu/intel/svm.c
> @@ -319,7 +319,7 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
>  	dmar_domain = to_dmar_domain(domain);
>  
>  	mutex_lock(&pasid_mutex);
> -	svm = ioasid_find(INVALID_IOASID_SET, data->hpasid, NULL);
I do not get what the call was supposed to do before that patch?
> +	svm = ioasid_find(dmar_domain->ioasid_sid, data->hpasid, NULL);
>  	if (IS_ERR(svm)) {
>  		ret = PTR_ERR(svm);
>  		goto out;
> @@ -436,6 +436,7 @@ int intel_svm_unbind_gpasid(struct iommu_domain *domain,
>  			    struct device *dev, ioasid_t pasid)
>  {
>  	struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> +	struct dmar_domain *dmar_domain;
>  	struct intel_svm_dev *sdev;
>  	struct intel_svm *svm;
>  	int ret = -EINVAL;
> @@ -443,8 +444,10 @@ int intel_svm_unbind_gpasid(struct iommu_domain *domain,
>  	if (WARN_ON(!iommu))
>  		return -EINVAL;
>  
> +	dmar_domain = to_dmar_domain(domain);
> +
>  	mutex_lock(&pasid_mutex);
> -	svm = ioasid_find(INVALID_IOASID_SET, pasid, NULL);
> +	svm = ioasid_find(dmar_domain->ioasid_sid, pasid, NULL);
just to make sure, about the locking, can't domain->ioasid_sid change
under the hood?

Thanks

Eric
>  	if (!svm) {
>  		ret = -EINVAL;
>  		goto out;
> 

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [PATCH v5 02/15] iommu: Report domain nesting info
  2020-07-17 16:29   ` Auger Eric
@ 2020-07-20  7:20     ` Liu, Yi L
  0 siblings, 0 replies; 58+ messages in thread
From: Liu, Yi L @ 2020-07-20  7:20 UTC (permalink / raw)
  To: Auger Eric, alex.williamson, baolu.lu, joro
  Cc: jean-philippe, Tian, Kevin, Raj, Ashok, kvm, stefanha, Tian,
	 Jun J, iommu, linux-kernel, Sun, Yi Y, Wu, Hao

Hi Eric,

> From: Auger Eric <eric.auger@redhat.com>
> Sent: Saturday, July 18, 2020 12:29 AM
> 
> Hi Yi,
> 
> On 7/12/20 1:20 PM, Liu Yi L wrote:
> > IOMMUs that support nesting translation needs report the capability info
> s/needs/need to report

yep.

> > to userspace, e.g. the format of first level/stage paging structures.
> It gives information about requirements the userspace needs to implement
> plus other features characterizing the physical implementation.

got it. will add it in next version.

> >
> > This patch reports nesting info by DOMAIN_ATTR_NESTING. Caller can get
> > nesting info after setting DOMAIN_ATTR_NESTING.
> I guess you meant after selecting VFIO_TYPE1_NESTING_IOMMU?

yes, it is. ok, perhaps, it's better to say get nesting info after selecting
VFIO_TYPE1_NESTING_IOMMU.

> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Alex Williamson <alex.williamson@redhat.com>
> > Cc: Eric Auger <eric.auger@redhat.com>
> > Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > Cc: Joerg Roedel <joro@8bytes.org>
> > Cc: Lu Baolu <baolu.lu@linux.intel.com>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > ---
> > v4 -> v5:
> > *) address comments from Eric Auger.
> >
> > v3 -> v4:
> > *) split the SMMU driver changes to be a separate patch
> > *) move the @addr_width and @pasid_bits from vendor specific
> >    part to generic part.
> > *) tweak the description for the @features field of struct
> >    iommu_nesting_info.
> > *) add description on the @data[] field of struct iommu_nesting_info
> >
> > v2 -> v3:
> > *) remvoe cap/ecap_mask in iommu_nesting_info.
> > *) reuse DOMAIN_ATTR_NESTING to get nesting info.
> > *) return an empty iommu_nesting_info for SMMU drivers per Jean'
> >    suggestion.
> > ---
> >  include/uapi/linux/iommu.h | 77
> ++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 77 insertions(+)
> >
> > diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> > index 1afc661..d2a47c4 100644
> > --- a/include/uapi/linux/iommu.h
> > +++ b/include/uapi/linux/iommu.h
> > @@ -332,4 +332,81 @@ struct iommu_gpasid_bind_data {
> >  	} vendor;
> >  };
> >
> > +/*
> > + * struct iommu_nesting_info - Information for nesting-capable IOMMU.
> > + *			       user space should check it before using
> > + *			       nesting capability.
> > + *
> > + * @size:	size of the whole structure
> > + * @format:	PASID table entry format, the same definition as struct
> > + *		iommu_gpasid_bind_data @format.
> > + * @features:	supported nesting features.
> > + * @flags:	currently reserved for future extension.
> > + * @addr_width:	The output addr width of first level/stage translation
> > + * @pasid_bits:	Maximum supported PASID bits, 0 represents no PASID
> > + *		support.
> > + * @data:	vendor specific cap info. data[] structure type can be deduced
> > + *		from @format field.
> > + *
> > + *
> +===============+===================================================
> ===+
> > + * | feature       |  Notes                                               |
> > + *
> +===============+===================================================
> ===+
> > + * | SYSWIDE_PASID |  PASIDs are managed in system-wide, instead of per   |
> s/in system-wide/system-wide ?

got it.

> > + * |               |  device. When a device is assigned to userspace or   |
> > + * |               |  VM, proper uAPI (userspace driver framework uAPI,   |
> > + * |               |  e.g. VFIO) must be used to allocate/free PASIDs for |
> > + * |               |  the assigned device.
> Isn't it possible to be more explicit, something like:
>                       |
> System-wide PASID management is mandated by the physical IOMMU. All
> PASIDs allocation must be mediated through the TBD API.

yep, I can add it.

> > + * +---------------+------------------------------------------------------+
> > + * | BIND_PGTBL    |  The owner of the first level/stage page table must  |
> > + * |               |  explicitly bind the page table to associated PASID  |
> > + * |               |  (either the one specified in bind request or the    |
> > + * |               |  default PASID of iommu domain), through userspace   |
> > + * |               |  driver framework uAPI (e.g. VFIO_IOMMU_NESTING_OP). |
> As per your answer in https://lkml.org/lkml/2020/7/6/383, I now
> understand ARM would not expose that BIND_PGTBL nesting feature,

yes, that's my point.

> I still
> think the above wording is a bit confusing. Maybe you may explicitly
> talk about the PASID *entry* that needs to be passed from guest to host.
> On ARM we directly pass the PASID table but when reading the above
> description I fail to determine if this does not fit that description.

yes, I can do it.

> > + * +---------------+------------------------------------------------------+
> > + * | CACHE_INVLD   |  The owner of the first level/stage page table must  |
> > + * |               |  explicitly invalidate the IOMMU cache through uAPI  |
> > + * |               |  provided by userspace driver framework (e.g. VFIO)  |
> > + * |               |  according to vendor-specific requirement when       |
> > + * |               |  changing the page table.                            |
> > + * +---------------+------------------------------------------------------+
> 
> instead of using the "uAPI provided by userspace driver framework (e.g.
> VFIO)", can't we use the so-called IOMMU UAPI terminology which now has
> a userspace documentation?

the problem is current IOMMU UAPI definitions is actually embedded in
other VFIO UAPI. if it can make the description more clear, I can follow
your suggestion. :-)

> 
> > + *
> > + * @data[] types defined for @format:
> > + *
> +================================+==================================
> ===+
> > + * | @format                        | @data[]                             |
> > + *
> +================================+==================================
> ===+
> > + * | IOMMU_PASID_FORMAT_INTEL_VTD   | struct iommu_nesting_info_vtd       |
> > + * +--------------------------------+-------------------------------------+
> > + *
> > + */
> > +struct iommu_nesting_info {
> > +	__u32	size;
> shouldn't it be @argsz to fit the iommu uapi convention and take benefit
> to put the flags field just below?

make sense.

> > +	__u32	format;
> > +#define IOMMU_NESTING_FEAT_SYSWIDE_PASID	(1 << 0)
> > +#define IOMMU_NESTING_FEAT_BIND_PGTBL		(1 << 1)
> > +#define IOMMU_NESTING_FEAT_CACHE_INVLD		(1 << 2)
> > +	__u32	features;
> > +	__u32	flags;
> > +	__u16	addr_width;
> > +	__u16	pasid_bits;
> > +	__u32	padding;
> > +	__u8	data[];
> > +};
> > +
> > +/*
> > + * struct iommu_nesting_info_vtd - Intel VT-d specific nesting info
> > + *
> > + * @flags:	VT-d specific flags. Currently reserved for future
> > + *		extension.
> must be set to 0?

yes. will add it.

Thanks,
Yi Liu

> > + * @cap_reg:	Describe basic capabilities as defined in VT-d capability
> > + *		register.
> > + * @ecap_reg:	Describe the extended capabilities as defined in VT-d
> > + *		extended capability register.
> > + */
> > +struct iommu_nesting_info_vtd {
> > +	__u32	flags;
> > +	__u32	padding;
> > +	__u64	cap_reg;
> > +	__u64	ecap_reg;
> > +};
> > +
> >  #endif /* _UAPI_IOMMU_H */
> Thanks
> 
> Eric
> >

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [PATCH v5 03/15] iommu/smmu: Report empty domain nesting info
  2020-07-17 17:18   ` Auger Eric
@ 2020-07-20  7:26     ` Liu, Yi L
  0 siblings, 0 replies; 58+ messages in thread
From: Liu, Yi L @ 2020-07-20  7:26 UTC (permalink / raw)
  To: Auger Eric, alex.williamson, baolu.lu, joro
  Cc: jean-philippe, Tian, Kevin, Raj, Ashok, kvm, stefanha, Tian,
	 Jun J, iommu, linux-kernel, Sun, Yi Y, Robin Murphy,
	Will Deacon, Wu, Hao

Hi Eric,

> From: Auger Eric <eric.auger@redhat.com>
> 
> Yi,
> 
> On 7/12/20 1:20 PM, Liu Yi L wrote:
> > This patch is added as instead of returning a boolean for
> > DOMAIN_ATTR_NESTING,
> > iommu_domain_get_attr() should return an iommu_nesting_info handle.
> 
> you may add in the commit message you return an empty nesting info struct for now
> as true nesting is not yet supported by the SMMUs.

will do.

> Besides:
> Reviewed-by: Eric Auger <eric.auger@redhat.com>

thanks.

Regards,
Yi Liu

> Thanks
> 
> Eric
> >
> > Cc: Will Deacon <will@kernel.org>
> > Cc: Robin Murphy <robin.murphy@arm.com>
> > Cc: Eric Auger <eric.auger@redhat.com>
> > Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > Suggested-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > ---
> > v4 -> v5:
> > *) address comments from Eric Auger.
> > ---
> >  drivers/iommu/arm-smmu-v3.c | 29 +++++++++++++++++++++++++++--
> >  drivers/iommu/arm-smmu.c    | 29 +++++++++++++++++++++++++++--
> >  2 files changed, 54 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> > index f578677..ec815d7 100644
> > --- a/drivers/iommu/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm-smmu-v3.c
> > @@ -3019,6 +3019,32 @@ static struct iommu_group
> *arm_smmu_device_group(struct device *dev)
> >  	return group;
> >  }
> >
> > +static int arm_smmu_domain_nesting_info(struct arm_smmu_domain
> *smmu_domain,
> > +					void *data)
> > +{
> > +	struct iommu_nesting_info *info = (struct iommu_nesting_info *)data;
> > +	unsigned int size;
> > +
> > +	if (!info || smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
> > +		return -ENODEV;
> > +
> > +	size = sizeof(struct iommu_nesting_info);
> > +
> > +	/*
> > +	 * if provided buffer size is smaller than expected, should
> > +	 * return 0 and also the expected buffer size to caller.
> > +	 */
> > +	if (info->size < size) {
> > +		info->size = size;
> > +		return 0;
> > +	}
> > +
> > +	/* report an empty iommu_nesting_info for now */
> > +	memset(info, 0x0, size);
> > +	info->size = size;
> > +	return 0;
> > +}
> > +
> >  static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
> >  				    enum iommu_attr attr, void *data)  { @@ -
> 3028,8 +3054,7 @@
> > static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
> >  	case IOMMU_DOMAIN_UNMANAGED:
> >  		switch (attr) {
> >  		case DOMAIN_ATTR_NESTING:
> > -			*(int *)data = (smmu_domain->stage ==
> ARM_SMMU_DOMAIN_NESTED);
> > -			return 0;
> > +			return arm_smmu_domain_nesting_info(smmu_domain,
> data);
> >  		default:
> >  			return -ENODEV;
> >  		}
> > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index
> > 243bc4c..09e2f1b 100644
> > --- a/drivers/iommu/arm-smmu.c
> > +++ b/drivers/iommu/arm-smmu.c
> > @@ -1506,6 +1506,32 @@ static struct iommu_group
> *arm_smmu_device_group(struct device *dev)
> >  	return group;
> >  }
> >
> > +static int arm_smmu_domain_nesting_info(struct arm_smmu_domain
> *smmu_domain,
> > +					void *data)
> > +{
> > +	struct iommu_nesting_info *info = (struct iommu_nesting_info *)data;
> > +	unsigned int size;
> > +
> > +	if (!info || smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
> > +		return -ENODEV;
> > +
> > +	size = sizeof(struct iommu_nesting_info);
> > +
> > +	/*
> > +	 * if provided buffer size is smaller than expected, should
> > +	 * return 0 and also the expected buffer size to caller.
> > +	 */
> > +	if (info->size < size) {
> > +		info->size = size;
> > +		return 0;
> > +	}
> > +
> > +	/* report an empty iommu_nesting_info for now */
> > +	memset(info, 0x0, size);
> > +	info->size = size;
> > +	return 0;
> > +}
> > +
> >  static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
> >  				    enum iommu_attr attr, void *data)  { @@ -
> 1515,8 +1541,7 @@
> > static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
> >  	case IOMMU_DOMAIN_UNMANAGED:
> >  		switch (attr) {
> >  		case DOMAIN_ATTR_NESTING:
> > -			*(int *)data = (smmu_domain->stage ==
> ARM_SMMU_DOMAIN_NESTED);
> > -			return 0;
> > +			return arm_smmu_domain_nesting_info(smmu_domain,
> data);
> >  		default:
> >  			return -ENODEV;
> >  		}
> >

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [PATCH v5 04/15] vfio/type1: Report iommu nesting info to userspace
  2020-07-17 17:34   ` Auger Eric
@ 2020-07-20  7:51     ` Liu, Yi L
  2020-07-20  8:33       ` Auger Eric
  0 siblings, 1 reply; 58+ messages in thread
From: Liu, Yi L @ 2020-07-20  7:51 UTC (permalink / raw)
  To: Auger Eric, alex.williamson, baolu.lu, joro
  Cc: jean-philippe, Tian, Kevin, Raj, Ashok, kvm, stefanha, Tian,
	 Jun J, iommu, linux-kernel, Sun, Yi Y, Wu, Hao

Hi Eric,

> From: Auger Eric <eric.auger@redhat.com>
> Sent: Saturday, July 18, 2020 1:34 AM
> 
> Yi,
> 
> On 7/12/20 1:20 PM, Liu Yi L wrote:
> > This patch exports iommu nesting capability info to user space through
> > VFIO. User space is expected to check this info for supported uAPIs (e.g.
> it is not only to check the supported uAPIS but rather to know which
> callbacks it must call upon vIOMMU events and which features are
> supported by the physical IOMMU.

yes, will refine the description per your comment.

> > PASID alloc/free, bind page table, and cache invalidation) and the vendor
> > specific format information for first level/stage page table that will be
> > bound to.
> >
> > The nesting info is available only after the nesting iommu type is set
> > for a container.
> to NESTED type

you mean "The nesting info is available only after container set to be NESTED type."

right?

>  Current implementation imposes one limitation - one
> > nesting container should include at most one group. The philosophy of
> > vfio container is having all groups/devices within the container share
> > the same IOMMU context. When vSVA is enabled, one IOMMU context could
> > include one 2nd-level address space and multiple 1st-level address spaces.
> > While the 2nd-level address space is reasonably sharable by multiple groups
> > , blindly sharing 1st-level address spaces across all groups within the
> > container might instead break the guest expectation. In the future sub/
> > super container concept might be introduced to allow partial address space
> > sharing within an IOMMU context. But for now let's go with this restriction
> > by requiring singleton container for using nesting iommu features. Below
> > link has the related discussion about this decision.
> 
> Maybe add a note about SMMU related changes spotted by Jean.

I guess you mean the comments Jean gave in patch 3/15, right? I'll
copy his comments and also give the below link as well.

https://lore.kernel.org/kvm/20200717090900.GC4850@myrica/

> >
> > https://lkml.org/lkml/2020/5/15/1028
> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Alex Williamson <alex.williamson@redhat.com>
> > Cc: Eric Auger <eric.auger@redhat.com>
> > Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > Cc: Joerg Roedel <joro@8bytes.org>
> > Cc: Lu Baolu <baolu.lu@linux.intel.com>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > ---
> > v4 -> v5:
> > *) address comments from Eric Auger.
> > *) return struct iommu_nesting_info for
> VFIO_IOMMU_TYPE1_INFO_CAP_NESTING as
> >    cap is much "cheap", if needs extension in future, just define another cap.
> >    https://lore.kernel.org/kvm/20200708132947.5b7ee954@x1.home/
> >
> > v3 -> v4:
> > *) address comments against v3.
> >
> > v1 -> v2:
> > *) added in v2
> > ---
> >  drivers/vfio/vfio_iommu_type1.c | 102
> +++++++++++++++++++++++++++++++++++-----
> >  include/uapi/linux/vfio.h       |  19 ++++++++
> >  2 files changed, 109 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > index 3bd70ff..ed80104 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -62,18 +62,20 @@ MODULE_PARM_DESC(dma_entry_limit,
> >  		 "Maximum number of user DMA mappings per container (65535).");
> >
> >  struct vfio_iommu {
> > -	struct list_head	domain_list;
> > -	struct list_head	iova_list;
> > -	struct vfio_domain	*external_domain; /* domain for external user */
> > -	struct mutex		lock;
> > -	struct rb_root		dma_list;
> > -	struct blocking_notifier_head notifier;
> > -	unsigned int		dma_avail;
> > -	uint64_t		pgsize_bitmap;
> > -	bool			v2;
> > -	bool			nesting;
> > -	bool			dirty_page_tracking;
> > -	bool			pinned_page_dirty_scope;
> > +	struct list_head		domain_list;
> > +	struct list_head		iova_list;
> > +	/* domain for external user */
> > +	struct vfio_domain		*external_domain;
> > +	struct mutex			lock;
> > +	struct rb_root			dma_list;
> > +	struct blocking_notifier_head	notifier;
> > +	unsigned int			dma_avail;
> > +	uint64_t			pgsize_bitmap;
> > +	bool				v2;
> > +	bool				nesting;
> > +	bool				dirty_page_tracking;
> > +	bool				pinned_page_dirty_scope;
> > +	struct iommu_nesting_info	*nesting_info;
> >  };
> >
> >  struct vfio_domain {
> > @@ -130,6 +132,9 @@ struct vfio_regions {
> >  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)	\
> >  					(!list_empty(&iommu->domain_list))
> >
> > +#define CONTAINER_HAS_DOMAIN(iommu)	(((iommu)->external_domain) || \
> > +					 (!list_empty(&(iommu)->domain_list)))
> > +
> >  #define DIRTY_BITMAP_BYTES(n)	(ALIGN(n, BITS_PER_TYPE(u64)) /
> BITS_PER_BYTE)
> >
> >  /*
> > @@ -1929,6 +1934,13 @@ static void vfio_iommu_iova_insert_copy(struct
> vfio_iommu *iommu,
> >
> >  	list_splice_tail(iova_copy, iova);
> >  }
> > +
> > +static void vfio_iommu_release_nesting_info(struct vfio_iommu *iommu)
> > +{
> > +	kfree(iommu->nesting_info);
> > +	iommu->nesting_info = NULL;
> > +}
> > +
> >  static int vfio_iommu_type1_attach_group(void *iommu_data,
> >  					 struct iommu_group *iommu_group)
> >  {
> > @@ -1959,6 +1971,12 @@ static int vfio_iommu_type1_attach_group(void
> *iommu_data,
> >  		}
> >  	}
> >
> > +	/* Nesting type container can include only one group */
> > +	if (iommu->nesting && CONTAINER_HAS_DOMAIN(iommu)) {
> > +		mutex_unlock(&iommu->lock);
> > +		return -EINVAL;
> > +	}
> > +
> >  	group = kzalloc(sizeof(*group), GFP_KERNEL);
> >  	domain = kzalloc(sizeof(*domain), GFP_KERNEL);
> >  	if (!group || !domain) {
> > @@ -2029,6 +2047,32 @@ static int vfio_iommu_type1_attach_group(void
> *iommu_data,
> >  	if (ret)
> >  		goto out_domain;
> >
> > +	/* Nesting cap info is available only after attaching */
> > +	if (iommu->nesting) {
> > +		struct iommu_nesting_info tmp = { .size = 0, };
> > +
> > +		/* First get the size of vendor specific nesting info */
> > +		ret = iommu_domain_get_attr(domain->domain,
> > +					    DOMAIN_ATTR_NESTING,
> > +					    &tmp);
> > +		if (ret)
> > +			goto out_detach;
> > +
> > +		iommu->nesting_info = kzalloc(tmp.size, GFP_KERNEL);
> > +		if (!iommu->nesting_info) {
> > +			ret = -ENOMEM;
> > +			goto out_detach;
> > +		}
> > +
> > +		/* Now get the nesting info */
> > +		iommu->nesting_info->size = tmp.size;
> > +		ret = iommu_domain_get_attr(domain->domain,
> > +					    DOMAIN_ATTR_NESTING,
> > +					    iommu->nesting_info);
> > +		if (ret)
> > +			goto out_detach;

get nesting_info failure will result in group_attach failure.[1]

> > +	}
> > +
> >  	/* Get aperture info */
> >  	iommu_domain_get_attr(domain->domain, DOMAIN_ATTR_GEOMETRY,
> &geo);
> >
> > @@ -2138,6 +2182,7 @@ static int vfio_iommu_type1_attach_group(void
> *iommu_data,
> >  	return 0;
> >
> >  out_detach:
> > +	vfio_iommu_release_nesting_info(iommu);
> >  	vfio_iommu_detach_group(domain, group);
> >  out_domain:
> >  	iommu_domain_free(domain->domain);
> > @@ -2338,6 +2383,8 @@ static void vfio_iommu_type1_detach_group(void
> *iommu_data,
> >  					vfio_iommu_unmap_unpin_all(iommu);
> >  				else
> >
> 	vfio_iommu_unmap_unpin_reaccount(iommu);
> > +
> > +				vfio_iommu_release_nesting_info(iommu);
> >  			}
> >  			iommu_domain_free(domain->domain);
> >  			list_del(&domain->next);
> > @@ -2546,6 +2593,31 @@ static int vfio_iommu_migration_build_caps(struct
> vfio_iommu *iommu,
> >  	return vfio_info_add_capability(caps, &cap_mig.header, sizeof(cap_mig));
> >  }
> >
> > +static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu,
> > +					   struct vfio_info_cap *caps)
> > +{
> > +	struct vfio_info_cap_header *header;
> > +	struct vfio_iommu_type1_info_cap_nesting *nesting_cap;
> > +	size_t size;
> > +
> > +	size = offsetof(struct vfio_iommu_type1_info_cap_nesting, info) +
> > +		iommu->nesting_info->size;
> > +
> > +	header = vfio_info_cap_add(caps, size,
> > +				   VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1);
> > +	if (IS_ERR(header))
> > +		return PTR_ERR(header);
> > +
> > +	nesting_cap = container_of(header,
> > +				   struct vfio_iommu_type1_info_cap_nesting,
> > +				   header);
> > +
> > +	memcpy(&nesting_cap->info, iommu->nesting_info,
> > +	       iommu->nesting_info->size);
> you must check whether nesting_info is non NULL before doing that.

it's now validated in the caller of this function. :-)
 
> Besides I agree with Jean on the fact it may be better to not report the
> capability if nested is not supported.

I see. in this patch, just few lines below [2], vfio only reports nesting
cap when iommu->nesting_info is non-null. so even if userspace selected
nesting type, it may fail at the VFIO_SET_IOMMU phase since group_attach
will be failed for NESTED type container if host iommu doesn’t support
nesting. the code is marked as [1] in this email.

> > +
> > +	return 0;
> > +}
> > +
> >  static int vfio_iommu_type1_get_info(struct vfio_iommu *iommu,
> >  				     unsigned long arg)
> >  {
> > @@ -2581,6 +2653,12 @@ static int vfio_iommu_type1_get_info(struct
> vfio_iommu *iommu,
> >  	if (!ret)
> >  		ret = vfio_iommu_iova_build_caps(iommu, &caps);
> >
> > +	if (iommu->nesting_info) {
> > +		ret = vfio_iommu_info_add_nesting_cap(iommu, &caps);
> > +		if (ret)
> > +			return ret;

here checks nesting_info before reporting nesting cap. [2]

> > +	}
> > +
> >  	mutex_unlock(&iommu->lock);
> >
> >  	if (ret)
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 9204705..46a78af 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -14,6 +14,7 @@
> >
> >  #include <linux/types.h>
> >  #include <linux/ioctl.h>
> > +#include <linux/iommu.h>
> >
> >  #define VFIO_API_VERSION	0
> >
> > @@ -1039,6 +1040,24 @@ struct vfio_iommu_type1_info_cap_migration {
> >  	__u64	max_dirty_bitmap_size;		/* in bytes */
> >  };
> >
> > +/*
> > + * The nesting capability allows to report the related capability
> > + * and info for nesting iommu type.
> > + *
> > + * The structures below define version 1 of this capability.
> > + *
> > + * User space selected VFIO_TYPE1_NESTING_IOMMU type should check
> > + * this capto get supported features.
> s/capto/capability to get

got it.

Regards,
Yi Liu

> > + *
> > + * @info: the nesting info provided by IOMMU driver.
> > + */
> > +#define VFIO_IOMMU_TYPE1_INFO_CAP_NESTING  3
> > +
> > +struct vfio_iommu_type1_info_cap_nesting {
> > +	struct	vfio_info_cap_header header;
> > +	struct iommu_nesting_info info;
> > +};
> > +
> >  #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
> >
> >  /**
> >
> 
> Thanks
> 
> Eric

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [PATCH v5 05/15] vfio: Add PASID allocation/free support
  2020-07-19 15:38   ` Auger Eric
@ 2020-07-20  8:03     ` Liu, Yi L
  2020-07-20  8:26       ` Auger Eric
  0 siblings, 1 reply; 58+ messages in thread
From: Liu, Yi L @ 2020-07-20  8:03 UTC (permalink / raw)
  To: Auger Eric, alex.williamson, baolu.lu, joro
  Cc: jean-philippe, Tian, Kevin, Raj, Ashok, kvm, stefanha, Tian,
	 Jun J, iommu, linux-kernel, Sun, Yi Y, Wu, Hao

Hi Eric,

> From: Auger Eric <eric.auger@redhat.com>
> Sent: Sunday, July 19, 2020 11:39 PM
> 
> Yi,
> 
> On 7/12/20 1:21 PM, Liu Yi L wrote:
> > Shared Virtual Addressing (a.k.a Shared Virtual Memory) allows sharing
> > multiple process virtual address spaces with the device for simplified
> > programming model. PASID is used to tag an virtual address space in
> > DMA requests and to identify the related translation structure in
> > IOMMU. When a PASID-capable device is assigned to a VM, we want the
> > same capability of using PASID to tag guest process virtual address
> > spaces to achieve virtual SVA (vSVA).
> >
> > PASID management for guest is vendor specific. Some vendors (e.g.
> > Intel
> > VT-d) requires system-wide managed PASIDs cross all devices,
> > regardless
> across?

yep. will correct it.

> > of whether a device is used by host or assigned to guest. Other
> > vendors (e.g. ARM SMMU) may allow PASIDs managed per-device thus could
> > be fully delegated to the guest for assigned devices.
> >
> > For system-wide managed PASIDs, this patch introduces a vfio module to
> > handle explicit PASID alloc/free requests from guest. Allocated PASIDs
> > are associated to a process (or, mm_struct) in IOASID core. A vfio_mm
> > object is introduced to track mm_struct. Multiple VFIO containers
> > within a process share the same vfio_mm object.
> >
> > A quota mechanism is provided to prevent malicious user from
> > exhausting available PASIDs. Currently the quota is a global parameter
> > applied to all VFIO devices. In the future per-device quota might be
> > supported too.
> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Eric Auger <eric.auger@redhat.com>
> > Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > Cc: Joerg Roedel <joro@8bytes.org>
> > Cc: Lu Baolu <baolu.lu@linux.intel.com>
> > Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > ---
> > v4 -> v5:
> > *) address comments from Eric Auger.
> > *) address the comments from Alex on the pasid free range support. Added
> >    per vfio_mm pasid r-b tree.
> >    https://lore.kernel.org/kvm/20200709082751.320742ab@x1.home/
> >
> > v3 -> v4:
> > *) fix lock leam in vfio_mm_get_from_task()
> > *) drop pasid_quota field in struct vfio_mm
> > *) vfio_mm_get_from_task() returns ERR_PTR(-ENOTTY) when
> > !CONFIG_VFIO_PASID
> >
> > v1 -> v2:
> > *) added in v2, split from the pasid alloc/free support of v1
> > ---
> >  drivers/vfio/Kconfig      |   5 +
> >  drivers/vfio/Makefile     |   1 +
> >  drivers/vfio/vfio_pasid.c | 235
> ++++++++++++++++++++++++++++++++++++++++++++++
> >  include/linux/vfio.h      |  28 ++++++
> >  4 files changed, 269 insertions(+)
> >  create mode 100644 drivers/vfio/vfio_pasid.c
> >
> > diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index
> > fd17db9..3d8a108 100644
> > --- a/drivers/vfio/Kconfig
> > +++ b/drivers/vfio/Kconfig
> > @@ -19,6 +19,11 @@ config VFIO_VIRQFD
> >  	depends on VFIO && EVENTFD
> >  	default n
> >
> > +config VFIO_PASID
> > +	tristate
> > +	depends on IOASID && VFIO
> > +	default n
> > +
> >  menuconfig VFIO
> >  	tristate "VFIO Non-Privileged userspace driver framework"
> >  	depends on IOMMU_API
> > diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile index
> > de67c47..bb836a3 100644
> > --- a/drivers/vfio/Makefile
> > +++ b/drivers/vfio/Makefile
> > @@ -3,6 +3,7 @@ vfio_virqfd-y := virqfd.o
> >
> >  obj-$(CONFIG_VFIO) += vfio.o
> >  obj-$(CONFIG_VFIO_VIRQFD) += vfio_virqfd.o
> > +obj-$(CONFIG_VFIO_PASID) += vfio_pasid.o
> >  obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
> >  obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
> >  obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o diff --git
> > a/drivers/vfio/vfio_pasid.c b/drivers/vfio/vfio_pasid.c new file mode
> > 100644 index 0000000..66e6054e
> > --- /dev/null
> > +++ b/drivers/vfio/vfio_pasid.c
> > @@ -0,0 +1,235 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * Copyright (C) 2020 Intel Corporation.
> > + *     Author: Liu Yi L <yi.l.liu@intel.com>
> > + *
> > + */
> > +
> > +#include <linux/vfio.h>
> > +#include <linux/eventfd.h>
> > +#include <linux/file.h>
> > +#include <linux/module.h>
> > +#include <linux/slab.h>
> > +#include <linux/sched/mm.h>
> > +
> > +#define DRIVER_VERSION  "0.1"
> > +#define DRIVER_AUTHOR   "Liu Yi L <yi.l.liu@intel.com>"
> > +#define DRIVER_DESC     "PASID management for VFIO bus drivers"
> > +
> > +#define VFIO_DEFAULT_PASID_QUOTA	1000
> > +static int pasid_quota = VFIO_DEFAULT_PASID_QUOTA;
> > +module_param_named(pasid_quota, pasid_quota, uint, 0444);
> > +MODULE_PARM_DESC(pasid_quota,
> > +		 " Set the quota for max number of PASIDs that an application is
> > +allowed to request (default 1000)");
> s/ Set/Set

got it.

> > +
> > +struct vfio_mm_token {
> > +	unsigned long long val;
> > +};
> > +
> > +struct vfio_mm {
> > +	struct kref		kref;
> > +	int			ioasid_sid;
> > +	struct mutex		pasid_lock;
> > +	struct rb_root		pasid_list;
> > +	struct list_head	next;
> > +	struct vfio_mm_token	token;
> > +};
> > +
> > +static struct mutex		vfio_mm_lock;
> > +static struct list_head		vfio_mm_list;
> > +
> > +struct vfio_pasid {
> > +	struct rb_node		node;
> > +	ioasid_t		pasid;
> > +};
> > +
> > +static void vfio_remove_all_pasids(struct vfio_mm *vmm);
> > +
> > +/* called with vfio.vfio_mm_lock held */ static void
> > +vfio_mm_release(struct kref *kref) {
> > +	struct vfio_mm *vmm = container_of(kref, struct vfio_mm, kref);
> > +
> > +	list_del(&vmm->next);
> > +	mutex_unlock(&vfio_mm_lock);
> > +	vfio_remove_all_pasids(vmm);
> > +	ioasid_free_set(vmm->ioasid_sid, true);
> > +	kfree(vmm);
> > +}
> > +
> > +void vfio_mm_put(struct vfio_mm *vmm) {
> > +	kref_put_mutex(&vmm->kref, vfio_mm_release, &vfio_mm_lock); }
> > +
> > +static void vfio_mm_get(struct vfio_mm *vmm) {
> > +	kref_get(&vmm->kref);
> > +}
> > +
> > +struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task) {
> > +	struct mm_struct *mm = get_task_mm(task);
> > +	struct vfio_mm *vmm;
> > +	unsigned long long val = (unsigned long long)mm;
> > +	int ret;
> > +
> > +	mutex_lock(&vfio_mm_lock);
> > +	/* Search existing vfio_mm with current mm pointer */
> > +	list_for_each_entry(vmm, &vfio_mm_list, next) {
> > +		if (vmm->token.val == val) {
> > +			vfio_mm_get(vmm);
> > +			goto out;
> > +		}
> > +	}
> > +
> > +	vmm = kzalloc(sizeof(*vmm), GFP_KERNEL);
> > +	if (!vmm) {
> > +		vmm = ERR_PTR(-ENOMEM);
> > +		goto out;
> > +	}
> > +
> > +	/*
> > +	 * IOASID core provides a 'IOASID set' concept to track all
> > +	 * PASIDs associated with a token. Here we use mm_struct as
> > +	 * the token and create a IOASID set per mm_struct. All the
> > +	 * containers of the process share the same IOASID set.
> > +	 */
> > +	ret = ioasid_alloc_set((struct ioasid_set *)mm, pasid_quota,
> > +			       &vmm->ioasid_sid);
> > +	if (ret) {
> > +		kfree(vmm);
> > +		vmm = ERR_PTR(ret);
> > +		goto out;
> > +	}
> > +
> > +	kref_init(&vmm->kref);
> > +	vmm->token.val = val;
> > +	mutex_init(&vmm->pasid_lock);
> > +	vmm->pasid_list = RB_ROOT;
> > +
> > +	list_add(&vmm->next, &vfio_mm_list);
> > +out:
> > +	mutex_unlock(&vfio_mm_lock);
> > +	mmput(mm);
> > +	return vmm;
> > +}
> > +
> > +/*
> > + * Find PASID within @min and @max
> > + */
> > +static struct vfio_pasid *vfio_find_pasid(struct vfio_mm *vmm,
> > +					  ioasid_t min, ioasid_t max)
> > +{
> > +	struct rb_node *node = vmm->pasid_list.rb_node;
> > +
> > +	while (node) {
> > +		struct vfio_pasid *vid = rb_entry(node,
> > +						struct vfio_pasid, node);
> > +
> > +		if (max < vid->pasid)
> > +			node = node->rb_left;
> > +		else if (min > vid->pasid)
> > +			node = node->rb_right;
> > +		else
> > +			return vid;
> > +	}
> > +
> > +	return NULL;
> > +}
> > +
> > +static void vfio_link_pasid(struct vfio_mm *vmm, struct vfio_pasid
> > +*new) {
> > +	struct rb_node **link = &vmm->pasid_list.rb_node, *parent = NULL;
> > +	struct vfio_pasid *vid;
> > +
> > +	while (*link) {
> > +		parent = *link;
> > +		vid = rb_entry(parent, struct vfio_pasid, node);
> > +
> > +		if (new->pasid <= vid->pasid)
> > +			link = &(*link)->rb_left;
> > +		else
> > +			link = &(*link)->rb_right;
> > +	}
> > +
> > +	rb_link_node(&new->node, parent, link);
> > +	rb_insert_color(&new->node, &vmm->pasid_list); }
> > +
> > +static void vfio_remove_pasid(struct vfio_mm *vmm, struct vfio_pasid
> > +*vid) {
> > +	rb_erase(&vid->node, &vmm->pasid_list); /* unlink pasid */
> nit: to be consistent with vfio_unlink_dma, introduce vfio_unlink_pasid

aha, I added it but removed it in the middle of cooking.:-)

> > +	ioasid_free(vid->pasid);
> > +	kfree(vid);
> > +}
> > +
> > +static void vfio_remove_all_pasids(struct vfio_mm *vmm) {
> > +	struct rb_node *node;
> > +
> > +	mutex_lock(&vmm->pasid_lock);
> > +	while ((node = rb_first(&vmm->pasid_list)))
> > +		vfio_remove_pasid(vmm, rb_entry(node, struct vfio_pasid, node));
> > +	mutex_unlock(&vmm->pasid_lock);
> > +}
> > +
> > +int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max) {
> > +	ioasid_t pasid;
> > +	struct vfio_pasid *vid;
> > +
> > +	pasid = ioasid_alloc(vmm->ioasid_sid, min, max, NULL);
> > +	if (pasid == INVALID_IOASID)
> > +		return -ENOSPC;
> > +
> > +	vid = kzalloc(sizeof(*vid), GFP_KERNEL);
> > +	if (!vid) {
> > +		ioasid_free(pasid);
> > +		return -ENOMEM;
> > +	}
> > +
> > +	vid->pasid = pasid;
> > +
> > +	mutex_lock(&vmm->pasid_lock);
> > +	vfio_link_pasid(vmm, vid);
> > +	mutex_unlock(&vmm->pasid_lock);
> > +
> > +	return pasid;
> > +}
> I am not totally convinced by your previous reply on EXPORT_SYMBOL_GP()
> irrelevance in this patch. But well ;-)

I recalled my memory, I think it's made per a comment from Chris.
I guess it may make sense to export symbols together with the exact
driver user of it in this patch as well :-) but maybe I misunderstood
him. if yes, I may add the symbol export back in this patch.

https://lore.kernel.org/linux-iommu/20200331075331.GA26583@infradead.org/#t

> 
> > +
> > +void vfio_pasid_free_range(struct vfio_mm *vmm,
> > +			   ioasid_t min, ioasid_t max)
> > +{
> > +	struct vfio_pasid *vid = NULL;
> > +
> > +	/*
> > +	 * IOASID core will notify PASID users (e.g. IOMMU driver) to
> > +	 * teardown necessary structures depending on the to-be-freed
> > +	 * PASID.
> > +	 */
> > +	mutex_lock(&vmm->pasid_lock);
> > +	while ((vid = vfio_find_pasid(vmm, min, max)) != NULL)
> > +		vfio_remove_pasid(vmm, vid);
> > +	mutex_unlock(&vmm->pasid_lock);
> > +}
> > +
> > +static int __init vfio_pasid_init(void) {
> > +	mutex_init(&vfio_mm_lock);
> > +	INIT_LIST_HEAD(&vfio_mm_list);
> > +	return 0;
> > +}
> > +
> > +static void __exit vfio_pasid_exit(void) {
> > +	WARN_ON(!list_empty(&vfio_mm_list));
> > +}
> In your previous reply, ie. https://lkml.org/lkml/2020/7/7/273
> you said:
> "
> I guess yes. VFIO_PASID is supposed to be referenced by VFIO_IOMMU_TYPE1 and
> may be other module. once vfio_pasid_exit() is triggered, that means its user
> (VFIO_IOMMU_TYPE1) has been removed. Should all the vfio_mm instances should
> have been released. If not, means there is vfio_mm leak, should be a bug of user
> module."
> 
> if I am not wrong this dependency is not yet known at this stage of
> the series? I
> would rather add this comment either in the commit message or here.

make sense, so far it's not known. I can add a comment here. :-)

Regards,
Yi Liu

> Thanks
> 
> Eric
> 
> > +
> > +module_init(vfio_pasid_init);
> > +module_exit(vfio_pasid_exit);
> > +
> > +MODULE_VERSION(DRIVER_VERSION);
> > +MODULE_LICENSE("GPL v2");
> > +MODULE_AUTHOR(DRIVER_AUTHOR);
> > +MODULE_DESCRIPTION(DRIVER_DESC);
> > diff --git a/include/linux/vfio.h b/include/linux/vfio.h index
> > 38d3c6a..31472a9 100644
> > --- a/include/linux/vfio.h
> > +++ b/include/linux/vfio.h
> > @@ -97,6 +97,34 @@ extern int vfio_register_iommu_driver(const struct
> > vfio_iommu_driver_ops *ops);  extern void vfio_unregister_iommu_driver(
> >  				const struct vfio_iommu_driver_ops *ops);
> >
> > +struct vfio_mm;
> > +#if IS_ENABLED(CONFIG_VFIO_PASID)
> > +extern struct vfio_mm *vfio_mm_get_from_task(struct task_struct
> > +*task); extern void vfio_mm_put(struct vfio_mm *vmm); extern int
> > +vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max); extern void
> > +vfio_pasid_free_range(struct vfio_mm *vmm,
> > +				  ioasid_t min, ioasid_t max);
> > +#else
> > +static inline struct vfio_mm *vfio_mm_get_from_task(struct
> > +task_struct *task) {
> > +	return ERR_PTR(-ENOTTY);
> > +}
> > +
> > +static inline void vfio_mm_put(struct vfio_mm *vmm) { }
> > +
> > +static inline int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int
> > +max) {
> > +	return -ENOTTY;
> > +}
> > +
> > +static inline void vfio_pasid_free_range(struct vfio_mm *vmm,
> > +					  ioasid_t min, ioasid_t max)
> > +{
> > +}
> > +#endif /* CONFIG_VFIO_PASID */
> > +
> >  /*
> >   * External user API
> >   */
> >

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [PATCH v5 06/15] iommu/vt-d: Support setting ioasid set to domain
  2020-07-19 15:38   ` Auger Eric
@ 2020-07-20  8:11     ` Liu, Yi L
  0 siblings, 0 replies; 58+ messages in thread
From: Liu, Yi L @ 2020-07-20  8:11 UTC (permalink / raw)
  To: Auger Eric, alex.williamson, baolu.lu, joro
  Cc: jean-philippe, Tian, Kevin, Raj, Ashok, kvm, stefanha, Tian,
	 Jun J, iommu, linux-kernel, Sun, Yi Y, Wu, Hao

Hi Eric,

> From: Auger Eric <eric.auger@redhat.com>
> Sent: Sunday, July 19, 2020 11:38 PM
> 
> Yi,
> 
> On 7/12/20 1:21 PM, Liu Yi L wrote:
> > From IOMMU p.o.v., PASIDs allocated and managed by external components
> > (e.g. VFIO) will be passed in for gpasid_bind/unbind operation. IOMMU
> > needs some knowledge to check the PASID ownership, hence add an
> > interface for those components to tell the PASID owner.
> >
> > In latest kernel design, PASID ownership is managed by IOASID set
> > where the PASID is allocated from. This patch adds support for setting
> > ioasid set ID to the domains used for nesting/vSVA. Subsequent SVA
> > operations on the PASID will be checked against its IOASID set for proper
> ownership.
> Subsequent SVA operations will check the PASID against its IOASID set for proper
> ownership.

got it.

> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Alex Williamson <alex.williamson@redhat.com>
> > Cc: Eric Auger <eric.auger@redhat.com>
> > Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > Cc: Joerg Roedel <joro@8bytes.org>
> > Cc: Lu Baolu <baolu.lu@linux.intel.com>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > ---
> > v4 -> v5:
> > *) address comments from Eric Auger.
> > ---
> >  drivers/iommu/intel/iommu.c | 22 ++++++++++++++++++++++
> > include/linux/intel-iommu.h |  4 ++++
> >  include/linux/iommu.h       |  1 +
> >  3 files changed, 27 insertions(+)
> >
> > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> > index 72ae6a2..4d54198 100644
> > --- a/drivers/iommu/intel/iommu.c
> > +++ b/drivers/iommu/intel/iommu.c
> > @@ -1793,6 +1793,7 @@ static struct dmar_domain *alloc_domain(int flags)
> >  	if (first_level_by_default())
> >  		domain->flags |= DOMAIN_FLAG_USE_FIRST_LEVEL;
> >  	domain->has_iotlb_device = false;
> > +	domain->ioasid_sid = INVALID_IOASID_SET;
> >  	INIT_LIST_HEAD(&domain->devices);
> >
> >  	return domain;
> > @@ -6039,6 +6040,27 @@ intel_iommu_domain_set_attr(struct iommu_domain
> *domain,
> >  		}
> >  		spin_unlock_irqrestore(&device_domain_lock, flags);
> >  		break;
> > +	case DOMAIN_ATTR_IOASID_SID:
> > +	{
> > +		int sid = *(int *)data;
> > +
> > +		if (!(dmar_domain->flags & DOMAIN_FLAG_NESTING_MODE)) {
> > +			ret = -ENODEV;
> > +			break;
> > +		}
> > +		spin_lock_irqsave(&device_domain_lock, flags);
> I think the lock should be taken before the DOMAIN_FLAG_NESTING_MODE check.
> Otherwise, the flags can be theretically changed inbetween the check and the test
> below?

I see. will correct it.

Thanks,
Yi Liu

> Thanks
> 
> Eric
> > +		if (dmar_domain->ioasid_sid != INVALID_IOASID_SET &&
> > +		    dmar_domain->ioasid_sid != sid) {
> > +			pr_warn_ratelimited("multi ioasid_set (%d:%d) setting",
> > +					    dmar_domain->ioasid_sid, sid);
> > +			ret = -EBUSY;
> > +			spin_unlock_irqrestore(&device_domain_lock, flags);
> > +			break;
> > +		}
> > +		dmar_domain->ioasid_sid = sid;
> > +		spin_unlock_irqrestore(&device_domain_lock, flags);
> > +		break;
> > +	}
> >  	default:
> >  		ret = -EINVAL;
> >  		break;
> > diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> > index 3f23c26..0d0ab32 100644
> > --- a/include/linux/intel-iommu.h
> > +++ b/include/linux/intel-iommu.h
> > @@ -549,6 +549,10 @@ struct dmar_domain {
> >  					   2 == 1GiB, 3 == 512GiB, 4 == 1TiB */
> >  	u64		max_addr;	/* maximum mapped address */
> >
> > +	int		ioasid_sid;	/*
> > +					 * the ioasid set which tracks all
> > +					 * PASIDs used by the domain.
> > +					 */
> >  	int		default_pasid;	/*
> >  					 * The default pasid used for non-SVM
> >  					 * traffic on mediated devices.
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h index
> > 7ca9d48..e84a1d5 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -124,6 +124,7 @@ enum iommu_attr {
> >  	DOMAIN_ATTR_FSL_PAMUV1,
> >  	DOMAIN_ATTR_NESTING,	/* two stages of translation */
> >  	DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
> > +	DOMAIN_ATTR_IOASID_SID,
> >  	DOMAIN_ATTR_MAX,
> >  };
> >
> >

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v5 05/15] vfio: Add PASID allocation/free support
  2020-07-20  8:03     ` Liu, Yi L
@ 2020-07-20  8:26       ` Auger Eric
  2020-07-20  8:49         ` Liu, Yi L
  0 siblings, 1 reply; 58+ messages in thread
From: Auger Eric @ 2020-07-20  8:26 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, baolu.lu, joro
  Cc: jean-philippe, Tian, Kevin, Raj, Ashok, kvm, stefanha, Tian,
	Jun J, iommu, linux-kernel, Sun, Yi Y, Wu, Hao

Hi Yi,

On 7/20/20 10:03 AM, Liu, Yi L wrote:
> Hi Eric,
> 
>> From: Auger Eric <eric.auger@redhat.com>
>> Sent: Sunday, July 19, 2020 11:39 PM
>>
>> Yi,
>>
>> On 7/12/20 1:21 PM, Liu Yi L wrote:
>>> Shared Virtual Addressing (a.k.a Shared Virtual Memory) allows sharing
>>> multiple process virtual address spaces with the device for simplified
>>> programming model. PASID is used to tag an virtual address space in
>>> DMA requests and to identify the related translation structure in
>>> IOMMU. When a PASID-capable device is assigned to a VM, we want the
>>> same capability of using PASID to tag guest process virtual address
>>> spaces to achieve virtual SVA (vSVA).
>>>
>>> PASID management for guest is vendor specific. Some vendors (e.g.
>>> Intel
>>> VT-d) requires system-wide managed PASIDs cross all devices,
>>> regardless
>> across?
> 
> yep. will correct it.
> 
>>> of whether a device is used by host or assigned to guest. Other
>>> vendors (e.g. ARM SMMU) may allow PASIDs managed per-device thus could
>>> be fully delegated to the guest for assigned devices.
>>>
>>> For system-wide managed PASIDs, this patch introduces a vfio module to
>>> handle explicit PASID alloc/free requests from guest. Allocated PASIDs
>>> are associated to a process (or, mm_struct) in IOASID core. A vfio_mm
>>> object is introduced to track mm_struct. Multiple VFIO containers
>>> within a process share the same vfio_mm object.
>>>
>>> A quota mechanism is provided to prevent malicious user from
>>> exhausting available PASIDs. Currently the quota is a global parameter
>>> applied to all VFIO devices. In the future per-device quota might be
>>> supported too.
>>>
>>> Cc: Kevin Tian <kevin.tian@intel.com>
>>> CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
>>> Cc: Eric Auger <eric.auger@redhat.com>
>>> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
>>> Cc: Joerg Roedel <joro@8bytes.org>
>>> Cc: Lu Baolu <baolu.lu@linux.intel.com>
>>> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
>>> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
>>> ---
>>> v4 -> v5:
>>> *) address comments from Eric Auger.
>>> *) address the comments from Alex on the pasid free range support. Added
>>>    per vfio_mm pasid r-b tree.
>>>    https://lore.kernel.org/kvm/20200709082751.320742ab@x1.home/
>>>
>>> v3 -> v4:
>>> *) fix lock leam in vfio_mm_get_from_task()
>>> *) drop pasid_quota field in struct vfio_mm
>>> *) vfio_mm_get_from_task() returns ERR_PTR(-ENOTTY) when
>>> !CONFIG_VFIO_PASID
>>>
>>> v1 -> v2:
>>> *) added in v2, split from the pasid alloc/free support of v1
>>> ---
>>>  drivers/vfio/Kconfig      |   5 +
>>>  drivers/vfio/Makefile     |   1 +
>>>  drivers/vfio/vfio_pasid.c | 235
>> ++++++++++++++++++++++++++++++++++++++++++++++
>>>  include/linux/vfio.h      |  28 ++++++
>>>  4 files changed, 269 insertions(+)
>>>  create mode 100644 drivers/vfio/vfio_pasid.c
>>>
>>> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index
>>> fd17db9..3d8a108 100644
>>> --- a/drivers/vfio/Kconfig
>>> +++ b/drivers/vfio/Kconfig
>>> @@ -19,6 +19,11 @@ config VFIO_VIRQFD
>>>  	depends on VFIO && EVENTFD
>>>  	default n
>>>
>>> +config VFIO_PASID
>>> +	tristate
>>> +	depends on IOASID && VFIO
>>> +	default n
>>> +
>>>  menuconfig VFIO
>>>  	tristate "VFIO Non-Privileged userspace driver framework"
>>>  	depends on IOMMU_API
>>> diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile index
>>> de67c47..bb836a3 100644
>>> --- a/drivers/vfio/Makefile
>>> +++ b/drivers/vfio/Makefile
>>> @@ -3,6 +3,7 @@ vfio_virqfd-y := virqfd.o
>>>
>>>  obj-$(CONFIG_VFIO) += vfio.o
>>>  obj-$(CONFIG_VFIO_VIRQFD) += vfio_virqfd.o
>>> +obj-$(CONFIG_VFIO_PASID) += vfio_pasid.o
>>>  obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
>>>  obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
>>>  obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o diff --git
>>> a/drivers/vfio/vfio_pasid.c b/drivers/vfio/vfio_pasid.c new file mode
>>> 100644 index 0000000..66e6054e
>>> --- /dev/null
>>> +++ b/drivers/vfio/vfio_pasid.c
>>> @@ -0,0 +1,235 @@
>>> +// SPDX-License-Identifier: GPL-2.0-only
>>> +/*
>>> + * Copyright (C) 2020 Intel Corporation.
>>> + *     Author: Liu Yi L <yi.l.liu@intel.com>
>>> + *
>>> + */
>>> +
>>> +#include <linux/vfio.h>
>>> +#include <linux/eventfd.h>
>>> +#include <linux/file.h>
>>> +#include <linux/module.h>
>>> +#include <linux/slab.h>
>>> +#include <linux/sched/mm.h>
>>> +
>>> +#define DRIVER_VERSION  "0.1"
>>> +#define DRIVER_AUTHOR   "Liu Yi L <yi.l.liu@intel.com>"
>>> +#define DRIVER_DESC     "PASID management for VFIO bus drivers"
>>> +
>>> +#define VFIO_DEFAULT_PASID_QUOTA	1000
>>> +static int pasid_quota = VFIO_DEFAULT_PASID_QUOTA;
>>> +module_param_named(pasid_quota, pasid_quota, uint, 0444);
>>> +MODULE_PARM_DESC(pasid_quota,
>>> +		 " Set the quota for max number of PASIDs that an application is
>>> +allowed to request (default 1000)");
>> s/ Set/Set
> 
> got it.
> 
>>> +
>>> +struct vfio_mm_token {
>>> +	unsigned long long val;
>>> +};
>>> +
>>> +struct vfio_mm {
>>> +	struct kref		kref;
>>> +	int			ioasid_sid;
>>> +	struct mutex		pasid_lock;
>>> +	struct rb_root		pasid_list;
>>> +	struct list_head	next;
>>> +	struct vfio_mm_token	token;
>>> +};
>>> +
>>> +static struct mutex		vfio_mm_lock;
>>> +static struct list_head		vfio_mm_list;
>>> +
>>> +struct vfio_pasid {
>>> +	struct rb_node		node;
>>> +	ioasid_t		pasid;
>>> +};
>>> +
>>> +static void vfio_remove_all_pasids(struct vfio_mm *vmm);
>>> +
>>> +/* called with vfio.vfio_mm_lock held */ static void
>>> +vfio_mm_release(struct kref *kref) {
>>> +	struct vfio_mm *vmm = container_of(kref, struct vfio_mm, kref);
>>> +
>>> +	list_del(&vmm->next);
>>> +	mutex_unlock(&vfio_mm_lock);
>>> +	vfio_remove_all_pasids(vmm);
>>> +	ioasid_free_set(vmm->ioasid_sid, true);
>>> +	kfree(vmm);
>>> +}
>>> +
>>> +void vfio_mm_put(struct vfio_mm *vmm) {
>>> +	kref_put_mutex(&vmm->kref, vfio_mm_release, &vfio_mm_lock); }
>>> +
>>> +static void vfio_mm_get(struct vfio_mm *vmm) {
>>> +	kref_get(&vmm->kref);
>>> +}
>>> +
>>> +struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task) {
>>> +	struct mm_struct *mm = get_task_mm(task);
>>> +	struct vfio_mm *vmm;
>>> +	unsigned long long val = (unsigned long long)mm;
>>> +	int ret;
>>> +
>>> +	mutex_lock(&vfio_mm_lock);
>>> +	/* Search existing vfio_mm with current mm pointer */
>>> +	list_for_each_entry(vmm, &vfio_mm_list, next) {
>>> +		if (vmm->token.val == val) {
>>> +			vfio_mm_get(vmm);
>>> +			goto out;
>>> +		}
>>> +	}
>>> +
>>> +	vmm = kzalloc(sizeof(*vmm), GFP_KERNEL);
>>> +	if (!vmm) {
>>> +		vmm = ERR_PTR(-ENOMEM);
>>> +		goto out;
>>> +	}
>>> +
>>> +	/*
>>> +	 * IOASID core provides a 'IOASID set' concept to track all
>>> +	 * PASIDs associated with a token. Here we use mm_struct as
>>> +	 * the token and create a IOASID set per mm_struct. All the
>>> +	 * containers of the process share the same IOASID set.
>>> +	 */
>>> +	ret = ioasid_alloc_set((struct ioasid_set *)mm, pasid_quota,
>>> +			       &vmm->ioasid_sid);
>>> +	if (ret) {
>>> +		kfree(vmm);
>>> +		vmm = ERR_PTR(ret);
>>> +		goto out;
>>> +	}
>>> +
>>> +	kref_init(&vmm->kref);
>>> +	vmm->token.val = val;
>>> +	mutex_init(&vmm->pasid_lock);
>>> +	vmm->pasid_list = RB_ROOT;
>>> +
>>> +	list_add(&vmm->next, &vfio_mm_list);
>>> +out:
>>> +	mutex_unlock(&vfio_mm_lock);
>>> +	mmput(mm);
>>> +	return vmm;
>>> +}
>>> +
>>> +/*
>>> + * Find PASID within @min and @max
>>> + */
>>> +static struct vfio_pasid *vfio_find_pasid(struct vfio_mm *vmm,
>>> +					  ioasid_t min, ioasid_t max)
>>> +{
>>> +	struct rb_node *node = vmm->pasid_list.rb_node;
>>> +
>>> +	while (node) {
>>> +		struct vfio_pasid *vid = rb_entry(node,
>>> +						struct vfio_pasid, node);
>>> +
>>> +		if (max < vid->pasid)
>>> +			node = node->rb_left;
>>> +		else if (min > vid->pasid)
>>> +			node = node->rb_right;
>>> +		else
>>> +			return vid;
>>> +	}
>>> +
>>> +	return NULL;
>>> +}
>>> +
>>> +static void vfio_link_pasid(struct vfio_mm *vmm, struct vfio_pasid
>>> +*new) {
>>> +	struct rb_node **link = &vmm->pasid_list.rb_node, *parent = NULL;
>>> +	struct vfio_pasid *vid;
>>> +
>>> +	while (*link) {
>>> +		parent = *link;
>>> +		vid = rb_entry(parent, struct vfio_pasid, node);
>>> +
>>> +		if (new->pasid <= vid->pasid)
>>> +			link = &(*link)->rb_left;
>>> +		else
>>> +			link = &(*link)->rb_right;
>>> +	}
>>> +
>>> +	rb_link_node(&new->node, parent, link);
>>> +	rb_insert_color(&new->node, &vmm->pasid_list); }
>>> +
>>> +static void vfio_remove_pasid(struct vfio_mm *vmm, struct vfio_pasid
>>> +*vid) {
>>> +	rb_erase(&vid->node, &vmm->pasid_list); /* unlink pasid */
>> nit: to be consistent with vfio_unlink_dma, introduce vfio_unlink_pasid
> 
> aha, I added it but removed it in the middle of cooking.:-)
> 
>>> +	ioasid_free(vid->pasid);
>>> +	kfree(vid);
>>> +}
>>> +
>>> +static void vfio_remove_all_pasids(struct vfio_mm *vmm) {
>>> +	struct rb_node *node;
>>> +
>>> +	mutex_lock(&vmm->pasid_lock);
>>> +	while ((node = rb_first(&vmm->pasid_list)))
>>> +		vfio_remove_pasid(vmm, rb_entry(node, struct vfio_pasid, node));
>>> +	mutex_unlock(&vmm->pasid_lock);
>>> +}
>>> +
>>> +int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max) {
>>> +	ioasid_t pasid;
>>> +	struct vfio_pasid *vid;
>>> +
>>> +	pasid = ioasid_alloc(vmm->ioasid_sid, min, max, NULL);
>>> +	if (pasid == INVALID_IOASID)
>>> +		return -ENOSPC;
>>> +
>>> +	vid = kzalloc(sizeof(*vid), GFP_KERNEL);
>>> +	if (!vid) {
>>> +		ioasid_free(pasid);
>>> +		return -ENOMEM;
>>> +	}
>>> +
>>> +	vid->pasid = pasid;
>>> +
>>> +	mutex_lock(&vmm->pasid_lock);
>>> +	vfio_link_pasid(vmm, vid);
>>> +	mutex_unlock(&vmm->pasid_lock);
>>> +
>>> +	return pasid;
>>> +}
>> I am not totally convinced by your previous reply on EXPORT_SYMBOL_GP()
>> irrelevance in this patch. But well ;-)
> 
> I recalled my memory, I think it's made per a comment from Chris.
> I guess it may make sense to export symbols together with the exact
> driver user of it in this patch as well :-) but maybe I misunderstood
> him. if yes, I may add the symbol export back in this patch.
> 
> https://lore.kernel.org/linux-iommu/20200331075331.GA26583@infradead.org/#t
OK I don't know the best practice here. Anyway there is no caller at
this stage either. I think you may describe in the commit message the
vfio_iommu_type1 will be the eventual user of the exported functions in
this module, what are the exact exported functions here. You may also
explain the motivation behind creating a separate module.

Thanks

Eric
> 
>>
>>> +
>>> +void vfio_pasid_free_range(struct vfio_mm *vmm,
>>> +			   ioasid_t min, ioasid_t max)
>>> +{
>>> +	struct vfio_pasid *vid = NULL;
>>> +
>>> +	/*
>>> +	 * IOASID core will notify PASID users (e.g. IOMMU driver) to
>>> +	 * teardown necessary structures depending on the to-be-freed
>>> +	 * PASID.
>>> +	 */
>>> +	mutex_lock(&vmm->pasid_lock);
>>> +	while ((vid = vfio_find_pasid(vmm, min, max)) != NULL)
>>> +		vfio_remove_pasid(vmm, vid);
>>> +	mutex_unlock(&vmm->pasid_lock);
>>> +}
>>> +
>>> +static int __init vfio_pasid_init(void) {
>>> +	mutex_init(&vfio_mm_lock);
>>> +	INIT_LIST_HEAD(&vfio_mm_list);
>>> +	return 0;
>>> +}
>>> +
>>> +static void __exit vfio_pasid_exit(void) {
>>> +	WARN_ON(!list_empty(&vfio_mm_list));
>>> +}
>> In your previous reply, ie. https://lkml.org/lkml/2020/7/7/273
>> you said:
>> "
>> I guess yes. VFIO_PASID is supposed to be referenced by VFIO_IOMMU_TYPE1 and
>> may be other module. once vfio_pasid_exit() is triggered, that means its user
>> (VFIO_IOMMU_TYPE1) has been removed. Should all the vfio_mm instances should
>> have been released. If not, means there is vfio_mm leak, should be a bug of user
>> module."
>>
>> if I am not wrong this dependency is not yet known at this stage of
>> the series? I
>> would rather add this comment either in the commit message or here.
> 
> make sense, so far it's not known. I can add a comment here. :-)
> 
> Regards,
> Yi Liu
> 
>> Thanks
>>
>> Eric
>>
>>> +
>>> +module_init(vfio_pasid_init);
>>> +module_exit(vfio_pasid_exit);
>>> +
>>> +MODULE_VERSION(DRIVER_VERSION);
>>> +MODULE_LICENSE("GPL v2");
>>> +MODULE_AUTHOR(DRIVER_AUTHOR);
>>> +MODULE_DESCRIPTION(DRIVER_DESC);
>>> diff --git a/include/linux/vfio.h b/include/linux/vfio.h index
>>> 38d3c6a..31472a9 100644
>>> --- a/include/linux/vfio.h
>>> +++ b/include/linux/vfio.h
>>> @@ -97,6 +97,34 @@ extern int vfio_register_iommu_driver(const struct
>>> vfio_iommu_driver_ops *ops);  extern void vfio_unregister_iommu_driver(
>>>  				const struct vfio_iommu_driver_ops *ops);
>>>
>>> +struct vfio_mm;
>>> +#if IS_ENABLED(CONFIG_VFIO_PASID)
>>> +extern struct vfio_mm *vfio_mm_get_from_task(struct task_struct
>>> +*task); extern void vfio_mm_put(struct vfio_mm *vmm); extern int
>>> +vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max); extern void
>>> +vfio_pasid_free_range(struct vfio_mm *vmm,
>>> +				  ioasid_t min, ioasid_t max);
>>> +#else
>>> +static inline struct vfio_mm *vfio_mm_get_from_task(struct
>>> +task_struct *task) {
>>> +	return ERR_PTR(-ENOTTY);
>>> +}
>>> +
>>> +static inline void vfio_mm_put(struct vfio_mm *vmm) { }
>>> +
>>> +static inline int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int
>>> +max) {
>>> +	return -ENOTTY;
>>> +}
>>> +
>>> +static inline void vfio_pasid_free_range(struct vfio_mm *vmm,
>>> +					  ioasid_t min, ioasid_t max)
>>> +{
>>> +}
>>> +#endif /* CONFIG_VFIO_PASID */
>>> +
>>>  /*
>>>   * External user API
>>>   */
>>>
> 

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v5 04/15] vfio/type1: Report iommu nesting info to userspace
  2020-07-20  7:51     ` Liu, Yi L
@ 2020-07-20  8:33       ` Auger Eric
  2020-07-20  8:51         ` Liu, Yi L
  0 siblings, 1 reply; 58+ messages in thread
From: Auger Eric @ 2020-07-20  8:33 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, baolu.lu, joro
  Cc: jean-philippe, Tian, Kevin, Raj, Ashok, kvm, stefanha, Tian,
	Jun J, iommu, linux-kernel, Sun, Yi Y, Wu, Hao

Yi,

On 7/20/20 9:51 AM, Liu, Yi L wrote:
> Hi Eric,
> 
>> From: Auger Eric <eric.auger@redhat.com>
>> Sent: Saturday, July 18, 2020 1:34 AM
>>
>> Yi,
>>
>> On 7/12/20 1:20 PM, Liu Yi L wrote:
>>> This patch exports iommu nesting capability info to user space through
>>> VFIO. User space is expected to check this info for supported uAPIs (e.g.
>> it is not only to check the supported uAPIS but rather to know which
>> callbacks it must call upon vIOMMU events and which features are
>> supported by the physical IOMMU.
> 
> yes, will refine the description per your comment.
> 
>>> PASID alloc/free, bind page table, and cache invalidation) and the vendor
>>> specific format information for first level/stage page table that will be
>>> bound to.
>>>
>>> The nesting info is available only after the nesting iommu type is set
>>> for a container.
>> to NESTED type
> 
> you mean "The nesting info is available only after container set to be NESTED type."
> 
> right?
correct
> 
>>  Current implementation imposes one limitation - one
>>> nesting container should include at most one group. The philosophy of
>>> vfio container is having all groups/devices within the container share
>>> the same IOMMU context. When vSVA is enabled, one IOMMU context could
>>> include one 2nd-level address space and multiple 1st-level address spaces.
>>> While the 2nd-level address space is reasonably sharable by multiple groups
>>> , blindly sharing 1st-level address spaces across all groups within the
>>> container might instead break the guest expectation. In the future sub/
>>> super container concept might be introduced to allow partial address space
>>> sharing within an IOMMU context. But for now let's go with this restriction
>>> by requiring singleton container for using nesting iommu features. Below
>>> link has the related discussion about this decision.
>>
>> Maybe add a note about SMMU related changes spotted by Jean.
> 
> I guess you mean the comments Jean gave in patch 3/15, right? I'll
> copy his comments and also give the below link as well.
> 
> https://lore.kernel.org/kvm/20200717090900.GC4850@myrica/
correct

Thanks

Eric
> 
>>>
>>> https://lkml.org/lkml/2020/5/15/1028
>>>
>>> Cc: Kevin Tian <kevin.tian@intel.com>
>>> CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
>>> Cc: Alex Williamson <alex.williamson@redhat.com>
>>> Cc: Eric Auger <eric.auger@redhat.com>
>>> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
>>> Cc: Joerg Roedel <joro@8bytes.org>
>>> Cc: Lu Baolu <baolu.lu@linux.intel.com>
>>> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
>>> ---
>>> v4 -> v5:
>>> *) address comments from Eric Auger.
>>> *) return struct iommu_nesting_info for
>> VFIO_IOMMU_TYPE1_INFO_CAP_NESTING as
>>>    cap is much "cheap", if needs extension in future, just define another cap.
>>>    https://lore.kernel.org/kvm/20200708132947.5b7ee954@x1.home/
>>>
>>> v3 -> v4:
>>> *) address comments against v3.
>>>
>>> v1 -> v2:
>>> *) added in v2
>>> ---
>>>  drivers/vfio/vfio_iommu_type1.c | 102
>> +++++++++++++++++++++++++++++++++++-----
>>>  include/uapi/linux/vfio.h       |  19 ++++++++
>>>  2 files changed, 109 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>>> index 3bd70ff..ed80104 100644
>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>> @@ -62,18 +62,20 @@ MODULE_PARM_DESC(dma_entry_limit,
>>>  		 "Maximum number of user DMA mappings per container (65535).");
>>>
>>>  struct vfio_iommu {
>>> -	struct list_head	domain_list;
>>> -	struct list_head	iova_list;
>>> -	struct vfio_domain	*external_domain; /* domain for external user */
>>> -	struct mutex		lock;
>>> -	struct rb_root		dma_list;
>>> -	struct blocking_notifier_head notifier;
>>> -	unsigned int		dma_avail;
>>> -	uint64_t		pgsize_bitmap;
>>> -	bool			v2;
>>> -	bool			nesting;
>>> -	bool			dirty_page_tracking;
>>> -	bool			pinned_page_dirty_scope;
>>> +	struct list_head		domain_list;
>>> +	struct list_head		iova_list;
>>> +	/* domain for external user */
>>> +	struct vfio_domain		*external_domain;
>>> +	struct mutex			lock;
>>> +	struct rb_root			dma_list;
>>> +	struct blocking_notifier_head	notifier;
>>> +	unsigned int			dma_avail;
>>> +	uint64_t			pgsize_bitmap;
>>> +	bool				v2;
>>> +	bool				nesting;
>>> +	bool				dirty_page_tracking;
>>> +	bool				pinned_page_dirty_scope;
>>> +	struct iommu_nesting_info	*nesting_info;
>>>  };
>>>
>>>  struct vfio_domain {
>>> @@ -130,6 +132,9 @@ struct vfio_regions {
>>>  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)	\
>>>  					(!list_empty(&iommu->domain_list))
>>>
>>> +#define CONTAINER_HAS_DOMAIN(iommu)	(((iommu)->external_domain) || \
>>> +					 (!list_empty(&(iommu)->domain_list)))
>>> +
>>>  #define DIRTY_BITMAP_BYTES(n)	(ALIGN(n, BITS_PER_TYPE(u64)) /
>> BITS_PER_BYTE)
>>>
>>>  /*
>>> @@ -1929,6 +1934,13 @@ static void vfio_iommu_iova_insert_copy(struct
>> vfio_iommu *iommu,
>>>
>>>  	list_splice_tail(iova_copy, iova);
>>>  }
>>> +
>>> +static void vfio_iommu_release_nesting_info(struct vfio_iommu *iommu)
>>> +{
>>> +	kfree(iommu->nesting_info);
>>> +	iommu->nesting_info = NULL;
>>> +}
>>> +
>>>  static int vfio_iommu_type1_attach_group(void *iommu_data,
>>>  					 struct iommu_group *iommu_group)
>>>  {
>>> @@ -1959,6 +1971,12 @@ static int vfio_iommu_type1_attach_group(void
>> *iommu_data,
>>>  		}
>>>  	}
>>>
>>> +	/* Nesting type container can include only one group */
>>> +	if (iommu->nesting && CONTAINER_HAS_DOMAIN(iommu)) {
>>> +		mutex_unlock(&iommu->lock);
>>> +		return -EINVAL;
>>> +	}
>>> +
>>>  	group = kzalloc(sizeof(*group), GFP_KERNEL);
>>>  	domain = kzalloc(sizeof(*domain), GFP_KERNEL);
>>>  	if (!group || !domain) {
>>> @@ -2029,6 +2047,32 @@ static int vfio_iommu_type1_attach_group(void
>> *iommu_data,
>>>  	if (ret)
>>>  		goto out_domain;
>>>
>>> +	/* Nesting cap info is available only after attaching */
>>> +	if (iommu->nesting) {
>>> +		struct iommu_nesting_info tmp = { .size = 0, };
>>> +
>>> +		/* First get the size of vendor specific nesting info */
>>> +		ret = iommu_domain_get_attr(domain->domain,
>>> +					    DOMAIN_ATTR_NESTING,
>>> +					    &tmp);
>>> +		if (ret)
>>> +			goto out_detach;
>>> +
>>> +		iommu->nesting_info = kzalloc(tmp.size, GFP_KERNEL);
>>> +		if (!iommu->nesting_info) {
>>> +			ret = -ENOMEM;
>>> +			goto out_detach;
>>> +		}
>>> +
>>> +		/* Now get the nesting info */
>>> +		iommu->nesting_info->size = tmp.size;
>>> +		ret = iommu_domain_get_attr(domain->domain,
>>> +					    DOMAIN_ATTR_NESTING,
>>> +					    iommu->nesting_info);
>>> +		if (ret)
>>> +			goto out_detach;
> 
> get nesting_info failure will result in group_attach failure.[1]
> 
>>> +	}
>>> +
>>>  	/* Get aperture info */
>>>  	iommu_domain_get_attr(domain->domain, DOMAIN_ATTR_GEOMETRY,
>> &geo);
>>>
>>> @@ -2138,6 +2182,7 @@ static int vfio_iommu_type1_attach_group(void
>> *iommu_data,
>>>  	return 0;
>>>
>>>  out_detach:
>>> +	vfio_iommu_release_nesting_info(iommu);
>>>  	vfio_iommu_detach_group(domain, group);
>>>  out_domain:
>>>  	iommu_domain_free(domain->domain);
>>> @@ -2338,6 +2383,8 @@ static void vfio_iommu_type1_detach_group(void
>> *iommu_data,
>>>  					vfio_iommu_unmap_unpin_all(iommu);
>>>  				else
>>>
>> 	vfio_iommu_unmap_unpin_reaccount(iommu);
>>> +
>>> +				vfio_iommu_release_nesting_info(iommu);
>>>  			}
>>>  			iommu_domain_free(domain->domain);
>>>  			list_del(&domain->next);
>>> @@ -2546,6 +2593,31 @@ static int vfio_iommu_migration_build_caps(struct
>> vfio_iommu *iommu,
>>>  	return vfio_info_add_capability(caps, &cap_mig.header, sizeof(cap_mig));
>>>  }
>>>
>>> +static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu,
>>> +					   struct vfio_info_cap *caps)
>>> +{
>>> +	struct vfio_info_cap_header *header;
>>> +	struct vfio_iommu_type1_info_cap_nesting *nesting_cap;
>>> +	size_t size;
>>> +
>>> +	size = offsetof(struct vfio_iommu_type1_info_cap_nesting, info) +
>>> +		iommu->nesting_info->size;
>>> +
>>> +	header = vfio_info_cap_add(caps, size,
>>> +				   VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1);
>>> +	if (IS_ERR(header))
>>> +		return PTR_ERR(header);
>>> +
>>> +	nesting_cap = container_of(header,
>>> +				   struct vfio_iommu_type1_info_cap_nesting,
>>> +				   header);
>>> +
>>> +	memcpy(&nesting_cap->info, iommu->nesting_info,
>>> +	       iommu->nesting_info->size);
>> you must check whether nesting_info is non NULL before doing that.
> 
> it's now validated in the caller of this function. :-)
ah ok sorry I missed that.
>  
>> Besides I agree with Jean on the fact it may be better to not report the
>> capability if nested is not supported.
> 
> I see. in this patch, just few lines below [2], vfio only reports nesting
> cap when iommu->nesting_info is non-null. so even if userspace selected
> nesting type, it may fail at the VFIO_SET_IOMMU phase since group_attach
> will be failed for NESTED type container if host iommu doesn’t support
> nesting. the code is marked as [1] in this email.
OK that's already there then.

Thanks

Eric
> 
>>> +
>>> +	return 0;
>>> +}
>>> +
>>>  static int vfio_iommu_type1_get_info(struct vfio_iommu *iommu,
>>>  				     unsigned long arg)
>>>  {
>>> @@ -2581,6 +2653,12 @@ static int vfio_iommu_type1_get_info(struct
>> vfio_iommu *iommu,
>>>  	if (!ret)
>>>  		ret = vfio_iommu_iova_build_caps(iommu, &caps);
>>>
>>> +	if (iommu->nesting_info) {
>>> +		ret = vfio_iommu_info_add_nesting_cap(iommu, &caps);
>>> +		if (ret)
>>> +			return ret;
> 
> here checks nesting_info before reporting nesting cap. [2]
> 
>>> +	}
>>> +
>>>  	mutex_unlock(&iommu->lock);
>>>
>>>  	if (ret)
>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>>> index 9204705..46a78af 100644
>>> --- a/include/uapi/linux/vfio.h
>>> +++ b/include/uapi/linux/vfio.h
>>> @@ -14,6 +14,7 @@
>>>
>>>  #include <linux/types.h>
>>>  #include <linux/ioctl.h>
>>> +#include <linux/iommu.h>
>>>
>>>  #define VFIO_API_VERSION	0
>>>
>>> @@ -1039,6 +1040,24 @@ struct vfio_iommu_type1_info_cap_migration {
>>>  	__u64	max_dirty_bitmap_size;		/* in bytes */
>>>  };
>>>
>>> +/*
>>> + * The nesting capability allows to report the related capability
>>> + * and info for nesting iommu type.
>>> + *
>>> + * The structures below define version 1 of this capability.
>>> + *
>>> + * User space selected VFIO_TYPE1_NESTING_IOMMU type should check
>>> + * this capto get supported features.
>> s/capto/capability to get
> 
> got it.
> 
> Regards,
> Yi Liu
> 
>>> + *
>>> + * @info: the nesting info provided by IOMMU driver.
>>> + */
>>> +#define VFIO_IOMMU_TYPE1_INFO_CAP_NESTING  3
>>> +
>>> +struct vfio_iommu_type1_info_cap_nesting {
>>> +	struct	vfio_info_cap_header header;
>>> +	struct iommu_nesting_info info;
>>> +};
>>> +
>>>  #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
>>>
>>>  /**
>>>
>>
>> Thanks
>>
>> Eric
> 

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [PATCH v5 05/15] vfio: Add PASID allocation/free support
  2020-07-20  8:26       ` Auger Eric
@ 2020-07-20  8:49         ` Liu, Yi L
  0 siblings, 0 replies; 58+ messages in thread
From: Liu, Yi L @ 2020-07-20  8:49 UTC (permalink / raw)
  To: Auger Eric, alex.williamson, baolu.lu, joro
  Cc: jean-philippe, Tian, Kevin, Raj, Ashok, kvm, stefanha, Tian,
	 Jun J, iommu, linux-kernel, Sun, Yi Y, Wu, Hao

Hi Eric,

> From: Auger Eric <eric.auger@redhat.com>
> Sent: Monday, July 20, 2020 4:26 PM
[...]
> >>> +int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max) {
> >>> +	ioasid_t pasid;
> >>> +	struct vfio_pasid *vid;
> >>> +
> >>> +	pasid = ioasid_alloc(vmm->ioasid_sid, min, max, NULL);
> >>> +	if (pasid == INVALID_IOASID)
> >>> +		return -ENOSPC;
> >>> +
> >>> +	vid = kzalloc(sizeof(*vid), GFP_KERNEL);
> >>> +	if (!vid) {
> >>> +		ioasid_free(pasid);
> >>> +		return -ENOMEM;
> >>> +	}
> >>> +
> >>> +	vid->pasid = pasid;
> >>> +
> >>> +	mutex_lock(&vmm->pasid_lock);
> >>> +	vfio_link_pasid(vmm, vid);
> >>> +	mutex_unlock(&vmm->pasid_lock);
> >>> +
> >>> +	return pasid;
> >>> +}
> >> I am not totally convinced by your previous reply on EXPORT_SYMBOL_GP()
> >> irrelevance in this patch. But well ;-)
> >
> > I recalled my memory, I think it's made per a comment from Chris.
> > I guess it may make sense to export symbols together with the exact
> > driver user of it in this patch as well :-) but maybe I misunderstood
> > him. if yes, I may add the symbol export back in this patch.
> >
> > https://lore.kernel.org/linux-iommu/20200331075331.GA26583@infradead.org/#t
> OK I don't know the best practice here. Anyway there is no caller at
> this stage either. I think you may describe in the commit message the
> vfio_iommu_type1 will be the eventual user of the exported functions in
> this module, what are the exact exported functions here. You may also
> explain the motivation behind creating a separate module.

got it. will add it.

Regards,
Yi Liu


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [PATCH v5 04/15] vfio/type1: Report iommu nesting info to userspace
  2020-07-20  8:33       ` Auger Eric
@ 2020-07-20  8:51         ` Liu, Yi L
  0 siblings, 0 replies; 58+ messages in thread
From: Liu, Yi L @ 2020-07-20  8:51 UTC (permalink / raw)
  To: Auger Eric, alex.williamson, baolu.lu, joro
  Cc: jean-philippe, Tian, Kevin, Raj, Ashok, kvm, stefanha, Tian,
	 Jun J, iommu, linux-kernel, Sun, Yi Y, Wu, Hao

Hi Eric,

> From: Auger Eric <eric.auger@redhat.com>
> Sent: Monday, July 20, 2020 4:33 PM
> 
> Yi,
> 
> On 7/20/20 9:51 AM, Liu, Yi L wrote:
> > Hi Eric,
> >
> >> From: Auger Eric <eric.auger@redhat.com>
> >> Sent: Saturday, July 18, 2020 1:34 AM
> >>
> >> Yi,
> >>
> >> On 7/12/20 1:20 PM, Liu Yi L wrote:
> >>> This patch exports iommu nesting capability info to user space
> >>> through VFIO. User space is expected to check this info for supported uAPIs (e.g.
> >> it is not only to check the supported uAPIS but rather to know which
> >> callbacks it must call upon vIOMMU events and which features are
> >> supported by the physical IOMMU.
> >
> > yes, will refine the description per your comment.
> >
> >>> PASID alloc/free, bind page table, and cache invalidation) and the
> >>> vendor specific format information for first level/stage page table
> >>> that will be bound to.
> >>>
> >>> The nesting info is available only after the nesting iommu type is
> >>> set for a container.
> >> to NESTED type
> >
> > you mean "The nesting info is available only after container set to be NESTED
> type."
> >
> > right?
> correct

got you.

> >
> >>  Current implementation imposes one limitation - one
> >>> nesting container should include at most one group. The philosophy
> >>> of vfio container is having all groups/devices within the container
> >>> share the same IOMMU context. When vSVA is enabled, one IOMMU
> >>> context could include one 2nd-level address space and multiple 1st-level address
> spaces.
> >>> While the 2nd-level address space is reasonably sharable by multiple
> >>> groups , blindly sharing 1st-level address spaces across all groups
> >>> within the container might instead break the guest expectation. In
> >>> the future sub/ super container concept might be introduced to allow
> >>> partial address space sharing within an IOMMU context. But for now
> >>> let's go with this restriction by requiring singleton container for
> >>> using nesting iommu features. Below link has the related discussion about this
> decision.
> >>
> >> Maybe add a note about SMMU related changes spotted by Jean.
> >
> > I guess you mean the comments Jean gave in patch 3/15, right? I'll
> > copy his comments and also give the below link as well.
> >
> > https://lore.kernel.org/kvm/20200717090900.GC4850@myrica/
> correct

I see.

Regards,
Yi Liu

> Thanks
> 
> Eric
> >
> >>>
> >>> https://lkml.org/lkml/2020/5/15/1028
> >>>
> >>> Cc: Kevin Tian <kevin.tian@intel.com>
> >>> CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> >>> Cc: Alex Williamson <alex.williamson@redhat.com>
> >>> Cc: Eric Auger <eric.auger@redhat.com>
> >>> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> >>> Cc: Joerg Roedel <joro@8bytes.org>
> >>> Cc: Lu Baolu <baolu.lu@linux.intel.com>
> >>> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> >>> ---
> >>> v4 -> v5:
> >>> *) address comments from Eric Auger.
> >>> *) return struct iommu_nesting_info for
> >> VFIO_IOMMU_TYPE1_INFO_CAP_NESTING as
> >>>    cap is much "cheap", if needs extension in future, just define another cap.
> >>>    https://lore.kernel.org/kvm/20200708132947.5b7ee954@x1.home/
> >>>
> >>> v3 -> v4:
> >>> *) address comments against v3.
> >>>
> >>> v1 -> v2:
> >>> *) added in v2
> >>> ---
> >>>  drivers/vfio/vfio_iommu_type1.c | 102
> >> +++++++++++++++++++++++++++++++++++-----
> >>>  include/uapi/linux/vfio.h       |  19 ++++++++
> >>>  2 files changed, 109 insertions(+), 12 deletions(-)
> >>>
> >>> diff --git a/drivers/vfio/vfio_iommu_type1.c
> >>> b/drivers/vfio/vfio_iommu_type1.c index 3bd70ff..ed80104 100644
> >>> --- a/drivers/vfio/vfio_iommu_type1.c
> >>> +++ b/drivers/vfio/vfio_iommu_type1.c
> >>> @@ -62,18 +62,20 @@ MODULE_PARM_DESC(dma_entry_limit,
> >>>  		 "Maximum number of user DMA mappings per container (65535).");
> >>>
> >>>  struct vfio_iommu {
> >>> -	struct list_head	domain_list;
> >>> -	struct list_head	iova_list;
> >>> -	struct vfio_domain	*external_domain; /* domain for external user */
> >>> -	struct mutex		lock;
> >>> -	struct rb_root		dma_list;
> >>> -	struct blocking_notifier_head notifier;
> >>> -	unsigned int		dma_avail;
> >>> -	uint64_t		pgsize_bitmap;
> >>> -	bool			v2;
> >>> -	bool			nesting;
> >>> -	bool			dirty_page_tracking;
> >>> -	bool			pinned_page_dirty_scope;
> >>> +	struct list_head		domain_list;
> >>> +	struct list_head		iova_list;
> >>> +	/* domain for external user */
> >>> +	struct vfio_domain		*external_domain;
> >>> +	struct mutex			lock;
> >>> +	struct rb_root			dma_list;
> >>> +	struct blocking_notifier_head	notifier;
> >>> +	unsigned int			dma_avail;
> >>> +	uint64_t			pgsize_bitmap;
> >>> +	bool				v2;
> >>> +	bool				nesting;
> >>> +	bool				dirty_page_tracking;
> >>> +	bool				pinned_page_dirty_scope;
> >>> +	struct iommu_nesting_info	*nesting_info;
> >>>  };
> >>>
> >>>  struct vfio_domain {
> >>> @@ -130,6 +132,9 @@ struct vfio_regions {
> >>>  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)	\
> >>>  					(!list_empty(&iommu->domain_list))
> >>>
> >>> +#define CONTAINER_HAS_DOMAIN(iommu)	(((iommu)-
> >external_domain) || \
> >>> +					 (!list_empty(&(iommu)->domain_list)))
> >>> +
> >>>  #define DIRTY_BITMAP_BYTES(n)	(ALIGN(n, BITS_PER_TYPE(u64)) /
> >> BITS_PER_BYTE)
> >>>
> >>>  /*
> >>> @@ -1929,6 +1934,13 @@ static void
> >>> vfio_iommu_iova_insert_copy(struct
> >> vfio_iommu *iommu,
> >>>
> >>>  	list_splice_tail(iova_copy, iova);  }
> >>> +
> >>> +static void vfio_iommu_release_nesting_info(struct vfio_iommu
> >>> +*iommu) {
> >>> +	kfree(iommu->nesting_info);
> >>> +	iommu->nesting_info = NULL;
> >>> +}
> >>> +
> >>>  static int vfio_iommu_type1_attach_group(void *iommu_data,
> >>>  					 struct iommu_group *iommu_group)
> { @@ -1959,6 +1971,12 @@
> >>> static int vfio_iommu_type1_attach_group(void
> >> *iommu_data,
> >>>  		}
> >>>  	}
> >>>
> >>> +	/* Nesting type container can include only one group */
> >>> +	if (iommu->nesting && CONTAINER_HAS_DOMAIN(iommu)) {
> >>> +		mutex_unlock(&iommu->lock);
> >>> +		return -EINVAL;
> >>> +	}
> >>> +
> >>>  	group = kzalloc(sizeof(*group), GFP_KERNEL);
> >>>  	domain = kzalloc(sizeof(*domain), GFP_KERNEL);
> >>>  	if (!group || !domain) {
> >>> @@ -2029,6 +2047,32 @@ static int vfio_iommu_type1_attach_group(void
> >> *iommu_data,
> >>>  	if (ret)
> >>>  		goto out_domain;
> >>>
> >>> +	/* Nesting cap info is available only after attaching */
> >>> +	if (iommu->nesting) {
> >>> +		struct iommu_nesting_info tmp = { .size = 0, };
> >>> +
> >>> +		/* First get the size of vendor specific nesting info */
> >>> +		ret = iommu_domain_get_attr(domain->domain,
> >>> +					    DOMAIN_ATTR_NESTING,
> >>> +					    &tmp);
> >>> +		if (ret)
> >>> +			goto out_detach;
> >>> +
> >>> +		iommu->nesting_info = kzalloc(tmp.size, GFP_KERNEL);
> >>> +		if (!iommu->nesting_info) {
> >>> +			ret = -ENOMEM;
> >>> +			goto out_detach;
> >>> +		}
> >>> +
> >>> +		/* Now get the nesting info */
> >>> +		iommu->nesting_info->size = tmp.size;
> >>> +		ret = iommu_domain_get_attr(domain->domain,
> >>> +					    DOMAIN_ATTR_NESTING,
> >>> +					    iommu->nesting_info);
> >>> +		if (ret)
> >>> +			goto out_detach;
> >
> > get nesting_info failure will result in group_attach failure.[1]
> >
> >>> +	}
> >>> +
> >>>  	/* Get aperture info */
> >>>  	iommu_domain_get_attr(domain->domain, DOMAIN_ATTR_GEOMETRY,
> >> &geo);
> >>>
> >>> @@ -2138,6 +2182,7 @@ static int vfio_iommu_type1_attach_group(void
> >> *iommu_data,
> >>>  	return 0;
> >>>
> >>>  out_detach:
> >>> +	vfio_iommu_release_nesting_info(iommu);
> >>>  	vfio_iommu_detach_group(domain, group);
> >>>  out_domain:
> >>>  	iommu_domain_free(domain->domain);
> >>> @@ -2338,6 +2383,8 @@ static void vfio_iommu_type1_detach_group(void
> >> *iommu_data,
> >>>  					vfio_iommu_unmap_unpin_all(iommu);
> >>>  				else
> >>>
> >> 	vfio_iommu_unmap_unpin_reaccount(iommu);
> >>> +
> >>> +				vfio_iommu_release_nesting_info(iommu);
> >>>  			}
> >>>  			iommu_domain_free(domain->domain);
> >>>  			list_del(&domain->next);
> >>> @@ -2546,6 +2593,31 @@ static int
> >>> vfio_iommu_migration_build_caps(struct
> >> vfio_iommu *iommu,
> >>>  	return vfio_info_add_capability(caps, &cap_mig.header,
> >>> sizeof(cap_mig));  }
> >>>
> >>> +static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu,
> >>> +					   struct vfio_info_cap *caps) {
> >>> +	struct vfio_info_cap_header *header;
> >>> +	struct vfio_iommu_type1_info_cap_nesting *nesting_cap;
> >>> +	size_t size;
> >>> +
> >>> +	size = offsetof(struct vfio_iommu_type1_info_cap_nesting, info) +
> >>> +		iommu->nesting_info->size;
> >>> +
> >>> +	header = vfio_info_cap_add(caps, size,
> >>> +				   VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1);
> >>> +	if (IS_ERR(header))
> >>> +		return PTR_ERR(header);
> >>> +
> >>> +	nesting_cap = container_of(header,
> >>> +				   struct vfio_iommu_type1_info_cap_nesting,
> >>> +				   header);
> >>> +
> >>> +	memcpy(&nesting_cap->info, iommu->nesting_info,
> >>> +	       iommu->nesting_info->size);
> >> you must check whether nesting_info is non NULL before doing that.
> >
> > it's now validated in the caller of this function. :-)
> ah ok sorry I missed that.
> >
> >> Besides I agree with Jean on the fact it may be better to not report
> >> the capability if nested is not supported.
> >
> > I see. in this patch, just few lines below [2], vfio only reports
> > nesting cap when iommu->nesting_info is non-null. so even if userspace
> > selected nesting type, it may fail at the VFIO_SET_IOMMU phase since
> > group_attach will be failed for NESTED type container if host iommu
> > doesn’t support nesting. the code is marked as [1] in this email.
> OK that's already there then.
>
> Thanks
> 
> Eric
> >
> >>> +
> >>> +	return 0;
> >>> +}
> >>> +
> >>>  static int vfio_iommu_type1_get_info(struct vfio_iommu *iommu,
> >>>  				     unsigned long arg)
> >>>  {
> >>> @@ -2581,6 +2653,12 @@ static int vfio_iommu_type1_get_info(struct
> >> vfio_iommu *iommu,
> >>>  	if (!ret)
> >>>  		ret = vfio_iommu_iova_build_caps(iommu, &caps);
> >>>
> >>> +	if (iommu->nesting_info) {
> >>> +		ret = vfio_iommu_info_add_nesting_cap(iommu, &caps);
> >>> +		if (ret)
> >>> +			return ret;
> >
> > here checks nesting_info before reporting nesting cap. [2]
> >
> >>> +	}
> >>> +
> >>>  	mutex_unlock(&iommu->lock);
> >>>
> >>>  	if (ret)
> >>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> >>> index 9204705..46a78af 100644
> >>> --- a/include/uapi/linux/vfio.h
> >>> +++ b/include/uapi/linux/vfio.h
> >>> @@ -14,6 +14,7 @@
> >>>
> >>>  #include <linux/types.h>
> >>>  #include <linux/ioctl.h>
> >>> +#include <linux/iommu.h>
> >>>
> >>>  #define VFIO_API_VERSION	0
> >>>
> >>> @@ -1039,6 +1040,24 @@ struct vfio_iommu_type1_info_cap_migration {
> >>>  	__u64	max_dirty_bitmap_size;		/* in bytes */
> >>>  };
> >>>
> >>> +/*
> >>> + * The nesting capability allows to report the related capability
> >>> + * and info for nesting iommu type.
> >>> + *
> >>> + * The structures below define version 1 of this capability.
> >>> + *
> >>> + * User space selected VFIO_TYPE1_NESTING_IOMMU type should check
> >>> + * this capto get supported features.
> >> s/capto/capability to get
> >
> > got it.
> >
> > Regards,
> > Yi Liu
> >
> >>> + *
> >>> + * @info: the nesting info provided by IOMMU driver.
> >>> + */
> >>> +#define VFIO_IOMMU_TYPE1_INFO_CAP_NESTING  3
> >>> +
> >>> +struct vfio_iommu_type1_info_cap_nesting {
> >>> +	struct	vfio_info_cap_header header;
> >>> +	struct iommu_nesting_info info;
> >>> +};
> >>> +
> >>>  #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
> >>>
> >>>  /**
> >>>
> >>
> >> Thanks
> >>
> >> Eric
> >

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [PATCH v5 07/15] vfio/type1: Add VFIO_IOMMU_PASID_REQUEST (alloc/free)
  2020-07-19 15:38   ` Auger Eric
@ 2020-07-20  8:56     ` Liu, Yi L
  0 siblings, 0 replies; 58+ messages in thread
From: Liu, Yi L @ 2020-07-20  8:56 UTC (permalink / raw)
  To: Auger Eric, alex.williamson, baolu.lu, joro
  Cc: jean-philippe, Tian, Kevin, Raj, Ashok, kvm, stefanha, Tian,
	 Jun J, iommu, linux-kernel, Sun, Yi Y, Wu, Hao

Hi Eric,

> From: Auger Eric <eric.auger@redhat.com>
> Sent: Sunday, July 19, 2020 11:39 PM
> 
> Yi,
> 
> On 7/12/20 1:21 PM, Liu Yi L wrote:
> > This patch allows user space to request PASID allocation/free, e.g.
> > when serving the request from the guest.
> >
> > PASIDs that are not freed by userspace are automatically freed when
> > the IOASID set is destroyed when process exits.
> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Alex Williamson <alex.williamson@redhat.com>
> > Cc: Eric Auger <eric.auger@redhat.com>
> > Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > Cc: Joerg Roedel <joro@8bytes.org>
> > Cc: Lu Baolu <baolu.lu@linux.intel.com>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > ---
> > v4 -> v5:
> > *) address comments from Eric Auger.
> > *) the comments for the PASID_FREE request is addressed in patch 5/15 of
> >    this series.
> >
> > v3 -> v4:
> > *) address comments from v3, except the below comment against the range
> >    of PASID_FREE request. needs more help on it.
> >     "> +if (req.range.min > req.range.max)
> >
> >     Is it exploitable that a user can spin the kernel for a long time in
> >     the case of a free by calling this with [0, MAX_UINT] regardless of
> >     their actual allocations?"
> >
> > https://lore.kernel.org/linux-iommu/20200702151832.048b44d1@x1.home/
> >
> > v1 -> v2:
> > *) move the vfio_mm related code to be a seprate module
> > *) use a single structure for alloc/free, could support a range of
> > PASIDs
> > *) fetch vfio_mm at group_attach time instead of at iommu driver open
> > time
> > ---
> >  drivers/vfio/Kconfig            |  1 +
> >  drivers/vfio/vfio_iommu_type1.c | 85
> +++++++++++++++++++++++++++++++++++++++++
> >  drivers/vfio/vfio_pasid.c       | 10 +++++
> >  include/linux/vfio.h            |  6 +++
> >  include/uapi/linux/vfio.h       | 37 ++++++++++++++++++
> >  5 files changed, 139 insertions(+)
> >
> > diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index
> > 3d8a108..95d90c6 100644
> > --- a/drivers/vfio/Kconfig
> > +++ b/drivers/vfio/Kconfig
> > @@ -2,6 +2,7 @@
> >  config VFIO_IOMMU_TYPE1
> >  	tristate
> >  	depends on VFIO
> > +	select VFIO_PASID if (X86)
> >  	default n
> >
> >  config VFIO_IOMMU_SPAPR_TCE
> > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > b/drivers/vfio/vfio_iommu_type1.c index ed80104..55b4065 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -76,6 +76,7 @@ struct vfio_iommu {
> >  	bool				dirty_page_tracking;
> >  	bool				pinned_page_dirty_scope;
> >  	struct iommu_nesting_info	*nesting_info;
> > +	struct vfio_mm			*vmm;
> >  };
> >
> >  struct vfio_domain {
> > @@ -1937,6 +1938,11 @@ static void vfio_iommu_iova_insert_copy(struct
> > vfio_iommu *iommu,
> >
> >  static void vfio_iommu_release_nesting_info(struct vfio_iommu *iommu)
> > {
> > +	if (iommu->vmm) {
> > +		vfio_mm_put(iommu->vmm);
> > +		iommu->vmm = NULL;
> > +	}
> > +
> >  	kfree(iommu->nesting_info);
> >  	iommu->nesting_info = NULL;
> >  }
> > @@ -2071,6 +2077,26 @@ static int vfio_iommu_type1_attach_group(void
> *iommu_data,
> >  					    iommu->nesting_info);
> >  		if (ret)
> >  			goto out_detach;
> > +
> > +		if (iommu->nesting_info->features &
> > +					IOMMU_NESTING_FEAT_SYSWIDE_PASID)
> {
> > +			struct vfio_mm *vmm;
> > +			int sid;
> > +
> > +			vmm = vfio_mm_get_from_task(current);
> > +			if (IS_ERR(vmm)) {
> > +				ret = PTR_ERR(vmm);
> > +				goto out_detach;
> > +			}
> > +			iommu->vmm = vmm;
> > +
> > +			sid = vfio_mm_ioasid_sid(vmm);
> > +			ret = iommu_domain_set_attr(domain->domain,
> > +						    DOMAIN_ATTR_IOASID_SID,
> > +						    &sid);
> > +			if (ret)
> > +				goto out_detach;
> > +		}
> >  	}
> >
> >  	/* Get aperture info */
> > @@ -2855,6 +2881,63 @@ static int vfio_iommu_type1_dirty_pages(struct
> vfio_iommu *iommu,
> >  	return -EINVAL;
> >  }
> >
> > +static int vfio_iommu_type1_pasid_alloc(struct vfio_iommu *iommu,
> > +					unsigned int min,
> > +					unsigned int max)
> > +{
> > +	int ret = -EOPNOTSUPP;
> > +
> > +	mutex_lock(&iommu->lock);
> > +	if (iommu->vmm)
> > +		ret = vfio_pasid_alloc(iommu->vmm, min, max);
> > +	mutex_unlock(&iommu->lock);
> > +	return ret;
> > +}
> > +
> > +static int vfio_iommu_type1_pasid_free(struct vfio_iommu *iommu,
> > +				       unsigned int min,
> > +				       unsigned int max)
> > +{
> > +	int ret = -EOPNOTSUPP;
> > +
> > +	mutex_lock(&iommu->lock);
> > +	if (iommu->vmm) {
> > +		vfio_pasid_free_range(iommu->vmm, min, max);
> > +		ret = 0;
> > +	}
> > +	mutex_unlock(&iommu->lock);
> > +	return ret;
> > +}
> > +
> > +static int vfio_iommu_type1_pasid_request(struct vfio_iommu *iommu,
> > +					  unsigned long arg)
> > +{
> > +	struct vfio_iommu_type1_pasid_request req;
> > +	unsigned long minsz;
> > +
> > +	minsz = offsetofend(struct vfio_iommu_type1_pasid_request, range);
> > +
> > +	if (copy_from_user(&req, (void __user *)arg, minsz))
> > +		return -EFAULT;
> > +
> > +	if (req.argsz < minsz || (req.flags & ~VFIO_PASID_REQUEST_MASK))
> > +		return -EINVAL;
> > +
> > +	if (req.range.min > req.range.max)
> > +		return -EINVAL;
> > +
> > +	switch (req.flags & VFIO_PASID_REQUEST_MASK) {
> > +	case VFIO_IOMMU_FLAG_ALLOC_PASID:
> not sure it is worth to introduce both vfio_iommu_type1_pasid_free/alloc
> helpers. You could take the lock above, and test iommu->vmm as well above.

nice suggestion. thanks.

> > +		return vfio_iommu_type1_pasid_alloc(iommu,
> > +					req.range.min, req.range.max);
> > +	case VFIO_IOMMU_FLAG_FREE_PASID:
> > +		return vfio_iommu_type1_pasid_free(iommu,
> > +					req.range.min, req.range.max);
> > +	default:
> > +		return -EINVAL;
> > +	}
> > +}
> > +
> >  static long vfio_iommu_type1_ioctl(void *iommu_data,
> >  				   unsigned int cmd, unsigned long arg)  { @@ -
> 2871,6 +2954,8 @@
> > static long vfio_iommu_type1_ioctl(void *iommu_data,
> >  		return vfio_iommu_type1_unmap_dma(iommu, arg);
> >  	case VFIO_IOMMU_DIRTY_PAGES:
> >  		return vfio_iommu_type1_dirty_pages(iommu, arg);
> > +	case VFIO_IOMMU_PASID_REQUEST:
> > +		return vfio_iommu_type1_pasid_request(iommu, arg);
> >  	default:
> >  		return -ENOTTY;
> >  	}
> > diff --git a/drivers/vfio/vfio_pasid.c b/drivers/vfio/vfio_pasid.c
> > index 66e6054e..ebec244 100644
> > --- a/drivers/vfio/vfio_pasid.c
> > +++ b/drivers/vfio/vfio_pasid.c
> > @@ -61,6 +61,7 @@ void vfio_mm_put(struct vfio_mm *vmm)  {
> >  	kref_put_mutex(&vmm->kref, vfio_mm_release, &vfio_mm_lock);  }
> > +EXPORT_SYMBOL_GPL(vfio_mm_put);
> >
> >  static void vfio_mm_get(struct vfio_mm *vmm)
> >  {
> > @@ -114,6 +115,13 @@ struct vfio_mm *vfio_mm_get_from_task(struct
> task_struct *task)
> >  	mmput(mm);
> >  	return vmm;
> >  }
> > +EXPORT_SYMBOL_GPL(vfio_mm_get_from_task);
> > +
> > +int vfio_mm_ioasid_sid(struct vfio_mm *vmm)
> > +{
> > +	return vmm->ioasid_sid;
> > +}
> > +EXPORT_SYMBOL_GPL(vfio_mm_ioasid_sid);
> >
> >  /*
> >   * Find PASID within @min and @max
> > @@ -197,6 +205,7 @@ int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max)
> >
> >  	return pasid;
> >  }
> > +EXPORT_SYMBOL_GPL(vfio_pasid_alloc);
> >
> >  void vfio_pasid_free_range(struct vfio_mm *vmm,
> >  			   ioasid_t min, ioasid_t max)
> > @@ -213,6 +222,7 @@ void vfio_pasid_free_range(struct vfio_mm *vmm,
> >  		vfio_remove_pasid(vmm, vid);
> >  	mutex_unlock(&vmm->pasid_lock);
> >  }
> > +EXPORT_SYMBOL_GPL(vfio_pasid_free_range);
> >
> >  static int __init vfio_pasid_init(void)
> >  {
> > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > index 31472a9..a111108 100644
> > --- a/include/linux/vfio.h
> > +++ b/include/linux/vfio.h
> > @@ -101,6 +101,7 @@ struct vfio_mm;
> >  #if IS_ENABLED(CONFIG_VFIO_PASID)
> >  extern struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task);
> >  extern void vfio_mm_put(struct vfio_mm *vmm);
> > +int vfio_mm_ioasid_sid(struct vfio_mm *vmm);
> I still don't get why it does not take an extern as the other functions
> implemented in vfio_pasid.c and used in other modules. what's the
> difference?

it's a mistake from me. should have "extern" :-)

> >  extern int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max);
> >  extern void vfio_pasid_free_range(struct vfio_mm *vmm,
> >  				  ioasid_t min, ioasid_t max);
> > @@ -114,6 +115,11 @@ static inline void vfio_mm_put(struct vfio_mm *vmm)
> >  {
> >  }
> >
> > +static inline int vfio_mm_ioasid_sid(struct vfio_mm *vmm)
> > +{
> > +	return -ENOTTY;
> > +}
> > +
> >  static inline int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max)
> >  {
> >  	return -ENOTTY;
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 46a78af..96a115f 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -1172,6 +1172,43 @@ struct vfio_iommu_type1_dirty_bitmap_get {
> >
> >  #define VFIO_IOMMU_DIRTY_PAGES             _IO(VFIO_TYPE, VFIO_BASE + 17)
> >
> > +/**
> > + * VFIO_IOMMU_PASID_REQUEST - _IOWR(VFIO_TYPE, VFIO_BASE + 18,
> > + *				struct vfio_iommu_type1_pasid_request)
> > + *
> > + * PASID (Processor Address Space ID) is a PCIe concept for tagging
> > + * address spaces in DMA requests. When system-wide PASID allocation
> > + * is required by underlying iommu driver (e.g. Intel VT-d), this
> by the underlying

got it.

> > + * provides an interface for userspace to request pasid alloc/free
> > + * for its assigned devices. Userspace should check the availability
> > + * of this API by checking VFIO_IOMMU_TYPE1_INFO_CAP_NESTING through
> > + * VFIO_IOMMU_GET_INFO.
> > + *
> > + * @flags=VFIO_IOMMU_FLAG_ALLOC_PASID, allocate a single PASID within
> @range.
> > + * @flags=VFIO_IOMMU_FLAG_FREE_PASID, free the PASIDs within @range.
> > + * @range is [min, max], which means both @min and @max are inclusive.
> > + * ALLOC_PASID and FREE_PASID are mutually exclusive.
> > + *
> > + * returns: allocated PASID value on success, -errno on failure for
> > + *	     ALLOC_PASID;
> > + *	     0 for FREE_PASID operation;
> > + */
> > +struct vfio_iommu_type1_pasid_request {
> > +	__u32	argsz;
> > +#define VFIO_IOMMU_FLAG_ALLOC_PASID	(1 << 0)
> > +#define VFIO_IOMMU_FLAG_FREE_PASID	(1 << 1)
> > +	__u32	flags;
> > +	struct {
> > +		__u32	min;
> > +		__u32	max;
> > +	} range;
> > +};
> > +
> > +#define VFIO_PASID_REQUEST_MASK	(VFIO_IOMMU_FLAG_ALLOC_PASID | \
> > +					 VFIO_IOMMU_FLAG_FREE_PASID)
> > +
> > +#define VFIO_IOMMU_PASID_REQUEST	_IO(VFIO_TYPE, VFIO_BASE + 18)
> > +
> >  /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
> >
> >  /*
> >
> Otherwise, besides the pieces I would have put directly in the
> vfio_pasid module patch (EXPORT_SYMBOL_GPL and vfio_mm_ioasid_sid),
> looks good.

we discussed in another thread:-).

Regards,
Yi Liu

> 
> Thanks
> 
> Eric

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [PATCH v5 08/15] iommu: Pass domain to sva_unbind_gpasid()
  2020-07-19 15:37   ` Auger Eric
@ 2020-07-20  9:06     ` Liu, Yi L
  0 siblings, 0 replies; 58+ messages in thread
From: Liu, Yi L @ 2020-07-20  9:06 UTC (permalink / raw)
  To: Auger Eric, alex.williamson, baolu.lu, joro
  Cc: jean-philippe, Tian, Kevin, Raj, Ashok, kvm, stefanha, Tian,
	 Jun J, iommu, linux-kernel, Sun, Yi Y, Wu, Hao

Hi Eric,

> From: Auger Eric <eric.auger@redhat.com>
> Sent: Sunday, July 19, 2020 11:38 PM
> 
> Yi,
> 
> On 7/12/20 1:21 PM, Liu Yi L wrote:
> > From: Yi Sun <yi.y.sun@intel.com>
> >
> > Current interface is good enough for SVA virtualization on an assigned
> > physical PCI device, but when it comes to mediated devices, a physical
> > device may attached with multiple aux-domains. Also, for guest unbind,
> > the PASID to be unbind should be allocated to the VM. This check
> > requires to know the ioasid_set which is associated with the domain.
> >
> > So this interface needs to pass in domain info. Then the iommu driver
> > is able to know which domain will be used for the 2nd stage
> > translation of the nesting mode and also be able to do PASID ownership
> > check. This patch passes @domain per the above reason.
> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Alex Williamson <alex.williamson@redhat.com>
> > Cc: Eric Auger <eric.auger@redhat.com>
> > Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > Cc: Joerg Roedel <joro@8bytes.org>
> > Cc: Lu Baolu <baolu.lu@linux.intel.com>
> > Signed-off-by: Yi Sun <yi.y.sun@intel.com>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > ---
> > v2 -> v3:
> > *) pass in domain info only
> > *) use ioasid_t for pasid instead of int type
> >
> > v1 -> v2:
> > *) added in v2.
> > ---
> >  drivers/iommu/intel/svm.c   | 3 ++-
> >  drivers/iommu/iommu.c       | 2 +-
> >  include/linux/intel-iommu.h | 3 ++-
> >  include/linux/iommu.h       | 3 ++-
> >  4 files changed, 7 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> > index b9a9c55..d2c0e1a 100644
> > --- a/drivers/iommu/intel/svm.c
> > +++ b/drivers/iommu/intel/svm.c
> > @@ -432,7 +432,8 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain,
> struct device *dev,
> >  	return ret;
> >  }
> >
> > -int intel_svm_unbind_gpasid(struct device *dev, int pasid)
> > +int intel_svm_unbind_gpasid(struct iommu_domain *domain,
> > +			    struct device *dev, ioasid_t pasid)
> int -> ioasid_t proto change is not described in the commit message,

oops, yes. btw. I noticed there is another thread which is going to use
u32 for pasid. perhaps I need to drop such change.

https://lore.kernel.org/linux-iommu/1594684087-61184-2-git-send-email-fenghua.yu@intel.com/#Z30drivers:iommu:iommu.c

> >  {
> >  	struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> >  	struct intel_svm_dev *sdev;
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index
> > 7910249..d3e554c 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -2151,7 +2151,7 @@ int __iommu_sva_unbind_gpasid(struct iommu_domain
> *domain, struct device *dev,
> >  	if (unlikely(!domain->ops->sva_unbind_gpasid))
> >  		return -ENODEV;
> >
> > -	return domain->ops->sva_unbind_gpasid(dev, data->hpasid);
> > +	return domain->ops->sva_unbind_gpasid(domain, dev, data->hpasid);
> >  }
> >  EXPORT_SYMBOL_GPL(__iommu_sva_unbind_gpasid);
> >
> > diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> > index 0d0ab32..18f292e 100644
> > --- a/include/linux/intel-iommu.h
> > +++ b/include/linux/intel-iommu.h
> > @@ -738,7 +738,8 @@ extern int intel_svm_enable_prq(struct intel_iommu
> > *iommu);  extern int intel_svm_finish_prq(struct intel_iommu *iommu);
> > int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
> >  			  struct iommu_gpasid_bind_data *data); -int
> > intel_svm_unbind_gpasid(struct device *dev, int pasid);
> > +int intel_svm_unbind_gpasid(struct iommu_domain *domain,
> > +			    struct device *dev, ioasid_t pasid);
> >  struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm,
> >  				 void *drvdata);
> >  void intel_svm_unbind(struct iommu_sva *handle); diff --git
> > a/include/linux/iommu.h b/include/linux/iommu.h index e84a1d5..ca5edd8
> > 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -303,7 +303,8 @@ struct iommu_ops {
> >  	int (*sva_bind_gpasid)(struct iommu_domain *domain,
> >  			struct device *dev, struct iommu_gpasid_bind_data *data);
> >
> > -	int (*sva_unbind_gpasid)(struct device *dev, int pasid);
> > +	int (*sva_unbind_gpasid)(struct iommu_domain *domain,
> > +				 struct device *dev, ioasid_t pasid);
> >
> >  	int (*def_domain_type)(struct device *dev);
> >
> >
> Besides
> Reviewed-by: Eric Auger <eric.auger@redhat.com>
>

thanks.

Regards,
Yi Liu

> Eric

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v5 10/15] vfio/type1: Support binding guest page tables to PASID
  2020-07-12 11:21 ` [PATCH v5 10/15] vfio/type1: Support binding guest page tables to PASID Liu Yi L
@ 2020-07-20  9:37   ` Auger Eric
  2020-07-20 10:37     ` Liu, Yi L
  0 siblings, 1 reply; 58+ messages in thread
From: Auger Eric @ 2020-07-20  9:37 UTC (permalink / raw)
  To: Liu Yi L, alex.williamson, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	iommu, linux-kernel, yi.y.sun, hao.wu

Yi,

On 7/12/20 1:21 PM, Liu Yi L wrote:
> Nesting translation allows two-levels/stages page tables, with 1st level
> for guest translations (e.g. GVA->GPA), 2nd level for host translations
> (e.g. GPA->HPA). This patch adds interface for binding guest page tables
> to a PASID. This PASID must have been allocated to user space before the
by the userspace?
> binding request.
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Eric Auger <eric.auger@redhat.com>
> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.com>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
> v3 -> v4:
> *) address comments from Alex on v3
> 
> v2 -> v3:
> *) use __iommu_sva_unbind_gpasid() for unbind call issued by VFIO
>    https://lore.kernel.org/linux-iommu/1592931837-58223-6-git-send-email-jacob.jun.pan@linux.intel.com/
> 
> v1 -> v2:
> *) rename subject from "vfio/type1: Bind guest page tables to host"
> *) remove VFIO_IOMMU_BIND, introduce VFIO_IOMMU_NESTING_OP to support bind/
>    unbind guet page table
> *) replaced vfio_iommu_for_each_dev() with a group level loop since this
>    series enforces one group per container w/ nesting type as start.
> *) rename vfio_bind/unbind_gpasid_fn() to vfio_dev_bind/unbind_gpasid_fn()
> *) vfio_dev_unbind_gpasid() always successful
> *) use vfio_mm->pasid_lock to avoid race between PASID free and page table
>    bind/unbind
> ---
>  drivers/vfio/vfio_iommu_type1.c | 166 ++++++++++++++++++++++++++++++++++++++++
>  drivers/vfio/vfio_pasid.c       |  26 +++++++
>  include/linux/vfio.h            |  20 +++++
>  include/uapi/linux/vfio.h       |  31 ++++++++
>  4 files changed, 243 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 55b4065..f0f21ff 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -149,6 +149,30 @@ struct vfio_regions {
>  #define DIRTY_BITMAP_PAGES_MAX	 ((u64)INT_MAX)
>  #define DIRTY_BITMAP_SIZE_MAX	 DIRTY_BITMAP_BYTES(DIRTY_BITMAP_PAGES_MAX)
>  
> +struct domain_capsule {
> +	struct vfio_group *group;
> +	struct iommu_domain *domain;
> +	void *data;
> +};
> +
> +/* iommu->lock must be held */
> +static struct vfio_group *vfio_find_nesting_group(struct vfio_iommu *iommu)
> +{
> +	struct vfio_domain *d;
> +	struct vfio_group *group = NULL;
> +
> +	if (!iommu->nesting_info)
> +		return NULL;
> +
> +	/* only support singleton container with nesting type */
> +	list_for_each_entry(d, &iommu->domain_list, next) {
> +		list_for_each_entry(group, &d->group_list, next) {
> +			break;
use list_first_entry?
> +		}
> +	}
> +	return group;
> +}
> +
>  static int put_pfn(unsigned long pfn, int prot);
>  
>  static struct vfio_group *vfio_iommu_find_iommu_group(struct vfio_iommu *iommu,
> @@ -2349,6 +2373,48 @@ static int vfio_iommu_resv_refresh(struct vfio_iommu *iommu,
>  	return ret;
>  }
>  
> +static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
> +{
> +	struct domain_capsule *dc = (struct domain_capsule *)data;
> +	unsigned long arg = *(unsigned long *)dc->data;
> +
> +	return iommu_sva_bind_gpasid(dc->domain, dev, (void __user *)arg);
> +}
> +
> +static int vfio_dev_unbind_gpasid_fn(struct device *dev, void *data)
> +{
> +	struct domain_capsule *dc = (struct domain_capsule *)data;
> +	unsigned long arg = *(unsigned long *)dc->data;
> +
> +	iommu_sva_unbind_gpasid(dc->domain, dev, (void __user *)arg);
> +	return 0;
> +}
> +
> +static int __vfio_dev_unbind_gpasid_fn(struct device *dev, void *data)
> +{
> +	struct domain_capsule *dc = (struct domain_capsule *)data;
> +	struct iommu_gpasid_bind_data *unbind_data =
> +				(struct iommu_gpasid_bind_data *)dc->data;
> +
> +	__iommu_sva_unbind_gpasid(dc->domain, dev, unbind_data);
> +	return 0;
> +}
> +
> +static void vfio_group_unbind_gpasid_fn(ioasid_t pasid, void *data)
> +{
> +	struct domain_capsule *dc = (struct domain_capsule *)data;
> +	struct iommu_gpasid_bind_data unbind_data;
> +
> +	unbind_data.argsz = offsetof(struct iommu_gpasid_bind_data, vendor);
> +	unbind_data.flags = 0;
> +	unbind_data.hpasid = pasid;
> +
> +	dc->data = &unbind_data;
> +
> +	iommu_group_for_each_dev(dc->group->iommu_group,
> +				 dc, __vfio_dev_unbind_gpasid_fn);
> +}
> +
>  static void vfio_iommu_type1_detach_group(void *iommu_data,
>  					  struct iommu_group *iommu_group)
>  {
> @@ -2392,6 +2458,21 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
>  		if (!group)
>  			continue;
>  
> +		if (iommu->nesting_info && iommu->vmm &&
> +		    (iommu->nesting_info->features &
> +					IOMMU_NESTING_FEAT_BIND_PGTBL)) {
> +			struct domain_capsule dc = { .group = group,
> +						     .domain = domain->domain,
> +						     .data = NULL };
> +
> +			/*
> +			 * Unbind page tables bound with system wide PASIDs
> +			 * which are allocated to user space.
> +			 */
> +			vfio_mm_for_each_pasid(iommu->vmm, &dc,
> +					       vfio_group_unbind_gpasid_fn);
> +		}
> +
>  		vfio_iommu_detach_group(domain, group);
>  		update_dirty_scope = !group->pinned_page_dirty_scope;
>  		list_del(&group->next);
> @@ -2938,6 +3019,89 @@ static int vfio_iommu_type1_pasid_request(struct vfio_iommu *iommu,
>  	}
>  }
>  
> +static long vfio_iommu_handle_pgtbl_op(struct vfio_iommu *iommu,
> +				       bool is_bind, unsigned long arg)
> +{
> +	struct iommu_nesting_info *info;
> +	struct domain_capsule dc = { .data = &arg };
> +	struct vfio_group *group;
> +	struct vfio_domain *domain;
> +	int ret;
> +
> +	mutex_lock(&iommu->lock);
> +
> +	info = iommu->nesting_info;
> +	if (!info || !(info->features & IOMMU_NESTING_FEAT_BIND_PGTBL)) {
> +		ret = -EOPNOTSUPP;
> +		goto out_unlock_iommu;
> +	}
> +
> +	if (!iommu->vmm) {
> +		ret = -EINVAL;
> +		goto out_unlock_iommu;
> +	}
> +
> +	group = vfio_find_nesting_group(iommu);
is it realy useful to introduce vfio_find_nesting_group?
in this function you already test info, you fetch the first domain
below. So you would only need to fetch the 1st group?
> +	if (!group) {
> +		ret = -EINVAL;
can it happen?
> +		goto out_unlock_iommu;
> +	}
> +
> +	domain = list_first_entry(&iommu->domain_list,
> +				  struct vfio_domain, next);
> +	dc.group = group;
> +	dc.domain = domain->domain;
> +
> +	/* Avoid race with other containers within the same process */
> +	vfio_mm_pasid_lock(iommu->vmm);
> +
> +	if (is_bind) {
> +		ret = iommu_group_for_each_dev(group->iommu_group, &dc,
> +					       vfio_dev_bind_gpasid_fn);
> +		if (ret)
> +			iommu_group_for_each_dev(group->iommu_group, &dc,
> +						 vfio_dev_unbind_gpasid_fn);
> +	} else {
> +		iommu_group_for_each_dev(group->iommu_group,
> +					 &dc, vfio_dev_unbind_gpasid_fn);
> +		ret = 0;

int ret = 0;

if (is_bind) {
ret = iommu_group_for_each_dev(group->iommu_group, &dc,
 			       vfio_dev_bind_gpasid_fn);
}
if (ret || !is_bind) {
	iommu_group_for_each_dev(group->iommu_group,
			&dc, vfio_dev_unbind_gpasid_fn);
}
> +	}
> +
> +	vfio_mm_pasid_unlock(iommu->vmm);
> +out_unlock_iommu:
> +	mutex_unlock(&iommu->lock);
> +	return ret;
> +}
> +
> +static long vfio_iommu_type1_nesting_op(struct vfio_iommu *iommu,
> +					unsigned long arg)
> +{
> +	struct vfio_iommu_type1_nesting_op hdr;
> +	unsigned int minsz;
> +	int ret;
> +
> +	minsz = offsetofend(struct vfio_iommu_type1_nesting_op, flags);
> +
> +	if (copy_from_user(&hdr, (void __user *)arg, minsz))
> +		return -EFAULT;
> +
> +	if (hdr.argsz < minsz || hdr.flags & ~VFIO_NESTING_OP_MASK)
> +		return -EINVAL;
> +
> +	switch (hdr.flags & VFIO_NESTING_OP_MASK) {
> +	case VFIO_IOMMU_NESTING_OP_BIND_PGTBL:
> +		ret = vfio_iommu_handle_pgtbl_op(iommu, true, arg + minsz);
> +		break;
> +	case VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL:
> +		ret = vfio_iommu_handle_pgtbl_op(iommu, false, arg + minsz);
> +		break;
> +	default:
> +		ret = -EINVAL;
> +	}
> +
> +	return ret;
> +}
> +
>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>  				   unsigned int cmd, unsigned long arg)
>  {
> @@ -2956,6 +3120,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  		return vfio_iommu_type1_dirty_pages(iommu, arg);
>  	case VFIO_IOMMU_PASID_REQUEST:
>  		return vfio_iommu_type1_pasid_request(iommu, arg);
> +	case VFIO_IOMMU_NESTING_OP:
> +		return vfio_iommu_type1_nesting_op(iommu, arg);
>  	default:
>  		return -ENOTTY;
>  	}
> diff --git a/drivers/vfio/vfio_pasid.c b/drivers/vfio/vfio_pasid.c
> index ebec244..ecabaaa 100644
> --- a/drivers/vfio/vfio_pasid.c
> +++ b/drivers/vfio/vfio_pasid.c
> @@ -216,6 +216,8 @@ void vfio_pasid_free_range(struct vfio_mm *vmm,
>  	 * IOASID core will notify PASID users (e.g. IOMMU driver) to
>  	 * teardown necessary structures depending on the to-be-freed
>  	 * PASID.
> +	 * Hold pasid_lock also avoids race with PASID usages like bind/
> +	 * unbind page tables to requested PASID.
>  	 */
>  	mutex_lock(&vmm->pasid_lock);
>  	while ((vid = vfio_find_pasid(vmm, min, max)) != NULL)
> @@ -224,6 +226,30 @@ void vfio_pasid_free_range(struct vfio_mm *vmm,
>  }
>  EXPORT_SYMBOL_GPL(vfio_pasid_free_range);
>  
> +int vfio_mm_for_each_pasid(struct vfio_mm *vmm, void *data,
> +			   void (*fn)(ioasid_t id, void *data))
> +{
> +	int ret;
> +
> +	mutex_lock(&vmm->pasid_lock);
> +	ret = ioasid_set_for_each_ioasid(vmm->ioasid_sid, fn, data);
> +	mutex_unlock(&vmm->pasid_lock);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(vfio_mm_for_each_pasid);
> +
> +void vfio_mm_pasid_lock(struct vfio_mm *vmm)
> +{
> +	mutex_lock(&vmm->pasid_lock);
> +}
> +EXPORT_SYMBOL_GPL(vfio_mm_pasid_lock);
> +
> +void vfio_mm_pasid_unlock(struct vfio_mm *vmm)
> +{
> +	mutex_unlock(&vmm->pasid_lock);
> +}
> +EXPORT_SYMBOL_GPL(vfio_mm_pasid_unlock);
> +
>  static int __init vfio_pasid_init(void)
>  {
>  	mutex_init(&vfio_mm_lock);
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index a111108..2bc8347 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -105,6 +105,11 @@ int vfio_mm_ioasid_sid(struct vfio_mm *vmm);
>  extern int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max);
>  extern void vfio_pasid_free_range(struct vfio_mm *vmm,
>  				  ioasid_t min, ioasid_t max);
> +extern int vfio_mm_for_each_pasid(struct vfio_mm *vmm, void *data,
> +				  void (*fn)(ioasid_t id, void *data));
> +extern void vfio_mm_pasid_lock(struct vfio_mm *vmm);
> +extern void vfio_mm_pasid_unlock(struct vfio_mm *vmm);
> +
>  #else
>  static inline struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task)
>  {
> @@ -129,6 +134,21 @@ static inline void vfio_pasid_free_range(struct vfio_mm *vmm,
>  					  ioasid_t min, ioasid_t max)
>  {
>  }
> +
> +static inline int vfio_mm_for_each_pasid(struct vfio_mm *vmm, void *data,
> +					 void (*fn)(ioasid_t id, void *data))
> +{
> +	return -ENOTTY;
> +}
> +
> +static inline void vfio_mm_pasid_lock(struct vfio_mm *vmm)
> +{
> +}
> +
> +static inline void vfio_mm_pasid_unlock(struct vfio_mm *vmm)
> +{
> +}
> +
>  #endif /* CONFIG_VFIO_PASID */
>  
>  /*
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 96a115f..a8ad786 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -1209,6 +1209,37 @@ struct vfio_iommu_type1_pasid_request {
>  
>  #define VFIO_IOMMU_PASID_REQUEST	_IO(VFIO_TYPE, VFIO_BASE + 18)
>  
> +/**
> + * VFIO_IOMMU_NESTING_OP - _IOW(VFIO_TYPE, VFIO_BASE + 19,
> + *				struct vfio_iommu_type1_nesting_op)
> + *
> + * This interface allows user space to utilize the nesting IOMMU
> + * capabilities as reported in VFIO_IOMMU_TYPE1_INFO_CAP_NESTING
> + * cap through VFIO_IOMMU_GET_INFO.
> + *
> + * @data[] types defined for each op:
> + * +=================+===============================================+
> + * | NESTING OP      |      @data[]                                  |
> + * +=================+===============================================+
> + * | BIND_PGTBL      |      struct iommu_gpasid_bind_data            |
> + * +-----------------+-----------------------------------------------+
> + * | UNBIND_PGTBL    |      struct iommu_gpasid_bind_data            |
> + * +-----------------+-----------------------------------------------+
> + *
> + * returns: 0 on success, -errno on failure.
> + */
> +struct vfio_iommu_type1_nesting_op {
> +	__u32	argsz;
> +	__u32	flags;
> +#define VFIO_NESTING_OP_MASK	(0xffff) /* lower 16-bits for op */
> +	__u8	data[];
> +};
> +
> +#define VFIO_IOMMU_NESTING_OP_BIND_PGTBL	(0)
> +#define VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL	(1)
> +
> +#define VFIO_IOMMU_NESTING_OP		_IO(VFIO_TYPE, VFIO_BASE + 19)
> +
>  /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
>  
>  /*
> 
Thanks

Eric

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v5 11/15] vfio/type1: Allow invalidating first-level/stage IOMMU cache
  2020-07-12 11:21 ` [PATCH v5 11/15] vfio/type1: Allow invalidating first-level/stage IOMMU cache Liu Yi L
@ 2020-07-20  9:41   ` Auger Eric
  2020-07-20 10:42     ` Liu, Yi L
  0 siblings, 1 reply; 58+ messages in thread
From: Auger Eric @ 2020-07-20  9:41 UTC (permalink / raw)
  To: Liu Yi L, alex.williamson, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	iommu, linux-kernel, yi.y.sun, hao.wu

Yi,

On 7/12/20 1:21 PM, Liu Yi L wrote:
> This patch provides an interface allowing the userspace to invalidate
> IOMMU cache for first-level page table. It is required when the first
> level IOMMU page table is not managed by the host kernel in the nested
> translation setup.
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Eric Auger <eric.auger@redhat.com>
> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
> v1 -> v2:
> *) rename from "vfio/type1: Flush stage-1 IOMMU cache for nesting type"
> *) rename vfio_cache_inv_fn() to vfio_dev_cache_invalidate_fn()
> *) vfio_dev_cache_inv_fn() always successful
> *) remove VFIO_IOMMU_CACHE_INVALIDATE, and reuse VFIO_IOMMU_NESTING_OP
> ---
>  drivers/vfio/vfio_iommu_type1.c | 50 +++++++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/vfio.h       |  3 +++
>  2 files changed, 53 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index f0f21ff..960cc59 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -3073,6 +3073,53 @@ static long vfio_iommu_handle_pgtbl_op(struct vfio_iommu *iommu,
>  	return ret;
>  }
>  
> +static int vfio_dev_cache_invalidate_fn(struct device *dev, void *data)
> +{
> +	struct domain_capsule *dc = (struct domain_capsule *)data;
> +	unsigned long arg = *(unsigned long *)dc->data;
> +
> +	iommu_cache_invalidate(dc->domain, dev, (void __user *)arg);
> +	return 0;
> +}
> +
> +static long vfio_iommu_invalidate_cache(struct vfio_iommu *iommu,
> +					unsigned long arg)
> +{
> +	struct domain_capsule dc = { .data = &arg };
> +	struct vfio_group *group;
> +	struct vfio_domain *domain;
> +	int ret = 0;
> +	struct iommu_nesting_info *info;
> +
> +	mutex_lock(&iommu->lock);
> +	/*
> +	 * Cache invalidation is required for any nesting IOMMU,
> +	 * so no need to check system-wide PASID support.
> +	 */
> +	info = iommu->nesting_info;
> +	if (!info || !(info->features & IOMMU_NESTING_FEAT_CACHE_INVLD)) {
> +		ret = -EOPNOTSUPP;
> +		goto out_unlock;
> +	}
> +
> +	group = vfio_find_nesting_group(iommu);
so I see you reuse it here. But still wondering if you cant't directly
set dc.domain and dc.group group below using list_firt_entry?
> +	if (!group) {
> +		ret = -EINVAL;
> +		goto out_unlock;
> +	}
> +
> +	domain = list_first_entry(&iommu->domain_list,
> +				  struct vfio_domain, next);
> +	dc.group = group;
> +	dc.domain = domain->domain;
> +	iommu_group_for_each_dev(group->iommu_group, &dc,
> +				 vfio_dev_cache_invalidate_fn);
> +
> +out_unlock:
> +	mutex_unlock(&iommu->lock);
> +	return ret;
> +}
> +
>  static long vfio_iommu_type1_nesting_op(struct vfio_iommu *iommu,
>  					unsigned long arg)
>  {
> @@ -3095,6 +3142,9 @@ static long vfio_iommu_type1_nesting_op(struct vfio_iommu *iommu,
>  	case VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL:
>  		ret = vfio_iommu_handle_pgtbl_op(iommu, false, arg + minsz);
>  		break;
> +	case VFIO_IOMMU_NESTING_OP_CACHE_INVLD:
> +		ret = vfio_iommu_invalidate_cache(iommu, arg + minsz);
> +		break;
>  	default:
>  		ret = -EINVAL;
>  	}
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index a8ad786..845a5800 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -1225,6 +1225,8 @@ struct vfio_iommu_type1_pasid_request {
>   * +-----------------+-----------------------------------------------+
>   * | UNBIND_PGTBL    |      struct iommu_gpasid_bind_data            |
>   * +-----------------+-----------------------------------------------+
> + * | CACHE_INVLD     |      struct iommu_cache_invalidate_info       |
> + * +-----------------+-----------------------------------------------+
>   *
>   * returns: 0 on success, -errno on failure.
>   */
> @@ -1237,6 +1239,7 @@ struct vfio_iommu_type1_nesting_op {
>  
>  #define VFIO_IOMMU_NESTING_OP_BIND_PGTBL	(0)
>  #define VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL	(1)
> +#define VFIO_IOMMU_NESTING_OP_CACHE_INVLD	(2)
>  
>  #define VFIO_IOMMU_NESTING_OP		_IO(VFIO_TYPE, VFIO_BASE + 19)
>  
> 
Otherwise looks good to me

Thanks

Eric

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [PATCH v5 09/15] iommu/vt-d: Check ownership for PASIDs from user-space
  2020-07-19 16:06   ` Auger Eric
@ 2020-07-20 10:18     ` Liu, Yi L
  2020-07-20 12:37       ` Auger Eric
  0 siblings, 1 reply; 58+ messages in thread
From: Liu, Yi L @ 2020-07-20 10:18 UTC (permalink / raw)
  To: Auger Eric, alex.williamson, baolu.lu, joro
  Cc: jean-philippe, Tian, Kevin, Raj, Ashok, kvm, stefanha, Tian,
	 Jun J, iommu, linux-kernel, Sun, Yi Y, Wu, Hao

Hi Eric,

> From: Auger Eric <eric.auger@redhat.com>
> Sent: Monday, July 20, 2020 12:06 AM
> 
> Hi Yi,
> 
> On 7/12/20 1:21 PM, Liu Yi L wrote:
> > When an IOMMU domain with nesting attribute is used for guest SVA, a
> > system-wide PASID is allocated for binding with the device and the domain.
> > For security reason, we need to check the PASID passsed from user-space.
> passed

got it.

> > e.g. page table bind/unbind and PASID related cache invalidation.
> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Alex Williamson <alex.williamson@redhat.com>
> > Cc: Eric Auger <eric.auger@redhat.com>
> > Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > Cc: Joerg Roedel <joro@8bytes.org>
> > Cc: Lu Baolu <baolu.lu@linux.intel.com>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > ---
> >  drivers/iommu/intel/iommu.c | 10 ++++++++++
> >  drivers/iommu/intel/svm.c   |  7 +++++--
> >  2 files changed, 15 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> > index 4d54198..a9504cb 100644
> > --- a/drivers/iommu/intel/iommu.c
> > +++ b/drivers/iommu/intel/iommu.c
> > @@ -5436,6 +5436,7 @@ intel_iommu_sva_invalidate(struct iommu_domain
> *domain, struct device *dev,
> >  		int granu = 0;
> >  		u64 pasid = 0;
> >  		u64 addr = 0;
> > +		void *pdata;
> >
> >  		granu = to_vtd_granularity(cache_type, inv_info->granularity);
> >  		if (granu == -EINVAL) {
> > @@ -5456,6 +5457,15 @@ intel_iommu_sva_invalidate(struct iommu_domain
> *domain, struct device *dev,
> >  			 (inv_info->granu.addr_info.flags &
> IOMMU_INV_ADDR_FLAGS_PASID))
> >  			pasid = inv_info->granu.addr_info.pasid;
> >
> > +		pdata = ioasid_find(dmar_domain->ioasid_sid, pasid, NULL);
> > +		if (!pdata) {
> > +			ret = -EINVAL;
> > +			goto out_unlock;
> > +		} else if (IS_ERR(pdata)) {
> > +			ret = PTR_ERR(pdata);
> > +			goto out_unlock;
> > +		}
> > +
> >  		switch (BIT(cache_type)) {
> >  		case IOMMU_CACHE_INV_TYPE_IOTLB:
> >  			/* HW will ignore LSB bits based on address mask */
> > diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> > index d2c0e1a..212dee0 100644
> > --- a/drivers/iommu/intel/svm.c
> > +++ b/drivers/iommu/intel/svm.c
> > @@ -319,7 +319,7 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain,
> struct device *dev,
> >  	dmar_domain = to_dmar_domain(domain);
> >
> >  	mutex_lock(&pasid_mutex);
> > -	svm = ioasid_find(INVALID_IOASID_SET, data->hpasid, NULL);
> I do not get what the call was supposed to do before that patch?

you mean patch 10/15 by "that patch", right? the ownership check should
be done as to prevent illegal bind request from userspace. before patch
10/15, it should be added.

> > +	svm = ioasid_find(dmar_domain->ioasid_sid, data->hpasid, NULL);
> >  	if (IS_ERR(svm)) {
> >  		ret = PTR_ERR(svm);
> >  		goto out;
> > @@ -436,6 +436,7 @@ int intel_svm_unbind_gpasid(struct iommu_domain
> *domain,
> >  			    struct device *dev, ioasid_t pasid)
> >  {
> >  	struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> > +	struct dmar_domain *dmar_domain;
> >  	struct intel_svm_dev *sdev;
> >  	struct intel_svm *svm;
> >  	int ret = -EINVAL;
> > @@ -443,8 +444,10 @@ int intel_svm_unbind_gpasid(struct iommu_domain
> *domain,
> >  	if (WARN_ON(!iommu))
> >  		return -EINVAL;
> >
> > +	dmar_domain = to_dmar_domain(domain);
> > +
> >  	mutex_lock(&pasid_mutex);
> > -	svm = ioasid_find(INVALID_IOASID_SET, pasid, NULL);
> > +	svm = ioasid_find(dmar_domain->ioasid_sid, pasid, NULL);
> just to make sure, about the locking, can't domain->ioasid_sid change
> under the hood?

I guess not. intel_svm_unbind_gpasid() and iommu_domain_set_attr()
is called by vfio today, and within vfio, there is vfio_iommu->lock.

Regards,
Yi Liu

> 
> Thanks
> 
> Eric
> >  	if (!svm) {
> >  		ret = -EINVAL;
> >  		goto out;
> >

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [PATCH v5 10/15] vfio/type1: Support binding guest page tables to PASID
  2020-07-20  9:37   ` Auger Eric
@ 2020-07-20 10:37     ` Liu, Yi L
  0 siblings, 0 replies; 58+ messages in thread
From: Liu, Yi L @ 2020-07-20 10:37 UTC (permalink / raw)
  To: Auger Eric, alex.williamson, baolu.lu, joro
  Cc: jean-philippe, Tian, Kevin, Raj, Ashok, kvm, stefanha, Tian,
	 Jun J, iommu, linux-kernel, Sun, Yi Y, Wu, Hao

Hi Eric,

> From: Auger Eric <eric.auger@redhat.com>
> Sent: Monday, July 20, 2020 5:37 PM
> 
> Yi,
> 
> On 7/12/20 1:21 PM, Liu Yi L wrote:
> > Nesting translation allows two-levels/stages page tables, with 1st level
> > for guest translations (e.g. GVA->GPA), 2nd level for host translations
> > (e.g. GPA->HPA). This patch adds interface for binding guest page tables
> > to a PASID. This PASID must have been allocated to user space before the
> by the userspace?

yes, it is. will modify it.

> > binding request.
> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Alex Williamson <alex.williamson@redhat.com>
> > Cc: Eric Auger <eric.auger@redhat.com>
> > Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > Cc: Joerg Roedel <joro@8bytes.org>
> > Cc: Lu Baolu <baolu.lu@linux.intel.com>
> > Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.com>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > ---
> > v3 -> v4:
> > *) address comments from Alex on v3
> >
> > v2 -> v3:
> > *) use __iommu_sva_unbind_gpasid() for unbind call issued by VFIO
> >    https://lore.kernel.org/linux-iommu/1592931837-58223-6-git-send-email-
> jacob.jun.pan@linux.intel.com/
> >
> > v1 -> v2:
> > *) rename subject from "vfio/type1: Bind guest page tables to host"
> > *) remove VFIO_IOMMU_BIND, introduce VFIO_IOMMU_NESTING_OP to support
> bind/
> >    unbind guet page table
> > *) replaced vfio_iommu_for_each_dev() with a group level loop since this
> >    series enforces one group per container w/ nesting type as start.
> > *) rename vfio_bind/unbind_gpasid_fn() to vfio_dev_bind/unbind_gpasid_fn()
> > *) vfio_dev_unbind_gpasid() always successful
> > *) use vfio_mm->pasid_lock to avoid race between PASID free and page table
> >    bind/unbind
> > ---
> >  drivers/vfio/vfio_iommu_type1.c | 166
> ++++++++++++++++++++++++++++++++++++++++
> >  drivers/vfio/vfio_pasid.c       |  26 +++++++
> >  include/linux/vfio.h            |  20 +++++
> >  include/uapi/linux/vfio.h       |  31 ++++++++
> >  4 files changed, 243 insertions(+)
> >
> > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > index 55b4065..f0f21ff 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -149,6 +149,30 @@ struct vfio_regions {
> >  #define DIRTY_BITMAP_PAGES_MAX	 ((u64)INT_MAX)
> >  #define DIRTY_BITMAP_SIZE_MAX
> DIRTY_BITMAP_BYTES(DIRTY_BITMAP_PAGES_MAX)
> >
> > +struct domain_capsule {
> > +	struct vfio_group *group;
> > +	struct iommu_domain *domain;
> > +	void *data;
> > +};
> > +
> > +/* iommu->lock must be held */
> > +static struct vfio_group *vfio_find_nesting_group(struct vfio_iommu *iommu)
> > +{
> > +	struct vfio_domain *d;
> > +	struct vfio_group *group = NULL;
> > +
> > +	if (!iommu->nesting_info)
> > +		return NULL;
> > +
> > +	/* only support singleton container with nesting type */
> > +	list_for_each_entry(d, &iommu->domain_list, next) {
> > +		list_for_each_entry(group, &d->group_list, next) {
> > +			break;
> use list_first_entry?

yeah, based on the discussion in below link, we only support singleton
container with nesting type, also if no group in container, the nesting_info
will be cleared. so yes, list_first_entry will work as well.

https://lkml.org/lkml/2020/5/15/1028

> > +		}
> > +	}
> > +	return group;
> > +}
> > +
> >  static int put_pfn(unsigned long pfn, int prot);
> >
> >  static struct vfio_group *vfio_iommu_find_iommu_group(struct vfio_iommu
> *iommu,
> > @@ -2349,6 +2373,48 @@ static int vfio_iommu_resv_refresh(struct vfio_iommu
> *iommu,
> >  	return ret;
> >  }
> >
> > +static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
> > +{
> > +	struct domain_capsule *dc = (struct domain_capsule *)data;
> > +	unsigned long arg = *(unsigned long *)dc->data;
> > +
> > +	return iommu_sva_bind_gpasid(dc->domain, dev, (void __user *)arg);
> > +}
> > +
> > +static int vfio_dev_unbind_gpasid_fn(struct device *dev, void *data)
> > +{
> > +	struct domain_capsule *dc = (struct domain_capsule *)data;
> > +	unsigned long arg = *(unsigned long *)dc->data;
> > +
> > +	iommu_sva_unbind_gpasid(dc->domain, dev, (void __user *)arg);
> > +	return 0;
> > +}
> > +
> > +static int __vfio_dev_unbind_gpasid_fn(struct device *dev, void *data)
> > +{
> > +	struct domain_capsule *dc = (struct domain_capsule *)data;
> > +	struct iommu_gpasid_bind_data *unbind_data =
> > +				(struct iommu_gpasid_bind_data *)dc->data;
> > +
> > +	__iommu_sva_unbind_gpasid(dc->domain, dev, unbind_data);
> > +	return 0;
> > +}
> > +
> > +static void vfio_group_unbind_gpasid_fn(ioasid_t pasid, void *data)
> > +{
> > +	struct domain_capsule *dc = (struct domain_capsule *)data;
> > +	struct iommu_gpasid_bind_data unbind_data;
> > +
> > +	unbind_data.argsz = offsetof(struct iommu_gpasid_bind_data, vendor);
> > +	unbind_data.flags = 0;
> > +	unbind_data.hpasid = pasid;
> > +
> > +	dc->data = &unbind_data;
> > +
> > +	iommu_group_for_each_dev(dc->group->iommu_group,
> > +				 dc, __vfio_dev_unbind_gpasid_fn);
> > +}
> > +
> >  static void vfio_iommu_type1_detach_group(void *iommu_data,
> >  					  struct iommu_group *iommu_group)
> >  {
> > @@ -2392,6 +2458,21 @@ static void vfio_iommu_type1_detach_group(void
> *iommu_data,
> >  		if (!group)
> >  			continue;
> >
> > +		if (iommu->nesting_info && iommu->vmm &&
> > +		    (iommu->nesting_info->features &
> > +					IOMMU_NESTING_FEAT_BIND_PGTBL)) {
> > +			struct domain_capsule dc = { .group = group,
> > +						     .domain = domain->domain,
> > +						     .data = NULL };
> > +
> > +			/*
> > +			 * Unbind page tables bound with system wide PASIDs
> > +			 * which are allocated to user space.
> > +			 */
> > +			vfio_mm_for_each_pasid(iommu->vmm, &dc,
> > +					       vfio_group_unbind_gpasid_fn);
> > +		}
> > +
> >  		vfio_iommu_detach_group(domain, group);
> >  		update_dirty_scope = !group->pinned_page_dirty_scope;
> >  		list_del(&group->next);
> > @@ -2938,6 +3019,89 @@ static int vfio_iommu_type1_pasid_request(struct
> vfio_iommu *iommu,
> >  	}
> >  }
> >
> > +static long vfio_iommu_handle_pgtbl_op(struct vfio_iommu *iommu,
> > +				       bool is_bind, unsigned long arg)
> > +{
> > +	struct iommu_nesting_info *info;
> > +	struct domain_capsule dc = { .data = &arg };
> > +	struct vfio_group *group;
> > +	struct vfio_domain *domain;
> > +	int ret;
> > +
> > +	mutex_lock(&iommu->lock);
> > +
> > +	info = iommu->nesting_info;
> > +	if (!info || !(info->features & IOMMU_NESTING_FEAT_BIND_PGTBL)) {
> > +		ret = -EOPNOTSUPP;
> > +		goto out_unlock_iommu;
> > +	}
> > +
> > +	if (!iommu->vmm) {
> > +		ret = -EINVAL;
> > +		goto out_unlock_iommu;
> > +	}
> > +
> > +	group = vfio_find_nesting_group(iommu);
> is it realy useful to introduce vfio_find_nesting_group?
> in this function you already test info, you fetch the first domain
> below. So you would only need to fetch the 1st group?

this function was added to highlight the singleton container support
limitation. perhaps, it can be replaced with list_first_entry.

> > +	if (!group) {
> > +		ret = -EINVAL;
> can it happen?

guess not. may drop if vfio_find_nesting_group() is dropped.

> > +		goto out_unlock_iommu;
> > +	}
> > +
> > +	domain = list_first_entry(&iommu->domain_list,
> > +				  struct vfio_domain, next);
> > +	dc.group = group;
> > +	dc.domain = domain->domain;
> > +
> > +	/* Avoid race with other containers within the same process */
> > +	vfio_mm_pasid_lock(iommu->vmm);
> > +
> > +	if (is_bind) {
> > +		ret = iommu_group_for_each_dev(group->iommu_group, &dc,
> > +					       vfio_dev_bind_gpasid_fn);
> > +		if (ret)
> > +			iommu_group_for_each_dev(group->iommu_group, &dc,
> > +						 vfio_dev_unbind_gpasid_fn);
> > +	} else {
> > +		iommu_group_for_each_dev(group->iommu_group,
> > +					 &dc, vfio_dev_unbind_gpasid_fn);
> > +		ret = 0;
> 
> int ret = 0;
> 
> if (is_bind) {
> ret = iommu_group_for_each_dev(group->iommu_group, &dc,
>  			       vfio_dev_bind_gpasid_fn);
> }
> if (ret || !is_bind) {
> 	iommu_group_for_each_dev(group->iommu_group,
> 			&dc, vfio_dev_unbind_gpasid_fn);
> }

got it. :-)

Regards,
Yi Liu

> > +	}
> > +
> > +	vfio_mm_pasid_unlock(iommu->vmm);
> > +out_unlock_iommu:
> > +	mutex_unlock(&iommu->lock);
> > +	return ret;
> > +}
> > +
> > +static long vfio_iommu_type1_nesting_op(struct vfio_iommu *iommu,
> > +					unsigned long arg)
> > +{
> > +	struct vfio_iommu_type1_nesting_op hdr;
> > +	unsigned int minsz;
> > +	int ret;
> > +
> > +	minsz = offsetofend(struct vfio_iommu_type1_nesting_op, flags);
> > +
> > +	if (copy_from_user(&hdr, (void __user *)arg, minsz))
> > +		return -EFAULT;
> > +
> > +	if (hdr.argsz < minsz || hdr.flags & ~VFIO_NESTING_OP_MASK)
> > +		return -EINVAL;
> > +
> > +	switch (hdr.flags & VFIO_NESTING_OP_MASK) {
> > +	case VFIO_IOMMU_NESTING_OP_BIND_PGTBL:
> > +		ret = vfio_iommu_handle_pgtbl_op(iommu, true, arg + minsz);
> > +		break;
> > +	case VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL:
> > +		ret = vfio_iommu_handle_pgtbl_op(iommu, false, arg + minsz);
> > +		break;
> > +	default:
> > +		ret = -EINVAL;
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> >  static long vfio_iommu_type1_ioctl(void *iommu_data,
> >  				   unsigned int cmd, unsigned long arg)
> >  {
> > @@ -2956,6 +3120,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> >  		return vfio_iommu_type1_dirty_pages(iommu, arg);
> >  	case VFIO_IOMMU_PASID_REQUEST:
> >  		return vfio_iommu_type1_pasid_request(iommu, arg);
> > +	case VFIO_IOMMU_NESTING_OP:
> > +		return vfio_iommu_type1_nesting_op(iommu, arg);
> >  	default:
> >  		return -ENOTTY;
> >  	}
> > diff --git a/drivers/vfio/vfio_pasid.c b/drivers/vfio/vfio_pasid.c
> > index ebec244..ecabaaa 100644
> > --- a/drivers/vfio/vfio_pasid.c
> > +++ b/drivers/vfio/vfio_pasid.c
> > @@ -216,6 +216,8 @@ void vfio_pasid_free_range(struct vfio_mm *vmm,
> >  	 * IOASID core will notify PASID users (e.g. IOMMU driver) to
> >  	 * teardown necessary structures depending on the to-be-freed
> >  	 * PASID.
> > +	 * Hold pasid_lock also avoids race with PASID usages like bind/
> > +	 * unbind page tables to requested PASID.
> >  	 */
> >  	mutex_lock(&vmm->pasid_lock);
> >  	while ((vid = vfio_find_pasid(vmm, min, max)) != NULL)
> > @@ -224,6 +226,30 @@ void vfio_pasid_free_range(struct vfio_mm *vmm,
> >  }
> >  EXPORT_SYMBOL_GPL(vfio_pasid_free_range);
> >
> > +int vfio_mm_for_each_pasid(struct vfio_mm *vmm, void *data,
> > +			   void (*fn)(ioasid_t id, void *data))
> > +{
> > +	int ret;
> > +
> > +	mutex_lock(&vmm->pasid_lock);
> > +	ret = ioasid_set_for_each_ioasid(vmm->ioasid_sid, fn, data);
> > +	mutex_unlock(&vmm->pasid_lock);
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(vfio_mm_for_each_pasid);
> > +
> > +void vfio_mm_pasid_lock(struct vfio_mm *vmm)
> > +{
> > +	mutex_lock(&vmm->pasid_lock);
> > +}
> > +EXPORT_SYMBOL_GPL(vfio_mm_pasid_lock);
> > +
> > +void vfio_mm_pasid_unlock(struct vfio_mm *vmm)
> > +{
> > +	mutex_unlock(&vmm->pasid_lock);
> > +}
> > +EXPORT_SYMBOL_GPL(vfio_mm_pasid_unlock);
> > +
> >  static int __init vfio_pasid_init(void)
> >  {
> >  	mutex_init(&vfio_mm_lock);
> > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > index a111108..2bc8347 100644
> > --- a/include/linux/vfio.h
> > +++ b/include/linux/vfio.h
> > @@ -105,6 +105,11 @@ int vfio_mm_ioasid_sid(struct vfio_mm *vmm);
> >  extern int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max);
> >  extern void vfio_pasid_free_range(struct vfio_mm *vmm,
> >  				  ioasid_t min, ioasid_t max);
> > +extern int vfio_mm_for_each_pasid(struct vfio_mm *vmm, void *data,
> > +				  void (*fn)(ioasid_t id, void *data));
> > +extern void vfio_mm_pasid_lock(struct vfio_mm *vmm);
> > +extern void vfio_mm_pasid_unlock(struct vfio_mm *vmm);
> > +
> >  #else
> >  static inline struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task)
> >  {
> > @@ -129,6 +134,21 @@ static inline void vfio_pasid_free_range(struct vfio_mm
> *vmm,
> >  					  ioasid_t min, ioasid_t max)
> >  {
> >  }
> > +
> > +static inline int vfio_mm_for_each_pasid(struct vfio_mm *vmm, void *data,
> > +					 void (*fn)(ioasid_t id, void *data))
> > +{
> > +	return -ENOTTY;
> > +}
> > +
> > +static inline void vfio_mm_pasid_lock(struct vfio_mm *vmm)
> > +{
> > +}
> > +
> > +static inline void vfio_mm_pasid_unlock(struct vfio_mm *vmm)
> > +{
> > +}
> > +
> >  #endif /* CONFIG_VFIO_PASID */
> >
> >  /*
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 96a115f..a8ad786 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -1209,6 +1209,37 @@ struct vfio_iommu_type1_pasid_request {
> >
> >  #define VFIO_IOMMU_PASID_REQUEST	_IO(VFIO_TYPE, VFIO_BASE + 18)
> >
> > +/**
> > + * VFIO_IOMMU_NESTING_OP - _IOW(VFIO_TYPE, VFIO_BASE + 19,
> > + *				struct vfio_iommu_type1_nesting_op)
> > + *
> > + * This interface allows user space to utilize the nesting IOMMU
> > + * capabilities as reported in VFIO_IOMMU_TYPE1_INFO_CAP_NESTING
> > + * cap through VFIO_IOMMU_GET_INFO.
> > + *
> > + * @data[] types defined for each op:
> > + *
> +=================+===============================================+
> > + * | NESTING OP      |      @data[]                                  |
> > + *
> +=================+===============================================+
> > + * | BIND_PGTBL      |      struct iommu_gpasid_bind_data            |
> > + * +-----------------+-----------------------------------------------+
> > + * | UNBIND_PGTBL    |      struct iommu_gpasid_bind_data            |
> > + * +-----------------+-----------------------------------------------+
> > + *
> > + * returns: 0 on success, -errno on failure.
> > + */
> > +struct vfio_iommu_type1_nesting_op {
> > +	__u32	argsz;
> > +	__u32	flags;
> > +#define VFIO_NESTING_OP_MASK	(0xffff) /* lower 16-bits for op */
> > +	__u8	data[];
> > +};
> > +
> > +#define VFIO_IOMMU_NESTING_OP_BIND_PGTBL	(0)
> > +#define VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL	(1)
> > +
> > +#define VFIO_IOMMU_NESTING_OP		_IO(VFIO_TYPE, VFIO_BASE + 19)
> > +
> >  /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
> >
> >  /*
> >
> Thanks
> 
> Eric

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [PATCH v5 11/15] vfio/type1: Allow invalidating first-level/stage IOMMU cache
  2020-07-20  9:41   ` Auger Eric
@ 2020-07-20 10:42     ` Liu, Yi L
  0 siblings, 0 replies; 58+ messages in thread
From: Liu, Yi L @ 2020-07-20 10:42 UTC (permalink / raw)
  To: Auger Eric, alex.williamson, baolu.lu, joro
  Cc: jean-philippe, Tian, Kevin, Raj, Ashok, kvm, stefanha, Tian,
	 Jun J, iommu, linux-kernel, Sun, Yi Y, Wu, Hao

Hi Eric,

> From: Auger Eric <eric.auger@redhat.com>
> Sent: Monday, July 20, 2020 5:42 PM
> 
> Yi,
> 
> On 7/12/20 1:21 PM, Liu Yi L wrote:
> > This patch provides an interface allowing the userspace to invalidate
> > IOMMU cache for first-level page table. It is required when the first
> > level IOMMU page table is not managed by the host kernel in the nested
> > translation setup.
> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Alex Williamson <alex.williamson@redhat.com>
> > Cc: Eric Auger <eric.auger@redhat.com>
> > Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > Cc: Joerg Roedel <joro@8bytes.org>
> > Cc: Lu Baolu <baolu.lu@linux.intel.com>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > Signed-off-by: Eric Auger <eric.auger@redhat.com>
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > ---
> > v1 -> v2:
> > *) rename from "vfio/type1: Flush stage-1 IOMMU cache for nesting type"
> > *) rename vfio_cache_inv_fn() to vfio_dev_cache_invalidate_fn()
> > *) vfio_dev_cache_inv_fn() always successful
> > *) remove VFIO_IOMMU_CACHE_INVALIDATE, and reuse
> VFIO_IOMMU_NESTING_OP
> > ---
> >  drivers/vfio/vfio_iommu_type1.c | 50
> +++++++++++++++++++++++++++++++++++++++++
> >  include/uapi/linux/vfio.h       |  3 +++
> >  2 files changed, 53 insertions(+)
> >
> > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > index f0f21ff..960cc59 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -3073,6 +3073,53 @@ static long vfio_iommu_handle_pgtbl_op(struct
> vfio_iommu *iommu,
> >  	return ret;
> >  }
> >
> > +static int vfio_dev_cache_invalidate_fn(struct device *dev, void *data)
> > +{
> > +	struct domain_capsule *dc = (struct domain_capsule *)data;
> > +	unsigned long arg = *(unsigned long *)dc->data;
> > +
> > +	iommu_cache_invalidate(dc->domain, dev, (void __user *)arg);
> > +	return 0;
> > +}
> > +
> > +static long vfio_iommu_invalidate_cache(struct vfio_iommu *iommu,
> > +					unsigned long arg)
> > +{
> > +	struct domain_capsule dc = { .data = &arg };
> > +	struct vfio_group *group;
> > +	struct vfio_domain *domain;
> > +	int ret = 0;
> > +	struct iommu_nesting_info *info;
> > +
> > +	mutex_lock(&iommu->lock);
> > +	/*
> > +	 * Cache invalidation is required for any nesting IOMMU,
> > +	 * so no need to check system-wide PASID support.
> > +	 */
> > +	info = iommu->nesting_info;
> > +	if (!info || !(info->features & IOMMU_NESTING_FEAT_CACHE_INVLD)) {
> > +		ret = -EOPNOTSUPP;
> > +		goto out_unlock;
> > +	}
> > +
> > +	group = vfio_find_nesting_group(iommu);
> so I see you reuse it here. But still wondering if you cant't directly
> set dc.domain and dc.group group below using list_firt_entry?

I guess yes for current implementation. I also considered if I can
get a helper function to retrun a dc with group and domain field
initialized as it is common code used by both bind/unbind and cache_inv
path. perhaps something like get_domain_capsule_for_nesting()

> > +	if (!group) {
> > +		ret = -EINVAL;
> > +		goto out_unlock;
> > +	}
> > +
> > +	domain = list_first_entry(&iommu->domain_list,
> > +				  struct vfio_domain, next);
> > +	dc.group = group;
> > +	dc.domain = domain->domain;
> > +	iommu_group_for_each_dev(group->iommu_group, &dc,
> > +				 vfio_dev_cache_invalidate_fn);
> > +
> > +out_unlock:
> > +	mutex_unlock(&iommu->lock);
> > +	return ret;
> > +}
> > +
> >  static long vfio_iommu_type1_nesting_op(struct vfio_iommu *iommu,
> >  					unsigned long arg)
> >  {
> > @@ -3095,6 +3142,9 @@ static long vfio_iommu_type1_nesting_op(struct
> vfio_iommu *iommu,
> >  	case VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL:
> >  		ret = vfio_iommu_handle_pgtbl_op(iommu, false, arg + minsz);
> >  		break;
> > +	case VFIO_IOMMU_NESTING_OP_CACHE_INVLD:
> > +		ret = vfio_iommu_invalidate_cache(iommu, arg + minsz);
> > +		break;
> >  	default:
> >  		ret = -EINVAL;
> >  	}
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index a8ad786..845a5800 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -1225,6 +1225,8 @@ struct vfio_iommu_type1_pasid_request {
> >   * +-----------------+-----------------------------------------------+
> >   * | UNBIND_PGTBL    |      struct iommu_gpasid_bind_data            |
> >   * +-----------------+-----------------------------------------------+
> > + * | CACHE_INVLD     |      struct iommu_cache_invalidate_info       |
> > + * +-----------------+-----------------------------------------------+
> >   *
> >   * returns: 0 on success, -errno on failure.
> >   */
> > @@ -1237,6 +1239,7 @@ struct vfio_iommu_type1_nesting_op {
> >
> >  #define VFIO_IOMMU_NESTING_OP_BIND_PGTBL	(0)
> >  #define VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL	(1)
> > +#define VFIO_IOMMU_NESTING_OP_CACHE_INVLD	(2)
> >
> >  #define VFIO_IOMMU_NESTING_OP		_IO(VFIO_TYPE, VFIO_BASE + 19)
> >
> >
> Otherwise looks good to me

thanks,

Regards,
Yi Liu

> Thanks
> 
> Eric

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v5 12/15] vfio/type1: Add vSVA support for IOMMU-backed mdevs
  2020-07-12 11:21 ` [PATCH v5 12/15] vfio/type1: Add vSVA support for IOMMU-backed mdevs Liu Yi L
@ 2020-07-20 12:22   ` Auger Eric
  2020-07-23 12:05     ` Liu, Yi L
  0 siblings, 1 reply; 58+ messages in thread
From: Auger Eric @ 2020-07-20 12:22 UTC (permalink / raw)
  To: Liu Yi L, alex.williamson, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	iommu, linux-kernel, yi.y.sun, hao.wu

Yi,

On 7/12/20 1:21 PM, Liu Yi L wrote:
> Recent years, mediated device pass-through framework (e.g. vfio-mdev)
> is used to achieve flexible device sharing across domains (e.g. VMs).
> Also there are hardware assisted mediated pass-through solutions from
> platform vendors. e.g. Intel VT-d scalable mode which supports Intel
> Scalable I/O Virtualization technology. Such mdevs are called IOMMU-
> backed mdevs as there are IOMMU enforced DMA isolation for such mdevs.
there is IOMMU enforced DMA isolation
> In kernel, IOMMU-backed mdevs are exposed to IOMMU layer by aux-domain
> concept, which means mdevs are protected by an iommu domain which is
> auxiliary to the domain that the kernel driver primarily uses for DMA
> API. Details can be found in the KVM presentation as below:
> 
> https://events19.linuxfoundation.org/wp-content/uploads/2017/12/\
> Hardware-Assisted-Mediated-Pass-Through-with-VFIO-Kevin-Tian-Intel.pdf
> 
> This patch extends NESTING_IOMMU ops to IOMMU-backed mdev devices. The
> main requirement is to use the auxiliary domain associated with mdev.

So as a result vSVM becomes functional for scalable mode mediated
devices, right?
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> CC: Jun Tian <jun.j.tian@intel.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Eric Auger <eric.auger@redhat.com>
> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> ---
> v1 -> v2:
> *) check the iommu_device to ensure the handling mdev is IOMMU-backed
> ---
>  drivers/vfio/vfio_iommu_type1.c | 39 +++++++++++++++++++++++++++++++++++----
>  1 file changed, 35 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 960cc59..f1f1ae2 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -2373,20 +2373,41 @@ static int vfio_iommu_resv_refresh(struct vfio_iommu *iommu,
>  	return ret;
>  }
>  
> +static struct device *vfio_get_iommu_device(struct vfio_group *group,
> +					    struct device *dev)
> +{
> +	if (group->mdev_group)
> +		return vfio_mdev_get_iommu_device(dev);
> +	else
> +		return dev;
> +}
> +
>  static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
>  {
>  	struct domain_capsule *dc = (struct domain_capsule *)data;
>  	unsigned long arg = *(unsigned long *)dc->data;
> +	struct device *iommu_device;
> +
> +	iommu_device = vfio_get_iommu_device(dc->group, dev);
> +	if (!iommu_device)
> +		return -EINVAL;
>  
> -	return iommu_sva_bind_gpasid(dc->domain, dev, (void __user *)arg);
> +	return iommu_sva_bind_gpasid(dc->domain, iommu_device,
> +				     (void __user *)arg);
>  }
>  
>  static int vfio_dev_unbind_gpasid_fn(struct device *dev, void *data)
>  {
>  	struct domain_capsule *dc = (struct domain_capsule *)data;
>  	unsigned long arg = *(unsigned long *)dc->data;
> +	struct device *iommu_device;
>  
> -	iommu_sva_unbind_gpasid(dc->domain, dev, (void __user *)arg);
> +	iommu_device = vfio_get_iommu_device(dc->group, dev);
> +	if (!iommu_device)
> +		return -EINVAL;
> +
> +	iommu_sva_unbind_gpasid(dc->domain, iommu_device,
> +				(void __user *)arg);
>  	return 0;
>  }
>  
> @@ -2395,8 +2416,13 @@ static int __vfio_dev_unbind_gpasid_fn(struct device *dev, void *data)
>  	struct domain_capsule *dc = (struct domain_capsule *)data;
>  	struct iommu_gpasid_bind_data *unbind_data =
>  				(struct iommu_gpasid_bind_data *)dc->data;
> +	struct device *iommu_device;
> +
> +	iommu_device = vfio_get_iommu_device(dc->group, dev);
> +	if (!iommu_device)
> +		return -EINVAL;
>  
> -	__iommu_sva_unbind_gpasid(dc->domain, dev, unbind_data);
> +	__iommu_sva_unbind_gpasid(dc->domain, iommu_device, unbind_data);
>  	return 0;
>  }
>  
> @@ -3077,8 +3103,13 @@ static int vfio_dev_cache_invalidate_fn(struct device *dev, void *data)
>  {
>  	struct domain_capsule *dc = (struct domain_capsule *)data;
>  	unsigned long arg = *(unsigned long *)dc->data;
> +	struct device *iommu_device;
> +
> +	iommu_device = vfio_get_iommu_device(dc->group, dev);
> +	if (!iommu_device)
> +		return -EINVAL;
>  
> -	iommu_cache_invalidate(dc->domain, dev, (void __user *)arg);
> +	iommu_cache_invalidate(dc->domain, iommu_device, (void __user *)arg);
>  	return 0;
>  }
>  
> 
Besides,

Looks grood to me

Reviewed-by: Eric Auger <eric.auger@redhat.com>

Eric

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v5 13/15] vfio/pci: Expose PCIe PASID capability to guest
  2020-07-12 11:21 ` [PATCH v5 13/15] vfio/pci: Expose PCIe PASID capability to guest Liu Yi L
@ 2020-07-20 12:35   ` Auger Eric
  2020-07-20 13:00     ` Liu, Yi L
  0 siblings, 1 reply; 58+ messages in thread
From: Auger Eric @ 2020-07-20 12:35 UTC (permalink / raw)
  To: Liu Yi L, alex.williamson, baolu.lu, joro
  Cc: jean-philippe, kevin.tian, ashok.raj, kvm, stefanha, jun.j.tian,
	iommu, linux-kernel, yi.y.sun, hao.wu

Yi,

On 7/12/20 1:21 PM, Liu Yi L wrote:
> This patch exposes PCIe PASID capability to guest for assigned devices.
> Existing vfio_pci driver hides it from guest by setting the capability
> length as 0 in pci_ext_cap_length[].
> 
> And this patch only exposes PASID capability for devices which has PCIe
> PASID extended struture in its configuration space. So VFs, will will
s/will//
> not see PASID capability on VFs as VF doesn't implement PASID extended
suggested rewording: VFs will not expose the PASID capability as they do
not implement the PASID extended structure in their config space?
> structure in its configuration space. For VF, it is a TODO in future.
> Related discussion can be found in below link:
> 
> https://lkml.org/lkml/2020/4/7/693

> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Eric Auger <eric.auger@redhat.com>
> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> ---
> v1 -> v2:
> *) added in v2, but it was sent in a separate patchseries before
> ---
>  drivers/vfio/pci/vfio_pci_config.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
> index d98843f..07ff2e6 100644
> --- a/drivers/vfio/pci/vfio_pci_config.c
> +++ b/drivers/vfio/pci/vfio_pci_config.c
> @@ -95,7 +95,7 @@ static const u16 pci_ext_cap_length[PCI_EXT_CAP_ID_MAX + 1] = {
>  	[PCI_EXT_CAP_ID_LTR]	=	PCI_EXT_CAP_LTR_SIZEOF,
>  	[PCI_EXT_CAP_ID_SECPCI]	=	0,	/* not yet */
>  	[PCI_EXT_CAP_ID_PMUX]	=	0,	/* not yet */
> -	[PCI_EXT_CAP_ID_PASID]	=	0,	/* not yet */
> +	[PCI_EXT_CAP_ID_PASID]	=	PCI_EXT_CAP_PASID_SIZEOF,
>  };
>  
>  /*
> 
Reviewed-by: Eric Auger <eric.auger@redhat.com>

Thanks

Eric

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v5 09/15] iommu/vt-d: Check ownership for PASIDs from user-space
  2020-07-20 10:18     ` Liu, Yi L
@ 2020-07-20 12:37       ` Auger Eric
  2020-07-20 12:55         ` Liu, Yi L
  0 siblings, 1 reply; 58+ messages in thread
From: Auger Eric @ 2020-07-20 12:37 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, baolu.lu, joro
  Cc: jean-philippe, Tian, Kevin, Raj, Ashok, kvm, stefanha, Tian,
	Jun J, iommu, linux-kernel, Sun, Yi Y, Wu, Hao

Yi,

On 7/20/20 12:18 PM, Liu, Yi L wrote:
> Hi Eric,
> 
>> From: Auger Eric <eric.auger@redhat.com>
>> Sent: Monday, July 20, 2020 12:06 AM
>>
>> Hi Yi,
>>
>> On 7/12/20 1:21 PM, Liu Yi L wrote:
>>> When an IOMMU domain with nesting attribute is used for guest SVA, a
>>> system-wide PASID is allocated for binding with the device and the domain.
>>> For security reason, we need to check the PASID passsed from user-space.
>> passed
> 
> got it.
> 
>>> e.g. page table bind/unbind and PASID related cache invalidation.
>>>
>>> Cc: Kevin Tian <kevin.tian@intel.com>
>>> CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
>>> Cc: Alex Williamson <alex.williamson@redhat.com>
>>> Cc: Eric Auger <eric.auger@redhat.com>
>>> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
>>> Cc: Joerg Roedel <joro@8bytes.org>
>>> Cc: Lu Baolu <baolu.lu@linux.intel.com>
>>> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
>>> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
>>> ---
>>>  drivers/iommu/intel/iommu.c | 10 ++++++++++
>>>  drivers/iommu/intel/svm.c   |  7 +++++--
>>>  2 files changed, 15 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
>>> index 4d54198..a9504cb 100644
>>> --- a/drivers/iommu/intel/iommu.c
>>> +++ b/drivers/iommu/intel/iommu.c
>>> @@ -5436,6 +5436,7 @@ intel_iommu_sva_invalidate(struct iommu_domain
>> *domain, struct device *dev,
>>>  		int granu = 0;
>>>  		u64 pasid = 0;
>>>  		u64 addr = 0;
>>> +		void *pdata;
>>>
>>>  		granu = to_vtd_granularity(cache_type, inv_info->granularity);
>>>  		if (granu == -EINVAL) {
>>> @@ -5456,6 +5457,15 @@ intel_iommu_sva_invalidate(struct iommu_domain
>> *domain, struct device *dev,
>>>  			 (inv_info->granu.addr_info.flags &
>> IOMMU_INV_ADDR_FLAGS_PASID))
>>>  			pasid = inv_info->granu.addr_info.pasid;
>>>
>>> +		pdata = ioasid_find(dmar_domain->ioasid_sid, pasid, NULL);
>>> +		if (!pdata) {
>>> +			ret = -EINVAL;
>>> +			goto out_unlock;
>>> +		} else if (IS_ERR(pdata)) {
>>> +			ret = PTR_ERR(pdata);
>>> +			goto out_unlock;
>>> +		}
>>> +
>>>  		switch (BIT(cache_type)) {
>>>  		case IOMMU_CACHE_INV_TYPE_IOTLB:
>>>  			/* HW will ignore LSB bits based on address mask */
>>> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
>>> index d2c0e1a..212dee0 100644
>>> --- a/drivers/iommu/intel/svm.c
>>> +++ b/drivers/iommu/intel/svm.c
>>> @@ -319,7 +319,7 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain,
>> struct device *dev,
>>>  	dmar_domain = to_dmar_domain(domain);
>>>
>>>  	mutex_lock(&pasid_mutex);
>>> -	svm = ioasid_find(INVALID_IOASID_SET, data->hpasid, NULL);
I meant while using INVALID_IOASID_SET instead of the actual
dmar_domain->ioasid_sid. But I think I've now recovered, the asset is
simply not used ;-)
>> I do not get what the call was supposed to do before that patch?
> 
> you mean patch 10/15 by "that patch", right? the ownership check should
> be done as to prevent illegal bind request from userspace. before patch
> 10/15, it should be added.
> 
>>> +	svm = ioasid_find(dmar_domain->ioasid_sid, data->hpasid, NULL);
>>>  	if (IS_ERR(svm)) {
>>>  		ret = PTR_ERR(svm);
>>>  		goto out;
>>> @@ -436,6 +436,7 @@ int intel_svm_unbind_gpasid(struct iommu_domain
>> *domain,
>>>  			    struct device *dev, ioasid_t pasid)
>>>  {
>>>  	struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
>>> +	struct dmar_domain *dmar_domain;
>>>  	struct intel_svm_dev *sdev;
>>>  	struct intel_svm *svm;
>>>  	int ret = -EINVAL;
>>> @@ -443,8 +444,10 @@ int intel_svm_unbind_gpasid(struct iommu_domain
>> *domain,
>>>  	if (WARN_ON(!iommu))
>>>  		return -EINVAL;
>>>
>>> +	dmar_domain = to_dmar_domain(domain);
>>> +
>>>  	mutex_lock(&pasid_mutex);
>>> -	svm = ioasid_find(INVALID_IOASID_SET, pasid, NULL);
>>> +	svm = ioasid_find(dmar_domain->ioasid_sid, pasid, NULL);
>> just to make sure, about the locking, can't domain->ioasid_sid change
>> under the hood?
> 
> I guess not. intel_svm_unbind_gpasid() and iommu_domain_set_attr()
> is called by vfio today, and within vfio, there is vfio_iommu->lock.
OK

Thanks

Eric
> 
> Regards,
> Yi Liu
> 
>>
>> Thanks
>>
>> Eric
>>>  	if (!svm) {
>>>  		ret = -EINVAL;
>>>  		goto out;
>>>
> 

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [PATCH v5 09/15] iommu/vt-d: Check ownership for PASIDs from user-space
  2020-07-20 12:37       ` Auger Eric
@ 2020-07-20 12:55         ` Liu, Yi L
  0 siblings, 0 replies; 58+ messages in thread
From: Liu, Yi L @ 2020-07-20 12:55 UTC (permalink / raw)
  To: Auger Eric, alex.williamson, baolu.lu, joro
  Cc: jean-philippe, Tian, Kevin, Raj, Ashok, kvm, stefanha, Tian,
	 Jun J, iommu, linux-kernel, Sun, Yi Y, Wu, Hao

Eric,

> From: Auger Eric <eric.auger@redhat.com>
> Sent: Monday, July 20, 2020 8:38 PM
> 
> Yi,
> 
> On 7/20/20 12:18 PM, Liu, Yi L wrote:
> > Hi Eric,
> >
> >> From: Auger Eric <eric.auger@redhat.com>
> >> Sent: Monday, July 20, 2020 12:06 AM
> >>
> >> Hi Yi,
> >>
> >> On 7/12/20 1:21 PM, Liu Yi L wrote:
> >>> When an IOMMU domain with nesting attribute is used for guest SVA, a
> >>> system-wide PASID is allocated for binding with the device and the domain.
> >>> For security reason, we need to check the PASID passsed from user-space.
> >> passed
> >
> > got it.
> >
> >>> e.g. page table bind/unbind and PASID related cache invalidation.
> >>>
> >>> Cc: Kevin Tian <kevin.tian@intel.com>
> >>> CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> >>> Cc: Alex Williamson <alex.williamson@redhat.com>
> >>> Cc: Eric Auger <eric.auger@redhat.com>
> >>> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> >>> Cc: Joerg Roedel <joro@8bytes.org>
> >>> Cc: Lu Baolu <baolu.lu@linux.intel.com>
> >>> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> >>> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> >>> ---
> >>>  drivers/iommu/intel/iommu.c | 10 ++++++++++
> >>>  drivers/iommu/intel/svm.c   |  7 +++++--
> >>>  2 files changed, 15 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> >>> index 4d54198..a9504cb 100644
> >>> --- a/drivers/iommu/intel/iommu.c
> >>> +++ b/drivers/iommu/intel/iommu.c
> >>> @@ -5436,6 +5436,7 @@ intel_iommu_sva_invalidate(struct iommu_domain
> >> *domain, struct device *dev,
> >>>  		int granu = 0;
> >>>  		u64 pasid = 0;
> >>>  		u64 addr = 0;
> >>> +		void *pdata;
> >>>
> >>>  		granu = to_vtd_granularity(cache_type, inv_info->granularity);
> >>>  		if (granu == -EINVAL) {
> >>> @@ -5456,6 +5457,15 @@ intel_iommu_sva_invalidate(struct iommu_domain
> >> *domain, struct device *dev,
> >>>  			 (inv_info->granu.addr_info.flags &
> >> IOMMU_INV_ADDR_FLAGS_PASID))
> >>>  			pasid = inv_info->granu.addr_info.pasid;
> >>>
> >>> +		pdata = ioasid_find(dmar_domain->ioasid_sid, pasid, NULL);
> >>> +		if (!pdata) {
> >>> +			ret = -EINVAL;
> >>> +			goto out_unlock;
> >>> +		} else if (IS_ERR(pdata)) {
> >>> +			ret = PTR_ERR(pdata);
> >>> +			goto out_unlock;
> >>> +		}
> >>> +
> >>>  		switch (BIT(cache_type)) {
> >>>  		case IOMMU_CACHE_INV_TYPE_IOTLB:
> >>>  			/* HW will ignore LSB bits based on address mask */
> >>> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> >>> index d2c0e1a..212dee0 100644
> >>> --- a/drivers/iommu/intel/svm.c
> >>> +++ b/drivers/iommu/intel/svm.c
> >>> @@ -319,7 +319,7 @@ int intel_svm_bind_gpasid(struct iommu_domain
> *domain,
> >> struct device *dev,
> >>>  	dmar_domain = to_dmar_domain(domain);
> >>>
> >>>  	mutex_lock(&pasid_mutex);
> >>> -	svm = ioasid_find(INVALID_IOASID_SET, data->hpasid, NULL);
> I meant while using INVALID_IOASID_SET instead of the actual
> dmar_domain->ioasid_sid. But I think I've now recovered, the asset is
> simply not used ;-)

oh, I think should be using dmar_domain->ioasid_sid from the beginning.
does it look good so far? :-)

Regards,
Yi Liu

> >> I do not get what the call was supposed to do before that patch?
> >
> > you mean patch 10/15 by "that patch", right? the ownership check should
> > be done as to prevent illegal bind request from userspace. before patch
> > 10/15, it should be added.
> >
> >>> +	svm = ioasid_find(dmar_domain->ioasid_sid, data->hpasid, NULL);
> >>>  	if (IS_ERR(svm)) {
> >>>  		ret = PTR_ERR(svm);
> >>>  		goto out;
> >>> @@ -436,6 +436,7 @@ int intel_svm_unbind_gpasid(struct iommu_domain
> >> *domain,
> >>>  			    struct device *dev, ioasid_t pasid)
> >>>  {
> >>>  	struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> >>> +	struct dmar_domain *dmar_domain;
> >>>  	struct intel_svm_dev *sdev;
> >>>  	struct intel_svm *svm;
> >>>  	int ret = -EINVAL;
> >>> @@ -443,8 +444,10 @@ int intel_svm_unbind_gpasid(struct iommu_domain
> >> *domain,
> >>>  	if (WARN_ON(!iommu))
> >>>  		return -EINVAL;
> >>>
> >>> +	dmar_domain = to_dmar_domain(domain);
> >>> +
> >>>  	mutex_lock(&pasid_mutex);
> >>> -	svm = ioasid_find(INVALID_IOASID_SET, pasid, NULL);
> >>> +	svm = ioasid_find(dmar_domain->ioasid_sid, pasid, NULL);
> >> just to make sure, about the locking, can't domain->ioasid_sid change
> >> under the hood?
> >
> > I guess not. intel_svm_unbind_gpasid() and iommu_domain_set_attr()
> > is called by vfio today, and within vfio, there is vfio_iommu->lock.
> OK
> 
> Thanks
> 
> Eric
> >
> > Regards,
> > Yi Liu
> >
> >>
> >> Thanks
> >>
> >> Eric
> >>>  	if (!svm) {
> >>>  		ret = -EINVAL;
> >>>  		goto out;
> >>>
> >

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [PATCH v5 13/15] vfio/pci: Expose PCIe PASID capability to guest
  2020-07-20 12:35   ` Auger Eric
@ 2020-07-20 13:00     ` Liu, Yi L
  0 siblings, 0 replies; 58+ messages in thread
From: Liu, Yi L @ 2020-07-20 13:00 UTC (permalink / raw)
  To: Auger Eric, alex.williamson, baolu.lu, joro
  Cc: jean-philippe, Tian, Kevin, Raj, Ashok, kvm, stefanha, Tian,
	 Jun J, iommu, linux-kernel, Sun, Yi Y, Wu, Hao

Hi Eric,

> From: Auger Eric <eric.auger@redhat.com>
> Sent: Monday, July 20, 2020 8:35 PM
> 
> Yi,
> 
> On 7/12/20 1:21 PM, Liu Yi L wrote:
> > This patch exposes PCIe PASID capability to guest for assigned devices.
> > Existing vfio_pci driver hides it from guest by setting the capability
> > length as 0 in pci_ext_cap_length[].
> >
> > And this patch only exposes PASID capability for devices which has PCIe
> > PASID extended struture in its configuration space. So VFs, will will
> s/will//

got it.

> > not see PASID capability on VFs as VF doesn't implement PASID extended
> suggested rewording: VFs will not expose the PASID capability as they do
> not implement the PASID extended structure in their config space?

make sense. will modify it.

> > structure in its configuration space. For VF, it is a TODO in future.
> > Related discussion can be found in below link:
> >
> > https://lkml.org/lkml/2020/4/7/693
> 
> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Alex Williamson <alex.williamson@redhat.com>
> > Cc: Eric Auger <eric.auger@redhat.com>
> > Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > Cc: Joerg Roedel <joro@8bytes.org>
> > Cc: Lu Baolu <baolu.lu@linux.intel.com>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > ---
> > v1 -> v2:
> > *) added in v2, but it was sent in a separate patchseries before
> > ---
> >  drivers/vfio/pci/vfio_pci_config.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
> > index d98843f..07ff2e6 100644
> > --- a/drivers/vfio/pci/vfio_pci_config.c
> > +++ b/drivers/vfio/pci/vfio_pci_config.c
> > @@ -95,7 +95,7 @@ static const u16 pci_ext_cap_length[PCI_EXT_CAP_ID_MAX +
> 1] = {
> >  	[PCI_EXT_CAP_ID_LTR]	=	PCI_EXT_CAP_LTR_SIZEOF,
> >  	[PCI_EXT_CAP_ID_SECPCI]	=	0,	/* not yet */
> >  	[PCI_EXT_CAP_ID_PMUX]	=	0,	/* not yet */
> > -	[PCI_EXT_CAP_ID_PASID]	=	0,	/* not yet */
> > +	[PCI_EXT_CAP_ID_PASID]	=	PCI_EXT_CAP_PASID_SIZEOF,
> >  };
> >
> >  /*
> >
> Reviewed-by: Eric Auger <eric.auger@redhat.com>

thanks.

Regards,
Yi Liu

> Thanks
> 
> Eric

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [PATCH v5 15/15] iommu/vt-d: Support reporting nesting capability info
  2020-07-17 17:14   ` Auger Eric
@ 2020-07-20 13:33     ` Liu, Yi L
  0 siblings, 0 replies; 58+ messages in thread
From: Liu, Yi L @ 2020-07-20 13:33 UTC (permalink / raw)
  To: Auger Eric, alex.williamson, baolu.lu, joro
  Cc: jean-philippe, Tian, Kevin, Raj, Ashok, kvm, stefanha, Tian,
	 Jun J, iommu, linux-kernel, Sun, Yi Y, Wu, Hao

Hi Eric,

> From: Auger Eric <eric.auger@redhat.com>
> Sent: Saturday, July 18, 2020 1:14 AM
> 
> Hi Yi,
> 
> Missing a proper commit message. You can comment on the fact you only
> support the case where all the physical iomms have the same CAP/ECAP MASKS

got it. will add it. it looks like the subject is straightforward, so I removed commit
message.

> 
> On 7/12/20 1:21 PM, Liu Yi L wrote:
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Alex Williamson <alex.williamson@redhat.com>
> > Cc: Eric Auger <eric.auger@redhat.com>
> > Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > Cc: Joerg Roedel <joro@8bytes.org>
> > Cc: Lu Baolu <baolu.lu@linux.intel.com>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > ---
> > v2 -> v3:
> > *) remove cap/ecap_mask in iommu_nesting_info.
> > ---
> >  drivers/iommu/intel/iommu.c | 81
> +++++++++++++++++++++++++++++++++++++++++++--
> >  include/linux/intel-iommu.h | 16 +++++++++
> >  2 files changed, 95 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> > index a9504cb..9f7ad1a 100644
> > --- a/drivers/iommu/intel/iommu.c
> > +++ b/drivers/iommu/intel/iommu.c
> > @@ -5659,12 +5659,16 @@ static inline bool iommu_pasid_support(void)
> >  static inline bool nested_mode_support(void)
> >  {
> >  	struct dmar_drhd_unit *drhd;
> > -	struct intel_iommu *iommu;
> > +	struct intel_iommu *iommu, *prev = NULL;
> >  	bool ret = true;
> >
> >  	rcu_read_lock();
> >  	for_each_active_iommu(iommu, drhd) {
> > -		if (!sm_supported(iommu) || !ecap_nest(iommu->ecap)) {
> > +		if (!prev)
> > +			prev = iommu;
> > +		if (!sm_supported(iommu) || !ecap_nest(iommu->ecap) ||
> > +		    (VTD_CAP_MASK & (iommu->cap ^ prev->cap)) ||
> > +		    (VTD_ECAP_MASK & (iommu->ecap ^ prev->ecap))) {
> >  			ret = false;
> >  			break;>  		}
> > @@ -6079,6 +6083,78 @@ intel_iommu_domain_set_attr(struct iommu_domain
> *domain,
> >  	return ret;
> >  }
> >
> > +static int intel_iommu_get_nesting_info(struct iommu_domain *domain,
> > +					struct iommu_nesting_info *info)
> > +{
> > +	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> > +	u64 cap = VTD_CAP_MASK, ecap = VTD_ECAP_MASK;
> > +	struct device_domain_info *domain_info;
> > +	struct iommu_nesting_info_vtd vtd;
> > +	unsigned long flags;
> > +	unsigned int size;
> > +
> > +	if (domain->type != IOMMU_DOMAIN_UNMANAGED ||
> > +	    !(dmar_domain->flags & DOMAIN_FLAG_NESTING_MODE))
> > +		return -ENODEV;
> > +
> > +	if (!info)
> > +		return -EINVAL;
> > +
> > +	size = sizeof(struct iommu_nesting_info) +
> > +		sizeof(struct iommu_nesting_info_vtd);
> > +	/*
> > +	 * if provided buffer size is smaller than expected, should
> > +	 * return 0 and also the expected buffer size to caller.
> > +	 */
> > +	if (info->size < size) {
> > +		info->size = size;
> > +		return 0;
> > +	}
> > +
> > +	spin_lock_irqsave(&device_domain_lock, flags);
> > +	/*
> > +	 * arbitrary select the first domain_info as all nesting
> > +	 * related capabilities should be consistent across iommu
> > +	 * units.
> > +	 */
> > +	domain_info = list_first_entry(&dmar_domain->devices,
> > +				       struct device_domain_info, link);
> > +	cap &= domain_info->iommu->cap;
> > +	ecap &= domain_info->iommu->ecap;
> > +	spin_unlock_irqrestore(&device_domain_lock, flags);
> > +
> > +	info->format = IOMMU_PASID_FORMAT_INTEL_VTD;
> > +	info->features = IOMMU_NESTING_FEAT_SYSWIDE_PASID |
> > +			 IOMMU_NESTING_FEAT_BIND_PGTBL |
> > +			 IOMMU_NESTING_FEAT_CACHE_INVLD;
> > +	info->addr_width = dmar_domain->gaw;
> > +	info->pasid_bits = ilog2(intel_pasid_max_id);
> > +	info->padding = 0;
> > +	vtd.flags = 0;
> > +	vtd.padding = 0;
> > +	vtd.cap_reg = cap;
> > +	vtd.ecap_reg = ecap;
> > +
> > +	memcpy(info->data, &vtd, sizeof(vtd));
> > +	return 0;
> > +}
> > +
> > +static int intel_iommu_domain_get_attr(struct iommu_domain *domain,
> > +				       enum iommu_attr attr, void *data)
> > +{
> > +	switch (attr) {
> > +	case DOMAIN_ATTR_NESTING:
> > +	{
> > +		struct iommu_nesting_info *info =
> > +				(struct iommu_nesting_info *)data;
> > +
> > +		return intel_iommu_get_nesting_info(domain, info);
> > +	}
> > +	default:
> > +		return -ENODEV;
> -ENOENT?

arm_smmu_domain_get_attr() is using -ENODEV, so I used the same. I can
modify it if -ENOENT is better. :-)

Regards,
Yi Liu

> > +	}
> > +}
> > +
> >  /*
> >   * Check that the device does not live on an external facing PCI port that is
> >   * marked as untrusted. Such devices should not be able to apply quirks and
> > @@ -6101,6 +6177,7 @@ const struct iommu_ops intel_iommu_ops = {
> >  	.domain_alloc		= intel_iommu_domain_alloc,
> >  	.domain_free		= intel_iommu_domain_free,
> >  	.domain_set_attr	= intel_iommu_domain_set_attr,
> > +	.domain_get_attr	= intel_iommu_domain_get_attr,
> >  	.attach_dev		= intel_iommu_attach_device,
> >  	.detach_dev		= intel_iommu_detach_device,
> >  	.aux_attach_dev		= intel_iommu_aux_attach_device,
> > diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> > index 18f292e..c4ed0d4 100644
> > --- a/include/linux/intel-iommu.h
> > +++ b/include/linux/intel-iommu.h
> > @@ -197,6 +197,22 @@
> >  #define ecap_max_handle_mask(e) ((e >> 20) & 0xf)
> >  #define ecap_sc_support(e)	((e >> 7) & 0x1) /* Snooping Control */
> >
> > +/* Nesting Support Capability Alignment */
> > +#define VTD_CAP_FL1GP		BIT_ULL(56)
> > +#define VTD_CAP_FL5LP		BIT_ULL(60)
> > +#define VTD_ECAP_PRS		BIT_ULL(29)
> > +#define VTD_ECAP_ERS		BIT_ULL(30)
> > +#define VTD_ECAP_SRS		BIT_ULL(31)
> > +#define VTD_ECAP_EAFS		BIT_ULL(34)
> > +#define VTD_ECAP_PASID		BIT_ULL(40)
> 
> > +
> > +/* Only capabilities marked in below MASKs are reported */
> > +#define VTD_CAP_MASK		(VTD_CAP_FL1GP | VTD_CAP_FL5LP)
> > +
> > +#define VTD_ECAP_MASK		(VTD_ECAP_PRS | VTD_ECAP_ERS | \
> > +				 VTD_ECAP_SRS | VTD_ECAP_EAFS | \
> > +				 VTD_ECAP_PASID)
> > +
> >  /* Virtual command interface capability */
> >  #define vccap_pasid(v)		(((v) & DMA_VCS_PAS)) /* PASID allocation
> */
> >
> >
> Thanks
> 
> Eric
> 

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [PATCH v5 03/15] iommu/smmu: Report empty domain nesting info
  2020-07-16 15:39       ` Jean-Philippe Brucker
  2020-07-16 20:38         ` Auger Eric
@ 2020-07-23  9:40         ` Liu, Yi L
  1 sibling, 0 replies; 58+ messages in thread
From: Liu, Yi L @ 2020-07-23  9:40 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Tian, Kevin, Raj, Ashok, kvm, stefanha, Sun, Yi Y, linux-kernel,
	alex.williamson, iommu, Robin Murphy, Wu, Hao, Will Deacon, Tian,
	Jun J

Hi Jean,

> From: Jean-Philippe Brucker <jean-philippe@linaro.org>
> Sent: Thursday, July 16, 2020 11:40 PM
> 
> On Tue, Jul 14, 2020 at 10:12:49AM +0000, Liu, Yi L wrote:
> > > Have you verified that this doesn't break the existing usage of
> > > DOMAIN_ATTR_NESTING in drivers/vfio/vfio_iommu_type1.c?
> >
> > I didn't have ARM machine on my hand. But I contacted with Jean
> > Philippe, he confirmed no compiling issue. I didn't see any code
> > getting DOMAIN_ATTR_NESTING attr in current drivers/vfio/vfio_iommu_type1.c.
> > What I'm adding is to call iommu_domai_get_attr(, DOMAIN_ATTR_NESTIN)
> > and won't fail if the iommu_domai_get_attr() returns 0. This patch
> > returns an empty nesting info for DOMAIN_ATTR_NESTIN and return
> > value is 0 if no error. So I guess it won't fail nesting for ARM.
> 
> I confirm that this series doesn't break the current support for
> VFIO_IOMMU_TYPE1_NESTING with an SMMUv3. That said...

thanks.

> If the SMMU does not support stage-2 then there is a change in behavior
> (untested): after the domain is silently switched to stage-1 by the SMMU
> driver, VFIO will now query nesting info and obtain -ENODEV. Instead of
> succeding as before, the VFIO ioctl will now fail. I believe that's a fix
> rather than a regression, it should have been like this since the
> beginning. No known userspace has been using VFIO_IOMMU_TYPE1_NESTING so
> far, so I don't think it should be a concern.
> 
> And if userspace queries the nesting properties using the new ABI
> introduced in this patchset, it will obtain an empty struct.

yes.

> I think
> that's acceptable, but it may be better to avoid adding the nesting cap if
> @format is 0?

right. will add it in patch 4/15.

Regards,
Yi Liu

> 
> Thanks,
> Jean
> 
> >
> > @Eric, how about your opinion? your dual-stage vSMMU support may
> > also share the vfio_iommu_type1.c code.
> >
> > Regards,
> > Yi Liu
> >
> > > Will
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [PATCH v5 03/15] iommu/smmu: Report empty domain nesting info
  2020-07-17  9:09           ` Jean-Philippe Brucker
  2020-07-17 10:28             ` Auger Eric
@ 2020-07-23  9:44             ` Liu, Yi L
  1 sibling, 0 replies; 58+ messages in thread
From: Liu, Yi L @ 2020-07-23  9:44 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Auger Eric
  Cc: Tian, Kevin, Raj, Ashok, kvm, stefanha, Sun, Yi Y, linux-kernel,
	alex.williamson, iommu, Robin Murphy, Wu, Hao, Will Deacon, Tian,
	Jun J

Hi Jean,

> From: Jean-Philippe Brucker <jean-philippe@linaro.org>
> Sent: Friday, July 17, 2020 5:09 PM
> 
> On Thu, Jul 16, 2020 at 10:38:17PM +0200, Auger Eric wrote:
> > Hi Jean,
> >
> > On 7/16/20 5:39 PM, Jean-Philippe Brucker wrote:
> > > On Tue, Jul 14, 2020 at 10:12:49AM +0000, Liu, Yi L wrote:
> > >>> Have you verified that this doesn't break the existing usage of
> > >>> DOMAIN_ATTR_NESTING in drivers/vfio/vfio_iommu_type1.c?
> > >>
> > >> I didn't have ARM machine on my hand. But I contacted with Jean
> > >> Philippe, he confirmed no compiling issue. I didn't see any code
> > >> getting DOMAIN_ATTR_NESTING attr in current
> drivers/vfio/vfio_iommu_type1.c.
> > >> What I'm adding is to call iommu_domai_get_attr(, DOMAIN_ATTR_NESTIN)
> > >> and won't fail if the iommu_domai_get_attr() returns 0. This patch
> > >> returns an empty nesting info for DOMAIN_ATTR_NESTIN and return
> > >> value is 0 if no error. So I guess it won't fail nesting for ARM.
> > >
> > > I confirm that this series doesn't break the current support for
> > > VFIO_IOMMU_TYPE1_NESTING with an SMMUv3. That said...
> > >
> > > If the SMMU does not support stage-2 then there is a change in behavior
> > > (untested): after the domain is silently switched to stage-1 by the SMMU
> > > driver, VFIO will now query nesting info and obtain -ENODEV. Instead of
> > > succeding as before, the VFIO ioctl will now fail. I believe that's a fix
> > > rather than a regression, it should have been like this since the
> > > beginning. No known userspace has been using VFIO_IOMMU_TYPE1_NESTING
> so
> > > far, so I don't think it should be a concern.
> > But as Yi mentioned ealier, in the current vfio code there is no
> > DOMAIN_ATTR_NESTING query yet.
> 
> That's why something that would have succeeded before will now fail:
> Before this series, if user asked for a VFIO_IOMMU_TYPE1_NESTING, it would
> have succeeded even if the SMMU didn't support stage-2, as the driver
> would have silently fallen back on stage-1 mappings (which work exactly
> the same as stage-2-only since there was no nesting supported). After the
> series, we do check for DOMAIN_ATTR_NESTING so if user asks for
> VFIO_IOMMU_TYPE1_NESTING and the SMMU doesn't support stage-2, the ioctl
> fails.

I think this depends on iommu driver. I noticed current SMMU driver
doesn't check physical IOMMU about nesting capability. So I guess
the SET_IOMMU would still work for SMMU. but it will fail as you
mentioned in prior email, userspace will check the nesting info and
would fail as it gets an empty struct from host.

https://lore.kernel.org/kvm/20200716153959.GA447208@myrica/

> 
> I believe it's a good fix and completely harmless, but wanted to make sure
> no one objects because it's an ABI change.

yes.

Regards,
Yi Liu

> Thanks,
> Jean
> 
> > In my SMMUV3 nested stage series, I added
> > such a query in vfio-pci.c to detect if I need to expose a fault region
> > but I already test both the returned value and the output arg. So to me
> > there is no issue with that change.
> > >
> > > And if userspace queries the nesting properties using the new ABI
> > > introduced in this patchset, it will obtain an empty struct. I think
> > > that's acceptable, but it may be better to avoid adding the nesting cap if
> > > @format is 0?
> > agreed
> >
> > Thanks
> >
> > Eric
> > >
> > > Thanks,
> > > Jean
> > >
> > >>
> > >> @Eric, how about your opinion? your dual-stage vSMMU support may
> > >> also share the vfio_iommu_type1.c code.
> > >>
> > >> Regards,
> > >> Yi Liu
> > >>
> > >>> Will
> > >
> >
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [PATCH v5 12/15] vfio/type1: Add vSVA support for IOMMU-backed mdevs
  2020-07-20 12:22   ` Auger Eric
@ 2020-07-23 12:05     ` Liu, Yi L
  0 siblings, 0 replies; 58+ messages in thread
From: Liu, Yi L @ 2020-07-23 12:05 UTC (permalink / raw)
  To: Auger Eric, alex.williamson, baolu.lu, joro
  Cc: jean-philippe, Tian, Kevin, Raj, Ashok, kvm, stefanha, Tian,
	 Jun J, iommu, linux-kernel, Sun, Yi Y, Wu, Hao

Hi Eric,

> From: Auger Eric < eric.auger@redhat.com>
> Sent: Monday, July 20, 2020 8:22 PM
> 
> Yi,
> 
> On 7/12/20 1:21 PM, Liu Yi L wrote:
> > Recent years, mediated device pass-through framework (e.g. vfio-mdev)
> > is used to achieve flexible device sharing across domains (e.g. VMs).
> > Also there are hardware assisted mediated pass-through solutions from
> > platform vendors. e.g. Intel VT-d scalable mode which supports Intel
> > Scalable I/O Virtualization technology. Such mdevs are called IOMMU-
> > backed mdevs as there are IOMMU enforced DMA isolation for such mdevs.
> there is IOMMU enforced DMA isolation
> > In kernel, IOMMU-backed mdevs are exposed to IOMMU layer by aux-domain
> > concept, which means mdevs are protected by an iommu domain which is
> > auxiliary to the domain that the kernel driver primarily uses for DMA
> > API. Details can be found in the KVM presentation as below:
> >
> > https://events19.linuxfoundation.org/wp-content/uploads/2017/12/\
> > Hardware-Assisted-Mediated-Pass-Through-with-VFIO-Kevin-Tian-Intel.pdf
> >
> > This patch extends NESTING_IOMMU ops to IOMMU-backed mdev devices. The
> > main requirement is to use the auxiliary domain associated with mdev.
> 
> So as a result vSVM becomes functional for scalable mode mediated devices, right?

yes. as long as the mediated devices reports PASID capability.

> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > CC: Jun Tian <jun.j.tian@intel.com>
> > Cc: Alex Williamson <alex.williamson@redhat.com>
> > Cc: Eric Auger <eric.auger@redhat.com>
> > Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > Cc: Joerg Roedel <joro@8bytes.org>
> > Cc: Lu Baolu <baolu.lu@linux.intel.com>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > ---
> > v1 -> v2:
> > *) check the iommu_device to ensure the handling mdev is IOMMU-backed
> > ---
> >  drivers/vfio/vfio_iommu_type1.c | 39
> > +++++++++++++++++++++++++++++++++++----
> >  1 file changed, 35 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > b/drivers/vfio/vfio_iommu_type1.c index 960cc59..f1f1ae2 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -2373,20 +2373,41 @@ static int vfio_iommu_resv_refresh(struct
> vfio_iommu *iommu,
> >  	return ret;
> >  }
> >
> > +static struct device *vfio_get_iommu_device(struct vfio_group *group,
> > +					    struct device *dev)
> > +{
> > +	if (group->mdev_group)
> > +		return vfio_mdev_get_iommu_device(dev);
> > +	else
> > +		return dev;
> > +}
> > +
> >  static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)  {
> >  	struct domain_capsule *dc = (struct domain_capsule *)data;
> >  	unsigned long arg = *(unsigned long *)dc->data;
> > +	struct device *iommu_device;
> > +
> > +	iommu_device = vfio_get_iommu_device(dc->group, dev);
> > +	if (!iommu_device)
> > +		return -EINVAL;
> >
> > -	return iommu_sva_bind_gpasid(dc->domain, dev, (void __user *)arg);
> > +	return iommu_sva_bind_gpasid(dc->domain, iommu_device,
> > +				     (void __user *)arg);
> >  }
> >
> >  static int vfio_dev_unbind_gpasid_fn(struct device *dev, void *data)
> > {
> >  	struct domain_capsule *dc = (struct domain_capsule *)data;
> >  	unsigned long arg = *(unsigned long *)dc->data;
> > +	struct device *iommu_device;
> >
> > -	iommu_sva_unbind_gpasid(dc->domain, dev, (void __user *)arg);
> > +	iommu_device = vfio_get_iommu_device(dc->group, dev);
> > +	if (!iommu_device)
> > +		return -EINVAL;
> > +
> > +	iommu_sva_unbind_gpasid(dc->domain, iommu_device,
> > +				(void __user *)arg);
> >  	return 0;
> >  }
> >
> > @@ -2395,8 +2416,13 @@ static int __vfio_dev_unbind_gpasid_fn(struct device
> *dev, void *data)
> >  	struct domain_capsule *dc = (struct domain_capsule *)data;
> >  	struct iommu_gpasid_bind_data *unbind_data =
> >  				(struct iommu_gpasid_bind_data *)dc->data;
> > +	struct device *iommu_device;
> > +
> > +	iommu_device = vfio_get_iommu_device(dc->group, dev);
> > +	if (!iommu_device)
> > +		return -EINVAL;
> >
> > -	__iommu_sva_unbind_gpasid(dc->domain, dev, unbind_data);
> > +	__iommu_sva_unbind_gpasid(dc->domain, iommu_device, unbind_data);
> >  	return 0;
> >  }
> >
> > @@ -3077,8 +3103,13 @@ static int vfio_dev_cache_invalidate_fn(struct
> > device *dev, void *data)  {
> >  	struct domain_capsule *dc = (struct domain_capsule *)data;
> >  	unsigned long arg = *(unsigned long *)dc->data;
> > +	struct device *iommu_device;
> > +
> > +	iommu_device = vfio_get_iommu_device(dc->group, dev);
> > +	if (!iommu_device)
> > +		return -EINVAL;
> >
> > -	iommu_cache_invalidate(dc->domain, dev, (void __user *)arg);
> > +	iommu_cache_invalidate(dc->domain, iommu_device, (void __user
> > +*)arg);
> >  	return 0;
> >  }
> >
> >
> Besides,
> 
> Looks grood to me
> 
> Reviewed-by: Eric Auger <eric.auger@redhat.com>

thanks :-)

Regards,
Yi Liu

> Eric

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [PATCH v5 14/15] vfio: Document dual stage control
  2020-07-18 13:39   ` Auger Eric
@ 2020-07-25  8:54     ` Liu, Yi L
  0 siblings, 0 replies; 58+ messages in thread
From: Liu, Yi L @ 2020-07-25  8:54 UTC (permalink / raw)
  To: Auger Eric, alex.williamson, baolu.lu, joro
  Cc: jean-philippe, Tian, Kevin, Raj, Ashok, kvm, stefanha, Tian,
	 Jun J, iommu, linux-kernel, Sun, Yi Y, Wu, Hao

Hi Eric,

> From: Auger Eric <eric.auger@redhat.com>
> Sent: Saturday, July 18, 2020 9:40 PM
> 
> Hi Yi,
> 
> On 7/12/20 1:21 PM, Liu Yi L wrote:
> > From: Eric Auger <eric.auger@redhat.com>
> >
> > The VFIO API was enhanced to support nested stage control: a bunch of
> > new iotcls and usage guideline.
> ioctls

got it.

> >
> > Let's document the process to follow to set up nested mode.
> >
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Alex Williamson <alex.williamson@redhat.com>
> > Cc: Eric Auger <eric.auger@redhat.com>
> > Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > Cc: Joerg Roedel <joro@8bytes.org>
> > Cc: Lu Baolu <baolu.lu@linux.intel.com>
> > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > Signed-off-by: Eric Auger <eric.auger@redhat.com>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > ---
> > v3 -> v4:
> > *) add review-by from Stefan Hajnoczi
> >
> > v2 -> v3:
> > *) address comments from Stefan Hajnoczi
> >
> > v1 -> v2:
> > *) new in v2, compared with Eric's original version, pasid table bind
> >    and fault reporting is removed as this series doesn't cover them.
> >    Original version from Eric.
> >    https://lkml.org/lkml/2020/3/20/700
> > ---
> >  Documentation/driver-api/vfio.rst | 67
> +++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 67 insertions(+)
> >
> > diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst
> > index f1a4d3c..0672c45 100644
> > --- a/Documentation/driver-api/vfio.rst
> > +++ b/Documentation/driver-api/vfio.rst
> > @@ -239,6 +239,73 @@ group and can access them as follows::
> >  	/* Gratuitous device reset and go... */
> >  	ioctl(device, VFIO_DEVICE_RESET);
> >
> > +IOMMU Dual Stage Control
> > +------------------------
> > +
> > +Some IOMMUs support 2 stages/levels of translation. Stage corresponds to
> > +the ARM terminology while level corresponds to Intel's VTD terminology.
> > +In the following text we use either without distinction.
> > +
> > +This is useful when the guest is exposed with a virtual IOMMU and some
> > +devices are assigned to the guest through VFIO. Then the guest OS can use
> > +stage 1 (GIOVA -> GPA or GVA->GPA), while the hypervisor uses stage 2 for
> > +VM isolation (GPA -> HPA).
> > +
> > +Under dual stage translation, the guest gets ownership of the stage 1 page
> > +tables and also owns stage 1 configuration structures. The hypervisor owns
> > +the root configuration structure (for security reason), including stage 2
> > +configuration. This works as long as configuration structures and page table
> > +formats are compatible between the virtual IOMMU and the physical IOMMU.
> > +
> > +Assuming the HW supports it, this nested mode is selected by choosing the
> > +VFIO_TYPE1_NESTING_IOMMU type through:
> > +
> > +    ioctl(container, VFIO_SET_IOMMU, VFIO_TYPE1_NESTING_IOMMU);
> > +
> > +This forces the hypervisor to use the stage 2, leaving stage 1 available
> > +for guest usage. The guest stage 1 format depends on IOMMU vendor, and
> > +it is the same with the nesting configuration method. User space should
> > +check the format and configuration method after setting nesting type by
> > +using:
> > +
> > +    ioctl(container->fd, VFIO_IOMMU_GET_INFO, &nesting_info);
> > +
> > +Details can be found in Documentation/userspace-api/iommu.rst. For Intel
> > +VT-d, each stage 1 page table is bound to host by:
> > +
> > +    nesting_op->flags = VFIO_IOMMU_NESTING_OP_BIND_PGTBL;
> > +    memcpy(&nesting_op->data, &bind_data, sizeof(bind_data));
> > +    ioctl(container->fd, VFIO_IOMMU_NESTING_OP, nesting_op);
> > +
> > +As mentioned above, guest OS may use stage 1 for GIOVA->GPA or GVA->GPA.
> the guest OS, here and below?

got it.

> > +GVA->GPA page tables are available when PASID (Process Address Space ID)
> > +is exposed to guest. e.g. guest with PASID-capable devices assigned. For
> > +such page table binding, the bind_data should include PASID info, which
> > +is allocated by guest itself or by host. This depends on hardware vendor.
> > +e.g. Intel VT-d requires to allocate PASID from host. This requirement is
> 
> > +defined by the Virtual Command Support in VT-d 3.0 spec, guest software
> > +running on VT-d should allocate PASID from host kernel.
> because VTD 3.0 requires the unicity of the PASID, system wide, instead
> of the above repetition.

I see. perhaps better to say Intel platform. :-) will refine it.

> 
>  To allocate PASID
> > +from host, user space should check the IOMMU_NESTING_FEAT_SYSWIDE_PASID
> > +bit of the nesting info reported from host kernel. VFIO reports the nesting
> > +info by VFIO_IOMMU_GET_INFO. User space could allocate PASID from host by:
> if SYSWIDE_PASID requirement is exposed, the userspace *must* allocate ...

got it.

> > +
> > +    req.flags = VFIO_IOMMU_ALLOC_PASID;
> > +    ioctl(container, VFIO_IOMMU_PASID_REQUEST, &req);
> > +
> > +With first stage/level page table bound to host, it allows to combine the
> > +guest stage 1 translation along with the hypervisor stage 2 translation to
> > +get final address.
> > +
> > +When the guest invalidates stage 1 related caches, invalidations must be
> > +forwarded to the host through
> > +
> > +    nesting_op->flags = VFIO_IOMMU_NESTING_OP_CACHE_INVLD;
> > +    memcpy(&nesting_op->data, &inv_data, sizeof(inv_data));
> > +    ioctl(container->fd, VFIO_IOMMU_NESTING_OP, nesting_op);
> > +
> > +Those invalidations can happen at various granularity levels, page, context,
> > +...
> > +
> >  VFIO User API
> >  -------------------------------------------------------------------------------
> I see you dropped the unrecoverable error reporting part of the original
> contribution. By the way don't you need any error handling for either of
> the use cases on vtd?

yes, I dropped the error reporting part, the reason is the series doesn’t
include the error reporting support. guess adding it when the error
reporting is sent out.

Regards,
Yi Liu

> >
> >
> Thanks
> 
> Eric

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2020-07-25  8:54 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-12 11:20 [PATCH v5 00/15] vfio: expose virtual Shared Virtual Addressing to VMs Liu Yi L
2020-07-12 11:20 ` [PATCH v5 01/15] vfio/type1: Refactor vfio_iommu_type1_ioctl() Liu Yi L
2020-07-12 11:20 ` [PATCH v5 02/15] iommu: Report domain nesting info Liu Yi L
2020-07-17 16:29   ` Auger Eric
2020-07-20  7:20     ` Liu, Yi L
2020-07-12 11:20 ` [PATCH v5 03/15] iommu/smmu: Report empty " Liu Yi L
2020-07-13 13:14   ` Will Deacon
2020-07-14 10:12     ` Liu, Yi L
2020-07-16 15:39       ` Jean-Philippe Brucker
2020-07-16 20:38         ` Auger Eric
2020-07-17  9:09           ` Jean-Philippe Brucker
2020-07-17 10:28             ` Auger Eric
2020-07-23  9:44             ` Liu, Yi L
2020-07-23  9:40         ` Liu, Yi L
2020-07-17 17:18   ` Auger Eric
2020-07-20  7:26     ` Liu, Yi L
2020-07-12 11:20 ` [PATCH v5 04/15] vfio/type1: Report iommu nesting info to userspace Liu Yi L
2020-07-17 17:34   ` Auger Eric
2020-07-20  7:51     ` Liu, Yi L
2020-07-20  8:33       ` Auger Eric
2020-07-20  8:51         ` Liu, Yi L
2020-07-12 11:21 ` [PATCH v5 05/15] vfio: Add PASID allocation/free support Liu Yi L
2020-07-19 15:38   ` Auger Eric
2020-07-20  8:03     ` Liu, Yi L
2020-07-20  8:26       ` Auger Eric
2020-07-20  8:49         ` Liu, Yi L
2020-07-12 11:21 ` [PATCH v5 06/15] iommu/vt-d: Support setting ioasid set to domain Liu Yi L
2020-07-19 15:38   ` Auger Eric
2020-07-20  8:11     ` Liu, Yi L
2020-07-12 11:21 ` [PATCH v5 07/15] vfio/type1: Add VFIO_IOMMU_PASID_REQUEST (alloc/free) Liu Yi L
2020-07-19 15:38   ` Auger Eric
2020-07-20  8:56     ` Liu, Yi L
2020-07-12 11:21 ` [PATCH v5 08/15] iommu: Pass domain to sva_unbind_gpasid() Liu Yi L
2020-07-19 15:37   ` Auger Eric
2020-07-20  9:06     ` Liu, Yi L
2020-07-12 11:21 ` [PATCH v5 09/15] iommu/vt-d: Check ownership for PASIDs from user-space Liu Yi L
2020-07-19 16:06   ` Auger Eric
2020-07-20 10:18     ` Liu, Yi L
2020-07-20 12:37       ` Auger Eric
2020-07-20 12:55         ` Liu, Yi L
2020-07-12 11:21 ` [PATCH v5 10/15] vfio/type1: Support binding guest page tables to PASID Liu Yi L
2020-07-20  9:37   ` Auger Eric
2020-07-20 10:37     ` Liu, Yi L
2020-07-12 11:21 ` [PATCH v5 11/15] vfio/type1: Allow invalidating first-level/stage IOMMU cache Liu Yi L
2020-07-20  9:41   ` Auger Eric
2020-07-20 10:42     ` Liu, Yi L
2020-07-12 11:21 ` [PATCH v5 12/15] vfio/type1: Add vSVA support for IOMMU-backed mdevs Liu Yi L
2020-07-20 12:22   ` Auger Eric
2020-07-23 12:05     ` Liu, Yi L
2020-07-12 11:21 ` [PATCH v5 13/15] vfio/pci: Expose PCIe PASID capability to guest Liu Yi L
2020-07-20 12:35   ` Auger Eric
2020-07-20 13:00     ` Liu, Yi L
2020-07-12 11:21 ` [PATCH v5 14/15] vfio: Document dual stage control Liu Yi L
2020-07-18 13:39   ` Auger Eric
2020-07-25  8:54     ` Liu, Yi L
2020-07-12 11:21 ` [PATCH v5 15/15] iommu/vt-d: Support reporting nesting capability info Liu Yi L
2020-07-17 17:14   ` Auger Eric
2020-07-20 13:33     ` Liu, Yi L

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).