iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/5] IOMMU user API enhancement
@ 2020-06-23 17:03 Jacob Pan
  2020-06-23 17:03 ` [PATCH v3 1/5] docs: IOMMU user API Jacob Pan
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Jacob Pan @ 2020-06-23 17:03 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, Alex Williamson
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	Christoph Hellwig, David Woodhouse

IOMMU user API header was introduced to support nested DMA translation and
related fault handling. The current UAPI data structures consist of three
areas that cover the interactions between host kernel and guest:
 - fault handling
 - cache invalidation
 - bind guest page tables, i.e. guest PASID

Future extensions are likely to support more architectures and vIOMMU features.

In the previous discussion, using user-filled data size and feature flags is
made a preferred approach over a unified version number.
https://lkml.org/lkml/2020/1/29/45

In addition to introduce argsz field to data structures, this patchset is also
trying to document the UAPI design, usage, and extension rules. VT-d driver
changes to utilize the new argsz field is included, VFIO usage is to follow.

Thanks,

Jacob


Changeog:
v3:
	- Rewrote backward compatibility rule to support existing code
	  re-compiled with newer kernel UAPI header that runs on older
	  kernel. Based on review comment from Alex W.
	  https://lore.kernel.org/linux-iommu/20200611094741.6d118fa8@w520.home/
	- Take user pointer directly in UAPI functions. Perform argsz check
	  and copy_from_user() in IOMMU driver. Eliminate the need for
	  VFIO or other upper layer to parse IOMMU data.
	- Create wrapper function for in-kernel users of UAPI functions
v2:
	- Removed unified API version and helper
	- Introduced argsz for each UAPI data
	- Introduced UAPI doc

Jacob Pan (5):
  docs: IOMMU user API
  iommu/uapi: Add argsz for user filled data
  iommu/uapi: Use named union for user data
  iommu/uapi: Handle data and argsz filled by users
  iommu/uapi: Support both kernel and user unbind guest PASID

 Documentation/userspace-api/iommu.rst | 244 ++++++++++++++++++++++++++++++++++
 drivers/iommu/intel/iommu.c           |  24 ++--
 drivers/iommu/intel/svm.c             |   5 +-
 drivers/iommu/iommu.c                 | 138 +++++++++++++++++--
 include/linux/iommu.h                 |  20 ++-
 include/uapi/linux/iommu.h            |  10 +-
 6 files changed, 413 insertions(+), 28 deletions(-)
 create mode 100644 Documentation/userspace-api/iommu.rst

-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v3 1/5] docs: IOMMU user API
  2020-06-23 17:03 [PATCH v3 0/5] IOMMU user API enhancement Jacob Pan
@ 2020-06-23 17:03 ` Jacob Pan
  2020-06-26 22:19   ` Alex Williamson
  2020-06-23 17:03 ` [PATCH v3 2/5] iommu/uapi: Add argsz for user filled data Jacob Pan
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Jacob Pan @ 2020-06-23 17:03 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, Alex Williamson
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	Christoph Hellwig, David Woodhouse

IOMMU UAPI is newly introduced to support communications between guest
virtual IOMMU and host IOMMU. There has been lots of discussions on how
it should work with VFIO UAPI and userspace in general.

This document is indended to clarify the UAPI design and usage. The
mechenics of how future extensions should be achieved are also covered
in this documentation.

Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 Documentation/userspace-api/iommu.rst | 244 ++++++++++++++++++++++++++++++++++
 1 file changed, 244 insertions(+)
 create mode 100644 Documentation/userspace-api/iommu.rst

diff --git a/Documentation/userspace-api/iommu.rst b/Documentation/userspace-api/iommu.rst
new file mode 100644
index 000000000000..f9e4ed90a413
--- /dev/null
+++ b/Documentation/userspace-api/iommu.rst
@@ -0,0 +1,244 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. iommu:
+
+=====================================
+IOMMU Userspace API
+=====================================
+
+IOMMU UAPI is used for virtualization cases where communications are
+needed between physical and virtual IOMMU drivers. For native
+usage, IOMMU is a system device which does not need to communicate
+with user space directly.
+
+The primary use cases are guest Shared Virtual Address (SVA) and
+guest IO virtual address (IOVA), wherein a virtual IOMMU (vIOMMU) is
+required to communicate with the physical IOMMU in the host.
+
+.. contents:: :local:
+
+Functionalities
+===============
+Communications of user and kernel involve both directions. The
+supported user-kernel APIs are as follows:
+
+1. Alloc/Free PASID
+2. Bind/unbind guest PASID (e.g. Intel VT-d)
+3. Bind/unbind guest PASID table (e.g. ARM sMMU)
+4. Invalidate IOMMU caches
+5. Service page requests
+
+Requirements
+============
+The IOMMU UAPIs are generic and extensible to meet the following
+requirements:
+
+1. Emulated and para-virtualised vIOMMUs
+2. Multiple vendors (Intel VT-d, ARM sMMU, etc.)
+3. Extensions to the UAPI shall not break existing user space
+
+Interfaces
+==========
+Although the data structures defined in IOMMU UAPI are self-contained,
+there is no user API functions introduced. Instead, IOMMU UAPI is
+designed to work with existing user driver frameworks such as VFIO.
+
+Extension Rules & Precautions
+-----------------------------
+When IOMMU UAPI gets extended, the data structures can *only* be
+modified in two ways:
+
+1. Adding new fields by re-purposing the padding[] field. No size change.
+2. Adding new union members at the end. May increase in size.
+
+No new fields can be added *after* the variable sized union in that it
+will break backward compatibility when offset moves. In both cases, a
+new flag must be accompanied with a new field such that the IOMMU
+driver can process the data based on the new flag. Version field is
+only reserved for the unlikely event of UAPI upgrade at its entirety.
+
+It's *always* the caller's responsibility to indicate the size of the
+structure passed by setting argsz appropriately.
+Though at the same time, argsz is user provided data which is not
+trusted. The argsz field allows the user to indicate how much data
+they're providing, it's still the kernel's responsibility to validate
+whether it's correct and sufficient for the requested operation.
+
+Compatibility Checking
+----------------------
+When IOMMU UAPI extension results in size increase, user such as VFIO
+has to handle the following cases:
+
+1. User and kernel has exact size match
+2. An older user with older kernel header (smaller UAPI size) running on a
+   newer kernel (larger UAPI size)
+3. A newer user with newer kernel header (larger UAPI size) running
+   on an older kernel.
+4. A malicious/misbehaving user pass illegal/invalid size but within
+   range. The data may contain garbage.
+
+Feature Checking
+----------------
+While launching a guest with vIOMMU, it is important to ensure that host
+can support the UAPI data structures to be used for vIOMMU-pIOMMU
+communications. Without upfront compatibility checking, future faults
+are difficult to report even in normal conditions. For example, TLB
+invalidations should always succeed. There is no architectural way to
+report back to the vIOMMU if the UAPI data is incompatible. If that
+happens, in order to protect IOMMU iosolation guarantee, we have to
+resort to not giving completion status in vIOMMU. This may result in
+VM hang.
+
+For this reason the following IOMMU UAPIs cannot fail:
+
+1. Free PASID
+2. Unbind guest PASID
+3. Unbind guest PASID table (SMMU)
+4. Cache invalidate
+
+User applications such as QEMU is expected to import kernel UAPI
+headers. Backward compatibility is supported per feature flags.
+For example, an older QEMU (with older kernel header) can run on newer
+kernel. Newer QEMU (with new kernel header) may refuse to initialize
+on an older kernel if new feature flags are not supported by older
+kernel. Simply recompile existing code with newer kernel header should
+not be an issue in that only existing flags are used.
+
+IOMMU vendor driver should report the below features to IOMMU UAPI
+consumers (e.g. via VFIO).
+
+1. IOMMU_NESTING_FEAT_SYSWIDE_PASID
+2. IOMMU_NESTING_FEAT_BIND_PGTBL
+3. IOMMU_NESTING_FEAT_BIND_PASID_TABLE
+4. IOMMU_NESTING_FEAT_CACHE_INVLD
+5. IOMMU_NESTING_FEAT_PAGE_REQUEST
+
+Take VFIO as example, upon request from VFIO user space (e.g. QEMU),
+VFIO kernel code shall query IOMMU vendor driver for the support of
+the above features. Query result can then be reported back to the
+user-space caller. Details can be found in
+Documentation/driver-api/vfio.rst.
+
+
+Data Passing Example with VFIO
+------------------------------
+As the ubiquitous userspace driver framework, VFIO is already IOMMU
+aware and share many key concepts such as device model, group, and
+protection domain. Other user driver frameworks can also be extended
+to support IOMMU UAPI but it is outside the scope of this document.
+
+In this tight-knit VFIO-IOMMU interface, the ultimate consumer of the
+IOMMU UAPI data is the host IOMMU driver. VFIO facilitates user-kernel
+transport, capability checking, security, and life cycle management of
+process address space ID (PASID).
+
+Unlike normal user data passed via VFIO UAPI IOTCL, IOMMU driver is the
+ultimate consumer of its UAPI data. At VFIO layer, the IOMMU UAPI data
+is wrapped in a VFIO UAPI data. It follows the
+pattern below::
+
+   struct {
+	__u32 argsz;
+	__u32 flags;
+	__u8  data[];
+   };
+
+Here data[] contains the IOMMU UAPI data structures. VFIO has the
+freedom to bundle the data as well as parse data size based on its own flags.
+
+In order to determine the size and feature set of the user data, argsz
+and flags are also embedded in the IOMMU UAPI data structures.
+A "__u32 argsz" field is *always* at the beginning of each structure.
+
+For example:
+::
+
+   struct iommu_cache_invalidate_info {
+	__u32	argsz;
+	#define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
+	__u32	version;
+	/* IOMMU paging structure cache */
+	#define IOMMU_CACHE_INV_TYPE_IOTLB	(1 << 0) /* IOMMU IOTLB */
+	#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB	(1 << 1) /* Device IOTLB */
+	#define IOMMU_CACHE_INV_TYPE_PASID	(1 << 2) /* PASID cache */
+	#define IOMMU_CACHE_INV_TYPE_NR		(3)
+	__u8	cache;
+	__u8	granularity;
+	__u8	padding[2];
+	union {
+		struct iommu_inv_pasid_info pasid_info;
+		struct iommu_inv_addr_info addr_info;
+	} granu;
+   };
+
+VFIO is responsible for checking its own argsz and flags then invokes
+appropriate IOMMU UAPI functions. User pointer is passed to IOMMU
+layer for further processing. The responsibilities are divided as
+follows:
+
+- Generic IOMMU layer checks argsz range and override out-of-range
+  value. If the exact argsz is based on generic flags, they are checked
+  here as well.
+
+- Vendor IOMMU driver checks argsz based on vendor flags, UAPI data
+  is consumed based on flags
+
+Once again, use guest TLB invalidation as an example, argsz is based
+on generic flags in the invalidation information. IOMMU generic code
+shall process the UAPI data as the following:
+
+::
+
+ int iommu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
+			void __user *uinfo)
+ {
+	/* Current kernel data size is the max to be copied from user */
+	maxsz = sizeof(struct iommu_cache_invalidate_info);
+	memset((void *)&inv_info, 0, maxsz);
+
+	/*
+	 * No new spaces can be added before the variable sized union, the
+	 * minimum size is the offset to the union.
+	 */
+	minsz = offsetof(struct iommu_cache_invalidate_info, granu);
+
+	/* Copy minsz from user to get flags and argsz */
+	if (copy_from_user(&inv_info, uinfo, minsz))
+		return -EFAULT;
+
+	/* Fields before variable size union is mandatory */
+	if (inv_info.argsz < minsz)
+		return -EINVAL;
+	/*
+	 * User might be using a newer UAPI header which has a larger data
+	 * size, we shall support the existing flags within the current
+	 * size.
+	 */
+	if (inv_info.argsz > maxsz)
+		inv_info.argsz = maxsz;
+
+	/* Checking the exact argsz based on generic flags */
+	if (inv_info.granularity == IOMMU_INV_GRANU_ADDR &&
+		inv_info.argsz != offsetofend(struct iommu_cache_invalidate_info,
+					granu.addr_info))
+		return -EINVAL;
+
+	if (inv_info.granularity == IOMMU_INV_GRANU_PASID &&
+		inv_info.argsz != offsetofend(struct iommu_cache_invalidate_info,
+					granu.pasid_info))
+		return -EINVAL;
+
+	/* Copy the remaining user data _after_ minsz */
+	if (copy_from_user((void *)&inv_info + minsz, uinfo + minsz,
+				inv_info.argsz - minsz))
+		return -EFAULT;
+
+	return domain->ops->cache_invalidate(domain, dev, &inv_info);
+ }
+ Add a wrapper
+   __iommu_unbind_( kernel data, same user data, kernel copy)
+
+Notice that in this example, since union size is determined by generic
+flags, all checking to argsz is validated in the generic IOMMU layer,
+vendor driver does not need to check argsz. However, if union size is
+based on vendor data, such as iommu_sva_bind_gpasid(), it will be
+vendor driver's responsibility to validate the exact argsz.
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 2/5] iommu/uapi: Add argsz for user filled data
  2020-06-23 17:03 [PATCH v3 0/5] IOMMU user API enhancement Jacob Pan
  2020-06-23 17:03 ` [PATCH v3 1/5] docs: IOMMU user API Jacob Pan
@ 2020-06-23 17:03 ` Jacob Pan
  2020-06-23 17:03 ` [PATCH v3 3/5] iommu/uapi: Use named union for user data Jacob Pan
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 19+ messages in thread
From: Jacob Pan @ 2020-06-23 17:03 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, Alex Williamson
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	Christoph Hellwig, David Woodhouse

As IOMMU UAPI gets extended, user data size may increase. To support
backward compatibiliy, this patch introduces a size field to each UAPI
data structures. It is *always* the responsibility for the user to fill in
the correct size.

Specific scenarios for user data handling are documented in:
Documentation/userspace-api/iommu.rst

Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 include/uapi/linux/iommu.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index e907b7091a46..303f148a5cd7 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -135,6 +135,7 @@ enum iommu_page_response_code {
 
 /**
  * struct iommu_page_response - Generic page response information
+ * @argsz: User filled size of this data
  * @version: API version of this structure
  * @flags: encodes whether the corresponding fields are valid
  *         (IOMMU_FAULT_PAGE_RESPONSE_* values)
@@ -143,6 +144,7 @@ enum iommu_page_response_code {
  * @code: response code from &enum iommu_page_response_code
  */
 struct iommu_page_response {
+	__u32	argsz;
 #define IOMMU_PAGE_RESP_VERSION_1	1
 	__u32	version;
 #define IOMMU_PAGE_RESP_PASID_VALID	(1 << 0)
@@ -218,6 +220,7 @@ struct iommu_inv_pasid_info {
 /**
  * struct iommu_cache_invalidate_info - First level/stage invalidation
  *     information
+ * @argsz: User filled size of this data
  * @version: API version of this structure
  * @cache: bitfield that allows to select which caches to invalidate
  * @granularity: defines the lowest granularity used for the invalidation:
@@ -246,6 +249,7 @@ struct iommu_inv_pasid_info {
  * must support the used granularity.
  */
 struct iommu_cache_invalidate_info {
+	__u32	argsz;
 #define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
 	__u32	version;
 /* IOMMU paging structure cache */
@@ -292,6 +296,7 @@ struct iommu_gpasid_bind_data_vtd {
 
 /**
  * struct iommu_gpasid_bind_data - Information about device and guest PASID binding
+ * @argsz:	User filled size of this data
  * @version:	Version of this data structure
  * @format:	PASID table entry format
  * @flags:	Additional information on guest bind request
@@ -309,6 +314,7 @@ struct iommu_gpasid_bind_data_vtd {
  * PASID to host PASID based on this bind data.
  */
 struct iommu_gpasid_bind_data {
+	__u32 argsz;
 #define IOMMU_GPASID_BIND_VERSION_1	1
 	__u32 version;
 #define IOMMU_PASID_FORMAT_INTEL_VTD	1
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 3/5] iommu/uapi: Use named union for user data
  2020-06-23 17:03 [PATCH v3 0/5] IOMMU user API enhancement Jacob Pan
  2020-06-23 17:03 ` [PATCH v3 1/5] docs: IOMMU user API Jacob Pan
  2020-06-23 17:03 ` [PATCH v3 2/5] iommu/uapi: Add argsz for user filled data Jacob Pan
@ 2020-06-23 17:03 ` Jacob Pan
  2020-06-24  6:29   ` Lu Baolu
  2020-06-23 17:03 ` [PATCH v3 4/5] iommu/uapi: Handle data and argsz filled by users Jacob Pan
  2020-06-23 17:03 ` [PATCH v3 5/5] iommu/uapi: Support both kernel and user unbind guest PASID Jacob Pan
  4 siblings, 1 reply; 19+ messages in thread
From: Jacob Pan @ 2020-06-23 17:03 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, Alex Williamson
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	Christoph Hellwig, David Woodhouse

IOMMU UAPI data size is filled by the user space which must be validated
by ther kernel. To ensure backward compatibility, user data can only be
extended by either re-purpose padding bytes or extend the variable sized
union at the end. No size change is allowed before the union. Therefore,
the minimum size is the offset of the union.

To use offsetof() on the union, we must make it named.

Link: https://lkml.org/lkml/2020/6/11/834
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/intel/iommu.c | 24 ++++++++++++------------
 drivers/iommu/intel/svm.c   |  2 +-
 include/uapi/linux/iommu.h  |  4 ++--
 3 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 50fc62413a35..59cba214ef13 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5409,8 +5409,8 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, struct device *dev,
 
 	/* Size is only valid in address selective invalidation */
 	if (inv_info->granularity == IOMMU_INV_GRANU_ADDR)
-		size = to_vtd_size(inv_info->addr_info.granule_size,
-				   inv_info->addr_info.nb_granules);
+		size = to_vtd_size(inv_info->granu.addr_info.granule_size,
+				   inv_info->granu.addr_info.nb_granules);
 
 	for_each_set_bit(cache_type,
 			 (unsigned long *)&inv_info->cache,
@@ -5431,20 +5431,20 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, struct device *dev,
 		 * granularity.
 		 */
 		if (inv_info->granularity == IOMMU_INV_GRANU_PASID &&
-		    (inv_info->pasid_info.flags & IOMMU_INV_PASID_FLAGS_PASID))
-			pasid = inv_info->pasid_info.pasid;
+		    (inv_info->granu.pasid_info.flags & IOMMU_INV_PASID_FLAGS_PASID))
+			pasid = inv_info->granu.pasid_info.pasid;
 		else if (inv_info->granularity == IOMMU_INV_GRANU_ADDR &&
-			 (inv_info->addr_info.flags & IOMMU_INV_ADDR_FLAGS_PASID))
-			pasid = inv_info->addr_info.pasid;
+			 (inv_info->granu.addr_info.flags & IOMMU_INV_ADDR_FLAGS_PASID))
+			pasid = inv_info->granu.addr_info.pasid;
 
 		switch (BIT(cache_type)) {
 		case IOMMU_CACHE_INV_TYPE_IOTLB:
 			/* HW will ignore LSB bits based on address mask */
 			if (inv_info->granularity == IOMMU_INV_GRANU_ADDR &&
 			    size &&
-			    (inv_info->addr_info.addr & ((BIT(VTD_PAGE_SHIFT + size)) - 1))) {
+				(inv_info->granu.addr_info.addr & ((BIT(VTD_PAGE_SHIFT + size)) - 1))) {
 				WARN_ONCE(1, "Address out of range, 0x%llx, size order %llu\n",
-					  inv_info->addr_info.addr, size);
+					  inv_info->granu.addr_info.addr, size);
 			}
 
 			/*
@@ -5452,9 +5452,9 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, struct device *dev,
 			 * We use npages = -1 to indicate that.
 			 */
 			qi_flush_piotlb(iommu, did, pasid,
-					mm_to_dma_pfn(inv_info->addr_info.addr),
+					mm_to_dma_pfn(inv_info->granu.addr_info.addr),
 					(granu == QI_GRAN_NONG_PASID) ? -1 : 1 << size,
-					inv_info->addr_info.flags & IOMMU_INV_ADDR_FLAGS_LEAF);
+					inv_info->granu.addr_info.flags & IOMMU_INV_ADDR_FLAGS_LEAF);
 
 			if (!info->ats_enabled)
 				break;
@@ -5475,13 +5475,13 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, struct device *dev,
 				size = 64 - VTD_PAGE_SHIFT;
 				addr = 0;
 			} else if (inv_info->granularity == IOMMU_INV_GRANU_ADDR)
-				addr = inv_info->addr_info.addr;
+				addr = inv_info->granu.addr_info.addr;
 
 			if (info->ats_enabled)
 				qi_flush_dev_iotlb_pasid(iommu, sid,
 						info->pfsid, pasid,
 						info->ats_qdep,
-						inv_info->addr_info.addr,
+						inv_info->granu.addr_info.addr,
 						size);
 			else
 				pr_warn_ratelimited("Passdown device IOTLB flush w/o ATS!\n");
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index d386853121a2..713b3a218483 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -338,7 +338,7 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
 	spin_lock(&iommu->lock);
 	ret = intel_pasid_setup_nested(iommu, dev,
 				       (pgd_t *)(uintptr_t)data->gpgd,
-				       data->hpasid, &data->vtd, dmar_domain,
+				       data->hpasid, &data->vendor.vtd, dmar_domain,
 				       data->addr_width);
 	spin_unlock(&iommu->lock);
 	if (ret) {
diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index 303f148a5cd7..1afc6610b0ad 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -263,7 +263,7 @@ struct iommu_cache_invalidate_info {
 	union {
 		struct iommu_inv_pasid_info pasid_info;
 		struct iommu_inv_addr_info addr_info;
-	};
+	} granu;
 };
 
 /**
@@ -329,7 +329,7 @@ struct iommu_gpasid_bind_data {
 	/* Vendor specific data */
 	union {
 		struct iommu_gpasid_bind_data_vtd vtd;
-	};
+	} vendor;
 };
 
 #endif /* _UAPI_IOMMU_H */
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 4/5] iommu/uapi: Handle data and argsz filled by users
  2020-06-23 17:03 [PATCH v3 0/5] IOMMU user API enhancement Jacob Pan
                   ` (2 preceding siblings ...)
  2020-06-23 17:03 ` [PATCH v3 3/5] iommu/uapi: Use named union for user data Jacob Pan
@ 2020-06-23 17:03 ` Jacob Pan
  2020-06-24  6:54   ` Lu Baolu
  2020-06-23 17:03 ` [PATCH v3 5/5] iommu/uapi: Support both kernel and user unbind guest PASID Jacob Pan
  4 siblings, 1 reply; 19+ messages in thread
From: Jacob Pan @ 2020-06-23 17:03 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, Alex Williamson
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	Christoph Hellwig, David Woodhouse

IOMMU UAPI data has a user filled argsz field which indicates the data
length comes with the API call. User data is not trusted, argsz must be
validated based on the current kernel data size, mandatory data size,
and feature flags.

User data may also be extended, results in possible argsz increase.
Backward compatibility is ensured based on size and flags checking.
Details are documented in Documentation/userspace-api/iommu.rst

This patch adds sanity checks in both IOMMU layer and vendor code, where
VT-d is the only user for now.

Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/intel/svm.c |  3 ++
 drivers/iommu/iommu.c     | 96 ++++++++++++++++++++++++++++++++++++++++++++---
 include/linux/iommu.h     |  7 ++--
 3 files changed, 98 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 713b3a218483..237db56878c0 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -244,6 +244,9 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
 	    data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
 		return -EINVAL;
 
+	if (data->argsz != offsetofend(struct iommu_gpasid_bind_data, vendor.vtd))
+		return -EINVAL;
+
 	if (!dev_is_pci(dev))
 		return -ENOTSUPP;
 
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d43120eb1dc5..4a025c429b41 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1951,22 +1951,108 @@ int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
 EXPORT_SYMBOL_GPL(iommu_attach_device);
 
 int iommu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
-			   struct iommu_cache_invalidate_info *inv_info)
+			void __user *uinfo)
 {
+	struct iommu_cache_invalidate_info inv_info;
+	unsigned long minsz, maxsz;
+
 	if (unlikely(!domain->ops->cache_invalidate))
 		return -ENODEV;
 
-	return domain->ops->cache_invalidate(domain, dev, inv_info);
+	/* Current kernel data size is the max to be copied from user */
+	maxsz = sizeof(struct iommu_cache_invalidate_info);
+	memset((void *)&inv_info, 0, maxsz);
+
+	/*
+	 * No new spaces can be added before the variable sized union, the
+	 * minimum size is the offset to the union.
+	 */
+	minsz = offsetof(struct iommu_cache_invalidate_info, granu);
+
+	/* Copy minsz from user to get flags and argsz */
+	if (copy_from_user(&inv_info, uinfo, minsz))
+		return -EFAULT;
+
+	/* Fields before variable size union is mandatory */
+	if (inv_info.argsz < minsz)
+		return -EINVAL;
+	/*
+	 * User might be using a newer UAPI header, we shall let IOMMU vendor
+	 * driver decide on what size it needs. Since the UAPI data extension
+	 * can be vendor specific, larger argsz could be the result of extension
+	 * for one vendor but it should not affect another vendor.
+	 */
+	/*
+	 * User might be using a newer UAPI header which has a larger data
+	 * size, we shall support the existing flags within the current
+	 * size.
+	 */
+	if (inv_info.argsz > maxsz)
+		inv_info.argsz = maxsz;
+
+	/* Checking the exact argsz based on generic flags */
+	if (inv_info.granularity == IOMMU_INV_GRANU_ADDR &&
+		inv_info.argsz != offsetofend(struct iommu_cache_invalidate_info,
+					granu.addr_info))
+		return -EINVAL;
+
+	if (inv_info.granularity == IOMMU_INV_GRANU_PASID &&
+		inv_info.argsz != offsetofend(struct iommu_cache_invalidate_info,
+					granu.pasid_info))
+		return -EINVAL;
+
+	/* Copy the remaining user data _after_ minsz */
+	if (copy_from_user((void *)&inv_info + minsz, uinfo + minsz,
+				inv_info.argsz - minsz))
+		return -EFAULT;
+
+	return domain->ops->cache_invalidate(domain, dev, &inv_info);
 }
 EXPORT_SYMBOL_GPL(iommu_cache_invalidate);
 
-int iommu_sva_bind_gpasid(struct iommu_domain *domain,
-			   struct device *dev, struct iommu_gpasid_bind_data *data)
+int iommu_sva_bind_gpasid(struct iommu_domain *domain, struct device *dev,
+						void __user *udata)
 {
+
+	struct iommu_gpasid_bind_data data;
+	unsigned long minsz, maxsz;
+
 	if (unlikely(!domain->ops->sva_bind_gpasid))
 		return -ENODEV;
 
-	return domain->ops->sva_bind_gpasid(domain, dev, data);
+	/* Current kernel data size is the max to be copied from user */
+	maxsz = sizeof(struct iommu_gpasid_bind_data);
+	memset((void *)&data, 0, maxsz);
+
+	/*
+	 * No new spaces can be added before the variable sized union, the
+	 * minimum size is the offset to the union.
+	 */
+	minsz = offsetof(struct iommu_gpasid_bind_data, vendor);
+
+	/* Copy minsz from user to get flags and argsz */
+	if (copy_from_user(&data, udata, minsz))
+		return -EFAULT;
+
+	/* Fields before variable size union is mandatory */
+	if (data.argsz < minsz)
+		return -EINVAL;
+	/*
+	 * User might be using a newer UAPI header, we shall let IOMMU vendor
+	 * driver decide on what size it needs. Since the guest PASID bind data
+	 * can be vendor specific, larger argsz could be the result of extension
+	 * for one vendor but it should not affect another vendor.
+	 */
+	if (data.argsz > maxsz)
+		data.argsz = maxsz;
+
+	/* Copy the remaining user data _after_ minsz */
+	if (copy_from_user((void *)&data + minsz, udata + minsz,
+				data.argsz - minsz))
+		return -EFAULT;
+
+
+	return domain->ops->sva_bind_gpasid(domain, dev, &data);
 }
 EXPORT_SYMBOL_GPL(iommu_sva_bind_gpasid);
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 5f0b7859d2eb..a688fea42ae5 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -432,9 +432,10 @@ extern void iommu_detach_device(struct iommu_domain *domain,
 				struct device *dev);
 extern int iommu_cache_invalidate(struct iommu_domain *domain,
 				  struct device *dev,
-				  struct iommu_cache_invalidate_info *inv_info);
+				  void __user *uinfo);
+
 extern int iommu_sva_bind_gpasid(struct iommu_domain *domain,
-		struct device *dev, struct iommu_gpasid_bind_data *data);
+				struct device *dev, void __user *udata);
 extern int iommu_sva_unbind_gpasid(struct iommu_domain *domain,
 				struct device *dev, ioasid_t pasid);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
@@ -1062,7 +1063,7 @@ iommu_cache_invalidate(struct iommu_domain *domain,
 	return -ENODEV;
 }
 static inline int iommu_sva_bind_gpasid(struct iommu_domain *domain,
-				struct device *dev, struct iommu_gpasid_bind_data *data)
+				struct device *dev, void __user *udata)
 {
 	return -ENODEV;
 }
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 5/5] iommu/uapi: Support both kernel and user unbind guest PASID
  2020-06-23 17:03 [PATCH v3 0/5] IOMMU user API enhancement Jacob Pan
                   ` (3 preceding siblings ...)
  2020-06-23 17:03 ` [PATCH v3 4/5] iommu/uapi: Handle data and argsz filled by users Jacob Pan
@ 2020-06-23 17:03 ` Jacob Pan
  2020-06-24  7:55   ` Lu Baolu
  2020-06-25 12:59   ` Lu Baolu
  4 siblings, 2 replies; 19+ messages in thread
From: Jacob Pan @ 2020-06-23 17:03 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, Alex Williamson
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	Christoph Hellwig, David Woodhouse

Guest SVA unbind data can come from either kernel and user space, if a
user pointer is passed in, IOMMU driver must copy from data from user.
If the unbind data is assembled in kernel, data can be trusted and
directly used. This patch creates a wrapper for unbind gpasid such that
user pointer can be parsed and sanitized before calling into the kernel
unbind function. Common user data copy code also consolidated.

Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/iommu.c | 70 ++++++++++++++++++++++++++++++++++++++-------------
 include/linux/iommu.h | 13 ++++++++--
 2 files changed, 64 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 4a025c429b41..595527e4c6b7 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2010,19 +2010,15 @@ int iommu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
 }
 EXPORT_SYMBOL_GPL(iommu_cache_invalidate);
 
-int iommu_sva_bind_gpasid(struct iommu_domain *domain, struct device *dev,
-						void __user *udata)
-{
 
-	struct iommu_gpasid_bind_data data;
+static int iommu_sva_prepare_bind_data(void __user *udata, bool bind,
+				       struct iommu_gpasid_bind_data *data)
+{
 	unsigned long minsz, maxsz;
 
-	if (unlikely(!domain->ops->sva_bind_gpasid))
-		return -ENODEV;
-
 	/* Current kernel data size is the max to be copied from user */
 	maxsz = sizeof(struct iommu_gpasid_bind_data);
-	memset((void *)&data, 0, maxsz);
+	memset((void *)data, 0, maxsz);
 
 	/*
 	 * No new spaces can be added before the variable sized union, the
@@ -2031,11 +2027,11 @@ int iommu_sva_bind_gpasid(struct iommu_domain *domain, struct device *dev,
 	minsz = offsetof(struct iommu_gpasid_bind_data, vendor);
 
 	/* Copy minsz from user to get flags and argsz */
-	if (copy_from_user(&data, udata, minsz))
+	if (copy_from_user(data, udata, minsz))
 		return -EFAULT;
 
 	/* Fields before variable size union is mandatory */
-	if (data.argsz < minsz)
+	if (data->argsz < minsz)
 		return -EINVAL;
 	/*
 	 * User might be using a newer UAPI header, we shall let IOMMU vendor
@@ -2043,26 +2039,66 @@ int iommu_sva_bind_gpasid(struct iommu_domain *domain, struct device *dev,
 	 * can be vendor specific, larger argsz could be the result of extension
 	 * for one vendor but it should not affect another vendor.
 	 */
-	if (data.argsz > maxsz)
-		data.argsz = maxsz;
+	if (data->argsz > maxsz)
+		data->argsz = maxsz;
+
+	/*
+	 * For unbind, we don't need any extra data, host PASID is included in
+	 * the minsz and that is all we need.
+	 */
+	if (!bind)
+		return 0;
 
 	/* Copy the remaining user data _after_ minsz */
-	if (copy_from_user((void *)&data + minsz, udata + minsz,
-				data.argsz - minsz))
+	if (copy_from_user((void *)data + minsz, udata + minsz,
+				data->argsz - minsz))
 		return -EFAULT;
 
+	return 0;
+}
+
+int iommu_sva_bind_gpasid(struct iommu_domain *domain, struct device *dev,
+						void __user *udata)
+{
+
+	struct iommu_gpasid_bind_data data;
+	int ret;
+
+	if (unlikely(!domain->ops->sva_bind_gpasid))
+		return -ENODEV;
+
+	ret = iommu_sva_prepare_bind_data(udata, true, &data);
+	if (ret)
+		return ret;
 
 	return domain->ops->sva_bind_gpasid(domain, dev, &data);
 }
 EXPORT_SYMBOL_GPL(iommu_sva_bind_gpasid);
 
-int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
-			     ioasid_t pasid)
+int __iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
+			struct iommu_gpasid_bind_data *data)
 {
 	if (unlikely(!domain->ops->sva_unbind_gpasid))
 		return -ENODEV;
 
-	return domain->ops->sva_unbind_gpasid(dev, pasid);
+	return domain->ops->sva_unbind_gpasid(dev, data->hpasid);
+}
+EXPORT_SYMBOL_GPL(__iommu_sva_unbind_gpasid);
+
+int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
+			void __user *udata)
+{
+	struct iommu_gpasid_bind_data data;
+	int ret;
+
+	if (unlikely(!domain->ops->sva_bind_gpasid))
+		return -ENODEV;
+
+	ret = iommu_sva_prepare_bind_data(udata, false, &data);
+	if (ret)
+		return ret;
+
+	return __iommu_sva_unbind_gpasid(domain, dev, &data);
 }
 EXPORT_SYMBOL_GPL(iommu_sva_unbind_gpasid);
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index a688fea42ae5..2567c33dc4e8 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -437,7 +437,9 @@ extern int iommu_cache_invalidate(struct iommu_domain *domain,
 extern int iommu_sva_bind_gpasid(struct iommu_domain *domain,
 				struct device *dev, void __user *udata);
 extern int iommu_sva_unbind_gpasid(struct iommu_domain *domain,
-				struct device *dev, ioasid_t pasid);
+				struct device *dev, void __user *udata);
+extern int __iommu_sva_unbind_gpasid(struct iommu_domain *domain,
+				struct device *dev, struct iommu_gpasid_bind_data *data);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
@@ -1069,7 +1071,14 @@ static inline int iommu_sva_bind_gpasid(struct iommu_domain *domain,
 }
 
 static inline int iommu_sva_unbind_gpasid(struct iommu_domain *domain,
-					   struct device *dev, int pasid)
+					   struct device *dev, void __user *udata)
+{
+	return -ENODEV;
+}
+
+static inline int __iommu_sva_unbind_gpasid(struct iommu_domain *domain,
+					struct device *dev,
+					struct iommu_gpasid_bind_data *data)
 {
 	return -ENODEV;
 }
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 3/5] iommu/uapi: Use named union for user data
  2020-06-23 17:03 ` [PATCH v3 3/5] iommu/uapi: Use named union for user data Jacob Pan
@ 2020-06-24  6:29   ` Lu Baolu
  2020-06-24 15:48     ` Jacob Pan
  0 siblings, 1 reply; 19+ messages in thread
From: Lu Baolu @ 2020-06-24  6:29 UTC (permalink / raw)
  To: Jacob Pan, iommu, LKML, Joerg Roedel, Alex Williamson
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	Christoph Hellwig, David Woodhouse

Hi Jacob,

On 2020/6/24 1:03, Jacob Pan wrote:
> IOMMU UAPI data size is filled by the user space which must be validated
> by ther kernel. To ensure backward compatibility, user data can only be
> extended by either re-purpose padding bytes or extend the variable sized
> union at the end. No size change is allowed before the union. Therefore,
> the minimum size is the offset of the union.
> 
> To use offsetof() on the union, we must make it named.
> 
> Link: https://lkml.org/lkml/2020/6/11/834
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
>   drivers/iommu/intel/iommu.c | 24 ++++++++++++------------
>   drivers/iommu/intel/svm.c   |  2 +-
>   include/uapi/linux/iommu.h  |  4 ++--
>   3 files changed, 15 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 50fc62413a35..59cba214ef13 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -5409,8 +5409,8 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, struct device *dev,
>   
>   	/* Size is only valid in address selective invalidation */
>   	if (inv_info->granularity == IOMMU_INV_GRANU_ADDR)
> -		size = to_vtd_size(inv_info->addr_info.granule_size,
> -				   inv_info->addr_info.nb_granules);
> +		size = to_vtd_size(inv_info->granu.addr_info.granule_size,
> +				   inv_info->granu.addr_info.nb_granules);
>   
>   	for_each_set_bit(cache_type,
>   			 (unsigned long *)&inv_info->cache,
> @@ -5431,20 +5431,20 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, struct device *dev,
>   		 * granularity.
>   		 */
>   		if (inv_info->granularity == IOMMU_INV_GRANU_PASID &&
> -		    (inv_info->pasid_info.flags & IOMMU_INV_PASID_FLAGS_PASID))
> -			pasid = inv_info->pasid_info.pasid;
> +		    (inv_info->granu.pasid_info.flags & IOMMU_INV_PASID_FLAGS_PASID))
> +			pasid = inv_info->granu.pasid_info.pasid;
>   		else if (inv_info->granularity == IOMMU_INV_GRANU_ADDR &&
> -			 (inv_info->addr_info.flags & IOMMU_INV_ADDR_FLAGS_PASID))
> -			pasid = inv_info->addr_info.pasid;
> +			 (inv_info->granu.addr_info.flags & IOMMU_INV_ADDR_FLAGS_PASID))
> +			pasid = inv_info->granu.addr_info.pasid;
>   
>   		switch (BIT(cache_type)) {
>   		case IOMMU_CACHE_INV_TYPE_IOTLB:
>   			/* HW will ignore LSB bits based on address mask */
>   			if (inv_info->granularity == IOMMU_INV_GRANU_ADDR &&
>   			    size &&
> -			    (inv_info->addr_info.addr & ((BIT(VTD_PAGE_SHIFT + size)) - 1))) {
> +				(inv_info->granu.addr_info.addr & ((BIT(VTD_PAGE_SHIFT + size)) - 1))) {

Nit: Keep it aligned. With this tweaked,

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>

Best regards,
baolu

>   				WARN_ONCE(1, "Address out of range, 0x%llx, size order %llu\n",
> -					  inv_info->addr_info.addr, size);
> +					  inv_info->granu.addr_info.addr, size);
>   			}
>   
>   			/*
> @@ -5452,9 +5452,9 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, struct device *dev,
>   			 * We use npages = -1 to indicate that.
>   			 */
>   			qi_flush_piotlb(iommu, did, pasid,
> -					mm_to_dma_pfn(inv_info->addr_info.addr),
> +					mm_to_dma_pfn(inv_info->granu.addr_info.addr),
>   					(granu == QI_GRAN_NONG_PASID) ? -1 : 1 << size,
> -					inv_info->addr_info.flags & IOMMU_INV_ADDR_FLAGS_LEAF);
> +					inv_info->granu.addr_info.flags & IOMMU_INV_ADDR_FLAGS_LEAF);
>   
>   			if (!info->ats_enabled)
>   				break;
> @@ -5475,13 +5475,13 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, struct device *dev,
>   				size = 64 - VTD_PAGE_SHIFT;
>   				addr = 0;
>   			} else if (inv_info->granularity == IOMMU_INV_GRANU_ADDR)
> -				addr = inv_info->addr_info.addr;
> +				addr = inv_info->granu.addr_info.addr;
>   
>   			if (info->ats_enabled)
>   				qi_flush_dev_iotlb_pasid(iommu, sid,
>   						info->pfsid, pasid,
>   						info->ats_qdep,
> -						inv_info->addr_info.addr,
> +						inv_info->granu.addr_info.addr,
>   						size);
>   			else
>   				pr_warn_ratelimited("Passdown device IOTLB flush w/o ATS!\n");
> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> index d386853121a2..713b3a218483 100644
> --- a/drivers/iommu/intel/svm.c
> +++ b/drivers/iommu/intel/svm.c
> @@ -338,7 +338,7 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
>   	spin_lock(&iommu->lock);
>   	ret = intel_pasid_setup_nested(iommu, dev,
>   				       (pgd_t *)(uintptr_t)data->gpgd,
> -				       data->hpasid, &data->vtd, dmar_domain,
> +				       data->hpasid, &data->vendor.vtd, dmar_domain,
>   				       data->addr_width);
>   	spin_unlock(&iommu->lock);
>   	if (ret) {
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> index 303f148a5cd7..1afc6610b0ad 100644
> --- a/include/uapi/linux/iommu.h
> +++ b/include/uapi/linux/iommu.h
> @@ -263,7 +263,7 @@ struct iommu_cache_invalidate_info {
>   	union {
>   		struct iommu_inv_pasid_info pasid_info;
>   		struct iommu_inv_addr_info addr_info;
> -	};
> +	} granu;
>   };
>   
>   /**
> @@ -329,7 +329,7 @@ struct iommu_gpasid_bind_data {
>   	/* Vendor specific data */
>   	union {
>   		struct iommu_gpasid_bind_data_vtd vtd;
> -	};
> +	} vendor;
>   };
>   
>   #endif /* _UAPI_IOMMU_H */
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 4/5] iommu/uapi: Handle data and argsz filled by users
  2020-06-23 17:03 ` [PATCH v3 4/5] iommu/uapi: Handle data and argsz filled by users Jacob Pan
@ 2020-06-24  6:54   ` Lu Baolu
  2020-06-24 17:07     ` Jacob Pan
  0 siblings, 1 reply; 19+ messages in thread
From: Lu Baolu @ 2020-06-24  6:54 UTC (permalink / raw)
  To: Jacob Pan, iommu, LKML, Joerg Roedel, Alex Williamson
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	Christoph Hellwig, David Woodhouse

Hi Jacob,

On 2020/6/24 1:03, Jacob Pan wrote:
> IOMMU UAPI data has a user filled argsz field which indicates the data
> length comes with the API call. User data is not trusted, argsz must be
> validated based on the current kernel data size, mandatory data size,
> and feature flags.
> 
> User data may also be extended, results in possible argsz increase.
> Backward compatibility is ensured based on size and flags checking.
> Details are documented in Documentation/userspace-api/iommu.rst
> 
> This patch adds sanity checks in both IOMMU layer and vendor code, where
> VT-d is the only user for now.
> 
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
>   drivers/iommu/intel/svm.c |  3 ++
>   drivers/iommu/iommu.c     | 96 ++++++++++++++++++++++++++++++++++++++++++++---
>   include/linux/iommu.h     |  7 ++--
>   3 files changed, 98 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> index 713b3a218483..237db56878c0 100644
> --- a/drivers/iommu/intel/svm.c
> +++ b/drivers/iommu/intel/svm.c
> @@ -244,6 +244,9 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
>   	    data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
>   		return -EINVAL;
>   
> +	if (data->argsz != offsetofend(struct iommu_gpasid_bind_data, vendor.vtd))
> +		return -EINVAL;

Need to do size check in intel_iommu_sva_invalidate() as well?

> +
>   	if (!dev_is_pci(dev))
>   		return -ENOTSUPP;
>   
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index d43120eb1dc5..4a025c429b41 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1951,22 +1951,108 @@ int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
>   EXPORT_SYMBOL_GPL(iommu_attach_device);
>   
>   int iommu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
> -			   struct iommu_cache_invalidate_info *inv_info)
> +			void __user *uinfo)

Nit: keep it aligned.

>   {
> +	struct iommu_cache_invalidate_info inv_info;
> +	unsigned long minsz, maxsz;
> +
>   	if (unlikely(!domain->ops->cache_invalidate))
>   		return -ENODEV;
>   
> -	return domain->ops->cache_invalidate(domain, dev, inv_info);
> +	/* Current kernel data size is the max to be copied from user */
> +	maxsz = sizeof(struct iommu_cache_invalidate_info);
> +	memset((void *)&inv_info, 0, maxsz);
> +
> +	/*
> +	 * No new spaces can be added before the variable sized union, the
> +	 * minimum size is the offset to the union.
> +	 */
> +	minsz = offsetof(struct iommu_cache_invalidate_info, granu);
> +
> +	/* Copy minsz from user to get flags and argsz */
> +	if (copy_from_user(&inv_info, uinfo, minsz))
> +		return -EFAULT;
> +
> +	/* Fields before variable size union is mandatory */
> +	if (inv_info.argsz < minsz)
> +		return -EINVAL;
> +	/*
> +	 * User might be using a newer UAPI header, we shall let IOMMU vendor
> +	 * driver decide on what size it needs. Since the UAPI data extension
> +	 * can be vendor specific, larger argsz could be the result of extension
> +	 * for one vendor but it should not affect another vendor.
> +	 */
> +	/*
> +	 * User might be using a newer UAPI header which has a larger data
> +	 * size, we shall support the existing flags within the current
> +	 * size.
> +	 */
> +	if (inv_info.argsz > maxsz)
> +		inv_info.argsz = maxsz;
> +
> +	/* Checking the exact argsz based on generic flags */
> +	if (inv_info.granularity == IOMMU_INV_GRANU_ADDR &&
> +		inv_info.argsz != offsetofend(struct iommu_cache_invalidate_info,
> +					granu.addr_info))
> +		return -EINVAL;
> +
> +	if (inv_info.granularity == IOMMU_INV_GRANU_PASID &&
> +		inv_info.argsz != offsetofend(struct iommu_cache_invalidate_info,
> +					granu.pasid_info))
> +		return -EINVAL;
> +
> +	/* Copy the remaining user data _after_ minsz */
> +	if (copy_from_user((void *)&inv_info + minsz, uinfo + minsz,
> +				inv_info.argsz - minsz))
> +		return -EFAULT;
> +
> +	return domain->ops->cache_invalidate(domain, dev, &inv_info);
>   }
>   EXPORT_SYMBOL_GPL(iommu_cache_invalidate);
>   
> -int iommu_sva_bind_gpasid(struct iommu_domain *domain,
> -			   struct device *dev, struct iommu_gpasid_bind_data *data)
> +int iommu_sva_bind_gpasid(struct iommu_domain *domain, struct device *dev,
> +						void __user *udata)
>   {
> +
> +	struct iommu_gpasid_bind_data data;
> +	unsigned long minsz, maxsz;
> +
>   	if (unlikely(!domain->ops->sva_bind_gpasid))
>   		return -ENODEV;
>   
> -	return domain->ops->sva_bind_gpasid(domain, dev, data);
> +	/* Current kernel data size is the max to be copied from user */
> +	maxsz = sizeof(struct iommu_gpasid_bind_data);
> +	memset((void *)&data, 0, maxsz);
> +
> +	/*
> +	 * No new spaces can be added before the variable sized union, the
> +	 * minimum size is the offset to the union.
> +	 */
> +	minsz = offsetof(struct iommu_gpasid_bind_data, vendor);
> +
> +	/* Copy minsz from user to get flags and argsz */
> +	if (copy_from_user(&data, udata, minsz))
> +		return -EFAULT;
> +
> +	/* Fields before variable size union is mandatory */
> +	if (data.argsz < minsz)
> +		return -EINVAL;
> +	/*
> +	 * User might be using a newer UAPI header, we shall let IOMMU vendor
> +	 * driver decide on what size it needs. Since the guest PASID bind data
> +	 * can be vendor specific, larger argsz could be the result of extension
> +	 * for one vendor but it should not affect another vendor.
> +	 */
> +	if (data.argsz > maxsz)
> +		data.argsz = maxsz;
> +
> +	/* Copy the remaining user data _after_ minsz */
> +	if (copy_from_user((void *)&data + minsz, udata + minsz,
> +				data.argsz - minsz))
> +		return -EFAULT;
> +
> +
> +	return domain->ops->sva_bind_gpasid(domain, dev, &data);
>   }
>   EXPORT_SYMBOL_GPL(iommu_sva_bind_gpasid);
>   
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 5f0b7859d2eb..a688fea42ae5 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -432,9 +432,10 @@ extern void iommu_detach_device(struct iommu_domain *domain,
>   				struct device *dev);
>   extern int iommu_cache_invalidate(struct iommu_domain *domain,
>   				  struct device *dev,
> -				  struct iommu_cache_invalidate_info *inv_info);
> +				  void __user *uinfo);
> +
>   extern int iommu_sva_bind_gpasid(struct iommu_domain *domain,
> -		struct device *dev, struct iommu_gpasid_bind_data *data);
> +				struct device *dev, void __user *udata);
>   extern int iommu_sva_unbind_gpasid(struct iommu_domain *domain,
>   				struct device *dev, ioasid_t pasid);
>   extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
> @@ -1062,7 +1063,7 @@ iommu_cache_invalidate(struct iommu_domain *domain,
>   	return -ENODEV;
>   }
>   static inline int iommu_sva_bind_gpasid(struct iommu_domain *domain,
> -				struct device *dev, struct iommu_gpasid_bind_data *data)
> +				struct device *dev, void __user *udata)
>   {
>   	return -ENODEV;
>   }
> 

Best regards,
baolu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 5/5] iommu/uapi: Support both kernel and user unbind guest PASID
  2020-06-23 17:03 ` [PATCH v3 5/5] iommu/uapi: Support both kernel and user unbind guest PASID Jacob Pan
@ 2020-06-24  7:55   ` Lu Baolu
  2020-06-25 12:59   ` Lu Baolu
  1 sibling, 0 replies; 19+ messages in thread
From: Lu Baolu @ 2020-06-24  7:55 UTC (permalink / raw)
  To: Jacob Pan, iommu, LKML, Joerg Roedel, Alex Williamson
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	Christoph Hellwig, David Woodhouse

Hi Jacob,

On 2020/6/24 1:03, Jacob Pan wrote:
> Guest SVA unbind data can come from either kernel and user space, if a

either kernel or user space

> user pointer is passed in, IOMMU driver must copy from data from user.

copy data from user

> If the unbind data is assembled in kernel, data can be trusted and
> directly used. This patch creates a wrapper for unbind gpasid such that
> user pointer can be parsed and sanitized before calling into the kernel
> unbind function. Common user data copy code also consolidated.
> 
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
>   drivers/iommu/iommu.c | 70 ++++++++++++++++++++++++++++++++++++++-------------
>   include/linux/iommu.h | 13 ++++++++--
>   2 files changed, 64 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 4a025c429b41..595527e4c6b7 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2010,19 +2010,15 @@ int iommu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
>   }
>   EXPORT_SYMBOL_GPL(iommu_cache_invalidate);
>   
> -int iommu_sva_bind_gpasid(struct iommu_domain *domain, struct device *dev,
> -						void __user *udata)
> -{
>   
> -	struct iommu_gpasid_bind_data data;
> +static int iommu_sva_prepare_bind_data(void __user *udata, bool bind,
> +				       struct iommu_gpasid_bind_data *data)
> +{
>   	unsigned long minsz, maxsz;
>   
> -	if (unlikely(!domain->ops->sva_bind_gpasid))
> -		return -ENODEV;
> -
>   	/* Current kernel data size is the max to be copied from user */
>   	maxsz = sizeof(struct iommu_gpasid_bind_data);
> -	memset((void *)&data, 0, maxsz);
> +	memset((void *)data, 0, maxsz);
>   
>   	/*
>   	 * No new spaces can be added before the variable sized union, the
> @@ -2031,11 +2027,11 @@ int iommu_sva_bind_gpasid(struct iommu_domain *domain, struct device *dev,
>   	minsz = offsetof(struct iommu_gpasid_bind_data, vendor);
>   
>   	/* Copy minsz from user to get flags and argsz */
> -	if (copy_from_user(&data, udata, minsz))
> +	if (copy_from_user(data, udata, minsz))
>   		return -EFAULT;
>   
>   	/* Fields before variable size union is mandatory */
> -	if (data.argsz < minsz)
> +	if (data->argsz < minsz)
>   		return -EINVAL;
>   	/*
>   	 * User might be using a newer UAPI header, we shall let IOMMU vendor
> @@ -2043,26 +2039,66 @@ int iommu_sva_bind_gpasid(struct iommu_domain *domain, struct device *dev,
>   	 * can be vendor specific, larger argsz could be the result of extension
>   	 * for one vendor but it should not affect another vendor.
>   	 */
> -	if (data.argsz > maxsz)
> -		data.argsz = maxsz;
> +	if (data->argsz > maxsz)
> +		data->argsz = maxsz;
> +
> +	/*
> +	 * For unbind, we don't need any extra data, host PASID is included in
> +	 * the minsz and that is all we need.
> +	 */
> +	if (!bind)
> +		return 0;
>   
>   	/* Copy the remaining user data _after_ minsz */
> -	if (copy_from_user((void *)&data + minsz, udata + minsz,
> -				data.argsz - minsz))
> +	if (copy_from_user((void *)data + minsz, udata + minsz,
> +				data->argsz - minsz))
>   		return -EFAULT;
>   
> +	return 0;
> +}
> +
> +int iommu_sva_bind_gpasid(struct iommu_domain *domain, struct device *dev,
> +						void __user *udata)
> +{
> +
> +	struct iommu_gpasid_bind_data data;
> +	int ret;
> +
> +	if (unlikely(!domain->ops->sva_bind_gpasid))
> +		return -ENODEV;
> +
> +	ret = iommu_sva_prepare_bind_data(udata, true, &data);
> +	if (ret)
> +		return ret;
>   
>   	return domain->ops->sva_bind_gpasid(domain, dev, &data);
>   }
>   EXPORT_SYMBOL_GPL(iommu_sva_bind_gpasid);
>   
> -int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
> -			     ioasid_t pasid)
> +int __iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
> +			struct iommu_gpasid_bind_data *data)
>   {
>   	if (unlikely(!domain->ops->sva_unbind_gpasid))
>   		return -ENODEV;
>   
> -	return domain->ops->sva_unbind_gpasid(dev, pasid);
> +	return domain->ops->sva_unbind_gpasid(dev, data->hpasid);
How about passing @data to the vendor iommu driver as well?

> +}
> +EXPORT_SYMBOL_GPL(__iommu_sva_unbind_gpasid);
> +
> +int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
> +			void __user *udata)

Keep it aligned.

> +{
> +	struct iommu_gpasid_bind_data data;
> +	int ret;
> +
> +	if (unlikely(!domain->ops->sva_bind_gpasid))
> +		return -ENODEV;
> +
> +	ret = iommu_sva_prepare_bind_data(udata, false, &data);
> +	if (ret)
> +		return ret;
> +
> +	return __iommu_sva_unbind_gpasid(domain, dev, &data);
>   }
>   EXPORT_SYMBOL_GPL(iommu_sva_unbind_gpasid);
>   
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index a688fea42ae5..2567c33dc4e8 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -437,7 +437,9 @@ extern int iommu_cache_invalidate(struct iommu_domain *domain,
>   extern int iommu_sva_bind_gpasid(struct iommu_domain *domain,
>   				struct device *dev, void __user *udata);
>   extern int iommu_sva_unbind_gpasid(struct iommu_domain *domain,
> -				struct device *dev, ioasid_t pasid);
> +				struct device *dev, void __user *udata);
> +extern int __iommu_sva_unbind_gpasid(struct iommu_domain *domain,
> +				struct device *dev, struct iommu_gpasid_bind_data *data);
>   extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
>   extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
>   extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
> @@ -1069,7 +1071,14 @@ static inline int iommu_sva_bind_gpasid(struct iommu_domain *domain,
>   }
>   
>   static inline int iommu_sva_unbind_gpasid(struct iommu_domain *domain,
> -					   struct device *dev, int pasid)
> +					   struct device *dev, void __user *udata)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline int __iommu_sva_unbind_gpasid(struct iommu_domain *domain,
> +					struct device *dev,
> +					struct iommu_gpasid_bind_data *data)
>   {
>   	return -ENODEV;
>   }
> 

Best regards,
baolu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 3/5] iommu/uapi: Use named union for user data
  2020-06-24  6:29   ` Lu Baolu
@ 2020-06-24 15:48     ` Jacob Pan
  0 siblings, 0 replies; 19+ messages in thread
From: Jacob Pan @ 2020-06-24 15:48 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	Alex Williamson, LKML, Christoph Hellwig, iommu, David Woodhouse

On Wed, 24 Jun 2020 14:29:57 +0800
Lu Baolu <baolu.lu@linux.intel.com> wrote:

> >   		case IOMMU_CACHE_INV_TYPE_IOTLB:
> >   			/* HW will ignore LSB bits based on
> > address mask */ if (inv_info->granularity == IOMMU_INV_GRANU_ADDR &&
> >   			    size &&
> > -			    (inv_info->addr_info.addr &
> > ((BIT(VTD_PAGE_SHIFT + size)) - 1))) {
> > +				(inv_info->granu.addr_info.addr &
> > ((BIT(VTD_PAGE_SHIFT + size)) - 1))) {  
> 
> Nit: Keep it aligned. With this tweaked,
> 
> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
will do. Thanks.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 4/5] iommu/uapi: Handle data and argsz filled by users
  2020-06-24  6:54   ` Lu Baolu
@ 2020-06-24 17:07     ` Jacob Pan
  2020-06-25  7:07       ` Lu Baolu
  0 siblings, 1 reply; 19+ messages in thread
From: Jacob Pan @ 2020-06-24 17:07 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	Alex Williamson, LKML, Christoph Hellwig, iommu, David Woodhouse

On Wed, 24 Jun 2020 14:54:49 +0800
Lu Baolu <baolu.lu@linux.intel.com> wrote:

> Hi Jacob,
> 
> On 2020/6/24 1:03, Jacob Pan wrote:
> > IOMMU UAPI data has a user filled argsz field which indicates the
> > data length comes with the API call. User data is not trusted,
> > argsz must be validated based on the current kernel data size,
> > mandatory data size, and feature flags.
> > 
> > User data may also be extended, results in possible argsz increase.
> > Backward compatibility is ensured based on size and flags checking.
> > Details are documented in Documentation/userspace-api/iommu.rst
> > 
> > This patch adds sanity checks in both IOMMU layer and vendor code,
> > where VT-d is the only user for now.
> > 
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > ---
> >   drivers/iommu/intel/svm.c |  3 ++
> >   drivers/iommu/iommu.c     | 96
> > ++++++++++++++++++++++++++++++++++++++++++++---
> > include/linux/iommu.h     |  7 ++-- 3 files changed, 98
> > insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> > index 713b3a218483..237db56878c0 100644
> > --- a/drivers/iommu/intel/svm.c
> > +++ b/drivers/iommu/intel/svm.c
> > @@ -244,6 +244,9 @@ int intel_svm_bind_gpasid(struct iommu_domain
> > *domain, struct device *dev, data->format !=
> > IOMMU_PASID_FORMAT_INTEL_VTD) return -EINVAL;
> >   
> > +	if (data->argsz != offsetofend(struct
> > iommu_gpasid_bind_data, vendor.vtd))
> > +		return -EINVAL;  
> 
> Need to do size check in intel_iommu_sva_invalidate() as well?
> 
No need. The difference is that there is no
vendor specific union for intel_iommu_sva_invalidate().

Generic flags are used to process invalidation data inside
intel_iommu_sva_invalidate().
> > +
> >   	if (!dev_is_pci(dev))
> >   		return -ENOTSUPP;
> >   
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index d43120eb1dc5..4a025c429b41 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -1951,22 +1951,108 @@ int iommu_attach_device(struct
> > iommu_domain *domain, struct device *dev)
> > EXPORT_SYMBOL_GPL(iommu_attach_device); 
> >   int iommu_cache_invalidate(struct iommu_domain *domain, struct
> > device *dev,
> > -			   struct iommu_cache_invalidate_info
> > *inv_info)
> > +			void __user *uinfo)  
> 
> Nit: keep it aligned.
> 
> >   {
> > +	struct iommu_cache_invalidate_info inv_info;
> > +	unsigned long minsz, maxsz;
> > +
> >   	if (unlikely(!domain->ops->cache_invalidate))
> >   		return -ENODEV;
> >   
> > -	return domain->ops->cache_invalidate(domain, dev,
> > inv_info);
> > +	/* Current kernel data size is the max to be copied from
> > user */
> > +	maxsz = sizeof(struct iommu_cache_invalidate_info);
> > +	memset((void *)&inv_info, 0, maxsz);
> > +
> > +	/*
> > +	 * No new spaces can be added before the variable sized
> > union, the
> > +	 * minimum size is the offset to the union.
> > +	 */
> > +	minsz = offsetof(struct iommu_cache_invalidate_info,
> > granu); +
> > +	/* Copy minsz from user to get flags and argsz */
> > +	if (copy_from_user(&inv_info, uinfo, minsz))
> > +		return -EFAULT;
> > +
> > +	/* Fields before variable size union is mandatory */
> > +	if (inv_info.argsz < minsz)
> > +		return -EINVAL;
> > +	/*
> > +	 * User might be using a newer UAPI header, we shall let
> > IOMMU vendor
> > +	 * driver decide on what size it needs. Since the UAPI
> > data extension
> > +	 * can be vendor specific, larger argsz could be the
> > result of extension
> > +	 * for one vendor but it should not affect another vendor.
> > +	 */
> > +	/*
> > +	 * User might be using a newer UAPI header which has a
> > larger data
> > +	 * size, we shall support the existing flags within the
> > current
> > +	 * size.
> > +	 */
> > +	if (inv_info.argsz > maxsz)
> > +		inv_info.argsz = maxsz;
> > +
> > +	/* Checking the exact argsz based on generic flags */
> > +	if (inv_info.granularity == IOMMU_INV_GRANU_ADDR &&
> > +		inv_info.argsz != offsetofend(struct
> > iommu_cache_invalidate_info,
> > +					granu.addr_info))
> > +		return -EINVAL;
> > +
> > +	if (inv_info.granularity == IOMMU_INV_GRANU_PASID &&
> > +		inv_info.argsz != offsetofend(struct
> > iommu_cache_invalidate_info,
> > +					granu.pasid_info))
> > +		return -EINVAL;
> > +
> > +	/* Copy the remaining user data _after_ minsz */
> > +	if (copy_from_user((void *)&inv_info + minsz, uinfo +
> > minsz,
> > +				inv_info.argsz - minsz))
> > +		return -EFAULT;
> > +
> > +	return domain->ops->cache_invalidate(domain, dev,
> > &inv_info); }
> >   EXPORT_SYMBOL_GPL(iommu_cache_invalidate);
> >   
> > -int iommu_sva_bind_gpasid(struct iommu_domain *domain,
> > -			   struct device *dev, struct
> > iommu_gpasid_bind_data *data) +int iommu_sva_bind_gpasid(struct
> > iommu_domain *domain, struct device *dev,
> > +						void __user *udata)
> >   {
> > +
> > +	struct iommu_gpasid_bind_data data;
> > +	unsigned long minsz, maxsz;
> > +
> >   	if (unlikely(!domain->ops->sva_bind_gpasid))
> >   		return -ENODEV;
> >   
> > -	return domain->ops->sva_bind_gpasid(domain, dev, data);
> > +	/* Current kernel data size is the max to be copied from
> > user */
> > +	maxsz = sizeof(struct iommu_gpasid_bind_data);
> > +	memset((void *)&data, 0, maxsz);
> > +
> > +	/*
> > +	 * No new spaces can be added before the variable sized
> > union, the
> > +	 * minimum size is the offset to the union.
> > +	 */
> > +	minsz = offsetof(struct iommu_gpasid_bind_data, vendor);
> > +
> > +	/* Copy minsz from user to get flags and argsz */
> > +	if (copy_from_user(&data, udata, minsz))
> > +		return -EFAULT;
> > +
> > +	/* Fields before variable size union is mandatory */
> > +	if (data.argsz < minsz)
> > +		return -EINVAL;
> > +	/*
> > +	 * User might be using a newer UAPI header, we shall let
> > IOMMU vendor
> > +	 * driver decide on what size it needs. Since the guest
> > PASID bind data
> > +	 * can be vendor specific, larger argsz could be the
> > result of extension
> > +	 * for one vendor but it should not affect another vendor.
> > +	 */
> > +	if (data.argsz > maxsz)
> > +		data.argsz = maxsz;
> > +
> > +	/* Copy the remaining user data _after_ minsz */
> > +	if (copy_from_user((void *)&data + minsz, udata + minsz,
> > +				data.argsz - minsz))
> > +		return -EFAULT;
> > +
> > +
> > +	return domain->ops->sva_bind_gpasid(domain, dev, &data);
> >   }
> >   EXPORT_SYMBOL_GPL(iommu_sva_bind_gpasid);
> >   
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index 5f0b7859d2eb..a688fea42ae5 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -432,9 +432,10 @@ extern void iommu_detach_device(struct
> > iommu_domain *domain, struct device *dev);
> >   extern int iommu_cache_invalidate(struct iommu_domain *domain,
> >   				  struct device *dev,
> > -				  struct
> > iommu_cache_invalidate_info *inv_info);
> > +				  void __user *uinfo);
> > +
> >   extern int iommu_sva_bind_gpasid(struct iommu_domain *domain,
> > -		struct device *dev, struct iommu_gpasid_bind_data
> > *data);
> > +				struct device *dev, void __user
> > *udata); extern int iommu_sva_unbind_gpasid(struct iommu_domain
> > *domain, struct device *dev, ioasid_t pasid);
> >   extern struct iommu_domain *iommu_get_domain_for_dev(struct
> > device *dev); @@ -1062,7 +1063,7 @@ iommu_cache_invalidate(struct
> > iommu_domain *domain, return -ENODEV;
> >   }
> >   static inline int iommu_sva_bind_gpasid(struct iommu_domain
> > *domain,
> > -				struct device *dev, struct
> > iommu_gpasid_bind_data *data)
> > +				struct device *dev, void __user
> > *udata) {
> >   	return -ENODEV;
> >   }
> >   
> 
> Best regards,
> baolu

[Jacob Pan]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 4/5] iommu/uapi: Handle data and argsz filled by users
  2020-06-24 17:07     ` Jacob Pan
@ 2020-06-25  7:07       ` Lu Baolu
  0 siblings, 0 replies; 19+ messages in thread
From: Lu Baolu @ 2020-06-25  7:07 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	iommu, LKML, Christoph Hellwig, Alex Williamson, David Woodhouse

On 2020/6/25 1:07, Jacob Pan wrote:
> On Wed, 24 Jun 2020 14:54:49 +0800
> Lu Baolu<baolu.lu@linux.intel.com>  wrote:
> 
>> Hi Jacob,
>>
>> On 2020/6/24 1:03, Jacob Pan wrote:
>>> IOMMU UAPI data has a user filled argsz field which indicates the
>>> data length comes with the API call. User data is not trusted,
>>> argsz must be validated based on the current kernel data size,
>>> mandatory data size, and feature flags.
>>>
>>> User data may also be extended, results in possible argsz increase.
>>> Backward compatibility is ensured based on size and flags checking.
>>> Details are documented in Documentation/userspace-api/iommu.rst
>>>
>>> This patch adds sanity checks in both IOMMU layer and vendor code,
>>> where VT-d is the only user for now.
>>>
>>> Signed-off-by: Liu Yi L<yi.l.liu@intel.com>
>>> Signed-off-by: Jacob Pan<jacob.jun.pan@linux.intel.com>
>>> ---
>>>    drivers/iommu/intel/svm.c |  3 ++
>>>    drivers/iommu/iommu.c     | 96
>>> ++++++++++++++++++++++++++++++++++++++++++++---
>>> include/linux/iommu.h     |  7 ++-- 3 files changed, 98
>>> insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
>>> index 713b3a218483..237db56878c0 100644
>>> --- a/drivers/iommu/intel/svm.c
>>> +++ b/drivers/iommu/intel/svm.c
>>> @@ -244,6 +244,9 @@ int intel_svm_bind_gpasid(struct iommu_domain
>>> *domain, struct device *dev, data->format !=
>>> IOMMU_PASID_FORMAT_INTEL_VTD) return -EINVAL;
>>>    
>>> +	if (data->argsz != offsetofend(struct
>>> iommu_gpasid_bind_data, vendor.vtd))
>>> +		return -EINVAL;
>> Need to do size check in intel_iommu_sva_invalidate() as well?
>>
> No need. The difference is that there is no
> vendor specific union for intel_iommu_sva_invalidate().
> 
> Generic flags are used to process invalidation data inside
> intel_iommu_sva_invalidate().

Thanks for the explanation. With the nit tweaked,

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>

Best regards,
baolu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 5/5] iommu/uapi: Support both kernel and user unbind guest PASID
  2020-06-23 17:03 ` [PATCH v3 5/5] iommu/uapi: Support both kernel and user unbind guest PASID Jacob Pan
  2020-06-24  7:55   ` Lu Baolu
@ 2020-06-25 12:59   ` Lu Baolu
  1 sibling, 0 replies; 19+ messages in thread
From: Lu Baolu @ 2020-06-25 12:59 UTC (permalink / raw)
  To: Jacob Pan, iommu, LKML, Joerg Roedel, Alex Williamson
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	Christoph Hellwig, David Woodhouse

Hi Jacob,

On 2020/6/24 1:03, Jacob Pan wrote:
> +int __iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
> +			struct iommu_gpasid_bind_data *data)
>   {
>   	if (unlikely(!domain->ops->sva_unbind_gpasid))
>   		return -ENODEV;
>   
> -	return domain->ops->sva_unbind_gpasid(dev, pasid);
> +	return domain->ops->sva_unbind_gpasid(dev, data->hpasid);
> +}
> +EXPORT_SYMBOL_GPL(__iommu_sva_unbind_gpasid);

__iommu_sva_unbind_gpasid() looks more like an internal only helper. How
about something like iommu_sva_kunbind_gpasid()?

Best regards,
baolu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/5] docs: IOMMU user API
  2020-06-23 17:03 ` [PATCH v3 1/5] docs: IOMMU user API Jacob Pan
@ 2020-06-26 22:19   ` Alex Williamson
  2020-06-29 23:05     ` Jacob Pan
  0 siblings, 1 reply; 19+ messages in thread
From: Alex Williamson @ 2020-06-26 22:19 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	LKML, Christoph Hellwig, iommu, David Woodhouse

On Tue, 23 Jun 2020 10:03:53 -0700
Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:

> IOMMU UAPI is newly introduced to support communications between guest
> virtual IOMMU and host IOMMU. There has been lots of discussions on how
> it should work with VFIO UAPI and userspace in general.
> 
> This document is indended to clarify the UAPI design and usage. The
> mechenics of how future extensions should be achieved are also covered
> in this documentation.
> 
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
>  Documentation/userspace-api/iommu.rst | 244 ++++++++++++++++++++++++++++++++++
>  1 file changed, 244 insertions(+)
>  create mode 100644 Documentation/userspace-api/iommu.rst
> 
> diff --git a/Documentation/userspace-api/iommu.rst b/Documentation/userspace-api/iommu.rst
> new file mode 100644
> index 000000000000..f9e4ed90a413
> --- /dev/null
> +++ b/Documentation/userspace-api/iommu.rst
> @@ -0,0 +1,244 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +.. iommu:
> +
> +=====================================
> +IOMMU Userspace API
> +=====================================
> +
> +IOMMU UAPI is used for virtualization cases where communications are
> +needed between physical and virtual IOMMU drivers. For native
> +usage, IOMMU is a system device which does not need to communicate
> +with user space directly.
> +
> +The primary use cases are guest Shared Virtual Address (SVA) and
> +guest IO virtual address (IOVA), wherein a virtual IOMMU (vIOMMU) is
> +required to communicate with the physical IOMMU in the host.
> +
> +.. contents:: :local:
> +
> +Functionalities
> +===============
> +Communications of user and kernel involve both directions. The
> +supported user-kernel APIs are as follows:
> +
> +1. Alloc/Free PASID
> +2. Bind/unbind guest PASID (e.g. Intel VT-d)
> +3. Bind/unbind guest PASID table (e.g. ARM sMMU)
> +4. Invalidate IOMMU caches
> +5. Service page requests
> +
> +Requirements
> +============
> +The IOMMU UAPIs are generic and extensible to meet the following
> +requirements:
> +
> +1. Emulated and para-virtualised vIOMMUs
> +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.)
> +3. Extensions to the UAPI shall not break existing user space
> +
> +Interfaces
> +==========
> +Although the data structures defined in IOMMU UAPI are self-contained,
> +there is no user API functions introduced. Instead, IOMMU UAPI is
> +designed to work with existing user driver frameworks such as VFIO.
> +
> +Extension Rules & Precautions
> +-----------------------------
> +When IOMMU UAPI gets extended, the data structures can *only* be
> +modified in two ways:
> +
> +1. Adding new fields by re-purposing the padding[] field. No size change.
> +2. Adding new union members at the end. May increase in size.
> +
> +No new fields can be added *after* the variable sized union in that it
> +will break backward compatibility when offset moves. In both cases, a
> +new flag must be accompanied with a new field such that the IOMMU
> +driver can process the data based on the new flag. Version field is
> +only reserved for the unlikely event of UAPI upgrade at its entirety.
> +
> +It's *always* the caller's responsibility to indicate the size of the
> +structure passed by setting argsz appropriately.
> +Though at the same time, argsz is user provided data which is not
> +trusted. The argsz field allows the user to indicate how much data
> +they're providing, it's still the kernel's responsibility to validate
> +whether it's correct and sufficient for the requested operation.
> +
> +Compatibility Checking
> +----------------------
> +When IOMMU UAPI extension results in size increase, user such as VFIO
> +has to handle the following cases:
> +
> +1. User and kernel has exact size match
> +2. An older user with older kernel header (smaller UAPI size) running on a
> +   newer kernel (larger UAPI size)
> +3. A newer user with newer kernel header (larger UAPI size) running
> +   on an older kernel.
> +4. A malicious/misbehaving user pass illegal/invalid size but within
> +   range. The data may contain garbage.

What exactly does vfio need to do to handle these?

> +
> +Feature Checking
> +----------------
> +While launching a guest with vIOMMU, it is important to ensure that host
> +can support the UAPI data structures to be used for vIOMMU-pIOMMU
> +communications. Without upfront compatibility checking, future faults
> +are difficult to report even in normal conditions. For example, TLB
> +invalidations should always succeed. There is no architectural way to
> +report back to the vIOMMU if the UAPI data is incompatible. If that
> +happens, in order to protect IOMMU iosolation guarantee, we have to
> +resort to not giving completion status in vIOMMU. This may result in
> +VM hang.
> +
> +For this reason the following IOMMU UAPIs cannot fail:
> +
> +1. Free PASID
> +2. Unbind guest PASID
> +3. Unbind guest PASID table (SMMU)
> +4. Cache invalidate
> +
> +User applications such as QEMU is expected to import kernel UAPI
> +headers. Backward compatibility is supported per feature flags.
> +For example, an older QEMU (with older kernel header) can run on newer
> +kernel. Newer QEMU (with new kernel header) may refuse to initialize
> +on an older kernel if new feature flags are not supported by older
> +kernel. Simply recompile existing code with newer kernel header should
> +not be an issue in that only existing flags are used.
> +
> +IOMMU vendor driver should report the below features to IOMMU UAPI
> +consumers (e.g. via VFIO).
> +
> +1. IOMMU_NESTING_FEAT_SYSWIDE_PASID
> +2. IOMMU_NESTING_FEAT_BIND_PGTBL
> +3. IOMMU_NESTING_FEAT_BIND_PASID_TABLE
> +4. IOMMU_NESTING_FEAT_CACHE_INVLD
> +5. IOMMU_NESTING_FEAT_PAGE_REQUEST
> +
> +Take VFIO as example, upon request from VFIO user space (e.g. QEMU),
> +VFIO kernel code shall query IOMMU vendor driver for the support of
> +the above features. Query result can then be reported back to the
> +user-space caller. Details can be found in
> +Documentation/driver-api/vfio.rst.
> +
> +
> +Data Passing Example with VFIO
> +------------------------------
> +As the ubiquitous userspace driver framework, VFIO is already IOMMU
> +aware and share many key concepts such as device model, group, and
> +protection domain. Other user driver frameworks can also be extended
> +to support IOMMU UAPI but it is outside the scope of this document.
> +
> +In this tight-knit VFIO-IOMMU interface, the ultimate consumer of the
> +IOMMU UAPI data is the host IOMMU driver. VFIO facilitates user-kernel
> +transport, capability checking, security, and life cycle management of
> +process address space ID (PASID).
> +
> +Unlike normal user data passed via VFIO UAPI IOTCL, IOMMU driver is the
> +ultimate consumer of its UAPI data. At VFIO layer, the IOMMU UAPI data
> +is wrapped in a VFIO UAPI data. It follows the
> +pattern below::
> +
> +   struct {
> +	__u32 argsz;
> +	__u32 flags;
> +	__u8  data[];
> +   };
> +
> +Here data[] contains the IOMMU UAPI data structures. VFIO has the
> +freedom to bundle the data as well as parse data size based on its own flags.
> +
> +In order to determine the size and feature set of the user data, argsz
> +and flags are also embedded in the IOMMU UAPI data structures.
> +A "__u32 argsz" field is *always* at the beginning of each structure.
> +
> +For example:
> +::
> +
> +   struct iommu_cache_invalidate_info {
> +	__u32	argsz;
> +	#define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
> +	__u32	version;
> +	/* IOMMU paging structure cache */
> +	#define IOMMU_CACHE_INV_TYPE_IOTLB	(1 << 0) /* IOMMU IOTLB */
> +	#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB	(1 << 1) /* Device IOTLB */
> +	#define IOMMU_CACHE_INV_TYPE_PASID	(1 << 2) /* PASID cache */
> +	#define IOMMU_CACHE_INV_TYPE_NR		(3)
> +	__u8	cache;
> +	__u8	granularity;
> +	__u8	padding[2];
> +	union {
> +		struct iommu_inv_pasid_info pasid_info;
> +		struct iommu_inv_addr_info addr_info;
> +	} granu;
> +   };
> +
> +VFIO is responsible for checking its own argsz and flags then invokes
> +appropriate IOMMU UAPI functions. User pointer is passed to IOMMU
> +layer for further processing. The responsibilities are divided as
> +follows:
> +
> +- Generic IOMMU layer checks argsz range and override out-of-range
> +  value. If the exact argsz is based on generic flags, they are checked
> +  here as well.
> +
> +- Vendor IOMMU driver checks argsz based on vendor flags, UAPI data
> +  is consumed based on flags
> +
> +Once again, use guest TLB invalidation as an example, argsz is based
> +on generic flags in the invalidation information. IOMMU generic code
> +shall process the UAPI data as the following:
> +
> +::
> +
> + int iommu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
> +			void __user *uinfo)
> + {
> +	/* Current kernel data size is the max to be copied from user */
> +	maxsz = sizeof(struct iommu_cache_invalidate_info);
> +	memset((void *)&inv_info, 0, maxsz);
> +
> +	/*
> +	 * No new spaces can be added before the variable sized union, the
> +	 * minimum size is the offset to the union.
> +	 */
> +	minsz = offsetof(struct iommu_cache_invalidate_info, granu);
> +
> +	/* Copy minsz from user to get flags and argsz */
> +	if (copy_from_user(&inv_info, uinfo, minsz))
> +		return -EFAULT;
> +
> +	/* Fields before variable size union is mandatory */
> +	if (inv_info.argsz < minsz)
> +		return -EINVAL;
> +	/*
> +	 * User might be using a newer UAPI header which has a larger data
> +	 * size, we shall support the existing flags within the current
> +	 * size.
> +	 */
> +	if (inv_info.argsz > maxsz)
> +		inv_info.argsz = maxsz;
> +
> +	/* Checking the exact argsz based on generic flags */
> +	if (inv_info.granularity == IOMMU_INV_GRANU_ADDR &&
> +		inv_info.argsz != offsetofend(struct iommu_cache_invalidate_info,
> +					granu.addr_info))

Is it really reasonable to expect the user to specify argsz to the
exact union element for the callback?  I'd certainly expect users to
simply use sizeof(struct iommu_cache_invalidate_info) and it should
therefore be sufficient to test >= here rather than jump through hoops
with an exact size.  We're already changing inv_info.argsz above to fit
our known structure, it's inconsistent to then expect it to be some
exact value.
 
> +		return -EINVAL;
> +
> +	if (inv_info.granularity == IOMMU_INV_GRANU_PASID &&
> +		inv_info.argsz != offsetofend(struct
> iommu_cache_invalidate_info,
> +					granu.pasid_info))
> +		return -EINVAL;
> +
> +	/* Copy the remaining user data _after_ minsz */
> +	if (copy_from_user((void *)&inv_info + minsz, uinfo + minsz,
> +				inv_info.argsz - minsz))
> +		return -EFAULT;
> +
> +	return domain->ops->cache_invalidate(domain, dev, &inv_info);
> + }
> + Add a wrapper
> +   __iommu_unbind_( kernel data, same user data, kernel copy)
> +
> +Notice that in this example, since union size is determined by
> generic +flags, all checking to argsz is validated in the generic
> IOMMU layer, +vendor driver does not need to check argsz. However, if
> union size is +based on vendor data, such as iommu_sva_bind_gpasid(),
> it will be +vendor driver's responsibility to validate the exact
> argsz.

struct iommu_cache_invalidate_info is a good example because it
explicitly states a table of type vs granularity validity.  When the
cache_invalidate() callback is used by an internal user we can consider
it a bug in the caller if its usage falls outside of these prescribed
valid combinations, ie. iommu_ops callbacks may assume a trusted caller
that isn't trying to exploit any loophole.  But here we've done nothing
more than validated the supplied size to pass it through to a non-user
hardened callback.  We didn't check the version, we didn't check that
any of the undefined bits in cache or granularity or padding were set,
we don't know what flags might be set in the union elements.  For
example, if a user is able to set a flag that gets ignored now, that
means we can never use that flag without potentially breaking that user
in the future.  If a user can pass in version 3141592654 now, then we
can never use version for validation.  I see that
intel_iommu_sva_invalidate() does test the version, but has no obvious
other hardening.  I'm afraid we're being far to lax about accepting a
data structure provided by a user, we should not assume good faith.
Thanks,

Alex

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/5] docs: IOMMU user API
  2020-06-26 22:19   ` Alex Williamson
@ 2020-06-29 23:05     ` Jacob Pan
  2020-06-30  2:52       ` Tian, Kevin
  2020-07-07 21:40       ` Alex Williamson
  0 siblings, 2 replies; 19+ messages in thread
From: Jacob Pan @ 2020-06-29 23:05 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	LKML, Christoph Hellwig, iommu, David Woodhouse

On Fri, 26 Jun 2020 16:19:23 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> On Tue, 23 Jun 2020 10:03:53 -0700
> Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
> 
> > IOMMU UAPI is newly introduced to support communications between
> > guest virtual IOMMU and host IOMMU. There has been lots of
> > discussions on how it should work with VFIO UAPI and userspace in
> > general.
> > 
> > This document is indended to clarify the UAPI design and usage. The
> > mechenics of how future extensions should be achieved are also
> > covered in this documentation.
> > 
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > ---
> >  Documentation/userspace-api/iommu.rst | 244
> > ++++++++++++++++++++++++++++++++++ 1 file changed, 244 insertions(+)
> >  create mode 100644 Documentation/userspace-api/iommu.rst
> > 
> > diff --git a/Documentation/userspace-api/iommu.rst
> > b/Documentation/userspace-api/iommu.rst new file mode 100644
> > index 000000000000..f9e4ed90a413
> > --- /dev/null
> > +++ b/Documentation/userspace-api/iommu.rst
> > @@ -0,0 +1,244 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +.. iommu:
> > +
> > +=====================================
> > +IOMMU Userspace API
> > +=====================================
> > +
> > +IOMMU UAPI is used for virtualization cases where communications
> > are +needed between physical and virtual IOMMU drivers. For native
> > +usage, IOMMU is a system device which does not need to communicate
> > +with user space directly.
> > +
> > +The primary use cases are guest Shared Virtual Address (SVA) and
> > +guest IO virtual address (IOVA), wherein a virtual IOMMU (vIOMMU)
> > is +required to communicate with the physical IOMMU in the host.
> > +
> > +.. contents:: :local:
> > +
> > +Functionalities
> > +===============
> > +Communications of user and kernel involve both directions. The
> > +supported user-kernel APIs are as follows:
> > +
> > +1. Alloc/Free PASID
> > +2. Bind/unbind guest PASID (e.g. Intel VT-d)
> > +3. Bind/unbind guest PASID table (e.g. ARM sMMU)
> > +4. Invalidate IOMMU caches
> > +5. Service page requests
> > +
> > +Requirements
> > +============
> > +The IOMMU UAPIs are generic and extensible to meet the following
> > +requirements:
> > +
> > +1. Emulated and para-virtualised vIOMMUs
> > +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.)
> > +3. Extensions to the UAPI shall not break existing user space
> > +
> > +Interfaces
> > +==========
> > +Although the data structures defined in IOMMU UAPI are
> > self-contained, +there is no user API functions introduced.
> > Instead, IOMMU UAPI is +designed to work with existing user driver
> > frameworks such as VFIO. +
> > +Extension Rules & Precautions
> > +-----------------------------
> > +When IOMMU UAPI gets extended, the data structures can *only* be
> > +modified in two ways:
> > +
> > +1. Adding new fields by re-purposing the padding[] field. No size
> > change. +2. Adding new union members at the end. May increase in
> > size. +
> > +No new fields can be added *after* the variable sized union in
> > that it +will break backward compatibility when offset moves. In
> > both cases, a +new flag must be accompanied with a new field such
> > that the IOMMU +driver can process the data based on the new flag.
> > Version field is +only reserved for the unlikely event of UAPI
> > upgrade at its entirety. +
> > +It's *always* the caller's responsibility to indicate the size of
> > the +structure passed by setting argsz appropriately.
> > +Though at the same time, argsz is user provided data which is not
> > +trusted. The argsz field allows the user to indicate how much data
> > +they're providing, it's still the kernel's responsibility to
> > validate +whether it's correct and sufficient for the requested
> > operation. +
> > +Compatibility Checking
> > +----------------------
> > +When IOMMU UAPI extension results in size increase, user such as
> > VFIO +has to handle the following cases:
> > +
> > +1. User and kernel has exact size match
> > +2. An older user with older kernel header (smaller UAPI size)
> > running on a
> > +   newer kernel (larger UAPI size)
> > +3. A newer user with newer kernel header (larger UAPI size) running
> > +   on an older kernel.
> > +4. A malicious/misbehaving user pass illegal/invalid size but
> > within
> > +   range. The data may contain garbage.  
> 
> What exactly does vfio need to do to handle these?
> 
VFIO does nothing other than returning the status from IOMMU driver.
Based on the return status, users such as QEMU can cause fault
conditions within the vIOMMU.

> > +
> > +Feature Checking
> > +----------------
> > +While launching a guest with vIOMMU, it is important to ensure
> > that host +can support the UAPI data structures to be used for
> > vIOMMU-pIOMMU +communications. Without upfront compatibility
> > checking, future faults +are difficult to report even in normal
> > conditions. For example, TLB +invalidations should always succeed.
> > There is no architectural way to +report back to the vIOMMU if the
> > UAPI data is incompatible. If that +happens, in order to protect
> > IOMMU iosolation guarantee, we have to +resort to not giving
> > completion status in vIOMMU. This may result in +VM hang.
> > +
> > +For this reason the following IOMMU UAPIs cannot fail:
> > +
> > +1. Free PASID
> > +2. Unbind guest PASID
> > +3. Unbind guest PASID table (SMMU)
> > +4. Cache invalidate
> > +
> > +User applications such as QEMU is expected to import kernel UAPI
> > +headers. Backward compatibility is supported per feature flags.
> > +For example, an older QEMU (with older kernel header) can run on
> > newer +kernel. Newer QEMU (with new kernel header) may refuse to
> > initialize +on an older kernel if new feature flags are not
> > supported by older +kernel. Simply recompile existing code with
> > newer kernel header should +not be an issue in that only existing
> > flags are used. +
> > +IOMMU vendor driver should report the below features to IOMMU UAPI
> > +consumers (e.g. via VFIO).
> > +
> > +1. IOMMU_NESTING_FEAT_SYSWIDE_PASID
> > +2. IOMMU_NESTING_FEAT_BIND_PGTBL
> > +3. IOMMU_NESTING_FEAT_BIND_PASID_TABLE
> > +4. IOMMU_NESTING_FEAT_CACHE_INVLD
> > +5. IOMMU_NESTING_FEAT_PAGE_REQUEST
> > +
> > +Take VFIO as example, upon request from VFIO user space (e.g.
> > QEMU), +VFIO kernel code shall query IOMMU vendor driver for the
> > support of +the above features. Query result can then be reported
> > back to the +user-space caller. Details can be found in
> > +Documentation/driver-api/vfio.rst.
> > +
> > +
> > +Data Passing Example with VFIO
> > +------------------------------
> > +As the ubiquitous userspace driver framework, VFIO is already IOMMU
> > +aware and share many key concepts such as device model, group, and
> > +protection domain. Other user driver frameworks can also be
> > extended +to support IOMMU UAPI but it is outside the scope of this
> > document. +
> > +In this tight-knit VFIO-IOMMU interface, the ultimate consumer of
> > the +IOMMU UAPI data is the host IOMMU driver. VFIO facilitates
> > user-kernel +transport, capability checking, security, and life
> > cycle management of +process address space ID (PASID).
> > +
> > +Unlike normal user data passed via VFIO UAPI IOTCL, IOMMU driver
> > is the +ultimate consumer of its UAPI data. At VFIO layer, the
> > IOMMU UAPI data +is wrapped in a VFIO UAPI data. It follows the
> > +pattern below::
> > +
> > +   struct {
> > +	__u32 argsz;
> > +	__u32 flags;
> > +	__u8  data[];
> > +   };
> > +
> > +Here data[] contains the IOMMU UAPI data structures. VFIO has the
> > +freedom to bundle the data as well as parse data size based on its
> > own flags. +
> > +In order to determine the size and feature set of the user data,
> > argsz +and flags are also embedded in the IOMMU UAPI data
> > structures. +A "__u32 argsz" field is *always* at the beginning of
> > each structure. +
> > +For example:
> > +::
> > +
> > +   struct iommu_cache_invalidate_info {
> > +	__u32	argsz;
> > +	#define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
> > +	__u32	version;
> > +	/* IOMMU paging structure cache */
> > +	#define IOMMU_CACHE_INV_TYPE_IOTLB	(1 << 0) /*
> > IOMMU IOTLB */
> > +	#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB	(1 << 1) /*
> > Device IOTLB */
> > +	#define IOMMU_CACHE_INV_TYPE_PASID	(1 << 2) /*
> > PASID cache */
> > +	#define IOMMU_CACHE_INV_TYPE_NR		(3)
> > +	__u8	cache;
> > +	__u8	granularity;
> > +	__u8	padding[2];
> > +	union {
> > +		struct iommu_inv_pasid_info pasid_info;
> > +		struct iommu_inv_addr_info addr_info;
> > +	} granu;
> > +   };
> > +
> > +VFIO is responsible for checking its own argsz and flags then
> > invokes +appropriate IOMMU UAPI functions. User pointer is passed
> > to IOMMU +layer for further processing. The responsibilities are
> > divided as +follows:
> > +
> > +- Generic IOMMU layer checks argsz range and override out-of-range
> > +  value. If the exact argsz is based on generic flags, they are
> > checked
> > +  here as well.
> > +
> > +- Vendor IOMMU driver checks argsz based on vendor flags, UAPI data
> > +  is consumed based on flags
> > +
> > +Once again, use guest TLB invalidation as an example, argsz is
> > based +on generic flags in the invalidation information. IOMMU
> > generic code +shall process the UAPI data as the following:
> > +
> > +::
> > +
> > + int iommu_cache_invalidate(struct iommu_domain *domain, struct
> > device *dev,
> > +			void __user *uinfo)
> > + {
> > +	/* Current kernel data size is the max to be copied from
> > user */
> > +	maxsz = sizeof(struct iommu_cache_invalidate_info);
> > +	memset((void *)&inv_info, 0, maxsz);
> > +
> > +	/*
> > +	 * No new spaces can be added before the variable sized
> > union, the
> > +	 * minimum size is the offset to the union.
> > +	 */
> > +	minsz = offsetof(struct iommu_cache_invalidate_info,
> > granu); +
> > +	/* Copy minsz from user to get flags and argsz */
> > +	if (copy_from_user(&inv_info, uinfo, minsz))
> > +		return -EFAULT;
> > +
> > +	/* Fields before variable size union is mandatory */
> > +	if (inv_info.argsz < minsz)
> > +		return -EINVAL;
> > +	/*
> > +	 * User might be using a newer UAPI header which has a
> > larger data
> > +	 * size, we shall support the existing flags within the
> > current
> > +	 * size.
> > +	 */
> > +	if (inv_info.argsz > maxsz)
> > +		inv_info.argsz = maxsz;
> > +
> > +	/* Checking the exact argsz based on generic flags */
> > +	if (inv_info.granularity == IOMMU_INV_GRANU_ADDR &&
> > +		inv_info.argsz != offsetofend(struct
> > iommu_cache_invalidate_info,
> > +					granu.addr_info))  
> 
> Is it really reasonable to expect the user to specify argsz to the
> exact union element for the callback?  I'd certainly expect users to
> simply use sizeof(struct iommu_cache_invalidate_info) and it should
> therefore be sufficient to test >= here rather than jump through hoops
> with an exact size.  We're already changing inv_info.argsz above to
> fit our known structure, it's inconsistent to then expect it to be
> some exact value.
>  
I was thinking argsz doesn't have to be the exact struct size. It should
be whatever the sufficient & correct size used by the user for a given
call.

For example, current struct iommu_gpasid_bind_data {} only has VT-d
data. If it gets extended with SMMU data in the union, VT-d vIOMMU
emulation should only fill the union size of vt-d.

> > +		return -EINVAL;
> > +
> > +	if (inv_info.granularity == IOMMU_INV_GRANU_PASID &&
> > +		inv_info.argsz != offsetofend(struct
> > iommu_cache_invalidate_info,
> > +					granu.pasid_info))
> > +		return -EINVAL;
> > +
> > +	/* Copy the remaining user data _after_ minsz */
> > +	if (copy_from_user((void *)&inv_info + minsz, uinfo +
> > minsz,
> > +				inv_info.argsz - minsz))
> > +		return -EFAULT;
> > +
> > +	return domain->ops->cache_invalidate(domain, dev,
> > &inv_info);
> > + }
> > + Add a wrapper
> > +   __iommu_unbind_( kernel data, same user data, kernel copy)
> > +
This should be removed. Sorry about the confusion. The patch does not
have two data pointers, just separate APIs for kernel and user.

> > +Notice that in this example, since union size is determined by
> > generic +flags, all checking to argsz is validated in the generic
> > IOMMU layer, +vendor driver does not need to check argsz. However,
> > if union size is +based on vendor data, such as
> > iommu_sva_bind_gpasid(), it will be +vendor driver's responsibility
> > to validate the exact argsz.  
> 
> struct iommu_cache_invalidate_info is a good example because it
> explicitly states a table of type vs granularity validity.  When the
> cache_invalidate() callback is used by an internal user we can
> consider it a bug in the caller if its usage falls outside of these
> prescribed valid combinations, ie. iommu_ops callbacks may assume a
> trusted caller that isn't trying to exploit any loophole.
Separate APIs are proposed in the patchset to address UAPIs
with both kernel and user callers. Sorry about the last line in the
example above. Currently, only unbind_gpasid() and page_response() have
both kernel and userspace callers. e.g.

   /* userspace caller */
   int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
			void __user *udata)

   /* in-kernel caller */
   int __iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
                                 struct iommu_gpasid_bind_data *data)

We don;t expect in-kernel caller for cache invalidate in that it is
implied in unmap, unbind operations.

>  But here
> we've done nothing more than validated the supplied size to pass it
> through to a non-user hardened callback.  We didn't check the
> version,
Yes, I should move up the version check from vendor driver.

> we didn't check that any of the undefined bits in cache or
> granularity or padding were set, we don't know what flags might be
> set in the union elements.
You are right, we should sanitize reserved bits.

> For example, if a user is able to set a
> flag that gets ignored now, that means we can never use that flag
> without potentially breaking that user in the future.
Good point, all reserved/unused bits should be tested.

>  If a user can
> pass in version 3141592654 now, then we can never use version for
> validation.  I see that intel_iommu_sva_invalidate() does test the
> version, but has no obvious other hardening.  I'm afraid we're being
> far to lax about accepting a data structure provided by a user, we
> should not assume good faith. Thanks,
> 
Agreed. will add checks in the IOMMU generic layer for reserved
bits.
For VT-d vendor driver, we do check all bits in cache types, i.e. in
intel/iommu.c
	for_each_set_bit(cache_type,
			 (unsigned long *)&inv_info->cache,
			 IOMMU_CACHE_INV_TYPE_NR) {


one other hardening is to check vendor argsz. This is in the
bind_gpasid call.

	if (data->argsz != offsetofend(struct iommu_gpasid_bind_data, vendor.vtd))
		return -EINVAL;



> Alex
> 

[Jacob Pan]



_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH v3 1/5] docs: IOMMU user API
  2020-06-29 23:05     ` Jacob Pan
@ 2020-06-30  2:52       ` Tian, Kevin
  2020-06-30 17:39         ` Jacob Pan
  2020-07-07 21:40       ` Alex Williamson
  1 sibling, 1 reply; 19+ messages in thread
From: Tian, Kevin @ 2020-06-30  2:52 UTC (permalink / raw)
  To: Jacob Pan, Alex Williamson
  Cc: Raj, Ashok, Jonathan Corbet, David Woodhouse, LKML,
	Christoph Hellwig, iommu, Jean-Philippe Brucker

> From: Jacob Pan
> Sent: Tuesday, June 30, 2020 7:05 AM
> 
> On Fri, 26 Jun 2020 16:19:23 -0600
> Alex Williamson <alex.williamson@redhat.com> wrote:
> 
> > On Tue, 23 Jun 2020 10:03:53 -0700
> > Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
> >
> > > IOMMU UAPI is newly introduced to support communications between
> > > guest virtual IOMMU and host IOMMU. There has been lots of
> > > discussions on how it should work with VFIO UAPI and userspace in
> > > general.
> > >
> > > This document is indended to clarify the UAPI design and usage. The
> > > mechenics of how future extensions should be achieved are also
> > > covered in this documentation.
> > >
> > > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > ---
> > >  Documentation/userspace-api/iommu.rst | 244
> > > ++++++++++++++++++++++++++++++++++ 1 file changed, 244 insertions(+)
> > >  create mode 100644 Documentation/userspace-api/iommu.rst
> > >
> > > diff --git a/Documentation/userspace-api/iommu.rst
> > > b/Documentation/userspace-api/iommu.rst new file mode 100644
> > > index 000000000000..f9e4ed90a413
> > > --- /dev/null
> > > +++ b/Documentation/userspace-api/iommu.rst
> > > @@ -0,0 +1,244 @@
> > > +.. SPDX-License-Identifier: GPL-2.0
> > > +.. iommu:
> > > +
> > > +=====================================
> > > +IOMMU Userspace API
> > > +=====================================
> > > +
> > > +IOMMU UAPI is used for virtualization cases where communications
> > > are +needed between physical and virtual IOMMU drivers. For native
> > > +usage, IOMMU is a system device which does not need to communicate
> > > +with user space directly.
> > > +
> > > +The primary use cases are guest Shared Virtual Address (SVA) and
> > > +guest IO virtual address (IOVA), wherein a virtual IOMMU (vIOMMU)
> > > is +required to communicate with the physical IOMMU in the host.
> > > +
> > > +.. contents:: :local:
> > > +
> > > +Functionalities
> > > +===============
> > > +Communications of user and kernel involve both directions. The
> > > +supported user-kernel APIs are as follows:
> > > +
> > > +1. Alloc/Free PASID
> > > +2. Bind/unbind guest PASID (e.g. Intel VT-d)
> > > +3. Bind/unbind guest PASID table (e.g. ARM sMMU)
> > > +4. Invalidate IOMMU caches
> > > +5. Service page requests
> > > +
> > > +Requirements
> > > +============
> > > +The IOMMU UAPIs are generic and extensible to meet the following
> > > +requirements:
> > > +
> > > +1. Emulated and para-virtualised vIOMMUs
> > > +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.)
> > > +3. Extensions to the UAPI shall not break existing user space
> > > +
> > > +Interfaces
> > > +==========
> > > +Although the data structures defined in IOMMU UAPI are
> > > self-contained, +there is no user API functions introduced.
> > > Instead, IOMMU UAPI is +designed to work with existing user driver
> > > frameworks such as VFIO. +
> > > +Extension Rules & Precautions
> > > +-----------------------------
> > > +When IOMMU UAPI gets extended, the data structures can *only* be
> > > +modified in two ways:
> > > +
> > > +1. Adding new fields by re-purposing the padding[] field. No size
> > > change. +2. Adding new union members at the end. May increase in
> > > size. +
> > > +No new fields can be added *after* the variable sized union in
> > > that it +will break backward compatibility when offset moves. In
> > > both cases, a +new flag must be accompanied with a new field such
> > > that the IOMMU +driver can process the data based on the new flag.
> > > Version field is +only reserved for the unlikely event of UAPI
> > > upgrade at its entirety. +
> > > +It's *always* the caller's responsibility to indicate the size of
> > > the +structure passed by setting argsz appropriately.
> > > +Though at the same time, argsz is user provided data which is not
> > > +trusted. The argsz field allows the user to indicate how much data
> > > +they're providing, it's still the kernel's responsibility to
> > > validate +whether it's correct and sufficient for the requested
> > > operation. +
> > > +Compatibility Checking
> > > +----------------------
> > > +When IOMMU UAPI extension results in size increase, user such as
> > > VFIO +has to handle the following cases:
> > > +
> > > +1. User and kernel has exact size match
> > > +2. An older user with older kernel header (smaller UAPI size)
> > > running on a
> > > +   newer kernel (larger UAPI size)
> > > +3. A newer user with newer kernel header (larger UAPI size) running
> > > +   on an older kernel.
> > > +4. A malicious/misbehaving user pass illegal/invalid size but
> > > within
> > > +   range. The data may contain garbage.
> >
> > What exactly does vfio need to do to handle these?
> >
> VFIO does nothing other than returning the status from IOMMU driver.
> Based on the return status, users such as QEMU can cause fault
> conditions within the vIOMMU.

But from above description, "user such as VFIO has to handle the
following cases"...

Thanks
Kevin

> 
> > > +
> > > +Feature Checking
> > > +----------------
> > > +While launching a guest with vIOMMU, it is important to ensure
> > > that host +can support the UAPI data structures to be used for
> > > vIOMMU-pIOMMU +communications. Without upfront compatibility
> > > checking, future faults +are difficult to report even in normal
> > > conditions. For example, TLB +invalidations should always succeed.
> > > There is no architectural way to +report back to the vIOMMU if the
> > > UAPI data is incompatible. If that +happens, in order to protect
> > > IOMMU iosolation guarantee, we have to +resort to not giving
> > > completion status in vIOMMU. This may result in +VM hang.
> > > +
> > > +For this reason the following IOMMU UAPIs cannot fail:
> > > +
> > > +1. Free PASID
> > > +2. Unbind guest PASID
> > > +3. Unbind guest PASID table (SMMU)
> > > +4. Cache invalidate
> > > +
> > > +User applications such as QEMU is expected to import kernel UAPI
> > > +headers. Backward compatibility is supported per feature flags.
> > > +For example, an older QEMU (with older kernel header) can run on
> > > newer +kernel. Newer QEMU (with new kernel header) may refuse to
> > > initialize +on an older kernel if new feature flags are not
> > > supported by older +kernel. Simply recompile existing code with
> > > newer kernel header should +not be an issue in that only existing
> > > flags are used. +
> > > +IOMMU vendor driver should report the below features to IOMMU UAPI
> > > +consumers (e.g. via VFIO).
> > > +
> > > +1. IOMMU_NESTING_FEAT_SYSWIDE_PASID
> > > +2. IOMMU_NESTING_FEAT_BIND_PGTBL
> > > +3. IOMMU_NESTING_FEAT_BIND_PASID_TABLE
> > > +4. IOMMU_NESTING_FEAT_CACHE_INVLD
> > > +5. IOMMU_NESTING_FEAT_PAGE_REQUEST
> > > +
> > > +Take VFIO as example, upon request from VFIO user space (e.g.
> > > QEMU), +VFIO kernel code shall query IOMMU vendor driver for the
> > > support of +the above features. Query result can then be reported
> > > back to the +user-space caller. Details can be found in
> > > +Documentation/driver-api/vfio.rst.
> > > +
> > > +
> > > +Data Passing Example with VFIO
> > > +------------------------------
> > > +As the ubiquitous userspace driver framework, VFIO is already IOMMU
> > > +aware and share many key concepts such as device model, group, and
> > > +protection domain. Other user driver frameworks can also be
> > > extended +to support IOMMU UAPI but it is outside the scope of this
> > > document. +
> > > +In this tight-knit VFIO-IOMMU interface, the ultimate consumer of
> > > the +IOMMU UAPI data is the host IOMMU driver. VFIO facilitates
> > > user-kernel +transport, capability checking, security, and life
> > > cycle management of +process address space ID (PASID).
> > > +
> > > +Unlike normal user data passed via VFIO UAPI IOTCL, IOMMU driver
> > > is the +ultimate consumer of its UAPI data. At VFIO layer, the
> > > IOMMU UAPI data +is wrapped in a VFIO UAPI data. It follows the
> > > +pattern below::
> > > +
> > > +   struct {
> > > +	__u32 argsz;
> > > +	__u32 flags;
> > > +	__u8  data[];
> > > +   };
> > > +
> > > +Here data[] contains the IOMMU UAPI data structures. VFIO has the
> > > +freedom to bundle the data as well as parse data size based on its
> > > own flags. +
> > > +In order to determine the size and feature set of the user data,
> > > argsz +and flags are also embedded in the IOMMU UAPI data
> > > structures. +A "__u32 argsz" field is *always* at the beginning of
> > > each structure. +
> > > +For example:
> > > +::
> > > +
> > > +   struct iommu_cache_invalidate_info {
> > > +	__u32	argsz;
> > > +	#define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
> > > +	__u32	version;
> > > +	/* IOMMU paging structure cache */
> > > +	#define IOMMU_CACHE_INV_TYPE_IOTLB	(1 << 0) /*
> > > IOMMU IOTLB */
> > > +	#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB	(1 << 1) /*
> > > Device IOTLB */
> > > +	#define IOMMU_CACHE_INV_TYPE_PASID	(1 << 2) /*
> > > PASID cache */
> > > +	#define IOMMU_CACHE_INV_TYPE_NR		(3)
> > > +	__u8	cache;
> > > +	__u8	granularity;
> > > +	__u8	padding[2];
> > > +	union {
> > > +		struct iommu_inv_pasid_info pasid_info;
> > > +		struct iommu_inv_addr_info addr_info;
> > > +	} granu;
> > > +   };
> > > +
> > > +VFIO is responsible for checking its own argsz and flags then
> > > invokes +appropriate IOMMU UAPI functions. User pointer is passed
> > > to IOMMU +layer for further processing. The responsibilities are
> > > divided as +follows:
> > > +
> > > +- Generic IOMMU layer checks argsz range and override out-of-range
> > > +  value. If the exact argsz is based on generic flags, they are
> > > checked
> > > +  here as well.
> > > +
> > > +- Vendor IOMMU driver checks argsz based on vendor flags, UAPI data
> > > +  is consumed based on flags
> > > +
> > > +Once again, use guest TLB invalidation as an example, argsz is
> > > based +on generic flags in the invalidation information. IOMMU
> > > generic code +shall process the UAPI data as the following:
> > > +
> > > +::
> > > +
> > > + int iommu_cache_invalidate(struct iommu_domain *domain, struct
> > > device *dev,
> > > +			void __user *uinfo)
> > > + {
> > > +	/* Current kernel data size is the max to be copied from
> > > user */
> > > +	maxsz = sizeof(struct iommu_cache_invalidate_info);
> > > +	memset((void *)&inv_info, 0, maxsz);
> > > +
> > > +	/*
> > > +	 * No new spaces can be added before the variable sized
> > > union, the
> > > +	 * minimum size is the offset to the union.
> > > +	 */
> > > +	minsz = offsetof(struct iommu_cache_invalidate_info,
> > > granu); +
> > > +	/* Copy minsz from user to get flags and argsz */
> > > +	if (copy_from_user(&inv_info, uinfo, minsz))
> > > +		return -EFAULT;
> > > +
> > > +	/* Fields before variable size union is mandatory */
> > > +	if (inv_info.argsz < minsz)
> > > +		return -EINVAL;
> > > +	/*
> > > +	 * User might be using a newer UAPI header which has a
> > > larger data
> > > +	 * size, we shall support the existing flags within the
> > > current
> > > +	 * size.
> > > +	 */
> > > +	if (inv_info.argsz > maxsz)
> > > +		inv_info.argsz = maxsz;
> > > +
> > > +	/* Checking the exact argsz based on generic flags */
> > > +	if (inv_info.granularity == IOMMU_INV_GRANU_ADDR &&
> > > +		inv_info.argsz != offsetofend(struct
> > > iommu_cache_invalidate_info,
> > > +					granu.addr_info))
> >
> > Is it really reasonable to expect the user to specify argsz to the
> > exact union element for the callback?  I'd certainly expect users to
> > simply use sizeof(struct iommu_cache_invalidate_info) and it should
> > therefore be sufficient to test >= here rather than jump through hoops
> > with an exact size.  We're already changing inv_info.argsz above to
> > fit our known structure, it's inconsistent to then expect it to be
> > some exact value.
> >
> I was thinking argsz doesn't have to be the exact struct size. It should
> be whatever the sufficient & correct size used by the user for a given
> call.
> 
> For example, current struct iommu_gpasid_bind_data {} only has VT-d
> data. If it gets extended with SMMU data in the union, VT-d vIOMMU
> emulation should only fill the union size of vt-d.
> 
> > > +		return -EINVAL;
> > > +
> > > +	if (inv_info.granularity == IOMMU_INV_GRANU_PASID &&
> > > +		inv_info.argsz != offsetofend(struct
> > > iommu_cache_invalidate_info,
> > > +					granu.pasid_info))
> > > +		return -EINVAL;
> > > +
> > > +	/* Copy the remaining user data _after_ minsz */
> > > +	if (copy_from_user((void *)&inv_info + minsz, uinfo +
> > > minsz,
> > > +				inv_info.argsz - minsz))
> > > +		return -EFAULT;
> > > +
> > > +	return domain->ops->cache_invalidate(domain, dev,
> > > &inv_info);
> > > + }
> > > + Add a wrapper
> > > +   __iommu_unbind_( kernel data, same user data, kernel copy)
> > > +
> This should be removed. Sorry about the confusion. The patch does not
> have two data pointers, just separate APIs for kernel and user.
> 
> > > +Notice that in this example, since union size is determined by
> > > generic +flags, all checking to argsz is validated in the generic
> > > IOMMU layer, +vendor driver does not need to check argsz. However,
> > > if union size is +based on vendor data, such as
> > > iommu_sva_bind_gpasid(), it will be +vendor driver's responsibility
> > > to validate the exact argsz.
> >
> > struct iommu_cache_invalidate_info is a good example because it
> > explicitly states a table of type vs granularity validity.  When the
> > cache_invalidate() callback is used by an internal user we can
> > consider it a bug in the caller if its usage falls outside of these
> > prescribed valid combinations, ie. iommu_ops callbacks may assume a
> > trusted caller that isn't trying to exploit any loophole.
> Separate APIs are proposed in the patchset to address UAPIs
> with both kernel and user callers. Sorry about the last line in the
> example above. Currently, only unbind_gpasid() and page_response() have
> both kernel and userspace callers. e.g.
> 
>    /* userspace caller */
>    int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct
> device *dev,
> 			void __user *udata)
> 
>    /* in-kernel caller */
>    int __iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct
> device *dev,
>                                  struct iommu_gpasid_bind_data *data)
> 
> We don;t expect in-kernel caller for cache invalidate in that it is
> implied in unmap, unbind operations.
> 
> >  But here
> > we've done nothing more than validated the supplied size to pass it
> > through to a non-user hardened callback.  We didn't check the
> > version,
> Yes, I should move up the version check from vendor driver.
> 
> > we didn't check that any of the undefined bits in cache or
> > granularity or padding were set, we don't know what flags might be
> > set in the union elements.
> You are right, we should sanitize reserved bits.
> 
> > For example, if a user is able to set a
> > flag that gets ignored now, that means we can never use that flag
> > without potentially breaking that user in the future.
> Good point, all reserved/unused bits should be tested.
> 
> >  If a user can
> > pass in version 3141592654 now, then we can never use version for
> > validation.  I see that intel_iommu_sva_invalidate() does test the
> > version, but has no obvious other hardening.  I'm afraid we're being
> > far to lax about accepting a data structure provided by a user, we
> > should not assume good faith. Thanks,
> >
> Agreed. will add checks in the IOMMU generic layer for reserved
> bits.
> For VT-d vendor driver, we do check all bits in cache types, i.e. in
> intel/iommu.c
> 	for_each_set_bit(cache_type,
> 			 (unsigned long *)&inv_info->cache,
> 			 IOMMU_CACHE_INV_TYPE_NR) {
> 
> 
> one other hardening is to check vendor argsz. This is in the
> bind_gpasid call.
> 
> 	if (data->argsz != offsetofend(struct iommu_gpasid_bind_data,
> vendor.vtd))
> 		return -EINVAL;
> 
> 
> 
> > Alex
> >
> 
> [Jacob Pan]
> 
> 
> 
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/5] docs: IOMMU user API
  2020-06-30  2:52       ` Tian, Kevin
@ 2020-06-30 17:39         ` Jacob Pan
  0 siblings, 0 replies; 19+ messages in thread
From: Jacob Pan @ 2020-06-30 17:39 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Raj, Ashok, Jonathan Corbet, David Woodhouse, LKML, iommu,
	Christoph Hellwig, Alex Williamson, Jean-Philippe Brucker

On Tue, 30 Jun 2020 02:52:45 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> > From: Jacob Pan
> > Sent: Tuesday, June 30, 2020 7:05 AM
> > 
> > On Fri, 26 Jun 2020 16:19:23 -0600
> > Alex Williamson <alex.williamson@redhat.com> wrote:
> >   
> > > On Tue, 23 Jun 2020 10:03:53 -0700
> > > Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
> > >  
> > > > IOMMU UAPI is newly introduced to support communications between
> > > > guest virtual IOMMU and host IOMMU. There has been lots of
> > > > discussions on how it should work with VFIO UAPI and userspace
> > > > in general.
> > > >
> > > > This document is indended to clarify the UAPI design and usage.
> > > > The mechenics of how future extensions should be achieved are
> > > > also covered in this documentation.
> > > >
> > > > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > > > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > > ---
> > > >  Documentation/userspace-api/iommu.rst | 244
> > > > ++++++++++++++++++++++++++++++++++ 1 file changed, 244
> > > > insertions(+) create mode 100644
> > > > Documentation/userspace-api/iommu.rst
> > > >
> > > > diff --git a/Documentation/userspace-api/iommu.rst
> > > > b/Documentation/userspace-api/iommu.rst new file mode 100644
> > > > index 000000000000..f9e4ed90a413
> > > > --- /dev/null
> > > > +++ b/Documentation/userspace-api/iommu.rst
> > > > @@ -0,0 +1,244 @@
> > > > +.. SPDX-License-Identifier: GPL-2.0
> > > > +.. iommu:
> > > > +
> > > > +=====================================
> > > > +IOMMU Userspace API
> > > > +=====================================
> > > > +
> > > > +IOMMU UAPI is used for virtualization cases where
> > > > communications are +needed between physical and virtual IOMMU
> > > > drivers. For native +usage, IOMMU is a system device which does
> > > > not need to communicate +with user space directly.
> > > > +
> > > > +The primary use cases are guest Shared Virtual Address (SVA)
> > > > and +guest IO virtual address (IOVA), wherein a virtual IOMMU
> > > > (vIOMMU) is +required to communicate with the physical IOMMU in
> > > > the host. +
> > > > +.. contents:: :local:
> > > > +
> > > > +Functionalities
> > > > +===============
> > > > +Communications of user and kernel involve both directions. The
> > > > +supported user-kernel APIs are as follows:
> > > > +
> > > > +1. Alloc/Free PASID
> > > > +2. Bind/unbind guest PASID (e.g. Intel VT-d)
> > > > +3. Bind/unbind guest PASID table (e.g. ARM sMMU)
> > > > +4. Invalidate IOMMU caches
> > > > +5. Service page requests
> > > > +
> > > > +Requirements
> > > > +============
> > > > +The IOMMU UAPIs are generic and extensible to meet the
> > > > following +requirements:
> > > > +
> > > > +1. Emulated and para-virtualised vIOMMUs
> > > > +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.)
> > > > +3. Extensions to the UAPI shall not break existing user space
> > > > +
> > > > +Interfaces
> > > > +==========
> > > > +Although the data structures defined in IOMMU UAPI are
> > > > self-contained, +there is no user API functions introduced.
> > > > Instead, IOMMU UAPI is +designed to work with existing user
> > > > driver frameworks such as VFIO. +
> > > > +Extension Rules & Precautions
> > > > +-----------------------------
> > > > +When IOMMU UAPI gets extended, the data structures can *only*
> > > > be +modified in two ways:
> > > > +
> > > > +1. Adding new fields by re-purposing the padding[] field. No
> > > > size change. +2. Adding new union members at the end. May
> > > > increase in size. +
> > > > +No new fields can be added *after* the variable sized union in
> > > > that it +will break backward compatibility when offset moves. In
> > > > both cases, a +new flag must be accompanied with a new field
> > > > such that the IOMMU +driver can process the data based on the
> > > > new flag. Version field is +only reserved for the unlikely
> > > > event of UAPI upgrade at its entirety. +
> > > > +It's *always* the caller's responsibility to indicate the size
> > > > of the +structure passed by setting argsz appropriately.
> > > > +Though at the same time, argsz is user provided data which is
> > > > not +trusted. The argsz field allows the user to indicate how
> > > > much data +they're providing, it's still the kernel's
> > > > responsibility to validate +whether it's correct and sufficient
> > > > for the requested operation. +
> > > > +Compatibility Checking
> > > > +----------------------
> > > > +When IOMMU UAPI extension results in size increase, user such
> > > > as VFIO +has to handle the following cases:
> > > > +
> > > > +1. User and kernel has exact size match
> > > > +2. An older user with older kernel header (smaller UAPI size)
> > > > running on a
> > > > +   newer kernel (larger UAPI size)
> > > > +3. A newer user with newer kernel header (larger UAPI size)
> > > > running
> > > > +   on an older kernel.
> > > > +4. A malicious/misbehaving user pass illegal/invalid size but
> > > > within
> > > > +   range. The data may contain garbage.  
> > >
> > > What exactly does vfio need to do to handle these?
> > >  
> > VFIO does nothing other than returning the status from IOMMU driver.
> > Based on the return status, users such as QEMU can cause fault
> > conditions within the vIOMMU.  
> 
> But from above description, "user such as VFIO has to handle the
> following cases"...
> 
I really meant the whole UAPI framework. How about:

"When IOMMU UAPI extension results in size increase, the following
cases must be handled:"

> Thanks
> Kevin
> 
> >   
> > > > +
> > > > +Feature Checking
> > > > +----------------
> > > > +While launching a guest with vIOMMU, it is important to ensure
> > > > that host +can support the UAPI data structures to be used for
> > > > vIOMMU-pIOMMU +communications. Without upfront compatibility
> > > > checking, future faults +are difficult to report even in normal
> > > > conditions. For example, TLB +invalidations should always
> > > > succeed. There is no architectural way to +report back to the
> > > > vIOMMU if the UAPI data is incompatible. If that +happens, in
> > > > order to protect IOMMU iosolation guarantee, we have to +resort
> > > > to not giving completion status in vIOMMU. This may result in
> > > > +VM hang. +
> > > > +For this reason the following IOMMU UAPIs cannot fail:
> > > > +
> > > > +1. Free PASID
> > > > +2. Unbind guest PASID
> > > > +3. Unbind guest PASID table (SMMU)
> > > > +4. Cache invalidate
> > > > +
> > > > +User applications such as QEMU is expected to import kernel
> > > > UAPI +headers. Backward compatibility is supported per feature
> > > > flags. +For example, an older QEMU (with older kernel header)
> > > > can run on newer +kernel. Newer QEMU (with new kernel header)
> > > > may refuse to initialize +on an older kernel if new feature
> > > > flags are not supported by older +kernel. Simply recompile
> > > > existing code with newer kernel header should +not be an issue
> > > > in that only existing flags are used. +
> > > > +IOMMU vendor driver should report the below features to IOMMU
> > > > UAPI +consumers (e.g. via VFIO).
> > > > +
> > > > +1. IOMMU_NESTING_FEAT_SYSWIDE_PASID
> > > > +2. IOMMU_NESTING_FEAT_BIND_PGTBL
> > > > +3. IOMMU_NESTING_FEAT_BIND_PASID_TABLE
> > > > +4. IOMMU_NESTING_FEAT_CACHE_INVLD
> > > > +5. IOMMU_NESTING_FEAT_PAGE_REQUEST
> > > > +
> > > > +Take VFIO as example, upon request from VFIO user space (e.g.
> > > > QEMU), +VFIO kernel code shall query IOMMU vendor driver for the
> > > > support of +the above features. Query result can then be
> > > > reported back to the +user-space caller. Details can be found in
> > > > +Documentation/driver-api/vfio.rst.
> > > > +
> > > > +
> > > > +Data Passing Example with VFIO
> > > > +------------------------------
> > > > +As the ubiquitous userspace driver framework, VFIO is already
> > > > IOMMU +aware and share many key concepts such as device model,
> > > > group, and +protection domain. Other user driver frameworks can
> > > > also be extended +to support IOMMU UAPI but it is outside the
> > > > scope of this document. +
> > > > +In this tight-knit VFIO-IOMMU interface, the ultimate consumer
> > > > of the +IOMMU UAPI data is the host IOMMU driver. VFIO
> > > > facilitates user-kernel +transport, capability checking,
> > > > security, and life cycle management of +process address space
> > > > ID (PASID). +
> > > > +Unlike normal user data passed via VFIO UAPI IOTCL, IOMMU
> > > > driver is the +ultimate consumer of its UAPI data. At VFIO
> > > > layer, the IOMMU UAPI data +is wrapped in a VFIO UAPI data. It
> > > > follows the +pattern below::
> > > > +
> > > > +   struct {
> > > > +	__u32 argsz;
> > > > +	__u32 flags;
> > > > +	__u8  data[];
> > > > +   };
> > > > +
> > > > +Here data[] contains the IOMMU UAPI data structures. VFIO has
> > > > the +freedom to bundle the data as well as parse data size
> > > > based on its own flags. +
> > > > +In order to determine the size and feature set of the user
> > > > data, argsz +and flags are also embedded in the IOMMU UAPI data
> > > > structures. +A "__u32 argsz" field is *always* at the beginning
> > > > of each structure. +
> > > > +For example:
> > > > +::
> > > > +
> > > > +   struct iommu_cache_invalidate_info {
> > > > +	__u32	argsz;
> > > > +	#define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
> > > > +	__u32	version;
> > > > +	/* IOMMU paging structure cache */
> > > > +	#define IOMMU_CACHE_INV_TYPE_IOTLB	(1 << 0) /*
> > > > IOMMU IOTLB */
> > > > +	#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB	(1 <<
> > > > 1) /* Device IOTLB */
> > > > +	#define IOMMU_CACHE_INV_TYPE_PASID	(1 << 2) /*
> > > > PASID cache */
> > > > +	#define IOMMU_CACHE_INV_TYPE_NR		(3)
> > > > +	__u8	cache;
> > > > +	__u8	granularity;
> > > > +	__u8	padding[2];
> > > > +	union {
> > > > +		struct iommu_inv_pasid_info pasid_info;
> > > > +		struct iommu_inv_addr_info addr_info;
> > > > +	} granu;
> > > > +   };
> > > > +
> > > > +VFIO is responsible for checking its own argsz and flags then
> > > > invokes +appropriate IOMMU UAPI functions. User pointer is
> > > > passed to IOMMU +layer for further processing. The
> > > > responsibilities are divided as +follows:
> > > > +
> > > > +- Generic IOMMU layer checks argsz range and override
> > > > out-of-range
> > > > +  value. If the exact argsz is based on generic flags, they are
> > > > checked
> > > > +  here as well.
> > > > +
> > > > +- Vendor IOMMU driver checks argsz based on vendor flags, UAPI
> > > > data
> > > > +  is consumed based on flags
> > > > +
> > > > +Once again, use guest TLB invalidation as an example, argsz is
> > > > based +on generic flags in the invalidation information. IOMMU
> > > > generic code +shall process the UAPI data as the following:
> > > > +
> > > > +::
> > > > +
> > > > + int iommu_cache_invalidate(struct iommu_domain *domain, struct
> > > > device *dev,
> > > > +			void __user *uinfo)
> > > > + {
> > > > +	/* Current kernel data size is the max to be copied
> > > > from user */
> > > > +	maxsz = sizeof(struct iommu_cache_invalidate_info);
> > > > +	memset((void *)&inv_info, 0, maxsz);
> > > > +
> > > > +	/*
> > > > +	 * No new spaces can be added before the variable sized
> > > > union, the
> > > > +	 * minimum size is the offset to the union.
> > > > +	 */
> > > > +	minsz = offsetof(struct iommu_cache_invalidate_info,
> > > > granu); +
> > > > +	/* Copy minsz from user to get flags and argsz */
> > > > +	if (copy_from_user(&inv_info, uinfo, minsz))
> > > > +		return -EFAULT;
> > > > +
> > > > +	/* Fields before variable size union is mandatory */
> > > > +	if (inv_info.argsz < minsz)
> > > > +		return -EINVAL;
> > > > +	/*
> > > > +	 * User might be using a newer UAPI header which has a
> > > > larger data
> > > > +	 * size, we shall support the existing flags within the
> > > > current
> > > > +	 * size.
> > > > +	 */
> > > > +	if (inv_info.argsz > maxsz)
> > > > +		inv_info.argsz = maxsz;
> > > > +
> > > > +	/* Checking the exact argsz based on generic flags */
> > > > +	if (inv_info.granularity == IOMMU_INV_GRANU_ADDR &&
> > > > +		inv_info.argsz != offsetofend(struct
> > > > iommu_cache_invalidate_info,
> > > > +					granu.addr_info))  
> > >
> > > Is it really reasonable to expect the user to specify argsz to the
> > > exact union element for the callback?  I'd certainly expect users
> > > to simply use sizeof(struct iommu_cache_invalidate_info) and it
> > > should therefore be sufficient to test >= here rather than jump
> > > through hoops with an exact size.  We're already changing
> > > inv_info.argsz above to fit our known structure, it's
> > > inconsistent to then expect it to be some exact value.
> > >  
> > I was thinking argsz doesn't have to be the exact struct size. It
> > should be whatever the sufficient & correct size used by the user
> > for a given call.
> > 
> > For example, current struct iommu_gpasid_bind_data {} only has VT-d
> > data. If it gets extended with SMMU data in the union, VT-d vIOMMU
> > emulation should only fill the union size of vt-d.
> >   
> > > > +		return -EINVAL;
> > > > +
> > > > +	if (inv_info.granularity == IOMMU_INV_GRANU_PASID &&
> > > > +		inv_info.argsz != offsetofend(struct
> > > > iommu_cache_invalidate_info,
> > > > +					granu.pasid_info))
> > > > +		return -EINVAL;
> > > > +
> > > > +	/* Copy the remaining user data _after_ minsz */
> > > > +	if (copy_from_user((void *)&inv_info + minsz, uinfo +
> > > > minsz,
> > > > +				inv_info.argsz - minsz))
> > > > +		return -EFAULT;
> > > > +
> > > > +	return domain->ops->cache_invalidate(domain, dev,
> > > > &inv_info);
> > > > + }
> > > > + Add a wrapper
> > > > +   __iommu_unbind_( kernel data, same user data, kernel copy)
> > > > +  
> > This should be removed. Sorry about the confusion. The patch does
> > not have two data pointers, just separate APIs for kernel and user.
> >   
> > > > +Notice that in this example, since union size is determined by
> > > > generic +flags, all checking to argsz is validated in the
> > > > generic IOMMU layer, +vendor driver does not need to check
> > > > argsz. However, if union size is +based on vendor data, such as
> > > > iommu_sva_bind_gpasid(), it will be +vendor driver's
> > > > responsibility to validate the exact argsz.  
> > >
> > > struct iommu_cache_invalidate_info is a good example because it
> > > explicitly states a table of type vs granularity validity.  When
> > > the cache_invalidate() callback is used by an internal user we can
> > > consider it a bug in the caller if its usage falls outside of
> > > these prescribed valid combinations, ie. iommu_ops callbacks may
> > > assume a trusted caller that isn't trying to exploit any
> > > loophole.  
> > Separate APIs are proposed in the patchset to address UAPIs
> > with both kernel and user callers. Sorry about the last line in the
> > example above. Currently, only unbind_gpasid() and page_response()
> > have both kernel and userspace callers. e.g.
> > 
> >    /* userspace caller */
> >    int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct
> > device *dev,
> > 			void __user *udata)
> > 
> >    /* in-kernel caller */
> >    int __iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct
> > device *dev,
> >                                  struct iommu_gpasid_bind_data
> > *data)
> > 
> > We don;t expect in-kernel caller for cache invalidate in that it is
> > implied in unmap, unbind operations.
> >   
> > >  But here
> > > we've done nothing more than validated the supplied size to pass
> > > it through to a non-user hardened callback.  We didn't check the
> > > version,  
> > Yes, I should move up the version check from vendor driver.
> >   
> > > we didn't check that any of the undefined bits in cache or
> > > granularity or padding were set, we don't know what flags might be
> > > set in the union elements.  
> > You are right, we should sanitize reserved bits.
> >   
> > > For example, if a user is able to set a
> > > flag that gets ignored now, that means we can never use that flag
> > > without potentially breaking that user in the future.  
> > Good point, all reserved/unused bits should be tested.
> >   
> > >  If a user can
> > > pass in version 3141592654 now, then we can never use version for
> > > validation.  I see that intel_iommu_sva_invalidate() does test the
> > > version, but has no obvious other hardening.  I'm afraid we're
> > > being far to lax about accepting a data structure provided by a
> > > user, we should not assume good faith. Thanks,
> > >  
> > Agreed. will add checks in the IOMMU generic layer for reserved
> > bits.
> > For VT-d vendor driver, we do check all bits in cache types, i.e. in
> > intel/iommu.c
> > 	for_each_set_bit(cache_type,
> > 			 (unsigned long *)&inv_info->cache,
> > 			 IOMMU_CACHE_INV_TYPE_NR) {
> > 
> > 
> > one other hardening is to check vendor argsz. This is in the
> > bind_gpasid call.
> > 
> > 	if (data->argsz != offsetofend(struct
> > iommu_gpasid_bind_data, vendor.vtd))
> > 		return -EINVAL;
> > 
> > 
> >   
> > > Alex
> > >  
> > 
> > [Jacob Pan]
> > 
> > 
> > 
> > _______________________________________________
> > iommu mailing list
> > iommu@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/iommu  

[Jacob Pan]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/5] docs: IOMMU user API
  2020-06-29 23:05     ` Jacob Pan
  2020-06-30  2:52       ` Tian, Kevin
@ 2020-07-07 21:40       ` Alex Williamson
  2020-07-08 15:21         ` Jacob Pan
  1 sibling, 1 reply; 19+ messages in thread
From: Alex Williamson @ 2020-07-07 21:40 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	LKML, Christoph Hellwig, iommu, David Woodhouse

On Mon, 29 Jun 2020 16:05:18 -0700
Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:

> On Fri, 26 Jun 2020 16:19:23 -0600
> Alex Williamson <alex.williamson@redhat.com> wrote:
> 
> > On Tue, 23 Jun 2020 10:03:53 -0700
> > Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
> >   
> > > IOMMU UAPI is newly introduced to support communications between
> > > guest virtual IOMMU and host IOMMU. There has been lots of
> > > discussions on how it should work with VFIO UAPI and userspace in
> > > general.
> > > 
> > > This document is indended to clarify the UAPI design and usage. The
> > > mechenics of how future extensions should be achieved are also
> > > covered in this documentation.
> > > 
> > > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > ---
> > >  Documentation/userspace-api/iommu.rst | 244
> > > ++++++++++++++++++++++++++++++++++ 1 file changed, 244 insertions(+)
> > >  create mode 100644 Documentation/userspace-api/iommu.rst
> > > 
> > > diff --git a/Documentation/userspace-api/iommu.rst
> > > b/Documentation/userspace-api/iommu.rst new file mode 100644
> > > index 000000000000..f9e4ed90a413
> > > --- /dev/null
> > > +++ b/Documentation/userspace-api/iommu.rst
> > > @@ -0,0 +1,244 @@
> > > +.. SPDX-License-Identifier: GPL-2.0
> > > +.. iommu:
> > > +
> > > +=====================================
> > > +IOMMU Userspace API
> > > +=====================================
> > > +
> > > +IOMMU UAPI is used for virtualization cases where communications
> > > are +needed between physical and virtual IOMMU drivers. For native
> > > +usage, IOMMU is a system device which does not need to communicate
> > > +with user space directly.
> > > +
> > > +The primary use cases are guest Shared Virtual Address (SVA) and
> > > +guest IO virtual address (IOVA), wherein a virtual IOMMU (vIOMMU)
> > > is +required to communicate with the physical IOMMU in the host.
> > > +
> > > +.. contents:: :local:
> > > +
> > > +Functionalities
> > > +===============
> > > +Communications of user and kernel involve both directions. The
> > > +supported user-kernel APIs are as follows:
> > > +
> > > +1. Alloc/Free PASID
> > > +2. Bind/unbind guest PASID (e.g. Intel VT-d)
> > > +3. Bind/unbind guest PASID table (e.g. ARM sMMU)
> > > +4. Invalidate IOMMU caches
> > > +5. Service page requests
> > > +
> > > +Requirements
> > > +============
> > > +The IOMMU UAPIs are generic and extensible to meet the following
> > > +requirements:
> > > +
> > > +1. Emulated and para-virtualised vIOMMUs
> > > +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.)
> > > +3. Extensions to the UAPI shall not break existing user space
> > > +
> > > +Interfaces
> > > +==========
> > > +Although the data structures defined in IOMMU UAPI are
> > > self-contained, +there is no user API functions introduced.
> > > Instead, IOMMU UAPI is +designed to work with existing user driver
> > > frameworks such as VFIO. +
> > > +Extension Rules & Precautions
> > > +-----------------------------
> > > +When IOMMU UAPI gets extended, the data structures can *only* be
> > > +modified in two ways:
> > > +
> > > +1. Adding new fields by re-purposing the padding[] field. No size
> > > change. +2. Adding new union members at the end. May increase in
> > > size. +
> > > +No new fields can be added *after* the variable sized union in
> > > that it +will break backward compatibility when offset moves. In
> > > both cases, a +new flag must be accompanied with a new field such
> > > that the IOMMU +driver can process the data based on the new flag.
> > > Version field is +only reserved for the unlikely event of UAPI
> > > upgrade at its entirety. +
> > > +It's *always* the caller's responsibility to indicate the size of
> > > the +structure passed by setting argsz appropriately.
> > > +Though at the same time, argsz is user provided data which is not
> > > +trusted. The argsz field allows the user to indicate how much data
> > > +they're providing, it's still the kernel's responsibility to
> > > validate +whether it's correct and sufficient for the requested
> > > operation. +
> > > +Compatibility Checking
> > > +----------------------
> > > +When IOMMU UAPI extension results in size increase, user such as
> > > VFIO +has to handle the following cases:
> > > +
> > > +1. User and kernel has exact size match
> > > +2. An older user with older kernel header (smaller UAPI size)
> > > running on a
> > > +   newer kernel (larger UAPI size)
> > > +3. A newer user with newer kernel header (larger UAPI size) running
> > > +   on an older kernel.
> > > +4. A malicious/misbehaving user pass illegal/invalid size but
> > > within
> > > +   range. The data may contain garbage.    
> > 
> > What exactly does vfio need to do to handle these?
> >   
> VFIO does nothing other than returning the status from IOMMU driver.
> Based on the return status, users such as QEMU can cause fault
> conditions within the vIOMMU.
> 
> > > +
> > > +Feature Checking
> > > +----------------
> > > +While launching a guest with vIOMMU, it is important to ensure
> > > that host +can support the UAPI data structures to be used for
> > > vIOMMU-pIOMMU +communications. Without upfront compatibility
> > > checking, future faults +are difficult to report even in normal
> > > conditions. For example, TLB +invalidations should always succeed.
> > > There is no architectural way to +report back to the vIOMMU if the
> > > UAPI data is incompatible. If that +happens, in order to protect
> > > IOMMU iosolation guarantee, we have to +resort to not giving
> > > completion status in vIOMMU. This may result in +VM hang.
> > > +
> > > +For this reason the following IOMMU UAPIs cannot fail:
> > > +
> > > +1. Free PASID
> > > +2. Unbind guest PASID
> > > +3. Unbind guest PASID table (SMMU)
> > > +4. Cache invalidate
> > > +
> > > +User applications such as QEMU is expected to import kernel UAPI
> > > +headers. Backward compatibility is supported per feature flags.
> > > +For example, an older QEMU (with older kernel header) can run on
> > > newer +kernel. Newer QEMU (with new kernel header) may refuse to
> > > initialize +on an older kernel if new feature flags are not
> > > supported by older +kernel. Simply recompile existing code with
> > > newer kernel header should +not be an issue in that only existing
> > > flags are used. +
> > > +IOMMU vendor driver should report the below features to IOMMU UAPI
> > > +consumers (e.g. via VFIO).
> > > +
> > > +1. IOMMU_NESTING_FEAT_SYSWIDE_PASID
> > > +2. IOMMU_NESTING_FEAT_BIND_PGTBL
> > > +3. IOMMU_NESTING_FEAT_BIND_PASID_TABLE
> > > +4. IOMMU_NESTING_FEAT_CACHE_INVLD
> > > +5. IOMMU_NESTING_FEAT_PAGE_REQUEST
> > > +
> > > +Take VFIO as example, upon request from VFIO user space (e.g.
> > > QEMU), +VFIO kernel code shall query IOMMU vendor driver for the
> > > support of +the above features. Query result can then be reported
> > > back to the +user-space caller. Details can be found in
> > > +Documentation/driver-api/vfio.rst.
> > > +
> > > +
> > > +Data Passing Example with VFIO
> > > +------------------------------
> > > +As the ubiquitous userspace driver framework, VFIO is already IOMMU
> > > +aware and share many key concepts such as device model, group, and
> > > +protection domain. Other user driver frameworks can also be
> > > extended +to support IOMMU UAPI but it is outside the scope of this
> > > document. +
> > > +In this tight-knit VFIO-IOMMU interface, the ultimate consumer of
> > > the +IOMMU UAPI data is the host IOMMU driver. VFIO facilitates
> > > user-kernel +transport, capability checking, security, and life
> > > cycle management of +process address space ID (PASID).
> > > +
> > > +Unlike normal user data passed via VFIO UAPI IOTCL, IOMMU driver
> > > is the +ultimate consumer of its UAPI data. At VFIO layer, the
> > > IOMMU UAPI data +is wrapped in a VFIO UAPI data. It follows the
> > > +pattern below::
> > > +
> > > +   struct {
> > > +	__u32 argsz;
> > > +	__u32 flags;
> > > +	__u8  data[];
> > > +   };
> > > +
> > > +Here data[] contains the IOMMU UAPI data structures. VFIO has the
> > > +freedom to bundle the data as well as parse data size based on its
> > > own flags. +
> > > +In order to determine the size and feature set of the user data,
> > > argsz +and flags are also embedded in the IOMMU UAPI data
> > > structures. +A "__u32 argsz" field is *always* at the beginning of
> > > each structure. +
> > > +For example:
> > > +::
> > > +
> > > +   struct iommu_cache_invalidate_info {
> > > +	__u32	argsz;
> > > +	#define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
> > > +	__u32	version;
> > > +	/* IOMMU paging structure cache */
> > > +	#define IOMMU_CACHE_INV_TYPE_IOTLB	(1 << 0) /*
> > > IOMMU IOTLB */
> > > +	#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB	(1 << 1) /*
> > > Device IOTLB */
> > > +	#define IOMMU_CACHE_INV_TYPE_PASID	(1 << 2) /*
> > > PASID cache */
> > > +	#define IOMMU_CACHE_INV_TYPE_NR		(3)
> > > +	__u8	cache;
> > > +	__u8	granularity;
> > > +	__u8	padding[2];
> > > +	union {
> > > +		struct iommu_inv_pasid_info pasid_info;
> > > +		struct iommu_inv_addr_info addr_info;
> > > +	} granu;
> > > +   };
> > > +
> > > +VFIO is responsible for checking its own argsz and flags then
> > > invokes +appropriate IOMMU UAPI functions. User pointer is passed
> > > to IOMMU +layer for further processing. The responsibilities are
> > > divided as +follows:
> > > +
> > > +- Generic IOMMU layer checks argsz range and override out-of-range
> > > +  value. If the exact argsz is based on generic flags, they are
> > > checked
> > > +  here as well.
> > > +
> > > +- Vendor IOMMU driver checks argsz based on vendor flags, UAPI data
> > > +  is consumed based on flags
> > > +
> > > +Once again, use guest TLB invalidation as an example, argsz is
> > > based +on generic flags in the invalidation information. IOMMU
> > > generic code +shall process the UAPI data as the following:
> > > +
> > > +::
> > > +
> > > + int iommu_cache_invalidate(struct iommu_domain *domain, struct
> > > device *dev,
> > > +			void __user *uinfo)
> > > + {
> > > +	/* Current kernel data size is the max to be copied from
> > > user */
> > > +	maxsz = sizeof(struct iommu_cache_invalidate_info);
> > > +	memset((void *)&inv_info, 0, maxsz);
> > > +
> > > +	/*
> > > +	 * No new spaces can be added before the variable sized
> > > union, the
> > > +	 * minimum size is the offset to the union.
> > > +	 */
> > > +	minsz = offsetof(struct iommu_cache_invalidate_info,
> > > granu); +
> > > +	/* Copy minsz from user to get flags and argsz */
> > > +	if (copy_from_user(&inv_info, uinfo, minsz))
> > > +		return -EFAULT;
> > > +
> > > +	/* Fields before variable size union is mandatory */
> > > +	if (inv_info.argsz < minsz)
> > > +		return -EINVAL;
> > > +	/*
> > > +	 * User might be using a newer UAPI header which has a
> > > larger data
> > > +	 * size, we shall support the existing flags within the
> > > current
> > > +	 * size.
> > > +	 */
> > > +	if (inv_info.argsz > maxsz)
> > > +		inv_info.argsz = maxsz;
> > > +
> > > +	/* Checking the exact argsz based on generic flags */
> > > +	if (inv_info.granularity == IOMMU_INV_GRANU_ADDR &&
> > > +		inv_info.argsz != offsetofend(struct
> > > iommu_cache_invalidate_info,
> > > +					granu.addr_info))    
> > 
> > Is it really reasonable to expect the user to specify argsz to the
> > exact union element for the callback?  I'd certainly expect users to
> > simply use sizeof(struct iommu_cache_invalidate_info) and it should
> > therefore be sufficient to test >= here rather than jump through hoops
> > with an exact size.  We're already changing inv_info.argsz above to
> > fit our known structure, it's inconsistent to then expect it to be
> > some exact value.
> >    
> I was thinking argsz doesn't have to be the exact struct size. It should
> be whatever the sufficient & correct size used by the user for a given
> call.
> 
> For example, current struct iommu_gpasid_bind_data {} only has VT-d
> data. If it gets extended with SMMU data in the union, VT-d vIOMMU
> emulation should only fill the union size of vt-d.

But the user is simply going to have a struct iommu_gpasid_bind_data
and set argsz to sizeof that struct.  The user is providing that entire
struct as the arg, so it doesn't really make sense to pinpoint the last
field of the union they chose to use.  The fields of the structure
define which union is applicable.  If you're following the vfio
interface as an example, sufficiently sized is all we require, we don't
ask for union member level granularity, which I think is confusing and
error prone for users.

> > > +		return -EINVAL;
> > > +
> > > +	if (inv_info.granularity == IOMMU_INV_GRANU_PASID &&
> > > +		inv_info.argsz != offsetofend(struct
> > > iommu_cache_invalidate_info,
> > > +					granu.pasid_info))
> > > +		return -EINVAL;
> > > +
> > > +	/* Copy the remaining user data _after_ minsz */
> > > +	if (copy_from_user((void *)&inv_info + minsz, uinfo +
> > > minsz,
> > > +				inv_info.argsz - minsz))
> > > +		return -EFAULT;
> > > +
> > > +	return domain->ops->cache_invalidate(domain, dev,
> > > &inv_info);
> > > + }
> > > + Add a wrapper
> > > +   __iommu_unbind_( kernel data, same user data, kernel copy)
> > > +  
> This should be removed. Sorry about the confusion. The patch does not
> have two data pointers, just separate APIs for kernel and user.
> 
> > > +Notice that in this example, since union size is determined by
> > > generic +flags, all checking to argsz is validated in the generic
> > > IOMMU layer, +vendor driver does not need to check argsz. However,
> > > if union size is +based on vendor data, such as
> > > iommu_sva_bind_gpasid(), it will be +vendor driver's responsibility
> > > to validate the exact argsz.    
> > 
> > struct iommu_cache_invalidate_info is a good example because it
> > explicitly states a table of type vs granularity validity.  When the
> > cache_invalidate() callback is used by an internal user we can
> > consider it a bug in the caller if its usage falls outside of these
> > prescribed valid combinations, ie. iommu_ops callbacks may assume a
> > trusted caller that isn't trying to exploit any loophole.  
> Separate APIs are proposed in the patchset to address UAPIs
> with both kernel and user callers. Sorry about the last line in the
> example above. Currently, only unbind_gpasid() and page_response() have
> both kernel and userspace callers. e.g.
> 
>    /* userspace caller */
>    int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
> 			void __user *udata)
> 
>    /* in-kernel caller */
>    int __iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
>                                  struct iommu_gpasid_bind_data *data)
> 
> We don;t expect in-kernel caller for cache invalidate in that it is
> implied in unmap, unbind operations.

That specific function was only an example of an interface which has
invalid combinations where in-kernel it's the caller's responsibility
to abide by the rules, but a user interface needs further validation.

> >  But here
> > we've done nothing more than validated the supplied size to pass it
> > through to a non-user hardened callback.  We didn't check the
> > version,  
> Yes, I should move up the version check from vendor driver.
> 
> > we didn't check that any of the undefined bits in cache or
> > granularity or padding were set, we don't know what flags might be
> > set in the union elements.  
> You are right, we should sanitize reserved bits.
> 
> > For example, if a user is able to set a
> > flag that gets ignored now, that means we can never use that flag
> > without potentially breaking that user in the future.  
> Good point, all reserved/unused bits should be tested.
> 
> >  If a user can
> > pass in version 3141592654 now, then we can never use version for
> > validation.  I see that intel_iommu_sva_invalidate() does test the
> > version, but has no obvious other hardening.  I'm afraid we're being
> > far to lax about accepting a data structure provided by a user, we
> > should not assume good faith. Thanks,
> >   
> Agreed. will add checks in the IOMMU generic layer for reserved
> bits.
> For VT-d vendor driver, we do check all bits in cache types, i.e. in
> intel/iommu.c
> 	for_each_set_bit(cache_type,
> 			 (unsigned long *)&inv_info->cache,
> 			 IOMMU_CACHE_INV_TYPE_NR) {
> 
> 
> one other hardening is to check vendor argsz. This is in the
> bind_gpasid call.
> 
> 	if (data->argsz != offsetofend(struct iommu_gpasid_bind_data, vendor.vtd))
> 		return -EINVAL;

This is the same issue I raise above whether this is actually too
strict.  The user is providing a full structure as the arg, the
structure defines the valid fields via its content, it shouldn't be
necessary to have field level granularity for argsz.  Thanks,

Alex

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/5] docs: IOMMU user API
  2020-07-07 21:40       ` Alex Williamson
@ 2020-07-08 15:21         ` Jacob Pan
  0 siblings, 0 replies; 19+ messages in thread
From: Jacob Pan @ 2020-07-08 15:21 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	LKML, Christoph Hellwig, iommu, David Woodhouse

Hi Alex,

All points below are taken. I have sent out v4 that addresses these
feedbacks. Thanks for the review.

Jacob

On Tue, 7 Jul 2020 15:40:54 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> On Mon, 29 Jun 2020 16:05:18 -0700
> Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
> 
> > On Fri, 26 Jun 2020 16:19:23 -0600
> > Alex Williamson <alex.williamson@redhat.com> wrote:
> >   
> > > On Tue, 23 Jun 2020 10:03:53 -0700
> > > Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
> > >     
> > > > IOMMU UAPI is newly introduced to support communications between
> > > > guest virtual IOMMU and host IOMMU. There has been lots of
> > > > discussions on how it should work with VFIO UAPI and userspace
> > > > in general.
> > > > 
> > > > This document is indended to clarify the UAPI design and usage.
> > > > The mechenics of how future extensions should be achieved are
> > > > also covered in this documentation.
> > > > 
> > > > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > > > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > > ---
> > > >  Documentation/userspace-api/iommu.rst | 244
> > > > ++++++++++++++++++++++++++++++++++ 1 file changed, 244
> > > > insertions(+) create mode 100644
> > > > Documentation/userspace-api/iommu.rst
> > > > 
> > > > diff --git a/Documentation/userspace-api/iommu.rst
> > > > b/Documentation/userspace-api/iommu.rst new file mode 100644
> > > > index 000000000000..f9e4ed90a413
> > > > --- /dev/null
> > > > +++ b/Documentation/userspace-api/iommu.rst
> > > > @@ -0,0 +1,244 @@
> > > > +.. SPDX-License-Identifier: GPL-2.0
> > > > +.. iommu:
> > > > +
> > > > +=====================================
> > > > +IOMMU Userspace API
> > > > +=====================================
> > > > +
> > > > +IOMMU UAPI is used for virtualization cases where
> > > > communications are +needed between physical and virtual IOMMU
> > > > drivers. For native +usage, IOMMU is a system device which does
> > > > not need to communicate +with user space directly.
> > > > +
> > > > +The primary use cases are guest Shared Virtual Address (SVA)
> > > > and +guest IO virtual address (IOVA), wherein a virtual IOMMU
> > > > (vIOMMU) is +required to communicate with the physical IOMMU in
> > > > the host. +
> > > > +.. contents:: :local:
> > > > +
> > > > +Functionalities
> > > > +===============
> > > > +Communications of user and kernel involve both directions. The
> > > > +supported user-kernel APIs are as follows:
> > > > +
> > > > +1. Alloc/Free PASID
> > > > +2. Bind/unbind guest PASID (e.g. Intel VT-d)
> > > > +3. Bind/unbind guest PASID table (e.g. ARM sMMU)
> > > > +4. Invalidate IOMMU caches
> > > > +5. Service page requests
> > > > +
> > > > +Requirements
> > > > +============
> > > > +The IOMMU UAPIs are generic and extensible to meet the
> > > > following +requirements:
> > > > +
> > > > +1. Emulated and para-virtualised vIOMMUs
> > > > +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.)
> > > > +3. Extensions to the UAPI shall not break existing user space
> > > > +
> > > > +Interfaces
> > > > +==========
> > > > +Although the data structures defined in IOMMU UAPI are
> > > > self-contained, +there is no user API functions introduced.
> > > > Instead, IOMMU UAPI is +designed to work with existing user
> > > > driver frameworks such as VFIO. +
> > > > +Extension Rules & Precautions
> > > > +-----------------------------
> > > > +When IOMMU UAPI gets extended, the data structures can *only*
> > > > be +modified in two ways:
> > > > +
> > > > +1. Adding new fields by re-purposing the padding[] field. No
> > > > size change. +2. Adding new union members at the end. May
> > > > increase in size. +
> > > > +No new fields can be added *after* the variable sized union in
> > > > that it +will break backward compatibility when offset moves. In
> > > > both cases, a +new flag must be accompanied with a new field
> > > > such that the IOMMU +driver can process the data based on the
> > > > new flag. Version field is +only reserved for the unlikely
> > > > event of UAPI upgrade at its entirety. +
> > > > +It's *always* the caller's responsibility to indicate the size
> > > > of the +structure passed by setting argsz appropriately.
> > > > +Though at the same time, argsz is user provided data which is
> > > > not +trusted. The argsz field allows the user to indicate how
> > > > much data +they're providing, it's still the kernel's
> > > > responsibility to validate +whether it's correct and sufficient
> > > > for the requested operation. +
> > > > +Compatibility Checking
> > > > +----------------------
> > > > +When IOMMU UAPI extension results in size increase, user such
> > > > as VFIO +has to handle the following cases:
> > > > +
> > > > +1. User and kernel has exact size match
> > > > +2. An older user with older kernel header (smaller UAPI size)
> > > > running on a
> > > > +   newer kernel (larger UAPI size)
> > > > +3. A newer user with newer kernel header (larger UAPI size)
> > > > running
> > > > +   on an older kernel.
> > > > +4. A malicious/misbehaving user pass illegal/invalid size but
> > > > within
> > > > +   range. The data may contain garbage.      
> > > 
> > > What exactly does vfio need to do to handle these?
> > >     
> > VFIO does nothing other than returning the status from IOMMU driver.
> > Based on the return status, users such as QEMU can cause fault
> > conditions within the vIOMMU.
> >   
> > > > +
> > > > +Feature Checking
> > > > +----------------
> > > > +While launching a guest with vIOMMU, it is important to ensure
> > > > that host +can support the UAPI data structures to be used for
> > > > vIOMMU-pIOMMU +communications. Without upfront compatibility
> > > > checking, future faults +are difficult to report even in normal
> > > > conditions. For example, TLB +invalidations should always
> > > > succeed. There is no architectural way to +report back to the
> > > > vIOMMU if the UAPI data is incompatible. If that +happens, in
> > > > order to protect IOMMU iosolation guarantee, we have to +resort
> > > > to not giving completion status in vIOMMU. This may result in
> > > > +VM hang. +
> > > > +For this reason the following IOMMU UAPIs cannot fail:
> > > > +
> > > > +1. Free PASID
> > > > +2. Unbind guest PASID
> > > > +3. Unbind guest PASID table (SMMU)
> > > > +4. Cache invalidate
> > > > +
> > > > +User applications such as QEMU is expected to import kernel
> > > > UAPI +headers. Backward compatibility is supported per feature
> > > > flags. +For example, an older QEMU (with older kernel header)
> > > > can run on newer +kernel. Newer QEMU (with new kernel header)
> > > > may refuse to initialize +on an older kernel if new feature
> > > > flags are not supported by older +kernel. Simply recompile
> > > > existing code with newer kernel header should +not be an issue
> > > > in that only existing flags are used. +
> > > > +IOMMU vendor driver should report the below features to IOMMU
> > > > UAPI +consumers (e.g. via VFIO).
> > > > +
> > > > +1. IOMMU_NESTING_FEAT_SYSWIDE_PASID
> > > > +2. IOMMU_NESTING_FEAT_BIND_PGTBL
> > > > +3. IOMMU_NESTING_FEAT_BIND_PASID_TABLE
> > > > +4. IOMMU_NESTING_FEAT_CACHE_INVLD
> > > > +5. IOMMU_NESTING_FEAT_PAGE_REQUEST
> > > > +
> > > > +Take VFIO as example, upon request from VFIO user space (e.g.
> > > > QEMU), +VFIO kernel code shall query IOMMU vendor driver for the
> > > > support of +the above features. Query result can then be
> > > > reported back to the +user-space caller. Details can be found in
> > > > +Documentation/driver-api/vfio.rst.
> > > > +
> > > > +
> > > > +Data Passing Example with VFIO
> > > > +------------------------------
> > > > +As the ubiquitous userspace driver framework, VFIO is already
> > > > IOMMU +aware and share many key concepts such as device model,
> > > > group, and +protection domain. Other user driver frameworks can
> > > > also be extended +to support IOMMU UAPI but it is outside the
> > > > scope of this document. +
> > > > +In this tight-knit VFIO-IOMMU interface, the ultimate consumer
> > > > of the +IOMMU UAPI data is the host IOMMU driver. VFIO
> > > > facilitates user-kernel +transport, capability checking,
> > > > security, and life cycle management of +process address space
> > > > ID (PASID). +
> > > > +Unlike normal user data passed via VFIO UAPI IOTCL, IOMMU
> > > > driver is the +ultimate consumer of its UAPI data. At VFIO
> > > > layer, the IOMMU UAPI data +is wrapped in a VFIO UAPI data. It
> > > > follows the +pattern below::
> > > > +
> > > > +   struct {
> > > > +	__u32 argsz;
> > > > +	__u32 flags;
> > > > +	__u8  data[];
> > > > +   };
> > > > +
> > > > +Here data[] contains the IOMMU UAPI data structures. VFIO has
> > > > the +freedom to bundle the data as well as parse data size
> > > > based on its own flags. +
> > > > +In order to determine the size and feature set of the user
> > > > data, argsz +and flags are also embedded in the IOMMU UAPI data
> > > > structures. +A "__u32 argsz" field is *always* at the beginning
> > > > of each structure. +
> > > > +For example:
> > > > +::
> > > > +
> > > > +   struct iommu_cache_invalidate_info {
> > > > +	__u32	argsz;
> > > > +	#define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
> > > > +	__u32	version;
> > > > +	/* IOMMU paging structure cache */
> > > > +	#define IOMMU_CACHE_INV_TYPE_IOTLB	(1 << 0) /*
> > > > IOMMU IOTLB */
> > > > +	#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB	(1 <<
> > > > 1) /* Device IOTLB */
> > > > +	#define IOMMU_CACHE_INV_TYPE_PASID	(1 << 2) /*
> > > > PASID cache */
> > > > +	#define IOMMU_CACHE_INV_TYPE_NR		(3)
> > > > +	__u8	cache;
> > > > +	__u8	granularity;
> > > > +	__u8	padding[2];
> > > > +	union {
> > > > +		struct iommu_inv_pasid_info pasid_info;
> > > > +		struct iommu_inv_addr_info addr_info;
> > > > +	} granu;
> > > > +   };
> > > > +
> > > > +VFIO is responsible for checking its own argsz and flags then
> > > > invokes +appropriate IOMMU UAPI functions. User pointer is
> > > > passed to IOMMU +layer for further processing. The
> > > > responsibilities are divided as +follows:
> > > > +
> > > > +- Generic IOMMU layer checks argsz range and override
> > > > out-of-range
> > > > +  value. If the exact argsz is based on generic flags, they are
> > > > checked
> > > > +  here as well.
> > > > +
> > > > +- Vendor IOMMU driver checks argsz based on vendor flags, UAPI
> > > > data
> > > > +  is consumed based on flags
> > > > +
> > > > +Once again, use guest TLB invalidation as an example, argsz is
> > > > based +on generic flags in the invalidation information. IOMMU
> > > > generic code +shall process the UAPI data as the following:
> > > > +
> > > > +::
> > > > +
> > > > + int iommu_cache_invalidate(struct iommu_domain *domain, struct
> > > > device *dev,
> > > > +			void __user *uinfo)
> > > > + {
> > > > +	/* Current kernel data size is the max to be copied
> > > > from user */
> > > > +	maxsz = sizeof(struct iommu_cache_invalidate_info);
> > > > +	memset((void *)&inv_info, 0, maxsz);
> > > > +
> > > > +	/*
> > > > +	 * No new spaces can be added before the variable sized
> > > > union, the
> > > > +	 * minimum size is the offset to the union.
> > > > +	 */
> > > > +	minsz = offsetof(struct iommu_cache_invalidate_info,
> > > > granu); +
> > > > +	/* Copy minsz from user to get flags and argsz */
> > > > +	if (copy_from_user(&inv_info, uinfo, minsz))
> > > > +		return -EFAULT;
> > > > +
> > > > +	/* Fields before variable size union is mandatory */
> > > > +	if (inv_info.argsz < minsz)
> > > > +		return -EINVAL;
> > > > +	/*
> > > > +	 * User might be using a newer UAPI header which has a
> > > > larger data
> > > > +	 * size, we shall support the existing flags within the
> > > > current
> > > > +	 * size.
> > > > +	 */
> > > > +	if (inv_info.argsz > maxsz)
> > > > +		inv_info.argsz = maxsz;
> > > > +
> > > > +	/* Checking the exact argsz based on generic flags */
> > > > +	if (inv_info.granularity == IOMMU_INV_GRANU_ADDR &&
> > > > +		inv_info.argsz != offsetofend(struct
> > > > iommu_cache_invalidate_info,
> > > > +					granu.addr_info))      
> > > 
> > > Is it really reasonable to expect the user to specify argsz to the
> > > exact union element for the callback?  I'd certainly expect users
> > > to simply use sizeof(struct iommu_cache_invalidate_info) and it
> > > should therefore be sufficient to test >= here rather than jump
> > > through hoops with an exact size.  We're already changing
> > > inv_info.argsz above to fit our known structure, it's
> > > inconsistent to then expect it to be some exact value.
> > >      
> > I was thinking argsz doesn't have to be the exact struct size. It
> > should be whatever the sufficient & correct size used by the user
> > for a given call.
> > 
> > For example, current struct iommu_gpasid_bind_data {} only has VT-d
> > data. If it gets extended with SMMU data in the union, VT-d vIOMMU
> > emulation should only fill the union size of vt-d.  
> 
> But the user is simply going to have a struct iommu_gpasid_bind_data
> and set argsz to sizeof that struct.  The user is providing that
> entire struct as the arg, so it doesn't really make sense to pinpoint
> the last field of the union they chose to use.  The fields of the
> structure define which union is applicable.  If you're following the
> vfio interface as an example, sufficiently sized is all we require,
> we don't ask for union member level granularity, which I think is
> confusing and error prone for users.
> 
> > > > +		return -EINVAL;
> > > > +
> > > > +	if (inv_info.granularity == IOMMU_INV_GRANU_PASID &&
> > > > +		inv_info.argsz != offsetofend(struct
> > > > iommu_cache_invalidate_info,
> > > > +					granu.pasid_info))
> > > > +		return -EINVAL;
> > > > +
> > > > +	/* Copy the remaining user data _after_ minsz */
> > > > +	if (copy_from_user((void *)&inv_info + minsz, uinfo +
> > > > minsz,
> > > > +				inv_info.argsz - minsz))
> > > > +		return -EFAULT;
> > > > +
> > > > +	return domain->ops->cache_invalidate(domain, dev,
> > > > &inv_info);
> > > > + }
> > > > + Add a wrapper
> > > > +   __iommu_unbind_( kernel data, same user data, kernel copy)
> > > > +    
> > This should be removed. Sorry about the confusion. The patch does
> > not have two data pointers, just separate APIs for kernel and user.
> >   
> > > > +Notice that in this example, since union size is determined by
> > > > generic +flags, all checking to argsz is validated in the
> > > > generic IOMMU layer, +vendor driver does not need to check
> > > > argsz. However, if union size is +based on vendor data, such as
> > > > iommu_sva_bind_gpasid(), it will be +vendor driver's
> > > > responsibility to validate the exact argsz.      
> > > 
> > > struct iommu_cache_invalidate_info is a good example because it
> > > explicitly states a table of type vs granularity validity.  When
> > > the cache_invalidate() callback is used by an internal user we can
> > > consider it a bug in the caller if its usage falls outside of
> > > these prescribed valid combinations, ie. iommu_ops callbacks may
> > > assume a trusted caller that isn't trying to exploit any
> > > loophole.    
> > Separate APIs are proposed in the patchset to address UAPIs
> > with both kernel and user callers. Sorry about the last line in the
> > example above. Currently, only unbind_gpasid() and page_response()
> > have both kernel and userspace callers. e.g.
> > 
> >    /* userspace caller */
> >    int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct
> > device *dev, void __user *udata)
> > 
> >    /* in-kernel caller */
> >    int __iommu_sva_unbind_gpasid(struct iommu_domain *domain,
> > struct device *dev, struct iommu_gpasid_bind_data *data)
> > 
> > We don;t expect in-kernel caller for cache invalidate in that it is
> > implied in unmap, unbind operations.  
> 
> That specific function was only an example of an interface which has
> invalid combinations where in-kernel it's the caller's responsibility
> to abide by the rules, but a user interface needs further validation.
> 
> > >  But here
> > > we've done nothing more than validated the supplied size to pass
> > > it through to a non-user hardened callback.  We didn't check the
> > > version,    
> > Yes, I should move up the version check from vendor driver.
> >   
> > > we didn't check that any of the undefined bits in cache or
> > > granularity or padding were set, we don't know what flags might be
> > > set in the union elements.    
> > You are right, we should sanitize reserved bits.
> >   
> > > For example, if a user is able to set a
> > > flag that gets ignored now, that means we can never use that flag
> > > without potentially breaking that user in the future.    
> > Good point, all reserved/unused bits should be tested.
> >   
> > >  If a user can
> > > pass in version 3141592654 now, then we can never use version for
> > > validation.  I see that intel_iommu_sva_invalidate() does test the
> > > version, but has no obvious other hardening.  I'm afraid we're
> > > being far to lax about accepting a data structure provided by a
> > > user, we should not assume good faith. Thanks,
> > >     
> > Agreed. will add checks in the IOMMU generic layer for reserved
> > bits.
> > For VT-d vendor driver, we do check all bits in cache types, i.e. in
> > intel/iommu.c
> > 	for_each_set_bit(cache_type,
> > 			 (unsigned long *)&inv_info->cache,
> > 			 IOMMU_CACHE_INV_TYPE_NR) {
> > 
> > 
> > one other hardening is to check vendor argsz. This is in the
> > bind_gpasid call.
> > 
> > 	if (data->argsz != offsetofend(struct
> > iommu_gpasid_bind_data, vendor.vtd)) return -EINVAL;  
> 
> This is the same issue I raise above whether this is actually too
> strict.  The user is providing a full structure as the arg, the
> structure defines the valid fields via its content, it shouldn't be
> necessary to have field level granularity for argsz.  Thanks,
> 
> Alex
> 

[Jacob Pan]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-07-08 15:15 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-23 17:03 [PATCH v3 0/5] IOMMU user API enhancement Jacob Pan
2020-06-23 17:03 ` [PATCH v3 1/5] docs: IOMMU user API Jacob Pan
2020-06-26 22:19   ` Alex Williamson
2020-06-29 23:05     ` Jacob Pan
2020-06-30  2:52       ` Tian, Kevin
2020-06-30 17:39         ` Jacob Pan
2020-07-07 21:40       ` Alex Williamson
2020-07-08 15:21         ` Jacob Pan
2020-06-23 17:03 ` [PATCH v3 2/5] iommu/uapi: Add argsz for user filled data Jacob Pan
2020-06-23 17:03 ` [PATCH v3 3/5] iommu/uapi: Use named union for user data Jacob Pan
2020-06-24  6:29   ` Lu Baolu
2020-06-24 15:48     ` Jacob Pan
2020-06-23 17:03 ` [PATCH v3 4/5] iommu/uapi: Handle data and argsz filled by users Jacob Pan
2020-06-24  6:54   ` Lu Baolu
2020-06-24 17:07     ` Jacob Pan
2020-06-25  7:07       ` Lu Baolu
2020-06-23 17:03 ` [PATCH v3 5/5] iommu/uapi: Support both kernel and user unbind guest PASID Jacob Pan
2020-06-24  7:55   ` Lu Baolu
2020-06-25 12:59   ` Lu Baolu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).