iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/5] IOMMU user API enhancement
@ 2020-07-07 23:43 Jacob Pan
  2020-07-07 23:43 ` [PATCH v4 1/5] docs: IOMMU user API Jacob Pan
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Jacob Pan @ 2020-07-07 23:43 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, Alex Williamson
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	Christoph Hellwig, David Woodhouse

IOMMU user API header was introduced to support nested DMA translation and
related fault handling. The current UAPI data structures consist of three
areas that cover the interactions between host kernel and guest:
 - fault handling
 - cache invalidation
 - bind guest page tables, i.e. guest PASID

Future extensions are likely to support more architectures and vIOMMU features.

In the previous discussion, using user-filled data size and feature flags is
made a preferred approach over a unified version number.
https://lkml.org/lkml/2020/1/29/45

In addition to introduce argsz field to data structures, this patchset is also
trying to document the UAPI design, usage, and extension rules. VT-d driver
changes to utilize the new argsz field is included, VFIO usage is to follow.

Thanks,

Jacob


Changeog:
v4
	- Added checks of UAPI data for reserved fields, version, and flags.
	- Removed version check from vendor driver (vt-d)
	- Relaxed argsz check to match the UAPI struct size instead of variable
	  union size
	- Updated documentation

v3:
	- Rewrote backward compatibility rule to support existing code
	  re-compiled with newer kernel UAPI header that runs on older
	  kernel. Based on review comment from Alex W.
	  https://lore.kernel.org/linux-iommu/20200611094741.6d118fa8@w520.home/
	- Take user pointer directly in UAPI functions. Perform argsz check
	  and copy_from_user() in IOMMU driver. Eliminate the need for
	  VFIO or other upper layer to parse IOMMU data.
	- Create wrapper function for in-kernel users of UAPI functions
v2:
	- Removed unified API version and helper
	- Introduced argsz for each UAPI data
	- Introduced UAPI doc

Jacob Pan (5):
  docs: IOMMU user API
  iommu/uapi: Add argsz for user filled data
  iommu/uapi: Use named union for user data
  iommu/uapi: Handle data and argsz filled by users
  iommu/vt-d: Remove UAPI version check

 Documentation/userspace-api/iommu.rst | 312 ++++++++++++++++++++++++++++++++++
 drivers/iommu/intel/iommu.c           |  27 ++-
 drivers/iommu/intel/svm.c             |   5 +-
 drivers/iommu/iommu.c                 | 208 ++++++++++++++++++++++-
 include/linux/iommu.h                 |   9 +-
 include/uapi/linux/iommu.h            |  10 +-
 6 files changed, 541 insertions(+), 30 deletions(-)
 create mode 100644 Documentation/userspace-api/iommu.rst

-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v4 1/5] docs: IOMMU user API
  2020-07-07 23:43 [PATCH v4 0/5] IOMMU user API enhancement Jacob Pan
@ 2020-07-07 23:43 ` Jacob Pan
  2020-07-08  2:07   ` Lu Baolu
  2020-07-13 22:48   ` Alex Williamson
  2020-07-07 23:43 ` [PATCH v4 2/5] iommu/uapi: Add argsz for user filled data Jacob Pan
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 16+ messages in thread
From: Jacob Pan @ 2020-07-07 23:43 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, Alex Williamson
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	Christoph Hellwig, David Woodhouse

IOMMU UAPI is newly introduced to support communications between guest
virtual IOMMU and host IOMMU. There has been lots of discussions on how
it should work with VFIO UAPI and userspace in general.

This document is indended to clarify the UAPI design and usage. The
mechenics of how future extensions should be achieved are also covered
in this documentation.

Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 Documentation/userspace-api/iommu.rst | 312 ++++++++++++++++++++++++++++++++++
 1 file changed, 312 insertions(+)
 create mode 100644 Documentation/userspace-api/iommu.rst

diff --git a/Documentation/userspace-api/iommu.rst b/Documentation/userspace-api/iommu.rst
new file mode 100644
index 000000000000..581b462c2cec
--- /dev/null
+++ b/Documentation/userspace-api/iommu.rst
@@ -0,0 +1,312 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. iommu:
+
+=====================================
+IOMMU Userspace API
+=====================================
+
+IOMMU UAPI is used for virtualization cases where communications are
+needed between physical and virtual IOMMU drivers. For native
+usage, IOMMU is a system device which does not need to communicate
+with user space directly.
+
+The primary use cases are guest Shared Virtual Address (SVA) and
+guest IO virtual address (IOVA), wherein a virtual IOMMU (vIOMMU) is
+required to communicate with the physical IOMMU in the host.
+
+.. contents:: :local:
+
+Functionalities
+===============
+Communications of user and kernel involve both directions. The
+supported user-kernel APIs are as follows:
+
+1. Alloc/Free PASID
+2. Bind/unbind guest PASID (e.g. Intel VT-d)
+3. Bind/unbind guest PASID table (e.g. ARM sMMU)
+4. Invalidate IOMMU caches
+5. Service page requests
+
+Requirements
+============
+The IOMMU UAPIs are generic and extensible to meet the following
+requirements:
+
+1. Emulated and para-virtualised vIOMMUs
+2. Multiple vendors (Intel VT-d, ARM sMMU, etc.)
+3. Extensions to the UAPI shall not break existing user space
+
+Interfaces
+==========
+Although the data structures defined in IOMMU UAPI are self-contained,
+there is no user API functions introduced. Instead, IOMMU UAPI is
+designed to work with existing user driver frameworks such as VFIO.
+
+Extension Rules & Precautions
+-----------------------------
+When IOMMU UAPI gets extended, the data structures can *only* be
+modified in two ways:
+
+1. Adding new fields by re-purposing the padding[] field. No size change.
+2. Adding new union members at the end. May increase in size.
+
+No new fields can be added *after* the variable sized union in that it
+will break backward compatibility when offset moves. In both cases, a
+new flag must be accompanied with a new field such that the IOMMU
+driver can process the data based on the new flag. Version field is
+only reserved for the unlikely event of UAPI upgrade at its entirety.
+
+It's *always* the caller's responsibility to indicate the size of the
+structure passed by setting argsz appropriately.
+Though at the same time, argsz is user provided data which is not
+trusted. The argsz field allows the user to indicate how much data
+they're providing, it's still the kernel's responsibility to validate
+whether it's correct and sufficient for the requested operation.
+
+Compatibility Checking
+----------------------
+When IOMMU UAPI extension results in size increase, user such as VFIO
+has to handle the following cases:
+
+1. User and kernel has exact size match
+2. An older user with older kernel header (smaller UAPI size) running on a
+   newer kernel (larger UAPI size)
+3. A newer user with newer kernel header (larger UAPI size) running
+   on an older kernel.
+4. A malicious/misbehaving user pass illegal/invalid size but within
+   range. The data may contain garbage.
+
+Feature Checking
+----------------
+While launching a guest with vIOMMU, it is important to ensure that host
+can support the UAPI data structures to be used for vIOMMU-pIOMMU
+communications. Without upfront compatibility checking, future faults
+are difficult to report even in normal conditions. For example, TLB
+invalidations should always succeed. There is no architectural way to
+report back to the vIOMMU if the UAPI data is incompatible. If that
+happens, in order to protect IOMMU iosolation guarantee, we have to
+resort to not giving completion status in vIOMMU. This may result in
+VM hang.
+
+For this reason the following IOMMU UAPIs cannot fail:
+
+1. Free PASID
+2. Unbind guest PASID
+3. Unbind guest PASID table (SMMU)
+4. Cache invalidate
+
+User applications such as QEMU is expected to import kernel UAPI
+headers. Backward compatibility is supported per feature flags.
+For example, an older QEMU (with older kernel header) can run on newer
+kernel. Newer QEMU (with new kernel header) may refuse to initialize
+on an older kernel if new feature flags are not supported by older
+kernel. Simply recompile existing code with newer kernel header should
+not be an issue in that only existing flags are used.
+
+IOMMU vendor driver should report the below features to IOMMU UAPI
+consumers (e.g. via VFIO).
+
+1. IOMMU_NESTING_FEAT_SYSWIDE_PASID
+2. IOMMU_NESTING_FEAT_BIND_PGTBL
+3. IOMMU_NESTING_FEAT_BIND_PASID_TABLE
+4. IOMMU_NESTING_FEAT_CACHE_INVLD
+5. IOMMU_NESTING_FEAT_PAGE_REQUEST
+
+Take VFIO as example, upon request from VFIO user space (e.g. QEMU),
+VFIO kernel code shall query IOMMU vendor driver for the support of
+the above features. Query result can then be reported back to the
+user-space caller. Details can be found in
+Documentation/driver-api/vfio.rst.
+
+
+Data Passing Example with VFIO
+------------------------------
+As the ubiquitous userspace driver framework, VFIO is already IOMMU
+aware and share many key concepts such as device model, group, and
+protection domain. Other user driver frameworks can also be extended
+to support IOMMU UAPI but it is outside the scope of this document.
+
+In this tight-knit VFIO-IOMMU interface, the ultimate consumer of the
+IOMMU UAPI data is the host IOMMU driver. VFIO facilitates user-kernel
+transport, capability checking, security, and life cycle management of
+process address space ID (PASID).
+
+Unlike normal user data passed via VFIO UAPI IOTCL, IOMMU driver is the
+ultimate consumer of its UAPI data. At VFIO layer, the IOMMU UAPI data
+is wrapped in a VFIO UAPI data. It follows the
+pattern below::
+
+   struct {
+	__u32 argsz;
+	__u32 flags;
+	__u8  data[];
+   };
+
+Here data[] contains the IOMMU UAPI data structures. VFIO has the
+freedom to bundle the data as well as parse data size based on its own flags.
+
+In order to determine the size and feature set of the user data, argsz
+and flags are also embedded in the IOMMU UAPI data structures.
+A "__u32 argsz" field is *always* at the beginning of each structure.
+
+For example:
+::
+
+   struct iommu_cache_invalidate_info {
+	__u32	argsz;
+	#define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
+	__u32	version;
+	/* IOMMU paging structure cache */
+	#define IOMMU_CACHE_INV_TYPE_IOTLB	(1 << 0) /* IOMMU IOTLB */
+	#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB	(1 << 1) /* Device IOTLB */
+	#define IOMMU_CACHE_INV_TYPE_PASID	(1 << 2) /* PASID cache */
+	#define IOMMU_CACHE_INV_TYPE_NR		(3)
+	__u8	cache;
+	__u8	granularity;
+	__u8	padding[2];
+	union {
+		struct iommu_inv_pasid_info pasid_info;
+		struct iommu_inv_addr_info addr_info;
+	} granu;
+   };
+
+VFIO is responsible for checking its own argsz and flags then invokes
+appropriate IOMMU UAPI functions. User pointer is passed to IOMMU
+layer for further processing. The responsibilities are divided as
+follows:
+
+- Generic IOMMU layer checks argsz range and override out-of-range
+  value.
+
+- Generic IOMMU layer checks content of the UAPI data for non-zero
+  reserved bits in flags, padding fields, and unsupported version.
+  This is to ensure not breaking userspace in the future when these
+  fields or flags are used.
+
+- Vendor IOMMU driver checks argsz based on vendor flags, UAPI data
+  is consumed based on flags
+
+Once again, use guest TLB invalidation as an example, argsz is based
+on generic flags in the invalidation information. IOMMU generic code
+shall process the UAPI data as the following:
+
+::
+
+ static int iommu_check_cache_invl_data(struct iommu_cache_invalidate_info *info)
+ {
+	int ret = 0;
+	u32 mask;
+
+	if (info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
+		return -EINVAL;
+
+	mask =  IOMMU_CACHE_INV_TYPE_IOTLB |
+		IOMMU_CACHE_INV_TYPE_DEV_IOTLB |
+		IOMMU_CACHE_INV_TYPE_PASID;
+	if (info->cache & ~mask) {
+		pr_warn_ratelimited("Invalid cache types %x\n", info->cache);
+		return -EINVAL;
+	}
+
+	if (info->granularity >= IOMMU_INV_GRANU_NR) {
+		pr_warn_ratelimited("Invalid cache invalidation granu %x\n",
+				info->granularity);
+		return -EINVAL;
+	}
+
+	switch (info->granularity) {
+	case IOMMU_INV_GRANU_ADDR:
+		mask = IOMMU_INV_ADDR_FLAGS_PASID |
+			IOMMU_INV_ADDR_FLAGS_ARCHID |
+			IOMMU_INV_ADDR_FLAGS_LEAF;
+
+		if (info->granu.addr_info.flags & ~mask) {
+			pr_warn_ratelimited("Unsupported invalidation addr flags %x\n",
+					info->granu.addr_info.flags);
+			ret = -EINVAL;
+		}
+		break;
+	case IOMMU_INV_GRANU_PASID:
+		mask = IOMMU_INV_PASID_FLAGS_PASID |
+			IOMMU_INV_PASID_FLAGS_ARCHID;
+		if (info->granu.pasid_info.flags & ~mask) {
+			pr_warn_ratelimited("Unsupported invalidation PASID flags%x\n",
+					info->granu.pasid_info.flags);
+			ret = -EINVAL;
+		}
+		break;
+	}
+
+	if (info->padding[0] || info->padding[1]) {
+		pr_warn_ratelimited("Non-zero reserved fields\n");
+		ret = -EINVAL;
+	}
+
+	return ret;
+ }
+
+ int iommu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
+			   void __user *uinfo)
+ {
+	struct iommu_cache_invalidate_info inv_info;
+	unsigned long minsz, maxsz;
+	int ret = 0;
+
+	if (unlikely(!domain->ops->cache_invalidate))
+		return -ENODEV;
+
+	/* Current kernel data size is the max to be copied from user */
+	maxsz = sizeof(struct iommu_cache_invalidate_info);
+	memset((void *)&inv_info, 0, maxsz);
+
+	/*
+	 * No new spaces can be added before the variable sized union, the
+	 * minimum size is the offset to the union.
+	 */
+	minsz = offsetof(struct iommu_cache_invalidate_info, granu);
+
+	/* Copy minsz from user to get flags and argsz */
+	if (copy_from_user(&inv_info, uinfo, minsz))
+		return -EFAULT;
+
+	/* Fields before variable size union is mandatory */
+	if (inv_info.argsz < minsz)
+		return -EINVAL;
+
+	/*
+	 * User might be using a newer UAPI header which has a larger data
+	 * size, we shall support the existing flags within the current
+	 * size.
+	 */
+	if (inv_info.argsz > maxsz)
+		inv_info.argsz = maxsz;
+
+	/* Copy the remaining user data _after_ minsz */
+	if (copy_from_user((void *)&inv_info + minsz, uinfo + minsz,
+				inv_info.argsz - minsz))
+		return -EFAULT;
+
+	/* Now the argsz is validated, check the content for reserved bits */
+	ret = iommu_check_cache_invl_data(&inv_info);
+	if (ret)
+		return ret;
+
+	return domain->ops->cache_invalidate(domain, dev, &inv_info);
+ }
+
+Notice that in this example, since union size is determined by generic
+flags, all checking to argsz is validated in the generic IOMMU layer,
+vendor driver does not need to check argsz.
+
+For UAPIs that are shared with in-kernel users, a wrapper function
+is provided to distinguish the callers. For example,
+
+Userspace caller ::
+
+  int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
+  void __user *udata)
+
+In-kernel caller ::
+
+  int __iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
+  struct iommu_gpasid_bind_data *data)
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 2/5] iommu/uapi: Add argsz for user filled data
  2020-07-07 23:43 [PATCH v4 0/5] IOMMU user API enhancement Jacob Pan
  2020-07-07 23:43 ` [PATCH v4 1/5] docs: IOMMU user API Jacob Pan
@ 2020-07-07 23:43 ` Jacob Pan
  2020-07-07 23:43 ` [PATCH v4 3/5] iommu/uapi: Use named union for user data Jacob Pan
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 16+ messages in thread
From: Jacob Pan @ 2020-07-07 23:43 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, Alex Williamson
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	Christoph Hellwig, David Woodhouse

As IOMMU UAPI gets extended, user data size may increase. To support
backward compatibiliy, this patch introduces a size field to each UAPI
data structures. It is *always* the responsibility for the user to fill in
the correct size.

Specific scenarios for user data handling are documented in:
Documentation/userspace-api/iommu.rst

Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 include/uapi/linux/iommu.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index e907b7091a46..303f148a5cd7 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -135,6 +135,7 @@ enum iommu_page_response_code {
 
 /**
  * struct iommu_page_response - Generic page response information
+ * @argsz: User filled size of this data
  * @version: API version of this structure
  * @flags: encodes whether the corresponding fields are valid
  *         (IOMMU_FAULT_PAGE_RESPONSE_* values)
@@ -143,6 +144,7 @@ enum iommu_page_response_code {
  * @code: response code from &enum iommu_page_response_code
  */
 struct iommu_page_response {
+	__u32	argsz;
 #define IOMMU_PAGE_RESP_VERSION_1	1
 	__u32	version;
 #define IOMMU_PAGE_RESP_PASID_VALID	(1 << 0)
@@ -218,6 +220,7 @@ struct iommu_inv_pasid_info {
 /**
  * struct iommu_cache_invalidate_info - First level/stage invalidation
  *     information
+ * @argsz: User filled size of this data
  * @version: API version of this structure
  * @cache: bitfield that allows to select which caches to invalidate
  * @granularity: defines the lowest granularity used for the invalidation:
@@ -246,6 +249,7 @@ struct iommu_inv_pasid_info {
  * must support the used granularity.
  */
 struct iommu_cache_invalidate_info {
+	__u32	argsz;
 #define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
 	__u32	version;
 /* IOMMU paging structure cache */
@@ -292,6 +296,7 @@ struct iommu_gpasid_bind_data_vtd {
 
 /**
  * struct iommu_gpasid_bind_data - Information about device and guest PASID binding
+ * @argsz:	User filled size of this data
  * @version:	Version of this data structure
  * @format:	PASID table entry format
  * @flags:	Additional information on guest bind request
@@ -309,6 +314,7 @@ struct iommu_gpasid_bind_data_vtd {
  * PASID to host PASID based on this bind data.
  */
 struct iommu_gpasid_bind_data {
+	__u32 argsz;
 #define IOMMU_GPASID_BIND_VERSION_1	1
 	__u32 version;
 #define IOMMU_PASID_FORMAT_INTEL_VTD	1
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 3/5] iommu/uapi: Use named union for user data
  2020-07-07 23:43 [PATCH v4 0/5] IOMMU user API enhancement Jacob Pan
  2020-07-07 23:43 ` [PATCH v4 1/5] docs: IOMMU user API Jacob Pan
  2020-07-07 23:43 ` [PATCH v4 2/5] iommu/uapi: Add argsz for user filled data Jacob Pan
@ 2020-07-07 23:43 ` Jacob Pan
  2020-07-08  2:17   ` Lu Baolu
  2020-07-07 23:43 ` [PATCH v4 4/5] iommu/uapi: Handle data and argsz filled by users Jacob Pan
  2020-07-07 23:43 ` [PATCH v4 5/5] iommu/vt-d: Remove UAPI version check Jacob Pan
  4 siblings, 1 reply; 16+ messages in thread
From: Jacob Pan @ 2020-07-07 23:43 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, Alex Williamson
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	Christoph Hellwig, David Woodhouse

IOMMU UAPI data size is filled by the user space which must be validated
by ther kernel. To ensure backward compatibility, user data can only be
extended by either re-purpose padding bytes or extend the variable sized
union at the end. No size change is allowed before the union. Therefore,
the minimum size is the offset of the union.

To use offsetof() on the union, we must make it named.

Link: https://lkml.org/lkml/2020/6/11/834
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/intel/iommu.c | 24 ++++++++++++------------
 drivers/iommu/intel/svm.c   |  2 +-
 include/uapi/linux/iommu.h  |  4 ++--
 3 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 50fc62413a35..abd70b618a3f 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5409,8 +5409,8 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, struct device *dev,
 
 	/* Size is only valid in address selective invalidation */
 	if (inv_info->granularity == IOMMU_INV_GRANU_ADDR)
-		size = to_vtd_size(inv_info->addr_info.granule_size,
-				   inv_info->addr_info.nb_granules);
+		size = to_vtd_size(inv_info->granu.addr_info.granule_size,
+				   inv_info->granu.addr_info.nb_granules);
 
 	for_each_set_bit(cache_type,
 			 (unsigned long *)&inv_info->cache,
@@ -5431,20 +5431,20 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, struct device *dev,
 		 * granularity.
 		 */
 		if (inv_info->granularity == IOMMU_INV_GRANU_PASID &&
-		    (inv_info->pasid_info.flags & IOMMU_INV_PASID_FLAGS_PASID))
-			pasid = inv_info->pasid_info.pasid;
+		    (inv_info->granu.pasid_info.flags & IOMMU_INV_PASID_FLAGS_PASID))
+			pasid = inv_info->granu.pasid_info.pasid;
 		else if (inv_info->granularity == IOMMU_INV_GRANU_ADDR &&
-			 (inv_info->addr_info.flags & IOMMU_INV_ADDR_FLAGS_PASID))
-			pasid = inv_info->addr_info.pasid;
+			 (inv_info->granu.addr_info.flags & IOMMU_INV_ADDR_FLAGS_PASID))
+			pasid = inv_info->granu.addr_info.pasid;
 
 		switch (BIT(cache_type)) {
 		case IOMMU_CACHE_INV_TYPE_IOTLB:
 			/* HW will ignore LSB bits based on address mask */
 			if (inv_info->granularity == IOMMU_INV_GRANU_ADDR &&
 			    size &&
-			    (inv_info->addr_info.addr & ((BIT(VTD_PAGE_SHIFT + size)) - 1))) {
+			    (inv_info->granu.addr_info.addr & ((BIT(VTD_PAGE_SHIFT + size)) - 1))) {
 				WARN_ONCE(1, "Address out of range, 0x%llx, size order %llu\n",
-					  inv_info->addr_info.addr, size);
+					  inv_info->granu.addr_info.addr, size);
 			}
 
 			/*
@@ -5452,9 +5452,9 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, struct device *dev,
 			 * We use npages = -1 to indicate that.
 			 */
 			qi_flush_piotlb(iommu, did, pasid,
-					mm_to_dma_pfn(inv_info->addr_info.addr),
+					mm_to_dma_pfn(inv_info->granu.addr_info.addr),
 					(granu == QI_GRAN_NONG_PASID) ? -1 : 1 << size,
-					inv_info->addr_info.flags & IOMMU_INV_ADDR_FLAGS_LEAF);
+					inv_info->granu.addr_info.flags & IOMMU_INV_ADDR_FLAGS_LEAF);
 
 			if (!info->ats_enabled)
 				break;
@@ -5475,13 +5475,13 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, struct device *dev,
 				size = 64 - VTD_PAGE_SHIFT;
 				addr = 0;
 			} else if (inv_info->granularity == IOMMU_INV_GRANU_ADDR)
-				addr = inv_info->addr_info.addr;
+				addr = inv_info->granu.addr_info.addr;
 
 			if (info->ats_enabled)
 				qi_flush_dev_iotlb_pasid(iommu, sid,
 						info->pfsid, pasid,
 						info->ats_qdep,
-						inv_info->addr_info.addr,
+						inv_info->granu.addr_info.addr,
 						size);
 			else
 				pr_warn_ratelimited("Passdown device IOTLB flush w/o ATS!\n");
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index d386853121a2..713b3a218483 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -338,7 +338,7 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
 	spin_lock(&iommu->lock);
 	ret = intel_pasid_setup_nested(iommu, dev,
 				       (pgd_t *)(uintptr_t)data->gpgd,
-				       data->hpasid, &data->vtd, dmar_domain,
+				       data->hpasid, &data->vendor.vtd, dmar_domain,
 				       data->addr_width);
 	spin_unlock(&iommu->lock);
 	if (ret) {
diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index 303f148a5cd7..1afc6610b0ad 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -263,7 +263,7 @@ struct iommu_cache_invalidate_info {
 	union {
 		struct iommu_inv_pasid_info pasid_info;
 		struct iommu_inv_addr_info addr_info;
-	};
+	} granu;
 };
 
 /**
@@ -329,7 +329,7 @@ struct iommu_gpasid_bind_data {
 	/* Vendor specific data */
 	union {
 		struct iommu_gpasid_bind_data_vtd vtd;
-	};
+	} vendor;
 };
 
 #endif /* _UAPI_IOMMU_H */
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 4/5] iommu/uapi: Handle data and argsz filled by users
  2020-07-07 23:43 [PATCH v4 0/5] IOMMU user API enhancement Jacob Pan
                   ` (2 preceding siblings ...)
  2020-07-07 23:43 ` [PATCH v4 3/5] iommu/uapi: Use named union for user data Jacob Pan
@ 2020-07-07 23:43 ` Jacob Pan
  2020-07-07 23:43 ` [PATCH v4 5/5] iommu/vt-d: Remove UAPI version check Jacob Pan
  4 siblings, 0 replies; 16+ messages in thread
From: Jacob Pan @ 2020-07-07 23:43 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, Alex Williamson
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	Christoph Hellwig, David Woodhouse

IOMMU UAPI data has a user filled argsz field which indicates the data
length comes with the API call. User data is not trusted, argsz must be
validated based on the current kernel data size, mandatory data size,
and feature flags.

User data may also be extended, results in possible argsz increase.
Backward compatibility is ensured based on size and flags checking.

This patch adds sanity checks in the IOMMU layer. In addition to argsz,
reserved/unused fields in padding, flags, and version are also checked.
Details are documented in Documentation/userspace-api/iommu.rst

Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/iommu.c | 208 ++++++++++++++++++++++++++++++++++++++++++++++++--
 include/linux/iommu.h |   9 ++-
 2 files changed, 206 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d43120eb1dc5..7910249f5dd7 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1950,33 +1950,225 @@ int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_attach_device);
 
+/*
+ * Check flags and other user privided data for valid combinations. We also
+ * make sure no reserved fields or unused flags are not set. This is to ensure
+ * not breaking userspace in the future when these fields or flags are used.
+ */
+static int iommu_check_cache_invl_data(struct iommu_cache_invalidate_info *info)
+{
+	int ret = 0;
+	u32 mask;
+
+	if (info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
+		return -EINVAL;
+
+	mask =  IOMMU_CACHE_INV_TYPE_IOTLB |
+		IOMMU_CACHE_INV_TYPE_DEV_IOTLB |
+		IOMMU_CACHE_INV_TYPE_PASID;
+	if (info->cache & ~mask) {
+		pr_warn_ratelimited("Invalid cache types %x\n", info->cache);
+		return -EINVAL;
+	}
+
+	if (info->granularity >= IOMMU_INV_GRANU_NR) {
+		pr_warn_ratelimited("Invalid cache invalidation granu %x\n",
+				info->granularity);
+		return -EINVAL;
+	}
+
+	switch (info->granularity) {
+	case IOMMU_INV_GRANU_ADDR:
+		mask = IOMMU_INV_ADDR_FLAGS_PASID |
+			IOMMU_INV_ADDR_FLAGS_ARCHID |
+			IOMMU_INV_ADDR_FLAGS_LEAF;
+
+		if (info->granu.addr_info.flags & ~mask) {
+			pr_warn_ratelimited("Unsupported invalidation addr flags %x\n",
+					info->granu.addr_info.flags);
+			ret = -EINVAL;
+		}
+		break;
+	case IOMMU_INV_GRANU_PASID:
+		mask = IOMMU_INV_PASID_FLAGS_PASID |
+			IOMMU_INV_PASID_FLAGS_ARCHID;
+		if (info->granu.pasid_info.flags & ~mask) {
+			pr_warn_ratelimited("Unsupported invalidation PASID flags%x\n",
+					info->granu.pasid_info.flags);
+			ret = -EINVAL;
+		}
+		break;
+	}
+
+	if (info->padding[0] || info->padding[1]) {
+		pr_warn_ratelimited("Non-zero reserved fields\n");
+		ret = -EINVAL;
+	}
+
+	return ret;
+}
+
 int iommu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
-			   struct iommu_cache_invalidate_info *inv_info)
+			   void __user *uinfo)
 {
+	struct iommu_cache_invalidate_info inv_info;
+	unsigned long minsz, maxsz;
+	int ret = 0;
+
 	if (unlikely(!domain->ops->cache_invalidate))
 		return -ENODEV;
 
-	return domain->ops->cache_invalidate(domain, dev, inv_info);
+	/* Current kernel data size is the max to be copied from user */
+	maxsz = sizeof(struct iommu_cache_invalidate_info);
+	memset((void *)&inv_info, 0, maxsz);
+
+	/*
+	 * No new spaces can be added before the variable sized union, the
+	 * minimum size is the offset to the union.
+	 */
+	minsz = offsetof(struct iommu_cache_invalidate_info, granu);
+
+	/* Copy minsz from user to get flags and argsz */
+	if (copy_from_user(&inv_info, uinfo, minsz))
+		return -EFAULT;
+
+	/* Fields before variable size union is mandatory */
+	if (inv_info.argsz < minsz)
+		return -EINVAL;
+
+	/*
+	 * User might be using a newer UAPI header which has a larger data
+	 * size, we shall support the existing flags within the current
+	 * size.
+	 */
+	if (inv_info.argsz > maxsz)
+		inv_info.argsz = maxsz;
+
+	/* Copy the remaining user data _after_ minsz */
+	if (copy_from_user((void *)&inv_info + minsz, uinfo + minsz,
+				inv_info.argsz - minsz))
+		return -EFAULT;
+
+	/* Now the argsz is validated, check the content */
+	ret = iommu_check_cache_invl_data(&inv_info);
+	if (ret)
+		return ret;
+
+	return domain->ops->cache_invalidate(domain, dev, &inv_info);
 }
 EXPORT_SYMBOL_GPL(iommu_cache_invalidate);
 
-int iommu_sva_bind_gpasid(struct iommu_domain *domain,
-			   struct device *dev, struct iommu_gpasid_bind_data *data)
+
+static int iommu_check_bind_data(struct iommu_gpasid_bind_data *data)
 {
+	u32 mask;
+	int i;
+
+	if (data->version != IOMMU_GPASID_BIND_VERSION_1)
+		return -EINVAL;
+
+	/* Check all supported format, for now just VT-d */
+	mask = IOMMU_PASID_FORMAT_INTEL_VTD;
+	if (data->format & ~mask)
+		return -EINVAL;
+
+	/* Check all flags */
+	mask = IOMMU_SVA_GPASID_VAL;
+	if (data->flags & ~mask)
+		return -EINVAL;
+
+	/* Check reserved padding fields */
+	for (i = 0; i < 12; i++) {
+		if (data->padding[i]) {
+			pr_warn_ratelimited("Non-zero reserved field\n");
+			return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+
+static int iommu_sva_prepare_bind_data(void __user *udata,
+				       struct iommu_gpasid_bind_data *data)
+{
+	unsigned long minsz, maxsz;
+
+	/* Current kernel data size is the max to be copied from user */
+	maxsz = sizeof(struct iommu_gpasid_bind_data);
+	memset((void *)data, 0, maxsz);
+
+	/*
+	 * No new spaces can be added before the variable sized union, the
+	 * minimum size is the offset to the union.
+	 */
+	minsz = offsetof(struct iommu_gpasid_bind_data, vendor);
+
+	/* Copy minsz from user to get flags and argsz */
+	if (copy_from_user(data, udata, minsz))
+		return -EFAULT;
+
+	/* Fields before variable size union is mandatory */
+	if (data->argsz < minsz)
+		return -EINVAL;
+	/*
+	 * User might be using a newer UAPI header, we shall let IOMMU vendor
+	 * driver decide on what size it needs. Since the guest PASID bind data
+	 * can be vendor specific, larger argsz could be the result of extension
+	 * for one vendor but it should not affect another vendor.
+	 */
+	if (data->argsz > maxsz)
+		data->argsz = maxsz;
+
+	/* Copy the remaining user data _after_ minsz */
+	if (copy_from_user((void *)data + minsz, udata + minsz,
+				data->argsz - minsz))
+		return -EFAULT;
+
+	return iommu_check_bind_data(data);
+}
+
+int iommu_sva_bind_gpasid(struct iommu_domain *domain, struct device *dev,
+						void __user *udata)
+{
+
+	struct iommu_gpasid_bind_data data;
+	int ret;
+
 	if (unlikely(!domain->ops->sva_bind_gpasid))
 		return -ENODEV;
 
-	return domain->ops->sva_bind_gpasid(domain, dev, data);
+	ret = iommu_sva_prepare_bind_data(udata, &data);
+	if (ret)
+		return ret;
+
+	return domain->ops->sva_bind_gpasid(domain, dev, &data);
 }
 EXPORT_SYMBOL_GPL(iommu_sva_bind_gpasid);
 
-int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
-			     ioasid_t pasid)
+int __iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
+			struct iommu_gpasid_bind_data *data)
 {
 	if (unlikely(!domain->ops->sva_unbind_gpasid))
 		return -ENODEV;
 
-	return domain->ops->sva_unbind_gpasid(dev, pasid);
+	return domain->ops->sva_unbind_gpasid(dev, data->hpasid);
+}
+EXPORT_SYMBOL_GPL(__iommu_sva_unbind_gpasid);
+
+int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
+			void __user *udata)
+{
+	struct iommu_gpasid_bind_data data;
+	int ret;
+
+	if (unlikely(!domain->ops->sva_bind_gpasid))
+		return -ENODEV;
+
+	ret = iommu_sva_prepare_bind_data(udata, &data);
+	if (ret)
+		return ret;
+
+	return __iommu_sva_unbind_gpasid(domain, dev, &data);
 }
 EXPORT_SYMBOL_GPL(iommu_sva_unbind_gpasid);
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 5f0b7859d2eb..7ca9d48c276c 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -432,11 +432,14 @@ extern void iommu_detach_device(struct iommu_domain *domain,
 				struct device *dev);
 extern int iommu_cache_invalidate(struct iommu_domain *domain,
 				  struct device *dev,
-				  struct iommu_cache_invalidate_info *inv_info);
+				  void __user *uinfo);
+
 extern int iommu_sva_bind_gpasid(struct iommu_domain *domain,
-		struct device *dev, struct iommu_gpasid_bind_data *data);
+				struct device *dev, void __user *udata);
 extern int iommu_sva_unbind_gpasid(struct iommu_domain *domain,
-				struct device *dev, ioasid_t pasid);
+				struct device *dev, void __user *udata);
+extern int __iommu_sva_unbind_gpasid(struct iommu_domain *domain,
+				struct device *dev, struct iommu_gpasid_bind_data *data);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 5/5] iommu/vt-d: Remove UAPI version check
  2020-07-07 23:43 [PATCH v4 0/5] IOMMU user API enhancement Jacob Pan
                   ` (3 preceding siblings ...)
  2020-07-07 23:43 ` [PATCH v4 4/5] iommu/uapi: Handle data and argsz filled by users Jacob Pan
@ 2020-07-07 23:43 ` Jacob Pan
  4 siblings, 0 replies; 16+ messages in thread
From: Jacob Pan @ 2020-07-07 23:43 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, Alex Williamson
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	Christoph Hellwig, David Woodhouse

IOMMU generic layer already does sanity checks on UAPI data which
include version check. Remove the version check from vendor driver.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/intel/iommu.c | 3 +--
 drivers/iommu/intel/svm.c   | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index abd70b618a3f..63fcca4645c9 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5383,8 +5383,7 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, struct device *dev,
 	int ret = 0;
 	u64 size = 0;
 
-	if (!inv_info || !dmar_domain ||
-	    inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
+	if (!inv_info || !dmar_domain)
 		return -EINVAL;
 
 	if (!dev || !dev_is_pci(dev))
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 713b3a218483..15b36fa0147a 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -240,8 +240,7 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
 	if (WARN_ON(!iommu) || !data)
 		return -EINVAL;
 
-	if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
-	    data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
+	if (data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
 		return -EINVAL;
 
 	if (!dev_is_pci(dev))
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 1/5] docs: IOMMU user API
  2020-07-07 23:43 ` [PATCH v4 1/5] docs: IOMMU user API Jacob Pan
@ 2020-07-08  2:07   ` Lu Baolu
  2020-07-08 15:29     ` Jacob Pan
  2020-07-13 22:48   ` Alex Williamson
  1 sibling, 1 reply; 16+ messages in thread
From: Lu Baolu @ 2020-07-08  2:07 UTC (permalink / raw)
  To: Jacob Pan, iommu, LKML, Joerg Roedel, Alex Williamson
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	Christoph Hellwig, David Woodhouse

Hi,

On 7/8/20 7:43 AM, Jacob Pan wrote:
> +For UAPIs that are shared with in-kernel users, a wrapper function
> +is provided to distinguish the callers. For example,
> +
> +Userspace caller ::
> +
> +  int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
> +  void __user *udata)
> +
> +In-kernel caller ::
> +
> +  int __iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
> +  struct iommu_gpasid_bind_data *data)

iommu_page_response() may have the same needs. Can we reach an agreement
on the naming rules?

Best regards,
baolu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 3/5] iommu/uapi: Use named union for user data
  2020-07-07 23:43 ` [PATCH v4 3/5] iommu/uapi: Use named union for user data Jacob Pan
@ 2020-07-08  2:17   ` Lu Baolu
  2020-07-08 15:18     ` Jacob Pan
  0 siblings, 1 reply; 16+ messages in thread
From: Lu Baolu @ 2020-07-08  2:17 UTC (permalink / raw)
  To: Jacob Pan, iommu, LKML, Joerg Roedel, Alex Williamson
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	Christoph Hellwig, David Woodhouse

Hi Jacob,

On 7/8/20 7:43 AM, Jacob Pan wrote:
> IOMMU UAPI data size is filled by the user space which must be validated
> by ther kernel. To ensure backward compatibility, user data can only be
> extended by either re-purpose padding bytes or extend the variable sized
> union at the end. No size change is allowed before the union. Therefore,
> the minimum size is the offset of the union.
> 
> To use offsetof() on the union, we must make it named.
> 
> Link:https://lkml.org/lkml/2020/6/11/834

Please use lore.kernel.org links.

Best regards,
baolu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 3/5] iommu/uapi: Use named union for user data
  2020-07-08  2:17   ` Lu Baolu
@ 2020-07-08 15:18     ` Jacob Pan
  0 siblings, 0 replies; 16+ messages in thread
From: Jacob Pan @ 2020-07-08 15:18 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	Alex Williamson, LKML, Christoph Hellwig, iommu, David Woodhouse

On Wed, 8 Jul 2020 10:17:57 +0800
Lu Baolu <baolu.lu@linux.intel.com> wrote:

> Hi Jacob,
> 
> On 7/8/20 7:43 AM, Jacob Pan wrote:
> > IOMMU UAPI data size is filled by the user space which must be
> > validated by ther kernel. To ensure backward compatibility, user
> > data can only be extended by either re-purpose padding bytes or
> > extend the variable sized union at the end. No size change is
> > allowed before the union. Therefore, the minimum size is the offset
> > of the union.
> > 
> > To use offsetof() on the union, we must make it named.
> > 
> > Link:https://lkml.org/lkml/2020/6/11/834  
> 
> Please use lore.kernel.org links.
> 
OK. will do.

> Best regards,
> baolu

[Jacob Pan]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 1/5] docs: IOMMU user API
  2020-07-08  2:07   ` Lu Baolu
@ 2020-07-08 15:29     ` Jacob Pan
  2020-07-09  0:44       ` Lu Baolu
  0 siblings, 1 reply; 16+ messages in thread
From: Jacob Pan @ 2020-07-08 15:29 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	Alex Williamson, LKML, Christoph Hellwig, iommu, David Woodhouse

On Wed, 8 Jul 2020 10:07:13 +0800
Lu Baolu <baolu.lu@linux.intel.com> wrote:

> Hi,
> 
> On 7/8/20 7:43 AM, Jacob Pan wrote:
> > +For UAPIs that are shared with in-kernel users, a wrapper function
> > +is provided to distinguish the callers. For example,
> > +
> > +Userspace caller ::
> > +
> > +  int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct
> > device *dev,
> > +  void __user *udata)
> > +
> > +In-kernel caller ::
> > +
> > +  int __iommu_sva_unbind_gpasid(struct iommu_domain *domain,
> > struct device *dev,
> > +  struct iommu_gpasid_bind_data *data)  
> 
> iommu_page_response() may have the same needs. Can we reach an
> agreement on the naming rules?
> 
Yes iommu_page_response() also has to deal with in-kernel and UAPI
callers. I left it out because I thought the current VFIO and SVA common
code is not ready for PRQ yet, I might be reading old news :) argsz
need to be handled as well.

Perhaps we can wait until this set is settled? Do you have a suggestion
on the naming rules?

> Best regards,
> baolu

[Jacob Pan]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 1/5] docs: IOMMU user API
  2020-07-08 15:29     ` Jacob Pan
@ 2020-07-09  0:44       ` Lu Baolu
  0 siblings, 0 replies; 16+ messages in thread
From: Lu Baolu @ 2020-07-09  0:44 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	iommu, LKML, Christoph Hellwig, Alex Williamson, David Woodhouse

Hi Jacob,

On 7/8/20 11:29 PM, Jacob Pan wrote:
> On Wed, 8 Jul 2020 10:07:13 +0800
> Lu Baolu <baolu.lu@linux.intel.com> wrote:
> 
>> Hi,
>>
>> On 7/8/20 7:43 AM, Jacob Pan wrote:
>>> +For UAPIs that are shared with in-kernel users, a wrapper function
>>> +is provided to distinguish the callers. For example,
>>> +
>>> +Userspace caller ::
>>> +
>>> +  int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct
>>> device *dev,
>>> +  void __user *udata)
>>> +
>>> +In-kernel caller ::
>>> +
>>> +  int __iommu_sva_unbind_gpasid(struct iommu_domain *domain,
>>> struct device *dev,
>>> +  struct iommu_gpasid_bind_data *data)
>>
>> iommu_page_response() may have the same needs. Can we reach an
>> agreement on the naming rules?
>>
> Yes iommu_page_response() also has to deal with in-kernel and UAPI
> callers. I left it out because I thought the current VFIO and SVA common
> code is not ready for PRQ yet, I might be reading old news :) argsz
> need to be handled as well.
> 
> Perhaps we can wait until this set is settled? Do you have a suggestion
> on the naming rules?

I have no suggestion on the naming rules, just wanted to check whether
others have any preference. I agree that we can wait until this series
is settled.

Best regards,
baolu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 1/5] docs: IOMMU user API
  2020-07-07 23:43 ` [PATCH v4 1/5] docs: IOMMU user API Jacob Pan
  2020-07-08  2:07   ` Lu Baolu
@ 2020-07-13 22:48   ` Alex Williamson
  2020-07-14  5:00     ` Jacob Pan
  1 sibling, 1 reply; 16+ messages in thread
From: Alex Williamson @ 2020-07-13 22:48 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	LKML, Christoph Hellwig, iommu, David Woodhouse

On Tue,  7 Jul 2020 16:43:45 -0700
Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:

> IOMMU UAPI is newly introduced to support communications between guest
> virtual IOMMU and host IOMMU. There has been lots of discussions on how
> it should work with VFIO UAPI and userspace in general.
> 
> This document is indended to clarify the UAPI design and usage. The
> mechenics of how future extensions should be achieved are also covered

mechanics

> in this documentation.
> 
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
>  Documentation/userspace-api/iommu.rst | 312 ++++++++++++++++++++++++++++++++++
>  1 file changed, 312 insertions(+)
>  create mode 100644 Documentation/userspace-api/iommu.rst
> 
> diff --git a/Documentation/userspace-api/iommu.rst b/Documentation/userspace-api/iommu.rst
> new file mode 100644
> index 000000000000..581b462c2cec
> --- /dev/null
> +++ b/Documentation/userspace-api/iommu.rst
> @@ -0,0 +1,312 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +.. iommu:
> +
> +=====================================
> +IOMMU Userspace API
> +=====================================
> +
> +IOMMU UAPI is used for virtualization cases where communications are
> +needed between physical and virtual IOMMU drivers. For native
> +usage, IOMMU is a system device which does not need to communicate
> +with user space directly.
> +
> +The primary use cases are guest Shared Virtual Address (SVA) and
> +guest IO virtual address (IOVA), wherein a virtual IOMMU (vIOMMU) is
> +required to communicate with the physical IOMMU in the host.
> +
> +.. contents:: :local:
> +
> +Functionalities
> +===============
> +Communications of user and kernel involve both directions. The
> +supported user-kernel APIs are as follows:
> +
> +1. Alloc/Free PASID
> +2. Bind/unbind guest PASID (e.g. Intel VT-d)
> +3. Bind/unbind guest PASID table (e.g. ARM sMMU)
> +4. Invalidate IOMMU caches
> +5. Service page requests
> +
> +Requirements
> +============
> +The IOMMU UAPIs are generic and extensible to meet the following
> +requirements:
> +
> +1. Emulated and para-virtualised vIOMMUs
> +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.)
> +3. Extensions to the UAPI shall not break existing user space
> +
> +Interfaces
> +==========
> +Although the data structures defined in IOMMU UAPI are self-contained,
> +there is no user API functions introduced. Instead, IOMMU UAPI is
> +designed to work with existing user driver frameworks such as VFIO.
> +
> +Extension Rules & Precautions
> +-----------------------------
> +When IOMMU UAPI gets extended, the data structures can *only* be
> +modified in two ways:
> +
> +1. Adding new fields by re-purposing the padding[] field. No size change.
> +2. Adding new union members at the end. May increase in size.
> +
> +No new fields can be added *after* the variable sized union in that it
> +will break backward compatibility when offset moves. In both cases, a
> +new flag must be accompanied with a new field such that the IOMMU
> +driver can process the data based on the new flag. Version field is
> +only reserved for the unlikely event of UAPI upgrade at its entirety.
> +
> +It's *always* the caller's responsibility to indicate the size of the
> +structure passed by setting argsz appropriately.
> +Though at the same time, argsz is user provided data which is not
> +trusted. The argsz field allows the user to indicate how much data
> +they're providing, it's still the kernel's responsibility to validate
> +whether it's correct and sufficient for the requested operation.
> +
> +Compatibility Checking
> +----------------------
> +When IOMMU UAPI extension results in size increase, user such as VFIO
> +has to handle the following cases:
> +
> +1. User and kernel has exact size match
> +2. An older user with older kernel header (smaller UAPI size) running on a
> +   newer kernel (larger UAPI size)
> +3. A newer user with newer kernel header (larger UAPI size) running
> +   on an older kernel.
> +4. A malicious/misbehaving user pass illegal/invalid size but within
> +   range. The data may contain garbage.

I'm still not sure where VFIO has responsibility in managing any of
these cases.  I think we've determined that VFIO is just the wrapper
and call-through mechanism, it's the UAPI core implementation and
IOMMU drivers that are responsible for this.

> +
> +Feature Checking
> +----------------
> +While launching a guest with vIOMMU, it is important to ensure that host
> +can support the UAPI data structures to be used for vIOMMU-pIOMMU
> +communications. Without upfront compatibility checking, future faults
> +are difficult to report even in normal conditions. For example, TLB
> +invalidations should always succeed. There is no architectural way to
> +report back to the vIOMMU if the UAPI data is incompatible. If that
> +happens, in order to protect IOMMU iosolation guarantee, we have to
> +resort to not giving completion status in vIOMMU. This may result in
> +VM hang.
> +
> +For this reason the following IOMMU UAPIs cannot fail:
> +
> +1. Free PASID
> +2. Unbind guest PASID
> +3. Unbind guest PASID table (SMMU)
> +4. Cache invalidate
> +
> +User applications such as QEMU is expected to import kernel UAPI

s/is/are/

> +headers. Backward compatibility is supported per feature flags.
> +For example, an older QEMU (with older kernel header) can run on newer
> +kernel. Newer QEMU (with new kernel header) may refuse to initialize
> +on an older kernel if new feature flags are not supported by older
> +kernel. Simply recompile existing code with newer kernel header should

s/recompile/recompiling/

> +not be an issue in that only existing flags are used.
> +
> +IOMMU vendor driver should report the below features to IOMMU UAPI
> +consumers (e.g. via VFIO).
> +
> +1. IOMMU_NESTING_FEAT_SYSWIDE_PASID
> +2. IOMMU_NESTING_FEAT_BIND_PGTBL
> +3. IOMMU_NESTING_FEAT_BIND_PASID_TABLE
> +4. IOMMU_NESTING_FEAT_CACHE_INVLD
> +5. IOMMU_NESTING_FEAT_PAGE_REQUEST
> +
> +Take VFIO as example, upon request from VFIO user space (e.g. QEMU),
> +VFIO kernel code shall query IOMMU vendor driver for the support of
> +the above features. Query result can then be reported back to the
> +user-space caller. Details can be found in
> +Documentation/driver-api/vfio.rst.
> +
> +
> +Data Passing Example with VFIO
> +------------------------------
> +As the ubiquitous userspace driver framework, VFIO is already IOMMU
> +aware and share many key concepts such as device model, group, and

s/share/shares/

> +protection domain. Other user driver frameworks can also be extended
> +to support IOMMU UAPI but it is outside the scope of this document.
> +
> +In this tight-knit VFIO-IOMMU interface, the ultimate consumer of the
> +IOMMU UAPI data is the host IOMMU driver. VFIO facilitates user-kernel
> +transport, capability checking, security, and life cycle management of
> +process address space ID (PASID).
> +
> +Unlike normal user data passed via VFIO UAPI IOTCL, IOMMU driver is the
> +ultimate consumer of its UAPI data. At VFIO layer, the IOMMU UAPI data
> +is wrapped in a VFIO UAPI data. It follows the
> +pattern below::
> +
> +   struct {
> +	__u32 argsz;
> +	__u32 flags;
> +	__u8  data[];
> +   };
> +
> +Here data[] contains the IOMMU UAPI data structures. VFIO has the
> +freedom to bundle the data as well as parse data size based on its own flags.
> +
> +In order to determine the size and feature set of the user data, argsz
> +and flags are also embedded in the IOMMU UAPI data structures.
> +A "__u32 argsz" field is *always* at the beginning of each structure.
> +
> +For example:
> +::
> +
> +   struct iommu_cache_invalidate_info {
> +	__u32	argsz;
> +	#define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
> +	__u32	version;
> +	/* IOMMU paging structure cache */
> +	#define IOMMU_CACHE_INV_TYPE_IOTLB	(1 << 0) /* IOMMU IOTLB */
> +	#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB	(1 << 1) /* Device IOTLB */
> +	#define IOMMU_CACHE_INV_TYPE_PASID	(1 << 2) /* PASID cache */
> +	#define IOMMU_CACHE_INV_TYPE_NR		(3)
> +	__u8	cache;
> +	__u8	granularity;
> +	__u8	padding[2];

Now would be the right time to add more than just minimum alignment
padding for future use.  Also note that we have 4-byte alignment
leading into the union, it could be desirable to pad that out to 8-byte
alignment anyway.

> +	union {
> +		struct iommu_inv_pasid_info pasid_info;
> +		struct iommu_inv_addr_info addr_info;
> +	} granu;
> +   };
> +
> +VFIO is responsible for checking its own argsz and flags then invokes
> +appropriate IOMMU UAPI functions. User pointer is passed to IOMMU
> +layer for further processing. The responsibilities are divided as
> +follows:
> +
> +- Generic IOMMU layer checks argsz range and override out-of-range
> +  value.
> +
> +- Generic IOMMU layer checks content of the UAPI data for non-zero
> +  reserved bits in flags, padding fields, and unsupported version.
> +  This is to ensure not breaking userspace in the future when these
> +  fields or flags are used.
> +
> +- Vendor IOMMU driver checks argsz based on vendor flags, UAPI data
> +  is consumed based on flags
> +
> +Once again, use guest TLB invalidation as an example, argsz is based
> +on generic flags in the invalidation information. IOMMU generic code
> +shall process the UAPI data as the following:
> +
> +::
> +
> + static int iommu_check_cache_invl_data(struct iommu_cache_invalidate_info *info)
> + {
> +	int ret = 0;
> +	u32 mask;
> +
> +	if (info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
> +		return -EINVAL;
> +
> +	mask =  IOMMU_CACHE_INV_TYPE_IOTLB |
> +		IOMMU_CACHE_INV_TYPE_DEV_IOTLB |
> +		IOMMU_CACHE_INV_TYPE_PASID;

Can TYPE_NR be used here?  ie.  ((1 << IOMMU_CACHE_INV_TYPE_NR) - 1)

> +	if (info->cache & ~mask) {
> +		pr_warn_ratelimited("Invalid cache types %x\n", info->cache);

Even ratelimited, this is too much for a user triggered error, at most
these should be some sort of debug level.  Should probably just drop
them for production.

> +		return -EINVAL;
> +	}
> +
> +	if (info->granularity >= IOMMU_INV_GRANU_NR) {
> +		pr_warn_ratelimited("Invalid cache invalidation granu %x\n",
> +				info->granularity);
> +		return -EINVAL;
> +	}
> +
> +	switch (info->granularity) {
> +	case IOMMU_INV_GRANU_ADDR:
> +		mask = IOMMU_INV_ADDR_FLAGS_PASID |
> +			IOMMU_INV_ADDR_FLAGS_ARCHID |
> +			IOMMU_INV_ADDR_FLAGS_LEAF;
> +
> +		if (info->granu.addr_info.flags & ~mask) {
> +			pr_warn_ratelimited("Unsupported invalidation addr flags %x\n",
> +					info->granu.addr_info.flags);
> +			ret = -EINVAL;

Why not return?  Inconsistent with above and unclear benefit.

> +		}
> +		break;
> +	case IOMMU_INV_GRANU_PASID:
> +		mask = IOMMU_INV_PASID_FLAGS_PASID |
> +			IOMMU_INV_PASID_FLAGS_ARCHID;
> +		if (info->granu.pasid_info.flags & ~mask) {
> +			pr_warn_ratelimited("Unsupported invalidation PASID flags%x\n",
> +					info->granu.pasid_info.flags);
> +			ret = -EINVAL;
> +		}
> +		break;
> +	}


What happened to IOMMU_INV_GRANU_DOMAIN?  Nothing to check?  Should
probably still be included with a 

> +
> +	if (info->padding[0] || info->padding[1]) {
> +		pr_warn_ratelimited("Non-zero reserved fields\n");
> +		ret = -EINVAL;
> +	}
> +
> +	return ret;
> + }
> +
> + int iommu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
> +			   void __user *uinfo)
> + {
> +	struct iommu_cache_invalidate_info inv_info;
> +	unsigned long minsz, maxsz;
> +	int ret = 0;
> +
> +	if (unlikely(!domain->ops->cache_invalidate))
> +		return -ENODEV;
> +
> +	/* Current kernel data size is the max to be copied from user */
> +	maxsz = sizeof(struct iommu_cache_invalidate_info);
> +	memset((void *)&inv_info, 0, maxsz);

initialize as = { 0 };

> +
> +	/*
> +	 * No new spaces can be added before the variable sized union, the
> +	 * minimum size is the offset to the union.
> +	 */
> +	minsz = offsetof(struct iommu_cache_invalidate_info, granu);
> +
> +	/* Copy minsz from user to get flags and argsz */
> +	if (copy_from_user(&inv_info, uinfo, minsz))
> +		return -EFAULT;
> +
> +	/* Fields before variable size union is mandatory */
> +	if (inv_info.argsz < minsz)
> +		return -EINVAL;
> +
> +	/*
> +	 * User might be using a newer UAPI header which has a larger data
> +	 * size, we shall support the existing flags within the current
> +	 * size.
> +	 */
> +	if (inv_info.argsz > maxsz)
> +		inv_info.argsz = maxsz;

maxsz handling seems a little clunky, maybe only because this is the
documentation example?

> +
> +	/* Copy the remaining user data _after_ minsz */
> +	if (copy_from_user((void *)&inv_info + minsz, uinfo + minsz,
> +				inv_info.argsz - minsz))
> +		return -EFAULT;
> +
> +	/* Now the argsz is validated, check the content for reserved bits */
> +	ret = iommu_check_cache_invl_data(&inv_info);
> +	if (ret)
> +		return ret;
> +
> +	return domain->ops->cache_invalidate(domain, dev, &inv_info);
> + }
> +
> +Notice that in this example, since union size is determined by generic
> +flags, all checking to argsz is validated in the generic IOMMU layer,
> +vendor driver does not need to check argsz.

Not true.  What if the user provided argsz = minsz and the operation
requires an entry in the granu union?  The vendor driver needs to check
that argsz was _at_least_ sufficient to provide that entry.  The
mangling of the user provided argsz above makes me cringe a little too
for that reason, once we start modifying the user values in the core it
could get messy for the vendor drivers.

> +
> +For UAPIs that are shared with in-kernel users, a wrapper function
> +is provided to distinguish the callers. For example,
> +
> +Userspace caller ::
> +
> +  int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
> +  void __user *udata)
> +
> +In-kernel caller ::
> +
> +  int __iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
> +  struct iommu_gpasid_bind_data *data)

Maybe just prefix with iommu_uapi rather than underscores?  Underscore
prefixes usually imply a locking requirement or other reasons to tread
carefully whereas this is just the internal API.  Thanks,

Alex

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 1/5] docs: IOMMU user API
  2020-07-13 22:48   ` Alex Williamson
@ 2020-07-14  5:00     ` Jacob Pan
  2020-07-14 19:04       ` Alex Williamson
  0 siblings, 1 reply; 16+ messages in thread
From: Jacob Pan @ 2020-07-14  5:00 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	LKML, Christoph Hellwig, iommu, David Woodhouse

Hi Alex,

On Mon, 13 Jul 2020 16:48:42 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> On Tue,  7 Jul 2020 16:43:45 -0700
> Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
> 
> > IOMMU UAPI is newly introduced to support communications between
> > guest virtual IOMMU and host IOMMU. There has been lots of
> > discussions on how it should work with VFIO UAPI and userspace in
> > general.
> > 
> > This document is indended to clarify the UAPI design and usage. The
> > mechenics of how future extensions should be achieved are also
> > covered  
> 
> mechanics
> 
will fix.

> > in this documentation.
> > 
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > ---
> >  Documentation/userspace-api/iommu.rst | 312
> > ++++++++++++++++++++++++++++++++++ 1 file changed, 312 insertions(+)
> >  create mode 100644 Documentation/userspace-api/iommu.rst
> > 
> > diff --git a/Documentation/userspace-api/iommu.rst
> > b/Documentation/userspace-api/iommu.rst new file mode 100644
> > index 000000000000..581b462c2cec
> > --- /dev/null
> > +++ b/Documentation/userspace-api/iommu.rst
> > @@ -0,0 +1,312 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +.. iommu:
> > +
> > +=====================================
> > +IOMMU Userspace API
> > +=====================================
> > +
> > +IOMMU UAPI is used for virtualization cases where communications
> > are +needed between physical and virtual IOMMU drivers. For native
> > +usage, IOMMU is a system device which does not need to communicate
> > +with user space directly.
> > +
> > +The primary use cases are guest Shared Virtual Address (SVA) and
> > +guest IO virtual address (IOVA), wherein a virtual IOMMU (vIOMMU)
> > is +required to communicate with the physical IOMMU in the host.
> > +
> > +.. contents:: :local:
> > +
> > +Functionalities
> > +===============
> > +Communications of user and kernel involve both directions. The
> > +supported user-kernel APIs are as follows:
> > +
> > +1. Alloc/Free PASID
> > +2. Bind/unbind guest PASID (e.g. Intel VT-d)
> > +3. Bind/unbind guest PASID table (e.g. ARM sMMU)
> > +4. Invalidate IOMMU caches
> > +5. Service page requests
> > +
> > +Requirements
> > +============
> > +The IOMMU UAPIs are generic and extensible to meet the following
> > +requirements:
> > +
> > +1. Emulated and para-virtualised vIOMMUs
> > +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.)
> > +3. Extensions to the UAPI shall not break existing user space
> > +
> > +Interfaces
> > +==========
> > +Although the data structures defined in IOMMU UAPI are
> > self-contained, +there is no user API functions introduced.
> > Instead, IOMMU UAPI is +designed to work with existing user driver
> > frameworks such as VFIO. +
> > +Extension Rules & Precautions
> > +-----------------------------
> > +When IOMMU UAPI gets extended, the data structures can *only* be
> > +modified in two ways:
> > +
> > +1. Adding new fields by re-purposing the padding[] field. No size
> > change. +2. Adding new union members at the end. May increase in
> > size. +
> > +No new fields can be added *after* the variable sized union in
> > that it +will break backward compatibility when offset moves. In
> > both cases, a +new flag must be accompanied with a new field such
> > that the IOMMU +driver can process the data based on the new flag.
> > Version field is +only reserved for the unlikely event of UAPI
> > upgrade at its entirety. +
> > +It's *always* the caller's responsibility to indicate the size of
> > the +structure passed by setting argsz appropriately.
> > +Though at the same time, argsz is user provided data which is not
> > +trusted. The argsz field allows the user to indicate how much data
> > +they're providing, it's still the kernel's responsibility to
> > validate +whether it's correct and sufficient for the requested
> > operation. +
> > +Compatibility Checking
> > +----------------------
> > +When IOMMU UAPI extension results in size increase, user such as
> > VFIO +has to handle the following cases:
> > +
> > +1. User and kernel has exact size match
> > +2. An older user with older kernel header (smaller UAPI size)
> > running on a
> > +   newer kernel (larger UAPI size)
> > +3. A newer user with newer kernel header (larger UAPI size) running
> > +   on an older kernel.
> > +4. A malicious/misbehaving user pass illegal/invalid size but
> > within
> > +   range. The data may contain garbage.  
> 
> I'm still not sure where VFIO has responsibility in managing any of
> these cases.  I think we've determined that VFIO is just the wrapper
> and call-through mechanism, it's the UAPI core implementation and
> IOMMU drivers that are responsible for this.
> 
That is right, I shall rewrite the responsibility to be held by IOMMU
core.
"When IOMMU UAPI extension results in size increase, IOMMU UAPI core
shall handle the following cases:"

> > +
> > +Feature Checking
> > +----------------
> > +While launching a guest with vIOMMU, it is important to ensure
> > that host +can support the UAPI data structures to be used for
> > vIOMMU-pIOMMU +communications. Without upfront compatibility
> > checking, future faults +are difficult to report even in normal
> > conditions. For example, TLB +invalidations should always succeed.
> > There is no architectural way to +report back to the vIOMMU if the
> > UAPI data is incompatible. If that +happens, in order to protect
> > IOMMU iosolation guarantee, we have to +resort to not giving
> > completion status in vIOMMU. This may result in +VM hang.
> > +
> > +For this reason the following IOMMU UAPIs cannot fail:
> > +
> > +1. Free PASID
> > +2. Unbind guest PASID
> > +3. Unbind guest PASID table (SMMU)
> > +4. Cache invalidate
> > +
> > +User applications such as QEMU is expected to import kernel UAPI  
> 
> s/is/are/
> 
will fix.

> > +headers. Backward compatibility is supported per feature flags.
> > +For example, an older QEMU (with older kernel header) can run on
> > newer +kernel. Newer QEMU (with new kernel header) may refuse to
> > initialize +on an older kernel if new feature flags are not
> > supported by older +kernel. Simply recompile existing code with
> > newer kernel header should  
> 
> s/recompile/recompiling/
> 
got it.

> > +not be an issue in that only existing flags are used.
> > +
> > +IOMMU vendor driver should report the below features to IOMMU UAPI
> > +consumers (e.g. via VFIO).
> > +
> > +1. IOMMU_NESTING_FEAT_SYSWIDE_PASID
> > +2. IOMMU_NESTING_FEAT_BIND_PGTBL
> > +3. IOMMU_NESTING_FEAT_BIND_PASID_TABLE
> > +4. IOMMU_NESTING_FEAT_CACHE_INVLD
> > +5. IOMMU_NESTING_FEAT_PAGE_REQUEST
> > +
> > +Take VFIO as example, upon request from VFIO user space (e.g.
> > QEMU), +VFIO kernel code shall query IOMMU vendor driver for the
> > support of +the above features. Query result can then be reported
> > back to the +user-space caller. Details can be found in
> > +Documentation/driver-api/vfio.rst.
> > +
> > +
> > +Data Passing Example with VFIO
> > +------------------------------
> > +As the ubiquitous userspace driver framework, VFIO is already IOMMU
> > +aware and share many key concepts such as device model, group,
> > and  
> 
> s/share/shares/
> 
got it.

> > +protection domain. Other user driver frameworks can also be
> > extended +to support IOMMU UAPI but it is outside the scope of this
> > document. +
> > +In this tight-knit VFIO-IOMMU interface, the ultimate consumer of
> > the +IOMMU UAPI data is the host IOMMU driver. VFIO facilitates
> > user-kernel +transport, capability checking, security, and life
> > cycle management of +process address space ID (PASID).
> > +
> > +Unlike normal user data passed via VFIO UAPI IOTCL, IOMMU driver
> > is the +ultimate consumer of its UAPI data. At VFIO layer, the
> > IOMMU UAPI data +is wrapped in a VFIO UAPI data. It follows the
> > +pattern below::
> > +
> > +   struct {
> > +	__u32 argsz;
> > +	__u32 flags;
> > +	__u8  data[];
> > +   };
> > +
> > +Here data[] contains the IOMMU UAPI data structures. VFIO has the
> > +freedom to bundle the data as well as parse data size based on its
> > own flags. +
> > +In order to determine the size and feature set of the user data,
> > argsz +and flags are also embedded in the IOMMU UAPI data
> > structures. +A "__u32 argsz" field is *always* at the beginning of
> > each structure. +
> > +For example:
> > +::
> > +
> > +   struct iommu_cache_invalidate_info {
> > +	__u32	argsz;
> > +	#define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
> > +	__u32	version;
> > +	/* IOMMU paging structure cache */
> > +	#define IOMMU_CACHE_INV_TYPE_IOTLB	(1 << 0) /*
> > IOMMU IOTLB */
> > +	#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB	(1 << 1) /*
> > Device IOTLB */
> > +	#define IOMMU_CACHE_INV_TYPE_PASID	(1 << 2) /*
> > PASID cache */
> > +	#define IOMMU_CACHE_INV_TYPE_NR		(3)
> > +	__u8	cache;
> > +	__u8	granularity;
> > +	__u8	padding[2];  
> 
> Now would be the right time to add more than just minimum alignment
> padding for future use.  Also note that we have 4-byte alignment
> leading into the union, it could be desirable to pad that out to
> 8-byte alignment anyway.
> 
make sense, will do padding[6]

> > +	union {
> > +		struct iommu_inv_pasid_info pasid_info;
> > +		struct iommu_inv_addr_info addr_info;
> > +	} granu;
> > +   };
> > +
> > +VFIO is responsible for checking its own argsz and flags then
> > invokes +appropriate IOMMU UAPI functions. User pointer is passed
> > to IOMMU +layer for further processing. The responsibilities are
> > divided as +follows:
> > +
> > +- Generic IOMMU layer checks argsz range and override out-of-range
> > +  value.
> > +
> > +- Generic IOMMU layer checks content of the UAPI data for non-zero
> > +  reserved bits in flags, padding fields, and unsupported version.
> > +  This is to ensure not breaking userspace in the future when these
> > +  fields or flags are used.
> > +
> > +- Vendor IOMMU driver checks argsz based on vendor flags, UAPI data
> > +  is consumed based on flags
> > +
> > +Once again, use guest TLB invalidation as an example, argsz is
> > based +on generic flags in the invalidation information. IOMMU
> > generic code +shall process the UAPI data as the following:
> > +
> > +::
> > +
> > + static int iommu_check_cache_invl_data(struct
> > iommu_cache_invalidate_info *info)
> > + {
> > +	int ret = 0;
> > +	u32 mask;
> > +
> > +	if (info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
> > +		return -EINVAL;
> > +
> > +	mask =  IOMMU_CACHE_INV_TYPE_IOTLB |
> > +		IOMMU_CACHE_INV_TYPE_DEV_IOTLB |
> > +		IOMMU_CACHE_INV_TYPE_PASID;  
> 
> Can TYPE_NR be used here?  ie.  ((1 << IOMMU_CACHE_INV_TYPE_NR) - 1)
> 
much better, thanks!

> > +	if (info->cache & ~mask) {
> > +		pr_warn_ratelimited("Invalid cache types %x\n",
> > info->cache);  
> 
> Even ratelimited, this is too much for a user triggered error, at most
> these should be some sort of debug level.  Should probably just drop
> them for production.
> 
I felt a little too chatty as well. will drop, we have a lot of these.

> > +		return -EINVAL;
> > +	}
> > +
> > +	if (info->granularity >= IOMMU_INV_GRANU_NR) {
> > +		pr_warn_ratelimited("Invalid cache invalidation
> > granu %x\n",
> > +				info->granularity);
> > +		return -EINVAL;
> > +	}
> > +
> > +	switch (info->granularity) {
> > +	case IOMMU_INV_GRANU_ADDR:
> > +		mask = IOMMU_INV_ADDR_FLAGS_PASID |
> > +			IOMMU_INV_ADDR_FLAGS_ARCHID |
> > +			IOMMU_INV_ADDR_FLAGS_LEAF;
> > +
> > +		if (info->granu.addr_info.flags & ~mask) {
> > +			pr_warn_ratelimited("Unsupported
> > invalidation addr flags %x\n",
> > +
> > info->granu.addr_info.flags);
> > +			ret = -EINVAL;  
> 
> Why not return?  Inconsistent with above and unclear benefit.
> 
will do. thanks

> > +		}
> > +		break;
> > +	case IOMMU_INV_GRANU_PASID:
> > +		mask = IOMMU_INV_PASID_FLAGS_PASID |
> > +			IOMMU_INV_PASID_FLAGS_ARCHID;
> > +		if (info->granu.pasid_info.flags & ~mask) {
> > +			pr_warn_ratelimited("Unsupported
> > invalidation PASID flags%x\n",
> > +
> > info->granu.pasid_info.flags);
> > +			ret = -EINVAL;
> > +		}
> > +		break;
> > +	}  
> 
> 
> What happened to IOMMU_INV_GRANU_DOMAIN?  Nothing to check?  Should
> probably still be included with a 
> 
I am not sure I got the complete comments here.

IOMMU_INV_GRANU_DOMAIN does not have additional info, actually not used
for now.

> > +
> > +	if (info->padding[0] || info->padding[1]) {
> > +		pr_warn_ratelimited("Non-zero reserved fields\n");
> > +		ret = -EINVAL;
> > +	}
> > +
> > +	return ret;
> > + }
> > +
> > + int iommu_cache_invalidate(struct iommu_domain *domain, struct
> > device *dev,
> > +			   void __user *uinfo)
> > + {
> > +	struct iommu_cache_invalidate_info inv_info;
> > +	unsigned long minsz, maxsz;
> > +	int ret = 0;
> > +
> > +	if (unlikely(!domain->ops->cache_invalidate))
> > +		return -ENODEV;
> > +
> > +	/* Current kernel data size is the max to be copied from
> > user */
> > +	maxsz = sizeof(struct iommu_cache_invalidate_info);
> > +	memset((void *)&inv_info, 0, maxsz);  
> 
> initialize as = { 0 };
> 
got it.

> > +
> > +	/*
> > +	 * No new spaces can be added before the variable sized
> > union, the
> > +	 * minimum size is the offset to the union.
> > +	 */
> > +	minsz = offsetof(struct iommu_cache_invalidate_info,
> > granu); +
> > +	/* Copy minsz from user to get flags and argsz */
> > +	if (copy_from_user(&inv_info, uinfo, minsz))
> > +		return -EFAULT;
> > +
> > +	/* Fields before variable size union is mandatory */
> > +	if (inv_info.argsz < minsz)
> > +		return -EINVAL;
> > +
> > +	/*
> > +	 * User might be using a newer UAPI header which has a
> > larger data
> > +	 * size, we shall support the existing flags within the
> > current
> > +	 * size.
> > +	 */
> > +	if (inv_info.argsz > maxsz)
> > +		inv_info.argsz = maxsz;  
> 
> maxsz handling seems a little clunky, maybe only because this is the
> documentation example?
> 
Not sure I am following.
My thinking is that we wanted to support old flags even the
user is compiled with a newer header with larger struct size. But
old flags must be within the current(older) kernel UAPI struct size.
That is why we override the size here, there is no need to copy the
whole new struct.

> > +
> > +	/* Copy the remaining user data _after_ minsz */
> > +	if (copy_from_user((void *)&inv_info + minsz, uinfo +
> > minsz,
> > +				inv_info.argsz - minsz))
> > +		return -EFAULT;
> > +
> > +	/* Now the argsz is validated, check the content for
> > reserved bits */
> > +	ret = iommu_check_cache_invl_data(&inv_info);
> > +	if (ret)
> > +		return ret;
> > +
> > +	return domain->ops->cache_invalidate(domain, dev,
> > &inv_info);
> > + }
> > +
> > +Notice that in this example, since union size is determined by
> > generic +flags, all checking to argsz is validated in the generic
> > IOMMU layer, +vendor driver does not need to check argsz.  
> 
> Not true.  What if the user provided argsz = minsz and the operation
> requires an entry in the granu union?  The vendor driver needs to
> check that argsz was _at_least_ sufficient to provide that entry.
My reason is that we have no vendor specific flags in cache invalidate
UAPI. Why would vendor driver gets involved in sanity checking?
Previously in v3, we check union size against argsz but was deemed to
strict.

>  The
> mangling of the user provided argsz above makes me cringe a little too
> for that reason, once we start modifying the user values in the core
> it could get messy for the vendor drivers.
> 
We do have vendor specific union in bind_gpasid UAPI. Could you
elaborate your concern?

> > +
> > +For UAPIs that are shared with in-kernel users, a wrapper function
> > +is provided to distinguish the callers. For example,
> > +
> > +Userspace caller ::
> > +
> > +  int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct
> > device *dev,
> > +  void __user *udata)
> > +
> > +In-kernel caller ::
> > +
> > +  int __iommu_sva_unbind_gpasid(struct iommu_domain *domain,
> > struct device *dev,
> > +  struct iommu_gpasid_bind_data *data)  
> 
> Maybe just prefix with iommu_uapi rather than underscores?  Underscore
> prefixes usually imply a locking requirement or other reasons to tread
> carefully whereas this is just the internal API.  Thanks,
> 
sounds good. Thanks for explaining.

> Alex
> 

[Jacob Pan]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 1/5] docs: IOMMU user API
  2020-07-14  5:00     ` Jacob Pan
@ 2020-07-14 19:04       ` Alex Williamson
  2020-07-14 23:53         ` Jacob Pan
  2020-07-15 22:12         ` Jacob Pan
  0 siblings, 2 replies; 16+ messages in thread
From: Alex Williamson @ 2020-07-14 19:04 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	LKML, Christoph Hellwig, iommu, David Woodhouse

On Mon, 13 Jul 2020 22:00:23 -0700
Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:

> Hi Alex,
> 
> On Mon, 13 Jul 2020 16:48:42 -0600
> Alex Williamson <alex.williamson@redhat.com> wrote:
> 
> > On Tue,  7 Jul 2020 16:43:45 -0700
> > Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
> >   
> > > IOMMU UAPI is newly introduced to support communications between
> > > guest virtual IOMMU and host IOMMU. There has been lots of
> > > discussions on how it should work with VFIO UAPI and userspace in
> > > general.
> > > 
> > > This document is indended to clarify the UAPI design and usage. The
> > > mechenics of how future extensions should be achieved are also
> > > covered    
> > 
> > mechanics
> >   
> will fix.
> 
> > > in this documentation.
> > > 
> > > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > ---
> > >  Documentation/userspace-api/iommu.rst | 312
> > > ++++++++++++++++++++++++++++++++++ 1 file changed, 312 insertions(+)
> > >  create mode 100644 Documentation/userspace-api/iommu.rst
> > > 
> > > diff --git a/Documentation/userspace-api/iommu.rst
> > > b/Documentation/userspace-api/iommu.rst new file mode 100644
> > > index 000000000000..581b462c2cec
> > > --- /dev/null
> > > +++ b/Documentation/userspace-api/iommu.rst
> > > @@ -0,0 +1,312 @@
> > > +.. SPDX-License-Identifier: GPL-2.0
> > > +.. iommu:
> > > +
> > > +=====================================
> > > +IOMMU Userspace API
> > > +=====================================
> > > +
> > > +IOMMU UAPI is used for virtualization cases where communications
> > > are +needed between physical and virtual IOMMU drivers. For native
> > > +usage, IOMMU is a system device which does not need to communicate
> > > +with user space directly.
> > > +
> > > +The primary use cases are guest Shared Virtual Address (SVA) and
> > > +guest IO virtual address (IOVA), wherein a virtual IOMMU (vIOMMU)
> > > is +required to communicate with the physical IOMMU in the host.
> > > +
> > > +.. contents:: :local:
> > > +
> > > +Functionalities
> > > +===============
> > > +Communications of user and kernel involve both directions. The
> > > +supported user-kernel APIs are as follows:
> > > +
> > > +1. Alloc/Free PASID
> > > +2. Bind/unbind guest PASID (e.g. Intel VT-d)
> > > +3. Bind/unbind guest PASID table (e.g. ARM sMMU)
> > > +4. Invalidate IOMMU caches
> > > +5. Service page requests
> > > +
> > > +Requirements
> > > +============
> > > +The IOMMU UAPIs are generic and extensible to meet the following
> > > +requirements:
> > > +
> > > +1. Emulated and para-virtualised vIOMMUs
> > > +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.)
> > > +3. Extensions to the UAPI shall not break existing user space
> > > +
> > > +Interfaces
> > > +==========
> > > +Although the data structures defined in IOMMU UAPI are
> > > self-contained, +there is no user API functions introduced.
> > > Instead, IOMMU UAPI is +designed to work with existing user driver
> > > frameworks such as VFIO. +
> > > +Extension Rules & Precautions
> > > +-----------------------------
> > > +When IOMMU UAPI gets extended, the data structures can *only* be
> > > +modified in two ways:
> > > +
> > > +1. Adding new fields by re-purposing the padding[] field. No size
> > > change. +2. Adding new union members at the end. May increase in
> > > size. +
> > > +No new fields can be added *after* the variable sized union in
> > > that it +will break backward compatibility when offset moves. In
> > > both cases, a +new flag must be accompanied with a new field such
> > > that the IOMMU +driver can process the data based on the new flag.
> > > Version field is +only reserved for the unlikely event of UAPI
> > > upgrade at its entirety. +
> > > +It's *always* the caller's responsibility to indicate the size of
> > > the +structure passed by setting argsz appropriately.
> > > +Though at the same time, argsz is user provided data which is not
> > > +trusted. The argsz field allows the user to indicate how much data
> > > +they're providing, it's still the kernel's responsibility to
> > > validate +whether it's correct and sufficient for the requested
> > > operation. +
> > > +Compatibility Checking
> > > +----------------------
> > > +When IOMMU UAPI extension results in size increase, user such as
> > > VFIO +has to handle the following cases:
> > > +
> > > +1. User and kernel has exact size match
> > > +2. An older user with older kernel header (smaller UAPI size)
> > > running on a
> > > +   newer kernel (larger UAPI size)
> > > +3. A newer user with newer kernel header (larger UAPI size) running
> > > +   on an older kernel.
> > > +4. A malicious/misbehaving user pass illegal/invalid size but
> > > within
> > > +   range. The data may contain garbage.    
> > 
> > I'm still not sure where VFIO has responsibility in managing any of
> > these cases.  I think we've determined that VFIO is just the wrapper
> > and call-through mechanism, it's the UAPI core implementation and
> > IOMMU drivers that are responsible for this.
> >   
> That is right, I shall rewrite the responsibility to be held by IOMMU
> core.
> "When IOMMU UAPI extension results in size increase, IOMMU UAPI core
> shall handle the following cases:"
> 
> > > +
> > > +Feature Checking
> > > +----------------
> > > +While launching a guest with vIOMMU, it is important to ensure
> > > that host +can support the UAPI data structures to be used for
> > > vIOMMU-pIOMMU +communications. Without upfront compatibility
> > > checking, future faults +are difficult to report even in normal
> > > conditions. For example, TLB +invalidations should always succeed.
> > > There is no architectural way to +report back to the vIOMMU if the
> > > UAPI data is incompatible. If that +happens, in order to protect
> > > IOMMU iosolation guarantee, we have to +resort to not giving
> > > completion status in vIOMMU. This may result in +VM hang.
> > > +
> > > +For this reason the following IOMMU UAPIs cannot fail:
> > > +
> > > +1. Free PASID
> > > +2. Unbind guest PASID
> > > +3. Unbind guest PASID table (SMMU)
> > > +4. Cache invalidate
> > > +
> > > +User applications such as QEMU is expected to import kernel UAPI    
> > 
> > s/is/are/
> >   
> will fix.
> 
> > > +headers. Backward compatibility is supported per feature flags.
> > > +For example, an older QEMU (with older kernel header) can run on
> > > newer +kernel. Newer QEMU (with new kernel header) may refuse to
> > > initialize +on an older kernel if new feature flags are not
> > > supported by older +kernel. Simply recompile existing code with
> > > newer kernel header should    
> > 
> > s/recompile/recompiling/
> >   
> got it.
> 
> > > +not be an issue in that only existing flags are used.
> > > +
> > > +IOMMU vendor driver should report the below features to IOMMU UAPI
> > > +consumers (e.g. via VFIO).
> > > +
> > > +1. IOMMU_NESTING_FEAT_SYSWIDE_PASID
> > > +2. IOMMU_NESTING_FEAT_BIND_PGTBL
> > > +3. IOMMU_NESTING_FEAT_BIND_PASID_TABLE
> > > +4. IOMMU_NESTING_FEAT_CACHE_INVLD
> > > +5. IOMMU_NESTING_FEAT_PAGE_REQUEST
> > > +
> > > +Take VFIO as example, upon request from VFIO user space (e.g.
> > > QEMU), +VFIO kernel code shall query IOMMU vendor driver for the
> > > support of +the above features. Query result can then be reported
> > > back to the +user-space caller. Details can be found in
> > > +Documentation/driver-api/vfio.rst.
> > > +
> > > +
> > > +Data Passing Example with VFIO
> > > +------------------------------
> > > +As the ubiquitous userspace driver framework, VFIO is already IOMMU
> > > +aware and share many key concepts such as device model, group,
> > > and    
> > 
> > s/share/shares/
> >   
> got it.
> 
> > > +protection domain. Other user driver frameworks can also be
> > > extended +to support IOMMU UAPI but it is outside the scope of this
> > > document. +
> > > +In this tight-knit VFIO-IOMMU interface, the ultimate consumer of
> > > the +IOMMU UAPI data is the host IOMMU driver. VFIO facilitates
> > > user-kernel +transport, capability checking, security, and life
> > > cycle management of +process address space ID (PASID).
> > > +
> > > +Unlike normal user data passed via VFIO UAPI IOTCL, IOMMU driver
> > > is the +ultimate consumer of its UAPI data. At VFIO layer, the
> > > IOMMU UAPI data +is wrapped in a VFIO UAPI data. It follows the
> > > +pattern below::
> > > +
> > > +   struct {
> > > +	__u32 argsz;
> > > +	__u32 flags;
> > > +	__u8  data[];
> > > +   };
> > > +
> > > +Here data[] contains the IOMMU UAPI data structures. VFIO has the
> > > +freedom to bundle the data as well as parse data size based on its
> > > own flags. +
> > > +In order to determine the size and feature set of the user data,
> > > argsz +and flags are also embedded in the IOMMU UAPI data
> > > structures. +A "__u32 argsz" field is *always* at the beginning of
> > > each structure. +
> > > +For example:
> > > +::
> > > +
> > > +   struct iommu_cache_invalidate_info {
> > > +	__u32	argsz;
> > > +	#define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
> > > +	__u32	version;
> > > +	/* IOMMU paging structure cache */
> > > +	#define IOMMU_CACHE_INV_TYPE_IOTLB	(1 << 0) /*
> > > IOMMU IOTLB */
> > > +	#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB	(1 << 1) /*
> > > Device IOTLB */
> > > +	#define IOMMU_CACHE_INV_TYPE_PASID	(1 << 2) /*
> > > PASID cache */
> > > +	#define IOMMU_CACHE_INV_TYPE_NR		(3)
> > > +	__u8	cache;
> > > +	__u8	granularity;
> > > +	__u8	padding[2];    
> > 
> > Now would be the right time to add more than just minimum alignment
> > padding for future use.  Also note that we have 4-byte alignment
> > leading into the union, it could be desirable to pad that out to
> > 8-byte alignment anyway.
> >   
> make sense, will do padding[6]
> 
> > > +	union {
> > > +		struct iommu_inv_pasid_info pasid_info;
> > > +		struct iommu_inv_addr_info addr_info;
> > > +	} granu;
> > > +   };
> > > +
> > > +VFIO is responsible for checking its own argsz and flags then
> > > invokes +appropriate IOMMU UAPI functions. User pointer is passed
> > > to IOMMU +layer for further processing. The responsibilities are
> > > divided as +follows:
> > > +
> > > +- Generic IOMMU layer checks argsz range and override out-of-range
> > > +  value.
> > > +
> > > +- Generic IOMMU layer checks content of the UAPI data for non-zero
> > > +  reserved bits in flags, padding fields, and unsupported version.
> > > +  This is to ensure not breaking userspace in the future when these
> > > +  fields or flags are used.
> > > +
> > > +- Vendor IOMMU driver checks argsz based on vendor flags, UAPI data
> > > +  is consumed based on flags
> > > +
> > > +Once again, use guest TLB invalidation as an example, argsz is
> > > based +on generic flags in the invalidation information. IOMMU
> > > generic code +shall process the UAPI data as the following:
> > > +
> > > +::
> > > +
> > > + static int iommu_check_cache_invl_data(struct
> > > iommu_cache_invalidate_info *info)
> > > + {
> > > +	int ret = 0;
> > > +	u32 mask;
> > > +
> > > +	if (info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
> > > +		return -EINVAL;
> > > +
> > > +	mask =  IOMMU_CACHE_INV_TYPE_IOTLB |
> > > +		IOMMU_CACHE_INV_TYPE_DEV_IOTLB |
> > > +		IOMMU_CACHE_INV_TYPE_PASID;    
> > 
> > Can TYPE_NR be used here?  ie.  ((1 << IOMMU_CACHE_INV_TYPE_NR) - 1)
> >   
> much better, thanks!
> 
> > > +	if (info->cache & ~mask) {
> > > +		pr_warn_ratelimited("Invalid cache types %x\n",
> > > info->cache);    
> > 
> > Even ratelimited, this is too much for a user triggered error, at most
> > these should be some sort of debug level.  Should probably just drop
> > them for production.
> >   
> I felt a little too chatty as well. will drop, we have a lot of these.
> 
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	if (info->granularity >= IOMMU_INV_GRANU_NR) {
> > > +		pr_warn_ratelimited("Invalid cache invalidation
> > > granu %x\n",
> > > +				info->granularity);
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	switch (info->granularity) {
> > > +	case IOMMU_INV_GRANU_ADDR:
> > > +		mask = IOMMU_INV_ADDR_FLAGS_PASID |
> > > +			IOMMU_INV_ADDR_FLAGS_ARCHID |
> > > +			IOMMU_INV_ADDR_FLAGS_LEAF;
> > > +
> > > +		if (info->granu.addr_info.flags & ~mask) {
> > > +			pr_warn_ratelimited("Unsupported
> > > invalidation addr flags %x\n",
> > > +
> > > info->granu.addr_info.flags);
> > > +			ret = -EINVAL;    
> > 
> > Why not return?  Inconsistent with above and unclear benefit.
> >   
> will do. thanks
> 
> > > +		}
> > > +		break;
> > > +	case IOMMU_INV_GRANU_PASID:
> > > +		mask = IOMMU_INV_PASID_FLAGS_PASID |
> > > +			IOMMU_INV_PASID_FLAGS_ARCHID;
> > > +		if (info->granu.pasid_info.flags & ~mask) {
> > > +			pr_warn_ratelimited("Unsupported
> > > invalidation PASID flags%x\n",
> > > +
> > > info->granu.pasid_info.flags);
> > > +			ret = -EINVAL;
> > > +		}
> > > +		break;
> > > +	}    
> > 
> > 
> > What happened to IOMMU_INV_GRANU_DOMAIN?  Nothing to check?  Should
> > probably still be included with a 
> >   
> I am not sure I got the complete comments here.
> 
> IOMMU_INV_GRANU_DOMAIN does not have additional info, actually not used
> for now.

I just found it strange that it was missing, even if only to have a
comment that it requires no additional checking.  This is
documentation, so we want very clear examples.

> > > +
> > > +	if (info->padding[0] || info->padding[1]) {
> > > +		pr_warn_ratelimited("Non-zero reserved fields\n");
> > > +		ret = -EINVAL;
> > > +	}
> > > +
> > > +	return ret;
> > > + }
> > > +
> > > + int iommu_cache_invalidate(struct iommu_domain *domain, struct
> > > device *dev,
> > > +			   void __user *uinfo)
> > > + {
> > > +	struct iommu_cache_invalidate_info inv_info;
> > > +	unsigned long minsz, maxsz;
> > > +	int ret = 0;
> > > +
> > > +	if (unlikely(!domain->ops->cache_invalidate))
> > > +		return -ENODEV;
> > > +
> > > +	/* Current kernel data size is the max to be copied from
> > > user */
> > > +	maxsz = sizeof(struct iommu_cache_invalidate_info);
> > > +	memset((void *)&inv_info, 0, maxsz);    
> > 
> > initialize as = { 0 };
> >   
> got it.
> 
> > > +
> > > +	/*
> > > +	 * No new spaces can be added before the variable sized
> > > union, the
> > > +	 * minimum size is the offset to the union.
> > > +	 */
> > > +	minsz = offsetof(struct iommu_cache_invalidate_info,
> > > granu); +
> > > +	/* Copy minsz from user to get flags and argsz */
> > > +	if (copy_from_user(&inv_info, uinfo, minsz))
> > > +		return -EFAULT;
> > > +
> > > +	/* Fields before variable size union is mandatory */
> > > +	if (inv_info.argsz < minsz)
> > > +		return -EINVAL;
> > > +
> > > +	/*
> > > +	 * User might be using a newer UAPI header which has a
> > > larger data
> > > +	 * size, we shall support the existing flags within the
> > > current
> > > +	 * size.
> > > +	 */
> > > +	if (inv_info.argsz > maxsz)
> > > +		inv_info.argsz = maxsz;    
> > 
> > maxsz handling seems a little clunky, maybe only because this is the
> > documentation example?
> >   
> Not sure I am following.
> My thinking is that we wanted to support old flags even the
> user is compiled with a newer header with larger struct size. But
> old flags must be within the current(older) kernel UAPI struct size.
> That is why we override the size here, there is no need to copy the
> whole new struct.

I'm only commenting that it feels a little rough versus using something
like min(inv_info.argsz, sizeof(inv_info)), so I thought maybe maxsz is
being overly explicit because this is an example.

> > > +
> > > +	/* Copy the remaining user data _after_ minsz */
> > > +	if (copy_from_user((void *)&inv_info + minsz, uinfo +
> > > minsz,
> > > +				inv_info.argsz - minsz))
> > > +		return -EFAULT;
> > > +
> > > +	/* Now the argsz is validated, check the content for
> > > reserved bits */
> > > +	ret = iommu_check_cache_invl_data(&inv_info);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	return domain->ops->cache_invalidate(domain, dev,
> > > &inv_info);
> > > + }
> > > +
> > > +Notice that in this example, since union size is determined by
> > > generic +flags, all checking to argsz is validated in the generic
> > > IOMMU layer, +vendor driver does not need to check argsz.    
> > 
> > Not true.  What if the user provided argsz = minsz and the operation
> > requires an entry in the granu union?  The vendor driver needs to
> > check that argsz was _at_least_ sufficient to provide that entry.  
> My reason is that we have no vendor specific flags in cache invalidate
> UAPI. Why would vendor driver gets involved in sanity checking?
> Previously in v3, we check union size against argsz but was deemed to
> strict.

With this example, if a user provides argsz = minsz and provides a
granularity of either ADDR or PASID,  then iommu_check_cache_invl_data
is validating against the section of the structure that was zero
initialized.  Is that valid for the user or should we have rejected
that as potentially undefined/unintended behavior?  I'm not sure if the
above comment is intended to set a precedent that the vendor driver
doesn't need to check the size or if it's identifying this as a unique
scenario, where normally the vendor driver should validate argsz.  If
we were dealing with a structure that included a vendor specific
structure within the union, it would be the vendor driver's
responsibility to make a similar verification that the user data
is sufficient to specify the requested operation.

 
> >  The
> > mangling of the user provided argsz above makes me cringe a little too
> > for that reason, once we start modifying the user values in the core
> > it could get messy for the vendor drivers.
> >   
> We do have vendor specific union in bind_gpasid UAPI. Could you
> elaborate your concern?

The vendor driver is no longer seeing the value the user provided, what
if some future vendor structure ends with something like:

	__u32 nr_entries;
	__u32 entries[];
}

The core code clobbered the user value, so the vendor driver wouldn't
be able to perform any additional user copies.  Clearly that's also a
bug that could be fixed at the time such functionality becomes
necessary, it seems unnecessary to perform that clobbering in the first
place if everyone is on guard for user supplied data.  Thanks,

Alex

> > > +
> > > +For UAPIs that are shared with in-kernel users, a wrapper function
> > > +is provided to distinguish the callers. For example,
> > > +
> > > +Userspace caller ::
> > > +
> > > +  int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct
> > > device *dev,
> > > +  void __user *udata)
> > > +
> > > +In-kernel caller ::
> > > +
> > > +  int __iommu_sva_unbind_gpasid(struct iommu_domain *domain,
> > > struct device *dev,
> > > +  struct iommu_gpasid_bind_data *data)    
> > 
> > Maybe just prefix with iommu_uapi rather than underscores?  Underscore
> > prefixes usually imply a locking requirement or other reasons to tread
> > carefully whereas this is just the internal API.  Thanks,
> >   
> sounds good. Thanks for explaining.
> 
> > Alex
> >   
> 
> [Jacob Pan]
> 

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 1/5] docs: IOMMU user API
  2020-07-14 19:04       ` Alex Williamson
@ 2020-07-14 23:53         ` Jacob Pan
  2020-07-15 22:12         ` Jacob Pan
  1 sibling, 0 replies; 16+ messages in thread
From: Jacob Pan @ 2020-07-14 23:53 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	LKML, Christoph Hellwig, iommu, David Woodhouse

On Tue, 14 Jul 2020 13:04:12 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> On Mon, 13 Jul 2020 22:00:23 -0700
> Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
> 
> > Hi Alex,
> > 
> > On Mon, 13 Jul 2020 16:48:42 -0600
> > Alex Williamson <alex.williamson@redhat.com> wrote:
> >   
> > > On Tue,  7 Jul 2020 16:43:45 -0700
> > > Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
> > >     
> > > > IOMMU UAPI is newly introduced to support communications between
> > > > guest virtual IOMMU and host IOMMU. There has been lots of
> > > > discussions on how it should work with VFIO UAPI and userspace
> > > > in general.
> > > > 
> > > > This document is indended to clarify the UAPI design and usage.
> > > > The mechenics of how future extensions should be achieved are
> > > > also covered      
> > > 
> > > mechanics
> > >     
> > will fix.
> >   
> > > > in this documentation.
> > > > 
> > > > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > > > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > > ---
> > > >  Documentation/userspace-api/iommu.rst | 312
> > > > ++++++++++++++++++++++++++++++++++ 1 file changed, 312
> > > > insertions(+) create mode 100644
> > > > Documentation/userspace-api/iommu.rst
> > > > 
> > > > diff --git a/Documentation/userspace-api/iommu.rst
> > > > b/Documentation/userspace-api/iommu.rst new file mode 100644
> > > > index 000000000000..581b462c2cec
> > > > --- /dev/null
> > > > +++ b/Documentation/userspace-api/iommu.rst
> > > > @@ -0,0 +1,312 @@
> > > > +.. SPDX-License-Identifier: GPL-2.0
> > > > +.. iommu:
> > > > +
> > > > +=====================================
> > > > +IOMMU Userspace API
> > > > +=====================================
> > > > +
> > > > +IOMMU UAPI is used for virtualization cases where
> > > > communications are +needed between physical and virtual IOMMU
> > > > drivers. For native +usage, IOMMU is a system device which does
> > > > not need to communicate +with user space directly.
> > > > +
> > > > +The primary use cases are guest Shared Virtual Address (SVA)
> > > > and +guest IO virtual address (IOVA), wherein a virtual IOMMU
> > > > (vIOMMU) is +required to communicate with the physical IOMMU in
> > > > the host. +
> > > > +.. contents:: :local:
> > > > +
> > > > +Functionalities
> > > > +===============
> > > > +Communications of user and kernel involve both directions. The
> > > > +supported user-kernel APIs are as follows:
> > > > +
> > > > +1. Alloc/Free PASID
> > > > +2. Bind/unbind guest PASID (e.g. Intel VT-d)
> > > > +3. Bind/unbind guest PASID table (e.g. ARM sMMU)
> > > > +4. Invalidate IOMMU caches
> > > > +5. Service page requests
> > > > +
> > > > +Requirements
> > > > +============
> > > > +The IOMMU UAPIs are generic and extensible to meet the
> > > > following +requirements:
> > > > +
> > > > +1. Emulated and para-virtualised vIOMMUs
> > > > +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.)
> > > > +3. Extensions to the UAPI shall not break existing user space
> > > > +
> > > > +Interfaces
> > > > +==========
> > > > +Although the data structures defined in IOMMU UAPI are
> > > > self-contained, +there is no user API functions introduced.
> > > > Instead, IOMMU UAPI is +designed to work with existing user
> > > > driver frameworks such as VFIO. +
> > > > +Extension Rules & Precautions
> > > > +-----------------------------
> > > > +When IOMMU UAPI gets extended, the data structures can *only*
> > > > be +modified in two ways:
> > > > +
> > > > +1. Adding new fields by re-purposing the padding[] field. No
> > > > size change. +2. Adding new union members at the end. May
> > > > increase in size. +
> > > > +No new fields can be added *after* the variable sized union in
> > > > that it +will break backward compatibility when offset moves. In
> > > > both cases, a +new flag must be accompanied with a new field
> > > > such that the IOMMU +driver can process the data based on the
> > > > new flag. Version field is +only reserved for the unlikely
> > > > event of UAPI upgrade at its entirety. +
> > > > +It's *always* the caller's responsibility to indicate the size
> > > > of the +structure passed by setting argsz appropriately.
> > > > +Though at the same time, argsz is user provided data which is
> > > > not +trusted. The argsz field allows the user to indicate how
> > > > much data +they're providing, it's still the kernel's
> > > > responsibility to validate +whether it's correct and sufficient
> > > > for the requested operation. +
> > > > +Compatibility Checking
> > > > +----------------------
> > > > +When IOMMU UAPI extension results in size increase, user such
> > > > as VFIO +has to handle the following cases:
> > > > +
> > > > +1. User and kernel has exact size match
> > > > +2. An older user with older kernel header (smaller UAPI size)
> > > > running on a
> > > > +   newer kernel (larger UAPI size)
> > > > +3. A newer user with newer kernel header (larger UAPI size)
> > > > running
> > > > +   on an older kernel.
> > > > +4. A malicious/misbehaving user pass illegal/invalid size but
> > > > within
> > > > +   range. The data may contain garbage.      
> > > 
> > > I'm still not sure where VFIO has responsibility in managing any
> > > of these cases.  I think we've determined that VFIO is just the
> > > wrapper and call-through mechanism, it's the UAPI core
> > > implementation and IOMMU drivers that are responsible for this.
> > >     
> > That is right, I shall rewrite the responsibility to be held by
> > IOMMU core.
> > "When IOMMU UAPI extension results in size increase, IOMMU UAPI core
> > shall handle the following cases:"
> >   
> > > > +
> > > > +Feature Checking
> > > > +----------------
> > > > +While launching a guest with vIOMMU, it is important to ensure
> > > > that host +can support the UAPI data structures to be used for
> > > > vIOMMU-pIOMMU +communications. Without upfront compatibility
> > > > checking, future faults +are difficult to report even in normal
> > > > conditions. For example, TLB +invalidations should always
> > > > succeed. There is no architectural way to +report back to the
> > > > vIOMMU if the UAPI data is incompatible. If that +happens, in
> > > > order to protect IOMMU iosolation guarantee, we have to +resort
> > > > to not giving completion status in vIOMMU. This may result in
> > > > +VM hang. +
> > > > +For this reason the following IOMMU UAPIs cannot fail:
> > > > +
> > > > +1. Free PASID
> > > > +2. Unbind guest PASID
> > > > +3. Unbind guest PASID table (SMMU)
> > > > +4. Cache invalidate
> > > > +
> > > > +User applications such as QEMU is expected to import kernel
> > > > UAPI      
> > > 
> > > s/is/are/
> > >     
> > will fix.
> >   
> > > > +headers. Backward compatibility is supported per feature flags.
> > > > +For example, an older QEMU (with older kernel header) can run
> > > > on newer +kernel. Newer QEMU (with new kernel header) may
> > > > refuse to initialize +on an older kernel if new feature flags
> > > > are not supported by older +kernel. Simply recompile existing
> > > > code with newer kernel header should      
> > > 
> > > s/recompile/recompiling/
> > >     
> > got it.
> >   
> > > > +not be an issue in that only existing flags are used.
> > > > +
> > > > +IOMMU vendor driver should report the below features to IOMMU
> > > > UAPI +consumers (e.g. via VFIO).
> > > > +
> > > > +1. IOMMU_NESTING_FEAT_SYSWIDE_PASID
> > > > +2. IOMMU_NESTING_FEAT_BIND_PGTBL
> > > > +3. IOMMU_NESTING_FEAT_BIND_PASID_TABLE
> > > > +4. IOMMU_NESTING_FEAT_CACHE_INVLD
> > > > +5. IOMMU_NESTING_FEAT_PAGE_REQUEST
> > > > +
> > > > +Take VFIO as example, upon request from VFIO user space (e.g.
> > > > QEMU), +VFIO kernel code shall query IOMMU vendor driver for the
> > > > support of +the above features. Query result can then be
> > > > reported back to the +user-space caller. Details can be found in
> > > > +Documentation/driver-api/vfio.rst.
> > > > +
> > > > +
> > > > +Data Passing Example with VFIO
> > > > +------------------------------
> > > > +As the ubiquitous userspace driver framework, VFIO is already
> > > > IOMMU +aware and share many key concepts such as device model,
> > > > group, and      
> > > 
> > > s/share/shares/
> > >     
> > got it.
> >   
> > > > +protection domain. Other user driver frameworks can also be
> > > > extended +to support IOMMU UAPI but it is outside the scope of
> > > > this document. +
> > > > +In this tight-knit VFIO-IOMMU interface, the ultimate consumer
> > > > of the +IOMMU UAPI data is the host IOMMU driver. VFIO
> > > > facilitates user-kernel +transport, capability checking,
> > > > security, and life cycle management of +process address space
> > > > ID (PASID). +
> > > > +Unlike normal user data passed via VFIO UAPI IOTCL, IOMMU
> > > > driver is the +ultimate consumer of its UAPI data. At VFIO
> > > > layer, the IOMMU UAPI data +is wrapped in a VFIO UAPI data. It
> > > > follows the +pattern below::
> > > > +
> > > > +   struct {
> > > > +	__u32 argsz;
> > > > +	__u32 flags;
> > > > +	__u8  data[];
> > > > +   };
> > > > +
> > > > +Here data[] contains the IOMMU UAPI data structures. VFIO has
> > > > the +freedom to bundle the data as well as parse data size
> > > > based on its own flags. +
> > > > +In order to determine the size and feature set of the user
> > > > data, argsz +and flags are also embedded in the IOMMU UAPI data
> > > > structures. +A "__u32 argsz" field is *always* at the beginning
> > > > of each structure. +
> > > > +For example:
> > > > +::
> > > > +
> > > > +   struct iommu_cache_invalidate_info {
> > > > +	__u32	argsz;
> > > > +	#define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
> > > > +	__u32	version;
> > > > +	/* IOMMU paging structure cache */
> > > > +	#define IOMMU_CACHE_INV_TYPE_IOTLB	(1 << 0) /*
> > > > IOMMU IOTLB */
> > > > +	#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB	(1 <<
> > > > 1) /* Device IOTLB */
> > > > +	#define IOMMU_CACHE_INV_TYPE_PASID	(1 << 2) /*
> > > > PASID cache */
> > > > +	#define IOMMU_CACHE_INV_TYPE_NR		(3)
> > > > +	__u8	cache;
> > > > +	__u8	granularity;
> > > > +	__u8	padding[2];      
> > > 
> > > Now would be the right time to add more than just minimum
> > > alignment padding for future use.  Also note that we have 4-byte
> > > alignment leading into the union, it could be desirable to pad
> > > that out to 8-byte alignment anyway.
> > >     
> > make sense, will do padding[6]
> >   
> > > > +	union {
> > > > +		struct iommu_inv_pasid_info pasid_info;
> > > > +		struct iommu_inv_addr_info addr_info;
> > > > +	} granu;
> > > > +   };
> > > > +
> > > > +VFIO is responsible for checking its own argsz and flags then
> > > > invokes +appropriate IOMMU UAPI functions. User pointer is
> > > > passed to IOMMU +layer for further processing. The
> > > > responsibilities are divided as +follows:
> > > > +
> > > > +- Generic IOMMU layer checks argsz range and override
> > > > out-of-range
> > > > +  value.
> > > > +
> > > > +- Generic IOMMU layer checks content of the UAPI data for
> > > > non-zero
> > > > +  reserved bits in flags, padding fields, and unsupported
> > > > version.
> > > > +  This is to ensure not breaking userspace in the future when
> > > > these
> > > > +  fields or flags are used.
> > > > +
> > > > +- Vendor IOMMU driver checks argsz based on vendor flags, UAPI
> > > > data
> > > > +  is consumed based on flags
> > > > +
> > > > +Once again, use guest TLB invalidation as an example, argsz is
> > > > based +on generic flags in the invalidation information. IOMMU
> > > > generic code +shall process the UAPI data as the following:
> > > > +
> > > > +::
> > > > +
> > > > + static int iommu_check_cache_invl_data(struct
> > > > iommu_cache_invalidate_info *info)
> > > > + {
> > > > +	int ret = 0;
> > > > +	u32 mask;
> > > > +
> > > > +	if (info->version !=
> > > > IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
> > > > +		return -EINVAL;
> > > > +
> > > > +	mask =  IOMMU_CACHE_INV_TYPE_IOTLB |
> > > > +		IOMMU_CACHE_INV_TYPE_DEV_IOTLB |
> > > > +		IOMMU_CACHE_INV_TYPE_PASID;      
> > > 
> > > Can TYPE_NR be used here?  ie.  ((1 << IOMMU_CACHE_INV_TYPE_NR) -
> > > 1) 
> > much better, thanks!
> >   
> > > > +	if (info->cache & ~mask) {
> > > > +		pr_warn_ratelimited("Invalid cache types %x\n",
> > > > info->cache);      
> > > 
> > > Even ratelimited, this is too much for a user triggered error, at
> > > most these should be some sort of debug level.  Should probably
> > > just drop them for production.
> > >     
> > I felt a little too chatty as well. will drop, we have a lot of
> > these. 
> > > > +		return -EINVAL;
> > > > +	}
> > > > +
> > > > +	if (info->granularity >= IOMMU_INV_GRANU_NR) {
> > > > +		pr_warn_ratelimited("Invalid cache invalidation
> > > > granu %x\n",
> > > > +				info->granularity);
> > > > +		return -EINVAL;
> > > > +	}
> > > > +
> > > > +	switch (info->granularity) {
> > > > +	case IOMMU_INV_GRANU_ADDR:
> > > > +		mask = IOMMU_INV_ADDR_FLAGS_PASID |
> > > > +			IOMMU_INV_ADDR_FLAGS_ARCHID |
> > > > +			IOMMU_INV_ADDR_FLAGS_LEAF;
> > > > +
> > > > +		if (info->granu.addr_info.flags & ~mask) {
> > > > +			pr_warn_ratelimited("Unsupported
> > > > invalidation addr flags %x\n",
> > > > +
> > > > info->granu.addr_info.flags);
> > > > +			ret = -EINVAL;      
> > > 
> > > Why not return?  Inconsistent with above and unclear benefit.
> > >     
> > will do. thanks
> >   
> > > > +		}
> > > > +		break;
> > > > +	case IOMMU_INV_GRANU_PASID:
> > > > +		mask = IOMMU_INV_PASID_FLAGS_PASID |
> > > > +			IOMMU_INV_PASID_FLAGS_ARCHID;
> > > > +		if (info->granu.pasid_info.flags & ~mask) {
> > > > +			pr_warn_ratelimited("Unsupported
> > > > invalidation PASID flags%x\n",
> > > > +
> > > > info->granu.pasid_info.flags);
> > > > +			ret = -EINVAL;
> > > > +		}
> > > > +		break;
> > > > +	}      
> > > 
> > > 
> > > What happened to IOMMU_INV_GRANU_DOMAIN?  Nothing to check?
> > > Should probably still be included with a 
> > >     
> > I am not sure I got the complete comments here.
> > 
> > IOMMU_INV_GRANU_DOMAIN does not have additional info, actually not
> > used for now.  
> 
> I just found it strange that it was missing, even if only to have a
> comment that it requires no additional checking.  This is
> documentation, so we want very clear examples.
> 
good point, will add comments.

> > > > +
> > > > +	if (info->padding[0] || info->padding[1]) {
> > > > +		pr_warn_ratelimited("Non-zero reserved
> > > > fields\n");
> > > > +		ret = -EINVAL;
> > > > +	}
> > > > +
> > > > +	return ret;
> > > > + }
> > > > +
> > > > + int iommu_cache_invalidate(struct iommu_domain *domain, struct
> > > > device *dev,
> > > > +			   void __user *uinfo)
> > > > + {
> > > > +	struct iommu_cache_invalidate_info inv_info;
> > > > +	unsigned long minsz, maxsz;
> > > > +	int ret = 0;
> > > > +
> > > > +	if (unlikely(!domain->ops->cache_invalidate))
> > > > +		return -ENODEV;
> > > > +
> > > > +	/* Current kernel data size is the max to be copied
> > > > from user */
> > > > +	maxsz = sizeof(struct iommu_cache_invalidate_info);
> > > > +	memset((void *)&inv_info, 0, maxsz);      
> > > 
> > > initialize as = { 0 };
> > >     
> > got it.
> >   
> > > > +
> > > > +	/*
> > > > +	 * No new spaces can be added before the variable sized
> > > > union, the
> > > > +	 * minimum size is the offset to the union.
> > > > +	 */
> > > > +	minsz = offsetof(struct iommu_cache_invalidate_info,
> > > > granu); +
> > > > +	/* Copy minsz from user to get flags and argsz */
> > > > +	if (copy_from_user(&inv_info, uinfo, minsz))
> > > > +		return -EFAULT;
> > > > +
> > > > +	/* Fields before variable size union is mandatory */
> > > > +	if (inv_info.argsz < minsz)
> > > > +		return -EINVAL;
> > > > +
> > > > +	/*
> > > > +	 * User might be using a newer UAPI header which has a
> > > > larger data
> > > > +	 * size, we shall support the existing flags within the
> > > > current
> > > > +	 * size.
> > > > +	 */
> > > > +	if (inv_info.argsz > maxsz)
> > > > +		inv_info.argsz = maxsz;      
> > > 
> > > maxsz handling seems a little clunky, maybe only because this is
> > > the documentation example?
> > >     
> > Not sure I am following.
> > My thinking is that we wanted to support old flags even the
> > user is compiled with a newer header with larger struct size. But
> > old flags must be within the current(older) kernel UAPI struct size.
> > That is why we override the size here, there is no need to copy the
> > whole new struct.  
> 
> I'm only commenting that it feels a little rough versus using
> something like min(inv_info.argsz, sizeof(inv_info)), so I thought
> maybe maxsz is being overly explicit because this is an example.
> 
I see, use min is better.

> > > > +
> > > > +	/* Copy the remaining user data _after_ minsz */
> > > > +	if (copy_from_user((void *)&inv_info + minsz, uinfo +
> > > > minsz,
> > > > +				inv_info.argsz - minsz))
> > > > +		return -EFAULT;
> > > > +
> > > > +	/* Now the argsz is validated, check the content for
> > > > reserved bits */
> > > > +	ret = iommu_check_cache_invl_data(&inv_info);
> > > > +	if (ret)
> > > > +		return ret;
> > > > +
> > > > +	return domain->ops->cache_invalidate(domain, dev,
> > > > &inv_info);
> > > > + }
> > > > +
> > > > +Notice that in this example, since union size is determined by
> > > > generic +flags, all checking to argsz is validated in the
> > > > generic IOMMU layer, +vendor driver does not need to check
> > > > argsz.      
> > > 
> > > Not true.  What if the user provided argsz = minsz and the
> > > operation requires an entry in the granu union?  The vendor
> > > driver needs to check that argsz was _at_least_ sufficient to
> > > provide that entry.    
> > My reason is that we have no vendor specific flags in cache
> > invalidate UAPI. Why would vendor driver gets involved in sanity
> > checking? Previously in v3, we check union size against argsz but
> > was deemed to strict.  
> 
> With this example, if a user provides argsz = minsz and provides a
> granularity of either ADDR or PASID,  then iommu_check_cache_invl_data
> is validating against the section of the structure that was zero
> initialized.  Is that valid for the user or should we have rejected
> that as potentially undefined/unintended behavior?
That is could be a valid data. We should reject it, I was counting on
vendor driver to reject it due to lack of flags in the union. But I
think it is better to reject here.

For argsz > minsz && argsz < maxsz, content will be checked by vendor
driver.


>  I'm not sure if
> the above comment is intended to set a precedent that the vendor
> driver doesn't need to check the size or if it's identifying this as
> a unique scenario, where normally the vendor driver should validate
> argsz. If we were dealing with a structure that included a vendor
> specific structure within the union, it would be the vendor driver's
> responsibility to make a similar verification that the user data
> is sufficient to specify the requested operation.
> 
I meant to use the example to layout two cases: with and without vendor
data. Wasn't meant to be a unique scenario. Let me rewrite this part to
cover both cases.

The downside of letting vendor driver check argsz is that the
"trusted" in-kernel caller also gets checked. But I guess it is a small
price to pay.

Vendor driver can also reject data based on flags.

>  
> > >  The
> > > mangling of the user provided argsz above makes me cringe a
> > > little too for that reason, once we start modifying the user
> > > values in the core it could get messy for the vendor drivers.
> > >     
> > We do have vendor specific union in bind_gpasid UAPI. Could you
> > elaborate your concern?  
> 
> The vendor driver is no longer seeing the value the user provided,
> what if some future vendor structure ends with something like:
> 
> 	__u32 nr_entries;
> 	__u32 entries[];
> }
> 
> The core code clobbered the user value, so the vendor driver wouldn't
> be able to perform any additional user copies.  Clearly that's also a
> bug that could be fixed at the time such functionality becomes
> necessary, it seems unnecessary to perform that clobbering in the
> first place if everyone is on guard for user supplied data.  Thanks,
> 
> Alex
> 
> > > > +
> > > > +For UAPIs that are shared with in-kernel users, a wrapper
> > > > function +is provided to distinguish the callers. For example,
> > > > +
> > > > +Userspace caller ::
> > > > +
> > > > +  int iommu_sva_unbind_gpasid(struct iommu_domain *domain,
> > > > struct device *dev,
> > > > +  void __user *udata)
> > > > +
> > > > +In-kernel caller ::
> > > > +
> > > > +  int __iommu_sva_unbind_gpasid(struct iommu_domain *domain,
> > > > struct device *dev,
> > > > +  struct iommu_gpasid_bind_data *data)      
> > > 
> > > Maybe just prefix with iommu_uapi rather than underscores?
> > > Underscore prefixes usually imply a locking requirement or other
> > > reasons to tread carefully whereas this is just the internal
> > > API.  Thanks, 
> > sounds good. Thanks for explaining.
> >   
> > > Alex
> > >     
> > 
> > [Jacob Pan]
> >   
> 

[Jacob Pan]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 1/5] docs: IOMMU user API
  2020-07-14 19:04       ` Alex Williamson
  2020-07-14 23:53         ` Jacob Pan
@ 2020-07-15 22:12         ` Jacob Pan
  1 sibling, 0 replies; 16+ messages in thread
From: Jacob Pan @ 2020-07-15 22:12 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	LKML, Christoph Hellwig, iommu, David Woodhouse

On Tue, 14 Jul 2020 13:04:12 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> > >  The
> > > mangling of the user provided argsz above makes me cringe a
> > > little too for that reason, once we start modifying the user
> > > values in the core it could get messy for the vendor drivers.
> > >     
> > We do have vendor specific union in bind_gpasid UAPI. Could you
> > elaborate your concern?  
> 
> The vendor driver is no longer seeing the value the user provided,
> what if some future vendor structure ends with something like:
> 
> 	__u32 nr_entries;
> 	__u32 entries[];
> }
> 
> The core code clobbered the user value, so the vendor driver wouldn't
> be able to perform any additional user copies.  Clearly that's also a
> bug that could be fixed at the time such functionality becomes
> necessary, it seems unnecessary to perform that clobbering in the
> first place if everyone is on guard for user supplied data.  Thanks,
Sorry I missed this in my last reply.

Make sense not to clobber user data in the core. So we should pass the
argsz as is to the vendor driver, but copy from user only up to the
size of the current kernel supports.
i.e.

	/*
	 * User might be using a newer UAPI header which has a larger data
	 * size, we shall support the existing flags within the current
	 * size. Copy the remaining user data _after_ minsz but not more
	 * than the current kernel supported size.
	 */
	if (copy_from_user((void *)&inv_info + minsz, uinfo + minsz,
				min(inv_info.argsz, maxsz) - minsz))
		return -EFAULT;

Currently, vendor driver does not handle user pointer. Once the
extension as you described above becomes necessary, we can change the
vendor driver interface. Is that a reasonable plan?

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-07-15 22:06 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-07 23:43 [PATCH v4 0/5] IOMMU user API enhancement Jacob Pan
2020-07-07 23:43 ` [PATCH v4 1/5] docs: IOMMU user API Jacob Pan
2020-07-08  2:07   ` Lu Baolu
2020-07-08 15:29     ` Jacob Pan
2020-07-09  0:44       ` Lu Baolu
2020-07-13 22:48   ` Alex Williamson
2020-07-14  5:00     ` Jacob Pan
2020-07-14 19:04       ` Alex Williamson
2020-07-14 23:53         ` Jacob Pan
2020-07-15 22:12         ` Jacob Pan
2020-07-07 23:43 ` [PATCH v4 2/5] iommu/uapi: Add argsz for user filled data Jacob Pan
2020-07-07 23:43 ` [PATCH v4 3/5] iommu/uapi: Use named union for user data Jacob Pan
2020-07-08  2:17   ` Lu Baolu
2020-07-08 15:18     ` Jacob Pan
2020-07-07 23:43 ` [PATCH v4 4/5] iommu/uapi: Handle data and argsz filled by users Jacob Pan
2020-07-07 23:43 ` [PATCH v4 5/5] iommu/vt-d: Remove UAPI version check Jacob Pan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).