iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] IOMMU user API enhancement
@ 2020-06-11  4:12 Jacob Pan
  2020-06-11  4:12 ` [PATCH v2 1/3] docs: IOMMU user API Jacob Pan
                   ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Jacob Pan @ 2020-06-11  4:12 UTC (permalink / raw)
  To: iommu, LKML, Lu Baolu, Joerg Roedel, David Woodhouse
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Christoph Hellwig,
	Alex Williamson, Jean-Philippe Brucker

IOMMU user API header was introduced to support nested DMA translation and
related fault handling. The current UAPI data structures consist of three
areas that cover the interactions between host kernel and guest:
 - fault handling
 - cache invalidation
 - bind guest page tables, i.e. guest PASID

Future extensions are likely to support more architectures and vIOMMU features.

In the previous discussion, using user-filled data size and feature flags is
made a preferred approach over a unified version number.
https://lkml.org/lkml/2020/1/29/45

In addition to introduce argsz field to data structures, this patchset is also
trying to document the UAPI design, usage, and extension rules. VT-d driver
changes to utilize the new argsz field is included, VFIO usage is to follow.

Thanks,

Jacob


Changeog:

v2:
	- Removed unified API version and helper
	- Introduced argsz for each UAPI data
	- Introduced UAPI doc


Jacob Pan (3):
  docs: IOMMU user API
  iommu/uapi: Add argsz for user filled data
  iommu/vt-d: Sanity check uapi argsz filled by users

 Documentation/userspace-api/iommu.rst | 210 ++++++++++++++++++++++++++++++++++
 drivers/iommu/intel-iommu.c           |  16 +++
 drivers/iommu/intel-svm.c             |  12 ++
 include/uapi/linux/iommu.h            |   6 +
 4 files changed, 244 insertions(+)
 create mode 100644 Documentation/userspace-api/iommu.rst

-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 1/3] docs: IOMMU user API
  2020-06-11  4:12 [PATCH v2 0/3] IOMMU user API enhancement Jacob Pan
@ 2020-06-11  4:12 ` Jacob Pan
  2020-06-11  6:33   ` Lu Baolu
                     ` (3 more replies)
  2020-06-11  4:12 ` [PATCH v2 2/3] iommu/uapi: Add argsz for user filled data Jacob Pan
  2020-06-11  4:12 ` [PATCH v2 3/3] iommu/vt-d: Sanity check uapi argsz filled by users Jacob Pan
  2 siblings, 4 replies; 31+ messages in thread
From: Jacob Pan @ 2020-06-11  4:12 UTC (permalink / raw)
  To: iommu, LKML, Lu Baolu, Joerg Roedel, David Woodhouse
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Christoph Hellwig,
	Alex Williamson, Jean-Philippe Brucker

IOMMU UAPI is newly introduced to support communications between guest
virtual IOMMU and host IOMMU. There has been lots of discussions on how
it should work with VFIO UAPI and userspace in general.

This document is indended to clarify the UAPI design and usage. The
mechenics of how future extensions should be achieved are also covered
in this documentation.

Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 Documentation/userspace-api/iommu.rst | 210 ++++++++++++++++++++++++++++++++++
 1 file changed, 210 insertions(+)
 create mode 100644 Documentation/userspace-api/iommu.rst

diff --git a/Documentation/userspace-api/iommu.rst b/Documentation/userspace-api/iommu.rst
new file mode 100644
index 000000000000..e95dc5a04a41
--- /dev/null
+++ b/Documentation/userspace-api/iommu.rst
@@ -0,0 +1,210 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. iommu:
+
+=====================================
+IOMMU Userspace API
+=====================================
+
+IOMMU UAPI is used for virtualization cases where communications are
+needed between physical and virtual IOMMU drivers. For native
+usage, IOMMU is a system device which does not need to communicate
+with user space directly.
+
+The primary use cases are guest Shared Virtual Address (SVA) and
+guest IO virtual address (IOVA), wherein virtual IOMMU (vIOMMU) is
+required to communicate with the physical IOMMU in the host.
+
+.. contents:: :local:
+
+Functionalities
+====================================================
+Communications of user and kernel involve both directions. The
+supported user-kernel APIs are as follows:
+
+1. Alloc/Free PASID
+2. Bind/unbind guest PASID (e.g. Intel VT-d)
+3. Bind/unbind guest PASID table (e.g. ARM sMMU)
+4. Invalidate IOMMU caches
+5. Service page request
+
+Requirements
+====================================================
+The IOMMU UAPIs are generic and extensible to meet the following
+requirements:
+
+1. Emulated and para-virtualised vIOMMUs
+2. Multiple vendors (Intel VT-d, ARM sMMU, etc.)
+3. Extensions to the UAPI shall not break existing user space
+
+Interfaces
+====================================================
+Although the data structures defined in IOMMU UAPI are self-contained,
+there is no user API functions introduced. Instead, IOMMU UAPI is
+designed to work with existing user driver frameworks such as VFIO.
+
+Extension Rules & Precautions
+-----------------------------
+When IOMMU UAPI gets extended, the data structures can *only* be
+modified in two ways:
+
+1. Adding new fields by re-purposing the padding[] field. No size change.
+2. Adding new union members at the end. May increase in size.
+
+No new fields can be added *after* the variable size union in that it
+will break backward compatibility when offset moves. In both cases, a
+new flag must be accompanied with a new field such that the IOMMU
+driver can process the data based on the new flag. Version field is
+only reserved for the unlikely event of UAPI upgrade at its entirety.
+
+It's *always* the caller's responsibility to indicate the size of the
+structure passed by setting argsz appropriately.
+
+When IOMMU UAPI extension results in size increase, user such as VFIO
+has to handle the following scenarios:
+
+1. User and kernel has exact size match
+2. An older user with older kernel header (smaller UAPI size) running on a
+   newer kernel (larger UAPI size)
+3. A newer user with newer kernel header (larger UAPI size) running
+   on a older kernel.
+4. A malicious/misbehaving user pass illegal/invalid size but within
+   range. The data may contain garbage.
+
+
+Feature Checking
+----------------
+While launching a guest with vIOMMU, it is important to ensure that host
+can support the UAPI data structures to be used for vIOMMU-pIOMMU
+communications. Without the upfront compatibility checking, future
+faults are difficult to report even in normal conditions. For example,
+TLB invalidations should always succeed from vIOMMU's
+perspective. There is no architectural way to report back to the vIOMMU
+if the UAPI data is incompatible. For this reason the following IOMMU
+UAPIs cannot fail:
+
+1. Free PASID
+2. Unbind guest PASID
+3. Unbind guest PASID table (SMMU)
+4. Cache invalidate
+5. Page response
+
+User applications such as QEMU is expected to import kernel UAPI
+headers. Only backward compatibility is supported. For example, an
+older QEMU (with older kernel header) can run on newer kernel. Newer
+QEMU (with new kernel header) may fail on older kernel.
+
+IOMMU vendor driver should report the below features to IOMMU UAPI
+consumers (e.g. via VFIO).
+
+1. IOMMU_NESTING_FEAT_SYSWIDE_PASID
+2. IOMMU_NESTING_FEAT_BIND_PGTBL
+3. IOMMU_NESTING_FEAT_BIND_PASID_TABLE
+4. IOMMU_NESTING_FEAT_CACHE_INVLD
+5. IOMMU_NESTING_FEAT_PAGE_REQUEST
+
+Take VFIO as example, upon request from VFIO user space (e.g. QEMU),
+VFIO kernel code shall query IOMMU vendor driver for the support of
+the above features. Query result can then be reported back to the
+user-space caller. Details can be found in
+Documentation/driver-api/vfio.rst.
+
+
+Data Passing Example with VFIO
+------------------------------
+As the ubiquitous userspace driver framework, VFIO is already IOMMU
+aware and share many key concepts such as device model, group, and
+protection domain. Other user driver frameworks can also be extended
+to support IOMMU UAPI but it is outside the scope of this document.
+
+In this tight-knit VFIO-IOMMU interface, the ultimate consumer of the
+IOMMU UAPI data is the host IOMMU driver. VFIO facilitates user-kernel
+transport, capability checking, security, and life cycle management of
+process address space ID (PASID).
+
+Unlike normal user data passed via VFIO UAPI IOTCL, IOMMU driver is the
+ultimate consumer of its UAPI data. At VFIO layer, the IOMMU UAPI data
+is wrapped in a VFIO UAPI data for sanity checking. It follows the
+pattern below:
+
+::
+
+   struct {
+	__u32 argsz;
+	__u32 flags;
+	__u8  data[];
+  }
+
+Here data[] contains the IOMMU UAPI data structures.
+
+In order to determine the size and feature set of the user data, argsz
+and flags are also embedded in the IOMMU UAPI data structures.
+A "__u32 argsz" field is *always* at the beginning of each structure.
+
+For example:
+::
+
+   struct iommu_gpasid_bind_data {
+	__u32 argsz;
+	__u32 version;
+	#define IOMMU_PASID_FORMAT_INTEL_VTD	1
+	__u32 format;
+	#define IOMMU_SVA_GPASID_VAL	(1 << 0)
+	__u64 flags;
+	__u64 gpgd;
+	__u64 hpasid;
+	__u64 gpasid;
+	__u32 addr_width;
+	__u8  padding[12];
+	/* Vendor specific data */
+	union {
+		struct iommu_gpasid_bind_data_vtd vtd;
+	};
+  };
+
+Use bind guest PASID as an example, VFIO code shall process IOMMU UAPI
+request as follows:
+
+::
+
+ 1        /* Minsz must include IOMMU UAPI "argsz" of __u32 */
+ 2        minsz = offsetofend(struct vfio_iommu_type1_bind, flags) +
+                              sizeof(u32);
+ 3        copy_from_user(&vfio_bind, (void __user *)arg, minsz);
+ 4
+ 5        /* Check VFIO argsz */
+ 6        if (vfio_bind.argsz < minsz)
+ 7                return -EINVAL;
+ 8
+ 9        /* VFIO flags must be included in minsz */
+ 10        switch (vfio_bind.flags) {
+ 11        case VFIO_IOMMU_BIND_GUEST_PGTBL:
+ 12                /*
+ 13                 * Get the current IOMMU bind GPASID data size,
+ 14                 * which accounted for the largest union member.
+ 15                 */
+ 16                data_size = sizeof(struct iommu_gpasid_bind_data);
+ 17                iommu_argsz = vfio_bind.argsz - minsz;
+ 18                if (iommu_argsz > data_size) {
+ 19                        /* User data > current kernel */
+ 20                        return -E2BIG;
+ 21                }
+ 22                copy_from_user(&iommu_bind, (void __user *)
+ 23                               vfio_bind.data, iommu_argsz);
+ 24               /*
+ 25                * Deal with trailing bytes that is bigger than user
+ 26                * provided UAPI size but smaller than the current
+ 27                * kernel data size. Zero fill the trailing bytes.
+ 28                */
+ 29                memset(iommu_bind + iommu_argsz, 0, data_size -
+ 30                       iommu_argsz;
+ 31
+ 32                iommu_sva_bind_gpasid(domain, dev, iommu_bind_data);
+ 33                break;
+
+
+Case #1 & 2 are supported per backward compatibility rule.
+
+Case #3 will fail with -E2BIG at line #20. Case
+
+Case #4 may result in other error processed by IOMMU vendor driver. However,
+the damage shall not exceed the scope of the offending user.
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 2/3] iommu/uapi: Add argsz for user filled data
  2020-06-11  4:12 [PATCH v2 0/3] IOMMU user API enhancement Jacob Pan
  2020-06-11  4:12 ` [PATCH v2 1/3] docs: IOMMU user API Jacob Pan
@ 2020-06-11  4:12 ` Jacob Pan
  2020-06-11 16:49   ` Alex Williamson
  2020-06-11  4:12 ` [PATCH v2 3/3] iommu/vt-d: Sanity check uapi argsz filled by users Jacob Pan
  2 siblings, 1 reply; 31+ messages in thread
From: Jacob Pan @ 2020-06-11  4:12 UTC (permalink / raw)
  To: iommu, LKML, Lu Baolu, Joerg Roedel, David Woodhouse
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Christoph Hellwig,
	Alex Williamson, Jean-Philippe Brucker

As IOMMU UAPI gets extended, user data size may increase. To support
backward compatibiliy, this patch introduces a size field to each UAPI
data structures. It is *always* the responsibility for the user to fill in
the correct size.

Specific scenarios for user data handling are documented in:
Documentation/userspace-api/iommu.rst

Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 include/uapi/linux/iommu.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index e907b7091a46..303f148a5cd7 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -135,6 +135,7 @@ enum iommu_page_response_code {
 
 /**
  * struct iommu_page_response - Generic page response information
+ * @argsz: User filled size of this data
  * @version: API version of this structure
  * @flags: encodes whether the corresponding fields are valid
  *         (IOMMU_FAULT_PAGE_RESPONSE_* values)
@@ -143,6 +144,7 @@ enum iommu_page_response_code {
  * @code: response code from &enum iommu_page_response_code
  */
 struct iommu_page_response {
+	__u32	argsz;
 #define IOMMU_PAGE_RESP_VERSION_1	1
 	__u32	version;
 #define IOMMU_PAGE_RESP_PASID_VALID	(1 << 0)
@@ -218,6 +220,7 @@ struct iommu_inv_pasid_info {
 /**
  * struct iommu_cache_invalidate_info - First level/stage invalidation
  *     information
+ * @argsz: User filled size of this data
  * @version: API version of this structure
  * @cache: bitfield that allows to select which caches to invalidate
  * @granularity: defines the lowest granularity used for the invalidation:
@@ -246,6 +249,7 @@ struct iommu_inv_pasid_info {
  * must support the used granularity.
  */
 struct iommu_cache_invalidate_info {
+	__u32	argsz;
 #define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
 	__u32	version;
 /* IOMMU paging structure cache */
@@ -292,6 +296,7 @@ struct iommu_gpasid_bind_data_vtd {
 
 /**
  * struct iommu_gpasid_bind_data - Information about device and guest PASID binding
+ * @argsz:	User filled size of this data
  * @version:	Version of this data structure
  * @format:	PASID table entry format
  * @flags:	Additional information on guest bind request
@@ -309,6 +314,7 @@ struct iommu_gpasid_bind_data_vtd {
  * PASID to host PASID based on this bind data.
  */
 struct iommu_gpasid_bind_data {
+	__u32 argsz;
 #define IOMMU_GPASID_BIND_VERSION_1	1
 	__u32 version;
 #define IOMMU_PASID_FORMAT_INTEL_VTD	1
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 3/3] iommu/vt-d: Sanity check uapi argsz filled by users
  2020-06-11  4:12 [PATCH v2 0/3] IOMMU user API enhancement Jacob Pan
  2020-06-11  4:12 ` [PATCH v2 1/3] docs: IOMMU user API Jacob Pan
  2020-06-11  4:12 ` [PATCH v2 2/3] iommu/uapi: Add argsz for user filled data Jacob Pan
@ 2020-06-11  4:12 ` Jacob Pan
  2020-06-11 17:08   ` Alex Williamson
  2 siblings, 1 reply; 31+ messages in thread
From: Jacob Pan @ 2020-06-11  4:12 UTC (permalink / raw)
  To: iommu, LKML, Lu Baolu, Joerg Roedel, David Woodhouse
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Christoph Hellwig,
	Alex Williamson, Jean-Philippe Brucker

IOMMU UAPI data has an argsz field which is filled by user. As the data
structures expands, argsz may change. As the UAPI data are shared among
different architectures, extensions of UAPI data could be a result of
one architecture which has no impact on another. Therefore, these argsz
santity checks are performed in the model specific IOMMU drivers. This
patch adds sanity checks in the VT-d to ensure argsz passed by userspace
matches feature flags and other contents.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/intel-iommu.c | 16 ++++++++++++++++
 drivers/iommu/intel-svm.c   | 12 ++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 27ebf4b9faef..c98b5109684b 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5365,6 +5365,7 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, struct device *dev,
 	struct device_domain_info *info;
 	struct intel_iommu *iommu;
 	unsigned long flags;
+	unsigned long minsz;
 	int cache_type;
 	u8 bus, devfn;
 	u16 did, sid;
@@ -5385,6 +5386,21 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, struct device *dev,
 	if (!(dmar_domain->flags & DOMAIN_FLAG_NESTING_MODE))
 		return -EINVAL;
 
+	minsz = offsetofend(struct iommu_cache_invalidate_info, padding);
+	if (inv_info->argsz < minsz)
+		return -EINVAL;
+
+	/* Sanity check user filled invalidation dat sizes */
+	if (inv_info->granularity == IOMMU_INV_GRANU_ADDR &&
+		inv_info->argsz != offsetofend(struct iommu_cache_invalidate_info,
+					addr_info))
+		return -EINVAL;
+
+	if (inv_info->granularity == IOMMU_INV_GRANU_PASID &&
+		inv_info->argsz != offsetofend(struct iommu_cache_invalidate_info,
+					pasid_info))
+		return -EINVAL;
+
 	spin_lock_irqsave(&device_domain_lock, flags);
 	spin_lock(&iommu->lock);
 	info = get_domain_info(dev);
diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 35b43fe819ed..64dc2c66dfff 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -235,15 +235,27 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
 	struct dmar_domain *dmar_domain;
 	struct intel_svm_dev *sdev;
 	struct intel_svm *svm;
+	unsigned long minsz;
 	int ret = 0;
 
 	if (WARN_ON(!iommu) || !data)
 		return -EINVAL;
 
+	/*
+	 * We mandate that no size change in IOMMU UAPI data before the
+	 * variable size union at the end.
+	 */
+	minsz = offsetofend(struct iommu_gpasid_bind_data, padding);
+	if (data->argsz < minsz)
+		return -EINVAL;
+
 	if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
 	    data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
 		return -EINVAL;
 
+	if (data->argsz != offsetofend(struct iommu_gpasid_bind_data, vtd))
+		return -EINVAL;
+
 	if (!dev_is_pci(dev))
 		return -ENOTSUPP;
 
-- 
2.7.4

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 1/3] docs: IOMMU user API
  2020-06-11  4:12 ` [PATCH v2 1/3] docs: IOMMU user API Jacob Pan
@ 2020-06-11  6:33   ` Lu Baolu
  2020-06-12 22:05     ` Jacob Pan
  2020-06-11  9:30   ` Jonathan Cameron
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 31+ messages in thread
From: Lu Baolu @ 2020-06-11  6:33 UTC (permalink / raw)
  To: Jacob Pan, iommu, LKML, Joerg Roedel, David Woodhouse
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Christoph Hellwig,
	Alex Williamson, Jean-Philippe Brucker

Hi Jacob,

On 2020/6/11 12:12, Jacob Pan wrote:
> IOMMU UAPI is newly introduced to support communications between guest
> virtual IOMMU and host IOMMU. There has been lots of discussions on how
> it should work with VFIO UAPI and userspace in general.
> 
> This document is indended to clarify the UAPI design and usage. The
> mechenics of how future extensions should be achieved are also covered
> in this documentation.
> 
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
>   Documentation/userspace-api/iommu.rst | 210 ++++++++++++++++++++++++++++++++++
>   1 file changed, 210 insertions(+)
>   create mode 100644 Documentation/userspace-api/iommu.rst
> 
> diff --git a/Documentation/userspace-api/iommu.rst b/Documentation/userspace-api/iommu.rst
> new file mode 100644
> index 000000000000..e95dc5a04a41
> --- /dev/null
> +++ b/Documentation/userspace-api/iommu.rst
> @@ -0,0 +1,210 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +.. iommu:
> +
> +=====================================
> +IOMMU Userspace API
> +=====================================
> +
> +IOMMU UAPI is used for virtualization cases where communications are
> +needed between physical and virtual IOMMU drivers. For native
> +usage, IOMMU is a system device which does not need to communicate
> +with user space directly.
> +
> +The primary use cases are guest Shared Virtual Address (SVA) and
> +guest IO virtual address (IOVA), wherein virtual IOMMU (vIOMMU) is
> +required to communicate with the physical IOMMU in the host.
> +
> +.. contents:: :local:
> +
> +Functionalities
> +====================================================
> +Communications of user and kernel involve both directions. The
> +supported user-kernel APIs are as follows:
> +
> +1. Alloc/Free PASID
> +2. Bind/unbind guest PASID (e.g. Intel VT-d)
> +3. Bind/unbind guest PASID table (e.g. ARM sMMU)
> +4. Invalidate IOMMU caches
> +5. Service page request
> +
> +Requirements
> +====================================================
> +The IOMMU UAPIs are generic and extensible to meet the following
> +requirements:
> +
> +1. Emulated and para-virtualised vIOMMUs
> +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.)
> +3. Extensions to the UAPI shall not break existing user space
> +
> +Interfaces
> +====================================================
> +Although the data structures defined in IOMMU UAPI are self-contained,
> +there is no user API functions introduced. Instead, IOMMU UAPI is
> +designed to work with existing user driver frameworks such as VFIO.
> +
> +Extension Rules & Precautions
> +-----------------------------
> +When IOMMU UAPI gets extended, the data structures can *only* be
> +modified in two ways:
> +
> +1. Adding new fields by re-purposing the padding[] field. No size change.
> +2. Adding new union members at the end. May increase in size.
> +
> +No new fields can be added *after* the variable size union in that it
> +will break backward compatibility when offset moves. In both cases, a
> +new flag must be accompanied with a new field such that the IOMMU
> +driver can process the data based on the new flag. Version field is
> +only reserved for the unlikely event of UAPI upgrade at its entirety.
> +
> +It's *always* the caller's responsibility to indicate the size of the
> +structure passed by setting argsz appropriately.
> +
> +When IOMMU UAPI extension results in size increase, user such as VFIO
> +has to handle the following scenarios:
> +
> +1. User and kernel has exact size match
> +2. An older user with older kernel header (smaller UAPI size) running on a
> +   newer kernel (larger UAPI size)
> +3. A newer user with newer kernel header (larger UAPI size) running
> +   on a older kernel.
> +4. A malicious/misbehaving user pass illegal/invalid size but within
> +   range. The data may contain garbage.
> +
> +
> +Feature Checking
> +----------------
> +While launching a guest with vIOMMU, it is important to ensure that host
> +can support the UAPI data structures to be used for vIOMMU-pIOMMU
> +communications. Without the upfront compatibility checking, future
> +faults are difficult to report even in normal conditions. For example,
> +TLB invalidations should always succeed from vIOMMU's
> +perspective. There is no architectural way to report back to the vIOMMU
> +if the UAPI data is incompatible. For this reason the following IOMMU
> +UAPIs cannot fail:
> +
> +1. Free PASID
> +2. Unbind guest PASID
> +3. Unbind guest PASID table (SMMU)
> +4. Cache invalidate
> +5. Page response
> +
> +User applications such as QEMU is expected to import kernel UAPI
> +headers. Only backward compatibility is supported. For example, an
> +older QEMU (with older kernel header) can run on newer kernel. Newer
> +QEMU (with new kernel header) may fail on older kernel.
> +
> +IOMMU vendor driver should report the below features to IOMMU UAPI
> +consumers (e.g. via VFIO).
> +
> +1. IOMMU_NESTING_FEAT_SYSWIDE_PASID
> +2. IOMMU_NESTING_FEAT_BIND_PGTBL
> +3. IOMMU_NESTING_FEAT_BIND_PASID_TABLE
> +4. IOMMU_NESTING_FEAT_CACHE_INVLD
> +5. IOMMU_NESTING_FEAT_PAGE_REQUEST
> +
> +Take VFIO as example, upon request from VFIO user space (e.g. QEMU),
> +VFIO kernel code shall query IOMMU vendor driver for the support of
> +the above features. Query result can then be reported back to the
> +user-space caller. Details can be found in
> +Documentation/driver-api/vfio.rst.
> +
> +
> +Data Passing Example with VFIO
> +------------------------------
> +As the ubiquitous userspace driver framework, VFIO is already IOMMU
> +aware and share many key concepts such as device model, group, and
> +protection domain. Other user driver frameworks can also be extended
> +to support IOMMU UAPI but it is outside the scope of this document.
> +
> +In this tight-knit VFIO-IOMMU interface, the ultimate consumer of the
> +IOMMU UAPI data is the host IOMMU driver. VFIO facilitates user-kernel
> +transport, capability checking, security, and life cycle management of
> +process address space ID (PASID).
> +
> +Unlike normal user data passed via VFIO UAPI IOTCL, IOMMU driver is the
> +ultimate consumer of its UAPI data. At VFIO layer, the IOMMU UAPI data
> +is wrapped in a VFIO UAPI data for sanity checking. It follows the
> +pattern below:
> +
> +::
> +
> +   struct {
> +	__u32 argsz;
> +	__u32 flags;
> +	__u8  data[];
> +  }
> +
> +Here data[] contains the IOMMU UAPI data structures.
> +
> +In order to determine the size and feature set of the user data, argsz
> +and flags are also embedded in the IOMMU UAPI data structures.
> +A "__u32 argsz" field is *always* at the beginning of each structure.
> +
> +For example:
> +::
> +
> +   struct iommu_gpasid_bind_data {
> +	__u32 argsz;
> +	__u32 version;
> +	#define IOMMU_PASID_FORMAT_INTEL_VTD	1
> +	__u32 format;
> +	#define IOMMU_SVA_GPASID_VAL	(1 << 0)
> +	__u64 flags;
> +	__u64 gpgd;
> +	__u64 hpasid;
> +	__u64 gpasid;
> +	__u32 addr_width;
> +	__u8  padding[12];
> +	/* Vendor specific data */
> +	union {
> +		struct iommu_gpasid_bind_data_vtd vtd;
> +	};
> +  };
> +
> +Use bind guest PASID as an example, VFIO code shall process IOMMU UAPI
> +request as follows:
> +
> +::
> +
> + 1        /* Minsz must include IOMMU UAPI "argsz" of __u32 */
> + 2        minsz = offsetofend(struct vfio_iommu_type1_bind, flags) +

Where is the vfio_iommu_type1_bind definition? I haven't found it in
both this series and current code.

> +                              sizeof(u32);
> + 3        copy_from_user(&vfio_bind, (void __user *)arg, minsz);
> + 4
> + 5        /* Check VFIO argsz */
> + 6        if (vfio_bind.argsz < minsz)
> + 7                return -EINVAL;
> + 8
> + 9        /* VFIO flags must be included in minsz */
> + 10        switch (vfio_bind.flags) {
> + 11        case VFIO_IOMMU_BIND_GUEST_PGTBL:
> + 12                /*
> + 13                 * Get the current IOMMU bind GPASID data size,
> + 14                 * which accounted for the largest union member.
> + 15                 */
> + 16                data_size = sizeof(struct iommu_gpasid_bind_data);
> + 17                iommu_argsz = vfio_bind.argsz - minsz;
> + 18                if (iommu_argsz > data_size) {
> + 19                        /* User data > current kernel */
> + 20                        return -E2BIG;
> + 21                }
> + 22                copy_from_user(&iommu_bind, (void __user *)
> + 23                               vfio_bind.data, iommu_argsz);
> + 24               /*
> + 25                * Deal with trailing bytes that is bigger than user
> + 26                * provided UAPI size but smaller than the current
> + 27                * kernel data size. Zero fill the trailing bytes.
> + 28                */
> + 29                memset(iommu_bind + iommu_argsz, 0, data_size -
> + 30                       iommu_argsz;

Where is iommu_bind definition? I am assuming that it's

	struct iommu_gpasid_bind_data iommu_bind;

You need to cast it to (void *) to avoid pointer overflow.

	memset((void *)&iommu_bind + iommu_argsz, 0, ...);

By the way, right parenthesis is missed.

> + 31
> + 32                iommu_sva_bind_gpasid(domain, dev, iommu_bind_data);

Where is the iommu_bind_data definition?

> + 33                break;
> +
> +
> +Case #1 & 2 are supported per backward compatibility rule.
> +
> +Case #3 will fail with -E2BIG at line #20. Case
> +
> +Case #4 may result in other error processed by IOMMU vendor driver. However,
> +the damage shall not exceed the scope of the offending user.
> 

The above description is not clear. People are hard to find the right
description of each case.

Best regards,
baolu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 1/3] docs: IOMMU user API
  2020-06-11  4:12 ` [PATCH v2 1/3] docs: IOMMU user API Jacob Pan
  2020-06-11  6:33   ` Lu Baolu
@ 2020-06-11  9:30   ` Jonathan Cameron
  2020-06-12 22:53     ` Jacob Pan
  2020-06-11 13:55   ` Jonathan Corbet
  2020-06-11 15:47   ` Alex Williamson
  3 siblings, 1 reply; 31+ messages in thread
From: Jonathan Cameron @ 2020-06-11  9:30 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Tian, Kevin, Alex Williamson, Raj Ashok, Jonathan Corbet,
	Jean-Philippe Brucker, LKML, Christoph Hellwig, iommu,
	David Woodhouse

On Wed, 10 Jun 2020 21:12:13 -0700
Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:

> IOMMU UAPI is newly introduced to support communications between guest
> virtual IOMMU and host IOMMU. There has been lots of discussions on how
> it should work with VFIO UAPI and userspace in general.
> 
> This document is indended to clarify the UAPI design and usage. The
> mechenics of how future extensions should be achieved are also covered

mechanics 

> in this documentation.
> 
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Mostly seems sensible.  A few comments / queries inline.

Jonathan

> ---
>  Documentation/userspace-api/iommu.rst | 210 ++++++++++++++++++++++++++++++++++
>  1 file changed, 210 insertions(+)
>  create mode 100644 Documentation/userspace-api/iommu.rst
> 
> diff --git a/Documentation/userspace-api/iommu.rst b/Documentation/userspace-api/iommu.rst
> new file mode 100644
> index 000000000000..e95dc5a04a41
> --- /dev/null
> +++ b/Documentation/userspace-api/iommu.rst
> @@ -0,0 +1,210 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +.. iommu:
> +
> +=====================================
> +IOMMU Userspace API
> +=====================================
> +
> +IOMMU UAPI is used for virtualization cases where communications are
> +needed between physical and virtual IOMMU drivers. For native
> +usage, IOMMU is a system device which does not need to communicate
> +with user space directly.
> +
> +The primary use cases are guest Shared Virtual Address (SVA) and
> +guest IO virtual address (IOVA), wherein virtual IOMMU (vIOMMU) is

wherein _a_ virtual IOMMU 

> +required to communicate with the physical IOMMU in the host.
> +
> +.. contents:: :local:
> +
> +Functionalities
> +====================================================
> +Communications of user and kernel involve both directions. The
> +supported user-kernel APIs are as follows:
> +
> +1. Alloc/Free PASID
> +2. Bind/unbind guest PASID (e.g. Intel VT-d)
> +3. Bind/unbind guest PASID table (e.g. ARM sMMU)
> +4. Invalidate IOMMU caches
> +5. Service page request
> +
> +Requirements
> +====================================================
> +The IOMMU UAPIs are generic and extensible to meet the following
> +requirements:
> +
> +1. Emulated and para-virtualised vIOMMUs
> +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.)
> +3. Extensions to the UAPI shall not break existing user space
> +
> +Interfaces
> +====================================================
> +Although the data structures defined in IOMMU UAPI are self-contained,
> +there is no user API functions introduced. Instead, IOMMU UAPI is
> +designed to work with existing user driver frameworks such as VFIO.
> +
> +Extension Rules & Precautions
> +-----------------------------
> +When IOMMU UAPI gets extended, the data structures can *only* be
> +modified in two ways:
> +
> +1. Adding new fields by re-purposing the padding[] field. No size change.
> +2. Adding new union members at the end. May increase in size.
> +
> +No new fields can be added *after* the variable size union in that it
> +will break backward compatibility when offset moves. In both cases, a
> +new flag must be accompanied with a new field such that the IOMMU
> +driver can process the data based on the new flag. Version field is
> +only reserved for the unlikely event of UAPI upgrade at its entirety.
> +
> +It's *always* the caller's responsibility to indicate the size of the
> +structure passed by setting argsz appropriately.
> +
> +When IOMMU UAPI extension results in size increase, user such as VFIO
> +has to handle the following scenarios:
> +
> +1. User and kernel has exact size match
> +2. An older user with older kernel header (smaller UAPI size) running on a
> +   newer kernel (larger UAPI size)
> +3. A newer user with newer kernel header (larger UAPI size) running
> +   on a older kernel.
> +4. A malicious/misbehaving user pass illegal/invalid size but within
> +   range. The data may contain garbage.
> +
> +
> +Feature Checking
> +----------------
> +While launching a guest with vIOMMU, it is important to ensure that host
> +can support the UAPI data structures to be used for vIOMMU-pIOMMU
> +communications. Without the upfront compatibility checking, future
> +faults are difficult to report even in normal conditions. For example,
> +TLB invalidations should always succeed from vIOMMU's
> +perspective. 

This statement has me concerned.  If a TLB invalidation fails, but
is reported to the guest as successful do we have possible breaking of iommu
isolation guarantees?

If you get a TLB invalidation not happening, for some reason, that's a critical
fault, isolate the device using the IOMMU or kill the VM.

I'd reword it as "TLB invalidations should always succeed."

As you mention, we should never get to this state anyway!

> There is no architectural way to report back to the vIOMMU
> +if the UAPI data is incompatible. For this reason the following IOMMU
> +UAPIs cannot fail:
> +
> +1. Free PASID
> +2. Unbind guest PASID
> +3. Unbind guest PASID table (SMMU)
> +4. Cache invalidate
> +5. Page response

Of these, page response is a bit different.  Shouldn't be a problem to
occasionally say a page response was handled when it wasn't (or it was but
has gone away again before the response reached the hardware).  In high
load environments that happens anyway sometimes. 

The others are all cases where any failure at all is fatal to the guest
continuing to use the hardware.

> +
> +User applications such as QEMU is expected to import kernel UAPI
> +headers. Only backward compatibility is supported. For example, an
> +older QEMU (with older kernel header) can run on newer kernel. Newer
> +QEMU (with new kernel header) may fail on older kernel.

I'd define fail a bit tighter here.  I presume refuse to initialize?

> +
> +IOMMU vendor driver should report the below features to IOMMU UAPI
> +consumers (e.g. via VFIO).
> +
> +1. IOMMU_NESTING_FEAT_SYSWIDE_PASID
> +2. IOMMU_NESTING_FEAT_BIND_PGTBL
> +3. IOMMU_NESTING_FEAT_BIND_PASID_TABLE
> +4. IOMMU_NESTING_FEAT_CACHE_INVLD
> +5. IOMMU_NESTING_FEAT_PAGE_REQUEST
> +
> +Take VFIO as example, upon request from VFIO user space (e.g. QEMU),
> +VFIO kernel code shall query IOMMU vendor driver for the support of
> +the above features. Query result can then be reported back to the
> +user-space caller. Details can be found in
> +Documentation/driver-api/vfio.rst.
> +
> +
> +Data Passing Example with VFIO
> +------------------------------
> +As the ubiquitous userspace driver framework, VFIO is already IOMMU
> +aware and share many key concepts such as device model, group, and
> +protection domain. Other user driver frameworks can also be extended
> +to support IOMMU UAPI but it is outside the scope of this document.
> +
> +In this tight-knit VFIO-IOMMU interface, the ultimate consumer of the
> +IOMMU UAPI data is the host IOMMU driver. VFIO facilitates user-kernel
> +transport, capability checking, security, and life cycle management of
> +process address space ID (PASID).
> +
> +Unlike normal user data passed via VFIO UAPI IOTCL, IOMMU driver is the
> +ultimate consumer of its UAPI data. At VFIO layer, the IOMMU UAPI data
> +is wrapped in a VFIO UAPI data for sanity checking. It follows the
> +pattern below:
> +
> +::
> +
> +   struct {
> +	__u32 argsz;
> +	__u32 flags;
> +	__u8  data[];
> +  }

That final bracket needs to be indented one more space.
Make sure you check the output of this file as there are a few of these.

> +
> +Here data[] contains the IOMMU UAPI data structures.
> +
> +In order to determine the size and feature set of the user data, argsz
> +and flags are also embedded in the IOMMU UAPI data structures.
> +A "__u32 argsz" field is *always* at the beginning of each structure.
> +
> +For example:
> +::
> +
> +   struct iommu_gpasid_bind_data {
> +	__u32 argsz;
> +	__u32 version;
> +	#define IOMMU_PASID_FORMAT_INTEL_VTD	1
> +	__u32 format;
> +	#define IOMMU_SVA_GPASID_VAL	(1 << 0)
> +	__u64 flags;
> +	__u64 gpgd;
> +	__u64 hpasid;
> +	__u64 gpasid;
> +	__u32 addr_width;
> +	__u8  padding[12];
> +	/* Vendor specific data */
> +	union {
> +		struct iommu_gpasid_bind_data_vtd vtd;
> +	};
> +  };
> +
> +Use bind guest PASID as an example, VFIO code shall process IOMMU UAPI
> +request as follows:
> +
> +::
> +
> + 1        /* Minsz must include IOMMU UAPI "argsz" of __u32 */
> + 2        minsz = offsetofend(struct vfio_iommu_type1_bind, flags) +
> +                              sizeof(u32);
> + 3        copy_from_user(&vfio_bind, (void __user *)arg, minsz);
> + 4
> + 5        /* Check VFIO argsz */
> + 6        if (vfio_bind.argsz < minsz)
> + 7                return -EINVAL;
> + 8
> + 9        /* VFIO flags must be included in minsz */

Nice to keep indentation across change in line number length.

> + 10        switch (vfio_bind.flags) {
> + 11        case VFIO_IOMMU_BIND_GUEST_PGTBL:
> + 12                /*
> + 13                 * Get the current IOMMU bind GPASID data size,
> + 14                 * which accounted for the largest union member.
> + 15                 */
> + 16                data_size = sizeof(struct iommu_gpasid_bind_data);
> + 17                iommu_argsz = vfio_bind.argsz - minsz;
> + 18                if (iommu_argsz > data_size) {
> + 19                        /* User data > current kernel */
> + 20                        return -E2BIG;
> + 21                }
> + 22                copy_from_user(&iommu_bind, (void __user *)
> + 23                               vfio_bind.data, iommu_argsz);
> + 24               /*
> + 25                * Deal with trailing bytes that is bigger than user
> + 26                * provided UAPI size but smaller than the current
> + 27                * kernel data size. Zero fill the trailing bytes.
> + 28                */
> + 29                memset(iommu_bind + iommu_argsz, 0, data_size -
> + 30                       iommu_argsz;
> + 31
> + 32                iommu_sva_bind_gpasid(domain, dev, iommu_bind_data);
> + 33                break;
> +
> +
> +Case #1 & 2 are supported per backward compatibility rule.
> +
> +Case #3 will fail with -E2BIG at line #20. Case

Is that always the case?  As we have multiple structures in a union it's possible
that a given architecture might expand without changing the union size. So this might
not detect a UAPI change.

> +
> +Case #4 may result in other error processed by IOMMU vendor driver. However,
> +the damage shall not exceed the scope of the offending user.


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 1/3] docs: IOMMU user API
  2020-06-11  4:12 ` [PATCH v2 1/3] docs: IOMMU user API Jacob Pan
  2020-06-11  6:33   ` Lu Baolu
  2020-06-11  9:30   ` Jonathan Cameron
@ 2020-06-11 13:55   ` Jonathan Corbet
  2020-06-11 16:38     ` Jacob Pan
  2020-06-11 15:47   ` Alex Williamson
  3 siblings, 1 reply; 31+ messages in thread
From: Jonathan Corbet @ 2020-06-11 13:55 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Tian, Kevin, Alex Williamson, Raj Ashok, Jean-Philippe Brucker,
	LKML, Christoph Hellwig, iommu, David Woodhouse

On Wed, 10 Jun 2020 21:12:13 -0700
Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:

A little nit but...this pattern:

> +pattern below:
> +
> +::
> +
> +   struct {
> +	__u32 argsz;
> +	__u32 flags;
> +	__u8  data[];
> +  }

can be more concisely and attractively written as:

   pattern below::

	struct { 
...

Thanks,

jon
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 1/3] docs: IOMMU user API
  2020-06-11  4:12 ` [PATCH v2 1/3] docs: IOMMU user API Jacob Pan
                     ` (2 preceding siblings ...)
  2020-06-11 13:55   ` Jonathan Corbet
@ 2020-06-11 15:47   ` Alex Williamson
  2020-06-11 19:52     ` Jacob Pan
  3 siblings, 1 reply; 31+ messages in thread
From: Alex Williamson @ 2020-06-11 15:47 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	LKML, Christoph Hellwig, iommu, David Woodhouse

On Wed, 10 Jun 2020 21:12:13 -0700
Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:

> IOMMU UAPI is newly introduced to support communications between guest
> virtual IOMMU and host IOMMU. There has been lots of discussions on how
> it should work with VFIO UAPI and userspace in general.
> 
> This document is indended to clarify the UAPI design and usage. The
> mechenics of how future extensions should be achieved are also covered
> in this documentation.
> 
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
>  Documentation/userspace-api/iommu.rst | 210 ++++++++++++++++++++++++++++++++++
>  1 file changed, 210 insertions(+)
>  create mode 100644 Documentation/userspace-api/iommu.rst
> 
> diff --git a/Documentation/userspace-api/iommu.rst b/Documentation/userspace-api/iommu.rst
> new file mode 100644
> index 000000000000..e95dc5a04a41
> --- /dev/null
> +++ b/Documentation/userspace-api/iommu.rst
> @@ -0,0 +1,210 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +.. iommu:
> +
> +=====================================
> +IOMMU Userspace API
> +=====================================
> +
> +IOMMU UAPI is used for virtualization cases where communications are
> +needed between physical and virtual IOMMU drivers. For native
> +usage, IOMMU is a system device which does not need to communicate
> +with user space directly.
> +
> +The primary use cases are guest Shared Virtual Address (SVA) and
> +guest IO virtual address (IOVA), wherein virtual IOMMU (vIOMMU) is
> +required to communicate with the physical IOMMU in the host.
> +
> +.. contents:: :local:
> +
> +Functionalities
> +====================================================
> +Communications of user and kernel involve both directions. The
> +supported user-kernel APIs are as follows:
> +
> +1. Alloc/Free PASID
> +2. Bind/unbind guest PASID (e.g. Intel VT-d)
> +3. Bind/unbind guest PASID table (e.g. ARM sMMU)
> +4. Invalidate IOMMU caches
> +5. Service page request
> +
> +Requirements
> +====================================================
> +The IOMMU UAPIs are generic and extensible to meet the following
> +requirements:
> +
> +1. Emulated and para-virtualised vIOMMUs
> +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.)
> +3. Extensions to the UAPI shall not break existing user space
> +
> +Interfaces
> +====================================================
> +Although the data structures defined in IOMMU UAPI are self-contained,
> +there is no user API functions introduced. Instead, IOMMU UAPI is
> +designed to work with existing user driver frameworks such as VFIO.
> +
> +Extension Rules & Precautions
> +-----------------------------
> +When IOMMU UAPI gets extended, the data structures can *only* be
> +modified in two ways:
> +
> +1. Adding new fields by re-purposing the padding[] field. No size change.
> +2. Adding new union members at the end. May increase in size.
> +
> +No new fields can be added *after* the variable size union in that it
> +will break backward compatibility when offset moves. In both cases, a
> +new flag must be accompanied with a new field such that the IOMMU
> +driver can process the data based on the new flag. Version field is
> +only reserved for the unlikely event of UAPI upgrade at its entirety.
> +
> +It's *always* the caller's responsibility to indicate the size of the
> +structure passed by setting argsz appropriately.
> +
> +When IOMMU UAPI extension results in size increase, user such as VFIO
> +has to handle the following scenarios:
> +
> +1. User and kernel has exact size match
> +2. An older user with older kernel header (smaller UAPI size) running on a
> +   newer kernel (larger UAPI size)
> +3. A newer user with newer kernel header (larger UAPI size) running
> +   on a older kernel.
> +4. A malicious/misbehaving user pass illegal/invalid size but within
> +   range. The data may contain garbage.
> +
> +
> +Feature Checking
> +----------------
> +While launching a guest with vIOMMU, it is important to ensure that host
> +can support the UAPI data structures to be used for vIOMMU-pIOMMU
> +communications. Without the upfront compatibility checking, future
> +faults are difficult to report even in normal conditions. For example,
> +TLB invalidations should always succeed from vIOMMU's
> +perspective. There is no architectural way to report back to the vIOMMU
> +if the UAPI data is incompatible. For this reason the following IOMMU
> +UAPIs cannot fail:
> +
> +1. Free PASID
> +2. Unbind guest PASID
> +3. Unbind guest PASID table (SMMU)
> +4. Cache invalidate
> +5. Page response
> +
> +User applications such as QEMU is expected to import kernel UAPI
> +headers. Only backward compatibility is supported. For example, an
> +older QEMU (with older kernel header) can run on newer kernel. Newer
> +QEMU (with new kernel header) may fail on older kernel.

"Build your user application against newer kernels and it may break on
older kernels" is not a great selling point of this UAPI.  Clearly new
features may not be available on older kernels and an application that
depends on a newer feature may be restricted to newer kernels.

> +
> +IOMMU vendor driver should report the below features to IOMMU UAPI
> +consumers (e.g. via VFIO).
> +
> +1. IOMMU_NESTING_FEAT_SYSWIDE_PASID
> +2. IOMMU_NESTING_FEAT_BIND_PGTBL
> +3. IOMMU_NESTING_FEAT_BIND_PASID_TABLE
> +4. IOMMU_NESTING_FEAT_CACHE_INVLD
> +5. IOMMU_NESTING_FEAT_PAGE_REQUEST
> +
> +Take VFIO as example, upon request from VFIO user space (e.g. QEMU),
> +VFIO kernel code shall query IOMMU vendor driver for the support of
> +the above features. Query result can then be reported back to the
> +user-space caller. Details can be found in
> +Documentation/driver-api/vfio.rst.
> +
> +
> +Data Passing Example with VFIO
> +------------------------------
> +As the ubiquitous userspace driver framework, VFIO is already IOMMU
> +aware and share many key concepts such as device model, group, and
> +protection domain. Other user driver frameworks can also be extended
> +to support IOMMU UAPI but it is outside the scope of this document.
> +
> +In this tight-knit VFIO-IOMMU interface, the ultimate consumer of the
> +IOMMU UAPI data is the host IOMMU driver. VFIO facilitates user-kernel
> +transport, capability checking, security, and life cycle management of
> +process address space ID (PASID).
> +
> +Unlike normal user data passed via VFIO UAPI IOTCL, IOMMU driver is the
> +ultimate consumer of its UAPI data. At VFIO layer, the IOMMU UAPI data
> +is wrapped in a VFIO UAPI data for sanity checking. It follows the
> +pattern below:
> +
> +::
> +
> +   struct {
> +	__u32 argsz;
> +	__u32 flags;
> +	__u8  data[];
> +  }
> +
> +Here data[] contains the IOMMU UAPI data structures.
> +
> +In order to determine the size and feature set of the user data, argsz
> +and flags are also embedded in the IOMMU UAPI data structures.
> +A "__u32 argsz" field is *always* at the beginning of each structure.
> +
> +For example:
> +::
> +
> +   struct iommu_gpasid_bind_data {
> +	__u32 argsz;
> +	__u32 version;
> +	#define IOMMU_PASID_FORMAT_INTEL_VTD	1
> +	__u32 format;
> +	#define IOMMU_SVA_GPASID_VAL	(1 << 0)
> +	__u64 flags;
> +	__u64 gpgd;
> +	__u64 hpasid;
> +	__u64 gpasid;
> +	__u32 addr_width;
> +	__u8  padding[12];
> +	/* Vendor specific data */
> +	union {
> +		struct iommu_gpasid_bind_data_vtd vtd;
> +	};
> +  };
> +
> +Use bind guest PASID as an example, VFIO code shall process IOMMU UAPI
> +request as follows:
> +
> +::
> +
> + 1        /* Minsz must include IOMMU UAPI "argsz" of __u32 */
> + 2        minsz = offsetofend(struct vfio_iommu_type1_bind, flags) +
> +                              sizeof(u32);

In the example structure above:

> +   struct {
> +	__u32 argsz;
> +	__u32 flags;
> +	__u8  data[];
> +  }

This presumes that vfio does not use flags to identify a different
layout, for example a field before data or defining a flag that
provides no data.  IOW, the IOMMU guarantees argsz at the beginning of
all structures, but let's not limit how vfio chooses to bundle that
structure.  minsz should be based on flags, which we'll evaluate to
determine how much more to copy.

> + 3        copy_from_user(&vfio_bind, (void __user *)arg, minsz);
> + 4
> + 5        /* Check VFIO argsz */
> + 6        if (vfio_bind.argsz < minsz)
> + 7                return -EINVAL;
> + 8
> + 9        /* VFIO flags must be included in minsz */
> + 10        switch (vfio_bind.flags) {
> + 11        case VFIO_IOMMU_BIND_GUEST_PGTBL:
> + 12                /*
> + 13                 * Get the current IOMMU bind GPASID data size,
> + 14                 * which accounted for the largest union member.
> + 15                 */
> + 16                data_size = sizeof(struct iommu_gpasid_bind_data);
> + 17                iommu_argsz = vfio_bind.argsz - minsz;

Note that by including the IOMMU UAPI argsz within minsz, this is
incorrect.

> + 18                if (iommu_argsz > data_size) {
> + 19                        /* User data > current kernel */
> + 20                        return -E2BIG;
> + 21                }

Now I see why you're making the claim that QEMU compiled against an new
kernel may not work on an older kernel.  We can do better.  The current
sizeof the data structure should be the maximum we'll copy from the
user, and we can update the user provided IOMMU UAPI argsz as we pass it
down from the user to avoid exposing ourselves to an arbitrarily large
user buffer.  The IOMMU UAPI interfaces should then also use argsz and
flags to determine whether the data is present for a specified flag.
That should allow a user application compiled against a newer kernel
header, but only using features found on older kernels to continue to
work on older kernels, which seems like a basic requirement to me.

> + 22                copy_from_user(&iommu_bind, (void __user *)
> + 23                               vfio_bind.data, iommu_argsz);
> + 24               /*
> + 25                * Deal with trailing bytes that is bigger than user
> + 26                * provided UAPI size but smaller than the current
> + 27                * kernel data size. Zero fill the trailing bytes.
> + 28                */
> + 29                memset(iommu_bind + iommu_argsz, 0, data_size -
> + 30                       iommu_argsz;

The IOMMU UAPI interface having access to argsz should make this
unnecessary.  Performing this memset() seems like it suggests to the
next layer that it can rely on all fields being present and valid,
which defeats the purpose of argsz.

> + 31
> + 32                iommu_sva_bind_gpasid(domain, dev, iommu_bind_data);
> + 33                break;
> +
> +
> +Case #1 & 2 are supported per backward compatibility rule.
> +
> +Case #3 will fail with -E2BIG at line #20. Case

This is not acceptable IMO.

> +Case #4 may result in other error processed by IOMMU vendor driver. However,
> +the damage shall not exceed the scope of the offending user.

This is a concern in this double wrapped interface, the IOMMU UAPI
layer may expect the vfio layer to validate the data.  Zeroing the
remainder of the data structure is evidence towards that.  The IOMMU
UAPI layer needs to consider all of this untrusted, so why would we not
reflect that by passing a __user pointer through to the IOMMU UAPI such
that it can copy the data from the user itself rather than being
mislead that the contents have been somehow verified?  Thanks,

Alex

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 1/3] docs: IOMMU user API
  2020-06-11 13:55   ` Jonathan Corbet
@ 2020-06-11 16:38     ` Jacob Pan
  0 siblings, 0 replies; 31+ messages in thread
From: Jacob Pan @ 2020-06-11 16:38 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Tian, Kevin, Alex Williamson, Raj Ashok, Jean-Philippe Brucker,
	LKML, Christoph Hellwig, iommu, David Woodhouse

Hi Jon,

On Thu, 11 Jun 2020 07:55:00 -0600
Jonathan Corbet <corbet@lwn.net> wrote:

> On Wed, 10 Jun 2020 21:12:13 -0700
> Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
> 
> A little nit but...this pattern:
> 
> > +pattern below:
> > +
> > +::
> > +
> > +   struct {
> > +	__u32 argsz;
> > +	__u32 flags;
> > +	__u8  data[];
> > +  }  
> 
> can be more concisely and attractively written as:
> 
>    pattern below::
> 
> 	struct { 
> ...
> 
Will do. thanks!

> Thanks,
> 
> jon

[Jacob Pan]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 2/3] iommu/uapi: Add argsz for user filled data
  2020-06-11  4:12 ` [PATCH v2 2/3] iommu/uapi: Add argsz for user filled data Jacob Pan
@ 2020-06-11 16:49   ` Alex Williamson
  2020-06-12  0:02     ` Jacob Pan
  0 siblings, 1 reply; 31+ messages in thread
From: Alex Williamson @ 2020-06-11 16:49 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	LKML, Christoph Hellwig, iommu, David Woodhouse

On Wed, 10 Jun 2020 21:12:14 -0700
Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:

> As IOMMU UAPI gets extended, user data size may increase. To support
> backward compatibiliy, this patch introduces a size field to each UAPI
> data structures. It is *always* the responsibility for the user to fill in
> the correct size.

Though at the same time, argsz is user provided data which we don't
trust.  The argsz field allows the user to indicate how much data
they're providing, it's still the kernel's responsibility to validate
whether it's correct and sufficient for the requested operation.
Thanks,

Alex

> Specific scenarios for user data handling are documented in:
> Documentation/userspace-api/iommu.rst
> 
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
>  include/uapi/linux/iommu.h | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> index e907b7091a46..303f148a5cd7 100644
> --- a/include/uapi/linux/iommu.h
> +++ b/include/uapi/linux/iommu.h
> @@ -135,6 +135,7 @@ enum iommu_page_response_code {
>  
>  /**
>   * struct iommu_page_response - Generic page response information
> + * @argsz: User filled size of this data
>   * @version: API version of this structure
>   * @flags: encodes whether the corresponding fields are valid
>   *         (IOMMU_FAULT_PAGE_RESPONSE_* values)
> @@ -143,6 +144,7 @@ enum iommu_page_response_code {
>   * @code: response code from &enum iommu_page_response_code
>   */
>  struct iommu_page_response {
> +	__u32	argsz;
>  #define IOMMU_PAGE_RESP_VERSION_1	1
>  	__u32	version;
>  #define IOMMU_PAGE_RESP_PASID_VALID	(1 << 0)
> @@ -218,6 +220,7 @@ struct iommu_inv_pasid_info {
>  /**
>   * struct iommu_cache_invalidate_info - First level/stage invalidation
>   *     information
> + * @argsz: User filled size of this data
>   * @version: API version of this structure
>   * @cache: bitfield that allows to select which caches to invalidate
>   * @granularity: defines the lowest granularity used for the invalidation:
> @@ -246,6 +249,7 @@ struct iommu_inv_pasid_info {
>   * must support the used granularity.
>   */
>  struct iommu_cache_invalidate_info {
> +	__u32	argsz;
>  #define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
>  	__u32	version;
>  /* IOMMU paging structure cache */
> @@ -292,6 +296,7 @@ struct iommu_gpasid_bind_data_vtd {
>  
>  /**
>   * struct iommu_gpasid_bind_data - Information about device and guest PASID binding
> + * @argsz:	User filled size of this data
>   * @version:	Version of this data structure
>   * @format:	PASID table entry format
>   * @flags:	Additional information on guest bind request
> @@ -309,6 +314,7 @@ struct iommu_gpasid_bind_data_vtd {
>   * PASID to host PASID based on this bind data.
>   */
>  struct iommu_gpasid_bind_data {
> +	__u32 argsz;
>  #define IOMMU_GPASID_BIND_VERSION_1	1
>  	__u32 version;
>  #define IOMMU_PASID_FORMAT_INTEL_VTD	1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 3/3] iommu/vt-d: Sanity check uapi argsz filled by users
  2020-06-11  4:12 ` [PATCH v2 3/3] iommu/vt-d: Sanity check uapi argsz filled by users Jacob Pan
@ 2020-06-11 17:08   ` Alex Williamson
  2020-06-11 20:02     ` Jacob Pan
  0 siblings, 1 reply; 31+ messages in thread
From: Alex Williamson @ 2020-06-11 17:08 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	LKML, Christoph Hellwig, iommu, David Woodhouse

On Wed, 10 Jun 2020 21:12:15 -0700
Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:

> IOMMU UAPI data has an argsz field which is filled by user. As the data
> structures expands, argsz may change. As the UAPI data are shared among
> different architectures, extensions of UAPI data could be a result of
> one architecture which has no impact on another. Therefore, these argsz
> santity checks are performed in the model specific IOMMU drivers. This
> patch adds sanity checks in the VT-d to ensure argsz passed by userspace
> matches feature flags and other contents.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
>  drivers/iommu/intel-iommu.c | 16 ++++++++++++++++
>  drivers/iommu/intel-svm.c   | 12 ++++++++++++
>  2 files changed, 28 insertions(+)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 27ebf4b9faef..c98b5109684b 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5365,6 +5365,7 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, struct device *dev,
>  	struct device_domain_info *info;
>  	struct intel_iommu *iommu;
>  	unsigned long flags;
> +	unsigned long minsz;
>  	int cache_type;
>  	u8 bus, devfn;
>  	u16 did, sid;
> @@ -5385,6 +5386,21 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, struct device *dev,
>  	if (!(dmar_domain->flags & DOMAIN_FLAG_NESTING_MODE))
>  		return -EINVAL;
>  
> +	minsz = offsetofend(struct iommu_cache_invalidate_info, padding);

Would it still be better to look for the end of the last field that's
actually used to avoid the code churn and oversights if/when the padding
field does get used and renamed?

Per my comment on patch 1/, this also seems like where the device
specific IOMMU driver should also have the responsibility of receiving
a __user pointer to do the copy_from_user() here.  vfio can't know
which flags require which fields to make a UAPI with acceptable
compatibility guarantees otherwise.

> +	if (inv_info->argsz < minsz)
> +		return -EINVAL;
> +
> +	/* Sanity check user filled invalidation dat sizes */
> +	if (inv_info->granularity == IOMMU_INV_GRANU_ADDR &&
> +		inv_info->argsz != offsetofend(struct iommu_cache_invalidate_info,
> +					addr_info))
> +		return -EINVAL;
> +
> +	if (inv_info->granularity == IOMMU_INV_GRANU_PASID &&
> +		inv_info->argsz != offsetofend(struct iommu_cache_invalidate_info,
> +					pasid_info))
> +		return -EINVAL;
> +
>  	spin_lock_irqsave(&device_domain_lock, flags);
>  	spin_lock(&iommu->lock);
>  	info = get_domain_info(dev);
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index 35b43fe819ed..64dc2c66dfff 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -235,15 +235,27 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
>  	struct dmar_domain *dmar_domain;
>  	struct intel_svm_dev *sdev;
>  	struct intel_svm *svm;
> +	unsigned long minsz;
>  	int ret = 0;
>  
>  	if (WARN_ON(!iommu) || !data)
>  		return -EINVAL;
>  
> +	/*
> +	 * We mandate that no size change in IOMMU UAPI data before the
> +	 * variable size union at the end.
> +	 */
> +	minsz = offsetofend(struct iommu_gpasid_bind_data, padding);

Same.  Thanks,

Alex

> +	if (data->argsz < minsz)
> +		return -EINVAL;
> +
>  	if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
>  	    data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
>  		return -EINVAL;
>  
> +	if (data->argsz != offsetofend(struct iommu_gpasid_bind_data, vtd))
> +		return -EINVAL;
> +
>  	if (!dev_is_pci(dev))
>  		return -ENOTSUPP;
>  

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 1/3] docs: IOMMU user API
  2020-06-11 15:47   ` Alex Williamson
@ 2020-06-11 19:52     ` Jacob Pan
  2020-06-11 20:40       ` Alex Williamson
  0 siblings, 1 reply; 31+ messages in thread
From: Jacob Pan @ 2020-06-11 19:52 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	LKML, Christoph Hellwig, iommu, David Woodhouse

Hi Alex,

On Thu, 11 Jun 2020 09:47:41 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> On Wed, 10 Jun 2020 21:12:13 -0700
> Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
> 
> > IOMMU UAPI is newly introduced to support communications between
> > guest virtual IOMMU and host IOMMU. There has been lots of
> > discussions on how it should work with VFIO UAPI and userspace in
> > general.
> > 
> > This document is indended to clarify the UAPI design and usage. The
> > mechenics of how future extensions should be achieved are also
> > covered in this documentation.
> > 
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > ---
> >  Documentation/userspace-api/iommu.rst | 210
> > ++++++++++++++++++++++++++++++++++ 1 file changed, 210 insertions(+)
> >  create mode 100644 Documentation/userspace-api/iommu.rst
> > 
> > diff --git a/Documentation/userspace-api/iommu.rst
> > b/Documentation/userspace-api/iommu.rst new file mode 100644
> > index 000000000000..e95dc5a04a41
> > --- /dev/null
> > +++ b/Documentation/userspace-api/iommu.rst
> > @@ -0,0 +1,210 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +.. iommu:
> > +
> > +=====================================
> > +IOMMU Userspace API
> > +=====================================
> > +
> > +IOMMU UAPI is used for virtualization cases where communications
> > are +needed between physical and virtual IOMMU drivers. For native
> > +usage, IOMMU is a system device which does not need to communicate
> > +with user space directly.
> > +
> > +The primary use cases are guest Shared Virtual Address (SVA) and
> > +guest IO virtual address (IOVA), wherein virtual IOMMU (vIOMMU) is
> > +required to communicate with the physical IOMMU in the host.
> > +
> > +.. contents:: :local:
> > +
> > +Functionalities
> > +====================================================
> > +Communications of user and kernel involve both directions. The
> > +supported user-kernel APIs are as follows:
> > +
> > +1. Alloc/Free PASID
> > +2. Bind/unbind guest PASID (e.g. Intel VT-d)
> > +3. Bind/unbind guest PASID table (e.g. ARM sMMU)
> > +4. Invalidate IOMMU caches
> > +5. Service page request
> > +
> > +Requirements
> > +====================================================
> > +The IOMMU UAPIs are generic and extensible to meet the following
> > +requirements:
> > +
> > +1. Emulated and para-virtualised vIOMMUs
> > +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.)
> > +3. Extensions to the UAPI shall not break existing user space
> > +
> > +Interfaces
> > +====================================================
> > +Although the data structures defined in IOMMU UAPI are
> > self-contained, +there is no user API functions introduced.
> > Instead, IOMMU UAPI is +designed to work with existing user driver
> > frameworks such as VFIO. +
> > +Extension Rules & Precautions
> > +-----------------------------
> > +When IOMMU UAPI gets extended, the data structures can *only* be
> > +modified in two ways:
> > +
> > +1. Adding new fields by re-purposing the padding[] field. No size
> > change. +2. Adding new union members at the end. May increase in
> > size. +
> > +No new fields can be added *after* the variable size union in that
> > it +will break backward compatibility when offset moves. In both
> > cases, a +new flag must be accompanied with a new field such that
> > the IOMMU +driver can process the data based on the new flag.
> > Version field is +only reserved for the unlikely event of UAPI
> > upgrade at its entirety. +
> > +It's *always* the caller's responsibility to indicate the size of
> > the +structure passed by setting argsz appropriately.
> > +
> > +When IOMMU UAPI extension results in size increase, user such as
> > VFIO +has to handle the following scenarios:
> > +
> > +1. User and kernel has exact size match
> > +2. An older user with older kernel header (smaller UAPI size)
> > running on a
> > +   newer kernel (larger UAPI size)
> > +3. A newer user with newer kernel header (larger UAPI size) running
> > +   on a older kernel.
> > +4. A malicious/misbehaving user pass illegal/invalid size but
> > within
> > +   range. The data may contain garbage.
> > +
> > +
> > +Feature Checking
> > +----------------
> > +While launching a guest with vIOMMU, it is important to ensure
> > that host +can support the UAPI data structures to be used for
> > vIOMMU-pIOMMU +communications. Without the upfront compatibility
> > checking, future +faults are difficult to report even in normal
> > conditions. For example, +TLB invalidations should always succeed
> > from vIOMMU's +perspective. There is no architectural way to report
> > back to the vIOMMU +if the UAPI data is incompatible. For this
> > reason the following IOMMU +UAPIs cannot fail:
> > +
> > +1. Free PASID
> > +2. Unbind guest PASID
> > +3. Unbind guest PASID table (SMMU)
> > +4. Cache invalidate
> > +5. Page response
> > +
> > +User applications such as QEMU is expected to import kernel UAPI
> > +headers. Only backward compatibility is supported. For example, an
> > +older QEMU (with older kernel header) can run on newer kernel.
> > Newer +QEMU (with new kernel header) may fail on older kernel.  
> 
> "Build your user application against newer kernels and it may break on
> older kernels" is not a great selling point of this UAPI.  Clearly new
> features may not be available on older kernels and an application that
> depends on a newer feature may be restricted to newer kernels.
> 
Perhaps "fail on older kernel" is not the right statement. I meant to
say "Newer QEMU (with new kernel header) may fail the compatibility
check on older kernel". Here compatibility check involves argsz check
and feature check.

Does it sound right?

> > +
> > +IOMMU vendor driver should report the below features to IOMMU UAPI
> > +consumers (e.g. via VFIO).
> > +
> > +1. IOMMU_NESTING_FEAT_SYSWIDE_PASID
> > +2. IOMMU_NESTING_FEAT_BIND_PGTBL
> > +3. IOMMU_NESTING_FEAT_BIND_PASID_TABLE
> > +4. IOMMU_NESTING_FEAT_CACHE_INVLD
> > +5. IOMMU_NESTING_FEAT_PAGE_REQUEST
> > +
> > +Take VFIO as example, upon request from VFIO user space (e.g.
> > QEMU), +VFIO kernel code shall query IOMMU vendor driver for the
> > support of +the above features. Query result can then be reported
> > back to the +user-space caller. Details can be found in
> > +Documentation/driver-api/vfio.rst.
> > +
> > +
> > +Data Passing Example with VFIO
> > +------------------------------
> > +As the ubiquitous userspace driver framework, VFIO is already IOMMU
> > +aware and share many key concepts such as device model, group, and
> > +protection domain. Other user driver frameworks can also be
> > extended +to support IOMMU UAPI but it is outside the scope of this
> > document. +
> > +In this tight-knit VFIO-IOMMU interface, the ultimate consumer of
> > the +IOMMU UAPI data is the host IOMMU driver. VFIO facilitates
> > user-kernel +transport, capability checking, security, and life
> > cycle management of +process address space ID (PASID).
> > +
> > +Unlike normal user data passed via VFIO UAPI IOTCL, IOMMU driver
> > is the +ultimate consumer of its UAPI data. At VFIO layer, the
> > IOMMU UAPI data +is wrapped in a VFIO UAPI data for sanity
> > checking. It follows the +pattern below:
> > +
> > +::
> > +
> > +   struct {
> > +	__u32 argsz;
> > +	__u32 flags;
> > +	__u8  data[];
> > +  }
> > +
> > +Here data[] contains the IOMMU UAPI data structures.
> > +
> > +In order to determine the size and feature set of the user data,
> > argsz +and flags are also embedded in the IOMMU UAPI data
> > structures. +A "__u32 argsz" field is *always* at the beginning of
> > each structure. +
> > +For example:
> > +::
> > +
> > +   struct iommu_gpasid_bind_data {
> > +	__u32 argsz;
> > +	__u32 version;
> > +	#define IOMMU_PASID_FORMAT_INTEL_VTD	1
> > +	__u32 format;
> > +	#define IOMMU_SVA_GPASID_VAL	(1 << 0)
> > +	__u64 flags;
> > +	__u64 gpgd;
> > +	__u64 hpasid;
> > +	__u64 gpasid;
> > +	__u32 addr_width;
> > +	__u8  padding[12];
> > +	/* Vendor specific data */
> > +	union {
> > +		struct iommu_gpasid_bind_data_vtd vtd;
> > +	};
> > +  };
> > +
> > +Use bind guest PASID as an example, VFIO code shall process IOMMU
> > UAPI +request as follows:
> > +
> > +::
> > +
> > + 1        /* Minsz must include IOMMU UAPI "argsz" of __u32 */
> > + 2        minsz = offsetofend(struct vfio_iommu_type1_bind, flags)
> > +
> > +                              sizeof(u32);  
> 
> In the example structure above:
> 
> > +   struct {
> > +	__u32 argsz;
> > +	__u32 flags;
> > +	__u8  data[];
> > +  }  
> 
> This presumes that vfio does not use flags to identify a different
> layout, for example a field before data or defining a flag that
> provides no data.  IOW, the IOMMU guarantees argsz at the beginning of
> all structures, but let's not limit how vfio chooses to bundle that
> structure.  minsz should be based on flags, which we'll evaluate to
> determine how much more to copy.
> 
Got it, VFIO owns its flags therefore minsz. How about reword the
example data struct as:
   struct {
	__u32 argsz;
	__u32 flags;
	__u8  data[];
  }

Here data[] contains the IOMMU UAPI data structures. VFIO has the
freedom to bundle the data as well as parse data size based on its own flags.

In the example code:

"Use bind guest PASID as an example, VFIO code shall first
process the flags field to determine the size to copy for IOMMU UAPI.
The flags could indicate different layout of the VFIO data and the types
of IOMMU UAPI data."


> > UAPI +request as follows:

> > + 3        copy_from_user(&vfio_bind, (void __user *)arg, minsz);
> > + 4
> > + 5        /* Check VFIO argsz */
> > + 6        if (vfio_bind.argsz < minsz)
> > + 7                return -EINVAL;
> > + 8
> > + 9        /* VFIO flags must be included in minsz */
> > + 10        switch (vfio_bind.flags) {
> > + 11        case VFIO_IOMMU_BIND_GUEST_PGTBL:
> > + 12                /*
> > + 13                 * Get the current IOMMU bind GPASID data size,
> > + 14                 * which accounted for the largest union member.
> > + 15                 */
> > + 16                data_size = sizeof(struct
> > iommu_gpasid_bind_data);
> > + 17                iommu_argsz = vfio_bind.argsz - minsz;  
> 
> Note that by including the IOMMU UAPI argsz within minsz, this is
> incorrect.
> 
Good catch, should be:
iommu_argsz = vfio_bind.argsz - minsz - sizeof(u32)

> > + 18                if (iommu_argsz > data_size) {
> > + 19                        /* User data > current kernel */
> > + 20                        return -E2BIG;
> > + 21                }  
> 
> Now I see why you're making the claim that QEMU compiled against an
> new kernel may not work on an older kernel.  We can do better.  The
> current sizeof the data structure should be the maximum we'll copy
> from the user, and we can update the user provided IOMMU UAPI argsz
> as we pass it down from the user to avoid exposing ourselves to an
> arbitrarily large user buffer.  The IOMMU UAPI interfaces should then
> also use argsz and flags to determine whether the data is present for
> a specified flag. That should allow a user application compiled
> against a newer kernel header, but only using features found on older
> kernels to continue to work on older kernels, which seems like a
> basic requirement to me.
> 
I got your point. But I don't understand why VFIO layer will update
IOMMU argsz, IOMMU layer is not exposed to arbitrary large user size in
that it can not exceed the current UAPI data size.

I agree we should make effort to allow features found in the older
kernel continue to work.


> > + 22                copy_from_user(&iommu_bind, (void __user *)
> > + 23                               vfio_bind.data, iommu_argsz);
> > + 24               /*
> > + 25                * Deal with trailing bytes that is bigger than
> > user
> > + 26                * provided UAPI size but smaller than the
> > current
> > + 27                * kernel data size. Zero fill the trailing
> > bytes.
> > + 28                */
> > + 29                memset(iommu_bind + iommu_argsz, 0, data_size -
> > + 30                       iommu_argsz;  
> 
> The IOMMU UAPI interface having access to argsz should make this
> unnecessary.  Performing this memset() seems like it suggests to the
> next layer that it can rely on all fields being present and valid,
> which defeats the purpose of argsz.
> 
This memset does not suggest all fields are present and valid. Only
filter out the obvious invalid data based on current size. My intention
is to reduce the burden of checking not eliminate.

> > + 31
> > + 32                iommu_sva_bind_gpasid(domain, dev,
> > iommu_bind_data);
> > + 33                break;
> > +
> > +
> > +Case #1 & 2 are supported per backward compatibility rule.
> > +
> > +Case #3 will fail with -E2BIG at line #20. Case  
> 
> This is not acceptable IMO.
> 
Got it. Will copy up to the current data size and let IOMMU driver
handle supported flags/features.

> > +Case #4 may result in other error processed by IOMMU vendor
> > driver. However, +the damage shall not exceed the scope of the
> > offending user.  
> 
> This is a concern in this double wrapped interface, the IOMMU UAPI
> layer may expect the vfio layer to validate the data.  Zeroing the
> remainder of the data structure is evidence towards that.  The IOMMU
> UAPI layer needs to consider all of this untrusted, so why would we
> not reflect that by passing a __user pointer through to the IOMMU
> UAPI such that it can copy the data from the user itself rather than
> being mislead that the contents have been somehow verified?  Thanks,
> 
I am OK with IOMMU layer does the copy_from_user. One of my
original thinking was that since some APIs (e.g page response) also
used by in-kernel code, I would avoid user pointer or another
wrapper.

Perhaps need to clarify the roles of each layer, IMHO the roles are:
- VFIO
	1. bundle IOMMU UAPI data with flags & argsz
	2. sanity check argsz > minsz
	3. determine the copy_from_user size based on VFIO flags & argsz

- IOMMU UAPI
	1. check argsz against current kernel IOMMU UAPI data size, its
	own minsz
	2. parse data based on feature/flags

So if VFIO already can decide the copy_from_user size as in VFIO.3, why
can't it just do the copy as well. VFIO only checks and ensures size,
nothing specific to the content of IOMMU UAPI. Does the role partition
sound right?


> Alex
> 

[Jacob Pan]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 3/3] iommu/vt-d: Sanity check uapi argsz filled by users
  2020-06-11 17:08   ` Alex Williamson
@ 2020-06-11 20:02     ` Jacob Pan
  2020-06-11 20:55       ` Alex Williamson
  0 siblings, 1 reply; 31+ messages in thread
From: Jacob Pan @ 2020-06-11 20:02 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	LKML, Christoph Hellwig, iommu, David Woodhouse

On Thu, 11 Jun 2020 11:08:16 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> On Wed, 10 Jun 2020 21:12:15 -0700
> Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
> 
> > IOMMU UAPI data has an argsz field which is filled by user. As the
> > data structures expands, argsz may change. As the UAPI data are
> > shared among different architectures, extensions of UAPI data could
> > be a result of one architecture which has no impact on another.
> > Therefore, these argsz santity checks are performed in the model
> > specific IOMMU drivers. This patch adds sanity checks in the VT-d
> > to ensure argsz passed by userspace matches feature flags and other
> > contents.
> > 
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > ---
> >  drivers/iommu/intel-iommu.c | 16 ++++++++++++++++
> >  drivers/iommu/intel-svm.c   | 12 ++++++++++++
> >  2 files changed, 28 insertions(+)
> > 
> > diff --git a/drivers/iommu/intel-iommu.c
> > b/drivers/iommu/intel-iommu.c index 27ebf4b9faef..c98b5109684b
> > 100644 --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -5365,6 +5365,7 @@ intel_iommu_sva_invalidate(struct
> > iommu_domain *domain, struct device *dev, struct device_domain_info
> > *info; struct intel_iommu *iommu;
> >  	unsigned long flags;
> > +	unsigned long minsz;
> >  	int cache_type;
> >  	u8 bus, devfn;
> >  	u16 did, sid;
> > @@ -5385,6 +5386,21 @@ intel_iommu_sva_invalidate(struct
> > iommu_domain *domain, struct device *dev, if (!(dmar_domain->flags
> > & DOMAIN_FLAG_NESTING_MODE)) return -EINVAL;
> >  
> > +	minsz = offsetofend(struct iommu_cache_invalidate_info,
> > padding);  
> 
> Would it still be better to look for the end of the last field that's
> actually used to avoid the code churn and oversights if/when the
> padding field does get used and renamed?
> 
My thought was that if the padding gets partially re-purposed, the
remaining padding would still be valid for minsz check. The extension
rule ensures that there is no size change other the variable size union
at the end. So use padding would avoid the churn, or i am totally wrong?

> Per my comment on patch 1/, this also seems like where the device
> specific IOMMU driver should also have the responsibility of receiving
> a __user pointer to do the copy_from_user() here.  vfio can't know
> which flags require which fields to make a UAPI with acceptable
> compatibility guarantees otherwise.
> 
Right, VFIO cannot do compatibility guarantees, it is just seem to be
that VFIO has enough information to copy_from_user sanely & safely and
handle over to IOMMU. Please help define the roles/responsibilities in
my other email. Then I will follow the guideline.

> > +	if (inv_info->argsz < minsz)
> > +		return -EINVAL;
> > +
> > +	/* Sanity check user filled invalidation dat sizes */
> > +	if (inv_info->granularity == IOMMU_INV_GRANU_ADDR &&
> > +		inv_info->argsz != offsetofend(struct
> > iommu_cache_invalidate_info,
> > +					addr_info))
> > +		return -EINVAL;
> > +
> > +	if (inv_info->granularity == IOMMU_INV_GRANU_PASID &&
> > +		inv_info->argsz != offsetofend(struct
> > iommu_cache_invalidate_info,
> > +					pasid_info))
> > +		return -EINVAL;
> > +
> >  	spin_lock_irqsave(&device_domain_lock, flags);
> >  	spin_lock(&iommu->lock);
> >  	info = get_domain_info(dev);
> > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> > index 35b43fe819ed..64dc2c66dfff 100644
> > --- a/drivers/iommu/intel-svm.c
> > +++ b/drivers/iommu/intel-svm.c
> > @@ -235,15 +235,27 @@ int intel_svm_bind_gpasid(struct iommu_domain
> > *domain, struct device *dev, struct dmar_domain *dmar_domain;
> >  	struct intel_svm_dev *sdev;
> >  	struct intel_svm *svm;
> > +	unsigned long minsz;
> >  	int ret = 0;
> >  
> >  	if (WARN_ON(!iommu) || !data)
> >  		return -EINVAL;
> >  
> > +	/*
> > +	 * We mandate that no size change in IOMMU UAPI data
> > before the
> > +	 * variable size union at the end.
> > +	 */
> > +	minsz = offsetofend(struct iommu_gpasid_bind_data,
> > padding);  
> 
> Same.  Thanks,
> 
> Alex
> 
> > +	if (data->argsz < minsz)
> > +		return -EINVAL;
> > +
> >  	if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
> >  	    data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
> >  		return -EINVAL;
> >  
> > +	if (data->argsz != offsetofend(struct
> > iommu_gpasid_bind_data, vtd))
> > +		return -EINVAL;
> > +
> >  	if (!dev_is_pci(dev))
> >  		return -ENOTSUPP;
> >    
> 

[Jacob Pan]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 1/3] docs: IOMMU user API
  2020-06-11 19:52     ` Jacob Pan
@ 2020-06-11 20:40       ` Alex Williamson
  2020-06-12  0:27         ` Jacob Pan
  0 siblings, 1 reply; 31+ messages in thread
From: Alex Williamson @ 2020-06-11 20:40 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	LKML, Christoph Hellwig, iommu, David Woodhouse

On Thu, 11 Jun 2020 12:52:05 -0700
Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:

> Hi Alex,
> 
> On Thu, 11 Jun 2020 09:47:41 -0600
> Alex Williamson <alex.williamson@redhat.com> wrote:
> 
> > On Wed, 10 Jun 2020 21:12:13 -0700
> > Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
> >   
> > > IOMMU UAPI is newly introduced to support communications between
> > > guest virtual IOMMU and host IOMMU. There has been lots of
> > > discussions on how it should work with VFIO UAPI and userspace in
> > > general.
> > > 
> > > This document is indended to clarify the UAPI design and usage. The
> > > mechenics of how future extensions should be achieved are also
> > > covered in this documentation.
> > > 
> > > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > ---
> > >  Documentation/userspace-api/iommu.rst | 210
> > > ++++++++++++++++++++++++++++++++++ 1 file changed, 210 insertions(+)
> > >  create mode 100644 Documentation/userspace-api/iommu.rst
> > > 
> > > diff --git a/Documentation/userspace-api/iommu.rst
> > > b/Documentation/userspace-api/iommu.rst new file mode 100644
> > > index 000000000000..e95dc5a04a41
> > > --- /dev/null
> > > +++ b/Documentation/userspace-api/iommu.rst
> > > @@ -0,0 +1,210 @@
> > > +.. SPDX-License-Identifier: GPL-2.0
> > > +.. iommu:
> > > +
> > > +=====================================
> > > +IOMMU Userspace API
> > > +=====================================
> > > +
> > > +IOMMU UAPI is used for virtualization cases where communications
> > > are +needed between physical and virtual IOMMU drivers. For native
> > > +usage, IOMMU is a system device which does not need to communicate
> > > +with user space directly.
> > > +
> > > +The primary use cases are guest Shared Virtual Address (SVA) and
> > > +guest IO virtual address (IOVA), wherein virtual IOMMU (vIOMMU) is
> > > +required to communicate with the physical IOMMU in the host.
> > > +
> > > +.. contents:: :local:
> > > +
> > > +Functionalities
> > > +====================================================
> > > +Communications of user and kernel involve both directions. The
> > > +supported user-kernel APIs are as follows:
> > > +
> > > +1. Alloc/Free PASID
> > > +2. Bind/unbind guest PASID (e.g. Intel VT-d)
> > > +3. Bind/unbind guest PASID table (e.g. ARM sMMU)
> > > +4. Invalidate IOMMU caches
> > > +5. Service page request
> > > +
> > > +Requirements
> > > +====================================================
> > > +The IOMMU UAPIs are generic and extensible to meet the following
> > > +requirements:
> > > +
> > > +1. Emulated and para-virtualised vIOMMUs
> > > +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.)
> > > +3. Extensions to the UAPI shall not break existing user space
> > > +
> > > +Interfaces
> > > +====================================================
> > > +Although the data structures defined in IOMMU UAPI are
> > > self-contained, +there is no user API functions introduced.
> > > Instead, IOMMU UAPI is +designed to work with existing user driver
> > > frameworks such as VFIO. +
> > > +Extension Rules & Precautions
> > > +-----------------------------
> > > +When IOMMU UAPI gets extended, the data structures can *only* be
> > > +modified in two ways:
> > > +
> > > +1. Adding new fields by re-purposing the padding[] field. No size
> > > change. +2. Adding new union members at the end. May increase in
> > > size. +
> > > +No new fields can be added *after* the variable size union in that
> > > it +will break backward compatibility when offset moves. In both
> > > cases, a +new flag must be accompanied with a new field such that
> > > the IOMMU +driver can process the data based on the new flag.
> > > Version field is +only reserved for the unlikely event of UAPI
> > > upgrade at its entirety. +
> > > +It's *always* the caller's responsibility to indicate the size of
> > > the +structure passed by setting argsz appropriately.
> > > +
> > > +When IOMMU UAPI extension results in size increase, user such as
> > > VFIO +has to handle the following scenarios:
> > > +
> > > +1. User and kernel has exact size match
> > > +2. An older user with older kernel header (smaller UAPI size)
> > > running on a
> > > +   newer kernel (larger UAPI size)
> > > +3. A newer user with newer kernel header (larger UAPI size) running
> > > +   on a older kernel.
> > > +4. A malicious/misbehaving user pass illegal/invalid size but
> > > within
> > > +   range. The data may contain garbage.
> > > +
> > > +
> > > +Feature Checking
> > > +----------------
> > > +While launching a guest with vIOMMU, it is important to ensure
> > > that host +can support the UAPI data structures to be used for
> > > vIOMMU-pIOMMU +communications. Without the upfront compatibility
> > > checking, future +faults are difficult to report even in normal
> > > conditions. For example, +TLB invalidations should always succeed
> > > from vIOMMU's +perspective. There is no architectural way to report
> > > back to the vIOMMU +if the UAPI data is incompatible. For this
> > > reason the following IOMMU +UAPIs cannot fail:
> > > +
> > > +1. Free PASID
> > > +2. Unbind guest PASID
> > > +3. Unbind guest PASID table (SMMU)
> > > +4. Cache invalidate
> > > +5. Page response
> > > +
> > > +User applications such as QEMU is expected to import kernel UAPI
> > > +headers. Only backward compatibility is supported. For example, an
> > > +older QEMU (with older kernel header) can run on newer kernel.
> > > Newer +QEMU (with new kernel header) may fail on older kernel.    
> > 
> > "Build your user application against newer kernels and it may break on
> > older kernels" is not a great selling point of this UAPI.  Clearly new
> > features may not be available on older kernels and an application that
> > depends on a newer feature may be restricted to newer kernels.
> >   
> Perhaps "fail on older kernel" is not the right statement. I meant to
> say "Newer QEMU (with new kernel header) may fail the compatibility
> check on older kernel". Here compatibility check involves argsz check
> and feature check.
> 
> Does it sound right?

If simply recompiling QEMU against a new kernel header causes it to
fail on an old kernel, we've done something very wrong in this UAPI.

> > > +
> > > +IOMMU vendor driver should report the below features to IOMMU UAPI
> > > +consumers (e.g. via VFIO).
> > > +
> > > +1. IOMMU_NESTING_FEAT_SYSWIDE_PASID
> > > +2. IOMMU_NESTING_FEAT_BIND_PGTBL
> > > +3. IOMMU_NESTING_FEAT_BIND_PASID_TABLE
> > > +4. IOMMU_NESTING_FEAT_CACHE_INVLD
> > > +5. IOMMU_NESTING_FEAT_PAGE_REQUEST
> > > +
> > > +Take VFIO as example, upon request from VFIO user space (e.g.
> > > QEMU), +VFIO kernel code shall query IOMMU vendor driver for the
> > > support of +the above features. Query result can then be reported
> > > back to the +user-space caller. Details can be found in
> > > +Documentation/driver-api/vfio.rst.
> > > +
> > > +
> > > +Data Passing Example with VFIO
> > > +------------------------------
> > > +As the ubiquitous userspace driver framework, VFIO is already IOMMU
> > > +aware and share many key concepts such as device model, group, and
> > > +protection domain. Other user driver frameworks can also be
> > > extended +to support IOMMU UAPI but it is outside the scope of this
> > > document. +
> > > +In this tight-knit VFIO-IOMMU interface, the ultimate consumer of
> > > the +IOMMU UAPI data is the host IOMMU driver. VFIO facilitates
> > > user-kernel +transport, capability checking, security, and life
> > > cycle management of +process address space ID (PASID).
> > > +
> > > +Unlike normal user data passed via VFIO UAPI IOTCL, IOMMU driver
> > > is the +ultimate consumer of its UAPI data. At VFIO layer, the
> > > IOMMU UAPI data +is wrapped in a VFIO UAPI data for sanity
> > > checking. It follows the +pattern below:
> > > +
> > > +::
> > > +
> > > +   struct {
> > > +	__u32 argsz;
> > > +	__u32 flags;
> > > +	__u8  data[];
> > > +  }
> > > +
> > > +Here data[] contains the IOMMU UAPI data structures.
> > > +
> > > +In order to determine the size and feature set of the user data,
> > > argsz +and flags are also embedded in the IOMMU UAPI data
> > > structures. +A "__u32 argsz" field is *always* at the beginning of
> > > each structure. +
> > > +For example:
> > > +::
> > > +
> > > +   struct iommu_gpasid_bind_data {
> > > +	__u32 argsz;
> > > +	__u32 version;
> > > +	#define IOMMU_PASID_FORMAT_INTEL_VTD	1
> > > +	__u32 format;
> > > +	#define IOMMU_SVA_GPASID_VAL	(1 << 0)
> > > +	__u64 flags;
> > > +	__u64 gpgd;
> > > +	__u64 hpasid;
> > > +	__u64 gpasid;
> > > +	__u32 addr_width;
> > > +	__u8  padding[12];
> > > +	/* Vendor specific data */
> > > +	union {
> > > +		struct iommu_gpasid_bind_data_vtd vtd;
> > > +	};
> > > +  };
> > > +
> > > +Use bind guest PASID as an example, VFIO code shall process IOMMU
> > > UAPI +request as follows:
> > > +
> > > +::
> > > +
> > > + 1        /* Minsz must include IOMMU UAPI "argsz" of __u32 */
> > > + 2        minsz = offsetofend(struct vfio_iommu_type1_bind, flags)
> > > +
> > > +                              sizeof(u32);    
> > 
> > In the example structure above:
> >   
> > > +   struct {
> > > +	__u32 argsz;
> > > +	__u32 flags;
> > > +	__u8  data[];
> > > +  }    
> > 
> > This presumes that vfio does not use flags to identify a different
> > layout, for example a field before data or defining a flag that
> > provides no data.  IOW, the IOMMU guarantees argsz at the beginning of
> > all structures, but let's not limit how vfio chooses to bundle that
> > structure.  minsz should be based on flags, which we'll evaluate to
> > determine how much more to copy.
> >   
> Got it, VFIO owns its flags therefore minsz. How about reword the
> example data struct as:
>    struct {
> 	__u32 argsz;
> 	__u32 flags;
> 	__u8  data[];
>   }
> 
> Here data[] contains the IOMMU UAPI data structures. VFIO has the
> freedom to bundle the data as well as parse data size based on its own flags.
> 
> In the example code:
> 
> "Use bind guest PASID as an example, VFIO code shall first
> process the flags field to determine the size to copy for IOMMU UAPI.
> The flags could indicate different layout of the VFIO data and the types
> of IOMMU UAPI data."

I think this should probably go no further than to say that the IOMMU
UAPI data structure is expected to be embedded opaquely into the VFIO
API, for example, VFIO may use a structure such as:

    struct {
 	__u32 argsz;
 	__u32 flags;
 	__u8  data[];
   }

where data[] contains an IOMMU UAPI structure, including the user
provided argsz field relative to that embedded structure.  This format
allows VFIO to multiplex multiple IOMMU UAPI interfaces through a
reduced set of ioctls.

> > > UAPI +request as follows:  
> 
> > > + 3        copy_from_user(&vfio_bind, (void __user *)arg, minsz);
> > > + 4
> > > + 5        /* Check VFIO argsz */
> > > + 6        if (vfio_bind.argsz < minsz)
> > > + 7                return -EINVAL;
> > > + 8
> > > + 9        /* VFIO flags must be included in minsz */
> > > + 10        switch (vfio_bind.flags) {
> > > + 11        case VFIO_IOMMU_BIND_GUEST_PGTBL:
> > > + 12                /*
> > > + 13                 * Get the current IOMMU bind GPASID data size,
> > > + 14                 * which accounted for the largest union member.
> > > + 15                 */
> > > + 16                data_size = sizeof(struct
> > > iommu_gpasid_bind_data);
> > > + 17                iommu_argsz = vfio_bind.argsz - minsz;    
> > 
> > Note that by including the IOMMU UAPI argsz within minsz, this is
> > incorrect.
> >   
> Good catch, should be:
> iommu_argsz = vfio_bind.argsz - minsz - sizeof(u32)
> 
> > > + 18                if (iommu_argsz > data_size) {
> > > + 19                        /* User data > current kernel */
> > > + 20                        return -E2BIG;
> > > + 21                }    
> > 
> > Now I see why you're making the claim that QEMU compiled against an
> > new kernel may not work on an older kernel.  We can do better.  The
> > current sizeof the data structure should be the maximum we'll copy
> > from the user, and we can update the user provided IOMMU UAPI argsz
> > as we pass it down from the user to avoid exposing ourselves to an
> > arbitrarily large user buffer.  The IOMMU UAPI interfaces should then
> > also use argsz and flags to determine whether the data is present for
> > a specified flag. That should allow a user application compiled
> > against a newer kernel header, but only using features found on older
> > kernels to continue to work on older kernels, which seems like a
> > basic requirement to me.
> >   
> I got your point. But I don't understand why VFIO layer will update
> IOMMU argsz, IOMMU layer is not exposed to arbitrary large user size in
> that it can not exceed the current UAPI data size.

I was thinking about the case where userspace is compiled against a
newer kernel and might provide argsz = sizeof(struct iommu_uapi_foo')
but vfio only knows sizeof(struct iommu_uapi_foo).  We don't want to
copy an arbitrary amount of data from userspace, so vfio would use
MIN(argsz, sizeof(struct iommu_uapi_foo)) for the copy_from_user().  The
IOMMU UAPI would then need to depend on the reduced argsz.

But then I thought it even better if VFIO leaves the entire
copy_from_user() to the layer consuming it.

> I agree we should make effort to allow features found in the older
> kernel continue to work.

That's a requirement.  Breaking existing userspace without following a
deprecation model is a bug.  But I think so too is the fact that this
interface actually specifies and provides an example where simply
recompiling existing userspace against a new kernel header where the
size of structure may be changed to support a feature will cause that
application to fail to run on older kernels.  That's not feasible for a
distribution to support.

> > > + 22                copy_from_user(&iommu_bind, (void __user *)
> > > + 23                               vfio_bind.data, iommu_argsz);
> > > + 24               /*
> > > + 25                * Deal with trailing bytes that is bigger than
> > > user
> > > + 26                * provided UAPI size but smaller than the
> > > current
> > > + 27                * kernel data size. Zero fill the trailing
> > > bytes.
> > > + 28                */
> > > + 29                memset(iommu_bind + iommu_argsz, 0, data_size -
> > > + 30                       iommu_argsz;    
> > 
> > The IOMMU UAPI interface having access to argsz should make this
> > unnecessary.  Performing this memset() seems like it suggests to the
> > next layer that it can rely on all fields being present and valid,
> > which defeats the purpose of argsz.
> >   
> This memset does not suggest all fields are present and valid. Only
> filter out the obvious invalid data based on current size. My intention
> is to reduce the burden of checking not eliminate.

I'm afraid that reducing the burden on the IOMMU UAPI layers leads to
reduced visibility which leads to less stringent validation.  We're
defining a UAPI layer for the kernel where VFIO just happens to be an
interface through to that UAPI.  The UAPI should therefore be providing
the validation, not VFIO.  Create a wrapper in the IOMMU UAPI layer if
you want to share common validation.

> > > + 31
> > > + 32                iommu_sva_bind_gpasid(domain, dev,
> > > iommu_bind_data);
> > > + 33                break;
> > > +
> > > +
> > > +Case #1 & 2 are supported per backward compatibility rule.
> > > +
> > > +Case #3 will fail with -E2BIG at line #20. Case    
> > 
> > This is not acceptable IMO.
> >   
> Got it. Will copy up to the current data size and let IOMMU driver
> handle supported flags/features.
> 
> > > +Case #4 may result in other error processed by IOMMU vendor
> > > driver. However, +the damage shall not exceed the scope of the
> > > offending user.    
> > 
> > This is a concern in this double wrapped interface, the IOMMU UAPI
> > layer may expect the vfio layer to validate the data.  Zeroing the
> > remainder of the data structure is evidence towards that.  The IOMMU
> > UAPI layer needs to consider all of this untrusted, so why would we
> > not reflect that by passing a __user pointer through to the IOMMU
> > UAPI such that it can copy the data from the user itself rather than
> > being mislead that the contents have been somehow verified?  Thanks,
> >   
> I am OK with IOMMU layer does the copy_from_user. One of my
> original thinking was that since some APIs (e.g page response) also
> used by in-kernel code, I would avoid user pointer or another
> wrapper.
> 
> Perhaps need to clarify the roles of each layer, IMHO the roles are:
> - VFIO
> 	1. bundle IOMMU UAPI data with flags & argsz
> 	2. sanity check argsz > minsz
> 	3. determine the copy_from_user size based on VFIO flags & argsz
> 
> - IOMMU UAPI
> 	1. check argsz against current kernel IOMMU UAPI data size, its
> 	own minsz
> 	2. parse data based on feature/flags
> 
> So if VFIO already can decide the copy_from_user size as in VFIO.3, why
> can't it just do the copy as well. VFIO only checks and ensures size,
> nothing specific to the content of IOMMU UAPI. Does the role partition
> sound right?

As above, the only way that this can be a generic UAPI is if VFIO is
just a passthrough, otherwise this just becomes VFIO API and we might
as well not pretend we're creating a UAPI between the VFIO and IOMMU.
If a second user of the UAPI would duplicate the code from VFIO, then
it shouldn't be in VFIO.  Thanks,

Alex

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 3/3] iommu/vt-d: Sanity check uapi argsz filled by users
  2020-06-11 20:02     ` Jacob Pan
@ 2020-06-11 20:55       ` Alex Williamson
  2020-06-11 23:58         ` Jacob Pan
  0 siblings, 1 reply; 31+ messages in thread
From: Alex Williamson @ 2020-06-11 20:55 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	LKML, Christoph Hellwig, iommu, David Woodhouse

On Thu, 11 Jun 2020 13:02:24 -0700
Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:

> On Thu, 11 Jun 2020 11:08:16 -0600
> Alex Williamson <alex.williamson@redhat.com> wrote:
> 
> > On Wed, 10 Jun 2020 21:12:15 -0700
> > Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
> >   
> > > IOMMU UAPI data has an argsz field which is filled by user. As the
> > > data structures expands, argsz may change. As the UAPI data are
> > > shared among different architectures, extensions of UAPI data could
> > > be a result of one architecture which has no impact on another.
> > > Therefore, these argsz santity checks are performed in the model
> > > specific IOMMU drivers. This patch adds sanity checks in the VT-d
> > > to ensure argsz passed by userspace matches feature flags and other
> > > contents.
> > > 
> > > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > ---
> > >  drivers/iommu/intel-iommu.c | 16 ++++++++++++++++
> > >  drivers/iommu/intel-svm.c   | 12 ++++++++++++
> > >  2 files changed, 28 insertions(+)
> > > 
> > > diff --git a/drivers/iommu/intel-iommu.c
> > > b/drivers/iommu/intel-iommu.c index 27ebf4b9faef..c98b5109684b
> > > 100644 --- a/drivers/iommu/intel-iommu.c
> > > +++ b/drivers/iommu/intel-iommu.c
> > > @@ -5365,6 +5365,7 @@ intel_iommu_sva_invalidate(struct
> > > iommu_domain *domain, struct device *dev, struct device_domain_info
> > > *info; struct intel_iommu *iommu;
> > >  	unsigned long flags;
> > > +	unsigned long minsz;
> > >  	int cache_type;
> > >  	u8 bus, devfn;
> > >  	u16 did, sid;
> > > @@ -5385,6 +5386,21 @@ intel_iommu_sva_invalidate(struct
> > > iommu_domain *domain, struct device *dev, if (!(dmar_domain->flags
> > > & DOMAIN_FLAG_NESTING_MODE)) return -EINVAL;
> > >  
> > > +	minsz = offsetofend(struct iommu_cache_invalidate_info,
> > > padding);    
> > 
> > Would it still be better to look for the end of the last field that's
> > actually used to avoid the code churn and oversights if/when the
> > padding field does get used and renamed?
> >   
> My thought was that if the padding gets partially re-purposed, the
> remaining padding would still be valid for minsz check. The extension
> rule ensures that there is no size change other the variable size union
> at the end. So use padding would avoid the churn, or i am totally wrong?

No, it's trying to predict the future either way.  I figured checking
minsz against the fields we actually consume allows complete use of the
padding fields and provides a little leniency to the user.  We'd need
to be careful though that if those fields are later used by this
driver, the code would still need to accept the smaller size.  If the
union was named rather than anonymous we could just use offsetof() to
avoid directly referencing the padding fields.
 
> > Per my comment on patch 1/, this also seems like where the device
> > specific IOMMU driver should also have the responsibility of receiving
> > a __user pointer to do the copy_from_user() here.  vfio can't know
> > which flags require which fields to make a UAPI with acceptable
> > compatibility guarantees otherwise.
> >   
> Right, VFIO cannot do compatibility guarantees, it is just seem to be
> that VFIO has enough information to copy_from_user sanely & safely and
> handle over to IOMMU. Please help define the roles/responsibilities in
> my other email. Then I will follow the guideline.

We can keep that part of the discussion in the other thread.  Thanks,

Alex

> > > +	if (inv_info->argsz < minsz)
> > > +		return -EINVAL;
> > > +
> > > +	/* Sanity check user filled invalidation dat sizes */
> > > +	if (inv_info->granularity == IOMMU_INV_GRANU_ADDR &&
> > > +		inv_info->argsz != offsetofend(struct
> > > iommu_cache_invalidate_info,
> > > +					addr_info))
> > > +		return -EINVAL;
> > > +
> > > +	if (inv_info->granularity == IOMMU_INV_GRANU_PASID &&
> > > +		inv_info->argsz != offsetofend(struct
> > > iommu_cache_invalidate_info,
> > > +					pasid_info))
> > > +		return -EINVAL;
> > > +
> > >  	spin_lock_irqsave(&device_domain_lock, flags);
> > >  	spin_lock(&iommu->lock);
> > >  	info = get_domain_info(dev);
> > > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> > > index 35b43fe819ed..64dc2c66dfff 100644
> > > --- a/drivers/iommu/intel-svm.c
> > > +++ b/drivers/iommu/intel-svm.c
> > > @@ -235,15 +235,27 @@ int intel_svm_bind_gpasid(struct iommu_domain
> > > *domain, struct device *dev, struct dmar_domain *dmar_domain;
> > >  	struct intel_svm_dev *sdev;
> > >  	struct intel_svm *svm;
> > > +	unsigned long minsz;
> > >  	int ret = 0;
> > >  
> > >  	if (WARN_ON(!iommu) || !data)
> > >  		return -EINVAL;
> > >  
> > > +	/*
> > > +	 * We mandate that no size change in IOMMU UAPI data
> > > before the
> > > +	 * variable size union at the end.
> > > +	 */
> > > +	minsz = offsetofend(struct iommu_gpasid_bind_data,
> > > padding);    
> > 
> > Same.  Thanks,
> > 
> > Alex
> >   
> > > +	if (data->argsz < minsz)
> > > +		return -EINVAL;
> > > +
> > >  	if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
> > >  	    data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
> > >  		return -EINVAL;
> > >  
> > > +	if (data->argsz != offsetofend(struct
> > > iommu_gpasid_bind_data, vtd))
> > > +		return -EINVAL;
> > > +
> > >  	if (!dev_is_pci(dev))
> > >  		return -ENOTSUPP;
> > >      
> >   
> 
> [Jacob Pan]
> 

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 3/3] iommu/vt-d: Sanity check uapi argsz filled by users
  2020-06-11 20:55       ` Alex Williamson
@ 2020-06-11 23:58         ` Jacob Pan
  0 siblings, 0 replies; 31+ messages in thread
From: Jacob Pan @ 2020-06-11 23:58 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	LKML, Christoph Hellwig, iommu, David Woodhouse

On Thu, 11 Jun 2020 14:55:18 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> On Thu, 11 Jun 2020 13:02:24 -0700
> Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
> 
> > On Thu, 11 Jun 2020 11:08:16 -0600
> > Alex Williamson <alex.williamson@redhat.com> wrote:
> >   
> > > On Wed, 10 Jun 2020 21:12:15 -0700
> > > Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
> > >     
> > > > IOMMU UAPI data has an argsz field which is filled by user. As
> > > > the data structures expands, argsz may change. As the UAPI data
> > > > are shared among different architectures, extensions of UAPI
> > > > data could be a result of one architecture which has no impact
> > > > on another. Therefore, these argsz santity checks are performed
> > > > in the model specific IOMMU drivers. This patch adds sanity
> > > > checks in the VT-d to ensure argsz passed by userspace matches
> > > > feature flags and other contents.
> > > > 
> > > > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > > ---
> > > >  drivers/iommu/intel-iommu.c | 16 ++++++++++++++++
> > > >  drivers/iommu/intel-svm.c   | 12 ++++++++++++
> > > >  2 files changed, 28 insertions(+)
> > > > 
> > > > diff --git a/drivers/iommu/intel-iommu.c
> > > > b/drivers/iommu/intel-iommu.c index 27ebf4b9faef..c98b5109684b
> > > > 100644 --- a/drivers/iommu/intel-iommu.c
> > > > +++ b/drivers/iommu/intel-iommu.c
> > > > @@ -5365,6 +5365,7 @@ intel_iommu_sva_invalidate(struct
> > > > iommu_domain *domain, struct device *dev, struct
> > > > device_domain_info *info; struct intel_iommu *iommu;
> > > >  	unsigned long flags;
> > > > +	unsigned long minsz;
> > > >  	int cache_type;
> > > >  	u8 bus, devfn;
> > > >  	u16 did, sid;
> > > > @@ -5385,6 +5386,21 @@ intel_iommu_sva_invalidate(struct
> > > > iommu_domain *domain, struct device *dev, if
> > > > (!(dmar_domain->flags & DOMAIN_FLAG_NESTING_MODE)) return
> > > > -EINVAL; 
> > > > +	minsz = offsetofend(struct iommu_cache_invalidate_info,
> > > > padding);      
> > > 
> > > Would it still be better to look for the end of the last field
> > > that's actually used to avoid the code churn and oversights
> > > if/when the padding field does get used and renamed?
> > >     
> > My thought was that if the padding gets partially re-purposed, the
> > remaining padding would still be valid for minsz check. The
> > extension rule ensures that there is no size change other the
> > variable size union at the end. So use padding would avoid the
> > churn, or i am totally wrong?  
> 
> No, it's trying to predict the future either way.  I figured checking
> minsz against the fields we actually consume allows complete use of
> the padding fields and provides a little leniency to the user.  We'd
> need to be careful though that if those fields are later used by this
> driver, the code would still need to accept the smaller size.  If the
> union was named rather than anonymous we could just use offsetof() to
> avoid directly referencing the padding fields.
>  
I will change it to named union.

Thanks,

> > > Per my comment on patch 1/, this also seems like where the device
> > > specific IOMMU driver should also have the responsibility of
> > > receiving a __user pointer to do the copy_from_user() here.  vfio
> > > can't know which flags require which fields to make a UAPI with
> > > acceptable compatibility guarantees otherwise.
> > >     
> > Right, VFIO cannot do compatibility guarantees, it is just seem to
> > be that VFIO has enough information to copy_from_user sanely &
> > safely and handle over to IOMMU. Please help define the
> > roles/responsibilities in my other email. Then I will follow the
> > guideline.  
> 
> We can keep that part of the discussion in the other thread.  Thanks,
> 
> Alex
> 
> > > > +	if (inv_info->argsz < minsz)
> > > > +		return -EINVAL;
> > > > +
> > > > +	/* Sanity check user filled invalidation dat sizes */
> > > > +	if (inv_info->granularity == IOMMU_INV_GRANU_ADDR &&
> > > > +		inv_info->argsz != offsetofend(struct
> > > > iommu_cache_invalidate_info,
> > > > +					addr_info))
> > > > +		return -EINVAL;
> > > > +
> > > > +	if (inv_info->granularity == IOMMU_INV_GRANU_PASID &&
> > > > +		inv_info->argsz != offsetofend(struct
> > > > iommu_cache_invalidate_info,
> > > > +					pasid_info))
> > > > +		return -EINVAL;
> > > > +
> > > >  	spin_lock_irqsave(&device_domain_lock, flags);
> > > >  	spin_lock(&iommu->lock);
> > > >  	info = get_domain_info(dev);
> > > > diff --git a/drivers/iommu/intel-svm.c
> > > > b/drivers/iommu/intel-svm.c index 35b43fe819ed..64dc2c66dfff
> > > > 100644 --- a/drivers/iommu/intel-svm.c
> > > > +++ b/drivers/iommu/intel-svm.c
> > > > @@ -235,15 +235,27 @@ int intel_svm_bind_gpasid(struct
> > > > iommu_domain *domain, struct device *dev, struct dmar_domain
> > > > *dmar_domain; struct intel_svm_dev *sdev;
> > > >  	struct intel_svm *svm;
> > > > +	unsigned long minsz;
> > > >  	int ret = 0;
> > > >  
> > > >  	if (WARN_ON(!iommu) || !data)
> > > >  		return -EINVAL;
> > > >  
> > > > +	/*
> > > > +	 * We mandate that no size change in IOMMU UAPI data
> > > > before the
> > > > +	 * variable size union at the end.
> > > > +	 */
> > > > +	minsz = offsetofend(struct iommu_gpasid_bind_data,
> > > > padding);      
> > > 
> > > Same.  Thanks,
> > > 
> > > Alex
> > >     
> > > > +	if (data->argsz < minsz)
> > > > +		return -EINVAL;
> > > > +
> > > >  	if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
> > > >  	    data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
> > > >  		return -EINVAL;
> > > >  
> > > > +	if (data->argsz != offsetofend(struct
> > > > iommu_gpasid_bind_data, vtd))
> > > > +		return -EINVAL;
> > > > +
> > > >  	if (!dev_is_pci(dev))
> > > >  		return -ENOTSUPP;
> > > >        
> > >     
> > 
> > [Jacob Pan]
> >   
> 

[Jacob Pan]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 2/3] iommu/uapi: Add argsz for user filled data
  2020-06-11 16:49   ` Alex Williamson
@ 2020-06-12  0:02     ` Jacob Pan
  0 siblings, 0 replies; 31+ messages in thread
From: Jacob Pan @ 2020-06-12  0:02 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	LKML, Christoph Hellwig, iommu, David Woodhouse

On Thu, 11 Jun 2020 10:49:36 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> On Wed, 10 Jun 2020 21:12:14 -0700
> Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
> 
> > As IOMMU UAPI gets extended, user data size may increase. To support
> > backward compatibiliy, this patch introduces a size field to each
> > UAPI data structures. It is *always* the responsibility for the
> > user to fill in the correct size.  
> 
> Though at the same time, argsz is user provided data which we don't
> trust.  The argsz field allows the user to indicate how much data
> they're providing, it's still the kernel's responsibility to validate
> whether it's correct and sufficient for the requested operation.
> Thanks,
> 
Yes, will add this clarification.

Thanks,

Jacob
> Alex
> 
> > Specific scenarios for user data handling are documented in:
> > Documentation/userspace-api/iommu.rst
> > 
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > ---
> >  include/uapi/linux/iommu.h | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> > index e907b7091a46..303f148a5cd7 100644
> > --- a/include/uapi/linux/iommu.h
> > +++ b/include/uapi/linux/iommu.h
> > @@ -135,6 +135,7 @@ enum iommu_page_response_code {
> >  
> >  /**
> >   * struct iommu_page_response - Generic page response information
> > + * @argsz: User filled size of this data
> >   * @version: API version of this structure
> >   * @flags: encodes whether the corresponding fields are valid
> >   *         (IOMMU_FAULT_PAGE_RESPONSE_* values)
> > @@ -143,6 +144,7 @@ enum iommu_page_response_code {
> >   * @code: response code from &enum iommu_page_response_code
> >   */
> >  struct iommu_page_response {
> > +	__u32	argsz;
> >  #define IOMMU_PAGE_RESP_VERSION_1	1
> >  	__u32	version;
> >  #define IOMMU_PAGE_RESP_PASID_VALID	(1 << 0)
> > @@ -218,6 +220,7 @@ struct iommu_inv_pasid_info {
> >  /**
> >   * struct iommu_cache_invalidate_info - First level/stage
> > invalidation
> >   *     information
> > + * @argsz: User filled size of this data
> >   * @version: API version of this structure
> >   * @cache: bitfield that allows to select which caches to
> > invalidate
> >   * @granularity: defines the lowest granularity used for the
> > invalidation: @@ -246,6 +249,7 @@ struct iommu_inv_pasid_info {
> >   * must support the used granularity.
> >   */
> >  struct iommu_cache_invalidate_info {
> > +	__u32	argsz;
> >  #define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
> >  	__u32	version;
> >  /* IOMMU paging structure cache */
> > @@ -292,6 +296,7 @@ struct iommu_gpasid_bind_data_vtd {
> >  
> >  /**
> >   * struct iommu_gpasid_bind_data - Information about device and
> > guest PASID binding
> > + * @argsz:	User filled size of this data
> >   * @version:	Version of this data structure
> >   * @format:	PASID table entry format
> >   * @flags:	Additional information on guest bind request
> > @@ -309,6 +314,7 @@ struct iommu_gpasid_bind_data_vtd {
> >   * PASID to host PASID based on this bind data.
> >   */
> >  struct iommu_gpasid_bind_data {
> > +	__u32 argsz;
> >  #define IOMMU_GPASID_BIND_VERSION_1	1
> >  	__u32 version;
> >  #define IOMMU_PASID_FORMAT_INTEL_VTD	1  
> 

[Jacob Pan]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 1/3] docs: IOMMU user API
  2020-06-11 20:40       ` Alex Williamson
@ 2020-06-12  0:27         ` Jacob Pan
  2020-06-12  7:38           ` Tian, Kevin
  2020-06-16 15:22           ` Jacob Pan
  0 siblings, 2 replies; 31+ messages in thread
From: Jacob Pan @ 2020-06-12  0:27 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	LKML, Christoph Hellwig, iommu, David Woodhouse

On Thu, 11 Jun 2020 14:40:47 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> On Thu, 11 Jun 2020 12:52:05 -0700
> Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
> 
> > Hi Alex,
> > 
> > On Thu, 11 Jun 2020 09:47:41 -0600
> > Alex Williamson <alex.williamson@redhat.com> wrote:
> >   
> > > On Wed, 10 Jun 2020 21:12:13 -0700
> > > Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
> > >     
> > > > IOMMU UAPI is newly introduced to support communications between
> > > > guest virtual IOMMU and host IOMMU. There has been lots of
> > > > discussions on how it should work with VFIO UAPI and userspace
> > > > in general.
> > > > 
> > > > This document is indended to clarify the UAPI design and usage.
> > > > The mechenics of how future extensions should be achieved are
> > > > also covered in this documentation.
> > > > 
> > > > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > > > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > > ---
> > > >  Documentation/userspace-api/iommu.rst | 210
> > > > ++++++++++++++++++++++++++++++++++ 1 file changed, 210
> > > > insertions(+) create mode 100644
> > > > Documentation/userspace-api/iommu.rst
> > > > 
> > > > diff --git a/Documentation/userspace-api/iommu.rst
> > > > b/Documentation/userspace-api/iommu.rst new file mode 100644
> > > > index 000000000000..e95dc5a04a41
> > > > --- /dev/null
> > > > +++ b/Documentation/userspace-api/iommu.rst
> > > > @@ -0,0 +1,210 @@
> > > > +.. SPDX-License-Identifier: GPL-2.0
> > > > +.. iommu:
> > > > +
> > > > +=====================================
> > > > +IOMMU Userspace API
> > > > +=====================================
> > > > +
> > > > +IOMMU UAPI is used for virtualization cases where
> > > > communications are +needed between physical and virtual IOMMU
> > > > drivers. For native +usage, IOMMU is a system device which does
> > > > not need to communicate +with user space directly.
> > > > +
> > > > +The primary use cases are guest Shared Virtual Address (SVA)
> > > > and +guest IO virtual address (IOVA), wherein virtual IOMMU
> > > > (vIOMMU) is +required to communicate with the physical IOMMU in
> > > > the host. +
> > > > +.. contents:: :local:
> > > > +
> > > > +Functionalities
> > > > +====================================================
> > > > +Communications of user and kernel involve both directions. The
> > > > +supported user-kernel APIs are as follows:
> > > > +
> > > > +1. Alloc/Free PASID
> > > > +2. Bind/unbind guest PASID (e.g. Intel VT-d)
> > > > +3. Bind/unbind guest PASID table (e.g. ARM sMMU)
> > > > +4. Invalidate IOMMU caches
> > > > +5. Service page request
> > > > +
> > > > +Requirements
> > > > +====================================================
> > > > +The IOMMU UAPIs are generic and extensible to meet the
> > > > following +requirements:
> > > > +
> > > > +1. Emulated and para-virtualised vIOMMUs
> > > > +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.)
> > > > +3. Extensions to the UAPI shall not break existing user space
> > > > +
> > > > +Interfaces
> > > > +====================================================
> > > > +Although the data structures defined in IOMMU UAPI are
> > > > self-contained, +there is no user API functions introduced.
> > > > Instead, IOMMU UAPI is +designed to work with existing user
> > > > driver frameworks such as VFIO. +
> > > > +Extension Rules & Precautions
> > > > +-----------------------------
> > > > +When IOMMU UAPI gets extended, the data structures can *only*
> > > > be +modified in two ways:
> > > > +
> > > > +1. Adding new fields by re-purposing the padding[] field. No
> > > > size change. +2. Adding new union members at the end. May
> > > > increase in size. +
> > > > +No new fields can be added *after* the variable size union in
> > > > that it +will break backward compatibility when offset moves.
> > > > In both cases, a +new flag must be accompanied with a new field
> > > > such that the IOMMU +driver can process the data based on the
> > > > new flag. Version field is +only reserved for the unlikely
> > > > event of UAPI upgrade at its entirety. +
> > > > +It's *always* the caller's responsibility to indicate the size
> > > > of the +structure passed by setting argsz appropriately.
> > > > +
> > > > +When IOMMU UAPI extension results in size increase, user such
> > > > as VFIO +has to handle the following scenarios:
> > > > +
> > > > +1. User and kernel has exact size match
> > > > +2. An older user with older kernel header (smaller UAPI size)
> > > > running on a
> > > > +   newer kernel (larger UAPI size)
> > > > +3. A newer user with newer kernel header (larger UAPI size)
> > > > running
> > > > +   on a older kernel.
> > > > +4. A malicious/misbehaving user pass illegal/invalid size but
> > > > within
> > > > +   range. The data may contain garbage.
> > > > +
> > > > +
> > > > +Feature Checking
> > > > +----------------
> > > > +While launching a guest with vIOMMU, it is important to ensure
> > > > that host +can support the UAPI data structures to be used for
> > > > vIOMMU-pIOMMU +communications. Without the upfront compatibility
> > > > checking, future +faults are difficult to report even in normal
> > > > conditions. For example, +TLB invalidations should always
> > > > succeed from vIOMMU's +perspective. There is no architectural
> > > > way to report back to the vIOMMU +if the UAPI data is
> > > > incompatible. For this reason the following IOMMU +UAPIs cannot
> > > > fail: +
> > > > +1. Free PASID
> > > > +2. Unbind guest PASID
> > > > +3. Unbind guest PASID table (SMMU)
> > > > +4. Cache invalidate
> > > > +5. Page response
> > > > +
> > > > +User applications such as QEMU is expected to import kernel
> > > > UAPI +headers. Only backward compatibility is supported. For
> > > > example, an +older QEMU (with older kernel header) can run on
> > > > newer kernel. Newer +QEMU (with new kernel header) may fail on
> > > > older kernel.      
> > > 
> > > "Build your user application against newer kernels and it may
> > > break on older kernels" is not a great selling point of this
> > > UAPI.  Clearly new features may not be available on older kernels
> > > and an application that depends on a newer feature may be
> > > restricted to newer kernels. 
> > Perhaps "fail on older kernel" is not the right statement. I meant
> > to say "Newer QEMU (with new kernel header) may fail the
> > compatibility check on older kernel". Here compatibility check
> > involves argsz check and feature check.
> > 
> > Does it sound right?  
> 
> If simply recompiling QEMU against a new kernel header causes it to
> fail on an old kernel, we've done something very wrong in this UAPI.
> 

I agree we should make best effort to support the fields in the new
header that was supported in the older kernel.

But there will be cases that new app fails on old kernel if the new
fields from the new header are used. Do we have consensus on this?

> > > > +
> > > > +IOMMU vendor driver should report the below features to IOMMU
> > > > UAPI +consumers (e.g. via VFIO).
> > > > +
> > > > +1. IOMMU_NESTING_FEAT_SYSWIDE_PASID
> > > > +2. IOMMU_NESTING_FEAT_BIND_PGTBL
> > > > +3. IOMMU_NESTING_FEAT_BIND_PASID_TABLE
> > > > +4. IOMMU_NESTING_FEAT_CACHE_INVLD
> > > > +5. IOMMU_NESTING_FEAT_PAGE_REQUEST
> > > > +
> > > > +Take VFIO as example, upon request from VFIO user space (e.g.
> > > > QEMU), +VFIO kernel code shall query IOMMU vendor driver for the
> > > > support of +the above features. Query result can then be
> > > > reported back to the +user-space caller. Details can be found in
> > > > +Documentation/driver-api/vfio.rst.
> > > > +
> > > > +
> > > > +Data Passing Example with VFIO
> > > > +------------------------------
> > > > +As the ubiquitous userspace driver framework, VFIO is already
> > > > IOMMU +aware and share many key concepts such as device model,
> > > > group, and +protection domain. Other user driver frameworks can
> > > > also be extended +to support IOMMU UAPI but it is outside the
> > > > scope of this document. +
> > > > +In this tight-knit VFIO-IOMMU interface, the ultimate consumer
> > > > of the +IOMMU UAPI data is the host IOMMU driver. VFIO
> > > > facilitates user-kernel +transport, capability checking,
> > > > security, and life cycle management of +process address space
> > > > ID (PASID). +
> > > > +Unlike normal user data passed via VFIO UAPI IOTCL, IOMMU
> > > > driver is the +ultimate consumer of its UAPI data. At VFIO
> > > > layer, the IOMMU UAPI data +is wrapped in a VFIO UAPI data for
> > > > sanity checking. It follows the +pattern below:
> > > > +
> > > > +::
> > > > +
> > > > +   struct {
> > > > +	__u32 argsz;
> > > > +	__u32 flags;
> > > > +	__u8  data[];
> > > > +  }
> > > > +
> > > > +Here data[] contains the IOMMU UAPI data structures.
> > > > +
> > > > +In order to determine the size and feature set of the user
> > > > data, argsz +and flags are also embedded in the IOMMU UAPI data
> > > > structures. +A "__u32 argsz" field is *always* at the beginning
> > > > of each structure. +
> > > > +For example:
> > > > +::
> > > > +
> > > > +   struct iommu_gpasid_bind_data {
> > > > +	__u32 argsz;
> > > > +	__u32 version;
> > > > +	#define IOMMU_PASID_FORMAT_INTEL_VTD	1
> > > > +	__u32 format;
> > > > +	#define IOMMU_SVA_GPASID_VAL	(1 << 0)
> > > > +	__u64 flags;
> > > > +	__u64 gpgd;
> > > > +	__u64 hpasid;
> > > > +	__u64 gpasid;
> > > > +	__u32 addr_width;
> > > > +	__u8  padding[12];
> > > > +	/* Vendor specific data */
> > > > +	union {
> > > > +		struct iommu_gpasid_bind_data_vtd vtd;
> > > > +	};
> > > > +  };
> > > > +
> > > > +Use bind guest PASID as an example, VFIO code shall process
> > > > IOMMU UAPI +request as follows:
> > > > +
> > > > +::
> > > > +
> > > > + 1        /* Minsz must include IOMMU UAPI "argsz" of __u32 */
> > > > + 2        minsz = offsetofend(struct vfio_iommu_type1_bind,
> > > > flags) +
> > > > +                              sizeof(u32);      
> > > 
> > > In the example structure above:
> > >     
> > > > +   struct {
> > > > +	__u32 argsz;
> > > > +	__u32 flags;
> > > > +	__u8  data[];
> > > > +  }      
> > > 
> > > This presumes that vfio does not use flags to identify a different
> > > layout, for example a field before data or defining a flag that
> > > provides no data.  IOW, the IOMMU guarantees argsz at the
> > > beginning of all structures, but let's not limit how vfio chooses
> > > to bundle that structure.  minsz should be based on flags, which
> > > we'll evaluate to determine how much more to copy.
> > >     
> > Got it, VFIO owns its flags therefore minsz. How about reword the
> > example data struct as:
> >    struct {
> > 	__u32 argsz;
> > 	__u32 flags;
> > 	__u8  data[];
> >   }
> > 
> > Here data[] contains the IOMMU UAPI data structures. VFIO has the
> > freedom to bundle the data as well as parse data size based on its
> > own flags.
> > 
> > In the example code:
> > 
> > "Use bind guest PASID as an example, VFIO code shall first
> > process the flags field to determine the size to copy for IOMMU
> > UAPI. The flags could indicate different layout of the VFIO data
> > and the types of IOMMU UAPI data."  
> 
> I think this should probably go no further than to say that the IOMMU
> UAPI data structure is expected to be embedded opaquely into the VFIO
> API, for example, VFIO may use a structure such as:
> 
>     struct {
>  	__u32 argsz;
>  	__u32 flags;
>  	__u8  data[];
>    }
> 
> where data[] contains an IOMMU UAPI structure, including the user
> provided argsz field relative to that embedded structure.  This format
> allows VFIO to multiplex multiple IOMMU UAPI interfaces through a
> reduced set of ioctls.
> 
Sounds good, thanks for the summary.

> > > > UAPI +request as follows:    
> >   
> > > > + 3        copy_from_user(&vfio_bind, (void __user *)arg,
> > > > minsz);
> > > > + 4
> > > > + 5        /* Check VFIO argsz */
> > > > + 6        if (vfio_bind.argsz < minsz)
> > > > + 7                return -EINVAL;
> > > > + 8
> > > > + 9        /* VFIO flags must be included in minsz */
> > > > + 10        switch (vfio_bind.flags) {
> > > > + 11        case VFIO_IOMMU_BIND_GUEST_PGTBL:
> > > > + 12                /*
> > > > + 13                 * Get the current IOMMU bind GPASID data
> > > > size,
> > > > + 14                 * which accounted for the largest union
> > > > member.
> > > > + 15                 */
> > > > + 16                data_size = sizeof(struct
> > > > iommu_gpasid_bind_data);
> > > > + 17                iommu_argsz = vfio_bind.argsz - minsz;      
> > > 
> > > Note that by including the IOMMU UAPI argsz within minsz, this is
> > > incorrect.
> > >     
> > Good catch, should be:
> > iommu_argsz = vfio_bind.argsz - minsz - sizeof(u32)
> >   
> > > > + 18                if (iommu_argsz > data_size) {
> > > > + 19                        /* User data > current kernel */
> > > > + 20                        return -E2BIG;
> > > > + 21                }      
> > > 
> > > Now I see why you're making the claim that QEMU compiled against
> > > an new kernel may not work on an older kernel.  We can do
> > > better.  The current sizeof the data structure should be the
> > > maximum we'll copy from the user, and we can update the user
> > > provided IOMMU UAPI argsz as we pass it down from the user to
> > > avoid exposing ourselves to an arbitrarily large user buffer.
> > > The IOMMU UAPI interfaces should then also use argsz and flags to
> > > determine whether the data is present for a specified flag. That
> > > should allow a user application compiled against a newer kernel
> > > header, but only using features found on older kernels to
> > > continue to work on older kernels, which seems like a basic
> > > requirement to me. 
> > I got your point. But I don't understand why VFIO layer will update
> > IOMMU argsz, IOMMU layer is not exposed to arbitrary large user
> > size in that it can not exceed the current UAPI data size.  
> 
> I was thinking about the case where userspace is compiled against a
> newer kernel and might provide argsz = sizeof(struct iommu_uapi_foo')
> but vfio only knows sizeof(struct iommu_uapi_foo).  We don't want to
> copy an arbitrary amount of data from userspace, so vfio would use
> MIN(argsz, sizeof(struct iommu_uapi_foo)) for the copy_from_user().
> The IOMMU UAPI would then need to depend on the reduced argsz.
> 
> But then I thought it even better if VFIO leaves the entire
> copy_from_user() to the layer consuming it.
> 
OK. Sounds good, that was what Kevin suggested also. I just wasn't sure
how much VFIO wants to inspect, I thought VFIO layer wanted to do a
sanity check.

Anyway, I will move copy_from_user to iommu uapi layer.

> > I agree we should make effort to allow features found in the older
> > kernel continue to work.  
> 
> That's a requirement.  Breaking existing userspace without following a
> deprecation model is a bug.
Sorry I don't understand why it is breaking existing userspace. If a
userspace is recompiled with new kernel header, it is not *existing*.

> But I think so too is the fact that this
> interface actually specifies and provides an example where simply
> recompiling existing userspace against a new kernel header where the
> size of structure may be changed to support a feature will cause that
> application to fail to run on older kernels.  That's not feasible for
> a distribution to support.
> 
I see your point, my assumption was that app (e.g. QEMU) imports new
header selectively. The new header must be imported with an intention
of using the new flags/fields. I guess this may not *always* be true.
I will add the support for older kernel to run on new header (if no new
fields/flags are used).


Thanks a lot!

> > > > + 22                copy_from_user(&iommu_bind, (void __user *)
> > > > + 23                               vfio_bind.data, iommu_argsz);
> > > > + 24               /*
> > > > + 25                * Deal with trailing bytes that is bigger
> > > > than user
> > > > + 26                * provided UAPI size but smaller than the
> > > > current
> > > > + 27                * kernel data size. Zero fill the trailing
> > > > bytes.
> > > > + 28                */
> > > > + 29                memset(iommu_bind + iommu_argsz, 0,
> > > > data_size -
> > > > + 30                       iommu_argsz;      
> > > 
> > > The IOMMU UAPI interface having access to argsz should make this
> > > unnecessary.  Performing this memset() seems like it suggests to
> > > the next layer that it can rely on all fields being present and
> > > valid, which defeats the purpose of argsz.
> > >     
> > This memset does not suggest all fields are present and valid. Only
> > filter out the obvious invalid data based on current size. My
> > intention is to reduce the burden of checking not eliminate.  
> 
> I'm afraid that reducing the burden on the IOMMU UAPI layers leads to
> reduced visibility which leads to less stringent validation.  We're
> defining a UAPI layer for the kernel where VFIO just happens to be an
> interface through to that UAPI.  The UAPI should therefore be
> providing the validation, not VFIO.  Create a wrapper in the IOMMU
> UAPI layer if you want to share common validation.
> 
> > > > + 31
> > > > + 32                iommu_sva_bind_gpasid(domain, dev,
> > > > iommu_bind_data);
> > > > + 33                break;
> > > > +
> > > > +
> > > > +Case #1 & 2 are supported per backward compatibility rule.
> > > > +
> > > > +Case #3 will fail with -E2BIG at line #20. Case      
> > > 
> > > This is not acceptable IMO.
> > >     
> > Got it. Will copy up to the current data size and let IOMMU driver
> > handle supported flags/features.
> >   
> > > > +Case #4 may result in other error processed by IOMMU vendor
> > > > driver. However, +the damage shall not exceed the scope of the
> > > > offending user.      
> > > 
> > > This is a concern in this double wrapped interface, the IOMMU UAPI
> > > layer may expect the vfio layer to validate the data.  Zeroing the
> > > remainder of the data structure is evidence towards that.  The
> > > IOMMU UAPI layer needs to consider all of this untrusted, so why
> > > would we not reflect that by passing a __user pointer through to
> > > the IOMMU UAPI such that it can copy the data from the user
> > > itself rather than being mislead that the contents have been
> > > somehow verified?  Thanks, 
> > I am OK with IOMMU layer does the copy_from_user. One of my
> > original thinking was that since some APIs (e.g page response) also
> > used by in-kernel code, I would avoid user pointer or another
> > wrapper.
> > 
> > Perhaps need to clarify the roles of each layer, IMHO the roles are:
> > - VFIO
> > 	1. bundle IOMMU UAPI data with flags & argsz
> > 	2. sanity check argsz > minsz
> > 	3. determine the copy_from_user size based on VFIO flags &
> > argsz
> > 
> > - IOMMU UAPI
> > 	1. check argsz against current kernel IOMMU UAPI data size,
> > its own minsz
> > 	2. parse data based on feature/flags
> > 
> > So if VFIO already can decide the copy_from_user size as in VFIO.3,
> > why can't it just do the copy as well. VFIO only checks and ensures
> > size, nothing specific to the content of IOMMU UAPI. Does the role
> > partition sound right?  
> 
> As above, the only way that this can be a generic UAPI is if VFIO is
> just a passthrough, otherwise this just becomes VFIO API and we might
> as well not pretend we're creating a UAPI between the VFIO and IOMMU.
> If a second user of the UAPI would duplicate the code from VFIO, then
> it shouldn't be in VFIO.  Thanks,
> 
> Alex
> 

[Jacob Pan]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH v2 1/3] docs: IOMMU user API
  2020-06-12  0:27         ` Jacob Pan
@ 2020-06-12  7:38           ` Tian, Kevin
  2020-06-12 13:09             ` Jacob Pan
  2020-06-16 15:22           ` Jacob Pan
  1 sibling, 1 reply; 31+ messages in thread
From: Tian, Kevin @ 2020-06-12  7:38 UTC (permalink / raw)
  To: Jacob Pan, Alex Williamson
  Cc: Raj, Ashok, Jonathan Corbet, Jean-Philippe Brucker, LKML,
	Christoph Hellwig, iommu, David Woodhouse

> From: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Sent: Friday, June 12, 2020 8:27 AM
> 
> On Thu, 11 Jun 2020 14:40:47 -0600
> Alex Williamson <alex.williamson@redhat.com> wrote:
> 
> > On Thu, 11 Jun 2020 12:52:05 -0700
> > Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
> >
> > > Hi Alex,
> > >
> > > On Thu, 11 Jun 2020 09:47:41 -0600
> > > Alex Williamson <alex.williamson@redhat.com> wrote:
> > >
> > > > On Wed, 10 Jun 2020 21:12:13 -0700
> > > > Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
> > > >
> > > > > IOMMU UAPI is newly introduced to support communications
> between
> > > > > guest virtual IOMMU and host IOMMU. There has been lots of
> > > > > discussions on how it should work with VFIO UAPI and userspace
> > > > > in general.
> > > > >
> > > > > This document is indended to clarify the UAPI design and usage.
> > > > > The mechenics of how future extensions should be achieved are
> > > > > also covered in this documentation.
> > > > >
> > > > > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > > > > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > > > ---
> > > > >  Documentation/userspace-api/iommu.rst | 210
> > > > > ++++++++++++++++++++++++++++++++++ 1 file changed, 210
> > > > > insertions(+) create mode 100644
> > > > > Documentation/userspace-api/iommu.rst
> > > > >
> > > > > diff --git a/Documentation/userspace-api/iommu.rst
> > > > > b/Documentation/userspace-api/iommu.rst new file mode 100644
> > > > > index 000000000000..e95dc5a04a41
> > > > > --- /dev/null
> > > > > +++ b/Documentation/userspace-api/iommu.rst
> > > > > @@ -0,0 +1,210 @@
> > > > > +.. SPDX-License-Identifier: GPL-2.0
> > > > > +.. iommu:
> > > > > +
> > > > > +=====================================
> > > > > +IOMMU Userspace API
> > > > > +=====================================
> > > > > +
> > > > > +IOMMU UAPI is used for virtualization cases where
> > > > > communications are +needed between physical and virtual IOMMU
> > > > > drivers. For native +usage, IOMMU is a system device which does
> > > > > not need to communicate +with user space directly.
> > > > > +
> > > > > +The primary use cases are guest Shared Virtual Address (SVA)
> > > > > and +guest IO virtual address (IOVA), wherein virtual IOMMU
> > > > > (vIOMMU) is +required to communicate with the physical IOMMU in
> > > > > the host. +
> > > > > +.. contents:: :local:
> > > > > +
> > > > > +Functionalities
> > > > > +====================================================
> > > > > +Communications of user and kernel involve both directions. The
> > > > > +supported user-kernel APIs are as follows:
> > > > > +
> > > > > +1. Alloc/Free PASID
> > > > > +2. Bind/unbind guest PASID (e.g. Intel VT-d)
> > > > > +3. Bind/unbind guest PASID table (e.g. ARM sMMU)
> > > > > +4. Invalidate IOMMU caches
> > > > > +5. Service page request
> > > > > +
> > > > > +Requirements
> > > > > +====================================================
> > > > > +The IOMMU UAPIs are generic and extensible to meet the
> > > > > following +requirements:
> > > > > +
> > > > > +1. Emulated and para-virtualised vIOMMUs
> > > > > +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.)
> > > > > +3. Extensions to the UAPI shall not break existing user space
> > > > > +
> > > > > +Interfaces
> > > > > +====================================================
> > > > > +Although the data structures defined in IOMMU UAPI are
> > > > > self-contained, +there is no user API functions introduced.
> > > > > Instead, IOMMU UAPI is +designed to work with existing user
> > > > > driver frameworks such as VFIO. +
> > > > > +Extension Rules & Precautions
> > > > > +-----------------------------
> > > > > +When IOMMU UAPI gets extended, the data structures can *only*
> > > > > be +modified in two ways:
> > > > > +
> > > > > +1. Adding new fields by re-purposing the padding[] field. No
> > > > > size change. +2. Adding new union members at the end. May
> > > > > increase in size. +
> > > > > +No new fields can be added *after* the variable size union in
> > > > > that it +will break backward compatibility when offset moves.
> > > > > In both cases, a +new flag must be accompanied with a new field
> > > > > such that the IOMMU +driver can process the data based on the
> > > > > new flag. Version field is +only reserved for the unlikely
> > > > > event of UAPI upgrade at its entirety. +
> > > > > +It's *always* the caller's responsibility to indicate the size
> > > > > of the +structure passed by setting argsz appropriately.
> > > > > +
> > > > > +When IOMMU UAPI extension results in size increase, user such
> > > > > as VFIO +has to handle the following scenarios:
> > > > > +
> > > > > +1. User and kernel has exact size match
> > > > > +2. An older user with older kernel header (smaller UAPI size)
> > > > > running on a
> > > > > +   newer kernel (larger UAPI size)
> > > > > +3. A newer user with newer kernel header (larger UAPI size)
> > > > > running
> > > > > +   on a older kernel.
> > > > > +4. A malicious/misbehaving user pass illegal/invalid size but
> > > > > within
> > > > > +   range. The data may contain garbage.
> > > > > +
> > > > > +
> > > > > +Feature Checking
> > > > > +----------------
> > > > > +While launching a guest with vIOMMU, it is important to ensure
> > > > > that host +can support the UAPI data structures to be used for
> > > > > vIOMMU-pIOMMU +communications. Without the upfront
> compatibility
> > > > > checking, future +faults are difficult to report even in normal
> > > > > conditions. For example, +TLB invalidations should always
> > > > > succeed from vIOMMU's +perspective. There is no architectural
> > > > > way to report back to the vIOMMU +if the UAPI data is
> > > > > incompatible. For this reason the following IOMMU +UAPIs cannot
> > > > > fail: +
> > > > > +1. Free PASID
> > > > > +2. Unbind guest PASID
> > > > > +3. Unbind guest PASID table (SMMU)
> > > > > +4. Cache invalidate
> > > > > +5. Page response
> > > > > +
> > > > > +User applications such as QEMU is expected to import kernel
> > > > > UAPI +headers. Only backward compatibility is supported. For
> > > > > example, an +older QEMU (with older kernel header) can run on
> > > > > newer kernel. Newer +QEMU (with new kernel header) may fail on
> > > > > older kernel.
> > > >
> > > > "Build your user application against newer kernels and it may
> > > > break on older kernels" is not a great selling point of this
> > > > UAPI.  Clearly new features may not be available on older kernels
> > > > and an application that depends on a newer feature may be
> > > > restricted to newer kernels.
> > > Perhaps "fail on older kernel" is not the right statement. I meant
> > > to say "Newer QEMU (with new kernel header) may fail the
> > > compatibility check on older kernel". Here compatibility check
> > > involves argsz check and feature check.
> > >
> > > Does it sound right?
> >
> > If simply recompiling QEMU against a new kernel header causes it to
> > fail on an old kernel, we've done something very wrong in this UAPI.
> >
> 
> I agree we should make best effort to support the fields in the new
> header that was supported in the older kernel.
> 
> But there will be cases that new app fails on old kernel if the new
> fields from the new header are used. Do we have consensus on this?

Yes, I think that is also what Alex meant. If new feature/field is touched
the app will fail for sure on old kernel. But if only old features/fields are 
touched then the app should work correctly. This is the case by simply 
recompiling Qemu against a new kernel header, where no new feature
is supposed to be used. Qemu may pass an argsz bigger than what old
kernel supports, but old kernel only copies the size that it knows and
serve the features that it supports according to flags.

> 
> > > > > +
> > > > > +IOMMU vendor driver should report the below features to IOMMU
> > > > > UAPI +consumers (e.g. via VFIO).
> > > > > +
> > > > > +1. IOMMU_NESTING_FEAT_SYSWIDE_PASID
> > > > > +2. IOMMU_NESTING_FEAT_BIND_PGTBL
> > > > > +3. IOMMU_NESTING_FEAT_BIND_PASID_TABLE
> > > > > +4. IOMMU_NESTING_FEAT_CACHE_INVLD
> > > > > +5. IOMMU_NESTING_FEAT_PAGE_REQUEST
> > > > > +
> > > > > +Take VFIO as example, upon request from VFIO user space (e.g.
> > > > > QEMU), +VFIO kernel code shall query IOMMU vendor driver for the
> > > > > support of +the above features. Query result can then be
> > > > > reported back to the +user-space caller. Details can be found in
> > > > > +Documentation/driver-api/vfio.rst.
> > > > > +
> > > > > +
> > > > > +Data Passing Example with VFIO
> > > > > +------------------------------
> > > > > +As the ubiquitous userspace driver framework, VFIO is already
> > > > > IOMMU +aware and share many key concepts such as device model,
> > > > > group, and +protection domain. Other user driver frameworks can
> > > > > also be extended +to support IOMMU UAPI but it is outside the
> > > > > scope of this document. +
> > > > > +In this tight-knit VFIO-IOMMU interface, the ultimate consumer
> > > > > of the +IOMMU UAPI data is the host IOMMU driver. VFIO
> > > > > facilitates user-kernel +transport, capability checking,
> > > > > security, and life cycle management of +process address space
> > > > > ID (PASID). +
> > > > > +Unlike normal user data passed via VFIO UAPI IOTCL, IOMMU
> > > > > driver is the +ultimate consumer of its UAPI data. At VFIO
> > > > > layer, the IOMMU UAPI data +is wrapped in a VFIO UAPI data for
> > > > > sanity checking. It follows the +pattern below:
> > > > > +
> > > > > +::
> > > > > +
> > > > > +   struct {
> > > > > +	__u32 argsz;
> > > > > +	__u32 flags;
> > > > > +	__u8  data[];
> > > > > +  }
> > > > > +
> > > > > +Here data[] contains the IOMMU UAPI data structures.
> > > > > +
> > > > > +In order to determine the size and feature set of the user
> > > > > data, argsz +and flags are also embedded in the IOMMU UAPI data
> > > > > structures. +A "__u32 argsz" field is *always* at the beginning
> > > > > of each structure. +
> > > > > +For example:
> > > > > +::
> > > > > +
> > > > > +   struct iommu_gpasid_bind_data {
> > > > > +	__u32 argsz;
> > > > > +	__u32 version;
> > > > > +	#define IOMMU_PASID_FORMAT_INTEL_VTD	1
> > > > > +	__u32 format;
> > > > > +	#define IOMMU_SVA_GPASID_VAL	(1 << 0)
> > > > > +	__u64 flags;
> > > > > +	__u64 gpgd;
> > > > > +	__u64 hpasid;
> > > > > +	__u64 gpasid;
> > > > > +	__u32 addr_width;
> > > > > +	__u8  padding[12];
> > > > > +	/* Vendor specific data */
> > > > > +	union {
> > > > > +		struct iommu_gpasid_bind_data_vtd vtd;
> > > > > +	};
> > > > > +  };
> > > > > +
> > > > > +Use bind guest PASID as an example, VFIO code shall process
> > > > > IOMMU UAPI +request as follows:
> > > > > +
> > > > > +::
> > > > > +
> > > > > + 1        /* Minsz must include IOMMU UAPI "argsz" of __u32 */
> > > > > + 2        minsz = offsetofend(struct vfio_iommu_type1_bind,
> > > > > flags) +
> > > > > +                              sizeof(u32);
> > > >
> > > > In the example structure above:
> > > >
> > > > > +   struct {
> > > > > +	__u32 argsz;
> > > > > +	__u32 flags;
> > > > > +	__u8  data[];
> > > > > +  }
> > > >
> > > > This presumes that vfio does not use flags to identify a different
> > > > layout, for example a field before data or defining a flag that
> > > > provides no data.  IOW, the IOMMU guarantees argsz at the
> > > > beginning of all structures, but let's not limit how vfio chooses
> > > > to bundle that structure.  minsz should be based on flags, which
> > > > we'll evaluate to determine how much more to copy.
> > > >
> > > Got it, VFIO owns its flags therefore minsz. How about reword the
> > > example data struct as:
> > >    struct {
> > > 	__u32 argsz;
> > > 	__u32 flags;
> > > 	__u8  data[];
> > >   }
> > >
> > > Here data[] contains the IOMMU UAPI data structures. VFIO has the
> > > freedom to bundle the data as well as parse data size based on its
> > > own flags.
> > >
> > > In the example code:
> > >
> > > "Use bind guest PASID as an example, VFIO code shall first
> > > process the flags field to determine the size to copy for IOMMU
> > > UAPI. The flags could indicate different layout of the VFIO data
> > > and the types of IOMMU UAPI data."
> >
> > I think this should probably go no further than to say that the IOMMU
> > UAPI data structure is expected to be embedded opaquely into the VFIO
> > API, for example, VFIO may use a structure such as:
> >
> >     struct {
> >  	__u32 argsz;
> >  	__u32 flags;
> >  	__u8  data[];
> >    }
> >
> > where data[] contains an IOMMU UAPI structure, including the user
> > provided argsz field relative to that embedded structure.  This format
> > allows VFIO to multiplex multiple IOMMU UAPI interfaces through a
> > reduced set of ioctls.
> >
> Sounds good, thanks for the summary.
> 
> > > > > UAPI +request as follows:
> > >
> > > > > + 3        copy_from_user(&vfio_bind, (void __user *)arg,
> > > > > minsz);
> > > > > + 4
> > > > > + 5        /* Check VFIO argsz */
> > > > > + 6        if (vfio_bind.argsz < minsz)
> > > > > + 7                return -EINVAL;
> > > > > + 8
> > > > > + 9        /* VFIO flags must be included in minsz */
> > > > > + 10        switch (vfio_bind.flags) {
> > > > > + 11        case VFIO_IOMMU_BIND_GUEST_PGTBL:
> > > > > + 12                /*
> > > > > + 13                 * Get the current IOMMU bind GPASID data
> > > > > size,
> > > > > + 14                 * which accounted for the largest union
> > > > > member.
> > > > > + 15                 */
> > > > > + 16                data_size = sizeof(struct
> > > > > iommu_gpasid_bind_data);
> > > > > + 17                iommu_argsz = vfio_bind.argsz - minsz;
> > > >
> > > > Note that by including the IOMMU UAPI argsz within minsz, this is
> > > > incorrect.
> > > >
> > > Good catch, should be:
> > > iommu_argsz = vfio_bind.argsz - minsz - sizeof(u32)

iommu_argsz = vfio_bind.argsz - minsz + sizeof(u32)

> > >
> > > > > + 18                if (iommu_argsz > data_size) {
> > > > > + 19                        /* User data > current kernel */
> > > > > + 20                        return -E2BIG;
> > > > > + 21                }
> > > >
> > > > Now I see why you're making the claim that QEMU compiled against
> > > > an new kernel may not work on an older kernel.  We can do
> > > > better.  The current sizeof the data structure should be the
> > > > maximum we'll copy from the user, and we can update the user
> > > > provided IOMMU UAPI argsz as we pass it down from the user to
> > > > avoid exposing ourselves to an arbitrarily large user buffer.
> > > > The IOMMU UAPI interfaces should then also use argsz and flags to
> > > > determine whether the data is present for a specified flag. That
> > > > should allow a user application compiled against a newer kernel
> > > > header, but only using features found on older kernels to
> > > > continue to work on older kernels, which seems like a basic
> > > > requirement to me.
> > > I got your point. But I don't understand why VFIO layer will update
> > > IOMMU argsz, IOMMU layer is not exposed to arbitrary large user
> > > size in that it can not exceed the current UAPI data size.
> >
> > I was thinking about the case where userspace is compiled against a
> > newer kernel and might provide argsz = sizeof(struct iommu_uapi_foo')
> > but vfio only knows sizeof(struct iommu_uapi_foo).  We don't want to
> > copy an arbitrary amount of data from userspace, so vfio would use
> > MIN(argsz, sizeof(struct iommu_uapi_foo)) for the copy_from_user().
> > The IOMMU UAPI would then need to depend on the reduced argsz.
> >
> > But then I thought it even better if VFIO leaves the entire
> > copy_from_user() to the layer consuming it.
> >
> OK. Sounds good, that was what Kevin suggested also. I just wasn't sure
> how much VFIO wants to inspect, I thought VFIO layer wanted to do a
> sanity check.
> 
> Anyway, I will move copy_from_user to iommu uapi layer.
> 
> > > I agree we should make effort to allow features found in the older
> > > kernel continue to work.
> >
> > That's a requirement.  Breaking existing userspace without following a
> > deprecation model is a bug.
> Sorry I don't understand why it is breaking existing userspace. If a
> userspace is recompiled with new kernel header, it is not *existing*.

It is not the "existing binary", but is about the "existing code". 😊

Thanks,
Kevin

> 
> > But I think so too is the fact that this
> > interface actually specifies and provides an example where simply
> > recompiling existing userspace against a new kernel header where the
> > size of structure may be changed to support a feature will cause that
> > application to fail to run on older kernels.  That's not feasible for
> > a distribution to support.
> >
> I see your point, my assumption was that app (e.g. QEMU) imports new
> header selectively. The new header must be imported with an intention
> of using the new flags/fields. I guess this may not *always* be true.
> I will add the support for older kernel to run on new header (if no new
> fields/flags are used).
> 
> 
> Thanks a lot!
> 
> > > > > + 22                copy_from_user(&iommu_bind, (void __user *)
> > > > > + 23                               vfio_bind.data, iommu_argsz);
> > > > > + 24               /*
> > > > > + 25                * Deal with trailing bytes that is bigger
> > > > > than user
> > > > > + 26                * provided UAPI size but smaller than the
> > > > > current
> > > > > + 27                * kernel data size. Zero fill the trailing
> > > > > bytes.
> > > > > + 28                */
> > > > > + 29                memset(iommu_bind + iommu_argsz, 0,
> > > > > data_size -
> > > > > + 30                       iommu_argsz;
> > > >
> > > > The IOMMU UAPI interface having access to argsz should make this
> > > > unnecessary.  Performing this memset() seems like it suggests to
> > > > the next layer that it can rely on all fields being present and
> > > > valid, which defeats the purpose of argsz.
> > > >
> > > This memset does not suggest all fields are present and valid. Only
> > > filter out the obvious invalid data based on current size. My
> > > intention is to reduce the burden of checking not eliminate.
> >
> > I'm afraid that reducing the burden on the IOMMU UAPI layers leads to
> > reduced visibility which leads to less stringent validation.  We're
> > defining a UAPI layer for the kernel where VFIO just happens to be an
> > interface through to that UAPI.  The UAPI should therefore be
> > providing the validation, not VFIO.  Create a wrapper in the IOMMU
> > UAPI layer if you want to share common validation.
> >
> > > > > + 31
> > > > > + 32                iommu_sva_bind_gpasid(domain, dev,
> > > > > iommu_bind_data);
> > > > > + 33                break;
> > > > > +
> > > > > +
> > > > > +Case #1 & 2 are supported per backward compatibility rule.
> > > > > +
> > > > > +Case #3 will fail with -E2BIG at line #20. Case
> > > >
> > > > This is not acceptable IMO.
> > > >
> > > Got it. Will copy up to the current data size and let IOMMU driver
> > > handle supported flags/features.
> > >
> > > > > +Case #4 may result in other error processed by IOMMU vendor
> > > > > driver. However, +the damage shall not exceed the scope of the
> > > > > offending user.
> > > >
> > > > This is a concern in this double wrapped interface, the IOMMU UAPI
> > > > layer may expect the vfio layer to validate the data.  Zeroing the
> > > > remainder of the data structure is evidence towards that.  The
> > > > IOMMU UAPI layer needs to consider all of this untrusted, so why
> > > > would we not reflect that by passing a __user pointer through to
> > > > the IOMMU UAPI such that it can copy the data from the user
> > > > itself rather than being mislead that the contents have been
> > > > somehow verified?  Thanks,
> > > I am OK with IOMMU layer does the copy_from_user. One of my
> > > original thinking was that since some APIs (e.g page response) also
> > > used by in-kernel code, I would avoid user pointer or another
> > > wrapper.
> > >
> > > Perhaps need to clarify the roles of each layer, IMHO the roles are:
> > > - VFIO
> > > 	1. bundle IOMMU UAPI data with flags & argsz
> > > 	2. sanity check argsz > minsz
> > > 	3. determine the copy_from_user size based on VFIO flags &
> > > argsz
> > >
> > > - IOMMU UAPI
> > > 	1. check argsz against current kernel IOMMU UAPI data size,
> > > its own minsz
> > > 	2. parse data based on feature/flags
> > >
> > > So if VFIO already can decide the copy_from_user size as in VFIO.3,
> > > why can't it just do the copy as well. VFIO only checks and ensures
> > > size, nothing specific to the content of IOMMU UAPI. Does the role
> > > partition sound right?
> >
> > As above, the only way that this can be a generic UAPI is if VFIO is
> > just a passthrough, otherwise this just becomes VFIO API and we might
> > as well not pretend we're creating a UAPI between the VFIO and IOMMU.
> > If a second user of the UAPI would duplicate the code from VFIO, then
> > it shouldn't be in VFIO.  Thanks,
> >
> > Alex
> >
> 
> [Jacob Pan]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 1/3] docs: IOMMU user API
  2020-06-12  7:38           ` Tian, Kevin
@ 2020-06-12 13:09             ` Jacob Pan
  0 siblings, 0 replies; 31+ messages in thread
From: Jacob Pan @ 2020-06-12 13:09 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Raj, Ashok, Jonathan Corbet, Jean-Philippe Brucker, iommu, LKML,
	Christoph Hellwig, Alex Williamson, David Woodhouse

On Fri, 12 Jun 2020 07:38:44 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> > From: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Sent: Friday, June 12, 2020 8:27 AM
> > 
> > On Thu, 11 Jun 2020 14:40:47 -0600
> > Alex Williamson <alex.williamson@redhat.com> wrote:
> >   
> > > On Thu, 11 Jun 2020 12:52:05 -0700
> > > Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
> > >  
> > > > Hi Alex,
> > > >
> > > > On Thu, 11 Jun 2020 09:47:41 -0600
> > > > Alex Williamson <alex.williamson@redhat.com> wrote:
> > > >  
> > > > > On Wed, 10 Jun 2020 21:12:13 -0700
> > > > > Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
> > > > >  
> > > > > > IOMMU UAPI is newly introduced to support communications  
> > between  
> > > > > > guest virtual IOMMU and host IOMMU. There has been lots of
> > > > > > discussions on how it should work with VFIO UAPI and
> > > > > > userspace in general.
> > > > > >
> > > > > > This document is indended to clarify the UAPI design and
> > > > > > usage. The mechenics of how future extensions should be
> > > > > > achieved are also covered in this documentation.
> > > > > >
> > > > > > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > > > > > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > > > > ---
> > > > > >  Documentation/userspace-api/iommu.rst | 210
> > > > > > ++++++++++++++++++++++++++++++++++ 1 file changed, 210
> > > > > > insertions(+) create mode 100644
> > > > > > Documentation/userspace-api/iommu.rst
> > > > > >
> > > > > > diff --git a/Documentation/userspace-api/iommu.rst
> > > > > > b/Documentation/userspace-api/iommu.rst new file mode 100644
> > > > > > index 000000000000..e95dc5a04a41
> > > > > > --- /dev/null
> > > > > > +++ b/Documentation/userspace-api/iommu.rst
> > > > > > @@ -0,0 +1,210 @@
> > > > > > +.. SPDX-License-Identifier: GPL-2.0
> > > > > > +.. iommu:
> > > > > > +
> > > > > > +=====================================
> > > > > > +IOMMU Userspace API
> > > > > > +=====================================
> > > > > > +
> > > > > > +IOMMU UAPI is used for virtualization cases where
> > > > > > communications are +needed between physical and virtual
> > > > > > IOMMU drivers. For native +usage, IOMMU is a system device
> > > > > > which does not need to communicate +with user space
> > > > > > directly. +
> > > > > > +The primary use cases are guest Shared Virtual Address
> > > > > > (SVA) and +guest IO virtual address (IOVA), wherein virtual
> > > > > > IOMMU (vIOMMU) is +required to communicate with the
> > > > > > physical IOMMU in the host. +
> > > > > > +.. contents:: :local:
> > > > > > +
> > > > > > +Functionalities
> > > > > > +====================================================
> > > > > > +Communications of user and kernel involve both directions.
> > > > > > The +supported user-kernel APIs are as follows:
> > > > > > +
> > > > > > +1. Alloc/Free PASID
> > > > > > +2. Bind/unbind guest PASID (e.g. Intel VT-d)
> > > > > > +3. Bind/unbind guest PASID table (e.g. ARM sMMU)
> > > > > > +4. Invalidate IOMMU caches
> > > > > > +5. Service page request
> > > > > > +
> > > > > > +Requirements
> > > > > > +====================================================
> > > > > > +The IOMMU UAPIs are generic and extensible to meet the
> > > > > > following +requirements:
> > > > > > +
> > > > > > +1. Emulated and para-virtualised vIOMMUs
> > > > > > +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.)
> > > > > > +3. Extensions to the UAPI shall not break existing user
> > > > > > space +
> > > > > > +Interfaces
> > > > > > +====================================================
> > > > > > +Although the data structures defined in IOMMU UAPI are
> > > > > > self-contained, +there is no user API functions introduced.
> > > > > > Instead, IOMMU UAPI is +designed to work with existing user
> > > > > > driver frameworks such as VFIO. +
> > > > > > +Extension Rules & Precautions
> > > > > > +-----------------------------
> > > > > > +When IOMMU UAPI gets extended, the data structures can
> > > > > > *only* be +modified in two ways:
> > > > > > +
> > > > > > +1. Adding new fields by re-purposing the padding[] field.
> > > > > > No size change. +2. Adding new union members at the end. May
> > > > > > increase in size. +
> > > > > > +No new fields can be added *after* the variable size union
> > > > > > in that it +will break backward compatibility when offset
> > > > > > moves. In both cases, a +new flag must be accompanied with
> > > > > > a new field such that the IOMMU +driver can process the
> > > > > > data based on the new flag. Version field is +only reserved
> > > > > > for the unlikely event of UAPI upgrade at its entirety. +
> > > > > > +It's *always* the caller's responsibility to indicate the
> > > > > > size of the +structure passed by setting argsz
> > > > > > appropriately. +
> > > > > > +When IOMMU UAPI extension results in size increase, user
> > > > > > such as VFIO +has to handle the following scenarios:
> > > > > > +
> > > > > > +1. User and kernel has exact size match
> > > > > > +2. An older user with older kernel header (smaller UAPI
> > > > > > size) running on a
> > > > > > +   newer kernel (larger UAPI size)
> > > > > > +3. A newer user with newer kernel header (larger UAPI size)
> > > > > > running
> > > > > > +   on a older kernel.
> > > > > > +4. A malicious/misbehaving user pass illegal/invalid size
> > > > > > but within
> > > > > > +   range. The data may contain garbage.
> > > > > > +
> > > > > > +
> > > > > > +Feature Checking
> > > > > > +----------------
> > > > > > +While launching a guest with vIOMMU, it is important to
> > > > > > ensure that host +can support the UAPI data structures to
> > > > > > be used for vIOMMU-pIOMMU +communications. Without the
> > > > > > upfront  
> > compatibility  
> > > > > > checking, future +faults are difficult to report even in
> > > > > > normal conditions. For example, +TLB invalidations should
> > > > > > always succeed from vIOMMU's +perspective. There is no
> > > > > > architectural way to report back to the vIOMMU +if the UAPI
> > > > > > data is incompatible. For this reason the following IOMMU
> > > > > > +UAPIs cannot fail: +
> > > > > > +1. Free PASID
> > > > > > +2. Unbind guest PASID
> > > > > > +3. Unbind guest PASID table (SMMU)
> > > > > > +4. Cache invalidate
> > > > > > +5. Page response
> > > > > > +
> > > > > > +User applications such as QEMU is expected to import kernel
> > > > > > UAPI +headers. Only backward compatibility is supported. For
> > > > > > example, an +older QEMU (with older kernel header) can run
> > > > > > on newer kernel. Newer +QEMU (with new kernel header) may
> > > > > > fail on older kernel.  
> > > > >
> > > > > "Build your user application against newer kernels and it may
> > > > > break on older kernels" is not a great selling point of this
> > > > > UAPI.  Clearly new features may not be available on older
> > > > > kernels and an application that depends on a newer feature
> > > > > may be restricted to newer kernels.  
> > > > Perhaps "fail on older kernel" is not the right statement. I
> > > > meant to say "Newer QEMU (with new kernel header) may fail the
> > > > compatibility check on older kernel". Here compatibility check
> > > > involves argsz check and feature check.
> > > >
> > > > Does it sound right?  
> > >
> > > If simply recompiling QEMU against a new kernel header causes it
> > > to fail on an old kernel, we've done something very wrong in this
> > > UAPI. 
> > 
> > I agree we should make best effort to support the fields in the new
> > header that was supported in the older kernel.
> > 
> > But there will be cases that new app fails on old kernel if the new
> > fields from the new header are used. Do we have consensus on this?  
> 
> Yes, I think that is also what Alex meant. If new feature/field is
> touched the app will fail for sure on old kernel. But if only old
> features/fields are touched then the app should work correctly. This
> is the case by simply recompiling Qemu against a new kernel header,
> where no new feature is supposed to be used. Qemu may pass an argsz
> bigger than what old kernel supports, but old kernel only copies the
> size that it knows and serve the features that it supports according
> to flags.
> 
great, thanks for the confirmation.

> >   
> > > > > > +
> > > > > > +IOMMU vendor driver should report the below features to
> > > > > > IOMMU UAPI +consumers (e.g. via VFIO).
> > > > > > +
> > > > > > +1. IOMMU_NESTING_FEAT_SYSWIDE_PASID
> > > > > > +2. IOMMU_NESTING_FEAT_BIND_PGTBL
> > > > > > +3. IOMMU_NESTING_FEAT_BIND_PASID_TABLE
> > > > > > +4. IOMMU_NESTING_FEAT_CACHE_INVLD
> > > > > > +5. IOMMU_NESTING_FEAT_PAGE_REQUEST
> > > > > > +
> > > > > > +Take VFIO as example, upon request from VFIO user space
> > > > > > (e.g. QEMU), +VFIO kernel code shall query IOMMU vendor
> > > > > > driver for the support of +the above features. Query result
> > > > > > can then be reported back to the +user-space caller.
> > > > > > Details can be found in +Documentation/driver-api/vfio.rst.
> > > > > > +
> > > > > > +
> > > > > > +Data Passing Example with VFIO
> > > > > > +------------------------------
> > > > > > +As the ubiquitous userspace driver framework, VFIO is
> > > > > > already IOMMU +aware and share many key concepts such as
> > > > > > device model, group, and +protection domain. Other user
> > > > > > driver frameworks can also be extended +to support IOMMU
> > > > > > UAPI but it is outside the scope of this document. +
> > > > > > +In this tight-knit VFIO-IOMMU interface, the ultimate
> > > > > > consumer of the +IOMMU UAPI data is the host IOMMU driver.
> > > > > > VFIO facilitates user-kernel +transport, capability
> > > > > > checking, security, and life cycle management of +process
> > > > > > address space ID (PASID). +
> > > > > > +Unlike normal user data passed via VFIO UAPI IOTCL, IOMMU
> > > > > > driver is the +ultimate consumer of its UAPI data. At VFIO
> > > > > > layer, the IOMMU UAPI data +is wrapped in a VFIO UAPI data
> > > > > > for sanity checking. It follows the +pattern below:
> > > > > > +
> > > > > > +::
> > > > > > +
> > > > > > +   struct {
> > > > > > +	__u32 argsz;
> > > > > > +	__u32 flags;
> > > > > > +	__u8  data[];
> > > > > > +  }
> > > > > > +
> > > > > > +Here data[] contains the IOMMU UAPI data structures.
> > > > > > +
> > > > > > +In order to determine the size and feature set of the user
> > > > > > data, argsz +and flags are also embedded in the IOMMU UAPI
> > > > > > data structures. +A "__u32 argsz" field is *always* at the
> > > > > > beginning of each structure. +
> > > > > > +For example:
> > > > > > +::
> > > > > > +
> > > > > > +   struct iommu_gpasid_bind_data {
> > > > > > +	__u32 argsz;
> > > > > > +	__u32 version;
> > > > > > +	#define IOMMU_PASID_FORMAT_INTEL_VTD	1
> > > > > > +	__u32 format;
> > > > > > +	#define IOMMU_SVA_GPASID_VAL	(1 << 0)
> > > > > > +	__u64 flags;
> > > > > > +	__u64 gpgd;
> > > > > > +	__u64 hpasid;
> > > > > > +	__u64 gpasid;
> > > > > > +	__u32 addr_width;
> > > > > > +	__u8  padding[12];
> > > > > > +	/* Vendor specific data */
> > > > > > +	union {
> > > > > > +		struct iommu_gpasid_bind_data_vtd vtd;
> > > > > > +	};
> > > > > > +  };
> > > > > > +
> > > > > > +Use bind guest PASID as an example, VFIO code shall process
> > > > > > IOMMU UAPI +request as follows:
> > > > > > +
> > > > > > +::
> > > > > > +
> > > > > > + 1        /* Minsz must include IOMMU UAPI "argsz" of
> > > > > > __u32 */
> > > > > > + 2        minsz = offsetofend(struct vfio_iommu_type1_bind,
> > > > > > flags) +
> > > > > > +                              sizeof(u32);  
> > > > >
> > > > > In the example structure above:
> > > > >  
> > > > > > +   struct {
> > > > > > +	__u32 argsz;
> > > > > > +	__u32 flags;
> > > > > > +	__u8  data[];
> > > > > > +  }  
> > > > >
> > > > > This presumes that vfio does not use flags to identify a
> > > > > different layout, for example a field before data or defining
> > > > > a flag that provides no data.  IOW, the IOMMU guarantees
> > > > > argsz at the beginning of all structures, but let's not limit
> > > > > how vfio chooses to bundle that structure.  minsz should be
> > > > > based on flags, which we'll evaluate to determine how much
> > > > > more to copy. 
> > > > Got it, VFIO owns its flags therefore minsz. How about reword
> > > > the example data struct as:
> > > >    struct {
> > > > 	__u32 argsz;
> > > > 	__u32 flags;
> > > > 	__u8  data[];
> > > >   }
> > > >
> > > > Here data[] contains the IOMMU UAPI data structures. VFIO has
> > > > the freedom to bundle the data as well as parse data size based
> > > > on its own flags.
> > > >
> > > > In the example code:
> > > >
> > > > "Use bind guest PASID as an example, VFIO code shall first
> > > > process the flags field to determine the size to copy for IOMMU
> > > > UAPI. The flags could indicate different layout of the VFIO data
> > > > and the types of IOMMU UAPI data."  
> > >
> > > I think this should probably go no further than to say that the
> > > IOMMU UAPI data structure is expected to be embedded opaquely
> > > into the VFIO API, for example, VFIO may use a structure such as:
> > >
> > >     struct {
> > >  	__u32 argsz;
> > >  	__u32 flags;
> > >  	__u8  data[];
> > >    }
> > >
> > > where data[] contains an IOMMU UAPI structure, including the user
> > > provided argsz field relative to that embedded structure.  This
> > > format allows VFIO to multiplex multiple IOMMU UAPI interfaces
> > > through a reduced set of ioctls.
> > >  
> > Sounds good, thanks for the summary.
> >   
> > > > > > UAPI +request as follows:  
> > > >  
> > > > > > + 3        copy_from_user(&vfio_bind, (void __user *)arg,
> > > > > > minsz);
> > > > > > + 4
> > > > > > + 5        /* Check VFIO argsz */
> > > > > > + 6        if (vfio_bind.argsz < minsz)
> > > > > > + 7                return -EINVAL;
> > > > > > + 8
> > > > > > + 9        /* VFIO flags must be included in minsz */
> > > > > > + 10        switch (vfio_bind.flags) {
> > > > > > + 11        case VFIO_IOMMU_BIND_GUEST_PGTBL:
> > > > > > + 12                /*
> > > > > > + 13                 * Get the current IOMMU bind GPASID
> > > > > > data size,
> > > > > > + 14                 * which accounted for the largest union
> > > > > > member.
> > > > > > + 15                 */
> > > > > > + 16                data_size = sizeof(struct
> > > > > > iommu_gpasid_bind_data);
> > > > > > + 17                iommu_argsz = vfio_bind.argsz - minsz;  
> > > > >
> > > > > Note that by including the IOMMU UAPI argsz within minsz,
> > > > > this is incorrect.
> > > > >  
> > > > Good catch, should be:
> > > > iommu_argsz = vfio_bind.argsz - minsz - sizeof(u32)  
> 
> iommu_argsz = vfio_bind.argsz - minsz + sizeof(u32)
> 
you are right :)

> > > >  
> > > > > > + 18                if (iommu_argsz > data_size) {
> > > > > > + 19                        /* User data > current kernel */
> > > > > > + 20                        return -E2BIG;
> > > > > > + 21                }  
> > > > >
> > > > > Now I see why you're making the claim that QEMU compiled
> > > > > against an new kernel may not work on an older kernel.  We
> > > > > can do better.  The current sizeof the data structure should
> > > > > be the maximum we'll copy from the user, and we can update
> > > > > the user provided IOMMU UAPI argsz as we pass it down from
> > > > > the user to avoid exposing ourselves to an arbitrarily large
> > > > > user buffer. The IOMMU UAPI interfaces should then also use
> > > > > argsz and flags to determine whether the data is present for
> > > > > a specified flag. That should allow a user application
> > > > > compiled against a newer kernel header, but only using
> > > > > features found on older kernels to continue to work on older
> > > > > kernels, which seems like a basic requirement to me.  
> > > > I got your point. But I don't understand why VFIO layer will
> > > > update IOMMU argsz, IOMMU layer is not exposed to arbitrary
> > > > large user size in that it can not exceed the current UAPI data
> > > > size.  
> > >
> > > I was thinking about the case where userspace is compiled against
> > > a newer kernel and might provide argsz = sizeof(struct
> > > iommu_uapi_foo') but vfio only knows sizeof(struct
> > > iommu_uapi_foo).  We don't want to copy an arbitrary amount of
> > > data from userspace, so vfio would use MIN(argsz, sizeof(struct
> > > iommu_uapi_foo)) for the copy_from_user(). The IOMMU UAPI would
> > > then need to depend on the reduced argsz.
> > >
> > > But then I thought it even better if VFIO leaves the entire
> > > copy_from_user() to the layer consuming it.
> > >  
> > OK. Sounds good, that was what Kevin suggested also. I just wasn't
> > sure how much VFIO wants to inspect, I thought VFIO layer wanted to
> > do a sanity check.
> > 
> > Anyway, I will move copy_from_user to iommu uapi layer.
> >   
> > > > I agree we should make effort to allow features found in the
> > > > older kernel continue to work.  
> > >
> > > That's a requirement.  Breaking existing userspace without
> > > following a deprecation model is a bug.  
> > Sorry I don't understand why it is breaking existing userspace. If a
> > userspace is recompiled with new kernel header, it is not
> > *existing*.  
> 
> It is not the "existing binary", but is about the "existing code". 😊
> 
right. I guess there are two possibilities for apps to use kernel
header. we need to support both.
1. selectively import part of the header
2. include header

will do in the next round.

> Thanks,
> Kevin
> 
> >   
> > > But I think so too is the fact that this
> > > interface actually specifies and provides an example where simply
> > > recompiling existing userspace against a new kernel header where
> > > the size of structure may be changed to support a feature will
> > > cause that application to fail to run on older kernels.  That's
> > > not feasible for a distribution to support.
> > >  
> > I see your point, my assumption was that app (e.g. QEMU) imports new
> > header selectively. The new header must be imported with an
> > intention of using the new flags/fields. I guess this may not
> > *always* be true. I will add the support for older kernel to run on
> > new header (if no new fields/flags are used).
> > 
> > 
> > Thanks a lot!
> >   
> > > > > > + 22                copy_from_user(&iommu_bind, (void
> > > > > > __user *)
> > > > > > + 23                               vfio_bind.data,
> > > > > > iommu_argsz);
> > > > > > + 24               /*
> > > > > > + 25                * Deal with trailing bytes that is
> > > > > > bigger than user
> > > > > > + 26                * provided UAPI size but smaller than
> > > > > > the current
> > > > > > + 27                * kernel data size. Zero fill the
> > > > > > trailing bytes.
> > > > > > + 28                */
> > > > > > + 29                memset(iommu_bind + iommu_argsz, 0,
> > > > > > data_size -
> > > > > > + 30                       iommu_argsz;  
> > > > >
> > > > > The IOMMU UAPI interface having access to argsz should make
> > > > > this unnecessary.  Performing this memset() seems like it
> > > > > suggests to the next layer that it can rely on all fields
> > > > > being present and valid, which defeats the purpose of argsz.
> > > > >  
> > > > This memset does not suggest all fields are present and valid.
> > > > Only filter out the obvious invalid data based on current size.
> > > > My intention is to reduce the burden of checking not
> > > > eliminate.  
> > >
> > > I'm afraid that reducing the burden on the IOMMU UAPI layers
> > > leads to reduced visibility which leads to less stringent
> > > validation.  We're defining a UAPI layer for the kernel where
> > > VFIO just happens to be an interface through to that UAPI.  The
> > > UAPI should therefore be providing the validation, not VFIO.
> > > Create a wrapper in the IOMMU UAPI layer if you want to share
> > > common validation. 
> > > > > > + 31
> > > > > > + 32                iommu_sva_bind_gpasid(domain, dev,
> > > > > > iommu_bind_data);
> > > > > > + 33                break;
> > > > > > +
> > > > > > +
> > > > > > +Case #1 & 2 are supported per backward compatibility rule.
> > > > > > +
> > > > > > +Case #3 will fail with -E2BIG at line #20. Case  
> > > > >
> > > > > This is not acceptable IMO.
> > > > >  
> > > > Got it. Will copy up to the current data size and let IOMMU
> > > > driver handle supported flags/features.
> > > >  
> > > > > > +Case #4 may result in other error processed by IOMMU vendor
> > > > > > driver. However, +the damage shall not exceed the scope of
> > > > > > the offending user.  
> > > > >
> > > > > This is a concern in this double wrapped interface, the IOMMU
> > > > > UAPI layer may expect the vfio layer to validate the data.
> > > > > Zeroing the remainder of the data structure is evidence
> > > > > towards that.  The IOMMU UAPI layer needs to consider all of
> > > > > this untrusted, so why would we not reflect that by passing a
> > > > > __user pointer through to the IOMMU UAPI such that it can
> > > > > copy the data from the user itself rather than being mislead
> > > > > that the contents have been somehow verified?  Thanks,  
> > > > I am OK with IOMMU layer does the copy_from_user. One of my
> > > > original thinking was that since some APIs (e.g page response)
> > > > also used by in-kernel code, I would avoid user pointer or
> > > > another wrapper.
> > > >
> > > > Perhaps need to clarify the roles of each layer, IMHO the roles
> > > > are:
> > > > - VFIO
> > > > 	1. bundle IOMMU UAPI data with flags & argsz
> > > > 	2. sanity check argsz > minsz
> > > > 	3. determine the copy_from_user size based on VFIO
> > > > flags & argsz
> > > >
> > > > - IOMMU UAPI
> > > > 	1. check argsz against current kernel IOMMU UAPI data
> > > > size, its own minsz
> > > > 	2. parse data based on feature/flags
> > > >
> > > > So if VFIO already can decide the copy_from_user size as in
> > > > VFIO.3, why can't it just do the copy as well. VFIO only checks
> > > > and ensures size, nothing specific to the content of IOMMU
> > > > UAPI. Does the role partition sound right?  
> > >
> > > As above, the only way that this can be a generic UAPI is if VFIO
> > > is just a passthrough, otherwise this just becomes VFIO API and
> > > we might as well not pretend we're creating a UAPI between the
> > > VFIO and IOMMU. If a second user of the UAPI would duplicate the
> > > code from VFIO, then it shouldn't be in VFIO.  Thanks,
> > >
> > > Alex
> > >  
> > 
> > [Jacob Pan]  

[Jacob Pan]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 1/3] docs: IOMMU user API
  2020-06-11  6:33   ` Lu Baolu
@ 2020-06-12 22:05     ` Jacob Pan
  0 siblings, 0 replies; 31+ messages in thread
From: Jacob Pan @ 2020-06-12 22:05 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Tian, Kevin, Alex Williamson, Raj Ashok, Jonathan Corbet,
	Jean-Philippe Brucker, LKML, Christoph Hellwig, iommu,
	David Woodhouse

On Thu, 11 Jun 2020 14:33:08 +0800
Lu Baolu <baolu.lu@linux.intel.com> wrote:

> Hi Jacob,
> 
> On 2020/6/11 12:12, Jacob Pan wrote:
> > IOMMU UAPI is newly introduced to support communications between
> > guest virtual IOMMU and host IOMMU. There has been lots of
> > discussions on how it should work with VFIO UAPI and userspace in
> > general.
> > 
> > This document is indended to clarify the UAPI design and usage. The
> > mechenics of how future extensions should be achieved are also
> > covered in this documentation.
> > 
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > ---
> >   Documentation/userspace-api/iommu.rst | 210
> > ++++++++++++++++++++++++++++++++++ 1 file changed, 210 insertions(+)
> >   create mode 100644 Documentation/userspace-api/iommu.rst
> > 
> > diff --git a/Documentation/userspace-api/iommu.rst
> > b/Documentation/userspace-api/iommu.rst new file mode 100644
> > index 000000000000..e95dc5a04a41
> > --- /dev/null
> > +++ b/Documentation/userspace-api/iommu.rst
> > @@ -0,0 +1,210 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +.. iommu:
> > +
> > +=====================================
> > +IOMMU Userspace API
> > +=====================================
> > +
> > +IOMMU UAPI is used for virtualization cases where communications
> > are +needed between physical and virtual IOMMU drivers. For native
> > +usage, IOMMU is a system device which does not need to communicate
> > +with user space directly.
> > +
> > +The primary use cases are guest Shared Virtual Address (SVA) and
> > +guest IO virtual address (IOVA), wherein virtual IOMMU (vIOMMU) is
> > +required to communicate with the physical IOMMU in the host.
> > +
> > +.. contents:: :local:
> > +
> > +Functionalities
> > +====================================================
> > +Communications of user and kernel involve both directions. The
> > +supported user-kernel APIs are as follows:
> > +
> > +1. Alloc/Free PASID
> > +2. Bind/unbind guest PASID (e.g. Intel VT-d)
> > +3. Bind/unbind guest PASID table (e.g. ARM sMMU)
> > +4. Invalidate IOMMU caches
> > +5. Service page request
> > +
> > +Requirements
> > +====================================================
> > +The IOMMU UAPIs are generic and extensible to meet the following
> > +requirements:
> > +
> > +1. Emulated and para-virtualised vIOMMUs
> > +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.)
> > +3. Extensions to the UAPI shall not break existing user space
> > +
> > +Interfaces
> > +====================================================
> > +Although the data structures defined in IOMMU UAPI are
> > self-contained, +there is no user API functions introduced.
> > Instead, IOMMU UAPI is +designed to work with existing user driver
> > frameworks such as VFIO. +
> > +Extension Rules & Precautions
> > +-----------------------------
> > +When IOMMU UAPI gets extended, the data structures can *only* be
> > +modified in two ways:
> > +
> > +1. Adding new fields by re-purposing the padding[] field. No size
> > change. +2. Adding new union members at the end. May increase in
> > size. +
> > +No new fields can be added *after* the variable size union in that
> > it +will break backward compatibility when offset moves. In both
> > cases, a +new flag must be accompanied with a new field such that
> > the IOMMU +driver can process the data based on the new flag.
> > Version field is +only reserved for the unlikely event of UAPI
> > upgrade at its entirety. +
> > +It's *always* the caller's responsibility to indicate the size of
> > the +structure passed by setting argsz appropriately.
> > +
> > +When IOMMU UAPI extension results in size increase, user such as
> > VFIO +has to handle the following scenarios:
> > +
> > +1. User and kernel has exact size match
> > +2. An older user with older kernel header (smaller UAPI size)
> > running on a
> > +   newer kernel (larger UAPI size)
> > +3. A newer user with newer kernel header (larger UAPI size) running
> > +   on a older kernel.
> > +4. A malicious/misbehaving user pass illegal/invalid size but
> > within
> > +   range. The data may contain garbage.
> > +
> > +
> > +Feature Checking
> > +----------------
> > +While launching a guest with vIOMMU, it is important to ensure
> > that host +can support the UAPI data structures to be used for
> > vIOMMU-pIOMMU +communications. Without the upfront compatibility
> > checking, future +faults are difficult to report even in normal
> > conditions. For example, +TLB invalidations should always succeed
> > from vIOMMU's +perspective. There is no architectural way to report
> > back to the vIOMMU +if the UAPI data is incompatible. For this
> > reason the following IOMMU +UAPIs cannot fail:
> > +
> > +1. Free PASID
> > +2. Unbind guest PASID
> > +3. Unbind guest PASID table (SMMU)
> > +4. Cache invalidate
> > +5. Page response
> > +
> > +User applications such as QEMU is expected to import kernel UAPI
> > +headers. Only backward compatibility is supported. For example, an
> > +older QEMU (with older kernel header) can run on newer kernel.
> > Newer +QEMU (with new kernel header) may fail on older kernel.
> > +
> > +IOMMU vendor driver should report the below features to IOMMU UAPI
> > +consumers (e.g. via VFIO).
> > +
> > +1. IOMMU_NESTING_FEAT_SYSWIDE_PASID
> > +2. IOMMU_NESTING_FEAT_BIND_PGTBL
> > +3. IOMMU_NESTING_FEAT_BIND_PASID_TABLE
> > +4. IOMMU_NESTING_FEAT_CACHE_INVLD
> > +5. IOMMU_NESTING_FEAT_PAGE_REQUEST
> > +
> > +Take VFIO as example, upon request from VFIO user space (e.g.
> > QEMU), +VFIO kernel code shall query IOMMU vendor driver for the
> > support of +the above features. Query result can then be reported
> > back to the +user-space caller. Details can be found in
> > +Documentation/driver-api/vfio.rst.
> > +
> > +
> > +Data Passing Example with VFIO
> > +------------------------------
> > +As the ubiquitous userspace driver framework, VFIO is already IOMMU
> > +aware and share many key concepts such as device model, group, and
> > +protection domain. Other user driver frameworks can also be
> > extended +to support IOMMU UAPI but it is outside the scope of this
> > document. +
> > +In this tight-knit VFIO-IOMMU interface, the ultimate consumer of
> > the +IOMMU UAPI data is the host IOMMU driver. VFIO facilitates
> > user-kernel +transport, capability checking, security, and life
> > cycle management of +process address space ID (PASID).
> > +
> > +Unlike normal user data passed via VFIO UAPI IOTCL, IOMMU driver
> > is the +ultimate consumer of its UAPI data. At VFIO layer, the
> > IOMMU UAPI data +is wrapped in a VFIO UAPI data for sanity
> > checking. It follows the +pattern below:
> > +
> > +::
> > +
> > +   struct {
> > +	__u32 argsz;
> > +	__u32 flags;
> > +	__u8  data[];
> > +  }
> > +
> > +Here data[] contains the IOMMU UAPI data structures.
> > +
> > +In order to determine the size and feature set of the user data,
> > argsz +and flags are also embedded in the IOMMU UAPI data
> > structures. +A "__u32 argsz" field is *always* at the beginning of
> > each structure. +
> > +For example:
> > +::
> > +
> > +   struct iommu_gpasid_bind_data {
> > +	__u32 argsz;
> > +	__u32 version;
> > +	#define IOMMU_PASID_FORMAT_INTEL_VTD	1
> > +	__u32 format;
> > +	#define IOMMU_SVA_GPASID_VAL	(1 << 0)
> > +	__u64 flags;
> > +	__u64 gpgd;
> > +	__u64 hpasid;
> > +	__u64 gpasid;
> > +	__u32 addr_width;
> > +	__u8  padding[12];
> > +	/* Vendor specific data */
> > +	union {
> > +		struct iommu_gpasid_bind_data_vtd vtd;
> > +	};
> > +  };
> > +
> > +Use bind guest PASID as an example, VFIO code shall process IOMMU
> > UAPI +request as follows:
> > +
> > +::
> > +
> > + 1        /* Minsz must include IOMMU UAPI "argsz" of __u32 */
> > + 2        minsz = offsetofend(struct vfio_iommu_type1_bind, flags)
> > +  
> 
> Where is the vfio_iommu_type1_bind definition? I haven't found it in
> both this series and current code.
> 
It is not in this set. it meant to be an example that follows the
pattern of wrapping iommu uapi data inside vfio uapi.
struct {
	__u32 argsz;
	__u32 flags;
	__u8  data[]; 
}
I will include the bits in the example to make it more clear.

> > +                              sizeof(u32);
> > + 3        copy_from_user(&vfio_bind, (void __user *)arg, minsz);
> > + 4
> > + 5        /* Check VFIO argsz */
> > + 6        if (vfio_bind.argsz < minsz)
> > + 7                return -EINVAL;
> > + 8
> > + 9        /* VFIO flags must be included in minsz */
> > + 10        switch (vfio_bind.flags) {
> > + 11        case VFIO_IOMMU_BIND_GUEST_PGTBL:
> > + 12                /*
> > + 13                 * Get the current IOMMU bind GPASID data size,
> > + 14                 * which accounted for the largest union member.
> > + 15                 */
> > + 16                data_size = sizeof(struct
> > iommu_gpasid_bind_data);
> > + 17                iommu_argsz = vfio_bind.argsz - minsz;
> > + 18                if (iommu_argsz > data_size) {
> > + 19                        /* User data > current kernel */
> > + 20                        return -E2BIG;
> > + 21                }
> > + 22                copy_from_user(&iommu_bind, (void __user *)
> > + 23                               vfio_bind.data, iommu_argsz);
> > + 24               /*
> > + 25                * Deal with trailing bytes that is bigger than
> > user
> > + 26                * provided UAPI size but smaller than the
> > current
> > + 27                * kernel data size. Zero fill the trailing
> > bytes.
> > + 28                */
> > + 29                memset(iommu_bind + iommu_argsz, 0, data_size -
> > + 30                       iommu_argsz;  
> 
> Where is iommu_bind definition? I am assuming that it's
> 
> 	struct iommu_gpasid_bind_data iommu_bind;
> 
> You need to cast it to (void *) to avoid pointer overflow.
> 
> 	memset((void *)&iommu_bind + iommu_argsz, 0, ...);
> 
> By the way, right parenthesis is missed.
> 
good point, i will include more realistic code example.

> > + 31
> > + 32                iommu_sva_bind_gpasid(domain, dev,
> > iommu_bind_data);  
> 
> Where is the iommu_bind_data definition?
> 
that is the iommu_bind, meant to be pseudo code but should be
consistent.


> > + 33                break;
> > +
> > +
> > +Case #1 & 2 are supported per backward compatibility rule.
> > +
> > +Case #3 will fail with -E2BIG at line #20. Case
> > +
> > +Case #4 may result in other error processed by IOMMU vendor
> > driver. However, +the damage shall not exceed the scope of the
> > offending user. 
> 
> The above description is not clear. People are hard to find the right
> description of each case.
> 
You are right, it is not clear. I meant for the four scenarios listed
above. I will use the same word "cases" to reference and create a
subsection title for reference.

"
Compatibility Checking
----------------------
VFIO has to handle the following cases:

1. User and kernel has exact size match
2. An older user with older kernel header (smaller UAPI size)
   running on a newer kernel (larger UAPI size)
3. A newer user with newer kernel header (larger UAPI size) running
   on a older kernel.
4. A malicious/misbehaving user pass illegal/invalid size but within
   range. The data may contain garbage.
"
> Best regards,
> baolu

[Jacob Pan]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 1/3] docs: IOMMU user API
  2020-06-11  9:30   ` Jonathan Cameron
@ 2020-06-12 22:53     ` Jacob Pan
  0 siblings, 0 replies; 31+ messages in thread
From: Jacob Pan @ 2020-06-12 22:53 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Tian, Kevin, Alex Williamson, Raj Ashok, Jonathan Corbet,
	Jean-Philippe Brucker, LKML, Christoph Hellwig, iommu,
	David Woodhouse

Hi Jon,

On Thu, 11 Jun 2020 10:30:32 +0100
Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:

> On Wed, 10 Jun 2020 21:12:13 -0700
> Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
> 
> > IOMMU UAPI is newly introduced to support communications between
> > guest virtual IOMMU and host IOMMU. There has been lots of
> > discussions on how it should work with VFIO UAPI and userspace in
> > general.
> > 
> > This document is indended to clarify the UAPI design and usage. The
> > mechenics of how future extensions should be achieved are also
> > covered  
> 
> mechanics 
> 
will fix,

> > in this documentation.
> > 
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>  
> Mostly seems sensible.  A few comments / queries inline.
> 
> Jonathan
> 
> > ---
> >  Documentation/userspace-api/iommu.rst | 210
> > ++++++++++++++++++++++++++++++++++ 1 file changed, 210 insertions(+)
> >  create mode 100644 Documentation/userspace-api/iommu.rst
> > 
> > diff --git a/Documentation/userspace-api/iommu.rst
> > b/Documentation/userspace-api/iommu.rst new file mode 100644
> > index 000000000000..e95dc5a04a41
> > --- /dev/null
> > +++ b/Documentation/userspace-api/iommu.rst
> > @@ -0,0 +1,210 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +.. iommu:
> > +
> > +=====================================
> > +IOMMU Userspace API
> > +=====================================
> > +
> > +IOMMU UAPI is used for virtualization cases where communications
> > are +needed between physical and virtual IOMMU drivers. For native
> > +usage, IOMMU is a system device which does not need to communicate
> > +with user space directly.
> > +
> > +The primary use cases are guest Shared Virtual Address (SVA) and
> > +guest IO virtual address (IOVA), wherein virtual IOMMU (vIOMMU)
> > is  
> 
> wherein _a_ virtual IOMMU 
right,

> 
> > +required to communicate with the physical IOMMU in the host.
> > +
> > +.. contents:: :local:
> > +
> > +Functionalities
> > +====================================================
> > +Communications of user and kernel involve both directions. The
> > +supported user-kernel APIs are as follows:
> > +
> > +1. Alloc/Free PASID
> > +2. Bind/unbind guest PASID (e.g. Intel VT-d)
> > +3. Bind/unbind guest PASID table (e.g. ARM sMMU)
> > +4. Invalidate IOMMU caches
> > +5. Service page request
> > +
> > +Requirements
> > +====================================================
> > +The IOMMU UAPIs are generic and extensible to meet the following
> > +requirements:
> > +
> > +1. Emulated and para-virtualised vIOMMUs
> > +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.)
> > +3. Extensions to the UAPI shall not break existing user space
> > +
> > +Interfaces
> > +====================================================
> > +Although the data structures defined in IOMMU UAPI are
> > self-contained, +there is no user API functions introduced.
> > Instead, IOMMU UAPI is +designed to work with existing user driver
> > frameworks such as VFIO. +
> > +Extension Rules & Precautions
> > +-----------------------------
> > +When IOMMU UAPI gets extended, the data structures can *only* be
> > +modified in two ways:
> > +
> > +1. Adding new fields by re-purposing the padding[] field. No size
> > change. +2. Adding new union members at the end. May increase in
> > size. +
> > +No new fields can be added *after* the variable size union in that
> > it +will break backward compatibility when offset moves. In both
> > cases, a +new flag must be accompanied with a new field such that
> > the IOMMU +driver can process the data based on the new flag.
> > Version field is +only reserved for the unlikely event of UAPI
> > upgrade at its entirety. +
> > +It's *always* the caller's responsibility to indicate the size of
> > the +structure passed by setting argsz appropriately.
> > +
> > +When IOMMU UAPI extension results in size increase, user such as
> > VFIO +has to handle the following scenarios:
> > +
> > +1. User and kernel has exact size match
> > +2. An older user with older kernel header (smaller UAPI size)
> > running on a
> > +   newer kernel (larger UAPI size)
> > +3. A newer user with newer kernel header (larger UAPI size) running
> > +   on a older kernel.
> > +4. A malicious/misbehaving user pass illegal/invalid size but
> > within
> > +   range. The data may contain garbage.
> > +
> > +
> > +Feature Checking
> > +----------------
> > +While launching a guest with vIOMMU, it is important to ensure
> > that host +can support the UAPI data structures to be used for
> > vIOMMU-pIOMMU +communications. Without the upfront compatibility
> > checking, future +faults are difficult to report even in normal
> > conditions. For example, +TLB invalidations should always succeed
> > from vIOMMU's +perspective.   
> 
> This statement has me concerned.  If a TLB invalidation fails, but
> is reported to the guest as successful do we have possible breaking
> of iommu isolation guarantees?
> 
Good point. we should never report success if TLB invalidation fails.
Perhaps reword as:
"For example, TLB invalidations should always succeed. There is no
architectural way to report back to the vIOMMU if the UAPI data is
incompatible. If that happens, in order to protect IOMMU iosolation
guarantee, we have to resort to not giving completion status. This
may result in VM hang."


> If you get a TLB invalidation not happening, for some reason, that's
> a critical fault, isolate the device using the IOMMU or kill the VM.
> 
> I'd reword it as "TLB invalidations should always succeed."
> 
yes.

> As you mention, we should never get to this state anyway!
> 
> > There is no architectural way to report back to the vIOMMU
> > +if the UAPI data is incompatible. For this reason the following
> > IOMMU +UAPIs cannot fail:
> > +
> > +1. Free PASID
> > +2. Unbind guest PASID
> > +3. Unbind guest PASID table (SMMU)
> > +4. Cache invalidate
> > +5. Page response  
> 
> Of these, page response is a bit different.  Shouldn't be a problem to
> occasionally say a page response was handled when it wasn't (or it
> was but has gone away again before the response reached the
> hardware).  In high load environments that happens anyway sometimes. 
> 
> The others are all cases where any failure at all is fatal to the
> guest continuing to use the hardware.
> 
You are right, PRS fail is expected in that guest could be scheduled
out for a long time. host IOMMU driver has tracking and timeout which
will respond back to the device to avoid device hang.

I will remove Page Response from the list.

Here from UAPI compatibility checking perspective, I meant if PRS data
is not compatible, all page responses could fail, difficult to handle.


> > +
> > +User applications such as QEMU is expected to import kernel UAPI
> > +headers. Only backward compatibility is supported. For example, an
> > +older QEMU (with older kernel header) can run on newer kernel.
> > Newer +QEMU (with new kernel header) may fail on older kernel.  
> 
> I'd define fail a bit tighter here.  I presume refuse to initialize?
> 
agreed. also with Alex's input, we should still support new header on
old kernel as long as only old features are used. So this could be 
"refuse to initialize if new fields/flags are used on the older kernel"

> > +
> > +IOMMU vendor driver should report the below features to IOMMU UAPI
> > +consumers (e.g. via VFIO).
> > +
> > +1. IOMMU_NESTING_FEAT_SYSWIDE_PASID
> > +2. IOMMU_NESTING_FEAT_BIND_PGTBL
> > +3. IOMMU_NESTING_FEAT_BIND_PASID_TABLE
> > +4. IOMMU_NESTING_FEAT_CACHE_INVLD
> > +5. IOMMU_NESTING_FEAT_PAGE_REQUEST
> > +
> > +Take VFIO as example, upon request from VFIO user space (e.g.
> > QEMU), +VFIO kernel code shall query IOMMU vendor driver for the
> > support of +the above features. Query result can then be reported
> > back to the +user-space caller. Details can be found in
> > +Documentation/driver-api/vfio.rst.
> > +
> > +
> > +Data Passing Example with VFIO
> > +------------------------------
> > +As the ubiquitous userspace driver framework, VFIO is already IOMMU
> > +aware and share many key concepts such as device model, group, and
> > +protection domain. Other user driver frameworks can also be
> > extended +to support IOMMU UAPI but it is outside the scope of this
> > document. +
> > +In this tight-knit VFIO-IOMMU interface, the ultimate consumer of
> > the +IOMMU UAPI data is the host IOMMU driver. VFIO facilitates
> > user-kernel +transport, capability checking, security, and life
> > cycle management of +process address space ID (PASID).
> > +
> > +Unlike normal user data passed via VFIO UAPI IOTCL, IOMMU driver
> > is the +ultimate consumer of its UAPI data. At VFIO layer, the
> > IOMMU UAPI data +is wrapped in a VFIO UAPI data for sanity
> > checking. It follows the +pattern below:
> > +
> > +::
> > +
> > +   struct {
> > +	__u32 argsz;
> > +	__u32 flags;
> > +	__u8  data[];
> > +  }  
> 
> That final bracket needs to be indented one more space.
> Make sure you check the output of this file as there are a few of
> these.
> 
will do. thanks

> > +
> > +Here data[] contains the IOMMU UAPI data structures.
> > +
> > +In order to determine the size and feature set of the user data,
> > argsz +and flags are also embedded in the IOMMU UAPI data
> > structures. +A "__u32 argsz" field is *always* at the beginning of
> > each structure. +
> > +For example:
> > +::
> > +
> > +   struct iommu_gpasid_bind_data {
> > +	__u32 argsz;
> > +	__u32 version;
> > +	#define IOMMU_PASID_FORMAT_INTEL_VTD	1
> > +	__u32 format;
> > +	#define IOMMU_SVA_GPASID_VAL	(1 << 0)
> > +	__u64 flags;
> > +	__u64 gpgd;
> > +	__u64 hpasid;
> > +	__u64 gpasid;
> > +	__u32 addr_width;
> > +	__u8  padding[12];
> > +	/* Vendor specific data */
> > +	union {
> > +		struct iommu_gpasid_bind_data_vtd vtd;
> > +	};
> > +  };
> > +
> > +Use bind guest PASID as an example, VFIO code shall process IOMMU
> > UAPI +request as follows:
> > +
> > +::
> > +
> > + 1        /* Minsz must include IOMMU UAPI "argsz" of __u32 */
> > + 2        minsz = offsetofend(struct vfio_iommu_type1_bind, flags)
> > +
> > +                              sizeof(u32);
> > + 3        copy_from_user(&vfio_bind, (void __user *)arg, minsz);
> > + 4
> > + 5        /* Check VFIO argsz */
> > + 6        if (vfio_bind.argsz < minsz)
> > + 7                return -EINVAL;
> > + 8
> > + 9        /* VFIO flags must be included in minsz */  
> 
> Nice to keep indentation across change in line number length.
> 
will do

> > + 10        switch (vfio_bind.flags) {
> > + 11        case VFIO_IOMMU_BIND_GUEST_PGTBL:
> > + 12                /*
> > + 13                 * Get the current IOMMU bind GPASID data size,
> > + 14                 * which accounted for the largest union member.
> > + 15                 */
> > + 16                data_size = sizeof(struct
> > iommu_gpasid_bind_data);
> > + 17                iommu_argsz = vfio_bind.argsz - minsz;
> > + 18                if (iommu_argsz > data_size) {
> > + 19                        /* User data > current kernel */
> > + 20                        return -E2BIG;
> > + 21                }
> > + 22                copy_from_user(&iommu_bind, (void __user *)
> > + 23                               vfio_bind.data, iommu_argsz);
> > + 24               /*
> > + 25                * Deal with trailing bytes that is bigger than
> > user
> > + 26                * provided UAPI size but smaller than the
> > current
> > + 27                * kernel data size. Zero fill the trailing
> > bytes.
> > + 28                */
> > + 29                memset(iommu_bind + iommu_argsz, 0, data_size -
> > + 30                       iommu_argsz;
> > + 31
> > + 32                iommu_sva_bind_gpasid(domain, dev,
> > iommu_bind_data);
> > + 33                break;
> > +
> > +
> > +Case #1 & 2 are supported per backward compatibility rule.
> > +
> > +Case #3 will fail with -E2BIG at line #20. Case  
> 
> Is that always the case?  As we have multiple structures in a union
> it's possible that a given architecture might expand without changing
> the union size. So this might not detect a UAPI change.
> 
not always, if expand without size change, there must be a new flag.
the flag will guide the processing of the new union member.

> > +
> > +Case #4 may result in other error processed by IOMMU vendor
> > driver. However, +the damage shall not exceed the scope of the
> > offending user.  
> 
> 

[Jacob Pan]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 1/3] docs: IOMMU user API
  2020-06-12  0:27         ` Jacob Pan
  2020-06-12  7:38           ` Tian, Kevin
@ 2020-06-16 15:22           ` Jacob Pan
  2020-06-17  6:20             ` Liu, Yi L
  1 sibling, 1 reply; 31+ messages in thread
From: Jacob Pan @ 2020-06-16 15:22 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Tian, Kevin, Raj Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	LKML, Christoph Hellwig, iommu, David Woodhouse

On Thu, 11 Jun 2020 17:27:27 -0700
Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:

> > 
> > But then I thought it even better if VFIO leaves the entire
> > copy_from_user() to the layer consuming it.
> >   
> OK. Sounds good, that was what Kevin suggested also. I just wasn't
> sure how much VFIO wants to inspect, I thought VFIO layer wanted to
> do a sanity check.
> 
> Anyway, I will move copy_from_user to iommu uapi layer.

Just one more point brought up by Yi when we discuss this offline.

If we move copy_from_user to iommu uapi layer, then there will be
multiple copy_from_user calls for the same data when a VFIO container
has multiple domains, devices. For bind, it might be OK. But might be
additional overhead for TLB flush request from the guest.

Thoughts?

Jacob
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH v2 1/3] docs: IOMMU user API
  2020-06-16 15:22           ` Jacob Pan
@ 2020-06-17  6:20             ` Liu, Yi L
  2020-06-17  8:28               ` Tian, Kevin
  0 siblings, 1 reply; 31+ messages in thread
From: Liu, Yi L @ 2020-06-17  6:20 UTC (permalink / raw)
  To: Jacob Pan, Alex Williamson
  Cc: Tian, Kevin, Raj, Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	LKML, Christoph Hellwig, iommu, David Woodhouse

> From: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Sent: Tuesday, June 16, 2020 11:22 PM
> 
> On Thu, 11 Jun 2020 17:27:27 -0700
> Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
> 
> > >
> > > But then I thought it even better if VFIO leaves the entire
> > > copy_from_user() to the layer consuming it.
> > >
> > OK. Sounds good, that was what Kevin suggested also. I just wasn't
> > sure how much VFIO wants to inspect, I thought VFIO layer wanted to do
> > a sanity check.
> >
> > Anyway, I will move copy_from_user to iommu uapi layer.
> 
> Just one more point brought up by Yi when we discuss this offline.
> 
> If we move copy_from_user to iommu uapi layer, then there will be multiple
> copy_from_user calls for the same data when a VFIO container has multiple domains,
> devices. For bind, it might be OK. But might be additional overhead for TLB flush
> request from the guest.

I think it is the same with bind and TLB flush path. will be multiple
copy_from_user.

BTW. for moving data copy to iommy layer, there is another point which
need to consider. VFIO needs to do unbind in bind path if bind failed,
so it will assemble unbind_data and pass to iommu layer. If iommu layer
do the copy_from_user, I think it will be failed. any idea?

Regards,
Yi Liu

> Thoughts?
> 
> Jacob
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH v2 1/3] docs: IOMMU user API
  2020-06-17  6:20             ` Liu, Yi L
@ 2020-06-17  8:28               ` Tian, Kevin
  2020-06-18 21:48                 ` Alex Williamson
  0 siblings, 1 reply; 31+ messages in thread
From: Tian, Kevin @ 2020-06-17  8:28 UTC (permalink / raw)
  To: Liu, Yi L, Jacob Pan, Alex Williamson
  Cc: Raj, Ashok, Jonathan Corbet, Jean-Philippe Brucker, LKML,
	Christoph Hellwig, iommu, David Woodhouse

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Wednesday, June 17, 2020 2:20 PM
> 
> > From: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Sent: Tuesday, June 16, 2020 11:22 PM
> >
> > On Thu, 11 Jun 2020 17:27:27 -0700
> > Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
> >
> > > >
> > > > But then I thought it even better if VFIO leaves the entire
> > > > copy_from_user() to the layer consuming it.
> > > >
> > > OK. Sounds good, that was what Kevin suggested also. I just wasn't
> > > sure how much VFIO wants to inspect, I thought VFIO layer wanted to do
> > > a sanity check.
> > >
> > > Anyway, I will move copy_from_user to iommu uapi layer.
> >
> > Just one more point brought up by Yi when we discuss this offline.
> >
> > If we move copy_from_user to iommu uapi layer, then there will be
> multiple
> > copy_from_user calls for the same data when a VFIO container has
> multiple domains,
> > devices. For bind, it might be OK. But might be additional overhead for TLB
> flush
> > request from the guest.
> 
> I think it is the same with bind and TLB flush path. will be multiple
> copy_from_user.

multiple copies is possibly fine. In reality we allow only one group per
nesting container (as described in patch [03/15]), and usually there
is just one SVA-capable device per group.

> 
> BTW. for moving data copy to iommy layer, there is another point which
> need to consider. VFIO needs to do unbind in bind path if bind failed,
> so it will assemble unbind_data and pass to iommu layer. If iommu layer
> do the copy_from_user, I think it will be failed. any idea?
> 

This might be mitigated if we go back to use the same bind_data for both
bind/unbind. Then you can reuse the user object for unwinding.

However there is another case where VFIO may need to assemble the
bind_data itself. When a VM is killed, VFIO needs to walk allocated PASIDs
and unbind them one-by-one. In such case copy_from_user doesn't work
since the data is created by kernel. Alex, do you have a suggestion how this
usage can be supported? e.g. asking IOMMU driver to provide two sets of
APIs to handle user/kernel generated requests?

Thanks
Kevin
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 1/3] docs: IOMMU user API
  2020-06-17  8:28               ` Tian, Kevin
@ 2020-06-18 21:48                 ` Alex Williamson
  2020-06-19  2:15                   ` Liu, Yi L
  0 siblings, 1 reply; 31+ messages in thread
From: Alex Williamson @ 2020-06-18 21:48 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Raj, Ashok, Jonathan Corbet, Jean-Philippe Brucker, LKML,
	Christoph Hellwig, iommu, David Woodhouse

On Wed, 17 Jun 2020 08:28:24 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Wednesday, June 17, 2020 2:20 PM
> >   
> > > From: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > Sent: Tuesday, June 16, 2020 11:22 PM
> > >
> > > On Thu, 11 Jun 2020 17:27:27 -0700
> > > Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
> > >  
> > > > >
> > > > > But then I thought it even better if VFIO leaves the entire
> > > > > copy_from_user() to the layer consuming it.
> > > > >  
> > > > OK. Sounds good, that was what Kevin suggested also. I just wasn't
> > > > sure how much VFIO wants to inspect, I thought VFIO layer wanted to do
> > > > a sanity check.
> > > >
> > > > Anyway, I will move copy_from_user to iommu uapi layer.  
> > >
> > > Just one more point brought up by Yi when we discuss this offline.
> > >
> > > If we move copy_from_user to iommu uapi layer, then there will be  
> > multiple  
> > > copy_from_user calls for the same data when a VFIO container has  
> > multiple domains,  
> > > devices. For bind, it might be OK. But might be additional overhead for TLB  
> > flush  
> > > request from the guest.  
> > 
> > I think it is the same with bind and TLB flush path. will be multiple
> > copy_from_user.  
> 
> multiple copies is possibly fine. In reality we allow only one group per
> nesting container (as described in patch [03/15]), and usually there
> is just one SVA-capable device per group.
> 
> > 
> > BTW. for moving data copy to iommy layer, there is another point which
> > need to consider. VFIO needs to do unbind in bind path if bind failed,
> > so it will assemble unbind_data and pass to iommu layer. If iommu layer
> > do the copy_from_user, I think it will be failed. any idea?

If a call into a UAPI fails, there should be nothing to undo.  Creating
a partial setup for a failed call that needs to be undone by the caller
is not good practice.

> This might be mitigated if we go back to use the same bind_data for both
> bind/unbind. Then you can reuse the user object for unwinding.
> 
> However there is another case where VFIO may need to assemble the
> bind_data itself. When a VM is killed, VFIO needs to walk allocated PASIDs
> and unbind them one-by-one. In such case copy_from_user doesn't work
> since the data is created by kernel. Alex, do you have a suggestion how this
> usage can be supported? e.g. asking IOMMU driver to provide two sets of
> APIs to handle user/kernel generated requests?

Yes, it seems like vfio would need to make use of a driver API to do
this, we shouldn't be faking a user buffer in the kernel in order to
call through to a UAPI.  Thanks,

Alex

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH v2 1/3] docs: IOMMU user API
  2020-06-18 21:48                 ` Alex Williamson
@ 2020-06-19  2:15                   ` Liu, Yi L
  2020-06-19  2:54                     ` Alex Williamson
  0 siblings, 1 reply; 31+ messages in thread
From: Liu, Yi L @ 2020-06-19  2:15 UTC (permalink / raw)
  To: Alex Williamson, Tian, Kevin
  Cc: Raj, Ashok, Jonathan Corbet, Jean-Philippe Brucker, LKML,
	Christoph Hellwig, iommu, David Woodhouse

Hi Alex,

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Friday, June 19, 2020 5:48 AM
> 
> On Wed, 17 Jun 2020 08:28:24 +0000
> "Tian, Kevin" <kevin.tian@intel.com> wrote:
> 
> > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > Sent: Wednesday, June 17, 2020 2:20 PM
> > >
> > > > From: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > > Sent: Tuesday, June 16, 2020 11:22 PM
> > > >
> > > > On Thu, 11 Jun 2020 17:27:27 -0700
> > > > Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
> > > >
> > > > > >
> > > > > > But then I thought it even better if VFIO leaves the entire
> > > > > > copy_from_user() to the layer consuming it.
> > > > > >
> > > > > OK. Sounds good, that was what Kevin suggested also. I just wasn't
> > > > > sure how much VFIO wants to inspect, I thought VFIO layer wanted to do
> > > > > a sanity check.
> > > > >
> > > > > Anyway, I will move copy_from_user to iommu uapi layer.
> > > >
> > > > Just one more point brought up by Yi when we discuss this offline.
> > > >
> > > > If we move copy_from_user to iommu uapi layer, then there will be
> > > multiple
> > > > copy_from_user calls for the same data when a VFIO container has
> > > multiple domains,
> > > > devices. For bind, it might be OK. But might be additional overhead for TLB
> > > flush
> > > > request from the guest.
> > >
> > > I think it is the same with bind and TLB flush path. will be multiple
> > > copy_from_user.
> >
> > multiple copies is possibly fine. In reality we allow only one group per
> > nesting container (as described in patch [03/15]), and usually there
> > is just one SVA-capable device per group.
> >
> > >
> > > BTW. for moving data copy to iommy layer, there is another point which
> > > need to consider. VFIO needs to do unbind in bind path if bind failed,
> > > so it will assemble unbind_data and pass to iommu layer. If iommu layer
> > > do the copy_from_user, I think it will be failed. any idea?
> 
> If a call into a UAPI fails, there should be nothing to undo.  Creating
> a partial setup for a failed call that needs to be undone by the caller
> is not good practice.

is it still a problem if it's the VFIO to undo the partial setup before
returning to user space?

> > This might be mitigated if we go back to use the same bind_data for both
> > bind/unbind. Then you can reuse the user object for unwinding.
> >
> > However there is another case where VFIO may need to assemble the
> > bind_data itself. When a VM is killed, VFIO needs to walk allocated PASIDs
> > and unbind them one-by-one. In such case copy_from_user doesn't work
> > since the data is created by kernel. Alex, do you have a suggestion how this
> > usage can be supported? e.g. asking IOMMU driver to provide two sets of
> > APIs to handle user/kernel generated requests?
> 
> Yes, it seems like vfio would need to make use of a driver API to do
> this, we shouldn't be faking a user buffer in the kernel in order to
> call through to a UAPI.  Thanks,

ok, so if VFIO wants to issue unbind by itself, it should use an API which
passes kernel buffer to IOMMU layer. If the unbind request is from user
space, then VFIO should use another API which passes user buffer pointer
to IOMMU layer. makes sense. will align with jacob.

Regards,
Yi Liu

> Alex

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 1/3] docs: IOMMU user API
  2020-06-19  2:15                   ` Liu, Yi L
@ 2020-06-19  2:54                     ` Alex Williamson
  2020-06-19  3:30                       ` Liu, Yi L
  0 siblings, 1 reply; 31+ messages in thread
From: Alex Williamson @ 2020-06-19  2:54 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Tian, Kevin, Raj, Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	LKML, Christoph Hellwig, iommu, David Woodhouse

On Fri, 19 Jun 2020 02:15:36 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> Hi Alex,
> 
> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Friday, June 19, 2020 5:48 AM
> > 
> > On Wed, 17 Jun 2020 08:28:24 +0000
> > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >   
> > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > Sent: Wednesday, June 17, 2020 2:20 PM
> > > >  
> > > > > From: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > > > Sent: Tuesday, June 16, 2020 11:22 PM
> > > > >
> > > > > On Thu, 11 Jun 2020 17:27:27 -0700
> > > > > Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
> > > > >  
> > > > > > >
> > > > > > > But then I thought it even better if VFIO leaves the entire
> > > > > > > copy_from_user() to the layer consuming it.
> > > > > > >  
> > > > > > OK. Sounds good, that was what Kevin suggested also. I just wasn't
> > > > > > sure how much VFIO wants to inspect, I thought VFIO layer wanted to do
> > > > > > a sanity check.
> > > > > >
> > > > > > Anyway, I will move copy_from_user to iommu uapi layer.  
> > > > >
> > > > > Just one more point brought up by Yi when we discuss this offline.
> > > > >
> > > > > If we move copy_from_user to iommu uapi layer, then there will be  
> > > > multiple  
> > > > > copy_from_user calls for the same data when a VFIO container has  
> > > > multiple domains,  
> > > > > devices. For bind, it might be OK. But might be additional overhead for TLB  
> > > > flush  
> > > > > request from the guest.  
> > > >
> > > > I think it is the same with bind and TLB flush path. will be multiple
> > > > copy_from_user.  
> > >
> > > multiple copies is possibly fine. In reality we allow only one group per
> > > nesting container (as described in patch [03/15]), and usually there
> > > is just one SVA-capable device per group.
> > >  
> > > >
> > > > BTW. for moving data copy to iommy layer, there is another point which
> > > > need to consider. VFIO needs to do unbind in bind path if bind failed,
> > > > so it will assemble unbind_data and pass to iommu layer. If iommu layer
> > > > do the copy_from_user, I think it will be failed. any idea?  
> > 
> > If a call into a UAPI fails, there should be nothing to undo.  Creating
> > a partial setup for a failed call that needs to be undone by the caller
> > is not good practice.  
> 
> is it still a problem if it's the VFIO to undo the partial setup before
> returning to user space?

Yes.  If a UAPI function fails there should be no residual effect.
 
> > > This might be mitigated if we go back to use the same bind_data for both
> > > bind/unbind. Then you can reuse the user object for unwinding.
> > >
> > > However there is another case where VFIO may need to assemble the
> > > bind_data itself. When a VM is killed, VFIO needs to walk allocated PASIDs
> > > and unbind them one-by-one. In such case copy_from_user doesn't work
> > > since the data is created by kernel. Alex, do you have a suggestion how this
> > > usage can be supported? e.g. asking IOMMU driver to provide two sets of
> > > APIs to handle user/kernel generated requests?  
> > 
> > Yes, it seems like vfio would need to make use of a driver API to do
> > this, we shouldn't be faking a user buffer in the kernel in order to
> > call through to a UAPI.  Thanks,  
> 
> ok, so if VFIO wants to issue unbind by itself, it should use an API which
> passes kernel buffer to IOMMU layer. If the unbind request is from user
> space, then VFIO should use another API which passes user buffer pointer
> to IOMMU layer. makes sense. will align with jacob.

Sounds right to me.  Different approaches might be used for the driver
API versus the UAPI, perhaps there is no buffer.  Thanks,

Alex

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH v2 1/3] docs: IOMMU user API
  2020-06-19  2:54                     ` Alex Williamson
@ 2020-06-19  3:30                       ` Liu, Yi L
  2020-06-19 16:37                         ` Alex Williamson
  0 siblings, 1 reply; 31+ messages in thread
From: Liu, Yi L @ 2020-06-19  3:30 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Tian, Kevin, Raj, Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	LKML, Christoph Hellwig, iommu, David Woodhouse

Hi Alex,

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Friday, June 19, 2020 10:55 AM
> 
> On Fri, 19 Jun 2020 02:15:36 +0000
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> 
> > Hi Alex,
> >
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Friday, June 19, 2020 5:48 AM
> > >
> > > On Wed, 17 Jun 2020 08:28:24 +0000
> > > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> > >
> > > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > > Sent: Wednesday, June 17, 2020 2:20 PM
> > > > >
> > > > > > From: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > > > > Sent: Tuesday, June 16, 2020 11:22 PM
> > > > > >
> > > > > > On Thu, 11 Jun 2020 17:27:27 -0700 Jacob Pan
> > > > > > <jacob.jun.pan@linux.intel.com> wrote:
> > > > > >
> > > > > > > >
> > > > > > > > But then I thought it even better if VFIO leaves the
> > > > > > > > entire
> > > > > > > > copy_from_user() to the layer consuming it.
> > > > > > > >
> > > > > > > OK. Sounds good, that was what Kevin suggested also. I just
> > > > > > > wasn't sure how much VFIO wants to inspect, I thought VFIO
> > > > > > > layer wanted to do a sanity check.
> > > > > > >
> > > > > > > Anyway, I will move copy_from_user to iommu uapi layer.
> > > > > >
> > > > > > Just one more point brought up by Yi when we discuss this offline.
> > > > > >
> > > > > > If we move copy_from_user to iommu uapi layer, then there will
> > > > > > be
> > > > > multiple
> > > > > > copy_from_user calls for the same data when a VFIO container
> > > > > > has
> > > > > multiple domains,
> > > > > > devices. For bind, it might be OK. But might be additional
> > > > > > overhead for TLB
> > > > > flush
> > > > > > request from the guest.
> > > > >
> > > > > I think it is the same with bind and TLB flush path. will be
> > > > > multiple copy_from_user.
> > > >
> > > > multiple copies is possibly fine. In reality we allow only one
> > > > group per nesting container (as described in patch [03/15]), and
> > > > usually there is just one SVA-capable device per group.
> > > >
> > > > >
> > > > > BTW. for moving data copy to iommy layer, there is another point
> > > > > which need to consider. VFIO needs to do unbind in bind path if
> > > > > bind failed, so it will assemble unbind_data and pass to iommu
> > > > > layer. If iommu layer do the copy_from_user, I think it will be failed. any
> idea?
> > >
> > > If a call into a UAPI fails, there should be nothing to undo.
> > > Creating a partial setup for a failed call that needs to be undone
> > > by the caller is not good practice.
> >
> > is it still a problem if it's the VFIO to undo the partial setup
> > before returning to user space?
> 
> Yes.  If a UAPI function fails there should be no residual effect.

ok. the iommu_sva_bind_gpasid() is per device call. There is no residual
effect if it failed. so no partial setup will happen per device.

but VFIO needs to use iommu_group_for_each_dev() to do bind, so
if iommu_group_for_each_dev() failed, I guess VFIO needs to undo
the partial setup for the group. right?

> > > > This might be mitigated if we go back to use the same bind_data
> > > > for both bind/unbind. Then you can reuse the user object for unwinding.
> > > >
> > > > However there is another case where VFIO may need to assemble the
> > > > bind_data itself. When a VM is killed, VFIO needs to walk
> > > > allocated PASIDs and unbind them one-by-one. In such case
> > > > copy_from_user doesn't work since the data is created by kernel.
> > > > Alex, do you have a suggestion how this usage can be supported?
> > > > e.g. asking IOMMU driver to provide two sets of APIs to handle user/kernel
> generated requests?
> > >
> > > Yes, it seems like vfio would need to make use of a driver API to do
> > > this, we shouldn't be faking a user buffer in the kernel in order to
> > > call through to a UAPI.  Thanks,
> >
> > ok, so if VFIO wants to issue unbind by itself, it should use an API
> > which passes kernel buffer to IOMMU layer. If the unbind request is
> > from user space, then VFIO should use another API which passes user
> > buffer pointer to IOMMU layer. makes sense. will align with jacob.
> 
> Sounds right to me.  Different approaches might be used for the driver API versus
> the UAPI, perhaps there is no buffer.  Thanks,

thanks for your coaching. It may require Jacob to add APIs in iommu layer
for the two purposes.

Regards,
Yi Liu

> Alex

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 1/3] docs: IOMMU user API
  2020-06-19  3:30                       ` Liu, Yi L
@ 2020-06-19 16:37                         ` Alex Williamson
  2020-06-21  5:46                           ` Liu, Yi L
  0 siblings, 1 reply; 31+ messages in thread
From: Alex Williamson @ 2020-06-19 16:37 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Tian, Kevin, Raj, Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	LKML, Christoph Hellwig, iommu, David Woodhouse

On Fri, 19 Jun 2020 03:30:24 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> Hi Alex,
> 
> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Friday, June 19, 2020 10:55 AM
> > 
> > On Fri, 19 Jun 2020 02:15:36 +0000
> > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> >   
> > > Hi Alex,
> > >  
> > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > Sent: Friday, June 19, 2020 5:48 AM
> > > >
> > > > On Wed, 17 Jun 2020 08:28:24 +0000
> > > > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> > > >  
> > > > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > > > Sent: Wednesday, June 17, 2020 2:20 PM
> > > > > >  
> > > > > > > From: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > > > > > Sent: Tuesday, June 16, 2020 11:22 PM
> > > > > > >
> > > > > > > On Thu, 11 Jun 2020 17:27:27 -0700 Jacob Pan
> > > > > > > <jacob.jun.pan@linux.intel.com> wrote:
> > > > > > >  
> > > > > > > > >
> > > > > > > > > But then I thought it even better if VFIO leaves the
> > > > > > > > > entire
> > > > > > > > > copy_from_user() to the layer consuming it.
> > > > > > > > >  
> > > > > > > > OK. Sounds good, that was what Kevin suggested also. I just
> > > > > > > > wasn't sure how much VFIO wants to inspect, I thought VFIO
> > > > > > > > layer wanted to do a sanity check.
> > > > > > > >
> > > > > > > > Anyway, I will move copy_from_user to iommu uapi layer.  
> > > > > > >
> > > > > > > Just one more point brought up by Yi when we discuss this offline.
> > > > > > >
> > > > > > > If we move copy_from_user to iommu uapi layer, then there will
> > > > > > > be  
> > > > > > multiple  
> > > > > > > copy_from_user calls for the same data when a VFIO container
> > > > > > > has  
> > > > > > multiple domains,  
> > > > > > > devices. For bind, it might be OK. But might be additional
> > > > > > > overhead for TLB  
> > > > > > flush  
> > > > > > > request from the guest.  
> > > > > >
> > > > > > I think it is the same with bind and TLB flush path. will be
> > > > > > multiple copy_from_user.  
> > > > >
> > > > > multiple copies is possibly fine. In reality we allow only one
> > > > > group per nesting container (as described in patch [03/15]), and
> > > > > usually there is just one SVA-capable device per group.
> > > > >  
> > > > > >
> > > > > > BTW. for moving data copy to iommy layer, there is another point
> > > > > > which need to consider. VFIO needs to do unbind in bind path if
> > > > > > bind failed, so it will assemble unbind_data and pass to iommu
> > > > > > layer. If iommu layer do the copy_from_user, I think it will be failed. any  
> > idea?  
> > > >
> > > > If a call into a UAPI fails, there should be nothing to undo.
> > > > Creating a partial setup for a failed call that needs to be undone
> > > > by the caller is not good practice.  
> > >
> > > is it still a problem if it's the VFIO to undo the partial setup
> > > before returning to user space?  
> > 
> > Yes.  If a UAPI function fails there should be no residual effect.  
> 
> ok. the iommu_sva_bind_gpasid() is per device call. There is no residual
> effect if it failed. so no partial setup will happen per device.
> 
> but VFIO needs to use iommu_group_for_each_dev() to do bind, so
> if iommu_group_for_each_dev() failed, I guess VFIO needs to undo
> the partial setup for the group. right?

Yes, each individual call should make no changes if it fails, but the
caller would need to unwind separate calls.  If this introduces too
much knowledge to the caller for the group case, maybe there should be
a group-level function in the iommu code to handle that.  Thanks,

Alex

> > > > > This might be mitigated if we go back to use the same bind_data
> > > > > for both bind/unbind. Then you can reuse the user object for unwinding.
> > > > >
> > > > > However there is another case where VFIO may need to assemble the
> > > > > bind_data itself. When a VM is killed, VFIO needs to walk
> > > > > allocated PASIDs and unbind them one-by-one. In such case
> > > > > copy_from_user doesn't work since the data is created by kernel.
> > > > > Alex, do you have a suggestion how this usage can be supported?
> > > > > e.g. asking IOMMU driver to provide two sets of APIs to handle user/kernel  
> > generated requests?  
> > > >
> > > > Yes, it seems like vfio would need to make use of a driver API to do
> > > > this, we shouldn't be faking a user buffer in the kernel in order to
> > > > call through to a UAPI.  Thanks,  
> > >
> > > ok, so if VFIO wants to issue unbind by itself, it should use an API
> > > which passes kernel buffer to IOMMU layer. If the unbind request is
> > > from user space, then VFIO should use another API which passes user
> > > buffer pointer to IOMMU layer. makes sense. will align with jacob.  
> > 
> > Sounds right to me.  Different approaches might be used for the driver API versus
> > the UAPI, perhaps there is no buffer.  Thanks,  
> 
> thanks for your coaching. It may require Jacob to add APIs in iommu layer
> for the two purposes.
> 
> Regards,
> Yi Liu
> 
> > Alex  
> 

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH v2 1/3] docs: IOMMU user API
  2020-06-19 16:37                         ` Alex Williamson
@ 2020-06-21  5:46                           ` Liu, Yi L
  0 siblings, 0 replies; 31+ messages in thread
From: Liu, Yi L @ 2020-06-21  5:46 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Tian, Kevin, Raj, Ashok, Jonathan Corbet, Jean-Philippe Brucker,
	LKML, Christoph Hellwig, iommu, David Woodhouse

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Saturday, June 20, 2020 12:38 AM
> 
> On Fri, 19 Jun 2020 03:30:24 +0000
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> 
> > Hi Alex,
> >
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Friday, June 19, 2020 10:55 AM
> > >
> > > On Fri, 19 Jun 2020 02:15:36 +0000
> > > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> > >
> > > > Hi Alex,
> > > >
> > > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > > Sent: Friday, June 19, 2020 5:48 AM
> > > > >
> > > > > On Wed, 17 Jun 2020 08:28:24 +0000
> > > > > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> > > > >
> > > > > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > > > > Sent: Wednesday, June 17, 2020 2:20 PM
> > > > > > >
> > > > > > > > From: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > > > > > > Sent: Tuesday, June 16, 2020 11:22 PM
> > > > > > > >
> > > > > > > > On Thu, 11 Jun 2020 17:27:27 -0700 Jacob Pan
> > > > > > > > <jacob.jun.pan@linux.intel.com> wrote:
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > > But then I thought it even better if VFIO leaves the
> > > > > > > > > > entire
> > > > > > > > > > copy_from_user() to the layer consuming it.
> > > > > > > > > >
> > > > > > > > > OK. Sounds good, that was what Kevin suggested also. I just
> > > > > > > > > wasn't sure how much VFIO wants to inspect, I thought VFIO
> > > > > > > > > layer wanted to do a sanity check.
> > > > > > > > >
> > > > > > > > > Anyway, I will move copy_from_user to iommu uapi layer.
> > > > > > > >
> > > > > > > > Just one more point brought up by Yi when we discuss this offline.
> > > > > > > >
> > > > > > > > If we move copy_from_user to iommu uapi layer, then there will
> > > > > > > > be
> > > > > > > multiple
> > > > > > > > copy_from_user calls for the same data when a VFIO container
> > > > > > > > has
> > > > > > > multiple domains,
> > > > > > > > devices. For bind, it might be OK. But might be additional
> > > > > > > > overhead for TLB
> > > > > > > flush
> > > > > > > > request from the guest.
> > > > > > >
> > > > > > > I think it is the same with bind and TLB flush path. will be
> > > > > > > multiple copy_from_user.
> > > > > >
> > > > > > multiple copies is possibly fine. In reality we allow only one
> > > > > > group per nesting container (as described in patch [03/15]), and
> > > > > > usually there is just one SVA-capable device per group.
> > > > > >
> > > > > > >
> > > > > > > BTW. for moving data copy to iommy layer, there is another point
> > > > > > > which need to consider. VFIO needs to do unbind in bind path if
> > > > > > > bind failed, so it will assemble unbind_data and pass to iommu
> > > > > > > layer. If iommu layer do the copy_from_user, I think it will be failed.
> any
> > > idea?
> > > > >
> > > > > If a call into a UAPI fails, there should be nothing to undo.
> > > > > Creating a partial setup for a failed call that needs to be undone
> > > > > by the caller is not good practice.
> > > >
> > > > is it still a problem if it's the VFIO to undo the partial setup
> > > > before returning to user space?
> > >
> > > Yes.  If a UAPI function fails there should be no residual effect.
> >
> > ok. the iommu_sva_bind_gpasid() is per device call. There is no residual
> > effect if it failed. so no partial setup will happen per device.
> >
> > but VFIO needs to use iommu_group_for_each_dev() to do bind, so
> > if iommu_group_for_each_dev() failed, I guess VFIO needs to undo
> > the partial setup for the group. right?
> 
> Yes, each individual call should make no changes if it fails, but the
> caller would need to unwind separate calls.  If this introduces too
> much knowledge to the caller for the group case, maybe there should be
> a group-level function in the iommu code to handle that.  Thanks,


got you. I don't think VFIO needs too much knowledge except the
group info and the bind data. may send updated version based on
your comments.

Thanks,
Yi Liu

> Alex
> 
> > > > > > This might be mitigated if we go back to use the same bind_data
> > > > > > for both bind/unbind. Then you can reuse the user object for unwinding.
> > > > > >
> > > > > > However there is another case where VFIO may need to assemble the
> > > > > > bind_data itself. When a VM is killed, VFIO needs to walk
> > > > > > allocated PASIDs and unbind them one-by-one. In such case
> > > > > > copy_from_user doesn't work since the data is created by kernel.
> > > > > > Alex, do you have a suggestion how this usage can be supported?
> > > > > > e.g. asking IOMMU driver to provide two sets of APIs to handle
> user/kernel
> > > generated requests?
> > > > >
> > > > > Yes, it seems like vfio would need to make use of a driver API to do
> > > > > this, we shouldn't be faking a user buffer in the kernel in order to
> > > > > call through to a UAPI.  Thanks,
> > > >
> > > > ok, so if VFIO wants to issue unbind by itself, it should use an API
> > > > which passes kernel buffer to IOMMU layer. If the unbind request is
> > > > from user space, then VFIO should use another API which passes user
> > > > buffer pointer to IOMMU layer. makes sense. will align with jacob.
> > >
> > > Sounds right to me.  Different approaches might be used for the driver API
> versus
> > > the UAPI, perhaps there is no buffer.  Thanks,
> >
> > thanks for your coaching. It may require Jacob to add APIs in iommu layer
> > for the two purposes.
> >
> > Regards,
> > Yi Liu
> >
> > > Alex
> >

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2020-06-21  5:46 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-11  4:12 [PATCH v2 0/3] IOMMU user API enhancement Jacob Pan
2020-06-11  4:12 ` [PATCH v2 1/3] docs: IOMMU user API Jacob Pan
2020-06-11  6:33   ` Lu Baolu
2020-06-12 22:05     ` Jacob Pan
2020-06-11  9:30   ` Jonathan Cameron
2020-06-12 22:53     ` Jacob Pan
2020-06-11 13:55   ` Jonathan Corbet
2020-06-11 16:38     ` Jacob Pan
2020-06-11 15:47   ` Alex Williamson
2020-06-11 19:52     ` Jacob Pan
2020-06-11 20:40       ` Alex Williamson
2020-06-12  0:27         ` Jacob Pan
2020-06-12  7:38           ` Tian, Kevin
2020-06-12 13:09             ` Jacob Pan
2020-06-16 15:22           ` Jacob Pan
2020-06-17  6:20             ` Liu, Yi L
2020-06-17  8:28               ` Tian, Kevin
2020-06-18 21:48                 ` Alex Williamson
2020-06-19  2:15                   ` Liu, Yi L
2020-06-19  2:54                     ` Alex Williamson
2020-06-19  3:30                       ` Liu, Yi L
2020-06-19 16:37                         ` Alex Williamson
2020-06-21  5:46                           ` Liu, Yi L
2020-06-11  4:12 ` [PATCH v2 2/3] iommu/uapi: Add argsz for user filled data Jacob Pan
2020-06-11 16:49   ` Alex Williamson
2020-06-12  0:02     ` Jacob Pan
2020-06-11  4:12 ` [PATCH v2 3/3] iommu/vt-d: Sanity check uapi argsz filled by users Jacob Pan
2020-06-11 17:08   ` Alex Williamson
2020-06-11 20:02     ` Jacob Pan
2020-06-11 20:55       ` Alex Williamson
2020-06-11 23:58         ` Jacob Pan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).