kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part)
@ 2021-02-23 20:56 Eric Auger
  2021-02-23 20:56 ` [PATCH v14 01/13] iommu: Introduce attach/detach_pasid_table API Eric Auger
                   ` (14 more replies)
  0 siblings, 15 replies; 46+ messages in thread
From: Eric Auger @ 2021-02-23 20:56 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, iommu, linux-kernel, kvm, kvmarm,
	will, maz, robin.murphy, joro, alex.williamson, tn, zhukeqian1
  Cc: jacob.jun.pan, yi.l.liu, wangxingang5, jiangkunkun,
	jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, yuzenghui, nicoleotsuka, lushenming,
	vsethi

This series brings the IOMMU part of HW nested paging support
in the SMMUv3. The VFIO part is submitted separately.

This is based on Jean-Philippe's
[PATCH v12 00/10] iommu: I/O page faults for SMMUv3
https://lore.kernel.org/linux-arm-kernel/YBfij71tyYvh8LhB@myrica/T/

The IOMMU API is extended to support 2 new API functionalities:
1) pass the guest stage 1 configuration
2) pass stage 1 MSI bindings

Then those capabilities gets implemented in the SMMUv3 driver.

The virtualizer passes information through the VFIO user API
which cascades them to the iommu subsystem. This allows the guest
to own stage 1 tables and context descriptors (so-called PASID
table) while the host owns stage 2 tables and main configuration
structures (STE).

Best Regards

Eric

This series can be found at:
https://github.com/eauger/linux/tree/v5.11-stallv12-2stage-v14
(including the VFIO part in its last version: v12)

The VFIO series is sent separately.

History:

Previous version (v13):
https://github.com/eauger/linux/tree/5.10-rc4-2stage-v13

v13 -> v14:
- Took into account all received comments I think. Great
  thanks to all the testers for their effort and sometimes
  fixes. I am really grateful to you!
- numerous fixes including guest running in
  noiommu, iommu.strict=0, iommu.passthrough=on,
  enable_unsafe_noiommu_mode

v12 -> v13:
- fixed compilation issue with CONFIG_ARM_SMMU_V3_SVA
  reported by Shameer. This urged me to revisit patch 4 into
  iommu/smmuv3: Allow s1 and s2 configs to coexist where
  s1_cfg and s2_cfg are not dynamically allocated anymore.
  Instead I use a new set field in existing structs
- fixed 2 others config checks
- Updated "iommu/arm-smmu-v3: Maintain a SID->device structure"
  according to the last version

v11 -> v12:
- rebase on top of v5.10-rc4

Eric Auger (13):
  iommu: Introduce attach/detach_pasid_table API
  iommu: Introduce bind/unbind_guest_msi
  iommu/smmuv3: Allow s1 and s2 configs to coexist
  iommu/smmuv3: Get prepared for nested stage support
  iommu/smmuv3: Implement attach/detach_pasid_table
  iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
  iommu/smmuv3: Implement cache_invalidate
  dma-iommu: Implement NESTED_MSI cookie
  iommu/smmuv3: Nested mode single MSI doorbell per domain enforcement
  iommu/smmuv3: Enforce incompatibility between nested mode and HW MSI
    regions
  iommu/smmuv3: Implement bind/unbind_guest_msi
  iommu/smmuv3: report additional recoverable faults
  iommu/smmuv3: Accept configs with more than one context descriptor

 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 444 ++++++++++++++++++--
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  14 +-
 drivers/iommu/dma-iommu.c                   | 142 ++++++-
 drivers/iommu/iommu.c                       | 106 +++++
 include/linux/dma-iommu.h                   |  16 +
 include/linux/iommu.h                       |  47 +++
 include/uapi/linux/iommu.h                  |  54 +++
 7 files changed, 781 insertions(+), 42 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v14 01/13] iommu: Introduce attach/detach_pasid_table API
  2021-02-23 20:56 [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part) Eric Auger
@ 2021-02-23 20:56 ` Eric Auger
  2021-02-23 20:56 ` [PATCH v14 02/13] iommu: Introduce bind/unbind_guest_msi Eric Auger
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 46+ messages in thread
From: Eric Auger @ 2021-02-23 20:56 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, iommu, linux-kernel, kvm, kvmarm,
	will, maz, robin.murphy, joro, alex.williamson, tn, zhukeqian1
  Cc: jacob.jun.pan, yi.l.liu, wangxingang5, jiangkunkun,
	jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, yuzenghui, nicoleotsuka, lushenming,
	vsethi

In virtualization use case, when a guest is assigned
a PCI host device, protected by a virtual IOMMU on the guest,
the physical IOMMU must be programmed to be consistent with
the guest mappings. If the physical IOMMU supports two
translation stages it makes sense to program guest mappings
onto the first stage/level (ARM/Intel terminology) while the host
owns the stage/level 2.

In that case, it is mandated to trap on guest configuration
settings and pass those to the physical iommu driver.

This patch adds a new API to the iommu subsystem that allows
to set/unset the pasid table information.

A generic iommu_pasid_table_config struct is introduced in
a new iommu.h uapi header. This is going to be used by the VFIO
user API.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v13 -> v14:
- export iommu_attach_pasid_table
- add dummy iommu_uapi_attach_pasid_table
- swap base_ptr and format in iommu_pasid_table_config

v12 -> v13:
- Fix config check

v11 -> v12:
- add argsz, name the union
---
 drivers/iommu/iommu.c      | 69 ++++++++++++++++++++++++++++++++++++++
 include/linux/iommu.h      | 27 +++++++++++++++
 include/uapi/linux/iommu.h | 54 +++++++++++++++++++++++++++++
 3 files changed, 150 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d0b0a15dba84..90bacf000789 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2200,6 +2200,75 @@ int iommu_uapi_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev
 }
 EXPORT_SYMBOL_GPL(iommu_uapi_sva_unbind_gpasid);
 
+int iommu_attach_pasid_table(struct iommu_domain *domain,
+			     struct iommu_pasid_table_config *cfg)
+{
+	if (unlikely(!domain->ops->attach_pasid_table))
+		return -ENODEV;
+
+	return domain->ops->attach_pasid_table(domain, cfg);
+}
+EXPORT_SYMBOL_GPL(iommu_attach_pasid_table);
+
+int iommu_uapi_attach_pasid_table(struct iommu_domain *domain,
+				  void __user *uinfo)
+{
+	struct iommu_pasid_table_config pasid_table_data = { 0 };
+	u32 minsz;
+
+	if (unlikely(!domain->ops->attach_pasid_table))
+		return -ENODEV;
+
+	/*
+	 * No new spaces can be added before the variable sized union, the
+	 * minimum size is the offset to the union.
+	 */
+	minsz = offsetof(struct iommu_pasid_table_config, vendor_data);
+
+	/* Copy minsz from user to get flags and argsz */
+	if (copy_from_user(&pasid_table_data, uinfo, minsz))
+		return -EFAULT;
+
+	/* Fields before the variable size union are mandatory */
+	if (pasid_table_data.argsz < minsz)
+		return -EINVAL;
+
+	/* PASID and address granu require additional info beyond minsz */
+	if (pasid_table_data.version != PASID_TABLE_CFG_VERSION_1)
+		return -EINVAL;
+	if (pasid_table_data.format == IOMMU_PASID_FORMAT_SMMUV3 &&
+	    pasid_table_data.argsz <
+		offsetofend(struct iommu_pasid_table_config, vendor_data.smmuv3))
+		return -EINVAL;
+
+	/*
+	 * User might be using a newer UAPI header which has a larger data
+	 * size, we shall support the existing flags within the current
+	 * size. Copy the remaining user data _after_ minsz but not more
+	 * than the current kernel supported size.
+	 */
+	if (copy_from_user((void *)&pasid_table_data + minsz, uinfo + minsz,
+			   min_t(u32, pasid_table_data.argsz, sizeof(pasid_table_data)) - minsz))
+		return -EFAULT;
+
+	/* Now the argsz is validated, check the content */
+	if (pasid_table_data.config < IOMMU_PASID_CONFIG_TRANSLATE ||
+	    pasid_table_data.config > IOMMU_PASID_CONFIG_ABORT)
+		return -EINVAL;
+
+	return domain->ops->attach_pasid_table(domain, &pasid_table_data);
+}
+EXPORT_SYMBOL_GPL(iommu_uapi_attach_pasid_table);
+
+void iommu_detach_pasid_table(struct iommu_domain *domain)
+{
+	if (unlikely(!domain->ops->detach_pasid_table))
+		return;
+
+	domain->ops->detach_pasid_table(domain);
+}
+EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
+
 static void __iommu_detach_device(struct iommu_domain *domain,
 				  struct device *dev)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 86d688c4418f..c4422975359e 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -239,6 +239,8 @@ struct iommu_iotlb_gather {
  * @cache_invalidate: invalidate translation caches
  * @sva_bind_gpasid: bind guest pasid and mm
  * @sva_unbind_gpasid: unbind guest pasid and mm
+ * @attach_pasid_table: attach a pasid table
+ * @detach_pasid_table: detach the pasid table
  * @def_domain_type: device default domain type, return value:
  *		- IOMMU_DOMAIN_IDENTITY: must use an identity domain
  *		- IOMMU_DOMAIN_DMA: must use a dma domain
@@ -304,6 +306,9 @@ struct iommu_ops {
 				      void *drvdata);
 	void (*sva_unbind)(struct iommu_sva *handle);
 	u32 (*sva_get_pasid)(struct iommu_sva *handle);
+	int (*attach_pasid_table)(struct iommu_domain *domain,
+				  struct iommu_pasid_table_config *cfg);
+	void (*detach_pasid_table)(struct iommu_domain *domain);
 
 	int (*page_response)(struct device *dev,
 			     struct iommu_fault_event *evt,
@@ -454,6 +459,11 @@ extern int iommu_uapi_sva_unbind_gpasid(struct iommu_domain *domain,
 					struct device *dev, void __user *udata);
 extern int iommu_sva_unbind_gpasid(struct iommu_domain *domain,
 				   struct device *dev, ioasid_t pasid);
+extern int iommu_attach_pasid_table(struct iommu_domain *domain,
+				    struct iommu_pasid_table_config *cfg);
+extern int iommu_uapi_attach_pasid_table(struct iommu_domain *domain,
+					 void __user *udata);
+extern void iommu_detach_pasid_table(struct iommu_domain *domain);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
@@ -1028,6 +1038,23 @@ iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev)
 	return -ENODEV;
 }
 
+static inline
+int iommu_attach_pasid_table(struct iommu_domain *domain,
+			     struct iommu_pasid_table_config *cfg)
+{
+	return -ENODEV;
+}
+
+static inline
+int iommu_uapi_attach_pasid_table(struct iommu_domain *domain,
+				  void __user *uinfo)
+{
+	return -ENODEV;
+}
+
+static inline
+void iommu_detach_pasid_table(struct iommu_domain *domain) {}
+
 static inline struct iommu_sva *
 iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
 {
diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index e1d9e75f2c94..40c28bb0e1bf 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -338,4 +338,58 @@ struct iommu_gpasid_bind_data {
 	} vendor;
 };
 
+/**
+ * struct iommu_pasid_smmuv3 - ARM SMMUv3 Stream Table Entry stage 1 related
+ *     information
+ * @version: API version of this structure
+ * @s1fmt: STE s1fmt (format of the CD table: single CD, linear table
+ *         or 2-level table)
+ * @s1dss: STE s1dss (specifies the behavior when @pasid_bits != 0
+ *         and no PASID is passed along with the incoming transaction)
+ * @padding: reserved for future use (should be zero)
+ *
+ * The PASID table is referred to as the Context Descriptor (CD) table on ARM
+ * SMMUv3. Please refer to the ARM SMMU 3.x spec (ARM IHI 0070A) for full
+ * details.
+ */
+struct iommu_pasid_smmuv3 {
+#define PASID_TABLE_SMMUV3_CFG_VERSION_1 1
+	__u32	version;
+	__u8	s1fmt;
+	__u8	s1dss;
+	__u8	padding[2];
+};
+
+/**
+ * struct iommu_pasid_table_config - PASID table data used to bind guest PASID
+ *     table to the host IOMMU
+ * @argsz: User filled size of this data
+ * @version: API version to prepare for future extensions
+ * @base_ptr: guest physical address of the PASID table
+ * @format: format of the PASID table
+ * @pasid_bits: number of PASID bits used in the PASID table
+ * @config: indicates whether the guest translation stage must
+ *          be translated, bypassed or aborted.
+ * @padding: reserved for future use (should be zero)
+ * @vendor_data.smmuv3: table information when @format is
+ * %IOMMU_PASID_FORMAT_SMMUV3
+ */
+struct iommu_pasid_table_config {
+	__u32	argsz;
+#define PASID_TABLE_CFG_VERSION_1 1
+	__u32	version;
+	__u64	base_ptr;
+#define IOMMU_PASID_FORMAT_SMMUV3	1
+	__u32	format;
+	__u8	pasid_bits;
+#define IOMMU_PASID_CONFIG_TRANSLATE	1
+#define IOMMU_PASID_CONFIG_BYPASS	2
+#define IOMMU_PASID_CONFIG_ABORT	3
+	__u8	config;
+	__u8    padding[2];
+	union {
+		struct iommu_pasid_smmuv3 smmuv3;
+	} vendor_data;
+};
+
 #endif /* _UAPI_IOMMU_H */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v14 02/13] iommu: Introduce bind/unbind_guest_msi
  2021-02-23 20:56 [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part) Eric Auger
  2021-02-23 20:56 ` [PATCH v14 01/13] iommu: Introduce attach/detach_pasid_table API Eric Auger
@ 2021-02-23 20:56 ` Eric Auger
  2021-02-23 20:56 ` [PATCH v14 03/13] iommu/smmuv3: Allow s1 and s2 configs to coexist Eric Auger
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 46+ messages in thread
From: Eric Auger @ 2021-02-23 20:56 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, iommu, linux-kernel, kvm, kvmarm,
	will, maz, robin.murphy, joro, alex.williamson, tn, zhukeqian1
  Cc: jacob.jun.pan, yi.l.liu, wangxingang5, jiangkunkun,
	jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, yuzenghui, nicoleotsuka, lushenming,
	vsethi

On ARM, MSI are translated by the SMMU. An IOVA is allocated
for each MSI doorbell. If both the host and the guest are exposed
with SMMUs, we end up with 2 different IOVAs allocated by each.
guest allocates an IOVA (gIOVA) to map onto the guest MSI
doorbell (gDB). The Host allocates another IOVA (hIOVA) to map
onto the physical doorbell (hDB).

So we end up with 2 untied mappings:
         S1            S2
gIOVA    ->    gDB
              hIOVA    ->    hDB

Currently the PCI device is programmed by the host with hIOVA
as MSI doorbell. So this does not work.

This patch introduces an API to pass gIOVA/gDB to the host so
that gIOVA can be reused by the host instead of re-allocating
a new IOVA. So the goal is to create the following nested mapping:

         S1            S2
gIOVA    ->    gDB     ->    hDB

and program the PCI device with gIOVA MSI doorbell.

In case we have several devices attached to this nested domain
(devices belonging to the same group), they cannot be isolated
on guest side either. So they should also end up in the same domain
on guest side. We will enforce that all the devices attached to
the host iommu domain use the same physical doorbell and similarly
a single virtual doorbell mapping gets registered (1 single
virtual doorbell is used on guest as well).

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v13 -> v14:
- s/iova/giova in iommu_unbind_guest_msi proto (Kequian)

v7 -> v8:
- dummy iommu_unbind_guest_msi turned into a void function

v6 -> v7:
- remove the device handle parameter.
- Add comments saying there can only be a single MSI binding
  registered per iommu_domain
v5 -> v6:
-fix compile issue when IOMMU_API is not set

v3 -> v4:
- add unbind

v2 -> v3:
- add a struct device handle
---
 drivers/iommu/iommu.c | 37 +++++++++++++++++++++++++++++++++++++
 include/linux/iommu.h | 20 ++++++++++++++++++++
 2 files changed, 57 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 90bacf000789..1853279216eb 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2282,6 +2282,43 @@ static void __iommu_detach_device(struct iommu_domain *domain,
 	trace_detach_device_from_domain(dev);
 }
 
+/**
+ * iommu_bind_guest_msi - Passes the stage1 GIOVA/GPA mapping of a
+ * virtual doorbell
+ *
+ * @domain: iommu domain the stage 1 mapping will be attached to
+ * @iova: iova allocated by the guest
+ * @gpa: guest physical address of the virtual doorbell
+ * @size: granule size used for the mapping
+ *
+ * The associated IOVA can be reused by the host to create a nested
+ * stage2 binding mapping translating into the physical doorbell used
+ * by the devices attached to the domain.
+ *
+ * All devices within the domain must share the same physical doorbell.
+ * A single MSI GIOVA/GPA mapping can be attached to an iommu_domain.
+ */
+
+int iommu_bind_guest_msi(struct iommu_domain *domain,
+			 dma_addr_t giova, phys_addr_t gpa, size_t size)
+{
+	if (unlikely(!domain->ops->bind_guest_msi))
+		return -ENODEV;
+
+	return domain->ops->bind_guest_msi(domain, giova, gpa, size);
+}
+EXPORT_SYMBOL_GPL(iommu_bind_guest_msi);
+
+void iommu_unbind_guest_msi(struct iommu_domain *domain,
+			    dma_addr_t giova)
+{
+	if (unlikely(!domain->ops->unbind_guest_msi))
+		return;
+
+	domain->ops->unbind_guest_msi(domain, giova);
+}
+EXPORT_SYMBOL_GPL(iommu_unbind_guest_msi);
+
 void iommu_detach_device(struct iommu_domain *domain, struct device *dev)
 {
 	struct iommu_group *group;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index c4422975359e..72bda5d93951 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -241,6 +241,8 @@ struct iommu_iotlb_gather {
  * @sva_unbind_gpasid: unbind guest pasid and mm
  * @attach_pasid_table: attach a pasid table
  * @detach_pasid_table: detach the pasid table
+ * @bind_guest_msi: provides a stage1 giova/gpa MSI doorbell mapping
+ * @unbind_guest_msi: withdraw a stage1 giova/gpa MSI doorbell mapping
  * @def_domain_type: device default domain type, return value:
  *		- IOMMU_DOMAIN_IDENTITY: must use an identity domain
  *		- IOMMU_DOMAIN_DMA: must use a dma domain
@@ -322,6 +324,10 @@ struct iommu_ops {
 
 	int (*def_domain_type)(struct device *dev);
 
+	int (*bind_guest_msi)(struct iommu_domain *domain,
+			      dma_addr_t giova, phys_addr_t gpa, size_t size);
+	void (*unbind_guest_msi)(struct iommu_domain *domain, dma_addr_t giova);
+
 	unsigned long pgsize_bitmap;
 	struct module *owner;
 };
@@ -464,6 +470,10 @@ extern int iommu_attach_pasid_table(struct iommu_domain *domain,
 extern int iommu_uapi_attach_pasid_table(struct iommu_domain *domain,
 					 void __user *udata);
 extern void iommu_detach_pasid_table(struct iommu_domain *domain);
+extern int iommu_bind_guest_msi(struct iommu_domain *domain,
+				dma_addr_t giova, phys_addr_t gpa, size_t size);
+extern void iommu_unbind_guest_msi(struct iommu_domain *domain,
+				   dma_addr_t giova);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
@@ -1101,6 +1111,16 @@ static inline struct iommu_fwspec *dev_iommu_fwspec_get(struct device *dev)
 {
 	return NULL;
 }
+
+static inline
+int iommu_bind_guest_msi(struct iommu_domain *domain,
+			 dma_addr_t giova, phys_addr_t gpa, size_t size)
+{
+	return -ENODEV;
+}
+static inline
+void iommu_unbind_guest_msi(struct iommu_domain *domain, dma_addr_t giova) {}
+
 #endif /* CONFIG_IOMMU_API */
 
 /**
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v14 03/13] iommu/smmuv3: Allow s1 and s2 configs to coexist
  2021-02-23 20:56 [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part) Eric Auger
  2021-02-23 20:56 ` [PATCH v14 01/13] iommu: Introduce attach/detach_pasid_table API Eric Auger
  2021-02-23 20:56 ` [PATCH v14 02/13] iommu: Introduce bind/unbind_guest_msi Eric Auger
@ 2021-02-23 20:56 ` Eric Auger
  2021-02-23 20:56 ` [PATCH v14 04/13] iommu/smmuv3: Get prepared for nested stage support Eric Auger
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 46+ messages in thread
From: Eric Auger @ 2021-02-23 20:56 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, iommu, linux-kernel, kvm, kvmarm,
	will, maz, robin.murphy, joro, alex.williamson, tn, zhukeqian1
  Cc: jacob.jun.pan, yi.l.liu, wangxingang5, jiangkunkun,
	jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, yuzenghui, nicoleotsuka, lushenming,
	vsethi

In true nested mode, both s1_cfg and s2_cfg will coexist.
Let's remove the union and add a "set" field in each
config structure telling whether the config is set and needs
to be applied when writing the STE. In legacy nested mode,
only the second stage is used. In true nested mode, both stages
are used and the S1 config is "set" when the guest passes
its pasid table.

No functional change intended.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v13 -> v14:
- slight reword of the commit message

v12 -> v13:
- does not dynamically allocate s1-cfg and s2_cfg anymore. Add
  the set field
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 43 +++++++++++++--------
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  8 ++--
 2 files changed, 31 insertions(+), 20 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 0f1264a86eec..14e5666c25dc 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1239,8 +1239,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 	u64 val = le64_to_cpu(dst[0]);
 	bool ste_live = false;
 	struct arm_smmu_device *smmu = NULL;
-	struct arm_smmu_s1_cfg *s1_cfg = NULL;
-	struct arm_smmu_s2_cfg *s2_cfg = NULL;
+	struct arm_smmu_s1_cfg *s1_cfg;
+	struct arm_smmu_s2_cfg *s2_cfg;
 	struct arm_smmu_domain *smmu_domain = NULL;
 	struct arm_smmu_cmdq_ent prefetch_cmd = {
 		.opcode		= CMDQ_OP_PREFETCH_CFG,
@@ -1255,13 +1255,24 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 	}
 
 	if (smmu_domain) {
+		s1_cfg = &smmu_domain->s1_cfg;
+		s2_cfg = &smmu_domain->s2_cfg;
+
 		switch (smmu_domain->stage) {
 		case ARM_SMMU_DOMAIN_S1:
-			s1_cfg = &smmu_domain->s1_cfg;
+			s1_cfg->set = true;
+			s2_cfg->set = false;
 			break;
 		case ARM_SMMU_DOMAIN_S2:
+			s1_cfg->set = false;
+			s2_cfg->set = true;
+			break;
 		case ARM_SMMU_DOMAIN_NESTED:
-			s2_cfg = &smmu_domain->s2_cfg;
+			/*
+			 * Actual usage of stage 1 depends on nested mode:
+			 * legacy (2d stage only) or true nested mode
+			 */
+			s2_cfg->set = true;
 			break;
 		default:
 			break;
@@ -1288,7 +1299,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 	val = STRTAB_STE_0_V;
 
 	/* Bypass/fault */
-	if (!smmu_domain || !(s1_cfg || s2_cfg)) {
+	if (!smmu_domain || !(s1_cfg->set || s2_cfg->set)) {
 		if (!smmu_domain && disable_bypass)
 			val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT);
 		else
@@ -1307,7 +1318,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 		return;
 	}
 
-	if (s1_cfg) {
+	if (s1_cfg->set) {
 		u64 strw = smmu->features & ARM_SMMU_FEAT_E2H ?
 			STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1;
 
@@ -1329,7 +1340,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 			FIELD_PREP(STRTAB_STE_0_S1FMT, s1_cfg->s1fmt);
 	}
 
-	if (s2_cfg) {
+	if (s2_cfg->set) {
 		BUG_ON(ste_live);
 		dst[2] = cpu_to_le64(
 			 FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) |
@@ -2016,24 +2027,24 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
 {
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	struct arm_smmu_s1_cfg *s1_cfg = &smmu_domain->s1_cfg;
+	struct arm_smmu_s2_cfg *s2_cfg = &smmu_domain->s2_cfg;
 
 	iommu_put_dma_cookie(domain);
 	free_io_pgtable_ops(smmu_domain->pgtbl_ops);
 
 	/* Free the CD and ASID, if we allocated them */
-	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
-		struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
-
+	if (s1_cfg->set) {
 		/* Prevent SVA from touching the CD while we're freeing it */
 		mutex_lock(&arm_smmu_asid_lock);
-		if (cfg->cdcfg.cdtab)
+		if (s1_cfg->cdcfg.cdtab)
 			arm_smmu_free_cd_tables(smmu_domain);
-		arm_smmu_free_asid(&cfg->cd);
+		arm_smmu_free_asid(&s1_cfg->cd);
 		mutex_unlock(&arm_smmu_asid_lock);
-	} else {
-		struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg;
-		if (cfg->vmid)
-			arm_smmu_bitmap_free(smmu->vmid_map, cfg->vmid);
+	}
+	if (s2_cfg->set) {
+		if (s2_cfg->vmid)
+			arm_smmu_bitmap_free(smmu->vmid_map, s2_cfg->vmid);
 	}
 
 	kfree(smmu_domain);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 59af0bbd2f7b..ec2b77596b6a 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -598,12 +598,14 @@ struct arm_smmu_s1_cfg {
 	struct arm_smmu_ctx_desc	cd;
 	u8				s1fmt;
 	u8				s1cdmax;
+	bool				set;
 };
 
 struct arm_smmu_s2_cfg {
 	u16				vmid;
 	u64				vttbr;
 	u64				vtcr;
+	bool				set;
 };
 
 struct arm_smmu_strtab_cfg {
@@ -718,10 +720,8 @@ struct arm_smmu_domain {
 	atomic_t			nr_ats_masters;
 
 	enum arm_smmu_domain_stage	stage;
-	union {
-		struct arm_smmu_s1_cfg	s1_cfg;
-		struct arm_smmu_s2_cfg	s2_cfg;
-	};
+	struct arm_smmu_s1_cfg	s1_cfg;
+	struct arm_smmu_s2_cfg	s2_cfg;
 
 	struct iommu_domain		domain;
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v14 04/13] iommu/smmuv3: Get prepared for nested stage support
  2021-02-23 20:56 [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part) Eric Auger
                   ` (2 preceding siblings ...)
  2021-02-23 20:56 ` [PATCH v14 03/13] iommu/smmuv3: Allow s1 and s2 configs to coexist Eric Auger
@ 2021-02-23 20:56 ` Eric Auger
  2021-02-23 20:56 ` [PATCH v14 05/13] iommu/smmuv3: Implement attach/detach_pasid_table Eric Auger
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 46+ messages in thread
From: Eric Auger @ 2021-02-23 20:56 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, iommu, linux-kernel, kvm, kvmarm,
	will, maz, robin.murphy, joro, alex.williamson, tn, zhukeqian1
  Cc: jacob.jun.pan, yi.l.liu, wangxingang5, jiangkunkun,
	jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, yuzenghui, nicoleotsuka, lushenming,
	vsethi

When nested stage translation is setup, both s1_cfg and
s2_cfg are set.

We introduce a new smmu_domain abort field that will be set
upon guest stage1 configuration passing. If no guest stage1
config has been attached, it is ignored when writing the STE.

arm_smmu_write_strtab_ent() is modified to write both stage
fields in the STE and deal with the abort field.

In nested mode, only stage 2 is "finalized" as the host does
not own/configure the stage 1 context descriptor; guest does.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v13 -> v14:
- removed BUG_ON(ste_live && !nested) as this should never happen
- restored the old comment as there is always an abort in between
  S2 -> S1 + S2 and S1 + S2 -> S2
- remove sparse warning

v10 -> v11:
- Fix an issue reported by Shameer when switching from with vSMMU
  to without vSMMU. Despite the spec does not seem to mention it
  seems to be needed to reset the 2 high 64b when switching from
  S1+S2 cfg to S1 only. Especially dst[3] needs to be reset (S2TTB).
  On some implementations, if the S2TTB is not reset, this causes
  a C_BAD_STE error
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 55 ++++++++++++++++++---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  2 +
 2 files changed, 49 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 14e5666c25dc..085a784dfaee 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1237,7 +1237,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 	 * 3. Update Config, sync
 	 */
 	u64 val = le64_to_cpu(dst[0]);
-	bool ste_live = false;
+	bool s1_live = false, s2_live = false, ste_live;
+	bool abort, translate = false;
 	struct arm_smmu_device *smmu = NULL;
 	struct arm_smmu_s1_cfg *s1_cfg;
 	struct arm_smmu_s2_cfg *s2_cfg;
@@ -1277,6 +1278,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 		default:
 			break;
 		}
+		translate = s1_cfg->set || s2_cfg->set;
 	}
 
 	if (val & STRTAB_STE_0_V) {
@@ -1284,23 +1286,36 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 		case STRTAB_STE_0_CFG_BYPASS:
 			break;
 		case STRTAB_STE_0_CFG_S1_TRANS:
+			s1_live = true;
+			break;
 		case STRTAB_STE_0_CFG_S2_TRANS:
-			ste_live = true;
+			s2_live = true;
+			break;
+		case STRTAB_STE_0_CFG_NESTED:
+			s1_live = true;
+			s2_live = true;
 			break;
 		case STRTAB_STE_0_CFG_ABORT:
-			BUG_ON(!disable_bypass);
 			break;
 		default:
 			BUG(); /* STE corruption */
 		}
 	}
 
+	ste_live = s1_live || s2_live;
+
 	/* Nuke the existing STE_0 value, as we're going to rewrite it */
 	val = STRTAB_STE_0_V;
 
 	/* Bypass/fault */
-	if (!smmu_domain || !(s1_cfg->set || s2_cfg->set)) {
-		if (!smmu_domain && disable_bypass)
+
+	if (!smmu_domain)
+		abort = disable_bypass;
+	else
+		abort = smmu_domain->abort;
+
+	if (abort || !translate) {
+		if (abort)
 			val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT);
 		else
 			val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS);
@@ -1318,11 +1333,17 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 		return;
 	}
 
+	if (ste_live) {
+		/* First invalidate the live STE */
+		dst[0] = cpu_to_le64(STRTAB_STE_0_CFG_ABORT);
+		arm_smmu_sync_ste_for_sid(smmu, sid);
+	}
+
 	if (s1_cfg->set) {
 		u64 strw = smmu->features & ARM_SMMU_FEAT_E2H ?
 			STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1;
 
-		BUG_ON(ste_live);
+		BUG_ON(s1_live);
 		dst[1] = cpu_to_le64(
 			 FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) |
 			 FIELD_PREP(STRTAB_STE_1_S1CIR, STRTAB_STE_1_S1C_CACHE_WBRA) |
@@ -1341,7 +1362,14 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 	}
 
 	if (s2_cfg->set) {
-		BUG_ON(ste_live);
+		u64 vttbr = s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK;
+
+		if (s2_live) {
+			u64 s2ttb = le64_to_cpu(dst[3]) & STRTAB_STE_3_S2TTB_MASK;
+
+			BUG_ON(s2ttb != vttbr);
+		}
+
 		dst[2] = cpu_to_le64(
 			 FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) |
 			 FIELD_PREP(STRTAB_STE_2_VTCR, s2_cfg->vtcr) |
@@ -1351,9 +1379,12 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 			 STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2AA64 |
 			 STRTAB_STE_2_S2R);
 
-		dst[3] = cpu_to_le64(s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK);
+		dst[3] = cpu_to_le64(vttbr);
 
 		val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS);
+	} else {
+		dst[2] = 0;
+		dst[3] = 0;
 	}
 
 	if (master->ats_enabled)
@@ -2154,6 +2185,14 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
 		return 0;
 	}
 
+	if (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED &&
+	    (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1) ||
+	     !(smmu->features & ARM_SMMU_FEAT_TRANS_S2))) {
+		dev_info(smmu_domain->smmu->dev,
+			 "does not implement two stages\n");
+		return -EINVAL;
+	}
+
 	/* Restrict the stage to what we can actually support */
 	if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1))
 		smmu_domain->stage = ARM_SMMU_DOMAIN_S2;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index ec2b77596b6a..eb0cc08e8240 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -206,6 +206,7 @@
 #define STRTAB_STE_0_CFG_BYPASS		4
 #define STRTAB_STE_0_CFG_S1_TRANS	5
 #define STRTAB_STE_0_CFG_S2_TRANS	6
+#define STRTAB_STE_0_CFG_NESTED		7
 
 #define STRTAB_STE_0_S1FMT		GENMASK_ULL(5, 4)
 #define STRTAB_STE_0_S1FMT_LINEAR	0
@@ -722,6 +723,7 @@ struct arm_smmu_domain {
 	enum arm_smmu_domain_stage	stage;
 	struct arm_smmu_s1_cfg	s1_cfg;
 	struct arm_smmu_s2_cfg	s2_cfg;
+	bool				abort;
 
 	struct iommu_domain		domain;
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v14 05/13] iommu/smmuv3: Implement attach/detach_pasid_table
  2021-02-23 20:56 [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part) Eric Auger
                   ` (3 preceding siblings ...)
  2021-02-23 20:56 ` [PATCH v14 04/13] iommu/smmuv3: Get prepared for nested stage support Eric Auger
@ 2021-02-23 20:56 ` Eric Auger
  2021-03-02  8:35   ` Keqian Zhu
  2021-02-23 20:56 ` [PATCH v14 06/13] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs Eric Auger
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 46+ messages in thread
From: Eric Auger @ 2021-02-23 20:56 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, iommu, linux-kernel, kvm, kvmarm,
	will, maz, robin.murphy, joro, alex.williamson, tn, zhukeqian1
  Cc: jacob.jun.pan, yi.l.liu, wangxingang5, jiangkunkun,
	jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, yuzenghui, nicoleotsuka, lushenming,
	vsethi

On attach_pasid_table() we program STE S1 related info set
by the guest into the actual physical STEs. At minimum
we need to program the context descriptor GPA and compute
whether the stage1 is translated/bypassed or aborted.

On detach, the stage 1 config is unset and the abort flag is
unset.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v13 -> v14:
- on PASID table detach, reset the abort flag (Keqian)

v7 -> v8:
- remove smmu->features check, now done on domain finalize

v6 -> v7:
- check versions and comment the fact we don't need to take
  into account s1dss and s1fmt
v3 -> v4:
- adapt to changes in iommu_pasid_table_config
- different programming convention at s1_cfg/s2_cfg/ste.abort

v2 -> v3:
- callback now is named set_pasid_table and struct fields
  are laid out differently.

v1 -> v2:
- invalidate the STE before changing them
- hold init_mutex
- handle new fields
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 89 +++++++++++++++++++++
 1 file changed, 89 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 085a784dfaee..5579ec4fccc8 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2842,6 +2842,93 @@ static void arm_smmu_get_resv_regions(struct device *dev,
 	iommu_dma_get_resv_regions(dev, head);
 }
 
+static int arm_smmu_attach_pasid_table(struct iommu_domain *domain,
+				       struct iommu_pasid_table_config *cfg)
+{
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct arm_smmu_master *master;
+	struct arm_smmu_device *smmu;
+	unsigned long flags;
+	int ret = -EINVAL;
+
+	if (cfg->format != IOMMU_PASID_FORMAT_SMMUV3)
+		return -EINVAL;
+
+	if (cfg->version != PASID_TABLE_CFG_VERSION_1 ||
+	    cfg->vendor_data.smmuv3.version != PASID_TABLE_SMMUV3_CFG_VERSION_1)
+		return -EINVAL;
+
+	mutex_lock(&smmu_domain->init_mutex);
+
+	smmu = smmu_domain->smmu;
+
+	if (!smmu)
+		goto out;
+
+	if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+		goto out;
+
+	switch (cfg->config) {
+	case IOMMU_PASID_CONFIG_ABORT:
+		smmu_domain->s1_cfg.set = false;
+		smmu_domain->abort = true;
+		break;
+	case IOMMU_PASID_CONFIG_BYPASS:
+		smmu_domain->s1_cfg.set = false;
+		smmu_domain->abort = false;
+		break;
+	case IOMMU_PASID_CONFIG_TRANSLATE:
+		/* we do not support S1 <-> S1 transitions */
+		if (smmu_domain->s1_cfg.set)
+			goto out;
+
+		/*
+		 * we currently support a single CD so s1fmt and s1dss
+		 * fields are also ignored
+		 */
+		if (cfg->pasid_bits)
+			goto out;
+
+		smmu_domain->s1_cfg.cdcfg.cdtab_dma = cfg->base_ptr;
+		smmu_domain->s1_cfg.set = true;
+		smmu_domain->abort = false;
+		break;
+	default:
+		goto out;
+	}
+	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+	list_for_each_entry(master, &smmu_domain->devices, domain_head)
+		arm_smmu_install_ste_for_dev(master);
+	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+	ret = 0;
+out:
+	mutex_unlock(&smmu_domain->init_mutex);
+	return ret;
+}
+
+static void arm_smmu_detach_pasid_table(struct iommu_domain *domain)
+{
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct arm_smmu_master *master;
+	unsigned long flags;
+
+	mutex_lock(&smmu_domain->init_mutex);
+
+	if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+		goto unlock;
+
+	smmu_domain->s1_cfg.set = false;
+	smmu_domain->abort = false;
+
+	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+	list_for_each_entry(master, &smmu_domain->devices, domain_head)
+		arm_smmu_install_ste_for_dev(master);
+	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+
+unlock:
+	mutex_unlock(&smmu_domain->init_mutex);
+}
+
 static bool arm_smmu_dev_has_feature(struct device *dev,
 				     enum iommu_dev_features feat)
 {
@@ -2939,6 +3026,8 @@ static struct iommu_ops arm_smmu_ops = {
 	.of_xlate		= arm_smmu_of_xlate,
 	.get_resv_regions	= arm_smmu_get_resv_regions,
 	.put_resv_regions	= generic_iommu_put_resv_regions,
+	.attach_pasid_table	= arm_smmu_attach_pasid_table,
+	.detach_pasid_table	= arm_smmu_detach_pasid_table,
 	.dev_has_feat		= arm_smmu_dev_has_feature,
 	.dev_feat_enabled	= arm_smmu_dev_feature_enabled,
 	.dev_enable_feat	= arm_smmu_dev_enable_feature,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v14 06/13] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
  2021-02-23 20:56 [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part) Eric Auger
                   ` (4 preceding siblings ...)
  2021-02-23 20:56 ` [PATCH v14 05/13] iommu/smmuv3: Implement attach/detach_pasid_table Eric Auger
@ 2021-02-23 20:56 ` Eric Auger
  2021-03-30  9:17   ` Zenghui Yu
  2021-04-01 12:37   ` Kunkun Jiang
  2021-02-23 20:56 ` [PATCH v14 07/13] iommu/smmuv3: Implement cache_invalidate Eric Auger
                   ` (8 subsequent siblings)
  14 siblings, 2 replies; 46+ messages in thread
From: Eric Auger @ 2021-02-23 20:56 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, iommu, linux-kernel, kvm, kvmarm,
	will, maz, robin.murphy, joro, alex.williamson, tn, zhukeqian1
  Cc: jacob.jun.pan, yi.l.liu, wangxingang5, jiangkunkun,
	jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, yuzenghui, nicoleotsuka, lushenming,
	vsethi

With nested stage support, soon we will need to invalidate
S1 contexts and ranges tagged with an unmanaged asid, this
latter being managed by the guest. So let's introduce 2 helpers
that allow to invalidate with externally managed ASIDs

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v13 -> v14
- Actually send the NH_ASID command (reported by Xingang Wang)
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 38 ++++++++++++++++-----
 1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 5579ec4fccc8..4c19a1114de4 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1843,9 +1843,9 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
 }
 
 /* IO_PGTABLE API */
-static void arm_smmu_tlb_inv_context(void *cookie)
+static void __arm_smmu_tlb_inv_context(struct arm_smmu_domain *smmu_domain,
+				       int ext_asid)
 {
-	struct arm_smmu_domain *smmu_domain = cookie;
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	struct arm_smmu_cmdq_ent cmd;
 
@@ -1856,7 +1856,13 @@ static void arm_smmu_tlb_inv_context(void *cookie)
 	 * insertion to guarantee those are observed before the TLBI. Do be
 	 * careful, 007.
 	 */
-	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
+	if (ext_asid >= 0) { /* guest stage 1 invalidation */
+		cmd.opcode	= CMDQ_OP_TLBI_NH_ASID;
+		cmd.tlbi.asid	= ext_asid;
+		cmd.tlbi.vmid	= smmu_domain->s2_cfg.vmid;
+		arm_smmu_cmdq_issue_cmd(smmu, &cmd);
+		arm_smmu_cmdq_issue_sync(smmu);
+	} else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
 		arm_smmu_tlb_inv_asid(smmu, smmu_domain->s1_cfg.cd.asid);
 	} else {
 		cmd.opcode	= CMDQ_OP_TLBI_S12_VMALL;
@@ -1867,6 +1873,13 @@ static void arm_smmu_tlb_inv_context(void *cookie)
 	arm_smmu_atc_inv_domain(smmu_domain, 0, 0, 0);
 }
 
+static void arm_smmu_tlb_inv_context(void *cookie)
+{
+	struct arm_smmu_domain *smmu_domain = cookie;
+
+	__arm_smmu_tlb_inv_context(smmu_domain, -1);
+}
+
 static void __arm_smmu_tlb_inv_range(struct arm_smmu_cmdq_ent *cmd,
 				     unsigned long iova, size_t size,
 				     size_t granule,
@@ -1926,9 +1939,10 @@ static void __arm_smmu_tlb_inv_range(struct arm_smmu_cmdq_ent *cmd,
 	arm_smmu_cmdq_batch_submit(smmu, &cmds);
 }
 
-static void arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
-					  size_t granule, bool leaf,
-					  struct arm_smmu_domain *smmu_domain)
+static void
+arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
+			      size_t granule, bool leaf, int ext_asid,
+			      struct arm_smmu_domain *smmu_domain)
 {
 	struct arm_smmu_cmdq_ent cmd = {
 		.tlbi = {
@@ -1936,7 +1950,12 @@ static void arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
 		},
 	};
 
-	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
+	if (ext_asid >= 0) {  /* guest stage 1 invalidation */
+		cmd.opcode	= smmu_domain->smmu->features & ARM_SMMU_FEAT_E2H ?
+				  CMDQ_OP_TLBI_EL2_VA : CMDQ_OP_TLBI_NH_VA;
+		cmd.tlbi.asid	= ext_asid;
+		cmd.tlbi.vmid	= smmu_domain->s2_cfg.vmid;
+	} else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
 		cmd.opcode	= smmu_domain->smmu->features & ARM_SMMU_FEAT_E2H ?
 				  CMDQ_OP_TLBI_EL2_VA : CMDQ_OP_TLBI_NH_VA;
 		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd.asid;
@@ -1944,6 +1963,7 @@ static void arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
 		cmd.opcode	= CMDQ_OP_TLBI_S2_IPA;
 		cmd.tlbi.vmid	= smmu_domain->s2_cfg.vmid;
 	}
+
 	__arm_smmu_tlb_inv_range(&cmd, iova, size, granule, smmu_domain);
 
 	/*
@@ -1982,7 +2002,7 @@ static void arm_smmu_tlb_inv_page_nosync(struct iommu_iotlb_gather *gather,
 static void arm_smmu_tlb_inv_walk(unsigned long iova, size_t size,
 				  size_t granule, void *cookie)
 {
-	arm_smmu_tlb_inv_range_domain(iova, size, granule, false, cookie);
+	arm_smmu_tlb_inv_range_domain(iova, size, granule, false, -1, cookie);
 }
 
 static const struct iommu_flush_ops arm_smmu_flush_ops = {
@@ -2523,7 +2543,7 @@ static void arm_smmu_iotlb_sync(struct iommu_domain *domain,
 
 	arm_smmu_tlb_inv_range_domain(gather->start,
 				      gather->end - gather->start + 1,
-				      gather->pgsize, true, smmu_domain);
+				      gather->pgsize, true, -1, smmu_domain);
 }
 
 static phys_addr_t
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v14 07/13] iommu/smmuv3: Implement cache_invalidate
  2021-02-23 20:56 [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part) Eric Auger
                   ` (5 preceding siblings ...)
  2021-02-23 20:56 ` [PATCH v14 06/13] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs Eric Auger
@ 2021-02-23 20:56 ` Eric Auger
       [not found]   ` <c10c6405-5efe-5a41-2b3a-f3af85a528ba@hisilicon.com>
  2021-04-01  6:11   ` Zenghui Yu
  2021-02-23 20:56 ` [PATCH v14 08/13] dma-iommu: Implement NESTED_MSI cookie Eric Auger
                   ` (7 subsequent siblings)
  14 siblings, 2 replies; 46+ messages in thread
From: Eric Auger @ 2021-02-23 20:56 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, iommu, linux-kernel, kvm, kvmarm,
	will, maz, robin.murphy, joro, alex.williamson, tn, zhukeqian1
  Cc: jacob.jun.pan, yi.l.liu, wangxingang5, jiangkunkun,
	jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, yuzenghui, nicoleotsuka, lushenming,
	vsethi

Implement domain-selective, pasid selective and page-selective
IOTLB invalidations.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v13 -> v14:
- Add domain invalidation
- do global inval when asid is not provided with addr
  granularity

v7 -> v8:
- ASID based invalidation using iommu_inv_pasid_info
- check ARCHID/PASID flags in addr based invalidation
- use __arm_smmu_tlb_inv_context and __arm_smmu_tlb_inv_range_nosync

v6 -> v7
- check the uapi version

v3 -> v4:
- adapt to changes in the uapi
- add support for leaf parameter
- do not use arm_smmu_tlb_inv_range_nosync or arm_smmu_tlb_inv_context
  anymore

v2 -> v3:
- replace __arm_smmu_tlb_sync by arm_smmu_cmdq_issue_sync

v1 -> v2:
- properly pass the asid
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 74 +++++++++++++++++++++
 1 file changed, 74 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 4c19a1114de4..df3adc49111c 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2949,6 +2949,79 @@ static void arm_smmu_detach_pasid_table(struct iommu_domain *domain)
 	mutex_unlock(&smmu_domain->init_mutex);
 }
 
+static int
+arm_smmu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
+			  struct iommu_cache_invalidate_info *inv_info)
+{
+	struct arm_smmu_cmdq_ent cmd = {.opcode = CMDQ_OP_TLBI_NSNH_ALL};
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+	if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+		return -EINVAL;
+
+	if (!smmu)
+		return -EINVAL;
+
+	if (inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
+		return -EINVAL;
+
+	if (inv_info->cache & IOMMU_CACHE_INV_TYPE_PASID ||
+	    inv_info->cache & IOMMU_CACHE_INV_TYPE_DEV_IOTLB) {
+		return -ENOENT;
+	}
+
+	if (!(inv_info->cache & IOMMU_CACHE_INV_TYPE_IOTLB))
+		return -EINVAL;
+
+	/* IOTLB invalidation */
+
+	switch (inv_info->granularity) {
+	case IOMMU_INV_GRANU_PASID:
+	{
+		struct iommu_inv_pasid_info *info =
+			&inv_info->granu.pasid_info;
+
+		if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
+			return -ENOENT;
+		if (!(info->flags & IOMMU_INV_PASID_FLAGS_ARCHID))
+			return -EINVAL;
+
+		__arm_smmu_tlb_inv_context(smmu_domain, info->archid);
+		return 0;
+	}
+	case IOMMU_INV_GRANU_ADDR:
+	{
+		struct iommu_inv_addr_info *info = &inv_info->granu.addr_info;
+		size_t size = info->nb_granules * info->granule_size;
+		bool leaf = info->flags & IOMMU_INV_ADDR_FLAGS_LEAF;
+
+		if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
+			return -ENOENT;
+
+		if (!(info->flags & IOMMU_INV_ADDR_FLAGS_ARCHID))
+			break;
+
+		arm_smmu_tlb_inv_range_domain(info->addr, size,
+					      info->granule_size, leaf,
+					      info->archid, smmu_domain);
+
+		arm_smmu_cmdq_issue_sync(smmu);
+		return 0;
+	}
+	case IOMMU_INV_GRANU_DOMAIN:
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	/* Global S1 invalidation */
+	cmd.tlbi.vmid   = smmu_domain->s2_cfg.vmid;
+	arm_smmu_cmdq_issue_cmd(smmu, &cmd);
+	arm_smmu_cmdq_issue_sync(smmu);
+	return 0;
+}
+
 static bool arm_smmu_dev_has_feature(struct device *dev,
 				     enum iommu_dev_features feat)
 {
@@ -3048,6 +3121,7 @@ static struct iommu_ops arm_smmu_ops = {
 	.put_resv_regions	= generic_iommu_put_resv_regions,
 	.attach_pasid_table	= arm_smmu_attach_pasid_table,
 	.detach_pasid_table	= arm_smmu_detach_pasid_table,
+	.cache_invalidate	= arm_smmu_cache_invalidate,
 	.dev_has_feat		= arm_smmu_dev_has_feature,
 	.dev_feat_enabled	= arm_smmu_dev_feature_enabled,
 	.dev_enable_feat	= arm_smmu_dev_enable_feature,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v14 08/13] dma-iommu: Implement NESTED_MSI cookie
  2021-02-23 20:56 [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part) Eric Auger
                   ` (6 preceding siblings ...)
  2021-02-23 20:56 ` [PATCH v14 07/13] iommu/smmuv3: Implement cache_invalidate Eric Auger
@ 2021-02-23 20:56 ` Eric Auger
  2021-04-07  7:39   ` Zenghui Yu
  2021-02-23 20:56 ` [PATCH v14 09/13] iommu/smmuv3: Nested mode single MSI doorbell per domain enforcement Eric Auger
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 46+ messages in thread
From: Eric Auger @ 2021-02-23 20:56 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, iommu, linux-kernel, kvm, kvmarm,
	will, maz, robin.murphy, joro, alex.williamson, tn, zhukeqian1
  Cc: jacob.jun.pan, yi.l.liu, wangxingang5, jiangkunkun,
	jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, yuzenghui, nicoleotsuka, lushenming,
	vsethi

Up to now, when the type was UNMANAGED, we used to
allocate IOVA pages within a reserved IOVA MSI range.

If both the host and the guest are exposed with SMMUs, each
would allocate an IOVA. The guest allocates an IOVA (gIOVA)
to map onto the guest MSI doorbell (gDB). The Host allocates
another IOVA (hIOVA) to map onto the physical doorbell (hDB).

So we end up with 2 unrelated mappings, at S1 and S2:
         S1             S2
gIOVA    ->     gDB
               hIOVA    ->    hDB

The PCI device would be programmed with hIOVA.
No stage 1 mapping would existing, causing the MSIs to fault.

iommu_dma_bind_guest_msi() allows to pass gIOVA/gDB
to the host so that gIOVA can be used by the host instead of
re-allocating a new hIOVA.

         S1           S2
gIOVA    ->    gDB    ->    hDB

this time, the PCI device can be programmed with the gIOVA MSI
doorbell which is correctly mapped through both stages.

Nested mode is not compatible with HW MSI regions as in that
case gDB and hDB should have a 1-1 mapping. This check will
be done when attaching each device to the IOMMU domain.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v10 -> v11:
- fix compilation if !CONFIG_IOMMU_DMA

v7 -> v8:
- correct iommu_dma_(un)bind_guest_msi when
  !CONFIG_IOMMU_DMA
- Mentioned nested mode is not compatible with HW MSI regions
  in commit message
- protect with msi_lock on unbind

v6 -> v7:
- removed device handle

v3 -> v4:
- change function names; add unregister
- protect with msi_lock

v2 -> v3:
- also store the device handle on S1 mapping registration.
  This garantees we associate the associated S2 mapping binds
  to the correct physical MSI controller.

v1 -> v2:
- unmap stage2 on put()
---
 drivers/iommu/dma-iommu.c | 142 +++++++++++++++++++++++++++++++++++++-
 include/linux/dma-iommu.h |  16 +++++
 2 files changed, 155 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index f659395e7959..d25eb7cecaa7 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -19,6 +19,7 @@
 #include <linux/irq.h>
 #include <linux/mm.h>
 #include <linux/mutex.h>
+#include <linux/mutex.h>
 #include <linux/pci.h>
 #include <linux/swiotlb.h>
 #include <linux/scatterlist.h>
@@ -29,12 +30,15 @@
 struct iommu_dma_msi_page {
 	struct list_head	list;
 	dma_addr_t		iova;
+	dma_addr_t		gpa;
 	phys_addr_t		phys;
+	size_t			s1_granule;
 };
 
 enum iommu_dma_cookie_type {
 	IOMMU_DMA_IOVA_COOKIE,
 	IOMMU_DMA_MSI_COOKIE,
+	IOMMU_DMA_NESTED_MSI_COOKIE,
 };
 
 struct iommu_dma_cookie {
@@ -46,6 +50,7 @@ struct iommu_dma_cookie {
 		dma_addr_t		msi_iova;
 	};
 	struct list_head		msi_page_list;
+	spinlock_t			msi_lock;
 
 	/* Domain for flush queue callback; NULL if flush queue not in use */
 	struct iommu_domain		*fq_domain;
@@ -87,6 +92,7 @@ static struct iommu_dma_cookie *cookie_alloc(enum iommu_dma_cookie_type type)
 
 	cookie = kzalloc(sizeof(*cookie), GFP_KERNEL);
 	if (cookie) {
+		spin_lock_init(&cookie->msi_lock);
 		INIT_LIST_HEAD(&cookie->msi_page_list);
 		cookie->type = type;
 	}
@@ -120,14 +126,17 @@ EXPORT_SYMBOL(iommu_get_dma_cookie);
  *
  * Users who manage their own IOVA allocation and do not want DMA API support,
  * but would still like to take advantage of automatic MSI remapping, can use
- * this to initialise their own domain appropriately. Users should reserve a
+ * this to initialise their own domain appropriately. Users may reserve a
  * contiguous IOVA region, starting at @base, large enough to accommodate the
  * number of PAGE_SIZE mappings necessary to cover every MSI doorbell address
- * used by the devices attached to @domain.
+ * used by the devices attached to @domain. The other way round is to provide
+ * usable iova pages through the iommu_dma_bind_doorbell API (nested stages
+ * use case)
  */
 int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
 {
 	struct iommu_dma_cookie *cookie;
+	int nesting, ret;
 
 	if (domain->type != IOMMU_DOMAIN_UNMANAGED)
 		return -EINVAL;
@@ -135,7 +144,12 @@ int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
 	if (domain->iova_cookie)
 		return -EEXIST;
 
-	cookie = cookie_alloc(IOMMU_DMA_MSI_COOKIE);
+	ret =  iommu_domain_get_attr(domain, DOMAIN_ATTR_NESTING, &nesting);
+	if (!ret && nesting)
+		cookie = cookie_alloc(IOMMU_DMA_NESTED_MSI_COOKIE);
+	else
+		cookie = cookie_alloc(IOMMU_DMA_MSI_COOKIE);
+
 	if (!cookie)
 		return -ENOMEM;
 
@@ -156,6 +170,7 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
 {
 	struct iommu_dma_cookie *cookie = domain->iova_cookie;
 	struct iommu_dma_msi_page *msi, *tmp;
+	bool s2_unmap = false;
 
 	if (!cookie)
 		return;
@@ -163,7 +178,15 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
 	if (cookie->type == IOMMU_DMA_IOVA_COOKIE && cookie->iovad.granule)
 		put_iova_domain(&cookie->iovad);
 
+	if (cookie->type == IOMMU_DMA_NESTED_MSI_COOKIE)
+		s2_unmap = true;
+
 	list_for_each_entry_safe(msi, tmp, &cookie->msi_page_list, list) {
+		if (s2_unmap && msi->phys) {
+			size_t size = cookie_msi_granule(cookie);
+
+			WARN_ON(iommu_unmap(domain, msi->gpa, size) != size);
+		}
 		list_del(&msi->list);
 		kfree(msi);
 	}
@@ -172,6 +195,92 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
 }
 EXPORT_SYMBOL(iommu_put_dma_cookie);
 
+/**
+ * iommu_dma_bind_guest_msi - Allows to pass the stage 1
+ * binding of a virtual MSI doorbell used by @dev.
+ *
+ * @domain: domain handle
+ * @iova: guest iova
+ * @gpa: gpa of the virtual doorbell
+ * @size: size of the granule used for the stage1 mapping
+ *
+ * In nested stage use case, the user can provide IOVA/IPA bindings
+ * corresponding to a guest MSI stage 1 mapping. When the host needs
+ * to map its own MSI doorbells, it can use @gpa as stage 2 input
+ * and map it onto the physical MSI doorbell.
+ */
+int iommu_dma_bind_guest_msi(struct iommu_domain *domain,
+			     dma_addr_t iova, phys_addr_t gpa, size_t size)
+{
+	struct iommu_dma_cookie *cookie = domain->iova_cookie;
+	struct iommu_dma_msi_page *msi;
+	int ret = 0;
+
+	if (!cookie)
+		return -EINVAL;
+
+	if (cookie->type != IOMMU_DMA_NESTED_MSI_COOKIE)
+		return -EINVAL;
+
+	iova = iova & ~(dma_addr_t)(size - 1);
+	gpa = gpa & ~(phys_addr_t)(size - 1);
+
+	spin_lock(&cookie->msi_lock);
+
+	list_for_each_entry(msi, &cookie->msi_page_list, list) {
+		if (msi->iova == iova)
+			goto unlock; /* this page is already registered */
+	}
+
+	msi = kzalloc(sizeof(*msi), GFP_ATOMIC);
+	if (!msi) {
+		ret = -ENOMEM;
+		goto unlock;
+	}
+
+	msi->iova = iova;
+	msi->gpa = gpa;
+	msi->s1_granule = size;
+	list_add(&msi->list, &cookie->msi_page_list);
+unlock:
+	spin_unlock(&cookie->msi_lock);
+	return ret;
+}
+EXPORT_SYMBOL(iommu_dma_bind_guest_msi);
+
+void iommu_dma_unbind_guest_msi(struct iommu_domain *domain, dma_addr_t giova)
+{
+	struct iommu_dma_cookie *cookie = domain->iova_cookie;
+	struct iommu_dma_msi_page *msi;
+
+	if (!cookie)
+		return;
+
+	if (cookie->type != IOMMU_DMA_NESTED_MSI_COOKIE)
+		return;
+
+	spin_lock(&cookie->msi_lock);
+
+	list_for_each_entry(msi, &cookie->msi_page_list, list) {
+		dma_addr_t aligned_giova =
+			giova & ~(dma_addr_t)(msi->s1_granule - 1);
+
+		if (msi->iova == aligned_giova) {
+			if (msi->phys) {
+				/* unmap the stage 2 */
+				size_t size = cookie_msi_granule(cookie);
+
+				WARN_ON(iommu_unmap(domain, msi->gpa, size) != size);
+			}
+			list_del(&msi->list);
+			kfree(msi);
+			break;
+		}
+	}
+	spin_unlock(&cookie->msi_lock);
+}
+EXPORT_SYMBOL(iommu_dma_unbind_guest_msi);
+
 /**
  * iommu_dma_get_resv_regions - Reserved region driver helper
  * @dev: Device from iommu_get_resv_regions()
@@ -1343,6 +1452,33 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
 		if (msi_page->phys == msi_addr)
 			return msi_page;
 
+	/*
+	 * In nested stage mode, we do not allocate an MSI page in
+	 * a range provided by the user. Instead, IOVA/IPA bindings are
+	 * individually provided. We reuse thise IOVAs to build the
+	 * GIOVA -> GPA -> MSI HPA nested stage mapping.
+	 */
+	if (cookie->type == IOMMU_DMA_NESTED_MSI_COOKIE) {
+		list_for_each_entry(msi_page, &cookie->msi_page_list, list)
+			if (!msi_page->phys) {
+				int ret;
+
+				/* do the stage 2 mapping */
+				ret = iommu_map(domain,
+						msi_page->gpa, msi_addr, size,
+						IOMMU_MMIO | IOMMU_WRITE);
+				if (ret) {
+					pr_warn("MSI S2 mapping failed (%d)\n",
+						ret);
+					return NULL;
+				}
+				msi_page->phys = msi_addr;
+				return msi_page;
+			}
+		pr_warn("%s no MSI binding found\n", __func__);
+		return NULL;
+	}
+
 	msi_page = kzalloc(sizeof(*msi_page), GFP_KERNEL);
 	if (!msi_page)
 		return NULL;
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 706b68d1359b..7bd785e68477 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -12,6 +12,7 @@
 #include <linux/dma-mapping.h>
 #include <linux/iommu.h>
 #include <linux/msi.h>
+#include <uapi/linux/iommu.h>
 
 /* Domain management interface for IOMMU drivers */
 int iommu_get_dma_cookie(struct iommu_domain *domain);
@@ -36,6 +37,9 @@ void iommu_dma_compose_msi_msg(struct msi_desc *desc,
 			       struct msi_msg *msg);
 
 void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list);
+int iommu_dma_bind_guest_msi(struct iommu_domain *domain,
+			     dma_addr_t iova, phys_addr_t gpa, size_t size);
+void iommu_dma_unbind_guest_msi(struct iommu_domain *domain, dma_addr_t giova);
 
 void iommu_dma_free_cpu_cached_iovas(unsigned int cpu,
 		struct iommu_domain *domain);
@@ -77,6 +81,18 @@ static inline void iommu_dma_compose_msi_msg(struct msi_desc *desc,
 {
 }
 
+static inline int
+iommu_dma_bind_guest_msi(struct iommu_domain *domain,
+			 dma_addr_t iova, phys_addr_t gpa, size_t size)
+{
+	return -ENODEV;
+}
+
+static inline void
+iommu_dma_unbind_guest_msi(struct iommu_domain *domain, dma_addr_t giova)
+{
+}
+
 static inline void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list)
 {
 }
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v14 09/13] iommu/smmuv3: Nested mode single MSI doorbell per domain enforcement
  2021-02-23 20:56 [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part) Eric Auger
                   ` (7 preceding siblings ...)
  2021-02-23 20:56 ` [PATCH v14 08/13] dma-iommu: Implement NESTED_MSI cookie Eric Auger
@ 2021-02-23 20:56 ` Eric Auger
  2021-02-23 20:56 ` [PATCH v14 10/13] iommu/smmuv3: Enforce incompatibility between nested mode and HW MSI regions Eric Auger
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 46+ messages in thread
From: Eric Auger @ 2021-02-23 20:56 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, iommu, linux-kernel, kvm, kvmarm,
	will, maz, robin.murphy, joro, alex.williamson, tn, zhukeqian1
  Cc: jacob.jun.pan, yi.l.liu, wangxingang5, jiangkunkun,
	jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, yuzenghui, nicoleotsuka, lushenming,
	vsethi

In nested mode we enforce the rule that all devices belonging
to the same iommu_domain share the same msi_domain.

Indeed if there were several physical MSI doorbells being used
within a single iommu_domain, it becomes really difficult to
resolve the nested stage mapping translating into the correct
physical doorbell. So let's forbid this situation.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 41 +++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index df3adc49111c..0cf92cd0ab3e 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2429,6 +2429,37 @@ static void arm_smmu_detach_dev(struct arm_smmu_master *master)
 	arm_smmu_install_ste_for_dev(master);
 }
 
+static bool arm_smmu_share_msi_domain(struct iommu_domain *domain,
+				      struct device *dev)
+{
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct irq_domain *irqd = dev_get_msi_domain(dev);
+	struct arm_smmu_master *master;
+	unsigned long flags;
+	bool share = false;
+
+	if (!irqd)
+		return true;
+
+	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+	list_for_each_entry(master, &smmu_domain->devices, domain_head) {
+		struct irq_domain *d = dev_get_msi_domain(master->dev);
+
+		if (!d)
+			continue;
+		if (irqd != d) {
+			dev_info(dev, "Nested mode forbids to attach devices "
+				 "using different physical MSI doorbells "
+				 "to the same iommu_domain");
+			goto unlock;
+		}
+	}
+	share = true;
+unlock:
+	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+	return share;
+}
+
 static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 {
 	int ret = 0;
@@ -2486,6 +2517,16 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 		ret = -EINVAL;
 		goto out_unlock;
 	}
+	/*
+	 * In nested mode we must check all devices belonging to the
+	 * domain share the same physical MSI doorbell. Otherwise nested
+	 * stage MSI binding is not supported.
+	 */
+	if (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED &&
+		!arm_smmu_share_msi_domain(domain, dev)) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
 
 	master->domain = smmu_domain;
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v14 10/13] iommu/smmuv3: Enforce incompatibility between nested mode and HW MSI regions
  2021-02-23 20:56 [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part) Eric Auger
                   ` (8 preceding siblings ...)
  2021-02-23 20:56 ` [PATCH v14 09/13] iommu/smmuv3: Nested mode single MSI doorbell per domain enforcement Eric Auger
@ 2021-02-23 20:56 ` Eric Auger
  2021-02-23 20:56 ` [PATCH v14 11/13] iommu/smmuv3: Implement bind/unbind_guest_msi Eric Auger
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 46+ messages in thread
From: Eric Auger @ 2021-02-23 20:56 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, iommu, linux-kernel, kvm, kvmarm,
	will, maz, robin.murphy, joro, alex.williamson, tn, zhukeqian1
  Cc: jacob.jun.pan, yi.l.liu, wangxingang5, jiangkunkun,
	jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, yuzenghui, nicoleotsuka, lushenming,
	vsethi

Nested mode currently is not compatible with HW MSI reserved regions.
Indeed MSI transactions targeting this MSI doorbells bypass the SMMU.

Let's check nested mode is not attempted in such configuration.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 23 +++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 0cf92cd0ab3e..ca6f796aeb95 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2460,6 +2460,23 @@ static bool arm_smmu_share_msi_domain(struct iommu_domain *domain,
 	return share;
 }
 
+static bool arm_smmu_has_hw_msi_resv_region(struct device *dev)
+{
+	struct iommu_resv_region *region;
+	bool has_msi_resv_region = false;
+	LIST_HEAD(resv_regions);
+
+	iommu_get_resv_regions(dev, &resv_regions);
+	list_for_each_entry(region, &resv_regions, list) {
+		if (region->type == IOMMU_RESV_MSI) {
+			has_msi_resv_region = true;
+			break;
+		}
+	}
+	iommu_put_resv_regions(dev, &resv_regions);
+	return has_msi_resv_region;
+}
+
 static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 {
 	int ret = 0;
@@ -2520,10 +2537,12 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	/*
 	 * In nested mode we must check all devices belonging to the
 	 * domain share the same physical MSI doorbell. Otherwise nested
-	 * stage MSI binding is not supported.
+	 * stage MSI binding is not supported. Also nested mode is not
+	 * compatible with MSI HW reserved regions.
 	 */
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED &&
-		!arm_smmu_share_msi_domain(domain, dev)) {
+		(!arm_smmu_share_msi_domain(domain, dev) ||
+		 arm_smmu_has_hw_msi_resv_region(dev))) {
 		ret = -EINVAL;
 		goto out_unlock;
 	}
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v14 11/13] iommu/smmuv3: Implement bind/unbind_guest_msi
  2021-02-23 20:56 [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part) Eric Auger
                   ` (9 preceding siblings ...)
  2021-02-23 20:56 ` [PATCH v14 10/13] iommu/smmuv3: Enforce incompatibility between nested mode and HW MSI regions Eric Auger
@ 2021-02-23 20:56 ` Eric Auger
  2021-02-23 20:56 ` [PATCH v14 12/13] iommu/smmuv3: report additional recoverable faults Eric Auger
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 46+ messages in thread
From: Eric Auger @ 2021-02-23 20:56 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, iommu, linux-kernel, kvm, kvmarm,
	will, maz, robin.murphy, joro, alex.williamson, tn, zhukeqian1
  Cc: jacob.jun.pan, yi.l.liu, wangxingang5, jiangkunkun,
	jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, yuzenghui, nicoleotsuka, lushenming,
	vsethi

The bind/unbind_guest_msi() callbacks check the domain
is NESTED and redirect to the dma-iommu implementation.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v6 -> v7:
- remove device handle argument
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 43 +++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index ca6f796aeb95..f783c2804872 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2922,6 +2922,47 @@ static void arm_smmu_get_resv_regions(struct device *dev,
 	iommu_dma_get_resv_regions(dev, head);
 }
 
+static int
+arm_smmu_bind_guest_msi(struct iommu_domain *domain,
+			dma_addr_t giova, phys_addr_t gpa, size_t size)
+{
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct arm_smmu_device *smmu;
+	int ret = -EINVAL;
+
+	mutex_lock(&smmu_domain->init_mutex);
+	smmu = smmu_domain->smmu;
+	if (!smmu)
+		goto out;
+
+	if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+		goto out;
+
+	ret = iommu_dma_bind_guest_msi(domain, giova, gpa, size);
+out:
+	mutex_unlock(&smmu_domain->init_mutex);
+	return ret;
+}
+
+static void
+arm_smmu_unbind_guest_msi(struct iommu_domain *domain, dma_addr_t giova)
+{
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct arm_smmu_device *smmu;
+
+	mutex_lock(&smmu_domain->init_mutex);
+	smmu = smmu_domain->smmu;
+	if (!smmu)
+		goto unlock;
+
+	if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+		goto unlock;
+
+	iommu_dma_unbind_guest_msi(domain, giova);
+unlock:
+	mutex_unlock(&smmu_domain->init_mutex);
+}
+
 static int arm_smmu_attach_pasid_table(struct iommu_domain *domain,
 				       struct iommu_pasid_table_config *cfg)
 {
@@ -3182,6 +3223,8 @@ static struct iommu_ops arm_smmu_ops = {
 	.attach_pasid_table	= arm_smmu_attach_pasid_table,
 	.detach_pasid_table	= arm_smmu_detach_pasid_table,
 	.cache_invalidate	= arm_smmu_cache_invalidate,
+	.bind_guest_msi		= arm_smmu_bind_guest_msi,
+	.unbind_guest_msi	= arm_smmu_unbind_guest_msi,
 	.dev_has_feat		= arm_smmu_dev_has_feature,
 	.dev_feat_enabled	= arm_smmu_dev_feature_enabled,
 	.dev_enable_feat	= arm_smmu_dev_enable_feature,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v14 12/13] iommu/smmuv3: report additional recoverable faults
  2021-02-23 20:56 [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part) Eric Auger
                   ` (10 preceding siblings ...)
  2021-02-23 20:56 ` [PATCH v14 11/13] iommu/smmuv3: Implement bind/unbind_guest_msi Eric Auger
@ 2021-02-23 20:56 ` Eric Auger
  2021-02-23 20:56 ` [PATCH v14 13/13] iommu/smmuv3: Accept configs with more than one context descriptor Eric Auger
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 46+ messages in thread
From: Eric Auger @ 2021-02-23 20:56 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, iommu, linux-kernel, kvm, kvmarm,
	will, maz, robin.murphy, joro, alex.williamson, tn, zhukeqian1
  Cc: jacob.jun.pan, yi.l.liu, wangxingang5, jiangkunkun,
	jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, yuzenghui, nicoleotsuka, lushenming,
	vsethi

Up to now we have only reported translation faults. Now that
the guest can induce some configuration faults, let's report them
too. Add propagation for BAD_SUBSTREAMID, CD_FETCH, BAD_CD, WALK_EABT.
We also fix the transcoding for some existing translation faults.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 41 +++++++++++++++++++--
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  4 ++
 2 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index f783c2804872..332d31c0680f 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1472,6 +1472,7 @@ static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt)
 	u32 perm = 0;
 	struct arm_smmu_master *master;
 	bool ssid_valid = evt[0] & EVTQ_0_SSV;
+	u8 type = FIELD_GET(EVTQ_0_ID, evt[0]);
 	u32 sid = FIELD_GET(EVTQ_0_SID, evt[0]);
 	struct iommu_fault_event fault_evt = { };
 	struct iommu_fault *flt = &fault_evt.fault;
@@ -1524,9 +1525,6 @@ static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt)
 	} else {
 		flt->type = IOMMU_FAULT_DMA_UNRECOV;
 		flt->event = (struct iommu_fault_unrecoverable) {
-			.reason = reason,
-			.flags = IOMMU_FAULT_UNRECOV_ADDR_VALID |
-				 IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID,
 			.perm = perm,
 			.addr = FIELD_GET(EVTQ_2_ADDR, evt[2]),
 			.fetch_addr = FIELD_GET(EVTQ_3_IPA, evt[3]),
@@ -1536,6 +1534,43 @@ static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt)
 			flt->event.flags |= IOMMU_FAULT_UNRECOV_PASID_VALID;
 			flt->event.pasid = FIELD_GET(EVTQ_0_SSID, evt[0]);
 		}
+
+		switch (type) {
+		case EVT_ID_TRANSLATION_FAULT:
+			flt->event.reason = IOMMU_FAULT_REASON_PTE_FETCH;
+			flt->event.flags |= IOMMU_FAULT_UNRECOV_ADDR_VALID;
+			break;
+		case EVT_ID_ADDR_SIZE_FAULT:
+			flt->event.reason = IOMMU_FAULT_REASON_OOR_ADDRESS;
+			flt->event.flags |= IOMMU_FAULT_UNRECOV_ADDR_VALID;
+			break;
+		case EVT_ID_ACCESS_FAULT:
+			flt->event.reason = IOMMU_FAULT_REASON_ACCESS;
+			flt->event.flags |= IOMMU_FAULT_UNRECOV_ADDR_VALID;
+			break;
+		case EVT_ID_PERMISSION_FAULT:
+			flt->event.reason = IOMMU_FAULT_REASON_PERMISSION;
+			flt->event.flags |= IOMMU_FAULT_UNRECOV_ADDR_VALID;
+			break;
+		case EVT_ID_BAD_SUBSTREAMID:
+			flt->event.reason = IOMMU_FAULT_REASON_PASID_INVALID;
+			break;
+		case EVT_ID_CD_FETCH:
+			flt->event.reason = IOMMU_FAULT_REASON_PASID_FETCH;
+			flt->event.flags |= IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID;
+			break;
+		case EVT_ID_BAD_CD:
+			flt->event.reason = IOMMU_FAULT_REASON_BAD_PASID_ENTRY;
+			break;
+		case EVT_ID_WALK_EABT:
+			flt->event.reason = IOMMU_FAULT_REASON_WALK_EABT;
+			flt->event.flags |= IOMMU_FAULT_UNRECOV_ADDR_VALID |
+					    IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID;
+			break;
+		default:
+			/* TODO: report other unrecoverable faults. */
+			return -EFAULT;
+		}
 	}
 
 	ret = iommu_report_device_fault(master->dev, &fault_evt);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index eb0cc08e8240..9c37dbec75b2 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -378,6 +378,10 @@
 
 #define EVTQ_0_ID			GENMASK_ULL(7, 0)
 
+#define EVT_ID_BAD_SUBSTREAMID		0x08
+#define EVT_ID_CD_FETCH			0x09
+#define EVT_ID_BAD_CD			0x0a
+#define EVT_ID_WALK_EABT		0x0b
 #define EVT_ID_TRANSLATION_FAULT	0x10
 #define EVT_ID_ADDR_SIZE_FAULT		0x11
 #define EVT_ID_ACCESS_FAULT		0x12
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v14 13/13] iommu/smmuv3: Accept configs with more than one context descriptor
  2021-02-23 20:56 [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part) Eric Auger
                   ` (11 preceding siblings ...)
  2021-02-23 20:56 ` [PATCH v14 12/13] iommu/smmuv3: report additional recoverable faults Eric Auger
@ 2021-02-23 20:56 ` Eric Auger
  2021-03-30  9:23   ` Zenghui Yu
  2021-02-25 16:06 ` [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part) Auger Eric
  2021-03-18  0:16 ` Krishna Reddy
  14 siblings, 1 reply; 46+ messages in thread
From: Eric Auger @ 2021-02-23 20:56 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, iommu, linux-kernel, kvm, kvmarm,
	will, maz, robin.murphy, joro, alex.williamson, tn, zhukeqian1
  Cc: jacob.jun.pan, yi.l.liu, wangxingang5, jiangkunkun,
	jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, yuzenghui, nicoleotsuka, lushenming,
	vsethi

In preparation for vSVA, let's accept userspace provided configs
with more than one CD. We check the max CD against the host iommu
capability and also the format (linear versus 2 level).

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 332d31c0680f..ab74a0289893 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3038,14 +3038,17 @@ static int arm_smmu_attach_pasid_table(struct iommu_domain *domain,
 		if (smmu_domain->s1_cfg.set)
 			goto out;
 
-		/*
-		 * we currently support a single CD so s1fmt and s1dss
-		 * fields are also ignored
-		 */
-		if (cfg->pasid_bits)
+		list_for_each_entry(master, &smmu_domain->devices, domain_head) {
+			if (cfg->pasid_bits > master->ssid_bits)
+				goto out;
+		}
+		if (cfg->vendor_data.smmuv3.s1fmt == STRTAB_STE_0_S1FMT_64K_L2 &&
+				!(smmu->features & ARM_SMMU_FEAT_2_LVL_CDTAB))
 			goto out;
 
 		smmu_domain->s1_cfg.cdcfg.cdtab_dma = cfg->base_ptr;
+		smmu_domain->s1_cfg.s1cdmax = cfg->pasid_bits;
+		smmu_domain->s1_cfg.s1fmt = cfg->vendor_data.smmuv3.s1fmt;
 		smmu_domain->s1_cfg.set = true;
 		smmu_domain->abort = false;
 		break;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part)
  2021-02-23 20:56 [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part) Eric Auger
                   ` (12 preceding siblings ...)
  2021-02-23 20:56 ` [PATCH v14 13/13] iommu/smmuv3: Accept configs with more than one context descriptor Eric Auger
@ 2021-02-25 16:06 ` Auger Eric
  2021-04-22 15:04   ` Sumit Gupta
  2021-03-18  0:16 ` Krishna Reddy
  14 siblings, 1 reply; 46+ messages in thread
From: Auger Eric @ 2021-02-25 16:06 UTC (permalink / raw)
  To: eric.auger.pro, iommu, linux-kernel, kvm, kvmarm, will, maz,
	robin.murphy, joro, alex.williamson, tn, zhukeqian1
  Cc: jacob.jun.pan, yi.l.liu, wangxingang5, jiangkunkun,
	jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, yuzenghui, nicoleotsuka, lushenming,
	vsethi

Hi Shameer, all

On 2/23/21 9:56 PM, Eric Auger wrote:
> This series brings the IOMMU part of HW nested paging support
> in the SMMUv3. The VFIO part is submitted separately.
> 
> This is based on Jean-Philippe's
> [PATCH v12 00/10] iommu: I/O page faults for SMMUv3
> https://lore.kernel.org/linux-arm-kernel/YBfij71tyYvh8LhB@myrica/T/
> 
> The IOMMU API is extended to support 2 new API functionalities:
> 1) pass the guest stage 1 configuration
> 2) pass stage 1 MSI bindings
> 
> Then those capabilities gets implemented in the SMMUv3 driver.
> 
> The virtualizer passes information through the VFIO user API
> which cascades them to the iommu subsystem. This allows the guest
> to own stage 1 tables and context descriptors (so-called PASID
> table) while the host owns stage 2 tables and main configuration
> structures (STE).
> 
> Best Regards
> 
> Eric
> 
> This series can be found at:
> https://github.com/eauger/linux/tree/v5.11-stallv12-2stage-v14
> (including the VFIO part in its last version: v12)

As committed, I have rebased the iommu + vfio part on top of Jean's
sva/current (5.11-rc4).

https://github.com/eauger/linux/tree/jean_sva_current_2stage_v14

I have not tested the SVA bits but I have tested there is no regression
from my pov.

From the QEMU perspective is works off the shelf with that branch but if
you want to use other SVA related IOCTLs please remind of updating the
linux headers.

Again thank you to all of you who reviewed and tested the previous version.

Thanks

Eric
> 
> The VFIO series is sent separately.
> 
> History:
> 
> Previous version (v13):
> https://github.com/eauger/linux/tree/5.10-rc4-2stage-v13
> 
> v13 -> v14:
> - Took into account all received comments I think. Great
>   thanks to all the testers for their effort and sometimes
>   fixes. I am really grateful to you!
> - numerous fixes including guest running in
>   noiommu, iommu.strict=0, iommu.passthrough=on,
>   enable_unsafe_noiommu_mode
> 
> v12 -> v13:
> - fixed compilation issue with CONFIG_ARM_SMMU_V3_SVA
>   reported by Shameer. This urged me to revisit patch 4 into
>   iommu/smmuv3: Allow s1 and s2 configs to coexist where
>   s1_cfg and s2_cfg are not dynamically allocated anymore.
>   Instead I use a new set field in existing structs
> - fixed 2 others config checks
> - Updated "iommu/arm-smmu-v3: Maintain a SID->device structure"
>   according to the last version
> 
> v11 -> v12:
> - rebase on top of v5.10-rc4
> 
> Eric Auger (13):
>   iommu: Introduce attach/detach_pasid_table API
>   iommu: Introduce bind/unbind_guest_msi
>   iommu/smmuv3: Allow s1 and s2 configs to coexist
>   iommu/smmuv3: Get prepared for nested stage support
>   iommu/smmuv3: Implement attach/detach_pasid_table
>   iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
>   iommu/smmuv3: Implement cache_invalidate
>   dma-iommu: Implement NESTED_MSI cookie
>   iommu/smmuv3: Nested mode single MSI doorbell per domain enforcement
>   iommu/smmuv3: Enforce incompatibility between nested mode and HW MSI
>     regions
>   iommu/smmuv3: Implement bind/unbind_guest_msi
>   iommu/smmuv3: report additional recoverable faults
>   iommu/smmuv3: Accept configs with more than one context descriptor
> 
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 444 ++++++++++++++++++--
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  14 +-
>  drivers/iommu/dma-iommu.c                   | 142 ++++++-
>  drivers/iommu/iommu.c                       | 106 +++++
>  include/linux/dma-iommu.h                   |  16 +
>  include/linux/iommu.h                       |  47 +++
>  include/uapi/linux/iommu.h                  |  54 +++
>  7 files changed, 781 insertions(+), 42 deletions(-)
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v14 05/13] iommu/smmuv3: Implement attach/detach_pasid_table
  2021-02-23 20:56 ` [PATCH v14 05/13] iommu/smmuv3: Implement attach/detach_pasid_table Eric Auger
@ 2021-03-02  8:35   ` Keqian Zhu
  2021-03-19 13:15     ` Auger Eric
  0 siblings, 1 reply; 46+ messages in thread
From: Keqian Zhu @ 2021-03-02  8:35 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, iommu, linux-kernel, kvm, kvmarm,
	will, maz, robin.murphy, joro, alex.williamson, tn
  Cc: jacob.jun.pan, yi.l.liu, wangxingang5, jiangkunkun,
	jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, yuzenghui, nicoleotsuka, lushenming,
	vsethi

Hi Eric,

On 2021/2/24 4:56, Eric Auger wrote:
> On attach_pasid_table() we program STE S1 related info set
> by the guest into the actual physical STEs. At minimum
> we need to program the context descriptor GPA and compute
> whether the stage1 is translated/bypassed or aborted.
> 
> On detach, the stage 1 config is unset and the abort flag is
> unset.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
[...]

> +
> +		/*
> +		 * we currently support a single CD so s1fmt and s1dss
> +		 * fields are also ignored
> +		 */
> +		if (cfg->pasid_bits)
> +			goto out;
> +
> +		smmu_domain->s1_cfg.cdcfg.cdtab_dma = cfg->base_ptr;
only the "cdtab_dma" field of "cdcfg" is set, we are not able to locate a specific cd using arm_smmu_get_cd_ptr().

Maybe we'd better use a specialized function to fill other fields of "cdcfg" or add a sanity check in arm_smmu_get_cd_ptr()
to prevent calling it under nested mode?

As now we just call arm_smmu_get_cd_ptr() during finalise_s1(), no problem found. Just a suggestion ;-)

Thanks,
Keqian


> +		smmu_domain->s1_cfg.set = true;
> +		smmu_domain->abort = false;
> +		break;
> +	default:
> +		goto out;
> +	}
> +	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
> +	list_for_each_entry(master, &smmu_domain->devices, domain_head)
> +		arm_smmu_install_ste_for_dev(master);
> +	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
> +	ret = 0;
> +out:
> +	mutex_unlock(&smmu_domain->init_mutex);
> +	return ret;
> +}
> +
> +static void arm_smmu_detach_pasid_table(struct iommu_domain *domain)
> +{
> +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> +	struct arm_smmu_master *master;
> +	unsigned long flags;
> +
> +	mutex_lock(&smmu_domain->init_mutex);
> +
> +	if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
> +		goto unlock;
> +
> +	smmu_domain->s1_cfg.set = false;
> +	smmu_domain->abort = false;
> +
> +	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
> +	list_for_each_entry(master, &smmu_domain->devices, domain_head)
> +		arm_smmu_install_ste_for_dev(master);
> +	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
> +
> +unlock:
> +	mutex_unlock(&smmu_domain->init_mutex);
> +}
> +
>  static bool arm_smmu_dev_has_feature(struct device *dev,
>  				     enum iommu_dev_features feat)
>  {
> @@ -2939,6 +3026,8 @@ static struct iommu_ops arm_smmu_ops = {
>  	.of_xlate		= arm_smmu_of_xlate,
>  	.get_resv_regions	= arm_smmu_get_resv_regions,
>  	.put_resv_regions	= generic_iommu_put_resv_regions,
> +	.attach_pasid_table	= arm_smmu_attach_pasid_table,
> +	.detach_pasid_table	= arm_smmu_detach_pasid_table,
>  	.dev_has_feat		= arm_smmu_dev_has_feature,
>  	.dev_feat_enabled	= arm_smmu_dev_feature_enabled,
>  	.dev_enable_feat	= arm_smmu_dev_enable_feature,
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part)
  2021-02-23 20:56 [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part) Eric Auger
                   ` (13 preceding siblings ...)
  2021-02-25 16:06 ` [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part) Auger Eric
@ 2021-03-18  0:16 ` Krishna Reddy
  2021-03-19 13:17   ` Auger Eric
  14 siblings, 1 reply; 46+ messages in thread
From: Krishna Reddy @ 2021-03-18  0:16 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, iommu, linux-kernel, kvm, kvmarm,
	will, maz, robin.murphy, joro, alex.williamson, tn, zhukeqian1
  Cc: jean-philippe, wangxingang5, lushenming, jiangkunkun,
	vivek.gautam, Vikram Sethi, zhangfei.gao, Bryan Huntsman,
	Terje Bergstrom, Yu-Huan Hsu, Sachin Nikam, Nicolin Chen,
	Pritesh Raithatha

Tested-by: Krishna Reddy <vdumpa@nvidia.com>

Validated nested translations with NVMe PCI device assigned to Guest VM. 
Tested with both v12 and v13 of Jean-Philippe's patches as base.

> This is based on Jean-Philippe's
> [PATCH v12 00/10] iommu: I/O page faults for SMMUv3
> https://lore.kernel.org/linux-arm-kernel/YBfij71tyYvh8LhB@myrica/T/

With Jean-Philippe's V13, Patch 12 of this series has a conflict that had to be resolved manually.

-KR



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v14 05/13] iommu/smmuv3: Implement attach/detach_pasid_table
  2021-03-02  8:35   ` Keqian Zhu
@ 2021-03-19 13:15     ` Auger Eric
  2021-03-22  6:23       ` Keqian Zhu
  0 siblings, 1 reply; 46+ messages in thread
From: Auger Eric @ 2021-03-19 13:15 UTC (permalink / raw)
  To: Keqian Zhu, eric.auger.pro, iommu, linux-kernel, kvm, kvmarm,
	will, maz, robin.murphy, joro, alex.williamson, tn
  Cc: jacob.jun.pan, yi.l.liu, wangxingang5, jiangkunkun,
	jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, yuzenghui, nicoleotsuka, lushenming,
	vsethi

Hi Keqian,

On 3/2/21 9:35 AM, Keqian Zhu wrote:
> Hi Eric,
> 
> On 2021/2/24 4:56, Eric Auger wrote:
>> On attach_pasid_table() we program STE S1 related info set
>> by the guest into the actual physical STEs. At minimum
>> we need to program the context descriptor GPA and compute
>> whether the stage1 is translated/bypassed or aborted.
>>
>> On detach, the stage 1 config is unset and the abort flag is
>> unset.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>
> [...]
> 
>> +
>> +		/*
>> +		 * we currently support a single CD so s1fmt and s1dss
>> +		 * fields are also ignored
>> +		 */
>> +		if (cfg->pasid_bits)
>> +			goto out;
>> +
>> +		smmu_domain->s1_cfg.cdcfg.cdtab_dma = cfg->base_ptr;
> only the "cdtab_dma" field of "cdcfg" is set, we are not able to locate a specific cd using arm_smmu_get_cd_ptr().
> 
> Maybe we'd better use a specialized function to fill other fields of "cdcfg" or add a sanity check in arm_smmu_get_cd_ptr()
> to prevent calling it under nested mode?
> 
> As now we just call arm_smmu_get_cd_ptr() during finalise_s1(), no problem found. Just a suggestion ;-)

forgive me for the delay. yes I can indeed make sure that code is not
called in nested mode. Please could you detail why you would need to
call arm_smmu_get_cd_ptr()?

Thanks

Eric
> 
> Thanks,
> Keqian
> 
> 
>> +		smmu_domain->s1_cfg.set = true;
>> +		smmu_domain->abort = false;
>> +		break;
>> +	default:
>> +		goto out;
>> +	}
>> +	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
>> +	list_for_each_entry(master, &smmu_domain->devices, domain_head)
>> +		arm_smmu_install_ste_for_dev(master);
>> +	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
>> +	ret = 0;
>> +out:
>> +	mutex_unlock(&smmu_domain->init_mutex);
>> +	return ret;
>> +}
>> +
>> +static void arm_smmu_detach_pasid_table(struct iommu_domain *domain)
>> +{
>> +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> +	struct arm_smmu_master *master;
>> +	unsigned long flags;
>> +
>> +	mutex_lock(&smmu_domain->init_mutex);
>> +
>> +	if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
>> +		goto unlock;
>> +
>> +	smmu_domain->s1_cfg.set = false;
>> +	smmu_domain->abort = false;
>> +
>> +	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
>> +	list_for_each_entry(master, &smmu_domain->devices, domain_head)
>> +		arm_smmu_install_ste_for_dev(master);
>> +	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
>> +
>> +unlock:
>> +	mutex_unlock(&smmu_domain->init_mutex);
>> +}
>> +
>>  static bool arm_smmu_dev_has_feature(struct device *dev,
>>  				     enum iommu_dev_features feat)
>>  {
>> @@ -2939,6 +3026,8 @@ static struct iommu_ops arm_smmu_ops = {
>>  	.of_xlate		= arm_smmu_of_xlate,
>>  	.get_resv_regions	= arm_smmu_get_resv_regions,
>>  	.put_resv_regions	= generic_iommu_put_resv_regions,
>> +	.attach_pasid_table	= arm_smmu_attach_pasid_table,
>> +	.detach_pasid_table	= arm_smmu_detach_pasid_table,
>>  	.dev_has_feat		= arm_smmu_dev_has_feature,
>>  	.dev_feat_enabled	= arm_smmu_dev_feature_enabled,
>>  	.dev_enable_feat	= arm_smmu_dev_enable_feature,
>>
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part)
  2021-03-18  0:16 ` Krishna Reddy
@ 2021-03-19 13:17   ` Auger Eric
  0 siblings, 0 replies; 46+ messages in thread
From: Auger Eric @ 2021-03-19 13:17 UTC (permalink / raw)
  To: Krishna Reddy, eric.auger.pro, iommu, linux-kernel, kvm, kvmarm,
	will, maz, robin.murphy, joro, alex.williamson, tn, zhukeqian1
  Cc: jean-philippe, wangxingang5, lushenming, jiangkunkun,
	vivek.gautam, Vikram Sethi, zhangfei.gao, Bryan Huntsman,
	Terje Bergstrom, Yu-Huan Hsu, Sachin Nikam, Nicolin Chen,
	Pritesh Raithatha

Hi Krishna,

On 3/18/21 1:16 AM, Krishna Reddy wrote:
> Tested-by: Krishna Reddy <vdumpa@nvidia.com>
> 
> Validated nested translations with NVMe PCI device assigned to Guest VM. 
> Tested with both v12 and v13 of Jean-Philippe's patches as base.

Many thanks for that.
> 
>> This is based on Jean-Philippe's
>> [PATCH v12 00/10] iommu: I/O page faults for SMMUv3
>> https://lore.kernel.org/linux-arm-kernel/YBfij71tyYvh8LhB@myrica/T/
> 
> With Jean-Philippe's V13, Patch 12 of this series has a conflict that had to be resolved manually.

Yep I will respin accordingly.

Best Regards

Eric
> 
> -KR
> 
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v14 07/13] iommu/smmuv3: Implement cache_invalidate
       [not found]   ` <c10c6405-5efe-5a41-2b3a-f3af85a528ba@hisilicon.com>
@ 2021-03-19 17:36     ` Auger Eric
  2021-03-22  6:40       ` [Linuxarm] " chenxiang (M)
  0 siblings, 1 reply; 46+ messages in thread
From: Auger Eric @ 2021-03-19 17:36 UTC (permalink / raw)
  To: chenxiang (M),
	eric.auger.pro, iommu, linux-kernel, kvm, kvmarm, will, maz,
	robin.murphy, joro, alex.williamson, tn, zhukeqian1
  Cc: jean-philippe, wangxingang5, lushenming, jiangkunkun,
	vivek.gautam, vsethi, zhangfei.gao, linuxarm

Hi Chenxiang,

On 3/4/21 8:55 AM, chenxiang (M) wrote:
> Hi Eric,
> 
> 
> 在 2021/2/24 4:56, Eric Auger 写道:
>> Implement domain-selective, pasid selective and page-selective
>> IOTLB invalidations.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>
>> ---
>>
>> v13 -> v14:
>> - Add domain invalidation
>> - do global inval when asid is not provided with addr
>>   granularity
>>
>> v7 -> v8:
>> - ASID based invalidation using iommu_inv_pasid_info
>> - check ARCHID/PASID flags in addr based invalidation
>> - use __arm_smmu_tlb_inv_context and __arm_smmu_tlb_inv_range_nosync
>>
>> v6 -> v7
>> - check the uapi version
>>
>> v3 -> v4:
>> - adapt to changes in the uapi
>> - add support for leaf parameter
>> - do not use arm_smmu_tlb_inv_range_nosync or arm_smmu_tlb_inv_context
>>   anymore
>>
>> v2 -> v3:
>> - replace __arm_smmu_tlb_sync by arm_smmu_cmdq_issue_sync
>>
>> v1 -> v2:
>> - properly pass the asid
>> ---
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 74 +++++++++++++++++++++
>>  1 file changed, 74 insertions(+)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 4c19a1114de4..df3adc49111c 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -2949,6 +2949,79 @@ static void arm_smmu_detach_pasid_table(struct iommu_domain *domain)
>>  	mutex_unlock(&smmu_domain->init_mutex);
>>  }
>>  
>> +static int
>> +arm_smmu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
>> +			  struct iommu_cache_invalidate_info *inv_info)
>> +{
>> +	struct arm_smmu_cmdq_ent cmd = {.opcode = CMDQ_OP_TLBI_NSNH_ALL};
>> +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
>> +
>> +	if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
>> +		return -EINVAL;
>> +
>> +	if (!smmu)
>> +		return -EINVAL;
>> +
>> +	if (inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
>> +		return -EINVAL;
>> +
>> +	if (inv_info->cache & IOMMU_CACHE_INV_TYPE_PASID ||
>> +	    inv_info->cache & IOMMU_CACHE_INV_TYPE_DEV_IOTLB) {
>> +		return -ENOENT;
>> +	}
>> +
>> +	if (!(inv_info->cache & IOMMU_CACHE_INV_TYPE_IOTLB))
>> +		return -EINVAL;
>> +
>> +	/* IOTLB invalidation */
>> +
>> +	switch (inv_info->granularity) {
>> +	case IOMMU_INV_GRANU_PASID:
>> +	{
>> +		struct iommu_inv_pasid_info *info =
>> +			&inv_info->granu.pasid_info;
>> +
>> +		if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
>> +			return -ENOENT;
>> +		if (!(info->flags & IOMMU_INV_PASID_FLAGS_ARCHID))
>> +			return -EINVAL;
>> +
>> +		__arm_smmu_tlb_inv_context(smmu_domain, info->archid);
>> +		return 0;
>> +	}
>> +	case IOMMU_INV_GRANU_ADDR:
>> +	{
>> +		struct iommu_inv_addr_info *info = &inv_info->granu.addr_info;
>> +		size_t size = info->nb_granules * info->granule_size;
>> +		bool leaf = info->flags & IOMMU_INV_ADDR_FLAGS_LEAF;
>> +
>> +		if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
>> +			return -ENOENT;
>> +
>> +		if (!(info->flags & IOMMU_INV_ADDR_FLAGS_ARCHID))
>> +			break;
>> +
>> +		arm_smmu_tlb_inv_range_domain(info->addr, size,
>> +					      info->granule_size, leaf,
>> +					      info->archid, smmu_domain);
> 
> Is it possible to add a check before the function to make sure that
> info->granule_size can be recognized by SMMU?
> There is a scenario which will cause TLBI issue: RIL feature is enabled
> on guest but is disabled on host, and then on
> host it just invalidate 4K/2M/1G once a time, but from QEMU,
> info->nb_granules is always 1 and info->granule_size = size,
> if size is not equal to 4K or 2M or 1G (for example size = granule_size
> is 5M), it will only part of the size it wants to invalidate.
> 
> I think maybe we can add a check here: if RIL is not enabled and also
> size is not the granule_size (4K/2M/1G) supported by
> SMMU hardware, can we just simply use the smallest granule_size
> supported by hardware all the time?
> 
>> +
>> +		arm_smmu_cmdq_issue_sync(smmu);
>> +		return 0;
>> +	}
>> +	case IOMMU_INV_GRANU_DOMAIN:
>> +		break;
> 
> I check the qemu code
> (https://github.com/eauger/qemu/tree/v5.2.0-2stage-rfcv8), for opcode
> CMD_TLBI_NH_ALL or CMD_TLBI_NSNH_ALL from guest OS
> it calls smmu_inv_notifiers_all() to unamp all notifiers of all mr's,
> but it seems not set event.entry.granularity which i think it should set
> IOMMU_INV_GRAN_ADDR.
this is because IOMMU_INV_GRAN_ADDR = 0. But for clarity I should rather
set it explicitly ;-)
> BTW, for opcode CMD_TLBI_NH_ALL or CMD_TLBI_NSNH_ALL, it needs to unmap
> size = 0x1000000000000 on 48bit system (it may spend much time),  maybe
> it is better
> to set it as IOMMU_INV_GRANU_DOMAIN, then in host OS, send TLBI_NH_ALL
> directly when IOMMU_INV_GRANU_DOMAIN.
Yes you're right. If the host does not support RIL then it is an issue.
This is going to be fixed in the next version.

Thank you for the report!

Best Regards

Eric
> 
> 
>> +	default:
>> +		return -EINVAL;
>> +	}
>> +
>> +	/* Global S1 invalidation */
>> +	cmd.tlbi.vmid   = smmu_domain->s2_cfg.vmid;
>> +	arm_smmu_cmdq_issue_cmd(smmu, &cmd);
>> +	arm_smmu_cmdq_issue_sync(smmu);
>> +	return 0;
>> +}
>> +
>>  static bool arm_smmu_dev_has_feature(struct device *dev,
>>  				     enum iommu_dev_features feat)
>>  {
>> @@ -3048,6 +3121,7 @@ static struct iommu_ops arm_smmu_ops = {
>>  	.put_resv_regions	= generic_iommu_put_resv_regions,
>>  	.attach_pasid_table	= arm_smmu_attach_pasid_table,
>>  	.detach_pasid_table	= arm_smmu_detach_pasid_table,
>> +	.cache_invalidate	= arm_smmu_cache_invalidate,
>>  	.dev_has_feat		= arm_smmu_dev_has_feature,
>>  	.dev_feat_enabled	= arm_smmu_dev_feature_enabled,
>>  	.dev_enable_feat	= arm_smmu_dev_enable_feature,
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v14 05/13] iommu/smmuv3: Implement attach/detach_pasid_table
  2021-03-19 13:15     ` Auger Eric
@ 2021-03-22  6:23       ` Keqian Zhu
  0 siblings, 0 replies; 46+ messages in thread
From: Keqian Zhu @ 2021-03-22  6:23 UTC (permalink / raw)
  To: Auger Eric, eric.auger.pro, iommu, linux-kernel, kvm, kvmarm,
	will, maz, robin.murphy, joro, alex.williamson, tn
  Cc: jacob.jun.pan, yi.l.liu, wangxingang5, jiangkunkun,
	jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, yuzenghui, nicoleotsuka, lushenming,
	vsethi

Hi Eric,

On 2021/3/19 21:15, Auger Eric wrote:
> Hi Keqian,
> 
> On 3/2/21 9:35 AM, Keqian Zhu wrote:
>> Hi Eric,
>>
>> On 2021/2/24 4:56, Eric Auger wrote:
>>> On attach_pasid_table() we program STE S1 related info set
>>> by the guest into the actual physical STEs. At minimum
>>> we need to program the context descriptor GPA and compute
>>> whether the stage1 is translated/bypassed or aborted.
>>>
>>> On detach, the stage 1 config is unset and the abort flag is
>>> unset.
>>>
>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>
>> [...]
>>
>>> +
>>> +		/*
>>> +		 * we currently support a single CD so s1fmt and s1dss
>>> +		 * fields are also ignored
>>> +		 */
>>> +		if (cfg->pasid_bits)
>>> +			goto out;
>>> +
>>> +		smmu_domain->s1_cfg.cdcfg.cdtab_dma = cfg->base_ptr;
>> only the "cdtab_dma" field of "cdcfg" is set, we are not able to locate a specific cd using arm_smmu_get_cd_ptr().
>>
>> Maybe we'd better use a specialized function to fill other fields of "cdcfg" or add a sanity check in arm_smmu_get_cd_ptr()
>> to prevent calling it under nested mode?
>>
>> As now we just call arm_smmu_get_cd_ptr() during finalise_s1(), no problem found. Just a suggestion ;-)
> 
> forgive me for the delay. yes I can indeed make sure that code is not
> called in nested mode. Please could you detail why you would need to
> call arm_smmu_get_cd_ptr()?
I accidentally called this function in nested mode when verify the smmu mpam feature. :)

Yes, in nested mode, context descriptor is owned by guest, hypervisor does not need to care about its content.
Maybe we'd better give an explicit comment for arm_smmu_get_cd_ptr() to let coder pay attention to this? :)

Thanks,
Keqian

> 
> Thanks
> 
> Eric
>>
>> Thanks,
>> Keqian
>>
>>
>>> +		smmu_domain->s1_cfg.set = true;
>>> +		smmu_domain->abort = false;
>>> +		break;
>>> +	default:
>>> +		goto out;
>>> +	}
>>> +	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
>>> +	list_for_each_entry(master, &smmu_domain->devices, domain_head)
>>> +		arm_smmu_install_ste_for_dev(master);
>>> +	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
>>> +	ret = 0;
>>> +out:
>>> +	mutex_unlock(&smmu_domain->init_mutex);
>>> +	return ret;
>>> +}
>>> +
>>> +static void arm_smmu_detach_pasid_table(struct iommu_domain *domain)
>>> +{
>>> +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>> +	struct arm_smmu_master *master;
>>> +	unsigned long flags;
>>> +
>>> +	mutex_lock(&smmu_domain->init_mutex);
>>> +
>>> +	if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
>>> +		goto unlock;
>>> +
>>> +	smmu_domain->s1_cfg.set = false;
>>> +	smmu_domain->abort = false;
>>> +
>>> +	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
>>> +	list_for_each_entry(master, &smmu_domain->devices, domain_head)
>>> +		arm_smmu_install_ste_for_dev(master);
>>> +	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
>>> +
>>> +unlock:
>>> +	mutex_unlock(&smmu_domain->init_mutex);
>>> +}
>>> +
>>>  static bool arm_smmu_dev_has_feature(struct device *dev,
>>>  				     enum iommu_dev_features feat)
>>>  {
>>> @@ -2939,6 +3026,8 @@ static struct iommu_ops arm_smmu_ops = {
>>>  	.of_xlate		= arm_smmu_of_xlate,
>>>  	.get_resv_regions	= arm_smmu_get_resv_regions,
>>>  	.put_resv_regions	= generic_iommu_put_resv_regions,
>>> +	.attach_pasid_table	= arm_smmu_attach_pasid_table,
>>> +	.detach_pasid_table	= arm_smmu_detach_pasid_table,
>>>  	.dev_has_feat		= arm_smmu_dev_has_feature,
>>>  	.dev_feat_enabled	= arm_smmu_dev_feature_enabled,
>>>  	.dev_enable_feat	= arm_smmu_dev_enable_feature,
>>>
>>
> 
> .
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Linuxarm] Re: [PATCH v14 07/13] iommu/smmuv3: Implement cache_invalidate
  2021-03-19 17:36     ` Auger Eric
@ 2021-03-22  6:40       ` chenxiang (M)
  2021-03-22  9:05         ` Auger Eric
  0 siblings, 1 reply; 46+ messages in thread
From: chenxiang (M) @ 2021-03-22  6:40 UTC (permalink / raw)
  To: Auger Eric, eric.auger.pro, iommu, linux-kernel, kvm, kvmarm,
	will, maz, robin.murphy, joro, alex.williamson, tn, zhukeqian1
  Cc: jean-philippe, wangxingang5, lushenming, jiangkunkun,
	vivek.gautam, vsethi, zhangfei.gao, linuxarm

Hi Eric,


在 2021/3/20 1:36, Auger Eric 写道:
> Hi Chenxiang,
>
> On 3/4/21 8:55 AM, chenxiang (M) wrote:
>> Hi Eric,
>>
>>
>> 在 2021/2/24 4:56, Eric Auger 写道:
>>> Implement domain-selective, pasid selective and page-selective
>>> IOTLB invalidations.
>>>
>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>
>>> ---
>>>
>>> v13 -> v14:
>>> - Add domain invalidation
>>> - do global inval when asid is not provided with addr
>>>    granularity
>>>
>>> v7 -> v8:
>>> - ASID based invalidation using iommu_inv_pasid_info
>>> - check ARCHID/PASID flags in addr based invalidation
>>> - use __arm_smmu_tlb_inv_context and __arm_smmu_tlb_inv_range_nosync
>>>
>>> v6 -> v7
>>> - check the uapi version
>>>
>>> v3 -> v4:
>>> - adapt to changes in the uapi
>>> - add support for leaf parameter
>>> - do not use arm_smmu_tlb_inv_range_nosync or arm_smmu_tlb_inv_context
>>>    anymore
>>>
>>> v2 -> v3:
>>> - replace __arm_smmu_tlb_sync by arm_smmu_cmdq_issue_sync
>>>
>>> v1 -> v2:
>>> - properly pass the asid
>>> ---
>>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 74 +++++++++++++++++++++
>>>   1 file changed, 74 insertions(+)
>>>
>>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>> index 4c19a1114de4..df3adc49111c 100644
>>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>> @@ -2949,6 +2949,79 @@ static void arm_smmu_detach_pasid_table(struct iommu_domain *domain)
>>>   	mutex_unlock(&smmu_domain->init_mutex);
>>>   }
>>>   
>>> +static int
>>> +arm_smmu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
>>> +			  struct iommu_cache_invalidate_info *inv_info)
>>> +{
>>> +	struct arm_smmu_cmdq_ent cmd = {.opcode = CMDQ_OP_TLBI_NSNH_ALL};
>>> +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
>>> +
>>> +	if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
>>> +		return -EINVAL;
>>> +
>>> +	if (!smmu)
>>> +		return -EINVAL;
>>> +
>>> +	if (inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
>>> +		return -EINVAL;
>>> +
>>> +	if (inv_info->cache & IOMMU_CACHE_INV_TYPE_PASID ||
>>> +	    inv_info->cache & IOMMU_CACHE_INV_TYPE_DEV_IOTLB) {
>>> +		return -ENOENT;
>>> +	}
>>> +
>>> +	if (!(inv_info->cache & IOMMU_CACHE_INV_TYPE_IOTLB))
>>> +		return -EINVAL;
>>> +
>>> +	/* IOTLB invalidation */
>>> +
>>> +	switch (inv_info->granularity) {
>>> +	case IOMMU_INV_GRANU_PASID:
>>> +	{
>>> +		struct iommu_inv_pasid_info *info =
>>> +			&inv_info->granu.pasid_info;
>>> +
>>> +		if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
>>> +			return -ENOENT;
>>> +		if (!(info->flags & IOMMU_INV_PASID_FLAGS_ARCHID))
>>> +			return -EINVAL;
>>> +
>>> +		__arm_smmu_tlb_inv_context(smmu_domain, info->archid);
>>> +		return 0;
>>> +	}
>>> +	case IOMMU_INV_GRANU_ADDR:
>>> +	{
>>> +		struct iommu_inv_addr_info *info = &inv_info->granu.addr_info;
>>> +		size_t size = info->nb_granules * info->granule_size;
>>> +		bool leaf = info->flags & IOMMU_INV_ADDR_FLAGS_LEAF;
>>> +
>>> +		if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
>>> +			return -ENOENT;
>>> +
>>> +		if (!(info->flags & IOMMU_INV_ADDR_FLAGS_ARCHID))
>>> +			break;
>>> +
>>> +		arm_smmu_tlb_inv_range_domain(info->addr, size,
>>> +					      info->granule_size, leaf,
>>> +					      info->archid, smmu_domain);
>> Is it possible to add a check before the function to make sure that
>> info->granule_size can be recognized by SMMU?
>> There is a scenario which will cause TLBI issue: RIL feature is enabled
>> on guest but is disabled on host, and then on
>> host it just invalidate 4K/2M/1G once a time, but from QEMU,
>> info->nb_granules is always 1 and info->granule_size = size,
>> if size is not equal to 4K or 2M or 1G (for example size = granule_size
>> is 5M), it will only part of the size it wants to invalidate.

Do you have any idea about this issue?

>>
>> I think maybe we can add a check here: if RIL is not enabled and also
>> size is not the granule_size (4K/2M/1G) supported by
>> SMMU hardware, can we just simply use the smallest granule_size
>> supported by hardware all the time?
>>
>>> +
>>> +		arm_smmu_cmdq_issue_sync(smmu);
>>> +		return 0;
>>> +	}
>>> +	case IOMMU_INV_GRANU_DOMAIN:
>>> +		break;
>> I check the qemu code
>> (https://github.com/eauger/qemu/tree/v5.2.0-2stage-rfcv8), for opcode
>> CMD_TLBI_NH_ALL or CMD_TLBI_NSNH_ALL from guest OS
>> it calls smmu_inv_notifiers_all() to unamp all notifiers of all mr's,
>> but it seems not set event.entry.granularity which i think it should set
>> IOMMU_INV_GRAN_ADDR.
> this is because IOMMU_INV_GRAN_ADDR = 0. But for clarity I should rather
> set it explicitly ;-)

ok

>> BTW, for opcode CMD_TLBI_NH_ALL or CMD_TLBI_NSNH_ALL, it needs to unmap
>> size = 0x1000000000000 on 48bit system (it may spend much time),  maybe
>> it is better
>> to set it as IOMMU_INV_GRANU_DOMAIN, then in host OS, send TLBI_NH_ALL
>> directly when IOMMU_INV_GRANU_DOMAIN.
> Yes you're right. If the host does not support RIL then it is an issue.
> This is going to be fixed in the next version.

Great

>
> Thank you for the report!
>
> Best Regards
>
> Eric
>>
>>> +	default:
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	/* Global S1 invalidation */
>>> +	cmd.tlbi.vmid   = smmu_domain->s2_cfg.vmid;
>>> +	arm_smmu_cmdq_issue_cmd(smmu, &cmd);
>>> +	arm_smmu_cmdq_issue_sync(smmu);
>>> +	return 0;
>>> +}
>>> +
>>>   static bool arm_smmu_dev_has_feature(struct device *dev,
>>>   				     enum iommu_dev_features feat)
>>>   {
>>> @@ -3048,6 +3121,7 @@ static struct iommu_ops arm_smmu_ops = {
>>>   	.put_resv_regions	= generic_iommu_put_resv_regions,
>>>   	.attach_pasid_table	= arm_smmu_attach_pasid_table,
>>>   	.detach_pasid_table	= arm_smmu_detach_pasid_table,
>>> +	.cache_invalidate	= arm_smmu_cache_invalidate,
>>>   	.dev_has_feat		= arm_smmu_dev_has_feature,
>>>   	.dev_feat_enabled	= arm_smmu_dev_feature_enabled,
>>>   	.dev_enable_feat	= arm_smmu_dev_enable_feature,
> _______________________________________________
> Linuxarm mailing list -- linuxarm@openeuler.org
> To unsubscribe send an email to linuxarm-leave@openeuler.org



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Linuxarm] Re: [PATCH v14 07/13] iommu/smmuv3: Implement cache_invalidate
  2021-03-22  6:40       ` [Linuxarm] " chenxiang (M)
@ 2021-03-22  9:05         ` Auger Eric
  2021-03-23  1:28           ` chenxiang (M)
  0 siblings, 1 reply; 46+ messages in thread
From: Auger Eric @ 2021-03-22  9:05 UTC (permalink / raw)
  To: chenxiang (M),
	eric.auger.pro, iommu, linux-kernel, kvm, kvmarm, will, maz,
	robin.murphy, joro, alex.williamson, tn, zhukeqian1
  Cc: jean-philippe, wangxingang5, lushenming, jiangkunkun,
	vivek.gautam, vsethi, zhangfei.gao, linuxarm

Hi Chenxiang,

On 3/22/21 7:40 AM, chenxiang (M) wrote:
> Hi Eric,
> 
> 
> 在 2021/3/20 1:36, Auger Eric 写道:
>> Hi Chenxiang,
>>
>> On 3/4/21 8:55 AM, chenxiang (M) wrote:
>>> Hi Eric,
>>>
>>>
>>> 在 2021/2/24 4:56, Eric Auger 写道:
>>>> Implement domain-selective, pasid selective and page-selective
>>>> IOTLB invalidations.
>>>>
>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>>
>>>> ---
>>>>
>>>> v13 -> v14:
>>>> - Add domain invalidation
>>>> - do global inval when asid is not provided with addr
>>>>    granularity
>>>>
>>>> v7 -> v8:
>>>> - ASID based invalidation using iommu_inv_pasid_info
>>>> - check ARCHID/PASID flags in addr based invalidation
>>>> - use __arm_smmu_tlb_inv_context and __arm_smmu_tlb_inv_range_nosync
>>>>
>>>> v6 -> v7
>>>> - check the uapi version
>>>>
>>>> v3 -> v4:
>>>> - adapt to changes in the uapi
>>>> - add support for leaf parameter
>>>> - do not use arm_smmu_tlb_inv_range_nosync or arm_smmu_tlb_inv_context
>>>>    anymore
>>>>
>>>> v2 -> v3:
>>>> - replace __arm_smmu_tlb_sync by arm_smmu_cmdq_issue_sync
>>>>
>>>> v1 -> v2:
>>>> - properly pass the asid
>>>> ---
>>>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 74
>>>> +++++++++++++++++++++
>>>>   1 file changed, 74 insertions(+)
>>>>
>>>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> index 4c19a1114de4..df3adc49111c 100644
>>>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> @@ -2949,6 +2949,79 @@ static void
>>>> arm_smmu_detach_pasid_table(struct iommu_domain *domain)
>>>>       mutex_unlock(&smmu_domain->init_mutex);
>>>>   }
>>>>   +static int
>>>> +arm_smmu_cache_invalidate(struct iommu_domain *domain, struct
>>>> device *dev,
>>>> +              struct iommu_cache_invalidate_info *inv_info)
>>>> +{
>>>> +    struct arm_smmu_cmdq_ent cmd = {.opcode = CMDQ_OP_TLBI_NSNH_ALL};
>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>> +    struct arm_smmu_device *smmu = smmu_domain->smmu;
>>>> +
>>>> +    if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
>>>> +        return -EINVAL;
>>>> +
>>>> +    if (!smmu)
>>>> +        return -EINVAL;
>>>> +
>>>> +    if (inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
>>>> +        return -EINVAL;
>>>> +
>>>> +    if (inv_info->cache & IOMMU_CACHE_INV_TYPE_PASID ||
>>>> +        inv_info->cache & IOMMU_CACHE_INV_TYPE_DEV_IOTLB) {
>>>> +        return -ENOENT;
>>>> +    }
>>>> +
>>>> +    if (!(inv_info->cache & IOMMU_CACHE_INV_TYPE_IOTLB))
>>>> +        return -EINVAL;
>>>> +
>>>> +    /* IOTLB invalidation */
>>>> +
>>>> +    switch (inv_info->granularity) {
>>>> +    case IOMMU_INV_GRANU_PASID:
>>>> +    {
>>>> +        struct iommu_inv_pasid_info *info =
>>>> +            &inv_info->granu.pasid_info;
>>>> +
>>>> +        if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
>>>> +            return -ENOENT;
>>>> +        if (!(info->flags & IOMMU_INV_PASID_FLAGS_ARCHID))
>>>> +            return -EINVAL;
>>>> +
>>>> +        __arm_smmu_tlb_inv_context(smmu_domain, info->archid);
>>>> +        return 0;
>>>> +    }
>>>> +    case IOMMU_INV_GRANU_ADDR:
>>>> +    {
>>>> +        struct iommu_inv_addr_info *info = &inv_info->granu.addr_info;
>>>> +        size_t size = info->nb_granules * info->granule_size;
>>>> +        bool leaf = info->flags & IOMMU_INV_ADDR_FLAGS_LEAF;
>>>> +
>>>> +        if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
>>>> +            return -ENOENT;
>>>> +
>>>> +        if (!(info->flags & IOMMU_INV_ADDR_FLAGS_ARCHID))
>>>> +            break;
>>>> +
>>>> +        arm_smmu_tlb_inv_range_domain(info->addr, size,
>>>> +                          info->granule_size, leaf,
>>>> +                          info->archid, smmu_domain);
>>> Is it possible to add a check before the function to make sure that
>>> info->granule_size can be recognized by SMMU?
>>> There is a scenario which will cause TLBI issue: RIL feature is enabled
>>> on guest but is disabled on host, and then on
>>> host it just invalidate 4K/2M/1G once a time, but from QEMU,
>>> info->nb_granules is always 1 and info->granule_size = size,
>>> if size is not equal to 4K or 2M or 1G (for example size = granule_size
>>> is 5M), it will only part of the size it wants to invalidate.
> 
> Do you have any idea about this issue?

At the QEMU VFIO notifier level, when I fill the struct
iommu_cache_invalidate_info, I currently miss the granule info, hence
this weird choice of setting setting info->nb_granules is always 1 and
info->granule_size = size. I think in arm_smmu_cache_invalidate() I need
to convert this info back to a the leaf page size, in case the host does
not support RIL. Just as it is done in  __arm_smmu_tlb_inv_range if RIL
is supported.

Does it makes sense to you?

Thank you for spotting the issue.

Eric
> 
>>>
>>> I think maybe we can add a check here: if RIL is not enabled and also
>>> size is not the granule_size (4K/2M/1G) supported by
>>> SMMU hardware, can we just simply use the smallest granule_size
>>> supported by hardware all the time?
>>>
>>>> +
>>>> +        arm_smmu_cmdq_issue_sync(smmu);
>>>> +        return 0;
>>>> +    }
>>>> +    case IOMMU_INV_GRANU_DOMAIN:
>>>> +        break;
>>> I check the qemu code
>>> (https://github.com/eauger/qemu/tree/v5.2.0-2stage-rfcv8), for opcode
>>> CMD_TLBI_NH_ALL or CMD_TLBI_NSNH_ALL from guest OS
>>> it calls smmu_inv_notifiers_all() to unamp all notifiers of all mr's,
>>> but it seems not set event.entry.granularity which i think it should set
>>> IOMMU_INV_GRAN_ADDR.
>> this is because IOMMU_INV_GRAN_ADDR = 0. But for clarity I should rather
>> set it explicitly ;-)
> 
> ok
> 
>>> BTW, for opcode CMD_TLBI_NH_ALL or CMD_TLBI_NSNH_ALL, it needs to unmap
>>> size = 0x1000000000000 on 48bit system (it may spend much time),  maybe
>>> it is better
>>> to set it as IOMMU_INV_GRANU_DOMAIN, then in host OS, send TLBI_NH_ALL
>>> directly when IOMMU_INV_GRANU_DOMAIN.
>> Yes you're right. If the host does not support RIL then it is an issue.
>> This is going to be fixed in the next version.
> 
> Great
> 
>>
>> Thank you for the report!
>>
>> Best Regards
>>
>> Eric
>>>
>>>> +    default:
>>>> +        return -EINVAL;
>>>> +    }
>>>> +
>>>> +    /* Global S1 invalidation */
>>>> +    cmd.tlbi.vmid   = smmu_domain->s2_cfg.vmid;
>>>> +    arm_smmu_cmdq_issue_cmd(smmu, &cmd);
>>>> +    arm_smmu_cmdq_issue_sync(smmu);
>>>> +    return 0;
>>>> +}
>>>> +
>>>>   static bool arm_smmu_dev_has_feature(struct device *dev,
>>>>                        enum iommu_dev_features feat)
>>>>   {
>>>> @@ -3048,6 +3121,7 @@ static struct iommu_ops arm_smmu_ops = {
>>>>       .put_resv_regions    = generic_iommu_put_resv_regions,
>>>>       .attach_pasid_table    = arm_smmu_attach_pasid_table,
>>>>       .detach_pasid_table    = arm_smmu_detach_pasid_table,
>>>> +    .cache_invalidate    = arm_smmu_cache_invalidate,
>>>>       .dev_has_feat        = arm_smmu_dev_has_feature,
>>>>       .dev_feat_enabled    = arm_smmu_dev_feature_enabled,
>>>>       .dev_enable_feat    = arm_smmu_dev_enable_feature,
>> _______________________________________________
>> Linuxarm mailing list -- linuxarm@openeuler.org
>> To unsubscribe send an email to linuxarm-leave@openeuler.org
> 
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Linuxarm] Re: [PATCH v14 07/13] iommu/smmuv3: Implement cache_invalidate
  2021-03-22  9:05         ` Auger Eric
@ 2021-03-23  1:28           ` chenxiang (M)
  0 siblings, 0 replies; 46+ messages in thread
From: chenxiang (M) @ 2021-03-23  1:28 UTC (permalink / raw)
  To: Auger Eric, eric.auger.pro, iommu, linux-kernel, kvm, kvmarm,
	will, maz, robin.murphy, joro, alex.williamson, tn, zhukeqian1
  Cc: jean-philippe, wangxingang5, lushenming, jiangkunkun,
	vivek.gautam, vsethi, zhangfei.gao, linuxarm

Hi Eric,


在 2021/3/22 17:05, Auger Eric 写道:
> Hi Chenxiang,
>
> On 3/22/21 7:40 AM, chenxiang (M) wrote:
>> Hi Eric,
>>
>>
>> 在 2021/3/20 1:36, Auger Eric 写道:
>>> Hi Chenxiang,
>>>
>>> On 3/4/21 8:55 AM, chenxiang (M) wrote:
>>>> Hi Eric,
>>>>
>>>>
>>>> 在 2021/2/24 4:56, Eric Auger 写道:
>>>>> Implement domain-selective, pasid selective and page-selective
>>>>> IOTLB invalidations.
>>>>>
>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>>>
>>>>> ---
>>>>>
>>>>> v13 -> v14:
>>>>> - Add domain invalidation
>>>>> - do global inval when asid is not provided with addr
>>>>>     granularity
>>>>>
>>>>> v7 -> v8:
>>>>> - ASID based invalidation using iommu_inv_pasid_info
>>>>> - check ARCHID/PASID flags in addr based invalidation
>>>>> - use __arm_smmu_tlb_inv_context and __arm_smmu_tlb_inv_range_nosync
>>>>>
>>>>> v6 -> v7
>>>>> - check the uapi version
>>>>>
>>>>> v3 -> v4:
>>>>> - adapt to changes in the uapi
>>>>> - add support for leaf parameter
>>>>> - do not use arm_smmu_tlb_inv_range_nosync or arm_smmu_tlb_inv_context
>>>>>     anymore
>>>>>
>>>>> v2 -> v3:
>>>>> - replace __arm_smmu_tlb_sync by arm_smmu_cmdq_issue_sync
>>>>>
>>>>> v1 -> v2:
>>>>> - properly pass the asid
>>>>> ---
>>>>>    drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 74
>>>>> +++++++++++++++++++++
>>>>>    1 file changed, 74 insertions(+)
>>>>>
>>>>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>>> index 4c19a1114de4..df3adc49111c 100644
>>>>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>>> @@ -2949,6 +2949,79 @@ static void
>>>>> arm_smmu_detach_pasid_table(struct iommu_domain *domain)
>>>>>        mutex_unlock(&smmu_domain->init_mutex);
>>>>>    }
>>>>>    +static int
>>>>> +arm_smmu_cache_invalidate(struct iommu_domain *domain, struct
>>>>> device *dev,
>>>>> +              struct iommu_cache_invalidate_info *inv_info)
>>>>> +{
>>>>> +    struct arm_smmu_cmdq_ent cmd = {.opcode = CMDQ_OP_TLBI_NSNH_ALL};
>>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>> +    struct arm_smmu_device *smmu = smmu_domain->smmu;
>>>>> +
>>>>> +    if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
>>>>> +        return -EINVAL;
>>>>> +
>>>>> +    if (!smmu)
>>>>> +        return -EINVAL;
>>>>> +
>>>>> +    if (inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
>>>>> +        return -EINVAL;
>>>>> +
>>>>> +    if (inv_info->cache & IOMMU_CACHE_INV_TYPE_PASID ||
>>>>> +        inv_info->cache & IOMMU_CACHE_INV_TYPE_DEV_IOTLB) {
>>>>> +        return -ENOENT;
>>>>> +    }
>>>>> +
>>>>> +    if (!(inv_info->cache & IOMMU_CACHE_INV_TYPE_IOTLB))
>>>>> +        return -EINVAL;
>>>>> +
>>>>> +    /* IOTLB invalidation */
>>>>> +
>>>>> +    switch (inv_info->granularity) {
>>>>> +    case IOMMU_INV_GRANU_PASID:
>>>>> +    {
>>>>> +        struct iommu_inv_pasid_info *info =
>>>>> +            &inv_info->granu.pasid_info;
>>>>> +
>>>>> +        if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
>>>>> +            return -ENOENT;
>>>>> +        if (!(info->flags & IOMMU_INV_PASID_FLAGS_ARCHID))
>>>>> +            return -EINVAL;
>>>>> +
>>>>> +        __arm_smmu_tlb_inv_context(smmu_domain, info->archid);
>>>>> +        return 0;
>>>>> +    }
>>>>> +    case IOMMU_INV_GRANU_ADDR:
>>>>> +    {
>>>>> +        struct iommu_inv_addr_info *info = &inv_info->granu.addr_info;
>>>>> +        size_t size = info->nb_granules * info->granule_size;
>>>>> +        bool leaf = info->flags & IOMMU_INV_ADDR_FLAGS_LEAF;
>>>>> +
>>>>> +        if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
>>>>> +            return -ENOENT;
>>>>> +
>>>>> +        if (!(info->flags & IOMMU_INV_ADDR_FLAGS_ARCHID))
>>>>> +            break;
>>>>> +
>>>>> +        arm_smmu_tlb_inv_range_domain(info->addr, size,
>>>>> +                          info->granule_size, leaf,
>>>>> +                          info->archid, smmu_domain);
>>>> Is it possible to add a check before the function to make sure that
>>>> info->granule_size can be recognized by SMMU?
>>>> There is a scenario which will cause TLBI issue: RIL feature is enabled
>>>> on guest but is disabled on host, and then on
>>>> host it just invalidate 4K/2M/1G once a time, but from QEMU,
>>>> info->nb_granules is always 1 and info->granule_size = size,
>>>> if size is not equal to 4K or 2M or 1G (for example size = granule_size
>>>> is 5M), it will only part of the size it wants to invalidate.
>> Do you have any idea about this issue?
> At the QEMU VFIO notifier level, when I fill the struct
> iommu_cache_invalidate_info, I currently miss the granule info, hence
> this weird choice of setting setting info->nb_granules is always 1 and
> info->granule_size = size. I think in arm_smmu_cache_invalidate() I need
> to convert this info back to a the leaf page size, in case the host does
> not support RIL. Just as it is done in  __arm_smmu_tlb_inv_range if RIL
> is supported.
>
> Does it makes sense to you?

Yes, it makes sense to me.
I am glad to test them if the patchset are ready.


>
> Thank you for spotting the issue.
>
> Eric
>>>> I think maybe we can add a check here: if RIL is not enabled and also
>>>> size is not the granule_size (4K/2M/1G) supported by
>>>> SMMU hardware, can we just simply use the smallest granule_size
>>>> supported by hardware all the time?
>>>>
>>>>> +
>>>>> +        arm_smmu_cmdq_issue_sync(smmu);
>>>>> +        return 0;
>>>>> +    }
>>>>> +    case IOMMU_INV_GRANU_DOMAIN:
>>>>> +        break;
>>>> I check the qemu code
>>>> (https://github.com/eauger/qemu/tree/v5.2.0-2stage-rfcv8), for opcode
>>>> CMD_TLBI_NH_ALL or CMD_TLBI_NSNH_ALL from guest OS
>>>> it calls smmu_inv_notifiers_all() to unamp all notifiers of all mr's,
>>>> but it seems not set event.entry.granularity which i think it should set
>>>> IOMMU_INV_GRAN_ADDR.
>>> this is because IOMMU_INV_GRAN_ADDR = 0. But for clarity I should rather
>>> set it explicitly ;-)
>> ok
>>
>>>> BTW, for opcode CMD_TLBI_NH_ALL or CMD_TLBI_NSNH_ALL, it needs to unmap
>>>> size = 0x1000000000000 on 48bit system (it may spend much time),  maybe
>>>> it is better
>>>> to set it as IOMMU_INV_GRANU_DOMAIN, then in host OS, send TLBI_NH_ALL
>>>> directly when IOMMU_INV_GRANU_DOMAIN.
>>> Yes you're right. If the host does not support RIL then it is an issue.
>>> This is going to be fixed in the next version.
>> Great
>>
>>> Thank you for the report!
>>>
>>> Best Regards
>>>
>>> Eric
>>>>> +    default:
>>>>> +        return -EINVAL;
>>>>> +    }
>>>>> +
>>>>> +    /* Global S1 invalidation */
>>>>> +    cmd.tlbi.vmid   = smmu_domain->s2_cfg.vmid;
>>>>> +    arm_smmu_cmdq_issue_cmd(smmu, &cmd);
>>>>> +    arm_smmu_cmdq_issue_sync(smmu);
>>>>> +    return 0;
>>>>> +}
>>>>> +
>>>>>    static bool arm_smmu_dev_has_feature(struct device *dev,
>>>>>                         enum iommu_dev_features feat)
>>>>>    {
>>>>> @@ -3048,6 +3121,7 @@ static struct iommu_ops arm_smmu_ops = {
>>>>>        .put_resv_regions    = generic_iommu_put_resv_regions,
>>>>>        .attach_pasid_table    = arm_smmu_attach_pasid_table,
>>>>>        .detach_pasid_table    = arm_smmu_detach_pasid_table,
>>>>> +    .cache_invalidate    = arm_smmu_cache_invalidate,
>>>>>        .dev_has_feat        = arm_smmu_dev_has_feature,
>>>>>        .dev_feat_enabled    = arm_smmu_dev_feature_enabled,
>>>>>        .dev_enable_feat    = arm_smmu_dev_enable_feature,
>>> _______________________________________________
>>> Linuxarm mailing list -- linuxarm@openeuler.org
>>> To unsubscribe send an email to linuxarm-leave@openeuler.org
>>
>
> .
>



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v14 06/13] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
  2021-02-23 20:56 ` [PATCH v14 06/13] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs Eric Auger
@ 2021-03-30  9:17   ` Zenghui Yu
  2021-04-01  9:38     ` Auger Eric
  2021-04-01 12:37   ` Kunkun Jiang
  1 sibling, 1 reply; 46+ messages in thread
From: Zenghui Yu @ 2021-03-30  9:17 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, iommu, linux-kernel, kvm, kvmarm, will, maz,
	robin.murphy, joro, alex.williamson, tn, zhukeqian1,
	jacob.jun.pan, yi.l.liu, wangxingang5, jiangkunkun,
	jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, nicoleotsuka, lushenming, vsethi,
	wanghaibin.wang

On 2021/2/24 4:56, Eric Auger wrote:
> @@ -1936,7 +1950,12 @@ static void arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
>   		},
>   	};
>   
> -	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
> +	if (ext_asid >= 0) {  /* guest stage 1 invalidation */
> +		cmd.opcode	= smmu_domain->smmu->features & ARM_SMMU_FEAT_E2H ?
> +				  CMDQ_OP_TLBI_EL2_VA : CMDQ_OP_TLBI_NH_VA;

If I understand it correctly, the true nested mode effectively gives us
a *NS-EL1* StreamWorld. We should therefore use CMDQ_OP_TLBI_NH_VA to
invalidate the stage-1 NS-EL1 entries, no matter E2H is selected or not.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v14 13/13] iommu/smmuv3: Accept configs with more than one context descriptor
  2021-02-23 20:56 ` [PATCH v14 13/13] iommu/smmuv3: Accept configs with more than one context descriptor Eric Auger
@ 2021-03-30  9:23   ` Zenghui Yu
  2021-04-01 11:48     ` Auger Eric
  0 siblings, 1 reply; 46+ messages in thread
From: Zenghui Yu @ 2021-03-30  9:23 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, iommu, linux-kernel, kvm, kvmarm, will, maz,
	robin.murphy, joro, alex.williamson, tn, zhukeqian1,
	jacob.jun.pan, yi.l.liu, wangxingang5, jiangkunkun,
	jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, nicoleotsuka, lushenming, vsethi,
	wanghaibin.wang

Hi Eric,

On 2021/2/24 4:56, Eric Auger wrote:
> In preparation for vSVA, let's accept userspace provided configs
> with more than one CD. We check the max CD against the host iommu
> capability and also the format (linear versus 2 level).
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 13 ++++++++-----
>   1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 332d31c0680f..ab74a0289893 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -3038,14 +3038,17 @@ static int arm_smmu_attach_pasid_table(struct iommu_domain *domain,
>   		if (smmu_domain->s1_cfg.set)
>   			goto out;
>   
> -		/*
> -		 * we currently support a single CD so s1fmt and s1dss
> -		 * fields are also ignored
> -		 */
> -		if (cfg->pasid_bits)
> +		list_for_each_entry(master, &smmu_domain->devices, domain_head) {
> +			if (cfg->pasid_bits > master->ssid_bits)
> +				goto out;
> +		}
> +		if (cfg->vendor_data.smmuv3.s1fmt == STRTAB_STE_0_S1FMT_64K_L2 &&
> +				!(smmu->features & ARM_SMMU_FEAT_2_LVL_CDTAB))
>   			goto out;
>   
>   		smmu_domain->s1_cfg.cdcfg.cdtab_dma = cfg->base_ptr;
> +		smmu_domain->s1_cfg.s1cdmax = cfg->pasid_bits;
> +		smmu_domain->s1_cfg.s1fmt = cfg->vendor_data.smmuv3.s1fmt;

And what about the SIDSS field?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v14 07/13] iommu/smmuv3: Implement cache_invalidate
  2021-02-23 20:56 ` [PATCH v14 07/13] iommu/smmuv3: Implement cache_invalidate Eric Auger
       [not found]   ` <c10c6405-5efe-5a41-2b3a-f3af85a528ba@hisilicon.com>
@ 2021-04-01  6:11   ` Zenghui Yu
  2021-04-01 12:06     ` Auger Eric
  1 sibling, 1 reply; 46+ messages in thread
From: Zenghui Yu @ 2021-04-01  6:11 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, iommu, linux-kernel, kvm, kvmarm, will, maz,
	robin.murphy, joro, alex.williamson, tn, zhukeqian1,
	jacob.jun.pan, yi.l.liu, wangxingang5, jiangkunkun,
	jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, nicoleotsuka, lushenming, vsethi,
	wanghaibin.wang

Hi Eric,

On 2021/2/24 4:56, Eric Auger wrote:
> +static int
> +arm_smmu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
> +			  struct iommu_cache_invalidate_info *inv_info)
> +{
> +	struct arm_smmu_cmdq_ent cmd = {.opcode = CMDQ_OP_TLBI_NSNH_ALL};
> +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
> +
> +	if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
> +		return -EINVAL;
> +
> +	if (!smmu)
> +		return -EINVAL;
> +
> +	if (inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
> +		return -EINVAL;
> +
> +	if (inv_info->cache & IOMMU_CACHE_INV_TYPE_PASID ||

I didn't find any code where we would emulate the CFGI_CD{_ALL} commands
for guest and invalidate the stale CD entries on the physical side. Is
PASID-cache type designed for that effect?

> +	    inv_info->cache & IOMMU_CACHE_INV_TYPE_DEV_IOTLB) {
> +		return -ENOENT;
> +	}
> +
> +	if (!(inv_info->cache & IOMMU_CACHE_INV_TYPE_IOTLB))
> +		return -EINVAL;
> +
> +	/* IOTLB invalidation */
> +
> +	switch (inv_info->granularity) {
> +	case IOMMU_INV_GRANU_PASID:
> +	{
> +		struct iommu_inv_pasid_info *info =
> +			&inv_info->granu.pasid_info;
> +
> +		if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
> +			return -ENOENT;
> +		if (!(info->flags & IOMMU_INV_PASID_FLAGS_ARCHID))
> +			return -EINVAL;
> +
> +		__arm_smmu_tlb_inv_context(smmu_domain, info->archid);
> +		return 0;
> +	}
> +	case IOMMU_INV_GRANU_ADDR:
> +	{
> +		struct iommu_inv_addr_info *info = &inv_info->granu.addr_info;
> +		size_t size = info->nb_granules * info->granule_size;
> +		bool leaf = info->flags & IOMMU_INV_ADDR_FLAGS_LEAF;
> +
> +		if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
> +			return -ENOENT;
> +
> +		if (!(info->flags & IOMMU_INV_ADDR_FLAGS_ARCHID))
> +			break;
> +
> +		arm_smmu_tlb_inv_range_domain(info->addr, size,
> +					      info->granule_size, leaf,
> +					      info->archid, smmu_domain);
> +
> +		arm_smmu_cmdq_issue_sync(smmu);

There is no need to issue one more SYNC.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v14 06/13] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
  2021-03-30  9:17   ` Zenghui Yu
@ 2021-04-01  9:38     ` Auger Eric
  0 siblings, 0 replies; 46+ messages in thread
From: Auger Eric @ 2021-04-01  9:38 UTC (permalink / raw)
  To: Zenghui Yu
  Cc: eric.auger.pro, iommu, linux-kernel, kvm, kvmarm, will, maz,
	robin.murphy, joro, alex.williamson, tn, zhukeqian1,
	jacob.jun.pan, yi.l.liu, wangxingang5, jiangkunkun,
	jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, nicoleotsuka, lushenming, vsethi,
	wanghaibin.wang

Hi Zenghui,

On 3/30/21 11:17 AM, Zenghui Yu wrote:
> On 2021/2/24 4:56, Eric Auger wrote:
>> @@ -1936,7 +1950,12 @@ static void
>> arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
>>           },
>>       };
>>   -    if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>> +    if (ext_asid >= 0) {  /* guest stage 1 invalidation */
>> +        cmd.opcode    = smmu_domain->smmu->features &
>> ARM_SMMU_FEAT_E2H ?
>> +                  CMDQ_OP_TLBI_EL2_VA : CMDQ_OP_TLBI_NH_VA;
> 
> If I understand it correctly, the true nested mode effectively gives us
> a *NS-EL1* StreamWorld. We should therefore use CMDQ_OP_TLBI_NH_VA to
> invalidate the stage-1 NS-EL1 entries, no matter E2H is selected or not.
> 

Yes at the moment you're right. Support for nested virt may induce some
changes here but we are not there. I will fix it and add a comment.
Thank you!

Best Regards

Eric


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v14 13/13] iommu/smmuv3: Accept configs with more than one context descriptor
  2021-03-30  9:23   ` Zenghui Yu
@ 2021-04-01 11:48     ` Auger Eric
  2021-04-01 12:38       ` Shameerali Kolothum Thodi
  0 siblings, 1 reply; 46+ messages in thread
From: Auger Eric @ 2021-04-01 11:48 UTC (permalink / raw)
  To: Zenghui Yu
  Cc: eric.auger.pro, iommu, linux-kernel, kvm, kvmarm, will, maz,
	robin.murphy, joro, alex.williamson, tn, zhukeqian1,
	jacob.jun.pan, yi.l.liu, wangxingang5, jiangkunkun,
	jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, nicoleotsuka, lushenming, vsethi,
	wanghaibin.wang

Hi Zenghui,

On 3/30/21 11:23 AM, Zenghui Yu wrote:
> Hi Eric,
> 
> On 2021/2/24 4:56, Eric Auger wrote:
>> In preparation for vSVA, let's accept userspace provided configs
>> with more than one CD. We check the max CD against the host iommu
>> capability and also the format (linear versus 2 level).
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
>> ---
>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 13 ++++++++-----
>>   1 file changed, 8 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 332d31c0680f..ab74a0289893 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -3038,14 +3038,17 @@ static int arm_smmu_attach_pasid_table(struct
>> iommu_domain *domain,
>>           if (smmu_domain->s1_cfg.set)
>>               goto out;
>>   -        /*
>> -         * we currently support a single CD so s1fmt and s1dss
>> -         * fields are also ignored
>> -         */
>> -        if (cfg->pasid_bits)
>> +        list_for_each_entry(master, &smmu_domain->devices,
>> domain_head) {
>> +            if (cfg->pasid_bits > master->ssid_bits)
>> +                goto out;
>> +        }
>> +        if (cfg->vendor_data.smmuv3.s1fmt ==
>> STRTAB_STE_0_S1FMT_64K_L2 &&
>> +                !(smmu->features & ARM_SMMU_FEAT_2_LVL_CDTAB))
>>               goto out;
>>             smmu_domain->s1_cfg.cdcfg.cdtab_dma = cfg->base_ptr;
>> +        smmu_domain->s1_cfg.s1cdmax = cfg->pasid_bits;
>> +        smmu_domain->s1_cfg.s1fmt = cfg->vendor_data.smmuv3.s1fmt;
> 
> And what about the SIDSS field?
> 
I added this patch upon Shameer's request, to be more vSVA friendly.
Hower this series does not really target multiple CD support. At the
moment the driver only supports STRTAB_STE_1_S1DSS_SSID0 (0x2) I think.
At this moment maybe I can only check the s1dss field is 0x2. Or simply
removes this patch?

Thoughts?

Eric


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v14 07/13] iommu/smmuv3: Implement cache_invalidate
  2021-04-01  6:11   ` Zenghui Yu
@ 2021-04-01 12:06     ` Auger Eric
  0 siblings, 0 replies; 46+ messages in thread
From: Auger Eric @ 2021-04-01 12:06 UTC (permalink / raw)
  To: Zenghui Yu
  Cc: eric.auger.pro, iommu, linux-kernel, kvm, kvmarm, will, maz,
	robin.murphy, joro, alex.williamson, tn, zhukeqian1,
	jacob.jun.pan, yi.l.liu, wangxingang5, jiangkunkun,
	jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, nicoleotsuka, lushenming, vsethi,
	wanghaibin.wang

Hi Zenghui,

On 4/1/21 8:11 AM, Zenghui Yu wrote:
> Hi Eric,
> 
> On 2021/2/24 4:56, Eric Auger wrote:
>> +static int
>> +arm_smmu_cache_invalidate(struct iommu_domain *domain, struct device
>> *dev,
>> +              struct iommu_cache_invalidate_info *inv_info)
>> +{
>> +    struct arm_smmu_cmdq_ent cmd = {.opcode = CMDQ_OP_TLBI_NSNH_ALL};
>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> +    struct arm_smmu_device *smmu = smmu_domain->smmu;
>> +
>> +    if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
>> +        return -EINVAL;
>> +
>> +    if (!smmu)
>> +        return -EINVAL;
>> +
>> +    if (inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
>> +        return -EINVAL;
>> +
>> +    if (inv_info->cache & IOMMU_CACHE_INV_TYPE_PASID ||
> 
> I didn't find any code where we would emulate the CFGI_CD{_ALL} commands
> for guest and invalidate the stale CD entries on the physical side. Is
> PASID-cache type designed for that effect?
Yes it is. PASID-cache matches the CD table.
> 
>> +        inv_info->cache & IOMMU_CACHE_INV_TYPE_DEV_IOTLB) {
>> +        return -ENOENT;
>> +    }
>> +
>> +    if (!(inv_info->cache & IOMMU_CACHE_INV_TYPE_IOTLB))
>> +        return -EINVAL;
>> +
>> +    /* IOTLB invalidation */
>> +
>> +    switch (inv_info->granularity) {
>> +    case IOMMU_INV_GRANU_PASID:
>> +    {
>> +        struct iommu_inv_pasid_info *info =
>> +            &inv_info->granu.pasid_info;
>> +
>> +        if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
>> +            return -ENOENT;
>> +        if (!(info->flags & IOMMU_INV_PASID_FLAGS_ARCHID))
>> +            return -EINVAL;
>> +
>> +        __arm_smmu_tlb_inv_context(smmu_domain, info->archid);
>> +        return 0;
>> +    }
>> +    case IOMMU_INV_GRANU_ADDR:
>> +    {
>> +        struct iommu_inv_addr_info *info = &inv_info->granu.addr_info;
>> +        size_t size = info->nb_granules * info->granule_size;
>> +        bool leaf = info->flags & IOMMU_INV_ADDR_FLAGS_LEAF;
>> +
>> +        if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
>> +            return -ENOENT;
>> +
>> +        if (!(info->flags & IOMMU_INV_ADDR_FLAGS_ARCHID))
>> +            break;
>> +
>> +        arm_smmu_tlb_inv_range_domain(info->addr, size,
>> +                          info->granule_size, leaf,
>> +                          info->archid, smmu_domain);
>> +
>> +        arm_smmu_cmdq_issue_sync(smmu);
> 
> There is no need to issue one more SYNC.
Hum yes I did not notice it was made by the arm_smmu_cmdq_issue_cmdlist()

Thanks!

Eric
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v14 06/13] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
  2021-02-23 20:56 ` [PATCH v14 06/13] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs Eric Auger
  2021-03-30  9:17   ` Zenghui Yu
@ 2021-04-01 12:37   ` Kunkun Jiang
  2021-04-08 12:30     ` Auger Eric
  1 sibling, 1 reply; 46+ messages in thread
From: Kunkun Jiang @ 2021-04-01 12:37 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, iommu, linux-kernel, kvm, kvmarm,
	will, maz, robin.murphy, joro, alex.williamson, tn, zhukeqian1
  Cc: jacob.jun.pan, yi.l.liu, wangxingang5, jean-philippe,
	zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, yuzenghui, nicoleotsuka, lushenming,
	vsethi, wanghaibin.wang, Keqian Zhu

Hi Eric,

On 2021/2/24 4:56, Eric Auger wrote:
> With nested stage support, soon we will need to invalidate
> S1 contexts and ranges tagged with an unmanaged asid, this
> latter being managed by the guest. So let's introduce 2 helpers
> that allow to invalidate with externally managed ASIDs
>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>
> ---
>
> v13 -> v14
> - Actually send the NH_ASID command (reported by Xingang Wang)
> ---
>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 38 ++++++++++++++++-----
>   1 file changed, 29 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 5579ec4fccc8..4c19a1114de4 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1843,9 +1843,9 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
>   }
>   
>   /* IO_PGTABLE API */
> -static void arm_smmu_tlb_inv_context(void *cookie)
> +static void __arm_smmu_tlb_inv_context(struct arm_smmu_domain *smmu_domain,
> +				       int ext_asid)
>   {
> -	struct arm_smmu_domain *smmu_domain = cookie;
>   	struct arm_smmu_device *smmu = smmu_domain->smmu;
>   	struct arm_smmu_cmdq_ent cmd;
>   
> @@ -1856,7 +1856,13 @@ static void arm_smmu_tlb_inv_context(void *cookie)
>   	 * insertion to guarantee those are observed before the TLBI. Do be
>   	 * careful, 007.
>   	 */
> -	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
> +	if (ext_asid >= 0) { /* guest stage 1 invalidation */
> +		cmd.opcode	= CMDQ_OP_TLBI_NH_ASID;
> +		cmd.tlbi.asid	= ext_asid;
> +		cmd.tlbi.vmid	= smmu_domain->s2_cfg.vmid;
> +		arm_smmu_cmdq_issue_cmd(smmu, &cmd);
> +		arm_smmu_cmdq_issue_sync(smmu);
> +	} else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>   		arm_smmu_tlb_inv_asid(smmu, smmu_domain->s1_cfg.cd.asid);
>   	} else {
>   		cmd.opcode	= CMDQ_OP_TLBI_S12_VMALL;
> @@ -1867,6 +1873,13 @@ static void arm_smmu_tlb_inv_context(void *cookie)
>   	arm_smmu_atc_inv_domain(smmu_domain, 0, 0, 0);
>   }
>   
> +static void arm_smmu_tlb_inv_context(void *cookie)
> +{
> +	struct arm_smmu_domain *smmu_domain = cookie;
> +
> +	__arm_smmu_tlb_inv_context(smmu_domain, -1);
> +}
> +
>   static void __arm_smmu_tlb_inv_range(struct arm_smmu_cmdq_ent *cmd,
>   				     unsigned long iova, size_t size,
>   				     size_t granule,
> @@ -1926,9 +1939,10 @@ static void __arm_smmu_tlb_inv_range(struct arm_smmu_cmdq_ent *cmd,
>   	arm_smmu_cmdq_batch_submit(smmu, &cmds);
>   }
>   
Here is the part of code in __arm_smmu_tlb_inv_range():
>         if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) {
>                 /* Get the leaf page size */
>                 tg = __ffs(smmu_domain->domain.pgsize_bitmap);
>
>                 /* Convert page size of 12,14,16 (log2) to 1,2,3 */
>                 cmd->tlbi.tg = (tg - 10) / 2;
>
>                 /* Determine what level the granule is at */
>                 cmd->tlbi.ttl = 4 - ((ilog2(granule) - 3) / (tg - 3));
>
>                 num_pages = size >> tg;
>         }
When pSMMU supports RIL, we get the leaf page size by __ffs(smmu_domain->
domain.pgsize_bitmap). In nested mode, it is determined by host 
PAGE_SIZE. If
the host kernel and guest kernel has different translation granule (e.g. 
host 16K,
guest 4K), __arm_smmu_tlb_inv_range() will issue an incorrect tlbi command.

Do you have any idea about this issue?

Best Regards,
Kunkun Jiang
> -static void arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
> -					  size_t granule, bool leaf,
> -					  struct arm_smmu_domain *smmu_domain)
> +static void
> +arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
> +			      size_t granule, bool leaf, int ext_asid,
> +			      struct arm_smmu_domain *smmu_domain)
>   {
>   	struct arm_smmu_cmdq_ent cmd = {
>   		.tlbi = {
> @@ -1936,7 +1950,12 @@ static void arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
>   		},
>   	};
>   
> -	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
> +	if (ext_asid >= 0) {  /* guest stage 1 invalidation */
> +		cmd.opcode	= smmu_domain->smmu->features & ARM_SMMU_FEAT_E2H ?
> +				  CMDQ_OP_TLBI_EL2_VA : CMDQ_OP_TLBI_NH_VA;
> +		cmd.tlbi.asid	= ext_asid;
> +		cmd.tlbi.vmid	= smmu_domain->s2_cfg.vmid;
> +	} else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>   		cmd.opcode	= smmu_domain->smmu->features & ARM_SMMU_FEAT_E2H ?
>   				  CMDQ_OP_TLBI_EL2_VA : CMDQ_OP_TLBI_NH_VA;
>   		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd.asid;
> @@ -1944,6 +1963,7 @@ static void arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
>   		cmd.opcode	= CMDQ_OP_TLBI_S2_IPA;
>   		cmd.tlbi.vmid	= smmu_domain->s2_cfg.vmid;
>   	}
> +
>   	__arm_smmu_tlb_inv_range(&cmd, iova, size, granule, smmu_domain);
>   
>   	/*
> @@ -1982,7 +2002,7 @@ static void arm_smmu_tlb_inv_page_nosync(struct iommu_iotlb_gather *gather,
>   static void arm_smmu_tlb_inv_walk(unsigned long iova, size_t size,
>   				  size_t granule, void *cookie)
>   {
> -	arm_smmu_tlb_inv_range_domain(iova, size, granule, false, cookie);
> +	arm_smmu_tlb_inv_range_domain(iova, size, granule, false, -1, cookie);
>   }
>   
>   static const struct iommu_flush_ops arm_smmu_flush_ops = {
> @@ -2523,7 +2543,7 @@ static void arm_smmu_iotlb_sync(struct iommu_domain *domain,
>   
>   	arm_smmu_tlb_inv_range_domain(gather->start,
>   				      gather->end - gather->start + 1,
> -				      gather->pgsize, true, smmu_domain);
> +				      gather->pgsize, true, -1, smmu_domain);
>   }
>   
>   static phys_addr_t



^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH v14 13/13] iommu/smmuv3: Accept configs with more than one context descriptor
  2021-04-01 11:48     ` Auger Eric
@ 2021-04-01 12:38       ` Shameerali Kolothum Thodi
  2021-04-01 13:20         ` Auger Eric
  0 siblings, 1 reply; 46+ messages in thread
From: Shameerali Kolothum Thodi @ 2021-04-01 12:38 UTC (permalink / raw)
  To: Auger Eric, yuzenghui
  Cc: eric.auger.pro, iommu, linux-kernel, kvm, kvmarm, will, maz,
	robin.murphy, joro, alex.williamson, tn, zhukeqian,
	jacob.jun.pan, yi.l.liu, wangxingang, jiangkunkun, jean-philippe,
	zhangfei.gao, zhangfei.gao, vivek.gautam, nicoleotsuka,
	lushenming, vsethi, Wanghaibin (D)



> -----Original Message-----
> From: Auger Eric [mailto:eric.auger@redhat.com]
> Sent: 01 April 2021 12:49
> To: yuzenghui <yuzenghui@huawei.com>
> Cc: eric.auger.pro@gmail.com; iommu@lists.linux-foundation.org;
> linux-kernel@vger.kernel.org; kvm@vger.kernel.org;
> kvmarm@lists.cs.columbia.edu; will@kernel.org; maz@kernel.org;
> robin.murphy@arm.com; joro@8bytes.org; alex.williamson@redhat.com;
> tn@semihalf.com; zhukeqian <zhukeqian1@huawei.com>;
> jacob.jun.pan@linux.intel.com; yi.l.liu@intel.com; wangxingang
> <wangxingang5@huawei.com>; jiangkunkun <jiangkunkun@huawei.com>;
> jean-philippe@linaro.org; zhangfei.gao@linaro.org; zhangfei.gao@gmail.com;
> vivek.gautam@arm.com; Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; nicoleotsuka@gmail.com;
> lushenming <lushenming@huawei.com>; vsethi@nvidia.com; Wanghaibin (D)
> <wanghaibin.wang@huawei.com>
> Subject: Re: [PATCH v14 13/13] iommu/smmuv3: Accept configs with more than
> one context descriptor
> 
> Hi Zenghui,
> 
> On 3/30/21 11:23 AM, Zenghui Yu wrote:
> > Hi Eric,
> >
> > On 2021/2/24 4:56, Eric Auger wrote:
> >> In preparation for vSVA, let's accept userspace provided configs
> >> with more than one CD. We check the max CD against the host iommu
> >> capability and also the format (linear versus 2 level).
> >>
> >> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> >> Signed-off-by: Shameer Kolothum
> <shameerali.kolothum.thodi@huawei.com>
> >> ---
> >>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 13 ++++++++-----
> >>   1 file changed, 8 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> >> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> >> index 332d31c0680f..ab74a0289893 100644
> >> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> >> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> >> @@ -3038,14 +3038,17 @@ static int
> arm_smmu_attach_pasid_table(struct
> >> iommu_domain *domain,
> >>           if (smmu_domain->s1_cfg.set)
> >>               goto out;
> >>   -        /*
> >> -         * we currently support a single CD so s1fmt and s1dss
> >> -         * fields are also ignored
> >> -         */
> >> -        if (cfg->pasid_bits)
> >> +        list_for_each_entry(master, &smmu_domain->devices,
> >> domain_head) {
> >> +            if (cfg->pasid_bits > master->ssid_bits)
> >> +                goto out;
> >> +        }
> >> +        if (cfg->vendor_data.smmuv3.s1fmt ==
> >> STRTAB_STE_0_S1FMT_64K_L2 &&
> >> +                !(smmu->features &
> ARM_SMMU_FEAT_2_LVL_CDTAB))
> >>               goto out;
> >>             smmu_domain->s1_cfg.cdcfg.cdtab_dma = cfg->base_ptr;
> >> +        smmu_domain->s1_cfg.s1cdmax = cfg->pasid_bits;
> >> +        smmu_domain->s1_cfg.s1fmt =
> cfg->vendor_data.smmuv3.s1fmt;
> >
> > And what about the SIDSS field?
> >
> I added this patch upon Shameer's request, to be more vSVA friendly.
> Hower this series does not really target multiple CD support. At the
> moment the driver only supports STRTAB_STE_1_S1DSS_SSID0 (0x2) I think.
> At this moment maybe I can only check the s1dss field is 0x2. Or simply
> removes this patch?
> 
> Thoughts?

Right. This was useful for vSVA tests. But yes, to properly support multiple CDs
we need to pass the S1DSS from Qemu. And that requires further changes.
So I think it's better to remove this patch and reject S1CDMAX != 0 cases.

Thanks,
Shameer
   
> 
> Eric


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v14 13/13] iommu/smmuv3: Accept configs with more than one context descriptor
  2021-04-01 12:38       ` Shameerali Kolothum Thodi
@ 2021-04-01 13:20         ` Auger Eric
  0 siblings, 0 replies; 46+ messages in thread
From: Auger Eric @ 2021-04-01 13:20 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, yuzenghui
  Cc: eric.auger.pro, iommu, linux-kernel, kvm, kvmarm, will, maz,
	robin.murphy, joro, alex.williamson, tn, zhukeqian,
	jacob.jun.pan, yi.l.liu, wangxingang, jiangkunkun, jean-philippe,
	zhangfei.gao, zhangfei.gao, vivek.gautam, nicoleotsuka,
	lushenming, vsethi, Wanghaibin (D)

Hi Shameer,
On 4/1/21 2:38 PM, Shameerali Kolothum Thodi wrote:
> 
> 
>> -----Original Message-----
>> From: Auger Eric [mailto:eric.auger@redhat.com]
>> Sent: 01 April 2021 12:49
>> To: yuzenghui <yuzenghui@huawei.com>
>> Cc: eric.auger.pro@gmail.com; iommu@lists.linux-foundation.org;
>> linux-kernel@vger.kernel.org; kvm@vger.kernel.org;
>> kvmarm@lists.cs.columbia.edu; will@kernel.org; maz@kernel.org;
>> robin.murphy@arm.com; joro@8bytes.org; alex.williamson@redhat.com;
>> tn@semihalf.com; zhukeqian <zhukeqian1@huawei.com>;
>> jacob.jun.pan@linux.intel.com; yi.l.liu@intel.com; wangxingang
>> <wangxingang5@huawei.com>; jiangkunkun <jiangkunkun@huawei.com>;
>> jean-philippe@linaro.org; zhangfei.gao@linaro.org; zhangfei.gao@gmail.com;
>> vivek.gautam@arm.com; Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>; nicoleotsuka@gmail.com;
>> lushenming <lushenming@huawei.com>; vsethi@nvidia.com; Wanghaibin (D)
>> <wanghaibin.wang@huawei.com>
>> Subject: Re: [PATCH v14 13/13] iommu/smmuv3: Accept configs with more than
>> one context descriptor
>>
>> Hi Zenghui,
>>
>> On 3/30/21 11:23 AM, Zenghui Yu wrote:
>>> Hi Eric,
>>>
>>> On 2021/2/24 4:56, Eric Auger wrote:
>>>> In preparation for vSVA, let's accept userspace provided configs
>>>> with more than one CD. We check the max CD against the host iommu
>>>> capability and also the format (linear versus 2 level).
>>>>
>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>> Signed-off-by: Shameer Kolothum
>> <shameerali.kolothum.thodi@huawei.com>
>>>> ---
>>>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 13 ++++++++-----
>>>>   1 file changed, 8 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> index 332d31c0680f..ab74a0289893 100644
>>>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> @@ -3038,14 +3038,17 @@ static int
>> arm_smmu_attach_pasid_table(struct
>>>> iommu_domain *domain,
>>>>           if (smmu_domain->s1_cfg.set)
>>>>               goto out;
>>>>   -        /*
>>>> -         * we currently support a single CD so s1fmt and s1dss
>>>> -         * fields are also ignored
>>>> -         */
>>>> -        if (cfg->pasid_bits)
>>>> +        list_for_each_entry(master, &smmu_domain->devices,
>>>> domain_head) {
>>>> +            if (cfg->pasid_bits > master->ssid_bits)
>>>> +                goto out;
>>>> +        }
>>>> +        if (cfg->vendor_data.smmuv3.s1fmt ==
>>>> STRTAB_STE_0_S1FMT_64K_L2 &&
>>>> +                !(smmu->features &
>> ARM_SMMU_FEAT_2_LVL_CDTAB))
>>>>               goto out;
>>>>             smmu_domain->s1_cfg.cdcfg.cdtab_dma = cfg->base_ptr;
>>>> +        smmu_domain->s1_cfg.s1cdmax = cfg->pasid_bits;
>>>> +        smmu_domain->s1_cfg.s1fmt =
>> cfg->vendor_data.smmuv3.s1fmt;
>>>
>>> And what about the SIDSS field?
>>>
>> I added this patch upon Shameer's request, to be more vSVA friendly.
>> Hower this series does not really target multiple CD support. At the
>> moment the driver only supports STRTAB_STE_1_S1DSS_SSID0 (0x2) I think.
>> At this moment maybe I can only check the s1dss field is 0x2. Or simply
>> removes this patch?
>>
>> Thoughts?
> 
> Right. This was useful for vSVA tests. But yes, to properly support multiple CDs
> we need to pass the S1DSS from Qemu. And that requires further changes.
> So I think it's better to remove this patch and reject S1CDMAX != 0 cases.
OK I will remove it

Thanks

Eric
> 
> Thanks,
> Shameer
>    
>>
>> Eric
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v14 08/13] dma-iommu: Implement NESTED_MSI cookie
  2021-02-23 20:56 ` [PATCH v14 08/13] dma-iommu: Implement NESTED_MSI cookie Eric Auger
@ 2021-04-07  7:39   ` Zenghui Yu
  2021-04-10 13:34     ` Auger Eric
  0 siblings, 1 reply; 46+ messages in thread
From: Zenghui Yu @ 2021-04-07  7:39 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, iommu, linux-kernel, kvm, kvmarm, will, maz,
	robin.murphy, joro, alex.williamson, tn, zhukeqian1,
	jacob.jun.pan, yi.l.liu, wangxingang5, jiangkunkun,
	jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, nicoleotsuka, lushenming, vsethi,
	wanghaibin.wang

Hi Eric,

On 2021/2/24 4:56, Eric Auger wrote:
> Up to now, when the type was UNMANAGED, we used to
> allocate IOVA pages within a reserved IOVA MSI range.
> 
> If both the host and the guest are exposed with SMMUs, each
> would allocate an IOVA. The guest allocates an IOVA (gIOVA)
> to map onto the guest MSI doorbell (gDB). The Host allocates
> another IOVA (hIOVA) to map onto the physical doorbell (hDB).
> 
> So we end up with 2 unrelated mappings, at S1 and S2:
>           S1             S2
> gIOVA    ->     gDB
>                 hIOVA    ->    hDB
> 
> The PCI device would be programmed with hIOVA.
> No stage 1 mapping would existing, causing the MSIs to fault.
> 
> iommu_dma_bind_guest_msi() allows to pass gIOVA/gDB
> to the host so that gIOVA can be used by the host instead of
> re-allocating a new hIOVA.
> 
>           S1           S2
> gIOVA    ->    gDB    ->    hDB
> 
> this time, the PCI device can be programmed with the gIOVA MSI
> doorbell which is correctly mapped through both stages.
> 
> Nested mode is not compatible with HW MSI regions as in that
> case gDB and hDB should have a 1-1 mapping. This check will
> be done when attaching each device to the IOMMU domain.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>

[...]

> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index f659395e7959..d25eb7cecaa7 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -19,6 +19,7 @@
>   #include <linux/irq.h>
>   #include <linux/mm.h>
>   #include <linux/mutex.h>
> +#include <linux/mutex.h>

Duplicated include.

>   #include <linux/pci.h>
>   #include <linux/swiotlb.h>
>   #include <linux/scatterlist.h>
> @@ -29,12 +30,15 @@
>   struct iommu_dma_msi_page {
>   	struct list_head	list;
>   	dma_addr_t		iova;
> +	dma_addr_t		gpa;
>   	phys_addr_t		phys;
> +	size_t			s1_granule;
>   };
>   
>   enum iommu_dma_cookie_type {
>   	IOMMU_DMA_IOVA_COOKIE,
>   	IOMMU_DMA_MSI_COOKIE,
> +	IOMMU_DMA_NESTED_MSI_COOKIE,
>   };
>   
>   struct iommu_dma_cookie {
> @@ -46,6 +50,7 @@ struct iommu_dma_cookie {
>   		dma_addr_t		msi_iova;

msi_iova is unused in the nested mode, but we still set it to the start
address of the RESV_SW_MSI region (in iommu_get_msi_cookie()), which
looks a bit strange to me.

>   	};
>   	struct list_head		msi_page_list;
> +	spinlock_t			msi_lock;

Should msi_lock be grabbed everywhere msi_page_list is populated?
Especially in iommu_dma_get_msi_page(), which can be invoked from the
irqchip driver.

>   
>   	/* Domain for flush queue callback; NULL if flush queue not in use */
>   	struct iommu_domain		*fq_domain;
> @@ -87,6 +92,7 @@ static struct iommu_dma_cookie *cookie_alloc(enum iommu_dma_cookie_type type)
>   
>   	cookie = kzalloc(sizeof(*cookie), GFP_KERNEL);
>   	if (cookie) {
> +		spin_lock_init(&cookie->msi_lock);
>   		INIT_LIST_HEAD(&cookie->msi_page_list);
>   		cookie->type = type;
>   	}
> @@ -120,14 +126,17 @@ EXPORT_SYMBOL(iommu_get_dma_cookie);
>    *
>    * Users who manage their own IOVA allocation and do not want DMA API support,
>    * but would still like to take advantage of automatic MSI remapping, can use
> - * this to initialise their own domain appropriately. Users should reserve a
> + * this to initialise their own domain appropriately. Users may reserve a
>    * contiguous IOVA region, starting at @base, large enough to accommodate the
>    * number of PAGE_SIZE mappings necessary to cover every MSI doorbell address
> - * used by the devices attached to @domain.
> + * used by the devices attached to @domain. The other way round is to provide
> + * usable iova pages through the iommu_dma_bind_doorbell API (nested stages

s/iommu_dma_bind_doorbell/iommu_dma_bind_guest_msi/ ?

> + * use case)
>    */
>   int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
>   {
>   	struct iommu_dma_cookie *cookie;
> +	int nesting, ret;
>   
>   	if (domain->type != IOMMU_DOMAIN_UNMANAGED)
>   		return -EINVAL;
> @@ -135,7 +144,12 @@ int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
>   	if (domain->iova_cookie)
>   		return -EEXIST;
>   
> -	cookie = cookie_alloc(IOMMU_DMA_MSI_COOKIE);
> +	ret =  iommu_domain_get_attr(domain, DOMAIN_ATTR_NESTING, &nesting);

Redundant space.

> +	if (!ret && nesting)
> +		cookie = cookie_alloc(IOMMU_DMA_NESTED_MSI_COOKIE);
> +	else
> +		cookie = cookie_alloc(IOMMU_DMA_MSI_COOKIE);
> +
>   	if (!cookie)
>   		return -ENOMEM;
>   
> @@ -156,6 +170,7 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
>   {
>   	struct iommu_dma_cookie *cookie = domain->iova_cookie;
>   	struct iommu_dma_msi_page *msi, *tmp;
> +	bool s2_unmap = false;
>   
>   	if (!cookie)
>   		return;
> @@ -163,7 +178,15 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
>   	if (cookie->type == IOMMU_DMA_IOVA_COOKIE && cookie->iovad.granule)
>   		put_iova_domain(&cookie->iovad);
>   
> +	if (cookie->type == IOMMU_DMA_NESTED_MSI_COOKIE)
> +		s2_unmap = true;
> +
>   	list_for_each_entry_safe(msi, tmp, &cookie->msi_page_list, list) {
> +		if (s2_unmap && msi->phys) {

I don't think @s2_unmap is necessary. Checking 'cookie->type==NESTED'
directly shouldn't be that expensive.

> +			size_t size = cookie_msi_granule(cookie);
> +
> +			WARN_ON(iommu_unmap(domain, msi->gpa, size) != size);
> +		}
>   		list_del(&msi->list);
>   		kfree(msi);
>   	}
> @@ -172,6 +195,92 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
>   }
>   EXPORT_SYMBOL(iommu_put_dma_cookie);
>   
> +/**
> + * iommu_dma_bind_guest_msi - Allows to pass the stage 1
> + * binding of a virtual MSI doorbell used by @dev.
> + *
> + * @domain: domain handle
> + * @iova: guest iova

Can we change it to 'giova' (to match the unbind side)?

> + * @gpa: gpa of the virtual doorbell
> + * @size: size of the granule used for the stage1 mapping
> + *
> + * In nested stage use case, the user can provide IOVA/IPA bindings
> + * corresponding to a guest MSI stage 1 mapping. When the host needs
> + * to map its own MSI doorbells, it can use @gpa as stage 2 input
> + * and map it onto the physical MSI doorbell.
> + */
> +int iommu_dma_bind_guest_msi(struct iommu_domain *domain,
> +			     dma_addr_t iova, phys_addr_t gpa, size_t size)
> +{
> +	struct iommu_dma_cookie *cookie = domain->iova_cookie;
> +	struct iommu_dma_msi_page *msi;
> +	int ret = 0;
> +
> +	if (!cookie)
> +		return -EINVAL;
> +
> +	if (cookie->type != IOMMU_DMA_NESTED_MSI_COOKIE)
> +		return -EINVAL;
> +
> +	iova = iova & ~(dma_addr_t)(size - 1);
> +	gpa = gpa & ~(phys_addr_t)(size - 1);
> +
> +	spin_lock(&cookie->msi_lock);
> +
> +	list_for_each_entry(msi, &cookie->msi_page_list, list) {
> +		if (msi->iova == iova)
> +			goto unlock; /* this page is already registered */
> +	}
> +
> +	msi = kzalloc(sizeof(*msi), GFP_ATOMIC);
> +	if (!msi) {
> +		ret = -ENOMEM;
> +		goto unlock;
> +	}
> +
> +	msi->iova = iova;
> +	msi->gpa = gpa;
> +	msi->s1_granule = size;
> +	list_add(&msi->list, &cookie->msi_page_list);
> +unlock:
> +	spin_unlock(&cookie->msi_lock);
> +	return ret;
> +}
> +EXPORT_SYMBOL(iommu_dma_bind_guest_msi);
> +
> +void iommu_dma_unbind_guest_msi(struct iommu_domain *domain, dma_addr_t giova)
> +{
> +	struct iommu_dma_cookie *cookie = domain->iova_cookie;
> +	struct iommu_dma_msi_page *msi;
> +
> +	if (!cookie)
> +		return;
> +
> +	if (cookie->type != IOMMU_DMA_NESTED_MSI_COOKIE)
> +		return;
> +
> +	spin_lock(&cookie->msi_lock);
> +
> +	list_for_each_entry(msi, &cookie->msi_page_list, list) {
> +		dma_addr_t aligned_giova =
> +			giova & ~(dma_addr_t)(msi->s1_granule - 1);
> +
> +		if (msi->iova == aligned_giova) {
> +			if (msi->phys) {
> +				/* unmap the stage 2 */
> +				size_t size = cookie_msi_granule(cookie);
> +
> +				WARN_ON(iommu_unmap(domain, msi->gpa, size) != size);
> +			}
> +			list_del(&msi->list);
> +			kfree(msi);
> +			break;
> +		}
> +	}
> +	spin_unlock(&cookie->msi_lock);
> +}
> +EXPORT_SYMBOL(iommu_dma_unbind_guest_msi);
> +
>   /**
>    * iommu_dma_get_resv_regions - Reserved region driver helper
>    * @dev: Device from iommu_get_resv_regions()
> @@ -1343,6 +1452,33 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
>   		if (msi_page->phys == msi_addr)
>   			return msi_page;
>   
> +	/*
> +	 * In nested stage mode, we do not allocate an MSI page in
> +	 * a range provided by the user. Instead, IOVA/IPA bindings are
> +	 * individually provided. We reuse thise IOVAs to build the

s/thise/these/

> +	 * GIOVA -> GPA -> MSI HPA nested stage mapping.
> +	 */
> +	if (cookie->type == IOMMU_DMA_NESTED_MSI_COOKIE) {
> +		list_for_each_entry(msi_page, &cookie->msi_page_list, list)
> +			if (!msi_page->phys) {
> +				int ret;
> +
> +				/* do the stage 2 mapping */
> +				ret = iommu_map(domain,
> +						msi_page->gpa, msi_addr, size,

Shouldn't we make sure that the size of S2 mapping is not less than
s1_granule? Although what we need is actually a 32-bit TRANSLATER
register, we don't know where it is mapped in S1.

> +						IOMMU_MMIO | IOMMU_WRITE);

Is it intentional to drop the IOMMU_NOEXEC flag (from @prot)?

> +				if (ret) {
> +					pr_warn("MSI S2 mapping failed (%d)\n",
> +						ret);
> +					return NULL;
> +				}
> +				msi_page->phys = msi_addr;
> +				return msi_page;
> +			}
> +		pr_warn("%s no MSI binding found\n", __func__);
> +		return NULL;
> +	}
> +
>   	msi_page = kzalloc(sizeof(*msi_page), GFP_KERNEL);
>   	if (!msi_page)
>   		return NULL;


Thanks,
Zenghui

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v14 06/13] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
  2021-04-01 12:37   ` Kunkun Jiang
@ 2021-04-08 12:30     ` Auger Eric
  2021-04-09  4:48       ` Kunkun Jiang
  0 siblings, 1 reply; 46+ messages in thread
From: Auger Eric @ 2021-04-08 12:30 UTC (permalink / raw)
  To: Kunkun Jiang, eric.auger.pro, iommu, linux-kernel, kvm, kvmarm,
	will, maz, robin.murphy, joro, alex.williamson, tn, zhukeqian1
  Cc: jacob.jun.pan, yi.l.liu, wangxingang5, jean-philippe,
	zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, yuzenghui, nicoleotsuka, lushenming,
	vsethi, wanghaibin.wang

Hi Kunkun,

On 4/1/21 2:37 PM, Kunkun Jiang wrote:
> Hi Eric,
> 
> On 2021/2/24 4:56, Eric Auger wrote:
>> With nested stage support, soon we will need to invalidate
>> S1 contexts and ranges tagged with an unmanaged asid, this
>> latter being managed by the guest. So let's introduce 2 helpers
>> that allow to invalidate with externally managed ASIDs
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>
>> ---
>>
>> v13 -> v14
>> - Actually send the NH_ASID command (reported by Xingang Wang)
>> ---
>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 38 ++++++++++++++++-----
>>   1 file changed, 29 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 5579ec4fccc8..4c19a1114de4 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -1843,9 +1843,9 @@ int arm_smmu_atc_inv_domain(struct
>> arm_smmu_domain *smmu_domain, int ssid,
>>   }
>>     /* IO_PGTABLE API */
>> -static void arm_smmu_tlb_inv_context(void *cookie)
>> +static void __arm_smmu_tlb_inv_context(struct arm_smmu_domain
>> *smmu_domain,
>> +                       int ext_asid)
>>   {
>> -    struct arm_smmu_domain *smmu_domain = cookie;
>>       struct arm_smmu_device *smmu = smmu_domain->smmu;
>>       struct arm_smmu_cmdq_ent cmd;
>>   @@ -1856,7 +1856,13 @@ static void arm_smmu_tlb_inv_context(void
>> *cookie)
>>        * insertion to guarantee those are observed before the TLBI. Do be
>>        * careful, 007.
>>        */
>> -    if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>> +    if (ext_asid >= 0) { /* guest stage 1 invalidation */
>> +        cmd.opcode    = CMDQ_OP_TLBI_NH_ASID;
>> +        cmd.tlbi.asid    = ext_asid;
>> +        cmd.tlbi.vmid    = smmu_domain->s2_cfg.vmid;
>> +        arm_smmu_cmdq_issue_cmd(smmu, &cmd);
>> +        arm_smmu_cmdq_issue_sync(smmu);
>> +    } else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>>           arm_smmu_tlb_inv_asid(smmu, smmu_domain->s1_cfg.cd.asid);
>>       } else {
>>           cmd.opcode    = CMDQ_OP_TLBI_S12_VMALL;
>> @@ -1867,6 +1873,13 @@ static void arm_smmu_tlb_inv_context(void *cookie)
>>       arm_smmu_atc_inv_domain(smmu_domain, 0, 0, 0);
>>   }
>>   +static void arm_smmu_tlb_inv_context(void *cookie)
>> +{
>> +    struct arm_smmu_domain *smmu_domain = cookie;
>> +
>> +    __arm_smmu_tlb_inv_context(smmu_domain, -1);
>> +}
>> +
>>   static void __arm_smmu_tlb_inv_range(struct arm_smmu_cmdq_ent *cmd,
>>                        unsigned long iova, size_t size,
>>                        size_t granule,
>> @@ -1926,9 +1939,10 @@ static void __arm_smmu_tlb_inv_range(struct
>> arm_smmu_cmdq_ent *cmd,
>>       arm_smmu_cmdq_batch_submit(smmu, &cmds);
>>   }
>>   
> Here is the part of code in __arm_smmu_tlb_inv_range():
>>         if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) {
>>                 /* Get the leaf page size */
>>                 tg = __ffs(smmu_domain->domain.pgsize_bitmap);
>>
>>                 /* Convert page size of 12,14,16 (log2) to 1,2,3 */
>>                 cmd->tlbi.tg = (tg - 10) / 2;
>>
>>                 /* Determine what level the granule is at */
>>                 cmd->tlbi.ttl = 4 - ((ilog2(granule) - 3) / (tg - 3));
>>
>>                 num_pages = size >> tg;
>>         }
> When pSMMU supports RIL, we get the leaf page size by __ffs(smmu_domain->
> domain.pgsize_bitmap). In nested mode, it is determined by host
> PAGE_SIZE. If
> the host kernel and guest kernel has different translation granule (e.g.
> host 16K,
> guest 4K), __arm_smmu_tlb_inv_range() will issue an incorrect tlbi command.
> 
> Do you have any idea about this issue?

I think this is the same issue as the one reported by Chenxiang

https://lore.kernel.org/lkml/15938ed5-2095-e903-a290-333c299015a2@hisilicon.com/

In case RIL is not supported by the host, next version will use the
smallest pSMMU supported page size, as done in __arm_smmu_tlb_inv_range

Thanks

Eric

> 
> Best Regards,
> Kunkun Jiang
>> -static void arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t
>> size,
>> -                      size_t granule, bool leaf,
>> -                      struct arm_smmu_domain *smmu_domain)
>> +static void
>> +arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
>> +                  size_t granule, bool leaf, int ext_asid,
>> +                  struct arm_smmu_domain *smmu_domain)
>>   {
>>       struct arm_smmu_cmdq_ent cmd = {
>>           .tlbi = {
>> @@ -1936,7 +1950,12 @@ static void
>> arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
>>           },
>>       };
>>   -    if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>> +    if (ext_asid >= 0) {  /* guest stage 1 invalidation */
>> +        cmd.opcode    = smmu_domain->smmu->features &
>> ARM_SMMU_FEAT_E2H ?
>> +                  CMDQ_OP_TLBI_EL2_VA : CMDQ_OP_TLBI_NH_VA;
>> +        cmd.tlbi.asid    = ext_asid;
>> +        cmd.tlbi.vmid    = smmu_domain->s2_cfg.vmid;
>> +    } else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>>           cmd.opcode    = smmu_domain->smmu->features &
>> ARM_SMMU_FEAT_E2H ?
>>                     CMDQ_OP_TLBI_EL2_VA : CMDQ_OP_TLBI_NH_VA;
>>           cmd.tlbi.asid    = smmu_domain->s1_cfg.cd.asid;
>> @@ -1944,6 +1963,7 @@ static void
>> arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
>>           cmd.opcode    = CMDQ_OP_TLBI_S2_IPA;
>>           cmd.tlbi.vmid    = smmu_domain->s2_cfg.vmid;
>>       }
>> +
>>       __arm_smmu_tlb_inv_range(&cmd, iova, size, granule, smmu_domain);
>>         /*
>> @@ -1982,7 +2002,7 @@ static void arm_smmu_tlb_inv_page_nosync(struct
>> iommu_iotlb_gather *gather,
>>   static void arm_smmu_tlb_inv_walk(unsigned long iova, size_t size,
>>                     size_t granule, void *cookie)
>>   {
>> -    arm_smmu_tlb_inv_range_domain(iova, size, granule, false, cookie);
>> +    arm_smmu_tlb_inv_range_domain(iova, size, granule, false, -1,
>> cookie);
>>   }
>>     static const struct iommu_flush_ops arm_smmu_flush_ops = {
>> @@ -2523,7 +2543,7 @@ static void arm_smmu_iotlb_sync(struct
>> iommu_domain *domain,
>>         arm_smmu_tlb_inv_range_domain(gather->start,
>>                         gather->end - gather->start + 1,
>> -                      gather->pgsize, true, smmu_domain);
>> +                      gather->pgsize, true, -1, smmu_domain);
>>   }
>>     static phys_addr_t
> 
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v14 06/13] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
  2021-04-08 12:30     ` Auger Eric
@ 2021-04-09  4:48       ` Kunkun Jiang
  2021-04-09  8:31         ` Auger Eric
  0 siblings, 1 reply; 46+ messages in thread
From: Kunkun Jiang @ 2021-04-09  4:48 UTC (permalink / raw)
  To: Auger Eric, eric.auger.pro, iommu, linux-kernel, kvm, kvmarm,
	will, maz, robin.murphy, joro, alex.williamson, tn, zhukeqian1
  Cc: jacob.jun.pan, yi.l.liu, wangxingang5, jean-philippe,
	zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, yuzenghui, nicoleotsuka, lushenming,
	vsethi, wanghaibin.wang

Hi Eric,

On 2021/4/8 20:30, Auger Eric wrote:
> Hi Kunkun,
>
> On 4/1/21 2:37 PM, Kunkun Jiang wrote:
>> Hi Eric,
>>
>> On 2021/2/24 4:56, Eric Auger wrote:
>>> With nested stage support, soon we will need to invalidate
>>> S1 contexts and ranges tagged with an unmanaged asid, this
>>> latter being managed by the guest. So let's introduce 2 helpers
>>> that allow to invalidate with externally managed ASIDs
>>>
>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>
>>> ---
>>>
>>> v13 -> v14
>>> - Actually send the NH_ASID command (reported by Xingang Wang)
>>> ---
>>>    drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 38 ++++++++++++++++-----
>>>    1 file changed, 29 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>> index 5579ec4fccc8..4c19a1114de4 100644
>>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>> @@ -1843,9 +1843,9 @@ int arm_smmu_atc_inv_domain(struct
>>> arm_smmu_domain *smmu_domain, int ssid,
>>>    }
>>>      /* IO_PGTABLE API */
>>> -static void arm_smmu_tlb_inv_context(void *cookie)
>>> +static void __arm_smmu_tlb_inv_context(struct arm_smmu_domain
>>> *smmu_domain,
>>> +                       int ext_asid)
>>>    {
>>> -    struct arm_smmu_domain *smmu_domain = cookie;
>>>        struct arm_smmu_device *smmu = smmu_domain->smmu;
>>>        struct arm_smmu_cmdq_ent cmd;
>>>    @@ -1856,7 +1856,13 @@ static void arm_smmu_tlb_inv_context(void
>>> *cookie)
>>>         * insertion to guarantee those are observed before the TLBI. Do be
>>>         * careful, 007.
>>>         */
>>> -    if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>>> +    if (ext_asid >= 0) { /* guest stage 1 invalidation */
>>> +        cmd.opcode    = CMDQ_OP_TLBI_NH_ASID;
>>> +        cmd.tlbi.asid    = ext_asid;
>>> +        cmd.tlbi.vmid    = smmu_domain->s2_cfg.vmid;
>>> +        arm_smmu_cmdq_issue_cmd(smmu, &cmd);
>>> +        arm_smmu_cmdq_issue_sync(smmu);
>>> +    } else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>>>            arm_smmu_tlb_inv_asid(smmu, smmu_domain->s1_cfg.cd.asid);
>>>        } else {
>>>            cmd.opcode    = CMDQ_OP_TLBI_S12_VMALL;
>>> @@ -1867,6 +1873,13 @@ static void arm_smmu_tlb_inv_context(void *cookie)
>>>        arm_smmu_atc_inv_domain(smmu_domain, 0, 0, 0);
>>>    }
>>>    +static void arm_smmu_tlb_inv_context(void *cookie)
>>> +{
>>> +    struct arm_smmu_domain *smmu_domain = cookie;
>>> +
>>> +    __arm_smmu_tlb_inv_context(smmu_domain, -1);
>>> +}
>>> +
>>>    static void __arm_smmu_tlb_inv_range(struct arm_smmu_cmdq_ent *cmd,
>>>                         unsigned long iova, size_t size,
>>>                         size_t granule,
>>> @@ -1926,9 +1939,10 @@ static void __arm_smmu_tlb_inv_range(struct
>>> arm_smmu_cmdq_ent *cmd,
>>>        arm_smmu_cmdq_batch_submit(smmu, &cmds);
>>>    }
>>>    
>> Here is the part of code in __arm_smmu_tlb_inv_range():
>>>          if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) {
>>>                  /* Get the leaf page size */
>>>                  tg = __ffs(smmu_domain->domain.pgsize_bitmap);
>>>
>>>                  /* Convert page size of 12,14,16 (log2) to 1,2,3 */
>>>                  cmd->tlbi.tg = (tg - 10) / 2;
>>>
>>>                  /* Determine what level the granule is at */
>>>                  cmd->tlbi.ttl = 4 - ((ilog2(granule) - 3) / (tg - 3));
>>>
>>>                  num_pages = size >> tg;
>>>          }
>> When pSMMU supports RIL, we get the leaf page size by __ffs(smmu_domain->
>> domain.pgsize_bitmap). In nested mode, it is determined by host
>> PAGE_SIZE. If
>> the host kernel and guest kernel has different translation granule (e.g.
>> host 16K,
>> guest 4K), __arm_smmu_tlb_inv_range() will issue an incorrect tlbi command.
>>
>> Do you have any idea about this issue?
> I think this is the same issue as the one reported by Chenxiang
>
> https://lore.kernel.org/lkml/15938ed5-2095-e903-a290-333c299015a2@hisilicon.com/
>
> In case RIL is not supported by the host, next version will use the
> smallest pSMMU supported page size, as done in __arm_smmu_tlb_inv_range
>
> Thanks
>
> Eric
I think they are different. In normal cases, when we want to invalidate the
cache of stage 1, we should use the granule size supported by vSMMU to
implement and issue an tlbi command if pSMMU supports RIL.

But in the current __arm_smmu_tlb_inv_range(), it always uses the granule
size supported by host.
(tg = __ffs(smmu_domain->domain.pgsize_bitmap);)

Let me explain more clearly.
Preconditions of this issue:
1. pSMMU supports RIL
2. host and guest use different translation granule (e.g. host 16K, 
guest 4K)

Guest wants to invalidate 4K, so info->granule_size = 4K.
In __arm_smmu_tlb_inv_range(),   if pSMMU supports RIL and host 16K,
tg = 14, tlbi.tg = 2, tlbi.ttl = 4, tlbi.scale = 0, tlbi.num = -1. It is 
an incorrect
tlbi command.

So it would be better to pass the leaf page size supported by vSMMU to
host.  Perhaps this issue and the one reported by Chenxiang can be solved
together.

Thanks,
Kunkun Jiang
>> Best Regards,
>> Kunkun Jiang
>>> -static void arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t
>>> size,
>>> -                      size_t granule, bool leaf,
>>> -                      struct arm_smmu_domain *smmu_domain)
>>> +static void
>>> +arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
>>> +                  size_t granule, bool leaf, int ext_asid,
>>> +                  struct arm_smmu_domain *smmu_domain)
>>>    {
>>>        struct arm_smmu_cmdq_ent cmd = {
>>>            .tlbi = {
>>> @@ -1936,7 +1950,12 @@ static void
>>> arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
>>>            },
>>>        };
>>>    -    if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>>> +    if (ext_asid >= 0) {  /* guest stage 1 invalidation */
>>> +        cmd.opcode    = smmu_domain->smmu->features &
>>> ARM_SMMU_FEAT_E2H ?
>>> +                  CMDQ_OP_TLBI_EL2_VA : CMDQ_OP_TLBI_NH_VA;
>>> +        cmd.tlbi.asid    = ext_asid;
>>> +        cmd.tlbi.vmid    = smmu_domain->s2_cfg.vmid;
>>> +    } else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>>>            cmd.opcode    = smmu_domain->smmu->features &
>>> ARM_SMMU_FEAT_E2H ?
>>>                      CMDQ_OP_TLBI_EL2_VA : CMDQ_OP_TLBI_NH_VA;
>>>            cmd.tlbi.asid    = smmu_domain->s1_cfg.cd.asid;
>>> @@ -1944,6 +1963,7 @@ static void
>>> arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
>>>            cmd.opcode    = CMDQ_OP_TLBI_S2_IPA;
>>>            cmd.tlbi.vmid    = smmu_domain->s2_cfg.vmid;
>>>        }
>>> +
>>>        __arm_smmu_tlb_inv_range(&cmd, iova, size, granule, smmu_domain);
>>>          /*
>>> @@ -1982,7 +2002,7 @@ static void arm_smmu_tlb_inv_page_nosync(struct
>>> iommu_iotlb_gather *gather,
>>>    static void arm_smmu_tlb_inv_walk(unsigned long iova, size_t size,
>>>                      size_t granule, void *cookie)
>>>    {
>>> -    arm_smmu_tlb_inv_range_domain(iova, size, granule, false, cookie);
>>> +    arm_smmu_tlb_inv_range_domain(iova, size, granule, false, -1,
>>> cookie);
>>>    }
>>>      static const struct iommu_flush_ops arm_smmu_flush_ops = {
>>> @@ -2523,7 +2543,7 @@ static void arm_smmu_iotlb_sync(struct
>>> iommu_domain *domain,
>>>          arm_smmu_tlb_inv_range_domain(gather->start,
>>>                          gather->end - gather->start + 1,
>>> -                      gather->pgsize, true, smmu_domain);
>>> +                      gather->pgsize, true, -1, smmu_domain);
>>>    }
>>>      static phys_addr_t
>>
> .



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v14 06/13] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
  2021-04-09  4:48       ` Kunkun Jiang
@ 2021-04-09  8:31         ` Auger Eric
  2021-04-09  9:43           ` Kunkun Jiang
  0 siblings, 1 reply; 46+ messages in thread
From: Auger Eric @ 2021-04-09  8:31 UTC (permalink / raw)
  To: Kunkun Jiang, eric.auger.pro, iommu, linux-kernel, kvm, kvmarm,
	will, maz, robin.murphy, joro, alex.williamson, tn, zhukeqian1
  Cc: jacob.jun.pan, yi.l.liu, wangxingang5, jean-philippe,
	zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, yuzenghui, nicoleotsuka, lushenming,
	vsethi, wanghaibin.wang

Hi Kunkun,

On 4/9/21 6:48 AM, Kunkun Jiang wrote:
> Hi Eric,
> 
> On 2021/4/8 20:30, Auger Eric wrote:
>> Hi Kunkun,
>>
>> On 4/1/21 2:37 PM, Kunkun Jiang wrote:
>>> Hi Eric,
>>>
>>> On 2021/2/24 4:56, Eric Auger wrote:
>>>> With nested stage support, soon we will need to invalidate
>>>> S1 contexts and ranges tagged with an unmanaged asid, this
>>>> latter being managed by the guest. So let's introduce 2 helpers
>>>> that allow to invalidate with externally managed ASIDs
>>>>
>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>>
>>>> ---
>>>>
>>>> v13 -> v14
>>>> - Actually send the NH_ASID command (reported by Xingang Wang)
>>>> ---
>>>>    drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 38
>>>> ++++++++++++++++-----
>>>>    1 file changed, 29 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> index 5579ec4fccc8..4c19a1114de4 100644
>>>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> @@ -1843,9 +1843,9 @@ int arm_smmu_atc_inv_domain(struct
>>>> arm_smmu_domain *smmu_domain, int ssid,
>>>>    }
>>>>      /* IO_PGTABLE API */
>>>> -static void arm_smmu_tlb_inv_context(void *cookie)
>>>> +static void __arm_smmu_tlb_inv_context(struct arm_smmu_domain
>>>> *smmu_domain,
>>>> +                       int ext_asid)
>>>>    {
>>>> -    struct arm_smmu_domain *smmu_domain = cookie;
>>>>        struct arm_smmu_device *smmu = smmu_domain->smmu;
>>>>        struct arm_smmu_cmdq_ent cmd;
>>>>    @@ -1856,7 +1856,13 @@ static void arm_smmu_tlb_inv_context(void
>>>> *cookie)
>>>>         * insertion to guarantee those are observed before the TLBI.
>>>> Do be
>>>>         * careful, 007.
>>>>         */
>>>> -    if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>>>> +    if (ext_asid >= 0) { /* guest stage 1 invalidation */
>>>> +        cmd.opcode    = CMDQ_OP_TLBI_NH_ASID;
>>>> +        cmd.tlbi.asid    = ext_asid;
>>>> +        cmd.tlbi.vmid    = smmu_domain->s2_cfg.vmid;
>>>> +        arm_smmu_cmdq_issue_cmd(smmu, &cmd);
>>>> +        arm_smmu_cmdq_issue_sync(smmu);
>>>> +    } else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>>>>            arm_smmu_tlb_inv_asid(smmu, smmu_domain->s1_cfg.cd.asid);
>>>>        } else {
>>>>            cmd.opcode    = CMDQ_OP_TLBI_S12_VMALL;
>>>> @@ -1867,6 +1873,13 @@ static void arm_smmu_tlb_inv_context(void
>>>> *cookie)
>>>>        arm_smmu_atc_inv_domain(smmu_domain, 0, 0, 0);
>>>>    }
>>>>    +static void arm_smmu_tlb_inv_context(void *cookie)
>>>> +{
>>>> +    struct arm_smmu_domain *smmu_domain = cookie;
>>>> +
>>>> +    __arm_smmu_tlb_inv_context(smmu_domain, -1);
>>>> +}
>>>> +
>>>>    static void __arm_smmu_tlb_inv_range(struct arm_smmu_cmdq_ent *cmd,
>>>>                         unsigned long iova, size_t size,
>>>>                         size_t granule,
>>>> @@ -1926,9 +1939,10 @@ static void __arm_smmu_tlb_inv_range(struct
>>>> arm_smmu_cmdq_ent *cmd,
>>>>        arm_smmu_cmdq_batch_submit(smmu, &cmds);
>>>>    }
>>>>    
>>> Here is the part of code in __arm_smmu_tlb_inv_range():
>>>>          if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) {
>>>>                  /* Get the leaf page size */
>>>>                  tg = __ffs(smmu_domain->domain.pgsize_bitmap);
>>>>
>>>>                  /* Convert page size of 12,14,16 (log2) to 1,2,3 */
>>>>                  cmd->tlbi.tg = (tg - 10) / 2;
>>>>
>>>>                  /* Determine what level the granule is at */
>>>>                  cmd->tlbi.ttl = 4 - ((ilog2(granule) - 3) / (tg - 3));
>>>>
>>>>                  num_pages = size >> tg;
>>>>          }
>>> When pSMMU supports RIL, we get the leaf page size by
>>> __ffs(smmu_domain->
>>> domain.pgsize_bitmap). In nested mode, it is determined by host
>>> PAGE_SIZE. If
>>> the host kernel and guest kernel has different translation granule (e.g.
>>> host 16K,
>>> guest 4K), __arm_smmu_tlb_inv_range() will issue an incorrect tlbi
>>> command.
>>>
>>> Do you have any idea about this issue?
>> I think this is the same issue as the one reported by Chenxiang
>>
>> https://lore.kernel.org/lkml/15938ed5-2095-e903-a290-333c299015a2@hisilicon.com/
>>
>>
>> In case RIL is not supported by the host, next version will use the
>> smallest pSMMU supported page size, as done in __arm_smmu_tlb_inv_range
>>
>> Thanks
>>
>> Eric
> I think they are different. In normal cases, when we want to invalidate the
> cache of stage 1, we should use the granule size supported by vSMMU to
> implement and issue an tlbi command if pSMMU supports RIL.
> 
> But in the current __arm_smmu_tlb_inv_range(), it always uses the granule
> size supported by host.
> (tg = __ffs(smmu_domain->domain.pgsize_bitmap);)
> 
> Let me explain more clearly.
> Preconditions of this issue:
> 1. pSMMU supports RIL
> 2. host and guest use different translation granule (e.g. host 16K,
> guest 4K)
this is not clear to me. See below.
> 
> Guest wants to invalidate 4K, so info->granule_size = 4K.
> In __arm_smmu_tlb_inv_range(),   if pSMMU supports RIL and host 16K,
> tg = 14, tlbi.tg = 2, tlbi.ttl = 4, tlbi.scale = 0, tlbi.num = -1. It is
> an incorrect
> tlbi command.

If the guest uses 4K granule, this means the pSMMU also supports 4K
granule. Otherwise the corresponding CD is invalid (TG0/TG1 field desc).
So in that case isn't it valid to send a RIL invalidation with tg = 12,
right?

Making sure the guest uses a valid pSMMU supported granule is the QEMU
job I think, this should be done at the init phase before hitting CD
invalid errors for sure.

Thanks

Eric

> 
> So it would be better to pass the leaf page size supported by vSMMU to
> host.  Perhaps this issue and the one reported by Chenxiang can be solved
> together.
> 
> Thanks,
> Kunkun Jiang
>>> Best Regards,
>>> Kunkun Jiang
>>>> -static void arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t
>>>> size,
>>>> -                      size_t granule, bool leaf,
>>>> -                      struct arm_smmu_domain *smmu_domain)
>>>> +static void
>>>> +arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
>>>> +                  size_t granule, bool leaf, int ext_asid,
>>>> +                  struct arm_smmu_domain *smmu_domain)
>>>>    {
>>>>        struct arm_smmu_cmdq_ent cmd = {
>>>>            .tlbi = {
>>>> @@ -1936,7 +1950,12 @@ static void
>>>> arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
>>>>            },
>>>>        };
>>>>    -    if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>>>> +    if (ext_asid >= 0) {  /* guest stage 1 invalidation */
>>>> +        cmd.opcode    = smmu_domain->smmu->features &
>>>> ARM_SMMU_FEAT_E2H ?
>>>> +                  CMDQ_OP_TLBI_EL2_VA : CMDQ_OP_TLBI_NH_VA;
>>>> +        cmd.tlbi.asid    = ext_asid;
>>>> +        cmd.tlbi.vmid    = smmu_domain->s2_cfg.vmid;
>>>> +    } else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>>>>            cmd.opcode    = smmu_domain->smmu->features &
>>>> ARM_SMMU_FEAT_E2H ?
>>>>                      CMDQ_OP_TLBI_EL2_VA : CMDQ_OP_TLBI_NH_VA;
>>>>            cmd.tlbi.asid    = smmu_domain->s1_cfg.cd.asid;
>>>> @@ -1944,6 +1963,7 @@ static void
>>>> arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
>>>>            cmd.opcode    = CMDQ_OP_TLBI_S2_IPA;
>>>>            cmd.tlbi.vmid    = smmu_domain->s2_cfg.vmid;
>>>>        }
>>>> +
>>>>        __arm_smmu_tlb_inv_range(&cmd, iova, size, granule,
>>>> smmu_domain);
>>>>          /*
>>>> @@ -1982,7 +2002,7 @@ static void arm_smmu_tlb_inv_page_nosync(struct
>>>> iommu_iotlb_gather *gather,
>>>>    static void arm_smmu_tlb_inv_walk(unsigned long iova, size_t size,
>>>>                      size_t granule, void *cookie)
>>>>    {
>>>> -    arm_smmu_tlb_inv_range_domain(iova, size, granule, false, cookie);
>>>> +    arm_smmu_tlb_inv_range_domain(iova, size, granule, false, -1,
>>>> cookie);
>>>>    }
>>>>      static const struct iommu_flush_ops arm_smmu_flush_ops = {
>>>> @@ -2523,7 +2543,7 @@ static void arm_smmu_iotlb_sync(struct
>>>> iommu_domain *domain,
>>>>          arm_smmu_tlb_inv_range_domain(gather->start,
>>>>                          gather->end - gather->start + 1,
>>>> -                      gather->pgsize, true, smmu_domain);
>>>> +                      gather->pgsize, true, -1, smmu_domain);
>>>>    }
>>>>      static phys_addr_t
>>>
>> .
> 
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v14 06/13] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
  2021-04-09  8:31         ` Auger Eric
@ 2021-04-09  9:43           ` Kunkun Jiang
  0 siblings, 0 replies; 46+ messages in thread
From: Kunkun Jiang @ 2021-04-09  9:43 UTC (permalink / raw)
  To: Auger Eric, eric.auger.pro, iommu, linux-kernel, kvm, kvmarm,
	will, maz, robin.murphy, joro, alex.williamson, tn, zhukeqian1
  Cc: jacob.jun.pan, yi.l.liu, wangxingang5, jean-philippe,
	zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, yuzenghui, nicoleotsuka, lushenming,
	vsethi, wanghaibin.wang

On 2021/4/9 16:31, Auger Eric wrote:
> Hi Kunkun,
>
> On 4/9/21 6:48 AM, Kunkun Jiang wrote:
>> Hi Eric,
>>
>> On 2021/4/8 20:30, Auger Eric wrote:
>>> Hi Kunkun,
>>>
>>> On 4/1/21 2:37 PM, Kunkun Jiang wrote:
>>>> Hi Eric,
>>>>
>>>> On 2021/2/24 4:56, Eric Auger wrote:
>>>>> With nested stage support, soon we will need to invalidate
>>>>> S1 contexts and ranges tagged with an unmanaged asid, this
>>>>> latter being managed by the guest. So let's introduce 2 helpers
>>>>> that allow to invalidate with externally managed ASIDs
>>>>>
>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>>>
>>>>> ---
>>>>>
>>>>> v13 -> v14
>>>>> - Actually send the NH_ASID command (reported by Xingang Wang)
>>>>> ---
>>>>>     drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 38
>>>>> ++++++++++++++++-----
>>>>>     1 file changed, 29 insertions(+), 9 deletions(-)
>>>>>
>>>>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>>> index 5579ec4fccc8..4c19a1114de4 100644
>>>>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>>> @@ -1843,9 +1843,9 @@ int arm_smmu_atc_inv_domain(struct
>>>>> arm_smmu_domain *smmu_domain, int ssid,
>>>>>     }
>>>>>       /* IO_PGTABLE API */
>>>>> -static void arm_smmu_tlb_inv_context(void *cookie)
>>>>> +static void __arm_smmu_tlb_inv_context(struct arm_smmu_domain
>>>>> *smmu_domain,
>>>>> +                       int ext_asid)
>>>>>     {
>>>>> -    struct arm_smmu_domain *smmu_domain = cookie;
>>>>>         struct arm_smmu_device *smmu = smmu_domain->smmu;
>>>>>         struct arm_smmu_cmdq_ent cmd;
>>>>>     @@ -1856,7 +1856,13 @@ static void arm_smmu_tlb_inv_context(void
>>>>> *cookie)
>>>>>          * insertion to guarantee those are observed before the TLBI.
>>>>> Do be
>>>>>          * careful, 007.
>>>>>          */
>>>>> -    if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>>>>> +    if (ext_asid >= 0) { /* guest stage 1 invalidation */
>>>>> +        cmd.opcode    = CMDQ_OP_TLBI_NH_ASID;
>>>>> +        cmd.tlbi.asid    = ext_asid;
>>>>> +        cmd.tlbi.vmid    = smmu_domain->s2_cfg.vmid;
>>>>> +        arm_smmu_cmdq_issue_cmd(smmu, &cmd);
>>>>> +        arm_smmu_cmdq_issue_sync(smmu);
>>>>> +    } else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>>>>>             arm_smmu_tlb_inv_asid(smmu, smmu_domain->s1_cfg.cd.asid);
>>>>>         } else {
>>>>>             cmd.opcode    = CMDQ_OP_TLBI_S12_VMALL;
>>>>> @@ -1867,6 +1873,13 @@ static void arm_smmu_tlb_inv_context(void
>>>>> *cookie)
>>>>>         arm_smmu_atc_inv_domain(smmu_domain, 0, 0, 0);
>>>>>     }
>>>>>     +static void arm_smmu_tlb_inv_context(void *cookie)
>>>>> +{
>>>>> +    struct arm_smmu_domain *smmu_domain = cookie;
>>>>> +
>>>>> +    __arm_smmu_tlb_inv_context(smmu_domain, -1);
>>>>> +}
>>>>> +
>>>>>     static void __arm_smmu_tlb_inv_range(struct arm_smmu_cmdq_ent *cmd,
>>>>>                          unsigned long iova, size_t size,
>>>>>                          size_t granule,
>>>>> @@ -1926,9 +1939,10 @@ static void __arm_smmu_tlb_inv_range(struct
>>>>> arm_smmu_cmdq_ent *cmd,
>>>>>         arm_smmu_cmdq_batch_submit(smmu, &cmds);
>>>>>     }
>>>>>     
>>>> Here is the part of code in __arm_smmu_tlb_inv_range():
>>>>>           if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) {
>>>>>                   /* Get the leaf page size */
>>>>>                   tg = __ffs(smmu_domain->domain.pgsize_bitmap);
>>>>>
>>>>>                   /* Convert page size of 12,14,16 (log2) to 1,2,3 */
>>>>>                   cmd->tlbi.tg = (tg - 10) / 2;
>>>>>
>>>>>                   /* Determine what level the granule is at */
>>>>>                   cmd->tlbi.ttl = 4 - ((ilog2(granule) - 3) / (tg - 3));
>>>>>
>>>>>                   num_pages = size >> tg;
>>>>>           }
>>>> When pSMMU supports RIL, we get the leaf page size by
>>>> __ffs(smmu_domain->
>>>> domain.pgsize_bitmap). In nested mode, it is determined by host
>>>> PAGE_SIZE. If
>>>> the host kernel and guest kernel has different translation granule (e.g.
>>>> host 16K,
>>>> guest 4K), __arm_smmu_tlb_inv_range() will issue an incorrect tlbi
>>>> command.
>>>>
>>>> Do you have any idea about this issue?
>>> I think this is the same issue as the one reported by Chenxiang
>>>
>>> https://lore.kernel.org/lkml/15938ed5-2095-e903-a290-333c299015a2@hisilicon.com/
>>>
>>>
>>> In case RIL is not supported by the host, next version will use the
>>> smallest pSMMU supported page size, as done in __arm_smmu_tlb_inv_range
>>>
>>> Thanks
>>>
>>> Eric
>> I think they are different. In normal cases, when we want to invalidate the
>> cache of stage 1, we should use the granule size supported by vSMMU to
>> implement and issue an tlbi command if pSMMU supports RIL.
>>
>> But in the current __arm_smmu_tlb_inv_range(), it always uses the granule
>> size supported by host.
>> (tg = __ffs(smmu_domain->domain.pgsize_bitmap);)
>>
>> Let me explain more clearly.
>> Preconditions of this issue:
>> 1. pSMMU supports RIL
>> 2. host and guest use different translation granule (e.g. host 16K,
>> guest 4K)
> this is not clear to me. See below.
>> Guest wants to invalidate 4K, so info->granule_size = 4K.
>> In __arm_smmu_tlb_inv_range(),   if pSMMU supports RIL and host 16K,
>> tg = 14, tlbi.tg = 2, tlbi.ttl = 4, tlbi.scale = 0, tlbi.num = -1. It is
>> an incorrect
>> tlbi command.
> If the guest uses 4K granule, this means the pSMMU also supports 4K
> granule. Otherwise the corresponding CD is invalid (TG0/TG1 field desc).
> So in that case isn't it valid to send a RIL invalidation with tg = 12,
> right?
Dose "tg = 12" come from the smallest pSMMU supported page size?
Sorry, I overlooked the point you mentioned earlier.

My previous idea was to use granule_size to record the stage 1 page size
and nb_granules to record the number of pages.(Without deep consideration)
Now, it seems also okay to use the smallest pSMMU supported page size.

Thanks,
Kunkun Jiang
> Making sure the guest uses a valid pSMMU supported granule is the QEMU
> job I think, this should be done at the init phase before hitting CD
> invalid errors for sure.
>
> Thanks
>
> Eric
>
>> So it would be better to pass the leaf page size supported by vSMMU to
>> host.  Perhaps this issue and the one reported by Chenxiang can be solved
>> together.
>>
>> Thanks,
>> Kunkun Jiang
>>>> Best Regards,
>>>> Kunkun Jiang
>>>>> -static void arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t
>>>>> size,
>>>>> -                      size_t granule, bool leaf,
>>>>> -                      struct arm_smmu_domain *smmu_domain)
>>>>> +static void
>>>>> +arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
>>>>> +                  size_t granule, bool leaf, int ext_asid,
>>>>> +                  struct arm_smmu_domain *smmu_domain)
>>>>>     {
>>>>>         struct arm_smmu_cmdq_ent cmd = {
>>>>>             .tlbi = {
>>>>> @@ -1936,7 +1950,12 @@ static void
>>>>> arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
>>>>>             },
>>>>>         };
>>>>>     -    if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>>>>> +    if (ext_asid >= 0) {  /* guest stage 1 invalidation */
>>>>> +        cmd.opcode    = smmu_domain->smmu->features &
>>>>> ARM_SMMU_FEAT_E2H ?
>>>>> +                  CMDQ_OP_TLBI_EL2_VA : CMDQ_OP_TLBI_NH_VA;
>>>>> +        cmd.tlbi.asid    = ext_asid;
>>>>> +        cmd.tlbi.vmid    = smmu_domain->s2_cfg.vmid;
>>>>> +    } else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>>>>>             cmd.opcode    = smmu_domain->smmu->features &
>>>>> ARM_SMMU_FEAT_E2H ?
>>>>>                       CMDQ_OP_TLBI_EL2_VA : CMDQ_OP_TLBI_NH_VA;
>>>>>             cmd.tlbi.asid    = smmu_domain->s1_cfg.cd.asid;
>>>>> @@ -1944,6 +1963,7 @@ static void
>>>>> arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
>>>>>             cmd.opcode    = CMDQ_OP_TLBI_S2_IPA;
>>>>>             cmd.tlbi.vmid    = smmu_domain->s2_cfg.vmid;
>>>>>         }
>>>>> +
>>>>>         __arm_smmu_tlb_inv_range(&cmd, iova, size, granule,
>>>>> smmu_domain);
>>>>>           /*
>>>>> @@ -1982,7 +2002,7 @@ static void arm_smmu_tlb_inv_page_nosync(struct
>>>>> iommu_iotlb_gather *gather,
>>>>>     static void arm_smmu_tlb_inv_walk(unsigned long iova, size_t size,
>>>>>                       size_t granule, void *cookie)
>>>>>     {
>>>>> -    arm_smmu_tlb_inv_range_domain(iova, size, granule, false, cookie);
>>>>> +    arm_smmu_tlb_inv_range_domain(iova, size, granule, false, -1,
>>>>> cookie);
>>>>>     }
>>>>>       static const struct iommu_flush_ops arm_smmu_flush_ops = {
>>>>> @@ -2523,7 +2543,7 @@ static void arm_smmu_iotlb_sync(struct
>>>>> iommu_domain *domain,
>>>>>           arm_smmu_tlb_inv_range_domain(gather->start,
>>>>>                           gather->end - gather->start + 1,
>>>>> -                      gather->pgsize, true, smmu_domain);
>>>>> +                      gather->pgsize, true, -1, smmu_domain);
>>>>>     }
>>>>>       static phys_addr_t
>>> .
>>
> .



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v14 08/13] dma-iommu: Implement NESTED_MSI cookie
  2021-04-07  7:39   ` Zenghui Yu
@ 2021-04-10 13:34     ` Auger Eric
  0 siblings, 0 replies; 46+ messages in thread
From: Auger Eric @ 2021-04-10 13:34 UTC (permalink / raw)
  To: Zenghui Yu
  Cc: eric.auger.pro, iommu, linux-kernel, kvm, kvmarm, will, maz,
	robin.murphy, joro, alex.williamson, tn, zhukeqian1,
	jacob.jun.pan, yi.l.liu, wangxingang5, jiangkunkun,
	jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, nicoleotsuka, lushenming, vsethi,
	wanghaibin.wang

Hi Zenghui,

On 4/7/21 9:39 AM, Zenghui Yu wrote:
> Hi Eric,
> 
> On 2021/2/24 4:56, Eric Auger wrote:
>> Up to now, when the type was UNMANAGED, we used to
>> allocate IOVA pages within a reserved IOVA MSI range.
>>
>> If both the host and the guest are exposed with SMMUs, each
>> would allocate an IOVA. The guest allocates an IOVA (gIOVA)
>> to map onto the guest MSI doorbell (gDB). The Host allocates
>> another IOVA (hIOVA) to map onto the physical doorbell (hDB).
>>
>> So we end up with 2 unrelated mappings, at S1 and S2:
>>           S1             S2
>> gIOVA    ->     gDB
>>                 hIOVA    ->    hDB
>>
>> The PCI device would be programmed with hIOVA.
>> No stage 1 mapping would existing, causing the MSIs to fault.
>>
>> iommu_dma_bind_guest_msi() allows to pass gIOVA/gDB
>> to the host so that gIOVA can be used by the host instead of
>> re-allocating a new hIOVA.
>>
>>           S1           S2
>> gIOVA    ->    gDB    ->    hDB
>>
>> this time, the PCI device can be programmed with the gIOVA MSI
>> doorbell which is correctly mapped through both stages.
>>
>> Nested mode is not compatible with HW MSI regions as in that
>> case gDB and hDB should have a 1-1 mapping. This check will
>> be done when attaching each device to the IOMMU domain.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> [...]
> 
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index f659395e7959..d25eb7cecaa7 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -19,6 +19,7 @@
>>   #include <linux/irq.h>
>>   #include <linux/mm.h>
>>   #include <linux/mutex.h>
>> +#include <linux/mutex.h>
> 
> Duplicated include.
sure
> 
>>   #include <linux/pci.h>
>>   #include <linux/swiotlb.h>
>>   #include <linux/scatterlist.h>
>> @@ -29,12 +30,15 @@
>>   struct iommu_dma_msi_page {
>>       struct list_head    list;
>>       dma_addr_t        iova;
>> +    dma_addr_t        gpa;
>>       phys_addr_t        phys;
>> +    size_t            s1_granule;
>>   };
>>     enum iommu_dma_cookie_type {
>>       IOMMU_DMA_IOVA_COOKIE,
>>       IOMMU_DMA_MSI_COOKIE,
>> +    IOMMU_DMA_NESTED_MSI_COOKIE,
>>   };
>>     struct iommu_dma_cookie {
>> @@ -46,6 +50,7 @@ struct iommu_dma_cookie {
>>           dma_addr_t        msi_iova;
> 
> msi_iova is unused in the nested mode, but we still set it to the start
> address of the RESV_SW_MSI region (in iommu_get_msi_cookie()), which
> looks a bit strange to me.
I agree with you
> 
>>       };
>>       struct list_head        msi_page_list;
>> +    spinlock_t            msi_lock;
> 
> Should msi_lock be grabbed everywhere msi_page_list is populated?
> Especially in iommu_dma_get_msi_page(), which can be invoked from the
> irqchip driver.
Yes I agree
> 
>>         /* Domain for flush queue callback; NULL if flush queue not in
>> use */
>>       struct iommu_domain        *fq_domain;
>> @@ -87,6 +92,7 @@ static struct iommu_dma_cookie *cookie_alloc(enum
>> iommu_dma_cookie_type type)
>>         cookie = kzalloc(sizeof(*cookie), GFP_KERNEL);
>>       if (cookie) {
>> +        spin_lock_init(&cookie->msi_lock);
>>           INIT_LIST_HEAD(&cookie->msi_page_list);
>>           cookie->type = type;
>>       }
>> @@ -120,14 +126,17 @@ EXPORT_SYMBOL(iommu_get_dma_cookie);
>>    *
>>    * Users who manage their own IOVA allocation and do not want DMA
>> API support,
>>    * but would still like to take advantage of automatic MSI
>> remapping, can use
>> - * this to initialise their own domain appropriately. Users should
>> reserve a
>> + * this to initialise their own domain appropriately. Users may
>> reserve a
>>    * contiguous IOVA region, starting at @base, large enough to
>> accommodate the
>>    * number of PAGE_SIZE mappings necessary to cover every MSI
>> doorbell address
>> - * used by the devices attached to @domain.
>> + * used by the devices attached to @domain. The other way round is to
>> provide
>> + * usable iova pages through the iommu_dma_bind_doorbell API (nested
>> stages
> 
> s/iommu_dma_bind_doorbell/iommu_dma_bind_guest_msi/ ?
correct
> 
>> + * use case)
>>    */
>>   int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
>>   {
>>       struct iommu_dma_cookie *cookie;
>> +    int nesting, ret;
>>         if (domain->type != IOMMU_DOMAIN_UNMANAGED)
>>           return -EINVAL;
>> @@ -135,7 +144,12 @@ int iommu_get_msi_cookie(struct iommu_domain
>> *domain, dma_addr_t base)
>>       if (domain->iova_cookie)
>>           return -EEXIST;
>>   -    cookie = cookie_alloc(IOMMU_DMA_MSI_COOKIE);
>> +    ret =  iommu_domain_get_attr(domain, DOMAIN_ATTR_NESTING, &nesting);
> 
> Redundant space.
yep
> 
>> +    if (!ret && nesting)
>> +        cookie = cookie_alloc(IOMMU_DMA_NESTED_MSI_COOKIE);
>> +    else
>> +        cookie = cookie_alloc(IOMMU_DMA_MSI_COOKIE);
>> +
>>       if (!cookie)
>>           return -ENOMEM;
>>   @@ -156,6 +170,7 @@ void iommu_put_dma_cookie(struct iommu_domain
>> *domain)
>>   {
>>       struct iommu_dma_cookie *cookie = domain->iova_cookie;
>>       struct iommu_dma_msi_page *msi, *tmp;
>> +    bool s2_unmap = false;
>>         if (!cookie)
>>           return;
>> @@ -163,7 +178,15 @@ void iommu_put_dma_cookie(struct iommu_domain
>> *domain)
>>       if (cookie->type == IOMMU_DMA_IOVA_COOKIE && cookie->iovad.granule)
>>           put_iova_domain(&cookie->iovad);
>>   +    if (cookie->type == IOMMU_DMA_NESTED_MSI_COOKIE)
>> +        s2_unmap = true;
>> +
>>       list_for_each_entry_safe(msi, tmp, &cookie->msi_page_list, list) {
>> +        if (s2_unmap && msi->phys) {
> 
> I don't think @s2_unmap is necessary. Checking 'cookie->type==NESTED'
> directly shouldn't be that expensive.
agreed
> 
>> +            size_t size = cookie_msi_granule(cookie);
>> +
>> +            WARN_ON(iommu_unmap(domain, msi->gpa, size) != size);
>> +        }
>>           list_del(&msi->list);
>>           kfree(msi);
>>       }
>> @@ -172,6 +195,92 @@ void iommu_put_dma_cookie(struct iommu_domain
>> *domain)
>>   }
>>   EXPORT_SYMBOL(iommu_put_dma_cookie);
>>   +/**
>> + * iommu_dma_bind_guest_msi - Allows to pass the stage 1
>> + * binding of a virtual MSI doorbell used by @dev.
>> + *
>> + * @domain: domain handle
>> + * @iova: guest iova
> 
> Can we change it to 'giova' (to match the unbind side)?
sure
> 
>> + * @gpa: gpa of the virtual doorbell
>> + * @size: size of the granule used for the stage1 mapping
>> + *
>> + * In nested stage use case, the user can provide IOVA/IPA bindings
>> + * corresponding to a guest MSI stage 1 mapping. When the host needs
>> + * to map its own MSI doorbells, it can use @gpa as stage 2 input
>> + * and map it onto the physical MSI doorbell.
>> + */
>> +int iommu_dma_bind_guest_msi(struct iommu_domain *domain,
>> +                 dma_addr_t iova, phys_addr_t gpa, size_t size)
>> +{
>> +    struct iommu_dma_cookie *cookie = domain->iova_cookie;
>> +    struct iommu_dma_msi_page *msi;
>> +    int ret = 0;
>> +
>> +    if (!cookie)
>> +        return -EINVAL;
>> +
>> +    if (cookie->type != IOMMU_DMA_NESTED_MSI_COOKIE)
>> +        return -EINVAL;
>> +
>> +    iova = iova & ~(dma_addr_t)(size - 1);
>> +    gpa = gpa & ~(phys_addr_t)(size - 1);
>> +
>> +    spin_lock(&cookie->msi_lock);
>> +
>> +    list_for_each_entry(msi, &cookie->msi_page_list, list) {
>> +        if (msi->iova == iova)
>> +            goto unlock; /* this page is already registered */
>> +    }
>> +
>> +    msi = kzalloc(sizeof(*msi), GFP_ATOMIC);
>> +    if (!msi) {
>> +        ret = -ENOMEM;
>> +        goto unlock;
>> +    }
>> +
>> +    msi->iova = iova;
>> +    msi->gpa = gpa;
>> +    msi->s1_granule = size;
>> +    list_add(&msi->list, &cookie->msi_page_list);
>> +unlock:
>> +    spin_unlock(&cookie->msi_lock);
>> +    return ret;
>> +}
>> +EXPORT_SYMBOL(iommu_dma_bind_guest_msi);
>> +
>> +void iommu_dma_unbind_guest_msi(struct iommu_domain *domain,
>> dma_addr_t giova)
>> +{
>> +    struct iommu_dma_cookie *cookie = domain->iova_cookie;
>> +    struct iommu_dma_msi_page *msi;
>> +
>> +    if (!cookie)
>> +        return;
>> +
>> +    if (cookie->type != IOMMU_DMA_NESTED_MSI_COOKIE)
>> +        return;
>> +
>> +    spin_lock(&cookie->msi_lock);
>> +
>> +    list_for_each_entry(msi, &cookie->msi_page_list, list) {
>> +        dma_addr_t aligned_giova =
>> +            giova & ~(dma_addr_t)(msi->s1_granule - 1);
>> +
>> +        if (msi->iova == aligned_giova) {
>> +            if (msi->phys) {
>> +                /* unmap the stage 2 */
>> +                size_t size = cookie_msi_granule(cookie);
>> +
>> +                WARN_ON(iommu_unmap(domain, msi->gpa, size) != size);
>> +            }
>> +            list_del(&msi->list);
>> +            kfree(msi);
>> +            break;
>> +        }
>> +    }
>> +    spin_unlock(&cookie->msi_lock);
>> +}
>> +EXPORT_SYMBOL(iommu_dma_unbind_guest_msi);
>> +
>>   /**
>>    * iommu_dma_get_resv_regions - Reserved region driver helper
>>    * @dev: Device from iommu_get_resv_regions()
>> @@ -1343,6 +1452,33 @@ static struct iommu_dma_msi_page
>> *iommu_dma_get_msi_page(struct device *dev,
>>           if (msi_page->phys == msi_addr)
>>               return msi_page;
>>   +    /*
>> +     * In nested stage mode, we do not allocate an MSI page in
>> +     * a range provided by the user. Instead, IOVA/IPA bindings are
>> +     * individually provided. We reuse thise IOVAs to build the
> 
> s/thise/these/
yep
> 
>> +     * GIOVA -> GPA -> MSI HPA nested stage mapping.
>> +     */
>> +    if (cookie->type == IOMMU_DMA_NESTED_MSI_COOKIE) {
>> +        list_for_each_entry(msi_page, &cookie->msi_page_list, list)
>> +            if (!msi_page->phys) {
>> +                int ret;
>> +
>> +                /* do the stage 2 mapping */
>> +                ret = iommu_map(domain,
>> +                        msi_page->gpa, msi_addr, size,
> 
> Shouldn't we make sure that the size of S2 mapping is not less than
> s1_granule? Although what we need is actually a 32-bit TRANSLATER
> register, we don't know where it is mapped in S1.
The GITS_TRANSLATER is at offset 0x40 of the page. That's what I had in
mind. But indeed the doorbell may belong to another MSI controller with
different layout. So yes that's wort checking.

I will add this check in iommu_dma_bind_guest_msi()

> 
>> +                        IOMMU_MMIO | IOMMU_WRITE);
> 
> Is it intentional to drop the IOMMU_NOEXEC flag (from @prot)?
no restoring it.
> 
>> +                if (ret) {
>> +                    pr_warn("MSI S2 mapping failed (%d)\n",
>> +                        ret);
>> +                    return NULL;
>> +                }
>> +                msi_page->phys = msi_addr;
>> +                return msi_page;
>> +            }
>> +        pr_warn("%s no MSI binding found\n", __func__);
>> +        return NULL;
>> +    }
>> +
>>       msi_page = kzalloc(sizeof(*msi_page), GFP_KERNEL);
>>       if (!msi_page)
>>           return NULL;
> 
> 
> Thanks,
> Zenghui
> 

Thank you for your careful review!

Best Regards

Eric


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part)
  2021-02-25 16:06 ` [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part) Auger Eric
@ 2021-04-22 15:04   ` Sumit Gupta
  2021-04-23 13:00     ` Jean-Philippe Brucker
  0 siblings, 1 reply; 46+ messages in thread
From: Sumit Gupta @ 2021-04-22 15:04 UTC (permalink / raw)
  To: eric.auger
  Cc: alex.williamson, eric.auger.pro, iommu, jean-philippe,
	jiangkunkun, joro, kvm, kvmarm, linux-kernel, lushenming, maz,
	robin.murphy, tn, vivek.gautam, vsethi, wangxingang5, will,
	zhangfei.gao, zhukeqian1, vdumpa, sumitg

Hi Eric,
I have validated the v14 of the patch series from branch "jean_sva_current_2stage_v14".
Verfied nested translations with NVMe PCI device assigned to Qemu 5.2 Guest.
Had to revert patch "mm: notify remote TLBs when dirtying a PTE".

Tested-by: Sumit Gupta <sumitg@nvidia.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part)
  2021-04-22 15:04   ` Sumit Gupta
@ 2021-04-23 13:00     ` Jean-Philippe Brucker
  2021-04-23 17:16       ` Sumit Gupta
  0 siblings, 1 reply; 46+ messages in thread
From: Jean-Philippe Brucker @ 2021-04-23 13:00 UTC (permalink / raw)
  To: Sumit Gupta
  Cc: eric.auger, alex.williamson, eric.auger.pro, iommu, jiangkunkun,
	joro, kvm, kvmarm, linux-kernel, lushenming, maz, robin.murphy,
	tn, vivek.gautam, vsethi, wangxingang5, will, zhangfei.gao,
	zhukeqian1, vdumpa

Hi Sumit,

On Thu, Apr 22, 2021 at 08:34:38PM +0530, Sumit Gupta wrote:
> Had to revert patch "mm: notify remote TLBs when dirtying a PTE".

Did that patch cause any issue, or is it just not needed on your system?
It fixes an hypothetical problem with the way ATS is implemented. Maybe I
actually observed it on an old software model, I don't remember. Either
way it's unlikely to go upstream but I'd like to know if I should drop it
from my tree.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part)
  2021-04-23 13:00     ` Jean-Philippe Brucker
@ 2021-04-23 17:16       ` Sumit Gupta
  2021-04-23 17:58         ` Krishna Reddy
  0 siblings, 1 reply; 46+ messages in thread
From: Sumit Gupta @ 2021-04-23 17:16 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: eric.auger, alex.williamson, eric.auger.pro, iommu, jiangkunkun,
	joro, kvm, kvmarm, linux-kernel, lushenming, maz, robin.murphy,
	tn, vivek.gautam, vsethi, wangxingang5, will, zhangfei.gao,
	zhukeqian1, Krishna Reddy, Sumit Gupta, Sachin Nikam, Bibek Basu

Hi Jean,


> Hi Sumit,
> 
> On Thu, Apr 22, 2021 at 08:34:38PM +0530, Sumit Gupta wrote:
>> Had to revert patch "mm: notify remote TLBs when dirtying a PTE".
> 
> Did that patch cause any issue, or is it just not needed on your system?
> It fixes an hypothetical problem with the way ATS is implemented. Maybe I
> actually observed it on an old software model, I don't remember. Either
> way it's unlikely to go upstream but I'd like to know if I should drop it
> from my tree.

I tried Nested SMMUv3 patches v15(Eric's branch: 
v5.12-rc6-jean-iopf-14-2stage-v15) on top of your current sva patches 
with Kernel-5.12.0-rc8.
Had to revert same patch "mm: notify remote TLBs when dirtying a PTE" to 
avoid below crash[1]. I am not sure about the cause yet.
Didn't get crash after reverting patch and nested translations worked.

[1]
[   11.730943] arm-smmu-v3 9050000.smmuv3: ias 44-bit, oas 44-bit 
(features 0x00008305)^M^M
[   11.833791] arm-smmu-v3 9050000.smmuv3: allocated 524288 entries for 
cmdq^M^M
[   11.979456] arm-smmu-v3 9050000.smmuv3: allocated 524288 entries for 
evtq^M^M
[   12.048895] cacheinfo: Unable to detect cache hierarchy for CPU 0^M^M
[   12.234175] loop: module loaded^M^M
[   12.279552] megasas: 07.714.04.00-rc1^M^M
[   12.408831] nvme 0000:00:02.0: Adding to iommu group 0^M^M
[   12.488063] nvme nvme0: pci function 0000:00:02.0^M^M
[   12.525887] nvme 0000:00:02.0: enabling device (0000 -> 0002)^M^M
[   12.612159] physmap-flash 0.flash: physmap platform flash device: 
[mem 0x00000000-0x03ffffff]^M^M
[ 1721.586943] Unable to handle kernel paging request at virtual address 
ffff617f80000000^M
[ 1721.587263] Mem abort info:^M
[ 1721.587776]   ESR = 0x96000145^M
[ 1721.587968]   EC = 0x25: DABT (current EL), IL = 32 bits^M
[ 1721.588416]   SET = 0, FnV = 0^M
[ 1721.588672]   EA = 0, S1PTW = 0^M
[ 1721.588863] Data abort info:^M
[ 1721.589120]   ISV = 0, ISS = 0x00000145^M
[ 1721.589311]   CM = 1, WnR = 1^M
[ 1721.589568] swapper pgtable: 64k pages, 48-bit VAs, 
pgdp=0000000111280000^M
[ 1721.589951] [ffff617f80000000] pgd=0000000000000000, 
p4d=0000000000000000, pud=0000000000000000^M
[ 1721.590592] Internal error: Oops: 96000145 [#1] PREEMPT SMP^M
[ 1721.590912] Modules linked in:^M
[ 1721.591232] CPU: 0 PID: 664 Comm: qemu-system-aar Not tainted 
5.12.0-rc8-tegra-229886-g4786d4a20d7 #22^M
[ 1721.591680] pstate: a0400005 (NzCv daif +PAN -UAO -TCO BTYPE=--)
[ 1721.592128] pc : __flush_dcache_area+0x20/0x38
[ 1721.592511] lr : kvm_set_spte_hva+0x64/0xc8
[ 1721.592832] sp : ffff8000145cfc30
[ 1721.593087] x29: ffff8000145cfc30 x28: ffff000095221c80
[ 1721.593599] x27: 0000000000000002 x26: ffff0000a3711c88
[ 1721.594112] x25: ffff00009333a740 x24: 01e8618000000f53
[ 1721.594624] x23: 0000ffffb8320000 x22: 0000000000000001
[ 1721.595136] x21: 0000ffffb8320000 x20: ffff0000a1268000
[ 1721.595647] x19: ffff800011c95000 x18: 0000000000000000
[ 1721.596160] x17: 0000000000000000 x16: 0000000000000000
[ 1721.596608] x15: 0000000000000000 x14: 0000000000000000
[ 1721.597120] x13: 0000000000000000 x12: 0000000000000000
[ 1721.597568] x11: 0000000000000000 x10: 0000000000000000
[ 1721.598080] x9 : 0000000000000000 x8 : ffff00009333a740
[ 1721.598592] x7 : 07fd000ffffb8320 x6 : ffff0000815bc190
[ 1721.599104] x5 : 0000000000011b06 x4 : 0000000000000000
[ 1721.599552] x3 : 000000000000003f x2 : 0000000000000040
[ 1721.600064] x1 : ffff617f80010000 x0 : ffff617f80000000
[ 1721.600576] Call trace:
[ 1721.600768]  __flush_dcache_area+0x20/0x38
[ 1721.601216]  kvm_mmu_notifier_change_pte+0x5c/0xa8
[ 1721.601601]  __mmu_notifier_change_pte+0x60/0xa0
[ 1721.601985]  __handle_mm_fault+0x740/0xde8
[ 1721.602367]  handle_mm_fault+0xe8/0x238
[ 1721.602751]  do_page_fault+0x160/0x3a8
[ 1721.603200]  do_mem_abort+0x40/0xb0
[ 1721.603520]  el0_da+0x20/0x30
[ 1721.603967]  el0_sync_handler+0x68/0xd0
[ 1721.604416]  el0_sync+0x154/0x180
[ 1721.604864] Code: 9ac32042 8b010001 d1000443 8a230000 (d50b7e20)
[ 1721.605184] ---[ end trace 7678eb97889b6fbd ]---
[ 1721.605504] Kernel panic - not syncing: Oops: Fatal exception
[ 1721.605824] Kernel Offset: disabled
[ 1721.606016] CPU features: 0x00340216,6280a018
[ 1721.606335] Memory Limit: 2909 MB
[ 1721.606656] ---[ end Kernel panic - not syncing: Oops: Fatal 
exception ]---

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part)
  2021-04-23 17:16       ` Sumit Gupta
@ 2021-04-23 17:58         ` Krishna Reddy
  2021-04-23 18:19           ` Sumit Gupta
  2021-04-24  9:06           ` Marc Zyngier
  0 siblings, 2 replies; 46+ messages in thread
From: Krishna Reddy @ 2021-04-23 17:58 UTC (permalink / raw)
  To: Sumit Gupta, Jean-Philippe Brucker
  Cc: eric.auger, alex.williamson, eric.auger.pro, iommu, jiangkunkun,
	joro, kvm, kvmarm, linux-kernel, lushenming, maz, robin.murphy,
	tn, vivek.gautam, Vikram Sethi, wangxingang5, will, zhangfei.gao,
	zhukeqian1, Sachin Nikam, Bibek Basu, Shanker Donthineni,
	Vikram Sethi

>> Did that patch cause any issue, or is it just not needed on your system?
>> It fixes an hypothetical problem with the way ATS is implemented. 
>> Maybe I actually observed it on an old software model, I don't 
>> remember. Either way it's unlikely to go upstream but I'd like to know 
>> if I should drop it from my tree.

> Had to revert same patch "mm: notify remote TLBs when dirtying a PTE" to
> avoid below crash[1]. I am not sure about the cause yet.

I have noticed this issue earlier with patch pointed here and root caused the issue as below.
It happens after vfio_mmap request from QEMU for the PCIe device and during the access of VA when
PTE access flags are updated. 

kvm_mmu_notifier_change_pte() --> kvm_set_spte_hve() --> kvm_set_spte_hva() --> clean_dcache_guest_page()

The validation model doesn't have FWB capability supported.
__clean_dcache_guest_page() attempts to perform dcache flush on pcie bar address(not a valid_pfn()) through page_address(),
which doesn't have page table mapping and leads to exception.

I have worked around the issue by filtering out the request if the pfn is not valid in  __clean_dcache_guest_page(). 
As the patch wasn't posted in the community, reverted it as well.

-KR


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part)
  2021-04-23 17:58         ` Krishna Reddy
@ 2021-04-23 18:19           ` Sumit Gupta
  2021-04-24  9:06           ` Marc Zyngier
  1 sibling, 0 replies; 46+ messages in thread
From: Sumit Gupta @ 2021-04-23 18:19 UTC (permalink / raw)
  To: Krishna Reddy, Jean-Philippe Brucker
  Cc: eric.auger, alex.williamson, eric.auger.pro, iommu, jiangkunkun,
	joro, kvm, kvmarm, linux-kernel, lushenming, maz, robin.murphy,
	tn, vivek.gautam, Vikram Sethi, wangxingang5, will, zhangfei.gao,
	zhukeqian1, Sachin Nikam, Bibek Basu, Shanker Donthineni,
	Sumit Gupta



>>> Did that patch cause any issue, or is it just not needed on your system?
>>> It fixes an hypothetical problem with the way ATS is implemented.
>>> Maybe I actually observed it on an old software model, I don't
>>> remember. Either way it's unlikely to go upstream but I'd like to know
>>> if I should drop it from my tree.
> 
>> Had to revert same patch "mm: notify remote TLBs when dirtying a PTE" to
>> avoid below crash[1]. I am not sure about the cause yet.
> 
> I have noticed this issue earlier with patch pointed here and root caused the issue as below.
> It happens after vfio_mmap request from QEMU for the PCIe device and during the access of VA when
> PTE access flags are updated.
> 
> kvm_mmu_notifier_change_pte() --> kvm_set_spte_hve() --> kvm_set_spte_hva() --> clean_dcache_guest_page()
> 
> The validation model doesn't have FWB capability supported.
> __clean_dcache_guest_page() attempts to perform dcache flush on pcie bar address(not a valid_pfn()) through page_address(),
> which doesn't have page table mapping and leads to exception.
> 
> I have worked around the issue by filtering out the request if the pfn is not valid in  __clean_dcache_guest_page().
> As the patch wasn't posted in the community, reverted it as well.

Thank you Krishna for sharing the analysis.

Best Regards,
Sumit Gupta

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part)
  2021-04-23 17:58         ` Krishna Reddy
  2021-04-23 18:19           ` Sumit Gupta
@ 2021-04-24  9:06           ` Marc Zyngier
  2021-04-24 11:29             ` Sumit Gupta
  1 sibling, 1 reply; 46+ messages in thread
From: Marc Zyngier @ 2021-04-24  9:06 UTC (permalink / raw)
  To: Krishna Reddy
  Cc: Sumit Gupta, Jean-Philippe Brucker, eric.auger, alex.williamson,
	eric.auger.pro, iommu, jiangkunkun, joro, kvm, kvmarm,
	linux-kernel, lushenming, robin.murphy, tn, vivek.gautam,
	Vikram Sethi, wangxingang5, will, zhangfei.gao, zhukeqian1,
	Sachin Nikam, Bibek Basu, Shanker Donthineni

On Fri, 23 Apr 2021 18:58:23 +0100,
Krishna Reddy <vdumpa@nvidia.com> wrote:
> 
> >> Did that patch cause any issue, or is it just not needed on your system?
> >> It fixes an hypothetical problem with the way ATS is implemented. 
> >> Maybe I actually observed it on an old software model, I don't 
> >> remember. Either way it's unlikely to go upstream but I'd like to know 
> >> if I should drop it from my tree.
> 
> > Had to revert same patch "mm: notify remote TLBs when dirtying a PTE" to
> > avoid below crash[1]. I am not sure about the cause yet.
> 
> I have noticed this issue earlier with patch pointed here and root
> caused the issue as below.  It happens after vfio_mmap request from
> QEMU for the PCIe device and during the access of VA when PTE access
> flags are updated.
> 
> kvm_mmu_notifier_change_pte() --> kvm_set_spte_hve() -->
> kvm_set_spte_hva() --> clean_dcache_guest_page()
> 
> The validation model doesn't have FWB capability supported.
> __clean_dcache_guest_page() attempts to perform dcache flush on pcie
> bar address(not a valid_pfn()) through page_address(), which doesn't
> have page table mapping and leads to exception.
> 
> I have worked around the issue by filtering out the request if the
> pfn is not valid in __clean_dcache_guest_page().  As the patch
> wasn't posted in the community, reverted it as well.

That's papering over the real issue, and this mapping path needs
fixing as it was only ever expected to be called for CoW.

Can you please try the following patch and let me know if that fixes
the issue for good?

Thanks,

	M.

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 77cb2d28f2a4..b62dd40a4083 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1147,7 +1147,8 @@ int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
 	 * We've moved a page around, probably through CoW, so let's treat it
 	 * just like a translation fault and clean the cache to the PoC.
 	 */
-	clean_dcache_guest_page(pfn, PAGE_SIZE);
+	if (!kvm_is_device_pfn(pfn))
+		clean_dcache_guest_page(pfn, PAGE_SIZE);
 	handle_hva_to_gpa(kvm, hva, end, &kvm_set_spte_handler, &pfn);
 	return 0;
 }


-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part)
  2021-04-24  9:06           ` Marc Zyngier
@ 2021-04-24 11:29             ` Sumit Gupta
  0 siblings, 0 replies; 46+ messages in thread
From: Sumit Gupta @ 2021-04-24 11:29 UTC (permalink / raw)
  To: Marc Zyngier, Krishna Reddy
  Cc: Jean-Philippe Brucker, eric.auger, alex.williamson,
	eric.auger.pro, iommu, jiangkunkun, joro, kvm, kvmarm,
	linux-kernel, lushenming, robin.murphy, tn, vivek.gautam,
	Vikram Sethi, wangxingang5, will, zhangfei.gao, zhukeqian1,
	Sachin Nikam, Bibek Basu, Shanker Donthineni, Sumit Gupta


>> I have worked around the issue by filtering out the request if the
>> pfn is not valid in __clean_dcache_guest_page().  As the patch
>> wasn't posted in the community, reverted it as well.
> 
> That's papering over the real issue, and this mapping path needs
> fixing as it was only ever expected to be called for CoW.
> 
> Can you please try the following patch and let me know if that fixes
> the issue for good?
> 

Hi Marc,

Thank you for the patch. This patch fixed the crash for me.
For the formal patch, please add:

Tested-by: Sumit Gupta <sumitg@nvidia.com>

> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 77cb2d28f2a4..b62dd40a4083 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1147,7 +1147,8 @@ int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
>           * We've moved a page around, probably through CoW, so let's treat it
>           * just like a translation fault and clean the cache to the PoC.
>           */
> -       clean_dcache_guest_page(pfn, PAGE_SIZE);
> +       if (!kvm_is_device_pfn(pfn))
> +               clean_dcache_guest_page(pfn, PAGE_SIZE);
>          handle_hva_to_gpa(kvm, hva, end, &kvm_set_spte_handler, &pfn);
>          return 0;
>   }
> 
> 
> --
> Without deviation from the norm, progress is not possible.
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2021-04-24 11:29 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-23 20:56 [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part) Eric Auger
2021-02-23 20:56 ` [PATCH v14 01/13] iommu: Introduce attach/detach_pasid_table API Eric Auger
2021-02-23 20:56 ` [PATCH v14 02/13] iommu: Introduce bind/unbind_guest_msi Eric Auger
2021-02-23 20:56 ` [PATCH v14 03/13] iommu/smmuv3: Allow s1 and s2 configs to coexist Eric Auger
2021-02-23 20:56 ` [PATCH v14 04/13] iommu/smmuv3: Get prepared for nested stage support Eric Auger
2021-02-23 20:56 ` [PATCH v14 05/13] iommu/smmuv3: Implement attach/detach_pasid_table Eric Auger
2021-03-02  8:35   ` Keqian Zhu
2021-03-19 13:15     ` Auger Eric
2021-03-22  6:23       ` Keqian Zhu
2021-02-23 20:56 ` [PATCH v14 06/13] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs Eric Auger
2021-03-30  9:17   ` Zenghui Yu
2021-04-01  9:38     ` Auger Eric
2021-04-01 12:37   ` Kunkun Jiang
2021-04-08 12:30     ` Auger Eric
2021-04-09  4:48       ` Kunkun Jiang
2021-04-09  8:31         ` Auger Eric
2021-04-09  9:43           ` Kunkun Jiang
2021-02-23 20:56 ` [PATCH v14 07/13] iommu/smmuv3: Implement cache_invalidate Eric Auger
     [not found]   ` <c10c6405-5efe-5a41-2b3a-f3af85a528ba@hisilicon.com>
2021-03-19 17:36     ` Auger Eric
2021-03-22  6:40       ` [Linuxarm] " chenxiang (M)
2021-03-22  9:05         ` Auger Eric
2021-03-23  1:28           ` chenxiang (M)
2021-04-01  6:11   ` Zenghui Yu
2021-04-01 12:06     ` Auger Eric
2021-02-23 20:56 ` [PATCH v14 08/13] dma-iommu: Implement NESTED_MSI cookie Eric Auger
2021-04-07  7:39   ` Zenghui Yu
2021-04-10 13:34     ` Auger Eric
2021-02-23 20:56 ` [PATCH v14 09/13] iommu/smmuv3: Nested mode single MSI doorbell per domain enforcement Eric Auger
2021-02-23 20:56 ` [PATCH v14 10/13] iommu/smmuv3: Enforce incompatibility between nested mode and HW MSI regions Eric Auger
2021-02-23 20:56 ` [PATCH v14 11/13] iommu/smmuv3: Implement bind/unbind_guest_msi Eric Auger
2021-02-23 20:56 ` [PATCH v14 12/13] iommu/smmuv3: report additional recoverable faults Eric Auger
2021-02-23 20:56 ` [PATCH v14 13/13] iommu/smmuv3: Accept configs with more than one context descriptor Eric Auger
2021-03-30  9:23   ` Zenghui Yu
2021-04-01 11:48     ` Auger Eric
2021-04-01 12:38       ` Shameerali Kolothum Thodi
2021-04-01 13:20         ` Auger Eric
2021-02-25 16:06 ` [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part) Auger Eric
2021-04-22 15:04   ` Sumit Gupta
2021-04-23 13:00     ` Jean-Philippe Brucker
2021-04-23 17:16       ` Sumit Gupta
2021-04-23 17:58         ` Krishna Reddy
2021-04-23 18:19           ` Sumit Gupta
2021-04-24  9:06           ` Marc Zyngier
2021-04-24 11:29             ` Sumit Gupta
2021-03-18  0:16 ` Krishna Reddy
2021-03-19 13:17   ` Auger Eric

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).