kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)
@ 2020-11-18 11:21 Eric Auger
  2020-11-18 11:21 ` [PATCH v13 01/15] iommu: Introduce attach/detach_pasid_table API Eric Auger
                   ` (16 more replies)
  0 siblings, 17 replies; 57+ messages in thread
From: Eric Auger @ 2020-11-18 11:21 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, iommu, linux-kernel, kvm, kvmarm,
	will, joro, maz, robin.murphy, alex.williamson
  Cc: jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, jacob.jun.pan, yi.l.liu, tn,
	nicoleotsuka, yuzenghui

This series brings the IOMMU part of HW nested paging support
in the SMMUv3. The VFIO part is submitted separately.

The IOMMU API is extended to support 2 new API functionalities:
1) pass the guest stage 1 configuration
2) pass stage 1 MSI bindings

Then those capabilities gets implemented in the SMMUv3 driver.

The virtualizer passes information through the VFIO user API
which cascades them to the iommu subsystem. This allows the guest
to own stage 1 tables and context descriptors (so-called PASID
table) while the host owns stage 2 tables and main configuration
structures (STE).

Best Regards

Eric

This series can be found at:
https://github.com/eauger/linux/tree/5.10-rc4-2stage-v13
(including the VFIO part in his last version: v11)

The series includes a patch from Jean-Philippe. It is better to
review the original patch:
[PATCH v8 2/9] iommu/arm-smmu-v3: Maintain a SID->device structure

The VFIO series is sent separately.

History:

v12 -> v13:
- fixed compilation issue with CONFIG_ARM_SMMU_V3_SVA
  reported by Shameer. This urged me to revisit patch 4 into
  iommu/smmuv3: Allow s1 and s2 configs to coexist where
  s1_cfg and s2_cfg are not dynamically allocated anymore.
  Instead I use a new set field in existing structs
- fixed 2 others config checks
- Updated "iommu/arm-smmu-v3: Maintain a SID->device structure"
  according to the last version

v11 -> v12:
- rebase on top of v5.10-rc4

Eric Auger (14):
  iommu: Introduce attach/detach_pasid_table API
  iommu: Introduce bind/unbind_guest_msi
  iommu/smmuv3: Allow s1 and s2 configs to coexist
  iommu/smmuv3: Get prepared for nested stage support
  iommu/smmuv3: Implement attach/detach_pasid_table
  iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
  iommu/smmuv3: Implement cache_invalidate
  dma-iommu: Implement NESTED_MSI cookie
  iommu/smmuv3: Nested mode single MSI doorbell per domain enforcement
  iommu/smmuv3: Enforce incompatibility between nested mode and HW MSI
    regions
  iommu/smmuv3: Implement bind/unbind_guest_msi
  iommu/smmuv3: Report non recoverable faults
  iommu/smmuv3: Accept configs with more than one context descriptor
  iommu/smmuv3: Add PASID cache invalidation per PASID

Jean-Philippe Brucker (1):
  iommu/arm-smmu-v3: Maintain a SID->device structure

 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 659 ++++++++++++++++++--
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 103 ++-
 drivers/iommu/dma-iommu.c                   | 142 ++++-
 drivers/iommu/iommu.c                       | 105 ++++
 include/linux/dma-iommu.h                   |  16 +
 include/linux/iommu.h                       |  41 ++
 include/uapi/linux/iommu.h                  |  54 ++
 7 files changed, 1042 insertions(+), 78 deletions(-)

-- 
2.21.3


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v13 01/15] iommu: Introduce attach/detach_pasid_table API
  2020-11-18 11:21 [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part) Eric Auger
@ 2020-11-18 11:21 ` Eric Auger
  2020-11-18 16:19   ` Jacob Pan
  2021-02-01 11:27   ` Keqian Zhu
  2020-11-18 11:21 ` [PATCH v13 02/15] iommu: Introduce bind/unbind_guest_msi Eric Auger
                   ` (15 subsequent siblings)
  16 siblings, 2 replies; 57+ messages in thread
From: Eric Auger @ 2020-11-18 11:21 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, iommu, linux-kernel, kvm, kvmarm,
	will, joro, maz, robin.murphy, alex.williamson
  Cc: jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, jacob.jun.pan, yi.l.liu, tn,
	nicoleotsuka, yuzenghui

In virtualization use case, when a guest is assigned
a PCI host device, protected by a virtual IOMMU on the guest,
the physical IOMMU must be programmed to be consistent with
the guest mappings. If the physical IOMMU supports two
translation stages it makes sense to program guest mappings
onto the first stage/level (ARM/Intel terminology) while the host
owns the stage/level 2.

In that case, it is mandated to trap on guest configuration
settings and pass those to the physical iommu driver.

This patch adds a new API to the iommu subsystem that allows
to set/unset the pasid table information.

A generic iommu_pasid_table_config struct is introduced in
a new iommu.h uapi header. This is going to be used by the VFIO
user API.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v12 -> v13:
- Fix config check

v11 -> v12:
- add argsz, name the union
---
 drivers/iommu/iommu.c      | 68 ++++++++++++++++++++++++++++++++++++++
 include/linux/iommu.h      | 21 ++++++++++++
 include/uapi/linux/iommu.h | 54 ++++++++++++++++++++++++++++++
 3 files changed, 143 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index b53446bb8c6b..978fe34378fb 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2171,6 +2171,74 @@ int iommu_uapi_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev
 }
 EXPORT_SYMBOL_GPL(iommu_uapi_sva_unbind_gpasid);
 
+int iommu_attach_pasid_table(struct iommu_domain *domain,
+			     struct iommu_pasid_table_config *cfg)
+{
+	if (unlikely(!domain->ops->attach_pasid_table))
+		return -ENODEV;
+
+	return domain->ops->attach_pasid_table(domain, cfg);
+}
+
+int iommu_uapi_attach_pasid_table(struct iommu_domain *domain,
+				  void __user *uinfo)
+{
+	struct iommu_pasid_table_config pasid_table_data = { 0 };
+	u32 minsz;
+
+	if (unlikely(!domain->ops->attach_pasid_table))
+		return -ENODEV;
+
+	/*
+	 * No new spaces can be added before the variable sized union, the
+	 * minimum size is the offset to the union.
+	 */
+	minsz = offsetof(struct iommu_pasid_table_config, vendor_data);
+
+	/* Copy minsz from user to get flags and argsz */
+	if (copy_from_user(&pasid_table_data, uinfo, minsz))
+		return -EFAULT;
+
+	/* Fields before the variable size union are mandatory */
+	if (pasid_table_data.argsz < minsz)
+		return -EINVAL;
+
+	/* PASID and address granu require additional info beyond minsz */
+	if (pasid_table_data.version != PASID_TABLE_CFG_VERSION_1)
+		return -EINVAL;
+	if (pasid_table_data.format == IOMMU_PASID_FORMAT_SMMUV3 &&
+	    pasid_table_data.argsz <
+		offsetofend(struct iommu_pasid_table_config, vendor_data.smmuv3))
+		return -EINVAL;
+
+	/*
+	 * User might be using a newer UAPI header which has a larger data
+	 * size, we shall support the existing flags within the current
+	 * size. Copy the remaining user data _after_ minsz but not more
+	 * than the current kernel supported size.
+	 */
+	if (copy_from_user((void *)&pasid_table_data + minsz, uinfo + minsz,
+			   min_t(u32, pasid_table_data.argsz, sizeof(pasid_table_data)) - minsz))
+		return -EFAULT;
+
+	/* Now the argsz is validated, check the content */
+	if (pasid_table_data.config < IOMMU_PASID_CONFIG_TRANSLATE ||
+	    pasid_table_data.config > IOMMU_PASID_CONFIG_ABORT)
+		return -EINVAL;
+
+	return domain->ops->attach_pasid_table(domain, &pasid_table_data);
+}
+EXPORT_SYMBOL_GPL(iommu_uapi_attach_pasid_table);
+
+void iommu_detach_pasid_table(struct iommu_domain *domain)
+{
+	if (unlikely(!domain->ops->detach_pasid_table))
+		return;
+
+	domain->ops->detach_pasid_table(domain);
+}
+EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
+
 static void __iommu_detach_device(struct iommu_domain *domain,
 				  struct device *dev)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index b95a6f8db6ff..464fcbecf841 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -223,6 +223,8 @@ struct iommu_iotlb_gather {
  * @cache_invalidate: invalidate translation caches
  * @sva_bind_gpasid: bind guest pasid and mm
  * @sva_unbind_gpasid: unbind guest pasid and mm
+ * @attach_pasid_table: attach a pasid table
+ * @detach_pasid_table: detach the pasid table
  * @def_domain_type: device default domain type, return value:
  *		- IOMMU_DOMAIN_IDENTITY: must use an identity domain
  *		- IOMMU_DOMAIN_DMA: must use a dma domain
@@ -287,6 +289,9 @@ struct iommu_ops {
 				      void *drvdata);
 	void (*sva_unbind)(struct iommu_sva *handle);
 	u32 (*sva_get_pasid)(struct iommu_sva *handle);
+	int (*attach_pasid_table)(struct iommu_domain *domain,
+				  struct iommu_pasid_table_config *cfg);
+	void (*detach_pasid_table)(struct iommu_domain *domain);
 
 	int (*page_response)(struct device *dev,
 			     struct iommu_fault_event *evt,
@@ -434,6 +439,11 @@ extern int iommu_uapi_sva_unbind_gpasid(struct iommu_domain *domain,
 					struct device *dev, void __user *udata);
 extern int iommu_sva_unbind_gpasid(struct iommu_domain *domain,
 				   struct device *dev, ioasid_t pasid);
+extern int iommu_attach_pasid_table(struct iommu_domain *domain,
+				    struct iommu_pasid_table_config *cfg);
+extern int iommu_uapi_attach_pasid_table(struct iommu_domain *domain,
+					 void __user *udata);
+extern void iommu_detach_pasid_table(struct iommu_domain *domain);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
@@ -639,6 +649,7 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev,
 void iommu_sva_unbind_device(struct iommu_sva *handle);
 u32 iommu_sva_get_pasid(struct iommu_sva *handle);
 
+
 #else /* CONFIG_IOMMU_API */
 
 struct iommu_ops {};
@@ -1020,6 +1031,16 @@ iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev)
 	return -ENODEV;
 }
 
+static inline
+int iommu_attach_pasid_table(struct iommu_domain *domain,
+			     struct iommu_pasid_table_config *cfg)
+{
+	return -ENODEV;
+}
+
+static inline
+void iommu_detach_pasid_table(struct iommu_domain *domain) {}
+
 static inline struct iommu_sva *
 iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
 {
diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index e1d9e75f2c94..082d758dd016 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -338,4 +338,58 @@ struct iommu_gpasid_bind_data {
 	} vendor;
 };
 
+/**
+ * struct iommu_pasid_smmuv3 - ARM SMMUv3 Stream Table Entry stage 1 related
+ *     information
+ * @version: API version of this structure
+ * @s1fmt: STE s1fmt (format of the CD table: single CD, linear table
+ *         or 2-level table)
+ * @s1dss: STE s1dss (specifies the behavior when @pasid_bits != 0
+ *         and no PASID is passed along with the incoming transaction)
+ * @padding: reserved for future use (should be zero)
+ *
+ * The PASID table is referred to as the Context Descriptor (CD) table on ARM
+ * SMMUv3. Please refer to the ARM SMMU 3.x spec (ARM IHI 0070A) for full
+ * details.
+ */
+struct iommu_pasid_smmuv3 {
+#define PASID_TABLE_SMMUV3_CFG_VERSION_1 1
+	__u32	version;
+	__u8	s1fmt;
+	__u8	s1dss;
+	__u8	padding[2];
+};
+
+/**
+ * struct iommu_pasid_table_config - PASID table data used to bind guest PASID
+ *     table to the host IOMMU
+ * @argsz: User filled size of this data
+ * @version: API version to prepare for future extensions
+ * @format: format of the PASID table
+ * @base_ptr: guest physical address of the PASID table
+ * @pasid_bits: number of PASID bits used in the PASID table
+ * @config: indicates whether the guest translation stage must
+ *          be translated, bypassed or aborted.
+ * @padding: reserved for future use (should be zero)
+ * @vendor_data.smmuv3: table information when @format is
+ * %IOMMU_PASID_FORMAT_SMMUV3
+ */
+struct iommu_pasid_table_config {
+	__u32	argsz;
+#define PASID_TABLE_CFG_VERSION_1 1
+	__u32	version;
+#define IOMMU_PASID_FORMAT_SMMUV3	1
+	__u32	format;
+	__u64	base_ptr;
+	__u8	pasid_bits;
+#define IOMMU_PASID_CONFIG_TRANSLATE	1
+#define IOMMU_PASID_CONFIG_BYPASS	2
+#define IOMMU_PASID_CONFIG_ABORT	3
+	__u8	config;
+	__u8    padding[2];
+	union {
+		struct iommu_pasid_smmuv3 smmuv3;
+	} vendor_data;
+};
+
 #endif /* _UAPI_IOMMU_H */
-- 
2.21.3


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v13 02/15] iommu: Introduce bind/unbind_guest_msi
  2020-11-18 11:21 [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part) Eric Auger
  2020-11-18 11:21 ` [PATCH v13 01/15] iommu: Introduce attach/detach_pasid_table API Eric Auger
@ 2020-11-18 11:21 ` Eric Auger
  2021-02-01 11:52   ` Keqian Zhu
  2020-11-18 11:21 ` [PATCH v13 03/15] iommu/arm-smmu-v3: Maintain a SID->device structure Eric Auger
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 57+ messages in thread
From: Eric Auger @ 2020-11-18 11:21 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, iommu, linux-kernel, kvm, kvmarm,
	will, joro, maz, robin.murphy, alex.williamson
  Cc: jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, jacob.jun.pan, yi.l.liu, tn,
	nicoleotsuka, yuzenghui

On ARM, MSI are translated by the SMMU. An IOVA is allocated
for each MSI doorbell. If both the host and the guest are exposed
with SMMUs, we end up with 2 different IOVAs allocated by each.
guest allocates an IOVA (gIOVA) to map onto the guest MSI
doorbell (gDB). The Host allocates another IOVA (hIOVA) to map
onto the physical doorbell (hDB).

So we end up with 2 untied mappings:
         S1            S2
gIOVA    ->    gDB
              hIOVA    ->    hDB

Currently the PCI device is programmed by the host with hIOVA
as MSI doorbell. So this does not work.

This patch introduces an API to pass gIOVA/gDB to the host so
that gIOVA can be reused by the host instead of re-allocating
a new IOVA. So the goal is to create the following nested mapping:

         S1            S2
gIOVA    ->    gDB     ->    hDB

and program the PCI device with gIOVA MSI doorbell.

In case we have several devices attached to this nested domain
(devices belonging to the same group), they cannot be isolated
on guest side either. So they should also end up in the same domain
on guest side. We will enforce that all the devices attached to
the host iommu domain use the same physical doorbell and similarly
a single virtual doorbell mapping gets registered (1 single
virtual doorbell is used on guest as well).

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v7 -> v8:
- dummy iommu_unbind_guest_msi turned into a void function

v6 -> v7:
- remove the device handle parameter.
- Add comments saying there can only be a single MSI binding
  registered per iommu_domain
v5 -> v6:
-fix compile issue when IOMMU_API is not set

v3 -> v4:
- add unbind

v2 -> v3:
- add a struct device handle
---
 drivers/iommu/iommu.c | 37 +++++++++++++++++++++++++++++++++++++
 include/linux/iommu.h | 20 ++++++++++++++++++++
 2 files changed, 57 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 978fe34378fb..0b1f458b444f 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2252,6 +2252,43 @@ static void __iommu_detach_device(struct iommu_domain *domain,
 	trace_detach_device_from_domain(dev);
 }
 
+/**
+ * iommu_bind_guest_msi - Passes the stage1 GIOVA/GPA mapping of a
+ * virtual doorbell
+ *
+ * @domain: iommu domain the stage 1 mapping will be attached to
+ * @iova: iova allocated by the guest
+ * @gpa: guest physical address of the virtual doorbell
+ * @size: granule size used for the mapping
+ *
+ * The associated IOVA can be reused by the host to create a nested
+ * stage2 binding mapping translating into the physical doorbell used
+ * by the devices attached to the domain.
+ *
+ * All devices within the domain must share the same physical doorbell.
+ * A single MSI GIOVA/GPA mapping can be attached to an iommu_domain.
+ */
+
+int iommu_bind_guest_msi(struct iommu_domain *domain,
+			 dma_addr_t giova, phys_addr_t gpa, size_t size)
+{
+	if (unlikely(!domain->ops->bind_guest_msi))
+		return -ENODEV;
+
+	return domain->ops->bind_guest_msi(domain, giova, gpa, size);
+}
+EXPORT_SYMBOL_GPL(iommu_bind_guest_msi);
+
+void iommu_unbind_guest_msi(struct iommu_domain *domain,
+			    dma_addr_t iova)
+{
+	if (unlikely(!domain->ops->unbind_guest_msi))
+		return;
+
+	domain->ops->unbind_guest_msi(domain, iova);
+}
+EXPORT_SYMBOL_GPL(iommu_unbind_guest_msi);
+
 void iommu_detach_device(struct iommu_domain *domain, struct device *dev)
 {
 	struct iommu_group *group;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 464fcbecf841..35819bff03bc 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -225,6 +225,8 @@ struct iommu_iotlb_gather {
  * @sva_unbind_gpasid: unbind guest pasid and mm
  * @attach_pasid_table: attach a pasid table
  * @detach_pasid_table: detach the pasid table
+ * @bind_guest_msi: provides a stage1 giova/gpa MSI doorbell mapping
+ * @unbind_guest_msi: withdraw a stage1 giova/gpa MSI doorbell mapping
  * @def_domain_type: device default domain type, return value:
  *		- IOMMU_DOMAIN_IDENTITY: must use an identity domain
  *		- IOMMU_DOMAIN_DMA: must use a dma domain
@@ -305,6 +307,10 @@ struct iommu_ops {
 
 	int (*def_domain_type)(struct device *dev);
 
+	int (*bind_guest_msi)(struct iommu_domain *domain,
+			      dma_addr_t giova, phys_addr_t gpa, size_t size);
+	void (*unbind_guest_msi)(struct iommu_domain *domain, dma_addr_t giova);
+
 	unsigned long pgsize_bitmap;
 	struct module *owner;
 };
@@ -444,6 +450,10 @@ extern int iommu_attach_pasid_table(struct iommu_domain *domain,
 extern int iommu_uapi_attach_pasid_table(struct iommu_domain *domain,
 					 void __user *udata);
 extern void iommu_detach_pasid_table(struct iommu_domain *domain);
+extern int iommu_bind_guest_msi(struct iommu_domain *domain,
+				dma_addr_t giova, phys_addr_t gpa, size_t size);
+extern void iommu_unbind_guest_msi(struct iommu_domain *domain,
+				   dma_addr_t giova);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
@@ -1087,6 +1097,16 @@ static inline struct iommu_fwspec *dev_iommu_fwspec_get(struct device *dev)
 {
 	return NULL;
 }
+
+static inline
+int iommu_bind_guest_msi(struct iommu_domain *domain,
+			 dma_addr_t giova, phys_addr_t gpa, size_t size)
+{
+	return -ENODEV;
+}
+static inline
+void iommu_unbind_guest_msi(struct iommu_domain *domain, dma_addr_t giova) {}
+
 #endif /* CONFIG_IOMMU_API */
 
 /**
-- 
2.21.3


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v13 03/15] iommu/arm-smmu-v3: Maintain a SID->device structure
  2020-11-18 11:21 [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part) Eric Auger
  2020-11-18 11:21 ` [PATCH v13 01/15] iommu: Introduce attach/detach_pasid_table API Eric Auger
  2020-11-18 11:21 ` [PATCH v13 02/15] iommu: Introduce bind/unbind_guest_msi Eric Auger
@ 2020-11-18 11:21 ` Eric Auger
  2021-02-01 12:26   ` Keqian Zhu
  2020-11-18 11:21 ` [PATCH v13 04/15] iommu/smmuv3: Allow s1 and s2 configs to coexist Eric Auger
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 57+ messages in thread
From: Eric Auger @ 2020-11-18 11:21 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, iommu, linux-kernel, kvm, kvmarm,
	will, joro, maz, robin.murphy, alex.williamson
  Cc: jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, jacob.jun.pan, yi.l.liu, tn,
	nicoleotsuka, yuzenghui

From: Jean-Philippe Brucker <jean-philippe@linaro.org>

When handling faults from the event or PRI queue, we need to find the
struct device associated to a SID. Add a rb_tree to keep track of SIDs.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 161 ++++++++++++++++----
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  13 +-
 2 files changed, 144 insertions(+), 30 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index e634bbe60573..1e4acc7f3d3c 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -911,8 +911,8 @@ static void arm_smmu_sync_cd(struct arm_smmu_domain *smmu_domain,
 
 	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
 	list_for_each_entry(master, &smmu_domain->devices, domain_head) {
-		for (i = 0; i < master->num_sids; i++) {
-			cmd.cfgi.sid = master->sids[i];
+		for (i = 0; i < master->num_streams; i++) {
+			cmd.cfgi.sid = master->streams[i].id;
 			arm_smmu_cmdq_batch_add(smmu, &cmds, &cmd);
 		}
 	}
@@ -1350,6 +1350,32 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
 	return 0;
 }
 
+__maybe_unused
+static struct arm_smmu_master *
+arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
+{
+	struct rb_node *node;
+	struct arm_smmu_stream *stream;
+	struct arm_smmu_master *master = NULL;
+
+	mutex_lock(&smmu->streams_mutex);
+	node = smmu->streams.rb_node;
+	while (node) {
+		stream = rb_entry(node, struct arm_smmu_stream, node);
+		if (stream->id < sid) {
+			node = node->rb_right;
+		} else if (stream->id > sid) {
+			node = node->rb_left;
+		} else {
+			master = stream->master;
+			break;
+		}
+	}
+	mutex_unlock(&smmu->streams_mutex);
+
+	return master;
+}
+
 /* IRQ and event handlers */
 static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 {
@@ -1569,8 +1595,8 @@ static int arm_smmu_atc_inv_master(struct arm_smmu_master *master)
 
 	arm_smmu_atc_inv_to_cmd(0, 0, 0, &cmd);
 
-	for (i = 0; i < master->num_sids; i++) {
-		cmd.atc.sid = master->sids[i];
+	for (i = 0; i < master->num_streams; i++) {
+		cmd.atc.sid = master->streams[i].id;
 		arm_smmu_cmdq_issue_cmd(master->smmu, &cmd);
 	}
 
@@ -1613,8 +1639,8 @@ static int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain,
 		if (!master->ats_enabled)
 			continue;
 
-		for (i = 0; i < master->num_sids; i++) {
-			cmd.atc.sid = master->sids[i];
+		for (i = 0; i < master->num_streams; i++) {
+			cmd.atc.sid = master->streams[i].id;
 			arm_smmu_cmdq_batch_add(smmu_domain->smmu, &cmds, &cmd);
 		}
 	}
@@ -2027,13 +2053,13 @@ static void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master)
 	int i, j;
 	struct arm_smmu_device *smmu = master->smmu;
 
-	for (i = 0; i < master->num_sids; ++i) {
-		u32 sid = master->sids[i];
+	for (i = 0; i < master->num_streams; ++i) {
+		u32 sid = master->streams[i].id;
 		__le64 *step = arm_smmu_get_step_for_sid(smmu, sid);
 
 		/* Bridged PCI devices may end up with duplicated IDs */
 		for (j = 0; j < i; j++)
-			if (master->sids[j] == sid)
+			if (master->streams[j].id == sid)
 				break;
 		if (j < i)
 			continue;
@@ -2306,11 +2332,101 @@ static bool arm_smmu_sid_in_range(struct arm_smmu_device *smmu, u32 sid)
 	return sid < limit;
 }
 
+static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
+				  struct arm_smmu_master *master)
+{
+	int i;
+	int ret = 0;
+	struct arm_smmu_stream *new_stream, *cur_stream;
+	struct rb_node **new_node, *parent_node = NULL;
+	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
+
+	master->streams = kcalloc(fwspec->num_ids,
+				  sizeof(struct arm_smmu_stream), GFP_KERNEL);
+	if (!master->streams)
+		return -ENOMEM;
+	master->num_streams = fwspec->num_ids;
+
+	mutex_lock(&smmu->streams_mutex);
+	for (i = 0; i < fwspec->num_ids && !ret; i++) {
+		u32 sid = fwspec->ids[i];
+
+		new_stream = &master->streams[i];
+		new_stream->id = sid;
+		new_stream->master = master;
+
+		/*
+		 * Check the SIDs are in range of the SMMU and our stream table
+		 */
+		if (!arm_smmu_sid_in_range(smmu, sid)) {
+			ret = -ERANGE;
+			break;
+		}
+
+		/* Ensure l2 strtab is initialised */
+		if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) {
+			ret = arm_smmu_init_l2_strtab(smmu, sid);
+			if (ret)
+				break;
+		}
+
+		/* Insert into SID tree */
+		new_node = &(smmu->streams.rb_node);
+		while (*new_node) {
+			cur_stream = rb_entry(*new_node, struct arm_smmu_stream,
+					      node);
+			parent_node = *new_node;
+			if (cur_stream->id > new_stream->id) {
+				new_node = &((*new_node)->rb_left);
+			} else if (cur_stream->id < new_stream->id) {
+				new_node = &((*new_node)->rb_right);
+			} else {
+				dev_warn(master->dev,
+					 "stream %u already in tree\n",
+					 cur_stream->id);
+				ret = -EINVAL;
+				break;
+			}
+		}
+
+		if (!ret) {
+			rb_link_node(&new_stream->node, parent_node, new_node);
+			rb_insert_color(&new_stream->node, &smmu->streams);
+		}
+	}
+
+	if (ret) {
+		for (; i > 0; i--)
+			rb_erase(&master->streams[i].node, &smmu->streams);
+		kfree(master->streams);
+	}
+	mutex_unlock(&smmu->streams_mutex);
+
+	return ret;
+}
+
+static void arm_smmu_remove_master(struct arm_smmu_master *master)
+{
+	int i;
+	struct arm_smmu_device *smmu = master->smmu;
+	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
+
+	if (!smmu || !master->streams)
+		return;
+
+	mutex_lock(&smmu->streams_mutex);
+	for (i = 0; i < fwspec->num_ids; i++)
+		rb_erase(&master->streams[i].node, &smmu->streams);
+	mutex_unlock(&smmu->streams_mutex);
+
+	kfree(master->streams);
+}
+
 static struct iommu_ops arm_smmu_ops;
 
 static struct iommu_device *arm_smmu_probe_device(struct device *dev)
 {
-	int i, ret;
+	int ret;
 	struct arm_smmu_device *smmu;
 	struct arm_smmu_master *master;
 	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
@@ -2331,27 +2447,12 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
 
 	master->dev = dev;
 	master->smmu = smmu;
-	master->sids = fwspec->ids;
-	master->num_sids = fwspec->num_ids;
 	INIT_LIST_HEAD(&master->bonds);
 	dev_iommu_priv_set(dev, master);
 
-	/* Check the SIDs are in range of the SMMU and our stream table */
-	for (i = 0; i < master->num_sids; i++) {
-		u32 sid = master->sids[i];
-
-		if (!arm_smmu_sid_in_range(smmu, sid)) {
-			ret = -ERANGE;
-			goto err_free_master;
-		}
-
-		/* Ensure l2 strtab is initialised */
-		if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) {
-			ret = arm_smmu_init_l2_strtab(smmu, sid);
-			if (ret)
-				goto err_free_master;
-		}
-	}
+	ret = arm_smmu_insert_master(smmu, master);
+	if (ret)
+		goto err_free_master;
 
 	master->ssid_bits = min(smmu->ssid_bits, fwspec->num_pasid_bits);
 
@@ -2389,6 +2490,7 @@ static void arm_smmu_release_device(struct device *dev)
 	WARN_ON(arm_smmu_master_sva_enabled(master));
 	arm_smmu_detach_dev(master);
 	arm_smmu_disable_pasid(master);
+	arm_smmu_remove_master(master);
 	kfree(master);
 	iommu_fwspec_free(dev);
 }
@@ -2808,6 +2910,9 @@ static int arm_smmu_init_structures(struct arm_smmu_device *smmu)
 {
 	int ret;
 
+	mutex_init(&smmu->streams_mutex);
+	smmu->streams = RB_ROOT;
+
 	ret = arm_smmu_init_queues(smmu);
 	if (ret)
 		return ret;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index d4b7f40ccb02..19196eea7c1d 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -636,6 +636,15 @@ struct arm_smmu_device {
 
 	/* IOMMU core code handle */
 	struct iommu_device		iommu;
+
+	struct rb_root			streams;
+	struct mutex			streams_mutex;
+};
+
+struct arm_smmu_stream {
+	u32				id;
+	struct arm_smmu_master		*master;
+	struct rb_node			node;
 };
 
 /* SMMU private data for each master */
@@ -644,8 +653,8 @@ struct arm_smmu_master {
 	struct device			*dev;
 	struct arm_smmu_domain		*domain;
 	struct list_head		domain_head;
-	u32				*sids;
-	unsigned int			num_sids;
+	struct arm_smmu_stream		*streams;
+	unsigned int			num_streams;
 	bool				ats_enabled;
 	bool				sva_enabled;
 	struct list_head		bonds;
-- 
2.21.3


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v13 04/15] iommu/smmuv3: Allow s1 and s2 configs to coexist
  2020-11-18 11:21 [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part) Eric Auger
                   ` (2 preceding siblings ...)
  2020-11-18 11:21 ` [PATCH v13 03/15] iommu/arm-smmu-v3: Maintain a SID->device structure Eric Auger
@ 2020-11-18 11:21 ` Eric Auger
  2021-02-01 12:35   ` Keqian Zhu
  2020-11-18 11:21 ` [PATCH v13 05/15] iommu/smmuv3: Get prepared for nested stage support Eric Auger
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 57+ messages in thread
From: Eric Auger @ 2020-11-18 11:21 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, iommu, linux-kernel, kvm, kvmarm,
	will, joro, maz, robin.murphy, alex.williamson
  Cc: jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, jacob.jun.pan, yi.l.liu, tn,
	nicoleotsuka, yuzenghui

In true nested mode, both s1_cfg and s2_cfg will coexist.
Let's remove the union and add a "set" field in each
config structure telling whether the config is set and needs
to be applied when writing the STE. In legacy nested mode,
only the 2d stage is used. In true nested mode, the "set" field
will be set when the guest passes the pasid table.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v12 -> v13:
- does not dynamically allocate s1-cfg and s2_cfg anymore. Add
  the set field
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 43 +++++++++++++--------
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  8 ++--
 2 files changed, 31 insertions(+), 20 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 1e4acc7f3d3c..18ac5af1b284 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1195,8 +1195,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 	u64 val = le64_to_cpu(dst[0]);
 	bool ste_live = false;
 	struct arm_smmu_device *smmu = NULL;
-	struct arm_smmu_s1_cfg *s1_cfg = NULL;
-	struct arm_smmu_s2_cfg *s2_cfg = NULL;
+	struct arm_smmu_s1_cfg *s1_cfg;
+	struct arm_smmu_s2_cfg *s2_cfg;
 	struct arm_smmu_domain *smmu_domain = NULL;
 	struct arm_smmu_cmdq_ent prefetch_cmd = {
 		.opcode		= CMDQ_OP_PREFETCH_CFG,
@@ -1211,13 +1211,24 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 	}
 
 	if (smmu_domain) {
+		s1_cfg = &smmu_domain->s1_cfg;
+		s2_cfg = &smmu_domain->s2_cfg;
+
 		switch (smmu_domain->stage) {
 		case ARM_SMMU_DOMAIN_S1:
-			s1_cfg = &smmu_domain->s1_cfg;
+			s1_cfg->set = true;
+			s2_cfg->set = false;
 			break;
 		case ARM_SMMU_DOMAIN_S2:
+			s1_cfg->set = false;
+			s2_cfg->set = true;
+			break;
 		case ARM_SMMU_DOMAIN_NESTED:
-			s2_cfg = &smmu_domain->s2_cfg;
+			/*
+			 * Actual usage of stage 1 depends on nested mode:
+			 * legacy (2d stage only) or true nested mode
+			 */
+			s2_cfg->set = true;
 			break;
 		default:
 			break;
@@ -1244,7 +1255,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 	val = STRTAB_STE_0_V;
 
 	/* Bypass/fault */
-	if (!smmu_domain || !(s1_cfg || s2_cfg)) {
+	if (!smmu_domain || !(s1_cfg->set || s2_cfg->set)) {
 		if (!smmu_domain && disable_bypass)
 			val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT);
 		else
@@ -1263,7 +1274,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 		return;
 	}
 
-	if (s1_cfg) {
+	if (s1_cfg->set) {
 		BUG_ON(ste_live);
 		dst[1] = cpu_to_le64(
 			 FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) |
@@ -1282,7 +1293,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 			FIELD_PREP(STRTAB_STE_0_S1FMT, s1_cfg->s1fmt);
 	}
 
-	if (s2_cfg) {
+	if (s2_cfg->set) {
 		BUG_ON(ste_live);
 		dst[2] = cpu_to_le64(
 			 FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) |
@@ -1846,24 +1857,24 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
 {
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	struct arm_smmu_s1_cfg *s1_cfg = &smmu_domain->s1_cfg;
+	struct arm_smmu_s2_cfg *s2_cfg = &smmu_domain->s2_cfg;
 
 	iommu_put_dma_cookie(domain);
 	free_io_pgtable_ops(smmu_domain->pgtbl_ops);
 
 	/* Free the CD and ASID, if we allocated them */
-	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
-		struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
-
+	if (s1_cfg->set) {
 		/* Prevent SVA from touching the CD while we're freeing it */
 		mutex_lock(&arm_smmu_asid_lock);
-		if (cfg->cdcfg.cdtab)
+		if (s1_cfg->cdcfg.cdtab)
 			arm_smmu_free_cd_tables(smmu_domain);
-		arm_smmu_free_asid(&cfg->cd);
+		arm_smmu_free_asid(&s1_cfg->cd);
 		mutex_unlock(&arm_smmu_asid_lock);
-	} else {
-		struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg;
-		if (cfg->vmid)
-			arm_smmu_bitmap_free(smmu->vmid_map, cfg->vmid);
+	}
+	if (s2_cfg->set) {
+		if (s2_cfg->vmid)
+			arm_smmu_bitmap_free(smmu->vmid_map, s2_cfg->vmid);
 	}
 
 	kfree(smmu_domain);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 19196eea7c1d..07f59252dd21 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -562,12 +562,14 @@ struct arm_smmu_s1_cfg {
 	struct arm_smmu_ctx_desc	cd;
 	u8				s1fmt;
 	u8				s1cdmax;
+	bool				set;
 };
 
 struct arm_smmu_s2_cfg {
 	u16				vmid;
 	u64				vttbr;
 	u64				vtcr;
+	bool				set;
 };
 
 struct arm_smmu_strtab_cfg {
@@ -678,10 +680,8 @@ struct arm_smmu_domain {
 	atomic_t			nr_ats_masters;
 
 	enum arm_smmu_domain_stage	stage;
-	union {
-		struct arm_smmu_s1_cfg	s1_cfg;
-		struct arm_smmu_s2_cfg	s2_cfg;
-	};
+	struct arm_smmu_s1_cfg	s1_cfg;
+	struct arm_smmu_s2_cfg	s2_cfg;
 
 	struct iommu_domain		domain;
 
-- 
2.21.3


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v13 05/15] iommu/smmuv3: Get prepared for nested stage support
  2020-11-18 11:21 [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part) Eric Auger
                   ` (3 preceding siblings ...)
  2020-11-18 11:21 ` [PATCH v13 04/15] iommu/smmuv3: Allow s1 and s2 configs to coexist Eric Auger
@ 2020-11-18 11:21 ` Eric Auger
  2020-11-19  3:59   ` kernel test robot
                     ` (3 more replies)
  2020-11-18 11:21 ` [PATCH v13 06/15] iommu/smmuv3: Implement attach/detach_pasid_table Eric Auger
                   ` (11 subsequent siblings)
  16 siblings, 4 replies; 57+ messages in thread
From: Eric Auger @ 2020-11-18 11:21 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, iommu, linux-kernel, kvm, kvmarm,
	will, joro, maz, robin.murphy, alex.williamson
  Cc: jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, jacob.jun.pan, yi.l.liu, tn,
	nicoleotsuka, yuzenghui

When nested stage translation is setup, both s1_cfg and
s2_cfg are set.

We introduce a new smmu domain abort field that will be set
upon guest stage1 configuration passing.

arm_smmu_write_strtab_ent() is modified to write both stage
fields in the STE and deal with the abort field.

In nested mode, only stage 2 is "finalized" as the host does
not own/configure the stage 1 context descriptor; guest does.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v10 -> v11:
- Fix an issue reported by Shameer when switching from with vSMMU
  to without vSMMU. Despite the spec does not seem to mention it
  seems to be needed to reset the 2 high 64b when switching from
  S1+S2 cfg to S1 only. Especially dst[3] needs to be reset (S2TTB).
  On some implementations, if the S2TTB is not reset, this causes
  a C_BAD_STE error
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 64 +++++++++++++++++----
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  2 +
 2 files changed, 56 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 18ac5af1b284..412ea1bafa50 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1181,8 +1181,10 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 	 * three cases at the moment:
 	 *
 	 * 1. Invalid (all zero) -> bypass/fault (init)
-	 * 2. Bypass/fault -> translation/bypass (attach)
-	 * 3. Translation/bypass -> bypass/fault (detach)
+	 * 2. Bypass/fault -> single stage translation/bypass (attach)
+	 * 3. Single or nested stage Translation/bypass -> bypass/fault (detach)
+	 * 4. S2 -> S1 + S2 (attach_pasid_table)
+	 * 5. S1 + S2 -> S2 (detach_pasid_table)
 	 *
 	 * Given that we can't update the STE atomically and the SMMU
 	 * doesn't read the thing in a defined order, that leaves us
@@ -1193,7 +1195,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 	 * 3. Update Config, sync
 	 */
 	u64 val = le64_to_cpu(dst[0]);
-	bool ste_live = false;
+	bool s1_live = false, s2_live = false, ste_live;
+	bool abort, nested = false, translate = false;
 	struct arm_smmu_device *smmu = NULL;
 	struct arm_smmu_s1_cfg *s1_cfg;
 	struct arm_smmu_s2_cfg *s2_cfg;
@@ -1233,6 +1236,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 		default:
 			break;
 		}
+		nested = s1_cfg->set && s2_cfg->set;
+		translate = s1_cfg->set || s2_cfg->set;
 	}
 
 	if (val & STRTAB_STE_0_V) {
@@ -1240,23 +1245,36 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 		case STRTAB_STE_0_CFG_BYPASS:
 			break;
 		case STRTAB_STE_0_CFG_S1_TRANS:
+			s1_live = true;
+			break;
 		case STRTAB_STE_0_CFG_S2_TRANS:
-			ste_live = true;
+			s2_live = true;
+			break;
+		case STRTAB_STE_0_CFG_NESTED:
+			s1_live = true;
+			s2_live = true;
 			break;
 		case STRTAB_STE_0_CFG_ABORT:
-			BUG_ON(!disable_bypass);
 			break;
 		default:
 			BUG(); /* STE corruption */
 		}
 	}
 
+	ste_live = s1_live || s2_live;
+
 	/* Nuke the existing STE_0 value, as we're going to rewrite it */
 	val = STRTAB_STE_0_V;
 
 	/* Bypass/fault */
-	if (!smmu_domain || !(s1_cfg->set || s2_cfg->set)) {
-		if (!smmu_domain && disable_bypass)
+
+	if (!smmu_domain)
+		abort = disable_bypass;
+	else
+		abort = smmu_domain->abort;
+
+	if (abort || !translate) {
+		if (abort)
 			val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT);
 		else
 			val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS);
@@ -1274,8 +1292,16 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 		return;
 	}
 
+	BUG_ON(ste_live && !nested);
+
+	if (ste_live) {
+		/* First invalidate the live STE */
+		dst[0] = cpu_to_le64(STRTAB_STE_0_CFG_ABORT);
+		arm_smmu_sync_ste_for_sid(smmu, sid);
+	}
+
 	if (s1_cfg->set) {
-		BUG_ON(ste_live);
+		BUG_ON(s1_live);
 		dst[1] = cpu_to_le64(
 			 FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) |
 			 FIELD_PREP(STRTAB_STE_1_S1CIR, STRTAB_STE_1_S1C_CACHE_WBRA) |
@@ -1294,7 +1320,14 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 	}
 
 	if (s2_cfg->set) {
-		BUG_ON(ste_live);
+		u64 vttbr = s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK;
+
+		if (s2_live) {
+			u64 s2ttb = le64_to_cpu(dst[3] & STRTAB_STE_3_S2TTB_MASK);
+
+			BUG_ON(s2ttb != vttbr);
+		}
+
 		dst[2] = cpu_to_le64(
 			 FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) |
 			 FIELD_PREP(STRTAB_STE_2_VTCR, s2_cfg->vtcr) |
@@ -1304,9 +1337,12 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 			 STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2AA64 |
 			 STRTAB_STE_2_S2R);
 
-		dst[3] = cpu_to_le64(s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK);
+		dst[3] = cpu_to_le64(vttbr);
 
 		val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS);
+	} else {
+		dst[2] = 0;
+		dst[3] = 0;
 	}
 
 	if (master->ats_enabled)
@@ -1982,6 +2018,14 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
 		return 0;
 	}
 
+	if (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED &&
+	    (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1) ||
+	     !(smmu->features & ARM_SMMU_FEAT_TRANS_S2))) {
+		dev_info(smmu_domain->smmu->dev,
+			 "does not implement two stages\n");
+		return -EINVAL;
+	}
+
 	/* Restrict the stage to what we can actually support */
 	if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1))
 		smmu_domain->stage = ARM_SMMU_DOMAIN_S2;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 07f59252dd21..269779dee8d1 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -206,6 +206,7 @@
 #define STRTAB_STE_0_CFG_BYPASS		4
 #define STRTAB_STE_0_CFG_S1_TRANS	5
 #define STRTAB_STE_0_CFG_S2_TRANS	6
+#define STRTAB_STE_0_CFG_NESTED		7
 
 #define STRTAB_STE_0_S1FMT		GENMASK_ULL(5, 4)
 #define STRTAB_STE_0_S1FMT_LINEAR	0
@@ -682,6 +683,7 @@ struct arm_smmu_domain {
 	enum arm_smmu_domain_stage	stage;
 	struct arm_smmu_s1_cfg	s1_cfg;
 	struct arm_smmu_s2_cfg	s2_cfg;
+	bool				abort;
 
 	struct iommu_domain		domain;
 
-- 
2.21.3


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v13 06/15] iommu/smmuv3: Implement attach/detach_pasid_table
  2020-11-18 11:21 [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part) Eric Auger
                   ` (4 preceding siblings ...)
  2020-11-18 11:21 ` [PATCH v13 05/15] iommu/smmuv3: Get prepared for nested stage support Eric Auger
@ 2020-11-18 11:21 ` Eric Auger
  2021-02-02  8:03   ` Keqian Zhu
  2020-11-18 11:21 ` [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs Eric Auger
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 57+ messages in thread
From: Eric Auger @ 2020-11-18 11:21 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, iommu, linux-kernel, kvm, kvmarm,
	will, joro, maz, robin.murphy, alex.williamson
  Cc: jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, jacob.jun.pan, yi.l.liu, tn,
	nicoleotsuka, yuzenghui

On attach_pasid_table() we program STE S1 related info set
by the guest into the actual physical STEs. At minimum
we need to program the context descriptor GPA and compute
whether the stage1 is translated/bypassed or aborted.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v7 -> v8:
- remove smmu->features check, now done on domain finalize

v6 -> v7:
- check versions and comment the fact we don't need to take
  into account s1dss and s1fmt
v3 -> v4:
- adapt to changes in iommu_pasid_table_config
- different programming convention at s1_cfg/s2_cfg/ste.abort

v2 -> v3:
- callback now is named set_pasid_table and struct fields
  are laid out differently.

v1 -> v2:
- invalidate the STE before changing them
- hold init_mutex
- handle new fields
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 89 +++++++++++++++++++++
 1 file changed, 89 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 412ea1bafa50..805acdc18a3a 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2661,6 +2661,93 @@ static void arm_smmu_get_resv_regions(struct device *dev,
 	iommu_dma_get_resv_regions(dev, head);
 }
 
+static int arm_smmu_attach_pasid_table(struct iommu_domain *domain,
+				       struct iommu_pasid_table_config *cfg)
+{
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct arm_smmu_master *master;
+	struct arm_smmu_device *smmu;
+	unsigned long flags;
+	int ret = -EINVAL;
+
+	if (cfg->format != IOMMU_PASID_FORMAT_SMMUV3)
+		return -EINVAL;
+
+	if (cfg->version != PASID_TABLE_CFG_VERSION_1 ||
+	    cfg->vendor_data.smmuv3.version != PASID_TABLE_SMMUV3_CFG_VERSION_1)
+		return -EINVAL;
+
+	mutex_lock(&smmu_domain->init_mutex);
+
+	smmu = smmu_domain->smmu;
+
+	if (!smmu)
+		goto out;
+
+	if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+		goto out;
+
+	switch (cfg->config) {
+	case IOMMU_PASID_CONFIG_ABORT:
+		smmu_domain->s1_cfg.set = false;
+		smmu_domain->abort = true;
+		break;
+	case IOMMU_PASID_CONFIG_BYPASS:
+		smmu_domain->s1_cfg.set = false;
+		smmu_domain->abort = false;
+		break;
+	case IOMMU_PASID_CONFIG_TRANSLATE:
+		/* we do not support S1 <-> S1 transitions */
+		if (smmu_domain->s1_cfg.set)
+			goto out;
+
+		/*
+		 * we currently support a single CD so s1fmt and s1dss
+		 * fields are also ignored
+		 */
+		if (cfg->pasid_bits)
+			goto out;
+
+		smmu_domain->s1_cfg.cdcfg.cdtab_dma = cfg->base_ptr;
+		smmu_domain->s1_cfg.set = true;
+		smmu_domain->abort = false;
+		break;
+	default:
+		goto out;
+	}
+	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+	list_for_each_entry(master, &smmu_domain->devices, domain_head)
+		arm_smmu_install_ste_for_dev(master);
+	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+	ret = 0;
+out:
+	mutex_unlock(&smmu_domain->init_mutex);
+	return ret;
+}
+
+static void arm_smmu_detach_pasid_table(struct iommu_domain *domain)
+{
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct arm_smmu_master *master;
+	unsigned long flags;
+
+	mutex_lock(&smmu_domain->init_mutex);
+
+	if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+		goto unlock;
+
+	smmu_domain->s1_cfg.set = false;
+	smmu_domain->abort = true;
+
+	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+	list_for_each_entry(master, &smmu_domain->devices, domain_head)
+		arm_smmu_install_ste_for_dev(master);
+	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+
+unlock:
+	mutex_unlock(&smmu_domain->init_mutex);
+}
+
 static bool arm_smmu_dev_has_feature(struct device *dev,
 				     enum iommu_dev_features feat)
 {
@@ -2742,6 +2829,8 @@ static struct iommu_ops arm_smmu_ops = {
 	.of_xlate		= arm_smmu_of_xlate,
 	.get_resv_regions	= arm_smmu_get_resv_regions,
 	.put_resv_regions	= generic_iommu_put_resv_regions,
+	.attach_pasid_table	= arm_smmu_attach_pasid_table,
+	.detach_pasid_table	= arm_smmu_detach_pasid_table,
 	.dev_has_feat		= arm_smmu_dev_has_feature,
 	.dev_feat_enabled	= arm_smmu_dev_feature_enabled,
 	.dev_enable_feat	= arm_smmu_dev_enable_feature,
-- 
2.21.3


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
  2020-11-18 11:21 [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part) Eric Auger
                   ` (5 preceding siblings ...)
  2020-11-18 11:21 ` [PATCH v13 06/15] iommu/smmuv3: Implement attach/detach_pasid_table Eric Auger
@ 2020-11-18 11:21 ` Eric Auger
  2020-12-01 13:33   ` Xingang Wang
  2020-11-18 11:21 ` [PATCH v13 08/15] iommu/smmuv3: Implement cache_invalidate Eric Auger
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 57+ messages in thread
From: Eric Auger @ 2020-11-18 11:21 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, iommu, linux-kernel, kvm, kvmarm,
	will, joro, maz, robin.murphy, alex.williamson
  Cc: jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, jacob.jun.pan, yi.l.liu, tn,
	nicoleotsuka, yuzenghui

With nested stage support, soon we will need to invalidate
S1 contexts and ranges tagged with an unmanaged asid, this
latter being managed by the guest. So let's introduce 2 helpers
that allow to invalidate with externally managed ASIDs

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 35 +++++++++++++++++----
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 805acdc18a3a..fdecc9f17b36 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1697,9 +1697,9 @@ static int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain,
 }
 
 /* IO_PGTABLE API */
-static void arm_smmu_tlb_inv_context(void *cookie)
+static void __arm_smmu_tlb_inv_context(struct arm_smmu_domain *smmu_domain,
+				       int ext_asid)
 {
-	struct arm_smmu_domain *smmu_domain = cookie;
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	struct arm_smmu_cmdq_ent cmd;
 
@@ -1710,7 +1710,11 @@ static void arm_smmu_tlb_inv_context(void *cookie)
 	 * insertion to guarantee those are observed before the TLBI. Do be
 	 * careful, 007.
 	 */
-	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
+	if (ext_asid >= 0) { /* guest stage 1 invalidation */
+		cmd.opcode	= CMDQ_OP_TLBI_NH_ASID;
+		cmd.tlbi.asid	= ext_asid;
+		cmd.tlbi.vmid	= smmu_domain->s2_cfg.vmid;
+	} else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
 		arm_smmu_tlb_inv_asid(smmu, smmu_domain->s1_cfg.cd.asid);
 	} else {
 		cmd.opcode	= CMDQ_OP_TLBI_S12_VMALL;
@@ -1721,9 +1725,17 @@ static void arm_smmu_tlb_inv_context(void *cookie)
 	arm_smmu_atc_inv_domain(smmu_domain, 0, 0, 0);
 }
 
-static void arm_smmu_tlb_inv_range(unsigned long iova, size_t size,
+static void arm_smmu_tlb_inv_context(void *cookie)
+{
+	struct arm_smmu_domain *smmu_domain = cookie;
+
+	__arm_smmu_tlb_inv_context(smmu_domain, -1);
+}
+
+static void __arm_smmu_tlb_inv_range(unsigned long iova, size_t size,
 				   size_t granule, bool leaf,
-				   struct arm_smmu_domain *smmu_domain)
+				   struct arm_smmu_domain *smmu_domain,
+				   int ext_asid)
 {
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	unsigned long start = iova, end = iova + size, num_pages = 0, tg = 0;
@@ -1738,7 +1750,11 @@ static void arm_smmu_tlb_inv_range(unsigned long iova, size_t size,
 	if (!size)
 		return;
 
-	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
+	if (ext_asid >= 0) {  /* guest stage 1 invalidation */
+		cmd.opcode	= CMDQ_OP_TLBI_NH_VA;
+		cmd.tlbi.asid	= ext_asid;
+		cmd.tlbi.vmid	= smmu_domain->s2_cfg.vmid;
+	} else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
 		cmd.opcode	= CMDQ_OP_TLBI_NH_VA;
 		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd.asid;
 	} else {
@@ -1798,6 +1814,13 @@ static void arm_smmu_tlb_inv_range(unsigned long iova, size_t size,
 	arm_smmu_atc_inv_domain(smmu_domain, 0, start, size);
 }
 
+static void arm_smmu_tlb_inv_range(unsigned long iova, size_t size,
+				   size_t granule, bool leaf,
+				   struct arm_smmu_domain *smmu_domain)
+{
+	__arm_smmu_tlb_inv_range(iova, size, granule, leaf, smmu_domain, -1);
+}
+
 static void arm_smmu_tlb_inv_page_nosync(struct iommu_iotlb_gather *gather,
 					 unsigned long iova, size_t granule,
 					 void *cookie)
-- 
2.21.3


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v13 08/15] iommu/smmuv3: Implement cache_invalidate
  2020-11-18 11:21 [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part) Eric Auger
                   ` (6 preceding siblings ...)
  2020-11-18 11:21 ` [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs Eric Auger
@ 2020-11-18 11:21 ` Eric Auger
  2020-11-18 11:21 ` [PATCH v13 09/15] dma-iommu: Implement NESTED_MSI cookie Eric Auger
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 57+ messages in thread
From: Eric Auger @ 2020-11-18 11:21 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, iommu, linux-kernel, kvm, kvmarm,
	will, joro, maz, robin.murphy, alex.williamson
  Cc: jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, jacob.jun.pan, yi.l.liu, tn,
	nicoleotsuka, yuzenghui

Implement domain-selective and page-selective IOTLB invalidations.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v7 -> v8:
- ASID based invalidation using iommu_inv_pasid_info
- check ARCHID/PASID flags in addr based invalidation
- use __arm_smmu_tlb_inv_context and __arm_smmu_tlb_inv_range_nosync

v6 -> v7
- check the uapi version

v3 -> v4:
- adapt to changes in the uapi
- add support for leaf parameter
- do not use arm_smmu_tlb_inv_range_nosync or arm_smmu_tlb_inv_context
  anymore

v2 -> v3:
- replace __arm_smmu_tlb_sync by arm_smmu_cmdq_issue_sync

v1 -> v2:
- properly pass the asid
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 53 +++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index fdecc9f17b36..24124361dd3b 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2771,6 +2771,58 @@ static void arm_smmu_detach_pasid_table(struct iommu_domain *domain)
 	mutex_unlock(&smmu_domain->init_mutex);
 }
 
+static int
+arm_smmu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
+			  struct iommu_cache_invalidate_info *inv_info)
+{
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+	if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+		return -EINVAL;
+
+	if (!smmu)
+		return -EINVAL;
+
+	if (inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
+		return -EINVAL;
+
+	if (inv_info->cache & IOMMU_CACHE_INV_TYPE_IOTLB) {
+		if (inv_info->granularity == IOMMU_INV_GRANU_PASID) {
+			struct iommu_inv_pasid_info *info =
+				&inv_info->granu.pasid_info;
+
+			if (!(info->flags & IOMMU_INV_PASID_FLAGS_ARCHID) ||
+			     (info->flags & IOMMU_INV_PASID_FLAGS_PASID))
+				return -EINVAL;
+
+			__arm_smmu_tlb_inv_context(smmu_domain, info->archid);
+
+		} else if (inv_info->granularity == IOMMU_INV_GRANU_ADDR) {
+			struct iommu_inv_addr_info *info = &inv_info->granu.addr_info;
+			size_t size = info->nb_granules * info->granule_size;
+			bool leaf = info->flags & IOMMU_INV_ADDR_FLAGS_LEAF;
+
+			if (!(info->flags & IOMMU_INV_ADDR_FLAGS_ARCHID) ||
+			     (info->flags & IOMMU_INV_ADDR_FLAGS_PASID))
+				return -EINVAL;
+
+			__arm_smmu_tlb_inv_range(info->addr, size,
+						 info->granule_size, leaf,
+						  smmu_domain, info->archid);
+
+			arm_smmu_cmdq_issue_sync(smmu);
+		} else {
+			return -EINVAL;
+		}
+	}
+	if (inv_info->cache & IOMMU_CACHE_INV_TYPE_PASID ||
+	    inv_info->cache & IOMMU_CACHE_INV_TYPE_DEV_IOTLB) {
+		return -ENOENT;
+	}
+	return 0;
+}
+
 static bool arm_smmu_dev_has_feature(struct device *dev,
 				     enum iommu_dev_features feat)
 {
@@ -2854,6 +2906,7 @@ static struct iommu_ops arm_smmu_ops = {
 	.put_resv_regions	= generic_iommu_put_resv_regions,
 	.attach_pasid_table	= arm_smmu_attach_pasid_table,
 	.detach_pasid_table	= arm_smmu_detach_pasid_table,
+	.cache_invalidate	= arm_smmu_cache_invalidate,
 	.dev_has_feat		= arm_smmu_dev_has_feature,
 	.dev_feat_enabled	= arm_smmu_dev_feature_enabled,
 	.dev_enable_feat	= arm_smmu_dev_enable_feature,
-- 
2.21.3


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v13 09/15] dma-iommu: Implement NESTED_MSI cookie
  2020-11-18 11:21 [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part) Eric Auger
                   ` (7 preceding siblings ...)
  2020-11-18 11:21 ` [PATCH v13 08/15] iommu/smmuv3: Implement cache_invalidate Eric Auger
@ 2020-11-18 11:21 ` Eric Auger
  2020-11-18 11:21 ` [PATCH v13 10/15] iommu/smmuv3: Nested mode single MSI doorbell per domain enforcement Eric Auger
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 57+ messages in thread
From: Eric Auger @ 2020-11-18 11:21 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, iommu, linux-kernel, kvm, kvmarm,
	will, joro, maz, robin.murphy, alex.williamson
  Cc: jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, jacob.jun.pan, yi.l.liu, tn,
	nicoleotsuka, yuzenghui

Up to now, when the type was UNMANAGED, we used to
allocate IOVA pages within a reserved IOVA MSI range.

If both the host and the guest are exposed with SMMUs, each
would allocate an IOVA. The guest allocates an IOVA (gIOVA)
to map onto the guest MSI doorbell (gDB). The Host allocates
another IOVA (hIOVA) to map onto the physical doorbell (hDB).

So we end up with 2 unrelated mappings, at S1 and S2:
         S1             S2
gIOVA    ->     gDB
               hIOVA    ->    hDB

The PCI device would be programmed with hIOVA.
No stage 1 mapping would existing, causing the MSIs to fault.

iommu_dma_bind_guest_msi() allows to pass gIOVA/gDB
to the host so that gIOVA can be used by the host instead of
re-allocating a new hIOVA.

         S1           S2
gIOVA    ->    gDB    ->    hDB

this time, the PCI device can be programmed with the gIOVA MSI
doorbell which is correctly mapped through both stages.

Nested mode is not compatible with HW MSI regions as in that
case gDB and hDB should have a 1-1 mapping. This check will
be done when attaching each device to the IOMMU domain.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v10 -> v11:
- fix compilation if !CONFIG_IOMMU_DMA

v7 -> v8:
- correct iommu_dma_(un)bind_guest_msi when
  !CONFIG_IOMMU_DMA
- Mentioned nested mode is not compatible with HW MSI regions
  in commit message
- protect with msi_lock on unbind

v6 -> v7:
- removed device handle

v3 -> v4:
- change function names; add unregister
- protect with msi_lock

v2 -> v3:
- also store the device handle on S1 mapping registration.
  This garantees we associate the associated S2 mapping binds
  to the correct physical MSI controller.

v1 -> v2:
- unmap stage2 on put()
---
 drivers/iommu/dma-iommu.c | 142 +++++++++++++++++++++++++++++++++++++-
 include/linux/dma-iommu.h |  16 +++++
 2 files changed, 155 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 0cbcd3fc3e7e..a14ecad6b79b 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -19,6 +19,7 @@
 #include <linux/irq.h>
 #include <linux/mm.h>
 #include <linux/mutex.h>
+#include <linux/mutex.h>
 #include <linux/pci.h>
 #include <linux/scatterlist.h>
 #include <linux/vmalloc.h>
@@ -27,12 +28,15 @@
 struct iommu_dma_msi_page {
 	struct list_head	list;
 	dma_addr_t		iova;
+	dma_addr_t		gpa;
 	phys_addr_t		phys;
+	size_t			s1_granule;
 };
 
 enum iommu_dma_cookie_type {
 	IOMMU_DMA_IOVA_COOKIE,
 	IOMMU_DMA_MSI_COOKIE,
+	IOMMU_DMA_NESTED_MSI_COOKIE,
 };
 
 struct iommu_dma_cookie {
@@ -44,6 +48,7 @@ struct iommu_dma_cookie {
 		dma_addr_t		msi_iova;
 	};
 	struct list_head		msi_page_list;
+	spinlock_t			msi_lock;
 
 	/* Domain for flush queue callback; NULL if flush queue not in use */
 	struct iommu_domain		*fq_domain;
@@ -62,6 +67,7 @@ static struct iommu_dma_cookie *cookie_alloc(enum iommu_dma_cookie_type type)
 
 	cookie = kzalloc(sizeof(*cookie), GFP_KERNEL);
 	if (cookie) {
+		spin_lock_init(&cookie->msi_lock);
 		INIT_LIST_HEAD(&cookie->msi_page_list);
 		cookie->type = type;
 	}
@@ -95,14 +101,17 @@ EXPORT_SYMBOL(iommu_get_dma_cookie);
  *
  * Users who manage their own IOVA allocation and do not want DMA API support,
  * but would still like to take advantage of automatic MSI remapping, can use
- * this to initialise their own domain appropriately. Users should reserve a
+ * this to initialise their own domain appropriately. Users may reserve a
  * contiguous IOVA region, starting at @base, large enough to accommodate the
  * number of PAGE_SIZE mappings necessary to cover every MSI doorbell address
- * used by the devices attached to @domain.
+ * used by the devices attached to @domain. The other way round is to provide
+ * usable iova pages through the iommu_dma_bind_doorbell API (nested stages
+ * use case)
  */
 int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
 {
 	struct iommu_dma_cookie *cookie;
+	int nesting, ret;
 
 	if (domain->type != IOMMU_DOMAIN_UNMANAGED)
 		return -EINVAL;
@@ -110,7 +119,12 @@ int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
 	if (domain->iova_cookie)
 		return -EEXIST;
 
-	cookie = cookie_alloc(IOMMU_DMA_MSI_COOKIE);
+	ret =  iommu_domain_get_attr(domain, DOMAIN_ATTR_NESTING, &nesting);
+	if (!ret && nesting)
+		cookie = cookie_alloc(IOMMU_DMA_NESTED_MSI_COOKIE);
+	else
+		cookie = cookie_alloc(IOMMU_DMA_MSI_COOKIE);
+
 	if (!cookie)
 		return -ENOMEM;
 
@@ -131,6 +145,7 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
 {
 	struct iommu_dma_cookie *cookie = domain->iova_cookie;
 	struct iommu_dma_msi_page *msi, *tmp;
+	bool s2_unmap = false;
 
 	if (!cookie)
 		return;
@@ -138,7 +153,15 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
 	if (cookie->type == IOMMU_DMA_IOVA_COOKIE && cookie->iovad.granule)
 		put_iova_domain(&cookie->iovad);
 
+	if (cookie->type == IOMMU_DMA_NESTED_MSI_COOKIE)
+		s2_unmap = true;
+
 	list_for_each_entry_safe(msi, tmp, &cookie->msi_page_list, list) {
+		if (s2_unmap && msi->phys) {
+			size_t size = cookie_msi_granule(cookie);
+
+			WARN_ON(iommu_unmap(domain, msi->gpa, size) != size);
+		}
 		list_del(&msi->list);
 		kfree(msi);
 	}
@@ -147,6 +170,92 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
 }
 EXPORT_SYMBOL(iommu_put_dma_cookie);
 
+/**
+ * iommu_dma_bind_guest_msi - Allows to pass the stage 1
+ * binding of a virtual MSI doorbell used by @dev.
+ *
+ * @domain: domain handle
+ * @iova: guest iova
+ * @gpa: gpa of the virtual doorbell
+ * @size: size of the granule used for the stage1 mapping
+ *
+ * In nested stage use case, the user can provide IOVA/IPA bindings
+ * corresponding to a guest MSI stage 1 mapping. When the host needs
+ * to map its own MSI doorbells, it can use @gpa as stage 2 input
+ * and map it onto the physical MSI doorbell.
+ */
+int iommu_dma_bind_guest_msi(struct iommu_domain *domain,
+			     dma_addr_t iova, phys_addr_t gpa, size_t size)
+{
+	struct iommu_dma_cookie *cookie = domain->iova_cookie;
+	struct iommu_dma_msi_page *msi;
+	int ret = 0;
+
+	if (!cookie)
+		return -EINVAL;
+
+	if (cookie->type != IOMMU_DMA_NESTED_MSI_COOKIE)
+		return -EINVAL;
+
+	iova = iova & ~(dma_addr_t)(size - 1);
+	gpa = gpa & ~(phys_addr_t)(size - 1);
+
+	spin_lock(&cookie->msi_lock);
+
+	list_for_each_entry(msi, &cookie->msi_page_list, list) {
+		if (msi->iova == iova)
+			goto unlock; /* this page is already registered */
+	}
+
+	msi = kzalloc(sizeof(*msi), GFP_ATOMIC);
+	if (!msi) {
+		ret = -ENOMEM;
+		goto unlock;
+	}
+
+	msi->iova = iova;
+	msi->gpa = gpa;
+	msi->s1_granule = size;
+	list_add(&msi->list, &cookie->msi_page_list);
+unlock:
+	spin_unlock(&cookie->msi_lock);
+	return ret;
+}
+EXPORT_SYMBOL(iommu_dma_bind_guest_msi);
+
+void iommu_dma_unbind_guest_msi(struct iommu_domain *domain, dma_addr_t giova)
+{
+	struct iommu_dma_cookie *cookie = domain->iova_cookie;
+	struct iommu_dma_msi_page *msi;
+
+	if (!cookie)
+		return;
+
+	if (cookie->type != IOMMU_DMA_NESTED_MSI_COOKIE)
+		return;
+
+	spin_lock(&cookie->msi_lock);
+
+	list_for_each_entry(msi, &cookie->msi_page_list, list) {
+		dma_addr_t aligned_giova =
+			giova & ~(dma_addr_t)(msi->s1_granule - 1);
+
+		if (msi->iova == aligned_giova) {
+			if (msi->phys) {
+				/* unmap the stage 2 */
+				size_t size = cookie_msi_granule(cookie);
+
+				WARN_ON(iommu_unmap(domain, msi->gpa, size) != size);
+			}
+			list_del(&msi->list);
+			kfree(msi);
+			break;
+		}
+	}
+	spin_unlock(&cookie->msi_lock);
+}
+EXPORT_SYMBOL(iommu_dma_unbind_guest_msi);
+
 /**
  * iommu_dma_get_resv_regions - Reserved region driver helper
  * @dev: Device from iommu_get_resv_regions()
@@ -1210,6 +1319,33 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
 		if (msi_page->phys == msi_addr)
 			return msi_page;
 
+	/*
+	 * In nested stage mode, we do not allocate an MSI page in
+	 * a range provided by the user. Instead, IOVA/IPA bindings are
+	 * individually provided. We reuse thise IOVAs to build the
+	 * GIOVA -> GPA -> MSI HPA nested stage mapping.
+	 */
+	if (cookie->type == IOMMU_DMA_NESTED_MSI_COOKIE) {
+		list_for_each_entry(msi_page, &cookie->msi_page_list, list)
+			if (!msi_page->phys) {
+				int ret;
+
+				/* do the stage 2 mapping */
+				ret = iommu_map(domain,
+						msi_page->gpa, msi_addr, size,
+						IOMMU_MMIO | IOMMU_WRITE);
+				if (ret) {
+					pr_warn("MSI S2 mapping failed (%d)\n",
+						ret);
+					return NULL;
+				}
+				msi_page->phys = msi_addr;
+				return msi_page;
+			}
+		pr_warn("%s no MSI binding found\n", __func__);
+		return NULL;
+	}
+
 	msi_page = kzalloc(sizeof(*msi_page), GFP_KERNEL);
 	if (!msi_page)
 		return NULL;
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 2112f21f73d8..f112ecdb4af6 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -12,6 +12,7 @@
 #include <linux/dma-mapping.h>
 #include <linux/iommu.h>
 #include <linux/msi.h>
+#include <uapi/linux/iommu.h>
 
 /* Domain management interface for IOMMU drivers */
 int iommu_get_dma_cookie(struct iommu_domain *domain);
@@ -36,6 +37,9 @@ void iommu_dma_compose_msi_msg(struct msi_desc *desc,
 			       struct msi_msg *msg);
 
 void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list);
+int iommu_dma_bind_guest_msi(struct iommu_domain *domain,
+			     dma_addr_t iova, phys_addr_t gpa, size_t size);
+void iommu_dma_unbind_guest_msi(struct iommu_domain *domain, dma_addr_t giova);
 
 #else /* CONFIG_IOMMU_DMA */
 
@@ -74,6 +78,18 @@ static inline void iommu_dma_compose_msi_msg(struct msi_desc *desc,
 {
 }
 
+static inline int
+iommu_dma_bind_guest_msi(struct iommu_domain *domain,
+			 dma_addr_t iova, phys_addr_t gpa, size_t size)
+{
+	return -ENODEV;
+}
+
+static inline void
+iommu_dma_unbind_guest_msi(struct iommu_domain *domain, dma_addr_t giova)
+{
+}
+
 static inline void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list)
 {
 }
-- 
2.21.3


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v13 10/15] iommu/smmuv3: Nested mode single MSI doorbell per domain enforcement
  2020-11-18 11:21 [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part) Eric Auger
                   ` (8 preceding siblings ...)
  2020-11-18 11:21 ` [PATCH v13 09/15] dma-iommu: Implement NESTED_MSI cookie Eric Auger
@ 2020-11-18 11:21 ` Eric Auger
  2020-11-18 11:21 ` [PATCH v13 11/15] iommu/smmuv3: Enforce incompatibility between nested mode and HW MSI regions Eric Auger
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 57+ messages in thread
From: Eric Auger @ 2020-11-18 11:21 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, iommu, linux-kernel, kvm, kvmarm,
	will, joro, maz, robin.murphy, alex.williamson
  Cc: jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, jacob.jun.pan, yi.l.liu, tn,
	nicoleotsuka, yuzenghui

In nested mode we enforce the rule that all devices belonging
to the same iommu_domain share the same msi_domain.

Indeed if there were several physical MSI doorbells being used
within a single iommu_domain, it becomes really difficult to
resolve the nested stage mapping translating into the correct
physical doorbell. So let's forbid this situation.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 41 +++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 24124361dd3b..5a05c2074c8a 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2265,6 +2265,37 @@ static void arm_smmu_detach_dev(struct arm_smmu_master *master)
 	arm_smmu_install_ste_for_dev(master);
 }
 
+static bool arm_smmu_share_msi_domain(struct iommu_domain *domain,
+				      struct device *dev)
+{
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct irq_domain *irqd = dev_get_msi_domain(dev);
+	struct arm_smmu_master *master;
+	unsigned long flags;
+	bool share = false;
+
+	if (!irqd)
+		return true;
+
+	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+	list_for_each_entry(master, &smmu_domain->devices, domain_head) {
+		struct irq_domain *d = dev_get_msi_domain(master->dev);
+
+		if (!d)
+			continue;
+		if (irqd != d) {
+			dev_info(dev, "Nested mode forbids to attach devices "
+				 "using different physical MSI doorbells "
+				 "to the same iommu_domain");
+			goto unlock;
+		}
+	}
+	share = true;
+unlock:
+	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+	return share;
+}
+
 static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 {
 	int ret = 0;
@@ -2316,6 +2347,16 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 		ret = -EINVAL;
 		goto out_unlock;
 	}
+	/*
+	 * In nested mode we must check all devices belonging to the
+	 * domain share the same physical MSI doorbell. Otherwise nested
+	 * stage MSI binding is not supported.
+	 */
+	if (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED &&
+		!arm_smmu_share_msi_domain(domain, dev)) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
 
 	master->domain = smmu_domain;
 
-- 
2.21.3


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v13 11/15] iommu/smmuv3: Enforce incompatibility between nested mode and HW MSI regions
  2020-11-18 11:21 [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part) Eric Auger
                   ` (9 preceding siblings ...)
  2020-11-18 11:21 ` [PATCH v13 10/15] iommu/smmuv3: Nested mode single MSI doorbell per domain enforcement Eric Auger
@ 2020-11-18 11:21 ` Eric Auger
  2020-11-18 11:21 ` [PATCH v13 12/15] iommu/smmuv3: Implement bind/unbind_guest_msi Eric Auger
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 57+ messages in thread
From: Eric Auger @ 2020-11-18 11:21 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, iommu, linux-kernel, kvm, kvmarm,
	will, joro, maz, robin.murphy, alex.williamson
  Cc: jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, jacob.jun.pan, yi.l.liu, tn,
	nicoleotsuka, yuzenghui

Nested mode currently is not compatible with HW MSI reserved regions.
Indeed MSI transactions targeting this MSI doorbells bypass the SMMU.

Let's check nested mode is not attempted in such configuration.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 23 +++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 5a05c2074c8a..ccf2fef10b69 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2296,6 +2296,23 @@ static bool arm_smmu_share_msi_domain(struct iommu_domain *domain,
 	return share;
 }
 
+static bool arm_smmu_has_hw_msi_resv_region(struct device *dev)
+{
+	struct iommu_resv_region *region;
+	bool has_msi_resv_region = false;
+	LIST_HEAD(resv_regions);
+
+	iommu_get_resv_regions(dev, &resv_regions);
+	list_for_each_entry(region, &resv_regions, list) {
+		if (region->type == IOMMU_RESV_MSI) {
+			has_msi_resv_region = true;
+			break;
+		}
+	}
+	iommu_put_resv_regions(dev, &resv_regions);
+	return has_msi_resv_region;
+}
+
 static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 {
 	int ret = 0;
@@ -2350,10 +2367,12 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	/*
 	 * In nested mode we must check all devices belonging to the
 	 * domain share the same physical MSI doorbell. Otherwise nested
-	 * stage MSI binding is not supported.
+	 * stage MSI binding is not supported. Also nested mode is not
+	 * compatible with MSI HW reserved regions.
 	 */
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED &&
-		!arm_smmu_share_msi_domain(domain, dev)) {
+		(!arm_smmu_share_msi_domain(domain, dev) ||
+		 arm_smmu_has_hw_msi_resv_region(dev))) {
 		ret = -EINVAL;
 		goto out_unlock;
 	}
-- 
2.21.3


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v13 12/15] iommu/smmuv3: Implement bind/unbind_guest_msi
  2020-11-18 11:21 [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part) Eric Auger
                   ` (10 preceding siblings ...)
  2020-11-18 11:21 ` [PATCH v13 11/15] iommu/smmuv3: Enforce incompatibility between nested mode and HW MSI regions Eric Auger
@ 2020-11-18 11:21 ` Eric Auger
  2020-11-18 11:21 ` [PATCH v13 13/15] iommu/smmuv3: Report non recoverable faults Eric Auger
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 57+ messages in thread
From: Eric Auger @ 2020-11-18 11:21 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, iommu, linux-kernel, kvm, kvmarm,
	will, joro, maz, robin.murphy, alex.williamson
  Cc: jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, jacob.jun.pan, yi.l.liu, tn,
	nicoleotsuka, yuzenghui

The bind/unbind_guest_msi() callbacks check the domain
is NESTED and redirect to the dma-iommu implementation.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v6 -> v7:
- remove device handle argument
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 43 +++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index ccf2fef10b69..1b8dad340899 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2744,6 +2744,47 @@ static void arm_smmu_get_resv_regions(struct device *dev,
 	iommu_dma_get_resv_regions(dev, head);
 }
 
+static int
+arm_smmu_bind_guest_msi(struct iommu_domain *domain,
+			dma_addr_t giova, phys_addr_t gpa, size_t size)
+{
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct arm_smmu_device *smmu;
+	int ret = -EINVAL;
+
+	mutex_lock(&smmu_domain->init_mutex);
+	smmu = smmu_domain->smmu;
+	if (!smmu)
+		goto out;
+
+	if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+		goto out;
+
+	ret = iommu_dma_bind_guest_msi(domain, giova, gpa, size);
+out:
+	mutex_unlock(&smmu_domain->init_mutex);
+	return ret;
+}
+
+static void
+arm_smmu_unbind_guest_msi(struct iommu_domain *domain, dma_addr_t giova)
+{
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct arm_smmu_device *smmu;
+
+	mutex_lock(&smmu_domain->init_mutex);
+	smmu = smmu_domain->smmu;
+	if (!smmu)
+		goto unlock;
+
+	if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+		goto unlock;
+
+	iommu_dma_unbind_guest_msi(domain, giova);
+unlock:
+	mutex_unlock(&smmu_domain->init_mutex);
+}
+
 static int arm_smmu_attach_pasid_table(struct iommu_domain *domain,
 				       struct iommu_pasid_table_config *cfg)
 {
@@ -2967,6 +3008,8 @@ static struct iommu_ops arm_smmu_ops = {
 	.attach_pasid_table	= arm_smmu_attach_pasid_table,
 	.detach_pasid_table	= arm_smmu_detach_pasid_table,
 	.cache_invalidate	= arm_smmu_cache_invalidate,
+	.bind_guest_msi		= arm_smmu_bind_guest_msi,
+	.unbind_guest_msi	= arm_smmu_unbind_guest_msi,
 	.dev_has_feat		= arm_smmu_dev_has_feature,
 	.dev_feat_enabled	= arm_smmu_dev_feature_enabled,
 	.dev_enable_feat	= arm_smmu_dev_enable_feature,
-- 
2.21.3


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v13 13/15] iommu/smmuv3: Report non recoverable faults
  2020-11-18 11:21 [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part) Eric Auger
                   ` (11 preceding siblings ...)
  2020-11-18 11:21 ` [PATCH v13 12/15] iommu/smmuv3: Implement bind/unbind_guest_msi Eric Auger
@ 2020-11-18 11:21 ` Eric Auger
  2020-11-18 11:21 ` [PATCH v13 14/15] iommu/smmuv3: Accept configs with more than one context descriptor Eric Auger
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 57+ messages in thread
From: Eric Auger @ 2020-11-18 11:21 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, iommu, linux-kernel, kvm, kvmarm,
	will, joro, maz, robin.murphy, alex.williamson
  Cc: jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, jacob.jun.pan, yi.l.liu, tn,
	nicoleotsuka, yuzenghui

When a stage 1 related fault event is read from the event queue,
let's propagate it to potential external fault listeners, ie. users
who registered a fault handler.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v8 -> v9:
- adapt to the removal of IOMMU_FAULT_UNRECOV_PERM_VALID:
  only look at IOMMU_FAULT_UNRECOV_ADDR_VALID which comes with
  perm
- do not advertise IOMMU_FAULT_UNRECOV_PASID_VALID faults for
  translation faults
- trace errors if !master
- test nested before calling iommu_report_device_fault
- call the fault handler unconditionnally in non nested mode

v4 -> v5:
- s/IOMMU_FAULT_PERM_INST/IOMMU_FAULT_PERM_EXEC
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 102 +++++++++++++++++---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  80 +++++++++++++++
 2 files changed, 171 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 1b8dad340899..977c22d08612 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1397,7 +1397,6 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
 	return 0;
 }
 
-__maybe_unused
 static struct arm_smmu_master *
 arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
 {
@@ -1423,25 +1422,106 @@ arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
 	return master;
 }
 
+/* Populates the record fields according to the input SMMU event */
+static bool arm_smmu_transcode_fault(u64 *evt, u8 type,
+				     struct iommu_fault_unrecoverable *record)
+{
+	const struct arm_smmu_fault_propagation_data *data;
+	u32 fields;
+
+	if (type >= ARRAY_SIZE(fault_propagation))
+		return false;
+
+	data = &fault_propagation[type];
+	if (!data->reason)
+		return false;
+
+	fields = data->fields;
+
+	if (data->s1_check & FIELD_GET(EVTQ_1_S2, evt[1]))
+		return false; /* S2 related fault, don't propagate */
+
+	if (fields & IOMMU_FAULT_UNRECOV_PASID_VALID)
+		record->pasid = FIELD_GET(EVTQ_0_SUBSTREAMID, evt[0]);
+	else {
+		/* all other transcoded errors have SSV */
+		if (FIELD_GET(EVTQ_0_SSV, evt[0])) {
+			record->pasid = FIELD_GET(EVTQ_0_SUBSTREAMID, evt[0]);
+			fields |= IOMMU_FAULT_UNRECOV_PASID_VALID;
+		}
+	}
+
+	if (fields & IOMMU_FAULT_UNRECOV_ADDR_VALID) {
+		if (FIELD_GET(EVTQ_1_RNW, evt[1]))
+			record->perm = IOMMU_FAULT_PERM_READ;
+		else
+			record->perm = IOMMU_FAULT_PERM_WRITE;
+		if (FIELD_GET(EVTQ_1_PNU, evt[1]))
+			record->perm |= IOMMU_FAULT_PERM_PRIV;
+		if (FIELD_GET(EVTQ_1_IND, evt[1]))
+			record->perm |= IOMMU_FAULT_PERM_EXEC;
+		record->addr = evt[2];
+	}
+
+	if (fields & IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID)
+		record->fetch_addr = FIELD_GET(EVTQ_3_FETCH_ADDR, evt[3]);
+
+	record->flags = fields;
+	record->reason = data->reason;
+	return true;
+}
+
+static void arm_smmu_report_event(struct arm_smmu_device *smmu, u64 *evt)
+{
+	u32 sid = FIELD_GET(EVTQ_0_STREAMID, evt[0]);
+	u8 type = FIELD_GET(EVTQ_0_ID, evt[0]);
+	struct arm_smmu_master *master;
+	struct iommu_fault_event event = {};
+	bool nested;
+	int i;
+
+	master = arm_smmu_find_master(smmu, sid);
+	if (!master || !master->domain)
+		goto out;
+
+	event.fault.type = IOMMU_FAULT_DMA_UNRECOV;
+
+	nested = (master->domain->stage == ARM_SMMU_DOMAIN_NESTED);
+
+	if (nested) {
+		if (arm_smmu_transcode_fault(evt, type, &event.fault.event)) {
+			/*
+			 * Only S1 related faults should be reported to the
+			 * guest and must not flood the host log.
+			 * Also a fault handler should have been registered
+			 * to guarantee the full nested functionality
+			 */
+			WARN_ON_ONCE(iommu_report_device_fault(master->dev,
+							       &event));
+			return;
+		}
+	} else {
+		iommu_report_device_fault(master->dev, &event);
+	}
+out:
+	dev_info(smmu->dev, "event 0x%02x received:\n", type);
+	for (i = 0; i < EVTQ_ENT_DWORDS; ++i) {
+		dev_info(smmu->dev, "\t0x%016llx\n",
+			 (unsigned long long)evt[i]);
+	}
+}
+
 /* IRQ and event handlers */
 static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 {
-	int i;
 	struct arm_smmu_device *smmu = dev;
 	struct arm_smmu_queue *q = &smmu->evtq.q;
 	struct arm_smmu_ll_queue *llq = &q->llq;
 	u64 evt[EVTQ_ENT_DWORDS];
 
 	do {
-		while (!queue_remove_raw(q, evt)) {
-			u8 id = FIELD_GET(EVTQ_0_ID, evt[0]);
-
-			dev_info(smmu->dev, "event 0x%02x received:\n", id);
-			for (i = 0; i < ARRAY_SIZE(evt); ++i)
-				dev_info(smmu->dev, "\t0x%016llx\n",
-					 (unsigned long long)evt[i]);
-
-		}
+		while (!queue_remove_raw(q, evt))
+			arm_smmu_report_event(smmu, evt);
 
 		/*
 		 * Not much we can do on overflow, so scream and pretend we're
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 269779dee8d1..452b094afac7 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -152,6 +152,26 @@
 #define ARM_SMMU_PRIQ_IRQ_CFG1		0xd8
 #define ARM_SMMU_PRIQ_IRQ_CFG2		0xdc
 
+/* Events */
+#define ARM_SMMU_EVT_F_UUT		0x01
+#define ARM_SMMU_EVT_C_BAD_STREAMID	0x02
+#define ARM_SMMU_EVT_F_STE_FETCH	0x03
+#define ARM_SMMU_EVT_C_BAD_STE		0x04
+#define ARM_SMMU_EVT_F_BAD_ATS_TREQ	0x05
+#define ARM_SMMU_EVT_F_STREAM_DISABLED	0x06
+#define ARM_SMMU_EVT_F_TRANSL_FORBIDDEN	0x07
+#define ARM_SMMU_EVT_C_BAD_SUBSTREAMID	0x08
+#define ARM_SMMU_EVT_F_CD_FETCH		0x09
+#define ARM_SMMU_EVT_C_BAD_CD		0x0a
+#define ARM_SMMU_EVT_F_WALK_EABT	0x0b
+#define ARM_SMMU_EVT_F_TRANSLATION	0x10
+#define ARM_SMMU_EVT_F_ADDR_SIZE	0x11
+#define ARM_SMMU_EVT_F_ACCESS		0x12
+#define ARM_SMMU_EVT_F_PERMISSION	0x13
+#define ARM_SMMU_EVT_F_TLB_CONFLICT	0x20
+#define ARM_SMMU_EVT_F_CFG_CONFLICT	0x21
+#define ARM_SMMU_EVT_E_PAGE_REQUEST	0x24
+
 #define ARM_SMMU_REG_SZ			0xe00
 
 /* Common MSI config fields */
@@ -370,6 +390,15 @@
 #define EVTQ_MAX_SZ_SHIFT		(Q_MAX_SZ_SHIFT - EVTQ_ENT_SZ_SHIFT)
 
 #define EVTQ_0_ID			GENMASK_ULL(7, 0)
+#define EVTQ_0_SSV			GENMASK_ULL(11, 11)
+#define EVTQ_0_SUBSTREAMID		GENMASK_ULL(31, 12)
+#define EVTQ_0_STREAMID			GENMASK_ULL(63, 32)
+#define EVTQ_1_PNU			GENMASK_ULL(33, 33)
+#define EVTQ_1_IND			GENMASK_ULL(34, 34)
+#define EVTQ_1_RNW			GENMASK_ULL(35, 35)
+#define EVTQ_1_S2			GENMASK_ULL(39, 39)
+#define EVTQ_1_CLASS			GENMASK_ULL(40, 41)
+#define EVTQ_3_FETCH_ADDR		GENMASK_ULL(51, 3)
 
 /* PRI queue */
 #define PRIQ_ENT_SZ_SHIFT		4
@@ -691,6 +720,57 @@ struct arm_smmu_domain {
 	spinlock_t			devices_lock;
 };
 
+/* fault propagation */
+struct arm_smmu_fault_propagation_data {
+	enum iommu_fault_reason reason;
+	bool s1_check;
+	u32 fields; /* IOMMU_FAULT_UNRECOV_*_VALID bits */
+};
+
+/*
+ * Describes how SMMU faults translate into generic IOMMU faults
+ * and if they need to be reported externally
+ */
+static const struct arm_smmu_fault_propagation_data fault_propagation[] = {
+[ARM_SMMU_EVT_F_UUT]			= { },
+[ARM_SMMU_EVT_C_BAD_STREAMID]		= { },
+[ARM_SMMU_EVT_F_STE_FETCH]		= { },
+[ARM_SMMU_EVT_C_BAD_STE]		= { },
+[ARM_SMMU_EVT_F_BAD_ATS_TREQ]		= { },
+[ARM_SMMU_EVT_F_STREAM_DISABLED]	= { },
+[ARM_SMMU_EVT_F_TRANSL_FORBIDDEN]	= { },
+[ARM_SMMU_EVT_C_BAD_SUBSTREAMID]	= {IOMMU_FAULT_REASON_PASID_INVALID,
+					   false,
+					   IOMMU_FAULT_UNRECOV_PASID_VALID
+					  },
+[ARM_SMMU_EVT_F_CD_FETCH]		= {IOMMU_FAULT_REASON_PASID_FETCH,
+					   false,
+					   IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID
+					  },
+[ARM_SMMU_EVT_C_BAD_CD]			= {IOMMU_FAULT_REASON_BAD_PASID_ENTRY,
+					   false,
+					  },
+[ARM_SMMU_EVT_F_WALK_EABT]		= {IOMMU_FAULT_REASON_WALK_EABT, true,
+					   IOMMU_FAULT_UNRECOV_ADDR_VALID |
+					   IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID
+					  },
+[ARM_SMMU_EVT_F_TRANSLATION]		= {IOMMU_FAULT_REASON_PTE_FETCH, true,
+					   IOMMU_FAULT_UNRECOV_ADDR_VALID
+					  },
+[ARM_SMMU_EVT_F_ADDR_SIZE]		= {IOMMU_FAULT_REASON_OOR_ADDRESS, true,
+					   IOMMU_FAULT_UNRECOV_ADDR_VALID
+					  },
+[ARM_SMMU_EVT_F_ACCESS]			= {IOMMU_FAULT_REASON_ACCESS, true,
+					   IOMMU_FAULT_UNRECOV_ADDR_VALID
+					  },
+[ARM_SMMU_EVT_F_PERMISSION]		= {IOMMU_FAULT_REASON_PERMISSION, true,
+					   IOMMU_FAULT_UNRECOV_ADDR_VALID
+					  },
+[ARM_SMMU_EVT_F_TLB_CONFLICT]		= { },
+[ARM_SMMU_EVT_F_CFG_CONFLICT]		= { },
+[ARM_SMMU_EVT_E_PAGE_REQUEST]		= { },
+};
+
 extern struct xarray arm_smmu_asid_xa;
 extern struct mutex arm_smmu_asid_lock;
 
-- 
2.21.3


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v13 14/15] iommu/smmuv3: Accept configs with more than one context descriptor
  2020-11-18 11:21 [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part) Eric Auger
                   ` (12 preceding siblings ...)
  2020-11-18 11:21 ` [PATCH v13 13/15] iommu/smmuv3: Report non recoverable faults Eric Auger
@ 2020-11-18 11:21 ` Eric Auger
  2020-11-18 11:21 ` [PATCH v13 15/15] iommu/smmuv3: Add PASID cache invalidation per PASID Eric Auger
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 57+ messages in thread
From: Eric Auger @ 2020-11-18 11:21 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, iommu, linux-kernel, kvm, kvmarm,
	will, joro, maz, robin.murphy, alex.williamson
  Cc: jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, jacob.jun.pan, yi.l.liu, tn,
	nicoleotsuka, yuzenghui

In preparation for vSVA, let's accept userspace provided configs
with more than one CD. We check the max CD against the host iommu
capability and also the format (linear versus 2 level).

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 977c22d08612..ed64699a4a0d 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2905,14 +2905,17 @@ static int arm_smmu_attach_pasid_table(struct iommu_domain *domain,
 		if (smmu_domain->s1_cfg.set)
 			goto out;
 
-		/*
-		 * we currently support a single CD so s1fmt and s1dss
-		 * fields are also ignored
-		 */
-		if (cfg->pasid_bits)
+		list_for_each_entry(master, &smmu_domain->devices, domain_head) {
+			if (cfg->pasid_bits > master->ssid_bits)
+				goto out;
+		}
+		if (cfg->vendor_data.smmuv3.s1fmt == STRTAB_STE_0_S1FMT_64K_L2 &&
+				!(smmu->features & ARM_SMMU_FEAT_2_LVL_CDTAB))
 			goto out;
 
 		smmu_domain->s1_cfg.cdcfg.cdtab_dma = cfg->base_ptr;
+		smmu_domain->s1_cfg.s1cdmax = cfg->pasid_bits;
+		smmu_domain->s1_cfg.s1fmt = cfg->vendor_data.smmuv3.s1fmt;
 		smmu_domain->s1_cfg.set = true;
 		smmu_domain->abort = false;
 		break;
-- 
2.21.3


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v13 15/15] iommu/smmuv3: Add PASID cache invalidation per PASID
  2020-11-18 11:21 [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part) Eric Auger
                   ` (13 preceding siblings ...)
  2020-11-18 11:21 ` [PATCH v13 14/15] iommu/smmuv3: Accept configs with more than one context descriptor Eric Auger
@ 2020-11-18 11:21 ` Eric Auger
  2021-01-08 17:05 ` [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part) Shameerali Kolothum Thodi
  2021-03-15 18:04 ` Krishna Reddy
  16 siblings, 0 replies; 57+ messages in thread
From: Eric Auger @ 2020-11-18 11:21 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, iommu, linux-kernel, kvm, kvmarm,
	will, joro, maz, robin.murphy, alex.williamson
  Cc: jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, jacob.jun.pan, yi.l.liu, tn,
	nicoleotsuka, yuzenghui

In order to cascade guest CFGI_CD, let's add PASID cache invalidation
per PASID.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v12 -> v13:
- Fix !(info->flags & IOMMU_INV_PASID_FLAGS_PASID) check
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index ed64699a4a0d..45adfe4da11b 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2999,9 +2999,19 @@ arm_smmu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
 		} else {
 			return -EINVAL;
 		}
-	}
-	if (inv_info->cache & IOMMU_CACHE_INV_TYPE_PASID ||
-	    inv_info->cache & IOMMU_CACHE_INV_TYPE_DEV_IOTLB) {
+	} else if (inv_info->cache & IOMMU_CACHE_INV_TYPE_PASID) {
+		if (inv_info->granularity == IOMMU_INV_GRANU_PASID) {
+			struct iommu_inv_pasid_info *info =
+				&inv_info->granu.pasid_info;
+
+			if (!(info->flags & IOMMU_INV_PASID_FLAGS_PASID))
+				return -EINVAL;
+
+			arm_smmu_sync_cd(smmu_domain, info->pasid, true);
+		} else {
+			return -ENOENT;
+		}
+	} else { /* IOMMU_CACHE_INV_TYPE_DEV_IOTLB */
 		return -ENOENT;
 	}
 	return 0;
-- 
2.21.3


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v13 01/15] iommu: Introduce attach/detach_pasid_table API
  2020-11-18 11:21 ` [PATCH v13 01/15] iommu: Introduce attach/detach_pasid_table API Eric Auger
@ 2020-11-18 16:19   ` Jacob Pan
  2020-11-19 17:02     ` Auger Eric
  2021-02-01 11:27   ` Keqian Zhu
  1 sibling, 1 reply; 57+ messages in thread
From: Jacob Pan @ 2020-11-18 16:19 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, iommu, linux-kernel, kvm, kvmarm, will, joro,
	maz, robin.murphy, alex.williamson, jean-philippe, zhangfei.gao,
	zhangfei.gao, vivek.gautam, shameerali.kolothum.thodi, yi.l.liu,
	tn, nicoleotsuka, yuzenghui, jacob.jun.pan

Hi Eric,

On Wed, 18 Nov 2020 12:21:37 +0100, Eric Auger <eric.auger@redhat.com>
wrote:

> In virtualization use case, when a guest is assigned
> a PCI host device, protected by a virtual IOMMU on the guest,
> the physical IOMMU must be programmed to be consistent with
> the guest mappings. If the physical IOMMU supports two
> translation stages it makes sense to program guest mappings
> onto the first stage/level (ARM/Intel terminology) while the host
> owns the stage/level 2.
> 
> In that case, it is mandated to trap on guest configuration
> settings and pass those to the physical iommu driver.
> 
> This patch adds a new API to the iommu subsystem that allows
> to set/unset the pasid table information.
> 
> A generic iommu_pasid_table_config struct is introduced in
> a new iommu.h uapi header. This is going to be used by the VFIO
> user API.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> ---
> 
> v12 -> v13:
> - Fix config check
> 
> v11 -> v12:
> - add argsz, name the union
> ---
>  drivers/iommu/iommu.c      | 68 ++++++++++++++++++++++++++++++++++++++
>  include/linux/iommu.h      | 21 ++++++++++++
>  include/uapi/linux/iommu.h | 54 ++++++++++++++++++++++++++++++
>  3 files changed, 143 insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index b53446bb8c6b..978fe34378fb 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2171,6 +2171,74 @@ int iommu_uapi_sva_unbind_gpasid(struct
> iommu_domain *domain, struct device *dev }
>  EXPORT_SYMBOL_GPL(iommu_uapi_sva_unbind_gpasid);
>  
> +int iommu_attach_pasid_table(struct iommu_domain *domain,
> +			     struct iommu_pasid_table_config *cfg)
> +{
> +	if (unlikely(!domain->ops->attach_pasid_table))
> +		return -ENODEV;
> +
> +	return domain->ops->attach_pasid_table(domain, cfg);
> +}
> +
> +int iommu_uapi_attach_pasid_table(struct iommu_domain *domain,
> +				  void __user *uinfo)
> +{
> +	struct iommu_pasid_table_config pasid_table_data = { 0 };
> +	u32 minsz;
> +
> +	if (unlikely(!domain->ops->attach_pasid_table))
> +		return -ENODEV;
> +
> +	/*
> +	 * No new spaces can be added before the variable sized union,
> the
> +	 * minimum size is the offset to the union.
> +	 */
> +	minsz = offsetof(struct iommu_pasid_table_config, vendor_data);
> +
> +	/* Copy minsz from user to get flags and argsz */
> +	if (copy_from_user(&pasid_table_data, uinfo, minsz))
> +		return -EFAULT;
> +
> +	/* Fields before the variable size union are mandatory */
> +	if (pasid_table_data.argsz < minsz)
> +		return -EINVAL;
> +
> +	/* PASID and address granu require additional info beyond minsz
> */
> +	if (pasid_table_data.version != PASID_TABLE_CFG_VERSION_1)
> +		return -EINVAL;
> +	if (pasid_table_data.format == IOMMU_PASID_FORMAT_SMMUV3 &&
> +	    pasid_table_data.argsz <
> +		offsetofend(struct iommu_pasid_table_config,
> vendor_data.smmuv3))
> +		return -EINVAL;
> +
> +	/*
> +	 * User might be using a newer UAPI header which has a larger
> data
> +	 * size, we shall support the existing flags within the current
> +	 * size. Copy the remaining user data _after_ minsz but not more
> +	 * than the current kernel supported size.
> +	 */
> +	if (copy_from_user((void *)&pasid_table_data + minsz, uinfo +
> minsz,
> +			   min_t(u32, pasid_table_data.argsz,
> sizeof(pasid_table_data)) - minsz))
> +		return -EFAULT;
> +
> +	/* Now the argsz is validated, check the content */
> +	if (pasid_table_data.config < IOMMU_PASID_CONFIG_TRANSLATE ||
> +	    pasid_table_data.config > IOMMU_PASID_CONFIG_ABORT)
> +		return -EINVAL;
> +
> +	return domain->ops->attach_pasid_table(domain,
> &pasid_table_data); +}
> +EXPORT_SYMBOL_GPL(iommu_uapi_attach_pasid_table);
> +
> +void iommu_detach_pasid_table(struct iommu_domain *domain)
> +{
> +	if (unlikely(!domain->ops->detach_pasid_table))
> +		return;
> +
> +	domain->ops->detach_pasid_table(domain);
> +}
> +EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
> +
>  static void __iommu_detach_device(struct iommu_domain *domain,
>  				  struct device *dev)
>  {
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index b95a6f8db6ff..464fcbecf841 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -223,6 +223,8 @@ struct iommu_iotlb_gather {
>   * @cache_invalidate: invalidate translation caches
>   * @sva_bind_gpasid: bind guest pasid and mm
>   * @sva_unbind_gpasid: unbind guest pasid and mm
> + * @attach_pasid_table: attach a pasid table
> + * @detach_pasid_table: detach the pasid table
>   * @def_domain_type: device default domain type, return value:
>   *		- IOMMU_DOMAIN_IDENTITY: must use an identity domain
>   *		- IOMMU_DOMAIN_DMA: must use a dma domain
> @@ -287,6 +289,9 @@ struct iommu_ops {
>  				      void *drvdata);
>  	void (*sva_unbind)(struct iommu_sva *handle);
>  	u32 (*sva_get_pasid)(struct iommu_sva *handle);
> +	int (*attach_pasid_table)(struct iommu_domain *domain,
> +				  struct iommu_pasid_table_config *cfg);
> +	void (*detach_pasid_table)(struct iommu_domain *domain);
>  
>  	int (*page_response)(struct device *dev,
>  			     struct iommu_fault_event *evt,
> @@ -434,6 +439,11 @@ extern int iommu_uapi_sva_unbind_gpasid(struct
> iommu_domain *domain, struct device *dev, void __user *udata);
>  extern int iommu_sva_unbind_gpasid(struct iommu_domain *domain,
>  				   struct device *dev, ioasid_t pasid);
> +extern int iommu_attach_pasid_table(struct iommu_domain *domain,
> +				    struct iommu_pasid_table_config
> *cfg); +extern int iommu_uapi_attach_pasid_table(struct iommu_domain
> *domain,
> +					 void __user *udata);
> +extern void iommu_detach_pasid_table(struct iommu_domain *domain);
>  extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
>  extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
>  extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
> @@ -639,6 +649,7 @@ struct iommu_sva *iommu_sva_bind_device(struct device
> *dev, void iommu_sva_unbind_device(struct iommu_sva *handle);
>  u32 iommu_sva_get_pasid(struct iommu_sva *handle);
>  
> +
>  #else /* CONFIG_IOMMU_API */
>  
>  struct iommu_ops {};
> @@ -1020,6 +1031,16 @@ iommu_aux_get_pasid(struct iommu_domain *domain,
> struct device *dev) return -ENODEV;
>  }
>  
> +static inline
> +int iommu_attach_pasid_table(struct iommu_domain *domain,
> +			     struct iommu_pasid_table_config *cfg)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline
> +void iommu_detach_pasid_table(struct iommu_domain *domain) {}
> +
>  static inline struct iommu_sva *
>  iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void
> *drvdata) {
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> index e1d9e75f2c94..082d758dd016 100644
> --- a/include/uapi/linux/iommu.h
> +++ b/include/uapi/linux/iommu.h
> @@ -338,4 +338,58 @@ struct iommu_gpasid_bind_data {
>  	} vendor;
>  };
>  
> +/**
> + * struct iommu_pasid_smmuv3 - ARM SMMUv3 Stream Table Entry stage 1
> related
> + *     information
> + * @version: API version of this structure
> + * @s1fmt: STE s1fmt (format of the CD table: single CD, linear table
> + *         or 2-level table)
> + * @s1dss: STE s1dss (specifies the behavior when @pasid_bits != 0
> + *         and no PASID is passed along with the incoming transaction)
> + * @padding: reserved for future use (should be zero)
> + *
> + * The PASID table is referred to as the Context Descriptor (CD) table
> on ARM
> + * SMMUv3. Please refer to the ARM SMMU 3.x spec (ARM IHI 0070A) for full
> + * details.
> + */
> +struct iommu_pasid_smmuv3 {
> +#define PASID_TABLE_SMMUV3_CFG_VERSION_1 1
> +	__u32	version;
> +	__u8	s1fmt;
> +	__u8	s1dss;
> +	__u8	padding[2];
> +};
> +
> +/**
> + * struct iommu_pasid_table_config - PASID table data used to bind guest
> PASID
> + *     table to the host IOMMU
> + * @argsz: User filled size of this data
> + * @version: API version to prepare for future extensions
> + * @format: format of the PASID table
> + * @base_ptr: guest physical address of the PASID table
> + * @pasid_bits: number of PASID bits used in the PASID table
> + * @config: indicates whether the guest translation stage must
> + *          be translated, bypassed or aborted.
> + * @padding: reserved for future use (should be zero)
> + * @vendor_data.smmuv3: table information when @format is
> + * %IOMMU_PASID_FORMAT_SMMUV3
> + */
> +struct iommu_pasid_table_config {
> +	__u32	argsz;
> +#define PASID_TABLE_CFG_VERSION_1 1
> +	__u32	version;
> +#define IOMMU_PASID_FORMAT_SMMUV3	1
> +	__u32	format;
There will be a u32 gap here, right? perhaps another padding?

> +	__u64	base_ptr;
> +	__u8	pasid_bits;
> +#define IOMMU_PASID_CONFIG_TRANSLATE	1
> +#define IOMMU_PASID_CONFIG_BYPASS	2
> +#define IOMMU_PASID_CONFIG_ABORT	3
> +	__u8	config;
> +	__u8    padding[2];
> +	union {
> +		struct iommu_pasid_smmuv3 smmuv3;
> +	} vendor_data;
> +};
> +
>  #endif /* _UAPI_IOMMU_H */


Thanks,

Jacob

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v13 05/15] iommu/smmuv3: Get prepared for nested stage support
  2020-11-18 11:21 ` [PATCH v13 05/15] iommu/smmuv3: Get prepared for nested stage support Eric Auger
@ 2020-11-19  3:59   ` kernel test robot
       [not found]   ` <a40b90bd-6756-c8cc-b455-c093d16d35f5@huawei.com>
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 57+ messages in thread
From: kernel test robot @ 2020-11-19  3:59 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, iommu, linux-kernel, kvm, kvmarm,
	will, joro, maz, robin.murphy
  Cc: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 8673 bytes --]

Hi Eric,

I love your patch! Perhaps something to improve:

[auto build test WARNING on iommu/next]
[also build test WARNING on linus/master v5.10-rc4 next-20201118]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Eric-Auger/SMMUv3-Nested-Stage-Setup-IOMMU-part/20201118-192520
base:   https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next
config: arm64-randconfig-s031-20201118 (attached as .config)
compiler: aarch64-linux-gcc (GCC) 9.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.3-123-g626c4742-dirty
        # https://github.com/0day-ci/linux/commit/7308cdb07384d807c5ef43e6bfe0cd61c35a121e
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Eric-Auger/SMMUv3-Nested-Stage-Setup-IOMMU-part/20201118-192520
        git checkout 7308cdb07384d807c5ef43e6bfe0cd61c35a121e
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=arm64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


"sparse warnings: (new ones prefixed by >>)"
>> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c:1326:37: sparse: sparse: restricted __le64 degrades to integer
   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c:1326:37: sparse: sparse: cast to restricted __le64
   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c: note: in included file (through arch/arm64/include/asm/atomic.h, include/linux/atomic.h, include/asm-generic/bitops/atomic.h, ...):
   arch/arm64/include/asm/cmpxchg.h:172:1: sparse: sparse: cast truncates bits from constant value (ffffffff80000000 becomes 0)
   arch/arm64/include/asm/cmpxchg.h:172:1: sparse: sparse: cast truncates bits from constant value (ffffffff80000000 becomes 0)

vim +1326 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c

  1175	
  1176	static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
  1177					      __le64 *dst)
  1178	{
  1179		/*
  1180		 * This is hideously complicated, but we only really care about
  1181		 * three cases at the moment:
  1182		 *
  1183		 * 1. Invalid (all zero) -> bypass/fault (init)
  1184		 * 2. Bypass/fault -> single stage translation/bypass (attach)
  1185		 * 3. Single or nested stage Translation/bypass -> bypass/fault (detach)
  1186		 * 4. S2 -> S1 + S2 (attach_pasid_table)
  1187		 * 5. S1 + S2 -> S2 (detach_pasid_table)
  1188		 *
  1189		 * Given that we can't update the STE atomically and the SMMU
  1190		 * doesn't read the thing in a defined order, that leaves us
  1191		 * with the following maintenance requirements:
  1192		 *
  1193		 * 1. Update Config, return (init time STEs aren't live)
  1194		 * 2. Write everything apart from dword 0, sync, write dword 0, sync
  1195		 * 3. Update Config, sync
  1196		 */
  1197		u64 val = le64_to_cpu(dst[0]);
  1198		bool s1_live = false, s2_live = false, ste_live;
  1199		bool abort, nested = false, translate = false;
  1200		struct arm_smmu_device *smmu = NULL;
  1201		struct arm_smmu_s1_cfg *s1_cfg;
  1202		struct arm_smmu_s2_cfg *s2_cfg;
  1203		struct arm_smmu_domain *smmu_domain = NULL;
  1204		struct arm_smmu_cmdq_ent prefetch_cmd = {
  1205			.opcode		= CMDQ_OP_PREFETCH_CFG,
  1206			.prefetch	= {
  1207				.sid	= sid,
  1208			},
  1209		};
  1210	
  1211		if (master) {
  1212			smmu_domain = master->domain;
  1213			smmu = master->smmu;
  1214		}
  1215	
  1216		if (smmu_domain) {
  1217			s1_cfg = &smmu_domain->s1_cfg;
  1218			s2_cfg = &smmu_domain->s2_cfg;
  1219	
  1220			switch (smmu_domain->stage) {
  1221			case ARM_SMMU_DOMAIN_S1:
  1222				s1_cfg->set = true;
  1223				s2_cfg->set = false;
  1224				break;
  1225			case ARM_SMMU_DOMAIN_S2:
  1226				s1_cfg->set = false;
  1227				s2_cfg->set = true;
  1228				break;
  1229			case ARM_SMMU_DOMAIN_NESTED:
  1230				/*
  1231				 * Actual usage of stage 1 depends on nested mode:
  1232				 * legacy (2d stage only) or true nested mode
  1233				 */
  1234				s2_cfg->set = true;
  1235				break;
  1236			default:
  1237				break;
  1238			}
  1239			nested = s1_cfg->set && s2_cfg->set;
  1240			translate = s1_cfg->set || s2_cfg->set;
  1241		}
  1242	
  1243		if (val & STRTAB_STE_0_V) {
  1244			switch (FIELD_GET(STRTAB_STE_0_CFG, val)) {
  1245			case STRTAB_STE_0_CFG_BYPASS:
  1246				break;
  1247			case STRTAB_STE_0_CFG_S1_TRANS:
  1248				s1_live = true;
  1249				break;
  1250			case STRTAB_STE_0_CFG_S2_TRANS:
  1251				s2_live = true;
  1252				break;
  1253			case STRTAB_STE_0_CFG_NESTED:
  1254				s1_live = true;
  1255				s2_live = true;
  1256				break;
  1257			case STRTAB_STE_0_CFG_ABORT:
  1258				break;
  1259			default:
  1260				BUG(); /* STE corruption */
  1261			}
  1262		}
  1263	
  1264		ste_live = s1_live || s2_live;
  1265	
  1266		/* Nuke the existing STE_0 value, as we're going to rewrite it */
  1267		val = STRTAB_STE_0_V;
  1268	
  1269		/* Bypass/fault */
  1270	
  1271		if (!smmu_domain)
  1272			abort = disable_bypass;
  1273		else
  1274			abort = smmu_domain->abort;
  1275	
  1276		if (abort || !translate) {
  1277			if (abort)
  1278				val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT);
  1279			else
  1280				val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS);
  1281	
  1282			dst[0] = cpu_to_le64(val);
  1283			dst[1] = cpu_to_le64(FIELD_PREP(STRTAB_STE_1_SHCFG,
  1284							STRTAB_STE_1_SHCFG_INCOMING));
  1285			dst[2] = 0; /* Nuke the VMID */
  1286			/*
  1287			 * The SMMU can perform negative caching, so we must sync
  1288			 * the STE regardless of whether the old value was live.
  1289			 */
  1290			if (smmu)
  1291				arm_smmu_sync_ste_for_sid(smmu, sid);
  1292			return;
  1293		}
  1294	
  1295		BUG_ON(ste_live && !nested);
  1296	
  1297		if (ste_live) {
  1298			/* First invalidate the live STE */
  1299			dst[0] = cpu_to_le64(STRTAB_STE_0_CFG_ABORT);
  1300			arm_smmu_sync_ste_for_sid(smmu, sid);
  1301		}
  1302	
  1303		if (s1_cfg->set) {
  1304			BUG_ON(s1_live);
  1305			dst[1] = cpu_to_le64(
  1306				 FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) |
  1307				 FIELD_PREP(STRTAB_STE_1_S1CIR, STRTAB_STE_1_S1C_CACHE_WBRA) |
  1308				 FIELD_PREP(STRTAB_STE_1_S1COR, STRTAB_STE_1_S1C_CACHE_WBRA) |
  1309				 FIELD_PREP(STRTAB_STE_1_S1CSH, ARM_SMMU_SH_ISH) |
  1310				 FIELD_PREP(STRTAB_STE_1_STRW, STRTAB_STE_1_STRW_NSEL1));
  1311	
  1312			if (smmu->features & ARM_SMMU_FEAT_STALLS &&
  1313			   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
  1314				dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
  1315	
  1316			val |= (s1_cfg->cdcfg.cdtab_dma & STRTAB_STE_0_S1CTXPTR_MASK) |
  1317				FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S1_TRANS) |
  1318				FIELD_PREP(STRTAB_STE_0_S1CDMAX, s1_cfg->s1cdmax) |
  1319				FIELD_PREP(STRTAB_STE_0_S1FMT, s1_cfg->s1fmt);
  1320		}
  1321	
  1322		if (s2_cfg->set) {
  1323			u64 vttbr = s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK;
  1324	
  1325			if (s2_live) {
> 1326				u64 s2ttb = le64_to_cpu(dst[3] & STRTAB_STE_3_S2TTB_MASK);
  1327	
  1328				BUG_ON(s2ttb != vttbr);
  1329			}
  1330	
  1331			dst[2] = cpu_to_le64(
  1332				 FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) |
  1333				 FIELD_PREP(STRTAB_STE_2_VTCR, s2_cfg->vtcr) |
  1334	#ifdef __BIG_ENDIAN
  1335				 STRTAB_STE_2_S2ENDI |
  1336	#endif
  1337				 STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2AA64 |
  1338				 STRTAB_STE_2_S2R);
  1339	
  1340			dst[3] = cpu_to_le64(vttbr);
  1341	
  1342			val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS);
  1343		} else {
  1344			dst[2] = 0;
  1345			dst[3] = 0;
  1346		}
  1347	
  1348		if (master->ats_enabled)
  1349			dst[1] |= cpu_to_le64(FIELD_PREP(STRTAB_STE_1_EATS,
  1350							 STRTAB_STE_1_EATS_TRANS));
  1351	
  1352		arm_smmu_sync_ste_for_sid(smmu, sid);
  1353		/* See comment in arm_smmu_write_ctx_desc() */
  1354		WRITE_ONCE(dst[0], cpu_to_le64(val));
  1355		arm_smmu_sync_ste_for_sid(smmu, sid);
  1356	
  1357		/* It's likely that we'll want to use the new STE soon */
  1358		if (!(smmu->options & ARM_SMMU_OPT_SKIP_PREFETCH))
  1359			arm_smmu_cmdq_issue_cmd(smmu, &prefetch_cmd);
  1360	}
  1361	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 29042 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v13 01/15] iommu: Introduce attach/detach_pasid_table API
  2020-11-18 16:19   ` Jacob Pan
@ 2020-11-19 17:02     ` Auger Eric
  0 siblings, 0 replies; 57+ messages in thread
From: Auger Eric @ 2020-11-19 17:02 UTC (permalink / raw)
  To: Jacob Pan
  Cc: eric.auger.pro, iommu, linux-kernel, kvm, kvmarm, will, joro,
	maz, robin.murphy, alex.williamson, jean-philippe, zhangfei.gao,
	zhangfei.gao, vivek.gautam, shameerali.kolothum.thodi, yi.l.liu,
	tn, nicoleotsuka, yuzenghui

Hi Jacob,
On 11/18/20 5:19 PM, Jacob Pan wrote:
> Hi Eric,
> 
> On Wed, 18 Nov 2020 12:21:37 +0100, Eric Auger <eric.auger@redhat.com>
> wrote:
> 
>> In virtualization use case, when a guest is assigned
>> a PCI host device, protected by a virtual IOMMU on the guest,
>> the physical IOMMU must be programmed to be consistent with
>> the guest mappings. If the physical IOMMU supports two
>> translation stages it makes sense to program guest mappings
>> onto the first stage/level (ARM/Intel terminology) while the host
>> owns the stage/level 2.
>>
>> In that case, it is mandated to trap on guest configuration
>> settings and pass those to the physical iommu driver.
>>
>> This patch adds a new API to the iommu subsystem that allows
>> to set/unset the pasid table information.
>>
>> A generic iommu_pasid_table_config struct is introduced in
>> a new iommu.h uapi header. This is going to be used by the VFIO
>> user API.
>>
>> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
>> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
>> Signed-off-by: Ashok Raj <ashok.raj@intel.com>
>> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>
>> ---
>>
>> v12 -> v13:
>> - Fix config check
>>
>> v11 -> v12:
>> - add argsz, name the union
>> ---
>>  drivers/iommu/iommu.c      | 68 ++++++++++++++++++++++++++++++++++++++
>>  include/linux/iommu.h      | 21 ++++++++++++
>>  include/uapi/linux/iommu.h | 54 ++++++++++++++++++++++++++++++
>>  3 files changed, 143 insertions(+)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index b53446bb8c6b..978fe34378fb 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -2171,6 +2171,74 @@ int iommu_uapi_sva_unbind_gpasid(struct
>> iommu_domain *domain, struct device *dev }
>>  EXPORT_SYMBOL_GPL(iommu_uapi_sva_unbind_gpasid);
>>  
>> +int iommu_attach_pasid_table(struct iommu_domain *domain,
>> +			     struct iommu_pasid_table_config *cfg)
>> +{
>> +	if (unlikely(!domain->ops->attach_pasid_table))
>> +		return -ENODEV;
>> +
>> +	return domain->ops->attach_pasid_table(domain, cfg);
>> +}
>> +
>> +int iommu_uapi_attach_pasid_table(struct iommu_domain *domain,
>> +				  void __user *uinfo)
>> +{
>> +	struct iommu_pasid_table_config pasid_table_data = { 0 };
>> +	u32 minsz;
>> +
>> +	if (unlikely(!domain->ops->attach_pasid_table))
>> +		return -ENODEV;
>> +
>> +	/*
>> +	 * No new spaces can be added before the variable sized union,
>> the
>> +	 * minimum size is the offset to the union.
>> +	 */
>> +	minsz = offsetof(struct iommu_pasid_table_config, vendor_data);
>> +
>> +	/* Copy minsz from user to get flags and argsz */
>> +	if (copy_from_user(&pasid_table_data, uinfo, minsz))
>> +		return -EFAULT;
>> +
>> +	/* Fields before the variable size union are mandatory */
>> +	if (pasid_table_data.argsz < minsz)
>> +		return -EINVAL;
>> +
>> +	/* PASID and address granu require additional info beyond minsz
>> */
>> +	if (pasid_table_data.version != PASID_TABLE_CFG_VERSION_1)
>> +		return -EINVAL;
>> +	if (pasid_table_data.format == IOMMU_PASID_FORMAT_SMMUV3 &&
>> +	    pasid_table_data.argsz <
>> +		offsetofend(struct iommu_pasid_table_config,
>> vendor_data.smmuv3))
>> +		return -EINVAL;
>> +
>> +	/*
>> +	 * User might be using a newer UAPI header which has a larger
>> data
>> +	 * size, we shall support the existing flags within the current
>> +	 * size. Copy the remaining user data _after_ minsz but not more
>> +	 * than the current kernel supported size.
>> +	 */
>> +	if (copy_from_user((void *)&pasid_table_data + minsz, uinfo +
>> minsz,
>> +			   min_t(u32, pasid_table_data.argsz,
>> sizeof(pasid_table_data)) - minsz))
>> +		return -EFAULT;
>> +
>> +	/* Now the argsz is validated, check the content */
>> +	if (pasid_table_data.config < IOMMU_PASID_CONFIG_TRANSLATE ||
>> +	    pasid_table_data.config > IOMMU_PASID_CONFIG_ABORT)
>> +		return -EINVAL;
>> +
>> +	return domain->ops->attach_pasid_table(domain,
>> &pasid_table_data); +}
>> +EXPORT_SYMBOL_GPL(iommu_uapi_attach_pasid_table);
>> +
>> +void iommu_detach_pasid_table(struct iommu_domain *domain)
>> +{
>> +	if (unlikely(!domain->ops->detach_pasid_table))
>> +		return;
>> +
>> +	domain->ops->detach_pasid_table(domain);
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
>> +
>>  static void __iommu_detach_device(struct iommu_domain *domain,
>>  				  struct device *dev)
>>  {
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index b95a6f8db6ff..464fcbecf841 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -223,6 +223,8 @@ struct iommu_iotlb_gather {
>>   * @cache_invalidate: invalidate translation caches
>>   * @sva_bind_gpasid: bind guest pasid and mm
>>   * @sva_unbind_gpasid: unbind guest pasid and mm
>> + * @attach_pasid_table: attach a pasid table
>> + * @detach_pasid_table: detach the pasid table
>>   * @def_domain_type: device default domain type, return value:
>>   *		- IOMMU_DOMAIN_IDENTITY: must use an identity domain
>>   *		- IOMMU_DOMAIN_DMA: must use a dma domain
>> @@ -287,6 +289,9 @@ struct iommu_ops {
>>  				      void *drvdata);
>>  	void (*sva_unbind)(struct iommu_sva *handle);
>>  	u32 (*sva_get_pasid)(struct iommu_sva *handle);
>> +	int (*attach_pasid_table)(struct iommu_domain *domain,
>> +				  struct iommu_pasid_table_config *cfg);
>> +	void (*detach_pasid_table)(struct iommu_domain *domain);
>>  
>>  	int (*page_response)(struct device *dev,
>>  			     struct iommu_fault_event *evt,
>> @@ -434,6 +439,11 @@ extern int iommu_uapi_sva_unbind_gpasid(struct
>> iommu_domain *domain, struct device *dev, void __user *udata);
>>  extern int iommu_sva_unbind_gpasid(struct iommu_domain *domain,
>>  				   struct device *dev, ioasid_t pasid);
>> +extern int iommu_attach_pasid_table(struct iommu_domain *domain,
>> +				    struct iommu_pasid_table_config
>> *cfg); +extern int iommu_uapi_attach_pasid_table(struct iommu_domain
>> *domain,
>> +					 void __user *udata);
>> +extern void iommu_detach_pasid_table(struct iommu_domain *domain);
>>  extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
>>  extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
>>  extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
>> @@ -639,6 +649,7 @@ struct iommu_sva *iommu_sva_bind_device(struct device
>> *dev, void iommu_sva_unbind_device(struct iommu_sva *handle);
>>  u32 iommu_sva_get_pasid(struct iommu_sva *handle);
>>  
>> +
>>  #else /* CONFIG_IOMMU_API */
>>  
>>  struct iommu_ops {};
>> @@ -1020,6 +1031,16 @@ iommu_aux_get_pasid(struct iommu_domain *domain,
>> struct device *dev) return -ENODEV;
>>  }
>>  
>> +static inline
>> +int iommu_attach_pasid_table(struct iommu_domain *domain,
>> +			     struct iommu_pasid_table_config *cfg)
>> +{
>> +	return -ENODEV;
>> +}
>> +
>> +static inline
>> +void iommu_detach_pasid_table(struct iommu_domain *domain) {}
>> +
>>  static inline struct iommu_sva *
>>  iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void
>> *drvdata) {
>> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
>> index e1d9e75f2c94..082d758dd016 100644
>> --- a/include/uapi/linux/iommu.h
>> +++ b/include/uapi/linux/iommu.h
>> @@ -338,4 +338,58 @@ struct iommu_gpasid_bind_data {
>>  	} vendor;
>>  };
>>  
>> +/**
>> + * struct iommu_pasid_smmuv3 - ARM SMMUv3 Stream Table Entry stage 1
>> related
>> + *     information
>> + * @version: API version of this structure
>> + * @s1fmt: STE s1fmt (format of the CD table: single CD, linear table
>> + *         or 2-level table)
>> + * @s1dss: STE s1dss (specifies the behavior when @pasid_bits != 0
>> + *         and no PASID is passed along with the incoming transaction)
>> + * @padding: reserved for future use (should be zero)
>> + *
>> + * The PASID table is referred to as the Context Descriptor (CD) table
>> on ARM
>> + * SMMUv3. Please refer to the ARM SMMU 3.x spec (ARM IHI 0070A) for full
>> + * details.
>> + */
>> +struct iommu_pasid_smmuv3 {
>> +#define PASID_TABLE_SMMUV3_CFG_VERSION_1 1
>> +	__u32	version;
>> +	__u8	s1fmt;
>> +	__u8	s1dss;
>> +	__u8	padding[2];
>> +};
>> +
>> +/**
>> + * struct iommu_pasid_table_config - PASID table data used to bind guest
>> PASID
>> + *     table to the host IOMMU
>> + * @argsz: User filled size of this data
>> + * @version: API version to prepare for future extensions
>> + * @format: format of the PASID table
>> + * @base_ptr: guest physical address of the PASID table
>> + * @pasid_bits: number of PASID bits used in the PASID table
>> + * @config: indicates whether the guest translation stage must
>> + *          be translated, bypassed or aborted.
>> + * @padding: reserved for future use (should be zero)
>> + * @vendor_data.smmuv3: table information when @format is
>> + * %IOMMU_PASID_FORMAT_SMMUV3
>> + */
>> +struct iommu_pasid_table_config {
>> +	__u32	argsz;
>> +#define PASID_TABLE_CFG_VERSION_1 1
>> +	__u32	version;
>> +#define IOMMU_PASID_FORMAT_SMMUV3	1
>> +	__u32	format;
> There will be a u32 gap here, right? perhaps another padding?
Yes you're right (as per
https://lists.cs.columbia.edu/pipermail/kvmarm/2019-January/034414.html).
I will add some padding here and add some additional padding below.

Thanks

Eric
> 
>> +	__u64	base_ptr;
>> +	__u8	pasid_bits;
>> +#define IOMMU_PASID_CONFIG_TRANSLATE	1
>> +#define IOMMU_PASID_CONFIG_BYPASS	2
>> +#define IOMMU_PASID_CONFIG_ABORT	3
>> +	__u8	config;
>> +	__u8    padding[2];
>> +	union {
>> +		struct iommu_pasid_smmuv3 smmuv3;
>> +	} vendor_data;
>> +};
>> +
>>  #endif /* _UAPI_IOMMU_H */
> 
> 
> Thanks,
> 
> Jacob
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
  2020-11-18 11:21 ` [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs Eric Auger
@ 2020-12-01 13:33   ` Xingang Wang
  2020-12-01 13:58     ` Auger Eric
  0 siblings, 1 reply; 57+ messages in thread
From: Xingang Wang @ 2020-12-01 13:33 UTC (permalink / raw)
  To: eric.auger
  Cc: alex.williamson, eric.auger.pro, iommu, jean-philippe, joro, kvm,
	kvmarm, linux-kernel, maz, robin.murphy, vivek.gautam, will,
	zhangfei.gao, xieyingtai

Hi Eric

On  Wed, 18 Nov 2020 12:21:43, Eric Auger wrote:
>@@ -1710,7 +1710,11 @@ static void arm_smmu_tlb_inv_context(void *cookie)
> 	 * insertion to guarantee those are observed before the TLBI. Do be
> 	 * careful, 007.
> 	 */
>-	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>+	if (ext_asid >= 0) { /* guest stage 1 invalidation */
>+		cmd.opcode	= CMDQ_OP_TLBI_NH_ASID;
>+		cmd.tlbi.asid	= ext_asid;
>+		cmd.tlbi.vmid	= smmu_domain->s2_cfg.vmid;
>+	} else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {

Found a problem here, the cmd for guest stage 1 invalidation is built,
but it is not delivered to smmu.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
  2020-12-01 13:33   ` Xingang Wang
@ 2020-12-01 13:58     ` Auger Eric
  2020-12-02 12:59       ` Wang Xingang
  2020-12-03 18:42       ` Shameerali Kolothum Thodi
  0 siblings, 2 replies; 57+ messages in thread
From: Auger Eric @ 2020-12-01 13:58 UTC (permalink / raw)
  To: Xingang Wang
  Cc: alex.williamson, eric.auger.pro, iommu, jean-philippe, joro, kvm,
	kvmarm, linux-kernel, maz, robin.murphy, vivek.gautam, will,
	zhangfei.gao, xieyingtai

Hi Xingang,

On 12/1/20 2:33 PM, Xingang Wang wrote:
> Hi Eric
> 
> On  Wed, 18 Nov 2020 12:21:43, Eric Auger wrote:
>> @@ -1710,7 +1710,11 @@ static void arm_smmu_tlb_inv_context(void *cookie)
>> 	 * insertion to guarantee those are observed before the TLBI. Do be
>> 	 * careful, 007.
>> 	 */
>> -	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>> +	if (ext_asid >= 0) { /* guest stage 1 invalidation */
>> +		cmd.opcode	= CMDQ_OP_TLBI_NH_ASID;
>> +		cmd.tlbi.asid	= ext_asid;
>> +		cmd.tlbi.vmid	= smmu_domain->s2_cfg.vmid;
>> +	} else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
> 
> Found a problem here, the cmd for guest stage 1 invalidation is built,
> but it is not delivered to smmu.
> 

Thank you for the report. I will fix that soon. With that fixed, have
you been able to run vSVA on top of the series. Do you need other stuff
to be fixed at SMMU level? As I am going to respin soon, please let me
know what is the best branch to rebase to alleviate your integration.

Best Regards

Eric


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
  2020-12-01 13:58     ` Auger Eric
@ 2020-12-02 12:59       ` Wang Xingang
  2020-12-03 18:42       ` Shameerali Kolothum Thodi
  1 sibling, 0 replies; 57+ messages in thread
From: Wang Xingang @ 2020-12-02 12:59 UTC (permalink / raw)
  To: Auger Eric
  Cc: alex.williamson, eric.auger.pro, iommu, jean-philippe, joro, kvm,
	kvmarm, linux-kernel, maz, robin.murphy, vivek.gautam, will,
	zhangfei.gao, xieyingtai

Thanks for your reply. We are testing vSVA, and will let you know if
other problems are found.

On 2020/12/1 21:58, Auger Eric wrote:
> Hi Xingang,
> 
> On 12/1/20 2:33 PM, Xingang Wang wrote:
>> Hi Eric
>>
>> On  Wed, 18 Nov 2020 12:21:43, Eric Auger wrote:
>>> @@ -1710,7 +1710,11 @@ static void arm_smmu_tlb_inv_context(void *cookie)
>>> 	 * insertion to guarantee those are observed before the TLBI. Do be
>>> 	 * careful, 007.
>>> 	 */
>>> -	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>>> +	if (ext_asid >= 0) { /* guest stage 1 invalidation */
>>> +		cmd.opcode	= CMDQ_OP_TLBI_NH_ASID;
>>> +		cmd.tlbi.asid	= ext_asid;
>>> +		cmd.tlbi.vmid	= smmu_domain->s2_cfg.vmid;
>>> +	} else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>>
>> Found a problem here, the cmd for guest stage 1 invalidation is built,
>> but it is not delivered to smmu.
>>
> 
> Thank you for the report. I will fix that soon. With that fixed, have
> you been able to run vSVA on top of the series. Do you need other stuff
> to be fixed at SMMU level? As I am going to respin soon, please let me
> know what is the best branch to rebase to alleviate your integration.
> 
> Best Regards
> 
> Eric
> 
> .
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v13 05/15] iommu/smmuv3: Get prepared for nested stage support
       [not found]   ` <a40b90bd-6756-c8cc-b455-c093d16d35f5@huawei.com>
@ 2020-12-03 13:01     ` Auger Eric
  2020-12-03 13:23       ` Kunkun Jiang
  0 siblings, 1 reply; 57+ messages in thread
From: Auger Eric @ 2020-12-03 13:01 UTC (permalink / raw)
  To: Kunkun Jiang, eric.auger.pro, iommu, linux-kernel, kvm, kvmarm,
	will, joro, maz, robin.murphy, alex.williamson
  Cc: jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, jacob.jun.pan, yi.l.liu, tn,
	nicoleotsuka, yuzenghui, wanghaibin.wang, Keqian Zhu

Hi Kunkun,

On 12/3/20 1:32 PM, Kunkun Jiang wrote:
> Hi Eric,
> 
> On 2020/11/18 19:21, Eric Auger wrote:
>> When nested stage translation is setup, both s1_cfg and
>> s2_cfg are set.
>>
>> We introduce a new smmu domain abort field that will be set
>> upon guest stage1 configuration passing.
>>
>> arm_smmu_write_strtab_ent() is modified to write both stage
>> fields in the STE and deal with the abort field.
>>
>> In nested mode, only stage 2 is "finalized" as the host does
>> not own/configure the stage 1 context descriptor; guest does.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>
>> ---
>> v10 -> v11:
>> - Fix an issue reported by Shameer when switching from with vSMMU
>>   to without vSMMU. Despite the spec does not seem to mention it
>>   seems to be needed to reset the 2 high 64b when switching from
>>   S1+S2 cfg to S1 only. Especially dst[3] needs to be reset (S2TTB).
>>   On some implementations, if the S2TTB is not reset, this causes
>>   a C_BAD_STE error
>> ---
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 64 +++++++++++++++++----
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  2 +
>>  2 files changed, 56 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 18ac5af1b284..412ea1bafa50 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -1181,8 +1181,10 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>>  	 * three cases at the moment:
> Now, it should be *five cases*.
>>  	 *
>>  	 * 1. Invalid (all zero) -> bypass/fault (init)
>> -	 * 2. Bypass/fault -> translation/bypass (attach)
>> -	 * 3. Translation/bypass -> bypass/fault (detach)
>> +	 * 2. Bypass/fault -> single stage translation/bypass (attach)
>> +	 * 3. Single or nested stage Translation/bypass -> bypass/fault (detach)
>> +	 * 4. S2 -> S1 + S2 (attach_pasid_table)
> 
> I was testing this series on one of our hardware board with SMMUv3. And
> I found while trying to /"//attach_pasid_table//"/,
> 
> the sequence of STE (host) config(bit[3:1]) is /"S2->abort->S1 + S2"/.
> Because the maintenance is  /"Write everything apart///
> 
> /from dword 0, sync, write dword 0, sync"/ when we update the STE
> (guest). Dose the sequence meet your expectation?

yes it does. I will fix the comments accordingly.

Is there anything to correct in the code or was it functional?

Thanks

Eric
> 
>> +	 * 5. S1 + S2 -> S2 (detach_pasid_table)
>>  	 *
>>  	 * Given that we can't update the STE atomically and the SMMU
>>  	 * doesn't read the thing in a defined order, that leaves us
>> @@ -1193,7 +1195,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>>  	 * 3. Update Config, sync
>>  	 */
>>  	u64 val = le64_to_cpu(dst[0]);
>> -	bool ste_live = false;
>> +	bool s1_live = false, s2_live = false, ste_live;
>> +	bool abort, nested = false, translate = false;
>>  	struct arm_smmu_device *smmu = NULL;
>>  	struct arm_smmu_s1_cfg *s1_cfg;
>>  	struct arm_smmu_s2_cfg *s2_cfg;
>> @@ -1233,6 +1236,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>>  		default:
>>  			break;
>>  		}
>> +		nested = s1_cfg->set && s2_cfg->set;
>> +		translate = s1_cfg->set || s2_cfg->set;
>>  	}
>>  
>>  	if (val & STRTAB_STE_0_V) {
>> @@ -1240,23 +1245,36 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>>  		case STRTAB_STE_0_CFG_BYPASS:
>>  			break;
>>  		case STRTAB_STE_0_CFG_S1_TRANS:
>> +			s1_live = true;
>> +			break;
>>  		case STRTAB_STE_0_CFG_S2_TRANS:
>> -			ste_live = true;
>> +			s2_live = true;
>> +			break;
>> +		case STRTAB_STE_0_CFG_NESTED:
>> +			s1_live = true;
>> +			s2_live = true;
>>  			break;
>>  		case STRTAB_STE_0_CFG_ABORT:
>> -			BUG_ON(!disable_bypass);
>>  			break;
>>  		default:
>>  			BUG(); /* STE corruption */
>>  		}
>>  	}
>>  
>> +	ste_live = s1_live || s2_live;
>> +
>>  	/* Nuke the existing STE_0 value, as we're going to rewrite it */
>>  	val = STRTAB_STE_0_V;
>>  
>>  	/* Bypass/fault */
>> -	if (!smmu_domain || !(s1_cfg->set || s2_cfg->set)) {
>> -		if (!smmu_domain && disable_bypass)
>> +
>> +	if (!smmu_domain)
>> +		abort = disable_bypass;
>> +	else
>> +		abort = smmu_domain->abort;
>> +
>> +	if (abort || !translate) {
>> +		if (abort)
>>  			val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT);
>>  		else
>>  			val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS);
>> @@ -1274,8 +1292,16 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>>  		return;
>>  	}
>>  
>> +	BUG_ON(ste_live && !nested);
>> +
>> +	if (ste_live) {
>> +		/* First invalidate the live STE */
>> +		dst[0] = cpu_to_le64(STRTAB_STE_0_CFG_ABORT);
>> +		arm_smmu_sync_ste_for_sid(smmu, sid);
>> +	}
>> +
>>  	if (s1_cfg->set) {
>> -		BUG_ON(ste_live);
>> +		BUG_ON(s1_live);
>>  		dst[1] = cpu_to_le64(
>>  			 FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) |
>>  			 FIELD_PREP(STRTAB_STE_1_S1CIR, STRTAB_STE_1_S1C_CACHE_WBRA) |
>> @@ -1294,7 +1320,14 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>>  	}
>>  
>>  	if (s2_cfg->set) {
>> -		BUG_ON(ste_live);
>> +		u64 vttbr = s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK;
>> +
>> +		if (s2_live) {
>> +			u64 s2ttb = le64_to_cpu(dst[3] & STRTAB_STE_3_S2TTB_MASK);
>> +
>> +			BUG_ON(s2ttb != vttbr);
>> +		}
>> +
>>  		dst[2] = cpu_to_le64(
>>  			 FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) |
>>  			 FIELD_PREP(STRTAB_STE_2_VTCR, s2_cfg->vtcr) |
>> @@ -1304,9 +1337,12 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>>  			 STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2AA64 |
>>  			 STRTAB_STE_2_S2R);
>>  
>> -		dst[3] = cpu_to_le64(s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK);
>> +		dst[3] = cpu_to_le64(vttbr);
>>  
>>  		val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS);
>> +	} else {
>> +		dst[2] = 0;
>> +		dst[3] = 0;
>>  	}
>>  
>>  	if (master->ats_enabled)
>> @@ -1982,6 +2018,14 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
>>  		return 0;
>>  	}
>>  
>> +	if (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED &&
>> +	    (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1) ||
>> +	     !(smmu->features & ARM_SMMU_FEAT_TRANS_S2))) {
>> +		dev_info(smmu_domain->smmu->dev,
>> +			 "does not implement two stages\n");
>> +		return -EINVAL;
>> +	}
>> +
>>  	/* Restrict the stage to what we can actually support */
>>  	if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1))
>>  		smmu_domain->stage = ARM_SMMU_DOMAIN_S2;
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> index 07f59252dd21..269779dee8d1 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> @@ -206,6 +206,7 @@
>>  #define STRTAB_STE_0_CFG_BYPASS		4
>>  #define STRTAB_STE_0_CFG_S1_TRANS	5
>>  #define STRTAB_STE_0_CFG_S2_TRANS	6
>> +#define STRTAB_STE_0_CFG_NESTED		7
>>  
>>  #define STRTAB_STE_0_S1FMT		GENMASK_ULL(5, 4)
>>  #define STRTAB_STE_0_S1FMT_LINEAR	0
>> @@ -682,6 +683,7 @@ struct arm_smmu_domain {
>>  	enum arm_smmu_domain_stage	stage;
>>  	struct arm_smmu_s1_cfg	s1_cfg;
>>  	struct arm_smmu_s2_cfg	s2_cfg;
>> +	bool				abort;
>>  
>>  	struct iommu_domain		domain;
> 
> Thanks,
> 
> Kunkun Jiang
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v13 05/15] iommu/smmuv3: Get prepared for nested stage support
  2020-12-03 13:01     ` Auger Eric
@ 2020-12-03 13:23       ` Kunkun Jiang
  0 siblings, 0 replies; 57+ messages in thread
From: Kunkun Jiang @ 2020-12-03 13:23 UTC (permalink / raw)
  To: Auger Eric, eric.auger.pro, iommu, linux-kernel, kvm, kvmarm,
	will, joro, maz, robin.murphy, alex.williamson
  Cc: jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	shameerali.kolothum.thodi, jacob.jun.pan, yi.l.liu, tn,
	nicoleotsuka, yuzenghui, wanghaibin.wang, Keqian Zhu


On 2020/12/3 21:01, Auger Eric wrote:
> Hi Kunkun,
>
> On 12/3/20 1:32 PM, Kunkun Jiang wrote:
>> Hi Eric,
>>
>> On 2020/11/18 19:21, Eric Auger wrote:
>>> When nested stage translation is setup, both s1_cfg and
>>> s2_cfg are set.
>>>
>>> We introduce a new smmu domain abort field that will be set
>>> upon guest stage1 configuration passing.
>>>
>>> arm_smmu_write_strtab_ent() is modified to write both stage
>>> fields in the STE and deal with the abort field.
>>>
>>> In nested mode, only stage 2 is "finalized" as the host does
>>> not own/configure the stage 1 context descriptor; guest does.
>>>
>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>
>>> ---
>>> v10 -> v11:
>>> - Fix an issue reported by Shameer when switching from with vSMMU
>>>    to without vSMMU. Despite the spec does not seem to mention it
>>>    seems to be needed to reset the 2 high 64b when switching from
>>>    S1+S2 cfg to S1 only. Especially dst[3] needs to be reset (S2TTB).
>>>    On some implementations, if the S2TTB is not reset, this causes
>>>    a C_BAD_STE error
>>> ---
>>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 64 +++++++++++++++++----
>>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  2 +
>>>   2 files changed, 56 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>> index 18ac5af1b284..412ea1bafa50 100644
>>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>> @@ -1181,8 +1181,10 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>>>   	 * three cases at the moment:
>> Now, it should be *five cases*.
>>>   	 *
>>>   	 * 1. Invalid (all zero) -> bypass/fault (init)
>>> -	 * 2. Bypass/fault -> translation/bypass (attach)
>>> -	 * 3. Translation/bypass -> bypass/fault (detach)
>>> +	 * 2. Bypass/fault -> single stage translation/bypass (attach)
>>> +	 * 3. Single or nested stage Translation/bypass -> bypass/fault (detach)
>>> +	 * 4. S2 -> S1 + S2 (attach_pasid_table)
>> I was testing this series on one of our hardware board with SMMUv3. And
>> I found while trying to /"//attach_pasid_table//"/,
>>
>> the sequence of STE (host) config(bit[3:1]) is /"S2->abort->S1 + S2"/.
>> Because the maintenance is  /"Write everything apart///
>>
>> /from dword 0, sync, write dword 0, sync"/ when we update the STE
>> (guest). Dose the sequence meet your expectation?
> yes it does. I will fix the comments accordingly.
>
> Is there anything to correct in the code or was it functional?
>
> Thanks
>
> Eric
No, thanks. I just want to clarify the sequence.   :)
>>> +	 * 5. S1 + S2 -> S2 (detach_pasid_table)
>>>   	 *
>>>   	 * Given that we can't update the STE atomically and the SMMU
>>>   	 * doesn't read the thing in a defined order, that leaves us
>>> @@ -1193,7 +1195,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>>>   	 * 3. Update Config, sync
>>>   	 */
>>>   	u64 val = le64_to_cpu(dst[0]);
>>> -	bool ste_live = false;
>>> +	bool s1_live = false, s2_live = false, ste_live;
>>> +	bool abort, nested = false, translate = false;
>>>   	struct arm_smmu_device *smmu = NULL;
>>>   	struct arm_smmu_s1_cfg *s1_cfg;
>>>   	struct arm_smmu_s2_cfg *s2_cfg;
>>> @@ -1233,6 +1236,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>>>   		default:
>>>   			break;
>>>   		}
>>> +		nested = s1_cfg->set && s2_cfg->set;
>>> +		translate = s1_cfg->set || s2_cfg->set;
>>>   	}
>>>   
>>>   	if (val & STRTAB_STE_0_V) {
>>> @@ -1240,23 +1245,36 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>>>   		case STRTAB_STE_0_CFG_BYPASS:
>>>   			break;
>>>   		case STRTAB_STE_0_CFG_S1_TRANS:
>>> +			s1_live = true;
>>> +			break;
>>>   		case STRTAB_STE_0_CFG_S2_TRANS:
>>> -			ste_live = true;
>>> +			s2_live = true;
>>> +			break;
>>> +		case STRTAB_STE_0_CFG_NESTED:
>>> +			s1_live = true;
>>> +			s2_live = true;
>>>   			break;
>>>   		case STRTAB_STE_0_CFG_ABORT:
>>> -			BUG_ON(!disable_bypass);
>>>   			break;
>>>   		default:
>>>   			BUG(); /* STE corruption */
>>>   		}
>>>   	}
>>>   
>>> +	ste_live = s1_live || s2_live;
>>> +
>>>   	/* Nuke the existing STE_0 value, as we're going to rewrite it */
>>>   	val = STRTAB_STE_0_V;
>>>   
>>>   	/* Bypass/fault */
>>> -	if (!smmu_domain || !(s1_cfg->set || s2_cfg->set)) {
>>> -		if (!smmu_domain && disable_bypass)
>>> +
>>> +	if (!smmu_domain)
>>> +		abort = disable_bypass;
>>> +	else
>>> +		abort = smmu_domain->abort;
>>> +
>>> +	if (abort || !translate) {
>>> +		if (abort)
>>>   			val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT);
>>>   		else
>>>   			val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS);
>>> @@ -1274,8 +1292,16 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>>>   		return;
>>>   	}
>>>   
>>> +	BUG_ON(ste_live && !nested);
>>> +
>>> +	if (ste_live) {
>>> +		/* First invalidate the live STE */
>>> +		dst[0] = cpu_to_le64(STRTAB_STE_0_CFG_ABORT);
>>> +		arm_smmu_sync_ste_for_sid(smmu, sid);
>>> +	}
>>> +
>>>   	if (s1_cfg->set) {
>>> -		BUG_ON(ste_live);
>>> +		BUG_ON(s1_live);
>>>   		dst[1] = cpu_to_le64(
>>>   			 FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) |
>>>   			 FIELD_PREP(STRTAB_STE_1_S1CIR, STRTAB_STE_1_S1C_CACHE_WBRA) |
>>> @@ -1294,7 +1320,14 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>>>   	}
>>>   
>>>   	if (s2_cfg->set) {
>>> -		BUG_ON(ste_live);
>>> +		u64 vttbr = s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK;
>>> +
>>> +		if (s2_live) {
>>> +			u64 s2ttb = le64_to_cpu(dst[3] & STRTAB_STE_3_S2TTB_MASK);
>>> +
>>> +			BUG_ON(s2ttb != vttbr);
>>> +		}
>>> +
>>>   		dst[2] = cpu_to_le64(
>>>   			 FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) |
>>>   			 FIELD_PREP(STRTAB_STE_2_VTCR, s2_cfg->vtcr) |
>>> @@ -1304,9 +1337,12 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>>>   			 STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2AA64 |
>>>   			 STRTAB_STE_2_S2R);
>>>   
>>> -		dst[3] = cpu_to_le64(s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK);
>>> +		dst[3] = cpu_to_le64(vttbr);
>>>   
>>>   		val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS);
>>> +	} else {
>>> +		dst[2] = 0;
>>> +		dst[3] = 0;
>>>   	}
>>>   
>>>   	if (master->ats_enabled)
>>> @@ -1982,6 +2018,14 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
>>>   		return 0;
>>>   	}
>>>   
>>> +	if (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED &&
>>> +	    (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1) ||
>>> +	     !(smmu->features & ARM_SMMU_FEAT_TRANS_S2))) {
>>> +		dev_info(smmu_domain->smmu->dev,
>>> +			 "does not implement two stages\n");
>>> +		return -EINVAL;
>>> +	}
>>> +
>>>   	/* Restrict the stage to what we can actually support */
>>>   	if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1))
>>>   		smmu_domain->stage = ARM_SMMU_DOMAIN_S2;
>>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>>> index 07f59252dd21..269779dee8d1 100644
>>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>>> @@ -206,6 +206,7 @@
>>>   #define STRTAB_STE_0_CFG_BYPASS		4
>>>   #define STRTAB_STE_0_CFG_S1_TRANS	5
>>>   #define STRTAB_STE_0_CFG_S2_TRANS	6
>>> +#define STRTAB_STE_0_CFG_NESTED		7
>>>   
>>>   #define STRTAB_STE_0_S1FMT		GENMASK_ULL(5, 4)
>>>   #define STRTAB_STE_0_S1FMT_LINEAR	0
>>> @@ -682,6 +683,7 @@ struct arm_smmu_domain {
>>>   	enum arm_smmu_domain_stage	stage;
>>>   	struct arm_smmu_s1_cfg	s1_cfg;
>>>   	struct arm_smmu_s2_cfg	s2_cfg;
>>> +	bool				abort;
>>>   
>>>   	struct iommu_domain		domain;
>> Thanks,
>>
>> Kunkun Jiang
>>
> .

Thanks

Kunkun Jiang


^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
  2020-12-01 13:58     ` Auger Eric
  2020-12-02 12:59       ` Wang Xingang
@ 2020-12-03 18:42       ` Shameerali Kolothum Thodi
  2020-12-04  9:53         ` Jean-Philippe Brucker
  2021-02-15 13:17         ` Auger Eric
  1 sibling, 2 replies; 57+ messages in thread
From: Shameerali Kolothum Thodi @ 2020-12-03 18:42 UTC (permalink / raw)
  To: Auger Eric, wangxingang
  Cc: Xieyingtai, jean-philippe, kvm, maz, joro, will, iommu,
	linux-kernel, vivek.gautam, alex.williamson, zhangfei.gao,
	robin.murphy, kvmarm, eric.auger.pro, Zengtao (B),
	qubingbing

Hi Eric,

> -----Original Message-----
> From: kvmarm-bounces@lists.cs.columbia.edu
> [mailto:kvmarm-bounces@lists.cs.columbia.edu] On Behalf Of Auger Eric
> Sent: 01 December 2020 13:59
> To: wangxingang <wangxingang5@huawei.com>
> Cc: Xieyingtai <xieyingtai@huawei.com>; jean-philippe@linaro.org;
> kvm@vger.kernel.org; maz@kernel.org; joro@8bytes.org; will@kernel.org;
> iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> vivek.gautam@arm.com; alex.williamson@redhat.com;
> zhangfei.gao@linaro.org; robin.murphy@arm.com;
> kvmarm@lists.cs.columbia.edu; eric.auger.pro@gmail.com
> Subject: Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with
> unmanaged ASIDs
> 
> Hi Xingang,
> 
> On 12/1/20 2:33 PM, Xingang Wang wrote:
> > Hi Eric
> >
> > On  Wed, 18 Nov 2020 12:21:43, Eric Auger wrote:
> >> @@ -1710,7 +1710,11 @@ static void arm_smmu_tlb_inv_context(void
> *cookie)
> >> 	 * insertion to guarantee those are observed before the TLBI. Do be
> >> 	 * careful, 007.
> >> 	 */
> >> -	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
> >> +	if (ext_asid >= 0) { /* guest stage 1 invalidation */
> >> +		cmd.opcode	= CMDQ_OP_TLBI_NH_ASID;
> >> +		cmd.tlbi.asid	= ext_asid;
> >> +		cmd.tlbi.vmid	= smmu_domain->s2_cfg.vmid;
> >> +	} else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
> >
> > Found a problem here, the cmd for guest stage 1 invalidation is built,
> > but it is not delivered to smmu.
> >
> 
> Thank you for the report. I will fix that soon. With that fixed, have
> you been able to run vSVA on top of the series. Do you need other stuff
> to be fixed at SMMU level? 

I am seeing another issue with this series. This is when you have the vSMMU
in non-strict mode(iommu.strict=0). Any network pass-through dev with iperf run 
will be enough to reproduce the issue. It may randomly stop/hang.

It looks like the .flush_iotlb_all from guest is not propagated down to the host
correctly. I have a temp hack to fix this in Qemu wherein CMDQ_OP_TLBI_NH_ASID
will result in a CACHE_INVALIDATE with IOMMU_INV_GRANU_PASID flag and archid
set.

Please take a look and let me know. 

As I am going to respin soon, please let me
> know what is the best branch to rebase to alleviate your integration.

Please find the latest kernel and Qemu branch with vSVA support added here,

https://github.com/hisilicon/kernel-dev/tree/5.10-rc4-2stage-v13-vsva
https://github.com/hisilicon/qemu/tree/v5.2.0-rc1-2stage-rfcv7-vsva

I have done some basic minimum vSVA tests on a HiSilicon D06 board with
a zip dev that supports STALL. All looks good so far apart from the issues
that have been already reported/discussed.

The kernel branch is actually a rebase of sva/uacce related patches from a
Linaro branch here,

https://github.com/Linaro/linux-kernel-uadk/tree/uacce-devel-5.10

I think going forward it will be good(if possible) to respin your series on top of
a sva branch with STALL/PRI support added. 

Hi Jean/zhangfei,
Is it possible to have a branch with minimum required SVA/UACCE related patches
that are already public and can be a "stable" candidate for future respin of Eric's series?
Please share your thoughts.

Thanks,
Shameer 

> Best Regards
> 
> Eric
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
  2020-12-03 18:42       ` Shameerali Kolothum Thodi
@ 2020-12-04  9:53         ` Jean-Philippe Brucker
  2020-12-04 10:20           ` Shameerali Kolothum Thodi
  2021-02-15 13:17         ` Auger Eric
  1 sibling, 1 reply; 57+ messages in thread
From: Jean-Philippe Brucker @ 2020-12-04  9:53 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: Auger Eric, wangxingang, Xieyingtai, kvm, maz, joro, will, iommu,
	linux-kernel, vivek.gautam, alex.williamson, zhangfei.gao,
	robin.murphy, kvmarm, eric.auger.pro, Zengtao (B),
	qubingbing

Hi Shameer,

On Thu, Dec 03, 2020 at 06:42:57PM +0000, Shameerali Kolothum Thodi wrote:
> Hi Jean/zhangfei,
> Is it possible to have a branch with minimum required SVA/UACCE related patches
> that are already public and can be a "stable" candidate for future respin of Eric's series?
> Please share your thoughts.

By "stable" you mean a fixed branch with the latest SVA/UACCE patches
based on mainline?  The uacce-devel branches from
https://github.com/Linaro/linux-kernel-uadk do provide this at the moment
(they track the latest sva/zip-devel branch
https://jpbrucker.net/git/linux/ which is roughly based on mainline.)

Thanks,
Jean


^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
  2020-12-04  9:53         ` Jean-Philippe Brucker
@ 2020-12-04 10:20           ` Shameerali Kolothum Thodi
  2020-12-04 10:23             ` Auger Eric
  0 siblings, 1 reply; 57+ messages in thread
From: Shameerali Kolothum Thodi @ 2020-12-04 10:20 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Auger Eric, wangxingang, Xieyingtai, kvm, maz, joro, will, iommu,
	linux-kernel, vivek.gautam, alex.williamson, zhangfei.gao,
	robin.murphy, kvmarm, eric.auger.pro, Zengtao (B),
	qubingbing

Hi Jean,

> -----Original Message-----
> From: Jean-Philippe Brucker [mailto:jean-philippe@linaro.org]
> Sent: 04 December 2020 09:54
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: Auger Eric <eric.auger@redhat.com>; wangxingang
> <wangxingang5@huawei.com>; Xieyingtai <xieyingtai@huawei.com>;
> kvm@vger.kernel.org; maz@kernel.org; joro@8bytes.org; will@kernel.org;
> iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> vivek.gautam@arm.com; alex.williamson@redhat.com;
> zhangfei.gao@linaro.org; robin.murphy@arm.com;
> kvmarm@lists.cs.columbia.edu; eric.auger.pro@gmail.com; Zengtao (B)
> <prime.zeng@hisilicon.com>; qubingbing <qubingbing@hisilicon.com>
> Subject: Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with
> unmanaged ASIDs
> 
> Hi Shameer,
> 
> On Thu, Dec 03, 2020 at 06:42:57PM +0000, Shameerali Kolothum Thodi wrote:
> > Hi Jean/zhangfei,
> > Is it possible to have a branch with minimum required SVA/UACCE related
> patches
> > that are already public and can be a "stable" candidate for future respin of
> Eric's series?
> > Please share your thoughts.
> 
> By "stable" you mean a fixed branch with the latest SVA/UACCE patches
> based on mainline? 

Yes. 

 The uacce-devel branches from
> https://github.com/Linaro/linux-kernel-uadk do provide this at the moment
> (they track the latest sva/zip-devel branch
> https://jpbrucker.net/git/linux/ which is roughly based on mainline.)

Thanks. 

Hi Eric,

Could you please take a look at the above branches and see whether it make sense
to rebase on top of either of those?

From vSVA point of view, it will be less rebase hassle if we can do that.

Thanks,
Shameer

> Thanks,
> Jean


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
  2020-12-04 10:20           ` Shameerali Kolothum Thodi
@ 2020-12-04 10:23             ` Auger Eric
  2021-01-14 16:58               ` Auger Eric
  0 siblings, 1 reply; 57+ messages in thread
From: Auger Eric @ 2020-12-04 10:23 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, Jean-Philippe Brucker
  Cc: wangxingang, Xieyingtai, kvm, maz, joro, will, iommu,
	linux-kernel, vivek.gautam, alex.williamson, zhangfei.gao,
	robin.murphy, kvmarm, eric.auger.pro, Zengtao (B),
	qubingbing

Hi Shameer, Jean-Philippe,

On 12/4/20 11:20 AM, Shameerali Kolothum Thodi wrote:
> Hi Jean,
> 
>> -----Original Message-----
>> From: Jean-Philippe Brucker [mailto:jean-philippe@linaro.org]
>> Sent: 04 December 2020 09:54
>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
>> Cc: Auger Eric <eric.auger@redhat.com>; wangxingang
>> <wangxingang5@huawei.com>; Xieyingtai <xieyingtai@huawei.com>;
>> kvm@vger.kernel.org; maz@kernel.org; joro@8bytes.org; will@kernel.org;
>> iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
>> vivek.gautam@arm.com; alex.williamson@redhat.com;
>> zhangfei.gao@linaro.org; robin.murphy@arm.com;
>> kvmarm@lists.cs.columbia.edu; eric.auger.pro@gmail.com; Zengtao (B)
>> <prime.zeng@hisilicon.com>; qubingbing <qubingbing@hisilicon.com>
>> Subject: Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with
>> unmanaged ASIDs
>>
>> Hi Shameer,
>>
>> On Thu, Dec 03, 2020 at 06:42:57PM +0000, Shameerali Kolothum Thodi wrote:
>>> Hi Jean/zhangfei,
>>> Is it possible to have a branch with minimum required SVA/UACCE related
>> patches
>>> that are already public and can be a "stable" candidate for future respin of
>> Eric's series?
>>> Please share your thoughts.
>>
>> By "stable" you mean a fixed branch with the latest SVA/UACCE patches
>> based on mainline? 
> 
> Yes. 
> 
>  The uacce-devel branches from
>> https://github.com/Linaro/linux-kernel-uadk do provide this at the moment
>> (they track the latest sva/zip-devel branch
>> https://jpbrucker.net/git/linux/ which is roughly based on mainline.)
> 
> Thanks. 
> 
> Hi Eric,
> 
> Could you please take a look at the above branches and see whether it make sense
> to rebase on top of either of those?
> 
> From vSVA point of view, it will be less rebase hassle if we can do that.

Sure. I will rebase on top of this ;-)

Thanks

Eric
> 
> Thanks,
> Shameer
> 
>> Thanks,
>> Jean
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [PATCH v13 05/15] iommu/smmuv3: Get prepared for nested stage support
  2020-11-18 11:21 ` [PATCH v13 05/15] iommu/smmuv3: Get prepared for nested stage support Eric Auger
  2020-11-19  3:59   ` kernel test robot
       [not found]   ` <a40b90bd-6756-c8cc-b455-c093d16d35f5@huawei.com>
@ 2020-12-09 14:26   ` Shameerali Kolothum Thodi
  2021-02-02  7:14   ` Keqian Zhu
  3 siblings, 0 replies; 57+ messages in thread
From: Shameerali Kolothum Thodi @ 2020-12-09 14:26 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, iommu, linux-kernel, kvm, kvmarm,
	will, joro, maz, robin.murphy, alex.williamson
  Cc: jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	jacob.jun.pan, yi.l.liu, tn, nicoleotsuka, yuzenghui, Zengtao (B),
	qubingbing

Hi Eric,

> -----Original Message-----
> From: Eric Auger [mailto:eric.auger@redhat.com]
> Sent: 18 November 2020 11:22
> To: eric.auger.pro@gmail.com; eric.auger@redhat.com;
> iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> kvm@vger.kernel.org; kvmarm@lists.cs.columbia.edu; will@kernel.org;
> joro@8bytes.org; maz@kernel.org; robin.murphy@arm.com;
> alex.williamson@redhat.com
> Cc: jean-philippe@linaro.org; zhangfei.gao@linaro.org;
> zhangfei.gao@gmail.com; vivek.gautam@arm.com; Shameerali Kolothum
> Thodi <shameerali.kolothum.thodi@huawei.com>;
> jacob.jun.pan@linux.intel.com; yi.l.liu@intel.com; tn@semihalf.com;
> nicoleotsuka@gmail.com; yuzenghui <yuzenghui@huawei.com>
> Subject: [PATCH v13 05/15] iommu/smmuv3: Get prepared for nested stage
> support
> 
> When nested stage translation is setup, both s1_cfg and
> s2_cfg are set.
> 
> We introduce a new smmu domain abort field that will be set
> upon guest stage1 configuration passing.
> 
> arm_smmu_write_strtab_ent() is modified to write both stage
> fields in the STE and deal with the abort field.
> 
> In nested mode, only stage 2 is "finalized" as the host does
> not own/configure the stage 1 context descriptor; guest does.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> ---
> v10 -> v11:
> - Fix an issue reported by Shameer when switching from with vSMMU
>   to without vSMMU. Despite the spec does not seem to mention it
>   seems to be needed to reset the 2 high 64b when switching from
>   S1+S2 cfg to S1 only. Especially dst[3] needs to be reset (S2TTB).
>   On some implementations, if the S2TTB is not reset, this causes
>   a C_BAD_STE error
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 64
> +++++++++++++++++----
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  2 +
>  2 files changed, 56 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 18ac5af1b284..412ea1bafa50 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1181,8 +1181,10 @@ static void arm_smmu_write_strtab_ent(struct
> arm_smmu_master *master, u32 sid,
>  	 * three cases at the moment:
>  	 *
>  	 * 1. Invalid (all zero) -> bypass/fault (init)
> -	 * 2. Bypass/fault -> translation/bypass (attach)
> -	 * 3. Translation/bypass -> bypass/fault (detach)
> +	 * 2. Bypass/fault -> single stage translation/bypass (attach)
> +	 * 3. Single or nested stage Translation/bypass -> bypass/fault (detach)
> +	 * 4. S2 -> S1 + S2 (attach_pasid_table)
> +	 * 5. S1 + S2 -> S2 (detach_pasid_table)
>  	 *
>  	 * Given that we can't update the STE atomically and the SMMU
>  	 * doesn't read the thing in a defined order, that leaves us
> @@ -1193,7 +1195,8 @@ static void arm_smmu_write_strtab_ent(struct
> arm_smmu_master *master, u32 sid,
>  	 * 3. Update Config, sync
>  	 */
>  	u64 val = le64_to_cpu(dst[0]);
> -	bool ste_live = false;
> +	bool s1_live = false, s2_live = false, ste_live;
> +	bool abort, nested = false, translate = false;
>  	struct arm_smmu_device *smmu = NULL;
>  	struct arm_smmu_s1_cfg *s1_cfg;
>  	struct arm_smmu_s2_cfg *s2_cfg;
> @@ -1233,6 +1236,8 @@ static void arm_smmu_write_strtab_ent(struct
> arm_smmu_master *master, u32 sid,
>  		default:
>  			break;
>  		}
> +		nested = s1_cfg->set && s2_cfg->set;

This is a problem when the Guest is booted with iommu.passthrough = 1 as we
set s1_cfg.set = false for IOMMU_PASID_CONFIG_BYPASS. 

Results in BUG_ON(ste_live && !nested).

Can we instead have nested = true set a bit above in the code, where we set
s2_cfg->set = true for the ARM_SMMU_DOMAIN_NESTED case?

Please take a look.

Thanks,
Shameer

> +		translate = s1_cfg->set || s2_cfg->set;
>  	}
> 
>  	if (val & STRTAB_STE_0_V) {
> @@ -1240,23 +1245,36 @@ static void arm_smmu_write_strtab_ent(struct
> arm_smmu_master *master, u32 sid,
>  		case STRTAB_STE_0_CFG_BYPASS:
>  			break;
>  		case STRTAB_STE_0_CFG_S1_TRANS:
> +			s1_live = true;
> +			break;
>  		case STRTAB_STE_0_CFG_S2_TRANS:
> -			ste_live = true;
> +			s2_live = true;
> +			break;
> +		case STRTAB_STE_0_CFG_NESTED:
> +			s1_live = true;
> +			s2_live = true;
>  			break;
>  		case STRTAB_STE_0_CFG_ABORT:
> -			BUG_ON(!disable_bypass);
>  			break;
>  		default:
>  			BUG(); /* STE corruption */
>  		}
>  	}
> 
> +	ste_live = s1_live || s2_live;
> +
>  	/* Nuke the existing STE_0 value, as we're going to rewrite it */
>  	val = STRTAB_STE_0_V;
> 
>  	/* Bypass/fault */
> -	if (!smmu_domain || !(s1_cfg->set || s2_cfg->set)) {
> -		if (!smmu_domain && disable_bypass)
> +
> +	if (!smmu_domain)
> +		abort = disable_bypass;
> +	else
> +		abort = smmu_domain->abort;
> +
> +	if (abort || !translate) {
> +		if (abort)
>  			val |= FIELD_PREP(STRTAB_STE_0_CFG,
> STRTAB_STE_0_CFG_ABORT);
>  		else
>  			val |= FIELD_PREP(STRTAB_STE_0_CFG,
> STRTAB_STE_0_CFG_BYPASS);
> @@ -1274,8 +1292,16 @@ static void arm_smmu_write_strtab_ent(struct
> arm_smmu_master *master, u32 sid,
>  		return;
>  	}
> 
> +	BUG_ON(ste_live && !nested);
> +
> +	if (ste_live) {
> +		/* First invalidate the live STE */
> +		dst[0] = cpu_to_le64(STRTAB_STE_0_CFG_ABORT);
> +		arm_smmu_sync_ste_for_sid(smmu, sid);
> +	}
> +
>  	if (s1_cfg->set) {
> -		BUG_ON(ste_live);
> +		BUG_ON(s1_live);
>  		dst[1] = cpu_to_le64(
>  			 FIELD_PREP(STRTAB_STE_1_S1DSS,
> STRTAB_STE_1_S1DSS_SSID0) |
>  			 FIELD_PREP(STRTAB_STE_1_S1CIR,
> STRTAB_STE_1_S1C_CACHE_WBRA) |
> @@ -1294,7 +1320,14 @@ static void arm_smmu_write_strtab_ent(struct
> arm_smmu_master *master, u32 sid,
>  	}
> 
>  	if (s2_cfg->set) {
> -		BUG_ON(ste_live);
> +		u64 vttbr = s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK;
> +
> +		if (s2_live) {
> +			u64 s2ttb = le64_to_cpu(dst[3] & STRTAB_STE_3_S2TTB_MASK);
> +
> +			BUG_ON(s2ttb != vttbr);
> +		}
> +
>  		dst[2] = cpu_to_le64(
>  			 FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) |
>  			 FIELD_PREP(STRTAB_STE_2_VTCR, s2_cfg->vtcr) |
> @@ -1304,9 +1337,12 @@ static void arm_smmu_write_strtab_ent(struct
> arm_smmu_master *master, u32 sid,
>  			 STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2AA64 |
>  			 STRTAB_STE_2_S2R);
> 
> -		dst[3] = cpu_to_le64(s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK);
> +		dst[3] = cpu_to_le64(vttbr);
> 
>  		val |= FIELD_PREP(STRTAB_STE_0_CFG,
> STRTAB_STE_0_CFG_S2_TRANS);
> +	} else {
> +		dst[2] = 0;
> +		dst[3] = 0;
>  	}
> 
>  	if (master->ats_enabled)
> @@ -1982,6 +2018,14 @@ static int arm_smmu_domain_finalise(struct
> iommu_domain *domain,
>  		return 0;
>  	}
> 
> +	if (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED &&
> +	    (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1) ||
> +	     !(smmu->features & ARM_SMMU_FEAT_TRANS_S2))) {
> +		dev_info(smmu_domain->smmu->dev,
> +			 "does not implement two stages\n");
> +		return -EINVAL;
> +	}
> +
>  	/* Restrict the stage to what we can actually support */
>  	if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1))
>  		smmu_domain->stage = ARM_SMMU_DOMAIN_S2;
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index 07f59252dd21..269779dee8d1 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -206,6 +206,7 @@
>  #define STRTAB_STE_0_CFG_BYPASS		4
>  #define STRTAB_STE_0_CFG_S1_TRANS	5
>  #define STRTAB_STE_0_CFG_S2_TRANS	6
> +#define STRTAB_STE_0_CFG_NESTED		7
> 
>  #define STRTAB_STE_0_S1FMT		GENMASK_ULL(5, 4)
>  #define STRTAB_STE_0_S1FMT_LINEAR	0
> @@ -682,6 +683,7 @@ struct arm_smmu_domain {
>  	enum arm_smmu_domain_stage	stage;
>  	struct arm_smmu_s1_cfg	s1_cfg;
>  	struct arm_smmu_s2_cfg	s2_cfg;
> +	bool				abort;
> 
>  	struct iommu_domain		domain;
> 
> --
> 2.21.3


^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)
  2020-11-18 11:21 [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part) Eric Auger
                   ` (14 preceding siblings ...)
  2020-11-18 11:21 ` [PATCH v13 15/15] iommu/smmuv3: Add PASID cache invalidation per PASID Eric Auger
@ 2021-01-08 17:05 ` Shameerali Kolothum Thodi
  2021-01-13 15:37   ` Auger Eric
  2021-02-21 18:21   ` Auger Eric
  2021-03-15 18:04 ` Krishna Reddy
  16 siblings, 2 replies; 57+ messages in thread
From: Shameerali Kolothum Thodi @ 2021-01-08 17:05 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, iommu, linux-kernel, kvm, kvmarm,
	will, joro, maz, robin.murphy, alex.williamson
  Cc: jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	jacob.jun.pan, yi.l.liu, tn, nicoleotsuka, yuzenghui, Zengtao (B),
	linuxarm

Hi Eric,

> -----Original Message-----
> From: Eric Auger [mailto:eric.auger@redhat.com]
> Sent: 18 November 2020 11:22
> To: eric.auger.pro@gmail.com; eric.auger@redhat.com;
> iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> kvm@vger.kernel.org; kvmarm@lists.cs.columbia.edu; will@kernel.org;
> joro@8bytes.org; maz@kernel.org; robin.murphy@arm.com;
> alex.williamson@redhat.com
> Cc: jean-philippe@linaro.org; zhangfei.gao@linaro.org;
> zhangfei.gao@gmail.com; vivek.gautam@arm.com; Shameerali Kolothum
> Thodi <shameerali.kolothum.thodi@huawei.com>;
> jacob.jun.pan@linux.intel.com; yi.l.liu@intel.com; tn@semihalf.com;
> nicoleotsuka@gmail.com; yuzenghui <yuzenghui@huawei.com>
> Subject: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)
> 
> This series brings the IOMMU part of HW nested paging support
> in the SMMUv3. The VFIO part is submitted separately.
> 
> The IOMMU API is extended to support 2 new API functionalities:
> 1) pass the guest stage 1 configuration
> 2) pass stage 1 MSI bindings
> 
> Then those capabilities gets implemented in the SMMUv3 driver.
> 
> The virtualizer passes information through the VFIO user API
> which cascades them to the iommu subsystem. This allows the guest
> to own stage 1 tables and context descriptors (so-called PASID
> table) while the host owns stage 2 tables and main configuration
> structures (STE).

I am seeing an issue with Guest testpmd run with this series.
I have two different setups and testpmd works fine with the
first one but not with the second.

1). Guest doesn't have kernel driver built-in for pass-through dev.

root@ubuntu:/# lspci -v
...
00:02.0 Ethernet controller: Huawei Technologies Co., Ltd. Device a22e (rev 21)
Subsystem: Huawei Technologies Co., Ltd. Device 0000
Flags: fast devsel
Memory at 8000100000 (64-bit, prefetchable) [disabled] [size=64K]
Memory at 8000000000 (64-bit, prefetchable) [disabled] [size=1M]
Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
Capabilities: [a0] MSI-X: Enable- Count=67 Masked-
Capabilities: [b0] Power Management version 3
Capabilities: [100] Access Control Services
Capabilities: [300] Transaction Processing Hints

root@ubuntu:/# echo vfio-pci > /sys/bus/pci/devices/0000:00:02.0/driver_override
root@ubuntu:/# echo 0000:00:02.0 > /sys/bus/pci/drivers_probe

root@ubuntu:/mnt/dpdk/build/app# ./testpmd -w 0000:00:02.0 --file-prefix socket0  -l 0-1 -n 2 -- -i
EAL: Detected 8 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/socket0/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: No available hugepages reported in hugepages-32768kB
EAL: No available hugepages reported in hugepages-64kB
EAL: No available hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL:   Invalid NUMA socket, default to 0
EAL:   using IOMMU type 1 (Type 1)
EAL: Probe PCI driver: net_hns3_vf (19e5:a22e) device: 0000:00:02.0 (socket 0)
EAL: No legacy callbacks, legacy socket not created
Interactive-mode selected
testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=155456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc

Warning! port-topology=paired and odd forward ports number, the last port will pair with itself.

Configuring Port 0 (socket 0)
Port 0: 8E:A6:8C:43:43:45
Checking link statuses...
Done
testpmd>

2). Guest have kernel driver built-in for pass-through dev.

root@ubuntu:/# lspci -v
...
00:02.0 Ethernet controller: Huawei Technologies Co., Ltd. Device a22e (rev 21)
Subsystem: Huawei Technologies Co., Ltd. Device 0000
Flags: bus master, fast devsel, latency 0
Memory at 8000100000 (64-bit, prefetchable) [size=64K]
Memory at 8000000000 (64-bit, prefetchable) [size=1M]
Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
Capabilities: [a0] MSI-X: Enable+ Count=67 Masked-
Capabilities: [b0] Power Management version 3
Capabilities: [100] Access Control Services
Capabilities: [300] Transaction Processing Hints
Kernel driver in use: hns3

root@ubuntu:/# echo vfio-pci > /sys/bus/pci/devices/0000:00:02.0/driver_override
root@ubuntu:/# echo 0000:00:02.0 > /sys/bus/pci/drivers/hns3/unbind
root@ubuntu:/# echo 0000:00:02.0 > /sys/bus/pci/drivers_probe

root@ubuntu:/mnt/dpdk/build/app# ./testpmd -w 0000:00:02.0 --file-prefix socket0 -l 0-1 -n 2 -- -i
EAL: Detected 8 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/socket0/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: No available hugepages reported in hugepages-32768kB
EAL: No available hugepages reported in hugepages-64kB
EAL: No available hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL:   Invalid NUMA socket, default to 0
EAL:   using IOMMU type 1 (Type 1)
EAL: Probe PCI driver: net_hns3_vf (19e5:a22e) device: 0000:00:02.0 (socket 0)
0000:00:02.0 hns3_get_mbx_resp(): VF could not get mbx(11,0) head(1) tail(0) lost(1) from PF in_irq:0
hns3vf_get_queue_info(): Failed to get tqp info from PF: -62
hns3vf_init_vf(): Failed to fetch configuration: -62
hns3vf_dev_init(): Failed to init vf: -62
EAL: Releasing pci mapped resource for 0000:00:02.0
EAL: Calling pci_unmap_resource for 0000:00:02.0 at 0x1100800000
EAL: Calling pci_unmap_resource for 0000:00:02.0 at 0x1100810000
EAL: Requested device 0000:00:02.0 cannot be used
EAL: Bus (pci) probe failed.
EAL: No legacy callbacks, legacy socket not created
testpmd: No probed ethernet devices
Interactive-mode selected
testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=155456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
Done
testpmd>

And in this case, smmu(host) reports a translation fault,

[ 6542.670624] arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
[ 6542.670630] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00007d1200000010
[ 6542.670631] arm-smmu-v3 arm-smmu-v3.2.auto: 0x000012000000007c
[ 6542.670633] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000fffef040
[ 6542.670634] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000fffef000

Tested with Intel 82599 card(ixgbevf) as well. but same errror.

Not able to root cause the problem yet. With the hope that, this is 
related to tlb entries not being invlaidated properly, I tried explicitly
issuing CMD_TLBI_NSNH_ALL and CMD_CFGI_CD_ALL just before
the STE update, but no luck yet :(

Please let me know if I am missing something here or has any clue if you
can replicate this on your setup.

Thanks,
Shameer

> 
> Best Regards
> 
> Eric
> 
> This series can be found at:
> https://github.com/eauger/linux/tree/5.10-rc4-2stage-v13
> (including the VFIO part in his last version: v11)
> 
> The series includes a patch from Jean-Philippe. It is better to
> review the original patch:
> [PATCH v8 2/9] iommu/arm-smmu-v3: Maintain a SID->device structure
> 
> The VFIO series is sent separately.
> 
> History:
> 
> v12 -> v13:
> - fixed compilation issue with CONFIG_ARM_SMMU_V3_SVA
>   reported by Shameer. This urged me to revisit patch 4 into
>   iommu/smmuv3: Allow s1 and s2 configs to coexist where
>   s1_cfg and s2_cfg are not dynamically allocated anymore.
>   Instead I use a new set field in existing structs
> - fixed 2 others config checks
> - Updated "iommu/arm-smmu-v3: Maintain a SID->device structure"
>   according to the last version
> 
> v11 -> v12:
> - rebase on top of v5.10-rc4
> 
> Eric Auger (14):
>   iommu: Introduce attach/detach_pasid_table API
>   iommu: Introduce bind/unbind_guest_msi
>   iommu/smmuv3: Allow s1 and s2 configs to coexist
>   iommu/smmuv3: Get prepared for nested stage support
>   iommu/smmuv3: Implement attach/detach_pasid_table
>   iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
>   iommu/smmuv3: Implement cache_invalidate
>   dma-iommu: Implement NESTED_MSI cookie
>   iommu/smmuv3: Nested mode single MSI doorbell per domain enforcement
>   iommu/smmuv3: Enforce incompatibility between nested mode and HW MSI
>     regions
>   iommu/smmuv3: Implement bind/unbind_guest_msi
>   iommu/smmuv3: Report non recoverable faults
>   iommu/smmuv3: Accept configs with more than one context descriptor
>   iommu/smmuv3: Add PASID cache invalidation per PASID
> 
> Jean-Philippe Brucker (1):
>   iommu/arm-smmu-v3: Maintain a SID->device structure
> 
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 659
> ++++++++++++++++++--
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 103 ++-
>  drivers/iommu/dma-iommu.c                   | 142 ++++-
>  drivers/iommu/iommu.c                       | 105 ++++
>  include/linux/dma-iommu.h                   |  16 +
>  include/linux/iommu.h                       |  41 ++
>  include/uapi/linux/iommu.h                  |  54 ++
>  7 files changed, 1042 insertions(+), 78 deletions(-)
> 
> --
> 2.21.3


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)
  2021-01-08 17:05 ` [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part) Shameerali Kolothum Thodi
@ 2021-01-13 15:37   ` Auger Eric
  2021-02-21 18:21   ` Auger Eric
  1 sibling, 0 replies; 57+ messages in thread
From: Auger Eric @ 2021-01-13 15:37 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, eric.auger.pro, iommu, linux-kernel,
	kvm, kvmarm, will, joro, maz, robin.murphy, alex.williamson
  Cc: jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	jacob.jun.pan, yi.l.liu, tn, nicoleotsuka, yuzenghui, Zengtao (B),
	linuxarm

Hi Shameer,

On 1/8/21 6:05 PM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
> 
>> -----Original Message-----
>> From: Eric Auger [mailto:eric.auger@redhat.com]
>> Sent: 18 November 2020 11:22
>> To: eric.auger.pro@gmail.com; eric.auger@redhat.com;
>> iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
>> kvm@vger.kernel.org; kvmarm@lists.cs.columbia.edu; will@kernel.org;
>> joro@8bytes.org; maz@kernel.org; robin.murphy@arm.com;
>> alex.williamson@redhat.com
>> Cc: jean-philippe@linaro.org; zhangfei.gao@linaro.org;
>> zhangfei.gao@gmail.com; vivek.gautam@arm.com; Shameerali Kolothum
>> Thodi <shameerali.kolothum.thodi@huawei.com>;
>> jacob.jun.pan@linux.intel.com; yi.l.liu@intel.com; tn@semihalf.com;
>> nicoleotsuka@gmail.com; yuzenghui <yuzenghui@huawei.com>
>> Subject: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)
>>
>> This series brings the IOMMU part of HW nested paging support
>> in the SMMUv3. The VFIO part is submitted separately.
>>
>> The IOMMU API is extended to support 2 new API functionalities:
>> 1) pass the guest stage 1 configuration
>> 2) pass stage 1 MSI bindings
>>
>> Then those capabilities gets implemented in the SMMUv3 driver.
>>
>> The virtualizer passes information through the VFIO user API
>> which cascades them to the iommu subsystem. This allows the guest
>> to own stage 1 tables and context descriptors (so-called PASID
>> table) while the host owns stage 2 tables and main configuration
>> structures (STE).
> 
> I am seeing an issue with Guest testpmd run with this series.
> I have two different setups and testpmd works fine with the
> first one but not with the second.
> 
> 1). Guest doesn't have kernel driver built-in for pass-through dev.
> 
> root@ubuntu:/# lspci -v
> ...
> 00:02.0 Ethernet controller: Huawei Technologies Co., Ltd. Device a22e (rev 21)
> Subsystem: Huawei Technologies Co., Ltd. Device 0000
> Flags: fast devsel
> Memory at 8000100000 (64-bit, prefetchable) [disabled] [size=64K]
> Memory at 8000000000 (64-bit, prefetchable) [disabled] [size=1M]
> Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
> Capabilities: [a0] MSI-X: Enable- Count=67 Masked-
> Capabilities: [b0] Power Management version 3
> Capabilities: [100] Access Control Services
> Capabilities: [300] Transaction Processing Hints
> 
> root@ubuntu:/# echo vfio-pci > /sys/bus/pci/devices/0000:00:02.0/driver_override
> root@ubuntu:/# echo 0000:00:02.0 > /sys/bus/pci/drivers_probe
> 
> root@ubuntu:/mnt/dpdk/build/app# ./testpmd -w 0000:00:02.0 --file-prefix socket0  -l 0-1 -n 2 -- -i
> EAL: Detected 8 lcore(s)
> EAL: Detected 1 NUMA nodes
> EAL: Multi-process socket /var/run/dpdk/socket0/mp_socket
> EAL: Selected IOVA mode 'VA'
> EAL: No available hugepages reported in hugepages-32768kB
> EAL: No available hugepages reported in hugepages-64kB
> EAL: No available hugepages reported in hugepages-1048576kB
> EAL: Probing VFIO support...
> EAL: VFIO support initialized
> EAL:   Invalid NUMA socket, default to 0
> EAL:   using IOMMU type 1 (Type 1)
> EAL: Probe PCI driver: net_hns3_vf (19e5:a22e) device: 0000:00:02.0 (socket 0)
> EAL: No legacy callbacks, legacy socket not created
> Interactive-mode selected
> testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=155456, size=2176, socket=0
> testpmd: preferred mempool ops selected: ring_mp_mc
> 
> Warning! port-topology=paired and odd forward ports number, the last port will pair with itself.
> 
> Configuring Port 0 (socket 0)
> Port 0: 8E:A6:8C:43:43:45
> Checking link statuses...
> Done
> testpmd>
> 
> 2). Guest have kernel driver built-in for pass-through dev.
> 
> root@ubuntu:/# lspci -v
> ...
> 00:02.0 Ethernet controller: Huawei Technologies Co., Ltd. Device a22e (rev 21)
> Subsystem: Huawei Technologies Co., Ltd. Device 0000
> Flags: bus master, fast devsel, latency 0
> Memory at 8000100000 (64-bit, prefetchable) [size=64K]
> Memory at 8000000000 (64-bit, prefetchable) [size=1M]
> Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
> Capabilities: [a0] MSI-X: Enable+ Count=67 Masked-
> Capabilities: [b0] Power Management version 3
> Capabilities: [100] Access Control Services
> Capabilities: [300] Transaction Processing Hints
> Kernel driver in use: hns3
> 
> root@ubuntu:/# echo vfio-pci > /sys/bus/pci/devices/0000:00:02.0/driver_override
> root@ubuntu:/# echo 0000:00:02.0 > /sys/bus/pci/drivers/hns3/unbind
> root@ubuntu:/# echo 0000:00:02.0 > /sys/bus/pci/drivers_probe
> 
> root@ubuntu:/mnt/dpdk/build/app# ./testpmd -w 0000:00:02.0 --file-prefix socket0 -l 0-1 -n 2 -- -i
> EAL: Detected 8 lcore(s)
> EAL: Detected 1 NUMA nodes
> EAL: Multi-process socket /var/run/dpdk/socket0/mp_socket
> EAL: Selected IOVA mode 'VA'
> EAL: No available hugepages reported in hugepages-32768kB
> EAL: No available hugepages reported in hugepages-64kB
> EAL: No available hugepages reported in hugepages-1048576kB
> EAL: Probing VFIO support...
> EAL: VFIO support initialized
> EAL:   Invalid NUMA socket, default to 0
> EAL:   using IOMMU type 1 (Type 1)
> EAL: Probe PCI driver: net_hns3_vf (19e5:a22e) device: 0000:00:02.0 (socket 0)
> 0000:00:02.0 hns3_get_mbx_resp(): VF could not get mbx(11,0) head(1) tail(0) lost(1) from PF in_irq:0
> hns3vf_get_queue_info(): Failed to get tqp info from PF: -62
> hns3vf_init_vf(): Failed to fetch configuration: -62
> hns3vf_dev_init(): Failed to init vf: -62
> EAL: Releasing pci mapped resource for 0000:00:02.0
> EAL: Calling pci_unmap_resource for 0000:00:02.0 at 0x1100800000
> EAL: Calling pci_unmap_resource for 0000:00:02.0 at 0x1100810000
> EAL: Requested device 0000:00:02.0 cannot be used
> EAL: Bus (pci) probe failed.
> EAL: No legacy callbacks, legacy socket not created
> testpmd: No probed ethernet devices
> Interactive-mode selected
> testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=155456, size=2176, socket=0
> testpmd: preferred mempool ops selected: ring_mp_mc
> Done
> testpmd>
> 
> And in this case, smmu(host) reports a translation fault,
> 
> [ 6542.670624] arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
> [ 6542.670630] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00007d1200000010
> [ 6542.670631] arm-smmu-v3 arm-smmu-v3.2.auto: 0x000012000000007c
> [ 6542.670633] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000fffef040
> [ 6542.670634] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000fffef000
> 
> Tested with Intel 82599 card(ixgbevf) as well. but same errror.
> 
> Not able to root cause the problem yet. With the hope that, this is 
> related to tlb entries not being invlaidated properly, I tried explicitly
> issuing CMD_TLBI_NSNH_ALL and CMD_CFGI_CD_ALL just before
> the STE update, but no luck yet :(
> 
> Please let me know if I am missing something here or has any clue if you
> can replicate this on your setup.

Thank for for the report. I need to setup the DPDK environment again and
I will test on my end. I plan to respin next week and study all the
issues you reported up to now. I will rebase on Jean's last branch.

Thanks

Eric
> 
> Thanks,
> Shameer
> 
>>
>> Best Regards
>>
>> Eric
>>
>> This series can be found at:
>> https://github.com/eauger/linux/tree/5.10-rc4-2stage-v13
>> (including the VFIO part in his last version: v11)
>>
>> The series includes a patch from Jean-Philippe. It is better to
>> review the original patch:
>> [PATCH v8 2/9] iommu/arm-smmu-v3: Maintain a SID->device structure
>>
>> The VFIO series is sent separately.
>>
>> History:
>>
>> v12 -> v13:
>> - fixed compilation issue with CONFIG_ARM_SMMU_V3_SVA
>>   reported by Shameer. This urged me to revisit patch 4 into
>>   iommu/smmuv3: Allow s1 and s2 configs to coexist where
>>   s1_cfg and s2_cfg are not dynamically allocated anymore.
>>   Instead I use a new set field in existing structs
>> - fixed 2 others config checks
>> - Updated "iommu/arm-smmu-v3: Maintain a SID->device structure"
>>   according to the last version
>>
>> v11 -> v12:
>> - rebase on top of v5.10-rc4
>>
>> Eric Auger (14):
>>   iommu: Introduce attach/detach_pasid_table API
>>   iommu: Introduce bind/unbind_guest_msi
>>   iommu/smmuv3: Allow s1 and s2 configs to coexist
>>   iommu/smmuv3: Get prepared for nested stage support
>>   iommu/smmuv3: Implement attach/detach_pasid_table
>>   iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
>>   iommu/smmuv3: Implement cache_invalidate
>>   dma-iommu: Implement NESTED_MSI cookie
>>   iommu/smmuv3: Nested mode single MSI doorbell per domain enforcement
>>   iommu/smmuv3: Enforce incompatibility between nested mode and HW MSI
>>     regions
>>   iommu/smmuv3: Implement bind/unbind_guest_msi
>>   iommu/smmuv3: Report non recoverable faults
>>   iommu/smmuv3: Accept configs with more than one context descriptor
>>   iommu/smmuv3: Add PASID cache invalidation per PASID
>>
>> Jean-Philippe Brucker (1):
>>   iommu/arm-smmu-v3: Maintain a SID->device structure
>>
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 659
>> ++++++++++++++++++--
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 103 ++-
>>  drivers/iommu/dma-iommu.c                   | 142 ++++-
>>  drivers/iommu/iommu.c                       | 105 ++++
>>  include/linux/dma-iommu.h                   |  16 +
>>  include/linux/iommu.h                       |  41 ++
>>  include/uapi/linux/iommu.h                  |  54 ++
>>  7 files changed, 1042 insertions(+), 78 deletions(-)
>>
>> --
>> 2.21.3
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
  2020-12-04 10:23             ` Auger Eric
@ 2021-01-14 16:58               ` Auger Eric
  2021-01-14 17:09                 ` Shameerali Kolothum Thodi
  2021-01-14 17:33                 ` Jean-Philippe Brucker
  0 siblings, 2 replies; 57+ messages in thread
From: Auger Eric @ 2021-01-14 16:58 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, Jean-Philippe Brucker
  Cc: Xieyingtai, alex.williamson, wangxingang, kvm, maz, linux-kernel,
	vivek.gautam, iommu, qubingbing, Zengtao (B),
	zhangfei.gao, eric.auger.pro, will, kvmarm, robin.murphy

Hi Shameer, Jean-Philippe,

On 12/4/20 11:23 AM, Auger Eric wrote:
> Hi Shameer, Jean-Philippe,
> 
> On 12/4/20 11:20 AM, Shameerali Kolothum Thodi wrote:
>> Hi Jean,
>>
>>> -----Original Message-----
>>> From: Jean-Philippe Brucker [mailto:jean-philippe@linaro.org]
>>> Sent: 04 December 2020 09:54
>>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
>>> Cc: Auger Eric <eric.auger@redhat.com>; wangxingang
>>> <wangxingang5@huawei.com>; Xieyingtai <xieyingtai@huawei.com>;
>>> kvm@vger.kernel.org; maz@kernel.org; joro@8bytes.org; will@kernel.org;
>>> iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
>>> vivek.gautam@arm.com; alex.williamson@redhat.com;
>>> zhangfei.gao@linaro.org; robin.murphy@arm.com;
>>> kvmarm@lists.cs.columbia.edu; eric.auger.pro@gmail.com; Zengtao (B)
>>> <prime.zeng@hisilicon.com>; qubingbing <qubingbing@hisilicon.com>
>>> Subject: Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with
>>> unmanaged ASIDs
>>>
>>> Hi Shameer,
>>>
>>> On Thu, Dec 03, 2020 at 06:42:57PM +0000, Shameerali Kolothum Thodi wrote:
>>>> Hi Jean/zhangfei,
>>>> Is it possible to have a branch with minimum required SVA/UACCE related
>>> patches
>>>> that are already public and can be a "stable" candidate for future respin of
>>> Eric's series?
>>>> Please share your thoughts.
>>>
>>> By "stable" you mean a fixed branch with the latest SVA/UACCE patches
>>> based on mainline? 
>>
>> Yes. 
>>
>>  The uacce-devel branches from
>>> https://github.com/Linaro/linux-kernel-uadk do provide this at the moment
>>> (they track the latest sva/zip-devel branch
>>> https://jpbrucker.net/git/linux/ which is roughly based on mainline.)
As I plan to respin shortly, please could you confirm the best branch to
rebase on still is that one (uacce-devel from the linux-kernel-uadk git
repo). Is it up to date? Commits seem to be quite old there.

Thanks

Eric
>>
>> Thanks. 
>>
>> Hi Eric,
>>
>> Could you please take a look at the above branches and see whether it make sense
>> to rebase on top of either of those?
>>
>> From vSVA point of view, it will be less rebase hassle if we can do that.
> 
> Sure. I will rebase on top of this ;-)
> 
> Thanks
> 
> Eric
>>
>> Thanks,
>> Shameer
>>
>>> Thanks,
>>> Jean
>>
> 
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
  2021-01-14 16:58               ` Auger Eric
@ 2021-01-14 17:09                 ` Shameerali Kolothum Thodi
  2021-01-14 17:33                 ` Jean-Philippe Brucker
  1 sibling, 0 replies; 57+ messages in thread
From: Shameerali Kolothum Thodi @ 2021-01-14 17:09 UTC (permalink / raw)
  To: Auger Eric, Jean-Philippe Brucker
  Cc: Xieyingtai, alex.williamson, wangxingang, kvm, maz, linux-kernel,
	vivek.gautam, iommu, qubingbing, Zengtao (B),
	zhangfei.gao, eric.auger.pro, will, kvmarm, robin.murphy

Hi Eric,

> -----Original Message-----
> From: Auger Eric [mailto:eric.auger@redhat.com]
> Sent: 14 January 2021 16:58
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
> Jean-Philippe Brucker <jean-philippe@linaro.org>
> Cc: Xieyingtai <xieyingtai@huawei.com>; alex.williamson@redhat.com;
> wangxingang <wangxingang5@huawei.com>; kvm@vger.kernel.org;
> maz@kernel.org; linux-kernel@vger.kernel.org; vivek.gautam@arm.com;
> iommu@lists.linux-foundation.org; qubingbing <qubingbing@hisilicon.com>;
> Zengtao (B) <prime.zeng@hisilicon.com>; zhangfei.gao@linaro.org;
> eric.auger.pro@gmail.com; will@kernel.org; kvmarm@lists.cs.columbia.edu;
> robin.murphy@arm.com
> Subject: Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with
> unmanaged ASIDs
> 
> Hi Shameer, Jean-Philippe,
> 
> On 12/4/20 11:23 AM, Auger Eric wrote:
> > Hi Shameer, Jean-Philippe,
> >
> > On 12/4/20 11:20 AM, Shameerali Kolothum Thodi wrote:
> >> Hi Jean,
> >>
> >>> -----Original Message-----
> >>> From: Jean-Philippe Brucker [mailto:jean-philippe@linaro.org]
> >>> Sent: 04 December 2020 09:54
> >>> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>
> >>> Cc: Auger Eric <eric.auger@redhat.com>; wangxingang
> >>> <wangxingang5@huawei.com>; Xieyingtai <xieyingtai@huawei.com>;
> >>> kvm@vger.kernel.org; maz@kernel.org; joro@8bytes.org;
> will@kernel.org;
> >>> iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> >>> vivek.gautam@arm.com; alex.williamson@redhat.com;
> >>> zhangfei.gao@linaro.org; robin.murphy@arm.com;
> >>> kvmarm@lists.cs.columbia.edu; eric.auger.pro@gmail.com; Zengtao (B)
> >>> <prime.zeng@hisilicon.com>; qubingbing <qubingbing@hisilicon.com>
> >>> Subject: Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation
> with
> >>> unmanaged ASIDs
> >>>
> >>> Hi Shameer,
> >>>
> >>> On Thu, Dec 03, 2020 at 06:42:57PM +0000, Shameerali Kolothum Thodi
> wrote:
> >>>> Hi Jean/zhangfei,
> >>>> Is it possible to have a branch with minimum required SVA/UACCE related
> >>> patches
> >>>> that are already public and can be a "stable" candidate for future respin
> of
> >>> Eric's series?
> >>>> Please share your thoughts.
> >>>
> >>> By "stable" you mean a fixed branch with the latest SVA/UACCE patches
> >>> based on mainline?
> >>
> >> Yes.
> >>
> >>  The uacce-devel branches from
> >>> https://github.com/Linaro/linux-kernel-uadk do provide this at the moment
> >>> (they track the latest sva/zip-devel branch
> >>> https://jpbrucker.net/git/linux/ which is roughly based on mainline.)
> As I plan to respin shortly, please could you confirm the best branch to
> rebase on still is that one (uacce-devel from the linux-kernel-uadk git
> repo). Is it up to date? Commits seem to be quite old there.

I think it is the uacce-devel-5.11 branch, but will wait for Jean or Zhangfei
to confirm.

Thanks,
Shameer

> Thanks
> 
> Eric
> >>
> >> Thanks.
> >>
> >> Hi Eric,
> >>
> >> Could you please take a look at the above branches and see whether it make
> sense
> >> to rebase on top of either of those?
> >>
> >> From vSVA point of view, it will be less rebase hassle if we can do that.
> >
> > Sure. I will rebase on top of this ;-)
> >
> > Thanks
> >
> > Eric
> >>
> >> Thanks,
> >> Shameer
> >>
> >>> Thanks,
> >>> Jean
> >>
> >
> > _______________________________________________
> > iommu mailing list
> > iommu@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/iommu
> >


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
  2021-01-14 16:58               ` Auger Eric
  2021-01-14 17:09                 ` Shameerali Kolothum Thodi
@ 2021-01-14 17:33                 ` Jean-Philippe Brucker
  2021-01-14 18:00                   ` Auger Eric
  1 sibling, 1 reply; 57+ messages in thread
From: Jean-Philippe Brucker @ 2021-01-14 17:33 UTC (permalink / raw)
  To: Auger Eric
  Cc: Shameerali Kolothum Thodi, Xieyingtai, alex.williamson,
	wangxingang, kvm, maz, linux-kernel, vivek.gautam, iommu,
	qubingbing, Zengtao (B),
	zhangfei.gao, eric.auger.pro, will, kvmarm, robin.murphy

Hi Eric,

On Thu, Jan 14, 2021 at 05:58:27PM +0100, Auger Eric wrote:
> >>  The uacce-devel branches from
> >>> https://github.com/Linaro/linux-kernel-uadk do provide this at the moment
> >>> (they track the latest sva/zip-devel branch
> >>> https://jpbrucker.net/git/linux/ which is roughly based on mainline.)
> As I plan to respin shortly, please could you confirm the best branch to
> rebase on still is that one (uacce-devel from the linux-kernel-uadk git
> repo). Is it up to date? Commits seem to be quite old there.

Right I meant the uacce-devel-X branches. The uacce-devel-5.11 branch
currently has the latest patches

Thanks,
Jean

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
  2021-01-14 17:33                 ` Jean-Philippe Brucker
@ 2021-01-14 18:00                   ` Auger Eric
  0 siblings, 0 replies; 57+ messages in thread
From: Auger Eric @ 2021-01-14 18:00 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Shameerali Kolothum Thodi, Xieyingtai, alex.williamson,
	wangxingang, kvm, maz, linux-kernel, vivek.gautam, iommu,
	qubingbing, Zengtao (B),
	zhangfei.gao, eric.auger.pro, will, kvmarm, robin.murphy

Hi Jean,

On 1/14/21 6:33 PM, Jean-Philippe Brucker wrote:
> Hi Eric,
> 
> On Thu, Jan 14, 2021 at 05:58:27PM +0100, Auger Eric wrote:
>>>>  The uacce-devel branches from
>>>>> https://github.com/Linaro/linux-kernel-uadk do provide this at the moment
>>>>> (they track the latest sva/zip-devel branch
>>>>> https://jpbrucker.net/git/linux/ which is roughly based on mainline.)
>> As I plan to respin shortly, please could you confirm the best branch to
>> rebase on still is that one (uacce-devel from the linux-kernel-uadk git
>> repo). Is it up to date? Commits seem to be quite old there.
> 
> Right I meant the uacce-devel-X branches. The uacce-devel-5.11 branch
> currently has the latest patches

OK thanks!

Eric
> 
> Thanks,
> Jean
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v13 01/15] iommu: Introduce attach/detach_pasid_table API
  2020-11-18 11:21 ` [PATCH v13 01/15] iommu: Introduce attach/detach_pasid_table API Eric Auger
  2020-11-18 16:19   ` Jacob Pan
@ 2021-02-01 11:27   ` Keqian Zhu
  2021-02-01 17:18     ` Auger Eric
  1 sibling, 1 reply; 57+ messages in thread
From: Keqian Zhu @ 2021-02-01 11:27 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, iommu, linux-kernel, kvm, kvmarm,
	will, joro, maz, robin.murphy, alex.williamson
  Cc: jean-philippe, jacob.jun.pan, nicoleotsuka, vivek.gautam,
	yi.l.liu, zhangfei.gao

Hi Eric,

On 2020/11/18 19:21, Eric Auger wrote:
> In virtualization use case, when a guest is assigned
> a PCI host device, protected by a virtual IOMMU on the guest,
> the physical IOMMU must be programmed to be consistent with
> the guest mappings. If the physical IOMMU supports two
> translation stages it makes sense to program guest mappings
> onto the first stage/level (ARM/Intel terminology) while the host
> owns the stage/level 2.
> 
> In that case, it is mandated to trap on guest configuration
> settings and pass those to the physical iommu driver.
> 
> This patch adds a new API to the iommu subsystem that allows
> to set/unset the pasid table information.
> 
> A generic iommu_pasid_table_config struct is introduced in
> a new iommu.h uapi header. This is going to be used by the VFIO
> user API.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> ---
> 
> v12 -> v13:
> - Fix config check
> 
> v11 -> v12:
> - add argsz, name the union
> ---
>  drivers/iommu/iommu.c      | 68 ++++++++++++++++++++++++++++++++++++++
>  include/linux/iommu.h      | 21 ++++++++++++
>  include/uapi/linux/iommu.h | 54 ++++++++++++++++++++++++++++++
>  3 files changed, 143 insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index b53446bb8c6b..978fe34378fb 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2171,6 +2171,74 @@ int iommu_uapi_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev
>  }
>  EXPORT_SYMBOL_GPL(iommu_uapi_sva_unbind_gpasid);
>  
> +int iommu_attach_pasid_table(struct iommu_domain *domain,
> +			     struct iommu_pasid_table_config *cfg)
> +{
> +	if (unlikely(!domain->ops->attach_pasid_table))
> +		return -ENODEV;
> +
> +	return domain->ops->attach_pasid_table(domain, cfg);
> +}
miss export symbol?

> +
> +int iommu_uapi_attach_pasid_table(struct iommu_domain *domain,
> +				  void __user *uinfo)
> +{
> +	struct iommu_pasid_table_config pasid_table_data = { 0 };
> +	u32 minsz;
> +
> +	if (unlikely(!domain->ops->attach_pasid_table))
> +		return -ENODEV;
> +
> +	/*
> +	 * No new spaces can be added before the variable sized union, the
> +	 * minimum size is the offset to the union.
> +	 */
> +	minsz = offsetof(struct iommu_pasid_table_config, vendor_data);
> +
> +	/* Copy minsz from user to get flags and argsz */
> +	if (copy_from_user(&pasid_table_data, uinfo, minsz))
> +		return -EFAULT;
> +
> +	/* Fields before the variable size union are mandatory */
> +	if (pasid_table_data.argsz < minsz)
> +		return -EINVAL;
> +
> +	/* PASID and address granu require additional info beyond minsz */
> +	if (pasid_table_data.version != PASID_TABLE_CFG_VERSION_1)
> +		return -EINVAL;
> +	if (pasid_table_data.format == IOMMU_PASID_FORMAT_SMMUV3 &&
> +	    pasid_table_data.argsz <
> +		offsetofend(struct iommu_pasid_table_config, vendor_data.smmuv3))
> +		return -EINVAL;
> +
> +	/*
> +	 * User might be using a newer UAPI header which has a larger data
> +	 * size, we shall support the existing flags within the current
> +	 * size. Copy the remaining user data _after_ minsz but not more
> +	 * than the current kernel supported size.
> +	 */
> +	if (copy_from_user((void *)&pasid_table_data + minsz, uinfo + minsz,
> +			   min_t(u32, pasid_table_data.argsz, sizeof(pasid_table_data)) - minsz))
> +		return -EFAULT;
> +
> +	/* Now the argsz is validated, check the content */
> +	if (pasid_table_data.config < IOMMU_PASID_CONFIG_TRANSLATE ||
> +	    pasid_table_data.config > IOMMU_PASID_CONFIG_ABORT)
> +		return -EINVAL;
> +
> +	return domain->ops->attach_pasid_table(domain, &pasid_table_data);
> +}
> +EXPORT_SYMBOL_GPL(iommu_uapi_attach_pasid_table);
> +
> +void iommu_detach_pasid_table(struct iommu_domain *domain)
> +{
> +	if (unlikely(!domain->ops->detach_pasid_table))
> +		return;
> +
> +	domain->ops->detach_pasid_table(domain);
> +}
> +EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
> +
>  static void __iommu_detach_device(struct iommu_domain *domain,
>  				  struct device *dev)
>  {
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index b95a6f8db6ff..464fcbecf841 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -223,6 +223,8 @@ struct iommu_iotlb_gather {
>   * @cache_invalidate: invalidate translation caches
>   * @sva_bind_gpasid: bind guest pasid and mm
>   * @sva_unbind_gpasid: unbind guest pasid and mm
> + * @attach_pasid_table: attach a pasid table
> + * @detach_pasid_table: detach the pasid table
>   * @def_domain_type: device default domain type, return value:
>   *		- IOMMU_DOMAIN_IDENTITY: must use an identity domain
>   *		- IOMMU_DOMAIN_DMA: must use a dma domain
> @@ -287,6 +289,9 @@ struct iommu_ops {
>  				      void *drvdata);
>  	void (*sva_unbind)(struct iommu_sva *handle);
>  	u32 (*sva_get_pasid)(struct iommu_sva *handle);
> +	int (*attach_pasid_table)(struct iommu_domain *domain,
> +				  struct iommu_pasid_table_config *cfg);
> +	void (*detach_pasid_table)(struct iommu_domain *domain);
>  
>  	int (*page_response)(struct device *dev,
>  			     struct iommu_fault_event *evt,
> @@ -434,6 +439,11 @@ extern int iommu_uapi_sva_unbind_gpasid(struct iommu_domain *domain,
>  					struct device *dev, void __user *udata);
>  extern int iommu_sva_unbind_gpasid(struct iommu_domain *domain,
>  				   struct device *dev, ioasid_t pasid);
> +extern int iommu_attach_pasid_table(struct iommu_domain *domain,
> +				    struct iommu_pasid_table_config *cfg);
> +extern int iommu_uapi_attach_pasid_table(struct iommu_domain *domain,
> +					 void __user *udata);
> +extern void iommu_detach_pasid_table(struct iommu_domain *domain);
>  extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
>  extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
>  extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
> @@ -639,6 +649,7 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev,
>  void iommu_sva_unbind_device(struct iommu_sva *handle);
>  u32 iommu_sva_get_pasid(struct iommu_sva *handle);
>  
> +
extra blank line.

>  #else /* CONFIG_IOMMU_API */
>  
>  struct iommu_ops {};
> @@ -1020,6 +1031,16 @@ iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev)
>  	return -ENODEV;
>  }
>  
> +static inline
> +int iommu_attach_pasid_table(struct iommu_domain *domain,
> +			     struct iommu_pasid_table_config *cfg)
> +{
> +	return -ENODEV;
> +}

miss dummy iommu_uapi_attach_pasid_table?

> +
> +static inline
> +void iommu_detach_pasid_table(struct iommu_domain *domain) {}
> +
>  static inline struct iommu_sva *
>  iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
>  {
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> index e1d9e75f2c94..082d758dd016 100644
> --- a/include/uapi/linux/iommu.h
> +++ b/include/uapi/linux/iommu.h
> @@ -338,4 +338,58 @@ struct iommu_gpasid_bind_data {
>  	} vendor;
>  };
>  
> +/**
> + * struct iommu_pasid_smmuv3 - ARM SMMUv3 Stream Table Entry stage 1 related
> + *     information
> + * @version: API version of this structure
> + * @s1fmt: STE s1fmt (format of the CD table: single CD, linear table
> + *         or 2-level table)
> + * @s1dss: STE s1dss (specifies the behavior when @pasid_bits != 0
> + *         and no PASID is passed along with the incoming transaction)
> + * @padding: reserved for future use (should be zero)
> + *
> + * The PASID table is referred to as the Context Descriptor (CD) table on ARM
> + * SMMUv3. Please refer to the ARM SMMU 3.x spec (ARM IHI 0070A) for full
> + * details.
> + */
> +struct iommu_pasid_smmuv3 {
> +#define PASID_TABLE_SMMUV3_CFG_VERSION_1 1
> +	__u32	version;
> +	__u8	s1fmt;
> +	__u8	s1dss;
> +	__u8	padding[2];
> +};
> +
> +/**
> + * struct iommu_pasid_table_config - PASID table data used to bind guest PASID
> + *     table to the host IOMMU
> + * @argsz: User filled size of this data
> + * @version: API version to prepare for future extensions
> + * @format: format of the PASID table
> + * @base_ptr: guest physical address of the PASID table
> + * @pasid_bits: number of PASID bits used in the PASID table
> + * @config: indicates whether the guest translation stage must
> + *          be translated, bypassed or aborted.
> + * @padding: reserved for future use (should be zero)
> + * @vendor_data.smmuv3: table information when @format is
> + * %IOMMU_PASID_FORMAT_SMMUV3
> + */
> +struct iommu_pasid_table_config {
> +	__u32	argsz;
> +#define PASID_TABLE_CFG_VERSION_1 1
> +	__u32	version;
> +#define IOMMU_PASID_FORMAT_SMMUV3	1
> +	__u32	format;
> +	__u64	base_ptr;
put @base_ptr between @version and @format can save some memory.

> +	__u8	pasid_bits;
> +#define IOMMU_PASID_CONFIG_TRANSLATE	1
> +#define IOMMU_PASID_CONFIG_BYPASS	2
> +#define IOMMU_PASID_CONFIG_ABORT	3
> +	__u8	config;
> +	__u8    padding[2];
> +	union {
> +		struct iommu_pasid_smmuv3 smmuv3;
> +	} vendor_data;
> +};
> +
>  #endif /* _UAPI_IOMMU_H */
> 

Thanks,
Keqian

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v13 02/15] iommu: Introduce bind/unbind_guest_msi
  2020-11-18 11:21 ` [PATCH v13 02/15] iommu: Introduce bind/unbind_guest_msi Eric Auger
@ 2021-02-01 11:52   ` Keqian Zhu
  2021-02-12  8:55     ` Auger Eric
  0 siblings, 1 reply; 57+ messages in thread
From: Keqian Zhu @ 2021-02-01 11:52 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, iommu, linux-kernel, kvm, kvmarm,
	will, joro, maz, robin.murphy, alex.williamson
  Cc: jean-philippe, jacob.jun.pan, nicoleotsuka, vivek.gautam,
	yi.l.liu, zhangfei.gao

Hi Eric,

On 2020/11/18 19:21, Eric Auger wrote:
> On ARM, MSI are translated by the SMMU. An IOVA is allocated
> for each MSI doorbell. If both the host and the guest are exposed
> with SMMUs, we end up with 2 different IOVAs allocated by each.
> guest allocates an IOVA (gIOVA) to map onto the guest MSI
> doorbell (gDB). The Host allocates another IOVA (hIOVA) to map
> onto the physical doorbell (hDB).
> 
> So we end up with 2 untied mappings:
>          S1            S2
> gIOVA    ->    gDB
>               hIOVA    ->    hDB
> 
> Currently the PCI device is programmed by the host with hIOVA
> as MSI doorbell. So this does not work.
> 
> This patch introduces an API to pass gIOVA/gDB to the host so
> that gIOVA can be reused by the host instead of re-allocating
> a new IOVA. So the goal is to create the following nested mapping:
Does the gDB can be reused under non-nested mode?

> 
>          S1            S2
> gIOVA    ->    gDB     ->    hDB
> 
> and program the PCI device with gIOVA MSI doorbell.
> 
> In case we have several devices attached to this nested domain
> (devices belonging to the same group), they cannot be isolated
> on guest side either. So they should also end up in the same domain
> on guest side. We will enforce that all the devices attached to
> the host iommu domain use the same physical doorbell and similarly
> a single virtual doorbell mapping gets registered (1 single
> virtual doorbell is used on guest as well).
> 
[...]

> + *
> + * The associated IOVA can be reused by the host to create a nested
> + * stage2 binding mapping translating into the physical doorbell used
> + * by the devices attached to the domain.
> + *
> + * All devices within the domain must share the same physical doorbell.
> + * A single MSI GIOVA/GPA mapping can be attached to an iommu_domain.
> + */
> +
> +int iommu_bind_guest_msi(struct iommu_domain *domain,
> +			 dma_addr_t giova, phys_addr_t gpa, size_t size)
> +{
> +	if (unlikely(!domain->ops->bind_guest_msi))
> +		return -ENODEV;
> +
> +	return domain->ops->bind_guest_msi(domain, giova, gpa, size);
> +}
> +EXPORT_SYMBOL_GPL(iommu_bind_guest_msi);
> +
> +void iommu_unbind_guest_msi(struct iommu_domain *domain,
> +			    dma_addr_t iova)
nit: s/iova/giova

> +{
> +	if (unlikely(!domain->ops->unbind_guest_msi))
> +		return;
> +
> +	domain->ops->unbind_guest_msi(domain, iova);
> +}
> +EXPORT_SYMBOL_GPL(iommu_unbind_guest_msi);
> +
[...]

Thanks,
Keqian


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v13 03/15] iommu/arm-smmu-v3: Maintain a SID->device structure
  2020-11-18 11:21 ` [PATCH v13 03/15] iommu/arm-smmu-v3: Maintain a SID->device structure Eric Auger
@ 2021-02-01 12:26   ` Keqian Zhu
  2021-02-01 15:15     ` Jean-Philippe Brucker
  2021-02-01 17:19     ` Auger Eric
  0 siblings, 2 replies; 57+ messages in thread
From: Keqian Zhu @ 2021-02-01 12:26 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, iommu, linux-kernel, kvm, kvmarm,
	will, joro, maz, robin.murphy, alex.williamson
  Cc: jean-philippe, jacob.jun.pan, nicoleotsuka, vivek.gautam,
	yi.l.liu, zhangfei.gao

Hi Eric,

On 2020/11/18 19:21, Eric Auger wrote:
> From: Jean-Philippe Brucker <jean-philippe@linaro.org>
> 
> When handling faults from the event or PRI queue, we need to find the
> struct device associated to a SID. Add a rb_tree to keep track of SIDs.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
[...]

>  }
>  
> +static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
> +				  struct arm_smmu_master *master)
> +{
> +	int i;
> +	int ret = 0;
> +	struct arm_smmu_stream *new_stream, *cur_stream;
> +	struct rb_node **new_node, *parent_node = NULL;
> +	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
> +
> +	master->streams = kcalloc(fwspec->num_ids,
> +				  sizeof(struct arm_smmu_stream), GFP_KERNEL);
> +	if (!master->streams)
> +		return -ENOMEM;
> +	master->num_streams = fwspec->num_ids;
This is not roll-backed when fail.

> +
> +	mutex_lock(&smmu->streams_mutex);
> +	for (i = 0; i < fwspec->num_ids && !ret; i++) {
Check ret at here, makes it hard to decide the start index of rollback.

If we fail at here, then start index is (i-2).
If we fail in the loop, then start index is (i-1).

> +		u32 sid = fwspec->ids[i];
> +
> +		new_stream = &master->streams[i];
> +		new_stream->id = sid;
> +		new_stream->master = master;
> +
> +		/*
> +		 * Check the SIDs are in range of the SMMU and our stream table
> +		 */
> +		if (!arm_smmu_sid_in_range(smmu, sid)) {
> +			ret = -ERANGE;
> +			break;
> +		}
> +
> +		/* Ensure l2 strtab is initialised */
> +		if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) {
> +			ret = arm_smmu_init_l2_strtab(smmu, sid);
> +			if (ret)
> +				break;
> +		}
> +
> +		/* Insert into SID tree */
> +		new_node = &(smmu->streams.rb_node);
> +		while (*new_node) {
> +			cur_stream = rb_entry(*new_node, struct arm_smmu_stream,
> +					      node);
> +			parent_node = *new_node;
> +			if (cur_stream->id > new_stream->id) {
> +				new_node = &((*new_node)->rb_left);
> +			} else if (cur_stream->id < new_stream->id) {
> +				new_node = &((*new_node)->rb_right);
> +			} else {
> +				dev_warn(master->dev,
> +					 "stream %u already in tree\n",
> +					 cur_stream->id);
> +				ret = -EINVAL;
> +				break;
> +			}
> +		}
> +
> +		if (!ret) {
> +			rb_link_node(&new_stream->node, parent_node, new_node);
> +			rb_insert_color(&new_stream->node, &smmu->streams);
> +		}
> +	}
> +
> +	if (ret) {
> +		for (; i > 0; i--)
should be (i >= 0)?
And the start index seems not correct.

> +			rb_erase(&master->streams[i].node, &smmu->streams);
> +		kfree(master->streams);
> +	}
> +	mutex_unlock(&smmu->streams_mutex);
> +
> +	return ret;
> +}
> +
> +static void arm_smmu_remove_master(struct arm_smmu_master *master)
> +{
> +	int i;
> +	struct arm_smmu_device *smmu = master->smmu;
> +	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
> +
> +	if (!smmu || !master->streams)
> +		return;
> +
> +	mutex_lock(&smmu->streams_mutex);
> +	for (i = 0; i < fwspec->num_ids; i++)
> +		rb_erase(&master->streams[i].node, &smmu->streams);
> +	mutex_unlock(&smmu->streams_mutex);
> +
> +	kfree(master->streams);
> +}
> +
>  static struct iommu_ops arm_smmu_ops;
>  
>  static struct iommu_device *arm_smmu_probe_device(struct device *dev)
>  {
> -	int i, ret;
> +	int ret;
>  	struct arm_smmu_device *smmu;
>  	struct arm_smmu_master *master;
>  	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> @@ -2331,27 +2447,12 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
>  
>  	master->dev = dev;
>  	master->smmu = smmu;
> -	master->sids = fwspec->ids;
> -	master->num_sids = fwspec->num_ids;
>  	INIT_LIST_HEAD(&master->bonds);
>  	dev_iommu_priv_set(dev, master);
>  
> -	/* Check the SIDs are in range of the SMMU and our stream table */
> -	for (i = 0; i < master->num_sids; i++) {
> -		u32 sid = master->sids[i];
> -
> -		if (!arm_smmu_sid_in_range(smmu, sid)) {
> -			ret = -ERANGE;
> -			goto err_free_master;
> -		}
> -
> -		/* Ensure l2 strtab is initialised */
> -		if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) {
> -			ret = arm_smmu_init_l2_strtab(smmu, sid);
> -			if (ret)
> -				goto err_free_master;
> -		}
> -	}
> +	ret = arm_smmu_insert_master(smmu, master);
> +	if (ret)
> +		goto err_free_master;
>  
>  	master->ssid_bits = min(smmu->ssid_bits, fwspec->num_pasid_bits);
>  
> @@ -2389,6 +2490,7 @@ static void arm_smmu_release_device(struct device *dev)
>  	WARN_ON(arm_smmu_master_sva_enabled(master));
>  	arm_smmu_detach_dev(master);
>  	arm_smmu_disable_pasid(master);
> +	arm_smmu_remove_master(master);
>  	kfree(master);

Thanks,
Keqian

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v13 04/15] iommu/smmuv3: Allow s1 and s2 configs to coexist
  2020-11-18 11:21 ` [PATCH v13 04/15] iommu/smmuv3: Allow s1 and s2 configs to coexist Eric Auger
@ 2021-02-01 12:35   ` Keqian Zhu
  0 siblings, 0 replies; 57+ messages in thread
From: Keqian Zhu @ 2021-02-01 12:35 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, iommu, linux-kernel, kvm, kvmarm,
	will, joro, maz, robin.murphy, alex.williamson
  Cc: jean-philippe, jacob.jun.pan, nicoleotsuka, vivek.gautam,
	yi.l.liu, zhangfei.gao

Hi Eric,

On 2020/11/18 19:21, Eric Auger wrote:
> In true nested mode, both s1_cfg and s2_cfg will coexist.
> Let's remove the union and add a "set" field in each
> config structure telling whether the config is set and needs
> to be applied when writing the STE. In legacy nested mode,
> only the 2d stage is used. In true nested mode, the "set" field
nit: s/2d/2nd

> will be set when the guest passes the pasid table.
nit: ... the "set" filed of s1_cfg and s2_cfg will be set ...

> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> ---
> v12 -> v13:
> - does not dynamically allocate s1-cfg and s2_cfg anymore. Add
>   the set field
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 43 +++++++++++++--------
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  8 ++--
>  2 files changed, 31 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 1e4acc7f3d3c..18ac5af1b284 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1195,8 +1195,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>  	u64 val = le64_to_cpu(dst[0]);
>  	bool ste_live = false;
>  	struct arm_smmu_device *smmu = NULL;
> -	struct arm_smmu_s1_cfg *s1_cfg = NULL;
> -	struct arm_smmu_s2_cfg *s2_cfg = NULL;
> +	struct arm_smmu_s1_cfg *s1_cfg;
> +	struct arm_smmu_s2_cfg *s2_cfg;
>  	struct arm_smmu_domain *smmu_domain = NULL;
>  	struct arm_smmu_cmdq_ent prefetch_cmd = {
>  		.opcode		= CMDQ_OP_PREFETCH_CFG,
> @@ -1211,13 +1211,24 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>  	}
>  
>  	if (smmu_domain) {
> +		s1_cfg = &smmu_domain->s1_cfg;
> +		s2_cfg = &smmu_domain->s2_cfg;
> +
>  		switch (smmu_domain->stage) {
>  		case ARM_SMMU_DOMAIN_S1:
> -			s1_cfg = &smmu_domain->s1_cfg;
> +			s1_cfg->set = true;
> +			s2_cfg->set = false;
>  			break;
>  		case ARM_SMMU_DOMAIN_S2:
> +			s1_cfg->set = false;
> +			s2_cfg->set = true;
> +			break;
>  		case ARM_SMMU_DOMAIN_NESTED:
> -			s2_cfg = &smmu_domain->s2_cfg;
> +			/*
> +			 * Actual usage of stage 1 depends on nested mode:
> +			 * legacy (2d stage only) or true nested mode
> +			 */
> +			s2_cfg->set = true;
>  			break;
>  		default:
>  			break;
> @@ -1244,7 +1255,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>  	val = STRTAB_STE_0_V;
>  
>  	/* Bypass/fault */
> -	if (!smmu_domain || !(s1_cfg || s2_cfg)) {
> +	if (!smmu_domain || !(s1_cfg->set || s2_cfg->set)) {
>  		if (!smmu_domain && disable_bypass)
>  			val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT);
>  		else
> @@ -1263,7 +1274,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>  		return;
>  	}
>  
> -	if (s1_cfg) {
> +	if (s1_cfg->set) {
>  		BUG_ON(ste_live);
>  		dst[1] = cpu_to_le64(
>  			 FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) |
> @@ -1282,7 +1293,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>  			FIELD_PREP(STRTAB_STE_0_S1FMT, s1_cfg->s1fmt);
>  	}
>  
> -	if (s2_cfg) {
> +	if (s2_cfg->set) {
>  		BUG_ON(ste_live);
>  		dst[2] = cpu_to_le64(
>  			 FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) |
> @@ -1846,24 +1857,24 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
>  {
>  	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>  	struct arm_smmu_device *smmu = smmu_domain->smmu;
> +	struct arm_smmu_s1_cfg *s1_cfg = &smmu_domain->s1_cfg;
> +	struct arm_smmu_s2_cfg *s2_cfg = &smmu_domain->s2_cfg;
>  
>  	iommu_put_dma_cookie(domain);
>  	free_io_pgtable_ops(smmu_domain->pgtbl_ops);
>  
>  	/* Free the CD and ASID, if we allocated them */
> -	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
> -		struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
> -
> +	if (s1_cfg->set) {
>  		/* Prevent SVA from touching the CD while we're freeing it */
>  		mutex_lock(&arm_smmu_asid_lock);
> -		if (cfg->cdcfg.cdtab)
> +		if (s1_cfg->cdcfg.cdtab)
>  			arm_smmu_free_cd_tables(smmu_domain);
> -		arm_smmu_free_asid(&cfg->cd);
> +		arm_smmu_free_asid(&s1_cfg->cd);
>  		mutex_unlock(&arm_smmu_asid_lock);
> -	} else {
> -		struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg;
> -		if (cfg->vmid)
> -			arm_smmu_bitmap_free(smmu->vmid_map, cfg->vmid);
> +	}
> +	if (s2_cfg->set) {
> +		if (s2_cfg->vmid)
> +			arm_smmu_bitmap_free(smmu->vmid_map, s2_cfg->vmid);
>  	}
>  
>  	kfree(smmu_domain);
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index 19196eea7c1d..07f59252dd21 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -562,12 +562,14 @@ struct arm_smmu_s1_cfg {
>  	struct arm_smmu_ctx_desc	cd;
>  	u8				s1fmt;
>  	u8				s1cdmax;
> +	bool				set;
>  };
>  
>  struct arm_smmu_s2_cfg {
>  	u16				vmid;
>  	u64				vttbr;
>  	u64				vtcr;
> +	bool				set;
>  };
>  
>  struct arm_smmu_strtab_cfg {
> @@ -678,10 +680,8 @@ struct arm_smmu_domain {
>  	atomic_t			nr_ats_masters;
>  
>  	enum arm_smmu_domain_stage	stage;
> -	union {
> -		struct arm_smmu_s1_cfg	s1_cfg;
> -		struct arm_smmu_s2_cfg	s2_cfg;
> -	};
> +	struct arm_smmu_s1_cfg	s1_cfg;
> +	struct arm_smmu_s2_cfg	s2_cfg;
>  
>  	struct iommu_domain		domain;
>  
Other looks good to me. ;-)
> 

Thanks,
Keqian

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v13 03/15] iommu/arm-smmu-v3: Maintain a SID->device structure
  2021-02-01 12:26   ` Keqian Zhu
@ 2021-02-01 15:15     ` Jean-Philippe Brucker
  2021-02-02  6:39       ` Keqian Zhu
  2021-02-01 17:19     ` Auger Eric
  1 sibling, 1 reply; 57+ messages in thread
From: Jean-Philippe Brucker @ 2021-02-01 15:15 UTC (permalink / raw)
  To: Keqian Zhu
  Cc: Eric Auger, eric.auger.pro, iommu, linux-kernel, kvm, kvmarm,
	will, joro, maz, robin.murphy, alex.williamson, jacob.jun.pan,
	nicoleotsuka, vivek.gautam, yi.l.liu, zhangfei.gao

On Mon, Feb 01, 2021 at 08:26:41PM +0800, Keqian Zhu wrote:
> > +static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
> > +				  struct arm_smmu_master *master)
> > +{
> > +	int i;
> > +	int ret = 0;
> > +	struct arm_smmu_stream *new_stream, *cur_stream;
> > +	struct rb_node **new_node, *parent_node = NULL;
> > +	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
> > +
> > +	master->streams = kcalloc(fwspec->num_ids,
> > +				  sizeof(struct arm_smmu_stream), GFP_KERNEL);
> > +	if (!master->streams)
> > +		return -ENOMEM;
> > +	master->num_streams = fwspec->num_ids;
> This is not roll-backed when fail.

No need, the caller frees master

> > +
> > +	mutex_lock(&smmu->streams_mutex);
> > +	for (i = 0; i < fwspec->num_ids && !ret; i++) {
> Check ret at here, makes it hard to decide the start index of rollback.
> 
> If we fail at here, then start index is (i-2).
> If we fail in the loop, then start index is (i-1).
> 
[...]
> > +	if (ret) {
> > +		for (; i > 0; i--)
> should be (i >= 0)?
> And the start index seems not correct.

Indeed, this whole bit is wrong. I'll fix it while resending the IOPF
series.

Thanks,
Jean


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v13 01/15] iommu: Introduce attach/detach_pasid_table API
  2021-02-01 11:27   ` Keqian Zhu
@ 2021-02-01 17:18     ` Auger Eric
  0 siblings, 0 replies; 57+ messages in thread
From: Auger Eric @ 2021-02-01 17:18 UTC (permalink / raw)
  To: Keqian Zhu, eric.auger.pro, iommu, linux-kernel, kvm, kvmarm,
	will, joro, maz, robin.murphy, alex.williamson
  Cc: jean-philippe, jacob.jun.pan, nicoleotsuka, vivek.gautam,
	yi.l.liu, zhangfei.gao

Hi Keqian,

On 2/1/21 12:27 PM, Keqian Zhu wrote:
> Hi Eric,
> 
> On 2020/11/18 19:21, Eric Auger wrote:
>> In virtualization use case, when a guest is assigned
>> a PCI host device, protected by a virtual IOMMU on the guest,
>> the physical IOMMU must be programmed to be consistent with
>> the guest mappings. If the physical IOMMU supports two
>> translation stages it makes sense to program guest mappings
>> onto the first stage/level (ARM/Intel terminology) while the host
>> owns the stage/level 2.
>>
>> In that case, it is mandated to trap on guest configuration
>> settings and pass those to the physical iommu driver.
>>
>> This patch adds a new API to the iommu subsystem that allows
>> to set/unset the pasid table information.
>>
>> A generic iommu_pasid_table_config struct is introduced in
>> a new iommu.h uapi header. This is going to be used by the VFIO
>> user API.
>>
>> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
>> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
>> Signed-off-by: Ashok Raj <ashok.raj@intel.com>
>> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>
>> ---
>>
>> v12 -> v13:
>> - Fix config check
>>
>> v11 -> v12:
>> - add argsz, name the union
>> ---
>>  drivers/iommu/iommu.c      | 68 ++++++++++++++++++++++++++++++++++++++
>>  include/linux/iommu.h      | 21 ++++++++++++
>>  include/uapi/linux/iommu.h | 54 ++++++++++++++++++++++++++++++
>>  3 files changed, 143 insertions(+)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index b53446bb8c6b..978fe34378fb 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -2171,6 +2171,74 @@ int iommu_uapi_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev
>>  }
>>  EXPORT_SYMBOL_GPL(iommu_uapi_sva_unbind_gpasid);
>>  
>> +int iommu_attach_pasid_table(struct iommu_domain *domain,
>> +			     struct iommu_pasid_table_config *cfg)
>> +{
>> +	if (unlikely(!domain->ops->attach_pasid_table))
>> +		return -ENODEV;
>> +
>> +	return domain->ops->attach_pasid_table(domain, cfg);
>> +}
> miss export symbol?
yes we do
> 
>> +
>> +int iommu_uapi_attach_pasid_table(struct iommu_domain *domain,
>> +				  void __user *uinfo)
>> +{
>> +	struct iommu_pasid_table_config pasid_table_data = { 0 };
>> +	u32 minsz;
>> +
>> +	if (unlikely(!domain->ops->attach_pasid_table))
>> +		return -ENODEV;
>> +
>> +	/*
>> +	 * No new spaces can be added before the variable sized union, the
>> +	 * minimum size is the offset to the union.
>> +	 */
>> +	minsz = offsetof(struct iommu_pasid_table_config, vendor_data);
>> +
>> +	/* Copy minsz from user to get flags and argsz */
>> +	if (copy_from_user(&pasid_table_data, uinfo, minsz))
>> +		return -EFAULT;
>> +
>> +	/* Fields before the variable size union are mandatory */
>> +	if (pasid_table_data.argsz < minsz)
>> +		return -EINVAL;
>> +
>> +	/* PASID and address granu require additional info beyond minsz */
>> +	if (pasid_table_data.version != PASID_TABLE_CFG_VERSION_1)
>> +		return -EINVAL;
>> +	if (pasid_table_data.format == IOMMU_PASID_FORMAT_SMMUV3 &&
>> +	    pasid_table_data.argsz <
>> +		offsetofend(struct iommu_pasid_table_config, vendor_data.smmuv3))
>> +		return -EINVAL;
>> +
>> +	/*
>> +	 * User might be using a newer UAPI header which has a larger data
>> +	 * size, we shall support the existing flags within the current
>> +	 * size. Copy the remaining user data _after_ minsz but not more
>> +	 * than the current kernel supported size.
>> +	 */
>> +	if (copy_from_user((void *)&pasid_table_data + minsz, uinfo + minsz,
>> +			   min_t(u32, pasid_table_data.argsz, sizeof(pasid_table_data)) - minsz))
>> +		return -EFAULT;
>> +
>> +	/* Now the argsz is validated, check the content */
>> +	if (pasid_table_data.config < IOMMU_PASID_CONFIG_TRANSLATE ||
>> +	    pasid_table_data.config > IOMMU_PASID_CONFIG_ABORT)
>> +		return -EINVAL;
>> +
>> +	return domain->ops->attach_pasid_table(domain, &pasid_table_data);
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_uapi_attach_pasid_table);
>> +
>> +void iommu_detach_pasid_table(struct iommu_domain *domain)
>> +{
>> +	if (unlikely(!domain->ops->detach_pasid_table))
>> +		return;
>> +
>> +	domain->ops->detach_pasid_table(domain);
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
>> +
>>  static void __iommu_detach_device(struct iommu_domain *domain,
>>  				  struct device *dev)
>>  {
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index b95a6f8db6ff..464fcbecf841 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -223,6 +223,8 @@ struct iommu_iotlb_gather {
>>   * @cache_invalidate: invalidate translation caches
>>   * @sva_bind_gpasid: bind guest pasid and mm
>>   * @sva_unbind_gpasid: unbind guest pasid and mm
>> + * @attach_pasid_table: attach a pasid table
>> + * @detach_pasid_table: detach the pasid table
>>   * @def_domain_type: device default domain type, return value:
>>   *		- IOMMU_DOMAIN_IDENTITY: must use an identity domain
>>   *		- IOMMU_DOMAIN_DMA: must use a dma domain
>> @@ -287,6 +289,9 @@ struct iommu_ops {
>>  				      void *drvdata);
>>  	void (*sva_unbind)(struct iommu_sva *handle);
>>  	u32 (*sva_get_pasid)(struct iommu_sva *handle);
>> +	int (*attach_pasid_table)(struct iommu_domain *domain,
>> +				  struct iommu_pasid_table_config *cfg);
>> +	void (*detach_pasid_table)(struct iommu_domain *domain);
>>  
>>  	int (*page_response)(struct device *dev,
>>  			     struct iommu_fault_event *evt,
>> @@ -434,6 +439,11 @@ extern int iommu_uapi_sva_unbind_gpasid(struct iommu_domain *domain,
>>  					struct device *dev, void __user *udata);
>>  extern int iommu_sva_unbind_gpasid(struct iommu_domain *domain,
>>  				   struct device *dev, ioasid_t pasid);
>> +extern int iommu_attach_pasid_table(struct iommu_domain *domain,
>> +				    struct iommu_pasid_table_config *cfg);
>> +extern int iommu_uapi_attach_pasid_table(struct iommu_domain *domain,
>> +					 void __user *udata);
>> +extern void iommu_detach_pasid_table(struct iommu_domain *domain);
>>  extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
>>  extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
>>  extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
>> @@ -639,6 +649,7 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev,
>>  void iommu_sva_unbind_device(struct iommu_sva *handle);
>>  u32 iommu_sva_get_pasid(struct iommu_sva *handle);
>>  
>> +
> extra blank line.
yup
> 
>>  #else /* CONFIG_IOMMU_API */
>>  
>>  struct iommu_ops {};
>> @@ -1020,6 +1031,16 @@ iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev)
>>  	return -ENODEV;
>>  }
>>  
>> +static inline
>> +int iommu_attach_pasid_table(struct iommu_domain *domain,
>> +			     struct iommu_pasid_table_config *cfg)
>> +{
>> +	return -ENODEV;
>> +}
> 
> miss dummy iommu_uapi_attach_pasid_table?
yes we do
> 
>> +
>> +static inline
>> +void iommu_detach_pasid_table(struct iommu_domain *domain) {}
>> +
>>  static inline struct iommu_sva *
>>  iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
>>  {
>> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
>> index e1d9e75f2c94..082d758dd016 100644
>> --- a/include/uapi/linux/iommu.h
>> +++ b/include/uapi/linux/iommu.h
>> @@ -338,4 +338,58 @@ struct iommu_gpasid_bind_data {
>>  	} vendor;
>>  };
>>  
>> +/**
>> + * struct iommu_pasid_smmuv3 - ARM SMMUv3 Stream Table Entry stage 1 related
>> + *     information
>> + * @version: API version of this structure
>> + * @s1fmt: STE s1fmt (format of the CD table: single CD, linear table
>> + *         or 2-level table)
>> + * @s1dss: STE s1dss (specifies the behavior when @pasid_bits != 0
>> + *         and no PASID is passed along with the incoming transaction)
>> + * @padding: reserved for future use (should be zero)
>> + *
>> + * The PASID table is referred to as the Context Descriptor (CD) table on ARM
>> + * SMMUv3. Please refer to the ARM SMMU 3.x spec (ARM IHI 0070A) for full
>> + * details.
>> + */
>> +struct iommu_pasid_smmuv3 {
>> +#define PASID_TABLE_SMMUV3_CFG_VERSION_1 1
>> +	__u32	version;
>> +	__u8	s1fmt;
>> +	__u8	s1dss;
>> +	__u8	padding[2];
>> +};
>> +
>> +/**
>> + * struct iommu_pasid_table_config - PASID table data used to bind guest PASID
>> + *     table to the host IOMMU
>> + * @argsz: User filled size of this data
>> + * @version: API version to prepare for future extensions
>> + * @format: format of the PASID table
>> + * @base_ptr: guest physical address of the PASID table
>> + * @pasid_bits: number of PASID bits used in the PASID table
>> + * @config: indicates whether the guest translation stage must
>> + *          be translated, bypassed or aborted.
>> + * @padding: reserved for future use (should be zero)
>> + * @vendor_data.smmuv3: table information when @format is
>> + * %IOMMU_PASID_FORMAT_SMMUV3
>> + */
>> +struct iommu_pasid_table_config {
>> +	__u32	argsz;
>> +#define PASID_TABLE_CFG_VERSION_1 1
>> +	__u32	version;
>> +#define IOMMU_PASID_FORMAT_SMMUV3	1
>> +	__u32	format;
>> +	__u64	base_ptr;
> put @base_ptr between @version and @format can save some memory.
yes. This padding issue was also reported by Jacob. I will swap both
format and base_ptr.
> 
>> +	__u8	pasid_bits;
>> +#define IOMMU_PASID_CONFIG_TRANSLATE	1
>> +#define IOMMU_PASID_CONFIG_BYPASS	2
>> +#define IOMMU_PASID_CONFIG_ABORT	3
>> +	__u8	config;
>> +	__u8    padding[2];
>> +	union {
>> +		struct iommu_pasid_smmuv3 smmuv3;
>> +	} vendor_data;
>> +};
>> +
>>  #endif /* _UAPI_IOMMU_H */
>>
> 
> Thanks,
> Keqian
> 
Thanks!

Eric


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v13 03/15] iommu/arm-smmu-v3: Maintain a SID->device structure
  2021-02-01 12:26   ` Keqian Zhu
  2021-02-01 15:15     ` Jean-Philippe Brucker
@ 2021-02-01 17:19     ` Auger Eric
  2021-02-02  7:20       ` Keqian Zhu
  1 sibling, 1 reply; 57+ messages in thread
From: Auger Eric @ 2021-02-01 17:19 UTC (permalink / raw)
  To: Keqian Zhu, eric.auger.pro, iommu, linux-kernel, kvm, kvmarm,
	will, joro, maz, robin.murphy, alex.williamson
  Cc: jean-philippe, jacob.jun.pan, nicoleotsuka, vivek.gautam,
	yi.l.liu, zhangfei.gao

Hi Keqian,

On 2/1/21 1:26 PM, Keqian Zhu wrote:
> Hi Eric,
> 
> On 2020/11/18 19:21, Eric Auger wrote:
>> From: Jean-Philippe Brucker <jean-philippe@linaro.org>
>>
>> When handling faults from the event or PRI queue, we need to find the
>> struct device associated to a SID. Add a rb_tree to keep track of SIDs.
>>
>> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> [...]
> 
>>  }
>>  
>> +static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
>> +				  struct arm_smmu_master *master)
>> +{
>> +	int i;
>> +	int ret = 0;
>> +	struct arm_smmu_stream *new_stream, *cur_stream;
>> +	struct rb_node **new_node, *parent_node = NULL;
>> +	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
>> +
>> +	master->streams = kcalloc(fwspec->num_ids,
>> +				  sizeof(struct arm_smmu_stream), GFP_KERNEL);
>> +	if (!master->streams)
>> +		return -ENOMEM;
>> +	master->num_streams = fwspec->num_ids;
> This is not roll-backed when fail.
> 
>> +
>> +	mutex_lock(&smmu->streams_mutex);
>> +	for (i = 0; i < fwspec->num_ids && !ret; i++) {
> Check ret at here, makes it hard to decide the start index of rollback.
> 
> If we fail at here, then start index is (i-2).
> If we fail in the loop, then start index is (i-1).
> 
>> +		u32 sid = fwspec->ids[i];
>> +
>> +		new_stream = &master->streams[i];
>> +		new_stream->id = sid;
>> +		new_stream->master = master;
>> +
>> +		/*
>> +		 * Check the SIDs are in range of the SMMU and our stream table
>> +		 */
>> +		if (!arm_smmu_sid_in_range(smmu, sid)) {
>> +			ret = -ERANGE;
>> +			break;
>> +		}
>> +
>> +		/* Ensure l2 strtab is initialised */
>> +		if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) {
>> +			ret = arm_smmu_init_l2_strtab(smmu, sid);
>> +			if (ret)
>> +				break;
>> +		}
>> +
>> +		/* Insert into SID tree */
>> +		new_node = &(smmu->streams.rb_node);
>> +		while (*new_node) {
>> +			cur_stream = rb_entry(*new_node, struct arm_smmu_stream,
>> +					      node);
>> +			parent_node = *new_node;
>> +			if (cur_stream->id > new_stream->id) {
>> +				new_node = &((*new_node)->rb_left);
>> +			} else if (cur_stream->id < new_stream->id) {
>> +				new_node = &((*new_node)->rb_right);
>> +			} else {
>> +				dev_warn(master->dev,
>> +					 "stream %u already in tree\n",
>> +					 cur_stream->id);
>> +				ret = -EINVAL;
>> +				break;
>> +			}
>> +		}
>> +
>> +		if (!ret) {
>> +			rb_link_node(&new_stream->node, parent_node, new_node);
>> +			rb_insert_color(&new_stream->node, &smmu->streams);
>> +		}
>> +	}
>> +
>> +	if (ret) {
>> +		for (; i > 0; i--)
> should be (i >= 0)?
> And the start index seems not correct.
> 
>> +			rb_erase(&master->streams[i].node, &smmu->streams);
>> +		kfree(master->streams);
>> +	}
>> +	mutex_unlock(&smmu->streams_mutex);
>> +
>> +	return ret;
>> +}
>> +
>> +static void arm_smmu_remove_master(struct arm_smmu_master *master)
>> +{
>> +	int i;
>> +	struct arm_smmu_device *smmu = master->smmu;
>> +	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
>> +
>> +	if (!smmu || !master->streams)
>> +		return;
>> +
>> +	mutex_lock(&smmu->streams_mutex);
>> +	for (i = 0; i < fwspec->num_ids; i++)
>> +		rb_erase(&master->streams[i].node, &smmu->streams);
>> +	mutex_unlock(&smmu->streams_mutex);
>> +
>> +	kfree(master->streams);
>> +}
>> +
>>  static struct iommu_ops arm_smmu_ops;
>>  
>>  static struct iommu_device *arm_smmu_probe_device(struct device *dev)
>>  {
>> -	int i, ret;
>> +	int ret;
>>  	struct arm_smmu_device *smmu;
>>  	struct arm_smmu_master *master;
>>  	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
>> @@ -2331,27 +2447,12 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
>>  
>>  	master->dev = dev;
>>  	master->smmu = smmu;
>> -	master->sids = fwspec->ids;
>> -	master->num_sids = fwspec->num_ids;
>>  	INIT_LIST_HEAD(&master->bonds);
>>  	dev_iommu_priv_set(dev, master);
>>  
>> -	/* Check the SIDs are in range of the SMMU and our stream table */
>> -	for (i = 0; i < master->num_sids; i++) {
>> -		u32 sid = master->sids[i];
>> -
>> -		if (!arm_smmu_sid_in_range(smmu, sid)) {
>> -			ret = -ERANGE;
>> -			goto err_free_master;
>> -		}
>> -
>> -		/* Ensure l2 strtab is initialised */
>> -		if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) {
>> -			ret = arm_smmu_init_l2_strtab(smmu, sid);
>> -			if (ret)
>> -				goto err_free_master;
>> -		}
>> -	}
>> +	ret = arm_smmu_insert_master(smmu, master);
>> +	if (ret)
>> +		goto err_free_master;
>>  
>>  	master->ssid_bits = min(smmu->ssid_bits, fwspec->num_pasid_bits);
>>  
>> @@ -2389,6 +2490,7 @@ static void arm_smmu_release_device(struct device *dev)
>>  	WARN_ON(arm_smmu_master_sva_enabled(master));
>>  	arm_smmu_detach_dev(master);
>>  	arm_smmu_disable_pasid(master);
>> +	arm_smmu_remove_master(master);
>>  	kfree(master);
> 
> Thanks,
> Keqian
> 
Thank you for the review. Jean will address this issues in his own
series and on my end I will rebase on this latter.

Best Regards

Eric



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v13 03/15] iommu/arm-smmu-v3: Maintain a SID->device structure
  2021-02-01 15:15     ` Jean-Philippe Brucker
@ 2021-02-02  6:39       ` Keqian Zhu
  0 siblings, 0 replies; 57+ messages in thread
From: Keqian Zhu @ 2021-02-02  6:39 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Eric Auger
  Cc: eric.auger.pro, iommu, linux-kernel, kvm, kvmarm, will, joro,
	maz, robin.murphy, alex.williamson, jacob.jun.pan, nicoleotsuka,
	vivek.gautam, yi.l.liu, zhangfei.gao

Hi Jean,

On 2021/2/1 23:15, Jean-Philippe Brucker wrote:
> On Mon, Feb 01, 2021 at 08:26:41PM +0800, Keqian Zhu wrote:
>>> +static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
>>> +				  struct arm_smmu_master *master)
>>> +{
>>> +	int i;
>>> +	int ret = 0;
>>> +	struct arm_smmu_stream *new_stream, *cur_stream;
>>> +	struct rb_node **new_node, *parent_node = NULL;
>>> +	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
>>> +
>>> +	master->streams = kcalloc(fwspec->num_ids,
>>> +				  sizeof(struct arm_smmu_stream), GFP_KERNEL);
>>> +	if (!master->streams)
>>> +		return -ENOMEM;
>>> +	master->num_streams = fwspec->num_ids;
>> This is not roll-backed when fail.
> 
> No need, the caller frees master
OK.

> 
>>> +
>>> +	mutex_lock(&smmu->streams_mutex);
>>> +	for (i = 0; i < fwspec->num_ids && !ret; i++) {
>> Check ret at here, makes it hard to decide the start index of rollback.
>>
>> If we fail at here, then start index is (i-2).
>> If we fail in the loop, then start index is (i-1).
>>
> [...]
>>> +	if (ret) {
>>> +		for (; i > 0; i--)
>> should be (i >= 0)?
>> And the start index seems not correct.
> 
> Indeed, this whole bit is wrong. I'll fix it while resending the IOPF
> series.
> 
> Thanks,
> Jean
OK, I am glad it helps.

Thanks,
Keqian

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v13 05/15] iommu/smmuv3: Get prepared for nested stage support
  2020-11-18 11:21 ` [PATCH v13 05/15] iommu/smmuv3: Get prepared for nested stage support Eric Auger
                     ` (2 preceding siblings ...)
  2020-12-09 14:26   ` Shameerali Kolothum Thodi
@ 2021-02-02  7:14   ` Keqian Zhu
  2021-02-11 17:36     ` Auger Eric
  3 siblings, 1 reply; 57+ messages in thread
From: Keqian Zhu @ 2021-02-02  7:14 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, iommu, linux-kernel, kvm, kvmarm,
	will, joro, maz, robin.murphy, alex.williamson
  Cc: jean-philippe, jacob.jun.pan, nicoleotsuka, vivek.gautam,
	yi.l.liu, zhangfei.gao

Hi Eric,

On 2020/11/18 19:21, Eric Auger wrote:
> When nested stage translation is setup, both s1_cfg and
> s2_cfg are set.
> 
> We introduce a new smmu domain abort field that will be set
> upon guest stage1 configuration passing.
> 
> arm_smmu_write_strtab_ent() is modified to write both stage
> fields in the STE and deal with the abort field.
> 
> In nested mode, only stage 2 is "finalized" as the host does
> not own/configure the stage 1 context descriptor; guest does.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> ---
> v10 -> v11:
> - Fix an issue reported by Shameer when switching from with vSMMU
>   to without vSMMU. Despite the spec does not seem to mention it
>   seems to be needed to reset the 2 high 64b when switching from
>   S1+S2 cfg to S1 only. Especially dst[3] needs to be reset (S2TTB).
>   On some implementations, if the S2TTB is not reset, this causes
>   a C_BAD_STE error
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 64 +++++++++++++++++----
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  2 +
>  2 files changed, 56 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 18ac5af1b284..412ea1bafa50 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1181,8 +1181,10 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>  	 * three cases at the moment:
>  	 *
>  	 * 1. Invalid (all zero) -> bypass/fault (init)
> -	 * 2. Bypass/fault -> translation/bypass (attach)
> -	 * 3. Translation/bypass -> bypass/fault (detach)
> +	 * 2. Bypass/fault -> single stage translation/bypass (attach)
> +	 * 3. Single or nested stage Translation/bypass -> bypass/fault (detach)
> +	 * 4. S2 -> S1 + S2 (attach_pasid_table)
> +	 * 5. S1 + S2 -> S2 (detach_pasid_table)

The following line "BUG_ON(ste_live && !nested);" forbids this transform.
And I have a look at the 6th patch, the transform seems S1 + S2 -> abort.
So after detach, the status is not the same as that before attach. Does it
match our expectation?

>  	 *
>  	 * Given that we can't update the STE atomically and the SMMU
>  	 * doesn't read the thing in a defined order, that leaves us
> @@ -1193,7 +1195,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>  	 * 3. Update Config, sync
>  	 */
>  	u64 val = le64_to_cpu(dst[0]);
> -	bool ste_live = false;
> +	bool s1_live = false, s2_live = false, ste_live;
> +	bool abort, nested = false, translate = false;
>  	struct arm_smmu_device *smmu = NULL;
>  	struct arm_smmu_s1_cfg *s1_cfg;
>  	struct arm_smmu_s2_cfg *s2_cfg;
> @@ -1233,6 +1236,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>  		default:
>  			break;
>  		}
> +		nested = s1_cfg->set && s2_cfg->set;
> +		translate = s1_cfg->set || s2_cfg->set;
>  	}
>  
>  	if (val & STRTAB_STE_0_V) {
> @@ -1240,23 +1245,36 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>  		case STRTAB_STE_0_CFG_BYPASS:
>  			break;
>  		case STRTAB_STE_0_CFG_S1_TRANS:
> +			s1_live = true;
> +			break;
>  		case STRTAB_STE_0_CFG_S2_TRANS:
> -			ste_live = true;
> +			s2_live = true;
> +			break;
> +		case STRTAB_STE_0_CFG_NESTED:
> +			s1_live = true;
> +			s2_live = true;
>  			break;
>  		case STRTAB_STE_0_CFG_ABORT:
> -			BUG_ON(!disable_bypass);
>  			break;
>  		default:
>  			BUG(); /* STE corruption */
>  		}
>  	}
>  
> +	ste_live = s1_live || s2_live;
> +
>  	/* Nuke the existing STE_0 value, as we're going to rewrite it */
>  	val = STRTAB_STE_0_V;
>  
>  	/* Bypass/fault */
> -	if (!smmu_domain || !(s1_cfg->set || s2_cfg->set)) {
> -		if (!smmu_domain && disable_bypass)
> +
> +	if (!smmu_domain)
> +		abort = disable_bypass;
> +	else
> +		abort = smmu_domain->abort;
> +
> +	if (abort || !translate) {
> +		if (abort)
>  			val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT);
>  		else
>  			val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS);
> @@ -1274,8 +1292,16 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>  		return;
>  	}
>  
> +	BUG_ON(ste_live && !nested);
> +
> +	if (ste_live) {
> +		/* First invalidate the live STE */
> +		dst[0] = cpu_to_le64(STRTAB_STE_0_CFG_ABORT);
> +		arm_smmu_sync_ste_for_sid(smmu, sid);
> +	}
> +
[...]

Thanks,
Keqian

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v13 03/15] iommu/arm-smmu-v3: Maintain a SID->device structure
  2021-02-01 17:19     ` Auger Eric
@ 2021-02-02  7:20       ` Keqian Zhu
  0 siblings, 0 replies; 57+ messages in thread
From: Keqian Zhu @ 2021-02-02  7:20 UTC (permalink / raw)
  To: Auger Eric, eric.auger.pro, iommu, linux-kernel, kvm, kvmarm,
	will, joro, maz, robin.murphy, alex.williamson
  Cc: jean-philippe, jacob.jun.pan, nicoleotsuka, vivek.gautam,
	yi.l.liu, zhangfei.gao

Hi Eric,

On 2021/2/2 1:19, Auger Eric wrote:
> Hi Keqian,
> 
> On 2/1/21 1:26 PM, Keqian Zhu wrote:
>> Hi Eric,
>>
>> On 2020/11/18 19:21, Eric Auger wrote:
>>> From: Jean-Philippe Brucker <jean-philippe@linaro.org>
>>>
>>> When handling faults from the event or PRI queue, we need to find the
>>> struct device associated to a SID. Add a rb_tree to keep track of SIDs.
>>>
>>> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
>> [...]
>>
>>>  }
>>>  
>>> +static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
>>> +				  struct arm_smmu_master *master)
[...]

>>>  	kfree(master);
>>
>> Thanks,
>> Keqian
>>
> Thank you for the review. Jean will address this issues in his own
> series and on my end I will rebase on this latter.
> 
> Best Regards
> 
> Eric
>

Yeah, and hope this series can be accepted earlier ;-)

Thanks,
Keqian

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v13 06/15] iommu/smmuv3: Implement attach/detach_pasid_table
  2020-11-18 11:21 ` [PATCH v13 06/15] iommu/smmuv3: Implement attach/detach_pasid_table Eric Auger
@ 2021-02-02  8:03   ` Keqian Zhu
  2021-02-11 17:35     ` Auger Eric
  0 siblings, 1 reply; 57+ messages in thread
From: Keqian Zhu @ 2021-02-02  8:03 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, iommu, linux-kernel, kvm, kvmarm,
	will, joro, maz, robin.murphy, alex.williamson
  Cc: jean-philippe, jacob.jun.pan, nicoleotsuka, vivek.gautam,
	yi.l.liu, zhangfei.gao

Hi Eric,

On 2020/11/18 19:21, Eric Auger wrote:
> On attach_pasid_table() we program STE S1 related info set
> by the guest into the actual physical STEs. At minimum
> we need to program the context descriptor GPA and compute
> whether the stage1 is translated/bypassed or aborted.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> ---
> v7 -> v8:
> - remove smmu->features check, now done on domain finalize
> 
> v6 -> v7:
> - check versions and comment the fact we don't need to take
>   into account s1dss and s1fmt
> v3 -> v4:
> - adapt to changes in iommu_pasid_table_config
> - different programming convention at s1_cfg/s2_cfg/ste.abort
> 
> v2 -> v3:
> - callback now is named set_pasid_table and struct fields
>   are laid out differently.
> 
> v1 -> v2:
> - invalidate the STE before changing them
> - hold init_mutex
> - handle new fields
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 89 +++++++++++++++++++++
>  1 file changed, 89 insertions(+)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 412ea1bafa50..805acdc18a3a 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2661,6 +2661,93 @@ static void arm_smmu_get_resv_regions(struct device *dev,
>  	iommu_dma_get_resv_regions(dev, head);
>  }
>  
> +static int arm_smmu_attach_pasid_table(struct iommu_domain *domain,
> +				       struct iommu_pasid_table_config *cfg)
> +{
> +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> +	struct arm_smmu_master *master;
> +	struct arm_smmu_device *smmu;
> +	unsigned long flags;
> +	int ret = -EINVAL;
> +
> +	if (cfg->format != IOMMU_PASID_FORMAT_SMMUV3)
> +		return -EINVAL;
> +
> +	if (cfg->version != PASID_TABLE_CFG_VERSION_1 ||
> +	    cfg->vendor_data.smmuv3.version != PASID_TABLE_SMMUV3_CFG_VERSION_1)
> +		return -EINVAL;
> +
> +	mutex_lock(&smmu_domain->init_mutex);
> +
> +	smmu = smmu_domain->smmu;
> +
> +	if (!smmu)
> +		goto out;
> +
> +	if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
> +		goto out;
> +
> +	switch (cfg->config) {
> +	case IOMMU_PASID_CONFIG_ABORT:
> +		smmu_domain->s1_cfg.set = false;
> +		smmu_domain->abort = true;
> +		break;
> +	case IOMMU_PASID_CONFIG_BYPASS:
> +		smmu_domain->s1_cfg.set = false;
> +		smmu_domain->abort = false;
I didn't test it, but it seems that this will cause BUG() in arm_smmu_write_strtab_ent().
At the line "BUG_ON(ste_live && !nested);". Maybe I miss something?

> +		break;
> +	case IOMMU_PASID_CONFIG_TRANSLATE:
> +		/* we do not support S1 <-> S1 transitions */
> +		if (smmu_domain->s1_cfg.set)
> +			goto out;
> +
> +		/*
> +		 * we currently support a single CD so s1fmt and s1dss
> +		 * fields are also ignored
> +		 */
> +		if (cfg->pasid_bits)
> +			goto out;
> +
> +		smmu_domain->s1_cfg.cdcfg.cdtab_dma = cfg->base_ptr;
> +		smmu_domain->s1_cfg.set = true;
> +		smmu_domain->abort = false;
> +		break;
> +	default:
> +		goto out;
> +	}
> +	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
> +	list_for_each_entry(master, &smmu_domain->devices, domain_head)
> +		arm_smmu_install_ste_for_dev(master);
> +	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
> +	ret = 0;
> +out:
> +	mutex_unlock(&smmu_domain->init_mutex);
> +	return ret;
> +}
> +
[...]

Thanks,
Keqian

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v13 06/15] iommu/smmuv3: Implement attach/detach_pasid_table
  2021-02-02  8:03   ` Keqian Zhu
@ 2021-02-11 17:35     ` Auger Eric
  0 siblings, 0 replies; 57+ messages in thread
From: Auger Eric @ 2021-02-11 17:35 UTC (permalink / raw)
  To: Keqian Zhu, eric.auger.pro, iommu, linux-kernel, kvm, kvmarm,
	will, joro, maz, robin.murphy, alex.williamson
  Cc: jean-philippe, jacob.jun.pan, nicoleotsuka, vivek.gautam,
	yi.l.liu, zhangfei.gao

Hi Keqian,

On 2/2/21 9:03 AM, Keqian Zhu wrote:
> Hi Eric,
> 
> On 2020/11/18 19:21, Eric Auger wrote:
>> On attach_pasid_table() we program STE S1 related info set
>> by the guest into the actual physical STEs. At minimum
>> we need to program the context descriptor GPA and compute
>> whether the stage1 is translated/bypassed or aborted.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>
>> ---
>> v7 -> v8:
>> - remove smmu->features check, now done on domain finalize
>>
>> v6 -> v7:
>> - check versions and comment the fact we don't need to take
>>   into account s1dss and s1fmt
>> v3 -> v4:
>> - adapt to changes in iommu_pasid_table_config
>> - different programming convention at s1_cfg/s2_cfg/ste.abort
>>
>> v2 -> v3:
>> - callback now is named set_pasid_table and struct fields
>>   are laid out differently.
>>
>> v1 -> v2:
>> - invalidate the STE before changing them
>> - hold init_mutex
>> - handle new fields
>> ---
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 89 +++++++++++++++++++++
>>  1 file changed, 89 insertions(+)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 412ea1bafa50..805acdc18a3a 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -2661,6 +2661,93 @@ static void arm_smmu_get_resv_regions(struct device *dev,
>>  	iommu_dma_get_resv_regions(dev, head);
>>  }
>>  
>> +static int arm_smmu_attach_pasid_table(struct iommu_domain *domain,
>> +				       struct iommu_pasid_table_config *cfg)
>> +{
>> +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> +	struct arm_smmu_master *master;
>> +	struct arm_smmu_device *smmu;
>> +	unsigned long flags;
>> +	int ret = -EINVAL;
>> +
>> +	if (cfg->format != IOMMU_PASID_FORMAT_SMMUV3)
>> +		return -EINVAL;
>> +
>> +	if (cfg->version != PASID_TABLE_CFG_VERSION_1 ||
>> +	    cfg->vendor_data.smmuv3.version != PASID_TABLE_SMMUV3_CFG_VERSION_1)
>> +		return -EINVAL;
>> +
>> +	mutex_lock(&smmu_domain->init_mutex);
>> +
>> +	smmu = smmu_domain->smmu;
>> +
>> +	if (!smmu)
>> +		goto out;
>> +
>> +	if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
>> +		goto out;
>> +
>> +	switch (cfg->config) {
>> +	case IOMMU_PASID_CONFIG_ABORT:
>> +		smmu_domain->s1_cfg.set = false;
>> +		smmu_domain->abort = true;
>> +		break;
>> +	case IOMMU_PASID_CONFIG_BYPASS:
>> +		smmu_domain->s1_cfg.set = false;
>> +		smmu_domain->abort = false;
> I didn't test it, but it seems that this will cause BUG() in arm_smmu_write_strtab_ent().
> At the line "BUG_ON(ste_live && !nested);". Maybe I miss something?

No you are fully correct. Shammeer hit the BUG_ON() when booting the
guest with iommu.passthrough = 1. So I removed the BUG_ON(). The legacy
BUG_ON(ste_live) still is there under the form of BUG_ON(s1_live).

Thanks!

Eric


> 
>> +		break;
>> +	case IOMMU_PASID_CONFIG_TRANSLATE:
>> +		/* we do not support S1 <-> S1 transitions */
>> +		if (smmu_domain->s1_cfg.set)
>> +			goto out;
>> +
>> +		/*
>> +		 * we currently support a single CD so s1fmt and s1dss
>> +		 * fields are also ignored
>> +		 */
>> +		if (cfg->pasid_bits)
>> +			goto out;
>> +
>> +		smmu_domain->s1_cfg.cdcfg.cdtab_dma = cfg->base_ptr;
>> +		smmu_domain->s1_cfg.set = true;
>> +		smmu_domain->abort = false;
>> +		break;
>> +	default:
>> +		goto out;
>> +	}
>> +	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
>> +	list_for_each_entry(master, &smmu_domain->devices, domain_head)
>> +		arm_smmu_install_ste_for_dev(master);
>> +	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
>> +	ret = 0;
>> +out:
>> +	mutex_unlock(&smmu_domain->init_mutex);
>> +	return ret;
>> +}
>> +
> [...]
> 
> Thanks,
> Keqian
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v13 05/15] iommu/smmuv3: Get prepared for nested stage support
  2021-02-02  7:14   ` Keqian Zhu
@ 2021-02-11 17:36     ` Auger Eric
  0 siblings, 0 replies; 57+ messages in thread
From: Auger Eric @ 2021-02-11 17:36 UTC (permalink / raw)
  To: Keqian Zhu, eric.auger.pro, iommu, linux-kernel, kvm, kvmarm,
	will, joro, maz, robin.murphy, alex.williamson
  Cc: jean-philippe, jacob.jun.pan, nicoleotsuka, vivek.gautam,
	yi.l.liu, zhangfei.gao

Hi Keqian,

On 2/2/21 8:14 AM, Keqian Zhu wrote:
> Hi Eric,
> 
> On 2020/11/18 19:21, Eric Auger wrote:
>> When nested stage translation is setup, both s1_cfg and
>> s2_cfg are set.
>>
>> We introduce a new smmu domain abort field that will be set
>> upon guest stage1 configuration passing.
>>
>> arm_smmu_write_strtab_ent() is modified to write both stage
>> fields in the STE and deal with the abort field.
>>
>> In nested mode, only stage 2 is "finalized" as the host does
>> not own/configure the stage 1 context descriptor; guest does.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>
>> ---
>> v10 -> v11:
>> - Fix an issue reported by Shameer when switching from with vSMMU
>>   to without vSMMU. Despite the spec does not seem to mention it
>>   seems to be needed to reset the 2 high 64b when switching from
>>   S1+S2 cfg to S1 only. Especially dst[3] needs to be reset (S2TTB).
>>   On some implementations, if the S2TTB is not reset, this causes
>>   a C_BAD_STE error
>> ---
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 64 +++++++++++++++++----
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  2 +
>>  2 files changed, 56 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 18ac5af1b284..412ea1bafa50 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -1181,8 +1181,10 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>>  	 * three cases at the moment:
>>  	 *
>>  	 * 1. Invalid (all zero) -> bypass/fault (init)
>> -	 * 2. Bypass/fault -> translation/bypass (attach)
>> -	 * 3. Translation/bypass -> bypass/fault (detach)
>> +	 * 2. Bypass/fault -> single stage translation/bypass (attach)
>> +	 * 3. Single or nested stage Translation/bypass -> bypass/fault (detach)
>> +	 * 4. S2 -> S1 + S2 (attach_pasid_table)
>> +	 * 5. S1 + S2 -> S2 (detach_pasid_table)
> 
> The following line "BUG_ON(ste_live && !nested);" forbids this transform.

Yes as pointed out by Kunkun, there is always an abort in-between. I
will restore the original comment.

> And I have a look at the 6th patch, the transform seems S1 + S2 -> abort.
> So after detach, the status is not the same as that before attach. Does it
> match our expectation?

Indeed at detach time I think I should reset the abort() flag as this
latter is not imposed anymore by the guest.

Thanks!

Eric


> 
>>  	 *
>>  	 * Given that we can't update the STE atomically and the SMMU
>>  	 * doesn't read the thing in a defined order, that leaves us
>> @@ -1193,7 +1195,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>>  	 * 3. Update Config, sync
>>  	 */
>>  	u64 val = le64_to_cpu(dst[0]);
>> -	bool ste_live = false;
>> +	bool s1_live = false, s2_live = false, ste_live;
>> +	bool abort, nested = false, translate = false;
>>  	struct arm_smmu_device *smmu = NULL;
>>  	struct arm_smmu_s1_cfg *s1_cfg;
>>  	struct arm_smmu_s2_cfg *s2_cfg;
>> @@ -1233,6 +1236,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>>  		default:
>>  			break;
>>  		}
>> +		nested = s1_cfg->set && s2_cfg->set;
>> +		translate = s1_cfg->set || s2_cfg->set;
>>  	}
>>  
>>  	if (val & STRTAB_STE_0_V) {
>> @@ -1240,23 +1245,36 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>>  		case STRTAB_STE_0_CFG_BYPASS:
>>  			break;
>>  		case STRTAB_STE_0_CFG_S1_TRANS:
>> +			s1_live = true;
>> +			break;
>>  		case STRTAB_STE_0_CFG_S2_TRANS:
>> -			ste_live = true;
>> +			s2_live = true;
>> +			break;
>> +		case STRTAB_STE_0_CFG_NESTED:
>> +			s1_live = true;
>> +			s2_live = true;
>>  			break;
>>  		case STRTAB_STE_0_CFG_ABORT:
>> -			BUG_ON(!disable_bypass);
>>  			break;
>>  		default:
>>  			BUG(); /* STE corruption */
>>  		}
>>  	}
>>  
>> +	ste_live = s1_live || s2_live;
>> +
>>  	/* Nuke the existing STE_0 value, as we're going to rewrite it */
>>  	val = STRTAB_STE_0_V;
>>  
>>  	/* Bypass/fault */
>> -	if (!smmu_domain || !(s1_cfg->set || s2_cfg->set)) {
>> -		if (!smmu_domain && disable_bypass)
>> +
>> +	if (!smmu_domain)
>> +		abort = disable_bypass;
>> +	else
>> +		abort = smmu_domain->abort;
>> +
>> +	if (abort || !translate) {
>> +		if (abort)
>>  			val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT);
>>  		else
>>  			val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS);
>> @@ -1274,8 +1292,16 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>>  		return;
>>  	}
>>  
>> +	BUG_ON(ste_live && !nested);
>> +
>> +	if (ste_live) {
>> +		/* First invalidate the live STE */
>> +		dst[0] = cpu_to_le64(STRTAB_STE_0_CFG_ABORT);
>> +		arm_smmu_sync_ste_for_sid(smmu, sid);
>> +	}
>> +
> [...]
> 
> Thanks,
> Keqian
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v13 02/15] iommu: Introduce bind/unbind_guest_msi
  2021-02-01 11:52   ` Keqian Zhu
@ 2021-02-12  8:55     ` Auger Eric
  2021-02-18  8:43       ` Keqian Zhu
  0 siblings, 1 reply; 57+ messages in thread
From: Auger Eric @ 2021-02-12  8:55 UTC (permalink / raw)
  To: Keqian Zhu, eric.auger.pro, iommu, linux-kernel, kvm, kvmarm,
	will, joro, maz, robin.murphy, alex.williamson
  Cc: jean-philippe, jacob.jun.pan, nicoleotsuka, vivek.gautam,
	yi.l.liu, zhangfei.gao

Hi Keqian,

On 2/1/21 12:52 PM, Keqian Zhu wrote:
> Hi Eric,
> 
> On 2020/11/18 19:21, Eric Auger wrote:
>> On ARM, MSI are translated by the SMMU. An IOVA is allocated
>> for each MSI doorbell. If both the host and the guest are exposed
>> with SMMUs, we end up with 2 different IOVAs allocated by each.
>> guest allocates an IOVA (gIOVA) to map onto the guest MSI
>> doorbell (gDB). The Host allocates another IOVA (hIOVA) to map
>> onto the physical doorbell (hDB).
>>
>> So we end up with 2 untied mappings:
>>          S1            S2
>> gIOVA    ->    gDB
>>               hIOVA    ->    hDB
>>
>> Currently the PCI device is programmed by the host with hIOVA
>> as MSI doorbell. So this does not work.
>>
>> This patch introduces an API to pass gIOVA/gDB to the host so
>> that gIOVA can be reused by the host instead of re-allocating
>> a new IOVA. So the goal is to create the following nested mapping:
> Does the gDB can be reused under non-nested mode?

Under non nested mode the hIOVA is allocated within the MSI reserved
region exposed by the SMMU driver, [0x8000000, 80fffff]. see
iommu_dma_prepare_msi/iommu_dma_get_msi_page in dma_iommu.c. this hIOVA
is programmed in the physical device so that the physical SMMU
translates it into the physical doorbell (hDB = host physical ITS
doorbell). The gDB is not used at pIOMMU programming level. It is only
used when setting up the KVM irq route.

Hope this answers your question.

> 
>>
>>          S1            S2
>> gIOVA    ->    gDB     ->    hDB
>>
>> and program the PCI device with gIOVA MSI doorbell.
>>
>> In case we have several devices attached to this nested domain
>> (devices belonging to the same group), they cannot be isolated
>> on guest side either. So they should also end up in the same domain
>> on guest side. We will enforce that all the devices attached to
>> the host iommu domain use the same physical doorbell and similarly
>> a single virtual doorbell mapping gets registered (1 single
>> virtual doorbell is used on guest as well).
>>
> [...]
> 
>> + *
>> + * The associated IOVA can be reused by the host to create a nested
>> + * stage2 binding mapping translating into the physical doorbell used
>> + * by the devices attached to the domain.
>> + *
>> + * All devices within the domain must share the same physical doorbell.
>> + * A single MSI GIOVA/GPA mapping can be attached to an iommu_domain.
>> + */
>> +
>> +int iommu_bind_guest_msi(struct iommu_domain *domain,
>> +			 dma_addr_t giova, phys_addr_t gpa, size_t size)
>> +{
>> +	if (unlikely(!domain->ops->bind_guest_msi))
>> +		return -ENODEV;
>> +
>> +	return domain->ops->bind_guest_msi(domain, giova, gpa, size);
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_bind_guest_msi);
>> +
>> +void iommu_unbind_guest_msi(struct iommu_domain *domain,
>> +			    dma_addr_t iova)
> nit: s/iova/giova
sure
> 
>> +{
>> +	if (unlikely(!domain->ops->unbind_guest_msi))
>> +		return;
>> +
>> +	domain->ops->unbind_guest_msi(domain, iova);
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_unbind_guest_msi);
>> +
> [...]
> 
> Thanks,
> Keqian
> 

Thanks

Eric


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
  2020-12-03 18:42       ` Shameerali Kolothum Thodi
  2020-12-04  9:53         ` Jean-Philippe Brucker
@ 2021-02-15 13:17         ` Auger Eric
  1 sibling, 0 replies; 57+ messages in thread
From: Auger Eric @ 2021-02-15 13:17 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, wangxingang
  Cc: Xieyingtai, jean-philippe, kvm, maz, joro, will, iommu,
	linux-kernel, vivek.gautam, alex.williamson, zhangfei.gao,
	robin.murphy, kvmarm, eric.auger.pro, Zengtao (B),
	qubingbing

Hi Shameer,

On 12/3/20 7:42 PM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
> 
>> -----Original Message-----
>> From: kvmarm-bounces@lists.cs.columbia.edu
>> [mailto:kvmarm-bounces@lists.cs.columbia.edu] On Behalf Of Auger Eric
>> Sent: 01 December 2020 13:59
>> To: wangxingang <wangxingang5@huawei.com>
>> Cc: Xieyingtai <xieyingtai@huawei.com>; jean-philippe@linaro.org;
>> kvm@vger.kernel.org; maz@kernel.org; joro@8bytes.org; will@kernel.org;
>> iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
>> vivek.gautam@arm.com; alex.williamson@redhat.com;
>> zhangfei.gao@linaro.org; robin.murphy@arm.com;
>> kvmarm@lists.cs.columbia.edu; eric.auger.pro@gmail.com
>> Subject: Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with
>> unmanaged ASIDs
>>
>> Hi Xingang,
>>
>> On 12/1/20 2:33 PM, Xingang Wang wrote:
>>> Hi Eric
>>>
>>> On  Wed, 18 Nov 2020 12:21:43, Eric Auger wrote:
>>>> @@ -1710,7 +1710,11 @@ static void arm_smmu_tlb_inv_context(void
>> *cookie)
>>>> 	 * insertion to guarantee those are observed before the TLBI. Do be
>>>> 	 * careful, 007.
>>>> 	 */
>>>> -	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>>>> +	if (ext_asid >= 0) { /* guest stage 1 invalidation */
>>>> +		cmd.opcode	= CMDQ_OP_TLBI_NH_ASID;
>>>> +		cmd.tlbi.asid	= ext_asid;
>>>> +		cmd.tlbi.vmid	= smmu_domain->s2_cfg.vmid;
>>>> +	} else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>>>
>>> Found a problem here, the cmd for guest stage 1 invalidation is built,
>>> but it is not delivered to smmu.
>>>
>>
>> Thank you for the report. I will fix that soon. With that fixed, have
>> you been able to run vSVA on top of the series. Do you need other stuff
>> to be fixed at SMMU level? 
> 
> I am seeing another issue with this series. This is when you have the vSMMU
> in non-strict mode(iommu.strict=0). Any network pass-through dev with iperf run 
> will be enough to reproduce the issue. It may randomly stop/hang.
> 
> It looks like the .flush_iotlb_all from guest is not propagated down to the host
> correctly. I have a temp hack to fix this in Qemu wherein CMDQ_OP_TLBI_NH_ASID
> will result in a CACHE_INVALIDATE with IOMMU_INV_GRANU_PASID flag and archid
> set.

Thank you for the analysis. Indeed the NH_ASID was not properly handled
as asid info was not passed down. I fixed domain invalidation and added
asid based invalidation.

Thanks

Eric
> 
> Please take a look and let me know. 
> 
> As I am going to respin soon, please let me
>> know what is the best branch to rebase to alleviate your integration.
> 
> Please find the latest kernel and Qemu branch with vSVA support added here,
> 
> https://github.com/hisilicon/kernel-dev/tree/5.10-rc4-2stage-v13-vsva
> https://github.com/hisilicon/qemu/tree/v5.2.0-rc1-2stage-rfcv7-vsva
> 
> I have done some basic minimum vSVA tests on a HiSilicon D06 board with
> a zip dev that supports STALL. All looks good so far apart from the issues
> that have been already reported/discussed.
> 
> The kernel branch is actually a rebase of sva/uacce related patches from a
> Linaro branch here,
> 
> https://github.com/Linaro/linux-kernel-uadk/tree/uacce-devel-5.10
> 
> I think going forward it will be good(if possible) to respin your series on top of
> a sva branch with STALL/PRI support added. 
> 
> Hi Jean/zhangfei,
> Is it possible to have a branch with minimum required SVA/UACCE related patches
> that are already public and can be a "stable" candidate for future respin of Eric's series?
> Please share your thoughts.
> 
> Thanks,
> Shameer 
> 
>> Best Regards
>>
>> Eric
>>
>> _______________________________________________
>> kvmarm mailing list
>> kvmarm@lists.cs.columbia.edu
>> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v13 02/15] iommu: Introduce bind/unbind_guest_msi
  2021-02-12  8:55     ` Auger Eric
@ 2021-02-18  8:43       ` Keqian Zhu
  2021-02-18 10:35         ` Auger Eric
  0 siblings, 1 reply; 57+ messages in thread
From: Keqian Zhu @ 2021-02-18  8:43 UTC (permalink / raw)
  To: Auger Eric, eric.auger.pro, iommu, linux-kernel, kvm, kvmarm,
	will, joro, maz, robin.murphy, alex.williamson
  Cc: jean-philippe, jacob.jun.pan, nicoleotsuka, vivek.gautam,
	yi.l.liu, zhangfei.gao

Hi Eric,

On 2021/2/12 16:55, Auger Eric wrote:
> Hi Keqian,
> 
> On 2/1/21 12:52 PM, Keqian Zhu wrote:
>> Hi Eric,
>>
>> On 2020/11/18 19:21, Eric Auger wrote:
>>> On ARM, MSI are translated by the SMMU. An IOVA is allocated
>>> for each MSI doorbell. If both the host and the guest are exposed
>>> with SMMUs, we end up with 2 different IOVAs allocated by each.
>>> guest allocates an IOVA (gIOVA) to map onto the guest MSI
>>> doorbell (gDB). The Host allocates another IOVA (hIOVA) to map
>>> onto the physical doorbell (hDB).
>>>
>>> So we end up with 2 untied mappings:
>>>          S1            S2
>>> gIOVA    ->    gDB
>>>               hIOVA    ->    hDB
>>>
>>> Currently the PCI device is programmed by the host with hIOVA
>>> as MSI doorbell. So this does not work.
>>>
>>> This patch introduces an API to pass gIOVA/gDB to the host so
>>> that gIOVA can be reused by the host instead of re-allocating
>>> a new IOVA. So the goal is to create the following nested mapping:
>> Does the gDB can be reused under non-nested mode?
> 
> Under non nested mode the hIOVA is allocated within the MSI reserved
> region exposed by the SMMU driver, [0x8000000, 80fffff]. see
> iommu_dma_prepare_msi/iommu_dma_get_msi_page in dma_iommu.c. this hIOVA
> is programmed in the physical device so that the physical SMMU
> translates it into the physical doorbell (hDB = host physical ITS
So, AFAIU, under non-nested mode, at smmu side, we reuse the workflow of non-virtualization scenario.

> doorbell). The gDB is not used at pIOMMU programming level. It is only
> used when setting up the KVM irq route.
> 
> Hope this answers your question.
Thanks for your explanation!
> 

Thanks,
Keqian

>>
>>>
>>>          S1            S2
>>> gIOVA    ->    gDB     ->    hDB
>>>
>>> and program the PCI device with gIOVA MSI doorbell.
>>>
>>> In case we have several devices attached to this nested domain
>>> (devices belonging to the same group), they cannot be isolated
>>> on guest side either. So they should also end up in the same domain
>>> on guest side. We will enforce that all the devices attached to
>>> the host iommu domain use the same physical doorbell and similarly
>>> a single virtual doorbell mapping gets registered (1 single
>>> virtual doorbell is used on guest as well).
>>>
>> [...]
>>
>>> + *
>>> + * The associated IOVA can be reused by the host to create a nested
>>> + * stage2 binding mapping translating into the physical doorbell used
>>> + * by the devices attached to the domain.
>>> + *
>>> + * All devices within the domain must share the same physical doorbell.
>>> + * A single MSI GIOVA/GPA mapping can be attached to an iommu_domain.
>>> + */
>>> +
>>> +int iommu_bind_guest_msi(struct iommu_domain *domain,
>>> +			 dma_addr_t giova, phys_addr_t gpa, size_t size)
>>> +{
>>> +	if (unlikely(!domain->ops->bind_guest_msi))
>>> +		return -ENODEV;
>>> +
>>> +	return domain->ops->bind_guest_msi(domain, giova, gpa, size);
>>> +}
>>> +EXPORT_SYMBOL_GPL(iommu_bind_guest_msi);
>>> +
>>> +void iommu_unbind_guest_msi(struct iommu_domain *domain,
>>> +			    dma_addr_t iova)
>> nit: s/iova/giova
> sure
>>
>>> +{
>>> +	if (unlikely(!domain->ops->unbind_guest_msi))
>>> +		return;
>>> +
>>> +	domain->ops->unbind_guest_msi(domain, iova);
>>> +}
>>> +EXPORT_SYMBOL_GPL(iommu_unbind_guest_msi);
>>> +
>> [...]
>>
>> Thanks,
>> Keqian
>>
> 
> Thanks
> 
> Eric
> 
> .
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v13 02/15] iommu: Introduce bind/unbind_guest_msi
  2021-02-18  8:43       ` Keqian Zhu
@ 2021-02-18 10:35         ` Auger Eric
  0 siblings, 0 replies; 57+ messages in thread
From: Auger Eric @ 2021-02-18 10:35 UTC (permalink / raw)
  To: Keqian Zhu, eric.auger.pro, iommu, linux-kernel, kvm, kvmarm,
	will, joro, maz, robin.murphy, alex.williamson
  Cc: jean-philippe, jacob.jun.pan, nicoleotsuka, vivek.gautam,
	yi.l.liu, zhangfei.gao

Hi Keqian,

On 2/18/21 9:43 AM, Keqian Zhu wrote:
> Hi Eric,
> 
> On 2021/2/12 16:55, Auger Eric wrote:
>> Hi Keqian,
>>
>> On 2/1/21 12:52 PM, Keqian Zhu wrote:
>>> Hi Eric,
>>>
>>> On 2020/11/18 19:21, Eric Auger wrote:
>>>> On ARM, MSI are translated by the SMMU. An IOVA is allocated
>>>> for each MSI doorbell. If both the host and the guest are exposed
>>>> with SMMUs, we end up with 2 different IOVAs allocated by each.
>>>> guest allocates an IOVA (gIOVA) to map onto the guest MSI
>>>> doorbell (gDB). The Host allocates another IOVA (hIOVA) to map
>>>> onto the physical doorbell (hDB).
>>>>
>>>> So we end up with 2 untied mappings:
>>>>          S1            S2
>>>> gIOVA    ->    gDB
>>>>               hIOVA    ->    hDB
>>>>
>>>> Currently the PCI device is programmed by the host with hIOVA
>>>> as MSI doorbell. So this does not work.
>>>>
>>>> This patch introduces an API to pass gIOVA/gDB to the host so
>>>> that gIOVA can be reused by the host instead of re-allocating
>>>> a new IOVA. So the goal is to create the following nested mapping:
>>> Does the gDB can be reused under non-nested mode?
>>
>> Under non nested mode the hIOVA is allocated within the MSI reserved
>> region exposed by the SMMU driver, [0x8000000, 80fffff]. see
>> iommu_dma_prepare_msi/iommu_dma_get_msi_page in dma_iommu.c. this hIOVA
>> is programmed in the physical device so that the physical SMMU
>> translates it into the physical doorbell (hDB = host physical ITS
> So, AFAIU, under non-nested mode, at smmu side, we reuse the workflow of non-virtualization scenario.
Without virtualization, the host kernel also transparently allocates an
iova to map the doorbell. With standard passthrough withou vIOMMU, the
iova window is different (MSI RESV region).

Thanks

Eric
> 
>> doorbell). The gDB is not used at pIOMMU programming level. It is only
>> used when setting up the KVM irq route.
>>
>> Hope this answers your question.
> Thanks for your explanation!
>>
> 
> Thanks,
> Keqian
> 
>>>
>>>>
>>>>          S1            S2
>>>> gIOVA    ->    gDB     ->    hDB
>>>>
>>>> and program the PCI device with gIOVA MSI doorbell.
>>>>
>>>> In case we have several devices attached to this nested domain
>>>> (devices belonging to the same group), they cannot be isolated
>>>> on guest side either. So they should also end up in the same domain
>>>> on guest side. We will enforce that all the devices attached to
>>>> the host iommu domain use the same physical doorbell and similarly
>>>> a single virtual doorbell mapping gets registered (1 single
>>>> virtual doorbell is used on guest as well).
>>>>
>>> [...]
>>>
>>>> + *
>>>> + * The associated IOVA can be reused by the host to create a nested
>>>> + * stage2 binding mapping translating into the physical doorbell used
>>>> + * by the devices attached to the domain.
>>>> + *
>>>> + * All devices within the domain must share the same physical doorbell.
>>>> + * A single MSI GIOVA/GPA mapping can be attached to an iommu_domain.
>>>> + */
>>>> +
>>>> +int iommu_bind_guest_msi(struct iommu_domain *domain,
>>>> +			 dma_addr_t giova, phys_addr_t gpa, size_t size)
>>>> +{
>>>> +	if (unlikely(!domain->ops->bind_guest_msi))
>>>> +		return -ENODEV;
>>>> +
>>>> +	return domain->ops->bind_guest_msi(domain, giova, gpa, size);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(iommu_bind_guest_msi);
>>>> +
>>>> +void iommu_unbind_guest_msi(struct iommu_domain *domain,
>>>> +			    dma_addr_t iova)
>>> nit: s/iova/giova
>> sure
>>>
>>>> +{
>>>> +	if (unlikely(!domain->ops->unbind_guest_msi))
>>>> +		return;
>>>> +
>>>> +	domain->ops->unbind_guest_msi(domain, iova);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(iommu_unbind_guest_msi);
>>>> +
>>> [...]
>>>
>>> Thanks,
>>> Keqian
>>>
>>
>> Thanks
>>
>> Eric
>>
>> .
>>
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)
  2021-01-08 17:05 ` [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part) Shameerali Kolothum Thodi
  2021-01-13 15:37   ` Auger Eric
@ 2021-02-21 18:21   ` Auger Eric
  2021-02-22  8:56     ` Shameerali Kolothum Thodi
  1 sibling, 1 reply; 57+ messages in thread
From: Auger Eric @ 2021-02-21 18:21 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, eric.auger.pro, iommu, linux-kernel,
	kvm, kvmarm, will, joro, maz, robin.murphy, alex.williamson
  Cc: jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	jacob.jun.pan, yi.l.liu, tn, nicoleotsuka, yuzenghui, Zengtao (B),
	linuxarm

Hi Shameer,
On 1/8/21 6:05 PM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
> 
>> -----Original Message-----
>> From: Eric Auger [mailto:eric.auger@redhat.com]
>> Sent: 18 November 2020 11:22
>> To: eric.auger.pro@gmail.com; eric.auger@redhat.com;
>> iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
>> kvm@vger.kernel.org; kvmarm@lists.cs.columbia.edu; will@kernel.org;
>> joro@8bytes.org; maz@kernel.org; robin.murphy@arm.com;
>> alex.williamson@redhat.com
>> Cc: jean-philippe@linaro.org; zhangfei.gao@linaro.org;
>> zhangfei.gao@gmail.com; vivek.gautam@arm.com; Shameerali Kolothum
>> Thodi <shameerali.kolothum.thodi@huawei.com>;
>> jacob.jun.pan@linux.intel.com; yi.l.liu@intel.com; tn@semihalf.com;
>> nicoleotsuka@gmail.com; yuzenghui <yuzenghui@huawei.com>
>> Subject: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)
>>
>> This series brings the IOMMU part of HW nested paging support
>> in the SMMUv3. The VFIO part is submitted separately.
>>
>> The IOMMU API is extended to support 2 new API functionalities:
>> 1) pass the guest stage 1 configuration
>> 2) pass stage 1 MSI bindings
>>
>> Then those capabilities gets implemented in the SMMUv3 driver.
>>
>> The virtualizer passes information through the VFIO user API
>> which cascades them to the iommu subsystem. This allows the guest
>> to own stage 1 tables and context descriptors (so-called PASID
>> table) while the host owns stage 2 tables and main configuration
>> structures (STE).
> 
> I am seeing an issue with Guest testpmd run with this series.
> I have two different setups and testpmd works fine with the
> first one but not with the second.
> 
> 1). Guest doesn't have kernel driver built-in for pass-through dev.
> 
> root@ubuntu:/# lspci -v
> ...
> 00:02.0 Ethernet controller: Huawei Technologies Co., Ltd. Device a22e (rev 21)
> Subsystem: Huawei Technologies Co., Ltd. Device 0000
> Flags: fast devsel
> Memory at 8000100000 (64-bit, prefetchable) [disabled] [size=64K]
> Memory at 8000000000 (64-bit, prefetchable) [disabled] [size=1M]
> Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
> Capabilities: [a0] MSI-X: Enable- Count=67 Masked-
> Capabilities: [b0] Power Management version 3
> Capabilities: [100] Access Control Services
> Capabilities: [300] Transaction Processing Hints
> 
> root@ubuntu:/# echo vfio-pci > /sys/bus/pci/devices/0000:00:02.0/driver_override
> root@ubuntu:/# echo 0000:00:02.0 > /sys/bus/pci/drivers_probe
> 
> root@ubuntu:/mnt/dpdk/build/app# ./testpmd -w 0000:00:02.0 --file-prefix socket0  -l 0-1 -n 2 -- -i
> EAL: Detected 8 lcore(s)
> EAL: Detected 1 NUMA nodes
> EAL: Multi-process socket /var/run/dpdk/socket0/mp_socket
> EAL: Selected IOVA mode 'VA'
> EAL: No available hugepages reported in hugepages-32768kB
> EAL: No available hugepages reported in hugepages-64kB
> EAL: No available hugepages reported in hugepages-1048576kB
> EAL: Probing VFIO support...
> EAL: VFIO support initialized
> EAL:   Invalid NUMA socket, default to 0
> EAL:   using IOMMU type 1 (Type 1)
> EAL: Probe PCI driver: net_hns3_vf (19e5:a22e) device: 0000:00:02.0 (socket 0)
> EAL: No legacy callbacks, legacy socket not created
> Interactive-mode selected
> testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=155456, size=2176, socket=0
> testpmd: preferred mempool ops selected: ring_mp_mc
> 
> Warning! port-topology=paired and odd forward ports number, the last port will pair with itself.
> 
> Configuring Port 0 (socket 0)
> Port 0: 8E:A6:8C:43:43:45
> Checking link statuses...
> Done
> testpmd>
> 
> 2). Guest have kernel driver built-in for pass-through dev.
> 
> root@ubuntu:/# lspci -v
> ...
> 00:02.0 Ethernet controller: Huawei Technologies Co., Ltd. Device a22e (rev 21)
> Subsystem: Huawei Technologies Co., Ltd. Device 0000
> Flags: bus master, fast devsel, latency 0
> Memory at 8000100000 (64-bit, prefetchable) [size=64K]
> Memory at 8000000000 (64-bit, prefetchable) [size=1M]
> Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
> Capabilities: [a0] MSI-X: Enable+ Count=67 Masked-
> Capabilities: [b0] Power Management version 3
> Capabilities: [100] Access Control Services
> Capabilities: [300] Transaction Processing Hints
> Kernel driver in use: hns3
> 
> root@ubuntu:/# echo vfio-pci > /sys/bus/pci/devices/0000:00:02.0/driver_override
> root@ubuntu:/# echo 0000:00:02.0 > /sys/bus/pci/drivers/hns3/unbind
> root@ubuntu:/# echo 0000:00:02.0 > /sys/bus/pci/drivers_probe
> 
> root@ubuntu:/mnt/dpdk/build/app# ./testpmd -w 0000:00:02.0 --file-prefix socket0 -l 0-1 -n 2 -- -i
> EAL: Detected 8 lcore(s)
> EAL: Detected 1 NUMA nodes
> EAL: Multi-process socket /var/run/dpdk/socket0/mp_socket
> EAL: Selected IOVA mode 'VA'
> EAL: No available hugepages reported in hugepages-32768kB
> EAL: No available hugepages reported in hugepages-64kB
> EAL: No available hugepages reported in hugepages-1048576kB
> EAL: Probing VFIO support...
> EAL: VFIO support initialized
> EAL:   Invalid NUMA socket, default to 0
> EAL:   using IOMMU type 1 (Type 1)
> EAL: Probe PCI driver: net_hns3_vf (19e5:a22e) device: 0000:00:02.0 (socket 0)
> 0000:00:02.0 hns3_get_mbx_resp(): VF could not get mbx(11,0) head(1) tail(0) lost(1) from PF in_irq:0
> hns3vf_get_queue_info(): Failed to get tqp info from PF: -62
> hns3vf_init_vf(): Failed to fetch configuration: -62
> hns3vf_dev_init(): Failed to init vf: -62
> EAL: Releasing pci mapped resource for 0000:00:02.0
> EAL: Calling pci_unmap_resource for 0000:00:02.0 at 0x1100800000
> EAL: Calling pci_unmap_resource for 0000:00:02.0 at 0x1100810000
> EAL: Requested device 0000:00:02.0 cannot be used
> EAL: Bus (pci) probe failed.
> EAL: No legacy callbacks, legacy socket not created
> testpmd: No probed ethernet devices
> Interactive-mode selected
> testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=155456, size=2176, socket=0
> testpmd: preferred mempool ops selected: ring_mp_mc
> Done
> testpmd>
> 
> And in this case, smmu(host) reports a translation fault,
> 
> [ 6542.670624] arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
> [ 6542.670630] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00007d1200000010
> [ 6542.670631] arm-smmu-v3 arm-smmu-v3.2.auto: 0x000012000000007c
> [ 6542.670633] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000fffef040
> [ 6542.670634] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000fffef000
> 
> Tested with Intel 82599 card(ixgbevf) as well. but same errror.

So this should be fixed in the next release. The problem came from the
fact the MSI giova was not duly unregistered. When vfio is not in used
on guest side, the guest kernel allocates giovas for MSIs @fffef000 - 40
is the ITS translater offset ;-) - When passthrough is in use, the iova
is allocated @0x8000000. As fffef000 MSI giova was not properly
unregistered, the host kernel used it - despite it has been unmapped by
the guest kernel -, hence the translation fault. So the fix is to
unregister the MSI in the VFIO QEMU code when msix are disabled. So to
me this is a QEMU integration issue.

Thank you very much for testing and reporting!

Thanks

Eric
> 
> Not able to root cause the problem yet. With the hope that, this is 
> related to tlb entries not being invlaidated properly, I tried explicitly
> issuing CMD_TLBI_NSNH_ALL and CMD_CFGI_CD_ALL just before
> the STE update, but no luck yet :(
> 
> Please let me know if I am missing something here or has any clue if you
> can replicate this on your setup.
> 
> Thanks,
> Shameer
> 
>>
>> Best Regards
>>
>> Eric
>>
>> This series can be found at:
>> https://github.com/eauger/linux/tree/5.10-rc4-2stage-v13
>> (including the VFIO part in his last version: v11)
>>
>> The series includes a patch from Jean-Philippe. It is better to
>> review the original patch:
>> [PATCH v8 2/9] iommu/arm-smmu-v3: Maintain a SID->device structure
>>
>> The VFIO series is sent separately.
>>
>> History:
>>
>> v12 -> v13:
>> - fixed compilation issue with CONFIG_ARM_SMMU_V3_SVA
>>   reported by Shameer. This urged me to revisit patch 4 into
>>   iommu/smmuv3: Allow s1 and s2 configs to coexist where
>>   s1_cfg and s2_cfg are not dynamically allocated anymore.
>>   Instead I use a new set field in existing structs
>> - fixed 2 others config checks
>> - Updated "iommu/arm-smmu-v3: Maintain a SID->device structure"
>>   according to the last version
>>
>> v11 -> v12:
>> - rebase on top of v5.10-rc4
>>
>> Eric Auger (14):
>>   iommu: Introduce attach/detach_pasid_table API
>>   iommu: Introduce bind/unbind_guest_msi
>>   iommu/smmuv3: Allow s1 and s2 configs to coexist
>>   iommu/smmuv3: Get prepared for nested stage support
>>   iommu/smmuv3: Implement attach/detach_pasid_table
>>   iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
>>   iommu/smmuv3: Implement cache_invalidate
>>   dma-iommu: Implement NESTED_MSI cookie
>>   iommu/smmuv3: Nested mode single MSI doorbell per domain enforcement
>>   iommu/smmuv3: Enforce incompatibility between nested mode and HW MSI
>>     regions
>>   iommu/smmuv3: Implement bind/unbind_guest_msi
>>   iommu/smmuv3: Report non recoverable faults
>>   iommu/smmuv3: Accept configs with more than one context descriptor
>>   iommu/smmuv3: Add PASID cache invalidation per PASID
>>
>> Jean-Philippe Brucker (1):
>>   iommu/arm-smmu-v3: Maintain a SID->device structure
>>
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 659
>> ++++++++++++++++++--
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 103 ++-
>>  drivers/iommu/dma-iommu.c                   | 142 ++++-
>>  drivers/iommu/iommu.c                       | 105 ++++
>>  include/linux/dma-iommu.h                   |  16 +
>>  include/linux/iommu.h                       |  41 ++
>>  include/uapi/linux/iommu.h                  |  54 ++
>>  7 files changed, 1042 insertions(+), 78 deletions(-)
>>
>> --
>> 2.21.3
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)
  2021-02-21 18:21   ` Auger Eric
@ 2021-02-22  8:56     ` Shameerali Kolothum Thodi
  0 siblings, 0 replies; 57+ messages in thread
From: Shameerali Kolothum Thodi @ 2021-02-22  8:56 UTC (permalink / raw)
  To: Auger Eric, eric.auger.pro, iommu, linux-kernel, kvm, kvmarm,
	will, joro, maz, robin.murphy, alex.williamson
  Cc: jean-philippe, zhangfei.gao, zhangfei.gao, vivek.gautam,
	jacob.jun.pan, yi.l.liu, tn, nicoleotsuka, yuzenghui, Zengtao (B),
	linuxarm



> -----Original Message-----
> From: Auger Eric [mailto:eric.auger@redhat.com]
> Sent: 21 February 2021 18:21
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
> eric.auger.pro@gmail.com; iommu@lists.linux-foundation.org;
> linux-kernel@vger.kernel.org; kvm@vger.kernel.org;
> kvmarm@lists.cs.columbia.edu; will@kernel.org; joro@8bytes.org;
> maz@kernel.org; robin.murphy@arm.com; alex.williamson@redhat.com
> Cc: jean-philippe@linaro.org; zhangfei.gao@linaro.org;
> zhangfei.gao@gmail.com; vivek.gautam@arm.com;
> jacob.jun.pan@linux.intel.com; yi.l.liu@intel.com; tn@semihalf.com;
> nicoleotsuka@gmail.com; yuzenghui <yuzenghui@huawei.com>; Zengtao (B)
> <prime.zeng@hisilicon.com>; linuxarm@openeuler.org
> Subject: Re: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)
> 
> Hi Shameer,
> On 1/8/21 6:05 PM, Shameerali Kolothum Thodi wrote:
> > Hi Eric,
> >
> >> -----Original Message-----
> >> From: Eric Auger [mailto:eric.auger@redhat.com]
> >> Sent: 18 November 2020 11:22
> >> To: eric.auger.pro@gmail.com; eric.auger@redhat.com;
> >> iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> >> kvm@vger.kernel.org; kvmarm@lists.cs.columbia.edu; will@kernel.org;
> >> joro@8bytes.org; maz@kernel.org; robin.murphy@arm.com;
> >> alex.williamson@redhat.com
> >> Cc: jean-philippe@linaro.org; zhangfei.gao@linaro.org;
> >> zhangfei.gao@gmail.com; vivek.gautam@arm.com; Shameerali Kolothum
> >> Thodi <shameerali.kolothum.thodi@huawei.com>;
> >> jacob.jun.pan@linux.intel.com; yi.l.liu@intel.com; tn@semihalf.com;
> >> nicoleotsuka@gmail.com; yuzenghui <yuzenghui@huawei.com>
> >> Subject: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)
> >>
> >> This series brings the IOMMU part of HW nested paging support
> >> in the SMMUv3. The VFIO part is submitted separately.
> >>
> >> The IOMMU API is extended to support 2 new API functionalities:
> >> 1) pass the guest stage 1 configuration
> >> 2) pass stage 1 MSI bindings
> >>
> >> Then those capabilities gets implemented in the SMMUv3 driver.
> >>
> >> The virtualizer passes information through the VFIO user API
> >> which cascades them to the iommu subsystem. This allows the guest
> >> to own stage 1 tables and context descriptors (so-called PASID
> >> table) while the host owns stage 2 tables and main configuration
> >> structures (STE).
> >
> > I am seeing an issue with Guest testpmd run with this series.
> > I have two different setups and testpmd works fine with the
> > first one but not with the second.
> >
> > 1). Guest doesn't have kernel driver built-in for pass-through dev.
> >
> > root@ubuntu:/# lspci -v
> > ...
> > 00:02.0 Ethernet controller: Huawei Technologies Co., Ltd. Device a22e (rev
> 21)
> > Subsystem: Huawei Technologies Co., Ltd. Device 0000
> > Flags: fast devsel
> > Memory at 8000100000 (64-bit, prefetchable) [disabled] [size=64K]
> > Memory at 8000000000 (64-bit, prefetchable) [disabled] [size=1M]
> > Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
> > Capabilities: [a0] MSI-X: Enable- Count=67 Masked-
> > Capabilities: [b0] Power Management version 3
> > Capabilities: [100] Access Control Services
> > Capabilities: [300] Transaction Processing Hints
> >
> > root@ubuntu:/# echo vfio-pci >
> /sys/bus/pci/devices/0000:00:02.0/driver_override
> > root@ubuntu:/# echo 0000:00:02.0 > /sys/bus/pci/drivers_probe
> >
> > root@ubuntu:/mnt/dpdk/build/app# ./testpmd -w 0000:00:02.0 --file-prefix
> socket0  -l 0-1 -n 2 -- -i
> > EAL: Detected 8 lcore(s)
> > EAL: Detected 1 NUMA nodes
> > EAL: Multi-process socket /var/run/dpdk/socket0/mp_socket
> > EAL: Selected IOVA mode 'VA'
> > EAL: No available hugepages reported in hugepages-32768kB
> > EAL: No available hugepages reported in hugepages-64kB
> > EAL: No available hugepages reported in hugepages-1048576kB
> > EAL: Probing VFIO support...
> > EAL: VFIO support initialized
> > EAL:   Invalid NUMA socket, default to 0
> > EAL:   using IOMMU type 1 (Type 1)
> > EAL: Probe PCI driver: net_hns3_vf (19e5:a22e) device: 0000:00:02.0 (socket
> 0)
> > EAL: No legacy callbacks, legacy socket not created
> > Interactive-mode selected
> > testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=155456,
> size=2176, socket=0
> > testpmd: preferred mempool ops selected: ring_mp_mc
> >
> > Warning! port-topology=paired and odd forward ports number, the last port
> will pair with itself.
> >
> > Configuring Port 0 (socket 0)
> > Port 0: 8E:A6:8C:43:43:45
> > Checking link statuses...
> > Done
> > testpmd>
> >
> > 2). Guest have kernel driver built-in for pass-through dev.
> >
> > root@ubuntu:/# lspci -v
> > ...
> > 00:02.0 Ethernet controller: Huawei Technologies Co., Ltd. Device a22e (rev
> 21)
> > Subsystem: Huawei Technologies Co., Ltd. Device 0000
> > Flags: bus master, fast devsel, latency 0
> > Memory at 8000100000 (64-bit, prefetchable) [size=64K]
> > Memory at 8000000000 (64-bit, prefetchable) [size=1M]
> > Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
> > Capabilities: [a0] MSI-X: Enable+ Count=67 Masked-
> > Capabilities: [b0] Power Management version 3
> > Capabilities: [100] Access Control Services
> > Capabilities: [300] Transaction Processing Hints
> > Kernel driver in use: hns3
> >
> > root@ubuntu:/# echo vfio-pci >
> /sys/bus/pci/devices/0000:00:02.0/driver_override
> > root@ubuntu:/# echo 0000:00:02.0 > /sys/bus/pci/drivers/hns3/unbind
> > root@ubuntu:/# echo 0000:00:02.0 > /sys/bus/pci/drivers_probe
> >
> > root@ubuntu:/mnt/dpdk/build/app# ./testpmd -w 0000:00:02.0 --file-prefix
> socket0 -l 0-1 -n 2 -- -i
> > EAL: Detected 8 lcore(s)
> > EAL: Detected 1 NUMA nodes
> > EAL: Multi-process socket /var/run/dpdk/socket0/mp_socket
> > EAL: Selected IOVA mode 'VA'
> > EAL: No available hugepages reported in hugepages-32768kB
> > EAL: No available hugepages reported in hugepages-64kB
> > EAL: No available hugepages reported in hugepages-1048576kB
> > EAL: Probing VFIO support...
> > EAL: VFIO support initialized
> > EAL:   Invalid NUMA socket, default to 0
> > EAL:   using IOMMU type 1 (Type 1)
> > EAL: Probe PCI driver: net_hns3_vf (19e5:a22e) device: 0000:00:02.0 (socket
> 0)
> > 0000:00:02.0 hns3_get_mbx_resp(): VF could not get mbx(11,0) head(1) tail(0)
> lost(1) from PF in_irq:0
> > hns3vf_get_queue_info(): Failed to get tqp info from PF: -62
> > hns3vf_init_vf(): Failed to fetch configuration: -62
> > hns3vf_dev_init(): Failed to init vf: -62
> > EAL: Releasing pci mapped resource for 0000:00:02.0
> > EAL: Calling pci_unmap_resource for 0000:00:02.0 at 0x1100800000
> > EAL: Calling pci_unmap_resource for 0000:00:02.0 at 0x1100810000
> > EAL: Requested device 0000:00:02.0 cannot be used
> > EAL: Bus (pci) probe failed.
> > EAL: No legacy callbacks, legacy socket not created
> > testpmd: No probed ethernet devices
> > Interactive-mode selected
> > testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=155456,
> size=2176, socket=0
> > testpmd: preferred mempool ops selected: ring_mp_mc
> > Done
> > testpmd>
> >
> > And in this case, smmu(host) reports a translation fault,
> >
> > [ 6542.670624] arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
> > [ 6542.670630] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00007d1200000010
> > [ 6542.670631] arm-smmu-v3 arm-smmu-v3.2.auto: 0x000012000000007c
> > [ 6542.670633] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000fffef040
> > [ 6542.670634] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000fffef000
> >
> > Tested with Intel 82599 card(ixgbevf) as well. but same errror.
> 
> So this should be fixed in the next release. The problem came from the
> fact the MSI giova was not duly unregistered. When vfio is not in used
> on guest side, the guest kernel allocates giovas for MSIs @fffef000 - 40
> is the ITS translater offset ;-) - When passthrough is in use, the iova
> is allocated @0x8000000. As fffef000 MSI giova was not properly
> unregistered, the host kernel used it - despite it has been unmapped by
> the guest kernel -, hence the translation fault. So the fix is to
> unregister the MSI in the VFIO QEMU code when msix are disabled. So to
> me this is a QEMU integration issue.

Super!. I was focusing on the TLBI side and was slightly worried it is somehow
related our specific hardware. That’s a relief :).

Thanks,
Shameer 



^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)
  2020-11-18 11:21 [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part) Eric Auger
                   ` (15 preceding siblings ...)
  2021-01-08 17:05 ` [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part) Shameerali Kolothum Thodi
@ 2021-03-15 18:04 ` Krishna Reddy
  2021-03-16  8:22   ` Auger Eric
  16 siblings, 1 reply; 57+ messages in thread
From: Krishna Reddy @ 2021-03-15 18:04 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, iommu, linux-kernel, kvm, kvmarm,
	will, joro, maz, robin.murphy, alex.williamson
  Cc: jean-philippe, vivek.gautam, zhangfei.gao, Sachin Nikam,
	Yu-Huan Hsu, Bryan Huntsman, Vikram Sethi

Tested-by: Krishna Reddy <vdumpa@nvidia.com>

> 1) pass the guest stage 1 configuration

Validated Nested SMMUv3 translations for NVMe PCIe device from Guest VM along with patch series "v11 SMMUv3 Nested Stage Setup (VFIO part)" and QEMU patch series "vSMMUv3/pSMMUv3 2 stage VFIO integration" from v5.2.0-2stage-rfcv8. 
NVMe PCIe device is functional with 2-stage translations and no issues observed.

-KR


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)
  2021-03-15 18:04 ` Krishna Reddy
@ 2021-03-16  8:22   ` Auger Eric
  2021-03-16 18:10     ` Krishna Reddy
  0 siblings, 1 reply; 57+ messages in thread
From: Auger Eric @ 2021-03-16  8:22 UTC (permalink / raw)
  To: Krishna Reddy, eric.auger.pro, iommu, linux-kernel, kvm, kvmarm,
	will, joro, maz, robin.murphy, alex.williamson
  Cc: jean-philippe, vivek.gautam, zhangfei.gao, Sachin Nikam,
	Yu-Huan Hsu, Bryan Huntsman, Vikram Sethi

Hi Krishna,
On 3/15/21 7:04 PM, Krishna Reddy wrote:
> Tested-by: Krishna Reddy <vdumpa@nvidia.com>
> 
>> 1) pass the guest stage 1 configuration
> 
> Validated Nested SMMUv3 translations for NVMe PCIe device from Guest VM along with patch series "v11 SMMUv3 Nested Stage Setup (VFIO part)" and QEMU patch series "vSMMUv3/pSMMUv3 2 stage VFIO integration" from v5.2.0-2stage-rfcv8. 
> NVMe PCIe device is functional with 2-stage translations and no issues observed.
Thank you very much for your testing efforts. For your info, there are
more recent kernel series:
[PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part) (Feb 23)
[PATCH v12 00/13] SMMUv3 Nested Stage Setup (VFIO part) (Feb 23)

working along with QEMU RFC
[RFC v8 00/28] vSMMUv3/pSMMUv3 2 stage VFIO integration (Feb 25)

If you have cycles to test with those, this would be higly appreciated.

Thanks

Eric
> 
> -KR
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)
  2021-03-16  8:22   ` Auger Eric
@ 2021-03-16 18:10     ` Krishna Reddy
  0 siblings, 0 replies; 57+ messages in thread
From: Krishna Reddy @ 2021-03-16 18:10 UTC (permalink / raw)
  To: Auger Eric, eric.auger.pro, iommu, linux-kernel, kvm, kvmarm,
	will, joro, maz, robin.murphy, alex.williamson
  Cc: jean-philippe, vivek.gautam, zhangfei.gao, Sachin Nikam,
	Yu-Huan Hsu, Bryan Huntsman, Vikram Sethi, Nicolin Chen,
	Nate Watterson, Pritesh Raithatha

> Hi Krishna,
> On 3/15/21 7:04 PM, Krishna Reddy wrote:
> > Tested-by: Krishna Reddy <vdumpa@nvidia.com>
> >
> >> 1) pass the guest stage 1 configuration
> >
> > Validated Nested SMMUv3 translations for NVMe PCIe device from Guest VM
> along with patch series "v11 SMMUv3 Nested Stage Setup (VFIO part)" and
> QEMU patch series "vSMMUv3/pSMMUv3 2 stage VFIO integration" from
> v5.2.0-2stage-rfcv8.
> > NVMe PCIe device is functional with 2-stage translations and no issues
> observed.
> Thank you very much for your testing efforts. For your info, there are more
> recent kernel series:
> [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part) (Feb 23) [PATCH
> v12 00/13] SMMUv3 Nested Stage Setup (VFIO part) (Feb 23)
> 
> working along with QEMU RFC
> [RFC v8 00/28] vSMMUv3/pSMMUv3 2 stage VFIO integration (Feb 25)
> 
> If you have cycles to test with those, this would be higly appreciated.
 
Thanks Eric for the latest patches. Will validate and update. 
Feel free to reach out me for validating future patch sets as necessary.

-KR

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2021-03-16 18:11 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-18 11:21 [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part) Eric Auger
2020-11-18 11:21 ` [PATCH v13 01/15] iommu: Introduce attach/detach_pasid_table API Eric Auger
2020-11-18 16:19   ` Jacob Pan
2020-11-19 17:02     ` Auger Eric
2021-02-01 11:27   ` Keqian Zhu
2021-02-01 17:18     ` Auger Eric
2020-11-18 11:21 ` [PATCH v13 02/15] iommu: Introduce bind/unbind_guest_msi Eric Auger
2021-02-01 11:52   ` Keqian Zhu
2021-02-12  8:55     ` Auger Eric
2021-02-18  8:43       ` Keqian Zhu
2021-02-18 10:35         ` Auger Eric
2020-11-18 11:21 ` [PATCH v13 03/15] iommu/arm-smmu-v3: Maintain a SID->device structure Eric Auger
2021-02-01 12:26   ` Keqian Zhu
2021-02-01 15:15     ` Jean-Philippe Brucker
2021-02-02  6:39       ` Keqian Zhu
2021-02-01 17:19     ` Auger Eric
2021-02-02  7:20       ` Keqian Zhu
2020-11-18 11:21 ` [PATCH v13 04/15] iommu/smmuv3: Allow s1 and s2 configs to coexist Eric Auger
2021-02-01 12:35   ` Keqian Zhu
2020-11-18 11:21 ` [PATCH v13 05/15] iommu/smmuv3: Get prepared for nested stage support Eric Auger
2020-11-19  3:59   ` kernel test robot
     [not found]   ` <a40b90bd-6756-c8cc-b455-c093d16d35f5@huawei.com>
2020-12-03 13:01     ` Auger Eric
2020-12-03 13:23       ` Kunkun Jiang
2020-12-09 14:26   ` Shameerali Kolothum Thodi
2021-02-02  7:14   ` Keqian Zhu
2021-02-11 17:36     ` Auger Eric
2020-11-18 11:21 ` [PATCH v13 06/15] iommu/smmuv3: Implement attach/detach_pasid_table Eric Auger
2021-02-02  8:03   ` Keqian Zhu
2021-02-11 17:35     ` Auger Eric
2020-11-18 11:21 ` [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs Eric Auger
2020-12-01 13:33   ` Xingang Wang
2020-12-01 13:58     ` Auger Eric
2020-12-02 12:59       ` Wang Xingang
2020-12-03 18:42       ` Shameerali Kolothum Thodi
2020-12-04  9:53         ` Jean-Philippe Brucker
2020-12-04 10:20           ` Shameerali Kolothum Thodi
2020-12-04 10:23             ` Auger Eric
2021-01-14 16:58               ` Auger Eric
2021-01-14 17:09                 ` Shameerali Kolothum Thodi
2021-01-14 17:33                 ` Jean-Philippe Brucker
2021-01-14 18:00                   ` Auger Eric
2021-02-15 13:17         ` Auger Eric
2020-11-18 11:21 ` [PATCH v13 08/15] iommu/smmuv3: Implement cache_invalidate Eric Auger
2020-11-18 11:21 ` [PATCH v13 09/15] dma-iommu: Implement NESTED_MSI cookie Eric Auger
2020-11-18 11:21 ` [PATCH v13 10/15] iommu/smmuv3: Nested mode single MSI doorbell per domain enforcement Eric Auger
2020-11-18 11:21 ` [PATCH v13 11/15] iommu/smmuv3: Enforce incompatibility between nested mode and HW MSI regions Eric Auger
2020-11-18 11:21 ` [PATCH v13 12/15] iommu/smmuv3: Implement bind/unbind_guest_msi Eric Auger
2020-11-18 11:21 ` [PATCH v13 13/15] iommu/smmuv3: Report non recoverable faults Eric Auger
2020-11-18 11:21 ` [PATCH v13 14/15] iommu/smmuv3: Accept configs with more than one context descriptor Eric Auger
2020-11-18 11:21 ` [PATCH v13 15/15] iommu/smmuv3: Add PASID cache invalidation per PASID Eric Auger
2021-01-08 17:05 ` [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part) Shameerali Kolothum Thodi
2021-01-13 15:37   ` Auger Eric
2021-02-21 18:21   ` Auger Eric
2021-02-22  8:56     ` Shameerali Kolothum Thodi
2021-03-15 18:04 ` Krishna Reddy
2021-03-16  8:22   ` Auger Eric
2021-03-16 18:10     ` Krishna Reddy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).