linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8] iommu: Add auxiliary domain and PASID support to Arm SMMUv3
@ 2019-06-10 18:47 Jean-Philippe Brucker
  2019-06-10 18:47 ` [PATCH 1/8] iommu: Add I/O ASID allocator Jean-Philippe Brucker
                   ` (7 more replies)
  0 siblings, 8 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2019-06-10 18:47 UTC (permalink / raw)
  To: will.deacon
  Cc: joro, robh+dt, mark.rutland, robin.murphy, jacob.jun.pan, iommu,
	devicetree, linux-kernel, linux-arm-kernel, eric.auger

Add substreams and PCI PASID support to the SMMUv3 driver. At the moment
the driver supports a single address space per device. PASID enables
multiple address spaces per device, up to a million in theory (1 << 20).

Two kernel features will make use of PASIDs, auxiliary domains (AUXD)
and Shared Virtual Addressing (SVA). Auxiliary domains allow to program
PASID contexts using IOMMU domains. SVA allows to bind process address
spaces to device contexts and relieve device drivers of DMA management.

Since SVA support for SMMUv3 has a lot more dependencies (new fault API,
ASID pinning, generic bind, PRI or stall support, and so on),
introducing PASID support to the SMMUv3 driver is easier with auxiliary
domains.

The AUXD API allows device drivers to easily test PASID support of their
devices, although they need to allocate IOVA and pages themselves
because the DMA API doesn't support AUXD for the moment:

	iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_AUX);
	domain = iommu_domain_alloc(dev->bus);
	iommu_aux_attach_device(domain, dev);
	iommu_map(domain, iova, phys_addr, size, prot);
	pasid = iommu_aux_get_pasid(domain);
	/* Then launch DMA with the PASID and IOVA */

Auxiliary domains also allow to split devices into multiple contexts
assignable to guest, with vfio-mdev.

Past discussions for these patches:
* Auxiliary domains (patch 6)
  [RFC PATCH 0/6] Auxiliary IOMMU domains and Arm SMMUv3
  https://www.spinics.net/lists/iommu/msg30637.html
* SSID support for the SMMU (patches 2, 3, 4, 5, 7 and 8)
  [PATCH v2 00/40] Shared Virtual Addressing for the IOMMU
  https://lists.linuxfoundation.org/pipermail/iommu/2018-May/027595.html
* I/O ASID (patch 1)
  [PATCH v3 00/16] Shared virtual address IOMMU and VT-d support
  https://lkml.kernel.org/lkml/1556922737-76313-4-git-send-email-jacob.jun.pan@linux.intel.com/

Jean-Philippe Brucker (8):
  iommu: Add I/O ASID allocator
  dt-bindings: document PASID property for IOMMU masters
  iommu/arm-smmu-v3: Support platform SSID
  iommu/arm-smmu-v3: Add support for Substream IDs
  iommu/arm-smmu-v3: Add second level of context descriptor table
  iommu/arm-smmu-v3: Support auxiliary domains
  iommu/arm-smmu-v3: Improve add_device() error handling
  iommu/arm-smmu-v3: Add support for PCI PASID

 .../devicetree/bindings/iommu/iommu.txt       |   6 +
 drivers/iommu/Kconfig                         |   5 +
 drivers/iommu/Makefile                        |   1 +
 drivers/iommu/arm-smmu-v3.c                   | 714 ++++++++++++++++--
 drivers/iommu/ioasid.c                        | 150 ++++
 drivers/iommu/of_iommu.c                      |   6 +-
 include/linux/ioasid.h                        |  49 ++
 include/linux/iommu.h                         |   1 +
 8 files changed, 865 insertions(+), 67 deletions(-)
 create mode 100644 drivers/iommu/ioasid.c
 create mode 100644 include/linux/ioasid.h

-- 
2.21.0


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 1/8] iommu: Add I/O ASID allocator
  2019-06-10 18:47 [PATCH 0/8] iommu: Add auxiliary domain and PASID support to Arm SMMUv3 Jean-Philippe Brucker
@ 2019-06-10 18:47 ` Jean-Philippe Brucker
  2019-06-11  9:36   ` Jonathan Cameron
  2019-06-11 12:26   ` Jacob Pan
  2019-06-10 18:47 ` [PATCH 2/8] dt-bindings: document PASID property for IOMMU masters Jean-Philippe Brucker
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2019-06-10 18:47 UTC (permalink / raw)
  To: will.deacon
  Cc: joro, robh+dt, mark.rutland, robin.murphy, jacob.jun.pan, iommu,
	devicetree, linux-kernel, linux-arm-kernel, eric.auger

Some devices might support multiple DMA address spaces, in particular
those that have the PCI PASID feature. PASID (Process Address Space ID)
allows to share process address spaces with devices (SVA), partition a
device into VM-assignable entities (VFIO mdev) or simply provide
multiple DMA address space to kernel drivers. Add a global PASID
allocator usable by different drivers at the same time. Name it I/O ASID
to avoid confusion with ASIDs allocated by arch code, which are usually
a separate ID space.

The IOASID space is global. Each device can have its own PASID space,
but by convention the IOMMU ended up having a global PASID space, so
that with SVA, each mm_struct is associated to a single PASID.

The allocator is primarily used by IOMMU subsystem but in rare occasions
drivers would like to allocate PASIDs for devices that aren't managed by
an IOMMU, using the same ID space as IOMMU.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
The most recent discussion on this patch was at:
https://lkml.kernel.org/lkml/1556922737-76313-4-git-send-email-jacob.jun.pan@linux.intel.com/
I fixed it up a bit following comments in that series, and removed the
definitions for the custom allocator for now.

There also is a new version that includes the custom allocator into this
patch, but is currently missing the RCU fixes, at:
https://lore.kernel.org/lkml/1560087862-57608-13-git-send-email-jacob.jun.pan@linux.intel.com/
---
 drivers/iommu/Kconfig  |   4 ++
 drivers/iommu/Makefile |   1 +
 drivers/iommu/ioasid.c | 150 +++++++++++++++++++++++++++++++++++++++++
 include/linux/ioasid.h |  49 ++++++++++++++
 4 files changed, 204 insertions(+)
 create mode 100644 drivers/iommu/ioasid.c
 create mode 100644 include/linux/ioasid.h

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 83664db5221d..9b45f70549a7 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -3,6 +3,10 @@
 config IOMMU_IOVA
 	tristate
 
+# The IOASID library may also be used by non-IOMMU_API users
+config IOASID
+	tristate
+
 # IOMMU_API always gets selected by whoever wants it.
 config IOMMU_API
 	bool
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 8c71a15e986b..0efac6f1ec73 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -7,6 +7,7 @@ obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
+obj-$(CONFIG_IOASID) += ioasid.o
 obj-$(CONFIG_IOMMU_IOVA) += iova.o
 obj-$(CONFIG_OF_IOMMU)	+= of_iommu.o
 obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
new file mode 100644
index 000000000000..bbb771214fa9
--- /dev/null
+++ b/drivers/iommu/ioasid.c
@@ -0,0 +1,150 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * I/O Address Space ID allocator. There is one global IOASID space, split into
+ * subsets. Users create a subset with DECLARE_IOASID_SET, then allocate and
+ * free IOASIDs with ioasid_alloc and ioasid_free.
+ */
+#include <linux/ioasid.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/xarray.h>
+
+struct ioasid_data {
+	ioasid_t id;
+	struct ioasid_set *set;
+	void *private;
+	struct rcu_head rcu;
+};
+
+static DEFINE_XARRAY_ALLOC(ioasid_xa);
+
+/**
+ * ioasid_set_data - Set private data for an allocated ioasid
+ * @ioasid: the ID to set data
+ * @data:   the private data
+ *
+ * For IOASID that is already allocated, private data can be set
+ * via this API. Future lookup can be done via ioasid_find.
+ */
+int ioasid_set_data(ioasid_t ioasid, void *data)
+{
+	struct ioasid_data *ioasid_data;
+	int ret = 0;
+
+	xa_lock(&ioasid_xa);
+	ioasid_data = xa_load(&ioasid_xa, ioasid);
+	if (ioasid_data)
+		rcu_assign_pointer(ioasid_data->private, data);
+	else
+		ret = -ENOENT;
+	xa_unlock(&ioasid_xa);
+
+	/*
+	 * Wait for readers to stop accessing the old private data, so the
+	 * caller can free it.
+	 */
+	if (!ret)
+		synchronize_rcu();
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(ioasid_set_data);
+
+/**
+ * ioasid_alloc - Allocate an IOASID
+ * @set: the IOASID set
+ * @min: the minimum ID (inclusive)
+ * @max: the maximum ID (inclusive)
+ * @private: data private to the caller
+ *
+ * Allocate an ID between @min and @max. The @private pointer is stored
+ * internally and can be retrieved with ioasid_find().
+ *
+ * Return: the allocated ID on success, or %INVALID_IOASID on failure.
+ */
+ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, ioasid_t max,
+		      void *private)
+{
+	u32 id = INVALID_IOASID;
+	struct ioasid_data *data;
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
+	if (!data)
+		return INVALID_IOASID;
+
+	data->set = set;
+	data->private = private;
+
+	if (xa_alloc(&ioasid_xa, &id, data, XA_LIMIT(min, max), GFP_KERNEL)) {
+		pr_err("Failed to alloc ioasid from %d to %d\n", min, max);
+		goto exit_free;
+	}
+	data->id = id;
+
+exit_free:
+	if (id == INVALID_IOASID) {
+		kfree(data);
+		return INVALID_IOASID;
+	}
+	return id;
+}
+EXPORT_SYMBOL_GPL(ioasid_alloc);
+
+/**
+ * ioasid_free - Free an IOASID
+ * @ioasid: the ID to remove
+ */
+void ioasid_free(ioasid_t ioasid)
+{
+	struct ioasid_data *ioasid_data;
+
+	ioasid_data = xa_erase(&ioasid_xa, ioasid);
+
+	kfree_rcu(ioasid_data, rcu);
+}
+EXPORT_SYMBOL_GPL(ioasid_free);
+
+/**
+ * ioasid_find - Find IOASID data
+ * @set: the IOASID set
+ * @ioasid: the IOASID to find
+ * @getter: function to call on the found object
+ *
+ * The optional getter function allows to take a reference to the found object
+ * under the rcu lock. The function can also check if the object is still valid:
+ * if @getter returns false, then the object is invalid and NULL is returned.
+ *
+ * If the IOASID has been allocated for this set, return the private pointer
+ * passed to ioasid_alloc. Private data can be NULL if not set. Return an error
+ * if the IOASID is not found or does not belong to the set.
+ */
+void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
+		  bool (*getter)(void *))
+{
+	void *priv = NULL;
+	struct ioasid_data *ioasid_data;
+
+	rcu_read_lock();
+	ioasid_data = xa_load(&ioasid_xa, ioasid);
+	if (!ioasid_data) {
+		priv = ERR_PTR(-ENOENT);
+		goto unlock;
+	}
+	if (set && ioasid_data->set != set) {
+		/* data found but does not belong to the set */
+		priv = ERR_PTR(-EACCES);
+		goto unlock;
+	}
+	/* Now IOASID and its set is verified, we can return the private data */
+	priv = rcu_dereference(ioasid_data->private);
+	if (getter && !getter(priv))
+		priv = NULL;
+unlock:
+	rcu_read_unlock();
+
+	return priv;
+}
+EXPORT_SYMBOL_GPL(ioasid_find);
+
+MODULE_LICENSE("GPL");
diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
new file mode 100644
index 000000000000..940212422b8f
--- /dev/null
+++ b/include/linux/ioasid.h
@@ -0,0 +1,49 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __LINUX_IOASID_H
+#define __LINUX_IOASID_H
+
+#include <linux/types.h>
+
+#define INVALID_IOASID ((ioasid_t)-1)
+typedef unsigned int ioasid_t;
+
+struct ioasid_set {
+	int dummy;
+};
+
+#define DECLARE_IOASID_SET(name) struct ioasid_set name = { 0 }
+
+#if IS_ENABLED(CONFIG_IOASID)
+ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, ioasid_t max,
+		      void *private);
+void ioasid_free(ioasid_t ioasid);
+
+void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
+		  bool (*getter)(void *));
+
+int ioasid_set_data(ioasid_t ioasid, void *data);
+
+#else /* !CONFIG_IOASID */
+static inline ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min,
+				    ioasid_t max, void *private)
+{
+	return INVALID_IOASID;
+}
+
+static inline void ioasid_free(ioasid_t ioasid)
+{
+}
+
+static inline void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
+				bool (*getter)(void *))
+{
+	return NULL;
+}
+
+static inline int ioasid_set_data(ioasid_t ioasid, void *data)
+{
+	return -ENODEV;
+}
+
+#endif /* CONFIG_IOASID */
+#endif /* __LINUX_IOASID_H */
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 2/8] dt-bindings: document PASID property for IOMMU masters
  2019-06-10 18:47 [PATCH 0/8] iommu: Add auxiliary domain and PASID support to Arm SMMUv3 Jean-Philippe Brucker
  2019-06-10 18:47 ` [PATCH 1/8] iommu: Add I/O ASID allocator Jean-Philippe Brucker
@ 2019-06-10 18:47 ` Jean-Philippe Brucker
  2019-07-08  7:58   ` Auger Eric
  2019-06-10 18:47 ` [PATCH 3/8] iommu/arm-smmu-v3: Support platform SSID Jean-Philippe Brucker
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 44+ messages in thread
From: Jean-Philippe Brucker @ 2019-06-10 18:47 UTC (permalink / raw)
  To: will.deacon
  Cc: joro, robh+dt, mark.rutland, robin.murphy, jacob.jun.pan, iommu,
	devicetree, linux-kernel, linux-arm-kernel, eric.auger

On Arm systems, some platform devices behind an SMMU may support the PASID
feature, which offers multiple address space. Let the firmware tell us
when a device supports PASID.

Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
Previous discussion on this patch last year:
https://patchwork.ozlabs.org/patch/872275/
I split PASID and stall definitions, keeping only PASID here.
---
 Documentation/devicetree/bindings/iommu/iommu.txt | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/iommu.txt b/Documentation/devicetree/bindings/iommu/iommu.txt
index 5a8b4624defc..3c36334e4f94 100644
--- a/Documentation/devicetree/bindings/iommu/iommu.txt
+++ b/Documentation/devicetree/bindings/iommu/iommu.txt
@@ -86,6 +86,12 @@ have a means to turn off translation. But it is invalid in such cases to
 disable the IOMMU's device tree node in the first place because it would
 prevent any driver from properly setting up the translations.
 
+Optional properties:
+--------------------
+- pasid-num-bits: Some masters support multiple address spaces for DMA, by
+  tagging DMA transactions with an address space identifier. By default,
+  this is 0, which means that the device only has one address space.
+
 
 Notes:
 ======
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 3/8] iommu/arm-smmu-v3: Support platform SSID
  2019-06-10 18:47 [PATCH 0/8] iommu: Add auxiliary domain and PASID support to Arm SMMUv3 Jean-Philippe Brucker
  2019-06-10 18:47 ` [PATCH 1/8] iommu: Add I/O ASID allocator Jean-Philippe Brucker
  2019-06-10 18:47 ` [PATCH 2/8] dt-bindings: document PASID property for IOMMU masters Jean-Philippe Brucker
@ 2019-06-10 18:47 ` Jean-Philippe Brucker
  2019-06-11  9:42   ` Jonathan Cameron
                     ` (2 more replies)
  2019-06-10 18:47 ` [PATCH 4/8] iommu/arm-smmu-v3: Add support for Substream IDs Jean-Philippe Brucker
                   ` (4 subsequent siblings)
  7 siblings, 3 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2019-06-10 18:47 UTC (permalink / raw)
  To: will.deacon
  Cc: joro, robh+dt, mark.rutland, robin.murphy, jacob.jun.pan, iommu,
	devicetree, linux-kernel, linux-arm-kernel, eric.auger

For platform devices that support SubstreamID (SSID), firmware provides
the number of supported SSID bits. Restrict it to what the SMMU supports
and cache it into master->ssid_bits.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 11 +++++++++++
 drivers/iommu/of_iommu.c    |  6 +++++-
 include/linux/iommu.h       |  1 +
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 4d5a694f02c2..3254f473e681 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -604,6 +604,7 @@ struct arm_smmu_master {
 	struct list_head		domain_head;
 	u32				*sids;
 	unsigned int			num_sids;
+	unsigned int			ssid_bits;
 	bool				ats_enabled		:1;
 };
 
@@ -2097,6 +2098,16 @@ static int arm_smmu_add_device(struct device *dev)
 		}
 	}
 
+	master->ssid_bits = min(smmu->ssid_bits, fwspec->num_pasid_bits);
+
+	/*
+	 * If the SMMU doesn't support 2-stage CD, limit the linear
+	 * tables to a reasonable number of contexts, let's say
+	 * 64kB / sizeof(ctx_desc) = 1024 = 2^10
+	 */
+	if (!(smmu->features & ARM_SMMU_FEAT_2_LVL_CDTAB))
+		master->ssid_bits = min(master->ssid_bits, 10U);
+
 	group = iommu_group_get_for_dev(dev);
 	if (!IS_ERR(group)) {
 		iommu_group_put(group);
diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index f04a6df65eb8..04f4f6b95d82 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -206,8 +206,12 @@ const struct iommu_ops *of_iommu_configure(struct device *dev,
 			if (err)
 				break;
 		}
-	}
 
+		fwspec = dev_iommu_fwspec_get(dev);
+		if (!err && fwspec)
+			of_property_read_u32(master_np, "pasid-num-bits",
+					     &fwspec->num_pasid_bits);
+	}
 
 	/*
 	 * Two success conditions can be represented by non-negative err here:
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 519e40fb23ce..b91df613385f 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -536,6 +536,7 @@ struct iommu_fwspec {
 	struct fwnode_handle	*iommu_fwnode;
 	void			*iommu_priv;
 	u32			flags;
+	u32			num_pasid_bits;
 	unsigned int		num_ids;
 	u32			ids[1];
 };
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 4/8] iommu/arm-smmu-v3: Add support for Substream IDs
  2019-06-10 18:47 [PATCH 0/8] iommu: Add auxiliary domain and PASID support to Arm SMMUv3 Jean-Philippe Brucker
                   ` (2 preceding siblings ...)
  2019-06-10 18:47 ` [PATCH 3/8] iommu/arm-smmu-v3: Support platform SSID Jean-Philippe Brucker
@ 2019-06-10 18:47 ` Jean-Philippe Brucker
  2019-06-11 10:19   ` Jonathan Cameron
                     ` (2 more replies)
  2019-06-10 18:47 ` [PATCH 5/8] iommu/arm-smmu-v3: Add second level of context descriptor table Jean-Philippe Brucker
                   ` (3 subsequent siblings)
  7 siblings, 3 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2019-06-10 18:47 UTC (permalink / raw)
  To: will.deacon
  Cc: joro, robh+dt, mark.rutland, robin.murphy, jacob.jun.pan, iommu,
	devicetree, linux-kernel, linux-arm-kernel, eric.auger

At the moment, the SMMUv3 driver implements only one stage-1 or stage-2
page directory per device. However SMMUv3 allows more than one address
space for some devices, by providing multiple stage-1 page directories. In
addition to the Stream ID (SID), that identifies a device, we can now have
Substream IDs (SSID) identifying an address space. In PCIe, SID is called
Requester ID (RID) and SSID is called Process Address-Space ID (PASID).

Prepare the driver for SSID support, by adding context descriptor tables
in STEs (previously a single static context descriptor). A complete
stage-1 walk is now performed like this by the SMMU:

      Stream tables          Ctx. tables          Page tables
        +--------+   ,------->+-------+   ,------->+-------+
        :        :   |        :       :   |        :       :
        +--------+   |        +-------+   |        +-------+
   SID->|  STE   |---'  SSID->|  CD   |---'  IOVA->|  PTE  |--> IPA
        +--------+            +-------+            +-------+
        :        :            :       :            :       :
        +--------+            +-------+            +-------+

Implement a single level of context descriptor table for now, but as with
stream and page tables, an SSID can be split to index multiple levels of
tables.

In all stream table entries, we set S1DSS=SSID0 mode, making translations
without an SSID use context descriptor 0. Although it would be possible by
setting S1DSS=BYPASS, we don't currently support SSID when user selects
iommu.passthrough.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 238 +++++++++++++++++++++++++++++-------
 1 file changed, 192 insertions(+), 46 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 3254f473e681..d90eb604b65d 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -219,6 +219,11 @@
 #define STRTAB_STE_0_S1CTXPTR_MASK	GENMASK_ULL(51, 6)
 #define STRTAB_STE_0_S1CDMAX		GENMASK_ULL(63, 59)
 
+#define STRTAB_STE_1_S1DSS		GENMASK_ULL(1, 0)
+#define STRTAB_STE_1_S1DSS_TERMINATE	0x0
+#define STRTAB_STE_1_S1DSS_BYPASS	0x1
+#define STRTAB_STE_1_S1DSS_SSID0	0x2
+
 #define STRTAB_STE_1_S1C_CACHE_NC	0UL
 #define STRTAB_STE_1_S1C_CACHE_WBRA	1UL
 #define STRTAB_STE_1_S1C_CACHE_WT	2UL
@@ -305,6 +310,7 @@
 #define CMDQ_PREFETCH_1_SIZE		GENMASK_ULL(4, 0)
 #define CMDQ_PREFETCH_1_ADDR_MASK	GENMASK_ULL(63, 12)
 
+#define CMDQ_CFGI_0_SSID		GENMASK_ULL(31, 12)
 #define CMDQ_CFGI_0_SID			GENMASK_ULL(63, 32)
 #define CMDQ_CFGI_1_LEAF		(1UL << 0)
 #define CMDQ_CFGI_1_RANGE		GENMASK_ULL(4, 0)
@@ -421,8 +427,11 @@ struct arm_smmu_cmdq_ent {
 
 		#define CMDQ_OP_CFGI_STE	0x3
 		#define CMDQ_OP_CFGI_ALL	0x4
+		#define CMDQ_OP_CFGI_CD		0x5
+		#define CMDQ_OP_CFGI_CD_ALL	0x6
 		struct {
 			u32			sid;
+			u32			ssid;
 			union {
 				bool		leaf;
 				u8		span;
@@ -506,16 +515,25 @@ struct arm_smmu_strtab_l1_desc {
 	dma_addr_t			l2ptr_dma;
 };
 
+struct arm_smmu_cd_table {
+	__le64				*ptr;
+	dma_addr_t			ptr_dma;
+};
+
+struct arm_smmu_ctx_desc {
+	u16				asid;
+	u64				ttbr;
+	u64				tcr;
+	u64				mair;
+};
+
 struct arm_smmu_s1_cfg {
-	__le64				*cdptr;
-	dma_addr_t			cdptr_dma;
-
-	struct arm_smmu_ctx_desc {
-		u16	asid;
-		u64	ttbr;
-		u64	tcr;
-		u64	mair;
-	}				cd;
+	u8				s1fmt;
+	u8				s1cdmax;
+	struct arm_smmu_cd_table	table;
+
+	/* Context descriptor 0, when substreams are disabled or s1dss = 0b10 */
+	struct arm_smmu_ctx_desc	cd;
 };
 
 struct arm_smmu_s2_cfg {
@@ -811,10 +829,16 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 		cmd[1] |= FIELD_PREP(CMDQ_PREFETCH_1_SIZE, ent->prefetch.size);
 		cmd[1] |= ent->prefetch.addr & CMDQ_PREFETCH_1_ADDR_MASK;
 		break;
+	case CMDQ_OP_CFGI_CD:
+		cmd[0] |= FIELD_PREP(CMDQ_CFGI_0_SSID, ent->cfgi.ssid);
+		/* Fallthrough */
 	case CMDQ_OP_CFGI_STE:
 		cmd[0] |= FIELD_PREP(CMDQ_CFGI_0_SID, ent->cfgi.sid);
 		cmd[1] |= FIELD_PREP(CMDQ_CFGI_1_LEAF, ent->cfgi.leaf);
 		break;
+	case CMDQ_OP_CFGI_CD_ALL:
+		cmd[0] |= FIELD_PREP(CMDQ_CFGI_0_SID, ent->cfgi.sid);
+		break;
 	case CMDQ_OP_CFGI_ALL:
 		/* Cover the entire SID range */
 		cmd[1] |= FIELD_PREP(CMDQ_CFGI_1_RANGE, 31);
@@ -1045,6 +1069,63 @@ static int arm_smmu_cmdq_issue_sync(struct arm_smmu_device *smmu)
 }
 
 /* Context descriptor manipulation functions */
+static void arm_smmu_sync_cd(struct arm_smmu_domain *smmu_domain,
+			     int ssid, bool leaf)
+{
+	size_t i;
+	unsigned long flags;
+	struct arm_smmu_master *master;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	struct arm_smmu_cmdq_ent cmd = {
+		.opcode	= CMDQ_OP_CFGI_CD,
+		.cfgi	= {
+			.ssid	= ssid,
+			.leaf	= leaf,
+		},
+	};
+
+	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+	list_for_each_entry(master, &smmu_domain->devices, domain_head) {
+		for (i = 0; i < master->num_sids; i++) {
+			cmd.cfgi.sid = master->sids[i];
+			arm_smmu_cmdq_issue_cmd(smmu, &cmd);
+		}
+	}
+	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+
+	arm_smmu_cmdq_issue_sync(smmu);
+}
+
+static int arm_smmu_alloc_cd_leaf_table(struct arm_smmu_device *smmu,
+					struct arm_smmu_cd_table *table,
+					size_t num_entries)
+{
+	size_t size = num_entries * (CTXDESC_CD_DWORDS << 3);
+
+	table->ptr = dmam_alloc_coherent(smmu->dev, size, &table->ptr_dma,
+					 GFP_KERNEL | __GFP_ZERO);
+	if (!table->ptr) {
+		dev_warn(smmu->dev,
+			 "failed to allocate context descriptor table\n");
+		return -ENOMEM;
+	}
+	return 0;
+}
+
+static void arm_smmu_free_cd_leaf_table(struct arm_smmu_device *smmu,
+					struct arm_smmu_cd_table *table,
+					size_t num_entries)
+{
+	size_t size = num_entries * (CTXDESC_CD_DWORDS << 3);
+
+	dmam_free_coherent(smmu->dev, size, table->ptr, table->ptr_dma);
+}
+
+static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_s1_cfg *cfg, u32 ssid)
+{
+	return cfg->table.ptr + ssid * CTXDESC_CD_DWORDS;
+}
+
 static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
 {
 	u64 val = 0;
@@ -1062,33 +1143,90 @@ static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
 	return val;
 }
 
-static void arm_smmu_write_ctx_desc(struct arm_smmu_device *smmu,
-				    struct arm_smmu_s1_cfg *cfg)
+static int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
+				   int ssid, struct arm_smmu_ctx_desc *cd)
 {
 	u64 val;
+	bool cd_live;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	__le64 *cdptr = arm_smmu_get_cd_ptr(&smmu_domain->s1_cfg, ssid);
 
 	/*
-	 * We don't need to issue any invalidation here, as we'll invalidate
-	 * the STE when installing the new entry anyway.
+	 * This function handles the following cases:
+	 *
+	 * (1) Install primary CD, for normal DMA traffic (SSID = 0).
+	 * (2) Install a secondary CD, for SID+SSID traffic.
+	 * (3) Update ASID of a CD. Atomically write the first 64 bits of the
+	 *     CD, then invalidate the old entry and mappings.
+	 * (4) Remove a secondary CD.
 	 */
-	val = arm_smmu_cpu_tcr_to_cd(cfg->cd.tcr) |
+
+	if (!cdptr)
+		return -ENOMEM;
+
+	val = le64_to_cpu(cdptr[0]);
+	cd_live = !!(val & CTXDESC_CD_0_V);
+
+	if (!cd) { /* (4) */
+		cdptr[0] = 0;
+	} else if (cd_live) { /* (3) */
+		val &= ~CTXDESC_CD_0_ASID;
+		val |= FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid);
+
+		cdptr[0] = cpu_to_le64(val);
+		/*
+		 * Until CD+TLB invalidation, both ASIDs may be used for tagging
+		 * this substream's traffic
+		 */
+	} else { /* (1) and (2) */
+		cdptr[1] = cpu_to_le64(cd->ttbr & CTXDESC_CD_1_TTB0_MASK);
+		cdptr[2] = 0;
+		cdptr[3] = cpu_to_le64(cd->mair);
+
+		/*
+		 * STE is live, and the SMMU might fetch this CD at any
+		 * time. Ensure that it observes the rest of the CD before we
+		 * enable it.
+		 */
+		arm_smmu_sync_cd(smmu_domain, ssid, true);
+
+		val = arm_smmu_cpu_tcr_to_cd(cd->tcr) |
 #ifdef __BIG_ENDIAN
-	      CTXDESC_CD_0_ENDI |
+			CTXDESC_CD_0_ENDI |
 #endif
-	      CTXDESC_CD_0_R | CTXDESC_CD_0_A | CTXDESC_CD_0_ASET |
-	      CTXDESC_CD_0_AA64 | FIELD_PREP(CTXDESC_CD_0_ASID, cfg->cd.asid) |
-	      CTXDESC_CD_0_V;
+			CTXDESC_CD_0_R | CTXDESC_CD_0_A | CTXDESC_CD_0_ASET |
+			CTXDESC_CD_0_AA64 |
+			FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid) |
+			CTXDESC_CD_0_V;
+
+		/* STALL_MODEL==0b10 && CD.S==0 is ILLEGAL */
+		if (smmu->features & ARM_SMMU_FEAT_STALL_FORCE)
+			val |= CTXDESC_CD_0_S;
+
+		cdptr[0] = cpu_to_le64(val);
+	}
 
-	/* STALL_MODEL==0b10 && CD.S==0 is ILLEGAL */
-	if (smmu->features & ARM_SMMU_FEAT_STALL_FORCE)
-		val |= CTXDESC_CD_0_S;
+	arm_smmu_sync_cd(smmu_domain, ssid, true);
+	return 0;
+}
+
+static int arm_smmu_alloc_cd_tables(struct arm_smmu_domain *smmu_domain,
+				    struct arm_smmu_master *master)
+{
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
 
-	cfg->cdptr[0] = cpu_to_le64(val);
+	cfg->s1fmt = STRTAB_STE_0_S1FMT_LINEAR;
+	cfg->s1cdmax = master->ssid_bits;
+	return arm_smmu_alloc_cd_leaf_table(smmu, &cfg->table, 1 << cfg->s1cdmax);
+}
 
-	val = cfg->cd.ttbr & CTXDESC_CD_1_TTB0_MASK;
-	cfg->cdptr[1] = cpu_to_le64(val);
+static void arm_smmu_free_cd_tables(struct arm_smmu_domain *smmu_domain)
+{
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
 
-	cfg->cdptr[3] = cpu_to_le64(cfg->cd.mair);
+	arm_smmu_free_cd_leaf_table(smmu, &cfg->table, 1 << cfg->s1cdmax);
 }
 
 /* Stream table manipulation functions */
@@ -1210,6 +1348,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 	if (s1_cfg) {
 		BUG_ON(ste_live);
 		dst[1] = cpu_to_le64(
+			 FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) |
 			 FIELD_PREP(STRTAB_STE_1_S1CIR, STRTAB_STE_1_S1C_CACHE_WBRA) |
 			 FIELD_PREP(STRTAB_STE_1_S1COR, STRTAB_STE_1_S1C_CACHE_WBRA) |
 			 FIELD_PREP(STRTAB_STE_1_S1CSH, ARM_SMMU_SH_ISH) |
@@ -1219,8 +1358,10 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
 			dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
 
-		val |= (s1_cfg->cdptr_dma & STRTAB_STE_0_S1CTXPTR_MASK) |
-			FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S1_TRANS);
+		val |= (s1_cfg->table.ptr_dma & STRTAB_STE_0_S1CTXPTR_MASK) |
+			FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S1_TRANS) |
+			FIELD_PREP(STRTAB_STE_0_S1CDMAX, s1_cfg->s1cdmax) |
+			FIELD_PREP(STRTAB_STE_0_S1FMT, s1_cfg->s1fmt);
 	}
 
 	if (s2_cfg) {
@@ -1674,12 +1815,8 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
 		struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
 
-		if (cfg->cdptr) {
-			dmam_free_coherent(smmu_domain->smmu->dev,
-					   CTXDESC_CD_DWORDS << 3,
-					   cfg->cdptr,
-					   cfg->cdptr_dma);
-
+		if (cfg->table.ptr) {
+			arm_smmu_free_cd_tables(smmu_domain);
 			arm_smmu_bitmap_free(smmu->asid_map, cfg->cd.asid);
 		}
 	} else {
@@ -1692,6 +1829,7 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
 }
 
 static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
+				       struct arm_smmu_master *master,
 				       struct io_pgtable_cfg *pgtbl_cfg)
 {
 	int ret;
@@ -1703,27 +1841,29 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 	if (asid < 0)
 		return asid;
 
-	cfg->cdptr = dmam_alloc_coherent(smmu->dev, CTXDESC_CD_DWORDS << 3,
-					 &cfg->cdptr_dma,
-					 GFP_KERNEL | __GFP_ZERO);
-	if (!cfg->cdptr) {
-		dev_warn(smmu->dev, "failed to allocate context descriptor\n");
-		ret = -ENOMEM;
+	ret = arm_smmu_alloc_cd_tables(smmu_domain, master);
+	if (ret)
 		goto out_free_asid;
-	}
 
 	cfg->cd.asid	= (u16)asid;
 	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
 	cfg->cd.tcr	= pgtbl_cfg->arm_lpae_s1_cfg.tcr;
 	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair[0];
+
+	ret = arm_smmu_write_ctx_desc(smmu_domain, 0, &smmu_domain->s1_cfg.cd);
+	if (ret)
+		goto out_free_table;
 	return 0;
 
+out_free_table:
+	arm_smmu_free_cd_tables(smmu_domain);
 out_free_asid:
 	arm_smmu_bitmap_free(smmu->asid_map, asid);
 	return ret;
 }
 
 static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain,
+				       struct arm_smmu_master *master,
 				       struct io_pgtable_cfg *pgtbl_cfg)
 {
 	int vmid;
@@ -1740,7 +1880,8 @@ static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain,
 	return 0;
 }
 
-static int arm_smmu_domain_finalise(struct iommu_domain *domain)
+static int arm_smmu_domain_finalise(struct iommu_domain *domain,
+				    struct arm_smmu_master *master)
 {
 	int ret;
 	unsigned long ias, oas;
@@ -1748,6 +1889,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain)
 	struct io_pgtable_cfg pgtbl_cfg;
 	struct io_pgtable_ops *pgtbl_ops;
 	int (*finalise_stage_fn)(struct arm_smmu_domain *,
+				 struct arm_smmu_master *,
 				 struct io_pgtable_cfg *);
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
@@ -1804,7 +1946,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain)
 	domain->geometry.aperture_end = (1UL << pgtbl_cfg.ias) - 1;
 	domain->geometry.force_aperture = true;
 
-	ret = finalise_stage_fn(smmu_domain, &pgtbl_cfg);
+	ret = finalise_stage_fn(smmu_domain, master, &pgtbl_cfg);
 	if (ret < 0) {
 		free_io_pgtable_ops(pgtbl_ops);
 		return ret;
@@ -1932,7 +2074,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 
 	if (!smmu_domain->smmu) {
 		smmu_domain->smmu = smmu;
-		ret = arm_smmu_domain_finalise(domain);
+		ret = arm_smmu_domain_finalise(domain, master);
 		if (ret) {
 			smmu_domain->smmu = NULL;
 			goto out_unlock;
@@ -1944,6 +2086,13 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 			dev_name(smmu->dev));
 		ret = -ENXIO;
 		goto out_unlock;
+	} else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1 &&
+		   master->ssid_bits != smmu_domain->s1_cfg.s1cdmax) {
+		dev_err(dev,
+			"cannot attach to incompatible domain (%u SSID bits != %u)\n",
+			smmu_domain->s1_cfg.s1cdmax, master->ssid_bits);
+		ret = -EINVAL;
+		goto out_unlock;
 	}
 
 	master->domain = smmu_domain;
@@ -1955,9 +2104,6 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	if (smmu_domain->stage != ARM_SMMU_DOMAIN_BYPASS)
 		arm_smmu_enable_ats(master);
 
-	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1)
-		arm_smmu_write_ctx_desc(smmu, &smmu_domain->s1_cfg);
-
 	arm_smmu_install_ste_for_dev(master);
 out_unlock:
 	mutex_unlock(&smmu_domain->init_mutex);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 5/8] iommu/arm-smmu-v3: Add second level of context descriptor table
  2019-06-10 18:47 [PATCH 0/8] iommu: Add auxiliary domain and PASID support to Arm SMMUv3 Jean-Philippe Brucker
                   ` (3 preceding siblings ...)
  2019-06-10 18:47 ` [PATCH 4/8] iommu/arm-smmu-v3: Add support for Substream IDs Jean-Philippe Brucker
@ 2019-06-10 18:47 ` Jean-Philippe Brucker
  2019-06-11 10:24   ` Jonathan Cameron
  2019-07-08 15:13   ` Auger Eric
  2019-06-10 18:47 ` [PATCH 6/8] iommu/arm-smmu-v3: Support auxiliary domains Jean-Philippe Brucker
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2019-06-10 18:47 UTC (permalink / raw)
  To: will.deacon
  Cc: joro, robh+dt, mark.rutland, robin.murphy, jacob.jun.pan, iommu,
	devicetree, linux-kernel, linux-arm-kernel, eric.auger

The SMMU can support up to 20 bits of SSID. Add a second level of page
tables to accommodate this. Devices that support more than 1024 SSIDs now
have a table of 1024 L1 entries (8kB), pointing to tables of 1024 context
descriptors (64kB), allocated on demand.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 136 +++++++++++++++++++++++++++++++++---
 1 file changed, 128 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index d90eb604b65d..326b71793336 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -216,6 +216,8 @@
 
 #define STRTAB_STE_0_S1FMT		GENMASK_ULL(5, 4)
 #define STRTAB_STE_0_S1FMT_LINEAR	0
+#define STRTAB_STE_0_S1FMT_4K_L2	1
+#define STRTAB_STE_0_S1FMT_64K_L2	2
 #define STRTAB_STE_0_S1CTXPTR_MASK	GENMASK_ULL(51, 6)
 #define STRTAB_STE_0_S1CDMAX		GENMASK_ULL(63, 59)
 
@@ -255,6 +257,18 @@
 
 #define STRTAB_STE_3_S2TTB_MASK		GENMASK_ULL(51, 4)
 
+/*
+ * Linear: when less than 1024 SSIDs are supported
+ * 2lvl: at most 1024 L1 entrie,
+ *      1024 lazy entries per table.
+ */
+#define CTXDESC_SPLIT			10
+#define CTXDESC_NUM_L2_ENTRIES		(1 << CTXDESC_SPLIT)
+
+#define CTXDESC_L1_DESC_DWORD		1
+#define CTXDESC_L1_DESC_VALID		1
+#define CTXDESC_L1_DESC_L2PTR_MASK	GENMASK_ULL(51, 12)
+
 /* Context descriptor (stage-1 only) */
 #define CTXDESC_CD_DWORDS		8
 #define CTXDESC_CD_0_TCR_T0SZ		GENMASK_ULL(5, 0)
@@ -530,7 +544,10 @@ struct arm_smmu_ctx_desc {
 struct arm_smmu_s1_cfg {
 	u8				s1fmt;
 	u8				s1cdmax;
-	struct arm_smmu_cd_table	table;
+	struct arm_smmu_cd_table	*tables;
+	size_t				num_tables;
+	__le64				*l1ptr;
+	dma_addr_t			l1ptr_dma;
 
 	/* Context descriptor 0, when substreams are disabled or s1dss = 0b10 */
 	struct arm_smmu_ctx_desc	cd;
@@ -1118,12 +1135,51 @@ static void arm_smmu_free_cd_leaf_table(struct arm_smmu_device *smmu,
 {
 	size_t size = num_entries * (CTXDESC_CD_DWORDS << 3);
 
+	if (!table->ptr)
+		return;
 	dmam_free_coherent(smmu->dev, size, table->ptr, table->ptr_dma);
 }
 
-static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_s1_cfg *cfg, u32 ssid)
+static void arm_smmu_write_cd_l1_desc(__le64 *dst,
+				      struct arm_smmu_cd_table *table)
 {
-	return cfg->table.ptr + ssid * CTXDESC_CD_DWORDS;
+	u64 val = (table->ptr_dma & CTXDESC_L1_DESC_L2PTR_MASK) |
+		  CTXDESC_L1_DESC_VALID;
+
+	*dst = cpu_to_le64(val);
+}
+
+static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_domain *smmu_domain,
+				   u32 ssid)
+{
+	unsigned int idx;
+	struct arm_smmu_cd_table *table;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
+
+	if (cfg->s1fmt == STRTAB_STE_0_S1FMT_LINEAR) {
+		table = &cfg->tables[0];
+		idx = ssid;
+	} else {
+		idx = ssid >> CTXDESC_SPLIT;
+		if (idx >= cfg->num_tables)
+			return NULL;
+
+		table = &cfg->tables[idx];
+		if (!table->ptr) {
+			__le64 *l1ptr = cfg->l1ptr + idx * CTXDESC_L1_DESC_DWORD;
+
+			if (arm_smmu_alloc_cd_leaf_table(smmu, table,
+							 CTXDESC_NUM_L2_ENTRIES))
+				return NULL;
+
+			arm_smmu_write_cd_l1_desc(l1ptr, table);
+			/* An invalid L1 entry is allowed to be cached */
+			arm_smmu_sync_cd(smmu_domain, ssid, false);
+		}
+		idx = ssid & (CTXDESC_NUM_L2_ENTRIES - 1);
+	}
+	return table->ptr + idx * CTXDESC_CD_DWORDS;
 }
 
 static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
@@ -1149,7 +1205,7 @@ static int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
 	u64 val;
 	bool cd_live;
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
-	__le64 *cdptr = arm_smmu_get_cd_ptr(&smmu_domain->s1_cfg, ssid);
+	__le64 *cdptr = arm_smmu_get_cd_ptr(smmu_domain, ssid);
 
 	/*
 	 * This function handles the following cases:
@@ -1213,20 +1269,81 @@ static int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
 static int arm_smmu_alloc_cd_tables(struct arm_smmu_domain *smmu_domain,
 				    struct arm_smmu_master *master)
 {
+	int ret;
+	size_t size = 0;
+	size_t max_contexts, num_leaf_entries;
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
 
 	cfg->s1fmt = STRTAB_STE_0_S1FMT_LINEAR;
 	cfg->s1cdmax = master->ssid_bits;
-	return arm_smmu_alloc_cd_leaf_table(smmu, &cfg->table, 1 << cfg->s1cdmax);
+
+	max_contexts = 1 << cfg->s1cdmax;
+	if (!(smmu->features & ARM_SMMU_FEAT_2_LVL_CDTAB) ||
+	    max_contexts <= CTXDESC_NUM_L2_ENTRIES) {
+		cfg->s1fmt = STRTAB_STE_0_S1FMT_LINEAR;
+		cfg->num_tables = 1;
+		num_leaf_entries = max_contexts;
+	} else {
+		cfg->s1fmt = STRTAB_STE_0_S1FMT_64K_L2;
+		/*
+		 * SSID[S1CDmax-1:10] indexes 1st-level table, SSID[9:0] indexes
+		 * 2nd-level
+		 */
+		cfg->num_tables = max_contexts / CTXDESC_NUM_L2_ENTRIES;
+
+		size = cfg->num_tables * (CTXDESC_L1_DESC_DWORD << 3);
+		cfg->l1ptr = dmam_alloc_coherent(smmu->dev, size,
+						 &cfg->l1ptr_dma,
+						 GFP_KERNEL | __GFP_ZERO);
+		if (!cfg->l1ptr) {
+			dev_warn(smmu->dev, "failed to allocate L1 context table\n");
+			return -ENOMEM;
+		}
+
+		num_leaf_entries = CTXDESC_NUM_L2_ENTRIES;
+	}
+
+	cfg->tables = devm_kzalloc(smmu->dev, sizeof(struct arm_smmu_cd_table) *
+				   cfg->num_tables, GFP_KERNEL);
+	if (!cfg->tables)
+		return -ENOMEM;
+
+	ret = arm_smmu_alloc_cd_leaf_table(smmu, &cfg->tables[0], num_leaf_entries);
+	if (ret)
+		goto err_free_l1;
+
+	if (cfg->l1ptr)
+		arm_smmu_write_cd_l1_desc(cfg->l1ptr, &cfg->tables[0]);
+
+	return 0;
+
+err_free_l1:
+	if (cfg->l1ptr)
+		dmam_free_coherent(smmu->dev, size, cfg->l1ptr, cfg->l1ptr_dma);
+	devm_kfree(smmu->dev, cfg->tables);
+	return ret;
 }
 
 static void arm_smmu_free_cd_tables(struct arm_smmu_domain *smmu_domain)
 {
+	int i;
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
+	size_t num_leaf_entries = 1 << cfg->s1cdmax;
+	struct arm_smmu_cd_table *table = cfg->tables;
+
+	if (cfg->l1ptr) {
+		size_t size = cfg->num_tables * (CTXDESC_L1_DESC_DWORD << 3);
 
-	arm_smmu_free_cd_leaf_table(smmu, &cfg->table, 1 << cfg->s1cdmax);
+		dmam_free_coherent(smmu->dev, size, cfg->l1ptr,
+				   cfg->l1ptr_dma);
+		num_leaf_entries = CTXDESC_NUM_L2_ENTRIES;
+	}
+
+	for (i = 0; i < cfg->num_tables; i++, table++)
+		arm_smmu_free_cd_leaf_table(smmu, table, num_leaf_entries);
+	devm_kfree(smmu->dev, cfg->tables);
 }
 
 /* Stream table manipulation functions */
@@ -1346,6 +1463,9 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 	}
 
 	if (s1_cfg) {
+		dma_addr_t ptr_dma = s1_cfg->l1ptr ? s1_cfg->l1ptr_dma :
+			             s1_cfg->tables[0].ptr_dma;
+
 		BUG_ON(ste_live);
 		dst[1] = cpu_to_le64(
 			 FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) |
@@ -1358,7 +1478,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
 			dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
 
-		val |= (s1_cfg->table.ptr_dma & STRTAB_STE_0_S1CTXPTR_MASK) |
+		val |= (ptr_dma & STRTAB_STE_0_S1CTXPTR_MASK) |
 			FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S1_TRANS) |
 			FIELD_PREP(STRTAB_STE_0_S1CDMAX, s1_cfg->s1cdmax) |
 			FIELD_PREP(STRTAB_STE_0_S1FMT, s1_cfg->s1fmt);
@@ -1815,7 +1935,7 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
 		struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
 
-		if (cfg->table.ptr) {
+		if (cfg->tables) {
 			arm_smmu_free_cd_tables(smmu_domain);
 			arm_smmu_bitmap_free(smmu->asid_map, cfg->cd.asid);
 		}
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 6/8] iommu/arm-smmu-v3: Support auxiliary domains
  2019-06-10 18:47 [PATCH 0/8] iommu: Add auxiliary domain and PASID support to Arm SMMUv3 Jean-Philippe Brucker
                   ` (4 preceding siblings ...)
  2019-06-10 18:47 ` [PATCH 5/8] iommu/arm-smmu-v3: Add second level of context descriptor table Jean-Philippe Brucker
@ 2019-06-10 18:47 ` Jean-Philippe Brucker
  2019-06-26 17:59   ` Will Deacon
  2019-06-10 18:47 ` [PATCH 7/8] iommu/arm-smmu-v3: Improve add_device() error handling Jean-Philippe Brucker
  2019-06-10 18:47 ` [PATCH 8/8] iommu/arm-smmu-v3: Add support for PCI PASID Jean-Philippe Brucker
  7 siblings, 1 reply; 44+ messages in thread
From: Jean-Philippe Brucker @ 2019-06-10 18:47 UTC (permalink / raw)
  To: will.deacon
  Cc: joro, robh+dt, mark.rutland, robin.murphy, jacob.jun.pan, iommu,
	devicetree, linux-kernel, linux-arm-kernel, eric.auger

In commit a3a195929d40 ("iommu: Add APIs for multiple domains per
device"), the IOMMU API gained the concept of auxiliary domains (AUXD),
which allows to control the PASID-tagged address spaces of a device. With
AUXD the PASID address space are not shared with the CPU, but are instead
modified with iommu_map() and iommu_unmap() calls on auxiliary domains.

Add auxiliary domain support to the SMMUv3 driver. Device drivers allocate
an unmanaged IOMMU domain with iommu_domain_alloc(), and attach it to the
device with iommu_aux_attach_domain().

The AUXD API is fairly permissive, and allows to attach an IOMMU domain in
both normal and auxiliary mode at the same time - one device can be
attached to the domain normally, and another device can be attached
through one of its PASIDs. To avoid excessive complexity in the SMMU
implementation we pose some restrictions on supported AUXD usage:

* A domain is either in auxiliary mode or normal mode. And that state is
  sticky. Once detached the domain has to be re-attached in the same mode.

* An auxiliary domain can have a single parent domain. Two devices can be
  attached to the same auxiliary domain only if they are attached to the
  same parent domain.

In practice these shouldn't be problematic, since we have the same kind of
restriction on normal domains and users have been able to cope so far: at
the moment a domain cannot be attached to two devices behind different
SMMUs. When VFIO puts two such devices in the same container, it simply
falls back to allocating two separate IOMMU domains.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/Kconfig       |   1 +
 drivers/iommu/arm-smmu-v3.c | 276 +++++++++++++++++++++++++++++++++---
 2 files changed, 260 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 9b45f70549a7..d326fef3d3a6 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -393,6 +393,7 @@ config ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT
 config ARM_SMMU_V3
 	bool "ARM Ltd. System MMU Version 3 (SMMUv3) Support"
 	depends on ARM64
+	select IOASID
 	select IOMMU_API
 	select IOMMU_IO_PGTABLE_LPAE
 	select GENERIC_MSI_IRQ_DOMAIN
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 326b71793336..633d829f246f 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -19,6 +19,7 @@
 #include <linux/err.h>
 #include <linux/interrupt.h>
 #include <linux/io-pgtable.h>
+#include <linux/ioasid.h>
 #include <linux/iommu.h>
 #include <linux/iopoll.h>
 #include <linux/init.h>
@@ -641,6 +642,7 @@ struct arm_smmu_master {
 	unsigned int			num_sids;
 	unsigned int			ssid_bits;
 	bool				ats_enabled		:1;
+	bool				auxd_enabled		:1;
 };
 
 /* SMMU private data for an IOMMU domain */
@@ -666,8 +668,14 @@ struct arm_smmu_domain {
 
 	struct iommu_domain		domain;
 
+	/* Unused in aux domains */
 	struct list_head		devices;
 	spinlock_t			devices_lock;
+
+	/* Auxiliary domain stuff */
+	struct arm_smmu_domain		*parent;
+	ioasid_t			ssid;
+	unsigned long			aux_nr_devs;
 };
 
 struct arm_smmu_option_prop {
@@ -675,6 +683,8 @@ struct arm_smmu_option_prop {
 	const char *prop;
 };
 
+static DECLARE_IOASID_SET(private_ioasid);
+
 static struct arm_smmu_option_prop arm_smmu_options[] = {
 	{ ARM_SMMU_OPT_SKIP_PREFETCH, "hisilicon,broken-prefetch-cmd" },
 	{ ARM_SMMU_OPT_PAGE0_REGS_ONLY, "cavium,cn9900-broken-page1-regspace"},
@@ -696,6 +706,15 @@ static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
 	return container_of(dom, struct arm_smmu_domain, domain);
 }
 
+static struct arm_smmu_master *dev_to_master(struct device *dev)
+{
+	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
+
+	if (!fwspec)
+		return NULL;
+	return fwspec->iommu_priv;
+}
+
 static void parse_driver_options(struct arm_smmu_device *smmu)
 {
 	int i = 0;
@@ -1776,13 +1795,19 @@ static int arm_smmu_atc_inv_master(struct arm_smmu_master *master,
 }
 
 static int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain,
-				   int ssid, unsigned long iova, size_t size)
+				   unsigned long iova, size_t size)
 {
 	int ret = 0;
+	unsigned int ssid = 0;
 	unsigned long flags;
 	struct arm_smmu_cmdq_ent cmd;
 	struct arm_smmu_master *master;
 
+	if (smmu_domain->parent) {
+		ssid = smmu_domain->ssid;
+		smmu_domain = smmu_domain->parent;
+	}
+
 	if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS))
 		return 0;
 
@@ -1935,10 +1960,12 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
 		struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
 
-		if (cfg->tables) {
+		if (cfg->tables)
 			arm_smmu_free_cd_tables(smmu_domain);
+		if (cfg->cd.asid)
 			arm_smmu_bitmap_free(smmu->asid_map, cfg->cd.asid);
-		}
+		if (smmu_domain->ssid)
+			ioasid_free(smmu_domain->ssid);
 	} else {
 		struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg;
 		if (cfg->vmid)
@@ -1948,11 +1975,10 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
 	kfree(smmu_domain);
 }
 
-static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
+static int arm_smmu_domain_finalise_cd(struct arm_smmu_domain *smmu_domain,
 				       struct arm_smmu_master *master,
 				       struct io_pgtable_cfg *pgtbl_cfg)
 {
-	int ret;
 	int asid;
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
@@ -1961,16 +1987,30 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 	if (asid < 0)
 		return asid;
 
-	ret = arm_smmu_alloc_cd_tables(smmu_domain, master);
-	if (ret)
-		goto out_free_asid;
-
 	cfg->cd.asid	= (u16)asid;
 	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
 	cfg->cd.tcr	= pgtbl_cfg->arm_lpae_s1_cfg.tcr;
 	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair[0];
+	return 0;
+}
+
+static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
+				       struct arm_smmu_master *master,
+				       struct io_pgtable_cfg *pgtbl_cfg)
+{
+	int ret;
+	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+	ret = arm_smmu_domain_finalise_cd(smmu_domain, master, pgtbl_cfg);
+	if (ret)
+		return ret;
+
+	ret = arm_smmu_alloc_cd_tables(smmu_domain, master);
+	if (ret)
+		goto out_free_asid;
 
-	ret = arm_smmu_write_ctx_desc(smmu_domain, 0, &smmu_domain->s1_cfg.cd);
+	ret = arm_smmu_write_ctx_desc(smmu_domain, 0, &cfg->cd);
 	if (ret)
 		goto out_free_table;
 	return 0;
@@ -1978,7 +2018,7 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 out_free_table:
 	arm_smmu_free_cd_tables(smmu_domain);
 out_free_asid:
-	arm_smmu_bitmap_free(smmu->asid_map, asid);
+	arm_smmu_bitmap_free(smmu->asid_map, cfg->cd.asid);
 	return ret;
 }
 
@@ -2031,7 +2071,10 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
 		ias = min_t(unsigned long, ias, VA_BITS);
 		oas = smmu->ias;
 		fmt = ARM_64_LPAE_S1;
-		finalise_stage_fn = arm_smmu_domain_finalise_s1;
+		if (smmu_domain->parent)
+			finalise_stage_fn = arm_smmu_domain_finalise_cd;
+		else
+			finalise_stage_fn = arm_smmu_domain_finalise_s1;
 		break;
 	case ARM_SMMU_DOMAIN_NESTED:
 	case ARM_SMMU_DOMAIN_S2:
@@ -2177,15 +2220,13 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 {
 	int ret = 0;
 	unsigned long flags;
-	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
 	struct arm_smmu_device *smmu;
+	struct arm_smmu_master *master = dev_to_master(dev);
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
-	struct arm_smmu_master *master;
 
-	if (!fwspec)
+	if (!master)
 		return -ENOENT;
 
-	master = fwspec->iommu_priv;
 	smmu = master->smmu;
 
 	arm_smmu_detach_dev(master);
@@ -2213,6 +2254,10 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 			smmu_domain->s1_cfg.s1cdmax, master->ssid_bits);
 		ret = -EINVAL;
 		goto out_unlock;
+	} else if (smmu_domain->parent) {
+		dev_err(dev, "cannot attach auxiliary domain\n");
+		ret = -EINVAL;
+		goto out_unlock;
 	}
 
 	master->domain = smmu_domain;
@@ -2252,7 +2297,7 @@ arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
 		return 0;
 
 	ret = ops->unmap(ops, iova, size);
-	if (ret && arm_smmu_atc_inv_domain(smmu_domain, 0, iova, size))
+	if (ret && arm_smmu_atc_inv_domain(smmu_domain, iova, size))
 		return 0;
 
 	return ret;
@@ -2521,6 +2566,194 @@ static void arm_smmu_put_resv_regions(struct device *dev,
 		kfree(entry);
 }
 
+static bool arm_smmu_dev_has_feature(struct device *dev,
+				     enum iommu_dev_features feat)
+{
+	struct arm_smmu_master *master = dev_to_master(dev);
+
+	if (!master)
+		return false;
+
+	switch (feat) {
+	case IOMMU_DEV_FEAT_AUX:
+		return master->ssid_bits != 0;
+	default:
+		return false;
+	}
+}
+
+static bool arm_smmu_dev_feature_enabled(struct device *dev,
+					 enum iommu_dev_features feat)
+{
+	struct arm_smmu_master *master = dev_to_master(dev);
+
+	if (!master)
+		return false;
+
+	switch (feat) {
+	case IOMMU_DEV_FEAT_AUX:
+		return master->auxd_enabled;
+	default:
+		return false;
+	}
+}
+
+static int arm_smmu_dev_enable_feature(struct device *dev,
+				       enum iommu_dev_features feat)
+{
+	struct arm_smmu_master *master = dev_to_master(dev);
+
+	if (!arm_smmu_dev_has_feature(dev, feat))
+		return -ENODEV;
+
+	if (arm_smmu_dev_feature_enabled(dev, feat))
+		return -EBUSY;
+
+	switch (feat) {
+	case IOMMU_DEV_FEAT_AUX:
+		master->auxd_enabled = true;
+		return 0;
+	default:
+		return -EINVAL;
+	}
+}
+
+static int arm_smmu_dev_disable_feature(struct device *dev,
+					enum iommu_dev_features feat)
+{
+	struct arm_smmu_master *master = dev_to_master(dev);
+
+	if (!arm_smmu_dev_feature_enabled(dev, feat))
+		return -EINVAL;
+
+	switch (feat) {
+	case IOMMU_DEV_FEAT_AUX:
+		/* TODO: check if aux domains are still attached? */
+		master->auxd_enabled = false;
+		return 0;
+	default:
+		return -EINVAL;
+	}
+}
+
+static int arm_smmu_aux_attach_dev(struct iommu_domain *domain, struct device *dev)
+{
+	int ret;
+	struct iommu_domain *parent_domain;
+	struct arm_smmu_domain *parent_smmu_domain;
+	struct arm_smmu_master *master = dev_to_master(dev);
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+
+	if (!arm_smmu_dev_feature_enabled(dev, IOMMU_DEV_FEAT_AUX))
+		return -EINVAL;
+
+	parent_domain = iommu_get_domain_for_dev(dev);
+	if (!parent_domain)
+		return -EINVAL;
+	parent_smmu_domain = to_smmu_domain(parent_domain);
+
+	mutex_lock(&smmu_domain->init_mutex);
+	if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1 ||
+	    parent_smmu_domain->stage != ARM_SMMU_DOMAIN_S1) {
+		ret = -EINVAL;
+		goto out_unlock;
+	} else if (smmu_domain->s1_cfg.tables) {
+		/* Already attached as a normal domain */
+		dev_err(dev, "cannot attach domain in auxiliary mode\n");
+		ret = -EINVAL;
+		goto out_unlock;
+	} else if (!smmu_domain->smmu) {
+		ioasid_t ssid = ioasid_alloc(&private_ioasid, 1,
+					     (1UL << master->ssid_bits) - 1,
+					     NULL);
+		if (ssid == INVALID_IOASID) {
+			ret = -EINVAL;
+			goto out_unlock;
+		}
+		smmu_domain->smmu = master->smmu;
+		smmu_domain->parent = parent_smmu_domain;
+		smmu_domain->ssid = ssid;
+
+		ret = arm_smmu_domain_finalise(domain, master);
+		if (ret) {
+			smmu_domain->smmu = NULL;
+			smmu_domain->ssid = 0;
+			smmu_domain->parent = NULL;
+			ioasid_free(ssid);
+			goto out_unlock;
+		}
+	} else if (smmu_domain->parent != parent_smmu_domain) {
+		/* Additional restriction: an aux domain has a single parent */
+		dev_err(dev, "cannot attach aux domain with different parent\n");
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	if (!smmu_domain->aux_nr_devs++)
+		arm_smmu_write_ctx_desc(parent_smmu_domain, smmu_domain->ssid,
+					&smmu_domain->s1_cfg.cd);
+	/*
+	 * Note that all other devices attached to the parent domain can now
+	 * access this context as well.
+	 */
+
+out_unlock:
+	mutex_unlock(&smmu_domain->init_mutex);
+	return ret;
+}
+
+static void arm_smmu_aux_detach_dev(struct iommu_domain *domain, struct device *dev)
+{
+	struct iommu_domain *parent_domain;
+	struct arm_smmu_domain *parent_smmu_domain;
+	struct arm_smmu_master *master = dev_to_master(dev);
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+
+	if (!arm_smmu_dev_feature_enabled(dev, IOMMU_DEV_FEAT_AUX))
+		return;
+
+	parent_domain = iommu_get_domain_for_dev(dev);
+	if (!parent_domain)
+		return;
+	parent_smmu_domain = to_smmu_domain(parent_domain);
+
+	mutex_lock(&smmu_domain->init_mutex);
+	if (!smmu_domain->aux_nr_devs)
+		goto out_unlock;
+
+	if (!--smmu_domain->aux_nr_devs) {
+		arm_smmu_write_ctx_desc(parent_smmu_domain, smmu_domain->ssid,
+					NULL);
+		/*
+		 * TLB doesn't need invalidation since accesses from the device
+		 * can't use this domain's ASID once the CD is clear.
+		 *
+		 * Sadly that doesn't apply to ATCs, which are PASID tagged.
+		 * Invalidate all other devices as well, because even though
+		 * they weren't 'officially' attached to the auxiliary domain,
+		 * they could have formed ATC entries.
+		 */
+		arm_smmu_atc_inv_domain(smmu_domain, 0, 0);
+	} else {
+		struct arm_smmu_cmdq_ent cmd;
+
+		/* Invalidate only this device's ATC */
+		if (master->ats_enabled) {
+			arm_smmu_atc_inv_to_cmd(smmu_domain->ssid, 0, 0, &cmd);
+			arm_smmu_atc_inv_master(master, &cmd);
+		}
+	}
+out_unlock:
+	mutex_unlock(&smmu_domain->init_mutex);
+}
+
+static int arm_smmu_aux_get_pasid(struct iommu_domain *domain, struct device *dev)
+{
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+
+	return smmu_domain->ssid ?: -EINVAL;
+}
+
 static struct iommu_ops arm_smmu_ops = {
 	.capable		= arm_smmu_capable,
 	.domain_alloc		= arm_smmu_domain_alloc,
@@ -2539,6 +2772,13 @@ static struct iommu_ops arm_smmu_ops = {
 	.of_xlate		= arm_smmu_of_xlate,
 	.get_resv_regions	= arm_smmu_get_resv_regions,
 	.put_resv_regions	= arm_smmu_put_resv_regions,
+	.dev_has_feat		= arm_smmu_dev_has_feature,
+	.dev_feat_enabled	= arm_smmu_dev_feature_enabled,
+	.dev_enable_feat	= arm_smmu_dev_enable_feature,
+	.dev_disable_feat	= arm_smmu_dev_disable_feature,
+	.aux_attach_dev		= arm_smmu_aux_attach_dev,
+	.aux_detach_dev		= arm_smmu_aux_detach_dev,
+	.aux_get_pasid		= arm_smmu_aux_get_pasid,
 	.pgsize_bitmap		= -1UL, /* Restricted during device attach */
 };
 
@@ -3332,6 +3572,8 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
 		return -ENOMEM;
 	}
 	smmu->dev = dev;
+	/* Reserve ASID 0 for validity check */
+	set_bit(0, smmu->asid_map);
 
 	if (dev->of_node) {
 		ret = arm_smmu_device_dt_probe(pdev, smmu);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 7/8] iommu/arm-smmu-v3: Improve add_device() error handling
  2019-06-10 18:47 [PATCH 0/8] iommu: Add auxiliary domain and PASID support to Arm SMMUv3 Jean-Philippe Brucker
                   ` (5 preceding siblings ...)
  2019-06-10 18:47 ` [PATCH 6/8] iommu/arm-smmu-v3: Support auxiliary domains Jean-Philippe Brucker
@ 2019-06-10 18:47 ` Jean-Philippe Brucker
  2019-07-08  7:58   ` Auger Eric
  2019-06-10 18:47 ` [PATCH 8/8] iommu/arm-smmu-v3: Add support for PCI PASID Jean-Philippe Brucker
  7 siblings, 1 reply; 44+ messages in thread
From: Jean-Philippe Brucker @ 2019-06-10 18:47 UTC (permalink / raw)
  To: will.deacon
  Cc: joro, robh+dt, mark.rutland, robin.murphy, jacob.jun.pan, iommu,
	devicetree, linux-kernel, linux-arm-kernel, eric.auger

Let add_device() clean up behind itself. The iommu_bus_init() function
does call remove_device() on error, but other sites (e.g. of_iommu) do
not.

Don't free level-2 stream tables because we'd have to track if we
allocated each of them or if they are used by other endpoints. It's not
worth the hassle since they are managed resources.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 28 +++++++++++++++++++++-------
 1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 633d829f246f..972bfb80f964 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2398,14 +2398,16 @@ static int arm_smmu_add_device(struct device *dev)
 	for (i = 0; i < master->num_sids; i++) {
 		u32 sid = master->sids[i];
 
-		if (!arm_smmu_sid_in_range(smmu, sid))
-			return -ERANGE;
+		if (!arm_smmu_sid_in_range(smmu, sid)) {
+			ret = -ERANGE;
+			goto err_free_master;
+		}
 
 		/* Ensure l2 strtab is initialised */
 		if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) {
 			ret = arm_smmu_init_l2_strtab(smmu, sid);
 			if (ret)
-				return ret;
+				goto err_free_master;
 		}
 	}
 
@@ -2419,13 +2421,25 @@ static int arm_smmu_add_device(struct device *dev)
 	if (!(smmu->features & ARM_SMMU_FEAT_2_LVL_CDTAB))
 		master->ssid_bits = min(master->ssid_bits, 10U);
 
+	ret = iommu_device_link(&smmu->iommu, dev);
+	if (ret)
+		goto err_free_master;
+
 	group = iommu_group_get_for_dev(dev);
-	if (!IS_ERR(group)) {
-		iommu_group_put(group);
-		iommu_device_link(&smmu->iommu, dev);
+	if (IS_ERR(group)) {
+		ret = PTR_ERR(group);
+		goto err_unlink;
 	}
 
-	return PTR_ERR_OR_ZERO(group);
+	iommu_group_put(group);
+	return 0;
+
+err_unlink:
+	iommu_device_unlink(&smmu->iommu, dev);
+err_free_master:
+	kfree(master);
+	fwspec->iommu_priv = NULL;
+	return ret;
 }
 
 static void arm_smmu_remove_device(struct device *dev)
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 8/8] iommu/arm-smmu-v3: Add support for PCI PASID
  2019-06-10 18:47 [PATCH 0/8] iommu: Add auxiliary domain and PASID support to Arm SMMUv3 Jean-Philippe Brucker
                   ` (6 preceding siblings ...)
  2019-06-10 18:47 ` [PATCH 7/8] iommu/arm-smmu-v3: Improve add_device() error handling Jean-Philippe Brucker
@ 2019-06-10 18:47 ` Jean-Philippe Brucker
  2019-06-11 10:45   ` Jonathan Cameron
  2019-07-08  7:58   ` Auger Eric
  7 siblings, 2 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2019-06-10 18:47 UTC (permalink / raw)
  To: will.deacon
  Cc: joro, robh+dt, mark.rutland, robin.murphy, jacob.jun.pan, iommu,
	devicetree, linux-kernel, linux-arm-kernel, eric.auger

Enable PASID for PCI devices that support it. Since the SSID tables are
allocated by arm_smmu_attach_dev(), PASID has to be enabled early enough.
arm_smmu_dev_feature_enable() would be too late, since by that time the
main DMA domain has already been attached. Do it in add_device() instead.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 51 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 50 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 972bfb80f964..a8a516d9ff10 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2197,6 +2197,49 @@ static void arm_smmu_disable_ats(struct arm_smmu_master *master)
 	master->ats_enabled = false;
 }
 
+static int arm_smmu_enable_pasid(struct arm_smmu_master *master)
+{
+	int ret;
+	int features;
+	int num_pasids;
+	struct pci_dev *pdev;
+
+	if (!dev_is_pci(master->dev))
+		return -ENOSYS;
+
+	pdev = to_pci_dev(master->dev);
+
+	features = pci_pasid_features(pdev);
+	if (features < 0)
+		return -ENOSYS;
+
+	num_pasids = pci_max_pasids(pdev);
+	if (num_pasids <= 0)
+		return -ENOSYS;
+
+	ret = pci_enable_pasid(pdev, features);
+	if (!ret)
+		master->ssid_bits = min_t(u8, ilog2(num_pasids),
+					  master->smmu->ssid_bits);
+	return ret;
+}
+
+static void arm_smmu_disable_pasid(struct arm_smmu_master *master)
+{
+	struct pci_dev *pdev;
+
+	if (!dev_is_pci(master->dev))
+		return;
+
+	pdev = to_pci_dev(master->dev);
+
+	if (!pdev->pasid_enabled)
+		return;
+
+	pci_disable_pasid(pdev);
+	master->ssid_bits = 0;
+}
+
 static void arm_smmu_detach_dev(struct arm_smmu_master *master)
 {
 	unsigned long flags;
@@ -2413,6 +2456,9 @@ static int arm_smmu_add_device(struct device *dev)
 
 	master->ssid_bits = min(smmu->ssid_bits, fwspec->num_pasid_bits);
 
+	/* Note that PASID must be enabled before, and disabled after ATS */
+	arm_smmu_enable_pasid(master);
+
 	/*
 	 * If the SMMU doesn't support 2-stage CD, limit the linear
 	 * tables to a reasonable number of contexts, let's say
@@ -2423,7 +2469,7 @@ static int arm_smmu_add_device(struct device *dev)
 
 	ret = iommu_device_link(&smmu->iommu, dev);
 	if (ret)
-		goto err_free_master;
+		goto err_disable_pasid;
 
 	group = iommu_group_get_for_dev(dev);
 	if (IS_ERR(group)) {
@@ -2436,6 +2482,8 @@ static int arm_smmu_add_device(struct device *dev)
 
 err_unlink:
 	iommu_device_unlink(&smmu->iommu, dev);
+err_disable_pasid:
+	arm_smmu_disable_pasid(master);
 err_free_master:
 	kfree(master);
 	fwspec->iommu_priv = NULL;
@@ -2456,6 +2504,7 @@ static void arm_smmu_remove_device(struct device *dev)
 	arm_smmu_detach_dev(master);
 	iommu_group_remove_device(dev);
 	iommu_device_unlink(&smmu->iommu, dev);
+	arm_smmu_disable_pasid(master);
 	kfree(master);
 	iommu_fwspec_free(dev);
 }
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/8] iommu: Add I/O ASID allocator
  2019-06-10 18:47 ` [PATCH 1/8] iommu: Add I/O ASID allocator Jean-Philippe Brucker
@ 2019-06-11  9:36   ` Jonathan Cameron
  2019-06-11 14:35     ` Jean-Philippe Brucker
  2019-06-11 12:26   ` Jacob Pan
  1 sibling, 1 reply; 44+ messages in thread
From: Jonathan Cameron @ 2019-06-11  9:36 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: will.deacon, mark.rutland, devicetree, linux-kernel, iommu,
	robh+dt, robin.murphy, linux-arm-kernel

On Mon, 10 Jun 2019 19:47:07 +0100
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> Some devices might support multiple DMA address spaces, in particular
> those that have the PCI PASID feature. PASID (Process Address Space ID)
> allows to share process address spaces with devices (SVA), partition a
> device into VM-assignable entities (VFIO mdev) or simply provide
> multiple DMA address space to kernel drivers. Add a global PASID
> allocator usable by different drivers at the same time. Name it I/O ASID
> to avoid confusion with ASIDs allocated by arch code, which are usually
> a separate ID space.
> 
> The IOASID space is global. Each device can have its own PASID space,
> but by convention the IOMMU ended up having a global PASID space, so
> that with SVA, each mm_struct is associated to a single PASID.
> 
> The allocator is primarily used by IOMMU subsystem but in rare occasions
> drivers would like to allocate PASIDs for devices that aren't managed by
> an IOMMU, using the same ID space as IOMMU.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Hi,

A few trivial comments inline.  May be more because I'm not that familiar
with xa_array than anything else.

Jonathan

> ---
> The most recent discussion on this patch was at:
> https://lkml.kernel.org/lkml/1556922737-76313-4-git-send-email-jacob.jun.pan@linux.intel.com/
> I fixed it up a bit following comments in that series, and removed the
> definitions for the custom allocator for now.
> 
> There also is a new version that includes the custom allocator into this
> patch, but is currently missing the RCU fixes, at:
> https://lore.kernel.org/lkml/1560087862-57608-13-git-send-email-jacob.jun.pan@linux.intel.com/
> ---

...

> +
> +/**
> + * ioasid_alloc - Allocate an IOASID
> + * @set: the IOASID set
> + * @min: the minimum ID (inclusive)
> + * @max: the maximum ID (inclusive)
> + * @private: data private to the caller
> + *
> + * Allocate an ID between @min and @max. The @private pointer is stored
> + * internally and can be retrieved with ioasid_find().
> + *
> + * Return: the allocated ID on success, or %INVALID_IOASID on failure.
> + */
> +ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, ioasid_t max,
> +		      void *private)
> +{
> +	u32 id = INVALID_IOASID;
> +	struct ioasid_data *data;
> +
> +	data = kzalloc(sizeof(*data), GFP_KERNEL);
> +	if (!data)
> +		return INVALID_IOASID;
> +
> +	data->set = set;
> +	data->private = private;
> +
> +	if (xa_alloc(&ioasid_xa, &id, data, XA_LIMIT(min, max), GFP_KERNEL)) {
> +		pr_err("Failed to alloc ioasid from %d to %d\n", min, max);
> +		goto exit_free;
> +	}
> +	data->id = id;
> +
> +exit_free:

This error flow is perhaps a little more confusing than it needs to be?

My assumption (perhaps wrong) is that we only have an id == INVALID_IOASID
if the xa_alloc fails, and that we will always have such an id value if
it does (I'm not totally sure this second element is true in __xa_alloc).

If I'm missing something perhaps a comment on how else we'd get here.

> +	if (id == INVALID_IOASID) {
> +		kfree(data);
> +		return INVALID_IOASID;
> +	}
> +	return id;
> +}
> +EXPORT_SYMBOL_GPL(ioasid_alloc);
> +
> +/**
> + * ioasid_free - Free an IOASID
> + * @ioasid: the ID to remove
> + */
> +void ioasid_free(ioasid_t ioasid)
> +{
> +	struct ioasid_data *ioasid_data;
> +
> +	ioasid_data = xa_erase(&ioasid_xa, ioasid);
> +
> +	kfree_rcu(ioasid_data, rcu);
> +}
> +EXPORT_SYMBOL_GPL(ioasid_free);
> +
> +/**
> + * ioasid_find - Find IOASID data
> + * @set: the IOASID set
> + * @ioasid: the IOASID to find
> + * @getter: function to call on the found object
> + *
> + * The optional getter function allows to take a reference to the found object
> + * under the rcu lock. The function can also check if the object is still valid:
> + * if @getter returns false, then the object is invalid and NULL is returned.
> + *
> + * If the IOASID has been allocated for this set, return the private pointer
> + * passed to ioasid_alloc. Private data can be NULL if not set. Return an error
> + * if the IOASID is not found or does not belong to the set.

Perhaps should make it clear that @set can be null.

> + */
> +void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
> +		  bool (*getter)(void *))
> +{
> +	void *priv = NULL;

Set in all paths, so does need to be set here.

> +	struct ioasid_data *ioasid_data;
> +
> +	rcu_read_lock();
> +	ioasid_data = xa_load(&ioasid_xa, ioasid);
> +	if (!ioasid_data) {
> +		priv = ERR_PTR(-ENOENT);
> +		goto unlock;
> +	}
> +	if (set && ioasid_data->set != set) {
> +		/* data found but does not belong to the set */
> +		priv = ERR_PTR(-EACCES);
> +		goto unlock;
> +	}
> +	/* Now IOASID and its set is verified, we can return the private data */
> +	priv = rcu_dereference(ioasid_data->private);
> +	if (getter && !getter(priv))
> +		priv = NULL;
> +unlock:
> +	rcu_read_unlock();
> +
> +	return priv;
> +}
> +EXPORT_SYMBOL_GPL(ioasid_find);
> +
> +MODULE_LICENSE("GPL");
...


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/8] iommu/arm-smmu-v3: Support platform SSID
  2019-06-10 18:47 ` [PATCH 3/8] iommu/arm-smmu-v3: Support platform SSID Jean-Philippe Brucker
@ 2019-06-11  9:42   ` Jonathan Cameron
  2019-06-11 14:35     ` Jean-Philippe Brucker
  2019-06-18 18:08   ` Will Deacon
  2019-07-08  7:58   ` Auger Eric
  2 siblings, 1 reply; 44+ messages in thread
From: Jonathan Cameron @ 2019-06-11  9:42 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: will.deacon, mark.rutland, devicetree, linux-kernel, iommu,
	robh+dt, robin.murphy, linux-arm-kernel

On Mon, 10 Jun 2019 19:47:09 +0100
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> For platform devices that support SubstreamID (SSID), firmware provides
> the number of supported SSID bits. Restrict it to what the SMMU supports
> and cache it into master->ssid_bits.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>

Missing kernel-doc.

Thanks,

Jonathan

> ---
>  drivers/iommu/arm-smmu-v3.c | 11 +++++++++++
>  drivers/iommu/of_iommu.c    |  6 +++++-
>  include/linux/iommu.h       |  1 +
>  3 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 4d5a694f02c2..3254f473e681 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -604,6 +604,7 @@ struct arm_smmu_master {
>  	struct list_head		domain_head;
>  	u32				*sids;
>  	unsigned int			num_sids;
> +	unsigned int			ssid_bits;
>  	bool				ats_enabled		:1;
>  };
>  
> @@ -2097,6 +2098,16 @@ static int arm_smmu_add_device(struct device *dev)
>  		}
>  	}
>  
> +	master->ssid_bits = min(smmu->ssid_bits, fwspec->num_pasid_bits);
> +
> +	/*
> +	 * If the SMMU doesn't support 2-stage CD, limit the linear
> +	 * tables to a reasonable number of contexts, let's say
> +	 * 64kB / sizeof(ctx_desc) = 1024 = 2^10
> +	 */
> +	if (!(smmu->features & ARM_SMMU_FEAT_2_LVL_CDTAB))
> +		master->ssid_bits = min(master->ssid_bits, 10U);
> +
>  	group = iommu_group_get_for_dev(dev);
>  	if (!IS_ERR(group)) {
>  		iommu_group_put(group);
> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
> index f04a6df65eb8..04f4f6b95d82 100644
> --- a/drivers/iommu/of_iommu.c
> +++ b/drivers/iommu/of_iommu.c
> @@ -206,8 +206,12 @@ const struct iommu_ops *of_iommu_configure(struct device *dev,
>  			if (err)
>  				break;
>  		}
> -	}
>  
> +		fwspec = dev_iommu_fwspec_get(dev);
> +		if (!err && fwspec)
> +			of_property_read_u32(master_np, "pasid-num-bits",
> +					     &fwspec->num_pasid_bits);
> +	}
>  
>  	/*
>  	 * Two success conditions can be represented by non-negative err here:
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 519e40fb23ce..b91df613385f 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -536,6 +536,7 @@ struct iommu_fwspec {
>  	struct fwnode_handle	*iommu_fwnode;
>  	void			*iommu_priv;
>  	u32			flags;
> +	u32			num_pasid_bits;

This structure has kernel doc so you need to add something for this.

>  	unsigned int		num_ids;
>  	u32			ids[1];
>  };



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/8] iommu/arm-smmu-v3: Add support for Substream IDs
  2019-06-10 18:47 ` [PATCH 4/8] iommu/arm-smmu-v3: Add support for Substream IDs Jean-Philippe Brucker
@ 2019-06-11 10:19   ` Jonathan Cameron
  2019-06-11 14:35     ` Jean-Philippe Brucker
  2019-06-26 18:00   ` Will Deacon
  2019-07-08 15:31   ` Auger Eric
  2 siblings, 1 reply; 44+ messages in thread
From: Jonathan Cameron @ 2019-06-11 10:19 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: will.deacon, mark.rutland, devicetree, linux-kernel, iommu,
	robh+dt, robin.murphy, linux-arm-kernel

On Mon, 10 Jun 2019 19:47:10 +0100
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> At the moment, the SMMUv3 driver implements only one stage-1 or stage-2
> page directory per device. However SMMUv3 allows more than one address
> space for some devices, by providing multiple stage-1 page directories. In
> addition to the Stream ID (SID), that identifies a device, we can now have
> Substream IDs (SSID) identifying an address space. In PCIe, SID is called
> Requester ID (RID) and SSID is called Process Address-Space ID (PASID).
> 
> Prepare the driver for SSID support, by adding context descriptor tables
> in STEs (previously a single static context descriptor). A complete
> stage-1 walk is now performed like this by the SMMU:
> 
>       Stream tables          Ctx. tables          Page tables
>         +--------+   ,------->+-------+   ,------->+-------+
>         :        :   |        :       :   |        :       :
>         +--------+   |        +-------+   |        +-------+
>    SID->|  STE   |---'  SSID->|  CD   |---'  IOVA->|  PTE  |--> IPA
>         +--------+            +-------+            +-------+
>         :        :            :       :            :       :
>         +--------+            +-------+            +-------+
> 
> Implement a single level of context descriptor table for now, but as with
> stream and page tables, an SSID can be split to index multiple levels of
> tables.
> 
> In all stream table entries, we set S1DSS=SSID0 mode, making translations
> without an SSID use context descriptor 0. Although it would be possible by
> setting S1DSS=BYPASS, we don't currently support SSID when user selects
> iommu.passthrough.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>

Hi Jean-Phillipe,

A few trivial comments inline, mostly around wondering if a few bits
of refactoring can get pulled out before this and hopefully stop diff
making such a mess of this patch from a readability point of view!

Thanks,

Jonathan

> ---
>  drivers/iommu/arm-smmu-v3.c | 238 +++++++++++++++++++++++++++++-------
>  1 file changed, 192 insertions(+), 46 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 3254f473e681..d90eb604b65d 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -219,6 +219,11 @@
>  #define STRTAB_STE_0_S1CTXPTR_MASK	GENMASK_ULL(51, 6)
>  #define STRTAB_STE_0_S1CDMAX		GENMASK_ULL(63, 59)
>  
> +#define STRTAB_STE_1_S1DSS		GENMASK_ULL(1, 0)
> +#define STRTAB_STE_1_S1DSS_TERMINATE	0x0
> +#define STRTAB_STE_1_S1DSS_BYPASS	0x1
> +#define STRTAB_STE_1_S1DSS_SSID0	0x2
> +
>  #define STRTAB_STE_1_S1C_CACHE_NC	0UL
>  #define STRTAB_STE_1_S1C_CACHE_WBRA	1UL
>  #define STRTAB_STE_1_S1C_CACHE_WT	2UL
> @@ -305,6 +310,7 @@
>  #define CMDQ_PREFETCH_1_SIZE		GENMASK_ULL(4, 0)
>  #define CMDQ_PREFETCH_1_ADDR_MASK	GENMASK_ULL(63, 12)
>  
> +#define CMDQ_CFGI_0_SSID		GENMASK_ULL(31, 12)
>  #define CMDQ_CFGI_0_SID			GENMASK_ULL(63, 32)
>  #define CMDQ_CFGI_1_LEAF		(1UL << 0)
>  #define CMDQ_CFGI_1_RANGE		GENMASK_ULL(4, 0)
> @@ -421,8 +427,11 @@ struct arm_smmu_cmdq_ent {
>  
>  		#define CMDQ_OP_CFGI_STE	0x3
>  		#define CMDQ_OP_CFGI_ALL	0x4
> +		#define CMDQ_OP_CFGI_CD		0x5
> +		#define CMDQ_OP_CFGI_CD_ALL	0x6
>  		struct {
>  			u32			sid;
> +			u32			ssid;
>  			union {
>  				bool		leaf;
>  				u8		span;
> @@ -506,16 +515,25 @@ struct arm_smmu_strtab_l1_desc {
>  	dma_addr_t			l2ptr_dma;
>  };
>  
> +struct arm_smmu_cd_table {
> +	__le64				*ptr;
> +	dma_addr_t			ptr_dma;
> +};
> +
> +struct arm_smmu_ctx_desc {
> +	u16				asid;
> +	u64				ttbr;
> +	u64				tcr;
> +	u64				mair;
> +};
> +
>  struct arm_smmu_s1_cfg {
> -	__le64				*cdptr;
> -	dma_addr_t			cdptr_dma;
> -
> -	struct arm_smmu_ctx_desc {
> -		u16	asid;
> -		u64	ttbr;
> -		u64	tcr;
> -		u64	mair;
> -	}				cd;
> +	u8				s1fmt;
> +	u8				s1cdmax;
> +	struct arm_smmu_cd_table	table;

This new structure is a sensible addition and makes the code more readable,
but it's not directly tied to the main flow of this patch, perhaps pull it out?

> +
> +	/* Context descriptor 0, when substreams are disabled or s1dss = 0b10 */
> +	struct arm_smmu_ctx_desc	cd;
I've not checked in detail but feels like you could pull this refactor out as well
as a trivial precursor and make this patch easier to read as a result?

>  };
>  
>  struct arm_smmu_s2_cfg {
> @@ -811,10 +829,16 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
>  		cmd[1] |= FIELD_PREP(CMDQ_PREFETCH_1_SIZE, ent->prefetch.size);
>  		cmd[1] |= ent->prefetch.addr & CMDQ_PREFETCH_1_ADDR_MASK;
>  		break;
> +	case CMDQ_OP_CFGI_CD:
> +		cmd[0] |= FIELD_PREP(CMDQ_CFGI_0_SSID, ent->cfgi.ssid);
> +		/* Fallthrough */
>  	case CMDQ_OP_CFGI_STE:
>  		cmd[0] |= FIELD_PREP(CMDQ_CFGI_0_SID, ent->cfgi.sid);
>  		cmd[1] |= FIELD_PREP(CMDQ_CFGI_1_LEAF, ent->cfgi.leaf);
>  		break;
> +	case CMDQ_OP_CFGI_CD_ALL:
> +		cmd[0] |= FIELD_PREP(CMDQ_CFGI_0_SID, ent->cfgi.sid);
> +		break;
>  	case CMDQ_OP_CFGI_ALL:
>  		/* Cover the entire SID range */
>  		cmd[1] |= FIELD_PREP(CMDQ_CFGI_1_RANGE, 31);
> @@ -1045,6 +1069,63 @@ static int arm_smmu_cmdq_issue_sync(struct arm_smmu_device *smmu)
>  }
>  
>  /* Context descriptor manipulation functions */
> +static void arm_smmu_sync_cd(struct arm_smmu_domain *smmu_domain,
> +			     int ssid, bool leaf)
> +{
> +	size_t i;
> +	unsigned long flags;
> +	struct arm_smmu_master *master;
> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
> +	struct arm_smmu_cmdq_ent cmd = {
> +		.opcode	= CMDQ_OP_CFGI_CD,
> +		.cfgi	= {
> +			.ssid	= ssid,
> +			.leaf	= leaf,
> +		},
> +	};
> +
> +	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
> +	list_for_each_entry(master, &smmu_domain->devices, domain_head) {
> +		for (i = 0; i < master->num_sids; i++) {
> +			cmd.cfgi.sid = master->sids[i];
> +			arm_smmu_cmdq_issue_cmd(smmu, &cmd);
> +		}
> +	}
> +	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
> +
> +	arm_smmu_cmdq_issue_sync(smmu);
> +}
> +
> +static int arm_smmu_alloc_cd_leaf_table(struct arm_smmu_device *smmu,
> +					struct arm_smmu_cd_table *table,
> +					size_t num_entries)
> +{
> +	size_t size = num_entries * (CTXDESC_CD_DWORDS << 3);
> +
> +	table->ptr = dmam_alloc_coherent(smmu->dev, size, &table->ptr_dma,
> +					 GFP_KERNEL | __GFP_ZERO);
> +	if (!table->ptr) {
> +		dev_warn(smmu->dev,
> +			 "failed to allocate context descriptor table\n");
> +		return -ENOMEM;
> +	}
> +	return 0;
> +}
> +
> +static void arm_smmu_free_cd_leaf_table(struct arm_smmu_device *smmu,
> +					struct arm_smmu_cd_table *table,
> +					size_t num_entries)
> +{
> +	size_t size = num_entries * (CTXDESC_CD_DWORDS << 3);
> +
> +	dmam_free_coherent(smmu->dev, size, table->ptr, table->ptr_dma);
> +}
> +
> +static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_s1_cfg *cfg, u32 ssid)
> +{
> +	return cfg->table.ptr + ssid * CTXDESC_CD_DWORDS;
> +}
> +
>  static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
>  {
>  	u64 val = 0;
> @@ -1062,33 +1143,90 @@ static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
>  	return val;
>  }
>  
> -static void arm_smmu_write_ctx_desc(struct arm_smmu_device *smmu,
> -				    struct arm_smmu_s1_cfg *cfg)
> +static int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
> +				   int ssid, struct arm_smmu_ctx_desc *cd)
>  {
>  	u64 val;
> +	bool cd_live;
> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
> +	__le64 *cdptr = arm_smmu_get_cd_ptr(&smmu_domain->s1_cfg, ssid);
>  
>  	/*
> -	 * We don't need to issue any invalidation here, as we'll invalidate
> -	 * the STE when installing the new entry anyway.
> +	 * This function handles the following cases:
> +	 *
> +	 * (1) Install primary CD, for normal DMA traffic (SSID = 0).
> +	 * (2) Install a secondary CD, for SID+SSID traffic.
> +	 * (3) Update ASID of a CD. Atomically write the first 64 bits of the
> +	 *     CD, then invalidate the old entry and mappings.
> +	 * (4) Remove a secondary CD.
>  	 */
> -	val = arm_smmu_cpu_tcr_to_cd(cfg->cd.tcr) |
> +
> +	if (!cdptr)
> +		return -ENOMEM;
> +
> +	val = le64_to_cpu(cdptr[0]);
> +	cd_live = !!(val & CTXDESC_CD_0_V);
> +
> +	if (!cd) { /* (4) */
> +		cdptr[0] = 0;
> +	} else if (cd_live) { /* (3) */
> +		val &= ~CTXDESC_CD_0_ASID;
> +		val |= FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid);
> +
> +		cdptr[0] = cpu_to_le64(val);
> +		/*
> +		 * Until CD+TLB invalidation, both ASIDs may be used for tagging
> +		 * this substream's traffic
> +		 */
> +	} else { /* (1) and (2) */
> +		cdptr[1] = cpu_to_le64(cd->ttbr & CTXDESC_CD_1_TTB0_MASK);
> +		cdptr[2] = 0;
> +		cdptr[3] = cpu_to_le64(cd->mair);
> +
> +		/*
> +		 * STE is live, and the SMMU might fetch this CD at any
> +		 * time. Ensure that it observes the rest of the CD before we
> +		 * enable it.
> +		 */
> +		arm_smmu_sync_cd(smmu_domain, ssid, true);
> +
> +		val = arm_smmu_cpu_tcr_to_cd(cd->tcr) |
>  #ifdef __BIG_ENDIAN
> -	      CTXDESC_CD_0_ENDI |
> +			CTXDESC_CD_0_ENDI |
>  #endif
> -	      CTXDESC_CD_0_R | CTXDESC_CD_0_A | CTXDESC_CD_0_ASET |
> -	      CTXDESC_CD_0_AA64 | FIELD_PREP(CTXDESC_CD_0_ASID, cfg->cd.asid) |
> -	      CTXDESC_CD_0_V;
> +			CTXDESC_CD_0_R | CTXDESC_CD_0_A | CTXDESC_CD_0_ASET |
> +			CTXDESC_CD_0_AA64 |
> +			FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid) |
> +			CTXDESC_CD_0_V;
> +
> +		/* STALL_MODEL==0b10 && CD.S==0 is ILLEGAL */
> +		if (smmu->features & ARM_SMMU_FEAT_STALL_FORCE)
> +			val |= CTXDESC_CD_0_S;
> +
> +		cdptr[0] = cpu_to_le64(val);
> +	}
>  
> -	/* STALL_MODEL==0b10 && CD.S==0 is ILLEGAL */
> -	if (smmu->features & ARM_SMMU_FEAT_STALL_FORCE)
> -		val |= CTXDESC_CD_0_S;
> +	arm_smmu_sync_cd(smmu_domain, ssid, true);
> +	return 0;
> +}
> +
> +static int arm_smmu_alloc_cd_tables(struct arm_smmu_domain *smmu_domain,
> +				    struct arm_smmu_master *master)
> +{
> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
> +	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
>  
> -	cfg->cdptr[0] = cpu_to_le64(val);
> +	cfg->s1fmt = STRTAB_STE_0_S1FMT_LINEAR;
> +	cfg->s1cdmax = master->ssid_bits;
> +	return arm_smmu_alloc_cd_leaf_table(smmu, &cfg->table, 1 << cfg->s1cdmax);
> +}
>  
> -	val = cfg->cd.ttbr & CTXDESC_CD_1_TTB0_MASK;
> -	cfg->cdptr[1] = cpu_to_le64(val);

Hmm. Diff was having a field day in trying to make the patch as unreadable as possible..

> +static void arm_smmu_free_cd_tables(struct arm_smmu_domain *smmu_domain)
> +{
> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
> +	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
>  
> -	cfg->cdptr[3] = cpu_to_le64(cfg->cd.mair);
> +	arm_smmu_free_cd_leaf_table(smmu, &cfg->table, 1 << cfg->s1cdmax);
>  }
>  
>  /* Stream table manipulation functions */
> @@ -1210,6 +1348,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>  	if (s1_cfg) {
>  		BUG_ON(ste_live);
>  		dst[1] = cpu_to_le64(
> +			 FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) |
>  			 FIELD_PREP(STRTAB_STE_1_S1CIR, STRTAB_STE_1_S1C_CACHE_WBRA) |
>  			 FIELD_PREP(STRTAB_STE_1_S1COR, STRTAB_STE_1_S1C_CACHE_WBRA) |
>  			 FIELD_PREP(STRTAB_STE_1_S1CSH, ARM_SMMU_SH_ISH) |
> @@ -1219,8 +1358,10 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>  		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
>  			dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
>  
> -		val |= (s1_cfg->cdptr_dma & STRTAB_STE_0_S1CTXPTR_MASK) |
> -			FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S1_TRANS);
> +		val |= (s1_cfg->table.ptr_dma & STRTAB_STE_0_S1CTXPTR_MASK) |
> +			FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S1_TRANS) |
> +			FIELD_PREP(STRTAB_STE_0_S1CDMAX, s1_cfg->s1cdmax) |
> +			FIELD_PREP(STRTAB_STE_0_S1FMT, s1_cfg->s1fmt);
>  	}
>  
>  	if (s2_cfg) {
> @@ -1674,12 +1815,8 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
>  	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>  		struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
>  
> -		if (cfg->cdptr) {
> -			dmam_free_coherent(smmu_domain->smmu->dev,
> -					   CTXDESC_CD_DWORDS << 3,
> -					   cfg->cdptr,
> -					   cfg->cdptr_dma);
> -
> +		if (cfg->table.ptr) {
> +			arm_smmu_free_cd_tables(smmu_domain);
>  			arm_smmu_bitmap_free(smmu->asid_map, cfg->cd.asid);
>  		}
>  	} else {
> @@ -1692,6 +1829,7 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
>  }
>  
>  static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
> +				       struct arm_smmu_master *master,
>  				       struct io_pgtable_cfg *pgtbl_cfg)
>  {
>  	int ret;
> @@ -1703,27 +1841,29 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
>  	if (asid < 0)
>  		return asid;
>  
> -	cfg->cdptr = dmam_alloc_coherent(smmu->dev, CTXDESC_CD_DWORDS << 3,
> -					 &cfg->cdptr_dma,
> -					 GFP_KERNEL | __GFP_ZERO);
> -	if (!cfg->cdptr) {
> -		dev_warn(smmu->dev, "failed to allocate context descriptor\n");
> -		ret = -ENOMEM;
> +	ret = arm_smmu_alloc_cd_tables(smmu_domain, master);
> +	if (ret)
>  		goto out_free_asid;
> -	}

I wonder if it is worth pulling this refactor out as a precursor patch?
That is provide the allocation functions without the new support first, then
simply add the new stuff in this patch?

>  
>  	cfg->cd.asid	= (u16)asid;
>  	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
>  	cfg->cd.tcr	= pgtbl_cfg->arm_lpae_s1_cfg.tcr;
>  	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair[0];
> +
> +	ret = arm_smmu_write_ctx_desc(smmu_domain, 0, &smmu_domain->s1_cfg.cd);
> +	if (ret)
> +		goto out_free_table;
>  	return 0;
>  
> +out_free_table:
> +	arm_smmu_free_cd_tables(smmu_domain);
>  out_free_asid:
>  	arm_smmu_bitmap_free(smmu->asid_map, asid);
>  	return ret;
>  }
>  
>  static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain,
> +				       struct arm_smmu_master *master,
>  				       struct io_pgtable_cfg *pgtbl_cfg)
>  {
>  	int vmid;
> @@ -1740,7 +1880,8 @@ static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain,
>  	return 0;
>  }
>  
> -static int arm_smmu_domain_finalise(struct iommu_domain *domain)
> +static int arm_smmu_domain_finalise(struct iommu_domain *domain,
> +				    struct arm_smmu_master *master)
>  {
>  	int ret;
>  	unsigned long ias, oas;
> @@ -1748,6 +1889,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain)
>  	struct io_pgtable_cfg pgtbl_cfg;
>  	struct io_pgtable_ops *pgtbl_ops;
>  	int (*finalise_stage_fn)(struct arm_smmu_domain *,
> +				 struct arm_smmu_master *,
>  				 struct io_pgtable_cfg *);
>  	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>  	struct arm_smmu_device *smmu = smmu_domain->smmu;
> @@ -1804,7 +1946,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain)
>  	domain->geometry.aperture_end = (1UL << pgtbl_cfg.ias) - 1;
>  	domain->geometry.force_aperture = true;
>  
> -	ret = finalise_stage_fn(smmu_domain, &pgtbl_cfg);
> +	ret = finalise_stage_fn(smmu_domain, master, &pgtbl_cfg);
>  	if (ret < 0) {
>  		free_io_pgtable_ops(pgtbl_ops);
>  		return ret;
> @@ -1932,7 +2074,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  
>  	if (!smmu_domain->smmu) {
>  		smmu_domain->smmu = smmu;
> -		ret = arm_smmu_domain_finalise(domain);
> +		ret = arm_smmu_domain_finalise(domain, master);
>  		if (ret) {
>  			smmu_domain->smmu = NULL;
>  			goto out_unlock;
> @@ -1944,6 +2086,13 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  			dev_name(smmu->dev));
>  		ret = -ENXIO;
>  		goto out_unlock;
> +	} else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1 &&
> +		   master->ssid_bits != smmu_domain->s1_cfg.s1cdmax) {
> +		dev_err(dev,
> +			"cannot attach to incompatible domain (%u SSID bits != %u)\n",
> +			smmu_domain->s1_cfg.s1cdmax, master->ssid_bits);
> +		ret = -EINVAL;
> +		goto out_unlock;
>  	}
>  
>  	master->domain = smmu_domain;
> @@ -1955,9 +2104,6 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  	if (smmu_domain->stage != ARM_SMMU_DOMAIN_BYPASS)
>  		arm_smmu_enable_ats(master);
>  
> -	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1)
> -		arm_smmu_write_ctx_desc(smmu, &smmu_domain->s1_cfg);
> -
>  	arm_smmu_install_ste_for_dev(master);
>  out_unlock:
>  	mutex_unlock(&smmu_domain->init_mutex);



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 5/8] iommu/arm-smmu-v3: Add second level of context descriptor table
  2019-06-10 18:47 ` [PATCH 5/8] iommu/arm-smmu-v3: Add second level of context descriptor table Jean-Philippe Brucker
@ 2019-06-11 10:24   ` Jonathan Cameron
  2019-07-08 15:13   ` Auger Eric
  1 sibling, 0 replies; 44+ messages in thread
From: Jonathan Cameron @ 2019-06-11 10:24 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: will.deacon, mark.rutland, devicetree, linux-kernel, iommu,
	robh+dt, robin.murphy, linux-arm-kernel

On Mon, 10 Jun 2019 19:47:11 +0100
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> The SMMU can support up to 20 bits of SSID. Add a second level of page
> tables to accommodate this. Devices that support more than 1024 SSIDs now
> have a table of 1024 L1 entries (8kB), pointing to tables of 1024 context
> descriptors (64kB), allocated on demand.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
One trivial typo.

Thanks,

Jonathan
> ---
>  drivers/iommu/arm-smmu-v3.c | 136 +++++++++++++++++++++++++++++++++---
>  1 file changed, 128 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index d90eb604b65d..326b71793336 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -216,6 +216,8 @@
>  
>  #define STRTAB_STE_0_S1FMT		GENMASK_ULL(5, 4)
>  #define STRTAB_STE_0_S1FMT_LINEAR	0
> +#define STRTAB_STE_0_S1FMT_4K_L2	1
> +#define STRTAB_STE_0_S1FMT_64K_L2	2
>  #define STRTAB_STE_0_S1CTXPTR_MASK	GENMASK_ULL(51, 6)
>  #define STRTAB_STE_0_S1CDMAX		GENMASK_ULL(63, 59)
>  
> @@ -255,6 +257,18 @@
>  
>  #define STRTAB_STE_3_S2TTB_MASK		GENMASK_ULL(51, 4)
>  
> +/*
> + * Linear: when less than 1024 SSIDs are supported
> + * 2lvl: at most 1024 L1 entrie,

entries?

> + *      1024 lazy entries per table.
> + */
> +#define CTXDESC_SPLIT			10
> +#define CTXDESC_NUM_L2_ENTRIES		(1 << CTXDESC_SPLIT)
> +
> +#define CTXDESC_L1_DESC_DWORD		1
> +#define CTXDESC_L1_DESC_VALID		1
> +#define CTXDESC_L1_DESC_L2PTR_MASK	GENMASK_ULL(51, 12)
> +
>  /* Context descriptor (stage-1 only) */
>  #define CTXDESC_CD_DWORDS		8
>  #define CTXDESC_CD_0_TCR_T0SZ		GENMASK_ULL(5, 0)
> @@ -530,7 +544,10 @@ struct arm_smmu_ctx_desc {
>  struct arm_smmu_s1_cfg {
>  	u8				s1fmt;
>  	u8				s1cdmax;
> -	struct arm_smmu_cd_table	table;
> +	struct arm_smmu_cd_table	*tables;
> +	size_t				num_tables;
> +	__le64				*l1ptr;
> +	dma_addr_t			l1ptr_dma;
>  
>  	/* Context descriptor 0, when substreams are disabled or s1dss = 0b10 */
>  	struct arm_smmu_ctx_desc	cd;
> @@ -1118,12 +1135,51 @@ static void arm_smmu_free_cd_leaf_table(struct arm_smmu_device *smmu,
>  {
>  	size_t size = num_entries * (CTXDESC_CD_DWORDS << 3);
>  
> +	if (!table->ptr)
> +		return;
>  	dmam_free_coherent(smmu->dev, size, table->ptr, table->ptr_dma);
>  }
>  
> -static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_s1_cfg *cfg, u32 ssid)
> +static void arm_smmu_write_cd_l1_desc(__le64 *dst,
> +				      struct arm_smmu_cd_table *table)
>  {
> -	return cfg->table.ptr + ssid * CTXDESC_CD_DWORDS;
> +	u64 val = (table->ptr_dma & CTXDESC_L1_DESC_L2PTR_MASK) |
> +		  CTXDESC_L1_DESC_VALID;
> +
> +	*dst = cpu_to_le64(val);
> +}
> +
> +static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_domain *smmu_domain,
> +				   u32 ssid)
> +{
> +	unsigned int idx;
> +	struct arm_smmu_cd_table *table;
> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
> +	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
> +
> +	if (cfg->s1fmt == STRTAB_STE_0_S1FMT_LINEAR) {
> +		table = &cfg->tables[0];
> +		idx = ssid;
> +	} else {
> +		idx = ssid >> CTXDESC_SPLIT;
> +		if (idx >= cfg->num_tables)
> +			return NULL;
> +
> +		table = &cfg->tables[idx];
> +		if (!table->ptr) {
> +			__le64 *l1ptr = cfg->l1ptr + idx * CTXDESC_L1_DESC_DWORD;
> +
> +			if (arm_smmu_alloc_cd_leaf_table(smmu, table,
> +							 CTXDESC_NUM_L2_ENTRIES))
> +				return NULL;
> +
> +			arm_smmu_write_cd_l1_desc(l1ptr, table);
> +			/* An invalid L1 entry is allowed to be cached */
> +			arm_smmu_sync_cd(smmu_domain, ssid, false);
> +		}
> +		idx = ssid & (CTXDESC_NUM_L2_ENTRIES - 1);
> +	}
> +	return table->ptr + idx * CTXDESC_CD_DWORDS;
>  }
>  
>  static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
> @@ -1149,7 +1205,7 @@ static int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
>  	u64 val;
>  	bool cd_live;
>  	struct arm_smmu_device *smmu = smmu_domain->smmu;
> -	__le64 *cdptr = arm_smmu_get_cd_ptr(&smmu_domain->s1_cfg, ssid);
> +	__le64 *cdptr = arm_smmu_get_cd_ptr(smmu_domain, ssid);
>  
>  	/*
>  	 * This function handles the following cases:
> @@ -1213,20 +1269,81 @@ static int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
>  static int arm_smmu_alloc_cd_tables(struct arm_smmu_domain *smmu_domain,
>  				    struct arm_smmu_master *master)
>  {
> +	int ret;
> +	size_t size = 0;
> +	size_t max_contexts, num_leaf_entries;
>  	struct arm_smmu_device *smmu = smmu_domain->smmu;
>  	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
>  
>  	cfg->s1fmt = STRTAB_STE_0_S1FMT_LINEAR;
>  	cfg->s1cdmax = master->ssid_bits;
> -	return arm_smmu_alloc_cd_leaf_table(smmu, &cfg->table, 1 << cfg->s1cdmax);
> +
> +	max_contexts = 1 << cfg->s1cdmax;
> +	if (!(smmu->features & ARM_SMMU_FEAT_2_LVL_CDTAB) ||
> +	    max_contexts <= CTXDESC_NUM_L2_ENTRIES) {
> +		cfg->s1fmt = STRTAB_STE_0_S1FMT_LINEAR;
> +		cfg->num_tables = 1;
> +		num_leaf_entries = max_contexts;
> +	} else {
> +		cfg->s1fmt = STRTAB_STE_0_S1FMT_64K_L2;
> +		/*
> +		 * SSID[S1CDmax-1:10] indexes 1st-level table, SSID[9:0] indexes
> +		 * 2nd-level
> +		 */
> +		cfg->num_tables = max_contexts / CTXDESC_NUM_L2_ENTRIES;
> +
> +		size = cfg->num_tables * (CTXDESC_L1_DESC_DWORD << 3);
> +		cfg->l1ptr = dmam_alloc_coherent(smmu->dev, size,
> +						 &cfg->l1ptr_dma,
> +						 GFP_KERNEL | __GFP_ZERO);
> +		if (!cfg->l1ptr) {
> +			dev_warn(smmu->dev, "failed to allocate L1 context table\n");
> +			return -ENOMEM;
> +		}
> +
> +		num_leaf_entries = CTXDESC_NUM_L2_ENTRIES;
> +	}
> +
> +	cfg->tables = devm_kzalloc(smmu->dev, sizeof(struct arm_smmu_cd_table) *
> +				   cfg->num_tables, GFP_KERNEL);
> +	if (!cfg->tables)
> +		return -ENOMEM;
> +
> +	ret = arm_smmu_alloc_cd_leaf_table(smmu, &cfg->tables[0], num_leaf_entries);
> +	if (ret)
> +		goto err_free_l1;
> +
> +	if (cfg->l1ptr)
> +		arm_smmu_write_cd_l1_desc(cfg->l1ptr, &cfg->tables[0]);
> +
> +	return 0;
> +
> +err_free_l1:
> +	if (cfg->l1ptr)
> +		dmam_free_coherent(smmu->dev, size, cfg->l1ptr, cfg->l1ptr_dma);
> +	devm_kfree(smmu->dev, cfg->tables);
> +	return ret;
>  }
>  
>  static void arm_smmu_free_cd_tables(struct arm_smmu_domain *smmu_domain)
>  {
> +	int i;
>  	struct arm_smmu_device *smmu = smmu_domain->smmu;
>  	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
> +	size_t num_leaf_entries = 1 << cfg->s1cdmax;
> +	struct arm_smmu_cd_table *table = cfg->tables;
> +
> +	if (cfg->l1ptr) {
> +		size_t size = cfg->num_tables * (CTXDESC_L1_DESC_DWORD << 3);
>  
> -	arm_smmu_free_cd_leaf_table(smmu, &cfg->table, 1 << cfg->s1cdmax);
> +		dmam_free_coherent(smmu->dev, size, cfg->l1ptr,
> +				   cfg->l1ptr_dma);
> +		num_leaf_entries = CTXDESC_NUM_L2_ENTRIES;
> +	}
> +
> +	for (i = 0; i < cfg->num_tables; i++, table++)
> +		arm_smmu_free_cd_leaf_table(smmu, table, num_leaf_entries);
> +	devm_kfree(smmu->dev, cfg->tables);
>  }
>  
>  /* Stream table manipulation functions */
> @@ -1346,6 +1463,9 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>  	}
>  
>  	if (s1_cfg) {
> +		dma_addr_t ptr_dma = s1_cfg->l1ptr ? s1_cfg->l1ptr_dma :
> +			             s1_cfg->tables[0].ptr_dma;
> +
>  		BUG_ON(ste_live);
>  		dst[1] = cpu_to_le64(
>  			 FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) |
> @@ -1358,7 +1478,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>  		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
>  			dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
>  
> -		val |= (s1_cfg->table.ptr_dma & STRTAB_STE_0_S1CTXPTR_MASK) |
> +		val |= (ptr_dma & STRTAB_STE_0_S1CTXPTR_MASK) |
>  			FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S1_TRANS) |
>  			FIELD_PREP(STRTAB_STE_0_S1CDMAX, s1_cfg->s1cdmax) |
>  			FIELD_PREP(STRTAB_STE_0_S1FMT, s1_cfg->s1fmt);
> @@ -1815,7 +1935,7 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
>  	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>  		struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
>  
> -		if (cfg->table.ptr) {
> +		if (cfg->tables) {
>  			arm_smmu_free_cd_tables(smmu_domain);
>  			arm_smmu_bitmap_free(smmu->asid_map, cfg->cd.asid);
>  		}



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] iommu/arm-smmu-v3: Add support for PCI PASID
  2019-06-10 18:47 ` [PATCH 8/8] iommu/arm-smmu-v3: Add support for PCI PASID Jean-Philippe Brucker
@ 2019-06-11 10:45   ` Jonathan Cameron
  2019-06-11 14:35     ` Jean-Philippe Brucker
  2019-07-08  7:58   ` Auger Eric
  1 sibling, 1 reply; 44+ messages in thread
From: Jonathan Cameron @ 2019-06-11 10:45 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: will.deacon, mark.rutland, devicetree, linux-kernel, iommu,
	robh+dt, robin.murphy, linux-arm-kernel

On Mon, 10 Jun 2019 19:47:14 +0100
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> Enable PASID for PCI devices that support it. Since the SSID tables are
> allocated by arm_smmu_attach_dev(), PASID has to be enabled early enough.
> arm_smmu_dev_feature_enable() would be too late, since by that time the
> main DMA domain has already been attached. Do it in add_device() instead.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Nitpick in line.

Thanks,

Jonathan
> ---
>  drivers/iommu/arm-smmu-v3.c | 51 ++++++++++++++++++++++++++++++++++++-
>  1 file changed, 50 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 972bfb80f964..a8a516d9ff10 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -2197,6 +2197,49 @@ static void arm_smmu_disable_ats(struct arm_smmu_master *master)
>  	master->ats_enabled = false;
>  }
>  
> +static int arm_smmu_enable_pasid(struct arm_smmu_master *master)
> +{
> +	int ret;
> +	int features;
> +	int num_pasids;
> +	struct pci_dev *pdev;
> +
> +	if (!dev_is_pci(master->dev))
> +		return -ENOSYS;
> +
> +	pdev = to_pci_dev(master->dev);
> +
> +	features = pci_pasid_features(pdev);
> +	if (features < 0)
> +		return -ENOSYS;
> +
> +	num_pasids = pci_max_pasids(pdev);
> +	if (num_pasids <= 0)
> +		return -ENOSYS;
> +
> +	ret = pci_enable_pasid(pdev, features);
> +	if (!ret)
> +		master->ssid_bits = min_t(u8, ilog2(num_pasids),
> +					  master->smmu->ssid_bits);
> +	return ret;
> +}
> +
> +static void arm_smmu_disable_pasid(struct arm_smmu_master *master)
> +{
> +	struct pci_dev *pdev;
> +
> +	if (!dev_is_pci(master->dev))
> +		return;
> +
> +	pdev = to_pci_dev(master->dev);
> +
> +	if (!pdev->pasid_enabled)
> +		return;
> +
> +	pci_disable_pasid(pdev);
> +	master->ssid_bits = 0;

If we are being really fussy about ordering, why have this set of
ssid_bits after pci_disable_pasid rather than before (to reverse order
of .._enable_pasid)?

> +}
> +
>  static void arm_smmu_detach_dev(struct arm_smmu_master *master)
>  {
>  	unsigned long flags;
> @@ -2413,6 +2456,9 @@ static int arm_smmu_add_device(struct device *dev)
>  
>  	master->ssid_bits = min(smmu->ssid_bits, fwspec->num_pasid_bits);
>  
> +	/* Note that PASID must be enabled before, and disabled after ATS */
> +	arm_smmu_enable_pasid(master);
> +
>  	/*
>  	 * If the SMMU doesn't support 2-stage CD, limit the linear
>  	 * tables to a reasonable number of contexts, let's say
> @@ -2423,7 +2469,7 @@ static int arm_smmu_add_device(struct device *dev)
>  
>  	ret = iommu_device_link(&smmu->iommu, dev);
>  	if (ret)
> -		goto err_free_master;
> +		goto err_disable_pasid;
>  
>  	group = iommu_group_get_for_dev(dev);
>  	if (IS_ERR(group)) {
> @@ -2436,6 +2482,8 @@ static int arm_smmu_add_device(struct device *dev)
>  
>  err_unlink:
>  	iommu_device_unlink(&smmu->iommu, dev);
> +err_disable_pasid:
> +	arm_smmu_disable_pasid(master);
>  err_free_master:
>  	kfree(master);
>  	fwspec->iommu_priv = NULL;
> @@ -2456,6 +2504,7 @@ static void arm_smmu_remove_device(struct device *dev)
>  	arm_smmu_detach_dev(master);
>  	iommu_group_remove_device(dev);
>  	iommu_device_unlink(&smmu->iommu, dev);
> +	arm_smmu_disable_pasid(master);
>  	kfree(master);
>  	iommu_fwspec_free(dev);
>  }



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/8] iommu: Add I/O ASID allocator
  2019-06-10 18:47 ` [PATCH 1/8] iommu: Add I/O ASID allocator Jean-Philippe Brucker
  2019-06-11  9:36   ` Jonathan Cameron
@ 2019-06-11 12:26   ` Jacob Pan
  2019-06-11 14:37     ` Jean-Philippe Brucker
  1 sibling, 1 reply; 44+ messages in thread
From: Jacob Pan @ 2019-06-11 12:26 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: will.deacon, joro, robh+dt, mark.rutland, robin.murphy, iommu,
	devicetree, linux-kernel, linux-arm-kernel, eric.auger,
	jacob.jun.pan

On Mon, 10 Jun 2019 19:47:07 +0100
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> Some devices might support multiple DMA address spaces, in particular
> those that have the PCI PASID feature. PASID (Process Address Space
> ID) allows to share process address spaces with devices (SVA),
> partition a device into VM-assignable entities (VFIO mdev) or simply
> provide multiple DMA address space to kernel drivers. Add a global
> PASID allocator usable by different drivers at the same time. Name it
> I/O ASID to avoid confusion with ASIDs allocated by arch code, which
> are usually a separate ID space.
> 
> The IOASID space is global. Each device can have its own PASID space,
> but by convention the IOMMU ended up having a global PASID space, so
> that with SVA, each mm_struct is associated to a single PASID.
> 
> The allocator is primarily used by IOMMU subsystem but in rare
> occasions drivers would like to allocate PASIDs for devices that
> aren't managed by an IOMMU, using the same ID space as IOMMU.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
> The most recent discussion on this patch was at:
> https://lkml.kernel.org/lkml/1556922737-76313-4-git-send-email-jacob.jun.pan@linux.intel.com/
> I fixed it up a bit following comments in that series, and removed the
> definitions for the custom allocator for now.
> 
> There also is a new version that includes the custom allocator into
> this patch, but is currently missing the RCU fixes, at:
> https://lore.kernel.org/lkml/1560087862-57608-13-git-send-email-jacob.jun.pan@linux.intel.com/
> ---
>  drivers/iommu/Kconfig  |   4 ++
>  drivers/iommu/Makefile |   1 +
>  drivers/iommu/ioasid.c | 150
> +++++++++++++++++++++++++++++++++++++++++ include/linux/ioasid.h |
> 49 ++++++++++++++ 4 files changed, 204 insertions(+)
>  create mode 100644 drivers/iommu/ioasid.c
>  create mode 100644 include/linux/ioasid.h
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 83664db5221d..9b45f70549a7 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -3,6 +3,10 @@
>  config IOMMU_IOVA
>  	tristate
>  
> +# The IOASID library may also be used by non-IOMMU_API users
> +config IOASID
> +	tristate
> +
>  # IOMMU_API always gets selected by whoever wants it.
>  config IOMMU_API
>  	bool
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 8c71a15e986b..0efac6f1ec73 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -7,6 +7,7 @@ obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
> +obj-$(CONFIG_IOASID) += ioasid.o
>  obj-$(CONFIG_IOMMU_IOVA) += iova.o
>  obj-$(CONFIG_OF_IOMMU)	+= of_iommu.o
>  obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
> diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> new file mode 100644
> index 000000000000..bbb771214fa9
> --- /dev/null
> +++ b/drivers/iommu/ioasid.c
> @@ -0,0 +1,150 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * I/O Address Space ID allocator. There is one global IOASID space,
> split into
> + * subsets. Users create a subset with DECLARE_IOASID_SET, then
> allocate and
> + * free IOASIDs with ioasid_alloc and ioasid_free.
> + */
> +#include <linux/ioasid.h>
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +#include <linux/xarray.h>
> +
> +struct ioasid_data {
> +	ioasid_t id;
> +	struct ioasid_set *set;
> +	void *private;
> +	struct rcu_head rcu;
> +};
> +
> +static DEFINE_XARRAY_ALLOC(ioasid_xa);
> +
> +/**
> + * ioasid_set_data - Set private data for an allocated ioasid
> + * @ioasid: the ID to set data
> + * @data:   the private data
> + *
> + * For IOASID that is already allocated, private data can be set
> + * via this API. Future lookup can be done via ioasid_find.
> + */
> +int ioasid_set_data(ioasid_t ioasid, void *data)
> +{
> +	struct ioasid_data *ioasid_data;
> +	int ret = 0;
> +
> +	xa_lock(&ioasid_xa);
Just wondering if this is necessary, since xa_load is under
rcu_read_lock and we are not changing anything internal to xa. For
custom allocator I still need to have the mutex against allocator
removal.
> +	ioasid_data = xa_load(&ioasid_xa, ioasid);
> +	if (ioasid_data)
> +		rcu_assign_pointer(ioasid_data->private, data);
it is good to publish and have barrier here. But I just wonder even for
weakly ordered machine, this pointer update is quite far away from its
data update.
> +	else
> +		ret = -ENOENT;
> +	xa_unlock(&ioasid_xa);
> +
> +	/*
> +	 * Wait for readers to stop accessing the old private data,
> so the
> +	 * caller can free it.
> +	 */
> +	if (!ret)
> +		synchronize_rcu();
> +
I will add that to my next version to check ret value.

Thanks,

Jacob
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(ioasid_set_data);
> +
> +/**
> + * ioasid_alloc - Allocate an IOASID
> + * @set: the IOASID set
> + * @min: the minimum ID (inclusive)
> + * @max: the maximum ID (inclusive)
> + * @private: data private to the caller
> + *
> + * Allocate an ID between @min and @max. The @private pointer is
> stored
> + * internally and can be retrieved with ioasid_find().
> + *
> + * Return: the allocated ID on success, or %INVALID_IOASID on
> failure.
> + */
> +ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, ioasid_t
> max,
> +		      void *private)
> +{
> +	u32 id = INVALID_IOASID;
> +	struct ioasid_data *data;
> +
> +	data = kzalloc(sizeof(*data), GFP_KERNEL);
> +	if (!data)
> +		return INVALID_IOASID;
> +
> +	data->set = set;
> +	data->private = private;
> +
> +	if (xa_alloc(&ioasid_xa, &id, data, XA_LIMIT(min, max),
> GFP_KERNEL)) {
> +		pr_err("Failed to alloc ioasid from %d to %d\n",
> min, max);
> +		goto exit_free;
> +	}
> +	data->id = id;
> +
> +exit_free:
> +	if (id == INVALID_IOASID) {
> +		kfree(data);
> +		return INVALID_IOASID;
> +	}
> +	return id;
> +}
> +EXPORT_SYMBOL_GPL(ioasid_alloc);
> +
> +/**
> + * ioasid_free - Free an IOASID
> + * @ioasid: the ID to remove
> + */
> +void ioasid_free(ioasid_t ioasid)
> +{
> +	struct ioasid_data *ioasid_data;
> +
> +	ioasid_data = xa_erase(&ioasid_xa, ioasid);
> +
> +	kfree_rcu(ioasid_data, rcu);
> +}
> +EXPORT_SYMBOL_GPL(ioasid_free);
> +
> +/**
> + * ioasid_find - Find IOASID data
> + * @set: the IOASID set
> + * @ioasid: the IOASID to find
> + * @getter: function to call on the found object
> + *
> + * The optional getter function allows to take a reference to the
> found object
> + * under the rcu lock. The function can also check if the object is
> still valid:
> + * if @getter returns false, then the object is invalid and NULL is
> returned.
> + *
> + * If the IOASID has been allocated for this set, return the private
> pointer
> + * passed to ioasid_alloc. Private data can be NULL if not set.
> Return an error
> + * if the IOASID is not found or does not belong to the set.
> + */
> +void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
> +		  bool (*getter)(void *))
> +{
> +	void *priv = NULL;
> +	struct ioasid_data *ioasid_data;
> +
> +	rcu_read_lock();
> +	ioasid_data = xa_load(&ioasid_xa, ioasid);
> +	if (!ioasid_data) {
> +		priv = ERR_PTR(-ENOENT);
> +		goto unlock;
> +	}
> +	if (set && ioasid_data->set != set) {
> +		/* data found but does not belong to the set */
> +		priv = ERR_PTR(-EACCES);
> +		goto unlock;
> +	}
> +	/* Now IOASID and its set is verified, we can return the
> private data */
> +	priv = rcu_dereference(ioasid_data->private);
> +	if (getter && !getter(priv))
> +		priv = NULL;
> +unlock:
> +	rcu_read_unlock();
> +
> +	return priv;
> +}
> +EXPORT_SYMBOL_GPL(ioasid_find);
> +
> +MODULE_LICENSE("GPL");
> diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
> new file mode 100644
> index 000000000000..940212422b8f
> --- /dev/null
> +++ b/include/linux/ioasid.h
> @@ -0,0 +1,49 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __LINUX_IOASID_H
> +#define __LINUX_IOASID_H
> +
> +#include <linux/types.h>
> +
> +#define INVALID_IOASID ((ioasid_t)-1)
> +typedef unsigned int ioasid_t;
> +
> +struct ioasid_set {
> +	int dummy;
> +};
> +
> +#define DECLARE_IOASID_SET(name) struct ioasid_set name = { 0 }
> +
> +#if IS_ENABLED(CONFIG_IOASID)
> +ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, ioasid_t
> max,
> +		      void *private);
> +void ioasid_free(ioasid_t ioasid);
> +
> +void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
> +		  bool (*getter)(void *));
> +
> +int ioasid_set_data(ioasid_t ioasid, void *data);
> +
> +#else /* !CONFIG_IOASID */
> +static inline ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t
> min,
> +				    ioasid_t max, void *private)
> +{
> +	return INVALID_IOASID;
> +}
> +
> +static inline void ioasid_free(ioasid_t ioasid)
> +{
> +}
> +
> +static inline void *ioasid_find(struct ioasid_set *set, ioasid_t
> ioasid,
> +				bool (*getter)(void *))
> +{
> +	return NULL;
> +}
> +
> +static inline int ioasid_set_data(ioasid_t ioasid, void *data)
> +{
> +	return -ENODEV;
> +}
> +
> +#endif /* CONFIG_IOASID */
> +#endif /* __LINUX_IOASID_H */

[Jacob Pan]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/8] iommu: Add I/O ASID allocator
  2019-06-11  9:36   ` Jonathan Cameron
@ 2019-06-11 14:35     ` Jean-Philippe Brucker
  2019-06-11 18:13       ` Jacob Pan
  0 siblings, 1 reply; 44+ messages in thread
From: Jean-Philippe Brucker @ 2019-06-11 14:35 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: mark.rutland, devicetree, will.deacon, linux-kernel, iommu,
	robh+dt, robin.murphy, linux-arm-kernel, jacob.jun.pan

On 11/06/2019 10:36, Jonathan Cameron wrote:
>> +/**
>> + * ioasid_alloc - Allocate an IOASID
>> + * @set: the IOASID set
>> + * @min: the minimum ID (inclusive)
>> + * @max: the maximum ID (inclusive)
>> + * @private: data private to the caller
>> + *
>> + * Allocate an ID between @min and @max. The @private pointer is stored
>> + * internally and can be retrieved with ioasid_find().
>> + *
>> + * Return: the allocated ID on success, or %INVALID_IOASID on failure.
>> + */
>> +ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, ioasid_t max,
>> +		      void *private)
>> +{
>> +	u32 id = INVALID_IOASID;
>> +	struct ioasid_data *data;
>> +
>> +	data = kzalloc(sizeof(*data), GFP_KERNEL);
>> +	if (!data)
>> +		return INVALID_IOASID;
>> +
>> +	data->set = set;
>> +	data->private = private;
>> +
>> +	if (xa_alloc(&ioasid_xa, &id, data, XA_LIMIT(min, max), GFP_KERNEL)) {
>> +		pr_err("Failed to alloc ioasid from %d to %d\n", min, max);
>> +		goto exit_free;
>> +	}
>> +	data->id = id;
>> +
>> +exit_free:
> 
> This error flow is perhaps a little more confusing than it needs to be?
> 
> My assumption (perhaps wrong) is that we only have an id == INVALID_IOASID
> if the xa_alloc fails, and that we will always have such an id value if
> it does (I'm not totally sure this second element is true in __xa_alloc).
> 
> If I'm missing something perhaps a comment on how else we'd get here.

Yes we can simplify this:

		return id;
	exit_free:
		kfree(data)
		return INVALID_IOASID;
	}

The XA API doesn't say that @id passed to xa_alloc() won't be modified
in case of error. It's true in the current implementation, but won't
necessarily stay that way. On the other hand I think it's safe to expect
@id to always be set when xa_alloc() succeeds.

>> +/**
>> + * ioasid_find - Find IOASID data
>> + * @set: the IOASID set
>> + * @ioasid: the IOASID to find
>> + * @getter: function to call on the found object
>> + *
>> + * The optional getter function allows to take a reference to the found object
>> + * under the rcu lock. The function can also check if the object is still valid:
>> + * if @getter returns false, then the object is invalid and NULL is returned.
>> + *
>> + * If the IOASID has been allocated for this set, return the private pointer
>> + * passed to ioasid_alloc. Private data can be NULL if not set. Return an error
>> + * if the IOASID is not found or does not belong to the set.
> 
> Perhaps should make it clear that @set can be null.

Indeed. But I'm not sure allowing @set to be NULL is such a good idea,
because the data type associated to an ioasid depends on its set. For
example SVA will put an mm_struct in there, and auxiliary domains use
some structure private to the IOMMU domain.

Jacob, could me make @set mandatory, or do you see a use for a global
search? If @set is NULL, then callers can check if the return pointer is
NULL, but will run into trouble if they try to dereference it.

> 
>> + */
>> +void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
>> +		  bool (*getter)(void *))
>> +{
>> +	void *priv = NULL;
> 
> Set in all paths, so does need to be set here.

Right

Thanks,
Jean

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/8] iommu/arm-smmu-v3: Support platform SSID
  2019-06-11  9:42   ` Jonathan Cameron
@ 2019-06-11 14:35     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2019-06-11 14:35 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: mark.rutland, devicetree, will.deacon, linux-kernel, iommu,
	robh+dt, robin.murphy, linux-arm-kernel

On 11/06/2019 10:42, Jonathan Cameron wrote:
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index 519e40fb23ce..b91df613385f 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -536,6 +536,7 @@ struct iommu_fwspec {
>>  	struct fwnode_handle	*iommu_fwnode;
>>  	void			*iommu_priv;
>>  	u32			flags;
>> +	u32			num_pasid_bits;
> 
> This structure has kernel doc so you need to add something for this.

Good catch

Thanks,
Jean

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/8] iommu/arm-smmu-v3: Add support for Substream IDs
  2019-06-11 10:19   ` Jonathan Cameron
@ 2019-06-11 14:35     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2019-06-11 14:35 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: mark.rutland, devicetree, will.deacon, linux-kernel, iommu,
	robh+dt, robin.murphy, linux-arm-kernel

On 11/06/2019 11:19, Jonathan Cameron wrote:
>> +static int arm_smmu_alloc_cd_tables(struct arm_smmu_domain *smmu_domain,
>> +				    struct arm_smmu_master *master)
>> +{
>> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
>> +	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
>>  
>> -	cfg->cdptr[0] = cpu_to_le64(val);
>> +	cfg->s1fmt = STRTAB_STE_0_S1FMT_LINEAR;
>> +	cfg->s1cdmax = master->ssid_bits;
>> +	return arm_smmu_alloc_cd_leaf_table(smmu, &cfg->table, 1 << cfg->s1cdmax);
>> +}
>>  
>> -	val = cfg->cd.ttbr & CTXDESC_CD_1_TTB0_MASK;
>> -	cfg->cdptr[1] = cpu_to_le64(val);
> 
> Hmm. Diff was having a field day in trying to make the patch as unreadable as possible..

Ugh, yes. This part is a bit more readable with --patience, but I'll
also try to split the patch as you suggest

Thanks,
Jean


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] iommu/arm-smmu-v3: Add support for PCI PASID
  2019-06-11 10:45   ` Jonathan Cameron
@ 2019-06-11 14:35     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2019-06-11 14:35 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: mark.rutland, devicetree, will.deacon, linux-kernel, iommu,
	robh+dt, robin.murphy, linux-arm-kernel

On 11/06/2019 11:45, Jonathan Cameron wrote:
>> +	pci_disable_pasid(pdev);
>> +	master->ssid_bits = 0;
> 
> If we are being really fussy about ordering, why have this set of
> ssid_bits after pci_disable_pasid rather than before (to reverse order
> of .._enable_pasid)?

Sure, I'll change that

Thanks,
Jean

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/8] iommu: Add I/O ASID allocator
  2019-06-11 12:26   ` Jacob Pan
@ 2019-06-11 14:37     ` Jean-Philippe Brucker
  2019-06-11 17:10       ` Jacob Pan
  0 siblings, 1 reply; 44+ messages in thread
From: Jean-Philippe Brucker @ 2019-06-11 14:37 UTC (permalink / raw)
  To: Jacob Pan
  Cc: mark.rutland, devicetree, will.deacon, linux-kernel, iommu,
	robh+dt, robin.murphy, linux-arm-kernel

On 11/06/2019 13:26, Jacob Pan wrote:
>> +/**
>> + * ioasid_set_data - Set private data for an allocated ioasid
>> + * @ioasid: the ID to set data
>> + * @data:   the private data
>> + *
>> + * For IOASID that is already allocated, private data can be set
>> + * via this API. Future lookup can be done via ioasid_find.
>> + */
>> +int ioasid_set_data(ioasid_t ioasid, void *data)
>> +{
>> +	struct ioasid_data *ioasid_data;
>> +	int ret = 0;
>> +
>> +	xa_lock(&ioasid_xa);
> Just wondering if this is necessary, since xa_load is under
> rcu_read_lock and we are not changing anything internal to xa. For
> custom allocator I still need to have the mutex against allocator
> removal.

I think we do need this because of a possible race with ioasid_free():

         CPU1                      CPU2
  ioasid_free(ioasid)     ioasid_set_data(ioasid, foo)
                            data = xa_load(...)
    xa_erase(...)
    kfree_rcu(data)           (no RCU lock held)
    ...free(data)
                            data->private = foo;

The issue is theoretical at the moment because no users do this, but I'd
be more comfortable taking the xa_lock, which prevents a concurrent
xa_erase()+free(). (I commented on your v3 but you might have missed it)

>> +	ioasid_data = xa_load(&ioasid_xa, ioasid);
>> +	if (ioasid_data)
>> +		rcu_assign_pointer(ioasid_data->private, data);
> it is good to publish and have barrier here. But I just wonder even for
> weakly ordered machine, this pointer update is quite far away from its
> data update.

I don't know, it could be right before calling ioasid_set_data():

	mydata = kzalloc(sizeof(*mydata));
	mydata->ops = &my_ops;			(1)
	ioasid_set_data(ioasid, mydata);
		... /* no write barrier here */
		data->private = mydata;		(2)

And then another thread calls ioasid_find():

	mydata = ioasid_find(ioasid);
	if (mydata)
		mydata->ops->do_something();

On a weakly ordered machine, this thread could observe the pointer
assignment (2) before the ops assignment (1), and dereference NULL.
Using rcu_assign_pointer() should fix that

Thanks,
Jean

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/8] iommu: Add I/O ASID allocator
  2019-06-11 14:37     ` Jean-Philippe Brucker
@ 2019-06-11 17:10       ` Jacob Pan
  2019-06-12 11:30         ` Jean-Philippe Brucker
  0 siblings, 1 reply; 44+ messages in thread
From: Jacob Pan @ 2019-06-11 17:10 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: mark.rutland, devicetree, will.deacon, linux-kernel, iommu,
	robh+dt, robin.murphy, linux-arm-kernel, jacob.jun.pan

On Tue, 11 Jun 2019 15:37:42 +0100
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> On 11/06/2019 13:26, Jacob Pan wrote:
> >> +/**
> >> + * ioasid_set_data - Set private data for an allocated ioasid
> >> + * @ioasid: the ID to set data
> >> + * @data:   the private data
> >> + *
> >> + * For IOASID that is already allocated, private data can be set
> >> + * via this API. Future lookup can be done via ioasid_find.
> >> + */
> >> +int ioasid_set_data(ioasid_t ioasid, void *data)
> >> +{
> >> +	struct ioasid_data *ioasid_data;
> >> +	int ret = 0;
> >> +
> >> +	xa_lock(&ioasid_xa);  
> > Just wondering if this is necessary, since xa_load is under
> > rcu_read_lock and we are not changing anything internal to xa. For
> > custom allocator I still need to have the mutex against allocator
> > removal.  
> 
> I think we do need this because of a possible race with ioasid_free():
> 
>          CPU1                      CPU2
>   ioasid_free(ioasid)     ioasid_set_data(ioasid, foo)
>                             data = xa_load(...)
>     xa_erase(...)
>     kfree_rcu(data)           (no RCU lock held)
>     ...free(data)
>                             data->private = foo;
> 
make sense, thanks for explaining.

> The issue is theoretical at the moment because no users do this, but
> I'd be more comfortable taking the xa_lock, which prevents a
> concurrent xa_erase()+free(). (I commented on your v3 but you might
> have missed it)
> 
Did you reply to my v3? I did not see it. I only saw your comments about
v3 in your commit message.

> >> +	ioasid_data = xa_load(&ioasid_xa, ioasid);
> >> +	if (ioasid_data)
> >> +		rcu_assign_pointer(ioasid_data->private, data);  
> > it is good to publish and have barrier here. But I just wonder even
> > for weakly ordered machine, this pointer update is quite far away
> > from its data update.  
> 
> I don't know, it could be right before calling ioasid_set_data():
> 
> 	mydata = kzalloc(sizeof(*mydata));
> 	mydata->ops = &my_ops;			(1)
> 	ioasid_set_data(ioasid, mydata);
> 		... /* no write barrier here */
> 		data->private = mydata;		(2)
> 
> And then another thread calls ioasid_find():
> 
> 	mydata = ioasid_find(ioasid);
> 	if (mydata)
> 		mydata->ops->do_something();
> 
> On a weakly ordered machine, this thread could observe the pointer
> assignment (2) before the ops assignment (1), and dereference NULL.
> Using rcu_assign_pointer() should fix that
> 
I agree it is better to have the barrier. Just thought there is already
a rcu_read_lock() in xa_load() in between. rcu_read_lock() may have
barrier in some case but better not count on it. No issues here. I will
integrate this in the next version.

> Thanks,
> Jean

[Jacob Pan]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/8] iommu: Add I/O ASID allocator
  2019-06-11 14:35     ` Jean-Philippe Brucker
@ 2019-06-11 18:13       ` Jacob Pan
  2019-06-18 14:22         ` Jean-Philippe Brucker
  0 siblings, 1 reply; 44+ messages in thread
From: Jacob Pan @ 2019-06-11 18:13 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Jonathan Cameron, mark.rutland, devicetree, will.deacon,
	linux-kernel, iommu, robh+dt, robin.murphy, linux-arm-kernel,
	jacob.jun.pan

On Tue, 11 Jun 2019 15:35:22 +0100
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> On 11/06/2019 10:36, Jonathan Cameron wrote:
> >> +/**
> >> + * ioasid_alloc - Allocate an IOASID
> >> + * @set: the IOASID set
> >> + * @min: the minimum ID (inclusive)
> >> + * @max: the maximum ID (inclusive)
> >> + * @private: data private to the caller
> >> + *
> >> + * Allocate an ID between @min and @max. The @private pointer is
> >> stored
> >> + * internally and can be retrieved with ioasid_find().
> >> + *
> >> + * Return: the allocated ID on success, or %INVALID_IOASID on
> >> failure.
> >> + */
> >> +ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min,
> >> ioasid_t max,
> >> +		      void *private)
> >> +{
> >> +	u32 id = INVALID_IOASID;
> >> +	struct ioasid_data *data;
> >> +
> >> +	data = kzalloc(sizeof(*data), GFP_KERNEL);
> >> +	if (!data)
> >> +		return INVALID_IOASID;
> >> +
> >> +	data->set = set;
> >> +	data->private = private;
> >> +
> >> +	if (xa_alloc(&ioasid_xa, &id, data, XA_LIMIT(min, max),
> >> GFP_KERNEL)) {
> >> +		pr_err("Failed to alloc ioasid from %d to %d\n",
> >> min, max);
> >> +		goto exit_free;
> >> +	}
> >> +	data->id = id;
> >> +
> >> +exit_free:  
> > 
> > This error flow is perhaps a little more confusing than it needs to
> > be?
> > 
> > My assumption (perhaps wrong) is that we only have an id ==
> > INVALID_IOASID if the xa_alloc fails, and that we will always have
> > such an id value if it does (I'm not totally sure this second
> > element is true in __xa_alloc).
> > 
> > If I'm missing something perhaps a comment on how else we'd get
> > here.  
> 
> Yes we can simplify this:
> 
> 		return id;
> 	exit_free:
> 		kfree(data)
> 		return INVALID_IOASID;
> 	}
> 
> The XA API doesn't say that @id passed to xa_alloc() won't be modified
> in case of error. It's true in the current implementation, but won't
> necessarily stay that way. On the other hand I think it's safe to
> expect @id to always be set when xa_alloc() succeeds.
> 
the flow with custom allocator is slightly different, but you are right
I can simplify it as you suggested.
Jonathan, I will add you to the cc list in next version. If you could
also review the current version, it would be greatly appreciated.

https://lore.kernel.org/lkml/1560087862-57608-13-git-send-email-jacob.jun.pan@linux.intel.com/

> >> +/**
> >> + * ioasid_find - Find IOASID data
> >> + * @set: the IOASID set
> >> + * @ioasid: the IOASID to find
> >> + * @getter: function to call on the found object
> >> + *
> >> + * The optional getter function allows to take a reference to the
> >> found object
> >> + * under the rcu lock. The function can also check if the object
> >> is still valid:
> >> + * if @getter returns false, then the object is invalid and NULL
> >> is returned.
> >> + *
> >> + * If the IOASID has been allocated for this set, return the
> >> private pointer
> >> + * passed to ioasid_alloc. Private data can be NULL if not set.
> >> Return an error
> >> + * if the IOASID is not found or does not belong to the set.  
> > 
> > Perhaps should make it clear that @set can be null.  
> 
> Indeed. But I'm not sure allowing @set to be NULL is such a good idea,
> because the data type associated to an ioasid depends on its set. For
> example SVA will put an mm_struct in there, and auxiliary domains use
> some structure private to the IOMMU domain.
> 
I am not sure we need to count on @set to decipher data type. Whoever
does the allocation and owns the IOASID should knows its own data type.
My thought was that @set is only used to group IDs, permission check
etc.

> Jacob, could me make @set mandatory, or do you see a use for a global
> search? If @set is NULL, then callers can check if the return pointer
> is NULL, but will run into trouble if they try to dereference it.
> 
A global search use case can be for PRQ. IOMMU driver gets a IOASID
(first interrupt then retrieve from a queue), it has no idea which
@set it belongs to. But the data types are the same for all IOASIDs
used by the IOMMU.
If @set is NULL, the search does not check set match. It is separate
from return pointer. Sorry i am not seeing the problems here.

> >   
> >> + */
> >> +void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
> >> +		  bool (*getter)(void *))
> >> +{
> >> +	void *priv = NULL;  
> > 
> > Set in all paths, so does need to be set here.  
> 
> Right
> 
> Thanks,
> Jean

[Jacob Pan]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/8] iommu: Add I/O ASID allocator
  2019-06-11 17:10       ` Jacob Pan
@ 2019-06-12 11:30         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2019-06-12 11:30 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Mark Rutland, devicetree, Will Deacon, linux-kernel, iommu,
	robh+dt, Robin Murphy, linux-arm-kernel

On 11/06/2019 18:10, Jacob Pan wrote:
>> The issue is theoretical at the moment because no users do this, but
>> I'd be more comfortable taking the xa_lock, which prevents a
>> concurrent xa_erase()+free(). (I commented on your v3 but you might
>> have missed it)
>>
> Did you reply to my v3? I did not see it. I only saw your comments about
> v3 in your commit message.

My fault, I sneaked the comments in a random reply three levels down the
thread:
https://lore.kernel.org/linux-iommu/836caf0d-699e-33ba-5303-b1c9c949c9ca@arm.com/

(Great, linux-iommu is indexed by lore! I won't have to Cc lkml anymore)

>>>> +	ioasid_data = xa_load(&ioasid_xa, ioasid);
>>>> +	if (ioasid_data)
>>>> +		rcu_assign_pointer(ioasid_data->private, data);  
>>> it is good to publish and have barrier here. But I just wonder even
>>> for weakly ordered machine, this pointer update is quite far away
>>> from its data update.  
>>
>> I don't know, it could be right before calling ioasid_set_data():
>>
>> 	mydata = kzalloc(sizeof(*mydata));
>> 	mydata->ops = &my_ops;			(1)
>> 	ioasid_set_data(ioasid, mydata);
>> 		... /* no write barrier here */
>> 		data->private = mydata;		(2)
>>
>> And then another thread calls ioasid_find():
>>
>> 	mydata = ioasid_find(ioasid);
>> 	if (mydata)
>> 		mydata->ops->do_something();
>>
>> On a weakly ordered machine, this thread could observe the pointer
>> assignment (2) before the ops assignment (1), and dereference NULL.
>> Using rcu_assign_pointer() should fix that
>>
> I agree it is better to have the barrier. Just thought there is already
> a rcu_read_lock() in xa_load() in between. rcu_read_lock() may have
> barrier in some case but better not count on it. 

Yes, and even if rcu_read_lock() provided a barrier I don't think it
would be sufficient, because acquire semantics don't guarantee that
prior writes appear to happen before the barrier, only the other way
round. A lock operation with release semantics, for example
spin_unlock(), should work.

Thanks,
Jean

> No issues here. I will
> integrate this in the next version.
> 
>> Thanks,
>> Jean
> 
> [Jacob Pan]
> 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/8] iommu: Add I/O ASID allocator
  2019-06-11 18:13       ` Jacob Pan
@ 2019-06-18 14:22         ` Jean-Philippe Brucker
  2019-06-18 17:05           ` Jacob Pan
  0 siblings, 1 reply; 44+ messages in thread
From: Jean-Philippe Brucker @ 2019-06-18 14:22 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Jonathan Cameron, Mark Rutland, devicetree, Will Deacon,
	linux-kernel, iommu, robh+dt, Robin Murphy, linux-arm-kernel

On 11/06/2019 19:13, Jacob Pan wrote:
>>>> +/**
>>>> + * ioasid_find - Find IOASID data
>>>> + * @set: the IOASID set
>>>> + * @ioasid: the IOASID to find
>>>> + * @getter: function to call on the found object
>>>> + *
>>>> + * The optional getter function allows to take a reference to the
>>>> found object
>>>> + * under the rcu lock. The function can also check if the object
>>>> is still valid:
>>>> + * if @getter returns false, then the object is invalid and NULL
>>>> is returned.
>>>> + *
>>>> + * If the IOASID has been allocated for this set, return the
>>>> private pointer
>>>> + * passed to ioasid_alloc. Private data can be NULL if not set.
>>>> Return an error
>>>> + * if the IOASID is not found or does not belong to the set.  
>>>
>>> Perhaps should make it clear that @set can be null.  
>>
>> Indeed. But I'm not sure allowing @set to be NULL is such a good idea,
>> because the data type associated to an ioasid depends on its set. For
>> example SVA will put an mm_struct in there, and auxiliary domains use
>> some structure private to the IOMMU domain.
>>
> I am not sure we need to count on @set to decipher data type. Whoever
> does the allocation and owns the IOASID should knows its own data type.
> My thought was that @set is only used to group IDs, permission check
> etc.
> 
>> Jacob, could me make @set mandatory, or do you see a use for a global
>> search? If @set is NULL, then callers can check if the return pointer
>> is NULL, but will run into trouble if they try to dereference it.
>>
> A global search use case can be for PRQ. IOMMU driver gets a IOASID
> (first interrupt then retrieve from a queue), it has no idea which
> @set it belongs to. But the data types are the same for all IOASIDs
> used by the IOMMU.

They aren't when we use a generic SVA handler. Following a call to
iommu_sva_bind_device(), iommu-sva.c allocates an IOASID and store an
mm_struct. If auxiliary domains are also enabled for the device,
following a call to iommu_aux_attach_device() the IOMMU driver allocates
an IOASID and stores some private object.

Now for example the IOMMU driver receives a PPR and calls ioasid_find()
with @set = NULL. ioasid_find() may return either an mm_struct or a
private object, and the driver cannot know which it is so the returned
value is unusable.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/8] iommu: Add I/O ASID allocator
  2019-06-18 14:22         ` Jean-Philippe Brucker
@ 2019-06-18 17:05           ` Jacob Pan
  2019-06-19 14:26             ` Jean-Philippe Brucker
  0 siblings, 1 reply; 44+ messages in thread
From: Jacob Pan @ 2019-06-18 17:05 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Jonathan Cameron, Mark Rutland, devicetree, Will Deacon,
	linux-kernel, iommu, robh+dt, Robin Murphy, linux-arm-kernel,
	jacob.jun.pan

On Tue, 18 Jun 2019 15:22:20 +0100
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> On 11/06/2019 19:13, Jacob Pan wrote:
> >>>> +/**
> >>>> + * ioasid_find - Find IOASID data
> >>>> + * @set: the IOASID set
> >>>> + * @ioasid: the IOASID to find
> >>>> + * @getter: function to call on the found object
> >>>> + *
> >>>> + * The optional getter function allows to take a reference to
> >>>> the found object
> >>>> + * under the rcu lock. The function can also check if the object
> >>>> is still valid:
> >>>> + * if @getter returns false, then the object is invalid and NULL
> >>>> is returned.
> >>>> + *
> >>>> + * If the IOASID has been allocated for this set, return the
> >>>> private pointer
> >>>> + * passed to ioasid_alloc. Private data can be NULL if not set.
> >>>> Return an error
> >>>> + * if the IOASID is not found or does not belong to the set.    
> >>>
> >>> Perhaps should make it clear that @set can be null.    
> >>
> >> Indeed. But I'm not sure allowing @set to be NULL is such a good
> >> idea, because the data type associated to an ioasid depends on its
> >> set. For example SVA will put an mm_struct in there, and auxiliary
> >> domains use some structure private to the IOMMU domain.
> >>  
> > I am not sure we need to count on @set to decipher data type.
> > Whoever does the allocation and owns the IOASID should knows its
> > own data type. My thought was that @set is only used to group IDs,
> > permission check etc.
> >   
> >> Jacob, could me make @set mandatory, or do you see a use for a
> >> global search? If @set is NULL, then callers can check if the
> >> return pointer is NULL, but will run into trouble if they try to
> >> dereference it. 
> > A global search use case can be for PRQ. IOMMU driver gets a IOASID
> > (first interrupt then retrieve from a queue), it has no idea which
> > @set it belongs to. But the data types are the same for all IOASIDs
> > used by the IOMMU.  
> 
> They aren't when we use a generic SVA handler. Following a call to
> iommu_sva_bind_device(), iommu-sva.c allocates an IOASID and store an
> mm_struct. If auxiliary domains are also enabled for the device,
> following a call to iommu_aux_attach_device() the IOMMU driver
> allocates an IOASID and stores some private object.
> 
> Now for example the IOMMU driver receives a PPR and calls
> ioasid_find() with @set = NULL. ioasid_find() may return either an
> mm_struct or a private object, and the driver cannot know which it is
> so the returned value is unusable.
I think we might have a misunderstanding of what ioasid_set is. Or i
have misused your original intention for it:) So my understanding of an
ioasid_set is to identify a sub set of IOASIDs that
1. shares the same data type
2. belongs to the same permission group, owner.

Our usage is #2., we put a mm_struct as the set to do permission
check. E.g VFIO can check guest PASID ownership based on QEMU process
mm. This will make sure IOASID allocated by one guest can only be used
by the same guest.

For IOASID used for sva bind, let it be native or guest sva, we always
have the same data type. Therefore, when page request handler gets
called to search with ioasid_find(), it knows its data type. An
additional flag will tell if it is a guest bind or native bind.

I was under the assumption that aux domain and its IOASID is a 1:1
mapping, there is no need for a search. Or even it needs to search, it
will be searched by the same caller that did the allocation, therefore
it knows what private data type is.

In addition, aux domain is used for request w/o PASID. And PPR for
request w/o PASID is not to be supported. So there would not be a need
to search from page request handler.

Now if we take the above assumption away, and use ioasid_set strictly
to differentiate the data types (Usage #1). Then I agree we can get
rid of NULL set and global search.

That would require we converge on the generic sva_bind for both
native and guest. The additional implication is that VFIO layer has to
be SVA struct aware in order to sanitize the mm_struct for guest bind.
i.e. check guest ownership by using QEMU process mm. Whereas we have
today, VFIO just use IOASID set as mm to check ownership, no need to
know the type.

Can we keep the NULL set for now and remove it __after__ we converge on
the sva bind flows?

Thanks,

Jacob

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/8] iommu/arm-smmu-v3: Support platform SSID
  2019-06-10 18:47 ` [PATCH 3/8] iommu/arm-smmu-v3: Support platform SSID Jean-Philippe Brucker
  2019-06-11  9:42   ` Jonathan Cameron
@ 2019-06-18 18:08   ` Will Deacon
  2019-06-19 11:53     ` Jean-Philippe Brucker
  2019-07-08  7:58   ` Auger Eric
  2 siblings, 1 reply; 44+ messages in thread
From: Will Deacon @ 2019-06-18 18:08 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: joro, robh+dt, mark.rutland, robin.murphy, jacob.jun.pan, iommu,
	devicetree, linux-kernel, linux-arm-kernel, eric.auger

On Mon, Jun 10, 2019 at 07:47:09PM +0100, Jean-Philippe Brucker wrote:
> For platform devices that support SubstreamID (SSID), firmware provides
> the number of supported SSID bits. Restrict it to what the SMMU supports
> and cache it into master->ssid_bits.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  drivers/iommu/arm-smmu-v3.c | 11 +++++++++++
>  drivers/iommu/of_iommu.c    |  6 +++++-
>  include/linux/iommu.h       |  1 +
>  3 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 4d5a694f02c2..3254f473e681 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -604,6 +604,7 @@ struct arm_smmu_master {
>  	struct list_head		domain_head;
>  	u32				*sids;
>  	unsigned int			num_sids;
> +	unsigned int			ssid_bits;
>  	bool				ats_enabled		:1;
>  };
>  
> @@ -2097,6 +2098,16 @@ static int arm_smmu_add_device(struct device *dev)
>  		}
>  	}
>  
> +	master->ssid_bits = min(smmu->ssid_bits, fwspec->num_pasid_bits);
> +
> +	/*
> +	 * If the SMMU doesn't support 2-stage CD, limit the linear
> +	 * tables to a reasonable number of contexts, let's say
> +	 * 64kB / sizeof(ctx_desc) = 1024 = 2^10
> +	 */
> +	if (!(smmu->features & ARM_SMMU_FEAT_2_LVL_CDTAB))
> +		master->ssid_bits = min(master->ssid_bits, 10U);

Please introduce a #define for the 10, so that it is computed in the way
you describe in the comment (a bit like we do for things like queue sizes).

> +
>  	group = iommu_group_get_for_dev(dev);
>  	if (!IS_ERR(group)) {
>  		iommu_group_put(group);
> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
> index f04a6df65eb8..04f4f6b95d82 100644
> --- a/drivers/iommu/of_iommu.c
> +++ b/drivers/iommu/of_iommu.c
> @@ -206,8 +206,12 @@ const struct iommu_ops *of_iommu_configure(struct device *dev,
>  			if (err)
>  				break;
>  		}
> -	}
>  
> +		fwspec = dev_iommu_fwspec_get(dev);
> +		if (!err && fwspec)
> +			of_property_read_u32(master_np, "pasid-num-bits",
> +					     &fwspec->num_pasid_bits);
> +	}

Hmm. Do you know if there's anything in ACPI for this?

Otherwise, patch looks fine. Thanks.

Will

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/8] iommu/arm-smmu-v3: Support platform SSID
  2019-06-18 18:08   ` Will Deacon
@ 2019-06-19 11:53     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2019-06-19 11:53 UTC (permalink / raw)
  To: Will Deacon
  Cc: joro, robh+dt, Mark Rutland, Robin Murphy, jacob.jun.pan, iommu,
	devicetree, linux-kernel, linux-arm-kernel, eric.auger

On 18/06/2019 19:08, Will Deacon wrote:
>> +	/*
>> +	 * If the SMMU doesn't support 2-stage CD, limit the linear
>> +	 * tables to a reasonable number of contexts, let's say
>> +	 * 64kB / sizeof(ctx_desc) = 1024 = 2^10
>> +	 */
>> +	if (!(smmu->features & ARM_SMMU_FEAT_2_LVL_CDTAB))
>> +		master->ssid_bits = min(master->ssid_bits, 10U);
> 
> Please introduce a #define for the 10, so that it is computed in the way
> you describe in the comment (a bit like we do for things like queue sizes).

Ok

>> +
>>  	group = iommu_group_get_for_dev(dev);
>>  	if (!IS_ERR(group)) {
>>  		iommu_group_put(group);
>> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
>> index f04a6df65eb8..04f4f6b95d82 100644
>> --- a/drivers/iommu/of_iommu.c
>> +++ b/drivers/iommu/of_iommu.c
>> @@ -206,8 +206,12 @@ const struct iommu_ops *of_iommu_configure(struct device *dev,
>>  			if (err)
>>  				break;
>>  		}
>> -	}
>>  
>> +		fwspec = dev_iommu_fwspec_get(dev);
>> +		if (!err && fwspec)
>> +			of_property_read_u32(master_np, "pasid-num-bits",
>> +					     &fwspec->num_pasid_bits);
>> +	}
> 
> Hmm. Do you know if there's anything in ACPI for this?

Yes, IORT version D introduced a "substream width" field for the Named
component node (platform device). I don't think it existed last time I
checked, so I'll see about supporting it.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/8] iommu: Add I/O ASID allocator
  2019-06-18 17:05           ` Jacob Pan
@ 2019-06-19 14:26             ` Jean-Philippe Brucker
  0 siblings, 0 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2019-06-19 14:26 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Mark Rutland, devicetree, Will Deacon, linux-kernel, iommu,
	robh+dt, Robin Murphy, linux-arm-kernel

On 18/06/2019 18:05, Jacob Pan wrote:
> On Tue, 18 Jun 2019 15:22:20 +0100
> Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:
> 
>> On 11/06/2019 19:13, Jacob Pan wrote:
>>>>>> +/**
>>>>>> + * ioasid_find - Find IOASID data
>>>>>> + * @set: the IOASID set
>>>>>> + * @ioasid: the IOASID to find
>>>>>> + * @getter: function to call on the found object
>>>>>> + *
>>>>>> + * The optional getter function allows to take a reference to
>>>>>> the found object
>>>>>> + * under the rcu lock. The function can also check if the object
>>>>>> is still valid:
>>>>>> + * if @getter returns false, then the object is invalid and NULL
>>>>>> is returned.
>>>>>> + *
>>>>>> + * If the IOASID has been allocated for this set, return the
>>>>>> private pointer
>>>>>> + * passed to ioasid_alloc. Private data can be NULL if not set.
>>>>>> Return an error
>>>>>> + * if the IOASID is not found or does not belong to the set.    
>>>>>
>>>>> Perhaps should make it clear that @set can be null.    
>>>>
>>>> Indeed. But I'm not sure allowing @set to be NULL is such a good
>>>> idea, because the data type associated to an ioasid depends on its
>>>> set. For example SVA will put an mm_struct in there, and auxiliary
>>>> domains use some structure private to the IOMMU domain.
>>>>  
>>> I am not sure we need to count on @set to decipher data type.
>>> Whoever does the allocation and owns the IOASID should knows its
>>> own data type. My thought was that @set is only used to group IDs,
>>> permission check etc.
>>>   
>>>> Jacob, could me make @set mandatory, or do you see a use for a
>>>> global search? If @set is NULL, then callers can check if the
>>>> return pointer is NULL, but will run into trouble if they try to
>>>> dereference it. 
>>> A global search use case can be for PRQ. IOMMU driver gets a IOASID
>>> (first interrupt then retrieve from a queue), it has no idea which
>>> @set it belongs to. But the data types are the same for all IOASIDs
>>> used by the IOMMU.  
>>
>> They aren't when we use a generic SVA handler. Following a call to
>> iommu_sva_bind_device(), iommu-sva.c allocates an IOASID and store an
>> mm_struct. If auxiliary domains are also enabled for the device,
>> following a call to iommu_aux_attach_device() the IOMMU driver
>> allocates an IOASID and stores some private object.
>>
>> Now for example the IOMMU driver receives a PPR and calls
>> ioasid_find() with @set = NULL. ioasid_find() may return either an
>> mm_struct or a private object, and the driver cannot know which it is
>> so the returned value is unusable.
> I think we might have a misunderstanding of what ioasid_set is. Or i
> have misused your original intention for it:) So my understanding of an
> ioasid_set is to identify a sub set of IOASIDs that
> 1. shares the same data type
> 2. belongs to the same permission group, owner.

In my case it's mostly #1. The two IOASID sets (SVA and AUX) are managed
by different modules (iommu-sva and arm-smmu-v3). Since we don't do
scalable IOV, the two sets are shared_ioasid and private_ioasid, with
either an mm or NULL as private data (but we might need to store a
domain for private IOASIDs at some point). So at the moment passing a
NULL set to ioasid_find() would work for us as well.

However in a non-virtualization scenario, a device driver could define
its own set and allocate an IOASID for its own use, associating some
private structure with it. If it somehow causes a PRQ on that IOASID,
the IOMMU driver shouldn't attempt to access the device driver's private
structure. So I do think we need to be careful with global
ioasid_find(). Given that any driver in the system can use the
allocator, we won't be able to keep assuming that all objects stored in
there are of one type.

> Our usage is #2., we put a mm_struct as the set to do permission
> check. E.g VFIO can check guest PASID ownership based on QEMU process
> mm. This will make sure IOASID allocated by one guest can only be used
> by the same guest.
> 
> For IOASID used for sva bind, let it be native or guest sva, we always
> have the same data type. Therefore, when page request handler gets
> called to search with ioasid_find(), it knows its data type. An
> additional flag will tell if it is a guest bind or native bind.
> 
> I was under the assumption that aux domain and its IOASID is a 1:1
> mapping, there is no need for a search. Or even it needs to search, it
> will be searched by the same caller that did the allocation, therefore
> it knows what private data type is.
>
> In addition, aux domain is used for request w/o PASID. And PPR for
> request w/o PASID is not to be supported. So there would not be a need
> to search from page request handler.
> 
> Now if we take the above assumption away, and use ioasid_set strictly
> to differentiate the data types (Usage #1). Then I agree we can get
> rid of NULL set and global search.
> 
> That would require we converge on the generic sva_bind for both
> native and guest. The additional implication is that VFIO layer has to
> be SVA struct aware in order to sanitize the mm_struct for guest bind.
> i.e. check guest ownership by using QEMU process mm. Whereas we have
> today, VFIO just use IOASID set as mm to check ownership, no need to
> know the type.

Thanks for the explanation, I think I understand a little better after
taking a closer look at your v4.

> Can we keep the NULL set for now and remove it __after__ we converge on
> the sva bind flows?

Sure, let's revisit this later

Thanks,
Jean

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 6/8] iommu/arm-smmu-v3: Support auxiliary domains
  2019-06-10 18:47 ` [PATCH 6/8] iommu/arm-smmu-v3: Support auxiliary domains Jean-Philippe Brucker
@ 2019-06-26 17:59   ` Will Deacon
  2019-07-05 16:29     ` Jean-Philippe Brucker
  2019-09-19 15:06     ` Jean-Philippe Brucker
  0 siblings, 2 replies; 44+ messages in thread
From: Will Deacon @ 2019-06-26 17:59 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: will.deacon, joro, robh+dt, mark.rutland, robin.murphy,
	jacob.jun.pan, iommu, devicetree, linux-kernel, linux-arm-kernel,
	eric.auger

Hi Jean-Philippe,

On Mon, Jun 10, 2019 at 07:47:12PM +0100, Jean-Philippe Brucker wrote:
> In commit a3a195929d40 ("iommu: Add APIs for multiple domains per
> device"), the IOMMU API gained the concept of auxiliary domains (AUXD),
> which allows to control the PASID-tagged address spaces of a device. With
> AUXD the PASID address space are not shared with the CPU, but are instead
> modified with iommu_map() and iommu_unmap() calls on auxiliary domains.
> 
> Add auxiliary domain support to the SMMUv3 driver. Device drivers allocate
> an unmanaged IOMMU domain with iommu_domain_alloc(), and attach it to the
> device with iommu_aux_attach_domain().

[...]

> 
> The AUXD API is fairly permissive, and allows to attach an IOMMU domain in
> both normal and auxiliary mode at the same time - one device can be
> attached to the domain normally, and another device can be attached
> through one of its PASIDs. To avoid excessive complexity in the SMMU
> implementation we pose some restrictions on supported AUXD usage:
> 
> * A domain is either in auxiliary mode or normal mode. And that state is
>   sticky. Once detached the domain has to be re-attached in the same mode.
> 
> * An auxiliary domain can have a single parent domain. Two devices can be
>   attached to the same auxiliary domain only if they are attached to the
>   same parent domain.
> 
> In practice these shouldn't be problematic, since we have the same kind of
> restriction on normal domains and users have been able to cope so far: at
> the moment a domain cannot be attached to two devices behind different
> SMMUs. When VFIO puts two such devices in the same container, it simply
> falls back to allocating two separate IOMMU domains.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  drivers/iommu/Kconfig       |   1 +
>  drivers/iommu/arm-smmu-v3.c | 276 +++++++++++++++++++++++++++++++++---
>  2 files changed, 260 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 9b45f70549a7..d326fef3d3a6 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -393,6 +393,7 @@ config ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT
>  config ARM_SMMU_V3
>  	bool "ARM Ltd. System MMU Version 3 (SMMUv3) Support"
>  	depends on ARM64
> +	select IOASID
>  	select IOMMU_API
>  	select IOMMU_IO_PGTABLE_LPAE
>  	select GENERIC_MSI_IRQ_DOMAIN
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 326b71793336..633d829f246f 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -19,6 +19,7 @@
>  #include <linux/err.h>
>  #include <linux/interrupt.h>
>  #include <linux/io-pgtable.h>
> +#include <linux/ioasid.h>
>  #include <linux/iommu.h>
>  #include <linux/iopoll.h>
>  #include <linux/init.h>
> @@ -641,6 +642,7 @@ struct arm_smmu_master {
>  	unsigned int			num_sids;
>  	unsigned int			ssid_bits;
>  	bool				ats_enabled		:1;
> +	bool				auxd_enabled		:1;
>  };
>  
>  /* SMMU private data for an IOMMU domain */
> @@ -666,8 +668,14 @@ struct arm_smmu_domain {
>  
>  	struct iommu_domain		domain;
>  
> +	/* Unused in aux domains */
>  	struct list_head		devices;
>  	spinlock_t			devices_lock;
> +
> +	/* Auxiliary domain stuff */
> +	struct arm_smmu_domain		*parent;
> +	ioasid_t			ssid;
> +	unsigned long			aux_nr_devs;

Maybe use a union to avoid comments about what is used/unused?

> +static void arm_smmu_aux_detach_dev(struct iommu_domain *domain, struct device *dev)
> +{
> +	struct iommu_domain *parent_domain;
> +	struct arm_smmu_domain *parent_smmu_domain;
> +	struct arm_smmu_master *master = dev_to_master(dev);
> +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> +
> +	if (!arm_smmu_dev_feature_enabled(dev, IOMMU_DEV_FEAT_AUX))
> +		return;
> +
> +	parent_domain = iommu_get_domain_for_dev(dev);
> +	if (!parent_domain)
> +		return;
> +	parent_smmu_domain = to_smmu_domain(parent_domain);
> +
> +	mutex_lock(&smmu_domain->init_mutex);
> +	if (!smmu_domain->aux_nr_devs)
> +		goto out_unlock;
> +
> +	if (!--smmu_domain->aux_nr_devs) {
> +		arm_smmu_write_ctx_desc(parent_smmu_domain, smmu_domain->ssid,
> +					NULL);
> +		/*
> +		 * TLB doesn't need invalidation since accesses from the device
> +		 * can't use this domain's ASID once the CD is clear.
> +		 *
> +		 * Sadly that doesn't apply to ATCs, which are PASID tagged.
> +		 * Invalidate all other devices as well, because even though
> +		 * they weren't 'officially' attached to the auxiliary domain,
> +		 * they could have formed ATC entries.
> +		 */
> +		arm_smmu_atc_inv_domain(smmu_domain, 0, 0);

I've been struggling to understand the locking here, since both
arm_smmu_write_ctx_desc and arm_smmu_atc_inv_domain take and release the
devices_lock for the domain. Is there not a problem with devices coming and
going in-between the two calls?

> +	} else {
> +		struct arm_smmu_cmdq_ent cmd;
> +
> +		/* Invalidate only this device's ATC */
> +		if (master->ats_enabled) {
> +			arm_smmu_atc_inv_to_cmd(smmu_domain->ssid, 0, 0, &cmd);
> +			arm_smmu_atc_inv_master(master, &cmd);
> +		}
> +	}
> +out_unlock:
> +	mutex_unlock(&smmu_domain->init_mutex);
> +}
> +
> +static int arm_smmu_aux_get_pasid(struct iommu_domain *domain, struct device *dev)
> +{
> +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> +
> +	return smmu_domain->ssid ?: -EINVAL;
> +}
> +
>  static struct iommu_ops arm_smmu_ops = {
>  	.capable		= arm_smmu_capable,
>  	.domain_alloc		= arm_smmu_domain_alloc,
> @@ -2539,6 +2772,13 @@ static struct iommu_ops arm_smmu_ops = {
>  	.of_xlate		= arm_smmu_of_xlate,
>  	.get_resv_regions	= arm_smmu_get_resv_regions,
>  	.put_resv_regions	= arm_smmu_put_resv_regions,
> +	.dev_has_feat		= arm_smmu_dev_has_feature,
> +	.dev_feat_enabled	= arm_smmu_dev_feature_enabled,
> +	.dev_enable_feat	= arm_smmu_dev_enable_feature,
> +	.dev_disable_feat	= arm_smmu_dev_disable_feature,

Why can't we use the existing ->capable and ->dev_{get,set}_attr callbacks
for this?

Will

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/8] iommu/arm-smmu-v3: Add support for Substream IDs
  2019-06-10 18:47 ` [PATCH 4/8] iommu/arm-smmu-v3: Add support for Substream IDs Jean-Philippe Brucker
  2019-06-11 10:19   ` Jonathan Cameron
@ 2019-06-26 18:00   ` Will Deacon
  2019-07-04  9:33     ` Jean-Philippe Brucker
  2019-09-19 14:57     ` Jean-Philippe Brucker
  2019-07-08 15:31   ` Auger Eric
  2 siblings, 2 replies; 44+ messages in thread
From: Will Deacon @ 2019-06-26 18:00 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: will.deacon, joro, robh+dt, mark.rutland, robin.murphy,
	jacob.jun.pan, iommu, devicetree, linux-kernel, linux-arm-kernel,
	eric.auger

On Mon, Jun 10, 2019 at 07:47:10PM +0100, Jean-Philippe Brucker wrote:
> At the moment, the SMMUv3 driver implements only one stage-1 or stage-2
> page directory per device. However SMMUv3 allows more than one address
> space for some devices, by providing multiple stage-1 page directories. In
> addition to the Stream ID (SID), that identifies a device, we can now have
> Substream IDs (SSID) identifying an address space. In PCIe, SID is called
> Requester ID (RID) and SSID is called Process Address-Space ID (PASID).
> 
> Prepare the driver for SSID support, by adding context descriptor tables
> in STEs (previously a single static context descriptor). A complete
> stage-1 walk is now performed like this by the SMMU:
> 
>       Stream tables          Ctx. tables          Page tables
>         +--------+   ,------->+-------+   ,------->+-------+
>         :        :   |        :       :   |        :       :
>         +--------+   |        +-------+   |        +-------+
>    SID->|  STE   |---'  SSID->|  CD   |---'  IOVA->|  PTE  |--> IPA
>         +--------+            +-------+            +-------+
>         :        :            :       :            :       :
>         +--------+            +-------+            +-------+
> 
> Implement a single level of context descriptor table for now, but as with
> stream and page tables, an SSID can be split to index multiple levels of
> tables.
> 
> In all stream table entries, we set S1DSS=SSID0 mode, making translations
> without an SSID use context descriptor 0. Although it would be possible by
> setting S1DSS=BYPASS, we don't currently support SSID when user selects
> iommu.passthrough.

I don't understand your comment here: iommu.passthrough works just as it did
before, right, since we set bypass in the STE config field so S1DSS is not
relevant? I also notice that SSID0 causes transactions with SSID==0 to
abort. Is a PASID of 0 reserved, so this doesn't matter?

> @@ -1062,33 +1143,90 @@ static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
>  	return val;
>  }
>  
> -static void arm_smmu_write_ctx_desc(struct arm_smmu_device *smmu,
> -				    struct arm_smmu_s1_cfg *cfg)
> +static int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
> +				   int ssid, struct arm_smmu_ctx_desc *cd)
>  {
>  	u64 val;
> +	bool cd_live;
> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
> +	__le64 *cdptr = arm_smmu_get_cd_ptr(&smmu_domain->s1_cfg, ssid);
>  
>  	/*
> -	 * We don't need to issue any invalidation here, as we'll invalidate
> -	 * the STE when installing the new entry anyway.
> +	 * This function handles the following cases:
> +	 *
> +	 * (1) Install primary CD, for normal DMA traffic (SSID = 0).
> +	 * (2) Install a secondary CD, for SID+SSID traffic.
> +	 * (3) Update ASID of a CD. Atomically write the first 64 bits of the
> +	 *     CD, then invalidate the old entry and mappings.
> +	 * (4) Remove a secondary CD.
>  	 */
> -	val = arm_smmu_cpu_tcr_to_cd(cfg->cd.tcr) |
> +
> +	if (!cdptr)
> +		return -ENOMEM;
> +
> +	val = le64_to_cpu(cdptr[0]);
> +	cd_live = !!(val & CTXDESC_CD_0_V);
> +
> +	if (!cd) { /* (4) */
> +		cdptr[0] = 0;

Should we be using WRITE_ONCE here? (although I notice we don't seem to
bother for STEs either...)

Will

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/8] iommu/arm-smmu-v3: Add support for Substream IDs
  2019-06-26 18:00   ` Will Deacon
@ 2019-07-04  9:33     ` Jean-Philippe Brucker
  2019-09-19 14:57     ` Jean-Philippe Brucker
  1 sibling, 0 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2019-07-04  9:33 UTC (permalink / raw)
  To: Will Deacon
  Cc: mark.rutland, devicetree, jacob.jun.pan, joro, will.deacon,
	linux-kernel, eric.auger, iommu, robh+dt, robin.murphy,
	linux-arm-kernel

On 26/06/2019 19:00, Will Deacon wrote:
> On Mon, Jun 10, 2019 at 07:47:10PM +0100, Jean-Philippe Brucker wrote:
>> At the moment, the SMMUv3 driver implements only one stage-1 or stage-2
>> page directory per device. However SMMUv3 allows more than one address
>> space for some devices, by providing multiple stage-1 page directories. In
>> addition to the Stream ID (SID), that identifies a device, we can now have
>> Substream IDs (SSID) identifying an address space. In PCIe, SID is called
>> Requester ID (RID) and SSID is called Process Address-Space ID (PASID).
>>
>> Prepare the driver for SSID support, by adding context descriptor tables
>> in STEs (previously a single static context descriptor). A complete
>> stage-1 walk is now performed like this by the SMMU:
>>
>>       Stream tables          Ctx. tables          Page tables
>>         +--------+   ,------->+-------+   ,------->+-------+
>>         :        :   |        :       :   |        :       :
>>         +--------+   |        +-------+   |        +-------+
>>    SID->|  STE   |---'  SSID->|  CD   |---'  IOVA->|  PTE  |--> IPA
>>         +--------+            +-------+            +-------+
>>         :        :            :       :            :       :
>>         +--------+            +-------+            +-------+
>>
>> Implement a single level of context descriptor table for now, but as with
>> stream and page tables, an SSID can be split to index multiple levels of
>> tables.
>>
>> In all stream table entries, we set S1DSS=SSID0 mode, making translations
>> without an SSID use context descriptor 0. Although it would be possible by
>> setting S1DSS=BYPASS, we don't currently support SSID when user selects
>> iommu.passthrough.
> 
> I don't understand your comment here: iommu.passthrough works just as it did
> before, right, since we set bypass in the STE config field so S1DSS is not
> relevant?

Yes the comment is wrong, or at least unclear. It isn't well defined how
SSID is supposed to work with iommu.passthrough, but I guess keeping the
same behavior as non-PASID DMA is what we want (any PASID-tagged DMA
also bypasses the SMMU.)

In the comment I was referring to another possibility, supporting SVA
and auxiliary domains even when iommu.passthrough is set. That would
require allocating context tables and setting S1DSS=BYPASS. But I don't
think it's a feature anyone needs at the moment.

> I also notice that SSID0 causes transactions with SSID==0 to
> abort. Is a PASID of 0 reserved, so this doesn't matter?

Yes, PASID 0 is reserved, we start allocation at 1

> 
>> @@ -1062,33 +1143,90 @@ static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
>>  	return val;
>>  }
>>  
>> -static void arm_smmu_write_ctx_desc(struct arm_smmu_device *smmu,
>> -				    struct arm_smmu_s1_cfg *cfg)
>> +static int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
>> +				   int ssid, struct arm_smmu_ctx_desc *cd)
>>  {
>>  	u64 val;
>> +	bool cd_live;
>> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
>> +	__le64 *cdptr = arm_smmu_get_cd_ptr(&smmu_domain->s1_cfg, ssid);
>>  
>>  	/*
>> -	 * We don't need to issue any invalidation here, as we'll invalidate
>> -	 * the STE when installing the new entry anyway.
>> +	 * This function handles the following cases:
>> +	 *
>> +	 * (1) Install primary CD, for normal DMA traffic (SSID = 0).
>> +	 * (2) Install a secondary CD, for SID+SSID traffic.
>> +	 * (3) Update ASID of a CD. Atomically write the first 64 bits of the
>> +	 *     CD, then invalidate the old entry and mappings.
>> +	 * (4) Remove a secondary CD.
>>  	 */
>> -	val = arm_smmu_cpu_tcr_to_cd(cfg->cd.tcr) |
>> +
>> +	if (!cdptr)
>> +		return -ENOMEM;
>> +
>> +	val = le64_to_cpu(cdptr[0]);
>> +	cd_live = !!(val & CTXDESC_CD_0_V);
>> +
>> +	if (!cd) { /* (4) */
>> +		cdptr[0] = 0;
> 
> Should we be using WRITE_ONCE here? (although I notice we don't seem to
> bother for STEs either...)

Sure, that's safer

Thanks,
Jean

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 6/8] iommu/arm-smmu-v3: Support auxiliary domains
  2019-06-26 17:59   ` Will Deacon
@ 2019-07-05 16:29     ` Jean-Philippe Brucker
  2019-09-19 15:06     ` Jean-Philippe Brucker
  1 sibling, 0 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2019-07-05 16:29 UTC (permalink / raw)
  To: Will Deacon
  Cc: Will Deacon, joro, robh+dt, Mark Rutland, Robin Murphy,
	jacob.jun.pan, iommu, devicetree, linux-kernel, linux-arm-kernel,
	eric.auger

On 26/06/2019 18:59, Will Deacon wrote:
>> +static void arm_smmu_aux_detach_dev(struct iommu_domain *domain, struct device *dev)
>> +{
>> +	struct iommu_domain *parent_domain;
>> +	struct arm_smmu_domain *parent_smmu_domain;
>> +	struct arm_smmu_master *master = dev_to_master(dev);
>> +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> +
>> +	if (!arm_smmu_dev_feature_enabled(dev, IOMMU_DEV_FEAT_AUX))
>> +		return;
>> +
>> +	parent_domain = iommu_get_domain_for_dev(dev);
>> +	if (!parent_domain)
>> +		return;
>> +	parent_smmu_domain = to_smmu_domain(parent_domain);
>> +
>> +	mutex_lock(&smmu_domain->init_mutex);
>> +	if (!smmu_domain->aux_nr_devs)
>> +		goto out_unlock;
>> +
>> +	if (!--smmu_domain->aux_nr_devs) {
>> +		arm_smmu_write_ctx_desc(parent_smmu_domain, smmu_domain->ssid,
>> +					NULL);
>> +		/*
>> +		 * TLB doesn't need invalidation since accesses from the device
>> +		 * can't use this domain's ASID once the CD is clear.
>> +		 *
>> +		 * Sadly that doesn't apply to ATCs, which are PASID tagged.
>> +		 * Invalidate all other devices as well, because even though
>> +		 * they weren't 'officially' attached to the auxiliary domain,
>> +		 * they could have formed ATC entries.
>> +		 */
>> +		arm_smmu_atc_inv_domain(smmu_domain, 0, 0);
> 
> I've been struggling to understand the locking here, since both
> arm_smmu_write_ctx_desc and arm_smmu_atc_inv_domain take and release the
> devices_lock for the domain. Is there not a problem with devices coming and
> going in-between the two calls?

Yes it's a problem. I suppose we could take the parent's init_mutex
(making sure that it protects detach_dev() as well.

First I need to figure out how to prevent the parent domain from
disappearing when auxiliary domains are attached, I seem to have forgotten
that. I think checking if AUXD is enabled in the device passed to
attach_dev() should be sufficient - that's what I do for SVA. But the
IOMMU API isn't quite ready to handle failure in iommu_detach_device() at
the moment. VFIO will free the domain even if it's still attached.

> 
>> +	} else {
>> +		struct arm_smmu_cmdq_ent cmd;
>> +
>> +		/* Invalidate only this device's ATC */
>> +		if (master->ats_enabled) {
>> +			arm_smmu_atc_inv_to_cmd(smmu_domain->ssid, 0, 0, &cmd);
>> +			arm_smmu_atc_inv_master(master, &cmd);
>> +		}
>> +	}
>> +out_unlock:
>> +	mutex_unlock(&smmu_domain->init_mutex);
>> +}
>> +
>> +static int arm_smmu_aux_get_pasid(struct iommu_domain *domain, struct device *dev)
>> +{
>> +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> +
>> +	return smmu_domain->ssid ?: -EINVAL;
>> +}
>> +
>>  static struct iommu_ops arm_smmu_ops = {
>>  	.capable		= arm_smmu_capable,
>>  	.domain_alloc		= arm_smmu_domain_alloc,
>> @@ -2539,6 +2772,13 @@ static struct iommu_ops arm_smmu_ops = {
>>  	.of_xlate		= arm_smmu_of_xlate,
>>  	.get_resv_regions	= arm_smmu_get_resv_regions,
>>  	.put_resv_regions	= arm_smmu_put_resv_regions,
>> +	.dev_has_feat		= arm_smmu_dev_has_feature,
>> +	.dev_feat_enabled	= arm_smmu_dev_feature_enabled,
>> +	.dev_enable_feat	= arm_smmu_dev_enable_feature,
>> +	.dev_disable_feat	= arm_smmu_dev_disable_feature,
> 
> Why can't we use the existing ->capable and ->dev_{get,set}_attr callbacks
> for this?

->capable isn't very useful because it applies to all SMMUs in the
system. The existing ->{get,set}_attr callbacks apply to an
iommu_domain. The main reason for doing it on endpoints was that it
would be tedious to keep track of capabilities when attaching and
detaching devices to a domain, especially for drivers that allow
multiple IOMMUs per domain [1]. There were more discussions, and in the
end Joerg proposed the current API for device attributes [2]

[1]
https://lore.kernel.org/lkml/aa1ff748-c2ec-acc0-f1d9-cdff2b131e58@linux.intel.com/
[2] https://lore.kernel.org/linux-iommu/20181207102926.GM16835@8bytes.org/

Thanks,
Jean

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] iommu/arm-smmu-v3: Add support for PCI PASID
  2019-06-10 18:47 ` [PATCH 8/8] iommu/arm-smmu-v3: Add support for PCI PASID Jean-Philippe Brucker
  2019-06-11 10:45   ` Jonathan Cameron
@ 2019-07-08  7:58   ` Auger Eric
  2019-09-19 15:10     ` Jean-Philippe Brucker
  1 sibling, 1 reply; 44+ messages in thread
From: Auger Eric @ 2019-07-08  7:58 UTC (permalink / raw)
  To: Jean-Philippe Brucker, will.deacon
  Cc: joro, robh+dt, mark.rutland, robin.murphy, jacob.jun.pan, iommu,
	devicetree, linux-kernel, linux-arm-kernel

Hi Jean,

On 6/10/19 8:47 PM, Jean-Philippe Brucker wrote:
> Enable PASID for PCI devices that support it. Since the SSID tables are
> allocated by arm_smmu_attach_dev(), PASID has to be enabled early enough.
> arm_smmu_dev_feature_enable() would be too late, since by that time the
> main DMA domain has already been attached. Do it in add_device() instead.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  drivers/iommu/arm-smmu-v3.c | 51 ++++++++++++++++++++++++++++++++++++-
>  1 file changed, 50 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 972bfb80f964..a8a516d9ff10 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -2197,6 +2197,49 @@ static void arm_smmu_disable_ats(struct arm_smmu_master *master)
>  	master->ats_enabled = false;
>  }
>  
> +static int arm_smmu_enable_pasid(struct arm_smmu_master *master)
> +{
> +	int ret;
> +	int features;
> +	int num_pasids;
> +	struct pci_dev *pdev;
> +
> +	if (!dev_is_pci(master->dev))
> +		return -ENOSYS;
> +
> +	pdev = to_pci_dev(master->dev);
> +
> +	features = pci_pasid_features(pdev);
> +	if (features < 0)
> +		return -ENOSYS;
> +
> +	num_pasids = pci_max_pasids(pdev);
> +	if (num_pasids <= 0)
> +		return -ENOSYS;
> +
> +	ret = pci_enable_pasid(pdev, features);
> +	if (!ret)
> +		master->ssid_bits = min_t(u8, ilog2(num_pasids),
> +					  master->smmu->ssid_bits);
I don't really get why this setting is conditional to the success of
pci_enabled_pasid and not num_pasids > 0.

If it fails the ssid_bits is set to min(smmu->ssid_bits,
fwspec->num_pasid_bits) anyway.

> +	return ret;
> +}
> +
> +static void arm_smmu_disable_pasid(struct arm_smmu_master *master)
> +{
> +	struct pci_dev *pdev;
> +
> +	if (!dev_is_pci(master->dev))
> +		return;
> +
> +	pdev = to_pci_dev(master->dev);
> +
> +	if (!pdev->pasid_enabled)
> +		return;
> +
> +	pci_disable_pasid(pdev);
> +	master->ssid_bits = 0;
in case of a platform device you leave the ssid_bits to a value != 0. Is
that what you want?
> +}
> +
>  static void arm_smmu_detach_dev(struct arm_smmu_master *master)
>  {
>  	unsigned long flags;
> @@ -2413,6 +2456,9 @@ static int arm_smmu_add_device(struct device *dev)
>  
>  	master->ssid_bits = min(smmu->ssid_bits, fwspec->num_pasid_bits);
>  
> +	/* Note that PASID must be enabled before, and disabled after ATS */
> +	arm_smmu_enable_pasid(master);
In case the call fails, don't you want to handle the error and reset the
ssid_bits?
> +
>  	/*
>  	 * If the SMMU doesn't support 2-stage CD, limit the linear
>  	 * tables to a reasonable number of contexts, let's say
> @@ -2423,7 +2469,7 @@ static int arm_smmu_add_device(struct device *dev)
>  
>  	ret = iommu_device_link(&smmu->iommu, dev);
>  	if (ret)
> -		goto err_free_master;
> +		goto err_disable_pasid;
>  
>  	group = iommu_group_get_for_dev(dev);
>  	if (IS_ERR(group)) {
> @@ -2436,6 +2482,8 @@ static int arm_smmu_add_device(struct device *dev)
>  
>  err_unlink:
>  	iommu_device_unlink(&smmu->iommu, dev);
> +err_disable_pasid:
> +	arm_smmu_disable_pasid(master);
>  err_free_master:
>  	kfree(master);
>  	fwspec->iommu_priv = NULL;
> @@ -2456,6 +2504,7 @@ static void arm_smmu_remove_device(struct device *dev)
>  	arm_smmu_detach_dev(master);
>  	iommu_group_remove_device(dev);
>  	iommu_device_unlink(&smmu->iommu, dev);
> +	arm_smmu_disable_pasid(master);
>  	kfree(master);
>  	iommu_fwspec_free(dev);
>  }
> 
Thanks

Eric

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/8] iommu/arm-smmu-v3: Support platform SSID
  2019-06-10 18:47 ` [PATCH 3/8] iommu/arm-smmu-v3: Support platform SSID Jean-Philippe Brucker
  2019-06-11  9:42   ` Jonathan Cameron
  2019-06-18 18:08   ` Will Deacon
@ 2019-07-08  7:58   ` Auger Eric
  2019-09-19 14:51     ` Jean-Philippe Brucker
  2 siblings, 1 reply; 44+ messages in thread
From: Auger Eric @ 2019-07-08  7:58 UTC (permalink / raw)
  To: Jean-Philippe Brucker, will.deacon
  Cc: joro, robh+dt, mark.rutland, robin.murphy, jacob.jun.pan, iommu,
	devicetree, linux-kernel, linux-arm-kernel

Hi Jean,

On 6/10/19 8:47 PM, Jean-Philippe Brucker wrote:
> For platform devices that support SubstreamID (SSID), firmware provides
> the number of supported SSID bits. Restrict it to what the SMMU supports
> and cache it into master->ssid_bits.
The commit message may give the impression the master's ssid_bits field
only is used for platform devices.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  drivers/iommu/arm-smmu-v3.c | 11 +++++++++++
>  drivers/iommu/of_iommu.c    |  6 +++++-
>  include/linux/iommu.h       |  1 +
>  3 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 4d5a694f02c2..3254f473e681 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -604,6 +604,7 @@ struct arm_smmu_master {
>  	struct list_head		domain_head;
>  	u32				*sids;
>  	unsigned int			num_sids;
> +	unsigned int			ssid_bits;
>  	bool				ats_enabled		:1;
>  };
>  
> @@ -2097,6 +2098,16 @@ static int arm_smmu_add_device(struct device *dev)
>  		}
>  	}
>  
> +	master->ssid_bits = min(smmu->ssid_bits, fwspec->num_pasid_bits);
In case the device is a PCI device, what is the value taken by
fwspec->num_pasid_bits?
> +
> +	/*
> +	 * If the SMMU doesn't support 2-stage CD, limit the linear
> +	 * tables to a reasonable number of contexts, let's say
> +	 * 64kB / sizeof(ctx_desc) = 1024 = 2^10
ctx_desc is 26B so 11bits would be OK
> +	 */
> +	if (!(smmu->features & ARM_SMMU_FEAT_2_LVL_CDTAB))
> +		master->ssid_bits = min(master->ssid_bits, 10U);
> +
>  	group = iommu_group_get_for_dev(dev);
>  	if (!IS_ERR(group)) {
>  		iommu_group_put(group);
> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
> index f04a6df65eb8..04f4f6b95d82 100644
> --- a/drivers/iommu/of_iommu.c
> +++ b/drivers/iommu/of_iommu.c
> @@ -206,8 +206,12 @@ const struct iommu_ops *of_iommu_configure(struct device *dev,
>  			if (err)
>  				break;
>  		}
> -	}
>  
> +		fwspec = dev_iommu_fwspec_get(dev);
> +		if (!err && fwspec)
> +			of_property_read_u32(master_np, "pasid-num-bits",
> +					     &fwspec->num_pasid_bits);
> +	}
>  
>  	/*
>  	 * Two success conditions can be represented by non-negative err here:
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 519e40fb23ce..b91df613385f 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -536,6 +536,7 @@ struct iommu_fwspec {
>  	struct fwnode_handle	*iommu_fwnode;
>  	void			*iommu_priv;
>  	u32			flags;
> +	u32			num_pasid_bits;
>  	unsigned int		num_ids;
>  	u32			ids[1];
>  };
> 
Thanks

Eric

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 7/8] iommu/arm-smmu-v3: Improve add_device() error handling
  2019-06-10 18:47 ` [PATCH 7/8] iommu/arm-smmu-v3: Improve add_device() error handling Jean-Philippe Brucker
@ 2019-07-08  7:58   ` Auger Eric
  0 siblings, 0 replies; 44+ messages in thread
From: Auger Eric @ 2019-07-08  7:58 UTC (permalink / raw)
  To: Jean-Philippe Brucker, will.deacon
  Cc: joro, robh+dt, mark.rutland, robin.murphy, jacob.jun.pan, iommu,
	devicetree, linux-kernel, linux-arm-kernel

Hi Jean,

On 6/10/19 8:47 PM, Jean-Philippe Brucker wrote:
> Let add_device() clean up behind itself. The iommu_bus_init() function
> does call remove_device() on error, but other sites (e.g. of_iommu) do
> not.
> 
> Don't free level-2 stream tables because we'd have to track if we
> allocated each of them or if they are used by other endpoints. It's not
> worth the hassle since they are managed resources.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>

Thanks

Eric
> ---
>  drivers/iommu/arm-smmu-v3.c | 28 +++++++++++++++++++++-------
>  1 file changed, 21 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 633d829f246f..972bfb80f964 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -2398,14 +2398,16 @@ static int arm_smmu_add_device(struct device *dev)
>  	for (i = 0; i < master->num_sids; i++) {
>  		u32 sid = master->sids[i];
>  
> -		if (!arm_smmu_sid_in_range(smmu, sid))
> -			return -ERANGE;
> +		if (!arm_smmu_sid_in_range(smmu, sid)) {
> +			ret = -ERANGE;
> +			goto err_free_master;
> +		}
>  
>  		/* Ensure l2 strtab is initialised */
>  		if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) {
>  			ret = arm_smmu_init_l2_strtab(smmu, sid);
>  			if (ret)
> -				return ret;
> +				goto err_free_master;
>  		}
>  	}
>  
> @@ -2419,13 +2421,25 @@ static int arm_smmu_add_device(struct device *dev)
>  	if (!(smmu->features & ARM_SMMU_FEAT_2_LVL_CDTAB))
>  		master->ssid_bits = min(master->ssid_bits, 10U);
>  
> +	ret = iommu_device_link(&smmu->iommu, dev);
> +	if (ret)
> +		goto err_free_master;
> +
>  	group = iommu_group_get_for_dev(dev);
> -	if (!IS_ERR(group)) {
> -		iommu_group_put(group);
> -		iommu_device_link(&smmu->iommu, dev);
> +	if (IS_ERR(group)) {
> +		ret = PTR_ERR(group);
> +		goto err_unlink;
>  	}
>  
> -	return PTR_ERR_OR_ZERO(group);
> +	iommu_group_put(group);
> +	return 0;
> +
> +err_unlink:
> +	iommu_device_unlink(&smmu->iommu, dev);
> +err_free_master:
> +	kfree(master);
> +	fwspec->iommu_priv = NULL;
> +	return ret;
>  }
>  
>  static void arm_smmu_remove_device(struct device *dev)
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/8] dt-bindings: document PASID property for IOMMU masters
  2019-06-10 18:47 ` [PATCH 2/8] dt-bindings: document PASID property for IOMMU masters Jean-Philippe Brucker
@ 2019-07-08  7:58   ` Auger Eric
  0 siblings, 0 replies; 44+ messages in thread
From: Auger Eric @ 2019-07-08  7:58 UTC (permalink / raw)
  To: Jean-Philippe Brucker, will.deacon
  Cc: joro, robh+dt, mark.rutland, robin.murphy, jacob.jun.pan, iommu,
	devicetree, linux-kernel, linux-arm-kernel

Hi Jean,

On 6/10/19 8:47 PM, Jean-Philippe Brucker wrote:
> On Arm systems, some platform devices behind an SMMU may support the PASID
> feature, which offers multiple address space. Let the firmware tell us
> when a device supports PASID.
> 
> Reviewed-by: Rob Herring <robh@kernel.org>
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
> Previous discussion on this patch last year:
> https://patchwork.ozlabs.org/patch/872275/
> I split PASID and stall definitions, keeping only PASID here.
> ---
>  Documentation/devicetree/bindings/iommu/iommu.txt | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/iommu/iommu.txt b/Documentation/devicetree/bindings/iommu/iommu.txt
> index 5a8b4624defc..3c36334e4f94 100644
> --- a/Documentation/devicetree/bindings/iommu/iommu.txt
> +++ b/Documentation/devicetree/bindings/iommu/iommu.txt
> @@ -86,6 +86,12 @@ have a means to turn off translation. But it is invalid in such cases to
>  disable the IOMMU's device tree node in the first place because it would
>  prevent any driver from properly setting up the translations.
>  
> +Optional properties:
> +--------------------
> +- pasid-num-bits: Some masters support multiple address spaces for DMA, by
> +  tagging DMA transactions with an address space identifier. By default,
> +  this is 0, which means that the device only has one address space.
> +
Reviewed-by: Eric Auger <eric.auger@redhat.com>

Thanks

Eric
>  
>  Notes:
>  ======
> 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 5/8] iommu/arm-smmu-v3: Add second level of context descriptor table
  2019-06-10 18:47 ` [PATCH 5/8] iommu/arm-smmu-v3: Add second level of context descriptor table Jean-Philippe Brucker
  2019-06-11 10:24   ` Jonathan Cameron
@ 2019-07-08 15:13   ` Auger Eric
  2019-09-19 15:05     ` Jean-Philippe Brucker
  1 sibling, 1 reply; 44+ messages in thread
From: Auger Eric @ 2019-07-08 15:13 UTC (permalink / raw)
  To: Jean-Philippe Brucker, will.deacon
  Cc: joro, robh+dt, mark.rutland, robin.murphy, jacob.jun.pan, iommu,
	devicetree, linux-kernel, linux-arm-kernel

Hi Jean,

On 6/10/19 8:47 PM, Jean-Philippe Brucker wrote:
> The SMMU can support up to 20 bits of SSID. Add a second level of page
> tables to accommodate this. Devices that support more than 1024 SSIDs now
> have a table of 1024 L1 entries (8kB), pointing to tables of 1024 context
> descriptors (64kB), allocated on demand.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  drivers/iommu/arm-smmu-v3.c | 136 +++++++++++++++++++++++++++++++++---
>  1 file changed, 128 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index d90eb604b65d..326b71793336 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -216,6 +216,8 @@
>  
>  #define STRTAB_STE_0_S1FMT		GENMASK_ULL(5, 4)
>  #define STRTAB_STE_0_S1FMT_LINEAR	0
> +#define STRTAB_STE_0_S1FMT_4K_L2	1
As you only use 64kB L2, I guess you can remove the 4K define?
> +#define STRTAB_STE_0_S1FMT_64K_L2	2
>  #define STRTAB_STE_0_S1CTXPTR_MASK	GENMASK_ULL(51, 6)
>  #define STRTAB_STE_0_S1CDMAX		GENMASK_ULL(63, 59)
>  
> @@ -255,6 +257,18 @@
>  
>  #define STRTAB_STE_3_S2TTB_MASK		GENMASK_ULL(51, 4)
>  
> +/*
> + * Linear: when less than 1024 SSIDs are supported
> + * 2lvl: at most 1024 L1 entrie,
entries
> + *      1024 lazy entries per table.
> + */
> +#define CTXDESC_SPLIT			10
> +#define CTXDESC_NUM_L2_ENTRIES		(1 << CTXDESC_SPLIT)
> +
> +#define CTXDESC_L1_DESC_DWORD		1
> +#define CTXDESC_L1_DESC_VALID		1
> +#define CTXDESC_L1_DESC_L2PTR_MASK	GENMASK_ULL(51, 12)
> +
>  /* Context descriptor (stage-1 only) */
>  #define CTXDESC_CD_DWORDS		8
>  #define CTXDESC_CD_0_TCR_T0SZ		GENMASK_ULL(5, 0)
> @@ -530,7 +544,10 @@ struct arm_smmu_ctx_desc {
>  struct arm_smmu_s1_cfg {
>  	u8				s1fmt;
>  	u8				s1cdmax;
> -	struct arm_smmu_cd_table	table;
> +	struct arm_smmu_cd_table	*tables;
> +	size_t				num_tables;
> +	__le64				*l1ptr;
> +	dma_addr_t			l1ptr_dma;
>  
>  	/* Context descriptor 0, when substreams are disabled or s1dss = 0b10 */
>  	struct arm_smmu_ctx_desc	cd;
> @@ -1118,12 +1135,51 @@ static void arm_smmu_free_cd_leaf_table(struct arm_smmu_device *smmu,
>  {
>  	size_t size = num_entries * (CTXDESC_CD_DWORDS << 3);
>  
> +	if (!table->ptr)
> +		return;
>  	dmam_free_coherent(smmu->dev, size, table->ptr, table->ptr_dma);
>  }
>  
> -static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_s1_cfg *cfg, u32 ssid)
> +static void arm_smmu_write_cd_l1_desc(__le64 *dst,
> +				      struct arm_smmu_cd_table *table)
>  {
> -	return cfg->table.ptr + ssid * CTXDESC_CD_DWORDS;
> +	u64 val = (table->ptr_dma & CTXDESC_L1_DESC_L2PTR_MASK) |
> +		  CTXDESC_L1_DESC_VALID;
> +
> +	*dst = cpu_to_le64(val);
> +}
> +
> +static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_domain *smmu_domain,
> +				   u32 ssid)> +{
> +	unsigned int idx;
> +	struct arm_smmu_cd_table *table;
> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
> +	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
> +
> +	if (cfg->s1fmt == STRTAB_STE_0_S1FMT_LINEAR) {
> +		table = &cfg->tables[0];
> +		idx = ssid;
> +	} else {
> +		idx = ssid >> CTXDESC_SPLIT;
> +		if (idx >= cfg->num_tables)
> +			return NULL;
> +
> +		table = &cfg->tables[idx];
> +		if (!table->ptr) {
> +			__le64 *l1ptr = cfg->l1ptr + idx * CTXDESC_L1_DESC_DWORD;
> +
> +			if (arm_smmu_alloc_cd_leaf_table(smmu, table,
> +							 CTXDESC_NUM_L2_ENTRIES))
> +				return NULL;
> +
> +			arm_smmu_write_cd_l1_desc(l1ptr, table);
> +			/* An invalid L1 entry is allowed to be cached */
> +			arm_smmu_sync_cd(smmu_domain, ssid, false);
> +		}
> +		idx = ssid & (CTXDESC_NUM_L2_ENTRIES - 1);
> +	}
> +	return table->ptr + idx * CTXDESC_CD_DWORDS;
>  }
>  
>  static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
> @@ -1149,7 +1205,7 @@ static int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
>  	u64 val;
>  	bool cd_live;
>  	struct arm_smmu_device *smmu = smmu_domain->smmu;
> -	__le64 *cdptr = arm_smmu_get_cd_ptr(&smmu_domain->s1_cfg, ssid);
> +	__le64 *cdptr = arm_smmu_get_cd_ptr(smmu_domain, ssid);
>  
>  	/*
>  	 * This function handles the following cases:
> @@ -1213,20 +1269,81 @@ static int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
>  static int arm_smmu_alloc_cd_tables(struct arm_smmu_domain *smmu_domain,
>  				    struct arm_smmu_master *master)
>  {
> +	int ret;
> +	size_t size = 0;
> +	size_t max_contexts, num_leaf_entries;
>  	struct arm_smmu_device *smmu = smmu_domain->smmu;
>  	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
>  
>  	cfg->s1fmt = STRTAB_STE_0_S1FMT_LINEAR;
>  	cfg->s1cdmax = master->ssid_bits;
> -	return arm_smmu_alloc_cd_leaf_table(smmu, &cfg->table, 1 << cfg->s1cdmax);
> +
> +	max_contexts = 1 << cfg->s1cdmax;
> +	if (!(smmu->features & ARM_SMMU_FEAT_2_LVL_CDTAB) ||
> +	    max_contexts <= CTXDESC_NUM_L2_ENTRIES) {
> +		cfg->s1fmt = STRTAB_STE_0_S1FMT_LINEAR;
> +		cfg->num_tables = 1;
> +		num_leaf_entries = max_contexts;
> +	} else {
> +		cfg->s1fmt = STRTAB_STE_0_S1FMT_64K_L2;
> +		/*
> +		 * SSID[S1CDmax-1:10] indexes 1st-level table, SSID[9:0] indexes
> +		 * 2nd-level
> +		 */
> +		cfg->num_tables = max_contexts / CTXDESC_NUM_L2_ENTRIES;
> +
> +		size = cfg->num_tables * (CTXDESC_L1_DESC_DWORD << 3);
> +		cfg->l1ptr = dmam_alloc_coherent(smmu->dev, size,
> +						 &cfg->l1ptr_dma,
> +						 GFP_KERNEL | __GFP_ZERO);
> +		if (!cfg->l1ptr) {
> +			dev_warn(smmu->dev, "failed to allocate L1 context table\n");
> +			return -ENOMEM;
> +		}
> +
> +		num_leaf_entries = CTXDESC_NUM_L2_ENTRIES;
> +	}
> +
> +	cfg->tables = devm_kzalloc(smmu->dev, sizeof(struct arm_smmu_cd_table) *
> +				   cfg->num_tables, GFP_KERNEL);
> +	if (!cfg->tables)
> +		return -ENOMEM;
goto err_free_l1
> +
> +	ret = arm_smmu_alloc_cd_leaf_table(smmu, &cfg->tables[0], num_leaf_entries);
don't you want to do that only in linear case. In 2-level mode, I
understand arm_smmu_get_cd_ptr() will do the job.

> +	if (ret)
> +		goto err_free_l1;
> +
> +	if (cfg->l1ptr)
> +		arm_smmu_write_cd_l1_desc(cfg->l1ptr, &cfg->tables[0]);
that stuff could be removed as well? By the way I can see that
arm_smmu_get_cd_ptr() does a arm_smmu_sync_cd after. wouldn't it be
needed here as well?
> +
> +	return 0;
> +
> +err_free_l1:
> +	if (cfg->l1ptr)
> +		dmam_free_coherent(smmu->dev, size, cfg->l1ptr, cfg->l1ptr_dma);
> +	devm_kfree(smmu->dev, cfg->tables);
> +	return ret;
>  }
>  
>  static void arm_smmu_free_cd_tables(struct arm_smmu_domain *smmu_domain)
>  {
> +	int i;
>  	struct arm_smmu_device *smmu = smmu_domain->smmu;
>  	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
> +	size_t num_leaf_entries = 1 << cfg->s1cdmax;
> +	struct arm_smmu_cd_table *table = cfg->tables;
> +
> +	if (cfg->l1ptr) {
> +		size_t size = cfg->num_tables * (CTXDESC_L1_DESC_DWORD << 3);
>  
> -	arm_smmu_free_cd_leaf_table(smmu, &cfg->table, 1 << cfg->s1cdmax);
> +		dmam_free_coherent(smmu->dev, size, cfg->l1ptr,
> +				   cfg->l1ptr_dma);
> +		num_leaf_entries = CTXDESC_NUM_L2_ENTRIES;
> +	}
> +
> +	for (i = 0; i < cfg->num_tables; i++, table++)
> +		arm_smmu_free_cd_leaf_table(smmu, table, num_leaf_entries);
> +	devm_kfree(smmu->dev, cfg->tables);
>  }
>  
>  /* Stream table manipulation functions */
> @@ -1346,6 +1463,9 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>  	}
>  
>  	if (s1_cfg) {
> +		dma_addr_t ptr_dma = s1_cfg->l1ptr ? s1_cfg->l1ptr_dma :
> +			             s1_cfg->tables[0].ptr_dma;
> +
>  		BUG_ON(ste_live);
>  		dst[1] = cpu_to_le64(
>  			 FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) |
> @@ -1358,7 +1478,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>  		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
>  			dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
>  
> -		val |= (s1_cfg->table.ptr_dma & STRTAB_STE_0_S1CTXPTR_MASK) |
> +		val |= (ptr_dma & STRTAB_STE_0_S1CTXPTR_MASK) |
>  			FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S1_TRANS) |
>  			FIELD_PREP(STRTAB_STE_0_S1CDMAX, s1_cfg->s1cdmax) |
>  			FIELD_PREP(STRTAB_STE_0_S1FMT, s1_cfg->s1fmt);
> @@ -1815,7 +1935,7 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
>  	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>  		struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
>  
> -		if (cfg->table.ptr) {
> +		if (cfg->tables) {
>  			arm_smmu_free_cd_tables(smmu_domain);
>  			arm_smmu_bitmap_free(smmu->asid_map, cfg->cd.asid);
I don't get why the arm_smmu_bitmap_free is dependent on cfg->tables.

Thanks

Eric
>  		}
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/8] iommu/arm-smmu-v3: Add support for Substream IDs
  2019-06-10 18:47 ` [PATCH 4/8] iommu/arm-smmu-v3: Add support for Substream IDs Jean-Philippe Brucker
  2019-06-11 10:19   ` Jonathan Cameron
  2019-06-26 18:00   ` Will Deacon
@ 2019-07-08 15:31   ` Auger Eric
  2019-09-19 15:01     ` Jean-Philippe Brucker
  2 siblings, 1 reply; 44+ messages in thread
From: Auger Eric @ 2019-07-08 15:31 UTC (permalink / raw)
  To: Jean-Philippe Brucker, will.deacon
  Cc: joro, robh+dt, mark.rutland, robin.murphy, jacob.jun.pan, iommu,
	devicetree, linux-kernel, linux-arm-kernel

Hi Jean,

On 6/10/19 8:47 PM, Jean-Philippe Brucker wrote:
> At the moment, the SMMUv3 driver implements only one stage-1 or stage-2
> page directory per device. However SMMUv3 allows more than one address
> space for some devices, by providing multiple stage-1 page directories. In
> addition to the Stream ID (SID), that identifies a device, we can now have
> Substream IDs (SSID) identifying an address space. In PCIe, SID is called
> Requester ID (RID) and SSID is called Process Address-Space ID (PASID).
> 
> Prepare the driver for SSID support, by adding context descriptor tables
> in STEs (previously a single static context descriptor). A complete
> stage-1 walk is now performed like this by the SMMU:
> 
>       Stream tables          Ctx. tables          Page tables
>         +--------+   ,------->+-------+   ,------->+-------+
>         :        :   |        :       :   |        :       :
>         +--------+   |        +-------+   |        +-------+
>    SID->|  STE   |---'  SSID->|  CD   |---'  IOVA->|  PTE  |--> IPA
>         +--------+            +-------+            +-------+
>         :        :            :       :            :       :
>         +--------+            +-------+            +-------+
> 
> Implement a single level of context descriptor table for now, but as with
> stream and page tables, an SSID can be split to index multiple levels of
> tables.
> 
> In all stream table entries, we set S1DSS=SSID0 mode, making translations
> without an SSID use context descriptor 0. Although it would be possible by
> setting S1DSS=BYPASS, we don't currently support SSID when user selects
> iommu.passthrough.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  drivers/iommu/arm-smmu-v3.c | 238 +++++++++++++++++++++++++++++-------
>  1 file changed, 192 insertions(+), 46 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 3254f473e681..d90eb604b65d 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -219,6 +219,11 @@
>  #define STRTAB_STE_0_S1CTXPTR_MASK	GENMASK_ULL(51, 6)
>  #define STRTAB_STE_0_S1CDMAX		GENMASK_ULL(63, 59)
>  
> +#define STRTAB_STE_1_S1DSS		GENMASK_ULL(1, 0)
> +#define STRTAB_STE_1_S1DSS_TERMINATE	0x0
> +#define STRTAB_STE_1_S1DSS_BYPASS	0x1
> +#define STRTAB_STE_1_S1DSS_SSID0	0x2
> +
>  #define STRTAB_STE_1_S1C_CACHE_NC	0UL
>  #define STRTAB_STE_1_S1C_CACHE_WBRA	1UL
>  #define STRTAB_STE_1_S1C_CACHE_WT	2UL
> @@ -305,6 +310,7 @@
>  #define CMDQ_PREFETCH_1_SIZE		GENMASK_ULL(4, 0)
>  #define CMDQ_PREFETCH_1_ADDR_MASK	GENMASK_ULL(63, 12)
>  
> +#define CMDQ_CFGI_0_SSID		GENMASK_ULL(31, 12)
>  #define CMDQ_CFGI_0_SID			GENMASK_ULL(63, 32)
>  #define CMDQ_CFGI_1_LEAF		(1UL << 0)
>  #define CMDQ_CFGI_1_RANGE		GENMASK_ULL(4, 0)
> @@ -421,8 +427,11 @@ struct arm_smmu_cmdq_ent {
>  
>  		#define CMDQ_OP_CFGI_STE	0x3
>  		#define CMDQ_OP_CFGI_ALL	0x4
> +		#define CMDQ_OP_CFGI_CD		0x5
> +		#define CMDQ_OP_CFGI_CD_ALL	0x6
>  		struct {
>  			u32			sid;
> +			u32			ssid;
>  			union {
>  				bool		leaf;
>  				u8		span;
> @@ -506,16 +515,25 @@ struct arm_smmu_strtab_l1_desc {
>  	dma_addr_t			l2ptr_dma;
>  };
>  
> +struct arm_smmu_cd_table {
> +	__le64				*ptr;
> +	dma_addr_t			ptr_dma;
> +};
> +
> +struct arm_smmu_ctx_desc {
> +	u16				asid;
> +	u64				ttbr;
> +	u64				tcr;
> +	u64				mair;
> +};
> +
>  struct arm_smmu_s1_cfg {
> -	__le64				*cdptr;
> -	dma_addr_t			cdptr_dma;
> -
> -	struct arm_smmu_ctx_desc {
> -		u16	asid;
> -		u64	ttbr;
> -		u64	tcr;
> -		u64	mair;
> -	}				cd;
> +	u8				s1fmt;
> +	u8				s1cdmax;
> +	struct arm_smmu_cd_table	table;
> +
> +	/* Context descriptor 0, when substreams are disabled or s1dss = 0b10 */
> +	struct arm_smmu_ctx_desc	cd;
>  };
>  
>  struct arm_smmu_s2_cfg {
> @@ -811,10 +829,16 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
>  		cmd[1] |= FIELD_PREP(CMDQ_PREFETCH_1_SIZE, ent->prefetch.size);
>  		cmd[1] |= ent->prefetch.addr & CMDQ_PREFETCH_1_ADDR_MASK;
>  		break;
> +	case CMDQ_OP_CFGI_CD:
> +		cmd[0] |= FIELD_PREP(CMDQ_CFGI_0_SSID, ent->cfgi.ssid);
> +		/* Fallthrough */
>  	case CMDQ_OP_CFGI_STE:
>  		cmd[0] |= FIELD_PREP(CMDQ_CFGI_0_SID, ent->cfgi.sid);
>  		cmd[1] |= FIELD_PREP(CMDQ_CFGI_1_LEAF, ent->cfgi.leaf);
>  		break;
> +	case CMDQ_OP_CFGI_CD_ALL:
> +		cmd[0] |= FIELD_PREP(CMDQ_CFGI_0_SID, ent->cfgi.sid);
> +		break;
>  	case CMDQ_OP_CFGI_ALL:
>  		/* Cover the entire SID range */
>  		cmd[1] |= FIELD_PREP(CMDQ_CFGI_1_RANGE, 31);
> @@ -1045,6 +1069,63 @@ static int arm_smmu_cmdq_issue_sync(struct arm_smmu_device *smmu)
>  }
>  
>  /* Context descriptor manipulation functions */
> +static void arm_smmu_sync_cd(struct arm_smmu_domain *smmu_domain,
> +			     int ssid, bool leaf)
> +{
> +	size_t i;
> +	unsigned long flags;
> +	struct arm_smmu_master *master;
> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
> +	struct arm_smmu_cmdq_ent cmd = {
> +		.opcode	= CMDQ_OP_CFGI_CD,
> +		.cfgi	= {
> +			.ssid	= ssid,
> +			.leaf	= leaf,
> +		},
> +	};
> +
> +	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
> +	list_for_each_entry(master, &smmu_domain->devices, domain_head) {
> +		for (i = 0; i < master->num_sids; i++) {
> +			cmd.cfgi.sid = master->sids[i];
> +			arm_smmu_cmdq_issue_cmd(smmu, &cmd);
> +		}
> +	}
> +	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
> +
> +	arm_smmu_cmdq_issue_sync(smmu);
> +}
> +
> +static int arm_smmu_alloc_cd_leaf_table(struct arm_smmu_device *smmu,
> +					struct arm_smmu_cd_table *table,
> +					size_t num_entries)
> +{
> +	size_t size = num_entries * (CTXDESC_CD_DWORDS << 3);
> +
> +	table->ptr = dmam_alloc_coherent(smmu->dev, size, &table->ptr_dma,
> +					 GFP_KERNEL | __GFP_ZERO);
> +	if (!table->ptr) {
> +		dev_warn(smmu->dev,
> +			 "failed to allocate context descriptor table\n");
> +		return -ENOMEM;
> +	}
> +	return 0;
> +}
> +
> +static void arm_smmu_free_cd_leaf_table(struct arm_smmu_device *smmu,
> +					struct arm_smmu_cd_table *table,
> +					size_t num_entries)
> +{
> +	size_t size = num_entries * (CTXDESC_CD_DWORDS << 3);
> +
> +	dmam_free_coherent(smmu->dev, size, table->ptr, table->ptr_dma);
> +}
> +
> +static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_s1_cfg *cfg, u32 ssid)
> +{
> +	return cfg->table.ptr + ssid * CTXDESC_CD_DWORDS;
> +}
> +
>  static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
>  {
>  	u64 val = 0;
> @@ -1062,33 +1143,90 @@ static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
>  	return val;
>  }
>  
> -static void arm_smmu_write_ctx_desc(struct arm_smmu_device *smmu,
> -				    struct arm_smmu_s1_cfg *cfg)
> +static int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
> +				   int ssid, struct arm_smmu_ctx_desc *cd)
>  {
>  	u64 val;
> +	bool cd_live;
> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
> +	__le64 *cdptr = arm_smmu_get_cd_ptr(&smmu_domain->s1_cfg, ssid);
>  
>  	/*
> -	 * We don't need to issue any invalidation here, as we'll invalidate
> -	 * the STE when installing the new entry anyway.
> +	 * This function handles the following cases:
> +	 *
> +	 * (1) Install primary CD, for normal DMA traffic (SSID = 0).
> +	 * (2) Install a secondary CD, for SID+SSID traffic.
> +	 * (3) Update ASID of a CD. Atomically write the first 64 bits of the
> +	 *     CD, then invalidate the old entry and mappings.
Can you explain when (3) does occur?
> +	 * (4) Remove a secondary CD.
>  	 */
> -	val = arm_smmu_cpu_tcr_to_cd(cfg->cd.tcr) |
> +
> +	if (!cdptr)
> +		return -ENOMEM;
Is that relevant? arm_smmu_get_cd_ptr() does not test ssid is within the
cfg->s1cdmax range and always return smthg != NULL AFAIU.
> +
> +	val = le64_to_cpu(cdptr[0]);
> +	cd_live = !!(val & CTXDESC_CD_0_V);
> +
> +	if (!cd) { /* (4) */
> +		cdptr[0] = 0;
> +	} else if (cd_live) { /* (3) */
> +		val &= ~CTXDESC_CD_0_ASID;
> +		val |= FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid);
> +
> +		cdptr[0] = cpu_to_le64(val);
> +		/*
> +		 * Until CD+TLB invalidation, both ASIDs may be used for tagging
> +		 * this substream's traffic
> +		 */
> +	} else { /* (1) and (2) */
> +		cdptr[1] = cpu_to_le64(cd->ttbr & CTXDESC_CD_1_TTB0_MASK);
> +		cdptr[2] = 0;
> +		cdptr[3] = cpu_to_le64(cd->mair);
> +
> +		/*
> +		 * STE is live, and the SMMU might fetch this CD at any
> +		 * time. Ensure that it observes the rest of the CD before we
> +		 * enable it.
> +		 */
> +		arm_smmu_sync_cd(smmu_domain, ssid, true);
> +
> +		val = arm_smmu_cpu_tcr_to_cd(cd->tcr) |
>  #ifdef __BIG_ENDIAN
> -	      CTXDESC_CD_0_ENDI |
> +			CTXDESC_CD_0_ENDI |
>  #endif
> -	      CTXDESC_CD_0_R | CTXDESC_CD_0_A | CTXDESC_CD_0_ASET |
> -	      CTXDESC_CD_0_AA64 | FIELD_PREP(CTXDESC_CD_0_ASID, cfg->cd.asid) |
> -	      CTXDESC_CD_0_V;
> +			CTXDESC_CD_0_R | CTXDESC_CD_0_A | CTXDESC_CD_0_ASET |
> +			CTXDESC_CD_0_AA64 |
> +			FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid) |
> +			CTXDESC_CD_0_V;
> +
> +		/* STALL_MODEL==0b10 && CD.S==0 is ILLEGAL */
> +		if (smmu->features & ARM_SMMU_FEAT_STALL_FORCE)
> +			val |= CTXDESC_CD_0_S;
> +
> +		cdptr[0] = cpu_to_le64(val);
> +	}
>  
> -	/* STALL_MODEL==0b10 && CD.S==0 is ILLEGAL */
> -	if (smmu->features & ARM_SMMU_FEAT_STALL_FORCE)
> -		val |= CTXDESC_CD_0_S;
> +	arm_smmu_sync_cd(smmu_domain, ssid, true);
> +	return 0;
> +}
> +
> +static int arm_smmu_alloc_cd_tables(struct arm_smmu_domain *smmu_domain,
> +				    struct arm_smmu_master *master> +{
> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
> +	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
>  
> -	cfg->cdptr[0] = cpu_to_le64(val);
> +	cfg->s1fmt = STRTAB_STE_0_S1FMT_LINEAR;
> +	cfg->s1cdmax = master->ssid_bits;
> +	return arm_smmu_alloc_cd_leaf_table(smmu, &cfg->table, 1 << cfg->s1cdmax);
> +}
>  
> -	val = cfg->cd.ttbr & CTXDESC_CD_1_TTB0_MASK;
> -	cfg->cdptr[1] = cpu_to_le64(val);
> +static void arm_smmu_free_cd_tables(struct arm_smmu_domain *smmu_domain)
> +{
> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
> +	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
>  
> -	cfg->cdptr[3] = cpu_to_le64(cfg->cd.mair);
> +	arm_smmu_free_cd_leaf_table(smmu, &cfg->table, 1 << cfg->s1cdmax);
>  }
>  
>  /* Stream table manipulation functions */
> @@ -1210,6 +1348,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>  	if (s1_cfg) {
>  		BUG_ON(ste_live);
>  		dst[1] = cpu_to_le64(
> +			 FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) |
>  			 FIELD_PREP(STRTAB_STE_1_S1CIR, STRTAB_STE_1_S1C_CACHE_WBRA) |
>  			 FIELD_PREP(STRTAB_STE_1_S1COR, STRTAB_STE_1_S1C_CACHE_WBRA) |
>  			 FIELD_PREP(STRTAB_STE_1_S1CSH, ARM_SMMU_SH_ISH) |
> @@ -1219,8 +1358,10 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>  		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
>  			dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
>  
> -		val |= (s1_cfg->cdptr_dma & STRTAB_STE_0_S1CTXPTR_MASK) |
> -			FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S1_TRANS);
> +		val |= (s1_cfg->table.ptr_dma & STRTAB_STE_0_S1CTXPTR_MASK) |
> +			FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S1_TRANS) |
> +			FIELD_PREP(STRTAB_STE_0_S1CDMAX, s1_cfg->s1cdmax) |
> +			FIELD_PREP(STRTAB_STE_0_S1FMT, s1_cfg->s1fmt);
>  	}
>  
>  	if (s2_cfg) {
> @@ -1674,12 +1815,8 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
>  	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>  		struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
>  
> -		if (cfg->cdptr) {
> -			dmam_free_coherent(smmu_domain->smmu->dev,
> -					   CTXDESC_CD_DWORDS << 3,
> -					   cfg->cdptr,
> -					   cfg->cdptr_dma);
> -
> +		if (cfg->table.ptr) {
> +			arm_smmu_free_cd_tables(smmu_domain);
>  			arm_smmu_bitmap_free(smmu->asid_map, cfg->cd.asid);
>  		}
>  	} else {
> @@ -1692,6 +1829,7 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
>  }
>  
>  static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
> +				       struct arm_smmu_master *master,
>  				       struct io_pgtable_cfg *pgtbl_cfg)
>  {
>  	int ret;
> @@ -1703,27 +1841,29 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
>  	if (asid < 0)
>  		return asid;
>  
> -	cfg->cdptr = dmam_alloc_coherent(smmu->dev, CTXDESC_CD_DWORDS << 3,
> -					 &cfg->cdptr_dma,
> -					 GFP_KERNEL | __GFP_ZERO);
> -	if (!cfg->cdptr) {
> -		dev_warn(smmu->dev, "failed to allocate context descriptor\n");
> -		ret = -ENOMEM;
> +	ret = arm_smmu_alloc_cd_tables(smmu_domain, master);
> +	if (ret)
>  		goto out_free_asid;
> -	}
>  
>  	cfg->cd.asid	= (u16)asid;
>  	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
>  	cfg->cd.tcr	= pgtbl_cfg->arm_lpae_s1_cfg.tcr;
>  	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair[0];
> +
> +	ret = arm_smmu_write_ctx_desc(smmu_domain, 0, &smmu_domain->s1_cfg.cd);
cfg.cd
> +	if (ret)
> +		goto out_free_table;
>  	return 0;
>  
> +out_free_table:
> +	arm_smmu_free_cd_tables(smmu_domain);
>  out_free_asid:
>  	arm_smmu_bitmap_free(smmu->asid_map, asid);
>  	return ret;
>  }
>  
>  static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain,
> +				       struct arm_smmu_master *master,
>  				       struct io_pgtable_cfg *pgtbl_cfg)
>  {
>  	int vmid;
> @@ -1740,7 +1880,8 @@ static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain,
>  	return 0;
>  }
>  
> -static int arm_smmu_domain_finalise(struct iommu_domain *domain)
> +static int arm_smmu_domain_finalise(struct iommu_domain *domain,
> +				    struct arm_smmu_master *master)
>  {
>  	int ret;
>  	unsigned long ias, oas;
> @@ -1748,6 +1889,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain)
>  	struct io_pgtable_cfg pgtbl_cfg;
>  	struct io_pgtable_ops *pgtbl_ops;
>  	int (*finalise_stage_fn)(struct arm_smmu_domain *,
> +				 struct arm_smmu_master *,
>  				 struct io_pgtable_cfg *);
>  	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>  	struct arm_smmu_device *smmu = smmu_domain->smmu;
> @@ -1804,7 +1946,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain)
>  	domain->geometry.aperture_end = (1UL << pgtbl_cfg.ias) - 1;
>  	domain->geometry.force_aperture = true;
>  
> -	ret = finalise_stage_fn(smmu_domain, &pgtbl_cfg);
> +	ret = finalise_stage_fn(smmu_domain, master, &pgtbl_cfg);
>  	if (ret < 0) {
>  		free_io_pgtable_ops(pgtbl_ops);
>  		return ret;
> @@ -1932,7 +2074,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  
>  	if (!smmu_domain->smmu) {
>  		smmu_domain->smmu = smmu;
> -		ret = arm_smmu_domain_finalise(domain);
> +		ret = arm_smmu_domain_finalise(domain, master);
>  		if (ret) {
>  			smmu_domain->smmu = NULL;
>  			goto out_unlock;
> @@ -1944,6 +2086,13 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  			dev_name(smmu->dev));
>  		ret = -ENXIO;
>  		goto out_unlock;
> +	} else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1 &&
> +		   master->ssid_bits != smmu_domain->s1_cfg.s1cdmax) {
> +		dev_err(dev,
> +			"cannot attach to incompatible domain (%u SSID bits != %u)\n",
> +			smmu_domain->s1_cfg.s1cdmax, master->ssid_bits);
> +		ret = -EINVAL;
> +		goto out_unlock;
>  	}
>  
>  	master->domain = smmu_domain;
> @@ -1955,9 +2104,6 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  	if (smmu_domain->stage != ARM_SMMU_DOMAIN_BYPASS)
>  		arm_smmu_enable_ats(master);
>  
> -	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1)
> -		arm_smmu_write_ctx_desc(smmu, &smmu_domain->s1_cfg);
> -
>  	arm_smmu_install_ste_for_dev(master);
>  out_unlock:
>  	mutex_unlock(&smmu_domain->init_mutex);
> 
Thanks

Eric

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/8] iommu/arm-smmu-v3: Support platform SSID
  2019-07-08  7:58   ` Auger Eric
@ 2019-09-19 14:51     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2019-09-19 14:51 UTC (permalink / raw)
  To: Auger Eric
  Cc: will, mark.rutland, devicetree, jacob.jun.pan, joro,
	linux-kernel, iommu, robh+dt, robin.murphy, linux-arm-kernel

Hi Eric,

Sorry for the delay. I'll see if I can resend this for v5.5, although I
can't do much testing at the moment.

On Mon, Jul 08, 2019 at 09:58:22AM +0200, Auger Eric wrote:
> Hi Jean,
> 
> On 6/10/19 8:47 PM, Jean-Philippe Brucker wrote:
> > For platform devices that support SubstreamID (SSID), firmware provides
> > the number of supported SSID bits. Restrict it to what the SMMU supports
> > and cache it into master->ssid_bits.
> The commit message may give the impression the master's ssid_bits field
> only is used for platform devices.

Ok maybe I should add that this field will be used for PCI PASID as
well.

> > @@ -2097,6 +2098,16 @@ static int arm_smmu_add_device(struct device *dev)
> >  		}
> >  	}
> >  
> > +	master->ssid_bits = min(smmu->ssid_bits, fwspec->num_pasid_bits);
> In case the device is a PCI device, what is the value taken by
> fwspec->num_pasid_bits?

It would be zero, as firmware only specifies a value for platform
devices. For a PCI device, patch 8/8 fills master->ssid_bits from the
PCIe PASID capability.

> > +	/*
> > +	 * If the SMMU doesn't support 2-stage CD, limit the linear
> > +	 * tables to a reasonable number of contexts, let's say
> > +	 * 64kB / sizeof(ctx_desc) = 1024 = 2^10
> ctx_desc is 26B so 11bits would be OK

This refers to the size of the hardware context descriptor, not struct
arm_smmu_ctx_desc. Next version moves this to a define and makes it
clearer.

Thanks,
Jean


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/8] iommu/arm-smmu-v3: Add support for Substream IDs
  2019-06-26 18:00   ` Will Deacon
  2019-07-04  9:33     ` Jean-Philippe Brucker
@ 2019-09-19 14:57     ` Jean-Philippe Brucker
  1 sibling, 0 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2019-09-19 14:57 UTC (permalink / raw)
  To: Will Deacon
  Cc: joro, robh+dt, mark.rutland, robin.murphy, jacob.jun.pan, iommu,
	devicetree, linux-kernel, linux-arm-kernel, eric.auger

On Wed, Jun 26, 2019 at 07:00:26PM +0100, Will Deacon wrote:
> On Mon, Jun 10, 2019 at 07:47:10PM +0100, Jean-Philippe Brucker wrote:
> > In all stream table entries, we set S1DSS=SSID0 mode, making translations
> > without an SSID use context descriptor 0. Although it would be possible by
> > setting S1DSS=BYPASS, we don't currently support SSID when user selects
> > iommu.passthrough.
> 
> I don't understand your comment here: iommu.passthrough works just as it did
> before, right, since we set bypass in the STE config field so S1DSS is not
> relevant?

What isn't supported is bypassing translation *only* for transactions
without SSID, and using context descriptors for anything with SSID. I
don't know if such a mode would be useful, but I can drop that sentence
to avoid confusion.

> I also notice that SSID0 causes transactions with SSID==0 to
> abort. Is a PASID of 0 reserved, so this doesn't matter?

Yes, we never allocate PASID 0.

> 
> > @@ -1062,33 +1143,90 @@ static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
> >  	return val;
> >  }
> >  
> > -static void arm_smmu_write_ctx_desc(struct arm_smmu_device *smmu,
> > -				    struct arm_smmu_s1_cfg *cfg)
> > +static int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
> > +				   int ssid, struct arm_smmu_ctx_desc *cd)
> >  {
> >  	u64 val;
> > +	bool cd_live;
> > +	struct arm_smmu_device *smmu = smmu_domain->smmu;
> > +	__le64 *cdptr = arm_smmu_get_cd_ptr(&smmu_domain->s1_cfg, ssid);
> >  
> >  	/*
> > -	 * We don't need to issue any invalidation here, as we'll invalidate
> > -	 * the STE when installing the new entry anyway.
> > +	 * This function handles the following cases:
> > +	 *
> > +	 * (1) Install primary CD, for normal DMA traffic (SSID = 0).
> > +	 * (2) Install a secondary CD, for SID+SSID traffic.
> > +	 * (3) Update ASID of a CD. Atomically write the first 64 bits of the
> > +	 *     CD, then invalidate the old entry and mappings.
> > +	 * (4) Remove a secondary CD.
> >  	 */
> > -	val = arm_smmu_cpu_tcr_to_cd(cfg->cd.tcr) |
> > +
> > +	if (!cdptr)
> > +		return -ENOMEM;
> > +
> > +	val = le64_to_cpu(cdptr[0]);
> > +	cd_live = !!(val & CTXDESC_CD_0_V);
> > +
> > +	if (!cd) { /* (4) */
> > +		cdptr[0] = 0;
> 
> Should we be using WRITE_ONCE here? (although I notice we don't seem to
> bother for STEs either...)

Yes, I think it makes sense

Thanks,
Jean

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/8] iommu/arm-smmu-v3: Add support for Substream IDs
  2019-07-08 15:31   ` Auger Eric
@ 2019-09-19 15:01     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2019-09-19 15:01 UTC (permalink / raw)
  To: Auger Eric
  Cc: Jean-Philippe Brucker, will, mark.rutland, devicetree,
	jacob.jun.pan, joro, linux-kernel, iommu, robh+dt, robin.murphy,
	linux-arm-kernel

On Mon, Jul 08, 2019 at 05:31:53PM +0200, Auger Eric wrote:
> Hi Jean,
> 
> On 6/10/19 8:47 PM, Jean-Philippe Brucker wrote:
> >  	/*
> > -	 * We don't need to issue any invalidation here, as we'll invalidate
> > -	 * the STE when installing the new entry anyway.
> > +	 * This function handles the following cases:
> > +	 *
> > +	 * (1) Install primary CD, for normal DMA traffic (SSID = 0).
> > +	 * (2) Install a secondary CD, for SID+SSID traffic.
> > +	 * (3) Update ASID of a CD. Atomically write the first 64 bits of the
> > +	 *     CD, then invalidate the old entry and mappings.
> Can you explain when (3) does occur?

When sharing a process context with devices (SVA), we write in that
context descriptor the ASID allocated by the arch ASID allocator for
that process. But that ASID might already have been allocated locally by
the SMMU driver for a private context. As there is a single ASID space
per SMMU for both private and shared ASIDs, we reallocated the private
ASID and update it here. See
https://lore.kernel.org/linux-iommu/20180511190641.23008-25-jean-philippe.brucker@arm.com/

> > +	 * (4) Remove a secondary CD.
> >  	 */
> > -	val = arm_smmu_cpu_tcr_to_cd(cfg->cd.tcr) |
> > +
> > +	if (!cdptr)
> > +		return -ENOMEM;
> Is that relevant? arm_smmu_get_cd_ptr() does not test ssid is within the
> cfg->s1cdmax range and always return smthg != NULL AFAIU.

It might return NULL with patch 5/8, when we can't allocate a 2nd-level
table. I can move the check over to that patch.

> > +	ret = arm_smmu_write_ctx_desc(smmu_domain, 0, &smmu_domain->s1_cfg.cd);
> cfg.cd

Right.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 5/8] iommu/arm-smmu-v3: Add second level of context descriptor table
  2019-07-08 15:13   ` Auger Eric
@ 2019-09-19 15:05     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2019-09-19 15:05 UTC (permalink / raw)
  To: Auger Eric
  Cc: mark.rutland, devicetree, jacob.jun.pan, joro, linux-kernel,
	iommu, robh+dt, robin.murphy

On Mon, Jul 08, 2019 at 05:13:05PM +0200, Auger Eric wrote:
> >  #define STRTAB_STE_0_S1FMT		GENMASK_ULL(5, 4)
> >  #define STRTAB_STE_0_S1FMT_LINEAR	0
> > +#define STRTAB_STE_0_S1FMT_4K_L2	1
> As you only use 64kB L2, I guess you can remove the 4K define?

I prefer defining all values, but I suppose I can get rid of it.

> > +	cfg->tables = devm_kzalloc(smmu->dev, sizeof(struct arm_smmu_cd_table) *
> > +				   cfg->num_tables, GFP_KERNEL);
> > +	if (!cfg->tables)
> > +		return -ENOMEM;
> goto err_free_l1
> > +
> > +	ret = arm_smmu_alloc_cd_leaf_table(smmu, &cfg->tables[0], num_leaf_entries);
> don't you want to do that only in linear case. In 2-level mode, I
> understand arm_smmu_get_cd_ptr() will do the job.

OK, that might be better

> 
> > +	if (ret)
> > +		goto err_free_l1;
> > +
> > +	if (cfg->l1ptr)
> > +		arm_smmu_write_cd_l1_desc(cfg->l1ptr, &cfg->tables[0]);
> that stuff could be removed as well?

Yes

> By the way I can see that
> arm_smmu_get_cd_ptr() does a arm_smmu_sync_cd after. wouldn't it be
> needed here as well?

No context table is reachable from a STE at this point, so we don't have
to invalidate anything.

> > @@ -1815,7 +1935,7 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
> >  	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
> >  		struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
> >  
> > -		if (cfg->table.ptr) {
> > +		if (cfg->tables) {
> >  			arm_smmu_free_cd_tables(smmu_domain);
> >  			arm_smmu_bitmap_free(smmu->asid_map, cfg->cd.asid);
> I don't get why the arm_smmu_bitmap_free is dependent on cfg->tables.

Simply because arm_smmu_bitmap_alloc() and arm_smmu_alloc_cd_tables()
are both performed in arm_smmu_domain_finalise_s1(). A domain returned
by arm_smmu_domain_alloc() is not fully initialized, it still needs to
be finalized by arm_smmu_attach_dev(). So here we check whether the
domain has been finalized or not. Zero being a valid ASID in this
driver, we can't check whether cfg->cd.asid is valid, so we check
cfg->tables instead.

And I forgot to clear cfg->tables after failure of domain_finalise in
this series, I'll need to fix it.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 6/8] iommu/arm-smmu-v3: Support auxiliary domains
  2019-06-26 17:59   ` Will Deacon
  2019-07-05 16:29     ` Jean-Philippe Brucker
@ 2019-09-19 15:06     ` Jean-Philippe Brucker
  1 sibling, 0 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2019-09-19 15:06 UTC (permalink / raw)
  To: Will Deacon
  Cc: Jean-Philippe Brucker, joro, robh+dt, mark.rutland, robin.murphy,
	jacob.jun.pan, iommu, devicetree, linux-kernel, linux-arm-kernel,
	eric.auger

On Wed, Jun 26, 2019 at 06:59:59PM +0100, Will Deacon wrote:
> > @@ -666,8 +668,14 @@ struct arm_smmu_domain {
> >  
> >  	struct iommu_domain		domain;
> >  
> > +	/* Unused in aux domains */
> >  	struct list_head		devices;
> >  	spinlock_t			devices_lock;
> > +
> > +	/* Auxiliary domain stuff */
> > +	struct arm_smmu_domain		*parent;
> > +	ioasid_t			ssid;
> > +	unsigned long			aux_nr_devs;
> 
> Maybe use a union to avoid comments about what is used/unused?

OK

> > +static void arm_smmu_aux_detach_dev(struct iommu_domain *domain, struct device *dev)
> > +{
> > +	struct iommu_domain *parent_domain;
> > +	struct arm_smmu_domain *parent_smmu_domain;
> > +	struct arm_smmu_master *master = dev_to_master(dev);
> > +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> > +
> > +	if (!arm_smmu_dev_feature_enabled(dev, IOMMU_DEV_FEAT_AUX))
> > +		return;
> > +
> > +	parent_domain = iommu_get_domain_for_dev(dev);
> > +	if (!parent_domain)
> > +		return;
> > +	parent_smmu_domain = to_smmu_domain(parent_domain);
> > +
> > +	mutex_lock(&smmu_domain->init_mutex);
> > +	if (!smmu_domain->aux_nr_devs)
> > +		goto out_unlock;
> > +
> > +	if (!--smmu_domain->aux_nr_devs) {
> > +		arm_smmu_write_ctx_desc(parent_smmu_domain, smmu_domain->ssid,
> > +					NULL);
> > +		/*
> > +		 * TLB doesn't need invalidation since accesses from the device
> > +		 * can't use this domain's ASID once the CD is clear.
> > +		 *
> > +		 * Sadly that doesn't apply to ATCs, which are PASID tagged.
> > +		 * Invalidate all other devices as well, because even though
> > +		 * they weren't 'officially' attached to the auxiliary domain,
> > +		 * they could have formed ATC entries.
> > +		 */
> > +		arm_smmu_atc_inv_domain(smmu_domain, 0, 0);
> 
> I've been struggling to understand the locking here, since both
> arm_smmu_write_ctx_desc and arm_smmu_atc_inv_domain take and release the
> devices_lock for the domain. Is there not a problem with devices coming and
> going in-between the two calls?

Yes, I need to think about this more. I bet there are plenty more issues
like this. For example I don't think I currently prevent the parent
domain from disappearing while auxiliary domains are attached.

> >  static struct iommu_ops arm_smmu_ops = {
> >  	.capable		= arm_smmu_capable,
> >  	.domain_alloc		= arm_smmu_domain_alloc,
> > @@ -2539,6 +2772,13 @@ static struct iommu_ops arm_smmu_ops = {
> >  	.of_xlate		= arm_smmu_of_xlate,
> >  	.get_resv_regions	= arm_smmu_get_resv_regions,
> >  	.put_resv_regions	= arm_smmu_put_resv_regions,
> > +	.dev_has_feat		= arm_smmu_dev_has_feature,
> > +	.dev_feat_enabled	= arm_smmu_dev_feature_enabled,
> > +	.dev_enable_feat	= arm_smmu_dev_enable_feature,
> > +	.dev_disable_feat	= arm_smmu_dev_disable_feature,
> 
> Why can't we use the existing ->capable and ->dev_{get,set}_attr callbacks
> for this?

->capable isn't very useful because it applies to all SMMUs in the
system. The existing ->{get,set}_attr callbacks apply to an
iommu_domain. I think the main reason for doing it on endpoints was that
it would be tedious to keep track of capabilities when attaching and
detaching devices to a domain, especially for drivers that allow
multiple IOMMUs per domain [1]. There were more discussions, and in the
end we agreed on this API for device attributes [2].

Thanks,
Jean

[1] https://lore.kernel.org/lkml/aa1ff748-c2ec-acc0-f1d9-cdff2b131e58@linux.intel.com/
[2] https://lore.kernel.org/linux-iommu/20181207102926.GM16835@8bytes.org/


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] iommu/arm-smmu-v3: Add support for PCI PASID
  2019-07-08  7:58   ` Auger Eric
@ 2019-09-19 15:10     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2019-09-19 15:10 UTC (permalink / raw)
  To: Auger Eric
  Cc: Jean-Philippe Brucker, will.deacon, mark.rutland, devicetree,
	jacob.jun.pan, joro, linux-kernel, iommu, robh+dt, robin.murphy,
	linux-arm-kernel

On Mon, Jul 08, 2019 at 09:58:16AM +0200, Auger Eric wrote:
> > +	ret = pci_enable_pasid(pdev, features);
> > +	if (!ret)
> > +		master->ssid_bits = min_t(u8, ilog2(num_pasids),
> > +					  master->smmu->ssid_bits);
> I don't really get why this setting is conditional to the success of
> pci_enabled_pasid and not num_pasids > 0.

num_pasids only contains the value of the PCIe PASID capability. If
pci_enable_pasid() fails then we want to leave master->ssid_bits to 0 so
that we report to users that SVA and AUXD aren't supported.

> If it fails the ssid_bits is set to min(smmu->ssid_bits,
> fwspec->num_pasid_bits) anyway.
>
> > +	return ret;
> > +}
> > +
> > +static void arm_smmu_disable_pasid(struct arm_smmu_master *master)
> > +{
> > +	struct pci_dev *pdev;
> > +
> > +	if (!dev_is_pci(master->dev))
> > +		return;
> > +
> > +	pdev = to_pci_dev(master->dev);
> > +
> > +	if (!pdev->pasid_enabled)
> > +		return;
> > +
> > +	pci_disable_pasid(pdev);
> > +	master->ssid_bits = 0;
> in case of a platform device you leave the ssid_bits to a value != 0. Is
> that what you want?

Yes, this is only for PCI devices, there is no standard way of disabling
PASID in platform devices. We just take whatever the firmware gives us.

> > +}
> > +
> >  static void arm_smmu_detach_dev(struct arm_smmu_master *master)
> >  {
> >  	unsigned long flags;
> > @@ -2413,6 +2456,9 @@ static int arm_smmu_add_device(struct device *dev)
> >  
> >  	master->ssid_bits = min(smmu->ssid_bits, fwspec->num_pasid_bits);
> >  
> > +	/* Note that PASID must be enabled before, and disabled after ATS */
> > +	arm_smmu_enable_pasid(master);
> In case the call fails, don't you want to handle the error and reset the
> ssid_bits?

This function fails if the device doesn't support PASID, and we leave
ssid_bits to 0. That said, I think it would be nicer to move the above
line (that deals with fwspec) into arm_smmu_enable_pasid()

Thanks,
Jean

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2019-09-19 15:10 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-10 18:47 [PATCH 0/8] iommu: Add auxiliary domain and PASID support to Arm SMMUv3 Jean-Philippe Brucker
2019-06-10 18:47 ` [PATCH 1/8] iommu: Add I/O ASID allocator Jean-Philippe Brucker
2019-06-11  9:36   ` Jonathan Cameron
2019-06-11 14:35     ` Jean-Philippe Brucker
2019-06-11 18:13       ` Jacob Pan
2019-06-18 14:22         ` Jean-Philippe Brucker
2019-06-18 17:05           ` Jacob Pan
2019-06-19 14:26             ` Jean-Philippe Brucker
2019-06-11 12:26   ` Jacob Pan
2019-06-11 14:37     ` Jean-Philippe Brucker
2019-06-11 17:10       ` Jacob Pan
2019-06-12 11:30         ` Jean-Philippe Brucker
2019-06-10 18:47 ` [PATCH 2/8] dt-bindings: document PASID property for IOMMU masters Jean-Philippe Brucker
2019-07-08  7:58   ` Auger Eric
2019-06-10 18:47 ` [PATCH 3/8] iommu/arm-smmu-v3: Support platform SSID Jean-Philippe Brucker
2019-06-11  9:42   ` Jonathan Cameron
2019-06-11 14:35     ` Jean-Philippe Brucker
2019-06-18 18:08   ` Will Deacon
2019-06-19 11:53     ` Jean-Philippe Brucker
2019-07-08  7:58   ` Auger Eric
2019-09-19 14:51     ` Jean-Philippe Brucker
2019-06-10 18:47 ` [PATCH 4/8] iommu/arm-smmu-v3: Add support for Substream IDs Jean-Philippe Brucker
2019-06-11 10:19   ` Jonathan Cameron
2019-06-11 14:35     ` Jean-Philippe Brucker
2019-06-26 18:00   ` Will Deacon
2019-07-04  9:33     ` Jean-Philippe Brucker
2019-09-19 14:57     ` Jean-Philippe Brucker
2019-07-08 15:31   ` Auger Eric
2019-09-19 15:01     ` Jean-Philippe Brucker
2019-06-10 18:47 ` [PATCH 5/8] iommu/arm-smmu-v3: Add second level of context descriptor table Jean-Philippe Brucker
2019-06-11 10:24   ` Jonathan Cameron
2019-07-08 15:13   ` Auger Eric
2019-09-19 15:05     ` Jean-Philippe Brucker
2019-06-10 18:47 ` [PATCH 6/8] iommu/arm-smmu-v3: Support auxiliary domains Jean-Philippe Brucker
2019-06-26 17:59   ` Will Deacon
2019-07-05 16:29     ` Jean-Philippe Brucker
2019-09-19 15:06     ` Jean-Philippe Brucker
2019-06-10 18:47 ` [PATCH 7/8] iommu/arm-smmu-v3: Improve add_device() error handling Jean-Philippe Brucker
2019-07-08  7:58   ` Auger Eric
2019-06-10 18:47 ` [PATCH 8/8] iommu/arm-smmu-v3: Add support for PCI PASID Jean-Philippe Brucker
2019-06-11 10:45   ` Jonathan Cameron
2019-06-11 14:35     ` Jean-Philippe Brucker
2019-07-08  7:58   ` Auger Eric
2019-09-19 15:10     ` Jean-Philippe Brucker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).