linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/25] iommu: Shared Virtual Addressing and SMMUv3 support
@ 2020-04-14 17:02 Jean-Philippe Brucker
  2020-04-14 17:02 ` [PATCH v5 01/25] mm/mmu_notifiers: pass private data down to alloc_notifier() Jean-Philippe Brucker
                   ` (24 more replies)
  0 siblings, 25 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2020-04-14 17:02 UTC (permalink / raw)
  To: iommu, devicetree, linux-arm-kernel, linux-pci, linux-mm
  Cc: joro, catalin.marinas, will, robin.murphy, kevin.tian, baolu.lu,
	Jonathan.Cameron, jacob.jun.pan, christian.koenig, zhangfei.gao,
	jgg, xuzaibo, Jean-Philippe Brucker

Shared Virtual Addressing (SVA) allows to share process page tables with
devices using the IOMMU. Add a generic implementation of the IOMMU SVA
API, and add support in the Arm SMMUv3 driver.

Since v4 [1] I changed the PASID lifetime. It isn't released when the
corresponding process address space dies, but when the device driver calls
unbind. This alleviates the mmput() path as we don't need to ensure that
the device driver stops DMA there anymore. For more details see my
proposal from last week [2], which is a requirement for this series. As a
result patch 1 has separate clear() and detach() operations, and patch 17
has a new context descriptor state. 

Other changes are a simplification of the locking in patch 1 and overall
cleanups following review comments.

[1] [PATCH v4 00/26] iommu: Shared Virtual Addressing and SMMUv3 support
    https://lore.kernel.org/linux-iommu/20200224182401.353359-1-jean-philippe@linaro.org/
[2] [PATCH 0/2] iommu: Remove iommu_sva_ops::mm_exit()
    https://lore.kernel.org/linux-iommu/20200408140427.212807-1-jean-philippe@linaro.org/

Jean-Philippe Brucker (25):
  mm/mmu_notifiers: pass private data down to alloc_notifier()
  iommu/sva: Manage process address spaces
  iommu: Add a page fault handler
  iommu/sva: Search mm by PASID
  iommu/iopf: Handle mm faults
  iommu/sva: Register page fault handler
  arm64: mm: Add asid_gen_match() helper
  arm64: mm: Pin down ASIDs for sharing mm with devices
  iommu/io-pgtable-arm: Move some definitions to a header
  iommu/arm-smmu-v3: Manage ASIDs with xarray
  arm64: cpufeature: Export symbol read_sanitised_ftr_reg()
  iommu/arm-smmu-v3: Share process page tables
  iommu/arm-smmu-v3: Seize private ASID
  iommu/arm-smmu-v3: Add support for VHE
  iommu/arm-smmu-v3: Enable broadcast TLB maintenance
  iommu/arm-smmu-v3: Add SVA feature checking
  iommu/arm-smmu-v3: Implement mm operations
  iommu/arm-smmu-v3: Hook up ATC invalidation to mm ops
  iommu/arm-smmu-v3: Add support for Hardware Translation Table Update
  iommu/arm-smmu-v3: Maintain a SID->device structure
  dt-bindings: document stall property for IOMMU masters
  iommu/arm-smmu-v3: Add stall support for platform devices
  PCI/ATS: Add PRI stubs
  PCI/ATS: Export PRI functions
  iommu/arm-smmu-v3: Add support for PRI

 drivers/iommu/Kconfig                         |   13 +
 drivers/iommu/Makefile                        |    2 +
 .../devicetree/bindings/iommu/iommu.txt       |   18 +
 arch/arm64/include/asm/mmu.h                  |    1 +
 arch/arm64/include/asm/mmu_context.h          |   11 +-
 drivers/iommu/io-pgtable-arm.h                |   30 +
 drivers/iommu/iommu-sva.h                     |   78 +
 include/linux/iommu.h                         |   75 +
 include/linux/mmu_notifier.h                  |   11 +-
 include/linux/pci-ats.h                       |    8 +
 arch/arm64/kernel/cpufeature.c                |    1 +
 arch/arm64/mm/context.c                       |  103 +-
 drivers/iommu/arm-smmu-v3.c                   | 1398 +++++++++++++++--
 drivers/iommu/io-pgfault.c                    |  525 +++++++
 drivers/iommu/io-pgtable-arm.c                |   27 +-
 drivers/iommu/iommu-sva.c                     |  557 +++++++
 drivers/iommu/iommu.c                         |    1 +
 drivers/iommu/of_iommu.c                      |    5 +-
 drivers/misc/sgi-gru/grutlbpurge.c            |    5 +-
 drivers/pci/ats.c                             |    4 +
 mm/mmu_notifier.c                             |    6 +-
 21 files changed, 2716 insertions(+), 163 deletions(-)
 create mode 100644 drivers/iommu/io-pgtable-arm.h
 create mode 100644 drivers/iommu/iommu-sva.h
 create mode 100644 drivers/iommu/io-pgfault.c
 create mode 100644 drivers/iommu/iommu-sva.c

-- 
2.26.0



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v5 01/25] mm/mmu_notifiers: pass private data down to alloc_notifier()
  2020-04-14 17:02 [PATCH v5 00/25] iommu: Shared Virtual Addressing and SMMUv3 support Jean-Philippe Brucker
@ 2020-04-14 17:02 ` Jean-Philippe Brucker
  2020-04-14 18:09   ` Jason Gunthorpe
  2020-04-14 17:02 ` [PATCH v5 02/25] iommu/sva: Manage process address spaces Jean-Philippe Brucker
                   ` (23 subsequent siblings)
  24 siblings, 1 reply; 44+ messages in thread
From: Jean-Philippe Brucker @ 2020-04-14 17:02 UTC (permalink / raw)
  To: iommu, devicetree, linux-arm-kernel, linux-pci, linux-mm
  Cc: joro, catalin.marinas, will, robin.murphy, kevin.tian, baolu.lu,
	Jonathan.Cameron, jacob.jun.pan, christian.koenig, zhangfei.gao,
	jgg, xuzaibo, Jean-Philippe Brucker, Andrew Morton,
	Arnd Bergmann, Christoph Hellwig, Dimitri Sivanich,
	Greg Kroah-Hartman

The new allocation scheme introduced by commit 2c7933f53f6b
("mm/mmu_notifiers: add a get/put scheme for the registration") provides
a convenient way for users to attach notifier data to an mm. However, it
would be even better to create this notifier data atomically.

Since the alloc_notifier() callback only takes an mm argument at the
moment, some users have to perform the allocation in two times.
alloc_notifier() initially creates an incomplete structure, which is
then finalized using more context once mmu_notifier_get() returns. This
second step requires extra care to order memory accesses against live
invalidation.

The IOMMU SVA module, which attaches an mm to multiple devices,
exemplifies this situation. In essence it does:

	mmu_notifier_get()
	  alloc_notifier()
	     A = kzalloc()
	  /* MMU notifier is published */
	A->ctx = ctx;				// (1)
	device->A = A;
	list_add_rcu(device, A->devices);	// (2)

The invalidate notifier, which may start running before A is fully
initialized, does the following:

	io_mm_invalidate(A)
	  list_for_each_entry_rcu(device, A->devices)
	    device->invalidate(A->ctx)

The invalidate() thread must observe the initialization (1) before (2),
which is easily solved by fully initializing object A in
alloc_notifier(), before publishing the MMU notifier.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
v4->v5: provide example in commit message, fix style.
---
 include/linux/mmu_notifier.h       | 11 +++++++----
 drivers/misc/sgi-gru/grutlbpurge.c |  5 +++--
 mm/mmu_notifier.c                  |  6 ++++--
 3 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index 736f6918335ed..0536fe85e7457 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -207,7 +207,8 @@ struct mmu_notifier_ops {
 	 * callbacks are currently running. It is called from a SRCU callback
 	 * and cannot sleep.
 	 */
-	struct mmu_notifier *(*alloc_notifier)(struct mm_struct *mm);
+	struct mmu_notifier *(*alloc_notifier)(struct mm_struct *mm,
+					       void *privdata);
 	void (*free_notifier)(struct mmu_notifier *subscription);
 };
 
@@ -271,14 +272,16 @@ static inline int mm_has_notifiers(struct mm_struct *mm)
 }
 
 struct mmu_notifier *mmu_notifier_get_locked(const struct mmu_notifier_ops *ops,
-					     struct mm_struct *mm);
+					     struct mm_struct *mm,
+					     void *privdata);
 static inline struct mmu_notifier *
-mmu_notifier_get(const struct mmu_notifier_ops *ops, struct mm_struct *mm)
+mmu_notifier_get(const struct mmu_notifier_ops *ops, struct mm_struct *mm,
+		 void *privdata)
 {
 	struct mmu_notifier *ret;
 
 	down_write(&mm->mmap_sem);
-	ret = mmu_notifier_get_locked(ops, mm);
+	ret = mmu_notifier_get_locked(ops, mm, privdata);
 	up_write(&mm->mmap_sem);
 	return ret;
 }
diff --git a/drivers/misc/sgi-gru/grutlbpurge.c b/drivers/misc/sgi-gru/grutlbpurge.c
index 10921cd2608df..336e1b1df072f 100644
--- a/drivers/misc/sgi-gru/grutlbpurge.c
+++ b/drivers/misc/sgi-gru/grutlbpurge.c
@@ -235,7 +235,8 @@ static void gru_invalidate_range_end(struct mmu_notifier *mn,
 		gms, range->start, range->end);
 }
 
-static struct mmu_notifier *gru_alloc_notifier(struct mm_struct *mm)
+static struct mmu_notifier *gru_alloc_notifier(struct mm_struct *mm,
+					       void *privdata)
 {
 	struct gru_mm_struct *gms;
 
@@ -266,7 +267,7 @@ struct gru_mm_struct *gru_register_mmu_notifier(void)
 {
 	struct mmu_notifier *mn;
 
-	mn = mmu_notifier_get_locked(&gru_mmuops, current->mm);
+	mn = mmu_notifier_get_locked(&gru_mmuops, current->mm, NULL);
 	if (IS_ERR(mn))
 		return ERR_CAST(mn);
 
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index 06852b896fa63..6b9bfb8ca94d2 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -743,6 +743,7 @@ find_get_mmu_notifier(struct mm_struct *mm, const struct mmu_notifier_ops *ops)
  *                           the mm & ops
  * @ops: The operations struct being subscribe with
  * @mm : The mm to attach notifiers too
+ * @privdata: Initialization data passed down to ops->alloc_notifier()
  *
  * This function either allocates a new mmu_notifier via
  * ops->alloc_notifier(), or returns an already existing notifier on the
@@ -756,7 +757,8 @@ find_get_mmu_notifier(struct mm_struct *mm, const struct mmu_notifier_ops *ops)
  * and can be converted to an active mm pointer via mmget_not_zero().
  */
 struct mmu_notifier *mmu_notifier_get_locked(const struct mmu_notifier_ops *ops,
-					     struct mm_struct *mm)
+					     struct mm_struct *mm,
+					     void *privdata)
 {
 	struct mmu_notifier *subscription;
 	int ret;
@@ -769,7 +771,7 @@ struct mmu_notifier *mmu_notifier_get_locked(const struct mmu_notifier_ops *ops,
 			return subscription;
 	}
 
-	subscription = ops->alloc_notifier(mm);
+	subscription = ops->alloc_notifier(mm, privdata);
 	if (IS_ERR(subscription))
 		return subscription;
 	subscription->ops = ops;
-- 
2.26.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 02/25] iommu/sva: Manage process address spaces
  2020-04-14 17:02 [PATCH v5 00/25] iommu: Shared Virtual Addressing and SMMUv3 support Jean-Philippe Brucker
  2020-04-14 17:02 ` [PATCH v5 01/25] mm/mmu_notifiers: pass private data down to alloc_notifier() Jean-Philippe Brucker
@ 2020-04-14 17:02 ` Jean-Philippe Brucker
  2020-04-16  7:28   ` Christoph Hellwig
  2020-04-14 17:02 ` [PATCH v5 03/25] iommu: Add a page fault handler Jean-Philippe Brucker
                   ` (22 subsequent siblings)
  24 siblings, 1 reply; 44+ messages in thread
From: Jean-Philippe Brucker @ 2020-04-14 17:02 UTC (permalink / raw)
  To: iommu, devicetree, linux-arm-kernel, linux-pci, linux-mm
  Cc: joro, catalin.marinas, will, robin.murphy, kevin.tian, baolu.lu,
	Jonathan.Cameron, jacob.jun.pan, christian.koenig, zhangfei.gao,
	jgg, xuzaibo, Jean-Philippe Brucker

Add a small library to help IOMMU drivers manage process address spaces
bound to their devices. Register an MMU notifier to track modification
on each address space bound to one or more devices.

IOMMU drivers must implement the io_mm_ops and can then use the helpers
provided by this library to easily implement the SVA API introduced by
commit 26b25a2b98e4 ("iommu: Bind process address spaces to devices").
The io_mm_ops are:

alloc: Allocate a PASID context private to the IOMMU driver. There is a
  single context per mm. IOMMU drivers may perform arch-specific
  operations in there, for example pinning down a CPU ASID (on Arm).

attach: Attach a context to the device, by setting up the PASID table
  entry.

invalidate: Invalidate TLB entries for this address range.

clear: Clear the context and invalidate IOTLBs. Called if the mm exits
  before unbind(). DMA may still be issued.

detach: Detach a context from the device. Unlike clear() this is always
  called, at unbind(), and DMA aren't issued anymore.

free: Free a context.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
v4->v5:
* Simplify locking
* Add clear() op
* Improve doc
---
 drivers/iommu/Kconfig     |   7 +
 drivers/iommu/Makefile    |   1 +
 drivers/iommu/iommu-sva.h |  78 ++++++
 include/linux/iommu.h     |   4 +
 drivers/iommu/iommu-sva.c | 527 ++++++++++++++++++++++++++++++++++++++
 drivers/iommu/iommu.c     |   1 +
 6 files changed, 618 insertions(+)
 create mode 100644 drivers/iommu/iommu-sva.h
 create mode 100644 drivers/iommu/iommu-sva.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 58b4a4dbfc78b..e81842f59b037 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -102,6 +102,13 @@ config IOMMU_DMA
 	select IRQ_MSI_IOMMU
 	select NEED_SG_DMA_LENGTH
 
+# Shared Virtual Addressing library
+config IOMMU_SVA
+	bool
+	select IOASID
+	select IOMMU_API
+	select MMU_NOTIFIER
+
 config FSL_PAMU
 	bool "Freescale IOMMU support"
 	depends on PCI
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 9f33fdb3bb051..40c800dd4e3ef 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -37,3 +37,4 @@ obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
 obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
 obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
 obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
+obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
diff --git a/drivers/iommu/iommu-sva.h b/drivers/iommu/iommu-sva.h
new file mode 100644
index 0000000000000..3c4c7e886a6be
--- /dev/null
+++ b/drivers/iommu/iommu-sva.h
@@ -0,0 +1,78 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * SVA library for IOMMU drivers
+ */
+#ifndef _IOMMU_SVA_H
+#define _IOMMU_SVA_H
+
+#include <linux/iommu.h>
+#include <linux/kref.h>
+#include <linux/mmu_notifier.h>
+
+struct io_mm_ops {
+	/* Allocate a PASID context for an mm */
+	void *(*alloc)(struct mm_struct *mm);
+
+	/*
+	 * Attach a PASID context to a device. Write the entry into the PASID
+	 * table.
+	 *
+	 * @attach_domain is true when no other device in the IOMMU domain is
+	 *   already attached to this context. IOMMU drivers that share the
+	 *   PASID tables within a domain don't need to write the PASID entry
+	 *   when @attach_domain is false.
+	 */
+	int (*attach)(struct device *dev, int pasid, void *ctx,
+		      bool attach_domain);
+
+	/* Invalidate a range of addresses. Cannot sleep. */
+	void (*invalidate)(struct device *dev, int pasid, void *ctx,
+			   unsigned long vaddr, size_t size);
+
+	/*
+	 * Clear a PASID context, invalidate IOTLBs. Called when the address
+	 * space attached to this context exits. Until detach() is called, the
+	 * PASID is not freed. The IOMMU driver should expect incoming DMA
+	 * transactions for this PASID and abort them quietly. The IOMMU driver
+	 * can still queue incoming page faults for this PASID, they will be
+	 * silently aborted.
+	 */
+	void (*clear)(struct device *dev, int pasid, void *ctx);
+
+	/*
+	 * Detach a PASID context from a device. Unlike exit() this is final.
+	 * There are no more incoming DMA transactions, and page faults have
+	 * been flushed.
+	 *
+	 * @detach_domain is true when no other device in the IOMMU domain is
+	 *   still attached to this context. IOMMU drivers that share the PASID
+	 *   table within a domain don't need to clear the PASID entry when
+	 *   @detach_domain is false, only invalidate the caches.
+	 *
+	 * @cleared is true if the clear() op has already been called for this
+	 *   context. In this case there is no need to invalidate IOTLBs
+	 */
+	void (*detach)(struct device *dev, int pasid, void *ctx,
+		       bool detach_domain, bool cleared);
+
+	/* Free a context. Cannot sleep. */
+	void (*free)(void *ctx);
+};
+
+struct iommu_sva_param {
+	u32			min_pasid;
+	u32			max_pasid;
+	int			nr_bonds;
+};
+
+struct iommu_sva *
+iommu_sva_bind_generic(struct device *dev, struct mm_struct *mm,
+		       const struct io_mm_ops *ops, void *drvdata);
+void iommu_sva_unbind_generic(struct iommu_sva *handle);
+int iommu_sva_get_pasid_generic(struct iommu_sva *handle);
+
+int iommu_sva_enable(struct device *dev, struct iommu_sva_param *sva_param);
+int iommu_sva_disable(struct device *dev);
+bool iommu_sva_enabled(struct device *dev);
+
+#endif /* _IOMMU_SVA_H */
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index b62525747bd91..167e468dd3510 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -347,6 +347,8 @@ struct iommu_fault_param {
  * struct dev_iommu - Collection of per-device IOMMU data
  *
  * @fault_param: IOMMU detected device fault reporting data
+ * @sva_param:	 IOMMU parameter for SVA
+ * @sva_lock:	 protects @sva_param
  * @fwspec:	 IOMMU fwspec data
  * @priv:	 IOMMU Driver private data
  *
@@ -356,6 +358,8 @@ struct iommu_fault_param {
 struct dev_iommu {
 	struct mutex lock;
 	struct iommu_fault_param	*fault_param;
+	struct iommu_sva_param		*sva_param;
+	struct mutex			sva_lock;
 	struct iommu_fwspec		*fwspec;
 	void				*priv;
 };
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
new file mode 100644
index 0000000000000..7fecc74a9f7d6
--- /dev/null
+++ b/drivers/iommu/iommu-sva.c
@@ -0,0 +1,527 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Manage PASIDs and bind process address spaces to devices.
+ *
+ * Copyright (C) 2020 ARM Ltd.
+ */
+
+#include <linux/idr.h>
+#include <linux/ioasid.h>
+#include <linux/iommu.h>
+#include <linux/sched/mm.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+
+#include "iommu-sva.h"
+
+/**
+ * DOC: io_mm model
+ *
+ * The io_mm keeps track of process address spaces shared between CPU and IOMMU.
+ * The following example illustrates the relation between structures
+ * iommu_domain, io_mm and iommu_sva. The iommu_sva struct is a bond between
+ * io_mm and device. A device can have multiple io_mm and an io_mm may be bound
+ * to multiple devices.
+ *              ___________________________
+ *             |  IOMMU domain A           |
+ *             |  ________________         |
+ *             | |  IOMMU group   |        |
+ *             | |                |        |
+ *             | |   dev 00:00.0 ----+------- bond 1 --- io_mm X
+ *             | |________________|   \    |
+ *             |                       '----- bond 2 ---.
+ *             |___________________________|             \
+ *              ___________________________               \
+ *             |  IOMMU domain B           |             io_mm Y
+ *             |  ________________         |             / /
+ *             | |  IOMMU group   |        |            / /
+ *             | |                |        |           / /
+ *             | |   dev 00:01.0 ------------ bond 3 -' /
+ *             | |   dev 00:01.1 ------------ bond 4 --'
+ *             | |________________|        |
+ *             |___________________________|
+ *
+ * In this example, device 00:00.0 is in domain A, devices 00:01.* are in domain
+ * B. All devices within the same domain access the same address spaces. Device
+ * 00:00.0 accesses address spaces X and Y, each corresponding to an mm_struct.
+ * Devices 00:01.* only access address space Y.
+ *
+ * To obtain the above configuration, users would for instance issue the
+ * following calls:
+ *
+ *     iommu_sva_bind_device(dev 00:00.0, mm X, ...) -> bond 1
+ *     iommu_sva_bind_device(dev 00:00.0, mm Y, ...) -> bond 2
+ *     iommu_sva_bind_device(dev 00:01.0, mm Y, ...) -> bond 3
+ *     iommu_sva_bind_device(dev 00:01.1, mm Y, ...) -> bond 4
+ *
+ * A single Process Address Space ID (PASID) is allocated for each mm. It is a
+ * choice made for the Linux SVA implementation, not a hardware restriction. In
+ * the example, devices use PASID 1 to read/write into address space X and PASID
+ * 2 to read/write into address space Y. Calling iommu_sva_get_pasid() on bond 1
+ * returns 1, and calling it on bonds 2-4 returns 2.
+ *
+ * Hardware tables describing this configuration in the IOMMU would typically
+ * look like this:
+ *
+ *                                PASID tables
+ *                                 of domain A
+ *                              .->+--------+
+ *                             / 0 |        |-------> io_pgtable
+ *                            /    +--------+
+ *            Device tables  /   1 |        |-------> pgd X
+ *              +--------+  /      +--------+
+ *      00:00.0 |      A |-'     2 |        |--.
+ *              +--------+         +--------+   \
+ *              :        :       3 |        |    \
+ *              +--------+         +--------+     --> pgd Y
+ *      00:01.0 |      B |--.                    /
+ *              +--------+   \                  |
+ *      00:01.1 |      B |----+   PASID tables  |
+ *              +--------+     \   of domain B  |
+ *                              '->+--------+   |
+ *                               0 |        |-- | --> io_pgtable
+ *                                 +--------+   |
+ *                               1 |        |   |
+ *                                 +--------+   |
+ *                               2 |        |---'
+ *                                 +--------+
+ *                               3 |        |
+ *                                 +--------+
+ *
+ * With this model, a single call binds all devices in a given domain to an
+ * address space. Other devices in the domain will get the same bond implicitly.
+ * However, users must issue one bind() for each device, because IOMMUs may
+ * implement SVA differently. Furthermore, mandating one bind() per device
+ * allows the driver to perform sanity-checks on device capabilities.
+ *
+ * In some IOMMUs, one entry of the PASID table (typically the first one) can
+ * hold non-PASID translations. In this case PASID 0 is reserved and the first
+ * entry points to the io_pgtable pointer. In other IOMMUs PASID 0 is available
+ * to the allocator.
+ */
+
+struct io_mm {
+	struct hlist_head		devices;
+	struct mm_struct		*mm;
+	struct mmu_notifier		notifier;
+
+	/* Late initialization */
+	const struct io_mm_ops		*ops;
+	void				*ctx;
+	int				pasid;
+};
+
+#define mn_to_io_mm(mmu_notifier) \
+	container_of(mmu_notifier, struct io_mm, notifier)
+
+struct iommu_bond {
+	struct iommu_sva		sva;
+	struct io_mm			*io_mm;
+
+	struct hlist_node		mm_node;
+	void				*drvdata;
+	struct rcu_head			rcu_head;
+	refcount_t			refs;
+	bool				cleared;
+};
+
+#define to_iommu_bond(handle) container_of(handle, struct iommu_bond, sva)
+
+static DECLARE_IOASID_SET(shared_pasid);
+
+/*
+ * Serializes modifications of bonds.
+ * Lock order: Device SVA mutex; global SVA mutex; IOASID lock
+ */
+static DEFINE_MUTEX(iommu_sva_lock);
+
+struct io_mm_alloc_params {
+	const struct io_mm_ops *ops;
+	int min_pasid, max_pasid;
+};
+
+static struct mmu_notifier *io_mm_alloc(struct mm_struct *mm, void *privdata)
+{
+	int ret;
+	struct io_mm *io_mm;
+	struct io_mm_alloc_params *params = privdata;
+
+	io_mm = kzalloc(sizeof(*io_mm), GFP_KERNEL);
+	if (!io_mm)
+		return ERR_PTR(-ENOMEM);
+
+	io_mm->mm = mm;
+	io_mm->ops = params->ops;
+	INIT_HLIST_HEAD(&io_mm->devices);
+
+	io_mm->pasid = ioasid_alloc(&shared_pasid, params->min_pasid,
+				    params->max_pasid, io_mm->mm);
+	if (io_mm->pasid == INVALID_IOASID) {
+		ret = -ENOSPC;
+		goto err_free_io_mm;
+	}
+
+	io_mm->ctx = params->ops->alloc(mm);
+	if (IS_ERR(io_mm->ctx)) {
+		ret = PTR_ERR(io_mm->ctx);
+		goto err_free_pasid;
+	}
+	return &io_mm->notifier;
+
+err_free_pasid:
+	ioasid_free(io_mm->pasid);
+err_free_io_mm:
+	kfree(io_mm);
+	return ERR_PTR(ret);
+}
+
+static void io_mm_free(struct mmu_notifier *mn)
+{
+	struct io_mm *io_mm = mn_to_io_mm(mn);
+
+	WARN_ON(!hlist_empty(&io_mm->devices));
+
+	io_mm->ops->free(io_mm->ctx);
+	ioasid_free(io_mm->pasid);
+	kfree(io_mm);
+}
+
+static void io_mm_invalidate_range(struct mmu_notifier *mn,
+				   struct mm_struct *mm, unsigned long start,
+				   unsigned long end)
+{
+	struct iommu_bond *bond;
+	struct io_mm *io_mm = mn_to_io_mm(mn);
+
+	rcu_read_lock();
+	hlist_for_each_entry_rcu(bond, &io_mm->devices, mm_node)
+		io_mm->ops->invalidate(bond->sva.dev, io_mm->pasid, io_mm->ctx,
+				       start, end - start);
+	rcu_read_unlock();
+}
+
+/*
+ * io_mm_release - release MMU notifier
+ *
+ * Called when the mm exits. To avoid spending too much time in here, we only
+ * clear page table pointers and invalidate IOTLBs here, but we don't stop DMA
+ * or free anything here.
+ */
+static void io_mm_release(struct mmu_notifier *mn, struct mm_struct *mm)
+{
+	struct iommu_bond *bond;
+	struct io_mm *io_mm = mn_to_io_mm(mn);
+
+	mutex_lock(&iommu_sva_lock);
+	hlist_for_each_entry(bond, &io_mm->devices, mm_node) {
+		/* The release notifier could fire multiple times. */
+		if (bond->cleared)
+			continue;
+
+		io_mm->ops->clear(bond->sva.dev, io_mm->pasid, io_mm->ctx);
+		bond->cleared = true;
+	}
+	mutex_unlock(&iommu_sva_lock);
+}
+
+static struct mmu_notifier_ops iommu_mmu_notifier_ops = {
+	.alloc_notifier		= io_mm_alloc,
+	.free_notifier		= io_mm_free,
+	.invalidate_range	= io_mm_invalidate_range,
+	.release		= io_mm_release,
+};
+
+/*
+ * io_mm_get - Allocate an io_mm or get the existing one for the given mm
+ * @mm: the mm
+ * @ops: callbacks for the IOMMU driver
+ * @min_pasid: minimum PASID value (inclusive)
+ * @max_pasid: maximum PASID value (inclusive)
+ *
+ * Returns a valid io_mm or an error pointer.
+ */
+static struct io_mm *io_mm_get(struct mm_struct *mm,
+			       const struct io_mm_ops *ops,
+			       int min_pasid, int max_pasid)
+{
+	struct io_mm *io_mm;
+	struct mmu_notifier *mn;
+	struct io_mm_alloc_params params = {
+		.ops		= ops,
+		.min_pasid	= min_pasid,
+		.max_pasid	= max_pasid,
+	};
+
+	/*
+	 * A single notifier can exist for this (ops, mm) pair. Allocate it if
+	 * necessary.
+	 */
+	mn = mmu_notifier_get(&iommu_mmu_notifier_ops, mm, &params);
+	if (IS_ERR(mn))
+		return ERR_CAST(mn);
+
+	io_mm = mn_to_io_mm(mn);
+	if (WARN_ON(io_mm->ops != ops)) {
+		mmu_notifier_put(mn);
+		return ERR_PTR(-EINVAL);
+	}
+
+	return io_mm;
+}
+
+static void io_mm_put(struct io_mm *io_mm)
+{
+	mmu_notifier_put(&io_mm->notifier);
+}
+
+static struct iommu_sva *
+io_mm_attach(struct device *dev, struct io_mm *io_mm, void *drvdata)
+{
+	int ret;
+	bool attach_domain = true;
+	struct iommu_bond *bond, *tmp;
+	struct iommu_domain *domain, *other;
+	struct iommu_sva_param *param = dev->iommu->sva_param;
+
+	domain = iommu_get_domain_for_dev(dev);
+
+	/* Is it already bound to the device or domain? */
+	hlist_for_each_entry(tmp, &io_mm->devices, mm_node) {
+		if (tmp->sva.dev == dev) {
+			if (WARN_ON(tmp->drvdata != drvdata))
+				return ERR_PTR(-EINVAL);
+
+			/*
+			 * Hold a single io_mm reference per bond. Note that we
+			 * can't return an error after this, otherwise the
+			 * caller would drop an additional reference to the
+			 * io_mm.
+			 */
+			refcount_inc(&tmp->refs);
+			io_mm_put(io_mm);
+			return &tmp->sva;
+		}
+
+		if (!attach_domain)
+			continue;
+
+		other = iommu_get_domain_for_dev(tmp->sva.dev);
+		if (domain == other)
+			attach_domain = false;
+	}
+
+	bond = kzalloc(sizeof(*bond), GFP_KERNEL);
+	if (!bond)
+		return ERR_PTR(-ENOMEM);
+
+	bond->sva.dev	= dev;
+	bond->drvdata	= drvdata;
+	bond->io_mm	= io_mm;
+	refcount_set(&bond->refs, 1);
+
+	hlist_add_head_rcu(&bond->mm_node, &io_mm->devices);
+
+	ret = io_mm->ops->attach(bond->sva.dev, io_mm->pasid, io_mm->ctx,
+				 attach_domain);
+	if (ret)
+		goto err_remove;
+
+	param->nr_bonds++;
+	return &bond->sva;
+
+err_remove:
+	/*
+	 * At this point concurrent threads may have started to access the
+	 * io_mm->devices list in order to invalidate address ranges, which
+	 * requires to free the bond via kfree_rcu()
+	 */
+	hlist_del_init_rcu(&bond->mm_node);
+	kfree_rcu(bond, rcu_head);
+	return ERR_PTR(ret);
+}
+
+static void io_mm_detach(struct iommu_bond *bond)
+{
+	struct iommu_bond *tmp;
+	bool detach_domain = true;
+	struct io_mm *io_mm = bond->io_mm;
+	struct device *dev = bond->sva.dev;
+	struct iommu_domain *domain, *other;
+	struct iommu_sva_param *param = dev->iommu->sva_param;
+
+	if (!refcount_dec_and_test(&bond->refs))
+		return;
+
+	param->nr_bonds--;
+
+	domain = iommu_get_domain_for_dev(bond->sva.dev);
+
+	/* Are other devices in the same domain still attached to this mm? */
+	hlist_for_each_entry(tmp, &io_mm->devices, mm_node) {
+		if (tmp == bond)
+			continue;
+		other = iommu_get_domain_for_dev(tmp->sva.dev);
+		if (domain == other) {
+			detach_domain = false;
+			break;
+		}
+	}
+
+	io_mm->ops->detach(bond->sva.dev, io_mm->pasid, io_mm->ctx,
+			   detach_domain, bond->cleared);
+
+	hlist_del_init_rcu(&bond->mm_node);
+	kfree_rcu(bond, rcu_head);
+	io_mm_put(io_mm);
+}
+
+
+struct iommu_sva *
+iommu_sva_bind_generic(struct device *dev, struct mm_struct *mm,
+		       const struct io_mm_ops *ops, void *drvdata)
+{
+	struct io_mm *io_mm;
+	struct iommu_sva *handle;
+	struct dev_iommu *param = dev->iommu;
+
+	if (!param)
+		return ERR_PTR(-ENODEV);
+
+	mutex_lock(&param->sva_lock);
+	mutex_lock(&iommu_sva_lock);
+	if (!param->sva_param) {
+		handle = ERR_PTR(-ENODEV);
+		goto out_unlock;
+	}
+
+	io_mm = io_mm_get(mm, ops, param->sva_param->min_pasid,
+			  param->sva_param->max_pasid);
+	if (IS_ERR(io_mm)) {
+		handle = ERR_CAST(io_mm);
+		goto out_unlock;
+	}
+
+	handle = io_mm_attach(dev, io_mm, drvdata);
+	if (IS_ERR(handle))
+		io_mm_put(io_mm);
+
+out_unlock:
+	mutex_unlock(&iommu_sva_lock);
+	mutex_unlock(&param->sva_lock);
+	return handle;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_bind_generic);
+
+void iommu_sva_unbind_generic(struct iommu_sva *handle)
+{
+	struct iommu_bond *bond = to_iommu_bond(handle);
+	struct dev_iommu *param = handle->dev->iommu;
+
+	if (WARN_ON(!param))
+		return;
+
+	mutex_lock(&param->sva_lock);
+	mutex_lock(&iommu_sva_lock);
+	io_mm_detach(bond);
+	mutex_unlock(&iommu_sva_lock);
+	mutex_unlock(&param->sva_lock);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_unbind_generic);
+
+/**
+ * iommu_sva_enable() - Enable Shared Virtual Addressing for a device
+ * @dev: the device
+ * @sva_param: the parameters.
+ *
+ * Called by an IOMMU driver to setup the SVA parameters
+ * @sva_param is duplicated and can be freed when this function returns.
+ *
+ * Return 0 if initialization succeeded, or an error.
+ */
+int iommu_sva_enable(struct device *dev, struct iommu_sva_param *sva_param)
+{
+	int ret;
+	struct iommu_sva_param *new_param;
+	struct dev_iommu *param = dev->iommu;
+
+	if (!param)
+		return -ENODEV;
+
+	new_param = kmemdup(sva_param, sizeof(*new_param), GFP_KERNEL);
+	if (!new_param)
+		return -ENOMEM;
+
+	mutex_lock(&param->sva_lock);
+	if (param->sva_param) {
+		ret = -EEXIST;
+		goto err_unlock;
+	}
+
+	dev->iommu->sva_param = new_param;
+	mutex_unlock(&param->sva_lock);
+	return 0;
+
+err_unlock:
+	mutex_unlock(&param->sva_lock);
+	kfree(new_param);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_enable);
+
+/**
+ * iommu_sva_disable() - Disable Shared Virtual Addressing for a device
+ * @dev: the device
+ *
+ * IOMMU drivers call this to disable SVA.
+ */
+int iommu_sva_disable(struct device *dev)
+{
+	int ret = 0;
+	struct dev_iommu *param = dev->iommu;
+
+	if (!param)
+		return -EINVAL;
+
+	mutex_lock(&param->sva_lock);
+	if (!param->sva_param) {
+		ret = -ENODEV;
+		goto out_unlock;
+	}
+
+	/* Require that all contexts are unbound */
+	if (param->sva_param->nr_bonds) {
+		ret = -EBUSY;
+		goto out_unlock;
+	}
+
+	kfree(param->sva_param);
+	param->sva_param = NULL;
+out_unlock:
+	mutex_unlock(&param->sva_lock);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_disable);
+
+bool iommu_sva_enabled(struct device *dev)
+{
+	bool enabled;
+	struct dev_iommu *param = dev->iommu;
+
+	if (!param)
+		return false;
+
+	mutex_lock(&param->sva_lock);
+	enabled = !!param->sva_param;
+	mutex_unlock(&param->sva_lock);
+	return enabled;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_enabled);
+
+int iommu_sva_get_pasid_generic(struct iommu_sva *handle)
+{
+	struct iommu_bond *bond = to_iommu_bond(handle);
+
+	return bond->io_mm->pasid;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_get_pasid_generic);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index dfed12328c032..6cc2ab6889e50 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -164,6 +164,7 @@ static struct dev_iommu *dev_iommu_get(struct device *dev)
 		return NULL;
 
 	mutex_init(&param->lock);
+	mutex_init(&param->sva_lock);
 	dev->iommu = param;
 	return param;
 }
-- 
2.26.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 03/25] iommu: Add a page fault handler
  2020-04-14 17:02 [PATCH v5 00/25] iommu: Shared Virtual Addressing and SMMUv3 support Jean-Philippe Brucker
  2020-04-14 17:02 ` [PATCH v5 01/25] mm/mmu_notifiers: pass private data down to alloc_notifier() Jean-Philippe Brucker
  2020-04-14 17:02 ` [PATCH v5 02/25] iommu/sva: Manage process address spaces Jean-Philippe Brucker
@ 2020-04-14 17:02 ` Jean-Philippe Brucker
  2020-04-14 17:02 ` [PATCH v5 04/25] iommu/sva: Search mm by PASID Jean-Philippe Brucker
                   ` (21 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2020-04-14 17:02 UTC (permalink / raw)
  To: iommu, devicetree, linux-arm-kernel, linux-pci, linux-mm
  Cc: joro, catalin.marinas, will, robin.murphy, kevin.tian, baolu.lu,
	Jonathan.Cameron, jacob.jun.pan, christian.koenig, zhangfei.gao,
	jgg, xuzaibo, Jean-Philippe Brucker

Some systems allow devices to handle I/O Page Faults in the core mm. For
example systems implementing the PCI PRI extension or Arm SMMU stall
model. Infrastructure for reporting these recoverable page faults was
recently added to the IOMMU core. Add a page fault handler for host SVA.

IOMMU driver can now instantiate several fault workqueues and link them to
IOPF-capable devices. Drivers can choose between a single global
workqueue, one per IOMMU device, one per low-level fault queue, one per
domain, etc.

When it receives a fault event, supposedly in an IRQ handler, the IOMMU
driver reports the fault using iommu_report_device_fault(), which calls
the registered handler. The page fault handler then calls the mm fault
handler, and reports either success or failure with iommu_page_response().
When the handler succeeded, the IOMMU retries the access.

The iopf_param pointer could be embedded into iommu_fault_param. But
putting iopf_param into the iommu_param structure allows us not to care
about ordering between calls to iopf_queue_add_device() and
iommu_register_device_fault_handler().

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
v4->v5: Fix 'busy' refcount
---
 drivers/iommu/Kconfig      |   4 +
 drivers/iommu/Makefile     |   1 +
 include/linux/iommu.h      |  60 +++++
 drivers/iommu/io-pgfault.c | 452 +++++++++++++++++++++++++++++++++++++
 4 files changed, 517 insertions(+)
 create mode 100644 drivers/iommu/io-pgfault.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index e81842f59b037..bf620bf48da03 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -109,6 +109,10 @@ config IOMMU_SVA
 	select IOMMU_API
 	select MMU_NOTIFIER
 
+config IOMMU_PAGE_FAULT
+	bool
+	select IOMMU_API
+
 config FSL_PAMU
 	bool "Freescale IOMMU support"
 	depends on PCI
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 40c800dd4e3ef..bf5cb4ee84093 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -4,6 +4,7 @@ obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
 obj-$(CONFIG_IOMMU_DEBUGFS) += iommu-debugfs.o
 obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
+obj-$(CONFIG_IOMMU_PAGE_FAULT) += io-pgfault.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 167e468dd3510..5a3d092c2568a 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -343,12 +343,21 @@ struct iommu_fault_param {
 	struct mutex lock;
 };
 
+/**
+ * iopf_queue_flush_t - Flush low-level page fault queue
+ *
+ * Report all faults currently pending in the low-level page fault queue
+ */
+struct iopf_queue;
+typedef int (*iopf_queue_flush_t)(void *cookie, struct device *dev, int pasid);
+
 /**
  * struct dev_iommu - Collection of per-device IOMMU data
  *
  * @fault_param: IOMMU detected device fault reporting data
  * @sva_param:	 IOMMU parameter for SVA
  * @sva_lock:	 protects @sva_param
+ * @iopf_param:	 I/O Page Fault queue and data
  * @fwspec:	 IOMMU fwspec data
  * @priv:	 IOMMU Driver private data
  *
@@ -360,6 +369,7 @@ struct dev_iommu {
 	struct iommu_fault_param	*fault_param;
 	struct iommu_sva_param		*sva_param;
 	struct mutex			sva_lock;
+	struct iopf_device_param	*iopf_param;
 	struct iommu_fwspec		*fwspec;
 	void				*priv;
 };
@@ -1071,4 +1081,54 @@ void iommu_debugfs_setup(void);
 static inline void iommu_debugfs_setup(void) {}
 #endif
 
+#ifdef CONFIG_IOMMU_PAGE_FAULT
+extern int iommu_queue_iopf(struct iommu_fault *fault, void *cookie);
+
+extern int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev);
+extern int iopf_queue_remove_device(struct iopf_queue *queue,
+				    struct device *dev);
+extern int iopf_queue_flush_dev(struct device *dev, int pasid);
+extern struct iopf_queue *
+iopf_queue_alloc(const char *name, iopf_queue_flush_t flush, void *cookie);
+extern void iopf_queue_free(struct iopf_queue *queue);
+extern int iopf_queue_discard_partial(struct iopf_queue *queue);
+#else /* CONFIG_IOMMU_PAGE_FAULT */
+static inline int iommu_queue_iopf(struct iommu_fault *fault, void *cookie)
+{
+	return -ENODEV;
+}
+
+static inline int iopf_queue_add_device(struct iopf_queue *queue,
+					struct device *dev)
+{
+	return -ENODEV;
+}
+
+static inline int iopf_queue_remove_device(struct iopf_queue *queue,
+					   struct device *dev)
+{
+	return -ENODEV;
+}
+
+static inline int iopf_queue_flush_dev(struct device *dev, int pasid)
+{
+	return -ENODEV;
+}
+
+static inline struct iopf_queue *
+iopf_queue_alloc(const char *name, iopf_queue_flush_t flush, void *cookie)
+{
+	return NULL;
+}
+
+static inline void iopf_queue_free(struct iopf_queue *queue)
+{
+}
+
+static inline int iopf_queue_discard_partial(struct iopf_queue *queue)
+{
+	return -ENODEV;
+}
+#endif /* CONFIG_IOMMU_PAGE_FAULT */
+
 #endif /* __LINUX_IOMMU_H */
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
new file mode 100644
index 0000000000000..5bba8e6a13be2
--- /dev/null
+++ b/drivers/iommu/io-pgfault.c
@@ -0,0 +1,452 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Handle device page faults
+ *
+ * Copyright (C) 2020 ARM Ltd.
+ */
+
+#include <linux/iommu.h>
+#include <linux/list.h>
+#include <linux/slab.h>
+#include <linux/workqueue.h>
+
+/**
+ * struct iopf_queue - IO Page Fault queue
+ * @wq: the fault workqueue
+ * @flush: low-level flush callback
+ * @flush_arg: flush() argument
+ * @devices: devices attached to this queue
+ * @lock: protects the device list
+ */
+struct iopf_queue {
+	struct workqueue_struct		*wq;
+	iopf_queue_flush_t		flush;
+	void				*flush_arg;
+	struct list_head		devices;
+	struct mutex			lock;
+};
+
+/**
+ * struct iopf_device_param - IO Page Fault data attached to a device
+ * @dev: the device that owns this param
+ * @queue: IOPF queue
+ * @queue_list: index into queue->devices
+ * @partial: faults that are part of a Page Request Group for which the last
+ *           request hasn't been submitted yet.
+ * @busy: the param is being used
+ * @wq_head: signal a change to @busy
+ */
+struct iopf_device_param {
+	struct device			*dev;
+	struct iopf_queue		*queue;
+	struct list_head		queue_list;
+	struct list_head		partial;
+	unsigned long			busy;
+	wait_queue_head_t		wq_head;
+};
+
+struct iopf_fault {
+	struct iommu_fault		fault;
+	struct list_head		head;
+};
+
+struct iopf_group {
+	struct iopf_fault		last_fault;
+	struct list_head		faults;
+	struct work_struct		work;
+	struct device			*dev;
+};
+
+static int iopf_complete_group(struct device *dev, struct iopf_fault *iopf,
+			       enum iommu_page_response_code status)
+{
+	struct iommu_page_response resp = {
+		.version		= IOMMU_PAGE_RESP_VERSION_1,
+		.pasid			= iopf->fault.prm.pasid,
+		.grpid			= iopf->fault.prm.grpid,
+		.code			= status,
+	};
+
+	if (iopf->fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID)
+		resp.flags = IOMMU_PAGE_RESP_PASID_VALID;
+
+	return iommu_page_response(dev, &resp);
+}
+
+static enum iommu_page_response_code
+iopf_handle_single(struct iopf_fault *iopf)
+{
+	/* TODO */
+	return -ENODEV;
+}
+
+static void iopf_handle_group(struct work_struct *work)
+{
+	struct iopf_group *group;
+	struct iopf_fault *iopf, *next;
+	enum iommu_page_response_code status = IOMMU_PAGE_RESP_SUCCESS;
+
+	group = container_of(work, struct iopf_group, work);
+
+	list_for_each_entry_safe(iopf, next, &group->faults, head) {
+		/*
+		 * For the moment, errors are sticky: don't handle subsequent
+		 * faults in the group if there is an error.
+		 */
+		if (status == IOMMU_PAGE_RESP_SUCCESS)
+			status = iopf_handle_single(iopf);
+
+		if (!(iopf->fault.prm.flags &
+		      IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE))
+			kfree(iopf);
+	}
+
+	iopf_complete_group(group->dev, &group->last_fault, status);
+	kfree(group);
+}
+
+/**
+ * iommu_queue_iopf - IO Page Fault handler
+ * @evt: fault event
+ * @cookie: struct device, passed to iommu_register_device_fault_handler.
+ *
+ * Add a fault to the device workqueue, to be handled by mm.
+ *
+ * Return: 0 on success and <0 on error.
+ */
+int iommu_queue_iopf(struct iommu_fault *fault, void *cookie)
+{
+	int ret;
+	struct iopf_group *group;
+	struct iopf_fault *iopf, *next;
+	struct iopf_device_param *iopf_param;
+
+	struct device *dev = cookie;
+	struct dev_iommu *param = dev->iommu;
+
+	lockdep_assert_held(&param->lock);
+
+	if (fault->type != IOMMU_FAULT_PAGE_REQ)
+		/* Not a recoverable page fault */
+		return -EOPNOTSUPP;
+
+	/*
+	 * As long as we're holding param->lock, the queue can't be unlinked
+	 * from the device and therefore cannot disappear.
+	 */
+	iopf_param = param->iopf_param;
+	if (!iopf_param)
+		return -ENODEV;
+
+	if (!(fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE)) {
+		iopf = kzalloc(sizeof(*iopf), GFP_KERNEL);
+		if (!iopf)
+			return -ENOMEM;
+
+		iopf->fault = *fault;
+
+		/* Non-last request of a group. Postpone until the last one */
+		list_add(&iopf->head, &iopf_param->partial);
+
+		return 0;
+	}
+
+	group = kzalloc(sizeof(*group), GFP_KERNEL);
+	if (!group) {
+		/*
+		 * The caller will send a response to the hardware. But we do
+		 * need to clean up before leaving, otherwise partial faults
+		 * will be stuck.
+		 */
+		ret = -ENOMEM;
+		goto cleanup_partial;
+	}
+
+	group->dev = dev;
+	group->last_fault.fault = *fault;
+	INIT_LIST_HEAD(&group->faults);
+	list_add(&group->last_fault.head, &group->faults);
+	INIT_WORK(&group->work, iopf_handle_group);
+
+	/* See if we have partial faults for this group */
+	list_for_each_entry_safe(iopf, next, &iopf_param->partial, head) {
+		if (iopf->fault.prm.grpid == fault->prm.grpid)
+			/* Insert *before* the last fault */
+			list_move(&iopf->head, &group->faults);
+	}
+
+	queue_work(iopf_param->queue->wq, &group->work);
+	return 0;
+
+cleanup_partial:
+	list_for_each_entry_safe(iopf, next, &iopf_param->partial, head) {
+		if (iopf->fault.prm.grpid == fault->prm.grpid) {
+			list_del(&iopf->head);
+			kfree(iopf);
+		}
+	}
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_queue_iopf);
+
+/**
+ * iopf_queue_flush_dev - Ensure that all queued faults have been processed
+ * @dev: the endpoint whose faults need to be flushed.
+ * @pasid: the PASID affected by this flush
+ *
+ * Users must call this function when releasing a PASID, to ensure that all
+ * pending faults for this PASID have been handled, and won't hit the address
+ * space of the next process that uses this PASID.
+ *
+ * This function can also be called before shutting down the device, in which
+ * case @pasid should be IOMMU_PASID_INVALID.
+ *
+ * Return: 0 on success and <0 on error.
+ */
+int iopf_queue_flush_dev(struct device *dev, int pasid)
+{
+	int ret = 0;
+	struct iopf_queue *queue;
+	struct iopf_device_param *iopf_param;
+	struct dev_iommu *param = dev->iommu;
+
+	if (!param)
+		return -ENODEV;
+
+	/*
+	 * It is incredibly easy to find ourselves in a deadlock situation if
+	 * we're not careful, because we're taking the opposite path as
+	 * iommu_queue_iopf:
+	 *
+	 *   iopf_queue_flush_dev()   |  PRI queue handler
+	 *    lock(&param->lock)      |   iommu_queue_iopf()
+	 *     queue->flush()         |    lock(&param->lock)
+	 *      wait PRI queue empty  |
+	 *
+	 * So we can't hold the device param lock while flushing. Take a
+	 * reference to the device param instead, to prevent the queue from
+	 * going away.
+	 */
+	mutex_lock(&param->lock);
+	iopf_param = param->iopf_param;
+	if (iopf_param) {
+		queue = param->iopf_param->queue;
+		iopf_param->busy++;
+	} else {
+		ret = -ENODEV;
+	}
+	mutex_unlock(&param->lock);
+	if (ret)
+		return ret;
+
+	/*
+	 * When removing a PASID, the device driver tells the device to stop
+	 * using it, and flush any pending fault to the IOMMU. In this flush
+	 * callback, the IOMMU driver makes sure that there are no such faults
+	 * left in the low-level queue.
+	 */
+	queue->flush(queue->flush_arg, dev, pasid);
+
+	flush_workqueue(queue->wq);
+
+	mutex_lock(&param->lock);
+	iopf_param->busy--;
+	wake_up(&iopf_param->wq_head);
+	mutex_unlock(&param->lock);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iopf_queue_flush_dev);
+
+/**
+ * iopf_queue_discard_partial - Remove all pending partial fault
+ * @queue: the queue whose partial faults need to be discarded
+ *
+ * When the hardware queue overflows, last page faults in a group may have been
+ * lost and the IOMMU driver calls this to discard all partial faults. The
+ * driver shouldn't be adding new faults to this queue concurrently.
+ *
+ * Return: 0 on success and <0 on error.
+ */
+int iopf_queue_discard_partial(struct iopf_queue *queue)
+{
+	struct iopf_fault *iopf, *next;
+	struct iopf_device_param *iopf_param;
+
+	if (!queue)
+		return -EINVAL;
+
+	mutex_lock(&queue->lock);
+	list_for_each_entry(iopf_param, &queue->devices, queue_list) {
+		list_for_each_entry_safe(iopf, next, &iopf_param->partial, head)
+			kfree(iopf);
+	}
+	mutex_unlock(&queue->lock);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iopf_queue_discard_partial);
+
+/**
+ * iopf_queue_add_device - Add producer to the fault queue
+ * @queue: IOPF queue
+ * @dev: device to add
+ *
+ * Return: 0 on success and <0 on error.
+ */
+int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev)
+{
+	int ret = -EINVAL;
+	struct iopf_device_param *iopf_param;
+	struct dev_iommu *param = dev->iommu;
+
+	if (!param)
+		return -ENODEV;
+
+	iopf_param = kzalloc(sizeof(*iopf_param), GFP_KERNEL);
+	if (!iopf_param)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&iopf_param->partial);
+	iopf_param->queue = queue;
+	iopf_param->dev = dev;
+	init_waitqueue_head(&iopf_param->wq_head);
+
+	mutex_lock(&queue->lock);
+	mutex_lock(&param->lock);
+	if (!param->iopf_param) {
+		list_add(&iopf_param->queue_list, &queue->devices);
+		param->iopf_param = iopf_param;
+		ret = 0;
+	}
+	mutex_unlock(&param->lock);
+	mutex_unlock(&queue->lock);
+
+	if (ret)
+		kfree(iopf_param);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iopf_queue_add_device);
+
+/**
+ * iopf_queue_remove_device - Remove producer from fault queue
+ * @queue: IOPF queue
+ * @dev: device to remove
+ *
+ * Caller makes sure that no more faults are reported for this device.
+ *
+ * Return: 0 on success and <0 on error.
+ */
+int iopf_queue_remove_device(struct iopf_queue *queue, struct device *dev)
+{
+	bool busy;
+	int ret = 0;
+	struct iopf_fault *iopf, *next;
+	struct iopf_device_param *iopf_param;
+	struct dev_iommu *param = dev->iommu;
+
+	if (!param || !queue)
+		return -EINVAL;
+
+	do {
+		mutex_lock(&queue->lock);
+		mutex_lock(&param->lock);
+		iopf_param = param->iopf_param;
+		if (iopf_param && iopf_param->queue == queue) {
+			busy = iopf_param->busy;
+			if (!busy) {
+				list_del(&iopf_param->queue_list);
+				param->iopf_param = NULL;
+			}
+		} else {
+			ret = -EINVAL;
+		}
+		mutex_unlock(&param->lock);
+		mutex_unlock(&queue->lock);
+		if (ret)
+			return ret;
+
+		/*
+		 * If there is an ongoing flush, wait for it to complete and
+		 * then retry. iopf_param isn't going away since we're the only
+		 * thread that can free it.
+		 */
+		if (busy)
+			wait_event(iopf_param->wq_head, !iopf_param->busy);
+	} while (busy);
+
+	/* Just in case some faults are still stuck */
+	list_for_each_entry_safe(iopf, next, &iopf_param->partial, head)
+		kfree(iopf);
+
+	kfree(iopf_param);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iopf_queue_remove_device);
+
+/**
+ * iopf_queue_alloc - Allocate and initialize a fault queue
+ * @name: a unique string identifying the queue (for workqueue)
+ * @flush: a callback that flushes the low-level queue
+ * @cookie: driver-private data passed to the flush callback
+ *
+ * The callback is called before the workqueue is flushed. The IOMMU driver must
+ * commit all faults that are pending in its low-level queues at the time of the
+ * call, into the IOPF queue (with iommu_report_device_fault). The callback
+ * takes a device pointer as argument, hinting what endpoint is causing the
+ * flush. When the device is NULL, all faults should be committed.
+ *
+ * Return: the queue on success and NULL on error.
+ */
+struct iopf_queue *
+iopf_queue_alloc(const char *name, iopf_queue_flush_t flush, void *cookie)
+{
+	struct iopf_queue *queue;
+
+	queue = kzalloc(sizeof(*queue), GFP_KERNEL);
+	if (!queue)
+		return NULL;
+
+	/*
+	 * The WQ is unordered because the low-level handler enqueues faults by
+	 * group. PRI requests within a group have to be ordered, but once
+	 * that's dealt with, the high-level function can handle groups out of
+	 * order.
+	 */
+	queue->wq = alloc_workqueue("iopf_queue/%s", WQ_UNBOUND, 0, name);
+	if (!queue->wq) {
+		kfree(queue);
+		return NULL;
+	}
+
+	queue->flush = flush;
+	queue->flush_arg = cookie;
+	INIT_LIST_HEAD(&queue->devices);
+	mutex_init(&queue->lock);
+
+	return queue;
+}
+EXPORT_SYMBOL_GPL(iopf_queue_alloc);
+
+/**
+ * iopf_queue_free - Free IOPF queue
+ * @queue: queue to free
+ *
+ * Counterpart to iopf_queue_alloc(). The driver must not be queuing faults or
+ * adding/removing devices on this queue anymore.
+ */
+void iopf_queue_free(struct iopf_queue *queue)
+{
+	struct iopf_device_param *iopf_param, *next;
+
+	if (!queue)
+		return;
+
+	list_for_each_entry_safe(iopf_param, next, &queue->devices, queue_list)
+		iopf_queue_remove_device(queue, iopf_param->dev);
+
+	destroy_workqueue(queue->wq);
+	kfree(queue);
+}
+EXPORT_SYMBOL_GPL(iopf_queue_free);
-- 
2.26.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 04/25] iommu/sva: Search mm by PASID
  2020-04-14 17:02 [PATCH v5 00/25] iommu: Shared Virtual Addressing and SMMUv3 support Jean-Philippe Brucker
                   ` (2 preceding siblings ...)
  2020-04-14 17:02 ` [PATCH v5 03/25] iommu: Add a page fault handler Jean-Philippe Brucker
@ 2020-04-14 17:02 ` Jean-Philippe Brucker
  2020-04-14 17:02 ` [PATCH v5 05/25] iommu/iopf: Handle mm faults Jean-Philippe Brucker
                   ` (20 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2020-04-14 17:02 UTC (permalink / raw)
  To: iommu, devicetree, linux-arm-kernel, linux-pci, linux-mm
  Cc: joro, catalin.marinas, will, robin.murphy, kevin.tian, baolu.lu,
	Jonathan.Cameron, jacob.jun.pan, christian.koenig, zhangfei.gao,
	jgg, xuzaibo, Jean-Philippe Brucker

The fault handler will need to find an mm given its PASID. This is the
reason we have an IDR for storing address spaces, so hook it up.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 include/linux/iommu.h     |  9 +++++++++
 drivers/iommu/iommu-sva.c | 19 +++++++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 5a3d092c2568a..4b9c25d7246d5 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -1081,6 +1081,15 @@ void iommu_debugfs_setup(void);
 static inline void iommu_debugfs_setup(void) {}
 #endif
 
+#ifdef CONFIG_IOMMU_SVA
+extern struct mm_struct *iommu_sva_find(int pasid);
+#else /* !CONFIG_IOMMU_SVA */
+static inline struct mm_struct *iommu_sva_find(int pasid)
+{
+	return NULL;
+}
+#endif /* !CONFIG_IOMMU_SVA */
+
 #ifdef CONFIG_IOMMU_PAGE_FAULT
 extern int iommu_queue_iopf(struct iommu_fault *fault, void *cookie);
 
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 7fecc74a9f7d6..b177d6cbf4fff 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -525,3 +525,22 @@ int iommu_sva_get_pasid_generic(struct iommu_sva *handle)
 	return bond->io_mm->pasid;
 }
 EXPORT_SYMBOL_GPL(iommu_sva_get_pasid_generic);
+
+/* ioasid wants a void * argument */
+static bool __mmget_not_zero(void *mm)
+{
+	return mmget_not_zero(mm);
+}
+
+/**
+ * iommu_sva_find() - Find mm associated to the given PASID
+ * @pasid: Process Address Space ID assigned to the mm
+ *
+ * Returns the mm corresponding to this PASID, or an error if not found. A
+ * reference to the mm is taken, and must be released with mmput().
+ */
+struct mm_struct *iommu_sva_find(int pasid)
+{
+	return ioasid_find(&shared_pasid, pasid, __mmget_not_zero);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_find);
-- 
2.26.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 05/25] iommu/iopf: Handle mm faults
  2020-04-14 17:02 [PATCH v5 00/25] iommu: Shared Virtual Addressing and SMMUv3 support Jean-Philippe Brucker
                   ` (3 preceding siblings ...)
  2020-04-14 17:02 ` [PATCH v5 04/25] iommu/sva: Search mm by PASID Jean-Philippe Brucker
@ 2020-04-14 17:02 ` Jean-Philippe Brucker
  2020-04-14 17:02 ` [PATCH v5 06/25] iommu/sva: Register page fault handler Jean-Philippe Brucker
                   ` (19 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2020-04-14 17:02 UTC (permalink / raw)
  To: iommu, devicetree, linux-arm-kernel, linux-pci, linux-mm
  Cc: joro, catalin.marinas, will, robin.murphy, kevin.tian, baolu.lu,
	Jonathan.Cameron, jacob.jun.pan, christian.koenig, zhangfei.gao,
	jgg, xuzaibo, Jean-Philippe Brucker

When a recoverable page fault is handled by the fault workqueue, find the
associated mm and call handle_mm_fault.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
v4->v5: no need to call mmput_async() anymore, since the MMU release()
        doesn't flush the IOPF queue anymore.
---
 drivers/iommu/io-pgfault.c | 77 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 75 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index 5bba8e6a13be2..fd4244023b33f 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -7,6 +7,7 @@
 
 #include <linux/iommu.h>
 #include <linux/list.h>
+#include <linux/sched/mm.h>
 #include <linux/slab.h>
 #include <linux/workqueue.h>
 
@@ -76,8 +77,57 @@ static int iopf_complete_group(struct device *dev, struct iopf_fault *iopf,
 static enum iommu_page_response_code
 iopf_handle_single(struct iopf_fault *iopf)
 {
-	/* TODO */
-	return -ENODEV;
+	vm_fault_t ret;
+	struct mm_struct *mm;
+	struct vm_area_struct *vma;
+	unsigned int access_flags = 0;
+	unsigned int fault_flags = FAULT_FLAG_REMOTE;
+	struct iommu_fault_page_request *prm = &iopf->fault.prm;
+	enum iommu_page_response_code status = IOMMU_PAGE_RESP_INVALID;
+
+	if (!(prm->flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID))
+		return status;
+
+	mm = iommu_sva_find(prm->pasid);
+	if (IS_ERR_OR_NULL(mm))
+		return status;
+
+	down_read(&mm->mmap_sem);
+
+	vma = find_extend_vma(mm, prm->addr);
+	if (!vma)
+		/* Unmapped area */
+		goto out_put_mm;
+
+	if (prm->perm & IOMMU_FAULT_PERM_READ)
+		access_flags |= VM_READ;
+
+	if (prm->perm & IOMMU_FAULT_PERM_WRITE) {
+		access_flags |= VM_WRITE;
+		fault_flags |= FAULT_FLAG_WRITE;
+	}
+
+	if (prm->perm & IOMMU_FAULT_PERM_EXEC) {
+		access_flags |= VM_EXEC;
+		fault_flags |= FAULT_FLAG_INSTRUCTION;
+	}
+
+	if (!(prm->perm & IOMMU_FAULT_PERM_PRIV))
+		fault_flags |= FAULT_FLAG_USER;
+
+	if (access_flags & ~vma->vm_flags)
+		/* Access fault */
+		goto out_put_mm;
+
+	ret = handle_mm_fault(vma, prm->addr, fault_flags);
+	status = ret & VM_FAULT_ERROR ? IOMMU_PAGE_RESP_INVALID :
+		IOMMU_PAGE_RESP_SUCCESS;
+
+out_put_mm:
+	up_read(&mm->mmap_sem);
+	mmput(mm);
+
+	return status;
 }
 
 static void iopf_handle_group(struct work_struct *work)
@@ -112,6 +162,29 @@ static void iopf_handle_group(struct work_struct *work)
  *
  * Add a fault to the device workqueue, to be handled by mm.
  *
+ * This module doesn't handle PCI PASID Stop Marker; IOMMU drivers must discard
+ * them before reporting faults. A PASID Stop Marker (LRW = 0b100) doesn't
+ * expect a response. It may be generated when disabling a PASID (issuing a
+ * PASID stop request) by some PCI devices.
+ *
+ * The PASID stop request is issued by the device driver before unbind(). Once
+ * it completes, no page request is generated for this PASID anymore and
+ * outstanding ones have been pushed to the IOMMU (as per PCIe 4.0r1.0 - 6.20.1
+ * and 10.4.1.2 - Managing PASID TLP Prefix Usage). Some PCI devices will wait
+ * for all outstanding page requests to come back with a response before
+ * completing the PASID stop request. Others do not wait for page responses, and
+ * instead issue this Stop Marker that tells us when the PASID can be
+ * reallocated.
+ *
+ * It is safe to discard the Stop Marker because it is an optimization.
+ * a. Page requests, which are posted requests, have been flushed to the IOMMU
+ *    when the stop request completes.
+ * b. We flush all fault queues on unbind() before freeing the PASID.
+ *
+ * So even though the Stop Marker might be issued by the device *after* the stop
+ * request completes, outstanding faults will have been dealt with by the time
+ * we free the PASID.
+ *
  * Return: 0 on success and <0 on error.
  */
 int iommu_queue_iopf(struct iommu_fault *fault, void *cookie)
-- 
2.26.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 06/25] iommu/sva: Register page fault handler
  2020-04-14 17:02 [PATCH v5 00/25] iommu: Shared Virtual Addressing and SMMUv3 support Jean-Philippe Brucker
                   ` (4 preceding siblings ...)
  2020-04-14 17:02 ` [PATCH v5 05/25] iommu/iopf: Handle mm faults Jean-Philippe Brucker
@ 2020-04-14 17:02 ` Jean-Philippe Brucker
  2020-04-14 17:02 ` [PATCH v5 07/25] arm64: mm: Add asid_gen_match() helper Jean-Philippe Brucker
                   ` (18 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2020-04-14 17:02 UTC (permalink / raw)
  To: iommu, devicetree, linux-arm-kernel, linux-pci, linux-mm
  Cc: joro, catalin.marinas, will, robin.murphy, kevin.tian, baolu.lu,
	Jonathan.Cameron, jacob.jun.pan, christian.koenig, zhangfei.gao,
	jgg, xuzaibo, Jean-Philippe Brucker

When enabling SVA, register the fault handler. Device driver will register
an I/O page fault queue before or after calling iommu_sva_enable. The
fault queue must be flushed before any io_mm is freed, to make sure that
its PASID isn't used in any fault queue, and can be reallocated.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 drivers/iommu/Kconfig     |  1 +
 drivers/iommu/iommu-sva.c | 11 +++++++++++
 2 files changed, 12 insertions(+)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index bf620bf48da03..411a7ee2ab12d 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -106,6 +106,7 @@ config IOMMU_DMA
 config IOMMU_SVA
 	bool
 	select IOASID
+	select IOMMU_PAGE_FAULT
 	select IOMMU_API
 	select MMU_NOTIFIER
 
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index b177d6cbf4fff..00d5e7e895e80 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -420,6 +420,12 @@ void iommu_sva_unbind_generic(struct iommu_sva *handle)
 	if (WARN_ON(!param))
 		return;
 
+	/*
+	 * Caller stopped the device from issuing PASIDs, now make sure they are
+	 * out of the fault queue.
+	 */
+	iopf_queue_flush_dev(handle->dev, bond->io_mm->pasid);
+
 	mutex_lock(&param->sva_lock);
 	mutex_lock(&iommu_sva_lock);
 	io_mm_detach(bond);
@@ -457,6 +463,10 @@ int iommu_sva_enable(struct device *dev, struct iommu_sva_param *sva_param)
 		goto err_unlock;
 	}
 
+	ret = iommu_register_device_fault_handler(dev, iommu_queue_iopf, dev);
+	if (ret)
+		goto err_unlock;
+
 	dev->iommu->sva_param = new_param;
 	mutex_unlock(&param->sva_lock);
 	return 0;
@@ -494,6 +504,7 @@ int iommu_sva_disable(struct device *dev)
 		goto out_unlock;
 	}
 
+	iommu_unregister_device_fault_handler(dev);
 	kfree(param->sva_param);
 	param->sva_param = NULL;
 out_unlock:
-- 
2.26.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 07/25] arm64: mm: Add asid_gen_match() helper
  2020-04-14 17:02 [PATCH v5 00/25] iommu: Shared Virtual Addressing and SMMUv3 support Jean-Philippe Brucker
                   ` (5 preceding siblings ...)
  2020-04-14 17:02 ` [PATCH v5 06/25] iommu/sva: Register page fault handler Jean-Philippe Brucker
@ 2020-04-14 17:02 ` Jean-Philippe Brucker
  2020-04-14 17:02 ` [PATCH v5 08/25] arm64: mm: Pin down ASIDs for sharing mm with devices Jean-Philippe Brucker
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2020-04-14 17:02 UTC (permalink / raw)
  To: iommu, devicetree, linux-arm-kernel, linux-pci, linux-mm
  Cc: joro, catalin.marinas, will, robin.murphy, kevin.tian, baolu.lu,
	Jonathan.Cameron, jacob.jun.pan, christian.koenig, zhangfei.gao,
	jgg, xuzaibo, Jean-Philippe Brucker

Add a macro to check if an ASID is from the current generation, since a
subsequent patch will introduce a third user for this test.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
v4->v5: new
---
 arch/arm64/mm/context.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index 9b26f9a88724f..d702d60e64dab 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -92,6 +92,9 @@ static void set_reserved_asid_bits(void)
 		bitmap_clear(asid_map, 0, NUM_USER_ASIDS);
 }
 
+#define asid_gen_match(asid) \
+	(!(((asid) ^ atomic64_read(&asid_generation)) >> asid_bits))
+
 static void flush_context(void)
 {
 	int i;
@@ -220,8 +223,7 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)
 	 *   because atomic RmWs are totally ordered for a given location.
 	 */
 	old_active_asid = atomic64_read(&per_cpu(active_asids, cpu));
-	if (old_active_asid &&
-	    !((asid ^ atomic64_read(&asid_generation)) >> asid_bits) &&
+	if (old_active_asid && asid_gen_match(asid) &&
 	    atomic64_cmpxchg_relaxed(&per_cpu(active_asids, cpu),
 				     old_active_asid, asid))
 		goto switch_mm_fastpath;
@@ -229,7 +231,7 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)
 	raw_spin_lock_irqsave(&cpu_asid_lock, flags);
 	/* Check that our ASID belongs to the current generation. */
 	asid = atomic64_read(&mm->context.id);
-	if ((asid ^ atomic64_read(&asid_generation)) >> asid_bits) {
+	if (!asid_gen_match(asid)) {
 		asid = new_context(mm);
 		atomic64_set(&mm->context.id, asid);
 	}
-- 
2.26.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 08/25] arm64: mm: Pin down ASIDs for sharing mm with devices
  2020-04-14 17:02 [PATCH v5 00/25] iommu: Shared Virtual Addressing and SMMUv3 support Jean-Philippe Brucker
                   ` (6 preceding siblings ...)
  2020-04-14 17:02 ` [PATCH v5 07/25] arm64: mm: Add asid_gen_match() helper Jean-Philippe Brucker
@ 2020-04-14 17:02 ` Jean-Philippe Brucker
  2020-04-14 17:02 ` [PATCH v5 09/25] iommu/io-pgtable-arm: Move some definitions to a header Jean-Philippe Brucker
                   ` (16 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2020-04-14 17:02 UTC (permalink / raw)
  To: iommu, devicetree, linux-arm-kernel, linux-pci, linux-mm
  Cc: joro, catalin.marinas, will, robin.murphy, kevin.tian, baolu.lu,
	Jonathan.Cameron, jacob.jun.pan, christian.koenig, zhangfei.gao,
	jgg, xuzaibo, Jean-Philippe Brucker

To enable address space sharing with the IOMMU, introduce mm_context_get()
and mm_context_put(), that pin down a context and ensure that it will keep
its ASID after a rollover. Export the symbols to let the modular SMMUv3
driver use them.

Pinning is necessary because a device constantly needs a valid ASID,
unlike tasks that only require one when running. Without pinning, we would
need to notify the IOMMU when we're about to use a new ASID for a task,
and it would get complicated when a new task is assigned a shared ASID.
Consider the following scenario with no ASID pinned:

1. Task t1 is running on CPUx with shared ASID (gen=1, asid=1)
2. Task t2 is scheduled on CPUx, gets ASID (1, 2)
3. Task tn is scheduled on CPUy, a rollover occurs, tn gets ASID (2, 1)
   We would now have to immediately generate a new ASID for t1, notify
   the IOMMU, and finally enable task tn. We are holding the lock during
   all that time, since we can't afford having another CPU trigger a
   rollover. The IOMMU issues invalidation commands that can take tens of
   milliseconds.

It gets needlessly complicated. All we wanted to do was schedule task tn,
that has no business with the IOMMU. By letting the IOMMU pin tasks when
needed, we avoid stalling the slow path, and let the pinning fail when
we're out of shareable ASIDs.

After a rollover, the allocator expects at least one ASID to be available
in addition to the reserved ones (one per CPU). So (NR_ASIDS - NR_CPUS -
1) is the maximum number of ASIDs that can be shared with the IOMMU.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
v4->v5: extract helper macro
---
 arch/arm64/include/asm/mmu.h         |  1 +
 arch/arm64/include/asm/mmu_context.h | 11 +++-
 arch/arm64/mm/context.c              | 95 +++++++++++++++++++++++++++-
 3 files changed, 104 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 68140fdd89d6b..bbdd291e31d59 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -19,6 +19,7 @@
 
 typedef struct {
 	atomic64_t	id;
+	unsigned long	pinned;
 	void		*vdso;
 	unsigned long	flags;
 } mm_context_t;
diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index ab46187c63001..69599a64945b0 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -177,7 +177,13 @@ static inline void cpu_replace_ttbr1(pgd_t *pgdp)
 #define destroy_context(mm)		do { } while(0)
 void check_and_switch_context(struct mm_struct *mm, unsigned int cpu);
 
-#define init_new_context(tsk,mm)	({ atomic64_set(&(mm)->context.id, 0); 0; })
+static inline int
+init_new_context(struct task_struct *tsk, struct mm_struct *mm)
+{
+	atomic64_set(&mm->context.id, 0);
+	mm->context.pinned = 0;
+	return 0;
+}
 
 #ifdef CONFIG_ARM64_SW_TTBR0_PAN
 static inline void update_saved_ttbr0(struct task_struct *tsk,
@@ -250,6 +256,9 @@ switch_mm(struct mm_struct *prev, struct mm_struct *next,
 void verify_cpu_asid_bits(void);
 void post_ttbr_update_workaround(void);
 
+unsigned long mm_context_get(struct mm_struct *mm);
+void mm_context_put(struct mm_struct *mm);
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* !__ASM_MMU_CONTEXT_H */
diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index d702d60e64dab..d0ddd413f5645 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -27,6 +27,10 @@ static DEFINE_PER_CPU(atomic64_t, active_asids);
 static DEFINE_PER_CPU(u64, reserved_asids);
 static cpumask_t tlb_flush_pending;
 
+static unsigned long max_pinned_asids;
+static unsigned long nr_pinned_asids;
+static unsigned long *pinned_asid_map;
+
 #define ASID_MASK		(~GENMASK(asid_bits - 1, 0))
 #define ASID_FIRST_VERSION	(1UL << asid_bits)
 
@@ -74,6 +78,9 @@ void verify_cpu_asid_bits(void)
 
 static void set_kpti_asid_bits(void)
 {
+	unsigned int k;
+	u8 *dst = (u8 *)asid_map;
+	u8 *src = (u8 *)pinned_asid_map;
 	unsigned int len = BITS_TO_LONGS(NUM_USER_ASIDS) * sizeof(unsigned long);
 	/*
 	 * In case of KPTI kernel/user ASIDs are allocated in
@@ -81,7 +88,8 @@ static void set_kpti_asid_bits(void)
 	 * is set, then the ASID will map only userspace. Thus
 	 * mark even as reserved for kernel.
 	 */
-	memset(asid_map, 0xaa, len);
+	for (k = 0; k < len; k++)
+		dst[k] = src[k] | 0xaa;
 }
 
 static void set_reserved_asid_bits(void)
@@ -89,7 +97,7 @@ static void set_reserved_asid_bits(void)
 	if (arm64_kernel_unmapped_at_el0())
 		set_kpti_asid_bits();
 	else
-		bitmap_clear(asid_map, 0, NUM_USER_ASIDS);
+		bitmap_copy(asid_map, pinned_asid_map, NUM_USER_ASIDS);
 }
 
 #define asid_gen_match(asid) \
@@ -165,6 +173,14 @@ static u64 new_context(struct mm_struct *mm)
 		if (check_update_reserved_asid(asid, newasid))
 			return newasid;
 
+		/*
+		 * If it is pinned, we can keep using it. Note that reserved
+		 * takes priority, because even if it is also pinned, we need to
+		 * update the generation into the reserved_asids.
+		 */
+		if (mm->context.pinned)
+			return newasid;
+
 		/*
 		 * We had a valid ASID in a previous life, so try to re-use
 		 * it if possible.
@@ -254,6 +270,68 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)
 		cpu_switch_mm(mm->pgd, mm);
 }
 
+unsigned long mm_context_get(struct mm_struct *mm)
+{
+	unsigned long flags;
+	u64 asid;
+
+	raw_spin_lock_irqsave(&cpu_asid_lock, flags);
+
+	asid = atomic64_read(&mm->context.id);
+
+	if (mm->context.pinned) {
+		mm->context.pinned++;
+		asid &= ~ASID_MASK;
+		goto out_unlock;
+	}
+
+	if (nr_pinned_asids >= max_pinned_asids) {
+		asid = 0;
+		goto out_unlock;
+	}
+
+	if (!asid_gen_match(asid)) {
+		/*
+		 * We went through one or more rollover since that ASID was
+		 * used. Ensure that it is still valid, or generate a new one.
+		 */
+		asid = new_context(mm);
+		atomic64_set(&mm->context.id, asid);
+	}
+
+	asid &= ~ASID_MASK;
+
+	nr_pinned_asids++;
+	__set_bit(asid2idx(asid), pinned_asid_map);
+	mm->context.pinned++;
+
+out_unlock:
+	raw_spin_unlock_irqrestore(&cpu_asid_lock, flags);
+
+	/* Set the equivalent of USER_ASID_BIT */
+	if (asid && IS_ENABLED(CONFIG_UNMAP_KERNEL_AT_EL0))
+		asid |= 1;
+
+	return asid;
+}
+EXPORT_SYMBOL_GPL(mm_context_get);
+
+void mm_context_put(struct mm_struct *mm)
+{
+	unsigned long flags;
+	u64 asid = atomic64_read(&mm->context.id) & ~ASID_MASK;
+
+	raw_spin_lock_irqsave(&cpu_asid_lock, flags);
+
+	if (--mm->context.pinned == 0) {
+		__clear_bit(asid2idx(asid), pinned_asid_map);
+		nr_pinned_asids--;
+	}
+
+	raw_spin_unlock_irqrestore(&cpu_asid_lock, flags);
+}
+EXPORT_SYMBOL_GPL(mm_context_put);
+
 /* Errata workaround post TTBRx_EL1 update. */
 asmlinkage void post_ttbr_update_workaround(void)
 {
@@ -303,6 +381,13 @@ static int asids_update_limit(void)
 	WARN_ON(num_available_asids - 1 <= num_possible_cpus());
 	pr_info("ASID allocator initialised with %lu entries\n",
 		num_available_asids);
+
+	/*
+	 * We assume that an ASID is always available after a rollover. This
+	 * means that even if all CPUs have a reserved ASID, there still is at
+	 * least one slot available in the asid map.
+	 */
+	max_pinned_asids = num_available_asids - num_possible_cpus() - 2;
 	return 0;
 }
 arch_initcall(asids_update_limit);
@@ -317,6 +402,12 @@ static int asids_init(void)
 		panic("Failed to allocate bitmap for %lu ASIDs\n",
 		      NUM_USER_ASIDS);
 
+	pinned_asid_map = kcalloc(BITS_TO_LONGS(NUM_USER_ASIDS),
+				  sizeof(*pinned_asid_map), GFP_KERNEL);
+	if (!pinned_asid_map)
+		panic("Failed to allocate pinned ASID bitmap\n");
+	nr_pinned_asids = 0;
+
 	/*
 	 * We cannot call set_reserved_asid_bits() here because CPU
 	 * caps are not finalized yet, so it is safer to assume KPTI
-- 
2.26.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 09/25] iommu/io-pgtable-arm: Move some definitions to a header
  2020-04-14 17:02 [PATCH v5 00/25] iommu: Shared Virtual Addressing and SMMUv3 support Jean-Philippe Brucker
                   ` (7 preceding siblings ...)
  2020-04-14 17:02 ` [PATCH v5 08/25] arm64: mm: Pin down ASIDs for sharing mm with devices Jean-Philippe Brucker
@ 2020-04-14 17:02 ` Jean-Philippe Brucker
  2020-04-14 17:02 ` [PATCH v5 10/25] iommu/arm-smmu-v3: Manage ASIDs with xarray Jean-Philippe Brucker
                   ` (15 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2020-04-14 17:02 UTC (permalink / raw)
  To: iommu, devicetree, linux-arm-kernel, linux-pci, linux-mm
  Cc: joro, catalin.marinas, will, robin.murphy, kevin.tian, baolu.lu,
	Jonathan.Cameron, jacob.jun.pan, christian.koenig, zhangfei.gao,
	jgg, xuzaibo, Jean-Philippe Brucker

Extract some of the most generic TCR defines, so they can be reused by
the page table sharing code.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 drivers/iommu/io-pgtable-arm.h | 30 ++++++++++++++++++++++++++++++
 drivers/iommu/io-pgtable-arm.c | 27 ++-------------------------
 2 files changed, 32 insertions(+), 25 deletions(-)
 create mode 100644 drivers/iommu/io-pgtable-arm.h

diff --git a/drivers/iommu/io-pgtable-arm.h b/drivers/iommu/io-pgtable-arm.h
new file mode 100644
index 0000000000000..ba7cfdf7afa03
--- /dev/null
+++ b/drivers/iommu/io-pgtable-arm.h
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef IO_PGTABLE_ARM_H_
+#define IO_PGTABLE_ARM_H_
+
+#define ARM_LPAE_TCR_TG0_4K		0
+#define ARM_LPAE_TCR_TG0_64K		1
+#define ARM_LPAE_TCR_TG0_16K		2
+
+#define ARM_LPAE_TCR_TG1_16K		1
+#define ARM_LPAE_TCR_TG1_4K		2
+#define ARM_LPAE_TCR_TG1_64K		3
+
+#define ARM_LPAE_TCR_SH_NS		0
+#define ARM_LPAE_TCR_SH_OS		2
+#define ARM_LPAE_TCR_SH_IS		3
+
+#define ARM_LPAE_TCR_RGN_NC		0
+#define ARM_LPAE_TCR_RGN_WBWA		1
+#define ARM_LPAE_TCR_RGN_WT		2
+#define ARM_LPAE_TCR_RGN_WB		3
+
+#define ARM_LPAE_TCR_PS_32_BIT		0x0ULL
+#define ARM_LPAE_TCR_PS_36_BIT		0x1ULL
+#define ARM_LPAE_TCR_PS_40_BIT		0x2ULL
+#define ARM_LPAE_TCR_PS_42_BIT		0x3ULL
+#define ARM_LPAE_TCR_PS_44_BIT		0x4ULL
+#define ARM_LPAE_TCR_PS_48_BIT		0x5ULL
+#define ARM_LPAE_TCR_PS_52_BIT		0x6ULL
+
+#endif /* IO_PGTABLE_ARM_H_ */
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 04fbd4bf0ff9f..f71a2eade04ab 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -20,6 +20,8 @@
 
 #include <asm/barrier.h>
 
+#include "io-pgtable-arm.h"
+
 #define ARM_LPAE_MAX_ADDR_BITS		52
 #define ARM_LPAE_S2_MAX_CONCAT_PAGES	16
 #define ARM_LPAE_MAX_LEVELS		4
@@ -100,23 +102,6 @@
 #define ARM_LPAE_PTE_MEMATTR_DEV	(((arm_lpae_iopte)0x1) << 2)
 
 /* Register bits */
-#define ARM_LPAE_TCR_TG0_4K		0
-#define ARM_LPAE_TCR_TG0_64K		1
-#define ARM_LPAE_TCR_TG0_16K		2
-
-#define ARM_LPAE_TCR_TG1_16K		1
-#define ARM_LPAE_TCR_TG1_4K		2
-#define ARM_LPAE_TCR_TG1_64K		3
-
-#define ARM_LPAE_TCR_SH_NS		0
-#define ARM_LPAE_TCR_SH_OS		2
-#define ARM_LPAE_TCR_SH_IS		3
-
-#define ARM_LPAE_TCR_RGN_NC		0
-#define ARM_LPAE_TCR_RGN_WBWA		1
-#define ARM_LPAE_TCR_RGN_WT		2
-#define ARM_LPAE_TCR_RGN_WB		3
-
 #define ARM_LPAE_VTCR_SL0_MASK		0x3
 
 #define ARM_LPAE_TCR_T0SZ_SHIFT		0
@@ -124,14 +109,6 @@
 #define ARM_LPAE_VTCR_PS_SHIFT		16
 #define ARM_LPAE_VTCR_PS_MASK		0x7
 
-#define ARM_LPAE_TCR_PS_32_BIT		0x0ULL
-#define ARM_LPAE_TCR_PS_36_BIT		0x1ULL
-#define ARM_LPAE_TCR_PS_40_BIT		0x2ULL
-#define ARM_LPAE_TCR_PS_42_BIT		0x3ULL
-#define ARM_LPAE_TCR_PS_44_BIT		0x4ULL
-#define ARM_LPAE_TCR_PS_48_BIT		0x5ULL
-#define ARM_LPAE_TCR_PS_52_BIT		0x6ULL
-
 #define ARM_LPAE_MAIR_ATTR_SHIFT(n)	((n) << 3)
 #define ARM_LPAE_MAIR_ATTR_MASK		0xff
 #define ARM_LPAE_MAIR_ATTR_DEVICE	0x04
-- 
2.26.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 10/25] iommu/arm-smmu-v3: Manage ASIDs with xarray
  2020-04-14 17:02 [PATCH v5 00/25] iommu: Shared Virtual Addressing and SMMUv3 support Jean-Philippe Brucker
                   ` (8 preceding siblings ...)
  2020-04-14 17:02 ` [PATCH v5 09/25] iommu/io-pgtable-arm: Move some definitions to a header Jean-Philippe Brucker
@ 2020-04-14 17:02 ` Jean-Philippe Brucker
  2020-04-14 17:02 ` [PATCH v5 11/25] arm64: cpufeature: Export symbol read_sanitised_ftr_reg() Jean-Philippe Brucker
                   ` (14 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2020-04-14 17:02 UTC (permalink / raw)
  To: iommu, devicetree, linux-arm-kernel, linux-pci, linux-mm
  Cc: joro, catalin.marinas, will, robin.murphy, kevin.tian, baolu.lu,
	Jonathan.Cameron, jacob.jun.pan, christian.koenig, zhangfei.gao,
	jgg, xuzaibo, Jean-Philippe Brucker

In preparation for sharing some ASIDs with the CPU, use a global xarray to
store ASIDs and their context. ASID#0 is now reserved, and the ASID
space is global.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 drivers/iommu/arm-smmu-v3.c | 27 ++++++++++++++++++---------
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 60a415e8e2b6f..96ee60002e85e 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -664,7 +664,6 @@ struct arm_smmu_device {
 
 #define ARM_SMMU_MAX_ASIDS		(1 << 16)
 	unsigned int			asid_bits;
-	DECLARE_BITMAP(asid_map, ARM_SMMU_MAX_ASIDS);
 
 #define ARM_SMMU_MAX_VMIDS		(1 << 16)
 	unsigned int			vmid_bits;
@@ -724,6 +723,8 @@ struct arm_smmu_option_prop {
 	const char *prop;
 };
 
+static DEFINE_XARRAY_ALLOC1(asid_xa);
+
 static struct arm_smmu_option_prop arm_smmu_options[] = {
 	{ ARM_SMMU_OPT_SKIP_PREFETCH, "hisilicon,broken-prefetch-cmd" },
 	{ ARM_SMMU_OPT_PAGE0_REGS_ONLY, "cavium,cn9900-broken-page1-regspace"},
@@ -1763,6 +1764,14 @@ static void arm_smmu_free_cd_tables(struct arm_smmu_domain *smmu_domain)
 	cdcfg->cdtab = NULL;
 }
 
+static void arm_smmu_free_asid(struct arm_smmu_ctx_desc *cd)
+{
+	if (!cd->asid)
+		return;
+
+	xa_erase(&asid_xa, cd->asid);
+}
+
 /* Stream table manipulation functions */
 static void
 arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
@@ -2448,10 +2457,9 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
 		struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
 
-		if (cfg->cdcfg.cdtab) {
+		if (cfg->cdcfg.cdtab)
 			arm_smmu_free_cd_tables(smmu_domain);
-			arm_smmu_bitmap_free(smmu->asid_map, cfg->cd.asid);
-		}
+		arm_smmu_free_asid(&cfg->cd);
 	} else {
 		struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg;
 		if (cfg->vmid)
@@ -2466,14 +2474,15 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 				       struct io_pgtable_cfg *pgtbl_cfg)
 {
 	int ret;
-	int asid;
+	u32 asid;
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
 	typeof(&pgtbl_cfg->arm_lpae_s1_cfg.tcr) tcr = &pgtbl_cfg->arm_lpae_s1_cfg.tcr;
 
-	asid = arm_smmu_bitmap_alloc(smmu->asid_map, smmu->asid_bits);
-	if (asid < 0)
-		return asid;
+	ret = xa_alloc(&asid_xa, &asid, &cfg->cd,
+		       XA_LIMIT(1, (1 << smmu->asid_bits) - 1), GFP_KERNEL);
+	if (ret)
+		return ret;
 
 	cfg->s1cdmax = master->ssid_bits;
 
@@ -2506,7 +2515,7 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 out_free_cd_tables:
 	arm_smmu_free_cd_tables(smmu_domain);
 out_free_asid:
-	arm_smmu_bitmap_free(smmu->asid_map, asid);
+	arm_smmu_free_asid(&cfg->cd);
 	return ret;
 }
 
-- 
2.26.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 11/25] arm64: cpufeature: Export symbol read_sanitised_ftr_reg()
  2020-04-14 17:02 [PATCH v5 00/25] iommu: Shared Virtual Addressing and SMMUv3 support Jean-Philippe Brucker
                   ` (9 preceding siblings ...)
  2020-04-14 17:02 ` [PATCH v5 10/25] iommu/arm-smmu-v3: Manage ASIDs with xarray Jean-Philippe Brucker
@ 2020-04-14 17:02 ` Jean-Philippe Brucker
  2020-04-14 17:02 ` [PATCH v5 12/25] iommu/arm-smmu-v3: Share process page tables Jean-Philippe Brucker
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2020-04-14 17:02 UTC (permalink / raw)
  To: iommu, devicetree, linux-arm-kernel, linux-pci, linux-mm
  Cc: joro, catalin.marinas, will, robin.murphy, kevin.tian, baolu.lu,
	Jonathan.Cameron, jacob.jun.pan, christian.koenig, zhangfei.gao,
	jgg, xuzaibo, Jean-Philippe Brucker, Suzuki K Poulose

The SMMUv3 driver would like to read the MMFR0 PARANGE field in order to
share CPU page tables with devices. Allow the driver to be built as
module by exporting the read_sanitized_ftr_reg() cpufeature symbol.

Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/kernel/cpufeature.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 9fac745aa7bb2..5f6adbf4ae893 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -841,6 +841,7 @@ u64 read_sanitised_ftr_reg(u32 id)
 	BUG_ON(!regp);
 	return regp->sys_val;
 }
+EXPORT_SYMBOL_GPL(read_sanitised_ftr_reg);
 
 #define read_sysreg_case(r)	\
 	case r:		return read_sysreg_s(r)
-- 
2.26.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 12/25] iommu/arm-smmu-v3: Share process page tables
  2020-04-14 17:02 [PATCH v5 00/25] iommu: Shared Virtual Addressing and SMMUv3 support Jean-Philippe Brucker
                   ` (10 preceding siblings ...)
  2020-04-14 17:02 ` [PATCH v5 11/25] arm64: cpufeature: Export symbol read_sanitised_ftr_reg() Jean-Philippe Brucker
@ 2020-04-14 17:02 ` Jean-Philippe Brucker
  2020-04-14 17:02 ` [PATCH v5 13/25] iommu/arm-smmu-v3: Seize private ASID Jean-Philippe Brucker
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2020-04-14 17:02 UTC (permalink / raw)
  To: iommu, devicetree, linux-arm-kernel, linux-pci, linux-mm
  Cc: joro, catalin.marinas, will, robin.murphy, kevin.tian, baolu.lu,
	Jonathan.Cameron, jacob.jun.pan, christian.koenig, zhangfei.gao,
	jgg, xuzaibo, Jean-Philippe Brucker, Suzuki K Poulose

With Shared Virtual Addressing (SVA), we need to mirror CPU TTBR, TCR,
MAIR and ASIDs in SMMU contexts. Each SMMU has a single ASID space split
into two sets, shared and private. Shared ASIDs correspond to those
obtained from the arch ASID allocator, and private ASIDs are used for
"classic" map/unmap DMA.

Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 drivers/iommu/arm-smmu-v3.c | 161 +++++++++++++++++++++++++++++++++++-
 1 file changed, 157 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 96ee60002e85e..09f4f712fb103 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -22,6 +22,7 @@
 #include <linux/iommu.h>
 #include <linux/iopoll.h>
 #include <linux/module.h>
+#include <linux/mmu_context.h>
 #include <linux/msi.h>
 #include <linux/of.h>
 #include <linux/of_address.h>
@@ -33,6 +34,8 @@
 
 #include <linux/amba/bus.h>
 
+#include "io-pgtable-arm.h"
+
 /* MMIO registers */
 #define ARM_SMMU_IDR0			0x0
 #define IDR0_ST_LVL			GENMASK(28, 27)
@@ -587,6 +590,9 @@ struct arm_smmu_ctx_desc {
 	u64				ttbr;
 	u64				tcr;
 	u64				mair;
+
+	refcount_t			refs;
+	struct mm_struct		*mm;
 };
 
 struct arm_smmu_l1_ctx_desc {
@@ -1660,7 +1666,8 @@ static int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
 #ifdef __BIG_ENDIAN
 			CTXDESC_CD_0_ENDI |
 #endif
-			CTXDESC_CD_0_R | CTXDESC_CD_0_A | CTXDESC_CD_0_ASET |
+			CTXDESC_CD_0_R | CTXDESC_CD_0_A |
+			(cd->mm ? 0 : CTXDESC_CD_0_ASET) |
 			CTXDESC_CD_0_AA64 |
 			FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid) |
 			CTXDESC_CD_0_V;
@@ -1764,12 +1771,156 @@ static void arm_smmu_free_cd_tables(struct arm_smmu_domain *smmu_domain)
 	cdcfg->cdtab = NULL;
 }
 
-static void arm_smmu_free_asid(struct arm_smmu_ctx_desc *cd)
+static void arm_smmu_init_cd(struct arm_smmu_ctx_desc *cd)
 {
+	refcount_set(&cd->refs, 1);
+}
+
+static bool arm_smmu_free_asid(struct arm_smmu_ctx_desc *cd)
+{
+	bool free;
+	struct arm_smmu_ctx_desc *old_cd;
+
 	if (!cd->asid)
-		return;
+		return false;
+
+	xa_lock(&asid_xa);
+	free = refcount_dec_and_test(&cd->refs);
+	if (free) {
+		old_cd = __xa_erase(&asid_xa, cd->asid);
+		WARN_ON(old_cd != cd);
+	}
+	xa_unlock(&asid_xa);
+	return free;
+}
+
+static struct arm_smmu_ctx_desc *arm_smmu_share_asid(u16 asid)
+{
+	struct arm_smmu_ctx_desc *cd;
+
+	cd = xa_load(&asid_xa, asid);
+	if (!cd)
+		return NULL;
+
+	if (cd->mm) {
+		/*
+		 * It's pretty common to find a stale CD when doing unbind-bind,
+		 * given that the release happens after a RCU grace period.
+		 * arm_smmu_free_asid() hasn't gone through yet, so reuse it.
+		 */
+		refcount_inc(&cd->refs);
+		return cd;
+	}
+
+	/*
+	 * Ouch, ASID is already in use for a private cd.
+	 * TODO: seize it.
+	 */
+	return ERR_PTR(-EEXIST);
+}
+
+__maybe_unused
+static struct arm_smmu_ctx_desc *arm_smmu_alloc_shared_cd(struct mm_struct *mm)
+{
+	u16 asid;
+	int ret = 0;
+	u64 tcr, par, reg;
+	struct arm_smmu_ctx_desc *cd;
+	struct arm_smmu_ctx_desc *old_cd = NULL;
+
+	asid = mm_context_get(mm);
+	if (!asid)
+		return ERR_PTR(-ESRCH);
+
+	cd = kzalloc(sizeof(*cd), GFP_KERNEL);
+	if (!cd) {
+		ret = -ENOMEM;
+		goto err_put_context;
+	}
+
+	arm_smmu_init_cd(cd);
+
+	xa_lock(&asid_xa);
+	old_cd = arm_smmu_share_asid(asid);
+	if (!old_cd) {
+		old_cd = __xa_store(&asid_xa, asid, cd, GFP_ATOMIC);
+		/*
+		 * Keep error, clear valid pointers. If there was an old entry
+		 * it has been moved already by arm_smmu_share_asid().
+		 */
+		old_cd = ERR_PTR(xa_err(old_cd));
+		cd->asid = asid;
+	}
+	xa_unlock(&asid_xa);
+
+	if (IS_ERR(old_cd)) {
+		ret = PTR_ERR(old_cd);
+		goto err_free_cd;
+	} else if (old_cd) {
+		if (WARN_ON(old_cd->mm != mm)) {
+			ret = -EINVAL;
+			goto err_free_cd;
+		}
+		kfree(cd);
+		mm_context_put(mm);
+		return old_cd;
+	}
+
+	tcr = FIELD_PREP(CTXDESC_CD_0_TCR_T0SZ, 64ULL - VA_BITS) |
+	      FIELD_PREP(CTXDESC_CD_0_TCR_IRGN0, ARM_LPAE_TCR_RGN_WBWA) |
+	      FIELD_PREP(CTXDESC_CD_0_TCR_ORGN0, ARM_LPAE_TCR_RGN_WBWA) |
+	      FIELD_PREP(CTXDESC_CD_0_TCR_SH0, ARM_LPAE_TCR_SH_IS) |
+	      CTXDESC_CD_0_TCR_EPD1 | CTXDESC_CD_0_AA64;
+
+	switch (PAGE_SIZE) {
+	case SZ_4K:
+		tcr |= FIELD_PREP(CTXDESC_CD_0_TCR_TG0, ARM_LPAE_TCR_TG0_4K);
+		break;
+	case SZ_16K:
+		tcr |= FIELD_PREP(CTXDESC_CD_0_TCR_TG0, ARM_LPAE_TCR_TG0_16K);
+		break;
+	case SZ_64K:
+		tcr |= FIELD_PREP(CTXDESC_CD_0_TCR_TG0, ARM_LPAE_TCR_TG0_64K);
+		break;
+	default:
+		WARN_ON(1);
+		ret = -EINVAL;
+		goto err_free_asid;
+	}
+
+	reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
+	par = cpuid_feature_extract_unsigned_field(reg, ID_AA64MMFR0_PARANGE_SHIFT);
+	tcr |= FIELD_PREP(CTXDESC_CD_0_TCR_IPS, par);
+
+	cd->ttbr = virt_to_phys(mm->pgd);
+	cd->tcr = tcr;
+	/*
+	 * MAIR value is pretty much constant and global, so we can just get it
+	 * from the current CPU register
+	 */
+	cd->mair = read_sysreg(mair_el1);
 
-	xa_erase(&asid_xa, cd->asid);
+	cd->mm = mm;
+
+	return cd;
+
+err_free_asid:
+	arm_smmu_free_asid(cd);
+err_free_cd:
+	kfree(cd);
+err_put_context:
+	mm_context_put(mm);
+	return ERR_PTR(ret);
+}
+
+__maybe_unused
+static void arm_smmu_free_shared_cd(struct arm_smmu_ctx_desc *cd)
+{
+	if (arm_smmu_free_asid(cd)) {
+		/* Unpin ASID */
+		mm_context_put(cd->mm);
+		kfree(cd);
+	}
 }
 
 /* Stream table manipulation functions */
@@ -2479,6 +2630,8 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
 	typeof(&pgtbl_cfg->arm_lpae_s1_cfg.tcr) tcr = &pgtbl_cfg->arm_lpae_s1_cfg.tcr;
 
+	arm_smmu_init_cd(&cfg->cd);
+
 	ret = xa_alloc(&asid_xa, &asid, &cfg->cd,
 		       XA_LIMIT(1, (1 << smmu->asid_bits) - 1), GFP_KERNEL);
 	if (ret)
-- 
2.26.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 13/25] iommu/arm-smmu-v3: Seize private ASID
  2020-04-14 17:02 [PATCH v5 00/25] iommu: Shared Virtual Addressing and SMMUv3 support Jean-Philippe Brucker
                   ` (11 preceding siblings ...)
  2020-04-14 17:02 ` [PATCH v5 12/25] iommu/arm-smmu-v3: Share process page tables Jean-Philippe Brucker
@ 2020-04-14 17:02 ` Jean-Philippe Brucker
  2020-04-14 17:02 ` [PATCH v5 14/25] iommu/arm-smmu-v3: Add support for VHE Jean-Philippe Brucker
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2020-04-14 17:02 UTC (permalink / raw)
  To: iommu, devicetree, linux-arm-kernel, linux-pci, linux-mm
  Cc: joro, catalin.marinas, will, robin.murphy, kevin.tian, baolu.lu,
	Jonathan.Cameron, jacob.jun.pan, christian.koenig, zhangfei.gao,
	jgg, xuzaibo, Jean-Philippe Brucker

The SMMU has a single ASID space, the union of shared and private ASID
sets. This means that the SMMU driver competes with the arch allocator
for ASIDs. Shared ASIDs are those of Linux processes, allocated by the
arch, and contribute in broadcast TLB maintenance. Private ASIDs are
allocated by the SMMU driver and used for "classic" map/unmap DMA. They
require explicit TLB invalidations.

When we pin down an mm_context and get an ASID that is already in use by
the SMMU, it belongs to a private context. We used to simply abort the
bind, but this is unfair to users that would be unable to bind a few
seemingly random processes. Try to allocate a new private ASID for the
context, and make the old ASID shared.

Introduce a new lock to prevent races when rewriting context
descriptors. Unfortunately it has to be a spinlock since we take it
while holding the asid lock, which will be held in non-sleepable context
(freeing ASIDs from an RCU callback).

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 drivers/iommu/arm-smmu-v3.c | 83 +++++++++++++++++++++++++++++--------
 1 file changed, 66 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 09f4f712fb103..8fbc5da133ae4 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -730,6 +730,7 @@ struct arm_smmu_option_prop {
 };
 
 static DEFINE_XARRAY_ALLOC1(asid_xa);
+static DEFINE_SPINLOCK(contexts_lock);
 
 static struct arm_smmu_option_prop arm_smmu_options[] = {
 	{ ARM_SMMU_OPT_SKIP_PREFETCH, "hisilicon,broken-prefetch-cmd" },
@@ -1534,6 +1535,17 @@ static int arm_smmu_cmdq_batch_submit(struct arm_smmu_device *smmu,
 }
 
 /* Context descriptor manipulation functions */
+static void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid)
+{
+	struct arm_smmu_cmdq_ent cmd = {
+		.opcode = CMDQ_OP_TLBI_NH_ASID,
+		.tlbi.asid = asid,
+	};
+
+	arm_smmu_cmdq_issue_cmd(smmu, &cmd);
+	arm_smmu_cmdq_issue_sync(smmu);
+}
+
 static void arm_smmu_sync_cd(struct arm_smmu_domain *smmu_domain,
 			     int ssid, bool leaf)
 {
@@ -1568,7 +1580,7 @@ static int arm_smmu_alloc_cd_leaf_table(struct arm_smmu_device *smmu,
 	size_t size = CTXDESC_L2_ENTRIES * (CTXDESC_CD_DWORDS << 3);
 
 	l1_desc->l2ptr = dmam_alloc_coherent(smmu->dev, size,
-					     &l1_desc->l2ptr_dma, GFP_KERNEL);
+					     &l1_desc->l2ptr_dma, GFP_ATOMIC);
 	if (!l1_desc->l2ptr) {
 		dev_warn(smmu->dev,
 			 "failed to allocate context descriptor table\n");
@@ -1614,8 +1626,8 @@ static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_domain *smmu_domain,
 	return l1_desc->l2ptr + idx * CTXDESC_CD_DWORDS;
 }
 
-static int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
-				   int ssid, struct arm_smmu_ctx_desc *cd)
+static int __arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
+				     int ssid, struct arm_smmu_ctx_desc *cd)
 {
 	/*
 	 * This function handles the following cases:
@@ -1691,6 +1703,17 @@ static int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
 	return 0;
 }
 
+static int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
+				   int ssid, struct arm_smmu_ctx_desc *cd)
+{
+	int ret;
+
+	spin_lock(&contexts_lock);
+	ret = __arm_smmu_write_ctx_desc(smmu_domain, ssid, cd);
+	spin_unlock(&contexts_lock);
+	return ret;
+}
+
 static int arm_smmu_alloc_cd_tables(struct arm_smmu_domain *smmu_domain)
 {
 	int ret;
@@ -1794,9 +1817,18 @@ static bool arm_smmu_free_asid(struct arm_smmu_ctx_desc *cd)
 	return free;
 }
 
+/*
+ * Try to reserve this ASID in the SMMU. If it is in use, try to steal it from
+ * the private entry. Careful here, we may be modifying the context tables of
+ * another SMMU!
+ */
 static struct arm_smmu_ctx_desc *arm_smmu_share_asid(u16 asid)
 {
+	int ret;
+	u32 new_asid;
 	struct arm_smmu_ctx_desc *cd;
+	struct arm_smmu_device *smmu;
+	struct arm_smmu_domain *smmu_domain;
 
 	cd = xa_load(&asid_xa, asid);
 	if (!cd)
@@ -1812,11 +1844,31 @@ static struct arm_smmu_ctx_desc *arm_smmu_share_asid(u16 asid)
 		return cd;
 	}
 
+	smmu_domain = container_of(cd, struct arm_smmu_domain, s1_cfg.cd);
+	smmu = smmu_domain->smmu;
+
+	/*
+	 * Race with unmap: TLB invalidations will start targeting the new ASID,
+	 * which isn't assigned yet. We'll do an invalidate-all on the old ASID
+	 * later, so it doesn't matter.
+	 */
+	ret = __xa_alloc(&asid_xa, &new_asid, cd,
+			 XA_LIMIT(1, 1 << smmu->asid_bits), GFP_ATOMIC);
+	if (ret)
+		return ERR_PTR(-ENOSPC);
+	cd->asid = new_asid;
+
 	/*
-	 * Ouch, ASID is already in use for a private cd.
-	 * TODO: seize it.
+	 * Update ASID and invalidate CD in all associated masters. There will
+	 * be some overlap between use of both ASIDs, until we invalidate the
+	 * TLB.
 	 */
-	return ERR_PTR(-EEXIST);
+	arm_smmu_write_ctx_desc(smmu_domain, 0, cd);
+
+	/* Invalidate TLB entries previously associated with that context */
+	arm_smmu_tlb_inv_asid(smmu, asid);
+
+	return NULL;
 }
 
 __maybe_unused
@@ -2407,15 +2459,6 @@ static void arm_smmu_tlb_inv_context(void *cookie)
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	struct arm_smmu_cmdq_ent cmd;
 
-	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
-		cmd.opcode	= CMDQ_OP_TLBI_NH_ASID;
-		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd.asid;
-		cmd.tlbi.vmid	= 0;
-	} else {
-		cmd.opcode	= CMDQ_OP_TLBI_S12_VMALL;
-		cmd.tlbi.vmid	= smmu_domain->s2_cfg.vmid;
-	}
-
 	/*
 	 * NOTE: when io-pgtable is in non-strict mode, we may get here with
 	 * PTEs previously cleared by unmaps on the current CPU not yet visible
@@ -2423,8 +2466,14 @@ static void arm_smmu_tlb_inv_context(void *cookie)
 	 * insertion to guarantee those are observed before the TLBI. Do be
 	 * careful, 007.
 	 */
-	arm_smmu_cmdq_issue_cmd(smmu, &cmd);
-	arm_smmu_cmdq_issue_sync(smmu);
+	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
+		arm_smmu_tlb_inv_asid(smmu, smmu_domain->s1_cfg.cd.asid);
+	} else {
+		cmd.opcode	= CMDQ_OP_TLBI_S12_VMALL;
+		cmd.tlbi.vmid	= smmu_domain->s2_cfg.vmid;
+		arm_smmu_cmdq_issue_cmd(smmu, &cmd);
+		arm_smmu_cmdq_issue_sync(smmu);
+	}
 	arm_smmu_atc_inv_domain(smmu_domain, 0, 0, 0);
 }
 
-- 
2.26.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 14/25] iommu/arm-smmu-v3: Add support for VHE
  2020-04-14 17:02 [PATCH v5 00/25] iommu: Shared Virtual Addressing and SMMUv3 support Jean-Philippe Brucker
                   ` (12 preceding siblings ...)
  2020-04-14 17:02 ` [PATCH v5 13/25] iommu/arm-smmu-v3: Seize private ASID Jean-Philippe Brucker
@ 2020-04-14 17:02 ` Jean-Philippe Brucker
  2020-04-14 17:02 ` [PATCH v5 15/25] iommu/arm-smmu-v3: Enable broadcast TLB maintenance Jean-Philippe Brucker
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2020-04-14 17:02 UTC (permalink / raw)
  To: iommu, devicetree, linux-arm-kernel, linux-pci, linux-mm
  Cc: joro, catalin.marinas, will, robin.murphy, kevin.tian, baolu.lu,
	Jonathan.Cameron, jacob.jun.pan, christian.koenig, zhangfei.gao,
	jgg, xuzaibo, Jean-Philippe Brucker

ARMv8.1 extensions added Virtualization Host Extensions (VHE), which allow
to run a host kernel at EL2. When using normal DMA, Device and CPU address
spaces are dissociated, and do not need to implement the same
capabilities, so VHE hasn't been used in the SMMU until now.

With shared address spaces however, ASIDs are shared between MMU and SMMU,
and broadcast TLB invalidations issued by a CPU are taken into account by
the SMMU. TLB entries on both sides need to have identical exception level
in order to be cleared with a single invalidation.

When the CPU is using VHE, enable VHE in the SMMU for all STEs. Normal DMA
mappings will need to use TLBI_EL2 commands instead of TLBI_NH, but
shouldn't be otherwise affected by this change.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
v4->v5: bump feature bit
---
 drivers/iommu/arm-smmu-v3.c | 31 ++++++++++++++++++++++++++-----
 1 file changed, 26 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 8fbc5da133ae4..21d458d817fc2 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -13,6 +13,7 @@
 #include <linux/acpi_iort.h>
 #include <linux/bitfield.h>
 #include <linux/bitops.h>
+#include <linux/cpufeature.h>
 #include <linux/crash_dump.h>
 #include <linux/delay.h>
 #include <linux/dma-iommu.h>
@@ -480,6 +481,8 @@ struct arm_smmu_cmdq_ent {
 		#define CMDQ_OP_TLBI_NH_ASID	0x11
 		#define CMDQ_OP_TLBI_NH_VA	0x12
 		#define CMDQ_OP_TLBI_EL2_ALL	0x20
+		#define CMDQ_OP_TLBI_EL2_ASID	0x21
+		#define CMDQ_OP_TLBI_EL2_VA	0x22
 		#define CMDQ_OP_TLBI_S12_VMALL	0x28
 		#define CMDQ_OP_TLBI_S2_IPA	0x2a
 		#define CMDQ_OP_TLBI_NSNH_ALL	0x30
@@ -651,6 +654,7 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_STALL_FORCE	(1 << 13)
 #define ARM_SMMU_FEAT_VAX		(1 << 14)
 #define ARM_SMMU_FEAT_RANGE_INV		(1 << 15)
+#define ARM_SMMU_FEAT_E2H		(1 << 16)
 	u32				features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
@@ -924,6 +928,8 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 		cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_NUM, ent->tlbi.num);
 		cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_SCALE, ent->tlbi.scale);
 		cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid);
+		/* Fallthrough */
+	case CMDQ_OP_TLBI_EL2_VA:
 		cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_ASID, ent->tlbi.asid);
 		cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_LEAF, ent->tlbi.leaf);
 		cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_TTL, ent->tlbi.ttl);
@@ -945,6 +951,9 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 	case CMDQ_OP_TLBI_S12_VMALL:
 		cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid);
 		break;
+	case CMDQ_OP_TLBI_EL2_ASID:
+		cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_ASID, ent->tlbi.asid);
+		break;
 	case CMDQ_OP_ATC_INV:
 		cmd[0] |= FIELD_PREP(CMDQ_0_SSV, ent->substream_valid);
 		cmd[0] |= FIELD_PREP(CMDQ_ATC_0_GLOBAL, ent->atc.global);
@@ -1538,7 +1547,8 @@ static int arm_smmu_cmdq_batch_submit(struct arm_smmu_device *smmu,
 static void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid)
 {
 	struct arm_smmu_cmdq_ent cmd = {
-		.opcode = CMDQ_OP_TLBI_NH_ASID,
+		.opcode	= smmu->features & ARM_SMMU_FEAT_E2H ?
+			CMDQ_OP_TLBI_EL2_ASID : CMDQ_OP_TLBI_NH_ASID,
 		.tlbi.asid = asid,
 	};
 
@@ -2093,13 +2103,16 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 	}
 
 	if (s1_cfg) {
+		int strw = smmu->features & ARM_SMMU_FEAT_E2H ?
+			STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1;
+
 		BUG_ON(ste_live);
 		dst[1] = cpu_to_le64(
 			 FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) |
 			 FIELD_PREP(STRTAB_STE_1_S1CIR, STRTAB_STE_1_S1C_CACHE_WBRA) |
 			 FIELD_PREP(STRTAB_STE_1_S1COR, STRTAB_STE_1_S1C_CACHE_WBRA) |
 			 FIELD_PREP(STRTAB_STE_1_S1CSH, ARM_SMMU_SH_ISH) |
-			 FIELD_PREP(STRTAB_STE_1_STRW, STRTAB_STE_1_STRW_NSEL1));
+			 FIELD_PREP(STRTAB_STE_1_STRW, strw));
 
 		if (smmu->features & ARM_SMMU_FEAT_STALLS &&
 		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
@@ -2495,7 +2508,8 @@ static void arm_smmu_tlb_inv_range(unsigned long iova, size_t size,
 		return;
 
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
-		cmd.opcode	= CMDQ_OP_TLBI_NH_VA;
+		cmd.opcode	= smmu->features & ARM_SMMU_FEAT_E2H ?
+				  CMDQ_OP_TLBI_EL2_VA : CMDQ_OP_TLBI_NH_VA;
 		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd.asid;
 	} else {
 		cmd.opcode	= CMDQ_OP_TLBI_S2_IPA;
@@ -3800,7 +3814,11 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
 	writel_relaxed(reg, smmu->base + ARM_SMMU_CR1);
 
 	/* CR2 (random crap) */
-	reg = CR2_PTM | CR2_RECINVSID | CR2_E2H;
+	reg = CR2_PTM | CR2_RECINVSID;
+
+	if (smmu->features & ARM_SMMU_FEAT_E2H)
+		reg |= CR2_E2H;
+
 	writel_relaxed(reg, smmu->base + ARM_SMMU_CR2);
 
 	/* Stream table */
@@ -3958,8 +3976,11 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 	if (reg & IDR0_MSI)
 		smmu->features |= ARM_SMMU_FEAT_MSI;
 
-	if (reg & IDR0_HYP)
+	if (reg & IDR0_HYP) {
 		smmu->features |= ARM_SMMU_FEAT_HYP;
+		if (cpus_have_cap(ARM64_HAS_VIRT_HOST_EXTN))
+			smmu->features |= ARM_SMMU_FEAT_E2H;
+	}
 
 	/*
 	 * The coherency feature as set by FW is used in preference to the ID
-- 
2.26.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 15/25] iommu/arm-smmu-v3: Enable broadcast TLB maintenance
  2020-04-14 17:02 [PATCH v5 00/25] iommu: Shared Virtual Addressing and SMMUv3 support Jean-Philippe Brucker
                   ` (13 preceding siblings ...)
  2020-04-14 17:02 ` [PATCH v5 14/25] iommu/arm-smmu-v3: Add support for VHE Jean-Philippe Brucker
@ 2020-04-14 17:02 ` Jean-Philippe Brucker
  2020-04-14 17:02 ` [PATCH v5 16/25] iommu/arm-smmu-v3: Add SVA feature checking Jean-Philippe Brucker
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2020-04-14 17:02 UTC (permalink / raw)
  To: iommu, devicetree, linux-arm-kernel, linux-pci, linux-mm
  Cc: joro, catalin.marinas, will, robin.murphy, kevin.tian, baolu.lu,
	Jonathan.Cameron, jacob.jun.pan, christian.koenig, zhangfei.gao,
	jgg, xuzaibo, Jean-Philippe Brucker

The SMMUv3 can handle invalidation targeted at TLB entries with shared
ASIDs. If the implementation supports broadcast TLB maintenance, enable it
and keep track of it in a feature bit. The SMMU will then be affected by
inner-shareable TLB invalidations from other agents.

A major side-effect of this change is that stage-2 translation contexts
are now affected by all invalidations by VMID. VMIDs are all shared and
the only ways to prevent over-invalidation, since the stage-2 page tables
are not shared between CPU and SMMU, are to either disable BTM or allocate
different VMIDs. This patch does not address the problem.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
v4->v5: bump feature bit
---
 drivers/iommu/arm-smmu-v3.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 21d458d817fc2..e7de8a7459fa4 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -56,6 +56,7 @@
 #define IDR0_ASID16			(1 << 12)
 #define IDR0_ATS			(1 << 10)
 #define IDR0_HYP			(1 << 9)
+#define IDR0_BTM			(1 << 5)
 #define IDR0_COHACC			(1 << 4)
 #define IDR0_TTF			GENMASK(3, 2)
 #define IDR0_TTF_AARCH64		2
@@ -655,6 +656,7 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_VAX		(1 << 14)
 #define ARM_SMMU_FEAT_RANGE_INV		(1 << 15)
 #define ARM_SMMU_FEAT_E2H		(1 << 16)
+#define ARM_SMMU_FEAT_BTM		(1 << 17)
 	u32				features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
@@ -3814,11 +3816,14 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
 	writel_relaxed(reg, smmu->base + ARM_SMMU_CR1);
 
 	/* CR2 (random crap) */
-	reg = CR2_PTM | CR2_RECINVSID;
+	reg = CR2_RECINVSID;
 
 	if (smmu->features & ARM_SMMU_FEAT_E2H)
 		reg |= CR2_E2H;
 
+	if (!(smmu->features & ARM_SMMU_FEAT_BTM))
+		reg |= CR2_PTM;
+
 	writel_relaxed(reg, smmu->base + ARM_SMMU_CR2);
 
 	/* Stream table */
@@ -3929,6 +3934,7 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 {
 	u32 reg;
 	bool coherent = smmu->features & ARM_SMMU_FEAT_COHERENCY;
+	bool vhe = cpus_have_cap(ARM64_HAS_VIRT_HOST_EXTN);
 
 	/* IDR0 */
 	reg = readl_relaxed(smmu->base + ARM_SMMU_IDR0);
@@ -3978,10 +3984,19 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 
 	if (reg & IDR0_HYP) {
 		smmu->features |= ARM_SMMU_FEAT_HYP;
-		if (cpus_have_cap(ARM64_HAS_VIRT_HOST_EXTN))
+		if (vhe)
 			smmu->features |= ARM_SMMU_FEAT_E2H;
 	}
 
+	/*
+	 * If the CPU is using VHE, but the SMMU doesn't support it, the SMMU
+	 * will create TLB entries for NH-EL1 world and will miss the
+	 * broadcasted TLB invalidations that target EL2-E2H world. Don't enable
+	 * BTM in that case.
+	 */
+	if (reg & IDR0_BTM && (!vhe || reg & IDR0_HYP))
+		smmu->features |= ARM_SMMU_FEAT_BTM;
+
 	/*
 	 * The coherency feature as set by FW is used in preference to the ID
 	 * register, but warn on mismatch.
-- 
2.26.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 16/25] iommu/arm-smmu-v3: Add SVA feature checking
  2020-04-14 17:02 [PATCH v5 00/25] iommu: Shared Virtual Addressing and SMMUv3 support Jean-Philippe Brucker
                   ` (14 preceding siblings ...)
  2020-04-14 17:02 ` [PATCH v5 15/25] iommu/arm-smmu-v3: Enable broadcast TLB maintenance Jean-Philippe Brucker
@ 2020-04-14 17:02 ` Jean-Philippe Brucker
  2020-04-14 17:02 ` [PATCH v5 17/25] iommu/arm-smmu-v3: Implement mm operations Jean-Philippe Brucker
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2020-04-14 17:02 UTC (permalink / raw)
  To: iommu, devicetree, linux-arm-kernel, linux-pci, linux-mm
  Cc: joro, catalin.marinas, will, robin.murphy, kevin.tian, baolu.lu,
	Jonathan.Cameron, jacob.jun.pan, christian.koenig, zhangfei.gao,
	jgg, xuzaibo, Jean-Philippe Brucker, Suzuki K Poulose

Aggregate all sanity-checks for sharing CPU page tables with the SMMU
under a single ARM_SMMU_FEAT_SVA bit. For PCIe SVA, users also need to
check FEAT_ATS and FEAT_PRI. For platform SVA, they will most likely have
to check FEAT_STALLS.

Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
v4->v5: bump feature bit
---
 drivers/iommu/arm-smmu-v3.c | 72 +++++++++++++++++++++++++++++++++++++
 1 file changed, 72 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index e7de8a7459fa4..d209d85402a83 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -657,6 +657,7 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_RANGE_INV		(1 << 15)
 #define ARM_SMMU_FEAT_E2H		(1 << 16)
 #define ARM_SMMU_FEAT_BTM		(1 << 17)
+#define ARM_SMMU_FEAT_SVA		(1 << 18)
 	u32				features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
@@ -3930,6 +3931,74 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
 	return 0;
 }
 
+static bool arm_smmu_supports_sva(struct arm_smmu_device *smmu)
+{
+	unsigned long reg, fld;
+	unsigned long oas;
+	unsigned long asid_bits;
+
+	u32 feat_mask = ARM_SMMU_FEAT_BTM | ARM_SMMU_FEAT_COHERENCY;
+
+	if ((smmu->features & feat_mask) != feat_mask)
+		return false;
+
+	if (!(smmu->pgsize_bitmap & PAGE_SIZE))
+		return false;
+
+	/*
+	 * Get the smallest PA size of all CPUs (sanitized by cpufeature). We're
+	 * not even pretending to support AArch32 here.
+	 */
+	reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
+	fld = cpuid_feature_extract_unsigned_field(reg, ID_AA64MMFR0_PARANGE_SHIFT);
+	switch (fld) {
+	case 0x0:
+		oas = 32;
+		break;
+	case 0x1:
+		oas = 36;
+		break;
+	case 0x2:
+		oas = 40;
+		break;
+	case 0x3:
+		oas = 42;
+		break;
+	case 0x4:
+		oas = 44;
+		break;
+	case 0x5:
+		oas = 48;
+		break;
+	case 0x6:
+		oas = 52;
+		break;
+	default:
+		return false;
+	}
+
+	/* abort if MMU outputs addresses greater than what we support. */
+	if (smmu->oas < oas)
+		return false;
+
+	/* We can support bigger ASIDs than the CPU, but not smaller */
+	fld = cpuid_feature_extract_unsigned_field(reg, ID_AA64MMFR0_ASID_SHIFT);
+	asid_bits = fld ? 16 : 8;
+	if (smmu->asid_bits < asid_bits)
+		return false;
+
+	/*
+	 * See max_pinned_asids in arch/arm64/mm/context.c. The following is
+	 * generally the maximum number of bindable processes.
+	 */
+	if (IS_ENABLED(CONFIG_UNMAP_KERNEL_AT_EL0))
+		asid_bits--;
+	dev_dbg(smmu->dev, "%d shared contexts\n", (1 << asid_bits) -
+		num_possible_cpus() - 2);
+
+	return true;
+}
+
 static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 {
 	u32 reg;
@@ -4142,6 +4211,9 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 
 	smmu->ias = max(smmu->ias, smmu->oas);
 
+	if (arm_smmu_supports_sva(smmu))
+		smmu->features |= ARM_SMMU_FEAT_SVA;
+
 	dev_info(smmu->dev, "ias %lu-bit, oas %lu-bit (features 0x%08x)\n",
 		 smmu->ias, smmu->oas, smmu->features);
 	return 0;
-- 
2.26.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 17/25] iommu/arm-smmu-v3: Implement mm operations
  2020-04-14 17:02 [PATCH v5 00/25] iommu: Shared Virtual Addressing and SMMUv3 support Jean-Philippe Brucker
                   ` (15 preceding siblings ...)
  2020-04-14 17:02 ` [PATCH v5 16/25] iommu/arm-smmu-v3: Add SVA feature checking Jean-Philippe Brucker
@ 2020-04-14 17:02 ` Jean-Philippe Brucker
  2020-04-14 17:02 ` [PATCH v5 18/25] iommu/arm-smmu-v3: Hook up ATC invalidation to mm ops Jean-Philippe Brucker
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2020-04-14 17:02 UTC (permalink / raw)
  To: iommu, devicetree, linux-arm-kernel, linux-pci, linux-mm
  Cc: joro, catalin.marinas, will, robin.murphy, kevin.tian, baolu.lu,
	Jonathan.Cameron, jacob.jun.pan, christian.koenig, zhangfei.gao,
	jgg, xuzaibo, Jean-Philippe Brucker

Hook SVA operations to support sharing page tables with the SMMUv3:

* dev_enable/disable/has_feature for device drivers to modify the SVA
  state.
* sva_bind/unbind and sva_get_pasid to bind device and address spaces.
* The mm_attach/detach/clear/invalidate/free callbacks from iommu-sva

The clear() operation has to detach the page tables while DMA may still
be running (because the process died). To avoid any event 0x0a print
(C_BAD_CD) we disable translation and error reporting, without clearing
CD.V. PCIe Translation Requests and Page Requests are silently denied.
The detach() operation always happens whether or not clear() was
invoked, and properly disables the CD.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
v4->v5: Add clear() operation
---
 drivers/iommu/Kconfig       |   1 +
 drivers/iommu/arm-smmu-v3.c | 214 +++++++++++++++++++++++++++++++++++-
 2 files changed, 211 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 411a7ee2ab12d..8118f090a51b3 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -435,6 +435,7 @@ config ARM_SMMU_V3
 	tristate "ARM Ltd. System MMU Version 3 (SMMUv3) Support"
 	depends on ARM64
 	select IOMMU_API
+	select IOMMU_SVA
 	select IOMMU_IO_PGTABLE_LPAE
 	select GENERIC_MSI_IRQ_DOMAIN
 	help
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index d209d85402a83..6640c2ac2a7c5 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -36,6 +36,7 @@
 #include <linux/amba/bus.h>
 
 #include "io-pgtable-arm.h"
+#include "iommu-sva.h"
 
 /* MMIO registers */
 #define ARM_SMMU_IDR0			0x0
@@ -739,6 +740,13 @@ struct arm_smmu_option_prop {
 static DEFINE_XARRAY_ALLOC1(asid_xa);
 static DEFINE_SPINLOCK(contexts_lock);
 
+/*
+ * When a process dies, DMA is still running but we need to clear the pgd. If we
+ * simply cleared the valid bit from the context descriptor, we'd get event 0x0a
+ * which are not recoverable.
+ */
+static struct arm_smmu_ctx_desc invalid_cd = { 0 };
+
 static struct arm_smmu_option_prop arm_smmu_options[] = {
 	{ ARM_SMMU_OPT_SKIP_PREFETCH, "hisilicon,broken-prefetch-cmd" },
 	{ ARM_SMMU_OPT_PAGE0_REGS_ONLY, "cavium,cn9900-broken-page1-regspace"},
@@ -1649,7 +1657,9 @@ static int __arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
 	 * (2) Install a secondary CD, for SID+SSID traffic.
 	 * (3) Update ASID of a CD. Atomically write the first 64 bits of the
 	 *     CD, then invalidate the old entry and mappings.
-	 * (4) Remove a secondary CD.
+	 * (4) Quiesce the context without clearing the valid bit. Disable
+	 *     translation, and ignore any translation fault.
+	 * (5) Remove a secondary CD.
 	 */
 	u64 val;
 	bool cd_live;
@@ -1666,8 +1676,11 @@ static int __arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
 	val = le64_to_cpu(cdptr[0]);
 	cd_live = !!(val & CTXDESC_CD_0_V);
 
-	if (!cd) { /* (4) */
+	if (!cd) { /* (5) */
 		val = 0;
+	} else if (cd == &invalid_cd) { /* (4) */
+		val &= ~(CTXDESC_CD_0_S | CTXDESC_CD_0_R);
+		val |= CTXDESC_CD_0_TCR_EPD0;
 	} else if (cd_live) { /* (3) */
 		val &= ~CTXDESC_CD_0_ASID;
 		val |= FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid);
@@ -1884,7 +1897,6 @@ static struct arm_smmu_ctx_desc *arm_smmu_share_asid(u16 asid)
 	return NULL;
 }
 
-__maybe_unused
 static struct arm_smmu_ctx_desc *arm_smmu_alloc_shared_cd(struct mm_struct *mm)
 {
 	u16 asid;
@@ -1978,7 +1990,6 @@ static struct arm_smmu_ctx_desc *arm_smmu_alloc_shared_cd(struct mm_struct *mm)
 	return ERR_PTR(ret);
 }
 
-__maybe_unused
 static void arm_smmu_free_shared_cd(struct arm_smmu_ctx_desc *cd)
 {
 	if (arm_smmu_free_asid(cd)) {
@@ -3008,6 +3019,16 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	master = dev_iommu_priv_get(dev);
 	smmu = master->smmu;
 
+	/*
+	 * Checking that SVA is disabled ensures that this device isn't bound to
+	 * any mm, and can be safely detached from its old domain. Bonds cannot
+	 * be removed concurrently since we're holding the group mutex.
+	 */
+	if (iommu_sva_enabled(dev)) {
+		dev_err(dev, "cannot attach - SVA enabled\n");
+		return -EBUSY;
+	}
+
 	arm_smmu_detach_dev(master);
 
 	mutex_lock(&smmu_domain->init_mutex);
@@ -3107,6 +3128,99 @@ arm_smmu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova)
 	return ops->iova_to_phys(ops, iova);
 }
 
+static void arm_smmu_mm_invalidate(struct device *dev, int pasid, void *entry,
+				   unsigned long iova, size_t size)
+{
+	/* TODO: Invalidate ATC */
+}
+
+static int arm_smmu_mm_attach(struct device *dev, int pasid, void *entry,
+			      bool attach_domain)
+{
+	struct arm_smmu_ctx_desc *cd = entry;
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+
+	/*
+	 * If another device in the domain has already been attached, the
+	 * context descriptor is already valid.
+	 */
+	if (!attach_domain)
+		return 0;
+
+	return arm_smmu_write_ctx_desc(smmu_domain, pasid, cd);
+}
+
+static void arm_smmu_mm_clear(struct device *dev, int pasid, void *entry)
+{
+	struct arm_smmu_ctx_desc *cd = entry;
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+
+	/*
+	 * DMA may still be running. Keep the cd valid but disable
+	 * translation, so that new events will still result in stall.
+	 */
+	arm_smmu_write_ctx_desc(smmu_domain, pasid, &invalid_cd);
+
+	/*
+	 * The ASID allocator won't broadcast the final TLB invalidations
+	 * for this ASID, so we need to do it manually.
+	 */
+	arm_smmu_tlb_inv_asid(smmu_domain->smmu, cd->asid);
+
+	/* TODO: invalidate ATC */
+}
+
+static void arm_smmu_mm_detach(struct device *dev, int pasid, void *entry,
+			       bool detach_domain, bool cleared)
+{
+	struct arm_smmu_ctx_desc *cd = entry;
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+
+	if (detach_domain) {
+		arm_smmu_write_ctx_desc(smmu_domain, pasid, NULL);
+
+		if (!cleared)
+			/* See comment in arm_smmu_mm_clear() */
+			arm_smmu_tlb_inv_asid(smmu_domain->smmu, cd->asid);
+	}
+
+	/* TODO: invalidate ATC */
+}
+
+static void *arm_smmu_mm_alloc(struct mm_struct *mm)
+{
+	return arm_smmu_alloc_shared_cd(mm);
+}
+
+static void arm_smmu_mm_free(void *entry)
+{
+	arm_smmu_free_shared_cd(entry);
+}
+
+static struct io_mm_ops arm_smmu_mm_ops = {
+	.alloc		= arm_smmu_mm_alloc,
+	.invalidate	= arm_smmu_mm_invalidate,
+	.attach		= arm_smmu_mm_attach,
+	.clear		= arm_smmu_mm_clear,
+	.detach		= arm_smmu_mm_detach,
+	.free		= arm_smmu_mm_free,
+};
+
+static struct iommu_sva *
+arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
+{
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+
+	if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
+		return ERR_PTR(-EINVAL);
+
+	return iommu_sva_bind_generic(dev, mm, &arm_smmu_mm_ops, drvdata);
+}
+
 static struct platform_driver arm_smmu_driver;
 
 static
@@ -3225,6 +3339,7 @@ static void arm_smmu_remove_device(struct device *dev)
 
 	master = dev_iommu_priv_get(dev);
 	smmu = master->smmu;
+	WARN_ON(iommu_sva_disable(dev));
 	arm_smmu_detach_dev(master);
 	iommu_group_remove_device(dev);
 	iommu_device_unlink(&smmu->iommu, dev);
@@ -3344,6 +3459,90 @@ static void arm_smmu_get_resv_regions(struct device *dev,
 	iommu_dma_get_resv_regions(dev, head);
 }
 
+static bool arm_smmu_iopf_supported(struct arm_smmu_master *master)
+{
+	return false;
+}
+
+static bool arm_smmu_dev_has_feature(struct device *dev,
+				     enum iommu_dev_features feat)
+{
+	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+
+	if (!master)
+		return false;
+
+	switch (feat) {
+	case IOMMU_DEV_FEAT_SVA:
+		if (!(master->smmu->features & ARM_SMMU_FEAT_SVA))
+			return false;
+
+		/* SSID and IOPF support are mandatory for the moment */
+		return master->ssid_bits && arm_smmu_iopf_supported(master);
+	default:
+		return false;
+	}
+}
+
+static bool arm_smmu_dev_feature_enabled(struct device *dev,
+					 enum iommu_dev_features feat)
+{
+	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+
+	if (!master)
+		return false;
+
+	switch (feat) {
+	case IOMMU_DEV_FEAT_SVA:
+		return iommu_sva_enabled(dev);
+	default:
+		return false;
+	}
+}
+
+static int arm_smmu_dev_enable_sva(struct device *dev)
+{
+	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+	struct iommu_sva_param param = {
+		.min_pasid = 1,
+		.max_pasid = 0xfffffU,
+	};
+
+	param.max_pasid = min(param.max_pasid, (1U << master->ssid_bits) - 1);
+	return iommu_sva_enable(dev, &param);
+}
+
+static int arm_smmu_dev_enable_feature(struct device *dev,
+				       enum iommu_dev_features feat)
+{
+	if (!arm_smmu_dev_has_feature(dev, feat))
+		return -ENODEV;
+
+	if (arm_smmu_dev_feature_enabled(dev, feat))
+		return -EBUSY;
+
+	switch (feat) {
+	case IOMMU_DEV_FEAT_SVA:
+		return arm_smmu_dev_enable_sva(dev);
+	default:
+		return -EINVAL;
+	}
+}
+
+static int arm_smmu_dev_disable_feature(struct device *dev,
+					enum iommu_dev_features feat)
+{
+	if (!arm_smmu_dev_feature_enabled(dev, feat))
+		return -EINVAL;
+
+	switch (feat) {
+	case IOMMU_DEV_FEAT_SVA:
+		return iommu_sva_disable(dev);
+	default:
+		return -EINVAL;
+	}
+}
+
 static struct iommu_ops arm_smmu_ops = {
 	.capable		= arm_smmu_capable,
 	.domain_alloc		= arm_smmu_domain_alloc,
@@ -3362,6 +3561,13 @@ static struct iommu_ops arm_smmu_ops = {
 	.of_xlate		= arm_smmu_of_xlate,
 	.get_resv_regions	= arm_smmu_get_resv_regions,
 	.put_resv_regions	= generic_iommu_put_resv_regions,
+	.dev_has_feat		= arm_smmu_dev_has_feature,
+	.dev_feat_enabled	= arm_smmu_dev_feature_enabled,
+	.dev_enable_feat	= arm_smmu_dev_enable_feature,
+	.dev_disable_feat	= arm_smmu_dev_disable_feature,
+	.sva_bind		= arm_smmu_sva_bind,
+	.sva_unbind		= iommu_sva_unbind_generic,
+	.sva_get_pasid		= iommu_sva_get_pasid_generic,
 	.pgsize_bitmap		= -1UL, /* Restricted during device attach */
 };
 
-- 
2.26.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 18/25] iommu/arm-smmu-v3: Hook up ATC invalidation to mm ops
  2020-04-14 17:02 [PATCH v5 00/25] iommu: Shared Virtual Addressing and SMMUv3 support Jean-Philippe Brucker
                   ` (16 preceding siblings ...)
  2020-04-14 17:02 ` [PATCH v5 17/25] iommu/arm-smmu-v3: Implement mm operations Jean-Philippe Brucker
@ 2020-04-14 17:02 ` Jean-Philippe Brucker
  2020-04-14 17:02 ` [PATCH v5 19/25] iommu/arm-smmu-v3: Add support for Hardware Translation Table Update Jean-Philippe Brucker
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2020-04-14 17:02 UTC (permalink / raw)
  To: iommu, devicetree, linux-arm-kernel, linux-pci, linux-mm
  Cc: joro, catalin.marinas, will, robin.murphy, kevin.tian, baolu.lu,
	Jonathan.Cameron, jacob.jun.pan, christian.koenig, zhangfei.gao,
	jgg, xuzaibo, Jean-Philippe Brucker

iommu-sva calls us when an mm is modified. Perform the required ATC
invalidations.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
v4->v5: more comments
---
 drivers/iommu/arm-smmu-v3.c | 70 ++++++++++++++++++++++++++++++-------
 1 file changed, 58 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 6640c2ac2a7c5..c4bffb14461aa 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2375,6 +2375,20 @@ arm_smmu_atc_inv_to_cmd(int ssid, unsigned long iova, size_t size,
 	size_t inval_grain_shift = 12;
 	unsigned long page_start, page_end;
 
+	/*
+	 * ATS and PASID:
+	 *
+	 * If substream_valid is clear, the PCIe TLP is sent without a PASID
+	 * prefix. In that case all ATC entries within the address range are
+	 * invalidated, including those that were requested with a PASID! There
+	 * is no way to invalidate only entries without PASID.
+	 *
+	 * When using STRTAB_STE_1_S1DSS_SSID0 (reserving CD 0 for non-PASID
+	 * traffic), translation requests without PASID create ATC entries
+	 * without PASID, which must be invalidated with substream_valid clear.
+	 * This has the unpleasant side-effect of invalidating all PASID-tagged
+	 * ATC entries within the address range.
+	 */
 	*cmd = (struct arm_smmu_cmdq_ent) {
 		.opcode			= CMDQ_OP_ATC_INV,
 		.substream_valid	= !!ssid,
@@ -2418,12 +2432,12 @@ arm_smmu_atc_inv_to_cmd(int ssid, unsigned long iova, size_t size,
 	cmd->atc.size	= log2_span;
 }
 
-static int arm_smmu_atc_inv_master(struct arm_smmu_master *master)
+static int arm_smmu_atc_inv_master(struct arm_smmu_master *master, int ssid)
 {
 	int i;
 	struct arm_smmu_cmdq_ent cmd;
 
-	arm_smmu_atc_inv_to_cmd(0, 0, 0, &cmd);
+	arm_smmu_atc_inv_to_cmd(ssid, 0, 0, &cmd);
 
 	for (i = 0; i < master->num_sids; i++) {
 		cmd.atc.sid = master->sids[i];
@@ -2934,7 +2948,7 @@ static void arm_smmu_disable_ats(struct arm_smmu_master *master)
 	 * ATC invalidation via the SMMU.
 	 */
 	wmb();
-	arm_smmu_atc_inv_master(master);
+	arm_smmu_atc_inv_master(master, 0);
 	atomic_dec(&smmu_domain->nr_ats_masters);
 }
 
@@ -3131,7 +3145,22 @@ arm_smmu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova)
 static void arm_smmu_mm_invalidate(struct device *dev, int pasid, void *entry,
 				   unsigned long iova, size_t size)
 {
-	/* TODO: Invalidate ATC */
+	int i;
+	struct arm_smmu_cmdq_ent cmd;
+	struct arm_smmu_cmdq_batch cmds = {};
+	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+
+	if (!master->ats_enabled)
+		return;
+
+	arm_smmu_atc_inv_to_cmd(pasid, iova, size, &cmd);
+
+	for (i = 0; i < master->num_sids; i++) {
+		cmd.atc.sid = master->sids[i];
+		arm_smmu_cmdq_batch_add(master->smmu, &cmds, &cmd);
+	}
+
+	arm_smmu_cmdq_batch_submit(master->smmu, &cmds);
 }
 
 static int arm_smmu_mm_attach(struct device *dev, int pasid, void *entry,
@@ -3168,26 +3197,43 @@ static void arm_smmu_mm_clear(struct device *dev, int pasid, void *entry)
 	 * for this ASID, so we need to do it manually.
 	 */
 	arm_smmu_tlb_inv_asid(smmu_domain->smmu, cd->asid);
-
-	/* TODO: invalidate ATC */
+	arm_smmu_atc_inv_domain(smmu_domain, pasid, 0, 0);
 }
 
 static void arm_smmu_mm_detach(struct device *dev, int pasid, void *entry,
 			       bool detach_domain, bool cleared)
 {
 	struct arm_smmu_ctx_desc *cd = entry;
+	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
 	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 
-	if (detach_domain) {
+	if (detach_domain)
 		arm_smmu_write_ctx_desc(smmu_domain, pasid, NULL);
 
-		if (!cleared)
-			/* See comment in arm_smmu_mm_clear() */
-			arm_smmu_tlb_inv_asid(smmu_domain->smmu, cd->asid);
-	}
+	/*
+	 * If we went through clear(), we've already invalidated, and no new TLB
+	 * entry can have been formed.
+	 */
+	if (cleared)
+		return;
+
+	if (detach_domain) {
+		/* See comment in arm_smmu_mm_clear() */
+		arm_smmu_tlb_inv_asid(smmu_domain->smmu, cd->asid);
+		arm_smmu_atc_inv_domain(smmu_domain, pasid, 0, 0);
 
-	/* TODO: invalidate ATC */
+	} else if (master->ats_enabled) {
+		/*
+		 * There are more devices bound with this PASID in this domain,
+		 * so we cannot yet clear the PASID entry, and this device could
+		 * create new ATC entries. Invalidate the ATC for the sake of
+		 * it. On unbinding the last device we'll properly invalidate
+		 * all ATCs in the domain. Alternatively, an early detach_dev()
+		 * on this device will also flush the ATC.
+		 */
+		arm_smmu_atc_inv_master(master, pasid);
+	}
 }
 
 static void *arm_smmu_mm_alloc(struct mm_struct *mm)
-- 
2.26.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 19/25] iommu/arm-smmu-v3: Add support for Hardware Translation Table Update
  2020-04-14 17:02 [PATCH v5 00/25] iommu: Shared Virtual Addressing and SMMUv3 support Jean-Philippe Brucker
                   ` (17 preceding siblings ...)
  2020-04-14 17:02 ` [PATCH v5 18/25] iommu/arm-smmu-v3: Hook up ATC invalidation to mm ops Jean-Philippe Brucker
@ 2020-04-14 17:02 ` Jean-Philippe Brucker
  2020-04-14 17:02 ` [PATCH v5 20/25] iommu/arm-smmu-v3: Maintain a SID->device structure Jean-Philippe Brucker
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2020-04-14 17:02 UTC (permalink / raw)
  To: iommu, devicetree, linux-arm-kernel, linux-pci, linux-mm
  Cc: joro, catalin.marinas, will, robin.murphy, kevin.tian, baolu.lu,
	Jonathan.Cameron, jacob.jun.pan, christian.koenig, zhangfei.gao,
	jgg, xuzaibo, Jean-Philippe Brucker

If the SMMU supports it and the kernel was built with HTTU support, enable
hardware update of access and dirty flags. This is essential for shared
page tables, to reduce the number of access faults on the fault queue.

We can enable HTTU even if CPUs don't support it, because the kernel
always checks for HW dirty bit and updates the PTE flags atomically.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
v4->v5: bump feature bits
---
 drivers/iommu/arm-smmu-v3.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index c4bffb14461aa..4ed9df15581af 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -57,6 +57,8 @@
 #define IDR0_ASID16			(1 << 12)
 #define IDR0_ATS			(1 << 10)
 #define IDR0_HYP			(1 << 9)
+#define IDR0_HD				(1 << 7)
+#define IDR0_HA				(1 << 6)
 #define IDR0_BTM			(1 << 5)
 #define IDR0_COHACC			(1 << 4)
 #define IDR0_TTF			GENMASK(3, 2)
@@ -308,6 +310,9 @@
 #define CTXDESC_CD_0_TCR_IPS		GENMASK_ULL(34, 32)
 #define CTXDESC_CD_0_TCR_TBI0		(1ULL << 38)
 
+#define CTXDESC_CD_0_TCR_HA		(1UL << 43)
+#define CTXDESC_CD_0_TCR_HD		(1UL << 42)
+
 #define CTXDESC_CD_0_AA64		(1UL << 41)
 #define CTXDESC_CD_0_S			(1UL << 44)
 #define CTXDESC_CD_0_R			(1UL << 45)
@@ -659,6 +664,8 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_E2H		(1 << 16)
 #define ARM_SMMU_FEAT_BTM		(1 << 17)
 #define ARM_SMMU_FEAT_SVA		(1 << 18)
+#define ARM_SMMU_FEAT_HA		(1 << 19)
+#define ARM_SMMU_FEAT_HD		(1 << 20)
 	u32				features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
@@ -1689,10 +1696,17 @@ static int __arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
 		 * this substream's traffic
 		 */
 	} else { /* (1) and (2) */
+		u64 tcr = cd->tcr;
+
 		cdptr[1] = cpu_to_le64(cd->ttbr & CTXDESC_CD_1_TTB0_MASK);
 		cdptr[2] = 0;
 		cdptr[3] = cpu_to_le64(cd->mair);
 
+		if (!(smmu->features & ARM_SMMU_FEAT_HD))
+			tcr &= ~CTXDESC_CD_0_TCR_HD;
+		if (!(smmu->features & ARM_SMMU_FEAT_HA))
+			tcr &= ~CTXDESC_CD_0_TCR_HA;
+
 		/*
 		 * STE is live, and the SMMU might read dwords of this CD in any
 		 * order. Ensure that it observes valid values before reading
@@ -1700,7 +1714,7 @@ static int __arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
 		 */
 		arm_smmu_sync_cd(smmu_domain, ssid, true);
 
-		val = cd->tcr |
+		val = tcr |
 #ifdef __BIG_ENDIAN
 			CTXDESC_CD_0_ENDI |
 #endif
@@ -1943,10 +1957,12 @@ static struct arm_smmu_ctx_desc *arm_smmu_alloc_shared_cd(struct mm_struct *mm)
 		return old_cd;
 	}
 
+	/* HA and HD will be filtered out later if not supported by the SMMU */
 	tcr = FIELD_PREP(CTXDESC_CD_0_TCR_T0SZ, 64ULL - VA_BITS) |
 	      FIELD_PREP(CTXDESC_CD_0_TCR_IRGN0, ARM_LPAE_TCR_RGN_WBWA) |
 	      FIELD_PREP(CTXDESC_CD_0_TCR_ORGN0, ARM_LPAE_TCR_RGN_WBWA) |
 	      FIELD_PREP(CTXDESC_CD_0_TCR_SH0, ARM_LPAE_TCR_SH_IS) |
+	      CTXDESC_CD_0_TCR_HA | CTXDESC_CD_0_TCR_HD |
 	      CTXDESC_CD_0_TCR_EPD1 | CTXDESC_CD_0_AA64;
 
 	switch (PAGE_SIZE) {
@@ -4309,6 +4325,12 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 			smmu->features |= ARM_SMMU_FEAT_E2H;
 	}
 
+	if (reg & (IDR0_HA | IDR0_HD)) {
+		smmu->features |= ARM_SMMU_FEAT_HA;
+		if (reg & IDR0_HD)
+			smmu->features |= ARM_SMMU_FEAT_HD;
+	}
+
 	/*
 	 * If the CPU is using VHE, but the SMMU doesn't support it, the SMMU
 	 * will create TLB entries for NH-EL1 world and will miss the
-- 
2.26.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 20/25] iommu/arm-smmu-v3: Maintain a SID->device structure
  2020-04-14 17:02 [PATCH v5 00/25] iommu: Shared Virtual Addressing and SMMUv3 support Jean-Philippe Brucker
                   ` (18 preceding siblings ...)
  2020-04-14 17:02 ` [PATCH v5 19/25] iommu/arm-smmu-v3: Add support for Hardware Translation Table Update Jean-Philippe Brucker
@ 2020-04-14 17:02 ` Jean-Philippe Brucker
  2020-04-14 17:02 ` [PATCH v5 21/25] dt-bindings: document stall property for IOMMU masters Jean-Philippe Brucker
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2020-04-14 17:02 UTC (permalink / raw)
  To: iommu, devicetree, linux-arm-kernel, linux-pci, linux-mm
  Cc: joro, catalin.marinas, will, robin.murphy, kevin.tian, baolu.lu,
	Jonathan.Cameron, jacob.jun.pan, christian.koenig, zhangfei.gao,
	jgg, xuzaibo, Jean-Philippe Brucker

When handling faults from the event or PRI queue, we need to find the
struct device associated to a SID. Add a rb_tree to keep track of SIDs.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 drivers/iommu/arm-smmu-v3.c | 179 +++++++++++++++++++++++++++++-------
 1 file changed, 147 insertions(+), 32 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 4ed9df15581af..7a4c5914a2fe2 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -697,6 +697,15 @@ struct arm_smmu_device {
 
 	/* IOMMU core code handle */
 	struct iommu_device		iommu;
+
+	struct rb_root			streams;
+	struct mutex			streams_mutex;
+};
+
+struct arm_smmu_stream {
+	u32				id;
+	struct arm_smmu_master		*master;
+	struct rb_node			node;
 };
 
 /* SMMU private data for each master */
@@ -705,8 +714,8 @@ struct arm_smmu_master {
 	struct device			*dev;
 	struct arm_smmu_domain		*domain;
 	struct list_head		domain_head;
-	u32				*sids;
-	unsigned int			num_sids;
+	struct arm_smmu_stream		*streams;
+	unsigned int			num_streams;
 	bool				ats_enabled;
 	unsigned int			ssid_bits;
 };
@@ -1592,8 +1601,8 @@ static void arm_smmu_sync_cd(struct arm_smmu_domain *smmu_domain,
 
 	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
 	list_for_each_entry(master, &smmu_domain->devices, domain_head) {
-		for (i = 0; i < master->num_sids; i++) {
-			cmd.cfgi.sid = master->sids[i];
+		for (i = 0; i < master->num_streams; i++) {
+			cmd.cfgi.sid = master->streams[i].id;
 			arm_smmu_cmdq_batch_add(smmu, &cmds, &cmd);
 		}
 	}
@@ -2222,6 +2231,32 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
 	return 0;
 }
 
+__maybe_unused
+static struct arm_smmu_master *
+arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
+{
+	struct rb_node *node;
+	struct arm_smmu_stream *stream;
+	struct arm_smmu_master *master = NULL;
+
+	mutex_lock(&smmu->streams_mutex);
+	node = smmu->streams.rb_node;
+	while (node) {
+		stream = rb_entry(node, struct arm_smmu_stream, node);
+		if (stream->id < sid) {
+			node = node->rb_right;
+		} else if (stream->id > sid) {
+			node = node->rb_left;
+		} else {
+			master = stream->master;
+			break;
+		}
+	}
+	mutex_unlock(&smmu->streams_mutex);
+
+	return master;
+}
+
 /* IRQ and event handlers */
 static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 {
@@ -2455,8 +2490,8 @@ static int arm_smmu_atc_inv_master(struct arm_smmu_master *master, int ssid)
 
 	arm_smmu_atc_inv_to_cmd(ssid, 0, 0, &cmd);
 
-	for (i = 0; i < master->num_sids; i++) {
-		cmd.atc.sid = master->sids[i];
+	for (i = 0; i < master->num_streams; i++) {
+		cmd.atc.sid = master->streams[i].id;
 		arm_smmu_cmdq_issue_cmd(master->smmu, &cmd);
 	}
 
@@ -2499,8 +2534,8 @@ static int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain,
 		if (!master->ats_enabled)
 			continue;
 
-		for (i = 0; i < master->num_sids; i++) {
-			cmd.atc.sid = master->sids[i];
+		for (i = 0; i < master->num_streams; i++) {
+			cmd.atc.sid = master->streams[i].id;
 			arm_smmu_cmdq_batch_add(smmu_domain->smmu, &cmds, &cmd);
 		}
 	}
@@ -2906,13 +2941,13 @@ static void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master)
 	int i, j;
 	struct arm_smmu_device *smmu = master->smmu;
 
-	for (i = 0; i < master->num_sids; ++i) {
-		u32 sid = master->sids[i];
+	for (i = 0; i < master->num_streams; ++i) {
+		u32 sid = master->streams[i].id;
 		__le64 *step = arm_smmu_get_step_for_sid(smmu, sid);
 
 		/* Bridged PCI devices may end up with duplicated IDs */
 		for (j = 0; j < i; j++)
-			if (master->sids[j] == sid)
+			if (master->streams[j].id == sid)
 				break;
 		if (j < i)
 			continue;
@@ -3171,8 +3206,8 @@ static void arm_smmu_mm_invalidate(struct device *dev, int pasid, void *entry,
 
 	arm_smmu_atc_inv_to_cmd(pasid, iova, size, &cmd);
 
-	for (i = 0; i < master->num_sids; i++) {
-		cmd.atc.sid = master->sids[i];
+	for (i = 0; i < master->num_streams; i++) {
+		cmd.atc.sid = master->streams[i].id;
 		arm_smmu_cmdq_batch_add(master->smmu, &cmds, &cmd);
 	}
 
@@ -3304,11 +3339,101 @@ static bool arm_smmu_sid_in_range(struct arm_smmu_device *smmu, u32 sid)
 	return sid < limit;
 }
 
+static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
+				  struct arm_smmu_master *master)
+{
+	int i;
+	int ret = 0;
+	struct arm_smmu_stream *new_stream, *cur_stream;
+	struct rb_node **new_node, *parent_node = NULL;
+	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
+
+	master->streams = kcalloc(fwspec->num_ids,
+				  sizeof(struct arm_smmu_stream), GFP_KERNEL);
+	if (!master->streams)
+		return -ENOMEM;
+	master->num_streams = fwspec->num_ids;
+
+	mutex_lock(&smmu->streams_mutex);
+	for (i = 0; i < fwspec->num_ids && !ret; i++) {
+		u32 sid = fwspec->ids[i];
+
+		new_stream = &master->streams[i];
+		new_stream->id = sid;
+		new_stream->master = master;
+
+		/*
+		 * Check the SIDs are in range of the SMMU and our stream table
+		 */
+		if (!arm_smmu_sid_in_range(smmu, sid)) {
+			ret = -ERANGE;
+			break;
+		}
+
+		/* Ensure l2 strtab is initialised */
+		if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) {
+			ret = arm_smmu_init_l2_strtab(smmu, sid);
+			if (ret)
+				break;
+		}
+
+		/* Insert into SID tree */
+		new_node = &(smmu->streams.rb_node);
+		while (*new_node) {
+			cur_stream = rb_entry(*new_node, struct arm_smmu_stream,
+					      node);
+			parent_node = *new_node;
+			if (cur_stream->id > new_stream->id) {
+				new_node = &((*new_node)->rb_left);
+			} else if (cur_stream->id < new_stream->id) {
+				new_node = &((*new_node)->rb_right);
+			} else {
+				dev_warn(master->dev,
+					 "stream %u already in tree\n",
+					 cur_stream->id);
+				ret = -EINVAL;
+				break;
+			}
+		}
+
+		if (!ret) {
+			rb_link_node(&new_stream->node, parent_node, new_node);
+			rb_insert_color(&new_stream->node, &smmu->streams);
+		}
+	}
+
+	if (ret) {
+		for (; i > 0; i--)
+			rb_erase(&master->streams[i].node, &smmu->streams);
+		kfree(master->streams);
+	}
+	mutex_unlock(&smmu->streams_mutex);
+
+	return ret;
+}
+
+static void arm_smmu_remove_master(struct arm_smmu_device *smmu,
+				   struct arm_smmu_master *master)
+{
+	int i;
+	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
+
+	if (!master->streams)
+		return;
+
+	mutex_lock(&smmu->streams_mutex);
+	for (i = 0; i < fwspec->num_ids; i++)
+		rb_erase(&master->streams[i].node, &smmu->streams);
+	mutex_unlock(&smmu->streams_mutex);
+
+	kfree(master->streams);
+}
+
 static struct iommu_ops arm_smmu_ops;
 
 static int arm_smmu_add_device(struct device *dev)
 {
-	int i, ret;
+	int ret;
 	struct arm_smmu_device *smmu;
 	struct arm_smmu_master *master;
 	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
@@ -3330,26 +3455,11 @@ static int arm_smmu_add_device(struct device *dev)
 
 	master->dev = dev;
 	master->smmu = smmu;
-	master->sids = fwspec->ids;
-	master->num_sids = fwspec->num_ids;
 	dev_iommu_priv_set(dev, master);
 
-	/* Check the SIDs are in range of the SMMU and our stream table */
-	for (i = 0; i < master->num_sids; i++) {
-		u32 sid = master->sids[i];
-
-		if (!arm_smmu_sid_in_range(smmu, sid)) {
-			ret = -ERANGE;
-			goto err_free_master;
-		}
-
-		/* Ensure l2 strtab is initialised */
-		if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) {
-			ret = arm_smmu_init_l2_strtab(smmu, sid);
-			if (ret)
-				goto err_free_master;
-		}
-	}
+	ret = arm_smmu_insert_master(smmu, master);
+	if (ret)
+		goto err_free_master;
 
 	master->ssid_bits = min(smmu->ssid_bits, fwspec->num_pasid_bits);
 
@@ -3384,6 +3494,7 @@ static int arm_smmu_add_device(struct device *dev)
 	iommu_device_unlink(&smmu->iommu, dev);
 err_disable_pasid:
 	arm_smmu_disable_pasid(master);
+	arm_smmu_remove_master(smmu, master);
 err_free_master:
 	kfree(master);
 	dev_iommu_priv_set(dev, NULL);
@@ -3406,6 +3517,7 @@ static void arm_smmu_remove_device(struct device *dev)
 	iommu_group_remove_device(dev);
 	iommu_device_unlink(&smmu->iommu, dev);
 	arm_smmu_disable_pasid(master);
+	arm_smmu_remove_master(smmu, master);
 	kfree(master);
 	iommu_fwspec_free(dev);
 }
@@ -3849,6 +3961,9 @@ static int arm_smmu_init_structures(struct arm_smmu_device *smmu)
 {
 	int ret;
 
+	mutex_init(&smmu->streams_mutex);
+	smmu->streams = RB_ROOT;
+
 	ret = arm_smmu_init_queues(smmu);
 	if (ret)
 		return ret;
-- 
2.26.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 21/25] dt-bindings: document stall property for IOMMU masters
  2020-04-14 17:02 [PATCH v5 00/25] iommu: Shared Virtual Addressing and SMMUv3 support Jean-Philippe Brucker
                   ` (19 preceding siblings ...)
  2020-04-14 17:02 ` [PATCH v5 20/25] iommu/arm-smmu-v3: Maintain a SID->device structure Jean-Philippe Brucker
@ 2020-04-14 17:02 ` Jean-Philippe Brucker
  2020-04-14 17:02 ` [PATCH v5 22/25] iommu/arm-smmu-v3: Add stall support for platform devices Jean-Philippe Brucker
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2020-04-14 17:02 UTC (permalink / raw)
  To: iommu, devicetree, linux-arm-kernel, linux-pci, linux-mm
  Cc: joro, catalin.marinas, will, robin.murphy, kevin.tian, baolu.lu,
	Jonathan.Cameron, jacob.jun.pan, christian.koenig, zhangfei.gao,
	jgg, xuzaibo, Jean-Philippe Brucker, Rob Herring

On ARM systems, some platform devices behind an IOMMU may support stall,
which is the ability to recover from page faults. Let the firmware tell us
when a device supports stall.

Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 .../devicetree/bindings/iommu/iommu.txt        | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/iommu.txt b/Documentation/devicetree/bindings/iommu/iommu.txt
index 3c36334e4f942..26ba9e530f138 100644
--- a/Documentation/devicetree/bindings/iommu/iommu.txt
+++ b/Documentation/devicetree/bindings/iommu/iommu.txt
@@ -92,6 +92,24 @@ Optional properties:
   tagging DMA transactions with an address space identifier. By default,
   this is 0, which means that the device only has one address space.
 
+- dma-can-stall: When present, the master can wait for a transaction to
+  complete for an indefinite amount of time. Upon translation fault some
+  IOMMUs, instead of aborting the translation immediately, may first
+  notify the driver and keep the transaction in flight. This allows the OS
+  to inspect the fault and, for example, make physical pages resident
+  before updating the mappings and completing the transaction. Such IOMMU
+  accepts a limited number of simultaneous stalled transactions before
+  having to either put back-pressure on the master, or abort new faulting
+  transactions.
+
+  Firmware has to opt-in stalling, because most buses and masters don't
+  support it. In particular it isn't compatible with PCI, where
+  transactions have to complete before a time limit. More generally it
+  won't work in systems and masters that haven't been designed for
+  stalling. For example the OS, in order to handle a stalled transaction,
+  may attempt to retrieve pages from secondary storage in a stalled
+  domain, leading to a deadlock.
+
 
 Notes:
 ======
-- 
2.26.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 22/25] iommu/arm-smmu-v3: Add stall support for platform devices
  2020-04-14 17:02 [PATCH v5 00/25] iommu: Shared Virtual Addressing and SMMUv3 support Jean-Philippe Brucker
                   ` (20 preceding siblings ...)
  2020-04-14 17:02 ` [PATCH v5 21/25] dt-bindings: document stall property for IOMMU masters Jean-Philippe Brucker
@ 2020-04-14 17:02 ` Jean-Philippe Brucker
  2020-04-14 17:02 ` [PATCH v5 23/25] PCI/ATS: Add PRI stubs Jean-Philippe Brucker
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2020-04-14 17:02 UTC (permalink / raw)
  To: iommu, devicetree, linux-arm-kernel, linux-pci, linux-mm
  Cc: joro, catalin.marinas, will, robin.murphy, kevin.tian, baolu.lu,
	Jonathan.Cameron, jacob.jun.pan, christian.koenig, zhangfei.gao,
	jgg, xuzaibo, Jean-Philippe Brucker

The SMMU provides a Stall model for handling page faults in platform
devices. It is similar to PCI PRI, but doesn't require devices to have
their own translation cache. Instead, faulting transactions are parked and
the OS is given a chance to fix the page tables and retry the transaction.

Enable stall for devices that support it (opt-in by firmware). When an
event corresponds to a translation error, call the IOMMU fault handler. If
the fault is recoverable, it will call us back to terminate or continue
the stall.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
v4->v5: Improve comment for flush()
---
 include/linux/iommu.h       |   2 +
 drivers/iommu/arm-smmu-v3.c | 282 ++++++++++++++++++++++++++++++++++--
 drivers/iommu/of_iommu.c    |   5 +-
 3 files changed, 277 insertions(+), 12 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 4b9c25d7246d5..7dd615954e8c7 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -578,6 +578,7 @@ struct iommu_group *fsl_mc_device_group(struct device *dev);
  * @iommu_fwnode: firmware handle for this device's IOMMU
  * @iommu_priv: IOMMU driver private data for this device
  * @num_pasid_bits: number of PASID bits supported by this device
+ * @can_stall: the device is allowed to stall
  * @num_ids: number of associated device IDs
  * @ids: IDs which this device may present to the IOMMU
  */
@@ -585,6 +586,7 @@ struct iommu_fwspec {
 	const struct iommu_ops	*ops;
 	struct fwnode_handle	*iommu_fwnode;
 	u32			num_pasid_bits;
+	bool			can_stall;
 	unsigned int		num_ids;
 	u32			ids[];
 };
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 7a4c5914a2fe2..a7becf1c5347e 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -382,6 +382,13 @@
 #define CMDQ_PRI_1_GRPID		GENMASK_ULL(8, 0)
 #define CMDQ_PRI_1_RESP			GENMASK_ULL(13, 12)
 
+#define CMDQ_RESUME_0_SID		GENMASK_ULL(63, 32)
+#define CMDQ_RESUME_0_RESP_TERM		0UL
+#define CMDQ_RESUME_0_RESP_RETRY	1UL
+#define CMDQ_RESUME_0_RESP_ABORT	2UL
+#define CMDQ_RESUME_0_RESP		GENMASK_ULL(13, 12)
+#define CMDQ_RESUME_1_STAG		GENMASK_ULL(15, 0)
+
 #define CMDQ_SYNC_0_CS			GENMASK_ULL(13, 12)
 #define CMDQ_SYNC_0_CS_NONE		0
 #define CMDQ_SYNC_0_CS_IRQ		1
@@ -398,6 +405,25 @@
 
 #define EVTQ_0_ID			GENMASK_ULL(7, 0)
 
+#define EVT_ID_TRANSLATION_FAULT	0x10
+#define EVT_ID_ADDR_SIZE_FAULT		0x11
+#define EVT_ID_ACCESS_FAULT		0x12
+#define EVT_ID_PERMISSION_FAULT		0x13
+
+#define EVTQ_0_SSV			(1UL << 11)
+#define EVTQ_0_SSID			GENMASK_ULL(31, 12)
+#define EVTQ_0_SID			GENMASK_ULL(63, 32)
+#define EVTQ_1_STAG			GENMASK_ULL(15, 0)
+#define EVTQ_1_STALL			(1UL << 31)
+#define EVTQ_1_PRIV			(1UL << 33)
+#define EVTQ_1_EXEC			(1UL << 34)
+#define EVTQ_1_READ			(1UL << 35)
+#define EVTQ_1_S2			(1UL << 39)
+#define EVTQ_1_CLASS			GENMASK_ULL(41, 40)
+#define EVTQ_1_TT_READ			(1UL << 44)
+#define EVTQ_2_ADDR			GENMASK_ULL(63, 0)
+#define EVTQ_3_IPA			GENMASK_ULL(51, 12)
+
 /* PRI queue */
 #define PRIQ_ENT_SZ_SHIFT		4
 #define PRIQ_ENT_DWORDS			((1 << PRIQ_ENT_SZ_SHIFT) >> 3)
@@ -522,6 +548,13 @@ struct arm_smmu_cmdq_ent {
 			enum pri_resp		resp;
 		} pri;
 
+		#define CMDQ_OP_RESUME		0x44
+		struct {
+			u32			sid;
+			u16			stag;
+			u8			resp;
+		} resume;
+
 		#define CMDQ_OP_CMD_SYNC	0x46
 		struct {
 			u64			msiaddr;
@@ -557,6 +590,10 @@ struct arm_smmu_queue {
 
 	u32 __iomem			*prod_reg;
 	u32 __iomem			*cons_reg;
+
+	/* Event and PRI */
+	u64				batch;
+	wait_queue_head_t		wq;
 };
 
 struct arm_smmu_queue_poll {
@@ -580,6 +617,7 @@ struct arm_smmu_cmdq_batch {
 
 struct arm_smmu_evtq {
 	struct arm_smmu_queue		q;
+	struct iopf_queue		*iopf;
 	u32				max_stalls;
 };
 
@@ -717,6 +755,7 @@ struct arm_smmu_master {
 	struct arm_smmu_stream		*streams;
 	unsigned int			num_streams;
 	bool				ats_enabled;
+	bool				stall_enabled;
 	unsigned int			ssid_bits;
 };
 
@@ -734,6 +773,7 @@ struct arm_smmu_domain {
 
 	struct io_pgtable_ops		*pgtbl_ops;
 	bool				non_strict;
+	bool				stall_enabled;
 	atomic_t			nr_ats_masters;
 
 	enum arm_smmu_domain_stage	stage;
@@ -1004,6 +1044,11 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 		}
 		cmd[1] |= FIELD_PREP(CMDQ_PRI_1_RESP, ent->pri.resp);
 		break;
+	case CMDQ_OP_RESUME:
+		cmd[0] |= FIELD_PREP(CMDQ_RESUME_0_SID, ent->resume.sid);
+		cmd[0] |= FIELD_PREP(CMDQ_RESUME_0_RESP, ent->resume.resp);
+		cmd[1] |= FIELD_PREP(CMDQ_RESUME_1_STAG, ent->resume.stag);
+		break;
 	case CMDQ_OP_CMD_SYNC:
 		if (ent->sync.msiaddr) {
 			cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_CS, CMDQ_SYNC_0_CS_IRQ);
@@ -1570,6 +1615,45 @@ static int arm_smmu_cmdq_batch_submit(struct arm_smmu_device *smmu,
 	return arm_smmu_cmdq_issue_cmdlist(smmu, cmds->cmds, cmds->num, true);
 }
 
+static int arm_smmu_page_response(struct device *dev,
+				  struct iommu_fault_event *unused,
+				  struct iommu_page_response *resp)
+{
+	struct arm_smmu_cmdq_ent cmd = {0};
+	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+	int sid = master->streams[0].id;
+
+	if (master->stall_enabled) {
+		cmd.opcode		= CMDQ_OP_RESUME;
+		cmd.resume.sid		= sid;
+		cmd.resume.stag		= resp->grpid;
+		switch (resp->code) {
+		case IOMMU_PAGE_RESP_INVALID:
+		case IOMMU_PAGE_RESP_FAILURE:
+			cmd.resume.resp = CMDQ_RESUME_0_RESP_ABORT;
+			break;
+		case IOMMU_PAGE_RESP_SUCCESS:
+			cmd.resume.resp = CMDQ_RESUME_0_RESP_RETRY;
+			break;
+		default:
+			return -EINVAL;
+		}
+	} else {
+		/* TODO: insert PRI response here */
+		return -ENODEV;
+	}
+
+	arm_smmu_cmdq_issue_cmd(master->smmu, &cmd);
+	/*
+	 * Don't send a SYNC, it doesn't do anything for RESUME or PRI_RESP.
+	 * RESUME consumption guarantees that the stalled transaction will be
+	 * terminated... at some point in the future. PRI_RESP is fire and
+	 * forget.
+	 */
+
+	return 0;
+}
+
 /* Context descriptor manipulation functions */
 static void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid)
 {
@@ -1733,8 +1817,7 @@ static int __arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
 			FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid) |
 			CTXDESC_CD_0_V;
 
-		/* STALL_MODEL==0b10 && CD.S==0 is ILLEGAL */
-		if (smmu->features & ARM_SMMU_FEAT_STALL_FORCE)
+		if (smmu_domain->stall_enabled)
 			val |= CTXDESC_CD_0_S;
 	}
 
@@ -2154,7 +2237,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 			 FIELD_PREP(STRTAB_STE_1_STRW, strw));
 
 		if (smmu->features & ARM_SMMU_FEAT_STALLS &&
-		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
+		    !master->stall_enabled)
 			dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
 
 		val |= (s1_cfg->cdcfg.cdtab_dma & STRTAB_STE_0_S1CTXPTR_MASK) |
@@ -2231,7 +2314,6 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
 	return 0;
 }
 
-__maybe_unused
 static struct arm_smmu_master *
 arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
 {
@@ -2258,23 +2340,123 @@ arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
 }
 
 /* IRQ and event handlers */
+static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt)
+{
+	int ret;
+	u32 perm = 0;
+	struct arm_smmu_master *master;
+	bool ssid_valid = evt[0] & EVTQ_0_SSV;
+	u8 type = FIELD_GET(EVTQ_0_ID, evt[0]);
+	u32 sid = FIELD_GET(EVTQ_0_SID, evt[0]);
+	struct iommu_fault_event fault_evt = { };
+	struct iommu_fault *flt = &fault_evt.fault;
+
+	/* Stage-2 is always pinned at the moment */
+	if (evt[1] & EVTQ_1_S2)
+		return -EFAULT;
+
+	master = arm_smmu_find_master(smmu, sid);
+	if (!master)
+		return -EINVAL;
+
+	if (evt[1] & EVTQ_1_READ)
+		perm |= IOMMU_FAULT_PERM_READ;
+	else
+		perm |= IOMMU_FAULT_PERM_WRITE;
+
+	if (evt[1] & EVTQ_1_EXEC)
+		perm |= IOMMU_FAULT_PERM_EXEC;
+
+	if (evt[1] & EVTQ_1_PRIV)
+		perm |= IOMMU_FAULT_PERM_PRIV;
+
+	if (evt[1] & EVTQ_1_STALL) {
+		flt->type = IOMMU_FAULT_PAGE_REQ;
+		flt->prm = (struct iommu_fault_page_request) {
+			.flags = IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE,
+			.pasid = FIELD_GET(EVTQ_0_SSID, evt[0]),
+			.grpid = FIELD_GET(EVTQ_1_STAG, evt[1]),
+			.perm = perm,
+			.addr = FIELD_GET(EVTQ_2_ADDR, evt[2]),
+		};
+
+		if (ssid_valid)
+			flt->prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
+	} else {
+		flt->type = IOMMU_FAULT_DMA_UNRECOV;
+		flt->event = (struct iommu_fault_unrecoverable) {
+			.flags = IOMMU_FAULT_UNRECOV_ADDR_VALID |
+				 IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID,
+			.pasid = FIELD_GET(EVTQ_0_SSID, evt[0]),
+			.perm = perm,
+			.addr = FIELD_GET(EVTQ_2_ADDR, evt[2]),
+			.fetch_addr = FIELD_GET(EVTQ_3_IPA, evt[3]),
+		};
+
+		if (ssid_valid)
+			flt->event.flags |= IOMMU_FAULT_UNRECOV_PASID_VALID;
+
+		switch (type) {
+		case EVT_ID_TRANSLATION_FAULT:
+		case EVT_ID_ADDR_SIZE_FAULT:
+		case EVT_ID_ACCESS_FAULT:
+			flt->event.reason = IOMMU_FAULT_REASON_PTE_FETCH;
+			break;
+		case EVT_ID_PERMISSION_FAULT:
+			flt->event.reason = IOMMU_FAULT_REASON_PERMISSION;
+			break;
+		default:
+			/* TODO: report other unrecoverable faults. */
+			return -EFAULT;
+		}
+	}
+
+	ret = iommu_report_device_fault(master->dev, &fault_evt);
+	if (ret && flt->type == IOMMU_FAULT_PAGE_REQ) {
+		/* Nobody cared, abort the access */
+		struct iommu_page_response resp = {
+			.pasid		= flt->prm.pasid,
+			.grpid		= flt->prm.grpid,
+			.code		= IOMMU_PAGE_RESP_FAILURE,
+		};
+		arm_smmu_page_response(master->dev, NULL, &resp);
+	}
+
+	return ret;
+}
+
 static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 {
-	int i;
+	int i, ret;
+	int num_handled = 0;
 	struct arm_smmu_device *smmu = dev;
 	struct arm_smmu_queue *q = &smmu->evtq.q;
 	struct arm_smmu_ll_queue *llq = &q->llq;
+	size_t queue_size = 1 << llq->max_n_shift;
 	u64 evt[EVTQ_ENT_DWORDS];
 
+	spin_lock(&q->wq.lock);
 	do {
 		while (!queue_remove_raw(q, evt)) {
 			u8 id = FIELD_GET(EVTQ_0_ID, evt[0]);
 
-			dev_info(smmu->dev, "event 0x%02x received:\n", id);
-			for (i = 0; i < ARRAY_SIZE(evt); ++i)
-				dev_info(smmu->dev, "\t0x%016llx\n",
-					 (unsigned long long)evt[i]);
+			spin_unlock(&q->wq.lock);
+			ret = arm_smmu_handle_evt(smmu, evt);
+			spin_lock(&q->wq.lock);
 
+			if (++num_handled == queue_size) {
+				q->batch++;
+				wake_up_all_locked(&q->wq);
+				num_handled = 0;
+			}
+
+			if (ret) {
+				dev_info(smmu->dev, "event 0x%02x received:\n",
+					 id);
+				for (i = 0; i < ARRAY_SIZE(evt); ++i)
+					dev_info(smmu->dev, "\t0x%016llx\n",
+						 (unsigned long long)evt[i]);
+			}
 		}
 
 		/*
@@ -2288,6 +2470,11 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 	/* Sync our overflow flag, as we believe we're up to speed */
 	llq->cons = Q_OVF(llq->prod) | Q_WRP(llq, llq->cons) |
 		    Q_IDX(llq, llq->cons);
+	queue_sync_cons_out(q);
+
+	wake_up_all_locked(&q->wq);
+	spin_unlock(&q->wq.lock);
+
 	return IRQ_HANDLED;
 }
 
@@ -2351,6 +2538,37 @@ static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
 	return IRQ_HANDLED;
 }
 
+/*
+ * arm_smmu_flush_evtq - wait until all events currently in the queue have been
+ *                       consumed.
+ *
+ * Wait until there are no more event for this @pasid in the queue. Either until
+ * the queue becomes empty or, if new events are continually added the queue,
+ * until the event queue thread has handled a full batch (where one batch
+ * corresponds to the queue size). For that we take the batch number when
+ * entering flush() and wait for the event queue thread to increment it twice.
+ * Note that we don't handle overflows on q->batch. If it occurs, just wait for
+ * the queue to become empty.
+ */
+static int arm_smmu_flush_evtq(void *cookie, struct device *dev, int pasid)
+{
+	int ret;
+	u64 batch;
+	struct arm_smmu_device *smmu = cookie;
+	struct arm_smmu_queue *q = &smmu->evtq.q;
+
+	spin_lock(&q->wq.lock);
+	if (queue_sync_prod_in(q) == -EOVERFLOW)
+		dev_err(smmu->dev, "evtq overflow detected -- requests lost\n");
+
+	batch = q->batch;
+	ret = wait_event_interruptible_locked(q->wq, queue_empty(&q->llq) ||
+					      q->batch >= batch + 2);
+	spin_unlock(&q->wq.lock);
+
+	return ret;
+}
+
 static int arm_smmu_device_disable(struct arm_smmu_device *smmu);
 
 static irqreturn_t arm_smmu_gerror_handler(int irq, void *dev)
@@ -2781,6 +2999,8 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 
 	cfg->s1cdmax = master->ssid_bits;
 
+	smmu_domain->stall_enabled = master->stall_enabled;
+
 	ret = arm_smmu_alloc_cd_tables(smmu_domain);
 	if (ret)
 		goto out_free_asid;
@@ -3119,6 +3339,10 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 			smmu_domain->s1_cfg.s1cdmax, master->ssid_bits);
 		ret = -EINVAL;
 		goto out_unlock;
+	} else if (smmu_domain->stall_enabled && !master->stall_enabled) {
+		dev_err(dev, "cannot attach to stall-enabled domain\n");
+		ret = -EINVAL;
+		goto out_unlock;
 	}
 
 	master->domain = smmu_domain;
@@ -3477,6 +3701,10 @@ static int arm_smmu_add_device(struct device *dev)
 		master->ssid_bits = min_t(u8, master->ssid_bits,
 					  CTXDESC_LINEAR_CDMAX);
 
+	if ((smmu->features & ARM_SMMU_FEAT_STALLS && fwspec->can_stall) ||
+	    smmu->features & ARM_SMMU_FEAT_STALL_FORCE)
+		master->stall_enabled = true;
+
 	ret = iommu_device_link(&smmu->iommu, dev);
 	if (ret)
 		goto err_disable_pasid;
@@ -3512,6 +3740,7 @@ static void arm_smmu_remove_device(struct device *dev)
 
 	master = dev_iommu_priv_get(dev);
 	smmu = master->smmu;
+	iopf_queue_remove_device(smmu->evtq.iopf, dev);
 	WARN_ON(iommu_sva_disable(dev));
 	arm_smmu_detach_dev(master);
 	iommu_group_remove_device(dev);
@@ -3635,7 +3864,7 @@ static void arm_smmu_get_resv_regions(struct device *dev,
 
 static bool arm_smmu_iopf_supported(struct arm_smmu_master *master)
 {
-	return false;
+	return master->stall_enabled;
 }
 
 static bool arm_smmu_dev_has_feature(struct device *dev,
@@ -3676,6 +3905,7 @@ static bool arm_smmu_dev_feature_enabled(struct device *dev,
 
 static int arm_smmu_dev_enable_sva(struct device *dev)
 {
+	int ret;
 	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
 	struct iommu_sva_param param = {
 		.min_pasid = 1,
@@ -3683,7 +3913,21 @@ static int arm_smmu_dev_enable_sva(struct device *dev)
 	};
 
 	param.max_pasid = min(param.max_pasid, (1U << master->ssid_bits) - 1);
-	return iommu_sva_enable(dev, &param);
+
+	ret = iommu_sva_enable(dev, &param);
+	if (ret)
+		return ret;
+
+	if (master->stall_enabled) {
+		ret = iopf_queue_add_device(master->smmu->evtq.iopf, dev);
+		if (ret)
+			goto err_disable_sva;
+	}
+	return 0;
+
+err_disable_sva:
+	iommu_sva_disable(dev);
+	return ret;
 }
 
 static int arm_smmu_dev_enable_feature(struct device *dev,
@@ -3706,11 +3950,14 @@ static int arm_smmu_dev_enable_feature(struct device *dev,
 static int arm_smmu_dev_disable_feature(struct device *dev,
 					enum iommu_dev_features feat)
 {
+	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+
 	if (!arm_smmu_dev_feature_enabled(dev, feat))
 		return -EINVAL;
 
 	switch (feat) {
 	case IOMMU_DEV_FEAT_SVA:
+		iopf_queue_remove_device(master->smmu->evtq.iopf, dev);
 		return iommu_sva_disable(dev);
 	default:
 		return -EINVAL;
@@ -3742,6 +3989,7 @@ static struct iommu_ops arm_smmu_ops = {
 	.sva_bind		= arm_smmu_sva_bind,
 	.sva_unbind		= iommu_sva_unbind_generic,
 	.sva_get_pasid		= iommu_sva_get_pasid_generic,
+	.page_response		= arm_smmu_page_response,
 	.pgsize_bitmap		= -1UL, /* Restricted during device attach */
 };
 
@@ -3785,6 +4033,10 @@ static int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
 	q->q_base |= FIELD_PREP(Q_BASE_LOG2SIZE, q->llq.max_n_shift);
 
 	q->llq.prod = q->llq.cons = 0;
+
+	init_waitqueue_head(&q->wq);
+	q->batch = 0;
+
 	return 0;
 }
 
@@ -3838,6 +4090,13 @@ static int arm_smmu_init_queues(struct arm_smmu_device *smmu)
 	if (ret)
 		return ret;
 
+	if (smmu->features & ARM_SMMU_FEAT_STALLS) {
+		smmu->evtq.iopf = iopf_queue_alloc(dev_name(smmu->dev),
+						   arm_smmu_flush_evtq, smmu);
+		if (!smmu->evtq.iopf)
+			return -ENOMEM;
+	}
+
 	/* priq */
 	if (!(smmu->features & ARM_SMMU_FEAT_PRI))
 		return 0;
@@ -4818,6 +5077,7 @@ static int arm_smmu_device_remove(struct platform_device *pdev)
 	iommu_device_unregister(&smmu->iommu);
 	iommu_device_sysfs_remove(&smmu->iommu);
 	arm_smmu_device_disable(smmu);
+	iopf_queue_free(smmu->evtq.iopf);
 
 	return 0;
 }
diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 20738aacac89e..dd70177509543 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -205,9 +205,12 @@ const struct iommu_ops *of_iommu_configure(struct device *dev,
 		}
 
 		fwspec = dev_iommu_fwspec_get(dev);
-		if (!err && fwspec)
+		if (!err && fwspec) {
 			of_property_read_u32(master_np, "pasid-num-bits",
 					     &fwspec->num_pasid_bits);
+			fwspec->can_stall = of_property_read_bool(master_np,
+								  "dma-can-stall");
+		}
 	}
 
 	/*
-- 
2.26.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 23/25] PCI/ATS: Add PRI stubs
  2020-04-14 17:02 [PATCH v5 00/25] iommu: Shared Virtual Addressing and SMMUv3 support Jean-Philippe Brucker
                   ` (21 preceding siblings ...)
  2020-04-14 17:02 ` [PATCH v5 22/25] iommu/arm-smmu-v3: Add stall support for platform devices Jean-Philippe Brucker
@ 2020-04-14 17:02 ` Jean-Philippe Brucker
  2020-04-14 18:03   ` Kuppuswamy, Sathyanarayanan
  2020-04-14 17:02 ` [PATCH v5 24/25] PCI/ATS: Export PRI functions Jean-Philippe Brucker
  2020-04-14 17:02 ` [PATCH v5 25/25] iommu/arm-smmu-v3: Add support for PRI Jean-Philippe Brucker
  24 siblings, 1 reply; 44+ messages in thread
From: Jean-Philippe Brucker @ 2020-04-14 17:02 UTC (permalink / raw)
  To: iommu, devicetree, linux-arm-kernel, linux-pci, linux-mm
  Cc: joro, catalin.marinas, will, robin.murphy, kevin.tian, baolu.lu,
	Jonathan.Cameron, jacob.jun.pan, christian.koenig, zhangfei.gao,
	jgg, xuzaibo, Jean-Philippe Brucker, Bjorn Helgaas

The SMMUv3 driver, which can be built without CONFIG_PCI, will soon gain
support for PRI.  Partially revert commit c6e9aefbf9db ("PCI/ATS: Remove
unused PRI and PASID stubs") to re-introduce the PRI stubs, and avoid
adding more #ifdefs to the SMMU driver.

Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 include/linux/pci-ats.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
index f75c307f346de..e9e266df9b37c 100644
--- a/include/linux/pci-ats.h
+++ b/include/linux/pci-ats.h
@@ -28,6 +28,14 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs);
 void pci_disable_pri(struct pci_dev *pdev);
 int pci_reset_pri(struct pci_dev *pdev);
 int pci_prg_resp_pasid_required(struct pci_dev *pdev);
+#else /* CONFIG_PCI_PRI */
+static inline int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
+{ return -ENODEV; }
+static inline void pci_disable_pri(struct pci_dev *pdev) { }
+static inline int pci_reset_pri(struct pci_dev *pdev)
+{ return -ENODEV; }
+static inline int pci_prg_resp_pasid_required(struct pci_dev *pdev)
+{ return 0; }
 #endif /* CONFIG_PCI_PRI */
 
 #ifdef CONFIG_PCI_PASID
-- 
2.26.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 24/25] PCI/ATS: Export PRI functions
  2020-04-14 17:02 [PATCH v5 00/25] iommu: Shared Virtual Addressing and SMMUv3 support Jean-Philippe Brucker
                   ` (22 preceding siblings ...)
  2020-04-14 17:02 ` [PATCH v5 23/25] PCI/ATS: Add PRI stubs Jean-Philippe Brucker
@ 2020-04-14 17:02 ` Jean-Philippe Brucker
  2020-04-14 18:03   ` Kuppuswamy, Sathyanarayanan
  2020-04-14 17:02 ` [PATCH v5 25/25] iommu/arm-smmu-v3: Add support for PRI Jean-Philippe Brucker
  24 siblings, 1 reply; 44+ messages in thread
From: Jean-Philippe Brucker @ 2020-04-14 17:02 UTC (permalink / raw)
  To: iommu, devicetree, linux-arm-kernel, linux-pci, linux-mm
  Cc: joro, catalin.marinas, will, robin.murphy, kevin.tian, baolu.lu,
	Jonathan.Cameron, jacob.jun.pan, christian.koenig, zhangfei.gao,
	jgg, xuzaibo, Jean-Philippe Brucker, Bjorn Helgaas

The SMMUv3 driver uses pci_{enable,disable}_pri() and related
functions. Export those functions to allow the driver to be built as a
module.

Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 drivers/pci/ats.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index bbfd0d42b8b97..fc8fc6fc8bd55 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -197,6 +197,7 @@ void pci_pri_init(struct pci_dev *pdev)
 	if (status & PCI_PRI_STATUS_PASID)
 		pdev->pasid_required = 1;
 }
+EXPORT_SYMBOL_GPL(pci_pri_init);
 
 /**
  * pci_enable_pri - Enable PRI capability
@@ -243,6 +244,7 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
 
 	return 0;
 }
+EXPORT_SYMBOL_GPL(pci_enable_pri);
 
 /**
  * pci_disable_pri - Disable PRI capability
@@ -322,6 +324,7 @@ int pci_reset_pri(struct pci_dev *pdev)
 
 	return 0;
 }
+EXPORT_SYMBOL_GPL(pci_reset_pri);
 
 /**
  * pci_prg_resp_pasid_required - Return PRG Response PASID Required bit
@@ -337,6 +340,7 @@ int pci_prg_resp_pasid_required(struct pci_dev *pdev)
 
 	return pdev->pasid_required;
 }
+EXPORT_SYMBOL_GPL(pci_prg_resp_pasid_required);
 #endif /* CONFIG_PCI_PRI */
 
 #ifdef CONFIG_PCI_PASID
-- 
2.26.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v5 25/25] iommu/arm-smmu-v3: Add support for PRI
  2020-04-14 17:02 [PATCH v5 00/25] iommu: Shared Virtual Addressing and SMMUv3 support Jean-Philippe Brucker
                   ` (23 preceding siblings ...)
  2020-04-14 17:02 ` [PATCH v5 24/25] PCI/ATS: Export PRI functions Jean-Philippe Brucker
@ 2020-04-14 17:02 ` Jean-Philippe Brucker
  24 siblings, 0 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2020-04-14 17:02 UTC (permalink / raw)
  To: iommu, devicetree, linux-arm-kernel, linux-pci, linux-mm
  Cc: joro, catalin.marinas, will, robin.murphy, kevin.tian, baolu.lu,
	Jonathan.Cameron, jacob.jun.pan, christian.koenig, zhangfei.gao,
	jgg, xuzaibo, Jean-Philippe Brucker

For PCI devices that support it, enable the PRI capability and handle PRI
Page Requests with the generic fault handler. It is enabled on demand by
iommu_sva_device_init().

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 drivers/iommu/arm-smmu-v3.c | 284 +++++++++++++++++++++++++++++-------
 1 file changed, 234 insertions(+), 50 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index a7becf1c5347e..8017700c33c46 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -251,6 +251,7 @@
 #define STRTAB_STE_1_S1COR		GENMASK_ULL(5, 4)
 #define STRTAB_STE_1_S1CSH		GENMASK_ULL(7, 6)
 
+#define STRTAB_STE_1_PPAR		(1UL << 18)
 #define STRTAB_STE_1_S1STALLD		(1UL << 27)
 
 #define STRTAB_STE_1_EATS		GENMASK_ULL(29, 28)
@@ -381,6 +382,9 @@
 #define CMDQ_PRI_0_SID			GENMASK_ULL(63, 32)
 #define CMDQ_PRI_1_GRPID		GENMASK_ULL(8, 0)
 #define CMDQ_PRI_1_RESP			GENMASK_ULL(13, 12)
+#define CMDQ_PRI_1_RESP_FAILURE		0UL
+#define CMDQ_PRI_1_RESP_INVALID		1UL
+#define CMDQ_PRI_1_RESP_SUCCESS		2UL
 
 #define CMDQ_RESUME_0_SID		GENMASK_ULL(63, 32)
 #define CMDQ_RESUME_0_RESP_TERM		0UL
@@ -453,12 +457,6 @@ module_param_named(disable_bypass, disable_bypass, bool, S_IRUGO);
 MODULE_PARM_DESC(disable_bypass,
 	"Disable bypass streams such that incoming transactions from devices that are not attached to an iommu domain will report an abort back to the device and will not be allowed to pass through the SMMU.");
 
-enum pri_resp {
-	PRI_RESP_DENY = 0,
-	PRI_RESP_FAIL = 1,
-	PRI_RESP_SUCC = 2,
-};
-
 enum arm_smmu_msi_index {
 	EVTQ_MSI_INDEX,
 	GERROR_MSI_INDEX,
@@ -545,7 +543,7 @@ struct arm_smmu_cmdq_ent {
 			u32			sid;
 			u32			ssid;
 			u16			grpid;
-			enum pri_resp		resp;
+			u8			resp;
 		} pri;
 
 		#define CMDQ_OP_RESUME		0x44
@@ -623,6 +621,7 @@ struct arm_smmu_evtq {
 
 struct arm_smmu_priq {
 	struct arm_smmu_queue		q;
+	struct iopf_queue		*iopf;
 };
 
 /* High-level stream table and context descriptor structures */
@@ -756,6 +755,8 @@ struct arm_smmu_master {
 	unsigned int			num_streams;
 	bool				ats_enabled;
 	bool				stall_enabled;
+	bool				pri_supported;
+	bool				prg_resp_needs_ssid;
 	unsigned int			ssid_bits;
 };
 
@@ -1034,14 +1035,6 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 		cmd[0] |= FIELD_PREP(CMDQ_PRI_0_SSID, ent->pri.ssid);
 		cmd[0] |= FIELD_PREP(CMDQ_PRI_0_SID, ent->pri.sid);
 		cmd[1] |= FIELD_PREP(CMDQ_PRI_1_GRPID, ent->pri.grpid);
-		switch (ent->pri.resp) {
-		case PRI_RESP_DENY:
-		case PRI_RESP_FAIL:
-		case PRI_RESP_SUCC:
-			break;
-		default:
-			return -EINVAL;
-		}
 		cmd[1] |= FIELD_PREP(CMDQ_PRI_1_RESP, ent->pri.resp);
 		break;
 	case CMDQ_OP_RESUME:
@@ -1621,6 +1614,7 @@ static int arm_smmu_page_response(struct device *dev,
 {
 	struct arm_smmu_cmdq_ent cmd = {0};
 	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+	bool pasid_valid = resp->flags & IOMMU_PAGE_RESP_PASID_VALID;
 	int sid = master->streams[0].id;
 
 	if (master->stall_enabled) {
@@ -1638,8 +1632,27 @@ static int arm_smmu_page_response(struct device *dev,
 		default:
 			return -EINVAL;
 		}
+	} else if (master->pri_supported) {
+		cmd.opcode		= CMDQ_OP_PRI_RESP;
+		cmd.substream_valid	= pasid_valid &&
+					  master->prg_resp_needs_ssid;
+		cmd.pri.sid		= sid;
+		cmd.pri.ssid		= resp->pasid;
+		cmd.pri.grpid		= resp->grpid;
+		switch (resp->code) {
+		case IOMMU_PAGE_RESP_FAILURE:
+			cmd.pri.resp = CMDQ_PRI_1_RESP_FAILURE;
+			break;
+		case IOMMU_PAGE_RESP_INVALID:
+			cmd.pri.resp = CMDQ_PRI_1_RESP_INVALID;
+			break;
+		case IOMMU_PAGE_RESP_SUCCESS:
+			cmd.pri.resp = CMDQ_PRI_1_RESP_SUCCESS;
+			break;
+		default:
+			return -EINVAL;
+		}
 	} else {
-		/* TODO: insert PRI response here */
 		return -ENODEV;
 	}
 
@@ -2236,6 +2249,9 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 			 FIELD_PREP(STRTAB_STE_1_S1CSH, ARM_SMMU_SH_ISH) |
 			 FIELD_PREP(STRTAB_STE_1_STRW, strw));
 
+		if (master->prg_resp_needs_ssid)
+			dst[1] |= STRTAB_STE_1_PPAR;
+
 		if (smmu->features & ARM_SMMU_FEAT_STALLS &&
 		    !master->stall_enabled)
 			dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
@@ -2480,61 +2496,110 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 
 static void arm_smmu_handle_ppr(struct arm_smmu_device *smmu, u64 *evt)
 {
-	u32 sid, ssid;
-	u16 grpid;
-	bool ssv, last;
-
-	sid = FIELD_GET(PRIQ_0_SID, evt[0]);
-	ssv = FIELD_GET(PRIQ_0_SSID_V, evt[0]);
-	ssid = ssv ? FIELD_GET(PRIQ_0_SSID, evt[0]) : 0;
-	last = FIELD_GET(PRIQ_0_PRG_LAST, evt[0]);
-	grpid = FIELD_GET(PRIQ_1_PRG_IDX, evt[1]);
-
-	dev_info(smmu->dev, "unexpected PRI request received:\n");
-	dev_info(smmu->dev,
-		 "\tsid 0x%08x.0x%05x: [%u%s] %sprivileged %s%s%s access at iova 0x%016llx\n",
-		 sid, ssid, grpid, last ? "L" : "",
-		 evt[0] & PRIQ_0_PERM_PRIV ? "" : "un",
-		 evt[0] & PRIQ_0_PERM_READ ? "R" : "",
-		 evt[0] & PRIQ_0_PERM_WRITE ? "W" : "",
-		 evt[0] & PRIQ_0_PERM_EXEC ? "X" : "",
-		 evt[1] & PRIQ_1_ADDR_MASK);
-
-	if (last) {
-		struct arm_smmu_cmdq_ent cmd = {
-			.opcode			= CMDQ_OP_PRI_RESP,
-			.substream_valid	= ssv,
-			.pri			= {
-				.sid	= sid,
-				.ssid	= ssid,
-				.grpid	= grpid,
-				.resp	= PRI_RESP_DENY,
-			},
+	u32 sid = FIELD_PREP(PRIQ_0_SID, evt[0]);
+
+	bool pasid_valid, last;
+	struct arm_smmu_master *master;
+	struct iommu_fault_event fault_evt = {
+		.fault.type = IOMMU_FAULT_PAGE_REQ,
+		.fault.prm = {
+			.pasid		= FIELD_GET(PRIQ_0_SSID, evt[0]),
+			.grpid		= FIELD_GET(PRIQ_1_PRG_IDX, evt[1]),
+			.addr		= evt[1] & PRIQ_1_ADDR_MASK,
+		},
+	};
+	struct iommu_fault_page_request *pr = &fault_evt.fault.prm;
+
+	pasid_valid = evt[0] & PRIQ_0_SSID_V;
+	last = evt[0] & PRIQ_0_PRG_LAST;
+
+	/* Discard Stop PASID marker, it isn't used */
+	if (!(evt[0] & (PRIQ_0_PERM_READ | PRIQ_0_PERM_WRITE)) && last)
+		return;
+
+	if (last)
+		pr->flags |= IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE;
+	if (pasid_valid)
+		pr->flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
+	if (evt[0] & PRIQ_0_PERM_READ)
+		pr->perm |= IOMMU_FAULT_PERM_READ;
+	if (evt[0] & PRIQ_0_PERM_WRITE)
+		pr->perm |= IOMMU_FAULT_PERM_WRITE;
+	if (evt[0] & PRIQ_0_PERM_EXEC)
+		pr->perm |= IOMMU_FAULT_PERM_EXEC;
+	if (evt[0] & PRIQ_0_PERM_PRIV)
+		pr->perm |= IOMMU_FAULT_PERM_PRIV;
+
+	master = arm_smmu_find_master(smmu, sid);
+	if (WARN_ON(!master))
+		return;
+
+	if (iommu_report_device_fault(master->dev, &fault_evt)) {
+		/*
+		 * No handler registered, so subsequent faults won't produce
+		 * better results. Try to disable PRI.
+		 */
+		struct iommu_page_response resp = {
+			.flags		= pasid_valid ?
+					  IOMMU_PAGE_RESP_PASID_VALID : 0,
+			.pasid		= pr->pasid,
+			.grpid		= pr->grpid,
+			.code		= IOMMU_PAGE_RESP_FAILURE,
 		};
 
-		arm_smmu_cmdq_issue_cmd(smmu, &cmd);
+		dev_warn(master->dev,
+			 "PPR 0x%x:0x%llx 0x%x: nobody cared, disabling PRI\n",
+			 pasid_valid ? pr->pasid : 0, pr->addr, pr->perm);
+		if (last)
+			arm_smmu_page_response(master->dev, NULL, &resp);
 	}
 }
 
 static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
 {
+	int num_handled = 0;
+	bool overflow = false;
 	struct arm_smmu_device *smmu = dev;
 	struct arm_smmu_queue *q = &smmu->priq.q;
 	struct arm_smmu_ll_queue *llq = &q->llq;
+	size_t queue_size = 1 << llq->max_n_shift;
 	u64 evt[PRIQ_ENT_DWORDS];
 
+	spin_lock(&q->wq.lock);
 	do {
-		while (!queue_remove_raw(q, evt))
+		while (!queue_remove_raw(q, evt)) {
+			spin_unlock(&q->wq.lock);
 			arm_smmu_handle_ppr(smmu, evt);
+			spin_lock(&q->wq.lock);
+			if (++num_handled == queue_size) {
+				q->batch++;
+				wake_up_all_locked(&q->wq);
+				num_handled = 0;
+			}
+		}
 
-		if (queue_sync_prod_in(q) == -EOVERFLOW)
+		if (queue_sync_prod_in(q) == -EOVERFLOW) {
 			dev_err(smmu->dev, "PRIQ overflow detected -- requests lost\n");
+			overflow = true;
+		}
 	} while (!queue_empty(llq));
 
 	/* Sync our overflow flag, as we believe we're up to speed */
 	llq->cons = Q_OVF(llq->prod) | Q_WRP(llq, llq->cons) |
 		      Q_IDX(llq, llq->cons);
 	queue_sync_cons_out(q);
+
+	wake_up_all_locked(&q->wq);
+	spin_unlock(&q->wq.lock);
+
+	/*
+	 * On overflow, the SMMU might have discarded the last PPR in a group.
+	 * There is no way to know more about it, so we have to discard all
+	 * partial faults already queued.
+	 */
+	if (overflow)
+		iopf_queue_discard_partial(smmu->priq.iopf);
+
 	return IRQ_HANDLED;
 }
 
@@ -2569,6 +2634,36 @@ static int arm_smmu_flush_evtq(void *cookie, struct device *dev, int pasid)
 	return ret;
 }
 
+/*
+ * arm_smmu_flush_priq - wait until all requests currently in the queue have
+ *                       been consumed.
+ *
+ * See arm_smmu_flush_evtq().
+ */
+static int arm_smmu_flush_priq(void *cookie, struct device *dev, int pasid)
+{
+	int ret;
+	u64 batch;
+	bool overflow = false;
+	struct arm_smmu_device *smmu = cookie;
+	struct arm_smmu_queue *q = &smmu->priq.q;
+
+	spin_lock(&q->wq.lock);
+	if (queue_sync_prod_in(q) == -EOVERFLOW) {
+		dev_err(smmu->dev, "priq overflow detected -- requests lost\n");
+		overflow = true;
+	}
+
+	batch = q->batch;
+	ret = wait_event_interruptible_locked(q->wq, queue_empty(&q->llq) ||
+					      q->batch >= batch + 2);
+	spin_unlock(&q->wq.lock);
+
+	if (overflow)
+		iopf_queue_discard_partial(smmu->priq.iopf);
+	return ret;
+}
+
 static int arm_smmu_device_disable(struct arm_smmu_device *smmu);
 
 static irqreturn_t arm_smmu_gerror_handler(int irq, void *dev)
@@ -3270,6 +3365,75 @@ static void arm_smmu_disable_pasid(struct arm_smmu_master *master)
 	pci_disable_pasid(pdev);
 }
 
+static int arm_smmu_init_pri(struct arm_smmu_master *master)
+{
+	int pos;
+	struct pci_dev *pdev;
+
+	if (!dev_is_pci(master->dev))
+		return -EINVAL;
+
+	if (!(master->smmu->features & ARM_SMMU_FEAT_PRI))
+		return 0;
+
+	pdev = to_pci_dev(master->dev);
+	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
+	if (!pos)
+		return 0;
+
+	/* If the device supports PASID and PRI, set STE.PPAR */
+	if (master->ssid_bits)
+		master->prg_resp_needs_ssid = pci_prg_resp_pasid_required(pdev);
+
+	master->pri_supported = true;
+	return 0;
+}
+
+static int arm_smmu_enable_pri(struct arm_smmu_master *master)
+{
+	int ret;
+	struct pci_dev *pdev;
+	/*
+	 * TODO: find a good inflight PPR number. We should divide the PRI queue
+	 * by the number of PRI-capable devices, but it's impossible to know
+	 * about future (probed late or hotplugged) devices. So we're at risk of
+	 * dropping PPRs (and leaking pending requests in the FQ).
+	 */
+	size_t max_inflight_pprs = 16;
+
+	if (!master->pri_supported || !master->ats_enabled)
+		return -ENODEV;
+
+	pdev = to_pci_dev(master->dev);
+
+	ret = pci_reset_pri(pdev);
+	if (ret)
+		return ret;
+
+	ret = pci_enable_pri(pdev, max_inflight_pprs);
+	if (ret) {
+		dev_err(master->dev, "cannot enable PRI: %d\n", ret);
+		return ret;
+	}
+
+	return 0;
+}
+
+static void arm_smmu_disable_pri(struct arm_smmu_master *master)
+{
+	struct pci_dev *pdev;
+
+	if (!dev_is_pci(master->dev))
+		return;
+
+	pdev = to_pci_dev(master->dev);
+
+	if (!pdev->pri_enabled)
+		return;
+
+	pci_disable_pri(pdev);
+}
+
 static void arm_smmu_detach_dev(struct arm_smmu_master *master)
 {
 	unsigned long flags;
@@ -3705,6 +3869,8 @@ static int arm_smmu_add_device(struct device *dev)
 	    smmu->features & ARM_SMMU_FEAT_STALL_FORCE)
 		master->stall_enabled = true;
 
+	arm_smmu_init_pri(master);
+
 	ret = iommu_device_link(&smmu->iommu, dev);
 	if (ret)
 		goto err_disable_pasid;
@@ -3741,6 +3907,7 @@ static void arm_smmu_remove_device(struct device *dev)
 	master = dev_iommu_priv_get(dev);
 	smmu = master->smmu;
 	iopf_queue_remove_device(smmu->evtq.iopf, dev);
+	iopf_queue_remove_device(smmu->priq.iopf, dev);
 	WARN_ON(iommu_sva_disable(dev));
 	arm_smmu_detach_dev(master);
 	iommu_group_remove_device(dev);
@@ -3864,7 +4031,7 @@ static void arm_smmu_get_resv_regions(struct device *dev,
 
 static bool arm_smmu_iopf_supported(struct arm_smmu_master *master)
 {
-	return master->stall_enabled;
+	return master->stall_enabled || master->pri_supported;
 }
 
 static bool arm_smmu_dev_has_feature(struct device *dev,
@@ -3922,6 +4089,15 @@ static int arm_smmu_dev_enable_sva(struct device *dev)
 		ret = iopf_queue_add_device(master->smmu->evtq.iopf, dev);
 		if (ret)
 			goto err_disable_sva;
+	} else if (master->pri_supported) {
+		ret = iopf_queue_add_device(master->smmu->priq.iopf, dev);
+		if (ret)
+			goto err_disable_sva;
+
+		if (arm_smmu_enable_pri(master)) {
+			iopf_queue_remove_device(master->smmu->priq.iopf, dev);
+			goto err_disable_sva;
+		}
 	}
 	return 0;
 
@@ -3957,7 +4133,9 @@ static int arm_smmu_dev_disable_feature(struct device *dev,
 
 	switch (feat) {
 	case IOMMU_DEV_FEAT_SVA:
+		arm_smmu_disable_pri(master);
 		iopf_queue_remove_device(master->smmu->evtq.iopf, dev);
+		iopf_queue_remove_device(master->smmu->priq.iopf, dev);
 		return iommu_sva_disable(dev);
 	default:
 		return -EINVAL;
@@ -4101,6 +4279,11 @@ static int arm_smmu_init_queues(struct arm_smmu_device *smmu)
 	if (!(smmu->features & ARM_SMMU_FEAT_PRI))
 		return 0;
 
+	smmu->priq.iopf = iopf_queue_alloc(dev_name(smmu->dev),
+					   arm_smmu_flush_priq, smmu);
+	if (!smmu->priq.iopf)
+		return -ENOMEM;
+
 	return arm_smmu_init_one_queue(smmu, &smmu->priq.q, ARM_SMMU_PRIQ_PROD,
 				       ARM_SMMU_PRIQ_CONS, PRIQ_ENT_DWORDS,
 				       "priq");
@@ -5078,6 +5261,7 @@ static int arm_smmu_device_remove(struct platform_device *pdev)
 	iommu_device_sysfs_remove(&smmu->iommu);
 	arm_smmu_device_disable(smmu);
 	iopf_queue_free(smmu->evtq.iopf);
+	iopf_queue_free(smmu->priq.iopf);
 
 	return 0;
 }
-- 
2.26.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 23/25] PCI/ATS: Add PRI stubs
  2020-04-14 17:02 ` [PATCH v5 23/25] PCI/ATS: Add PRI stubs Jean-Philippe Brucker
@ 2020-04-14 18:03   ` Kuppuswamy, Sathyanarayanan
  0 siblings, 0 replies; 44+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2020-04-14 18:03 UTC (permalink / raw)
  To: Jean-Philippe Brucker, iommu, devicetree, linux-arm-kernel,
	linux-pci, linux-mm
  Cc: joro, catalin.marinas, will, robin.murphy, kevin.tian, baolu.lu,
	Jonathan.Cameron, jacob.jun.pan, christian.koenig, zhangfei.gao,
	jgg, xuzaibo, Bjorn Helgaas

Hi,

On 4/14/20 10:02 AM, Jean-Philippe Brucker wrote:
> The SMMUv3 driver, which can be built without CONFIG_PCI, will soon gain
> support for PRI.  Partially revert commit c6e9aefbf9db ("PCI/ATS: Remove
> unused PRI and PASID stubs") to re-introduce the PRI stubs, and avoid
> adding more #ifdefs to the SMMU driver.
> 
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Kuppuswamy Sathyanarayanan 
<sathyanarayanan.kuppuswamy@linux.intel.com>
> ---
>   include/linux/pci-ats.h | 8 ++++++++
>   1 file changed, 8 insertions(+)
> 
> diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
> index f75c307f346de..e9e266df9b37c 100644
> --- a/include/linux/pci-ats.h
> +++ b/include/linux/pci-ats.h
> @@ -28,6 +28,14 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs);
>   void pci_disable_pri(struct pci_dev *pdev);
>   int pci_reset_pri(struct pci_dev *pdev);
>   int pci_prg_resp_pasid_required(struct pci_dev *pdev);
> +#else /* CONFIG_PCI_PRI */
> +static inline int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
> +{ return -ENODEV; }
> +static inline void pci_disable_pri(struct pci_dev *pdev) { }
> +static inline int pci_reset_pri(struct pci_dev *pdev)
> +{ return -ENODEV; }
> +static inline int pci_prg_resp_pasid_required(struct pci_dev *pdev)
> +{ return 0; }
>   #endif /* CONFIG_PCI_PRI */
>   
>   #ifdef CONFIG_PCI_PASID
> 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 24/25] PCI/ATS: Export PRI functions
  2020-04-14 17:02 ` [PATCH v5 24/25] PCI/ATS: Export PRI functions Jean-Philippe Brucker
@ 2020-04-14 18:03   ` Kuppuswamy, Sathyanarayanan
  0 siblings, 0 replies; 44+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2020-04-14 18:03 UTC (permalink / raw)
  To: Jean-Philippe Brucker, iommu, devicetree, linux-arm-kernel,
	linux-pci, linux-mm
  Cc: kevin.tian, catalin.marinas, robin.murphy, jgg, Bjorn Helgaas,
	zhangfei.gao, will, christian.koenig


Hi,
On 4/14/20 10:02 AM, Jean-Philippe Brucker wrote:
> The SMMUv3 driver uses pci_{enable,disable}_pri() and related
> functions. Export those functions to allow the driver to be built as a
> module.
> 
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Kuppuswamy Sathyanarayanan 
<sathyanarayanan.kuppuswamy@linux.intel.com>
> ---
>   drivers/pci/ats.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index bbfd0d42b8b97..fc8fc6fc8bd55 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -197,6 +197,7 @@ void pci_pri_init(struct pci_dev *pdev)
>   	if (status & PCI_PRI_STATUS_PASID)
>   		pdev->pasid_required = 1;
>   }
> +EXPORT_SYMBOL_GPL(pci_pri_init);
>   
>   /**
>    * pci_enable_pri - Enable PRI capability
> @@ -243,6 +244,7 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
>   
>   	return 0;
>   }
> +EXPORT_SYMBOL_GPL(pci_enable_pri);
>   
>   /**
>    * pci_disable_pri - Disable PRI capability
> @@ -322,6 +324,7 @@ int pci_reset_pri(struct pci_dev *pdev)
>   
>   	return 0;
>   }
> +EXPORT_SYMBOL_GPL(pci_reset_pri);
>   
>   /**
>    * pci_prg_resp_pasid_required - Return PRG Response PASID Required bit
> @@ -337,6 +340,7 @@ int pci_prg_resp_pasid_required(struct pci_dev *pdev)
>   
>   	return pdev->pasid_required;
>   }
> +EXPORT_SYMBOL_GPL(pci_prg_resp_pasid_required);
>   #endif /* CONFIG_PCI_PRI */
>   
>   #ifdef CONFIG_PCI_PASID
> 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 01/25] mm/mmu_notifiers: pass private data down to alloc_notifier()
  2020-04-14 17:02 ` [PATCH v5 01/25] mm/mmu_notifiers: pass private data down to alloc_notifier() Jean-Philippe Brucker
@ 2020-04-14 18:09   ` Jason Gunthorpe
  0 siblings, 0 replies; 44+ messages in thread
From: Jason Gunthorpe @ 2020-04-14 18:09 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: iommu, devicetree, linux-arm-kernel, linux-pci, linux-mm, joro,
	catalin.marinas, will, robin.murphy, kevin.tian, baolu.lu,
	Jonathan.Cameron, jacob.jun.pan, christian.koenig, zhangfei.gao,
	xuzaibo, Andrew Morton, Arnd Bergmann, Christoph Hellwig,
	Dimitri Sivanich, Greg Kroah-Hartman

On Tue, Apr 14, 2020 at 07:02:29PM +0200, Jean-Philippe Brucker wrote:
> The new allocation scheme introduced by commit 2c7933f53f6b
> ("mm/mmu_notifiers: add a get/put scheme for the registration") provides
> a convenient way for users to attach notifier data to an mm. However, it
> would be even better to create this notifier data atomically.
> 
> Since the alloc_notifier() callback only takes an mm argument at the
> moment, some users have to perform the allocation in two times.
> alloc_notifier() initially creates an incomplete structure, which is
> then finalized using more context once mmu_notifier_get() returns. This
> second step requires extra care to order memory accesses against live
> invalidation.
> 
> The IOMMU SVA module, which attaches an mm to multiple devices,
> exemplifies this situation. In essence it does:
> 
> 	mmu_notifier_get()
> 	  alloc_notifier()
> 	     A = kzalloc()
> 	  /* MMU notifier is published */
> 	A->ctx = ctx;				// (1)
> 	device->A = A;
> 	list_add_rcu(device, A->devices);	// (2)
> 
> The invalidate notifier, which may start running before A is fully
> initialized, does the following:
> 
> 	io_mm_invalidate(A)
> 	  list_for_each_entry_rcu(device, A->devices)
> 	    device->invalidate(A->ctx)

This could probably also have been reliably fixed by not having A->ctx
be allocated memory, but inlined into the notifier struct

But I can't think of a down side to not add a params either.

Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>

Regards,
Jason


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 02/25] iommu/sva: Manage process address spaces
  2020-04-14 17:02 ` [PATCH v5 02/25] iommu/sva: Manage process address spaces Jean-Philippe Brucker
@ 2020-04-16  7:28   ` Christoph Hellwig
  2020-04-16  8:54     ` Jean-Philippe Brucker
  0 siblings, 1 reply; 44+ messages in thread
From: Christoph Hellwig @ 2020-04-16  7:28 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: iommu, devicetree, linux-arm-kernel, linux-pci, linux-mm, joro,
	catalin.marinas, will, robin.murphy, kevin.tian, baolu.lu,
	Jonathan.Cameron, jacob.jun.pan, christian.koenig, zhangfei.gao,
	jgg, xuzaibo

> +	rcu_read_lock();
> +	hlist_for_each_entry_rcu(bond, &io_mm->devices, mm_node)
> +		io_mm->ops->invalidate(bond->sva.dev, io_mm->pasid, io_mm->ctx,
> +				       start, end - start);
> +	rcu_read_unlock();
> +}

What is the reason that the devices don't register their own notifiers?
This kinds of multiplexing is always rather messy, and you do it for
all the methods.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 02/25] iommu/sva: Manage process address spaces
  2020-04-16  7:28   ` Christoph Hellwig
@ 2020-04-16  8:54     ` Jean-Philippe Brucker
  2020-04-16 12:13       ` Christoph Hellwig
  0 siblings, 1 reply; 44+ messages in thread
From: Jean-Philippe Brucker @ 2020-04-16  8:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: iommu, devicetree, linux-arm-kernel, linux-pci, linux-mm, joro,
	catalin.marinas, will, robin.murphy, kevin.tian, baolu.lu,
	Jonathan.Cameron, jacob.jun.pan, christian.koenig, zhangfei.gao,
	jgg, xuzaibo

On Thu, Apr 16, 2020 at 12:28:52AM -0700, Christoph Hellwig wrote:
> > +	rcu_read_lock();
> > +	hlist_for_each_entry_rcu(bond, &io_mm->devices, mm_node)
> > +		io_mm->ops->invalidate(bond->sva.dev, io_mm->pasid, io_mm->ctx,
> > +				       start, end - start);
> > +	rcu_read_unlock();
> > +}
> 
> What is the reason that the devices don't register their own notifiers?
> This kinds of multiplexing is always rather messy, and you do it for
> all the methods.

This sends TLB and ATC invalidations through the IOMMU, it doesn't go
through device drivers

Thanks,
Jean


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 02/25] iommu/sva: Manage process address spaces
  2020-04-16  8:54     ` Jean-Philippe Brucker
@ 2020-04-16 12:13       ` Christoph Hellwig
  2020-04-20  7:42         ` Jean-Philippe Brucker
  0 siblings, 1 reply; 44+ messages in thread
From: Christoph Hellwig @ 2020-04-16 12:13 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Christoph Hellwig, iommu, devicetree, linux-arm-kernel,
	linux-pci, linux-mm, joro, catalin.marinas, will, robin.murphy,
	kevin.tian, baolu.lu, Jonathan.Cameron, jacob.jun.pan,
	christian.koenig, zhangfei.gao, jgg, xuzaibo

On Thu, Apr 16, 2020 at 10:54:02AM +0200, Jean-Philippe Brucker wrote:
> On Thu, Apr 16, 2020 at 12:28:52AM -0700, Christoph Hellwig wrote:
> > > +	rcu_read_lock();
> > > +	hlist_for_each_entry_rcu(bond, &io_mm->devices, mm_node)
> > > +		io_mm->ops->invalidate(bond->sva.dev, io_mm->pasid, io_mm->ctx,
> > > +				       start, end - start);
> > > +	rcu_read_unlock();
> > > +}
> > 
> > What is the reason that the devices don't register their own notifiers?
> > This kinds of multiplexing is always rather messy, and you do it for
> > all the methods.
> 
> This sends TLB and ATC invalidations through the IOMMU, it doesn't go
> through device drivers

I don't think we mean the same thing, probably because of my rather
imprecise use of the word device.

What I mean is that the mmu_notifier should not be embedded into the
io_mm structure (whch btw, seems to have a way to generic name, just
like all other io_* prefixed names), but instead into the
iommu_bond structure.  That avoid the whole multiplexing layer.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 02/25] iommu/sva: Manage process address spaces
  2020-04-16 12:13       ` Christoph Hellwig
@ 2020-04-20  7:42         ` Jean-Philippe Brucker
  2020-04-20  8:10           ` Christoph Hellwig
  2020-04-20 13:57           ` Jason Gunthorpe
  0 siblings, 2 replies; 44+ messages in thread
From: Jean-Philippe Brucker @ 2020-04-20  7:42 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: iommu, devicetree, linux-arm-kernel, linux-pci, linux-mm, joro,
	catalin.marinas, will, robin.murphy, kevin.tian, baolu.lu,
	Jonathan.Cameron, jacob.jun.pan, christian.koenig, zhangfei.gao,
	jgg, xuzaibo

On Thu, Apr 16, 2020 at 05:13:31AM -0700, Christoph Hellwig wrote:
> On Thu, Apr 16, 2020 at 10:54:02AM +0200, Jean-Philippe Brucker wrote:
> > On Thu, Apr 16, 2020 at 12:28:52AM -0700, Christoph Hellwig wrote:
> > > > +	rcu_read_lock();
> > > > +	hlist_for_each_entry_rcu(bond, &io_mm->devices, mm_node)
> > > > +		io_mm->ops->invalidate(bond->sva.dev, io_mm->pasid, io_mm->ctx,
> > > > +				       start, end - start);
> > > > +	rcu_read_unlock();
> > > > +}
> > > 
> > > What is the reason that the devices don't register their own notifiers?
> > > This kinds of multiplexing is always rather messy, and you do it for
> > > all the methods.
> > 
> > This sends TLB and ATC invalidations through the IOMMU, it doesn't go
> > through device drivers
> 
> I don't think we mean the same thing, probably because of my rather
> imprecise use of the word device.
> 
> What I mean is that the mmu_notifier should not be embedded into the
> io_mm structure (whch btw, seems to have a way to generic name, just
> like all other io_* prefixed names), but instead into the
> iommu_bond structure.  That avoid the whole multiplexing layer.

Right, I can see the appeal. I still like having a single mmu notifier per
mm because it ensures we allocate a single PASID per mm (as required by
x86). I suppose one alternative is to maintain a hashtable of mm->pasid,
to avoid iterating over all bonds during allocation.

Thanks,
Jean


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 02/25] iommu/sva: Manage process address spaces
  2020-04-20  7:42         ` Jean-Philippe Brucker
@ 2020-04-20  8:10           ` Christoph Hellwig
  2020-04-20 11:44             ` Christian König
  2020-04-20 13:57           ` Jason Gunthorpe
  1 sibling, 1 reply; 44+ messages in thread
From: Christoph Hellwig @ 2020-04-20  8:10 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Christoph Hellwig, iommu, devicetree, linux-arm-kernel,
	linux-pci, linux-mm, joro, catalin.marinas, will, robin.murphy,
	kevin.tian, baolu.lu, Jonathan.Cameron, jacob.jun.pan,
	christian.koenig, zhangfei.gao, jgg, xuzaibo

On Mon, Apr 20, 2020 at 09:42:13AM +0200, Jean-Philippe Brucker wrote:
> Right, I can see the appeal. I still like having a single mmu notifier per
> mm because it ensures we allocate a single PASID per mm (as required by
> x86). I suppose one alternative is to maintain a hashtable of mm->pasid,
> to avoid iterating over all bonds during allocation.

Given that the PASID is a pretty generic and important concept can
we just add it directly to the mm_struct and allocate it lazily once
we first need it?


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 02/25] iommu/sva: Manage process address spaces
  2020-04-20  8:10           ` Christoph Hellwig
@ 2020-04-20 11:44             ` Christian König
  2020-04-20 11:55               ` Christoph Hellwig
  0 siblings, 1 reply; 44+ messages in thread
From: Christian König @ 2020-04-20 11:44 UTC (permalink / raw)
  To: Christoph Hellwig, Jean-Philippe Brucker
  Cc: iommu, devicetree, linux-arm-kernel, linux-pci, linux-mm, joro,
	catalin.marinas, will, robin.murphy, kevin.tian, baolu.lu,
	Jonathan.Cameron, jacob.jun.pan, zhangfei.gao, jgg, xuzaibo

Am 20.04.20 um 10:10 schrieb Christoph Hellwig:
> On Mon, Apr 20, 2020 at 09:42:13AM +0200, Jean-Philippe Brucker wrote:
>> Right, I can see the appeal. I still like having a single mmu notifier per
>> mm because it ensures we allocate a single PASID per mm (as required by
>> x86). I suppose one alternative is to maintain a hashtable of mm->pasid,
>> to avoid iterating over all bonds during allocation.
> Given that the PASID is a pretty generic and important concept can
> we just add it directly to the mm_struct and allocate it lazily once
> we first need it?

Well the problem is that the PASID might as well be device specific. 
E.g. some devices use 16bit PASIDs, some 15bit, some other only 12bit.

So what could (at least in theory) happen is that you need to allocate 
different PASIDs for the same process because different devices need one.

Regards,
Christian.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 02/25] iommu/sva: Manage process address spaces
  2020-04-20 11:44             ` Christian König
@ 2020-04-20 11:55               ` Christoph Hellwig
  2020-04-20 12:40                 ` Christian König
  0 siblings, 1 reply; 44+ messages in thread
From: Christoph Hellwig @ 2020-04-20 11:55 UTC (permalink / raw)
  To: Christian König
  Cc: Christoph Hellwig, Jean-Philippe Brucker, iommu, devicetree,
	linux-arm-kernel, linux-pci, linux-mm, joro, catalin.marinas,
	will, robin.murphy, kevin.tian, baolu.lu, Jonathan.Cameron,
	jacob.jun.pan, zhangfei.gao, jgg, xuzaibo

On Mon, Apr 20, 2020 at 01:44:56PM +0200, Christian König wrote:
> Am 20.04.20 um 10:10 schrieb Christoph Hellwig:
> > On Mon, Apr 20, 2020 at 09:42:13AM +0200, Jean-Philippe Brucker wrote:
> > > Right, I can see the appeal. I still like having a single mmu notifier per
> > > mm because it ensures we allocate a single PASID per mm (as required by
> > > x86). I suppose one alternative is to maintain a hashtable of mm->pasid,
> > > to avoid iterating over all bonds during allocation.
> > Given that the PASID is a pretty generic and important concept can
> > we just add it directly to the mm_struct and allocate it lazily once
> > we first need it?
> 
> Well the problem is that the PASID might as well be device specific. E.g.
> some devices use 16bit PASIDs, some 15bit, some other only 12bit.
> 
> So what could (at least in theory) happen is that you need to allocate
> different PASIDs for the same process because different devices need one.

This directly contradicts the statement from Jean-Philippe above that
x86 requires a single PASID per mm_struct.  If we may need different
PASIDs for different devices and can actually support this just
allocating one per [device, mm_struct] would make most sense of me, as
it doesn't couple otherwise disjoint state.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 02/25] iommu/sva: Manage process address spaces
  2020-04-20 11:55               ` Christoph Hellwig
@ 2020-04-20 12:40                 ` Christian König
  2020-04-20 15:00                   ` Felix Kuehling
  0 siblings, 1 reply; 44+ messages in thread
From: Christian König @ 2020-04-20 12:40 UTC (permalink / raw)
  To: Christoph Hellwig, Felix Kuehling
  Cc: Jean-Philippe Brucker, iommu, devicetree, linux-arm-kernel,
	linux-pci, linux-mm, joro, catalin.marinas, will, robin.murphy,
	kevin.tian, baolu.lu, Jonathan.Cameron, jacob.jun.pan,
	zhangfei.gao, jgg, xuzaibo

Am 20.04.20 um 13:55 schrieb Christoph Hellwig:
> On Mon, Apr 20, 2020 at 01:44:56PM +0200, Christian König wrote:
>> Am 20.04.20 um 10:10 schrieb Christoph Hellwig:
>>> On Mon, Apr 20, 2020 at 09:42:13AM +0200, Jean-Philippe Brucker wrote:
>>>> Right, I can see the appeal. I still like having a single mmu notifier per
>>>> mm because it ensures we allocate a single PASID per mm (as required by
>>>> x86). I suppose one alternative is to maintain a hashtable of mm->pasid,
>>>> to avoid iterating over all bonds during allocation.
>>> Given that the PASID is a pretty generic and important concept can
>>> we just add it directly to the mm_struct and allocate it lazily once
>>> we first need it?
>> Well the problem is that the PASID might as well be device specific. E.g.
>> some devices use 16bit PASIDs, some 15bit, some other only 12bit.
>>
>> So what could (at least in theory) happen is that you need to allocate
>> different PASIDs for the same process because different devices need one.
> This directly contradicts the statement from Jean-Philippe above that
> x86 requires a single PASID per mm_struct.  If we may need different
> PASIDs for different devices and can actually support this just
> allocating one per [device, mm_struct] would make most sense of me, as
> it doesn't couple otherwise disjoint state.

Well I'm not an expert on this topic. Felix can probably tell you a bit 
more about that.

Maybe it is sufficient to keep the allocated PASIDs as small as possible 
and return an appropriate error if a device can't deal with the 
allocated number.

If a device can only deal with 12bit PASIDs and more than 2^12 try to 
use it there isn't much else we can do than returning an error anyway.

Regards,
Christian.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 02/25] iommu/sva: Manage process address spaces
  2020-04-20  7:42         ` Jean-Philippe Brucker
  2020-04-20  8:10           ` Christoph Hellwig
@ 2020-04-20 13:57           ` Jason Gunthorpe
  2020-04-20 17:48             ` Jacob Pan
  1 sibling, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2020-04-20 13:57 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Christoph Hellwig, iommu, devicetree, linux-arm-kernel,
	linux-pci, linux-mm, joro, catalin.marinas, will, robin.murphy,
	kevin.tian, baolu.lu, Jonathan.Cameron, jacob.jun.pan,
	christian.koenig, zhangfei.gao, xuzaibo

On Mon, Apr 20, 2020 at 09:42:13AM +0200, Jean-Philippe Brucker wrote:
> On Thu, Apr 16, 2020 at 05:13:31AM -0700, Christoph Hellwig wrote:
> > On Thu, Apr 16, 2020 at 10:54:02AM +0200, Jean-Philippe Brucker wrote:
> > > On Thu, Apr 16, 2020 at 12:28:52AM -0700, Christoph Hellwig wrote:
> > > > > +	rcu_read_lock();
> > > > > +	hlist_for_each_entry_rcu(bond, &io_mm->devices, mm_node)
> > > > > +		io_mm->ops->invalidate(bond->sva.dev, io_mm->pasid, io_mm->ctx,
> > > > > +				       start, end - start);
> > > > > +	rcu_read_unlock();
> > > > > +}
> > > > 
> > > > What is the reason that the devices don't register their own notifiers?
> > > > This kinds of multiplexing is always rather messy, and you do it for
> > > > all the methods.
> > > 
> > > This sends TLB and ATC invalidations through the IOMMU, it doesn't go
> > > through device drivers
> > 
> > I don't think we mean the same thing, probably because of my rather
> > imprecise use of the word device.
> > 
> > What I mean is that the mmu_notifier should not be embedded into the
> > io_mm structure (whch btw, seems to have a way to generic name, just
> > like all other io_* prefixed names), but instead into the
> > iommu_bond structure.  That avoid the whole multiplexing layer.
> 
> Right, I can see the appeal. I still like having a single mmu notifier per
> mm because it ensures we allocate a single PASID per mm (as required by
> x86). I suppose one alternative is to maintain a hashtable of mm->pasid,
> to avoid iterating over all bonds during allocation.

I've been getting rid of hash tables like this.. Adding it to the
mm_struct does seem reasonable, I think PASID is a pretty broad
concept now.

Jason


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 02/25] iommu/sva: Manage process address spaces
  2020-04-20 12:40                 ` Christian König
@ 2020-04-20 15:00                   ` Felix Kuehling
  2020-04-20 17:44                     ` Jacob Pan
  0 siblings, 1 reply; 44+ messages in thread
From: Felix Kuehling @ 2020-04-20 15:00 UTC (permalink / raw)
  To: Christian König, Christoph Hellwig
  Cc: Jean-Philippe Brucker, iommu, devicetree, linux-arm-kernel,
	linux-pci, linux-mm, joro, catalin.marinas, will, robin.murphy,
	kevin.tian, baolu.lu, Jonathan.Cameron, jacob.jun.pan,
	zhangfei.gao, jgg, xuzaibo

Am 2020-04-20 um 8:40 a.m. schrieb Christian König:
> Am 20.04.20 um 13:55 schrieb Christoph Hellwig:
>> On Mon, Apr 20, 2020 at 01:44:56PM +0200, Christian König wrote:
>>> Am 20.04.20 um 10:10 schrieb Christoph Hellwig:
>>>> On Mon, Apr 20, 2020 at 09:42:13AM +0200, Jean-Philippe Brucker wrote:
>>>>> Right, I can see the appeal. I still like having a single mmu
>>>>> notifier per
>>>>> mm because it ensures we allocate a single PASID per mm (as
>>>>> required by
>>>>> x86). I suppose one alternative is to maintain a hashtable of
>>>>> mm->pasid,
>>>>> to avoid iterating over all bonds during allocation.
>>>> Given that the PASID is a pretty generic and important concept can
>>>> we just add it directly to the mm_struct and allocate it lazily once
>>>> we first need it?
>>> Well the problem is that the PASID might as well be device specific.
>>> E.g.
>>> some devices use 16bit PASIDs, some 15bit, some other only 12bit.
>>>
>>> So what could (at least in theory) happen is that you need to allocate
>>> different PASIDs for the same process because different devices need
>>> one.
>> This directly contradicts the statement from Jean-Philippe above that
>> x86 requires a single PASID per mm_struct.  If we may need different
>> PASIDs for different devices and can actually support this just
>> allocating one per [device, mm_struct] would make most sense of me, as
>> it doesn't couple otherwise disjoint state.
>
> Well I'm not an expert on this topic. Felix can probably tell you a
> bit more about that.
>
> Maybe it is sufficient to keep the allocated PASIDs as small as
> possible and return an appropriate error if a device can't deal with
> the allocated number.
>
> If a device can only deal with 12bit PASIDs and more than 2^12 try to
> use it there isn't much else we can do than returning an error anyway.

I'm probably missing some context. But let me try giving a useful reply.

The hardware allows you to have different PASIDs for each device
referring to the same address space. But I think it's OK for software to
choose not to do that. If Linux wants to manage one PASID namespace for
all devices, that's a reasonable choice IMO.

Different devices have different limits for the size of PASID they can
support. For example AMD GPUs support 16-bits but the IOMMU supports
less. So on APUs we use small PASIDs for contexts that want to use
IOMMUv2 to access memory, but bigger PASIDs for contexts that do not.

I choose the word "context" deliberately, because the amdgpu driver uses
PASIDs even when we're not using IOMMUv2, and we're using them to
identify GPU virtual address spaces. There can be more than one per
process. In practice you can have two, one for graphics (not SVM,
doesn't use IOMMUv2) and one for KFD compute (SVM, can use IOMMUv2 on APUs).

Because the IOMMUv2 supports only smaller PASIDs, we want to avoid
exhausting that space with PASID allocations that don't use the IOMMUv2.
So our PASID allocation function has a "size" parameter, and we try to
allocated a PASID as big as possible in order to leave more precious
smaller PASIDs for contexts that need them.

The bottom line is, when you allocate a PASID for a context, you want to
know how small it needs to be for all the devices that want to use it.
If you make it too big, some device will not be able to use it. If you
make it too small, you waste precious PASIDs that could be used for
other contexts that need them.

Regards,
  Felix



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 02/25] iommu/sva: Manage process address spaces
  2020-04-20 15:00                   ` Felix Kuehling
@ 2020-04-20 17:44                     ` Jacob Pan
  2020-04-20 17:52                       ` Felix Kuehling
  0 siblings, 1 reply; 44+ messages in thread
From: Jacob Pan @ 2020-04-20 17:44 UTC (permalink / raw)
  To: Felix Kuehling
  Cc: Christian König, Christoph Hellwig, Jean-Philippe Brucker,
	iommu, devicetree, linux-arm-kernel, linux-pci, linux-mm, joro,
	catalin.marinas, will, robin.murphy, kevin.tian, baolu.lu,
	Jonathan.Cameron, zhangfei.gao, jgg, xuzaibo, jacob.jun.pan

On Mon, 20 Apr 2020 11:00:28 -0400
Felix Kuehling <felix.kuehling@amd.com> wrote:

> Am 2020-04-20 um 8:40 a.m. schrieb Christian König:
> > Am 20.04.20 um 13:55 schrieb Christoph Hellwig:  
> >> On Mon, Apr 20, 2020 at 01:44:56PM +0200, Christian König wrote:  
> >>> Am 20.04.20 um 10:10 schrieb Christoph Hellwig:  
> >>>> On Mon, Apr 20, 2020 at 09:42:13AM +0200, Jean-Philippe Brucker
> >>>> wrote:  
> >>>>> Right, I can see the appeal. I still like having a single mmu
> >>>>> notifier per
> >>>>> mm because it ensures we allocate a single PASID per mm (as
> >>>>> required by
> >>>>> x86). I suppose one alternative is to maintain a hashtable of
> >>>>> mm->pasid,
> >>>>> to avoid iterating over all bonds during allocation.  
> >>>> Given that the PASID is a pretty generic and important concept
> >>>> can we just add it directly to the mm_struct and allocate it
> >>>> lazily once we first need it?  
> >>> Well the problem is that the PASID might as well be device
> >>> specific. E.g.
> >>> some devices use 16bit PASIDs, some 15bit, some other only 12bit.
> >>>
> >>> So what could (at least in theory) happen is that you need to
> >>> allocate different PASIDs for the same process because different
> >>> devices need one.  
> >> This directly contradicts the statement from Jean-Philippe above
> >> that x86 requires a single PASID per mm_struct.  If we may need
> >> different PASIDs for different devices and can actually support
> >> this just allocating one per [device, mm_struct] would make most
> >> sense of me, as it doesn't couple otherwise disjoint state.  
> >
> > Well I'm not an expert on this topic. Felix can probably tell you a
> > bit more about that.
> >
> > Maybe it is sufficient to keep the allocated PASIDs as small as
> > possible and return an appropriate error if a device can't deal with
> > the allocated number.
> >
> > If a device can only deal with 12bit PASIDs and more than 2^12 try
> > to use it there isn't much else we can do than returning an error
> > anyway.  
> 
> I'm probably missing some context. But let me try giving a useful
> reply.
> 
> The hardware allows you to have different PASIDs for each device
> referring to the same address space. But I think it's OK for software
> to choose not to do that. If Linux wants to manage one PASID
> namespace for all devices, that's a reasonable choice IMO.
> 
On VT-d, system wide PASID namespace is required. Here is a section of
the documentation I am working on.

Namespaces
====================================================
IOASIDs are limited system resources that default to 20 bits in
size. Since each device has its own table, theoretically the namespace
can be per device also. However, VT-d also supports shared workqueue
and ENQCMD[1] where one IOASID could be used to submit work on
multiple devices. This requires IOASID to be system-wide on Intel VT-d
platforms. This is also the reason why guest must use emulated virtual
command interface to allocate IOASID from the host.

On VT-d, storage of IOASID table is at per device while the
granularity of assignment is per IOASID. Even though, each guest
IOASID must have a backing host IOASID, guest IOASID can be different
than its host IOASID. The namespace of guest IOASID is controlled by
VMM, which decideds whether identity mapping of G-H IOASIDs is necessary.

1.
https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf

For the per mm_struct PASID question by Christian, we are proposing
that in x86 context and a lazy free.

https://lkml.org/lkml/2020/3/30/910

> Different devices have different limits for the size of PASID they can
> support. For example AMD GPUs support 16-bits but the IOMMU supports
> less. So on APUs we use small PASIDs for contexts that want to use
> IOMMUv2 to access memory, but bigger PASIDs for contexts that do not.
> 
> I choose the word "context" deliberately, because the amdgpu driver
> uses PASIDs even when we're not using IOMMUv2, and we're using them to
> identify GPU virtual address spaces. There can be more than one per
> process. In practice you can have two, one for graphics (not SVM,
> doesn't use IOMMUv2) and one for KFD compute (SVM, can use IOMMUv2 on
> APUs).
> 
> Because the IOMMUv2 supports only smaller PASIDs, we want to avoid
> exhausting that space with PASID allocations that don't use the
> IOMMUv2. So our PASID allocation function has a "size" parameter, and
> we try to allocated a PASID as big as possible in order to leave more
> precious smaller PASIDs for contexts that need them.
> 
> The bottom line is, when you allocate a PASID for a context, you want
> to know how small it needs to be for all the devices that want to use
> it. If you make it too big, some device will not be able to use it.
> If you make it too small, you waste precious PASIDs that could be
> used for other contexts that need them.
> 
So for AMD, system-wide PASID allocation works with the
restriction/optimization above?

> Regards,
>   Felix
> 

[Jacob Pan]


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 02/25] iommu/sva: Manage process address spaces
  2020-04-20 13:57           ` Jason Gunthorpe
@ 2020-04-20 17:48             ` Jacob Pan
  2020-04-20 18:14               ` Fenghua Yu
  0 siblings, 1 reply; 44+ messages in thread
From: Jacob Pan @ 2020-04-20 17:48 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jean-Philippe Brucker, Christoph Hellwig, iommu, devicetree,
	linux-arm-kernel, linux-pci, linux-mm, joro, catalin.marinas,
	will, robin.murphy, kevin.tian, baolu.lu, Jonathan.Cameron,
	christian.koenig, zhangfei.gao, xuzaibo, jacob.jun.pan, Raj,
	Ashok, Yu, Fenghua

On Mon, 20 Apr 2020 10:57:27 -0300
Jason Gunthorpe <jgg@ziepe.ca> wrote:

> On Mon, Apr 20, 2020 at 09:42:13AM +0200, Jean-Philippe Brucker wrote:
> > On Thu, Apr 16, 2020 at 05:13:31AM -0700, Christoph Hellwig wrote:  
> > > On Thu, Apr 16, 2020 at 10:54:02AM +0200, Jean-Philippe Brucker
> > > wrote:  
> > > > On Thu, Apr 16, 2020 at 12:28:52AM -0700, Christoph Hellwig
> > > > wrote:  
> > > > > > +	rcu_read_lock();
> > > > > > +	hlist_for_each_entry_rcu(bond, &io_mm->devices,
> > > > > > mm_node)
> > > > > > +		io_mm->ops->invalidate(bond->sva.dev,
> > > > > > io_mm->pasid, io_mm->ctx,
> > > > > > +				       start, end - start);
> > > > > > +	rcu_read_unlock();
> > > > > > +}  
> > > > > 
> > > > > What is the reason that the devices don't register their own
> > > > > notifiers? This kinds of multiplexing is always rather messy,
> > > > > and you do it for all the methods.  
> > > > 
> > > > This sends TLB and ATC invalidations through the IOMMU, it
> > > > doesn't go through device drivers  
> > > 
> > > I don't think we mean the same thing, probably because of my
> > > rather imprecise use of the word device.
> > > 
> > > What I mean is that the mmu_notifier should not be embedded into
> > > the io_mm structure (whch btw, seems to have a way to generic
> > > name, just like all other io_* prefixed names), but instead into
> > > the iommu_bond structure.  That avoid the whole multiplexing
> > > layer.  
> > 
> > Right, I can see the appeal. I still like having a single mmu
> > notifier per mm because it ensures we allocate a single PASID per
> > mm (as required by x86). I suppose one alternative is to maintain a
> > hashtable of mm->pasid, to avoid iterating over all bonds during
> > allocation.  
> 
> I've been getting rid of hash tables like this.. Adding it to the
> mm_struct does seem reasonable, I think PASID is a pretty broad
> concept now.
> 
Agreed, perhaps Fenghua can consider that in his patchset. It would
help align life cycles as well.
https://lkml.org/lkml/2020/3/30/910

> Jason

[Jacob Pan]


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 02/25] iommu/sva: Manage process address spaces
  2020-04-20 17:44                     ` Jacob Pan
@ 2020-04-20 17:52                       ` Felix Kuehling
  0 siblings, 0 replies; 44+ messages in thread
From: Felix Kuehling @ 2020-04-20 17:52 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Christian König, Christoph Hellwig, Jean-Philippe Brucker,
	iommu, devicetree, linux-arm-kernel, linux-pci, linux-mm, joro,
	catalin.marinas, will, robin.murphy, kevin.tian, baolu.lu,
	Jonathan.Cameron, zhangfei.gao, jgg, xuzaibo

[-- Attachment #1: Type: text/plain, Size: 899 bytes --]

Am 2020-04-20 um 1:44 p.m. schrieb Jacob Pan:
>> The bottom line is, when you allocate a PASID for a context, you want
>> to know how small it needs to be for all the devices that want to use
>> it. If you make it too big, some device will not be able to use it.
>> If you make it too small, you waste precious PASIDs that could be
>> used for other contexts that need them.
>>
> So for AMD, system-wide PASID allocation works with the
> restriction/optimization above?
>
Yes for KFD. On multi-GPU systems we allocate one PASID for the whole
process and use it on all GPUs.

For AMDGPU graphics contexts, we allocate one PASID for each per-GPU
context. But they're allocated from a single global PASID namespace
managed by the AMDGPU driver and shared with KFD. So we're wasting
PASIDs here, but we are compatible with a single system-wide PASID
namespace.

Regards,
  Felix


[-- Attachment #2: Type: text/html, Size: 1413 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 02/25] iommu/sva: Manage process address spaces
  2020-04-20 17:48             ` Jacob Pan
@ 2020-04-20 18:14               ` Fenghua Yu
  2020-04-21  8:55                 ` Christoph Hellwig
  0 siblings, 1 reply; 44+ messages in thread
From: Fenghua Yu @ 2020-04-20 18:14 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Jason Gunthorpe, Jean-Philippe Brucker, Christoph Hellwig, iommu,
	devicetree, linux-arm-kernel, linux-pci, linux-mm, joro,
	catalin.marinas, will, robin.murphy, kevin.tian, baolu.lu,
	Jonathan.Cameron, christian.koenig, zhangfei.gao, xuzaibo, Raj,
	Ashok

On Mon, Apr 20, 2020 at 10:48:50AM -0700, Jacob Pan wrote:
> On Mon, 20 Apr 2020 10:57:27 -0300
> Jason Gunthorpe <jgg@ziepe.ca> wrote:
> 
> > On Mon, Apr 20, 2020 at 09:42:13AM +0200, Jean-Philippe Brucker wrote:
> > > On Thu, Apr 16, 2020 at 05:13:31AM -0700, Christoph Hellwig wrote:  
> > > > On Thu, Apr 16, 2020 at 10:54:02AM +0200, Jean-Philippe Brucker
> > > > wrote:  
> > > > > On Thu, Apr 16, 2020 at 12:28:52AM -0700, Christoph Hellwig
> > > > > wrote:  
> > > > > > > +	rcu_read_lock();
> > > > > > > +	hlist_for_each_entry_rcu(bond, &io_mm->devices,
> > > > > > > mm_node)
> > > > > > > +		io_mm->ops->invalidate(bond->sva.dev,
> > > > > > > io_mm->pasid, io_mm->ctx,
> > > > > > > +				       start, end - start);
> > > > > > > +	rcu_read_unlock();
> > > > > > > +}  
> > > > > > 
> > > > > > What is the reason that the devices don't register their own
> > > > > > notifiers? This kinds of multiplexing is always rather messy,
> > > > > > and you do it for all the methods.  
> > > > > 
> > > > > This sends TLB and ATC invalidations through the IOMMU, it
> > > > > doesn't go through device drivers  
> > > > 
> > > > I don't think we mean the same thing, probably because of my
> > > > rather imprecise use of the word device.
> > > > 
> > > > What I mean is that the mmu_notifier should not be embedded into
> > > > the io_mm structure (whch btw, seems to have a way to generic
> > > > name, just like all other io_* prefixed names), but instead into
> > > > the iommu_bond structure.  That avoid the whole multiplexing
> > > > layer.  
> > > 
> > > Right, I can see the appeal. I still like having a single mmu
> > > notifier per mm because it ensures we allocate a single PASID per
> > > mm (as required by x86). I suppose one alternative is to maintain a
> > > hashtable of mm->pasid, to avoid iterating over all bonds during
> > > allocation.  
> > 
> > I've been getting rid of hash tables like this.. Adding it to the
> > mm_struct does seem reasonable, I think PASID is a pretty broad
> > concept now.
> > 
> Agreed, perhaps Fenghua can consider that in his patchset. It would
> help align life cycles as well.
> https://lkml.org/lkml/2020/3/30/910>

Seems we depend on each other: my patch defines pasid in mm_struct.
I can free PASID in your detach() function.

Thanks.

-Fenghua


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v5 02/25] iommu/sva: Manage process address spaces
  2020-04-20 18:14               ` Fenghua Yu
@ 2020-04-21  8:55                 ` Christoph Hellwig
  0 siblings, 0 replies; 44+ messages in thread
From: Christoph Hellwig @ 2020-04-21  8:55 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Jacob Pan, Jason Gunthorpe, Jean-Philippe Brucker,
	Christoph Hellwig, iommu, devicetree, linux-arm-kernel,
	linux-pci, linux-mm, joro, catalin.marinas, will, robin.murphy,
	kevin.tian, baolu.lu, Jonathan.Cameron, christian.koenig,
	zhangfei.gao, xuzaibo, Raj, Ashok

On Mon, Apr 20, 2020 at 11:14:37AM -0700, Fenghua Yu wrote:
> > Agreed, perhaps Fenghua can consider that in his patchset. It would
> > help align life cycles as well.
> > https://lkml.org/lkml/2020/3/30/910>
> 
> Seems we depend on each other: my patch defines pasid in mm_struct.
> I can free PASID in your detach() function.

Looks like this should go into the same series.  I also don't see any
good reason to have the pasid in the x86-specific context vs the common
mm_struct.


^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2020-04-21  8:55 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-14 17:02 [PATCH v5 00/25] iommu: Shared Virtual Addressing and SMMUv3 support Jean-Philippe Brucker
2020-04-14 17:02 ` [PATCH v5 01/25] mm/mmu_notifiers: pass private data down to alloc_notifier() Jean-Philippe Brucker
2020-04-14 18:09   ` Jason Gunthorpe
2020-04-14 17:02 ` [PATCH v5 02/25] iommu/sva: Manage process address spaces Jean-Philippe Brucker
2020-04-16  7:28   ` Christoph Hellwig
2020-04-16  8:54     ` Jean-Philippe Brucker
2020-04-16 12:13       ` Christoph Hellwig
2020-04-20  7:42         ` Jean-Philippe Brucker
2020-04-20  8:10           ` Christoph Hellwig
2020-04-20 11:44             ` Christian König
2020-04-20 11:55               ` Christoph Hellwig
2020-04-20 12:40                 ` Christian König
2020-04-20 15:00                   ` Felix Kuehling
2020-04-20 17:44                     ` Jacob Pan
2020-04-20 17:52                       ` Felix Kuehling
2020-04-20 13:57           ` Jason Gunthorpe
2020-04-20 17:48             ` Jacob Pan
2020-04-20 18:14               ` Fenghua Yu
2020-04-21  8:55                 ` Christoph Hellwig
2020-04-14 17:02 ` [PATCH v5 03/25] iommu: Add a page fault handler Jean-Philippe Brucker
2020-04-14 17:02 ` [PATCH v5 04/25] iommu/sva: Search mm by PASID Jean-Philippe Brucker
2020-04-14 17:02 ` [PATCH v5 05/25] iommu/iopf: Handle mm faults Jean-Philippe Brucker
2020-04-14 17:02 ` [PATCH v5 06/25] iommu/sva: Register page fault handler Jean-Philippe Brucker
2020-04-14 17:02 ` [PATCH v5 07/25] arm64: mm: Add asid_gen_match() helper Jean-Philippe Brucker
2020-04-14 17:02 ` [PATCH v5 08/25] arm64: mm: Pin down ASIDs for sharing mm with devices Jean-Philippe Brucker
2020-04-14 17:02 ` [PATCH v5 09/25] iommu/io-pgtable-arm: Move some definitions to a header Jean-Philippe Brucker
2020-04-14 17:02 ` [PATCH v5 10/25] iommu/arm-smmu-v3: Manage ASIDs with xarray Jean-Philippe Brucker
2020-04-14 17:02 ` [PATCH v5 11/25] arm64: cpufeature: Export symbol read_sanitised_ftr_reg() Jean-Philippe Brucker
2020-04-14 17:02 ` [PATCH v5 12/25] iommu/arm-smmu-v3: Share process page tables Jean-Philippe Brucker
2020-04-14 17:02 ` [PATCH v5 13/25] iommu/arm-smmu-v3: Seize private ASID Jean-Philippe Brucker
2020-04-14 17:02 ` [PATCH v5 14/25] iommu/arm-smmu-v3: Add support for VHE Jean-Philippe Brucker
2020-04-14 17:02 ` [PATCH v5 15/25] iommu/arm-smmu-v3: Enable broadcast TLB maintenance Jean-Philippe Brucker
2020-04-14 17:02 ` [PATCH v5 16/25] iommu/arm-smmu-v3: Add SVA feature checking Jean-Philippe Brucker
2020-04-14 17:02 ` [PATCH v5 17/25] iommu/arm-smmu-v3: Implement mm operations Jean-Philippe Brucker
2020-04-14 17:02 ` [PATCH v5 18/25] iommu/arm-smmu-v3: Hook up ATC invalidation to mm ops Jean-Philippe Brucker
2020-04-14 17:02 ` [PATCH v5 19/25] iommu/arm-smmu-v3: Add support for Hardware Translation Table Update Jean-Philippe Brucker
2020-04-14 17:02 ` [PATCH v5 20/25] iommu/arm-smmu-v3: Maintain a SID->device structure Jean-Philippe Brucker
2020-04-14 17:02 ` [PATCH v5 21/25] dt-bindings: document stall property for IOMMU masters Jean-Philippe Brucker
2020-04-14 17:02 ` [PATCH v5 22/25] iommu/arm-smmu-v3: Add stall support for platform devices Jean-Philippe Brucker
2020-04-14 17:02 ` [PATCH v5 23/25] PCI/ATS: Add PRI stubs Jean-Philippe Brucker
2020-04-14 18:03   ` Kuppuswamy, Sathyanarayanan
2020-04-14 17:02 ` [PATCH v5 24/25] PCI/ATS: Export PRI functions Jean-Philippe Brucker
2020-04-14 18:03   ` Kuppuswamy, Sathyanarayanan
2020-04-14 17:02 ` [PATCH v5 25/25] iommu/arm-smmu-v3: Add support for PRI Jean-Philippe Brucker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).