All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/37] Shared Virtual Addressing for the IOMMU
@ 2018-02-12 18:33 ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

Shared Virtual Addressing (SVA) is the ability to share process address
spaces with devices. It is called "SVM" (Shared Virtual Memory) by
OpenCL and some IOMMU architectures, but since that abbreviation is
already used for AMD virtualisation in Linux (Secure Virtual Machine),
we prefer the less ambiguous "SVA".

Sharing process address spaces with devices allows to rely on core kernel
memory management for DMA, removing some complexity from application and
device drivers. After binding to a device, applications can instruct it to
perform DMA on buffers obtained with malloc.

The device, buses and the IOMMU must support the following features:

* Multiple address spaces per device, for example using the PCI PASID
  (Process Address Space ID) extension. The IOMMU driver allocates a
  PASID and the device uses it in DMA transactions.

* I/O Page Faults (IOPF), for example PCI PRI (Page Request Interface) or
  Arm SMMU stall. The core mm handles translation faults from the IOMMU.

* MMU and IOMMU implement compatible page table formats.

This series requires to support all three features. I tried to
facilitate using only a subset of them but enabling it requires more
work. Upcoming patches will enable private PASID management, which
allows device driver to use an API similar to classical DMA,
map()/unmap() on PASIDs. In the future device drivers should also be
able to use SVA without IOPF by pinning all pages, or without PASID by
sharing the single device address space with a process.

Although we don't have any performance measurement at the moment, SVA
will likely be slower than classical DMA since it relies on page faults,
whereas classical DMA pins all pages in memory. SVA mostly aims at
simplifying DMA management, but also improves security by isolating
address spaces in devices.

Intel and AMD IOMMU drivers already offer slightly differing public
functions that bind process address spaces to devices. Because they don't
go through an architecture-agnostic API, only integrated devices could
use them so far.
                                ---

The series adds an SVA API to the IOMMU core, an example implementation
(SMMUv3), and an example user (VFIO). Since last version, sent as RFCv2
in October [1], I reworked the API and fixed some bugs.

Patches 1-6 introduce the bind API and track address spaces. This
version of the patchset improves documentation, adds device_init()/
shutdown(), and per-bond device driver data. Functions available to
device drivers are:

	iommu_sva_device_init(dev, features, max_pasid)
	iommu_sva_device_shutdown(dev)
	iommu_register_mm_exit_handler(dev, handler)
	iommu_unregister_mm_exit_handler(dev)
	iommu_sva_bind_device(dev, mm, *pasid, flags, drvdata)
	iommu_sva_unbind_device(dev, pasid)

Patches 7-10 add a generic fault handler. This version reuses the
structures introduced by Jacob Pan's vSVA series [2] (with some changes
to match the most recent comments in that thread).

Patches 11-36 add complete SVA support to the SMMUv3 driver, for both
platform and PCI devices. If you don't care about SMMU I advise to only
look at patches 25, 27, 29 and 35, which use the tools introduced
earlier.

In this version, context code for SMMUv3 moved to a separate module,
behind an interface reusable by other IOMMU drivers, and easily
extensible for private PASIDs. There are complicated interactions
between private and shared contexts (they have a common ASID space), so
moving it all to a separate file also helps making sense of refs and
locks.

Finally, patch 37 adds an ioctl to VFIO providing SVA to userspace
drivers. Since last version I fixed a few bugs.

You can pull the full series based onto v4.16-rc1+fault patches at:
git://linux-arm.org/linux-jpb.git sva/v1

I tested this code on a software model implementing an SMMUv3 and a
dummy DMA devices. Any testing report would be greatly appreciated!

[1] [RFCv2 PATCH 00/36] Process management for IOMMU + SVM for SMMUv3
    https://www.spinics.net/lists/arm-kernel/msg609771.html
[2] [PATCH v3 00/16] IOMMU driver support for SVM virtualization
    https://www.spinics.net/lists/kernel/msg2651481.html

Jean-Philippe Brucker (37):
  iommu: Introduce Shared Virtual Addressing API
  iommu/sva: Bind process address spaces to devices
  iommu/sva: Manage process address spaces
  iommu/sva: Add a mm_exit callback for device drivers
  iommu/sva: Track mm changes with an MMU notifier
  iommu/sva: Search mm by PASID
  iommu: Add a page fault handler
  iommu/fault: Handle mm faults
  iommu/fault: Let handler return a fault response
  iommu/fault: Allow blocking fault handlers
  dt-bindings: document stall and PASID properties for IOMMU masters
  iommu/of: Add stall and pasid properties to iommu_fwspec
  arm64: mm: Pin down ASIDs for sharing mm with devices
  iommu/arm-smmu-v3: Link domains and devices
  iommu/io-pgtable-arm: Factor out ARM LPAE register defines
  iommu: Add generic PASID table library
  iommu/arm-smmu-v3: Move context descriptor code
  iommu/arm-smmu-v3: Add support for Substream IDs
  iommu/arm-smmu-v3: Add second level of context descriptor table
  iommu/arm-smmu-v3: Share process page tables
  iommu/arm-smmu-v3: Seize private ASID
  iommu/arm-smmu-v3: Add support for VHE
  iommu/arm-smmu-v3: Enable broadcast TLB maintenance
  iommu/arm-smmu-v3: Add SVA feature checking
  iommu/arm-smmu-v3: Implement mm operations
  iommu/arm-smmu-v3: Add support for Hardware Translation Table Update
  iommu/arm-smmu-v3: Register fault workqueue
  iommu/arm-smmu-v3: Maintain a SID->device structure
  iommu/arm-smmu-v3: Add stall support for platform devices
  ACPI/IORT: Check ATS capability in root complex nodes
  iommu/arm-smmu-v3: Add support for PCI ATS
  iommu/arm-smmu-v3: Hook up ATC invalidation to mm ops
  iommu/arm-smmu-v3: Disable tagged pointers
  PCI: Make "PRG Response PASID Required" handling common
  iommu/arm-smmu-v3: Add support for PRI
  iommu/arm-smmu-v3: Add support for PCI PASID
  vfio: Add support for Shared Virtual Addressing

 Documentation/devicetree/bindings/iommu/iommu.txt |   24 +
 MAINTAINERS                                       |    3 +-
 arch/arm64/include/asm/mmu.h                      |    1 +
 arch/arm64/include/asm/mmu_context.h              |   11 +-
 arch/arm64/mm/context.c                           |   87 +-
 drivers/acpi/arm64/iort.c                         |   11 +
 drivers/iommu/Kconfig                             |   42 +
 drivers/iommu/Makefile                            |    4 +
 drivers/iommu/amd_iommu.c                         |   19 +-
 drivers/iommu/arm-smmu-v3-context.c               |  728 +++++++++++
 drivers/iommu/arm-smmu-v3.c                       | 1395 ++++++++++++++++++---
 drivers/iommu/io-pgfault.c                        |  384 ++++++
 drivers/iommu/io-pgtable-arm.c                    |   48 +-
 drivers/iommu/io-pgtable-arm.h                    |   67 +
 drivers/iommu/iommu-pasid.c                       |   54 +
 drivers/iommu/iommu-pasid.h                       |  173 +++
 drivers/iommu/iommu-sva.c                         |  795 ++++++++++++
 drivers/iommu/iommu.c                             |  109 +-
 drivers/iommu/of_iommu.c                          |   12 +
 drivers/pci/ats.c                                 |   17 +
 drivers/vfio/vfio_iommu_type1.c                   |  399 ++++++
 include/linux/iommu.h                             |  217 +++-
 include/linux/pci-ats.h                           |    8 +
 include/uapi/linux/pci_regs.h                     |    1 +
 include/uapi/linux/vfio.h                         |   76 ++
 25 files changed, 4381 insertions(+), 304 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-v3-context.c
 create mode 100644 drivers/iommu/io-pgfault.c
 create mode 100644 drivers/iommu/io-pgtable-arm.h
 create mode 100644 drivers/iommu/iommu-pasid.c
 create mode 100644 drivers/iommu/iommu-pasid.h
 create mode 100644 drivers/iommu/iommu-sva.c

-- 
2.15.1


^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 00/37] Shared Virtual Addressing for the IOMMU
@ 2018-02-12 18:33 ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

Shared Virtual Addressing (SVA) is the ability to share process address
spaces with devices. It is called "SVM" (Shared Virtual Memory) by
OpenCL and some IOMMU architectures, but since that abbreviation is
already used for AMD virtualisation in Linux (Secure Virtual Machine),
we prefer the less ambiguous "SVA".

Sharing process address spaces with devices allows to rely on core kernel
memory management for DMA, removing some complexity from application and
device drivers. After binding to a device, applications can instruct it to
perform DMA on buffers obtained with malloc.

The device, buses and the IOMMU must support the following features:

* Multiple address spaces per device, for example using the PCI PASID
  (Process Address Space ID) extension. The IOMMU driver allocates a
  PASID and the device uses it in DMA transactions.

* I/O Page Faults (IOPF), for example PCI PRI (Page Request Interface) or
  Arm SMMU stall. The core mm handles translation faults from the IOMMU.

* MMU and IOMMU implement compatible page table formats.

This series requires to support all three features. I tried to
facilitate using only a subset of them but enabling it requires more
work. Upcoming patches will enable private PASID management, which
allows device driver to use an API similar to classical DMA,
map()/unmap() on PASIDs. In the future device drivers should also be
able to use SVA without IOPF by pinning all pages, or without PASID by
sharing the single device address space with a process.

Although we don't have any performance measurement at the moment, SVA
will likely be slower than classical DMA since it relies on page faults,
whereas classical DMA pins all pages in memory. SVA mostly aims at
simplifying DMA management, but also improves security by isolating
address spaces in devices.

Intel and AMD IOMMU drivers already offer slightly differing public
functions that bind process address spaces to devices. Because they don't
go through an architecture-agnostic API, only integrated devices could
use them so far.
                                ---

The series adds an SVA API to the IOMMU core, an example implementation
(SMMUv3), and an example user (VFIO). Since last version, sent as RFCv2
in October [1], I reworked the API and fixed some bugs.

Patches 1-6 introduce the bind API and track address spaces. This
version of the patchset improves documentation, adds device_init()/
shutdown(), and per-bond device driver data. Functions available to
device drivers are:

	iommu_sva_device_init(dev, features, max_pasid)
	iommu_sva_device_shutdown(dev)
	iommu_register_mm_exit_handler(dev, handler)
	iommu_unregister_mm_exit_handler(dev)
	iommu_sva_bind_device(dev, mm, *pasid, flags, drvdata)
	iommu_sva_unbind_device(dev, pasid)

Patches 7-10 add a generic fault handler. This version reuses the
structures introduced by Jacob Pan's vSVA series [2] (with some changes
to match the most recent comments in that thread).

Patches 11-36 add complete SVA support to the SMMUv3 driver, for both
platform and PCI devices. If you don't care about SMMU I advise to only
look at patches 25, 27, 29 and 35, which use the tools introduced
earlier.

In this version, context code for SMMUv3 moved to a separate module,
behind an interface reusable by other IOMMU drivers, and easily
extensible for private PASIDs. There are complicated interactions
between private and shared contexts (they have a common ASID space), so
moving it all to a separate file also helps making sense of refs and
locks.

Finally, patch 37 adds an ioctl to VFIO providing SVA to userspace
drivers. Since last version I fixed a few bugs.

You can pull the full series based onto v4.16-rc1+fault patches at:
git://linux-arm.org/linux-jpb.git sva/v1

I tested this code on a software model implementing an SMMUv3 and a
dummy DMA devices. Any testing report would be greatly appreciated!

[1] [RFCv2 PATCH 00/36] Process management for IOMMU + SVM for SMMUv3
    https://www.spinics.net/lists/arm-kernel/msg609771.html
[2] [PATCH v3 00/16] IOMMU driver support for SVM virtualization
    https://www.spinics.net/lists/kernel/msg2651481.html

Jean-Philippe Brucker (37):
  iommu: Introduce Shared Virtual Addressing API
  iommu/sva: Bind process address spaces to devices
  iommu/sva: Manage process address spaces
  iommu/sva: Add a mm_exit callback for device drivers
  iommu/sva: Track mm changes with an MMU notifier
  iommu/sva: Search mm by PASID
  iommu: Add a page fault handler
  iommu/fault: Handle mm faults
  iommu/fault: Let handler return a fault response
  iommu/fault: Allow blocking fault handlers
  dt-bindings: document stall and PASID properties for IOMMU masters
  iommu/of: Add stall and pasid properties to iommu_fwspec
  arm64: mm: Pin down ASIDs for sharing mm with devices
  iommu/arm-smmu-v3: Link domains and devices
  iommu/io-pgtable-arm: Factor out ARM LPAE register defines
  iommu: Add generic PASID table library
  iommu/arm-smmu-v3: Move context descriptor code
  iommu/arm-smmu-v3: Add support for Substream IDs
  iommu/arm-smmu-v3: Add second level of context descriptor table
  iommu/arm-smmu-v3: Share process page tables
  iommu/arm-smmu-v3: Seize private ASID
  iommu/arm-smmu-v3: Add support for VHE
  iommu/arm-smmu-v3: Enable broadcast TLB maintenance
  iommu/arm-smmu-v3: Add SVA feature checking
  iommu/arm-smmu-v3: Implement mm operations
  iommu/arm-smmu-v3: Add support for Hardware Translation Table Update
  iommu/arm-smmu-v3: Register fault workqueue
  iommu/arm-smmu-v3: Maintain a SID->device structure
  iommu/arm-smmu-v3: Add stall support for platform devices
  ACPI/IORT: Check ATS capability in root complex nodes
  iommu/arm-smmu-v3: Add support for PCI ATS
  iommu/arm-smmu-v3: Hook up ATC invalidation to mm ops
  iommu/arm-smmu-v3: Disable tagged pointers
  PCI: Make "PRG Response PASID Required" handling common
  iommu/arm-smmu-v3: Add support for PRI
  iommu/arm-smmu-v3: Add support for PCI PASID
  vfio: Add support for Shared Virtual Addressing

 Documentation/devicetree/bindings/iommu/iommu.txt |   24 +
 MAINTAINERS                                       |    3 +-
 arch/arm64/include/asm/mmu.h                      |    1 +
 arch/arm64/include/asm/mmu_context.h              |   11 +-
 arch/arm64/mm/context.c                           |   87 +-
 drivers/acpi/arm64/iort.c                         |   11 +
 drivers/iommu/Kconfig                             |   42 +
 drivers/iommu/Makefile                            |    4 +
 drivers/iommu/amd_iommu.c                         |   19 +-
 drivers/iommu/arm-smmu-v3-context.c               |  728 +++++++++++
 drivers/iommu/arm-smmu-v3.c                       | 1395 ++++++++++++++++++---
 drivers/iommu/io-pgfault.c                        |  384 ++++++
 drivers/iommu/io-pgtable-arm.c                    |   48 +-
 drivers/iommu/io-pgtable-arm.h                    |   67 +
 drivers/iommu/iommu-pasid.c                       |   54 +
 drivers/iommu/iommu-pasid.h                       |  173 +++
 drivers/iommu/iommu-sva.c                         |  795 ++++++++++++
 drivers/iommu/iommu.c                             |  109 +-
 drivers/iommu/of_iommu.c                          |   12 +
 drivers/pci/ats.c                                 |   17 +
 drivers/vfio/vfio_iommu_type1.c                   |  399 ++++++
 include/linux/iommu.h                             |  217 +++-
 include/linux/pci-ats.h                           |    8 +
 include/uapi/linux/pci_regs.h                     |    1 +
 include/uapi/linux/vfio.h                         |   76 ++
 25 files changed, 4381 insertions(+), 304 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-v3-context.c
 create mode 100644 drivers/iommu/io-pgfault.c
 create mode 100644 drivers/iommu/io-pgtable-arm.h
 create mode 100644 drivers/iommu/iommu-pasid.c
 create mode 100644 drivers/iommu/iommu-pasid.h
 create mode 100644 drivers/iommu/iommu-sva.c

-- 
2.15.1

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 01/37] iommu: Introduce Shared Virtual Addressing API
  2018-02-12 18:33 ` Jean-Philippe Brucker
@ 2018-02-12 18:33   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

Shared Virtual Addressing (SVA) provides a way for device drivers to bind
process address spaces to devices. This requires the IOMMU to support the
same page table format as CPUs, and requires the system to support I/O
Page Faults (IOPF) and Process Address Space ID (PASID). When all of these
are available, DMA can access virtual addresses of a process. A PASID is
allocated for each process, and the device driver programs it into the
device in an implementation-specific way.

Add a new API for sharing process page tables with devices. Introduce two
IOMMU operations, sva_device_init() and sva_device_shutdown(), that
prepare the IOMMU driver for SVA. For example allocate PASID tables and
fault queues. Subsequent patches will implement the bind() and unbind()
operations.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/Kconfig     | 10 ++++++
 drivers/iommu/Makefile    |  1 +
 drivers/iommu/iommu-sva.c | 90 +++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/iommu.h     | 32 +++++++++++++++++
 4 files changed, 133 insertions(+)
 create mode 100644 drivers/iommu/iommu-sva.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index f3a21343e636..555147a61f7c 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -74,6 +74,16 @@ config IOMMU_DMA
 	select IOMMU_IOVA
 	select NEED_SG_DMA_LENGTH
 
+config IOMMU_SVA
+	bool "Shared Virtual Addressing API for the IOMMU"
+	select IOMMU_API
+	help
+	  Enable process address space management for the IOMMU API. In systems
+	  that support it, device drivers can bind process address spaces to
+	  devices and share their page tables using this API.
+
+	  If unsure, say N here.
+
 config FSL_PAMU
 	bool "Freescale IOMMU support"
 	depends on PCI
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 1fb695854809..1dbcc89ebe4c 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -3,6 +3,7 @@ obj-$(CONFIG_IOMMU_API) += iommu.o
 obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
 obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
+obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
new file mode 100644
index 000000000000..cab5d723520f
--- /dev/null
+++ b/drivers/iommu/iommu-sva.c
@@ -0,0 +1,90 @@
+/*
+ * Track processes address spaces bound to devices and allocate PASIDs.
+ *
+ * Copyright (C) 2018 ARM Ltd.
+ * Author: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0
+ */
+
+#include <linux/iommu.h>
+
+/**
+ * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a device
+ * @dev: the device
+ * @features: bitmask of features that need to be initialized
+ * @max_pasid: max PASID value supported by the device
+ *
+ * Users of the bind()/unbind() API must call this function to initialize all
+ * features required for SVA.
+ *
+ * - If the device should support multiple address spaces (e.g. PCI PASID),
+ *   IOMMU_SVA_FEAT_PASID must be requested.
+ *
+ *   By default the PASID allocated during bind() is limited by the IOMMU
+ *   capacity, and by the device PASID width defined in the PCI capability or in
+ *   the firmware description. Setting @max_pasid to a non-zero value smaller
+ *   than this limit overrides it.
+ *
+ * - If the device should support I/O Page Faults (e.g. PCI PRI),
+ *   IOMMU_SVA_FEAT_IOPF must be requested.
+ *
+ * The device should not be be performing any DMA while this function is
+ * running.
+ *
+ * Return 0 if initialization succeeded, or an error.
+ */
+int iommu_sva_device_init(struct device *dev, unsigned long features,
+			  unsigned int max_pasid)
+{
+	int ret;
+	unsigned int min_pasid = 0;
+	struct iommu_param *dev_param = dev->iommu_param;
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+	if (!domain || !dev_param || !domain->ops->sva_device_init)
+		return -ENODEV;
+
+	/*
+	 * IOMMU driver updates the limits depending on the IOMMU and device
+	 * capabilities.
+	 */
+	ret = domain->ops->sva_device_init(dev, features, &min_pasid,
+					   &max_pasid);
+	if (ret)
+		return ret;
+
+	/* FIXME: racy. Next version should have a mutex (same as fault handler) */
+	dev_param->sva_features = features;
+	dev_param->min_pasid = min_pasid;
+	dev_param->max_pasid = max_pasid;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_device_init);
+
+/**
+ * iommu_sva_device_shutdown() - Shutdown Shared Virtual Addressing for a device
+ * @dev: the device
+ *
+ * Disable SVA. The device should not be performing any DMA while this function
+ * is running.
+ */
+int iommu_sva_device_shutdown(struct device *dev)
+{
+	struct iommu_param *dev_param = dev->iommu_param;
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+	if (!domain)
+		return -ENODEV;
+
+	if (domain->ops->sva_device_shutdown)
+		domain->ops->sva_device_shutdown(dev);
+
+	dev_param->sva_features = 0;
+	dev_param->min_pasid = 0;
+	dev_param->max_pasid = 0;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_device_shutdown);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 66ef406396e9..e9e09eecdece 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -60,6 +60,11 @@ typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
 			struct device *, unsigned long, int, void *);
 typedef int (*iommu_dev_fault_handler_t)(struct iommu_fault_event *, void *);
 
+/* Request PASID support */
+#define IOMMU_SVA_FEAT_PASID		(1 << 0)
+/* Request I/O page fault support */
+#define IOMMU_SVA_FEAT_IOPF		(1 << 1)
+
 struct iommu_domain_geometry {
 	dma_addr_t aperture_start; /* First address that can be mapped    */
 	dma_addr_t aperture_end;   /* Last address that can be mapped     */
@@ -197,6 +202,8 @@ struct page_response_msg {
  * @domain_free: free iommu domain
  * @attach_dev: attach device to an iommu domain
  * @detach_dev: detach device from an iommu domain
+ * @sva_device_init: initialize Shared Virtual Adressing for a device
+ * @sva_device_shutdown: shutdown Shared Virtual Adressing for a device
  * @map: map a physically contiguous memory region to an iommu domain
  * @unmap: unmap a physically contiguous memory region from an iommu domain
  * @map_sg: map a scatter-gather list of physically contiguous memory chunks
@@ -230,6 +237,10 @@ struct iommu_ops {
 
 	int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
 	void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
+	int (*sva_device_init)(struct device *dev, unsigned long features,
+			       unsigned int *min_pasid,
+			       unsigned int *max_pasid);
+	void (*sva_device_shutdown)(struct device *dev);
 	int (*map)(struct iommu_domain *domain, unsigned long iova,
 		   phys_addr_t paddr, size_t size, int prot);
 	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
@@ -385,6 +396,9 @@ struct iommu_fault_param {
  */
 struct iommu_param {
 	struct iommu_fault_param *fault_param;
+	unsigned long sva_features;
+	unsigned int min_pasid;
+	unsigned int max_pasid;
 };
 
 int  iommu_device_register(struct iommu_device *iommu);
@@ -878,4 +892,22 @@ const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
 
 #endif /* CONFIG_IOMMU_API */
 
+#ifdef CONFIG_IOMMU_SVA
+extern int iommu_sva_device_init(struct device *dev, unsigned long features,
+				 unsigned int max_pasid);
+extern int iommu_sva_device_shutdown(struct device *dev);
+#else /* CONFIG_IOMMU_SVA */
+static inline int iommu_sva_device_init(struct device *dev,
+					unsigned long features,
+					unsigned int max_pasid)
+{
+	return -ENODEV;
+}
+
+static inline int iommu_sva_device_shutdown(struct device *dev)
+{
+	return -ENODEV;
+}
+#endif /* CONFIG_IOMMU_SVA */
+
 #endif /* __LINUX_IOMMU_H */
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 01/37] iommu: Introduce Shared Virtual Addressing API
@ 2018-02-12 18:33   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

Shared Virtual Addressing (SVA) provides a way for device drivers to bind
process address spaces to devices. This requires the IOMMU to support the
same page table format as CPUs, and requires the system to support I/O
Page Faults (IOPF) and Process Address Space ID (PASID). When all of these
are available, DMA can access virtual addresses of a process. A PASID is
allocated for each process, and the device driver programs it into the
device in an implementation-specific way.

Add a new API for sharing process page tables with devices. Introduce two
IOMMU operations, sva_device_init() and sva_device_shutdown(), that
prepare the IOMMU driver for SVA. For example allocate PASID tables and
fault queues. Subsequent patches will implement the bind() and unbind()
operations.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/Kconfig     | 10 ++++++
 drivers/iommu/Makefile    |  1 +
 drivers/iommu/iommu-sva.c | 90 +++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/iommu.h     | 32 +++++++++++++++++
 4 files changed, 133 insertions(+)
 create mode 100644 drivers/iommu/iommu-sva.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index f3a21343e636..555147a61f7c 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -74,6 +74,16 @@ config IOMMU_DMA
 	select IOMMU_IOVA
 	select NEED_SG_DMA_LENGTH
 
+config IOMMU_SVA
+	bool "Shared Virtual Addressing API for the IOMMU"
+	select IOMMU_API
+	help
+	  Enable process address space management for the IOMMU API. In systems
+	  that support it, device drivers can bind process address spaces to
+	  devices and share their page tables using this API.
+
+	  If unsure, say N here.
+
 config FSL_PAMU
 	bool "Freescale IOMMU support"
 	depends on PCI
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 1fb695854809..1dbcc89ebe4c 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -3,6 +3,7 @@ obj-$(CONFIG_IOMMU_API) += iommu.o
 obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
 obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
+obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
new file mode 100644
index 000000000000..cab5d723520f
--- /dev/null
+++ b/drivers/iommu/iommu-sva.c
@@ -0,0 +1,90 @@
+/*
+ * Track processes address spaces bound to devices and allocate PASIDs.
+ *
+ * Copyright (C) 2018 ARM Ltd.
+ * Author: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0
+ */
+
+#include <linux/iommu.h>
+
+/**
+ * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a device
+ * @dev: the device
+ * @features: bitmask of features that need to be initialized
+ * @max_pasid: max PASID value supported by the device
+ *
+ * Users of the bind()/unbind() API must call this function to initialize all
+ * features required for SVA.
+ *
+ * - If the device should support multiple address spaces (e.g. PCI PASID),
+ *   IOMMU_SVA_FEAT_PASID must be requested.
+ *
+ *   By default the PASID allocated during bind() is limited by the IOMMU
+ *   capacity, and by the device PASID width defined in the PCI capability or in
+ *   the firmware description. Setting @max_pasid to a non-zero value smaller
+ *   than this limit overrides it.
+ *
+ * - If the device should support I/O Page Faults (e.g. PCI PRI),
+ *   IOMMU_SVA_FEAT_IOPF must be requested.
+ *
+ * The device should not be be performing any DMA while this function is
+ * running.
+ *
+ * Return 0 if initialization succeeded, or an error.
+ */
+int iommu_sva_device_init(struct device *dev, unsigned long features,
+			  unsigned int max_pasid)
+{
+	int ret;
+	unsigned int min_pasid = 0;
+	struct iommu_param *dev_param = dev->iommu_param;
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+	if (!domain || !dev_param || !domain->ops->sva_device_init)
+		return -ENODEV;
+
+	/*
+	 * IOMMU driver updates the limits depending on the IOMMU and device
+	 * capabilities.
+	 */
+	ret = domain->ops->sva_device_init(dev, features, &min_pasid,
+					   &max_pasid);
+	if (ret)
+		return ret;
+
+	/* FIXME: racy. Next version should have a mutex (same as fault handler) */
+	dev_param->sva_features = features;
+	dev_param->min_pasid = min_pasid;
+	dev_param->max_pasid = max_pasid;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_device_init);
+
+/**
+ * iommu_sva_device_shutdown() - Shutdown Shared Virtual Addressing for a device
+ * @dev: the device
+ *
+ * Disable SVA. The device should not be performing any DMA while this function
+ * is running.
+ */
+int iommu_sva_device_shutdown(struct device *dev)
+{
+	struct iommu_param *dev_param = dev->iommu_param;
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+	if (!domain)
+		return -ENODEV;
+
+	if (domain->ops->sva_device_shutdown)
+		domain->ops->sva_device_shutdown(dev);
+
+	dev_param->sva_features = 0;
+	dev_param->min_pasid = 0;
+	dev_param->max_pasid = 0;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_device_shutdown);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 66ef406396e9..e9e09eecdece 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -60,6 +60,11 @@ typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
 			struct device *, unsigned long, int, void *);
 typedef int (*iommu_dev_fault_handler_t)(struct iommu_fault_event *, void *);
 
+/* Request PASID support */
+#define IOMMU_SVA_FEAT_PASID		(1 << 0)
+/* Request I/O page fault support */
+#define IOMMU_SVA_FEAT_IOPF		(1 << 1)
+
 struct iommu_domain_geometry {
 	dma_addr_t aperture_start; /* First address that can be mapped    */
 	dma_addr_t aperture_end;   /* Last address that can be mapped     */
@@ -197,6 +202,8 @@ struct page_response_msg {
  * @domain_free: free iommu domain
  * @attach_dev: attach device to an iommu domain
  * @detach_dev: detach device from an iommu domain
+ * @sva_device_init: initialize Shared Virtual Adressing for a device
+ * @sva_device_shutdown: shutdown Shared Virtual Adressing for a device
  * @map: map a physically contiguous memory region to an iommu domain
  * @unmap: unmap a physically contiguous memory region from an iommu domain
  * @map_sg: map a scatter-gather list of physically contiguous memory chunks
@@ -230,6 +237,10 @@ struct iommu_ops {
 
 	int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
 	void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
+	int (*sva_device_init)(struct device *dev, unsigned long features,
+			       unsigned int *min_pasid,
+			       unsigned int *max_pasid);
+	void (*sva_device_shutdown)(struct device *dev);
 	int (*map)(struct iommu_domain *domain, unsigned long iova,
 		   phys_addr_t paddr, size_t size, int prot);
 	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
@@ -385,6 +396,9 @@ struct iommu_fault_param {
  */
 struct iommu_param {
 	struct iommu_fault_param *fault_param;
+	unsigned long sva_features;
+	unsigned int min_pasid;
+	unsigned int max_pasid;
 };
 
 int  iommu_device_register(struct iommu_device *iommu);
@@ -878,4 +892,22 @@ const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
 
 #endif /* CONFIG_IOMMU_API */
 
+#ifdef CONFIG_IOMMU_SVA
+extern int iommu_sva_device_init(struct device *dev, unsigned long features,
+				 unsigned int max_pasid);
+extern int iommu_sva_device_shutdown(struct device *dev);
+#else /* CONFIG_IOMMU_SVA */
+static inline int iommu_sva_device_init(struct device *dev,
+					unsigned long features,
+					unsigned int max_pasid)
+{
+	return -ENODEV;
+}
+
+static inline int iommu_sva_device_shutdown(struct device *dev)
+{
+	return -ENODEV;
+}
+#endif /* CONFIG_IOMMU_SVA */
+
 #endif /* __LINUX_IOMMU_H */
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 02/37] iommu/sva: Bind process address spaces to devices
  2018-02-12 18:33 ` Jean-Philippe Brucker
@ 2018-02-12 18:33   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

Add bind() and unbind() operations to the IOMMU API. Device drivers can
use them to share process page tables with their devices. bind_group()
is provided for VFIO's convenience, as it needs to provide a coherent
interface on containers. Other device drivers will most likely want to
use bind_device(), which binds a single device in the group.

Regardless of the IOMMU group or domain a device is in, device drivers
should call bind() for each device that will use the PASID.

This patch only adds skeletons for the device driver API, most of the
implementation is still missing.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/iommu-sva.c | 105 ++++++++++++++++++++++++++++++++++++++++++++++
 drivers/iommu/iommu.c     |  63 ++++++++++++++++++++++++++++
 include/linux/iommu.h     |  36 ++++++++++++++++
 3 files changed, 204 insertions(+)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index cab5d723520f..593685d891bf 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -9,6 +9,9 @@
 
 #include <linux/iommu.h>
 
+/* TODO: stub for the fault queue. Remove later. */
+#define iommu_fault_queue_flush(...)
+
 /**
  * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a device
  * @dev: the device
@@ -78,6 +81,8 @@ int iommu_sva_device_shutdown(struct device *dev)
 	if (!domain)
 		return -ENODEV;
 
+	__iommu_sva_unbind_dev_all(dev);
+
 	if (domain->ops->sva_device_shutdown)
 		domain->ops->sva_device_shutdown(dev);
 
@@ -88,3 +93,103 @@ int iommu_sva_device_shutdown(struct device *dev)
 	return 0;
 }
 EXPORT_SYMBOL_GPL(iommu_sva_device_shutdown);
+
+/**
+ * iommu_sva_bind_device() - Bind a process address space to a device
+ * @dev: the device
+ * @mm: the mm to bind, caller must hold a reference to it
+ * @pasid: valid address where the PASID will be stored
+ * @flags: bond properties (IOMMU_SVA_FEAT_*)
+ * @drvdata: private data passed to the mm exit handler
+ *
+ * Create a bond between device and task, allowing the device to access the mm
+ * using the returned PASID. A subsequent bind() for the same device and mm will
+ * reuse the bond (and return the same PASID), but users will have to call
+ * unbind() twice.
+ *
+ * Callers should have taken care of setting up SVA for this device with
+ * iommu_sva_device_init() beforehand. They may also be notified of the bond
+ * disappearing, for example when the last task that uses the mm dies, by
+ * registering a notifier with iommu_register_mm_exit_handler().
+ *
+ * If IOMMU_SVA_FEAT_PASID is requested, a PASID is allocated and returned.
+ * TODO: The alternative, binding the non-PASID context to an mm, isn't
+ * supported at the moment because existing IOMMU domain types initialize the
+ * non-PASID context for iommu_map()/unmap() or bypass. This requires a new
+ * domain type.
+ *
+ * If IOMMU_SVA_FEAT_IOPF is not requested, the caller must pin down all
+ * mappings shared with the device. mlock() isn't sufficient, as it doesn't
+ * prevent minor page faults (e.g. copy-on-write). TODO: !IOPF isn't allowed at
+ * the moment.
+ *
+ * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an error
+ * is returned.
+ */
+int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
+			  unsigned long flags, void *drvdata)
+{
+	struct iommu_domain *domain;
+	struct iommu_param *dev_param = dev->iommu_param;
+
+	domain = iommu_get_domain_for_dev(dev);
+	if (!domain)
+		return -EINVAL;
+
+	if (!pasid)
+		return -EINVAL;
+
+	if (!dev_param || (flags & ~dev_param->sva_features))
+		return -EINVAL;
+
+	if (flags != (IOMMU_SVA_FEAT_PASID | IOMMU_SVA_FEAT_IOPF))
+		return -EINVAL;
+
+	return -ENOSYS; /* TODO */
+}
+EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
+
+/**
+ * iommu_sva_unbind_device() - Remove a bond created with iommu_sva_bind_device
+ * @dev: the device
+ * @pasid: the pasid returned by bind()
+ *
+ * Remove bond between device and address space identified by @pasid. Users
+ * should not call unbind() if the corresponding mm exited (as the PASID might
+ * have been reallocated to another process.)
+ *
+ * The device must not be issuing any more transaction for this PASID. All
+ * outstanding page requests for this PASID must have been flushed to the IOMMU.
+ *
+ * Returns 0 on success, or an error value
+ */
+int iommu_sva_unbind_device(struct device *dev, int pasid)
+{
+	struct iommu_domain *domain;
+
+	domain = iommu_get_domain_for_dev(dev);
+	if (WARN_ON(!domain))
+		return -EINVAL;
+
+	/*
+	 * Caller stopped the device from issuing PASIDs, now make sure they are
+	 * out of the fault queue.
+	 */
+	iommu_fault_queue_flush(dev);
+
+	return -ENOSYS; /* TODO */
+}
+EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);
+
+/**
+ * __iommu_sva_unbind_dev_all() - Detach all address spaces from this device
+ *
+ * When detaching @device from a domain, IOMMU drivers should use this helper.
+ */
+void __iommu_sva_unbind_dev_all(struct device *dev)
+{
+	iommu_fault_queue_flush(dev);
+
+	/* TODO */
+}
+EXPORT_SYMBOL_GPL(__iommu_sva_unbind_dev_all);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d4a4edaf2d8c..f977851c522b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1535,6 +1535,69 @@ void iommu_detach_group(struct iommu_domain *domain, struct iommu_group *group)
 }
 EXPORT_SYMBOL_GPL(iommu_detach_group);
 
+/*
+ * iommu_sva_bind_group() - Share address space with all devices in the group.
+ * @group: the iommu group
+ * @mm: the mm to bind
+ * @pasid: valid address where the PASID will be stored
+ * @flags: bond properties (IOMMU_PROCESS_BIND_*)
+ * @drvdata: private data passed to the mm exit handler
+ *
+ * Create a bond between group and process, allowing devices in the group to
+ * access the process address space using @pasid.
+ *
+ * Refer to iommu_sva_bind_device() for more details.
+ *
+ * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an error
+ * is returned.
+ */
+int iommu_sva_bind_group(struct iommu_group *group, struct mm_struct *mm,
+			 int *pasid, unsigned long flags, void *drvdata)
+{
+	struct group_device *device;
+	int ret = -ENODEV;
+
+	if (!group->domain)
+		return -EINVAL;
+
+	mutex_lock(&group->mutex);
+	list_for_each_entry(device, &group->devices, list) {
+		ret = iommu_sva_bind_device(device->dev, mm, pasid, flags,
+					    drvdata);
+		if (ret)
+			break;
+	}
+
+	if (ret) {
+		list_for_each_entry_continue_reverse(device, &group->devices, list)
+			iommu_sva_unbind_device(device->dev, *pasid);
+	}
+	mutex_unlock(&group->mutex);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_bind_group);
+
+/**
+ * iommu_sva_unbind_group() - Remove a bond created with iommu_sva_bind_group()
+ * @group: the group
+ * @pasid: the pasid returned by bind
+ *
+ * Refer to iommu_sva_unbind_device() for more details.
+ */
+int iommu_sva_unbind_group(struct iommu_group *group, int pasid)
+{
+	struct group_device *device;
+
+	mutex_lock(&group->mutex);
+	list_for_each_entry(device, &group->devices, list)
+		iommu_sva_unbind_device(device->dev, pasid);
+	mutex_unlock(&group->mutex);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_unbind_group);
+
 phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova)
 {
 	if (unlikely(domain->ops->iova_to_phys == NULL))
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index e9e09eecdece..1fb10d64b9e5 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -576,6 +576,10 @@ int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
 void iommu_fwspec_free(struct device *dev);
 int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids);
 const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode);
+extern int iommu_sva_bind_group(struct iommu_group *group,
+				struct mm_struct *mm, int *pasid,
+				unsigned long flags, void *drvdata);
+extern int iommu_sva_unbind_group(struct iommu_group *group, int pasid);
 
 #else /* CONFIG_IOMMU_API */
 
@@ -890,12 +894,28 @@ const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
 	return NULL;
 }
 
+static inline int iommu_sva_bind_group(struct iommu_group *group,
+				       struct mm_struct *mm, int *pasid,
+				       unsigned long flags, void *drvdata)
+{
+	return -ENODEV;
+}
+
+static inline int iommu_sva_unbind_group(struct iommu_group *group, int pasid)
+{
+	return -ENODEV;
+}
+
 #endif /* CONFIG_IOMMU_API */
 
 #ifdef CONFIG_IOMMU_SVA
 extern int iommu_sva_device_init(struct device *dev, unsigned long features,
 				 unsigned int max_pasid);
 extern int iommu_sva_device_shutdown(struct device *dev);
+extern int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
+				int *pasid, unsigned long flags, void *drvdata);
+extern int iommu_sva_unbind_device(struct device *dev, int pasid);
+extern void __iommu_sva_unbind_dev_all(struct device *dev);
 #else /* CONFIG_IOMMU_SVA */
 static inline int iommu_sva_device_init(struct device *dev,
 					unsigned long features,
@@ -908,6 +928,22 @@ static inline int iommu_sva_device_shutdown(struct device *dev)
 {
 	return -ENODEV;
 }
+
+static inline int iommu_sva_bind_device(struct device *dev,
+					struct mm_struct *mm, int *pasid,
+					unsigned long flags, void *drvdata)
+{
+	return -ENODEV;
+}
+
+static inline int iommu_sva_unbind_device(struct device *dev, int pasid)
+{
+	return -ENODEV;
+}
+
+static inline void __iommu_sva_unbind_dev_all(struct device *dev)
+{
+}
 #endif /* CONFIG_IOMMU_SVA */
 
 #endif /* __LINUX_IOMMU_H */
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 02/37] iommu/sva: Bind process address spaces to devices
@ 2018-02-12 18:33   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

Add bind() and unbind() operations to the IOMMU API. Device drivers can
use them to share process page tables with their devices. bind_group()
is provided for VFIO's convenience, as it needs to provide a coherent
interface on containers. Other device drivers will most likely want to
use bind_device(), which binds a single device in the group.

Regardless of the IOMMU group or domain a device is in, device drivers
should call bind() for each device that will use the PASID.

This patch only adds skeletons for the device driver API, most of the
implementation is still missing.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/iommu-sva.c | 105 ++++++++++++++++++++++++++++++++++++++++++++++
 drivers/iommu/iommu.c     |  63 ++++++++++++++++++++++++++++
 include/linux/iommu.h     |  36 ++++++++++++++++
 3 files changed, 204 insertions(+)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index cab5d723520f..593685d891bf 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -9,6 +9,9 @@
 
 #include <linux/iommu.h>
 
+/* TODO: stub for the fault queue. Remove later. */
+#define iommu_fault_queue_flush(...)
+
 /**
  * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a device
  * @dev: the device
@@ -78,6 +81,8 @@ int iommu_sva_device_shutdown(struct device *dev)
 	if (!domain)
 		return -ENODEV;
 
+	__iommu_sva_unbind_dev_all(dev);
+
 	if (domain->ops->sva_device_shutdown)
 		domain->ops->sva_device_shutdown(dev);
 
@@ -88,3 +93,103 @@ int iommu_sva_device_shutdown(struct device *dev)
 	return 0;
 }
 EXPORT_SYMBOL_GPL(iommu_sva_device_shutdown);
+
+/**
+ * iommu_sva_bind_device() - Bind a process address space to a device
+ * @dev: the device
+ * @mm: the mm to bind, caller must hold a reference to it
+ * @pasid: valid address where the PASID will be stored
+ * @flags: bond properties (IOMMU_SVA_FEAT_*)
+ * @drvdata: private data passed to the mm exit handler
+ *
+ * Create a bond between device and task, allowing the device to access the mm
+ * using the returned PASID. A subsequent bind() for the same device and mm will
+ * reuse the bond (and return the same PASID), but users will have to call
+ * unbind() twice.
+ *
+ * Callers should have taken care of setting up SVA for this device with
+ * iommu_sva_device_init() beforehand. They may also be notified of the bond
+ * disappearing, for example when the last task that uses the mm dies, by
+ * registering a notifier with iommu_register_mm_exit_handler().
+ *
+ * If IOMMU_SVA_FEAT_PASID is requested, a PASID is allocated and returned.
+ * TODO: The alternative, binding the non-PASID context to an mm, isn't
+ * supported at the moment because existing IOMMU domain types initialize the
+ * non-PASID context for iommu_map()/unmap() or bypass. This requires a new
+ * domain type.
+ *
+ * If IOMMU_SVA_FEAT_IOPF is not requested, the caller must pin down all
+ * mappings shared with the device. mlock() isn't sufficient, as it doesn't
+ * prevent minor page faults (e.g. copy-on-write). TODO: !IOPF isn't allowed at
+ * the moment.
+ *
+ * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an error
+ * is returned.
+ */
+int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
+			  unsigned long flags, void *drvdata)
+{
+	struct iommu_domain *domain;
+	struct iommu_param *dev_param = dev->iommu_param;
+
+	domain = iommu_get_domain_for_dev(dev);
+	if (!domain)
+		return -EINVAL;
+
+	if (!pasid)
+		return -EINVAL;
+
+	if (!dev_param || (flags & ~dev_param->sva_features))
+		return -EINVAL;
+
+	if (flags != (IOMMU_SVA_FEAT_PASID | IOMMU_SVA_FEAT_IOPF))
+		return -EINVAL;
+
+	return -ENOSYS; /* TODO */
+}
+EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
+
+/**
+ * iommu_sva_unbind_device() - Remove a bond created with iommu_sva_bind_device
+ * @dev: the device
+ * @pasid: the pasid returned by bind()
+ *
+ * Remove bond between device and address space identified by @pasid. Users
+ * should not call unbind() if the corresponding mm exited (as the PASID might
+ * have been reallocated to another process.)
+ *
+ * The device must not be issuing any more transaction for this PASID. All
+ * outstanding page requests for this PASID must have been flushed to the IOMMU.
+ *
+ * Returns 0 on success, or an error value
+ */
+int iommu_sva_unbind_device(struct device *dev, int pasid)
+{
+	struct iommu_domain *domain;
+
+	domain = iommu_get_domain_for_dev(dev);
+	if (WARN_ON(!domain))
+		return -EINVAL;
+
+	/*
+	 * Caller stopped the device from issuing PASIDs, now make sure they are
+	 * out of the fault queue.
+	 */
+	iommu_fault_queue_flush(dev);
+
+	return -ENOSYS; /* TODO */
+}
+EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);
+
+/**
+ * __iommu_sva_unbind_dev_all() - Detach all address spaces from this device
+ *
+ * When detaching @device from a domain, IOMMU drivers should use this helper.
+ */
+void __iommu_sva_unbind_dev_all(struct device *dev)
+{
+	iommu_fault_queue_flush(dev);
+
+	/* TODO */
+}
+EXPORT_SYMBOL_GPL(__iommu_sva_unbind_dev_all);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d4a4edaf2d8c..f977851c522b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1535,6 +1535,69 @@ void iommu_detach_group(struct iommu_domain *domain, struct iommu_group *group)
 }
 EXPORT_SYMBOL_GPL(iommu_detach_group);
 
+/*
+ * iommu_sva_bind_group() - Share address space with all devices in the group.
+ * @group: the iommu group
+ * @mm: the mm to bind
+ * @pasid: valid address where the PASID will be stored
+ * @flags: bond properties (IOMMU_PROCESS_BIND_*)
+ * @drvdata: private data passed to the mm exit handler
+ *
+ * Create a bond between group and process, allowing devices in the group to
+ * access the process address space using @pasid.
+ *
+ * Refer to iommu_sva_bind_device() for more details.
+ *
+ * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an error
+ * is returned.
+ */
+int iommu_sva_bind_group(struct iommu_group *group, struct mm_struct *mm,
+			 int *pasid, unsigned long flags, void *drvdata)
+{
+	struct group_device *device;
+	int ret = -ENODEV;
+
+	if (!group->domain)
+		return -EINVAL;
+
+	mutex_lock(&group->mutex);
+	list_for_each_entry(device, &group->devices, list) {
+		ret = iommu_sva_bind_device(device->dev, mm, pasid, flags,
+					    drvdata);
+		if (ret)
+			break;
+	}
+
+	if (ret) {
+		list_for_each_entry_continue_reverse(device, &group->devices, list)
+			iommu_sva_unbind_device(device->dev, *pasid);
+	}
+	mutex_unlock(&group->mutex);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_bind_group);
+
+/**
+ * iommu_sva_unbind_group() - Remove a bond created with iommu_sva_bind_group()
+ * @group: the group
+ * @pasid: the pasid returned by bind
+ *
+ * Refer to iommu_sva_unbind_device() for more details.
+ */
+int iommu_sva_unbind_group(struct iommu_group *group, int pasid)
+{
+	struct group_device *device;
+
+	mutex_lock(&group->mutex);
+	list_for_each_entry(device, &group->devices, list)
+		iommu_sva_unbind_device(device->dev, pasid);
+	mutex_unlock(&group->mutex);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_unbind_group);
+
 phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova)
 {
 	if (unlikely(domain->ops->iova_to_phys == NULL))
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index e9e09eecdece..1fb10d64b9e5 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -576,6 +576,10 @@ int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
 void iommu_fwspec_free(struct device *dev);
 int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids);
 const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode);
+extern int iommu_sva_bind_group(struct iommu_group *group,
+				struct mm_struct *mm, int *pasid,
+				unsigned long flags, void *drvdata);
+extern int iommu_sva_unbind_group(struct iommu_group *group, int pasid);
 
 #else /* CONFIG_IOMMU_API */
 
@@ -890,12 +894,28 @@ const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
 	return NULL;
 }
 
+static inline int iommu_sva_bind_group(struct iommu_group *group,
+				       struct mm_struct *mm, int *pasid,
+				       unsigned long flags, void *drvdata)
+{
+	return -ENODEV;
+}
+
+static inline int iommu_sva_unbind_group(struct iommu_group *group, int pasid)
+{
+	return -ENODEV;
+}
+
 #endif /* CONFIG_IOMMU_API */
 
 #ifdef CONFIG_IOMMU_SVA
 extern int iommu_sva_device_init(struct device *dev, unsigned long features,
 				 unsigned int max_pasid);
 extern int iommu_sva_device_shutdown(struct device *dev);
+extern int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
+				int *pasid, unsigned long flags, void *drvdata);
+extern int iommu_sva_unbind_device(struct device *dev, int pasid);
+extern void __iommu_sva_unbind_dev_all(struct device *dev);
 #else /* CONFIG_IOMMU_SVA */
 static inline int iommu_sva_device_init(struct device *dev,
 					unsigned long features,
@@ -908,6 +928,22 @@ static inline int iommu_sva_device_shutdown(struct device *dev)
 {
 	return -ENODEV;
 }
+
+static inline int iommu_sva_bind_device(struct device *dev,
+					struct mm_struct *mm, int *pasid,
+					unsigned long flags, void *drvdata)
+{
+	return -ENODEV;
+}
+
+static inline int iommu_sva_unbind_device(struct device *dev, int pasid)
+{
+	return -ENODEV;
+}
+
+static inline void __iommu_sva_unbind_dev_all(struct device *dev)
+{
+}
 #endif /* CONFIG_IOMMU_SVA */
 
 #endif /* __LINUX_IOMMU_H */
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 03/37] iommu/sva: Manage process address spaces
  2018-02-12 18:33 ` Jean-Philippe Brucker
  (?)
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: joro-zLv9SwRftAIdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	mark.rutland-5wv7dgnIgG8, catalin.marinas-5wv7dgnIgG8,
	will.deacon-5wv7dgnIgG8, lorenzo.pieralisi-5wv7dgnIgG8,
	hanjun.guo-QSEj5FYQhm4dnm+yROfE0A, sudeep.holla-5wv7dgnIgG8,
	rjw-LthD3rsA81gm4RdzfppkhA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	robin.murphy-5wv7dgnIgG8, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	tn-nYOzD4b6Jr9Wk0Htik3J/w, liubo95-hv44wF8Li93QT0dZR+AlfA,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	jonathan.cameron-hv44wF8Li93QT0dZR+AlfA,
	shunyong.yang-PT9Dzx9SjPiXmMXjJBpWqg,
	nwatters-sgV2jX0FEOL9JmXXK+q4OQ, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	jcrouse-sgV2jX0FEOL9JmXXK+q4OQ, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA,
	yi.l.liu-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	robdclark-Re5JQEeQqe8AvxtiuMwx3w, christian.koenig-5C7GfCeVMHo,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA

Introduce boilerplate code for allocating IOMMU mm structures and binding
them to devices. Four operations are added to IOMMU drivers:

* mm_alloc(): to create an io_mm structure and perform architecture-
  specific operations required to grab the process (for instance on ARM,
  pin down the CPU ASID so that the process doesn't get assigned a new
  ASID on rollover).

  There is a single valid io_mm structure per Linux mm. Future extensions
  may also use io_mm for kernel-managed address spaces, populated with
  map()/unmap() calls instead of bound to process address spaces. This
  patch focuses on "shared" io_mm.

* mm_attach(): attach an mm to a device. The IOMMU driver checks that the
  device is capable of sharing an address space, and writes the PASID
  table entry to install the pgd.

  Some IOMMU drivers will have a single PASID table per domain, for
  convenience. Other can implement it differently but to help these
  drivers, mm_attach and mm_detach take 'attach_domain' and
  'detach_domain' parameters, that tell whether they need to set and clear
  the PASID entry or only send the required TLB invalidations.

* mm_detach(): detach an mm from a device. The IOMMU driver removes the
  PASID table entry and invalidates the IOTLBs.

* mm_free(): free a structure allocated by mm_alloc(), and let arch
  release the process.

mm_attach and mm_detach operations are serialized with a spinlock. At the
moment it is global, but if we try to optimize it, the core should at
least prevent concurrent attach()/detach() on the same domain (so
multi-level PASID table code can allocate tables lazily). mm_alloc() can
sleep, but mm_free must not (because we'll have to call it from call_srcu
later on.)

At the moment we use an IDR for allocating PASIDs and retrieving contexts.
We also use a single spinlock. These can be refined and optimized later (a
custom allocator will be needed for top-down PASID allocation).

Keeping track of address spaces requires the use of MMU notifiers.
Handling process exit with regard to unbind() is tricky, so it is left for
another patch and we explicitly fail mm_alloc() for the moment.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/iommu-sva.c | 382 +++++++++++++++++++++++++++++++++++++++++++++-
 drivers/iommu/iommu.c     |   2 +
 include/linux/iommu.h     |  25 +++
 3 files changed, 406 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 593685d891bf..f9af9d66b3ed 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -7,11 +7,321 @@
  * SPDX-License-Identifier: GPL-2.0
  */
 
+#include <linux/idr.h>
 #include <linux/iommu.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+
+/**
+ * DOC: io_mm model
+ *
+ * The io_mm keeps track of process address spaces shared between CPU and IOMMU.
+ * The following example illustrates the relation between structures
+ * iommu_domain, io_mm and iommu_bond. An iommu_bond is a link between io_mm and
+ * device. A device can have multiple io_mm and an io_mm may be bound to
+ * multiple devices.
+ *              ___________________________
+ *             |  IOMMU domain A           |
+ *             |  ________________         |
+ *             | |  IOMMU group   |        +------- io_pgtables
+ *             | |                |        |
+ *             | |   dev 00:00.0 ----+------- bond --- io_mm X
+ *             | |________________|   \    |
+ *             |                       '----- bond ---.
+ *             |___________________________|           \
+ *              ___________________________             \
+ *             |  IOMMU domain B           |           io_mm Y
+ *             |  ________________         |           / /
+ *             | |  IOMMU group   |        |          / /
+ *             | |                |        |         / /
+ *             | |   dev 00:01.0 ------------ bond -' /
+ *             | |   dev 00:01.1 ------------ bond --'
+ *             | |________________|        |
+ *             |                           +------- io_pgtables
+ *             |___________________________|
+ *
+ * In this example, device 00:00.0 is in domain A, devices 00:01.* are in domain
+ * B. All devices within the same domain access the same address spaces. Device
+ * 00:00.0 accesses address spaces X and Y, each corresponding to an mm_struct.
+ * Devices 00:01.* only access address space Y. In addition each
+ * IOMMU_DOMAIN_DMA domain has a private address space, io_pgtable, that is
+ * managed with iommu_map()/iommu_unmap(), and isn't shared with the CPU MMU.
+ *
+ * To obtain the above configuration, users would for instance issue the
+ * following calls:
+ *
+ *     iommu_sva_bind_device(dev 00:00.0, mm X, ...) -> PASID 1
+ *     iommu_sva_bind_device(dev 00:00.0, mm Y, ...) -> PASID 2
+ *     iommu_sva_bind_device(dev 00:01.0, mm Y, ...) -> PASID 2
+ *     iommu_sva_bind_device(dev 00:01.1, mm Y, ...) -> PASID 2
+ *
+ * A single Process Address Space ID (PASID) is allocated for each mm. In the
+ * example, devices use PASID 1 to read/write into address space X and PASID 2
+ * to read/write into address space Y.
+ *
+ * Hardware tables describing this configuration in the IOMMU would typically
+ * look like this:
+ *
+ *                                PASID tables
+ *                                 of domain A
+ *                              .->+--------+
+ *                             / 0 |        |-------> io_pgtable
+ *                            /    +--------+
+ *            Device tables  /   1 |        |-------> pgd X
+ *              +--------+  /      +--------+
+ *      00:00.0 |      A |-'     2 |        |--.
+ *              +--------+         +--------+   \
+ *              :        :       3 |        |    \
+ *              +--------+         +--------+     --> pgd Y
+ *      00:01.0 |      B |--.                    /
+ *              +--------+   \                  |
+ *      00:01.1 |      B |----+   PASID tables  |
+ *              +--------+     \   of domain B  |
+ *                              '->+--------+   |
+ *                               0 |        |-- | --> io_pgtable
+ *                                 +--------+   |
+ *                               1 |        |   |
+ *                                 +--------+   |
+ *                               2 |        |---'
+ *                                 +--------+
+ *                               3 |        |
+ *                                 +--------+
+ *
+ * With this model, a single call binds all devices in a given domain to an
+ * address space. Other devices in the domain will get the same bond implicitly.
+ * However, users must issue one bind() for each device, because IOMMUs may
+ * implement SVA differently. Furthermore, mandating one bind() per device
+ * allows the driver to perform sanity-checks on device capabilities.
+ *
+ * On Arm and AMD IOMMUs, entry 0 of the PASID table can be used to hold
+ * non-PASID translations. In this case PASID 0 is reserved and entry 0 points
+ * to the io_pgtable base. On Intel IOMMU, the io_pgtable base would be held in
+ * the device table and PASID 0 would be available to the allocator.
+ */
 
 /* TODO: stub for the fault queue. Remove later. */
 #define iommu_fault_queue_flush(...)
 
+struct iommu_bond {
+	struct io_mm		*io_mm;
+	struct device		*dev;
+	struct iommu_domain	*domain;
+
+	struct list_head	mm_head;
+	struct list_head	dev_head;
+	struct list_head	domain_head;
+
+	void			*drvdata;
+
+	/* Number of bind() calls */
+	refcount_t		refs;
+};
+
+/*
+ * Because we're using an IDR, PASIDs are limited to 31 bits (the sign bit is
+ * used for returning errors). In practice implementations will use at most 20
+ * bits, which is the PCI limit.
+ */
+static DEFINE_IDR(iommu_pasid_idr);
+
+/*
+ * For the moment this is an all-purpose lock. It serializes
+ * access/modifications to bonds, access/modifications to the PASID IDR, and
+ * changes to io_mm refcount as well.
+ */
+static DEFINE_SPINLOCK(iommu_sva_lock);
+
+static struct io_mm *
+io_mm_alloc(struct iommu_domain *domain, struct device *dev,
+	    struct mm_struct *mm)
+{
+	int ret;
+	int pasid;
+	struct io_mm *io_mm;
+	struct iommu_param *dev_param = dev->iommu_param;
+
+	if (!dev_param || !domain->ops->mm_alloc || !domain->ops->mm_free)
+		return ERR_PTR(-ENODEV);
+
+	io_mm = domain->ops->mm_alloc(domain, mm);
+	if (IS_ERR(io_mm))
+		return io_mm;
+	if (!io_mm)
+		return ERR_PTR(-ENOMEM);
+
+	/*
+	 * The mm must not be freed until after the driver frees the io_mm
+	 * (which may involve unpinning the CPU ASID for instance, requiring a
+	 * valid mm struct.)
+	 */
+	mmgrab(mm);
+
+	io_mm->mm		= mm;
+	io_mm->release		= domain->ops->mm_free;
+	INIT_LIST_HEAD(&io_mm->devices);
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&iommu_sva_lock);
+	pasid = idr_alloc_cyclic(&iommu_pasid_idr, io_mm, dev_param->min_pasid,
+				 dev_param->max_pasid + 1, GFP_ATOMIC);
+	io_mm->pasid = pasid;
+	spin_unlock(&iommu_sva_lock);
+	idr_preload_end();
+
+	if (pasid < 0) {
+		ret = pasid;
+		goto err_free_mm;
+	}
+
+	/* TODO: keep track of mm. For the moment, abort. */
+	ret = -ENOSYS;
+	spin_lock(&iommu_sva_lock);
+	idr_remove(&iommu_pasid_idr, io_mm->pasid);
+	spin_unlock(&iommu_sva_lock);
+
+err_free_mm:
+	domain->ops->mm_free(io_mm);
+	mmdrop(mm);
+
+	return ERR_PTR(ret);
+}
+
+static void io_mm_free(struct io_mm *io_mm)
+{
+	struct mm_struct *mm;
+	void (*release)(struct io_mm *);
+
+	release = io_mm->release;
+	mm = io_mm->mm;
+
+	release(io_mm);
+	mmdrop(mm);
+}
+
+static void io_mm_release(struct kref *kref)
+{
+	struct io_mm *io_mm;
+
+	io_mm = container_of(kref, struct io_mm, kref);
+	WARN_ON(!list_empty(&io_mm->devices));
+
+	idr_remove(&iommu_pasid_idr, io_mm->pasid);
+
+	io_mm_free(io_mm);
+}
+
+/*
+ * Returns non-zero if a reference to the io_mm was successfully taken.
+ * Returns zero if the io_mm is being freed and should not be used.
+ */
+static int io_mm_get_locked(struct io_mm *io_mm)
+{
+	if (io_mm)
+		return kref_get_unless_zero(&io_mm->kref);
+
+	return 0;
+}
+
+static void io_mm_put_locked(struct io_mm *io_mm)
+{
+	kref_put(&io_mm->kref, io_mm_release);
+}
+
+static void io_mm_put(struct io_mm *io_mm)
+{
+	spin_lock(&iommu_sva_lock);
+	io_mm_put_locked(io_mm);
+	spin_unlock(&iommu_sva_lock);
+}
+
+static int io_mm_attach(struct iommu_domain *domain, struct device *dev,
+			struct io_mm *io_mm, void *drvdata)
+{
+	int ret;
+	bool attach_domain = true;
+	int pasid = io_mm->pasid;
+	struct iommu_bond *bond, *tmp;
+	struct iommu_param *dev_param = dev->iommu_param;
+
+	if (!dev_param)
+		return -EINVAL;
+
+	if (!domain->ops->mm_attach || !domain->ops->mm_detach)
+		return -ENODEV;
+
+	if (pasid > dev_param->max_pasid || pasid < dev_param->min_pasid)
+		return -ERANGE;
+
+	bond = kzalloc(sizeof(*bond), GFP_KERNEL);
+	if (!bond)
+		return -ENOMEM;
+
+	bond->domain		= domain;
+	bond->io_mm		= io_mm;
+	bond->dev		= dev;
+	bond->drvdata		= drvdata;
+	refcount_set(&bond->refs, 1);
+
+	spin_lock(&iommu_sva_lock);
+	/*
+	 * Check if this io_mm is already bound to the domain. In which case the
+	 * IOMMU driver doesn't have to install the PASID table entry.
+	 */
+	list_for_each_entry(tmp, &domain->mm_list, domain_head) {
+		if (tmp->io_mm == io_mm) {
+			attach_domain = false;
+			break;
+		}
+	}
+
+	ret = domain->ops->mm_attach(domain, dev, io_mm, attach_domain);
+	if (ret) {
+		kfree(bond);
+		spin_unlock(&iommu_sva_lock);
+		return ret;
+	}
+
+	list_add(&bond->mm_head, &io_mm->devices);
+	list_add(&bond->domain_head, &domain->mm_list);
+	list_add(&bond->dev_head, &dev_param->mm_list);
+	spin_unlock(&iommu_sva_lock);
+
+	return 0;
+}
+
+static bool io_mm_detach_locked(struct iommu_bond *bond)
+{
+	struct iommu_bond *tmp;
+	bool detach_domain = true;
+	struct iommu_domain *domain = bond->domain;
+
+	if (!refcount_dec_and_test(&bond->refs))
+		return false;
+
+	list_for_each_entry(tmp, &domain->mm_list, domain_head) {
+		if (tmp->io_mm == bond->io_mm && tmp->dev != bond->dev) {
+			detach_domain = false;
+			break;
+		}
+	}
+
+	domain->ops->mm_detach(domain, bond->dev, bond->io_mm, detach_domain);
+
+	list_del(&bond->mm_head);
+	list_del(&bond->domain_head);
+	list_del(&bond->dev_head);
+	io_mm_put_locked(bond->io_mm);
+
+	kfree(bond);
+
+	return true;
+}
+
+static void io_mm_detach_all_locked(struct iommu_bond *bond)
+{
+	while (!io_mm_detach_locked(bond));
+}
+
 /**
  * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a device
  * @dev: the device
@@ -129,7 +439,10 @@ EXPORT_SYMBOL_GPL(iommu_sva_device_shutdown);
 int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
 			  unsigned long flags, void *drvdata)
 {
+	int i, ret;
+	struct io_mm *io_mm = NULL;
 	struct iommu_domain *domain;
+	struct iommu_bond *bond = NULL, *tmp;
 	struct iommu_param *dev_param = dev->iommu_param;
 
 	domain = iommu_get_domain_for_dev(dev);
@@ -145,7 +458,42 @@ int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
 	if (flags != (IOMMU_SVA_FEAT_PASID | IOMMU_SVA_FEAT_IOPF))
 		return -EINVAL;
 
-	return -ENOSYS; /* TODO */
+	/* If an io_mm already exists, use it */
+	spin_lock(&iommu_sva_lock);
+	idr_for_each_entry(&iommu_pasid_idr, io_mm, i) {
+		if (io_mm->mm != mm || !io_mm_get_locked(io_mm))
+			continue;
+
+		/* Is it already bound to this device? */
+		list_for_each_entry(tmp, &io_mm->devices, mm_head) {
+			if (tmp->dev != dev)
+				continue;
+
+			bond = tmp;
+			refcount_inc(&bond->refs);
+			io_mm_put_locked(io_mm);
+			break;
+		}
+		break;
+	}
+	spin_unlock(&iommu_sva_lock);
+
+	if (bond)
+		return 0;
+
+	if (!io_mm) {
+		io_mm = io_mm_alloc(domain, dev, mm);
+		if (IS_ERR(io_mm))
+			return PTR_ERR(io_mm);
+	}
+
+	ret = io_mm_attach(domain, dev, io_mm, drvdata);
+	if (ret)
+		io_mm_put(io_mm);
+	else
+		*pasid = io_mm->pasid;
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
 
@@ -165,7 +513,10 @@ EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
  */
 int iommu_sva_unbind_device(struct device *dev, int pasid)
 {
+	int ret = -ESRCH;
+	struct io_mm *io_mm;
 	struct iommu_domain *domain;
+	struct iommu_bond *bond = NULL;
 
 	domain = iommu_get_domain_for_dev(dev);
 	if (WARN_ON(!domain))
@@ -177,7 +528,23 @@ int iommu_sva_unbind_device(struct device *dev, int pasid)
 	 */
 	iommu_fault_queue_flush(dev);
 
-	return -ENOSYS; /* TODO */
+	spin_lock(&iommu_sva_lock);
+	io_mm = idr_find(&iommu_pasid_idr, pasid);
+	if (!io_mm) {
+		spin_unlock(&iommu_sva_lock);
+		return -ESRCH;
+	}
+
+	list_for_each_entry(bond, &io_mm->devices, mm_head) {
+		if (bond->dev == dev) {
+			io_mm_detach_locked(bond);
+			ret = 0;
+			break;
+		}
+	}
+	spin_unlock(&iommu_sva_lock);
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);
 
@@ -188,8 +555,17 @@ EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);
  */
 void __iommu_sva_unbind_dev_all(struct device *dev)
 {
+	struct iommu_bond *bond, *next;
+	struct iommu_param *dev_param = dev->iommu_param;
+
+	if (!dev_param)
+		return;
+
 	iommu_fault_queue_flush(dev);
 
-	/* TODO */
+	spin_lock(&iommu_sva_lock);
+	list_for_each_entry_safe(bond, next, &dev_param->mm_list, dev_head)
+		io_mm_detach_all_locked(bond);
+	spin_unlock(&iommu_sva_lock);
 }
 EXPORT_SYMBOL_GPL(__iommu_sva_unbind_dev_all);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f977851c522b..1d60b32a6744 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -586,6 +586,7 @@ int iommu_group_add_device(struct iommu_group *group, struct device *dev)
 		ret = -ENOMEM;
 		goto err_free_name;
 	}
+	INIT_LIST_HEAD(&dev->iommu_param->mm_list);
 
 	kobject_get(group->devices_kobj);
 
@@ -1325,6 +1326,7 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
 	domain->type = type;
 	/* Assume all sizes by default; the driver may override this later */
 	domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
+	INIT_LIST_HEAD(&domain->mm_list);
 
 	return domain;
 }
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 1fb10d64b9e5..09d85f44142a 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -103,6 +103,18 @@ struct iommu_domain {
 	void *handler_token;
 	struct iommu_domain_geometry geometry;
 	void *iova_cookie;
+
+	struct list_head mm_list;
+};
+
+struct io_mm {
+	int			pasid;
+	struct list_head	devices;
+	struct kref		kref;
+	struct mm_struct	*mm;
+
+	/* Release callback for this mm */
+	void (*release)(struct io_mm *io_mm);
 };
 
 enum iommu_cap {
@@ -204,6 +216,11 @@ struct page_response_msg {
  * @detach_dev: detach device from an iommu domain
  * @sva_device_init: initialize Shared Virtual Adressing for a device
  * @sva_device_shutdown: shutdown Shared Virtual Adressing for a device
+ * @mm_alloc: allocate io_mm
+ * @mm_free: free io_mm
+ * @mm_attach: attach io_mm to a device. Install PASID entry if necessary
+ * @mm_detach: detach io_mm from a device. Remove PASID entry and
+ *             flush associated TLB entries.
  * @map: map a physically contiguous memory region to an iommu domain
  * @unmap: unmap a physically contiguous memory region from an iommu domain
  * @map_sg: map a scatter-gather list of physically contiguous memory chunks
@@ -241,6 +258,13 @@ struct iommu_ops {
 			       unsigned int *min_pasid,
 			       unsigned int *max_pasid);
 	void (*sva_device_shutdown)(struct device *dev);
+	struct io_mm *(*mm_alloc)(struct iommu_domain *domain,
+				  struct mm_struct *mm);
+	void (*mm_free)(struct io_mm *io_mm);
+	int (*mm_attach)(struct iommu_domain *domain, struct device *dev,
+			 struct io_mm *io_mm, bool attach_domain);
+	void (*mm_detach)(struct iommu_domain *domain, struct device *dev,
+			  struct io_mm *io_mm, bool detach_domain);
 	int (*map)(struct iommu_domain *domain, unsigned long iova,
 		   phys_addr_t paddr, size_t size, int prot);
 	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
@@ -399,6 +423,7 @@ struct iommu_param {
 	unsigned long sva_features;
 	unsigned int min_pasid;
 	unsigned int max_pasid;
+	struct list_head mm_list;
 };
 
 int  iommu_device_register(struct iommu_device *iommu);
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 03/37] iommu/sva: Manage process address spaces
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

Introduce boilerplate code for allocating IOMMU mm structures and binding
them to devices. Four operations are added to IOMMU drivers:

* mm_alloc(): to create an io_mm structure and perform architecture-
  specific operations required to grab the process (for instance on ARM,
  pin down the CPU ASID so that the process doesn't get assigned a new
  ASID on rollover).

  There is a single valid io_mm structure per Linux mm. Future extensions
  may also use io_mm for kernel-managed address spaces, populated with
  map()/unmap() calls instead of bound to process address spaces. This
  patch focuses on "shared" io_mm.

* mm_attach(): attach an mm to a device. The IOMMU driver checks that the
  device is capable of sharing an address space, and writes the PASID
  table entry to install the pgd.

  Some IOMMU drivers will have a single PASID table per domain, for
  convenience. Other can implement it differently but to help these
  drivers, mm_attach and mm_detach take 'attach_domain' and
  'detach_domain' parameters, that tell whether they need to set and clear
  the PASID entry or only send the required TLB invalidations.

* mm_detach(): detach an mm from a device. The IOMMU driver removes the
  PASID table entry and invalidates the IOTLBs.

* mm_free(): free a structure allocated by mm_alloc(), and let arch
  release the process.

mm_attach and mm_detach operations are serialized with a spinlock. At the
moment it is global, but if we try to optimize it, the core should at
least prevent concurrent attach()/detach() on the same domain (so
multi-level PASID table code can allocate tables lazily). mm_alloc() can
sleep, but mm_free must not (because we'll have to call it from call_srcu
later on.)

At the moment we use an IDR for allocating PASIDs and retrieving contexts.
We also use a single spinlock. These can be refined and optimized later (a
custom allocator will be needed for top-down PASID allocation).

Keeping track of address spaces requires the use of MMU notifiers.
Handling process exit with regard to unbind() is tricky, so it is left for
another patch and we explicitly fail mm_alloc() for the moment.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/iommu-sva.c | 382 +++++++++++++++++++++++++++++++++++++++++++++-
 drivers/iommu/iommu.c     |   2 +
 include/linux/iommu.h     |  25 +++
 3 files changed, 406 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 593685d891bf..f9af9d66b3ed 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -7,11 +7,321 @@
  * SPDX-License-Identifier: GPL-2.0
  */
 
+#include <linux/idr.h>
 #include <linux/iommu.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+
+/**
+ * DOC: io_mm model
+ *
+ * The io_mm keeps track of process address spaces shared between CPU and IOMMU.
+ * The following example illustrates the relation between structures
+ * iommu_domain, io_mm and iommu_bond. An iommu_bond is a link between io_mm and
+ * device. A device can have multiple io_mm and an io_mm may be bound to
+ * multiple devices.
+ *              ___________________________
+ *             |  IOMMU domain A           |
+ *             |  ________________         |
+ *             | |  IOMMU group   |        +------- io_pgtables
+ *             | |                |        |
+ *             | |   dev 00:00.0 ----+------- bond --- io_mm X
+ *             | |________________|   \    |
+ *             |                       '----- bond ---.
+ *             |___________________________|           \
+ *              ___________________________             \
+ *             |  IOMMU domain B           |           io_mm Y
+ *             |  ________________         |           / /
+ *             | |  IOMMU group   |        |          / /
+ *             | |                |        |         / /
+ *             | |   dev 00:01.0 ------------ bond -' /
+ *             | |   dev 00:01.1 ------------ bond --'
+ *             | |________________|        |
+ *             |                           +------- io_pgtables
+ *             |___________________________|
+ *
+ * In this example, device 00:00.0 is in domain A, devices 00:01.* are in domain
+ * B. All devices within the same domain access the same address spaces. Device
+ * 00:00.0 accesses address spaces X and Y, each corresponding to an mm_struct.
+ * Devices 00:01.* only access address space Y. In addition each
+ * IOMMU_DOMAIN_DMA domain has a private address space, io_pgtable, that is
+ * managed with iommu_map()/iommu_unmap(), and isn't shared with the CPU MMU.
+ *
+ * To obtain the above configuration, users would for instance issue the
+ * following calls:
+ *
+ *     iommu_sva_bind_device(dev 00:00.0, mm X, ...) -> PASID 1
+ *     iommu_sva_bind_device(dev 00:00.0, mm Y, ...) -> PASID 2
+ *     iommu_sva_bind_device(dev 00:01.0, mm Y, ...) -> PASID 2
+ *     iommu_sva_bind_device(dev 00:01.1, mm Y, ...) -> PASID 2
+ *
+ * A single Process Address Space ID (PASID) is allocated for each mm. In the
+ * example, devices use PASID 1 to read/write into address space X and PASID 2
+ * to read/write into address space Y.
+ *
+ * Hardware tables describing this configuration in the IOMMU would typically
+ * look like this:
+ *
+ *                                PASID tables
+ *                                 of domain A
+ *                              .->+--------+
+ *                             / 0 |        |-------> io_pgtable
+ *                            /    +--------+
+ *            Device tables  /   1 |        |-------> pgd X
+ *              +--------+  /      +--------+
+ *      00:00.0 |      A |-'     2 |        |--.
+ *              +--------+         +--------+   \
+ *              :        :       3 |        |    \
+ *              +--------+         +--------+     --> pgd Y
+ *      00:01.0 |      B |--.                    /
+ *              +--------+   \                  |
+ *      00:01.1 |      B |----+   PASID tables  |
+ *              +--------+     \   of domain B  |
+ *                              '->+--------+   |
+ *                               0 |        |-- | --> io_pgtable
+ *                                 +--------+   |
+ *                               1 |        |   |
+ *                                 +--------+   |
+ *                               2 |        |---'
+ *                                 +--------+
+ *                               3 |        |
+ *                                 +--------+
+ *
+ * With this model, a single call binds all devices in a given domain to an
+ * address space. Other devices in the domain will get the same bond implicitly.
+ * However, users must issue one bind() for each device, because IOMMUs may
+ * implement SVA differently. Furthermore, mandating one bind() per device
+ * allows the driver to perform sanity-checks on device capabilities.
+ *
+ * On Arm and AMD IOMMUs, entry 0 of the PASID table can be used to hold
+ * non-PASID translations. In this case PASID 0 is reserved and entry 0 points
+ * to the io_pgtable base. On Intel IOMMU, the io_pgtable base would be held in
+ * the device table and PASID 0 would be available to the allocator.
+ */
 
 /* TODO: stub for the fault queue. Remove later. */
 #define iommu_fault_queue_flush(...)
 
+struct iommu_bond {
+	struct io_mm		*io_mm;
+	struct device		*dev;
+	struct iommu_domain	*domain;
+
+	struct list_head	mm_head;
+	struct list_head	dev_head;
+	struct list_head	domain_head;
+
+	void			*drvdata;
+
+	/* Number of bind() calls */
+	refcount_t		refs;
+};
+
+/*
+ * Because we're using an IDR, PASIDs are limited to 31 bits (the sign bit is
+ * used for returning errors). In practice implementations will use at most 20
+ * bits, which is the PCI limit.
+ */
+static DEFINE_IDR(iommu_pasid_idr);
+
+/*
+ * For the moment this is an all-purpose lock. It serializes
+ * access/modifications to bonds, access/modifications to the PASID IDR, and
+ * changes to io_mm refcount as well.
+ */
+static DEFINE_SPINLOCK(iommu_sva_lock);
+
+static struct io_mm *
+io_mm_alloc(struct iommu_domain *domain, struct device *dev,
+	    struct mm_struct *mm)
+{
+	int ret;
+	int pasid;
+	struct io_mm *io_mm;
+	struct iommu_param *dev_param = dev->iommu_param;
+
+	if (!dev_param || !domain->ops->mm_alloc || !domain->ops->mm_free)
+		return ERR_PTR(-ENODEV);
+
+	io_mm = domain->ops->mm_alloc(domain, mm);
+	if (IS_ERR(io_mm))
+		return io_mm;
+	if (!io_mm)
+		return ERR_PTR(-ENOMEM);
+
+	/*
+	 * The mm must not be freed until after the driver frees the io_mm
+	 * (which may involve unpinning the CPU ASID for instance, requiring a
+	 * valid mm struct.)
+	 */
+	mmgrab(mm);
+
+	io_mm->mm		= mm;
+	io_mm->release		= domain->ops->mm_free;
+	INIT_LIST_HEAD(&io_mm->devices);
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&iommu_sva_lock);
+	pasid = idr_alloc_cyclic(&iommu_pasid_idr, io_mm, dev_param->min_pasid,
+				 dev_param->max_pasid + 1, GFP_ATOMIC);
+	io_mm->pasid = pasid;
+	spin_unlock(&iommu_sva_lock);
+	idr_preload_end();
+
+	if (pasid < 0) {
+		ret = pasid;
+		goto err_free_mm;
+	}
+
+	/* TODO: keep track of mm. For the moment, abort. */
+	ret = -ENOSYS;
+	spin_lock(&iommu_sva_lock);
+	idr_remove(&iommu_pasid_idr, io_mm->pasid);
+	spin_unlock(&iommu_sva_lock);
+
+err_free_mm:
+	domain->ops->mm_free(io_mm);
+	mmdrop(mm);
+
+	return ERR_PTR(ret);
+}
+
+static void io_mm_free(struct io_mm *io_mm)
+{
+	struct mm_struct *mm;
+	void (*release)(struct io_mm *);
+
+	release = io_mm->release;
+	mm = io_mm->mm;
+
+	release(io_mm);
+	mmdrop(mm);
+}
+
+static void io_mm_release(struct kref *kref)
+{
+	struct io_mm *io_mm;
+
+	io_mm = container_of(kref, struct io_mm, kref);
+	WARN_ON(!list_empty(&io_mm->devices));
+
+	idr_remove(&iommu_pasid_idr, io_mm->pasid);
+
+	io_mm_free(io_mm);
+}
+
+/*
+ * Returns non-zero if a reference to the io_mm was successfully taken.
+ * Returns zero if the io_mm is being freed and should not be used.
+ */
+static int io_mm_get_locked(struct io_mm *io_mm)
+{
+	if (io_mm)
+		return kref_get_unless_zero(&io_mm->kref);
+
+	return 0;
+}
+
+static void io_mm_put_locked(struct io_mm *io_mm)
+{
+	kref_put(&io_mm->kref, io_mm_release);
+}
+
+static void io_mm_put(struct io_mm *io_mm)
+{
+	spin_lock(&iommu_sva_lock);
+	io_mm_put_locked(io_mm);
+	spin_unlock(&iommu_sva_lock);
+}
+
+static int io_mm_attach(struct iommu_domain *domain, struct device *dev,
+			struct io_mm *io_mm, void *drvdata)
+{
+	int ret;
+	bool attach_domain = true;
+	int pasid = io_mm->pasid;
+	struct iommu_bond *bond, *tmp;
+	struct iommu_param *dev_param = dev->iommu_param;
+
+	if (!dev_param)
+		return -EINVAL;
+
+	if (!domain->ops->mm_attach || !domain->ops->mm_detach)
+		return -ENODEV;
+
+	if (pasid > dev_param->max_pasid || pasid < dev_param->min_pasid)
+		return -ERANGE;
+
+	bond = kzalloc(sizeof(*bond), GFP_KERNEL);
+	if (!bond)
+		return -ENOMEM;
+
+	bond->domain		= domain;
+	bond->io_mm		= io_mm;
+	bond->dev		= dev;
+	bond->drvdata		= drvdata;
+	refcount_set(&bond->refs, 1);
+
+	spin_lock(&iommu_sva_lock);
+	/*
+	 * Check if this io_mm is already bound to the domain. In which case the
+	 * IOMMU driver doesn't have to install the PASID table entry.
+	 */
+	list_for_each_entry(tmp, &domain->mm_list, domain_head) {
+		if (tmp->io_mm == io_mm) {
+			attach_domain = false;
+			break;
+		}
+	}
+
+	ret = domain->ops->mm_attach(domain, dev, io_mm, attach_domain);
+	if (ret) {
+		kfree(bond);
+		spin_unlock(&iommu_sva_lock);
+		return ret;
+	}
+
+	list_add(&bond->mm_head, &io_mm->devices);
+	list_add(&bond->domain_head, &domain->mm_list);
+	list_add(&bond->dev_head, &dev_param->mm_list);
+	spin_unlock(&iommu_sva_lock);
+
+	return 0;
+}
+
+static bool io_mm_detach_locked(struct iommu_bond *bond)
+{
+	struct iommu_bond *tmp;
+	bool detach_domain = true;
+	struct iommu_domain *domain = bond->domain;
+
+	if (!refcount_dec_and_test(&bond->refs))
+		return false;
+
+	list_for_each_entry(tmp, &domain->mm_list, domain_head) {
+		if (tmp->io_mm == bond->io_mm && tmp->dev != bond->dev) {
+			detach_domain = false;
+			break;
+		}
+	}
+
+	domain->ops->mm_detach(domain, bond->dev, bond->io_mm, detach_domain);
+
+	list_del(&bond->mm_head);
+	list_del(&bond->domain_head);
+	list_del(&bond->dev_head);
+	io_mm_put_locked(bond->io_mm);
+
+	kfree(bond);
+
+	return true;
+}
+
+static void io_mm_detach_all_locked(struct iommu_bond *bond)
+{
+	while (!io_mm_detach_locked(bond));
+}
+
 /**
  * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a device
  * @dev: the device
@@ -129,7 +439,10 @@ EXPORT_SYMBOL_GPL(iommu_sva_device_shutdown);
 int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
 			  unsigned long flags, void *drvdata)
 {
+	int i, ret;
+	struct io_mm *io_mm = NULL;
 	struct iommu_domain *domain;
+	struct iommu_bond *bond = NULL, *tmp;
 	struct iommu_param *dev_param = dev->iommu_param;
 
 	domain = iommu_get_domain_for_dev(dev);
@@ -145,7 +458,42 @@ int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
 	if (flags != (IOMMU_SVA_FEAT_PASID | IOMMU_SVA_FEAT_IOPF))
 		return -EINVAL;
 
-	return -ENOSYS; /* TODO */
+	/* If an io_mm already exists, use it */
+	spin_lock(&iommu_sva_lock);
+	idr_for_each_entry(&iommu_pasid_idr, io_mm, i) {
+		if (io_mm->mm != mm || !io_mm_get_locked(io_mm))
+			continue;
+
+		/* Is it already bound to this device? */
+		list_for_each_entry(tmp, &io_mm->devices, mm_head) {
+			if (tmp->dev != dev)
+				continue;
+
+			bond = tmp;
+			refcount_inc(&bond->refs);
+			io_mm_put_locked(io_mm);
+			break;
+		}
+		break;
+	}
+	spin_unlock(&iommu_sva_lock);
+
+	if (bond)
+		return 0;
+
+	if (!io_mm) {
+		io_mm = io_mm_alloc(domain, dev, mm);
+		if (IS_ERR(io_mm))
+			return PTR_ERR(io_mm);
+	}
+
+	ret = io_mm_attach(domain, dev, io_mm, drvdata);
+	if (ret)
+		io_mm_put(io_mm);
+	else
+		*pasid = io_mm->pasid;
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
 
@@ -165,7 +513,10 @@ EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
  */
 int iommu_sva_unbind_device(struct device *dev, int pasid)
 {
+	int ret = -ESRCH;
+	struct io_mm *io_mm;
 	struct iommu_domain *domain;
+	struct iommu_bond *bond = NULL;
 
 	domain = iommu_get_domain_for_dev(dev);
 	if (WARN_ON(!domain))
@@ -177,7 +528,23 @@ int iommu_sva_unbind_device(struct device *dev, int pasid)
 	 */
 	iommu_fault_queue_flush(dev);
 
-	return -ENOSYS; /* TODO */
+	spin_lock(&iommu_sva_lock);
+	io_mm = idr_find(&iommu_pasid_idr, pasid);
+	if (!io_mm) {
+		spin_unlock(&iommu_sva_lock);
+		return -ESRCH;
+	}
+
+	list_for_each_entry(bond, &io_mm->devices, mm_head) {
+		if (bond->dev == dev) {
+			io_mm_detach_locked(bond);
+			ret = 0;
+			break;
+		}
+	}
+	spin_unlock(&iommu_sva_lock);
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);
 
@@ -188,8 +555,17 @@ EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);
  */
 void __iommu_sva_unbind_dev_all(struct device *dev)
 {
+	struct iommu_bond *bond, *next;
+	struct iommu_param *dev_param = dev->iommu_param;
+
+	if (!dev_param)
+		return;
+
 	iommu_fault_queue_flush(dev);
 
-	/* TODO */
+	spin_lock(&iommu_sva_lock);
+	list_for_each_entry_safe(bond, next, &dev_param->mm_list, dev_head)
+		io_mm_detach_all_locked(bond);
+	spin_unlock(&iommu_sva_lock);
 }
 EXPORT_SYMBOL_GPL(__iommu_sva_unbind_dev_all);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f977851c522b..1d60b32a6744 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -586,6 +586,7 @@ int iommu_group_add_device(struct iommu_group *group, struct device *dev)
 		ret = -ENOMEM;
 		goto err_free_name;
 	}
+	INIT_LIST_HEAD(&dev->iommu_param->mm_list);
 
 	kobject_get(group->devices_kobj);
 
@@ -1325,6 +1326,7 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
 	domain->type = type;
 	/* Assume all sizes by default; the driver may override this later */
 	domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
+	INIT_LIST_HEAD(&domain->mm_list);
 
 	return domain;
 }
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 1fb10d64b9e5..09d85f44142a 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -103,6 +103,18 @@ struct iommu_domain {
 	void *handler_token;
 	struct iommu_domain_geometry geometry;
 	void *iova_cookie;
+
+	struct list_head mm_list;
+};
+
+struct io_mm {
+	int			pasid;
+	struct list_head	devices;
+	struct kref		kref;
+	struct mm_struct	*mm;
+
+	/* Release callback for this mm */
+	void (*release)(struct io_mm *io_mm);
 };
 
 enum iommu_cap {
@@ -204,6 +216,11 @@ struct page_response_msg {
  * @detach_dev: detach device from an iommu domain
  * @sva_device_init: initialize Shared Virtual Adressing for a device
  * @sva_device_shutdown: shutdown Shared Virtual Adressing for a device
+ * @mm_alloc: allocate io_mm
+ * @mm_free: free io_mm
+ * @mm_attach: attach io_mm to a device. Install PASID entry if necessary
+ * @mm_detach: detach io_mm from a device. Remove PASID entry and
+ *             flush associated TLB entries.
  * @map: map a physically contiguous memory region to an iommu domain
  * @unmap: unmap a physically contiguous memory region from an iommu domain
  * @map_sg: map a scatter-gather list of physically contiguous memory chunks
@@ -241,6 +258,13 @@ struct iommu_ops {
 			       unsigned int *min_pasid,
 			       unsigned int *max_pasid);
 	void (*sva_device_shutdown)(struct device *dev);
+	struct io_mm *(*mm_alloc)(struct iommu_domain *domain,
+				  struct mm_struct *mm);
+	void (*mm_free)(struct io_mm *io_mm);
+	int (*mm_attach)(struct iommu_domain *domain, struct device *dev,
+			 struct io_mm *io_mm, bool attach_domain);
+	void (*mm_detach)(struct iommu_domain *domain, struct device *dev,
+			  struct io_mm *io_mm, bool detach_domain);
 	int (*map)(struct iommu_domain *domain, unsigned long iova,
 		   phys_addr_t paddr, size_t size, int prot);
 	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
@@ -399,6 +423,7 @@ struct iommu_param {
 	unsigned long sva_features;
 	unsigned int min_pasid;
 	unsigned int max_pasid;
+	struct list_head mm_list;
 };
 
 int  iommu_device_register(struct iommu_device *iommu);
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 03/37] iommu/sva: Manage process address spaces
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

Introduce boilerplate code for allocating IOMMU mm structures and binding
them to devices. Four operations are added to IOMMU drivers:

* mm_alloc(): to create an io_mm structure and perform architecture-
  specific operations required to grab the process (for instance on ARM,
  pin down the CPU ASID so that the process doesn't get assigned a new
  ASID on rollover).

  There is a single valid io_mm structure per Linux mm. Future extensions
  may also use io_mm for kernel-managed address spaces, populated with
  map()/unmap() calls instead of bound to process address spaces. This
  patch focuses on "shared" io_mm.

* mm_attach(): attach an mm to a device. The IOMMU driver checks that the
  device is capable of sharing an address space, and writes the PASID
  table entry to install the pgd.

  Some IOMMU drivers will have a single PASID table per domain, for
  convenience. Other can implement it differently but to help these
  drivers, mm_attach and mm_detach take 'attach_domain' and
  'detach_domain' parameters, that tell whether they need to set and clear
  the PASID entry or only send the required TLB invalidations.

* mm_detach(): detach an mm from a device. The IOMMU driver removes the
  PASID table entry and invalidates the IOTLBs.

* mm_free(): free a structure allocated by mm_alloc(), and let arch
  release the process.

mm_attach and mm_detach operations are serialized with a spinlock. At the
moment it is global, but if we try to optimize it, the core should at
least prevent concurrent attach()/detach() on the same domain (so
multi-level PASID table code can allocate tables lazily). mm_alloc() can
sleep, but mm_free must not (because we'll have to call it from call_srcu
later on.)

At the moment we use an IDR for allocating PASIDs and retrieving contexts.
We also use a single spinlock. These can be refined and optimized later (a
custom allocator will be needed for top-down PASID allocation).

Keeping track of address spaces requires the use of MMU notifiers.
Handling process exit with regard to unbind() is tricky, so it is left for
another patch and we explicitly fail mm_alloc() for the moment.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/iommu-sva.c | 382 +++++++++++++++++++++++++++++++++++++++++++++-
 drivers/iommu/iommu.c     |   2 +
 include/linux/iommu.h     |  25 +++
 3 files changed, 406 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 593685d891bf..f9af9d66b3ed 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -7,11 +7,321 @@
  * SPDX-License-Identifier: GPL-2.0
  */
 
+#include <linux/idr.h>
 #include <linux/iommu.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+
+/**
+ * DOC: io_mm model
+ *
+ * The io_mm keeps track of process address spaces shared between CPU and IOMMU.
+ * The following example illustrates the relation between structures
+ * iommu_domain, io_mm and iommu_bond. An iommu_bond is a link between io_mm and
+ * device. A device can have multiple io_mm and an io_mm may be bound to
+ * multiple devices.
+ *              ___________________________
+ *             |  IOMMU domain A           |
+ *             |  ________________         |
+ *             | |  IOMMU group   |        +------- io_pgtables
+ *             | |                |        |
+ *             | |   dev 00:00.0 ----+------- bond --- io_mm X
+ *             | |________________|   \    |
+ *             |                       '----- bond ---.
+ *             |___________________________|           \
+ *              ___________________________             \
+ *             |  IOMMU domain B           |           io_mm Y
+ *             |  ________________         |           / /
+ *             | |  IOMMU group   |        |          / /
+ *             | |                |        |         / /
+ *             | |   dev 00:01.0 ------------ bond -' /
+ *             | |   dev 00:01.1 ------------ bond --'
+ *             | |________________|        |
+ *             |                           +------- io_pgtables
+ *             |___________________________|
+ *
+ * In this example, device 00:00.0 is in domain A, devices 00:01.* are in domain
+ * B. All devices within the same domain access the same address spaces. Device
+ * 00:00.0 accesses address spaces X and Y, each corresponding to an mm_struct.
+ * Devices 00:01.* only access address space Y. In addition each
+ * IOMMU_DOMAIN_DMA domain has a private address space, io_pgtable, that is
+ * managed with iommu_map()/iommu_unmap(), and isn't shared with the CPU MMU.
+ *
+ * To obtain the above configuration, users would for instance issue the
+ * following calls:
+ *
+ *     iommu_sva_bind_device(dev 00:00.0, mm X, ...) -> PASID 1
+ *     iommu_sva_bind_device(dev 00:00.0, mm Y, ...) -> PASID 2
+ *     iommu_sva_bind_device(dev 00:01.0, mm Y, ...) -> PASID 2
+ *     iommu_sva_bind_device(dev 00:01.1, mm Y, ...) -> PASID 2
+ *
+ * A single Process Address Space ID (PASID) is allocated for each mm. In the
+ * example, devices use PASID 1 to read/write into address space X and PASID 2
+ * to read/write into address space Y.
+ *
+ * Hardware tables describing this configuration in the IOMMU would typically
+ * look like this:
+ *
+ *                                PASID tables
+ *                                 of domain A
+ *                              .->+--------+
+ *                             / 0 |        |-------> io_pgtable
+ *                            /    +--------+
+ *            Device tables  /   1 |        |-------> pgd X
+ *              +--------+  /      +--------+
+ *      00:00.0 |      A |-'     2 |        |--.
+ *              +--------+         +--------+   \
+ *              :        :       3 |        |    \
+ *              +--------+         +--------+     --> pgd Y
+ *      00:01.0 |      B |--.                    /
+ *              +--------+   \                  |
+ *      00:01.1 |      B |----+   PASID tables  |
+ *              +--------+     \   of domain B  |
+ *                              '->+--------+   |
+ *                               0 |        |-- | --> io_pgtable
+ *                                 +--------+   |
+ *                               1 |        |   |
+ *                                 +--------+   |
+ *                               2 |        |---'
+ *                                 +--------+
+ *                               3 |        |
+ *                                 +--------+
+ *
+ * With this model, a single call binds all devices in a given domain to an
+ * address space. Other devices in the domain will get the same bond implicitly.
+ * However, users must issue one bind() for each device, because IOMMUs may
+ * implement SVA differently. Furthermore, mandating one bind() per device
+ * allows the driver to perform sanity-checks on device capabilities.
+ *
+ * On Arm and AMD IOMMUs, entry 0 of the PASID table can be used to hold
+ * non-PASID translations. In this case PASID 0 is reserved and entry 0 points
+ * to the io_pgtable base. On Intel IOMMU, the io_pgtable base would be held in
+ * the device table and PASID 0 would be available to the allocator.
+ */
 
 /* TODO: stub for the fault queue. Remove later. */
 #define iommu_fault_queue_flush(...)
 
+struct iommu_bond {
+	struct io_mm		*io_mm;
+	struct device		*dev;
+	struct iommu_domain	*domain;
+
+	struct list_head	mm_head;
+	struct list_head	dev_head;
+	struct list_head	domain_head;
+
+	void			*drvdata;
+
+	/* Number of bind() calls */
+	refcount_t		refs;
+};
+
+/*
+ * Because we're using an IDR, PASIDs are limited to 31 bits (the sign bit is
+ * used for returning errors). In practice implementations will use at most 20
+ * bits, which is the PCI limit.
+ */
+static DEFINE_IDR(iommu_pasid_idr);
+
+/*
+ * For the moment this is an all-purpose lock. It serializes
+ * access/modifications to bonds, access/modifications to the PASID IDR, and
+ * changes to io_mm refcount as well.
+ */
+static DEFINE_SPINLOCK(iommu_sva_lock);
+
+static struct io_mm *
+io_mm_alloc(struct iommu_domain *domain, struct device *dev,
+	    struct mm_struct *mm)
+{
+	int ret;
+	int pasid;
+	struct io_mm *io_mm;
+	struct iommu_param *dev_param = dev->iommu_param;
+
+	if (!dev_param || !domain->ops->mm_alloc || !domain->ops->mm_free)
+		return ERR_PTR(-ENODEV);
+
+	io_mm = domain->ops->mm_alloc(domain, mm);
+	if (IS_ERR(io_mm))
+		return io_mm;
+	if (!io_mm)
+		return ERR_PTR(-ENOMEM);
+
+	/*
+	 * The mm must not be freed until after the driver frees the io_mm
+	 * (which may involve unpinning the CPU ASID for instance, requiring a
+	 * valid mm struct.)
+	 */
+	mmgrab(mm);
+
+	io_mm->mm		= mm;
+	io_mm->release		= domain->ops->mm_free;
+	INIT_LIST_HEAD(&io_mm->devices);
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&iommu_sva_lock);
+	pasid = idr_alloc_cyclic(&iommu_pasid_idr, io_mm, dev_param->min_pasid,
+				 dev_param->max_pasid + 1, GFP_ATOMIC);
+	io_mm->pasid = pasid;
+	spin_unlock(&iommu_sva_lock);
+	idr_preload_end();
+
+	if (pasid < 0) {
+		ret = pasid;
+		goto err_free_mm;
+	}
+
+	/* TODO: keep track of mm. For the moment, abort. */
+	ret = -ENOSYS;
+	spin_lock(&iommu_sva_lock);
+	idr_remove(&iommu_pasid_idr, io_mm->pasid);
+	spin_unlock(&iommu_sva_lock);
+
+err_free_mm:
+	domain->ops->mm_free(io_mm);
+	mmdrop(mm);
+
+	return ERR_PTR(ret);
+}
+
+static void io_mm_free(struct io_mm *io_mm)
+{
+	struct mm_struct *mm;
+	void (*release)(struct io_mm *);
+
+	release = io_mm->release;
+	mm = io_mm->mm;
+
+	release(io_mm);
+	mmdrop(mm);
+}
+
+static void io_mm_release(struct kref *kref)
+{
+	struct io_mm *io_mm;
+
+	io_mm = container_of(kref, struct io_mm, kref);
+	WARN_ON(!list_empty(&io_mm->devices));
+
+	idr_remove(&iommu_pasid_idr, io_mm->pasid);
+
+	io_mm_free(io_mm);
+}
+
+/*
+ * Returns non-zero if a reference to the io_mm was successfully taken.
+ * Returns zero if the io_mm is being freed and should not be used.
+ */
+static int io_mm_get_locked(struct io_mm *io_mm)
+{
+	if (io_mm)
+		return kref_get_unless_zero(&io_mm->kref);
+
+	return 0;
+}
+
+static void io_mm_put_locked(struct io_mm *io_mm)
+{
+	kref_put(&io_mm->kref, io_mm_release);
+}
+
+static void io_mm_put(struct io_mm *io_mm)
+{
+	spin_lock(&iommu_sva_lock);
+	io_mm_put_locked(io_mm);
+	spin_unlock(&iommu_sva_lock);
+}
+
+static int io_mm_attach(struct iommu_domain *domain, struct device *dev,
+			struct io_mm *io_mm, void *drvdata)
+{
+	int ret;
+	bool attach_domain = true;
+	int pasid = io_mm->pasid;
+	struct iommu_bond *bond, *tmp;
+	struct iommu_param *dev_param = dev->iommu_param;
+
+	if (!dev_param)
+		return -EINVAL;
+
+	if (!domain->ops->mm_attach || !domain->ops->mm_detach)
+		return -ENODEV;
+
+	if (pasid > dev_param->max_pasid || pasid < dev_param->min_pasid)
+		return -ERANGE;
+
+	bond = kzalloc(sizeof(*bond), GFP_KERNEL);
+	if (!bond)
+		return -ENOMEM;
+
+	bond->domain		= domain;
+	bond->io_mm		= io_mm;
+	bond->dev		= dev;
+	bond->drvdata		= drvdata;
+	refcount_set(&bond->refs, 1);
+
+	spin_lock(&iommu_sva_lock);
+	/*
+	 * Check if this io_mm is already bound to the domain. In which case the
+	 * IOMMU driver doesn't have to install the PASID table entry.
+	 */
+	list_for_each_entry(tmp, &domain->mm_list, domain_head) {
+		if (tmp->io_mm == io_mm) {
+			attach_domain = false;
+			break;
+		}
+	}
+
+	ret = domain->ops->mm_attach(domain, dev, io_mm, attach_domain);
+	if (ret) {
+		kfree(bond);
+		spin_unlock(&iommu_sva_lock);
+		return ret;
+	}
+
+	list_add(&bond->mm_head, &io_mm->devices);
+	list_add(&bond->domain_head, &domain->mm_list);
+	list_add(&bond->dev_head, &dev_param->mm_list);
+	spin_unlock(&iommu_sva_lock);
+
+	return 0;
+}
+
+static bool io_mm_detach_locked(struct iommu_bond *bond)
+{
+	struct iommu_bond *tmp;
+	bool detach_domain = true;
+	struct iommu_domain *domain = bond->domain;
+
+	if (!refcount_dec_and_test(&bond->refs))
+		return false;
+
+	list_for_each_entry(tmp, &domain->mm_list, domain_head) {
+		if (tmp->io_mm == bond->io_mm && tmp->dev != bond->dev) {
+			detach_domain = false;
+			break;
+		}
+	}
+
+	domain->ops->mm_detach(domain, bond->dev, bond->io_mm, detach_domain);
+
+	list_del(&bond->mm_head);
+	list_del(&bond->domain_head);
+	list_del(&bond->dev_head);
+	io_mm_put_locked(bond->io_mm);
+
+	kfree(bond);
+
+	return true;
+}
+
+static void io_mm_detach_all_locked(struct iommu_bond *bond)
+{
+	while (!io_mm_detach_locked(bond));
+}
+
 /**
  * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a device
  * @dev: the device
@@ -129,7 +439,10 @@ EXPORT_SYMBOL_GPL(iommu_sva_device_shutdown);
 int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
 			  unsigned long flags, void *drvdata)
 {
+	int i, ret;
+	struct io_mm *io_mm = NULL;
 	struct iommu_domain *domain;
+	struct iommu_bond *bond = NULL, *tmp;
 	struct iommu_param *dev_param = dev->iommu_param;
 
 	domain = iommu_get_domain_for_dev(dev);
@@ -145,7 +458,42 @@ int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
 	if (flags != (IOMMU_SVA_FEAT_PASID | IOMMU_SVA_FEAT_IOPF))
 		return -EINVAL;
 
-	return -ENOSYS; /* TODO */
+	/* If an io_mm already exists, use it */
+	spin_lock(&iommu_sva_lock);
+	idr_for_each_entry(&iommu_pasid_idr, io_mm, i) {
+		if (io_mm->mm != mm || !io_mm_get_locked(io_mm))
+			continue;
+
+		/* Is it already bound to this device? */
+		list_for_each_entry(tmp, &io_mm->devices, mm_head) {
+			if (tmp->dev != dev)
+				continue;
+
+			bond = tmp;
+			refcount_inc(&bond->refs);
+			io_mm_put_locked(io_mm);
+			break;
+		}
+		break;
+	}
+	spin_unlock(&iommu_sva_lock);
+
+	if (bond)
+		return 0;
+
+	if (!io_mm) {
+		io_mm = io_mm_alloc(domain, dev, mm);
+		if (IS_ERR(io_mm))
+			return PTR_ERR(io_mm);
+	}
+
+	ret = io_mm_attach(domain, dev, io_mm, drvdata);
+	if (ret)
+		io_mm_put(io_mm);
+	else
+		*pasid = io_mm->pasid;
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
 
@@ -165,7 +513,10 @@ EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
  */
 int iommu_sva_unbind_device(struct device *dev, int pasid)
 {
+	int ret = -ESRCH;
+	struct io_mm *io_mm;
 	struct iommu_domain *domain;
+	struct iommu_bond *bond = NULL;
 
 	domain = iommu_get_domain_for_dev(dev);
 	if (WARN_ON(!domain))
@@ -177,7 +528,23 @@ int iommu_sva_unbind_device(struct device *dev, int pasid)
 	 */
 	iommu_fault_queue_flush(dev);
 
-	return -ENOSYS; /* TODO */
+	spin_lock(&iommu_sva_lock);
+	io_mm = idr_find(&iommu_pasid_idr, pasid);
+	if (!io_mm) {
+		spin_unlock(&iommu_sva_lock);
+		return -ESRCH;
+	}
+
+	list_for_each_entry(bond, &io_mm->devices, mm_head) {
+		if (bond->dev == dev) {
+			io_mm_detach_locked(bond);
+			ret = 0;
+			break;
+		}
+	}
+	spin_unlock(&iommu_sva_lock);
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);
 
@@ -188,8 +555,17 @@ EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);
  */
 void __iommu_sva_unbind_dev_all(struct device *dev)
 {
+	struct iommu_bond *bond, *next;
+	struct iommu_param *dev_param = dev->iommu_param;
+
+	if (!dev_param)
+		return;
+
 	iommu_fault_queue_flush(dev);
 
-	/* TODO */
+	spin_lock(&iommu_sva_lock);
+	list_for_each_entry_safe(bond, next, &dev_param->mm_list, dev_head)
+		io_mm_detach_all_locked(bond);
+	spin_unlock(&iommu_sva_lock);
 }
 EXPORT_SYMBOL_GPL(__iommu_sva_unbind_dev_all);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f977851c522b..1d60b32a6744 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -586,6 +586,7 @@ int iommu_group_add_device(struct iommu_group *group, struct device *dev)
 		ret = -ENOMEM;
 		goto err_free_name;
 	}
+	INIT_LIST_HEAD(&dev->iommu_param->mm_list);
 
 	kobject_get(group->devices_kobj);
 
@@ -1325,6 +1326,7 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
 	domain->type = type;
 	/* Assume all sizes by default; the driver may override this later */
 	domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
+	INIT_LIST_HEAD(&domain->mm_list);
 
 	return domain;
 }
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 1fb10d64b9e5..09d85f44142a 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -103,6 +103,18 @@ struct iommu_domain {
 	void *handler_token;
 	struct iommu_domain_geometry geometry;
 	void *iova_cookie;
+
+	struct list_head mm_list;
+};
+
+struct io_mm {
+	int			pasid;
+	struct list_head	devices;
+	struct kref		kref;
+	struct mm_struct	*mm;
+
+	/* Release callback for this mm */
+	void (*release)(struct io_mm *io_mm);
 };
 
 enum iommu_cap {
@@ -204,6 +216,11 @@ struct page_response_msg {
  * @detach_dev: detach device from an iommu domain
  * @sva_device_init: initialize Shared Virtual Adressing for a device
  * @sva_device_shutdown: shutdown Shared Virtual Adressing for a device
+ * @mm_alloc: allocate io_mm
+ * @mm_free: free io_mm
+ * @mm_attach: attach io_mm to a device. Install PASID entry if necessary
+ * @mm_detach: detach io_mm from a device. Remove PASID entry and
+ *             flush associated TLB entries.
  * @map: map a physically contiguous memory region to an iommu domain
  * @unmap: unmap a physically contiguous memory region from an iommu domain
  * @map_sg: map a scatter-gather list of physically contiguous memory chunks
@@ -241,6 +258,13 @@ struct iommu_ops {
 			       unsigned int *min_pasid,
 			       unsigned int *max_pasid);
 	void (*sva_device_shutdown)(struct device *dev);
+	struct io_mm *(*mm_alloc)(struct iommu_domain *domain,
+				  struct mm_struct *mm);
+	void (*mm_free)(struct io_mm *io_mm);
+	int (*mm_attach)(struct iommu_domain *domain, struct device *dev,
+			 struct io_mm *io_mm, bool attach_domain);
+	void (*mm_detach)(struct iommu_domain *domain, struct device *dev,
+			  struct io_mm *io_mm, bool detach_domain);
 	int (*map)(struct iommu_domain *domain, unsigned long iova,
 		   phys_addr_t paddr, size_t size, int prot);
 	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
@@ -399,6 +423,7 @@ struct iommu_param {
 	unsigned long sva_features;
 	unsigned int min_pasid;
 	unsigned int max_pasid;
+	struct list_head mm_list;
 };
 
 int  iommu_device_register(struct iommu_device *iommu);
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 04/37] iommu/sva: Add a mm_exit callback for device drivers
  2018-02-12 18:33 ` Jean-Philippe Brucker
  (?)
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: joro-zLv9SwRftAIdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	mark.rutland-5wv7dgnIgG8, catalin.marinas-5wv7dgnIgG8,
	will.deacon-5wv7dgnIgG8, lorenzo.pieralisi-5wv7dgnIgG8,
	hanjun.guo-QSEj5FYQhm4dnm+yROfE0A, sudeep.holla-5wv7dgnIgG8,
	rjw-LthD3rsA81gm4RdzfppkhA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	robin.murphy-5wv7dgnIgG8, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	tn-nYOzD4b6Jr9Wk0Htik3J/w, liubo95-hv44wF8Li93QT0dZR+AlfA,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	jonathan.cameron-hv44wF8Li93QT0dZR+AlfA,
	shunyong.yang-PT9Dzx9SjPiXmMXjJBpWqg,
	nwatters-sgV2jX0FEOL9JmXXK+q4OQ, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	jcrouse-sgV2jX0FEOL9JmXXK+q4OQ, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA,
	yi.l.liu-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	robdclark-Re5JQEeQqe8AvxtiuMwx3w, christian.koenig-5C7GfCeVMHo,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA

When an mm exits, devices that were bound to it must stop performing DMA
on its PASID. Let device drivers register a callback to be notified on mm
exit. Add the callback to the iommu_param structure attached to struct
device.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/iommu-sva.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/iommu.h     | 18 ++++++++++++++++
 2 files changed, 72 insertions(+)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index f9af9d66b3ed..90b524c99d3d 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -569,3 +569,57 @@ void __iommu_sva_unbind_dev_all(struct device *dev)
 	spin_unlock(&iommu_sva_lock);
 }
 EXPORT_SYMBOL_GPL(__iommu_sva_unbind_dev_all);
+
+/**
+ * iommu_register_mm_exit_handler() - Set a callback for mm exit
+ * @dev: the device
+ * @handler: exit handler
+ *
+ * Users of the bind/unbind API should call this function to set a
+ * device-specific callback telling them when a mm is exiting.
+ *
+ * After the callback returns, the device must not issue any more transaction
+ * with the PASID given as argument to the handler. In addition the handler gets
+ * an opaque pointer corresponding to the drvdata passed as argument of bind().
+ *
+ * The handler itself should return 0 on success, and an appropriate error code
+ * otherwise.
+ */
+int iommu_register_mm_exit_handler(struct device *dev,
+				   iommu_mm_exit_handler_t handler)
+{
+	struct iommu_param *dev_param = dev->iommu_param;
+
+	if (!dev_param)
+		return -EINVAL;
+
+	/*
+	 * FIXME: racy. Same as iommu_sva_device_init, but here we'll need a
+	 * spinlock to call the mm_exit param from atomic context.
+	 */
+	if (dev_param->mm_exit)
+		return -EBUSY;
+
+	get_device(dev);
+	dev_param->mm_exit = handler;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iommu_register_mm_exit_handler);
+
+/**
+ * iommu_unregister_mm_exit_handler() - Remove mm exit callback
+ */
+int iommu_unregister_mm_exit_handler(struct device *dev)
+{
+	struct iommu_param *dev_param = dev->iommu_param;
+
+	if (!dev_param || !dev_param->mm_exit)
+		return -EINVAL;
+
+	dev_param->mm_exit = NULL;
+	put_device(dev);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iommu_unregister_mm_exit_handler);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 09d85f44142a..1b1a16892ac1 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -65,6 +65,8 @@ typedef int (*iommu_dev_fault_handler_t)(struct iommu_fault_event *, void *);
 /* Request I/O page fault support */
 #define IOMMU_SVA_FEAT_IOPF		(1 << 1)
 
+typedef int (*iommu_mm_exit_handler_t)(struct device *dev, int pasid, void *);
+
 struct iommu_domain_geometry {
 	dma_addr_t aperture_start; /* First address that can be mapped    */
 	dma_addr_t aperture_end;   /* Last address that can be mapped     */
@@ -424,6 +426,7 @@ struct iommu_param {
 	unsigned int min_pasid;
 	unsigned int max_pasid;
 	struct list_head mm_list;
+	iommu_mm_exit_handler_t mm_exit;
 };
 
 int  iommu_device_register(struct iommu_device *iommu);
@@ -941,6 +944,10 @@ extern int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
 				int *pasid, unsigned long flags, void *drvdata);
 extern int iommu_sva_unbind_device(struct device *dev, int pasid);
 extern void __iommu_sva_unbind_dev_all(struct device *dev);
+extern int iommu_register_mm_exit_handler(struct device *dev,
+					  iommu_mm_exit_handler_t handler);
+extern int iommu_unregister_mm_exit_handler(struct device *dev);
+
 #else /* CONFIG_IOMMU_SVA */
 static inline int iommu_sva_device_init(struct device *dev,
 					unsigned long features,
@@ -969,6 +976,17 @@ static inline int iommu_sva_unbind_device(struct device *dev, int pasid)
 static inline void __iommu_sva_unbind_dev_all(struct device *dev)
 {
 }
+
+static inline int iommu_register_mm_exit_handler(struct device *dev,
+						 iommu_mm_exit_handler_t handler)
+{
+	return -ENODEV;
+}
+
+static inline int iommu_unregister_mm_exit_handler(struct device *dev)
+{
+	return -ENODEV;
+}
 #endif /* CONFIG_IOMMU_SVA */
 
 #endif /* __LINUX_IOMMU_H */
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 04/37] iommu/sva: Add a mm_exit callback for device drivers
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

When an mm exits, devices that were bound to it must stop performing DMA
on its PASID. Let device drivers register a callback to be notified on mm
exit. Add the callback to the iommu_param structure attached to struct
device.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/iommu-sva.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/iommu.h     | 18 ++++++++++++++++
 2 files changed, 72 insertions(+)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index f9af9d66b3ed..90b524c99d3d 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -569,3 +569,57 @@ void __iommu_sva_unbind_dev_all(struct device *dev)
 	spin_unlock(&iommu_sva_lock);
 }
 EXPORT_SYMBOL_GPL(__iommu_sva_unbind_dev_all);
+
+/**
+ * iommu_register_mm_exit_handler() - Set a callback for mm exit
+ * @dev: the device
+ * @handler: exit handler
+ *
+ * Users of the bind/unbind API should call this function to set a
+ * device-specific callback telling them when a mm is exiting.
+ *
+ * After the callback returns, the device must not issue any more transaction
+ * with the PASID given as argument to the handler. In addition the handler gets
+ * an opaque pointer corresponding to the drvdata passed as argument of bind().
+ *
+ * The handler itself should return 0 on success, and an appropriate error code
+ * otherwise.
+ */
+int iommu_register_mm_exit_handler(struct device *dev,
+				   iommu_mm_exit_handler_t handler)
+{
+	struct iommu_param *dev_param = dev->iommu_param;
+
+	if (!dev_param)
+		return -EINVAL;
+
+	/*
+	 * FIXME: racy. Same as iommu_sva_device_init, but here we'll need a
+	 * spinlock to call the mm_exit param from atomic context.
+	 */
+	if (dev_param->mm_exit)
+		return -EBUSY;
+
+	get_device(dev);
+	dev_param->mm_exit = handler;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iommu_register_mm_exit_handler);
+
+/**
+ * iommu_unregister_mm_exit_handler() - Remove mm exit callback
+ */
+int iommu_unregister_mm_exit_handler(struct device *dev)
+{
+	struct iommu_param *dev_param = dev->iommu_param;
+
+	if (!dev_param || !dev_param->mm_exit)
+		return -EINVAL;
+
+	dev_param->mm_exit = NULL;
+	put_device(dev);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iommu_unregister_mm_exit_handler);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 09d85f44142a..1b1a16892ac1 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -65,6 +65,8 @@ typedef int (*iommu_dev_fault_handler_t)(struct iommu_fault_event *, void *);
 /* Request I/O page fault support */
 #define IOMMU_SVA_FEAT_IOPF		(1 << 1)
 
+typedef int (*iommu_mm_exit_handler_t)(struct device *dev, int pasid, void *);
+
 struct iommu_domain_geometry {
 	dma_addr_t aperture_start; /* First address that can be mapped    */
 	dma_addr_t aperture_end;   /* Last address that can be mapped     */
@@ -424,6 +426,7 @@ struct iommu_param {
 	unsigned int min_pasid;
 	unsigned int max_pasid;
 	struct list_head mm_list;
+	iommu_mm_exit_handler_t mm_exit;
 };
 
 int  iommu_device_register(struct iommu_device *iommu);
@@ -941,6 +944,10 @@ extern int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
 				int *pasid, unsigned long flags, void *drvdata);
 extern int iommu_sva_unbind_device(struct device *dev, int pasid);
 extern void __iommu_sva_unbind_dev_all(struct device *dev);
+extern int iommu_register_mm_exit_handler(struct device *dev,
+					  iommu_mm_exit_handler_t handler);
+extern int iommu_unregister_mm_exit_handler(struct device *dev);
+
 #else /* CONFIG_IOMMU_SVA */
 static inline int iommu_sva_device_init(struct device *dev,
 					unsigned long features,
@@ -969,6 +976,17 @@ static inline int iommu_sva_unbind_device(struct device *dev, int pasid)
 static inline void __iommu_sva_unbind_dev_all(struct device *dev)
 {
 }
+
+static inline int iommu_register_mm_exit_handler(struct device *dev,
+						 iommu_mm_exit_handler_t handler)
+{
+	return -ENODEV;
+}
+
+static inline int iommu_unregister_mm_exit_handler(struct device *dev)
+{
+	return -ENODEV;
+}
 #endif /* CONFIG_IOMMU_SVA */
 
 #endif /* __LINUX_IOMMU_H */
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 04/37] iommu/sva: Add a mm_exit callback for device drivers
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

When an mm exits, devices that were bound to it must stop performing DMA
on its PASID. Let device drivers register a callback to be notified on mm
exit. Add the callback to the iommu_param structure attached to struct
device.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/iommu-sva.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/iommu.h     | 18 ++++++++++++++++
 2 files changed, 72 insertions(+)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index f9af9d66b3ed..90b524c99d3d 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -569,3 +569,57 @@ void __iommu_sva_unbind_dev_all(struct device *dev)
 	spin_unlock(&iommu_sva_lock);
 }
 EXPORT_SYMBOL_GPL(__iommu_sva_unbind_dev_all);
+
+/**
+ * iommu_register_mm_exit_handler() - Set a callback for mm exit
+ * @dev: the device
+ * @handler: exit handler
+ *
+ * Users of the bind/unbind API should call this function to set a
+ * device-specific callback telling them when a mm is exiting.
+ *
+ * After the callback returns, the device must not issue any more transaction
+ * with the PASID given as argument to the handler. In addition the handler gets
+ * an opaque pointer corresponding to the drvdata passed as argument of bind().
+ *
+ * The handler itself should return 0 on success, and an appropriate error code
+ * otherwise.
+ */
+int iommu_register_mm_exit_handler(struct device *dev,
+				   iommu_mm_exit_handler_t handler)
+{
+	struct iommu_param *dev_param = dev->iommu_param;
+
+	if (!dev_param)
+		return -EINVAL;
+
+	/*
+	 * FIXME: racy. Same as iommu_sva_device_init, but here we'll need a
+	 * spinlock to call the mm_exit param from atomic context.
+	 */
+	if (dev_param->mm_exit)
+		return -EBUSY;
+
+	get_device(dev);
+	dev_param->mm_exit = handler;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iommu_register_mm_exit_handler);
+
+/**
+ * iommu_unregister_mm_exit_handler() - Remove mm exit callback
+ */
+int iommu_unregister_mm_exit_handler(struct device *dev)
+{
+	struct iommu_param *dev_param = dev->iommu_param;
+
+	if (!dev_param || !dev_param->mm_exit)
+		return -EINVAL;
+
+	dev_param->mm_exit = NULL;
+	put_device(dev);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iommu_unregister_mm_exit_handler);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 09d85f44142a..1b1a16892ac1 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -65,6 +65,8 @@ typedef int (*iommu_dev_fault_handler_t)(struct iommu_fault_event *, void *);
 /* Request I/O page fault support */
 #define IOMMU_SVA_FEAT_IOPF		(1 << 1)
 
+typedef int (*iommu_mm_exit_handler_t)(struct device *dev, int pasid, void *);
+
 struct iommu_domain_geometry {
 	dma_addr_t aperture_start; /* First address that can be mapped    */
 	dma_addr_t aperture_end;   /* Last address that can be mapped     */
@@ -424,6 +426,7 @@ struct iommu_param {
 	unsigned int min_pasid;
 	unsigned int max_pasid;
 	struct list_head mm_list;
+	iommu_mm_exit_handler_t mm_exit;
 };
 
 int  iommu_device_register(struct iommu_device *iommu);
@@ -941,6 +944,10 @@ extern int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
 				int *pasid, unsigned long flags, void *drvdata);
 extern int iommu_sva_unbind_device(struct device *dev, int pasid);
 extern void __iommu_sva_unbind_dev_all(struct device *dev);
+extern int iommu_register_mm_exit_handler(struct device *dev,
+					  iommu_mm_exit_handler_t handler);
+extern int iommu_unregister_mm_exit_handler(struct device *dev);
+
 #else /* CONFIG_IOMMU_SVA */
 static inline int iommu_sva_device_init(struct device *dev,
 					unsigned long features,
@@ -969,6 +976,17 @@ static inline int iommu_sva_unbind_device(struct device *dev, int pasid)
 static inline void __iommu_sva_unbind_dev_all(struct device *dev)
 {
 }
+
+static inline int iommu_register_mm_exit_handler(struct device *dev,
+						 iommu_mm_exit_handler_t handler)
+{
+	return -ENODEV;
+}
+
+static inline int iommu_unregister_mm_exit_handler(struct device *dev)
+{
+	return -ENODEV;
+}
 #endif /* CONFIG_IOMMU_SVA */
 
 #endif /* __LINUX_IOMMU_H */
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 05/37] iommu/sva: Track mm changes with an MMU notifier
  2018-02-12 18:33 ` Jean-Philippe Brucker
@ 2018-02-12 18:33   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

When creating an io_mm structure, register an MMU notifier that informs
us when the virtual address space changes and disappears.

Add one new operation to the IOMMU driver: mm_invalidate is called when
a range of addresses is unmapped, to let the IOMMU driver send ATC
invalidations.

Adding the notifier complicates io_mm release. In one case device
drivers free the io_mm explicitly by calling unbind (or detaching the
device from its domain). In the other case the process could crash
before unbind, in which case the release notifier has to do all the
work.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/Kconfig     |   1 +
 drivers/iommu/iommu-sva.c | 161 ++++++++++++++++++++++++++++++++++++++++++++--
 include/linux/iommu.h     |  10 +++
 3 files changed, 165 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 555147a61f7c..146eebe9a4bb 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -77,6 +77,7 @@ config IOMMU_DMA
 config IOMMU_SVA
 	bool "Shared Virtual Addressing API for the IOMMU"
 	select IOMMU_API
+	select MMU_NOTIFIER
 	help
 	  Enable process address space management for the IOMMU API. In systems
 	  that support it, device drivers can bind process address spaces to
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 90b524c99d3d..9108adb54ec7 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -9,7 +9,9 @@
 
 #include <linux/idr.h>
 #include <linux/iommu.h>
+#include <linux/mmu_notifier.h>
 #include <linux/slab.h>
+#include <linux/sched/mm.h>
 #include <linux/spinlock.h>
 
 /**
@@ -131,6 +133,8 @@ static DEFINE_IDR(iommu_pasid_idr);
  */
 static DEFINE_SPINLOCK(iommu_sva_lock);
 
+static struct mmu_notifier_ops iommu_mmu_notifier;
+
 static struct io_mm *
 io_mm_alloc(struct iommu_domain *domain, struct device *dev,
 	    struct mm_struct *mm)
@@ -157,6 +161,7 @@ io_mm_alloc(struct iommu_domain *domain, struct device *dev,
 	mmgrab(mm);
 
 	io_mm->mm		= mm;
+	io_mm->notifier.ops	= &iommu_mmu_notifier;
 	io_mm->release		= domain->ops->mm_free;
 	INIT_LIST_HEAD(&io_mm->devices);
 
@@ -173,8 +178,29 @@ io_mm_alloc(struct iommu_domain *domain, struct device *dev,
 		goto err_free_mm;
 	}
 
-	/* TODO: keep track of mm. For the moment, abort. */
-	ret = -ENOSYS;
+	ret = mmu_notifier_register(&io_mm->notifier, mm);
+	if (ret)
+		goto err_free_pasid;
+
+	/*
+	 * Now that the MMU notifier is valid, we can allow users to grab this
+	 * io_mm by setting a valid refcount. Before that it was accessible in
+	 * the IDR but invalid.
+	 *
+	 * The following barrier ensures that users, who obtain the io_mm with
+	 * kref_get_unless_zero, don't read uninitialized fields in the
+	 * structure.
+	 */
+	smp_wmb();
+	kref_init(&io_mm->kref);
+
+	return io_mm;
+
+err_free_pasid:
+	/*
+	 * Even if the io_mm is accessible from the IDR at this point, kref is
+	 * 0 so no user could get a reference to it. Free it manually.
+	 */
 	spin_lock(&iommu_sva_lock);
 	idr_remove(&iommu_pasid_idr, io_mm->pasid);
 	spin_unlock(&iommu_sva_lock);
@@ -186,11 +212,13 @@ io_mm_alloc(struct iommu_domain *domain, struct device *dev,
 	return ERR_PTR(ret);
 }
 
-static void io_mm_free(struct io_mm *io_mm)
+static void io_mm_free(struct rcu_head *rcu)
 {
+	struct io_mm *io_mm;
 	struct mm_struct *mm;
 	void (*release)(struct io_mm *);
 
+	io_mm = container_of(rcu, struct io_mm, rcu);
 	release = io_mm->release;
 	mm = io_mm->mm;
 
@@ -207,7 +235,22 @@ static void io_mm_release(struct kref *kref)
 
 	idr_remove(&iommu_pasid_idr, io_mm->pasid);
 
-	io_mm_free(io_mm);
+	/*
+	 * If we're being released from mm exit, the notifier callback ->release
+	 * has already been called. Otherwise we don't need ->release, the io_mm
+	 * isn't attached to anything anymore. Hence no_release.
+	 */
+	mmu_notifier_unregister_no_release(&io_mm->notifier, io_mm->mm);
+
+	/*
+	 * We can't free the structure here, because if mm exits during
+	 * unbind(), then ->release might be attempting to grab the io_mm
+	 * concurrently. And in the other case, if ->release is calling
+	 * io_mm_release, then __mmu_notifier_release expects to still have a
+	 * valid mn when returning. So free the structure when it's safe, after
+	 * the RCU grace period elapsed.
+	 */
+	mmu_notifier_call_srcu(&io_mm->rcu, io_mm_free);
 }
 
 /*
@@ -216,8 +259,14 @@ static void io_mm_release(struct kref *kref)
  */
 static int io_mm_get_locked(struct io_mm *io_mm)
 {
-	if (io_mm)
-		return kref_get_unless_zero(&io_mm->kref);
+	if (io_mm && kref_get_unless_zero(&io_mm->kref)) {
+		/*
+		 * kref_get_unless_zero doesn't provide ordering for reads. This
+		 * barrier pairs with the one in io_mm_alloc.
+		 */
+		smp_rmb();
+		return 1;
+	}
 
 	return 0;
 }
@@ -246,7 +295,8 @@ static int io_mm_attach(struct iommu_domain *domain, struct device *dev,
 	if (!dev_param)
 		return -EINVAL;
 
-	if (!domain->ops->mm_attach || !domain->ops->mm_detach)
+	if (!domain->ops->mm_attach || !domain->ops->mm_detach ||
+	    !domain->ops->mm_invalidate)
 		return -ENODEV;
 
 	if (pasid > dev_param->max_pasid || pasid < dev_param->min_pasid)
@@ -322,6 +372,103 @@ static void io_mm_detach_all_locked(struct iommu_bond *bond)
 	while (!io_mm_detach_locked(bond));
 }
 
+static int iommu_signal_mm_exit(struct iommu_bond *bond)
+{
+	struct device *dev = bond->dev;
+	struct io_mm *io_mm = bond->io_mm;
+
+	if (!dev->iommu_param || !dev->iommu_param->mm_exit)
+		return 0;
+
+	return dev->iommu_param->mm_exit(dev, io_mm->pasid, bond->drvdata);
+}
+
+/*
+ * Called when the mm exits. Might race with unbind() or any other function
+ * dropping the last reference to the mm.
+ */
+static void iommu_notifier_release(struct mmu_notifier *mn, struct mm_struct *mm)
+{
+	struct iommu_bond *bond, *next;
+	struct io_mm *io_mm = container_of(mn, struct io_mm, notifier);
+
+	/*
+	 * If the mm is exiting then devices are still bound to the io_mm.
+	 * A few things need to be done before it is safe to release:
+	 *
+	 * - As the mmu notifier doesn't hold any reference to the io_mm when
+	 *   calling ->release(), try to take a reference.
+	 * - Tell the device driver to stop using this PASID.
+	 * - Clear the PASID table and invalidate TLBs.
+	 * - Drop all references to this io_mm by freeing the bonds.
+	 */
+	spin_lock(&iommu_sva_lock);
+	if (!io_mm_get_locked(io_mm)) {
+		/* Someone's already taking care of it. */
+		spin_unlock(&iommu_sva_lock);
+		return;
+	}
+
+	list_for_each_entry_safe(bond, next, &io_mm->devices, mm_head) {
+		if (iommu_signal_mm_exit(bond))
+			dev_WARN(bond->dev, "possible leak of PASID %u",
+				 io_mm->pasid);
+
+		io_mm_detach_all_locked(bond);
+	}
+	spin_unlock(&iommu_sva_lock);
+
+	iommu_fault_queue_flush(NULL);
+
+	/*
+	 * We're now reasonably certain that no more fault is being handled for
+	 * this io_mm, since we just flushed them all out of the fault queue.
+	 * Release the last reference to free the io_mm.
+	 */
+	io_mm_put(io_mm);
+}
+
+static void iommu_notifier_invalidate_range(struct mmu_notifier *mn,
+					    struct mm_struct *mm,
+					    unsigned long start,
+					    unsigned long end)
+{
+	struct iommu_bond *bond;
+	struct io_mm *io_mm = container_of(mn, struct io_mm, notifier);
+
+	spin_lock(&iommu_sva_lock);
+	list_for_each_entry(bond, &io_mm->devices, mm_head) {
+		struct iommu_domain *domain = bond->domain;
+
+		domain->ops->mm_invalidate(domain, bond->dev, io_mm, start,
+					   end - start);
+	}
+	spin_unlock(&iommu_sva_lock);
+}
+
+static int iommu_notifier_clear_flush_young(struct mmu_notifier *mn,
+					    struct mm_struct *mm,
+					    unsigned long start,
+					    unsigned long end)
+{
+	iommu_notifier_invalidate_range(mn, mm, start, end);
+	return 0;
+}
+
+static void iommu_notifier_change_pte(struct mmu_notifier *mn,
+				      struct mm_struct *mm,
+				      unsigned long address, pte_t pte)
+{
+	iommu_notifier_invalidate_range(mn, mm, address, address + PAGE_SIZE);
+}
+
+static struct mmu_notifier_ops iommu_mmu_notifier = {
+	.release		= iommu_notifier_release,
+	.clear_flush_young	= iommu_notifier_clear_flush_young,
+	.change_pte		= iommu_notifier_change_pte,
+	.invalidate_range	= iommu_notifier_invalidate_range,
+};
+
 /**
  * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a device
  * @dev: the device
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 1b1a16892ac1..afec7b1d3301 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -25,6 +25,7 @@
 #include <linux/errno.h>
 #include <linux/err.h>
 #include <linux/of.h>
+#include <linux/mmu_notifier.h>
 
 #define IOMMU_READ	(1 << 0)
 #define IOMMU_WRITE	(1 << 1)
@@ -113,10 +114,15 @@ struct io_mm {
 	int			pasid;
 	struct list_head	devices;
 	struct kref		kref;
+#if defined(CONFIG_MMU_NOTIFIER)
+	struct mmu_notifier	notifier;
+#endif
 	struct mm_struct	*mm;
 
 	/* Release callback for this mm */
 	void (*release)(struct io_mm *io_mm);
+	/* For postponed release */
+	struct rcu_head		rcu;
 };
 
 enum iommu_cap {
@@ -223,6 +229,7 @@ struct page_response_msg {
  * @mm_attach: attach io_mm to a device. Install PASID entry if necessary
  * @mm_detach: detach io_mm from a device. Remove PASID entry and
  *             flush associated TLB entries.
+ * @mm_invalidate: Invalidate a range of mappings for an mm
  * @map: map a physically contiguous memory region to an iommu domain
  * @unmap: unmap a physically contiguous memory region from an iommu domain
  * @map_sg: map a scatter-gather list of physically contiguous memory chunks
@@ -267,6 +274,9 @@ struct iommu_ops {
 			 struct io_mm *io_mm, bool attach_domain);
 	void (*mm_detach)(struct iommu_domain *domain, struct device *dev,
 			  struct io_mm *io_mm, bool detach_domain);
+	void (*mm_invalidate)(struct iommu_domain *domain, struct device *dev,
+			      struct io_mm *io_mm, unsigned long vaddr,
+			      size_t size);
 	int (*map)(struct iommu_domain *domain, unsigned long iova,
 		   phys_addr_t paddr, size_t size, int prot);
 	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 05/37] iommu/sva: Track mm changes with an MMU notifier
@ 2018-02-12 18:33   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

When creating an io_mm structure, register an MMU notifier that informs
us when the virtual address space changes and disappears.

Add one new operation to the IOMMU driver: mm_invalidate is called when
a range of addresses is unmapped, to let the IOMMU driver send ATC
invalidations.

Adding the notifier complicates io_mm release. In one case device
drivers free the io_mm explicitly by calling unbind (or detaching the
device from its domain). In the other case the process could crash
before unbind, in which case the release notifier has to do all the
work.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/Kconfig     |   1 +
 drivers/iommu/iommu-sva.c | 161 ++++++++++++++++++++++++++++++++++++++++++++--
 include/linux/iommu.h     |  10 +++
 3 files changed, 165 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 555147a61f7c..146eebe9a4bb 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -77,6 +77,7 @@ config IOMMU_DMA
 config IOMMU_SVA
 	bool "Shared Virtual Addressing API for the IOMMU"
 	select IOMMU_API
+	select MMU_NOTIFIER
 	help
 	  Enable process address space management for the IOMMU API. In systems
 	  that support it, device drivers can bind process address spaces to
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 90b524c99d3d..9108adb54ec7 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -9,7 +9,9 @@
 
 #include <linux/idr.h>
 #include <linux/iommu.h>
+#include <linux/mmu_notifier.h>
 #include <linux/slab.h>
+#include <linux/sched/mm.h>
 #include <linux/spinlock.h>
 
 /**
@@ -131,6 +133,8 @@ static DEFINE_IDR(iommu_pasid_idr);
  */
 static DEFINE_SPINLOCK(iommu_sva_lock);
 
+static struct mmu_notifier_ops iommu_mmu_notifier;
+
 static struct io_mm *
 io_mm_alloc(struct iommu_domain *domain, struct device *dev,
 	    struct mm_struct *mm)
@@ -157,6 +161,7 @@ io_mm_alloc(struct iommu_domain *domain, struct device *dev,
 	mmgrab(mm);
 
 	io_mm->mm		= mm;
+	io_mm->notifier.ops	= &iommu_mmu_notifier;
 	io_mm->release		= domain->ops->mm_free;
 	INIT_LIST_HEAD(&io_mm->devices);
 
@@ -173,8 +178,29 @@ io_mm_alloc(struct iommu_domain *domain, struct device *dev,
 		goto err_free_mm;
 	}
 
-	/* TODO: keep track of mm. For the moment, abort. */
-	ret = -ENOSYS;
+	ret = mmu_notifier_register(&io_mm->notifier, mm);
+	if (ret)
+		goto err_free_pasid;
+
+	/*
+	 * Now that the MMU notifier is valid, we can allow users to grab this
+	 * io_mm by setting a valid refcount. Before that it was accessible in
+	 * the IDR but invalid.
+	 *
+	 * The following barrier ensures that users, who obtain the io_mm with
+	 * kref_get_unless_zero, don't read uninitialized fields in the
+	 * structure.
+	 */
+	smp_wmb();
+	kref_init(&io_mm->kref);
+
+	return io_mm;
+
+err_free_pasid:
+	/*
+	 * Even if the io_mm is accessible from the IDR at this point, kref is
+	 * 0 so no user could get a reference to it. Free it manually.
+	 */
 	spin_lock(&iommu_sva_lock);
 	idr_remove(&iommu_pasid_idr, io_mm->pasid);
 	spin_unlock(&iommu_sva_lock);
@@ -186,11 +212,13 @@ io_mm_alloc(struct iommu_domain *domain, struct device *dev,
 	return ERR_PTR(ret);
 }
 
-static void io_mm_free(struct io_mm *io_mm)
+static void io_mm_free(struct rcu_head *rcu)
 {
+	struct io_mm *io_mm;
 	struct mm_struct *mm;
 	void (*release)(struct io_mm *);
 
+	io_mm = container_of(rcu, struct io_mm, rcu);
 	release = io_mm->release;
 	mm = io_mm->mm;
 
@@ -207,7 +235,22 @@ static void io_mm_release(struct kref *kref)
 
 	idr_remove(&iommu_pasid_idr, io_mm->pasid);
 
-	io_mm_free(io_mm);
+	/*
+	 * If we're being released from mm exit, the notifier callback ->release
+	 * has already been called. Otherwise we don't need ->release, the io_mm
+	 * isn't attached to anything anymore. Hence no_release.
+	 */
+	mmu_notifier_unregister_no_release(&io_mm->notifier, io_mm->mm);
+
+	/*
+	 * We can't free the structure here, because if mm exits during
+	 * unbind(), then ->release might be attempting to grab the io_mm
+	 * concurrently. And in the other case, if ->release is calling
+	 * io_mm_release, then __mmu_notifier_release expects to still have a
+	 * valid mn when returning. So free the structure when it's safe, after
+	 * the RCU grace period elapsed.
+	 */
+	mmu_notifier_call_srcu(&io_mm->rcu, io_mm_free);
 }
 
 /*
@@ -216,8 +259,14 @@ static void io_mm_release(struct kref *kref)
  */
 static int io_mm_get_locked(struct io_mm *io_mm)
 {
-	if (io_mm)
-		return kref_get_unless_zero(&io_mm->kref);
+	if (io_mm && kref_get_unless_zero(&io_mm->kref)) {
+		/*
+		 * kref_get_unless_zero doesn't provide ordering for reads. This
+		 * barrier pairs with the one in io_mm_alloc.
+		 */
+		smp_rmb();
+		return 1;
+	}
 
 	return 0;
 }
@@ -246,7 +295,8 @@ static int io_mm_attach(struct iommu_domain *domain, struct device *dev,
 	if (!dev_param)
 		return -EINVAL;
 
-	if (!domain->ops->mm_attach || !domain->ops->mm_detach)
+	if (!domain->ops->mm_attach || !domain->ops->mm_detach ||
+	    !domain->ops->mm_invalidate)
 		return -ENODEV;
 
 	if (pasid > dev_param->max_pasid || pasid < dev_param->min_pasid)
@@ -322,6 +372,103 @@ static void io_mm_detach_all_locked(struct iommu_bond *bond)
 	while (!io_mm_detach_locked(bond));
 }
 
+static int iommu_signal_mm_exit(struct iommu_bond *bond)
+{
+	struct device *dev = bond->dev;
+	struct io_mm *io_mm = bond->io_mm;
+
+	if (!dev->iommu_param || !dev->iommu_param->mm_exit)
+		return 0;
+
+	return dev->iommu_param->mm_exit(dev, io_mm->pasid, bond->drvdata);
+}
+
+/*
+ * Called when the mm exits. Might race with unbind() or any other function
+ * dropping the last reference to the mm.
+ */
+static void iommu_notifier_release(struct mmu_notifier *mn, struct mm_struct *mm)
+{
+	struct iommu_bond *bond, *next;
+	struct io_mm *io_mm = container_of(mn, struct io_mm, notifier);
+
+	/*
+	 * If the mm is exiting then devices are still bound to the io_mm.
+	 * A few things need to be done before it is safe to release:
+	 *
+	 * - As the mmu notifier doesn't hold any reference to the io_mm when
+	 *   calling ->release(), try to take a reference.
+	 * - Tell the device driver to stop using this PASID.
+	 * - Clear the PASID table and invalidate TLBs.
+	 * - Drop all references to this io_mm by freeing the bonds.
+	 */
+	spin_lock(&iommu_sva_lock);
+	if (!io_mm_get_locked(io_mm)) {
+		/* Someone's already taking care of it. */
+		spin_unlock(&iommu_sva_lock);
+		return;
+	}
+
+	list_for_each_entry_safe(bond, next, &io_mm->devices, mm_head) {
+		if (iommu_signal_mm_exit(bond))
+			dev_WARN(bond->dev, "possible leak of PASID %u",
+				 io_mm->pasid);
+
+		io_mm_detach_all_locked(bond);
+	}
+	spin_unlock(&iommu_sva_lock);
+
+	iommu_fault_queue_flush(NULL);
+
+	/*
+	 * We're now reasonably certain that no more fault is being handled for
+	 * this io_mm, since we just flushed them all out of the fault queue.
+	 * Release the last reference to free the io_mm.
+	 */
+	io_mm_put(io_mm);
+}
+
+static void iommu_notifier_invalidate_range(struct mmu_notifier *mn,
+					    struct mm_struct *mm,
+					    unsigned long start,
+					    unsigned long end)
+{
+	struct iommu_bond *bond;
+	struct io_mm *io_mm = container_of(mn, struct io_mm, notifier);
+
+	spin_lock(&iommu_sva_lock);
+	list_for_each_entry(bond, &io_mm->devices, mm_head) {
+		struct iommu_domain *domain = bond->domain;
+
+		domain->ops->mm_invalidate(domain, bond->dev, io_mm, start,
+					   end - start);
+	}
+	spin_unlock(&iommu_sva_lock);
+}
+
+static int iommu_notifier_clear_flush_young(struct mmu_notifier *mn,
+					    struct mm_struct *mm,
+					    unsigned long start,
+					    unsigned long end)
+{
+	iommu_notifier_invalidate_range(mn, mm, start, end);
+	return 0;
+}
+
+static void iommu_notifier_change_pte(struct mmu_notifier *mn,
+				      struct mm_struct *mm,
+				      unsigned long address, pte_t pte)
+{
+	iommu_notifier_invalidate_range(mn, mm, address, address + PAGE_SIZE);
+}
+
+static struct mmu_notifier_ops iommu_mmu_notifier = {
+	.release		= iommu_notifier_release,
+	.clear_flush_young	= iommu_notifier_clear_flush_young,
+	.change_pte		= iommu_notifier_change_pte,
+	.invalidate_range	= iommu_notifier_invalidate_range,
+};
+
 /**
  * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a device
  * @dev: the device
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 1b1a16892ac1..afec7b1d3301 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -25,6 +25,7 @@
 #include <linux/errno.h>
 #include <linux/err.h>
 #include <linux/of.h>
+#include <linux/mmu_notifier.h>
 
 #define IOMMU_READ	(1 << 0)
 #define IOMMU_WRITE	(1 << 1)
@@ -113,10 +114,15 @@ struct io_mm {
 	int			pasid;
 	struct list_head	devices;
 	struct kref		kref;
+#if defined(CONFIG_MMU_NOTIFIER)
+	struct mmu_notifier	notifier;
+#endif
 	struct mm_struct	*mm;
 
 	/* Release callback for this mm */
 	void (*release)(struct io_mm *io_mm);
+	/* For postponed release */
+	struct rcu_head		rcu;
 };
 
 enum iommu_cap {
@@ -223,6 +229,7 @@ struct page_response_msg {
  * @mm_attach: attach io_mm to a device. Install PASID entry if necessary
  * @mm_detach: detach io_mm from a device. Remove PASID entry and
  *             flush associated TLB entries.
+ * @mm_invalidate: Invalidate a range of mappings for an mm
  * @map: map a physically contiguous memory region to an iommu domain
  * @unmap: unmap a physically contiguous memory region from an iommu domain
  * @map_sg: map a scatter-gather list of physically contiguous memory chunks
@@ -267,6 +274,9 @@ struct iommu_ops {
 			 struct io_mm *io_mm, bool attach_domain);
 	void (*mm_detach)(struct iommu_domain *domain, struct device *dev,
 			  struct io_mm *io_mm, bool detach_domain);
+	void (*mm_invalidate)(struct iommu_domain *domain, struct device *dev,
+			      struct io_mm *io_mm, unsigned long vaddr,
+			      size_t size);
 	int (*map)(struct iommu_domain *domain, unsigned long iova,
 		   phys_addr_t paddr, size_t size, int prot);
 	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 06/37] iommu/sva: Search mm by PASID
  2018-02-12 18:33 ` Jean-Philippe Brucker
@ 2018-02-12 18:33   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

The fault handler will need to find an mm given its PASID. This is the
reason we have an IDR for storing address spaces, so hook it up. A future
optimization could find the io_mm from the struct device passed to the
fault handler, since it's readily accessible.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/iommu-sva.c | 26 ++++++++++++++++++++++++++
 include/linux/iommu.h     |  6 ++++++
 2 files changed, 32 insertions(+)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 9108adb54ec7..4bc2a8c12465 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -10,6 +10,7 @@
 #include <linux/idr.h>
 #include <linux/iommu.h>
 #include <linux/mmu_notifier.h>
+#include <linux/sched/mm.h>
 #include <linux/slab.h>
 #include <linux/sched/mm.h>
 #include <linux/spinlock.h>
@@ -770,3 +771,28 @@ int iommu_unregister_mm_exit_handler(struct device *dev)
 	return 0;
 }
 EXPORT_SYMBOL_GPL(iommu_unregister_mm_exit_handler);
+
+/*
+ * iommu_sva_find() - Find mm associated to the given PASID
+ *
+ * Returns the mm corresponding to this PASID, or NULL if not found. A reference
+ * to the mm is taken, and must be released with mmput().
+ */
+struct mm_struct *iommu_sva_find(int pasid)
+{
+	struct io_mm *io_mm;
+	struct mm_struct *mm = NULL;
+
+	spin_lock(&iommu_sva_lock);
+	io_mm = idr_find(&iommu_pasid_idr, pasid);
+	if (io_mm && io_mm_get_locked(io_mm)) {
+		if (mmget_not_zero(io_mm->mm))
+			mm = io_mm->mm;
+
+		io_mm_put_locked(io_mm);
+	}
+	spin_unlock(&iommu_sva_lock);
+
+	return mm;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_find);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index afec7b1d3301..226ab4f3ae0e 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -958,6 +958,7 @@ extern int iommu_register_mm_exit_handler(struct device *dev,
 					  iommu_mm_exit_handler_t handler);
 extern int iommu_unregister_mm_exit_handler(struct device *dev);
 
+extern struct mm_struct *iommu_sva_find(int pasid);
 #else /* CONFIG_IOMMU_SVA */
 static inline int iommu_sva_device_init(struct device *dev,
 					unsigned long features,
@@ -997,6 +998,11 @@ static inline int iommu_unregister_mm_exit_handler(struct device *dev)
 {
 	return -ENODEV;
 }
+
+static inline struct mm_struct *iommu_sva_find(int pasid)
+{
+	return NULL;
+}
 #endif /* CONFIG_IOMMU_SVA */
 
 #endif /* __LINUX_IOMMU_H */
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 06/37] iommu/sva: Search mm by PASID
@ 2018-02-12 18:33   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

The fault handler will need to find an mm given its PASID. This is the
reason we have an IDR for storing address spaces, so hook it up. A future
optimization could find the io_mm from the struct device passed to the
fault handler, since it's readily accessible.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/iommu-sva.c | 26 ++++++++++++++++++++++++++
 include/linux/iommu.h     |  6 ++++++
 2 files changed, 32 insertions(+)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 9108adb54ec7..4bc2a8c12465 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -10,6 +10,7 @@
 #include <linux/idr.h>
 #include <linux/iommu.h>
 #include <linux/mmu_notifier.h>
+#include <linux/sched/mm.h>
 #include <linux/slab.h>
 #include <linux/sched/mm.h>
 #include <linux/spinlock.h>
@@ -770,3 +771,28 @@ int iommu_unregister_mm_exit_handler(struct device *dev)
 	return 0;
 }
 EXPORT_SYMBOL_GPL(iommu_unregister_mm_exit_handler);
+
+/*
+ * iommu_sva_find() - Find mm associated to the given PASID
+ *
+ * Returns the mm corresponding to this PASID, or NULL if not found. A reference
+ * to the mm is taken, and must be released with mmput().
+ */
+struct mm_struct *iommu_sva_find(int pasid)
+{
+	struct io_mm *io_mm;
+	struct mm_struct *mm = NULL;
+
+	spin_lock(&iommu_sva_lock);
+	io_mm = idr_find(&iommu_pasid_idr, pasid);
+	if (io_mm && io_mm_get_locked(io_mm)) {
+		if (mmget_not_zero(io_mm->mm))
+			mm = io_mm->mm;
+
+		io_mm_put_locked(io_mm);
+	}
+	spin_unlock(&iommu_sva_lock);
+
+	return mm;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_find);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index afec7b1d3301..226ab4f3ae0e 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -958,6 +958,7 @@ extern int iommu_register_mm_exit_handler(struct device *dev,
 					  iommu_mm_exit_handler_t handler);
 extern int iommu_unregister_mm_exit_handler(struct device *dev);
 
+extern struct mm_struct *iommu_sva_find(int pasid);
 #else /* CONFIG_IOMMU_SVA */
 static inline int iommu_sva_device_init(struct device *dev,
 					unsigned long features,
@@ -997,6 +998,11 @@ static inline int iommu_unregister_mm_exit_handler(struct device *dev)
 {
 	return -ENODEV;
 }
+
+static inline struct mm_struct *iommu_sva_find(int pasid)
+{
+	return NULL;
+}
 #endif /* CONFIG_IOMMU_SVA */
 
 #endif /* __LINUX_IOMMU_H */
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 07/37] iommu: Add a page fault handler
  2018-02-12 18:33 ` Jean-Philippe Brucker
@ 2018-02-12 18:33   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

Some systems allow devices to handle IOMMU translation faults in the core
mm. For example systems supporting the PCI PRI extension or Arm SMMU stall
model. Infrastructure for reporting such recoverable page faults was
recently added to the IOMMU core, for SVA virtualization. Extend
iommu_report_device_fault() to handle host page faults as well.

* IOMMU drivers instantiate a fault workqueue, using
  iommu_fault_queue_init() and iommu_fault_queue_destroy().

* When it receives a fault event, supposedly in an IRQ handler, the IOMMU
  driver reports the fault using iommu_report_device_fault()

* If the device driver registered a handler (e.g. VFIO), pass down the
  fault event. Otherwise submit it to the fault queue, to be handled in a
  thread.

* When the fault corresponds to an io_mm, call the mm fault handler on it
  (in next patch).

* Once the fault is handled, the mm wrapper or the device driver reports
  success of failure with iommu_page_response(). The translation is either
  retried or aborted, depending on the response code.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/Kconfig      |  10 ++
 drivers/iommu/Makefile     |   1 +
 drivers/iommu/io-pgfault.c | 282 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/iommu/iommu-sva.c  |   3 -
 drivers/iommu/iommu.c      |  31 ++---
 include/linux/iommu.h      |  34 +++++-
 6 files changed, 339 insertions(+), 22 deletions(-)
 create mode 100644 drivers/iommu/io-pgfault.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 146eebe9a4bb..e751bb9958ba 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -85,6 +85,15 @@ config IOMMU_SVA
 
 	  If unsure, say N here.
 
+config IOMMU_FAULT
+	bool "Fault handler for the IOMMU API"
+	select IOMMU_API
+	help
+	  Enable the generic fault handler for the IOMMU API, that handles
+	  recoverable page faults or inject them into guests.
+
+	  If unsure, say N here.
+
 config FSL_PAMU
 	bool "Freescale IOMMU support"
 	depends on PCI
@@ -156,6 +165,7 @@ config INTEL_IOMMU
 	select IOMMU_API
 	select IOMMU_IOVA
 	select DMAR_TABLE
+	select IOMMU_FAULT
 	help
 	  DMA remapping (DMAR) devices support enables independent address
 	  translations for Direct Memory Access (DMA) from devices.
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 1dbcc89ebe4c..f4324e29035e 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -4,6 +4,7 @@ obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
 obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
 obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
+obj-$(CONFIG_IOMMU_FAULT) += io-pgfault.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
new file mode 100644
index 000000000000..33309ed316d2
--- /dev/null
+++ b/drivers/iommu/io-pgfault.c
@@ -0,0 +1,282 @@
+/*
+ * Handle device page faults
+ *
+ * Copyright (C) 2018 ARM Ltd.
+ * Author: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0
+ */
+
+#include <linux/iommu.h>
+#include <linux/list.h>
+#include <linux/slab.h>
+#include <linux/workqueue.h>
+
+static struct workqueue_struct *iommu_fault_queue;
+static DECLARE_RWSEM(iommu_fault_queue_sem);
+static refcount_t iommu_fault_queue_refs = REFCOUNT_INIT(0);
+static BLOCKING_NOTIFIER_HEAD(iommu_fault_queue_flush_notifiers);
+
+/* Used to store incomplete fault groups */
+static LIST_HEAD(iommu_partial_faults);
+static DEFINE_SPINLOCK(iommu_partial_faults_lock);
+
+struct iommu_fault_context {
+	struct device			*dev;
+	struct iommu_fault_event	evt;
+	struct list_head		head;
+};
+
+struct iommu_fault_group {
+	struct iommu_domain		*domain;
+	struct iommu_fault_context	last_fault;
+	struct list_head		faults;
+	struct work_struct		work;
+};
+
+/*
+ * iommu_fault_complete() - Finish handling a fault
+ *
+ * Send a response if necessary and pass on the sanitized status code
+ */
+static int iommu_fault_complete(struct iommu_domain *domain, struct device *dev,
+				struct iommu_fault_event *evt, int status)
+{
+	struct page_response_msg resp = {
+		.addr		= evt->addr,
+		.pasid		= evt->pasid,
+		.pasid_present	= evt->pasid_valid,
+		.page_req_group_id = evt->page_req_group_id,
+		.type		= IOMMU_PAGE_GROUP_RESP,
+		.private_data	= evt->iommu_private,
+	};
+
+	/*
+	 * There is no "handling" an unrecoverable fault, so the only valid
+	 * return values are 0 or an error.
+	 */
+	if (evt->type == IOMMU_FAULT_DMA_UNRECOV)
+		return status > 0 ? 0 : status;
+
+	/* Someone took ownership of the fault and will complete it later */
+	if (status == IOMMU_PAGE_RESP_HANDLED)
+		return 0;
+
+	/*
+	 * There was an internal error with handling the recoverable fault. Try
+	 * to complete the fault if possible.
+	 */
+	if (status < 0)
+		status = IOMMU_PAGE_RESP_INVALID;
+
+	if (WARN_ON(!domain->ops->page_response))
+		/*
+		 * The IOMMU driver shouldn't have submitted recoverable faults
+		 * if it cannot receive a response.
+		 */
+		return -EINVAL;
+
+	resp.resp_code = status;
+	return domain->ops->page_response(domain, dev, &resp);
+}
+
+static int iommu_fault_handle_single(struct iommu_fault_context *fault)
+{
+	/* TODO */
+	return -ENODEV;
+}
+
+static void iommu_fault_handle_group(struct work_struct *work)
+{
+	struct iommu_fault_group *group;
+	struct iommu_fault_context *fault, *next;
+	int status = IOMMU_PAGE_RESP_SUCCESS;
+
+	group = container_of(work, struct iommu_fault_group, work);
+
+	list_for_each_entry_safe(fault, next, &group->faults, head) {
+		struct iommu_fault_event *evt = &fault->evt;
+		/*
+		 * Errors are sticky: don't handle subsequent faults in the
+		 * group if there is an error.
+		 */
+		if (status == IOMMU_PAGE_RESP_SUCCESS)
+			status = iommu_fault_handle_single(fault);
+
+		if (!evt->last_req)
+			kfree(fault);
+	}
+
+	iommu_fault_complete(group->domain, group->last_fault.dev,
+			     &group->last_fault.evt, status);
+	kfree(group);
+}
+
+static int iommu_queue_fault(struct iommu_domain *domain, struct device *dev,
+			     struct iommu_fault_event *evt)
+{
+	struct iommu_fault_group *group;
+	struct iommu_fault_context *fault, *next;
+
+	if (!iommu_fault_queue)
+		return -ENOSYS;
+
+	if (!evt->last_req) {
+		fault = kzalloc(sizeof(*fault), GFP_KERNEL);
+		if (!fault)
+			return -ENOMEM;
+
+		fault->evt = *evt;
+		fault->dev = dev;
+
+		/* Non-last request of a group. Postpone until the last one */
+		spin_lock(&iommu_partial_faults_lock);
+		list_add_tail(&fault->head, &iommu_partial_faults);
+		spin_unlock(&iommu_partial_faults_lock);
+
+		return IOMMU_PAGE_RESP_HANDLED;
+	}
+
+	group = kzalloc(sizeof(*group), GFP_KERNEL);
+	if (!group)
+		return -ENOMEM;
+
+	group->last_fault.evt = *evt;
+	group->last_fault.dev = dev;
+	group->domain = domain;
+	INIT_LIST_HEAD(&group->faults);
+	list_add(&group->last_fault.head, &group->faults);
+	INIT_WORK(&group->work, iommu_fault_handle_group);
+
+	/* See if we have pending faults for this group */
+	spin_lock(&iommu_partial_faults_lock);
+	list_for_each_entry_safe(fault, next, &iommu_partial_faults, head) {
+		if (fault->evt.page_req_group_id == evt->page_req_group_id &&
+		    fault->dev == dev) {
+			list_del(&fault->head);
+			/* Insert *before* the last fault */
+			list_add(&fault->head, &group->faults);
+		}
+	}
+	spin_unlock(&iommu_partial_faults_lock);
+
+	queue_work(iommu_fault_queue, &group->work);
+
+	/* Postpone the fault completion */
+	return IOMMU_PAGE_RESP_HANDLED;
+}
+
+/**
+ * iommu_report_device_fault() - Handle fault in device driver or mm
+ *
+ * If the device driver expressed interest in handling fault, report it through
+ * the callback. If the fault is recoverable, try to page in the address.
+ */
+int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
+{
+	int ret = -ENOSYS;
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+	if (!domain)
+		return -ENODEV;
+
+	/*
+	 * if upper layers showed interest and installed a fault handler,
+	 * invoke it.
+	 */
+	if (iommu_has_device_fault_handler(dev)) {
+		struct iommu_fault_param *param = dev->iommu_param->fault_param;
+
+		return param->handler(evt, param->data);
+	}
+
+	/* If the handler is blocking, handle fault in the workqueue */
+	if (evt->type == IOMMU_FAULT_PAGE_REQ)
+		ret = iommu_queue_fault(domain, dev, evt);
+
+	return iommu_fault_complete(domain, dev, evt, ret);
+}
+EXPORT_SYMBOL_GPL(iommu_report_device_fault);
+
+/**
+ * iommu_fault_queue_register() - register an IOMMU driver to the fault queue
+ * @flush_notifier: a notifier block that is called before the fault queue is
+ * flushed. The IOMMU driver should commit all faults that are pending in its
+ * low-level queues at the time of the call, into the fault queue. The notifier
+ * takes a device pointer as argument, hinting what endpoint is causing the
+ * flush. When the device is NULL, all faults should be committed.
+ */
+int iommu_fault_queue_register(struct notifier_block *flush_notifier)
+{
+	/*
+	 * The WQ is unordered because the low-level handler enqueues faults by
+	 * group. PRI requests within a group have to be ordered, but once
+	 * that's dealt with, the high-level function can handle groups out of
+	 * order.
+	 */
+	down_write(&iommu_fault_queue_sem);
+	if (!iommu_fault_queue) {
+		iommu_fault_queue = alloc_workqueue("iommu_fault_queue",
+						    WQ_UNBOUND, 0);
+		if (iommu_fault_queue)
+			refcount_set(&iommu_fault_queue_refs, 1);
+	} else {
+		refcount_inc(&iommu_fault_queue_refs);
+	}
+	up_write(&iommu_fault_queue_sem);
+
+	if (!iommu_fault_queue)
+		return -ENOMEM;
+
+	if (flush_notifier)
+		blocking_notifier_chain_register(&iommu_fault_queue_flush_notifiers,
+						 flush_notifier);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iommu_fault_queue_register);
+
+/**
+ * iommu_fault_queue_flush() - Ensure that all queued faults have been
+ * processed.
+ * @dev: the endpoint whose faults need to be flushed. If NULL, flush all
+ *       pending faults.
+ *
+ * Users must call this function when releasing a PASID, to ensure that all
+ * pending faults affecting this PASID have been handled, and won't affect the
+ * address space of a subsequent process that reuses this PASID.
+ */
+void iommu_fault_queue_flush(struct device *dev)
+{
+	blocking_notifier_call_chain(&iommu_fault_queue_flush_notifiers, 0, dev);
+
+	down_read(&iommu_fault_queue_sem);
+	/*
+	 * Don't flush the partial faults list. All PRGs with the PASID are
+	 * complete and have been submitted to the queue.
+	 */
+	if (iommu_fault_queue)
+		flush_workqueue(iommu_fault_queue);
+	up_read(&iommu_fault_queue_sem);
+}
+EXPORT_SYMBOL_GPL(iommu_fault_queue_flush);
+
+/**
+ * iommu_fault_queue_unregister() - Unregister an IOMMU driver from the fault
+ * queue.
+ * @flush_notifier: same parameter as iommu_fault_queue_register
+ */
+void iommu_fault_queue_unregister(struct notifier_block *flush_notifier)
+{
+	down_write(&iommu_fault_queue_sem);
+	if (refcount_dec_and_test(&iommu_fault_queue_refs)) {
+		destroy_workqueue(iommu_fault_queue);
+		iommu_fault_queue = NULL;
+	}
+	up_write(&iommu_fault_queue_sem);
+
+	if (flush_notifier)
+		blocking_notifier_chain_unregister(&iommu_fault_queue_flush_notifiers,
+						   flush_notifier);
+}
+EXPORT_SYMBOL_GPL(iommu_fault_queue_unregister);
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 4bc2a8c12465..d7b231cd7355 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -102,9 +102,6 @@
  * the device table and PASID 0 would be available to the allocator.
  */
 
-/* TODO: stub for the fault queue. Remove later. */
-#define iommu_fault_queue_flush(...)
-
 struct iommu_bond {
 	struct io_mm		*io_mm;
 	struct device		*dev;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 1d60b32a6744..c475893ec7dc 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -798,6 +798,17 @@ int iommu_group_unregister_notifier(struct iommu_group *group,
 }
 EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
 
+/**
+ * iommu_register_device_fault_handler() - Register a device fault handler
+ * @dev: the device
+ * @handler: the fault handler
+ * @data: private data passed as argument to the callback
+ *
+ * When an IOMMU fault event is received, call this handler with the fault event
+ * and data as argument.
+ *
+ * Return 0 if the fault handler was installed successfully, or an error.
+ */
 int iommu_register_device_fault_handler(struct device *dev,
 					iommu_dev_fault_handler_t handler,
 					void *data)
@@ -825,6 +836,13 @@ int iommu_register_device_fault_handler(struct device *dev,
 }
 EXPORT_SYMBOL_GPL(iommu_register_device_fault_handler);
 
+/**
+ * iommu_unregister_device_fault_handler() - Unregister the device fault handler
+ * @dev: the device
+ *
+ * Remove the device fault handler installed with
+ * iommu_register_device_fault_handler().
+ */
 int iommu_unregister_device_fault_handler(struct device *dev)
 {
 	struct iommu_param *idata = dev->iommu_param;
@@ -840,19 +858,6 @@ int iommu_unregister_device_fault_handler(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_unregister_device_fault_handler);
 
-
-int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
-{
-	/* we only report device fault if there is a handler registered */
-	if (!dev->iommu_param || !dev->iommu_param->fault_param ||
-		!dev->iommu_param->fault_param->handler)
-		return -ENOSYS;
-
-	return dev->iommu_param->fault_param->handler(evt,
-						dev->iommu_param->fault_param->data);
-}
-EXPORT_SYMBOL_GPL(iommu_report_device_fault);
-
 /**
  * iommu_group_id - Return ID for a group
  * @group: the group to ID
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 226ab4f3ae0e..65e56f28e0ce 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -205,6 +205,7 @@ struct page_response_msg {
 	u32 resp_code:4;
 #define IOMMU_PAGE_RESP_SUCCESS	0
 #define IOMMU_PAGE_RESP_INVALID	1
+#define IOMMU_PAGE_RESP_HANDLED	2
 #define IOMMU_PAGE_RESP_FAILURE	0xF
 
 	u32 pasid_present:1;
@@ -534,7 +535,6 @@ extern int iommu_register_device_fault_handler(struct device *dev,
 
 extern int iommu_unregister_device_fault_handler(struct device *dev);
 
-extern int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt);
 extern int iommu_page_response(struct iommu_domain *domain, struct device *dev,
 			       struct page_response_msg *msg);
 
@@ -836,11 +836,6 @@ static inline bool iommu_has_device_fault_handler(struct device *dev)
 	return false;
 }
 
-static inline int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
-{
-	return 0;
-}
-
 static inline int iommu_page_response(struct iommu_domain *domain, struct device *dev,
 				      struct page_response_msg *msg)
 {
@@ -1005,4 +1000,31 @@ static inline struct mm_struct *iommu_sva_find(int pasid)
 }
 #endif /* CONFIG_IOMMU_SVA */
 
+#ifdef CONFIG_IOMMU_FAULT
+extern int iommu_fault_queue_register(struct notifier_block *flush_notifier);
+extern void iommu_fault_queue_flush(struct device *dev);
+extern void iommu_fault_queue_unregister(struct notifier_block *flush_notifier);
+extern int iommu_report_device_fault(struct device *dev,
+				     struct iommu_fault_event *evt);
+#else /* CONFIG_IOMMU_FAULT */
+static inline int iommu_fault_queue_register(struct notifier_block *flush_notifier)
+{
+	return -ENODEV;
+}
+
+static inline void iommu_fault_queue_flush(struct device *dev)
+{
+}
+
+static inline void iommu_fault_queue_unregister(struct notifier_block *flush_notifier)
+{
+}
+
+static inline int iommu_report_device_fault(struct device *dev,
+					    struct iommu_fault_event *evt)
+{
+	return 0;
+}
+#endif /* CONFIG_IOMMU_FAULT */
+
 #endif /* __LINUX_IOMMU_H */
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 07/37] iommu: Add a page fault handler
@ 2018-02-12 18:33   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

Some systems allow devices to handle IOMMU translation faults in the core
mm. For example systems supporting the PCI PRI extension or Arm SMMU stall
model. Infrastructure for reporting such recoverable page faults was
recently added to the IOMMU core, for SVA virtualization. Extend
iommu_report_device_fault() to handle host page faults as well.

* IOMMU drivers instantiate a fault workqueue, using
  iommu_fault_queue_init() and iommu_fault_queue_destroy().

* When it receives a fault event, supposedly in an IRQ handler, the IOMMU
  driver reports the fault using iommu_report_device_fault()

* If the device driver registered a handler (e.g. VFIO), pass down the
  fault event. Otherwise submit it to the fault queue, to be handled in a
  thread.

* When the fault corresponds to an io_mm, call the mm fault handler on it
  (in next patch).

* Once the fault is handled, the mm wrapper or the device driver reports
  success of failure with iommu_page_response(). The translation is either
  retried or aborted, depending on the response code.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/Kconfig      |  10 ++
 drivers/iommu/Makefile     |   1 +
 drivers/iommu/io-pgfault.c | 282 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/iommu/iommu-sva.c  |   3 -
 drivers/iommu/iommu.c      |  31 ++---
 include/linux/iommu.h      |  34 +++++-
 6 files changed, 339 insertions(+), 22 deletions(-)
 create mode 100644 drivers/iommu/io-pgfault.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 146eebe9a4bb..e751bb9958ba 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -85,6 +85,15 @@ config IOMMU_SVA
 
 	  If unsure, say N here.
 
+config IOMMU_FAULT
+	bool "Fault handler for the IOMMU API"
+	select IOMMU_API
+	help
+	  Enable the generic fault handler for the IOMMU API, that handles
+	  recoverable page faults or inject them into guests.
+
+	  If unsure, say N here.
+
 config FSL_PAMU
 	bool "Freescale IOMMU support"
 	depends on PCI
@@ -156,6 +165,7 @@ config INTEL_IOMMU
 	select IOMMU_API
 	select IOMMU_IOVA
 	select DMAR_TABLE
+	select IOMMU_FAULT
 	help
 	  DMA remapping (DMAR) devices support enables independent address
 	  translations for Direct Memory Access (DMA) from devices.
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 1dbcc89ebe4c..f4324e29035e 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -4,6 +4,7 @@ obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
 obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
 obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
+obj-$(CONFIG_IOMMU_FAULT) += io-pgfault.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
new file mode 100644
index 000000000000..33309ed316d2
--- /dev/null
+++ b/drivers/iommu/io-pgfault.c
@@ -0,0 +1,282 @@
+/*
+ * Handle device page faults
+ *
+ * Copyright (C) 2018 ARM Ltd.
+ * Author: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0
+ */
+
+#include <linux/iommu.h>
+#include <linux/list.h>
+#include <linux/slab.h>
+#include <linux/workqueue.h>
+
+static struct workqueue_struct *iommu_fault_queue;
+static DECLARE_RWSEM(iommu_fault_queue_sem);
+static refcount_t iommu_fault_queue_refs = REFCOUNT_INIT(0);
+static BLOCKING_NOTIFIER_HEAD(iommu_fault_queue_flush_notifiers);
+
+/* Used to store incomplete fault groups */
+static LIST_HEAD(iommu_partial_faults);
+static DEFINE_SPINLOCK(iommu_partial_faults_lock);
+
+struct iommu_fault_context {
+	struct device			*dev;
+	struct iommu_fault_event	evt;
+	struct list_head		head;
+};
+
+struct iommu_fault_group {
+	struct iommu_domain		*domain;
+	struct iommu_fault_context	last_fault;
+	struct list_head		faults;
+	struct work_struct		work;
+};
+
+/*
+ * iommu_fault_complete() - Finish handling a fault
+ *
+ * Send a response if necessary and pass on the sanitized status code
+ */
+static int iommu_fault_complete(struct iommu_domain *domain, struct device *dev,
+				struct iommu_fault_event *evt, int status)
+{
+	struct page_response_msg resp = {
+		.addr		= evt->addr,
+		.pasid		= evt->pasid,
+		.pasid_present	= evt->pasid_valid,
+		.page_req_group_id = evt->page_req_group_id,
+		.type		= IOMMU_PAGE_GROUP_RESP,
+		.private_data	= evt->iommu_private,
+	};
+
+	/*
+	 * There is no "handling" an unrecoverable fault, so the only valid
+	 * return values are 0 or an error.
+	 */
+	if (evt->type == IOMMU_FAULT_DMA_UNRECOV)
+		return status > 0 ? 0 : status;
+
+	/* Someone took ownership of the fault and will complete it later */
+	if (status == IOMMU_PAGE_RESP_HANDLED)
+		return 0;
+
+	/*
+	 * There was an internal error with handling the recoverable fault. Try
+	 * to complete the fault if possible.
+	 */
+	if (status < 0)
+		status = IOMMU_PAGE_RESP_INVALID;
+
+	if (WARN_ON(!domain->ops->page_response))
+		/*
+		 * The IOMMU driver shouldn't have submitted recoverable faults
+		 * if it cannot receive a response.
+		 */
+		return -EINVAL;
+
+	resp.resp_code = status;
+	return domain->ops->page_response(domain, dev, &resp);
+}
+
+static int iommu_fault_handle_single(struct iommu_fault_context *fault)
+{
+	/* TODO */
+	return -ENODEV;
+}
+
+static void iommu_fault_handle_group(struct work_struct *work)
+{
+	struct iommu_fault_group *group;
+	struct iommu_fault_context *fault, *next;
+	int status = IOMMU_PAGE_RESP_SUCCESS;
+
+	group = container_of(work, struct iommu_fault_group, work);
+
+	list_for_each_entry_safe(fault, next, &group->faults, head) {
+		struct iommu_fault_event *evt = &fault->evt;
+		/*
+		 * Errors are sticky: don't handle subsequent faults in the
+		 * group if there is an error.
+		 */
+		if (status == IOMMU_PAGE_RESP_SUCCESS)
+			status = iommu_fault_handle_single(fault);
+
+		if (!evt->last_req)
+			kfree(fault);
+	}
+
+	iommu_fault_complete(group->domain, group->last_fault.dev,
+			     &group->last_fault.evt, status);
+	kfree(group);
+}
+
+static int iommu_queue_fault(struct iommu_domain *domain, struct device *dev,
+			     struct iommu_fault_event *evt)
+{
+	struct iommu_fault_group *group;
+	struct iommu_fault_context *fault, *next;
+
+	if (!iommu_fault_queue)
+		return -ENOSYS;
+
+	if (!evt->last_req) {
+		fault = kzalloc(sizeof(*fault), GFP_KERNEL);
+		if (!fault)
+			return -ENOMEM;
+
+		fault->evt = *evt;
+		fault->dev = dev;
+
+		/* Non-last request of a group. Postpone until the last one */
+		spin_lock(&iommu_partial_faults_lock);
+		list_add_tail(&fault->head, &iommu_partial_faults);
+		spin_unlock(&iommu_partial_faults_lock);
+
+		return IOMMU_PAGE_RESP_HANDLED;
+	}
+
+	group = kzalloc(sizeof(*group), GFP_KERNEL);
+	if (!group)
+		return -ENOMEM;
+
+	group->last_fault.evt = *evt;
+	group->last_fault.dev = dev;
+	group->domain = domain;
+	INIT_LIST_HEAD(&group->faults);
+	list_add(&group->last_fault.head, &group->faults);
+	INIT_WORK(&group->work, iommu_fault_handle_group);
+
+	/* See if we have pending faults for this group */
+	spin_lock(&iommu_partial_faults_lock);
+	list_for_each_entry_safe(fault, next, &iommu_partial_faults, head) {
+		if (fault->evt.page_req_group_id == evt->page_req_group_id &&
+		    fault->dev == dev) {
+			list_del(&fault->head);
+			/* Insert *before* the last fault */
+			list_add(&fault->head, &group->faults);
+		}
+	}
+	spin_unlock(&iommu_partial_faults_lock);
+
+	queue_work(iommu_fault_queue, &group->work);
+
+	/* Postpone the fault completion */
+	return IOMMU_PAGE_RESP_HANDLED;
+}
+
+/**
+ * iommu_report_device_fault() - Handle fault in device driver or mm
+ *
+ * If the device driver expressed interest in handling fault, report it through
+ * the callback. If the fault is recoverable, try to page in the address.
+ */
+int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
+{
+	int ret = -ENOSYS;
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+	if (!domain)
+		return -ENODEV;
+
+	/*
+	 * if upper layers showed interest and installed a fault handler,
+	 * invoke it.
+	 */
+	if (iommu_has_device_fault_handler(dev)) {
+		struct iommu_fault_param *param = dev->iommu_param->fault_param;
+
+		return param->handler(evt, param->data);
+	}
+
+	/* If the handler is blocking, handle fault in the workqueue */
+	if (evt->type == IOMMU_FAULT_PAGE_REQ)
+		ret = iommu_queue_fault(domain, dev, evt);
+
+	return iommu_fault_complete(domain, dev, evt, ret);
+}
+EXPORT_SYMBOL_GPL(iommu_report_device_fault);
+
+/**
+ * iommu_fault_queue_register() - register an IOMMU driver to the fault queue
+ * @flush_notifier: a notifier block that is called before the fault queue is
+ * flushed. The IOMMU driver should commit all faults that are pending in its
+ * low-level queues at the time of the call, into the fault queue. The notifier
+ * takes a device pointer as argument, hinting what endpoint is causing the
+ * flush. When the device is NULL, all faults should be committed.
+ */
+int iommu_fault_queue_register(struct notifier_block *flush_notifier)
+{
+	/*
+	 * The WQ is unordered because the low-level handler enqueues faults by
+	 * group. PRI requests within a group have to be ordered, but once
+	 * that's dealt with, the high-level function can handle groups out of
+	 * order.
+	 */
+	down_write(&iommu_fault_queue_sem);
+	if (!iommu_fault_queue) {
+		iommu_fault_queue = alloc_workqueue("iommu_fault_queue",
+						    WQ_UNBOUND, 0);
+		if (iommu_fault_queue)
+			refcount_set(&iommu_fault_queue_refs, 1);
+	} else {
+		refcount_inc(&iommu_fault_queue_refs);
+	}
+	up_write(&iommu_fault_queue_sem);
+
+	if (!iommu_fault_queue)
+		return -ENOMEM;
+
+	if (flush_notifier)
+		blocking_notifier_chain_register(&iommu_fault_queue_flush_notifiers,
+						 flush_notifier);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iommu_fault_queue_register);
+
+/**
+ * iommu_fault_queue_flush() - Ensure that all queued faults have been
+ * processed.
+ * @dev: the endpoint whose faults need to be flushed. If NULL, flush all
+ *       pending faults.
+ *
+ * Users must call this function when releasing a PASID, to ensure that all
+ * pending faults affecting this PASID have been handled, and won't affect the
+ * address space of a subsequent process that reuses this PASID.
+ */
+void iommu_fault_queue_flush(struct device *dev)
+{
+	blocking_notifier_call_chain(&iommu_fault_queue_flush_notifiers, 0, dev);
+
+	down_read(&iommu_fault_queue_sem);
+	/*
+	 * Don't flush the partial faults list. All PRGs with the PASID are
+	 * complete and have been submitted to the queue.
+	 */
+	if (iommu_fault_queue)
+		flush_workqueue(iommu_fault_queue);
+	up_read(&iommu_fault_queue_sem);
+}
+EXPORT_SYMBOL_GPL(iommu_fault_queue_flush);
+
+/**
+ * iommu_fault_queue_unregister() - Unregister an IOMMU driver from the fault
+ * queue.
+ * @flush_notifier: same parameter as iommu_fault_queue_register
+ */
+void iommu_fault_queue_unregister(struct notifier_block *flush_notifier)
+{
+	down_write(&iommu_fault_queue_sem);
+	if (refcount_dec_and_test(&iommu_fault_queue_refs)) {
+		destroy_workqueue(iommu_fault_queue);
+		iommu_fault_queue = NULL;
+	}
+	up_write(&iommu_fault_queue_sem);
+
+	if (flush_notifier)
+		blocking_notifier_chain_unregister(&iommu_fault_queue_flush_notifiers,
+						   flush_notifier);
+}
+EXPORT_SYMBOL_GPL(iommu_fault_queue_unregister);
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 4bc2a8c12465..d7b231cd7355 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -102,9 +102,6 @@
  * the device table and PASID 0 would be available to the allocator.
  */
 
-/* TODO: stub for the fault queue. Remove later. */
-#define iommu_fault_queue_flush(...)
-
 struct iommu_bond {
 	struct io_mm		*io_mm;
 	struct device		*dev;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 1d60b32a6744..c475893ec7dc 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -798,6 +798,17 @@ int iommu_group_unregister_notifier(struct iommu_group *group,
 }
 EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
 
+/**
+ * iommu_register_device_fault_handler() - Register a device fault handler
+ * @dev: the device
+ * @handler: the fault handler
+ * @data: private data passed as argument to the callback
+ *
+ * When an IOMMU fault event is received, call this handler with the fault event
+ * and data as argument.
+ *
+ * Return 0 if the fault handler was installed successfully, or an error.
+ */
 int iommu_register_device_fault_handler(struct device *dev,
 					iommu_dev_fault_handler_t handler,
 					void *data)
@@ -825,6 +836,13 @@ int iommu_register_device_fault_handler(struct device *dev,
 }
 EXPORT_SYMBOL_GPL(iommu_register_device_fault_handler);
 
+/**
+ * iommu_unregister_device_fault_handler() - Unregister the device fault handler
+ * @dev: the device
+ *
+ * Remove the device fault handler installed with
+ * iommu_register_device_fault_handler().
+ */
 int iommu_unregister_device_fault_handler(struct device *dev)
 {
 	struct iommu_param *idata = dev->iommu_param;
@@ -840,19 +858,6 @@ int iommu_unregister_device_fault_handler(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_unregister_device_fault_handler);
 
-
-int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
-{
-	/* we only report device fault if there is a handler registered */
-	if (!dev->iommu_param || !dev->iommu_param->fault_param ||
-		!dev->iommu_param->fault_param->handler)
-		return -ENOSYS;
-
-	return dev->iommu_param->fault_param->handler(evt,
-						dev->iommu_param->fault_param->data);
-}
-EXPORT_SYMBOL_GPL(iommu_report_device_fault);
-
 /**
  * iommu_group_id - Return ID for a group
  * @group: the group to ID
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 226ab4f3ae0e..65e56f28e0ce 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -205,6 +205,7 @@ struct page_response_msg {
 	u32 resp_code:4;
 #define IOMMU_PAGE_RESP_SUCCESS	0
 #define IOMMU_PAGE_RESP_INVALID	1
+#define IOMMU_PAGE_RESP_HANDLED	2
 #define IOMMU_PAGE_RESP_FAILURE	0xF
 
 	u32 pasid_present:1;
@@ -534,7 +535,6 @@ extern int iommu_register_device_fault_handler(struct device *dev,
 
 extern int iommu_unregister_device_fault_handler(struct device *dev);
 
-extern int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt);
 extern int iommu_page_response(struct iommu_domain *domain, struct device *dev,
 			       struct page_response_msg *msg);
 
@@ -836,11 +836,6 @@ static inline bool iommu_has_device_fault_handler(struct device *dev)
 	return false;
 }
 
-static inline int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
-{
-	return 0;
-}
-
 static inline int iommu_page_response(struct iommu_domain *domain, struct device *dev,
 				      struct page_response_msg *msg)
 {
@@ -1005,4 +1000,31 @@ static inline struct mm_struct *iommu_sva_find(int pasid)
 }
 #endif /* CONFIG_IOMMU_SVA */
 
+#ifdef CONFIG_IOMMU_FAULT
+extern int iommu_fault_queue_register(struct notifier_block *flush_notifier);
+extern void iommu_fault_queue_flush(struct device *dev);
+extern void iommu_fault_queue_unregister(struct notifier_block *flush_notifier);
+extern int iommu_report_device_fault(struct device *dev,
+				     struct iommu_fault_event *evt);
+#else /* CONFIG_IOMMU_FAULT */
+static inline int iommu_fault_queue_register(struct notifier_block *flush_notifier)
+{
+	return -ENODEV;
+}
+
+static inline void iommu_fault_queue_flush(struct device *dev)
+{
+}
+
+static inline void iommu_fault_queue_unregister(struct notifier_block *flush_notifier)
+{
+}
+
+static inline int iommu_report_device_fault(struct device *dev,
+					    struct iommu_fault_event *evt)
+{
+	return 0;
+}
+#endif /* CONFIG_IOMMU_FAULT */
+
 #endif /* __LINUX_IOMMU_H */
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 08/37] iommu/fault: Handle mm faults
  2018-02-12 18:33 ` Jean-Philippe Brucker
@ 2018-02-12 18:33   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

When a recoverable page fault is handled by the fault workqueue, find the
associated mm and call handle_mm_fault.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/io-pgfault.c | 89 ++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 87 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index 33309ed316d2..565ec01a1b5f 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -9,6 +9,7 @@
 
 #include <linux/iommu.h>
 #include <linux/list.h>
+#include <linux/sched/mm.h>
 #include <linux/slab.h>
 #include <linux/workqueue.h>
 
@@ -82,8 +83,92 @@ static int iommu_fault_complete(struct iommu_domain *domain, struct device *dev,
 
 static int iommu_fault_handle_single(struct iommu_fault_context *fault)
 {
-	/* TODO */
-	return -ENODEV;
+	struct mm_struct *mm;
+	struct vm_area_struct *vma;
+	unsigned int access_flags = 0;
+	int ret = IOMMU_PAGE_RESP_INVALID;
+	unsigned int fault_flags = FAULT_FLAG_REMOTE;
+	struct iommu_fault_event *evt = &fault->evt;
+
+	if (!evt->pasid_valid)
+		return ret;
+
+	/*
+	 * Special case: PASID Stop Marker (LRW = 0b100) doesn't expect a
+	 * response. A Stop Marker may be generated when disabling a PASID
+	 * (issuing a PASID stop request) in some PCI devices.
+	 *
+	 * When the mm_exit() callback returns from the device driver, no page
+	 * request is generated for this PASID anymore and outstanding ones have
+	 * been pushed to the IOMMU (as per PCIe 4.0r1.0 - 6.20.1 and 10.4.1.2 -
+	 * Managing PASID TLP Prefix Usage). Some PCI devices will wait for all
+	 * outstanding page requests to come back with a response before
+	 * completing the PASID stop request. Others do not wait for page
+	 * responses, and instead issue this Stop Marker that tells us when the
+	 * PASID can be reallocated.
+	 *
+	 * We ignore the Stop Marker because:
+	 * a. Page requests, which are posted requests, have been flushed to the
+	 *    IOMMU when mm_exit() returns,
+	 * b. We flush all fault queues after mm_exit() returns and before
+	 *    freeing the PASID.
+	 *
+	 * So even though the Stop Marker might be issued by the device *after*
+	 * the stop request completes, outstanding faults will have been dealt
+	 * with by the time we free the PASID.
+	 */
+	if (evt->last_req &&
+	    !(evt->prot & (IOMMU_FAULT_READ | IOMMU_FAULT_WRITE)))
+		return IOMMU_PAGE_RESP_HANDLED;
+
+	mm = iommu_sva_find(evt->pasid);
+	if (!mm)
+		return ret;
+
+	down_read(&mm->mmap_sem);
+
+	vma = find_extend_vma(mm, evt->addr);
+	if (!vma)
+		/* Unmapped area */
+		goto out_put_mm;
+
+	if (evt->prot & IOMMU_FAULT_READ)
+		access_flags |= VM_READ;
+
+	if (evt->prot & IOMMU_FAULT_WRITE) {
+		access_flags |= VM_WRITE;
+		fault_flags |= FAULT_FLAG_WRITE;
+	}
+
+	if (evt->prot & IOMMU_FAULT_EXEC) {
+		access_flags |= VM_EXEC;
+		fault_flags |= FAULT_FLAG_INSTRUCTION;
+	}
+
+	if (!(evt->prot & IOMMU_FAULT_PRIV))
+		fault_flags |= FAULT_FLAG_USER;
+
+	if (access_flags & ~vma->vm_flags)
+		/* Access fault */
+		goto out_put_mm;
+
+	ret = handle_mm_fault(vma, evt->addr, fault_flags);
+	ret = ret & VM_FAULT_ERROR ? IOMMU_PAGE_RESP_INVALID :
+		IOMMU_PAGE_RESP_SUCCESS;
+
+out_put_mm:
+	up_read(&mm->mmap_sem);
+
+	/*
+	 * If the process exits while we're handling the fault on its mm, we
+	 * can't do mmput(). exit_mmap() would release the MMU notifier, calling
+	 * iommu_notifier_release(), which has to flush the fault queue that
+	 * we're executing on... So mmput_async() moves the release of the mm to
+	 * another thread, if we're the last user.
+	 */
+	mmput_async(mm);
+
+	return ret;
 }
 
 static void iommu_fault_handle_group(struct work_struct *work)
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 08/37] iommu/fault: Handle mm faults
@ 2018-02-12 18:33   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

When a recoverable page fault is handled by the fault workqueue, find the
associated mm and call handle_mm_fault.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/io-pgfault.c | 89 ++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 87 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index 33309ed316d2..565ec01a1b5f 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -9,6 +9,7 @@
 
 #include <linux/iommu.h>
 #include <linux/list.h>
+#include <linux/sched/mm.h>
 #include <linux/slab.h>
 #include <linux/workqueue.h>
 
@@ -82,8 +83,92 @@ static int iommu_fault_complete(struct iommu_domain *domain, struct device *dev,
 
 static int iommu_fault_handle_single(struct iommu_fault_context *fault)
 {
-	/* TODO */
-	return -ENODEV;
+	struct mm_struct *mm;
+	struct vm_area_struct *vma;
+	unsigned int access_flags = 0;
+	int ret = IOMMU_PAGE_RESP_INVALID;
+	unsigned int fault_flags = FAULT_FLAG_REMOTE;
+	struct iommu_fault_event *evt = &fault->evt;
+
+	if (!evt->pasid_valid)
+		return ret;
+
+	/*
+	 * Special case: PASID Stop Marker (LRW = 0b100) doesn't expect a
+	 * response. A Stop Marker may be generated when disabling a PASID
+	 * (issuing a PASID stop request) in some PCI devices.
+	 *
+	 * When the mm_exit() callback returns from the device driver, no page
+	 * request is generated for this PASID anymore and outstanding ones have
+	 * been pushed to the IOMMU (as per PCIe 4.0r1.0 - 6.20.1 and 10.4.1.2 -
+	 * Managing PASID TLP Prefix Usage). Some PCI devices will wait for all
+	 * outstanding page requests to come back with a response before
+	 * completing the PASID stop request. Others do not wait for page
+	 * responses, and instead issue this Stop Marker that tells us when the
+	 * PASID can be reallocated.
+	 *
+	 * We ignore the Stop Marker because:
+	 * a. Page requests, which are posted requests, have been flushed to the
+	 *    IOMMU when mm_exit() returns,
+	 * b. We flush all fault queues after mm_exit() returns and before
+	 *    freeing the PASID.
+	 *
+	 * So even though the Stop Marker might be issued by the device *after*
+	 * the stop request completes, outstanding faults will have been dealt
+	 * with by the time we free the PASID.
+	 */
+	if (evt->last_req &&
+	    !(evt->prot & (IOMMU_FAULT_READ | IOMMU_FAULT_WRITE)))
+		return IOMMU_PAGE_RESP_HANDLED;
+
+	mm = iommu_sva_find(evt->pasid);
+	if (!mm)
+		return ret;
+
+	down_read(&mm->mmap_sem);
+
+	vma = find_extend_vma(mm, evt->addr);
+	if (!vma)
+		/* Unmapped area */
+		goto out_put_mm;
+
+	if (evt->prot & IOMMU_FAULT_READ)
+		access_flags |= VM_READ;
+
+	if (evt->prot & IOMMU_FAULT_WRITE) {
+		access_flags |= VM_WRITE;
+		fault_flags |= FAULT_FLAG_WRITE;
+	}
+
+	if (evt->prot & IOMMU_FAULT_EXEC) {
+		access_flags |= VM_EXEC;
+		fault_flags |= FAULT_FLAG_INSTRUCTION;
+	}
+
+	if (!(evt->prot & IOMMU_FAULT_PRIV))
+		fault_flags |= FAULT_FLAG_USER;
+
+	if (access_flags & ~vma->vm_flags)
+		/* Access fault */
+		goto out_put_mm;
+
+	ret = handle_mm_fault(vma, evt->addr, fault_flags);
+	ret = ret & VM_FAULT_ERROR ? IOMMU_PAGE_RESP_INVALID :
+		IOMMU_PAGE_RESP_SUCCESS;
+
+out_put_mm:
+	up_read(&mm->mmap_sem);
+
+	/*
+	 * If the process exits while we're handling the fault on its mm, we
+	 * can't do mmput(). exit_mmap() would release the MMU notifier, calling
+	 * iommu_notifier_release(), which has to flush the fault queue that
+	 * we're executing on... So mmput_async() moves the release of the mm to
+	 * another thread, if we're the last user.
+	 */
+	mmput_async(mm);
+
+	return ret;
 }
 
 static void iommu_fault_handle_group(struct work_struct *work)
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 09/37] iommu/fault: Let handler return a fault response
  2018-02-12 18:33 ` Jean-Philippe Brucker
@ 2018-02-12 18:33   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

It is really convenient to let fault handlers return the action to perform
on the fault immediately, instead of having to call iommu_page_response
with a crafted structure. Update IOMMU_PAGE_RESP* values to encompass most
needs:

- IOMMU_PAGE_RESP_HANDLED means "I took ownership of the fault and will
  send a response later"

- IOMMU_PAGE_RESP_CONTINUE means "I didn't handle the fault, let the next
  handler in the chain take care of it"

- IOMMU_PAGE_RESP_SUCCESS, IOMMU_PAGE_RESP_INVALID,
  IOMMU_PAGE_RESP_FAILURE are the PCI PRI values, and mean respectively
  "fault fixed, retry the translation", "could not fix the fault, abort
  the translation" and "unexpected fault, disable PRI".

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/io-pgfault.c |  8 +++++++-
 drivers/iommu/iommu.c      |  5 ++++-
 include/linux/iommu.h      | 30 ++++++++++++++++++++++++------
 3 files changed, 35 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index 565ec01a1b5f..484a39710d3f 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -63,6 +63,9 @@ static int iommu_fault_complete(struct iommu_domain *domain, struct device *dev,
 	if (status == IOMMU_PAGE_RESP_HANDLED)
 		return 0;
 
+	if (WARN_ON(status == IOMMU_PAGE_RESP_CONTINUE))
+		return -EINVAL;
+
 	/*
 	 * There was an internal error with handling the recoverable fault. Try
 	 * to complete the fault if possible.
@@ -272,7 +275,10 @@ int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
 	if (iommu_has_device_fault_handler(dev)) {
 		struct iommu_fault_param *param = dev->iommu_param->fault_param;
 
-		return param->handler(evt, param->data);
+		ret = param->handler(evt, param->data);
+		if (ret != IOMMU_PAGE_RESP_CONTINUE)
+			return iommu_fault_complete(domain, dev, evt, ret);
+		ret = -ENOSYS;
 	}
 
 	/* If the handler is blocking, handle fault in the workqueue */
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index c475893ec7dc..9bec8390694c 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -805,7 +805,10 @@ EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
  * @data: private data passed as argument to the callback
  *
  * When an IOMMU fault event is received, call this handler with the fault event
- * and data as argument.
+ * and data as argument. If the fault is recoverable (IOMMU_FAULT_PAGE_REQ), the
+ * handler can either return a status code (IOMMU_PAGE_RESP_*) to complete the
+ * fault, or return IOMMU_PAGE_RESP_HANDLED and complete the fault later by
+ * calling iommu_page_response().
  *
  * Return 0 if the fault handler was installed successfully, or an error.
  */
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 65e56f28e0ce..d29991be9401 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -189,6 +189,29 @@ enum page_response_type {
 	IOMMU_PAGE_GROUP_RESP,
 };
 
+/**
+ * enum page_response_code - Return status of fault handlers, telling the IOMMU
+ * driver how to proceed with the fault.
+ *
+ * @IOMMU_FAULT_STATUS_HANDLED: Stop processing the fault, and do not send a
+ *	reply to the device.
+ * @IOMMU_FAULT_STATUS_CONTINUE: Fault was not handled. Call the next handler,
+ *	or terminate.
+ * @IOMMU_FAULT_STATUS_SUCCESS: Fault has been handled and the page tables
+ *	populated, retry the access. This is "Success" in PCI PRI.
+ * @IOMMU_FAULT_STATUS_FAILURE: General error. Drop all subsequent faults from
+ *	this device if possible. This is "Response Failure" in PCI PRI.
+ * @IOMMU_FAULT_STATUS_INVALID: Could not handle this fault, don't retry the
+ *	access. This is "Invalid Request" in PCI PRI.
+ */
+enum page_response_code {
+	IOMMU_PAGE_RESP_HANDLED = 0,
+	IOMMU_PAGE_RESP_CONTINUE,
+	IOMMU_PAGE_RESP_SUCCESS,
+	IOMMU_PAGE_RESP_INVALID,
+	IOMMU_PAGE_RESP_FAILURE,
+};
+
 /**
  * Generic page response information based on PCI ATS and PASID spec.
  * @addr: servicing page address
@@ -202,12 +225,7 @@ enum page_response_type {
 struct page_response_msg {
 	u64 addr;
 	u32 pasid;
-	u32 resp_code:4;
-#define IOMMU_PAGE_RESP_SUCCESS	0
-#define IOMMU_PAGE_RESP_INVALID	1
-#define IOMMU_PAGE_RESP_HANDLED	2
-#define IOMMU_PAGE_RESP_FAILURE	0xF
-
+	enum page_response_code resp_code;
 	u32 pasid_present:1;
 	u32 page_req_group_id : 9;
 	enum page_response_type type;
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 09/37] iommu/fault: Let handler return a fault response
@ 2018-02-12 18:33   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

It is really convenient to let fault handlers return the action to perform
on the fault immediately, instead of having to call iommu_page_response
with a crafted structure. Update IOMMU_PAGE_RESP* values to encompass most
needs:

- IOMMU_PAGE_RESP_HANDLED means "I took ownership of the fault and will
  send a response later"

- IOMMU_PAGE_RESP_CONTINUE means "I didn't handle the fault, let the next
  handler in the chain take care of it"

- IOMMU_PAGE_RESP_SUCCESS, IOMMU_PAGE_RESP_INVALID,
  IOMMU_PAGE_RESP_FAILURE are the PCI PRI values, and mean respectively
  "fault fixed, retry the translation", "could not fix the fault, abort
  the translation" and "unexpected fault, disable PRI".

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/io-pgfault.c |  8 +++++++-
 drivers/iommu/iommu.c      |  5 ++++-
 include/linux/iommu.h      | 30 ++++++++++++++++++++++++------
 3 files changed, 35 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index 565ec01a1b5f..484a39710d3f 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -63,6 +63,9 @@ static int iommu_fault_complete(struct iommu_domain *domain, struct device *dev,
 	if (status == IOMMU_PAGE_RESP_HANDLED)
 		return 0;
 
+	if (WARN_ON(status == IOMMU_PAGE_RESP_CONTINUE))
+		return -EINVAL;
+
 	/*
 	 * There was an internal error with handling the recoverable fault. Try
 	 * to complete the fault if possible.
@@ -272,7 +275,10 @@ int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
 	if (iommu_has_device_fault_handler(dev)) {
 		struct iommu_fault_param *param = dev->iommu_param->fault_param;
 
-		return param->handler(evt, param->data);
+		ret = param->handler(evt, param->data);
+		if (ret != IOMMU_PAGE_RESP_CONTINUE)
+			return iommu_fault_complete(domain, dev, evt, ret);
+		ret = -ENOSYS;
 	}
 
 	/* If the handler is blocking, handle fault in the workqueue */
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index c475893ec7dc..9bec8390694c 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -805,7 +805,10 @@ EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
  * @data: private data passed as argument to the callback
  *
  * When an IOMMU fault event is received, call this handler with the fault event
- * and data as argument.
+ * and data as argument. If the fault is recoverable (IOMMU_FAULT_PAGE_REQ), the
+ * handler can either return a status code (IOMMU_PAGE_RESP_*) to complete the
+ * fault, or return IOMMU_PAGE_RESP_HANDLED and complete the fault later by
+ * calling iommu_page_response().
  *
  * Return 0 if the fault handler was installed successfully, or an error.
  */
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 65e56f28e0ce..d29991be9401 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -189,6 +189,29 @@ enum page_response_type {
 	IOMMU_PAGE_GROUP_RESP,
 };
 
+/**
+ * enum page_response_code - Return status of fault handlers, telling the IOMMU
+ * driver how to proceed with the fault.
+ *
+ * @IOMMU_FAULT_STATUS_HANDLED: Stop processing the fault, and do not send a
+ *	reply to the device.
+ * @IOMMU_FAULT_STATUS_CONTINUE: Fault was not handled. Call the next handler,
+ *	or terminate.
+ * @IOMMU_FAULT_STATUS_SUCCESS: Fault has been handled and the page tables
+ *	populated, retry the access. This is "Success" in PCI PRI.
+ * @IOMMU_FAULT_STATUS_FAILURE: General error. Drop all subsequent faults from
+ *	this device if possible. This is "Response Failure" in PCI PRI.
+ * @IOMMU_FAULT_STATUS_INVALID: Could not handle this fault, don't retry the
+ *	access. This is "Invalid Request" in PCI PRI.
+ */
+enum page_response_code {
+	IOMMU_PAGE_RESP_HANDLED = 0,
+	IOMMU_PAGE_RESP_CONTINUE,
+	IOMMU_PAGE_RESP_SUCCESS,
+	IOMMU_PAGE_RESP_INVALID,
+	IOMMU_PAGE_RESP_FAILURE,
+};
+
 /**
  * Generic page response information based on PCI ATS and PASID spec.
  * @addr: servicing page address
@@ -202,12 +225,7 @@ enum page_response_type {
 struct page_response_msg {
 	u64 addr;
 	u32 pasid;
-	u32 resp_code:4;
-#define IOMMU_PAGE_RESP_SUCCESS	0
-#define IOMMU_PAGE_RESP_INVALID	1
-#define IOMMU_PAGE_RESP_HANDLED	2
-#define IOMMU_PAGE_RESP_FAILURE	0xF
-
+	enum page_response_code resp_code;
 	u32 pasid_present:1;
 	u32 page_req_group_id : 9;
 	enum page_response_type type;
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 10/37] iommu/fault: Allow blocking fault handlers
  2018-02-12 18:33 ` Jean-Philippe Brucker
  (?)
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: mark.rutland-5wv7dgnIgG8,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	catalin.marinas-5wv7dgnIgG8, xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	will.deacon-5wv7dgnIgG8, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	shunyong.yang-PT9Dzx9SjPiXmMXjJBpWqg,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw-LthD3rsA81gm4RdzfppkhA,
	sudeep.holla-5wv7dgnIgG8, christian.koenig-5C7GfCeVMHo

Allow device driver to register their fault handler at different stages of
the handling path. Since we now have a fault workqueue, it is easy to call
their handler from blocking context.

The API borrows "handler" and "thread" terms from the IRQ subsystem, even
though they don't match exactly: some IOMMU driver may report page faults
from an IRQ thread instead of handler. But executing blocking fault
handlers on the workqueue instead of the IRQ thread is still advantageous,
because it allows to unload the low-level fault queue as fast as possible
and avoid losing fault events.

A driver can request to be called both in blocking and non-blocking
context, so it can filter faults early and only execute the blocking code
for some of them.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/io-pgfault.c | 15 +++++++++++++--
 drivers/iommu/iommu.c      | 12 +++++++++++-
 include/linux/iommu.h      | 24 +++++++++++++++++++-----
 3 files changed, 43 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index 484a39710d3f..c8f1d9bdd825 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -89,10 +89,20 @@ static int iommu_fault_handle_single(struct iommu_fault_context *fault)
 	struct mm_struct *mm;
 	struct vm_area_struct *vma;
 	unsigned int access_flags = 0;
+	struct device *dev = fault->dev;
 	int ret = IOMMU_PAGE_RESP_INVALID;
 	unsigned int fault_flags = FAULT_FLAG_REMOTE;
 	struct iommu_fault_event *evt = &fault->evt;
 
+	if (iommu_has_blocking_device_fault_handler(dev)) {
+		struct iommu_fault_param *param = dev->iommu_param->fault_param;
+
+		ret = param->thread(evt, param->data);
+		if (ret != IOMMU_PAGE_RESP_CONTINUE)
+			return ret;
+		ret = IOMMU_PAGE_RESP_INVALID;
+	}
+
 	if (!evt->pasid_valid)
 		return ret;
 
@@ -272,7 +282,7 @@ int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
 	 * if upper layers showed interest and installed a fault handler,
 	 * invoke it.
 	 */
-	if (iommu_has_device_fault_handler(dev)) {
+	if (iommu_has_atomic_device_fault_handler(dev)) {
 		struct iommu_fault_param *param = dev->iommu_param->fault_param;
 
 		ret = param->handler(evt, param->data);
@@ -282,7 +292,8 @@ int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
 	}
 
 	/* If the handler is blocking, handle fault in the workqueue */
-	if (evt->type == IOMMU_FAULT_PAGE_REQ)
+	if (evt->type == IOMMU_FAULT_PAGE_REQ ||
+	    iommu_has_blocking_device_fault_handler(dev))
 		ret = iommu_queue_fault(domain, dev, evt);
 
 	return iommu_fault_complete(domain, dev, evt, ret);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 9bec8390694c..7f8395b620b1 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -801,7 +801,8 @@ EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
 /**
  * iommu_register_device_fault_handler() - Register a device fault handler
  * @dev: the device
- * @handler: the fault handler
+ * @handler: fault handler that can only be called in atomic context
+ * @thread: fault handler called from the workqueue and can block
  * @data: private data passed as argument to the callback
  *
  * When an IOMMU fault event is received, call this handler with the fault event
@@ -810,14 +811,22 @@ EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
  * fault, or return IOMMU_PAGE_RESP_HANDLED and complete the fault later by
  * calling iommu_page_response().
  *
+ * At least one of @handler and @thread must be non-NULL. Both may be set, in
+ * which case the top-half @thread is called from the workqueue iff the
+ * bottom-half @handler returned IOMMU_PAGE_RESP_CONTINUE.
+ *
  * Return 0 if the fault handler was installed successfully, or an error.
  */
 int iommu_register_device_fault_handler(struct device *dev,
 					iommu_dev_fault_handler_t handler,
+					iommu_dev_fault_handler_t thread,
 					void *data)
 {
 	struct iommu_param *idata = dev->iommu_param;
 
+	if (!handler && !thread)
+		return -EINVAL;
+
 	/*
 	 * Device iommu_param should have been allocated when device is
 	 * added to its iommu_group.
@@ -833,6 +842,7 @@ int iommu_register_device_fault_handler(struct device *dev,
 	if (!idata->fault_param)
 		return -ENOMEM;
 	idata->fault_param->handler = handler;
+	idata->fault_param->thread = thread;
 	idata->fault_param->data = data;
 
 	return 0;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index d29991be9401..36fcb579f5ed 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -431,12 +431,13 @@ struct iommu_fault_event {
 
 /**
  * struct iommu_fault_param - per-device IOMMU fault data
- * @dev_fault_handler: Callback function to handle IOMMU faults at device level
- * @data: handler private data
- *
+ * @handler: Atomic callback to handle IOMMU faults at device level
+ * @thread: Blocking callback to handle IOMMU faults at device level
+ * @data: private data for the handler
  */
 struct iommu_fault_param {
 	iommu_dev_fault_handler_t handler;
+	iommu_dev_fault_handler_t thread;
 	void *data;
 };
 
@@ -549,6 +550,7 @@ extern int iommu_group_unregister_notifier(struct iommu_group *group,
 					   struct notifier_block *nb);
 extern int iommu_register_device_fault_handler(struct device *dev,
 					iommu_dev_fault_handler_t handler,
+					iommu_dev_fault_handler_t thread,
 					void *data);
 
 extern int iommu_unregister_device_fault_handler(struct device *dev);
@@ -574,7 +576,13 @@ extern void iommu_domain_window_disable(struct iommu_domain *domain, u32 wnd_nr)
 extern int report_iommu_fault(struct iommu_domain *domain, struct device *dev,
 			      unsigned long iova, int flags);
 
-static inline bool iommu_has_device_fault_handler(struct device *dev)
+static inline bool iommu_has_blocking_device_fault_handler(struct device *dev)
+{
+	return dev->iommu_param && dev->iommu_param->fault_param &&
+		dev->iommu_param->fault_param->thread;
+}
+
+static inline bool iommu_has_atomic_device_fault_handler(struct device *dev)
 {
 	return dev->iommu_param && dev->iommu_param->fault_param &&
 		dev->iommu_param->fault_param->handler;
@@ -839,6 +847,7 @@ static inline int iommu_group_unregister_notifier(struct iommu_group *group,
 
 static inline int iommu_register_device_fault_handler(struct device *dev,
 						iommu_dev_fault_handler_t handler,
+						iommu_dev_fault_handler_t thread,
 						void *data)
 {
 	return 0;
@@ -849,7 +858,12 @@ static inline int iommu_unregister_device_fault_handler(struct device *dev)
 	return 0;
 }
 
-static inline bool iommu_has_device_fault_handler(struct device *dev)
+static inline bool iommu_has_blocking_device_fault_handler(struct device *dev)
+{
+	return false;
+}
+
+static inline bool iommu_has_atomic_device_fault_handler(struct device *dev)
 {
 	return false;
 }
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 10/37] iommu/fault: Allow blocking fault handlers
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

Allow device driver to register their fault handler at different stages of
the handling path. Since we now have a fault workqueue, it is easy to call
their handler from blocking context.

The API borrows "handler" and "thread" terms from the IRQ subsystem, even
though they don't match exactly: some IOMMU driver may report page faults
from an IRQ thread instead of handler. But executing blocking fault
handlers on the workqueue instead of the IRQ thread is still advantageous,
because it allows to unload the low-level fault queue as fast as possible
and avoid losing fault events.

A driver can request to be called both in blocking and non-blocking
context, so it can filter faults early and only execute the blocking code
for some of them.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/io-pgfault.c | 15 +++++++++++++--
 drivers/iommu/iommu.c      | 12 +++++++++++-
 include/linux/iommu.h      | 24 +++++++++++++++++++-----
 3 files changed, 43 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index 484a39710d3f..c8f1d9bdd825 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -89,10 +89,20 @@ static int iommu_fault_handle_single(struct iommu_fault_context *fault)
 	struct mm_struct *mm;
 	struct vm_area_struct *vma;
 	unsigned int access_flags = 0;
+	struct device *dev = fault->dev;
 	int ret = IOMMU_PAGE_RESP_INVALID;
 	unsigned int fault_flags = FAULT_FLAG_REMOTE;
 	struct iommu_fault_event *evt = &fault->evt;
 
+	if (iommu_has_blocking_device_fault_handler(dev)) {
+		struct iommu_fault_param *param = dev->iommu_param->fault_param;
+
+		ret = param->thread(evt, param->data);
+		if (ret != IOMMU_PAGE_RESP_CONTINUE)
+			return ret;
+		ret = IOMMU_PAGE_RESP_INVALID;
+	}
+
 	if (!evt->pasid_valid)
 		return ret;
 
@@ -272,7 +282,7 @@ int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
 	 * if upper layers showed interest and installed a fault handler,
 	 * invoke it.
 	 */
-	if (iommu_has_device_fault_handler(dev)) {
+	if (iommu_has_atomic_device_fault_handler(dev)) {
 		struct iommu_fault_param *param = dev->iommu_param->fault_param;
 
 		ret = param->handler(evt, param->data);
@@ -282,7 +292,8 @@ int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
 	}
 
 	/* If the handler is blocking, handle fault in the workqueue */
-	if (evt->type == IOMMU_FAULT_PAGE_REQ)
+	if (evt->type == IOMMU_FAULT_PAGE_REQ ||
+	    iommu_has_blocking_device_fault_handler(dev))
 		ret = iommu_queue_fault(domain, dev, evt);
 
 	return iommu_fault_complete(domain, dev, evt, ret);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 9bec8390694c..7f8395b620b1 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -801,7 +801,8 @@ EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
 /**
  * iommu_register_device_fault_handler() - Register a device fault handler
  * @dev: the device
- * @handler: the fault handler
+ * @handler: fault handler that can only be called in atomic context
+ * @thread: fault handler called from the workqueue and can block
  * @data: private data passed as argument to the callback
  *
  * When an IOMMU fault event is received, call this handler with the fault event
@@ -810,14 +811,22 @@ EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
  * fault, or return IOMMU_PAGE_RESP_HANDLED and complete the fault later by
  * calling iommu_page_response().
  *
+ * At least one of @handler and @thread must be non-NULL. Both may be set, in
+ * which case the top-half @thread is called from the workqueue iff the
+ * bottom-half @handler returned IOMMU_PAGE_RESP_CONTINUE.
+ *
  * Return 0 if the fault handler was installed successfully, or an error.
  */
 int iommu_register_device_fault_handler(struct device *dev,
 					iommu_dev_fault_handler_t handler,
+					iommu_dev_fault_handler_t thread,
 					void *data)
 {
 	struct iommu_param *idata = dev->iommu_param;
 
+	if (!handler && !thread)
+		return -EINVAL;
+
 	/*
 	 * Device iommu_param should have been allocated when device is
 	 * added to its iommu_group.
@@ -833,6 +842,7 @@ int iommu_register_device_fault_handler(struct device *dev,
 	if (!idata->fault_param)
 		return -ENOMEM;
 	idata->fault_param->handler = handler;
+	idata->fault_param->thread = thread;
 	idata->fault_param->data = data;
 
 	return 0;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index d29991be9401..36fcb579f5ed 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -431,12 +431,13 @@ struct iommu_fault_event {
 
 /**
  * struct iommu_fault_param - per-device IOMMU fault data
- * @dev_fault_handler: Callback function to handle IOMMU faults at device level
- * @data: handler private data
- *
+ * @handler: Atomic callback to handle IOMMU faults at device level
+ * @thread: Blocking callback to handle IOMMU faults at device level
+ * @data: private data for the handler
  */
 struct iommu_fault_param {
 	iommu_dev_fault_handler_t handler;
+	iommu_dev_fault_handler_t thread;
 	void *data;
 };
 
@@ -549,6 +550,7 @@ extern int iommu_group_unregister_notifier(struct iommu_group *group,
 					   struct notifier_block *nb);
 extern int iommu_register_device_fault_handler(struct device *dev,
 					iommu_dev_fault_handler_t handler,
+					iommu_dev_fault_handler_t thread,
 					void *data);
 
 extern int iommu_unregister_device_fault_handler(struct device *dev);
@@ -574,7 +576,13 @@ extern void iommu_domain_window_disable(struct iommu_domain *domain, u32 wnd_nr)
 extern int report_iommu_fault(struct iommu_domain *domain, struct device *dev,
 			      unsigned long iova, int flags);
 
-static inline bool iommu_has_device_fault_handler(struct device *dev)
+static inline bool iommu_has_blocking_device_fault_handler(struct device *dev)
+{
+	return dev->iommu_param && dev->iommu_param->fault_param &&
+		dev->iommu_param->fault_param->thread;
+}
+
+static inline bool iommu_has_atomic_device_fault_handler(struct device *dev)
 {
 	return dev->iommu_param && dev->iommu_param->fault_param &&
 		dev->iommu_param->fault_param->handler;
@@ -839,6 +847,7 @@ static inline int iommu_group_unregister_notifier(struct iommu_group *group,
 
 static inline int iommu_register_device_fault_handler(struct device *dev,
 						iommu_dev_fault_handler_t handler,
+						iommu_dev_fault_handler_t thread,
 						void *data)
 {
 	return 0;
@@ -849,7 +858,12 @@ static inline int iommu_unregister_device_fault_handler(struct device *dev)
 	return 0;
 }
 
-static inline bool iommu_has_device_fault_handler(struct device *dev)
+static inline bool iommu_has_blocking_device_fault_handler(struct device *dev)
+{
+	return false;
+}
+
+static inline bool iommu_has_atomic_device_fault_handler(struct device *dev)
 {
 	return false;
 }
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 10/37] iommu/fault: Allow blocking fault handlers
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

Allow device driver to register their fault handler at different stages of
the handling path. Since we now have a fault workqueue, it is easy to call
their handler from blocking context.

The API borrows "handler" and "thread" terms from the IRQ subsystem, even
though they don't match exactly: some IOMMU driver may report page faults
from an IRQ thread instead of handler. But executing blocking fault
handlers on the workqueue instead of the IRQ thread is still advantageous,
because it allows to unload the low-level fault queue as fast as possible
and avoid losing fault events.

A driver can request to be called both in blocking and non-blocking
context, so it can filter faults early and only execute the blocking code
for some of them.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/io-pgfault.c | 15 +++++++++++++--
 drivers/iommu/iommu.c      | 12 +++++++++++-
 include/linux/iommu.h      | 24 +++++++++++++++++++-----
 3 files changed, 43 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index 484a39710d3f..c8f1d9bdd825 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -89,10 +89,20 @@ static int iommu_fault_handle_single(struct iommu_fault_context *fault)
 	struct mm_struct *mm;
 	struct vm_area_struct *vma;
 	unsigned int access_flags = 0;
+	struct device *dev = fault->dev;
 	int ret = IOMMU_PAGE_RESP_INVALID;
 	unsigned int fault_flags = FAULT_FLAG_REMOTE;
 	struct iommu_fault_event *evt = &fault->evt;
 
+	if (iommu_has_blocking_device_fault_handler(dev)) {
+		struct iommu_fault_param *param = dev->iommu_param->fault_param;
+
+		ret = param->thread(evt, param->data);
+		if (ret != IOMMU_PAGE_RESP_CONTINUE)
+			return ret;
+		ret = IOMMU_PAGE_RESP_INVALID;
+	}
+
 	if (!evt->pasid_valid)
 		return ret;
 
@@ -272,7 +282,7 @@ int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
 	 * if upper layers showed interest and installed a fault handler,
 	 * invoke it.
 	 */
-	if (iommu_has_device_fault_handler(dev)) {
+	if (iommu_has_atomic_device_fault_handler(dev)) {
 		struct iommu_fault_param *param = dev->iommu_param->fault_param;
 
 		ret = param->handler(evt, param->data);
@@ -282,7 +292,8 @@ int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
 	}
 
 	/* If the handler is blocking, handle fault in the workqueue */
-	if (evt->type == IOMMU_FAULT_PAGE_REQ)
+	if (evt->type == IOMMU_FAULT_PAGE_REQ ||
+	    iommu_has_blocking_device_fault_handler(dev))
 		ret = iommu_queue_fault(domain, dev, evt);
 
 	return iommu_fault_complete(domain, dev, evt, ret);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 9bec8390694c..7f8395b620b1 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -801,7 +801,8 @@ EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
 /**
  * iommu_register_device_fault_handler() - Register a device fault handler
  * @dev: the device
- * @handler: the fault handler
+ * @handler: fault handler that can only be called in atomic context
+ * @thread: fault handler called from the workqueue and can block
  * @data: private data passed as argument to the callback
  *
  * When an IOMMU fault event is received, call this handler with the fault event
@@ -810,14 +811,22 @@ EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
  * fault, or return IOMMU_PAGE_RESP_HANDLED and complete the fault later by
  * calling iommu_page_response().
  *
+ * At least one of @handler and @thread must be non-NULL. Both may be set, in
+ * which case the top-half @thread is called from the workqueue iff the
+ * bottom-half @handler returned IOMMU_PAGE_RESP_CONTINUE.
+ *
  * Return 0 if the fault handler was installed successfully, or an error.
  */
 int iommu_register_device_fault_handler(struct device *dev,
 					iommu_dev_fault_handler_t handler,
+					iommu_dev_fault_handler_t thread,
 					void *data)
 {
 	struct iommu_param *idata = dev->iommu_param;
 
+	if (!handler && !thread)
+		return -EINVAL;
+
 	/*
 	 * Device iommu_param should have been allocated when device is
 	 * added to its iommu_group.
@@ -833,6 +842,7 @@ int iommu_register_device_fault_handler(struct device *dev,
 	if (!idata->fault_param)
 		return -ENOMEM;
 	idata->fault_param->handler = handler;
+	idata->fault_param->thread = thread;
 	idata->fault_param->data = data;
 
 	return 0;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index d29991be9401..36fcb579f5ed 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -431,12 +431,13 @@ struct iommu_fault_event {
 
 /**
  * struct iommu_fault_param - per-device IOMMU fault data
- * @dev_fault_handler: Callback function to handle IOMMU faults at device level
- * @data: handler private data
- *
+ * @handler: Atomic callback to handle IOMMU faults at device level
+ * @thread: Blocking callback to handle IOMMU faults at device level
+ * @data: private data for the handler
  */
 struct iommu_fault_param {
 	iommu_dev_fault_handler_t handler;
+	iommu_dev_fault_handler_t thread;
 	void *data;
 };
 
@@ -549,6 +550,7 @@ extern int iommu_group_unregister_notifier(struct iommu_group *group,
 					   struct notifier_block *nb);
 extern int iommu_register_device_fault_handler(struct device *dev,
 					iommu_dev_fault_handler_t handler,
+					iommu_dev_fault_handler_t thread,
 					void *data);
 
 extern int iommu_unregister_device_fault_handler(struct device *dev);
@@ -574,7 +576,13 @@ extern void iommu_domain_window_disable(struct iommu_domain *domain, u32 wnd_nr)
 extern int report_iommu_fault(struct iommu_domain *domain, struct device *dev,
 			      unsigned long iova, int flags);
 
-static inline bool iommu_has_device_fault_handler(struct device *dev)
+static inline bool iommu_has_blocking_device_fault_handler(struct device *dev)
+{
+	return dev->iommu_param && dev->iommu_param->fault_param &&
+		dev->iommu_param->fault_param->thread;
+}
+
+static inline bool iommu_has_atomic_device_fault_handler(struct device *dev)
 {
 	return dev->iommu_param && dev->iommu_param->fault_param &&
 		dev->iommu_param->fault_param->handler;
@@ -839,6 +847,7 @@ static inline int iommu_group_unregister_notifier(struct iommu_group *group,
 
 static inline int iommu_register_device_fault_handler(struct device *dev,
 						iommu_dev_fault_handler_t handler,
+						iommu_dev_fault_handler_t thread,
 						void *data)
 {
 	return 0;
@@ -849,7 +858,12 @@ static inline int iommu_unregister_device_fault_handler(struct device *dev)
 	return 0;
 }
 
-static inline bool iommu_has_device_fault_handler(struct device *dev)
+static inline bool iommu_has_blocking_device_fault_handler(struct device *dev)
+{
+	return false;
+}
+
+static inline bool iommu_has_atomic_device_fault_handler(struct device *dev)
 {
 	return false;
 }
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 11/37] dt-bindings: document stall and PASID properties for IOMMU masters
  2018-02-12 18:33 ` Jean-Philippe Brucker
@ 2018-02-12 18:33   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

On ARM systems, some platform devices behind an IOMMU may support stall
and PASID features. Stall is the ability to recover from page faults and
PASID offers multiple process address spaces to the device. Together they
allow to do paging with a device. Let the firmware tell us when a device
supports stall and PASID.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 Documentation/devicetree/bindings/iommu/iommu.txt | 24 +++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/iommu.txt b/Documentation/devicetree/bindings/iommu/iommu.txt
index 5a8b4624defc..8066b3852110 100644
--- a/Documentation/devicetree/bindings/iommu/iommu.txt
+++ b/Documentation/devicetree/bindings/iommu/iommu.txt
@@ -86,6 +86,30 @@ have a means to turn off translation. But it is invalid in such cases to
 disable the IOMMU's device tree node in the first place because it would
 prevent any driver from properly setting up the translations.
 
+Optional properties:
+--------------------
+- dma-can-stall: When present, the master can wait for a transaction to
+  complete for an indefinite amount of time. Upon translation fault some
+  IOMMUs, instead of aborting the translation immediately, may first
+  notify the driver and keep the transaction in flight. This allows the OS
+  to inspect the fault and, for example, make physical pages resident
+  before updating the mappings and completing the transaction. Such IOMMU
+  accepts a limited number of simultaneous stalled transactions before
+  having to either put back-pressure on the master, or abort new faulting
+  transactions.
+
+  Firmware has to opt-in stalling, because most buses and masters don't
+  support it. In particular it isn't compatible with PCI, where
+  transactions have to complete before a time limit. More generally it
+  won't work in systems and masters that haven't been designed for
+  stalling. For example the OS, in order to handle a stalled transaction,
+  may attempt to retrieve pages from secondary storage in a stalled
+  domain, leading to a deadlock.
+
+- pasid-bits: Some masters support multiple address spaces for DMA, by
+  tagging DMA transactions with an address space identifier. By default,
+  this is 0, which means that the device only has one address space.
+
 
 Notes:
 ======
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 11/37] dt-bindings: document stall and PASID properties for IOMMU masters
@ 2018-02-12 18:33   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

On ARM systems, some platform devices behind an IOMMU may support stall
and PASID features. Stall is the ability to recover from page faults and
PASID offers multiple process address spaces to the device. Together they
allow to do paging with a device. Let the firmware tell us when a device
supports stall and PASID.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 Documentation/devicetree/bindings/iommu/iommu.txt | 24 +++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/iommu.txt b/Documentation/devicetree/bindings/iommu/iommu.txt
index 5a8b4624defc..8066b3852110 100644
--- a/Documentation/devicetree/bindings/iommu/iommu.txt
+++ b/Documentation/devicetree/bindings/iommu/iommu.txt
@@ -86,6 +86,30 @@ have a means to turn off translation. But it is invalid in such cases to
 disable the IOMMU's device tree node in the first place because it would
 prevent any driver from properly setting up the translations.
 
+Optional properties:
+--------------------
+- dma-can-stall: When present, the master can wait for a transaction to
+  complete for an indefinite amount of time. Upon translation fault some
+  IOMMUs, instead of aborting the translation immediately, may first
+  notify the driver and keep the transaction in flight. This allows the OS
+  to inspect the fault and, for example, make physical pages resident
+  before updating the mappings and completing the transaction. Such IOMMU
+  accepts a limited number of simultaneous stalled transactions before
+  having to either put back-pressure on the master, or abort new faulting
+  transactions.
+
+  Firmware has to opt-in stalling, because most buses and masters don't
+  support it. In particular it isn't compatible with PCI, where
+  transactions have to complete before a time limit. More generally it
+  won't work in systems and masters that haven't been designed for
+  stalling. For example the OS, in order to handle a stalled transaction,
+  may attempt to retrieve pages from secondary storage in a stalled
+  domain, leading to a deadlock.
+
+- pasid-bits: Some masters support multiple address spaces for DMA, by
+  tagging DMA transactions with an address space identifier. By default,
+  this is 0, which means that the device only has one address space.
+
 
 Notes:
 ======
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 12/37] iommu/of: Add stall and pasid properties to iommu_fwspec
  2018-02-12 18:33 ` Jean-Philippe Brucker
@ 2018-02-12 18:33   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

Add stall and pasid properties to iommu_fwspec, and fill them when
dma-can-stall and pasid-bits properties are present in the device tree.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/of_iommu.c | 12 ++++++++++++
 include/linux/iommu.h    |  2 ++
 2 files changed, 14 insertions(+)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 5c36a8b7656a..98158fc061ca 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -204,6 +204,18 @@ const struct iommu_ops *of_iommu_configure(struct device *dev,
 			if (err)
 				break;
 		}
+
+		fwspec = dev->iommu_fwspec;
+		if (!err && fwspec) {
+			const __be32 *prop;
+
+			if (of_get_property(master_np, "dma-can-stall", NULL))
+				fwspec->can_stall = true;
+
+			prop = of_get_property(master_np, "pasid-bits", NULL);
+			if (prop)
+				fwspec->num_pasid_bits = be32_to_cpu(*prop);
+		}
 	}
 
 	/*
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 36fcb579f5ed..37c3b9d087ce 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -632,6 +632,8 @@ struct iommu_fwspec {
 	struct fwnode_handle	*iommu_fwnode;
 	void			*iommu_priv;
 	unsigned int		num_ids;
+	unsigned int		num_pasid_bits;
+	bool			can_stall;
 	u32			ids[1];
 };
 
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 12/37] iommu/of: Add stall and pasid properties to iommu_fwspec
@ 2018-02-12 18:33   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

Add stall and pasid properties to iommu_fwspec, and fill them when
dma-can-stall and pasid-bits properties are present in the device tree.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/of_iommu.c | 12 ++++++++++++
 include/linux/iommu.h    |  2 ++
 2 files changed, 14 insertions(+)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 5c36a8b7656a..98158fc061ca 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -204,6 +204,18 @@ const struct iommu_ops *of_iommu_configure(struct device *dev,
 			if (err)
 				break;
 		}
+
+		fwspec = dev->iommu_fwspec;
+		if (!err && fwspec) {
+			const __be32 *prop;
+
+			if (of_get_property(master_np, "dma-can-stall", NULL))
+				fwspec->can_stall = true;
+
+			prop = of_get_property(master_np, "pasid-bits", NULL);
+			if (prop)
+				fwspec->num_pasid_bits = be32_to_cpu(*prop);
+		}
 	}
 
 	/*
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 36fcb579f5ed..37c3b9d087ce 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -632,6 +632,8 @@ struct iommu_fwspec {
 	struct fwnode_handle	*iommu_fwnode;
 	void			*iommu_priv;
 	unsigned int		num_ids;
+	unsigned int		num_pasid_bits;
+	bool			can_stall;
 	u32			ids[1];
 };
 
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 13/37] arm64: mm: Pin down ASIDs for sharing mm with devices
  2018-02-12 18:33 ` Jean-Philippe Brucker
  (?)
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: mark.rutland-5wv7dgnIgG8,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	catalin.marinas-5wv7dgnIgG8, xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	will.deacon-5wv7dgnIgG8, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	shunyong.yang-PT9Dzx9SjPiXmMXjJBpWqg,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw-LthD3rsA81gm4RdzfppkhA,
	sudeep.holla-5wv7dgnIgG8, christian.koenig-5C7GfCeVMHo

To enable address space sharing with the IOMMU, introduce mm_context_get()
and mm_context_put(), that pin down a context and ensure that it will keep
its ASID after a rollover.

Pinning is necessary because a device constantly needs a valid ASID,
unlike tasks that only require one when running. Without pinning, we would
need to notify the IOMMU when we're about to use a new ASID for a task,
and it would get complicated when a new task is assigned a shared ASID.
Consider the following scenario with no ASID pinned:

1. Task t1 is running on CPUx with shared ASID (gen=1, asid=1)
2. Task t2 is scheduled on CPUx, gets ASID (1, 2)
3. Task tn is scheduled on CPUy, a rollover occurs, tn gets ASID (2, 1)
   We would now have to immediately generate a new ASID for t1, notify
   the IOMMU, and finally enable task tn. We are holding the lock during
   all that time, since we can't afford having another CPU trigger a
   rollover. The IOMMU issues invalidation commands that can take tens of
   milliseconds.

It gets needlessly complicated. All we wanted to do was schedule task tn,
that has no business with the IOMMU. By letting the IOMMU pin tasks when
needed, we avoid stalling the slow path, and let the pinning fail when
we're out of shareable ASIDs.

After a rollover, the allocator expects at least one ASID to be available
in addition to the reserved ones (one per CPU). So (NR_ASIDS - NR_CPUS -
1) is the maxium number of ASIDs that can be shared with the IOMMU.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---

I started adding these changes to the TLA+ specification of the ASID
allocator, but I'm having trouble finding a good configuration. So far I
haven't been able to complete a full check in reasonable time (4 days
and counting for the current version). More details on this patch:

http://jpbrucker.net/cgit.cgi/kernel-tla/commit/?id=4d4fd17429a516e1bf2495b2dc7d036daab2dab9

---
 arch/arm64/include/asm/mmu.h         |  1 +
 arch/arm64/include/asm/mmu_context.h | 11 ++++-
 arch/arm64/mm/context.c              | 87 ++++++++++++++++++++++++++++++++++--
 3 files changed, 94 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index a050d4f3615d..aaddd0a289ee 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -25,6 +25,7 @@
 
 typedef struct {
 	atomic64_t	id;
+	unsigned long	pinned;
 	void		*vdso;
 	unsigned long	flags;
 } mm_context_t;
diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index 8d3331985d2e..fb8c26bc6fda 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -168,7 +168,13 @@ static inline void cpu_replace_ttbr1(pgd_t *pgd)
 #define destroy_context(mm)		do { } while(0)
 void check_and_switch_context(struct mm_struct *mm, unsigned int cpu);
 
-#define init_new_context(tsk,mm)	({ atomic64_set(&(mm)->context.id, 0); 0; })
+static inline int
+init_new_context(struct task_struct *tsk, struct mm_struct *mm)
+{
+	atomic64_set(&mm->context.id, 0);
+	mm->context.pinned = 0;
+	return 0;
+}
 
 #ifdef CONFIG_ARM64_SW_TTBR0_PAN
 static inline void update_saved_ttbr0(struct task_struct *tsk,
@@ -241,6 +247,9 @@ switch_mm(struct mm_struct *prev, struct mm_struct *next,
 void verify_cpu_asid_bits(void);
 void post_ttbr_update_workaround(void);
 
+unsigned long mm_context_get(struct mm_struct *mm);
+void mm_context_put(struct mm_struct *mm);
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* !__ASM_MMU_CONTEXT_H */
diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index 301417ae2ba8..a2152687c423 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -37,6 +37,10 @@ static DEFINE_PER_CPU(atomic64_t, active_asids);
 static DEFINE_PER_CPU(u64, reserved_asids);
 static cpumask_t tlb_flush_pending;
 
+static unsigned long max_pinned_asids;
+static unsigned long nr_pinned_asids;
+static unsigned long *pinned_asid_map;
+
 #define ASID_MASK		(~GENMASK(asid_bits - 1, 0))
 #define ASID_FIRST_VERSION	(1UL << asid_bits)
 
@@ -88,13 +92,16 @@ void verify_cpu_asid_bits(void)
 	}
 }
 
+#define asid_gen_match(asid) \
+	(!(((asid) ^ atomic64_read(&asid_generation)) >> asid_bits))
+
 static void flush_context(unsigned int cpu)
 {
 	int i;
 	u64 asid;
 
 	/* Update the list of reserved ASIDs and the ASID bitmap. */
-	bitmap_clear(asid_map, 0, NUM_USER_ASIDS);
+	bitmap_copy(asid_map, pinned_asid_map, NUM_USER_ASIDS);
 
 	for_each_possible_cpu(i) {
 		asid = atomic64_xchg_relaxed(&per_cpu(active_asids, i), 0);
@@ -151,6 +158,9 @@ static u64 new_context(struct mm_struct *mm, unsigned int cpu)
 	if (asid != 0) {
 		u64 newasid = generation | (asid & ~ASID_MASK);
 
+		if (mm->context.pinned)
+			return newasid;
+
 		/*
 		 * If our current ASID was active during a rollover, we
 		 * can continue to use it and this was just a false alarm.
@@ -213,8 +223,7 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)
 	 *   because atomic RmWs are totally ordered for a given location.
 	 */
 	old_active_asid = atomic64_read(&per_cpu(active_asids, cpu));
-	if (old_active_asid &&
-	    !((asid ^ atomic64_read(&asid_generation)) >> asid_bits) &&
+	if (old_active_asid && asid_gen_match(asid) &&
 	    atomic64_cmpxchg_relaxed(&per_cpu(active_asids, cpu),
 				     old_active_asid, asid))
 		goto switch_mm_fastpath;
@@ -222,7 +231,7 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)
 	raw_spin_lock_irqsave(&cpu_asid_lock, flags);
 	/* Check that our ASID belongs to the current generation. */
 	asid = atomic64_read(&mm->context.id);
-	if ((asid ^ atomic64_read(&asid_generation)) >> asid_bits) {
+	if (!asid_gen_match(asid)) {
 		asid = new_context(mm, cpu);
 		atomic64_set(&mm->context.id, asid);
 	}
@@ -245,6 +254,63 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)
 		cpu_switch_mm(mm->pgd, mm);
 }
 
+unsigned long mm_context_get(struct mm_struct *mm)
+{
+	unsigned long flags;
+	u64 asid;
+
+	raw_spin_lock_irqsave(&cpu_asid_lock, flags);
+
+	asid = atomic64_read(&mm->context.id);
+
+	if (mm->context.pinned) {
+		mm->context.pinned++;
+		asid &= ~ASID_MASK;
+		goto out_unlock;
+	}
+
+	if (nr_pinned_asids >= max_pinned_asids) {
+		asid = 0;
+		goto out_unlock;
+	}
+
+	if (!asid_gen_match(asid)) {
+		/*
+		 * We went through one or more rollover since that ASID was
+		 * used. Ensure that it is still valid, or generate a new one.
+		 * The cpu argument isn't used by new_context.
+		 */
+		asid = new_context(mm, 0);
+		atomic64_set(&mm->context.id, asid);
+	}
+
+	asid &= ~ASID_MASK;
+
+	nr_pinned_asids++;
+	__set_bit(asid2idx(asid), pinned_asid_map);
+	mm->context.pinned++;
+
+out_unlock:
+	raw_spin_unlock_irqrestore(&cpu_asid_lock, flags);
+
+	return asid;
+}
+
+void mm_context_put(struct mm_struct *mm)
+{
+	unsigned long flags;
+	u64 asid = atomic64_read(&mm->context.id) & ~ASID_MASK;
+
+	raw_spin_lock_irqsave(&cpu_asid_lock, flags);
+
+	if (--mm->context.pinned == 0) {
+		__clear_bit(asid2idx(asid), pinned_asid_map);
+		nr_pinned_asids--;
+	}
+
+	raw_spin_unlock_irqrestore(&cpu_asid_lock, flags);
+}
+
 /* Errata workaround post TTBRx_EL1 update. */
 asmlinkage void post_ttbr_update_workaround(void)
 {
@@ -269,6 +335,19 @@ static int asids_init(void)
 		panic("Failed to allocate bitmap for %lu ASIDs\n",
 		      NUM_USER_ASIDS);
 
+	pinned_asid_map = kzalloc(BITS_TO_LONGS(NUM_USER_ASIDS)
+				  * sizeof(*pinned_asid_map), GFP_KERNEL);
+	if (!pinned_asid_map)
+		panic("Failed to allocate pinned bitmap\n");
+
+	/*
+	 * We assume that an ASID is always available after a rollover. This
+	 * means that even if all CPUs have a reserved ASID, there still is at
+	 * least one slot available in the asid map.
+	 */
+	max_pinned_asids = NUM_USER_ASIDS - num_possible_cpus() - 2;
+	nr_pinned_asids = 0;
+
 	pr_info("ASID allocator initialised with %lu entries\n", NUM_USER_ASIDS);
 	return 0;
 }
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 13/37] arm64: mm: Pin down ASIDs for sharing mm with devices
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

To enable address space sharing with the IOMMU, introduce mm_context_get()
and mm_context_put(), that pin down a context and ensure that it will keep
its ASID after a rollover.

Pinning is necessary because a device constantly needs a valid ASID,
unlike tasks that only require one when running. Without pinning, we would
need to notify the IOMMU when we're about to use a new ASID for a task,
and it would get complicated when a new task is assigned a shared ASID.
Consider the following scenario with no ASID pinned:

1. Task t1 is running on CPUx with shared ASID (gen=1, asid=1)
2. Task t2 is scheduled on CPUx, gets ASID (1, 2)
3. Task tn is scheduled on CPUy, a rollover occurs, tn gets ASID (2, 1)
   We would now have to immediately generate a new ASID for t1, notify
   the IOMMU, and finally enable task tn. We are holding the lock during
   all that time, since we can't afford having another CPU trigger a
   rollover. The IOMMU issues invalidation commands that can take tens of
   milliseconds.

It gets needlessly complicated. All we wanted to do was schedule task tn,
that has no business with the IOMMU. By letting the IOMMU pin tasks when
needed, we avoid stalling the slow path, and let the pinning fail when
we're out of shareable ASIDs.

After a rollover, the allocator expects at least one ASID to be available
in addition to the reserved ones (one per CPU). So (NR_ASIDS - NR_CPUS -
1) is the maxium number of ASIDs that can be shared with the IOMMU.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---

I started adding these changes to the TLA+ specification of the ASID
allocator, but I'm having trouble finding a good configuration. So far I
haven't been able to complete a full check in reasonable time (4 days
and counting for the current version). More details on this patch:

http://jpbrucker.net/cgit.cgi/kernel-tla/commit/?id=4d4fd17429a516e1bf2495b2dc7d036daab2dab9

---
 arch/arm64/include/asm/mmu.h         |  1 +
 arch/arm64/include/asm/mmu_context.h | 11 ++++-
 arch/arm64/mm/context.c              | 87 ++++++++++++++++++++++++++++++++++--
 3 files changed, 94 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index a050d4f3615d..aaddd0a289ee 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -25,6 +25,7 @@
 
 typedef struct {
 	atomic64_t	id;
+	unsigned long	pinned;
 	void		*vdso;
 	unsigned long	flags;
 } mm_context_t;
diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index 8d3331985d2e..fb8c26bc6fda 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -168,7 +168,13 @@ static inline void cpu_replace_ttbr1(pgd_t *pgd)
 #define destroy_context(mm)		do { } while(0)
 void check_and_switch_context(struct mm_struct *mm, unsigned int cpu);
 
-#define init_new_context(tsk,mm)	({ atomic64_set(&(mm)->context.id, 0); 0; })
+static inline int
+init_new_context(struct task_struct *tsk, struct mm_struct *mm)
+{
+	atomic64_set(&mm->context.id, 0);
+	mm->context.pinned = 0;
+	return 0;
+}
 
 #ifdef CONFIG_ARM64_SW_TTBR0_PAN
 static inline void update_saved_ttbr0(struct task_struct *tsk,
@@ -241,6 +247,9 @@ switch_mm(struct mm_struct *prev, struct mm_struct *next,
 void verify_cpu_asid_bits(void);
 void post_ttbr_update_workaround(void);
 
+unsigned long mm_context_get(struct mm_struct *mm);
+void mm_context_put(struct mm_struct *mm);
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* !__ASM_MMU_CONTEXT_H */
diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index 301417ae2ba8..a2152687c423 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -37,6 +37,10 @@ static DEFINE_PER_CPU(atomic64_t, active_asids);
 static DEFINE_PER_CPU(u64, reserved_asids);
 static cpumask_t tlb_flush_pending;
 
+static unsigned long max_pinned_asids;
+static unsigned long nr_pinned_asids;
+static unsigned long *pinned_asid_map;
+
 #define ASID_MASK		(~GENMASK(asid_bits - 1, 0))
 #define ASID_FIRST_VERSION	(1UL << asid_bits)
 
@@ -88,13 +92,16 @@ void verify_cpu_asid_bits(void)
 	}
 }
 
+#define asid_gen_match(asid) \
+	(!(((asid) ^ atomic64_read(&asid_generation)) >> asid_bits))
+
 static void flush_context(unsigned int cpu)
 {
 	int i;
 	u64 asid;
 
 	/* Update the list of reserved ASIDs and the ASID bitmap. */
-	bitmap_clear(asid_map, 0, NUM_USER_ASIDS);
+	bitmap_copy(asid_map, pinned_asid_map, NUM_USER_ASIDS);
 
 	for_each_possible_cpu(i) {
 		asid = atomic64_xchg_relaxed(&per_cpu(active_asids, i), 0);
@@ -151,6 +158,9 @@ static u64 new_context(struct mm_struct *mm, unsigned int cpu)
 	if (asid != 0) {
 		u64 newasid = generation | (asid & ~ASID_MASK);
 
+		if (mm->context.pinned)
+			return newasid;
+
 		/*
 		 * If our current ASID was active during a rollover, we
 		 * can continue to use it and this was just a false alarm.
@@ -213,8 +223,7 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)
 	 *   because atomic RmWs are totally ordered for a given location.
 	 */
 	old_active_asid = atomic64_read(&per_cpu(active_asids, cpu));
-	if (old_active_asid &&
-	    !((asid ^ atomic64_read(&asid_generation)) >> asid_bits) &&
+	if (old_active_asid && asid_gen_match(asid) &&
 	    atomic64_cmpxchg_relaxed(&per_cpu(active_asids, cpu),
 				     old_active_asid, asid))
 		goto switch_mm_fastpath;
@@ -222,7 +231,7 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)
 	raw_spin_lock_irqsave(&cpu_asid_lock, flags);
 	/* Check that our ASID belongs to the current generation. */
 	asid = atomic64_read(&mm->context.id);
-	if ((asid ^ atomic64_read(&asid_generation)) >> asid_bits) {
+	if (!asid_gen_match(asid)) {
 		asid = new_context(mm, cpu);
 		atomic64_set(&mm->context.id, asid);
 	}
@@ -245,6 +254,63 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)
 		cpu_switch_mm(mm->pgd, mm);
 }
 
+unsigned long mm_context_get(struct mm_struct *mm)
+{
+	unsigned long flags;
+	u64 asid;
+
+	raw_spin_lock_irqsave(&cpu_asid_lock, flags);
+
+	asid = atomic64_read(&mm->context.id);
+
+	if (mm->context.pinned) {
+		mm->context.pinned++;
+		asid &= ~ASID_MASK;
+		goto out_unlock;
+	}
+
+	if (nr_pinned_asids >= max_pinned_asids) {
+		asid = 0;
+		goto out_unlock;
+	}
+
+	if (!asid_gen_match(asid)) {
+		/*
+		 * We went through one or more rollover since that ASID was
+		 * used. Ensure that it is still valid, or generate a new one.
+		 * The cpu argument isn't used by new_context.
+		 */
+		asid = new_context(mm, 0);
+		atomic64_set(&mm->context.id, asid);
+	}
+
+	asid &= ~ASID_MASK;
+
+	nr_pinned_asids++;
+	__set_bit(asid2idx(asid), pinned_asid_map);
+	mm->context.pinned++;
+
+out_unlock:
+	raw_spin_unlock_irqrestore(&cpu_asid_lock, flags);
+
+	return asid;
+}
+
+void mm_context_put(struct mm_struct *mm)
+{
+	unsigned long flags;
+	u64 asid = atomic64_read(&mm->context.id) & ~ASID_MASK;
+
+	raw_spin_lock_irqsave(&cpu_asid_lock, flags);
+
+	if (--mm->context.pinned == 0) {
+		__clear_bit(asid2idx(asid), pinned_asid_map);
+		nr_pinned_asids--;
+	}
+
+	raw_spin_unlock_irqrestore(&cpu_asid_lock, flags);
+}
+
 /* Errata workaround post TTBRx_EL1 update. */
 asmlinkage void post_ttbr_update_workaround(void)
 {
@@ -269,6 +335,19 @@ static int asids_init(void)
 		panic("Failed to allocate bitmap for %lu ASIDs\n",
 		      NUM_USER_ASIDS);
 
+	pinned_asid_map = kzalloc(BITS_TO_LONGS(NUM_USER_ASIDS)
+				  * sizeof(*pinned_asid_map), GFP_KERNEL);
+	if (!pinned_asid_map)
+		panic("Failed to allocate pinned bitmap\n");
+
+	/*
+	 * We assume that an ASID is always available after a rollover. This
+	 * means that even if all CPUs have a reserved ASID, there still is at
+	 * least one slot available in the asid map.
+	 */
+	max_pinned_asids = NUM_USER_ASIDS - num_possible_cpus() - 2;
+	nr_pinned_asids = 0;
+
 	pr_info("ASID allocator initialised with %lu entries\n", NUM_USER_ASIDS);
 	return 0;
 }
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 13/37] arm64: mm: Pin down ASIDs for sharing mm with devices
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

To enable address space sharing with the IOMMU, introduce mm_context_get()
and mm_context_put(), that pin down a context and ensure that it will keep
its ASID after a rollover.

Pinning is necessary because a device constantly needs a valid ASID,
unlike tasks that only require one when running. Without pinning, we would
need to notify the IOMMU when we're about to use a new ASID for a task,
and it would get complicated when a new task is assigned a shared ASID.
Consider the following scenario with no ASID pinned:

1. Task t1 is running on CPUx with shared ASID (gen=1, asid=1)
2. Task t2 is scheduled on CPUx, gets ASID (1, 2)
3. Task tn is scheduled on CPUy, a rollover occurs, tn gets ASID (2, 1)
   We would now have to immediately generate a new ASID for t1, notify
   the IOMMU, and finally enable task tn. We are holding the lock during
   all that time, since we can't afford having another CPU trigger a
   rollover. The IOMMU issues invalidation commands that can take tens of
   milliseconds.

It gets needlessly complicated. All we wanted to do was schedule task tn,
that has no business with the IOMMU. By letting the IOMMU pin tasks when
needed, we avoid stalling the slow path, and let the pinning fail when
we're out of shareable ASIDs.

After a rollover, the allocator expects at least one ASID to be available
in addition to the reserved ones (one per CPU). So (NR_ASIDS - NR_CPUS -
1) is the maxium number of ASIDs that can be shared with the IOMMU.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---

I started adding these changes to the TLA+ specification of the ASID
allocator, but I'm having trouble finding a good configuration. So far I
haven't been able to complete a full check in reasonable time (4 days
and counting for the current version). More details on this patch:

http://jpbrucker.net/cgit.cgi/kernel-tla/commit/?id=4d4fd17429a516e1bf2495b2dc7d036daab2dab9

---
 arch/arm64/include/asm/mmu.h         |  1 +
 arch/arm64/include/asm/mmu_context.h | 11 ++++-
 arch/arm64/mm/context.c              | 87 ++++++++++++++++++++++++++++++++++--
 3 files changed, 94 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index a050d4f3615d..aaddd0a289ee 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -25,6 +25,7 @@
 
 typedef struct {
 	atomic64_t	id;
+	unsigned long	pinned;
 	void		*vdso;
 	unsigned long	flags;
 } mm_context_t;
diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index 8d3331985d2e..fb8c26bc6fda 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -168,7 +168,13 @@ static inline void cpu_replace_ttbr1(pgd_t *pgd)
 #define destroy_context(mm)		do { } while(0)
 void check_and_switch_context(struct mm_struct *mm, unsigned int cpu);
 
-#define init_new_context(tsk,mm)	({ atomic64_set(&(mm)->context.id, 0); 0; })
+static inline int
+init_new_context(struct task_struct *tsk, struct mm_struct *mm)
+{
+	atomic64_set(&mm->context.id, 0);
+	mm->context.pinned = 0;
+	return 0;
+}
 
 #ifdef CONFIG_ARM64_SW_TTBR0_PAN
 static inline void update_saved_ttbr0(struct task_struct *tsk,
@@ -241,6 +247,9 @@ switch_mm(struct mm_struct *prev, struct mm_struct *next,
 void verify_cpu_asid_bits(void);
 void post_ttbr_update_workaround(void);
 
+unsigned long mm_context_get(struct mm_struct *mm);
+void mm_context_put(struct mm_struct *mm);
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* !__ASM_MMU_CONTEXT_H */
diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index 301417ae2ba8..a2152687c423 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -37,6 +37,10 @@ static DEFINE_PER_CPU(atomic64_t, active_asids);
 static DEFINE_PER_CPU(u64, reserved_asids);
 static cpumask_t tlb_flush_pending;
 
+static unsigned long max_pinned_asids;
+static unsigned long nr_pinned_asids;
+static unsigned long *pinned_asid_map;
+
 #define ASID_MASK		(~GENMASK(asid_bits - 1, 0))
 #define ASID_FIRST_VERSION	(1UL << asid_bits)
 
@@ -88,13 +92,16 @@ void verify_cpu_asid_bits(void)
 	}
 }
 
+#define asid_gen_match(asid) \
+	(!(((asid) ^ atomic64_read(&asid_generation)) >> asid_bits))
+
 static void flush_context(unsigned int cpu)
 {
 	int i;
 	u64 asid;
 
 	/* Update the list of reserved ASIDs and the ASID bitmap. */
-	bitmap_clear(asid_map, 0, NUM_USER_ASIDS);
+	bitmap_copy(asid_map, pinned_asid_map, NUM_USER_ASIDS);
 
 	for_each_possible_cpu(i) {
 		asid = atomic64_xchg_relaxed(&per_cpu(active_asids, i), 0);
@@ -151,6 +158,9 @@ static u64 new_context(struct mm_struct *mm, unsigned int cpu)
 	if (asid != 0) {
 		u64 newasid = generation | (asid & ~ASID_MASK);
 
+		if (mm->context.pinned)
+			return newasid;
+
 		/*
 		 * If our current ASID was active during a rollover, we
 		 * can continue to use it and this was just a false alarm.
@@ -213,8 +223,7 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)
 	 *   because atomic RmWs are totally ordered for a given location.
 	 */
 	old_active_asid = atomic64_read(&per_cpu(active_asids, cpu));
-	if (old_active_asid &&
-	    !((asid ^ atomic64_read(&asid_generation)) >> asid_bits) &&
+	if (old_active_asid && asid_gen_match(asid) &&
 	    atomic64_cmpxchg_relaxed(&per_cpu(active_asids, cpu),
 				     old_active_asid, asid))
 		goto switch_mm_fastpath;
@@ -222,7 +231,7 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)
 	raw_spin_lock_irqsave(&cpu_asid_lock, flags);
 	/* Check that our ASID belongs to the current generation. */
 	asid = atomic64_read(&mm->context.id);
-	if ((asid ^ atomic64_read(&asid_generation)) >> asid_bits) {
+	if (!asid_gen_match(asid)) {
 		asid = new_context(mm, cpu);
 		atomic64_set(&mm->context.id, asid);
 	}
@@ -245,6 +254,63 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)
 		cpu_switch_mm(mm->pgd, mm);
 }
 
+unsigned long mm_context_get(struct mm_struct *mm)
+{
+	unsigned long flags;
+	u64 asid;
+
+	raw_spin_lock_irqsave(&cpu_asid_lock, flags);
+
+	asid = atomic64_read(&mm->context.id);
+
+	if (mm->context.pinned) {
+		mm->context.pinned++;
+		asid &= ~ASID_MASK;
+		goto out_unlock;
+	}
+
+	if (nr_pinned_asids >= max_pinned_asids) {
+		asid = 0;
+		goto out_unlock;
+	}
+
+	if (!asid_gen_match(asid)) {
+		/*
+		 * We went through one or more rollover since that ASID was
+		 * used. Ensure that it is still valid, or generate a new one.
+		 * The cpu argument isn't used by new_context.
+		 */
+		asid = new_context(mm, 0);
+		atomic64_set(&mm->context.id, asid);
+	}
+
+	asid &= ~ASID_MASK;
+
+	nr_pinned_asids++;
+	__set_bit(asid2idx(asid), pinned_asid_map);
+	mm->context.pinned++;
+
+out_unlock:
+	raw_spin_unlock_irqrestore(&cpu_asid_lock, flags);
+
+	return asid;
+}
+
+void mm_context_put(struct mm_struct *mm)
+{
+	unsigned long flags;
+	u64 asid = atomic64_read(&mm->context.id) & ~ASID_MASK;
+
+	raw_spin_lock_irqsave(&cpu_asid_lock, flags);
+
+	if (--mm->context.pinned == 0) {
+		__clear_bit(asid2idx(asid), pinned_asid_map);
+		nr_pinned_asids--;
+	}
+
+	raw_spin_unlock_irqrestore(&cpu_asid_lock, flags);
+}
+
 /* Errata workaround post TTBRx_EL1 update. */
 asmlinkage void post_ttbr_update_workaround(void)
 {
@@ -269,6 +335,19 @@ static int asids_init(void)
 		panic("Failed to allocate bitmap for %lu ASIDs\n",
 		      NUM_USER_ASIDS);
 
+	pinned_asid_map = kzalloc(BITS_TO_LONGS(NUM_USER_ASIDS)
+				  * sizeof(*pinned_asid_map), GFP_KERNEL);
+	if (!pinned_asid_map)
+		panic("Failed to allocate pinned bitmap\n");
+
+	/*
+	 * We assume that an ASID is always available after a rollover. This
+	 * means that even if all CPUs have a reserved ASID, there still is at
+	 * least one slot available in the asid map.
+	 */
+	max_pinned_asids = NUM_USER_ASIDS - num_possible_cpus() - 2;
+	nr_pinned_asids = 0;
+
 	pr_info("ASID allocator initialised with %lu entries\n", NUM_USER_ASIDS);
 	return 0;
 }
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 14/37] iommu/arm-smmu-v3: Link domains and devices
  2018-02-12 18:33 ` Jean-Philippe Brucker
  (?)
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: joro-zLv9SwRftAIdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	mark.rutland-5wv7dgnIgG8, catalin.marinas-5wv7dgnIgG8,
	will.deacon-5wv7dgnIgG8, lorenzo.pieralisi-5wv7dgnIgG8,
	hanjun.guo-QSEj5FYQhm4dnm+yROfE0A, sudeep.holla-5wv7dgnIgG8,
	rjw-LthD3rsA81gm4RdzfppkhA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	robin.murphy-5wv7dgnIgG8, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	tn-nYOzD4b6Jr9Wk0Htik3J/w, liubo95-hv44wF8Li93QT0dZR+AlfA,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	jonathan.cameron-hv44wF8Li93QT0dZR+AlfA,
	shunyong.yang-PT9Dzx9SjPiXmMXjJBpWqg,
	nwatters-sgV2jX0FEOL9JmXXK+q4OQ, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	jcrouse-sgV2jX0FEOL9JmXXK+q4OQ, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA,
	yi.l.liu-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	robdclark-Re5JQEeQqe8AvxtiuMwx3w, christian.koenig-5C7GfCeVMHo,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA

When removing a mapping from a domain, we need to send an invalidation to
all devices that might have stored it in their Address Translation Cache
(ATC). In addition when updating the context descriptor of a live domain,
we'll need to send invalidations for all devices attached to it.

Maintain a list of devices in each domain, protected by a spinlock. It is
updated every time we attach or detach devices to and from domains.

It needs to be a spinlock because we'll invalidate ATC entries from
within hardirq-safe contexts, but it may be possible to relax the read
side with RCU later.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu-v3.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 3f2f1fc68b52..fb2507ffcdaf 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -652,6 +652,11 @@ struct arm_smmu_device {
 struct arm_smmu_master_data {
 	struct arm_smmu_device		*smmu;
 	struct arm_smmu_strtab_ent	ste;
+
+	struct arm_smmu_domain		*domain;
+	struct list_head		list; /* domain->devices */
+
+	struct device			*dev;
 };
 
 /* SMMU private data for an IOMMU domain */
@@ -675,6 +680,9 @@ struct arm_smmu_domain {
 	};
 
 	struct iommu_domain		domain;
+
+	struct list_head		devices;
+	spinlock_t			devices_lock;
 };
 
 struct arm_smmu_option_prop {
@@ -1540,6 +1548,9 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 	}
 
 	mutex_init(&smmu_domain->init_mutex);
+	INIT_LIST_HEAD(&smmu_domain->devices);
+	spin_lock_init(&smmu_domain->devices_lock);
+
 	return &smmu_domain->domain;
 }
 
@@ -1754,7 +1765,17 @@ static void arm_smmu_install_ste_for_dev(struct iommu_fwspec *fwspec)
 
 static void arm_smmu_detach_dev(struct device *dev)
 {
+	unsigned long flags;
 	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
+	struct arm_smmu_domain *smmu_domain = master->domain;
+
+	if (smmu_domain) {
+		spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+		list_del(&master->list);
+		spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+
+		master->domain = NULL;
+	}
 
 	master->ste.assigned = false;
 	arm_smmu_install_ste_for_dev(dev->iommu_fwspec);
@@ -1763,6 +1784,7 @@ static void arm_smmu_detach_dev(struct device *dev)
 static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 {
 	int ret = 0;
+	unsigned long flags;
 	struct arm_smmu_device *smmu;
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 	struct arm_smmu_master_data *master;
@@ -1798,6 +1820,11 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	}
 
 	ste->assigned = true;
+	master->domain = smmu_domain;
+
+	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+	list_add(&master->list, &smmu_domain->devices);
+	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
 
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_BYPASS) {
 		ste->s1_cfg = NULL;
@@ -1916,6 +1943,7 @@ static int arm_smmu_add_device(struct device *dev)
 			return -ENOMEM;
 
 		master->smmu = smmu;
+		master->dev = dev;
 		fwspec->iommu_priv = master;
 	}
 
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 14/37] iommu/arm-smmu-v3: Link domains and devices
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

When removing a mapping from a domain, we need to send an invalidation to
all devices that might have stored it in their Address Translation Cache
(ATC). In addition when updating the context descriptor of a live domain,
we'll need to send invalidations for all devices attached to it.

Maintain a list of devices in each domain, protected by a spinlock. It is
updated every time we attach or detach devices to and from domains.

It needs to be a spinlock because we'll invalidate ATC entries from
within hardirq-safe contexts, but it may be possible to relax the read
side with RCU later.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 3f2f1fc68b52..fb2507ffcdaf 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -652,6 +652,11 @@ struct arm_smmu_device {
 struct arm_smmu_master_data {
 	struct arm_smmu_device		*smmu;
 	struct arm_smmu_strtab_ent	ste;
+
+	struct arm_smmu_domain		*domain;
+	struct list_head		list; /* domain->devices */
+
+	struct device			*dev;
 };
 
 /* SMMU private data for an IOMMU domain */
@@ -675,6 +680,9 @@ struct arm_smmu_domain {
 	};
 
 	struct iommu_domain		domain;
+
+	struct list_head		devices;
+	spinlock_t			devices_lock;
 };
 
 struct arm_smmu_option_prop {
@@ -1540,6 +1548,9 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 	}
 
 	mutex_init(&smmu_domain->init_mutex);
+	INIT_LIST_HEAD(&smmu_domain->devices);
+	spin_lock_init(&smmu_domain->devices_lock);
+
 	return &smmu_domain->domain;
 }
 
@@ -1754,7 +1765,17 @@ static void arm_smmu_install_ste_for_dev(struct iommu_fwspec *fwspec)
 
 static void arm_smmu_detach_dev(struct device *dev)
 {
+	unsigned long flags;
 	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
+	struct arm_smmu_domain *smmu_domain = master->domain;
+
+	if (smmu_domain) {
+		spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+		list_del(&master->list);
+		spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+
+		master->domain = NULL;
+	}
 
 	master->ste.assigned = false;
 	arm_smmu_install_ste_for_dev(dev->iommu_fwspec);
@@ -1763,6 +1784,7 @@ static void arm_smmu_detach_dev(struct device *dev)
 static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 {
 	int ret = 0;
+	unsigned long flags;
 	struct arm_smmu_device *smmu;
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 	struct arm_smmu_master_data *master;
@@ -1798,6 +1820,11 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	}
 
 	ste->assigned = true;
+	master->domain = smmu_domain;
+
+	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+	list_add(&master->list, &smmu_domain->devices);
+	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
 
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_BYPASS) {
 		ste->s1_cfg = NULL;
@@ -1916,6 +1943,7 @@ static int arm_smmu_add_device(struct device *dev)
 			return -ENOMEM;
 
 		master->smmu = smmu;
+		master->dev = dev;
 		fwspec->iommu_priv = master;
 	}
 
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 14/37] iommu/arm-smmu-v3: Link domains and devices
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

When removing a mapping from a domain, we need to send an invalidation to
all devices that might have stored it in their Address Translation Cache
(ATC). In addition when updating the context descriptor of a live domain,
we'll need to send invalidations for all devices attached to it.

Maintain a list of devices in each domain, protected by a spinlock. It is
updated every time we attach or detach devices to and from domains.

It needs to be a spinlock because we'll invalidate ATC entries from
within hardirq-safe contexts, but it may be possible to relax the read
side with RCU later.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 3f2f1fc68b52..fb2507ffcdaf 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -652,6 +652,11 @@ struct arm_smmu_device {
 struct arm_smmu_master_data {
 	struct arm_smmu_device		*smmu;
 	struct arm_smmu_strtab_ent	ste;
+
+	struct arm_smmu_domain		*domain;
+	struct list_head		list; /* domain->devices */
+
+	struct device			*dev;
 };
 
 /* SMMU private data for an IOMMU domain */
@@ -675,6 +680,9 @@ struct arm_smmu_domain {
 	};
 
 	struct iommu_domain		domain;
+
+	struct list_head		devices;
+	spinlock_t			devices_lock;
 };
 
 struct arm_smmu_option_prop {
@@ -1540,6 +1548,9 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 	}
 
 	mutex_init(&smmu_domain->init_mutex);
+	INIT_LIST_HEAD(&smmu_domain->devices);
+	spin_lock_init(&smmu_domain->devices_lock);
+
 	return &smmu_domain->domain;
 }
 
@@ -1754,7 +1765,17 @@ static void arm_smmu_install_ste_for_dev(struct iommu_fwspec *fwspec)
 
 static void arm_smmu_detach_dev(struct device *dev)
 {
+	unsigned long flags;
 	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
+	struct arm_smmu_domain *smmu_domain = master->domain;
+
+	if (smmu_domain) {
+		spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+		list_del(&master->list);
+		spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+
+		master->domain = NULL;
+	}
 
 	master->ste.assigned = false;
 	arm_smmu_install_ste_for_dev(dev->iommu_fwspec);
@@ -1763,6 +1784,7 @@ static void arm_smmu_detach_dev(struct device *dev)
 static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 {
 	int ret = 0;
+	unsigned long flags;
 	struct arm_smmu_device *smmu;
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 	struct arm_smmu_master_data *master;
@@ -1798,6 +1820,11 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	}
 
 	ste->assigned = true;
+	master->domain = smmu_domain;
+
+	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+	list_add(&master->list, &smmu_domain->devices);
+	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
 
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_BYPASS) {
 		ste->s1_cfg = NULL;
@@ -1916,6 +1943,7 @@ static int arm_smmu_add_device(struct device *dev)
 			return -ENOMEM;
 
 		master->smmu = smmu;
+		master->dev = dev;
 		fwspec->iommu_priv = master;
 	}
 
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 15/37] iommu/io-pgtable-arm: Factor out ARM LPAE register defines
  2018-02-12 18:33 ` Jean-Philippe Brucker
@ 2018-02-12 18:33   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

For SVA, we'll need to extract CPU page table information and mirror it in
the substream setup. Move relevant defines to a common header.

Fix TCR_SZ_MASK while we're at it.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 MAINTAINERS                    |  1 +
 drivers/iommu/io-pgtable-arm.c | 48 +-----------------------------
 drivers/iommu/io-pgtable-arm.h | 67 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 69 insertions(+), 47 deletions(-)
 create mode 100644 drivers/iommu/io-pgtable-arm.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 3bdc260e36b7..9cb8ced8322a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1106,6 +1106,7 @@ S:	Maintained
 F:	drivers/iommu/arm-smmu.c
 F:	drivers/iommu/arm-smmu-v3.c
 F:	drivers/iommu/io-pgtable-arm.c
+F:	drivers/iommu/io-pgtable-arm.h
 F:	drivers/iommu/io-pgtable-arm-v7s.c
 
 ARM SUB-ARCHITECTURES
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 51e5c43caed1..fff0b6ba0a69 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -31,6 +31,7 @@
 #include <asm/barrier.h>
 
 #include "io-pgtable.h"
+#include "io-pgtable-arm.h"
 
 #define ARM_LPAE_MAX_ADDR_BITS		48
 #define ARM_LPAE_S2_MAX_CONCAT_PAGES	16
@@ -118,53 +119,6 @@
 #define ARM_LPAE_PTE_MEMATTR_DEV	(((arm_lpae_iopte)0x1) << 2)
 
 /* Register bits */
-#define ARM_32_LPAE_TCR_EAE		(1 << 31)
-#define ARM_64_LPAE_S2_TCR_RES1		(1 << 31)
-
-#define ARM_LPAE_TCR_EPD1		(1 << 23)
-
-#define ARM_LPAE_TCR_TG0_4K		(0 << 14)
-#define ARM_LPAE_TCR_TG0_64K		(1 << 14)
-#define ARM_LPAE_TCR_TG0_16K		(2 << 14)
-
-#define ARM_LPAE_TCR_SH0_SHIFT		12
-#define ARM_LPAE_TCR_SH0_MASK		0x3
-#define ARM_LPAE_TCR_SH_NS		0
-#define ARM_LPAE_TCR_SH_OS		2
-#define ARM_LPAE_TCR_SH_IS		3
-
-#define ARM_LPAE_TCR_ORGN0_SHIFT	10
-#define ARM_LPAE_TCR_IRGN0_SHIFT	8
-#define ARM_LPAE_TCR_RGN_MASK		0x3
-#define ARM_LPAE_TCR_RGN_NC		0
-#define ARM_LPAE_TCR_RGN_WBWA		1
-#define ARM_LPAE_TCR_RGN_WT		2
-#define ARM_LPAE_TCR_RGN_WB		3
-
-#define ARM_LPAE_TCR_SL0_SHIFT		6
-#define ARM_LPAE_TCR_SL0_MASK		0x3
-
-#define ARM_LPAE_TCR_T0SZ_SHIFT		0
-#define ARM_LPAE_TCR_SZ_MASK		0xf
-
-#define ARM_LPAE_TCR_PS_SHIFT		16
-#define ARM_LPAE_TCR_PS_MASK		0x7
-
-#define ARM_LPAE_TCR_IPS_SHIFT		32
-#define ARM_LPAE_TCR_IPS_MASK		0x7
-
-#define ARM_LPAE_TCR_PS_32_BIT		0x0ULL
-#define ARM_LPAE_TCR_PS_36_BIT		0x1ULL
-#define ARM_LPAE_TCR_PS_40_BIT		0x2ULL
-#define ARM_LPAE_TCR_PS_42_BIT		0x3ULL
-#define ARM_LPAE_TCR_PS_44_BIT		0x4ULL
-#define ARM_LPAE_TCR_PS_48_BIT		0x5ULL
-
-#define ARM_LPAE_MAIR_ATTR_SHIFT(n)	((n) << 3)
-#define ARM_LPAE_MAIR_ATTR_MASK		0xff
-#define ARM_LPAE_MAIR_ATTR_DEVICE	0x04
-#define ARM_LPAE_MAIR_ATTR_NC		0x44
-#define ARM_LPAE_MAIR_ATTR_WBRWA	0xff
 #define ARM_LPAE_MAIR_ATTR_IDX_NC	0
 #define ARM_LPAE_MAIR_ATTR_IDX_CACHE	1
 #define ARM_LPAE_MAIR_ATTR_IDX_DEV	2
diff --git a/drivers/iommu/io-pgtable-arm.h b/drivers/iommu/io-pgtable-arm.h
new file mode 100644
index 000000000000..cb31314971ac
--- /dev/null
+++ b/drivers/iommu/io-pgtable-arm.h
@@ -0,0 +1,67 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Copyright (C) 2017 ARM Limited
+ */
+#ifndef __IO_PGTABLE_ARM_H
+#define __IO_PGTABLE_ARM_H
+
+#define ARM_32_LPAE_TCR_EAE		(1 << 31)
+#define ARM_64_LPAE_S2_TCR_RES1		(1 << 31)
+
+#define ARM_LPAE_TCR_EPD1		(1 << 23)
+
+#define ARM_LPAE_TCR_TG0_4K		(0 << 14)
+#define ARM_LPAE_TCR_TG0_64K		(1 << 14)
+#define ARM_LPAE_TCR_TG0_16K		(2 << 14)
+
+#define ARM_LPAE_TCR_SH0_SHIFT		12
+#define ARM_LPAE_TCR_SH0_MASK		0x3
+#define ARM_LPAE_TCR_SH_NS		0
+#define ARM_LPAE_TCR_SH_OS		2
+#define ARM_LPAE_TCR_SH_IS		3
+
+#define ARM_LPAE_TCR_ORGN0_SHIFT	10
+#define ARM_LPAE_TCR_IRGN0_SHIFT	8
+#define ARM_LPAE_TCR_RGN_MASK		0x3
+#define ARM_LPAE_TCR_RGN_NC		0
+#define ARM_LPAE_TCR_RGN_WBWA		1
+#define ARM_LPAE_TCR_RGN_WT		2
+#define ARM_LPAE_TCR_RGN_WB		3
+
+#define ARM_LPAE_TCR_SL0_SHIFT		6
+#define ARM_LPAE_TCR_SL0_MASK		0x3
+
+#define ARM_LPAE_TCR_T0SZ_SHIFT		0
+#define ARM_LPAE_TCR_SZ_MASK		0x3f
+
+#define ARM_LPAE_TCR_PS_SHIFT		16
+#define ARM_LPAE_TCR_PS_MASK		0x7
+
+#define ARM_LPAE_TCR_IPS_SHIFT		32
+#define ARM_LPAE_TCR_IPS_MASK		0x7
+
+#define ARM_LPAE_TCR_PS_32_BIT		0x0ULL
+#define ARM_LPAE_TCR_PS_36_BIT		0x1ULL
+#define ARM_LPAE_TCR_PS_40_BIT		0x2ULL
+#define ARM_LPAE_TCR_PS_42_BIT		0x3ULL
+#define ARM_LPAE_TCR_PS_44_BIT		0x4ULL
+#define ARM_LPAE_TCR_PS_48_BIT		0x5ULL
+
+#define ARM_LPAE_MAIR_ATTR_SHIFT(n)	((n) << 3)
+#define ARM_LPAE_MAIR_ATTR_MASK		0xff
+#define ARM_LPAE_MAIR_ATTR_DEVICE	0x04
+#define ARM_LPAE_MAIR_ATTR_NC		0x44
+#define ARM_LPAE_MAIR_ATTR_WBRWA	0xff
+
+#endif /* __IO_PGTABLE_ARM_H */
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 15/37] iommu/io-pgtable-arm: Factor out ARM LPAE register defines
@ 2018-02-12 18:33   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

For SVA, we'll need to extract CPU page table information and mirror it in
the substream setup. Move relevant defines to a common header.

Fix TCR_SZ_MASK while we're at it.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 MAINTAINERS                    |  1 +
 drivers/iommu/io-pgtable-arm.c | 48 +-----------------------------
 drivers/iommu/io-pgtable-arm.h | 67 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 69 insertions(+), 47 deletions(-)
 create mode 100644 drivers/iommu/io-pgtable-arm.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 3bdc260e36b7..9cb8ced8322a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1106,6 +1106,7 @@ S:	Maintained
 F:	drivers/iommu/arm-smmu.c
 F:	drivers/iommu/arm-smmu-v3.c
 F:	drivers/iommu/io-pgtable-arm.c
+F:	drivers/iommu/io-pgtable-arm.h
 F:	drivers/iommu/io-pgtable-arm-v7s.c
 
 ARM SUB-ARCHITECTURES
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 51e5c43caed1..fff0b6ba0a69 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -31,6 +31,7 @@
 #include <asm/barrier.h>
 
 #include "io-pgtable.h"
+#include "io-pgtable-arm.h"
 
 #define ARM_LPAE_MAX_ADDR_BITS		48
 #define ARM_LPAE_S2_MAX_CONCAT_PAGES	16
@@ -118,53 +119,6 @@
 #define ARM_LPAE_PTE_MEMATTR_DEV	(((arm_lpae_iopte)0x1) << 2)
 
 /* Register bits */
-#define ARM_32_LPAE_TCR_EAE		(1 << 31)
-#define ARM_64_LPAE_S2_TCR_RES1		(1 << 31)
-
-#define ARM_LPAE_TCR_EPD1		(1 << 23)
-
-#define ARM_LPAE_TCR_TG0_4K		(0 << 14)
-#define ARM_LPAE_TCR_TG0_64K		(1 << 14)
-#define ARM_LPAE_TCR_TG0_16K		(2 << 14)
-
-#define ARM_LPAE_TCR_SH0_SHIFT		12
-#define ARM_LPAE_TCR_SH0_MASK		0x3
-#define ARM_LPAE_TCR_SH_NS		0
-#define ARM_LPAE_TCR_SH_OS		2
-#define ARM_LPAE_TCR_SH_IS		3
-
-#define ARM_LPAE_TCR_ORGN0_SHIFT	10
-#define ARM_LPAE_TCR_IRGN0_SHIFT	8
-#define ARM_LPAE_TCR_RGN_MASK		0x3
-#define ARM_LPAE_TCR_RGN_NC		0
-#define ARM_LPAE_TCR_RGN_WBWA		1
-#define ARM_LPAE_TCR_RGN_WT		2
-#define ARM_LPAE_TCR_RGN_WB		3
-
-#define ARM_LPAE_TCR_SL0_SHIFT		6
-#define ARM_LPAE_TCR_SL0_MASK		0x3
-
-#define ARM_LPAE_TCR_T0SZ_SHIFT		0
-#define ARM_LPAE_TCR_SZ_MASK		0xf
-
-#define ARM_LPAE_TCR_PS_SHIFT		16
-#define ARM_LPAE_TCR_PS_MASK		0x7
-
-#define ARM_LPAE_TCR_IPS_SHIFT		32
-#define ARM_LPAE_TCR_IPS_MASK		0x7
-
-#define ARM_LPAE_TCR_PS_32_BIT		0x0ULL
-#define ARM_LPAE_TCR_PS_36_BIT		0x1ULL
-#define ARM_LPAE_TCR_PS_40_BIT		0x2ULL
-#define ARM_LPAE_TCR_PS_42_BIT		0x3ULL
-#define ARM_LPAE_TCR_PS_44_BIT		0x4ULL
-#define ARM_LPAE_TCR_PS_48_BIT		0x5ULL
-
-#define ARM_LPAE_MAIR_ATTR_SHIFT(n)	((n) << 3)
-#define ARM_LPAE_MAIR_ATTR_MASK		0xff
-#define ARM_LPAE_MAIR_ATTR_DEVICE	0x04
-#define ARM_LPAE_MAIR_ATTR_NC		0x44
-#define ARM_LPAE_MAIR_ATTR_WBRWA	0xff
 #define ARM_LPAE_MAIR_ATTR_IDX_NC	0
 #define ARM_LPAE_MAIR_ATTR_IDX_CACHE	1
 #define ARM_LPAE_MAIR_ATTR_IDX_DEV	2
diff --git a/drivers/iommu/io-pgtable-arm.h b/drivers/iommu/io-pgtable-arm.h
new file mode 100644
index 000000000000..cb31314971ac
--- /dev/null
+++ b/drivers/iommu/io-pgtable-arm.h
@@ -0,0 +1,67 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Copyright (C) 2017 ARM Limited
+ */
+#ifndef __IO_PGTABLE_ARM_H
+#define __IO_PGTABLE_ARM_H
+
+#define ARM_32_LPAE_TCR_EAE		(1 << 31)
+#define ARM_64_LPAE_S2_TCR_RES1		(1 << 31)
+
+#define ARM_LPAE_TCR_EPD1		(1 << 23)
+
+#define ARM_LPAE_TCR_TG0_4K		(0 << 14)
+#define ARM_LPAE_TCR_TG0_64K		(1 << 14)
+#define ARM_LPAE_TCR_TG0_16K		(2 << 14)
+
+#define ARM_LPAE_TCR_SH0_SHIFT		12
+#define ARM_LPAE_TCR_SH0_MASK		0x3
+#define ARM_LPAE_TCR_SH_NS		0
+#define ARM_LPAE_TCR_SH_OS		2
+#define ARM_LPAE_TCR_SH_IS		3
+
+#define ARM_LPAE_TCR_ORGN0_SHIFT	10
+#define ARM_LPAE_TCR_IRGN0_SHIFT	8
+#define ARM_LPAE_TCR_RGN_MASK		0x3
+#define ARM_LPAE_TCR_RGN_NC		0
+#define ARM_LPAE_TCR_RGN_WBWA		1
+#define ARM_LPAE_TCR_RGN_WT		2
+#define ARM_LPAE_TCR_RGN_WB		3
+
+#define ARM_LPAE_TCR_SL0_SHIFT		6
+#define ARM_LPAE_TCR_SL0_MASK		0x3
+
+#define ARM_LPAE_TCR_T0SZ_SHIFT		0
+#define ARM_LPAE_TCR_SZ_MASK		0x3f
+
+#define ARM_LPAE_TCR_PS_SHIFT		16
+#define ARM_LPAE_TCR_PS_MASK		0x7
+
+#define ARM_LPAE_TCR_IPS_SHIFT		32
+#define ARM_LPAE_TCR_IPS_MASK		0x7
+
+#define ARM_LPAE_TCR_PS_32_BIT		0x0ULL
+#define ARM_LPAE_TCR_PS_36_BIT		0x1ULL
+#define ARM_LPAE_TCR_PS_40_BIT		0x2ULL
+#define ARM_LPAE_TCR_PS_42_BIT		0x3ULL
+#define ARM_LPAE_TCR_PS_44_BIT		0x4ULL
+#define ARM_LPAE_TCR_PS_48_BIT		0x5ULL
+
+#define ARM_LPAE_MAIR_ATTR_SHIFT(n)	((n) << 3)
+#define ARM_LPAE_MAIR_ATTR_MASK		0xff
+#define ARM_LPAE_MAIR_ATTR_DEVICE	0x04
+#define ARM_LPAE_MAIR_ATTR_NC		0x44
+#define ARM_LPAE_MAIR_ATTR_WBRWA	0xff
+
+#endif /* __IO_PGTABLE_ARM_H */
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 16/37] iommu: Add generic PASID table library
  2018-02-12 18:33 ` Jean-Philippe Brucker
  (?)
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: joro-zLv9SwRftAIdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	mark.rutland-5wv7dgnIgG8, catalin.marinas-5wv7dgnIgG8,
	will.deacon-5wv7dgnIgG8, lorenzo.pieralisi-5wv7dgnIgG8,
	hanjun.guo-QSEj5FYQhm4dnm+yROfE0A, sudeep.holla-5wv7dgnIgG8,
	rjw-LthD3rsA81gm4RdzfppkhA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	robin.murphy-5wv7dgnIgG8, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	tn-nYOzD4b6Jr9Wk0Htik3J/w, liubo95-hv44wF8Li93QT0dZR+AlfA,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	jonathan.cameron-hv44wF8Li93QT0dZR+AlfA,
	shunyong.yang-PT9Dzx9SjPiXmMXjJBpWqg,
	nwatters-sgV2jX0FEOL9JmXXK+q4OQ, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	jcrouse-sgV2jX0FEOL9JmXXK+q4OQ, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA,
	yi.l.liu-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	robdclark-Re5JQEeQqe8AvxtiuMwx3w, christian.koenig-5C7GfCeVMHo,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA

Add a small API within the IOMMU subsystem to handle different formats of
PASID tables. It uses the same principle as io-pgtable:

* The IOMMU driver registers a PASID table with some invalidation
  callbacks.
* The pasid-table lib allocates a set of tables of the right format, and
  returns an iommu_pasid_table_ops structure.
* The IOMMU driver allocates entries and writes them using the provided
  ops.
* The pasid-table lib calls the IOMMU driver back for invalidation when
  necessary.
* The IOMMU driver unregisters the ops which frees the tables when
  finished.

An example user will be Arm SMMU in a subsequent patch.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/Kconfig       |   8 +++
 drivers/iommu/Makefile      |   1 +
 drivers/iommu/iommu-pasid.c |  53 +++++++++++++++++
 drivers/iommu/iommu-pasid.h | 142 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 204 insertions(+)
 create mode 100644 drivers/iommu/iommu-pasid.c
 create mode 100644 drivers/iommu/iommu-pasid.h

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index e751bb9958ba..8add90ba9b75 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -60,6 +60,14 @@ config IOMMU_IO_PGTABLE_ARMV7S_SELFTEST
 
 endmenu
 
+menu "Generic PASID table support"
+
+# Selected by the actual PASID table implementations
+config IOMMU_PASID_TABLE
+	bool
+
+endmenu
+
 config IOMMU_IOVA
 	tristate
 
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index f4324e29035e..338e59c93131 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -8,6 +8,7 @@ obj-$(CONFIG_IOMMU_FAULT) += io-pgfault.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
+obj-$(CONFIG_IOMMU_PASID_TABLE) += iommu-pasid.o
 obj-$(CONFIG_IOMMU_IOVA) += iova.o
 obj-$(CONFIG_OF_IOMMU)	+= of_iommu.o
 obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
diff --git a/drivers/iommu/iommu-pasid.c b/drivers/iommu/iommu-pasid.c
new file mode 100644
index 000000000000..6b21d369d514
--- /dev/null
+++ b/drivers/iommu/iommu-pasid.c
@@ -0,0 +1,53 @@
+/*
+ * PASID table management for the IOMMU
+ *
+ * Copyright (C) 2018 ARM Ltd.
+ * Author: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
+ *
+ * SPDX-License-Identifier: GPL-2.0
+ */
+
+#include <linux/kernel.h>
+
+#include "iommu-pasid.h"
+
+static const struct iommu_pasid_init_fns *
+pasid_table_init_fns[PASID_TABLE_NUM_FMTS] = {
+};
+
+struct iommu_pasid_table_ops *
+iommu_alloc_pasid_ops(enum iommu_pasid_table_fmt fmt,
+		      struct iommu_pasid_table_cfg *cfg, void *cookie)
+{
+	struct iommu_pasid_table *table;
+	const struct iommu_pasid_init_fns *fns;
+
+	if (fmt >= PASID_TABLE_NUM_FMTS)
+		return NULL;
+
+	fns = pasid_table_init_fns[fmt];
+	if (!fns)
+		return NULL;
+
+	table = fns->alloc(cfg, cookie);
+	if (!table)
+		return NULL;
+
+	table->fmt = fmt;
+	table->cookie = cookie;
+	table->cfg = *cfg;
+
+	return &table->ops;
+}
+
+void iommu_free_pasid_ops(struct iommu_pasid_table_ops *ops)
+{
+	struct iommu_pasid_table *table;
+
+	if (!ops)
+		return;
+
+	table = container_of(ops, struct iommu_pasid_table, ops);
+	iommu_pasid_flush_all(table);
+	pasid_table_init_fns[table->fmt]->free(table);
+}
diff --git a/drivers/iommu/iommu-pasid.h b/drivers/iommu/iommu-pasid.h
new file mode 100644
index 000000000000..40a27d35c1e0
--- /dev/null
+++ b/drivers/iommu/iommu-pasid.h
@@ -0,0 +1,142 @@
+/*
+ * PASID table management for the IOMMU
+ *
+ * Copyright (C) 2017 ARM Ltd.
+ * Author: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
+ *
+ * SPDX-License-Identifier: GPL-2.0
+ */
+#ifndef __IOMMU_PASID_H
+#define __IOMMU_PASID_H
+
+#include <linux/types.h>
+#include "io-pgtable.h"
+
+struct mm_struct;
+
+enum iommu_pasid_table_fmt {
+	PASID_TABLE_NUM_FMTS,
+};
+
+/**
+ * iommu_pasid_entry - Entry of a PASID table
+ *
+ * @token:	architecture-specific data needed to uniquely identify the
+ *		entry. Most notably used for TLB invalidation
+ */
+struct iommu_pasid_entry {
+	u64		tag;
+};
+
+/**
+ * iommu_pasid_table_ops - Operations on a PASID table
+ *
+ * @alloc_shared_entry:	allocate an entry for sharing an mm (SVA)
+ *			Returns the pointer to a new entry or an error
+ * @alloc_priv_entry:	allocate an entry for map/unmap operations
+ *			Returns the pointer to a new entry or an error
+ * @free_entry:		free an entry obtained with alloc_entry
+ * @set_entry:		write PASID table entry
+ * @clear_entry:	clear PASID table entry
+ */
+struct iommu_pasid_table_ops {
+	struct iommu_pasid_entry *
+	(*alloc_shared_entry)(struct iommu_pasid_table_ops *ops,
+			      struct mm_struct *mm);
+	struct iommu_pasid_entry *
+	(*alloc_priv_entry)(struct iommu_pasid_table_ops *ops,
+			    enum io_pgtable_fmt fmt,
+			    struct io_pgtable_cfg *cfg);
+	void (*free_entry)(struct iommu_pasid_table_ops *ops,
+			   struct iommu_pasid_entry *entry);
+	int (*set_entry)(struct iommu_pasid_table_ops *ops, int pasid,
+			 struct iommu_pasid_entry *entry);
+	void (*clear_entry)(struct iommu_pasid_table_ops *ops, int pasid,
+			    struct iommu_pasid_entry *entry);
+};
+
+/**
+ * iommu_pasid_sync_ops - Callbacks into the IOMMU driver
+ *
+ * @cfg_flush:		flush cached configuration for one entry. For a
+ *			multi-level PASID table, 'leaf' tells whether to only
+ *			flush cached leaf entries or intermediate levels as
+ *			well.
+ * @cfg_flush_all:	flush cached configuration for all entries of the PASID
+ *			table
+ * @tlb_flush:		flush TLB entries for one entry
+ */
+struct iommu_pasid_sync_ops {
+	void (*cfg_flush)(void *cookie, int pasid, bool leaf);
+	void (*cfg_flush_all)(void *cookie);
+	void (*tlb_flush)(void *cookie, int pasid,
+			  struct iommu_pasid_entry *entry);
+};
+
+/**
+ * struct iommu_pasid_table_cfg - Configuration data for a set of PASID tables.
+ *
+ * @iommu_dev	device performing the DMA table walks
+ * @order:	number of PASID bits, set by IOMMU driver
+ * @flush:	TLB management callbacks for this set of tables.
+ *
+ * @base:	DMA address of the allocated table, set by the allocator.
+ */
+struct iommu_pasid_table_cfg {
+	struct device			*iommu_dev;
+	size_t				order;
+	const struct iommu_pasid_sync_ops *sync;
+
+	dma_addr_t			base;
+};
+
+struct iommu_pasid_table_ops *
+iommu_alloc_pasid_ops(enum iommu_pasid_table_fmt fmt,
+		      struct iommu_pasid_table_cfg *cfg,
+		      void *cookie);
+void iommu_free_pasid_ops(struct iommu_pasid_table_ops *ops);
+
+/**
+ * struct iommu_pasid_table - describes a set of PASID tables
+ *
+ * @fmt:	The PASID table format.
+ * @cookie:	An opaque token provided by the IOMMU driver and passed back to
+ *		any callback routine.
+ * @cfg:	A copy of the PASID table configuration.
+ * @ops:	The PASID table operations in use for this set of page tables.
+ */
+struct iommu_pasid_table {
+	enum iommu_pasid_table_fmt	fmt;
+	void				*cookie;
+	struct iommu_pasid_table_cfg	cfg;
+	struct iommu_pasid_table_ops	ops;
+};
+
+#define iommu_pasid_table_ops_to_table(ops) \
+	container_of((ops), struct iommu_pasid_table, ops)
+
+struct iommu_pasid_init_fns {
+	struct iommu_pasid_table *(*alloc)(struct iommu_pasid_table_cfg *cfg,
+					   void *cookie);
+	void (*free)(struct iommu_pasid_table *table);
+};
+
+static inline void iommu_pasid_flush_all(struct iommu_pasid_table *table)
+{
+	table->cfg.sync->cfg_flush_all(table->cookie);
+}
+
+static inline void iommu_pasid_flush(struct iommu_pasid_table *table,
+					 int pasid, bool leaf)
+{
+	table->cfg.sync->cfg_flush(table->cookie, pasid, leaf);
+}
+
+static inline void iommu_pasid_flush_tlbs(struct iommu_pasid_table *table,
+					  int pasid,
+					  struct iommu_pasid_entry *entry)
+{
+	table->cfg.sync->tlb_flush(table->cookie, pasid, entry);
+}
+
+#endif /* __IOMMU_PASID_H */
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 16/37] iommu: Add generic PASID table library
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

Add a small API within the IOMMU subsystem to handle different formats of
PASID tables. It uses the same principle as io-pgtable:

* The IOMMU driver registers a PASID table with some invalidation
  callbacks.
* The pasid-table lib allocates a set of tables of the right format, and
  returns an iommu_pasid_table_ops structure.
* The IOMMU driver allocates entries and writes them using the provided
  ops.
* The pasid-table lib calls the IOMMU driver back for invalidation when
  necessary.
* The IOMMU driver unregisters the ops which frees the tables when
  finished.

An example user will be Arm SMMU in a subsequent patch.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/Kconfig       |   8 +++
 drivers/iommu/Makefile      |   1 +
 drivers/iommu/iommu-pasid.c |  53 +++++++++++++++++
 drivers/iommu/iommu-pasid.h | 142 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 204 insertions(+)
 create mode 100644 drivers/iommu/iommu-pasid.c
 create mode 100644 drivers/iommu/iommu-pasid.h

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index e751bb9958ba..8add90ba9b75 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -60,6 +60,14 @@ config IOMMU_IO_PGTABLE_ARMV7S_SELFTEST
 
 endmenu
 
+menu "Generic PASID table support"
+
+# Selected by the actual PASID table implementations
+config IOMMU_PASID_TABLE
+	bool
+
+endmenu
+
 config IOMMU_IOVA
 	tristate
 
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index f4324e29035e..338e59c93131 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -8,6 +8,7 @@ obj-$(CONFIG_IOMMU_FAULT) += io-pgfault.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
+obj-$(CONFIG_IOMMU_PASID_TABLE) += iommu-pasid.o
 obj-$(CONFIG_IOMMU_IOVA) += iova.o
 obj-$(CONFIG_OF_IOMMU)	+= of_iommu.o
 obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
diff --git a/drivers/iommu/iommu-pasid.c b/drivers/iommu/iommu-pasid.c
new file mode 100644
index 000000000000..6b21d369d514
--- /dev/null
+++ b/drivers/iommu/iommu-pasid.c
@@ -0,0 +1,53 @@
+/*
+ * PASID table management for the IOMMU
+ *
+ * Copyright (C) 2018 ARM Ltd.
+ * Author: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0
+ */
+
+#include <linux/kernel.h>
+
+#include "iommu-pasid.h"
+
+static const struct iommu_pasid_init_fns *
+pasid_table_init_fns[PASID_TABLE_NUM_FMTS] = {
+};
+
+struct iommu_pasid_table_ops *
+iommu_alloc_pasid_ops(enum iommu_pasid_table_fmt fmt,
+		      struct iommu_pasid_table_cfg *cfg, void *cookie)
+{
+	struct iommu_pasid_table *table;
+	const struct iommu_pasid_init_fns *fns;
+
+	if (fmt >= PASID_TABLE_NUM_FMTS)
+		return NULL;
+
+	fns = pasid_table_init_fns[fmt];
+	if (!fns)
+		return NULL;
+
+	table = fns->alloc(cfg, cookie);
+	if (!table)
+		return NULL;
+
+	table->fmt = fmt;
+	table->cookie = cookie;
+	table->cfg = *cfg;
+
+	return &table->ops;
+}
+
+void iommu_free_pasid_ops(struct iommu_pasid_table_ops *ops)
+{
+	struct iommu_pasid_table *table;
+
+	if (!ops)
+		return;
+
+	table = container_of(ops, struct iommu_pasid_table, ops);
+	iommu_pasid_flush_all(table);
+	pasid_table_init_fns[table->fmt]->free(table);
+}
diff --git a/drivers/iommu/iommu-pasid.h b/drivers/iommu/iommu-pasid.h
new file mode 100644
index 000000000000..40a27d35c1e0
--- /dev/null
+++ b/drivers/iommu/iommu-pasid.h
@@ -0,0 +1,142 @@
+/*
+ * PASID table management for the IOMMU
+ *
+ * Copyright (C) 2017 ARM Ltd.
+ * Author: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0
+ */
+#ifndef __IOMMU_PASID_H
+#define __IOMMU_PASID_H
+
+#include <linux/types.h>
+#include "io-pgtable.h"
+
+struct mm_struct;
+
+enum iommu_pasid_table_fmt {
+	PASID_TABLE_NUM_FMTS,
+};
+
+/**
+ * iommu_pasid_entry - Entry of a PASID table
+ *
+ * @token:	architecture-specific data needed to uniquely identify the
+ *		entry. Most notably used for TLB invalidation
+ */
+struct iommu_pasid_entry {
+	u64		tag;
+};
+
+/**
+ * iommu_pasid_table_ops - Operations on a PASID table
+ *
+ * @alloc_shared_entry:	allocate an entry for sharing an mm (SVA)
+ *			Returns the pointer to a new entry or an error
+ * @alloc_priv_entry:	allocate an entry for map/unmap operations
+ *			Returns the pointer to a new entry or an error
+ * @free_entry:		free an entry obtained with alloc_entry
+ * @set_entry:		write PASID table entry
+ * @clear_entry:	clear PASID table entry
+ */
+struct iommu_pasid_table_ops {
+	struct iommu_pasid_entry *
+	(*alloc_shared_entry)(struct iommu_pasid_table_ops *ops,
+			      struct mm_struct *mm);
+	struct iommu_pasid_entry *
+	(*alloc_priv_entry)(struct iommu_pasid_table_ops *ops,
+			    enum io_pgtable_fmt fmt,
+			    struct io_pgtable_cfg *cfg);
+	void (*free_entry)(struct iommu_pasid_table_ops *ops,
+			   struct iommu_pasid_entry *entry);
+	int (*set_entry)(struct iommu_pasid_table_ops *ops, int pasid,
+			 struct iommu_pasid_entry *entry);
+	void (*clear_entry)(struct iommu_pasid_table_ops *ops, int pasid,
+			    struct iommu_pasid_entry *entry);
+};
+
+/**
+ * iommu_pasid_sync_ops - Callbacks into the IOMMU driver
+ *
+ * @cfg_flush:		flush cached configuration for one entry. For a
+ *			multi-level PASID table, 'leaf' tells whether to only
+ *			flush cached leaf entries or intermediate levels as
+ *			well.
+ * @cfg_flush_all:	flush cached configuration for all entries of the PASID
+ *			table
+ * @tlb_flush:		flush TLB entries for one entry
+ */
+struct iommu_pasid_sync_ops {
+	void (*cfg_flush)(void *cookie, int pasid, bool leaf);
+	void (*cfg_flush_all)(void *cookie);
+	void (*tlb_flush)(void *cookie, int pasid,
+			  struct iommu_pasid_entry *entry);
+};
+
+/**
+ * struct iommu_pasid_table_cfg - Configuration data for a set of PASID tables.
+ *
+ * @iommu_dev	device performing the DMA table walks
+ * @order:	number of PASID bits, set by IOMMU driver
+ * @flush:	TLB management callbacks for this set of tables.
+ *
+ * @base:	DMA address of the allocated table, set by the allocator.
+ */
+struct iommu_pasid_table_cfg {
+	struct device			*iommu_dev;
+	size_t				order;
+	const struct iommu_pasid_sync_ops *sync;
+
+	dma_addr_t			base;
+};
+
+struct iommu_pasid_table_ops *
+iommu_alloc_pasid_ops(enum iommu_pasid_table_fmt fmt,
+		      struct iommu_pasid_table_cfg *cfg,
+		      void *cookie);
+void iommu_free_pasid_ops(struct iommu_pasid_table_ops *ops);
+
+/**
+ * struct iommu_pasid_table - describes a set of PASID tables
+ *
+ * @fmt:	The PASID table format.
+ * @cookie:	An opaque token provided by the IOMMU driver and passed back to
+ *		any callback routine.
+ * @cfg:	A copy of the PASID table configuration.
+ * @ops:	The PASID table operations in use for this set of page tables.
+ */
+struct iommu_pasid_table {
+	enum iommu_pasid_table_fmt	fmt;
+	void				*cookie;
+	struct iommu_pasid_table_cfg	cfg;
+	struct iommu_pasid_table_ops	ops;
+};
+
+#define iommu_pasid_table_ops_to_table(ops) \
+	container_of((ops), struct iommu_pasid_table, ops)
+
+struct iommu_pasid_init_fns {
+	struct iommu_pasid_table *(*alloc)(struct iommu_pasid_table_cfg *cfg,
+					   void *cookie);
+	void (*free)(struct iommu_pasid_table *table);
+};
+
+static inline void iommu_pasid_flush_all(struct iommu_pasid_table *table)
+{
+	table->cfg.sync->cfg_flush_all(table->cookie);
+}
+
+static inline void iommu_pasid_flush(struct iommu_pasid_table *table,
+					 int pasid, bool leaf)
+{
+	table->cfg.sync->cfg_flush(table->cookie, pasid, leaf);
+}
+
+static inline void iommu_pasid_flush_tlbs(struct iommu_pasid_table *table,
+					  int pasid,
+					  struct iommu_pasid_entry *entry)
+{
+	table->cfg.sync->tlb_flush(table->cookie, pasid, entry);
+}
+
+#endif /* __IOMMU_PASID_H */
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 16/37] iommu: Add generic PASID table library
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

Add a small API within the IOMMU subsystem to handle different formats of
PASID tables. It uses the same principle as io-pgtable:

* The IOMMU driver registers a PASID table with some invalidation
  callbacks.
* The pasid-table lib allocates a set of tables of the right format, and
  returns an iommu_pasid_table_ops structure.
* The IOMMU driver allocates entries and writes them using the provided
  ops.
* The pasid-table lib calls the IOMMU driver back for invalidation when
  necessary.
* The IOMMU driver unregisters the ops which frees the tables when
  finished.

An example user will be Arm SMMU in a subsequent patch.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/Kconfig       |   8 +++
 drivers/iommu/Makefile      |   1 +
 drivers/iommu/iommu-pasid.c |  53 +++++++++++++++++
 drivers/iommu/iommu-pasid.h | 142 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 204 insertions(+)
 create mode 100644 drivers/iommu/iommu-pasid.c
 create mode 100644 drivers/iommu/iommu-pasid.h

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index e751bb9958ba..8add90ba9b75 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -60,6 +60,14 @@ config IOMMU_IO_PGTABLE_ARMV7S_SELFTEST
 
 endmenu
 
+menu "Generic PASID table support"
+
+# Selected by the actual PASID table implementations
+config IOMMU_PASID_TABLE
+	bool
+
+endmenu
+
 config IOMMU_IOVA
 	tristate
 
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index f4324e29035e..338e59c93131 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -8,6 +8,7 @@ obj-$(CONFIG_IOMMU_FAULT) += io-pgfault.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
+obj-$(CONFIG_IOMMU_PASID_TABLE) += iommu-pasid.o
 obj-$(CONFIG_IOMMU_IOVA) += iova.o
 obj-$(CONFIG_OF_IOMMU)	+= of_iommu.o
 obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
diff --git a/drivers/iommu/iommu-pasid.c b/drivers/iommu/iommu-pasid.c
new file mode 100644
index 000000000000..6b21d369d514
--- /dev/null
+++ b/drivers/iommu/iommu-pasid.c
@@ -0,0 +1,53 @@
+/*
+ * PASID table management for the IOMMU
+ *
+ * Copyright (C) 2018 ARM Ltd.
+ * Author: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0
+ */
+
+#include <linux/kernel.h>
+
+#include "iommu-pasid.h"
+
+static const struct iommu_pasid_init_fns *
+pasid_table_init_fns[PASID_TABLE_NUM_FMTS] = {
+};
+
+struct iommu_pasid_table_ops *
+iommu_alloc_pasid_ops(enum iommu_pasid_table_fmt fmt,
+		      struct iommu_pasid_table_cfg *cfg, void *cookie)
+{
+	struct iommu_pasid_table *table;
+	const struct iommu_pasid_init_fns *fns;
+
+	if (fmt >= PASID_TABLE_NUM_FMTS)
+		return NULL;
+
+	fns = pasid_table_init_fns[fmt];
+	if (!fns)
+		return NULL;
+
+	table = fns->alloc(cfg, cookie);
+	if (!table)
+		return NULL;
+
+	table->fmt = fmt;
+	table->cookie = cookie;
+	table->cfg = *cfg;
+
+	return &table->ops;
+}
+
+void iommu_free_pasid_ops(struct iommu_pasid_table_ops *ops)
+{
+	struct iommu_pasid_table *table;
+
+	if (!ops)
+		return;
+
+	table = container_of(ops, struct iommu_pasid_table, ops);
+	iommu_pasid_flush_all(table);
+	pasid_table_init_fns[table->fmt]->free(table);
+}
diff --git a/drivers/iommu/iommu-pasid.h b/drivers/iommu/iommu-pasid.h
new file mode 100644
index 000000000000..40a27d35c1e0
--- /dev/null
+++ b/drivers/iommu/iommu-pasid.h
@@ -0,0 +1,142 @@
+/*
+ * PASID table management for the IOMMU
+ *
+ * Copyright (C) 2017 ARM Ltd.
+ * Author: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0
+ */
+#ifndef __IOMMU_PASID_H
+#define __IOMMU_PASID_H
+
+#include <linux/types.h>
+#include "io-pgtable.h"
+
+struct mm_struct;
+
+enum iommu_pasid_table_fmt {
+	PASID_TABLE_NUM_FMTS,
+};
+
+/**
+ * iommu_pasid_entry - Entry of a PASID table
+ *
+ * @token:	architecture-specific data needed to uniquely identify the
+ *		entry. Most notably used for TLB invalidation
+ */
+struct iommu_pasid_entry {
+	u64		tag;
+};
+
+/**
+ * iommu_pasid_table_ops - Operations on a PASID table
+ *
+ * @alloc_shared_entry:	allocate an entry for sharing an mm (SVA)
+ *			Returns the pointer to a new entry or an error
+ * @alloc_priv_entry:	allocate an entry for map/unmap operations
+ *			Returns the pointer to a new entry or an error
+ * @free_entry:		free an entry obtained with alloc_entry
+ * @set_entry:		write PASID table entry
+ * @clear_entry:	clear PASID table entry
+ */
+struct iommu_pasid_table_ops {
+	struct iommu_pasid_entry *
+	(*alloc_shared_entry)(struct iommu_pasid_table_ops *ops,
+			      struct mm_struct *mm);
+	struct iommu_pasid_entry *
+	(*alloc_priv_entry)(struct iommu_pasid_table_ops *ops,
+			    enum io_pgtable_fmt fmt,
+			    struct io_pgtable_cfg *cfg);
+	void (*free_entry)(struct iommu_pasid_table_ops *ops,
+			   struct iommu_pasid_entry *entry);
+	int (*set_entry)(struct iommu_pasid_table_ops *ops, int pasid,
+			 struct iommu_pasid_entry *entry);
+	void (*clear_entry)(struct iommu_pasid_table_ops *ops, int pasid,
+			    struct iommu_pasid_entry *entry);
+};
+
+/**
+ * iommu_pasid_sync_ops - Callbacks into the IOMMU driver
+ *
+ * @cfg_flush:		flush cached configuration for one entry. For a
+ *			multi-level PASID table, 'leaf' tells whether to only
+ *			flush cached leaf entries or intermediate levels as
+ *			well.
+ * @cfg_flush_all:	flush cached configuration for all entries of the PASID
+ *			table
+ * @tlb_flush:		flush TLB entries for one entry
+ */
+struct iommu_pasid_sync_ops {
+	void (*cfg_flush)(void *cookie, int pasid, bool leaf);
+	void (*cfg_flush_all)(void *cookie);
+	void (*tlb_flush)(void *cookie, int pasid,
+			  struct iommu_pasid_entry *entry);
+};
+
+/**
+ * struct iommu_pasid_table_cfg - Configuration data for a set of PASID tables.
+ *
+ * @iommu_dev	device performing the DMA table walks
+ * @order:	number of PASID bits, set by IOMMU driver
+ * @flush:	TLB management callbacks for this set of tables.
+ *
+ * @base:	DMA address of the allocated table, set by the allocator.
+ */
+struct iommu_pasid_table_cfg {
+	struct device			*iommu_dev;
+	size_t				order;
+	const struct iommu_pasid_sync_ops *sync;
+
+	dma_addr_t			base;
+};
+
+struct iommu_pasid_table_ops *
+iommu_alloc_pasid_ops(enum iommu_pasid_table_fmt fmt,
+		      struct iommu_pasid_table_cfg *cfg,
+		      void *cookie);
+void iommu_free_pasid_ops(struct iommu_pasid_table_ops *ops);
+
+/**
+ * struct iommu_pasid_table - describes a set of PASID tables
+ *
+ * @fmt:	The PASID table format.
+ * @cookie:	An opaque token provided by the IOMMU driver and passed back to
+ *		any callback routine.
+ * @cfg:	A copy of the PASID table configuration.
+ * @ops:	The PASID table operations in use for this set of page tables.
+ */
+struct iommu_pasid_table {
+	enum iommu_pasid_table_fmt	fmt;
+	void				*cookie;
+	struct iommu_pasid_table_cfg	cfg;
+	struct iommu_pasid_table_ops	ops;
+};
+
+#define iommu_pasid_table_ops_to_table(ops) \
+	container_of((ops), struct iommu_pasid_table, ops)
+
+struct iommu_pasid_init_fns {
+	struct iommu_pasid_table *(*alloc)(struct iommu_pasid_table_cfg *cfg,
+					   void *cookie);
+	void (*free)(struct iommu_pasid_table *table);
+};
+
+static inline void iommu_pasid_flush_all(struct iommu_pasid_table *table)
+{
+	table->cfg.sync->cfg_flush_all(table->cookie);
+}
+
+static inline void iommu_pasid_flush(struct iommu_pasid_table *table,
+					 int pasid, bool leaf)
+{
+	table->cfg.sync->cfg_flush(table->cookie, pasid, leaf);
+}
+
+static inline void iommu_pasid_flush_tlbs(struct iommu_pasid_table *table,
+					  int pasid,
+					  struct iommu_pasid_entry *entry)
+{
+	table->cfg.sync->tlb_flush(table->cookie, pasid, entry);
+}
+
+#endif /* __IOMMU_PASID_H */
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 17/37] iommu/arm-smmu-v3: Move context descriptor code
  2018-02-12 18:33 ` Jean-Philippe Brucker
  (?)
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: joro-zLv9SwRftAIdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	mark.rutland-5wv7dgnIgG8, catalin.marinas-5wv7dgnIgG8,
	will.deacon-5wv7dgnIgG8, lorenzo.pieralisi-5wv7dgnIgG8,
	hanjun.guo-QSEj5FYQhm4dnm+yROfE0A, sudeep.holla-5wv7dgnIgG8,
	rjw-LthD3rsA81gm4RdzfppkhA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	robin.murphy-5wv7dgnIgG8, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	tn-nYOzD4b6Jr9Wk0Htik3J/w, liubo95-hv44wF8Li93QT0dZR+AlfA,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	jonathan.cameron-hv44wF8Li93QT0dZR+AlfA,
	shunyong.yang-PT9Dzx9SjPiXmMXjJBpWqg,
	nwatters-sgV2jX0FEOL9JmXXK+q4OQ, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	jcrouse-sgV2jX0FEOL9JmXXK+q4OQ, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA,
	yi.l.liu-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	robdclark-Re5JQEeQqe8AvxtiuMwx3w, christian.koenig-5C7GfCeVMHo,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA

In order to add support for substream ID, move the context descriptor code
into a separate library. At the moment it only manages context descriptor
0, which is used for non-PASID translations.

One important behavior change is the ASID allocator, which is now global
instead of per-SMMU. If we end up needing per-SMMU ASIDs after all, it
would be relatively simple to move back to per-device allocator instead
of a global one. Sharing ASIDs will require an IDR, so implement the
ASID allocator with an IDA instead of porting the bitmap, to ease the
transition.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 MAINTAINERS                         |   2 +-
 drivers/iommu/Kconfig               |  11 ++
 drivers/iommu/Makefile              |   1 +
 drivers/iommu/arm-smmu-v3-context.c | 289 ++++++++++++++++++++++++++++++++++++
 drivers/iommu/arm-smmu-v3.c         | 265 +++++++++++++++------------------
 drivers/iommu/iommu-pasid.c         |   1 +
 drivers/iommu/iommu-pasid.h         |  27 ++++
 7 files changed, 451 insertions(+), 145 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-v3-context.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 9cb8ced8322a..93507bfe03a6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1104,7 +1104,7 @@ R:	Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
 L:	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org (moderated for non-subscribers)
 S:	Maintained
 F:	drivers/iommu/arm-smmu.c
-F:	drivers/iommu/arm-smmu-v3.c
+F:	drivers/iommu/arm-smmu-v3*
 F:	drivers/iommu/io-pgtable-arm.c
 F:	drivers/iommu/io-pgtable-arm.h
 F:	drivers/iommu/io-pgtable-arm-v7s.c
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 8add90ba9b75..4b272925ee78 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -66,6 +66,16 @@ menu "Generic PASID table support"
 config IOMMU_PASID_TABLE
 	bool
 
+config ARM_SMMU_V3_CONTEXT
+	bool "ARM SMMU v3 Context Descriptor tables"
+	select IOMMU_PASID_TABLE
+	depends on ARM64
+	help
+	Enable support for ARM SMMU v3 Context Descriptor tables, used for DMA
+	and PASID support.
+
+	If unsure, say N here.
+
 endmenu
 
 config IOMMU_IOVA
@@ -344,6 +354,7 @@ config ARM_SMMU_V3
 	depends on ARM64
 	select IOMMU_API
 	select IOMMU_IO_PGTABLE_LPAE
+	select ARM_SMMU_V3_CONTEXT
 	select GENERIC_MSI_IRQ_DOMAIN
 	help
 	  Support for implementations of the ARM System MMU architecture
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 338e59c93131..22758960ed02 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -9,6 +9,7 @@ obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
 obj-$(CONFIG_IOMMU_PASID_TABLE) += iommu-pasid.o
+obj-$(CONFIG_ARM_SMMU_V3_CONTEXT) += arm-smmu-v3-context.o
 obj-$(CONFIG_IOMMU_IOVA) += iova.o
 obj-$(CONFIG_OF_IOMMU)	+= of_iommu.o
 obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
diff --git a/drivers/iommu/arm-smmu-v3-context.c b/drivers/iommu/arm-smmu-v3-context.c
new file mode 100644
index 000000000000..e910cb356f45
--- /dev/null
+++ b/drivers/iommu/arm-smmu-v3-context.c
@@ -0,0 +1,289 @@
+/*
+ * Context descriptor table driver for SMMUv3
+ *
+ * Copyright (C) 2018 ARM Ltd.
+ *
+ * SPDX-License-Identifier: GPL-2.0
+ */
+
+#include <linux/device.h>
+#include <linux/dma-mapping.h>
+#include <linux/idr.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+
+#include "iommu-pasid.h"
+
+#define CTXDESC_CD_DWORDS		8
+#define CTXDESC_CD_0_TCR_T0SZ_SHIFT	0
+#define ARM64_TCR_T0SZ_SHIFT		0
+#define ARM64_TCR_T0SZ_MASK		0x1fUL
+#define CTXDESC_CD_0_TCR_TG0_SHIFT	6
+#define ARM64_TCR_TG0_SHIFT		14
+#define ARM64_TCR_TG0_MASK		0x3UL
+#define CTXDESC_CD_0_TCR_IRGN0_SHIFT	8
+#define ARM64_TCR_IRGN0_SHIFT		8
+#define ARM64_TCR_IRGN0_MASK		0x3UL
+#define CTXDESC_CD_0_TCR_ORGN0_SHIFT	10
+#define ARM64_TCR_ORGN0_SHIFT		10
+#define ARM64_TCR_ORGN0_MASK		0x3UL
+#define CTXDESC_CD_0_TCR_SH0_SHIFT	12
+#define ARM64_TCR_SH0_SHIFT		12
+#define ARM64_TCR_SH0_MASK		0x3UL
+#define CTXDESC_CD_0_TCR_EPD0_SHIFT	14
+#define ARM64_TCR_EPD0_SHIFT		7
+#define ARM64_TCR_EPD0_MASK		0x1UL
+#define CTXDESC_CD_0_TCR_EPD1_SHIFT	30
+#define ARM64_TCR_EPD1_SHIFT		23
+#define ARM64_TCR_EPD1_MASK		0x1UL
+
+#define CTXDESC_CD_0_ENDI		(1UL << 15)
+#define CTXDESC_CD_0_V			(1UL << 31)
+
+#define CTXDESC_CD_0_TCR_IPS_SHIFT	32
+#define ARM64_TCR_IPS_SHIFT		32
+#define ARM64_TCR_IPS_MASK		0x7UL
+#define CTXDESC_CD_0_TCR_TBI0_SHIFT	38
+#define ARM64_TCR_TBI0_SHIFT		37
+#define ARM64_TCR_TBI0_MASK		0x1UL
+
+#define CTXDESC_CD_0_AA64		(1UL << 41)
+#define CTXDESC_CD_0_S			(1UL << 44)
+#define CTXDESC_CD_0_R			(1UL << 45)
+#define CTXDESC_CD_0_A			(1UL << 46)
+#define CTXDESC_CD_0_ASET_SHIFT		47
+#define CTXDESC_CD_0_ASET_SHARED	(0UL << CTXDESC_CD_0_ASET_SHIFT)
+#define CTXDESC_CD_0_ASET_PRIVATE	(1UL << CTXDESC_CD_0_ASET_SHIFT)
+#define CTXDESC_CD_0_ASID_SHIFT		48
+#define CTXDESC_CD_0_ASID_MASK		0xffffUL
+
+#define CTXDESC_CD_1_TTB0_SHIFT		4
+#define CTXDESC_CD_1_TTB0_MASK		0xfffffffffffUL
+
+#define CTXDESC_CD_3_MAIR_SHIFT		0
+
+/* Convert between AArch64 (CPU) TCR format and SMMU CD format */
+#define ARM_SMMU_TCR2CD(tcr, fld)					\
+	(((tcr) >> ARM64_TCR_##fld##_SHIFT & ARM64_TCR_##fld##_MASK)	\
+	 << CTXDESC_CD_0_TCR_##fld##_SHIFT)
+
+
+struct arm_smmu_cd {
+	struct iommu_pasid_entry	entry;
+
+	u64				ttbr;
+	u64				tcr;
+	u64				mair;
+};
+
+#define pasid_entry_to_cd(entry) \
+	container_of((entry), struct arm_smmu_cd, entry)
+
+struct arm_smmu_cd_tables {
+	struct iommu_pasid_table	pasid;
+
+	void				*ptr;
+	dma_addr_t			ptr_dma;
+};
+
+#define pasid_to_cd_tables(pasid_table) \
+	container_of((pasid_table), struct arm_smmu_cd_tables, pasid)
+
+#define pasid_ops_to_tables(ops) \
+	pasid_to_cd_tables(iommu_pasid_table_ops_to_table(ops))
+
+static DEFINE_IDA(asid_ida);
+
+static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
+{
+	u64 val = 0;
+
+	/* Repack the TCR. Just care about TTBR0 for now */
+	val |= ARM_SMMU_TCR2CD(tcr, T0SZ);
+	val |= ARM_SMMU_TCR2CD(tcr, TG0);
+	val |= ARM_SMMU_TCR2CD(tcr, IRGN0);
+	val |= ARM_SMMU_TCR2CD(tcr, ORGN0);
+	val |= ARM_SMMU_TCR2CD(tcr, SH0);
+	val |= ARM_SMMU_TCR2CD(tcr, EPD0);
+	val |= ARM_SMMU_TCR2CD(tcr, EPD1);
+	val |= ARM_SMMU_TCR2CD(tcr, IPS);
+	val |= ARM_SMMU_TCR2CD(tcr, TBI0);
+
+	return val;
+}
+
+static int arm_smmu_write_ctx_desc(struct arm_smmu_cd_tables *tbl, int ssid,
+				    struct arm_smmu_cd *cd)
+{
+	u64 val;
+	__u64 *cdptr = tbl->ptr;
+	struct arm_smmu_context_cfg *cfg = &tbl->pasid.cfg.arm_smmu;
+
+	if (!cd || WARN_ON(ssid))
+		return -EINVAL;
+
+	/*
+	 * We don't need to issue any invalidation here, as we'll invalidate
+	 * the STE when installing the new entry anyway.
+	 */
+	val = arm_smmu_cpu_tcr_to_cd(cd->tcr) |
+#ifdef __BIG_ENDIAN
+	      CTXDESC_CD_0_ENDI |
+#endif
+	      CTXDESC_CD_0_R | CTXDESC_CD_0_A | CTXDESC_CD_0_ASET_PRIVATE |
+	      CTXDESC_CD_0_AA64 | cd->entry.tag << CTXDESC_CD_0_ASID_SHIFT |
+	      CTXDESC_CD_0_V;
+
+	if (cfg->stall)
+		val |= CTXDESC_CD_0_S;
+
+	cdptr[0] = cpu_to_le64(val);
+
+	val = cd->ttbr & CTXDESC_CD_1_TTB0_MASK << CTXDESC_CD_1_TTB0_SHIFT;
+	cdptr[1] = cpu_to_le64(val);
+
+	cdptr[3] = cpu_to_le64(cd->mair << CTXDESC_CD_3_MAIR_SHIFT);
+
+	return 0;
+}
+
+static struct iommu_pasid_entry *
+arm_smmu_alloc_shared_cd(struct iommu_pasid_table_ops *ops, struct mm_struct *mm)
+{
+	return ERR_PTR(-ENODEV);
+}
+
+static struct iommu_pasid_entry *
+arm_smmu_alloc_priv_cd(struct iommu_pasid_table_ops *ops,
+		       enum io_pgtable_fmt fmt,
+		       struct io_pgtable_cfg *cfg)
+{
+	int ret;
+	int asid;
+	struct arm_smmu_cd *cd;
+	struct arm_smmu_cd_tables *tbl = pasid_ops_to_tables(ops);
+	struct arm_smmu_context_cfg *ctx_cfg = &tbl->pasid.cfg.arm_smmu;
+
+	cd = kzalloc(sizeof(*cd), GFP_KERNEL);
+	if (!cd)
+		return ERR_PTR(-ENOMEM);
+
+	asid = ida_simple_get(&asid_ida, 0, 1 << ctx_cfg->asid_bits,
+			      GFP_KERNEL);
+	if (asid < 0) {
+		kfree(cd);
+		return ERR_PTR(asid);
+	}
+
+	cd->entry.tag = asid;
+
+	switch (fmt) {
+	case ARM_64_LPAE_S1:
+		cd->ttbr	= cfg->arm_lpae_s1_cfg.ttbr[0];
+		cd->tcr		= cfg->arm_lpae_s1_cfg.tcr;
+		cd->mair	= cfg->arm_lpae_s1_cfg.mair[0];
+		break;
+	default:
+		pr_err("Unsupported pgtable format 0x%x\n", fmt);
+		ret = -EINVAL;
+		goto err_free_asid;
+	}
+
+	return &cd->entry;
+
+err_free_asid:
+	ida_simple_remove(&asid_ida, asid);
+
+	kfree(cd);
+
+	return ERR_PTR(ret);
+}
+
+static void arm_smmu_free_cd(struct iommu_pasid_table_ops *ops,
+			     struct iommu_pasid_entry *entry)
+{
+	struct arm_smmu_cd *cd = pasid_entry_to_cd(entry);
+
+	ida_simple_remove(&asid_ida, (u16)entry->tag);
+	kfree(cd);
+}
+
+static int arm_smmu_set_cd(struct iommu_pasid_table_ops *ops, int pasid,
+			   struct iommu_pasid_entry *entry)
+{
+	struct arm_smmu_cd_tables *tbl = pasid_ops_to_tables(ops);
+	struct arm_smmu_cd *cd = pasid_entry_to_cd(entry);
+
+	if (WARN_ON(pasid > (1 << tbl->pasid.cfg.order)))
+		return -EINVAL;
+
+	return arm_smmu_write_ctx_desc(tbl, pasid, cd);
+}
+
+static void arm_smmu_clear_cd(struct iommu_pasid_table_ops *ops, int pasid,
+			      struct iommu_pasid_entry *entry)
+{
+	struct arm_smmu_cd_tables *tbl = pasid_ops_to_tables(ops);
+
+	if (WARN_ON(pasid > (1 << tbl->pasid.cfg.order)))
+		return;
+
+	arm_smmu_write_ctx_desc(tbl, pasid, NULL);
+}
+
+static struct iommu_pasid_table *
+arm_smmu_alloc_cd_tables(struct iommu_pasid_table_cfg *cfg, void *cookie)
+{
+	struct arm_smmu_cd_tables *tbl;
+	struct device *dev = cfg->iommu_dev;
+
+	if (cfg->order) {
+		/* TODO: support SSID */
+		return NULL;
+	}
+
+	tbl = devm_kzalloc(dev, sizeof(*tbl), GFP_KERNEL);
+	if (!tbl)
+		return NULL;
+
+	tbl->ptr = dmam_alloc_coherent(dev, CTXDESC_CD_DWORDS << 3,
+				       &tbl->ptr_dma, GFP_KERNEL | __GFP_ZERO);
+	if (!tbl->ptr) {
+		dev_warn(dev, "failed to allocate context descriptor\n");
+		goto err_free_tbl;
+	}
+
+	tbl->pasid.ops = (struct iommu_pasid_table_ops) {
+		.alloc_priv_entry	= arm_smmu_alloc_priv_cd,
+		.alloc_shared_entry	= arm_smmu_alloc_shared_cd,
+		.free_entry		= arm_smmu_free_cd,
+		.set_entry		= arm_smmu_set_cd,
+		.clear_entry		= arm_smmu_clear_cd,
+	};
+
+	cfg->base		= tbl->ptr_dma;
+	cfg->arm_smmu.s1fmt	= ARM_SMMU_S1FMT_LINEAR;
+
+	return &tbl->pasid;
+
+err_free_tbl:
+	devm_kfree(dev, tbl);
+
+	return NULL;
+}
+
+static void arm_smmu_free_cd_tables(struct iommu_pasid_table *pasid_table)
+{
+	struct iommu_pasid_table_cfg *cfg = &pasid_table->cfg;
+	struct device *dev = cfg->iommu_dev;
+	struct arm_smmu_cd_tables *tbl = pasid_to_cd_tables(pasid_table);
+
+	dmam_free_coherent(dev, CTXDESC_CD_DWORDS << 3,
+			   tbl->ptr, tbl->ptr_dma);
+	devm_kfree(dev, tbl);
+}
+
+struct iommu_pasid_init_fns arm_smmu_v3_pasid_init_fns = {
+	.alloc	= arm_smmu_alloc_cd_tables,
+	.free	= arm_smmu_free_cd_tables,
+};
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index fb2507ffcdaf..b6d8c90fafb3 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -40,6 +40,7 @@
 #include <linux/amba/bus.h>
 
 #include "io-pgtable.h"
+#include "iommu-pasid.h"
 
 /* MMIO registers */
 #define ARM_SMMU_IDR0			0x0
@@ -281,60 +282,6 @@
 #define STRTAB_STE_3_S2TTB_SHIFT	4
 #define STRTAB_STE_3_S2TTB_MASK		0xfffffffffffUL
 
-/* Context descriptor (stage-1 only) */
-#define CTXDESC_CD_DWORDS		8
-#define CTXDESC_CD_0_TCR_T0SZ_SHIFT	0
-#define ARM64_TCR_T0SZ_SHIFT		0
-#define ARM64_TCR_T0SZ_MASK		0x1fUL
-#define CTXDESC_CD_0_TCR_TG0_SHIFT	6
-#define ARM64_TCR_TG0_SHIFT		14
-#define ARM64_TCR_TG0_MASK		0x3UL
-#define CTXDESC_CD_0_TCR_IRGN0_SHIFT	8
-#define ARM64_TCR_IRGN0_SHIFT		8
-#define ARM64_TCR_IRGN0_MASK		0x3UL
-#define CTXDESC_CD_0_TCR_ORGN0_SHIFT	10
-#define ARM64_TCR_ORGN0_SHIFT		10
-#define ARM64_TCR_ORGN0_MASK		0x3UL
-#define CTXDESC_CD_0_TCR_SH0_SHIFT	12
-#define ARM64_TCR_SH0_SHIFT		12
-#define ARM64_TCR_SH0_MASK		0x3UL
-#define CTXDESC_CD_0_TCR_EPD0_SHIFT	14
-#define ARM64_TCR_EPD0_SHIFT		7
-#define ARM64_TCR_EPD0_MASK		0x1UL
-#define CTXDESC_CD_0_TCR_EPD1_SHIFT	30
-#define ARM64_TCR_EPD1_SHIFT		23
-#define ARM64_TCR_EPD1_MASK		0x1UL
-
-#define CTXDESC_CD_0_ENDI		(1UL << 15)
-#define CTXDESC_CD_0_V			(1UL << 31)
-
-#define CTXDESC_CD_0_TCR_IPS_SHIFT	32
-#define ARM64_TCR_IPS_SHIFT		32
-#define ARM64_TCR_IPS_MASK		0x7UL
-#define CTXDESC_CD_0_TCR_TBI0_SHIFT	38
-#define ARM64_TCR_TBI0_SHIFT		37
-#define ARM64_TCR_TBI0_MASK		0x1UL
-
-#define CTXDESC_CD_0_AA64		(1UL << 41)
-#define CTXDESC_CD_0_S			(1UL << 44)
-#define CTXDESC_CD_0_R			(1UL << 45)
-#define CTXDESC_CD_0_A			(1UL << 46)
-#define CTXDESC_CD_0_ASET_SHIFT		47
-#define CTXDESC_CD_0_ASET_SHARED	(0UL << CTXDESC_CD_0_ASET_SHIFT)
-#define CTXDESC_CD_0_ASET_PRIVATE	(1UL << CTXDESC_CD_0_ASET_SHIFT)
-#define CTXDESC_CD_0_ASID_SHIFT		48
-#define CTXDESC_CD_0_ASID_MASK		0xffffUL
-
-#define CTXDESC_CD_1_TTB0_SHIFT		4
-#define CTXDESC_CD_1_TTB0_MASK		0xfffffffffffUL
-
-#define CTXDESC_CD_3_MAIR_SHIFT		0
-
-/* Convert between AArch64 (CPU) TCR format and SMMU CD format */
-#define ARM_SMMU_TCR2CD(tcr, fld)					\
-	(((tcr) >> ARM64_TCR_##fld##_SHIFT & ARM64_TCR_##fld##_MASK)	\
-	 << CTXDESC_CD_0_TCR_##fld##_SHIFT)
-
 /* Command queue */
 #define CMDQ_ENT_DWORDS			2
 #define CMDQ_MAX_SZ_SHIFT		8
@@ -353,6 +300,8 @@
 #define CMDQ_PREFETCH_1_SIZE_SHIFT	0
 #define CMDQ_PREFETCH_1_ADDR_MASK	~0xfffUL
 
+#define CMDQ_CFGI_0_SSID_SHIFT		12
+#define CMDQ_CFGI_0_SSID_MASK		0xfffffUL
 #define CMDQ_CFGI_0_SID_SHIFT		32
 #define CMDQ_CFGI_0_SID_MASK		0xffffffffUL
 #define CMDQ_CFGI_1_LEAF		(1UL << 0)
@@ -476,8 +425,11 @@ struct arm_smmu_cmdq_ent {
 
 		#define CMDQ_OP_CFGI_STE	0x3
 		#define CMDQ_OP_CFGI_ALL	0x4
+		#define CMDQ_OP_CFGI_CD		0x5
+		#define CMDQ_OP_CFGI_CD_ALL	0x6
 		struct {
 			u32			sid;
+			u32			ssid;
 			union {
 				bool		leaf;
 				u8		span;
@@ -552,15 +504,9 @@ struct arm_smmu_strtab_l1_desc {
 };
 
 struct arm_smmu_s1_cfg {
-	__le64				*cdptr;
-	dma_addr_t			cdptr_dma;
-
-	struct arm_smmu_ctx_desc {
-		u16	asid;
-		u64	ttbr;
-		u64	tcr;
-		u64	mair;
-	}				cd;
+	struct iommu_pasid_table_cfg	tables;
+	struct iommu_pasid_table_ops	*ops;
+	struct iommu_pasid_entry	*cd0; /* Default context */
 };
 
 struct arm_smmu_s2_cfg {
@@ -629,9 +575,7 @@ struct arm_smmu_device {
 	unsigned long			oas; /* PA */
 	unsigned long			pgsize_bitmap;
 
-#define ARM_SMMU_MAX_ASIDS		(1 << 16)
 	unsigned int			asid_bits;
-	DECLARE_BITMAP(asid_map, ARM_SMMU_MAX_ASIDS);
 
 #define ARM_SMMU_MAX_VMIDS		(1 << 16)
 	unsigned int			vmid_bits;
@@ -855,10 +799,16 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 		cmd[1] |= ent->prefetch.size << CMDQ_PREFETCH_1_SIZE_SHIFT;
 		cmd[1] |= ent->prefetch.addr & CMDQ_PREFETCH_1_ADDR_MASK;
 		break;
+	case CMDQ_OP_CFGI_CD:
+		cmd[0] |= ent->cfgi.ssid << CMDQ_CFGI_0_SSID_SHIFT;
+		/* Fallthrough */
 	case CMDQ_OP_CFGI_STE:
 		cmd[0] |= (u64)ent->cfgi.sid << CMDQ_CFGI_0_SID_SHIFT;
 		cmd[1] |= ent->cfgi.leaf ? CMDQ_CFGI_1_LEAF : 0;
 		break;
+	case CMDQ_OP_CFGI_CD_ALL:
+		cmd[0] |= (u64)ent->cfgi.sid << CMDQ_CFGI_0_SID_SHIFT;
+		break;
 	case CMDQ_OP_CFGI_ALL:
 		/* Cover the entire SID range */
 		cmd[1] |= CMDQ_CFGI_1_RANGE_MASK << CMDQ_CFGI_1_RANGE_SHIFT;
@@ -1059,54 +1009,6 @@ static void arm_smmu_cmdq_issue_sync(struct arm_smmu_device *smmu)
 		dev_err_ratelimited(smmu->dev, "CMD_SYNC timeout\n");
 }
 
-/* Context descriptor manipulation functions */
-static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
-{
-	u64 val = 0;
-
-	/* Repack the TCR. Just care about TTBR0 for now */
-	val |= ARM_SMMU_TCR2CD(tcr, T0SZ);
-	val |= ARM_SMMU_TCR2CD(tcr, TG0);
-	val |= ARM_SMMU_TCR2CD(tcr, IRGN0);
-	val |= ARM_SMMU_TCR2CD(tcr, ORGN0);
-	val |= ARM_SMMU_TCR2CD(tcr, SH0);
-	val |= ARM_SMMU_TCR2CD(tcr, EPD0);
-	val |= ARM_SMMU_TCR2CD(tcr, EPD1);
-	val |= ARM_SMMU_TCR2CD(tcr, IPS);
-	val |= ARM_SMMU_TCR2CD(tcr, TBI0);
-
-	return val;
-}
-
-static void arm_smmu_write_ctx_desc(struct arm_smmu_device *smmu,
-				    struct arm_smmu_s1_cfg *cfg)
-{
-	u64 val;
-
-	/*
-	 * We don't need to issue any invalidation here, as we'll invalidate
-	 * the STE when installing the new entry anyway.
-	 */
-	val = arm_smmu_cpu_tcr_to_cd(cfg->cd.tcr) |
-#ifdef __BIG_ENDIAN
-	      CTXDESC_CD_0_ENDI |
-#endif
-	      CTXDESC_CD_0_R | CTXDESC_CD_0_A | CTXDESC_CD_0_ASET_PRIVATE |
-	      CTXDESC_CD_0_AA64 | (u64)cfg->cd.asid << CTXDESC_CD_0_ASID_SHIFT |
-	      CTXDESC_CD_0_V;
-
-	/* STALL_MODEL==0b10 && CD.S==0 is ILLEGAL */
-	if (smmu->features & ARM_SMMU_FEAT_STALL_FORCE)
-		val |= CTXDESC_CD_0_S;
-
-	cfg->cdptr[0] = cpu_to_le64(val);
-
-	val = cfg->cd.ttbr & CTXDESC_CD_1_TTB0_MASK << CTXDESC_CD_1_TTB0_SHIFT;
-	cfg->cdptr[1] = cpu_to_le64(val);
-
-	cfg->cdptr[3] = cpu_to_le64(cfg->cd.mair << CTXDESC_CD_3_MAIR_SHIFT);
-}
-
 /* Stream table manipulation functions */
 static void
 arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
@@ -1222,7 +1124,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
 			dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
 
-		val |= (ste->s1_cfg->cdptr_dma & STRTAB_STE_0_S1CTXPTR_MASK
+		val |= (ste->s1_cfg->tables.base & STRTAB_STE_0_S1CTXPTR_MASK
 		        << STRTAB_STE_0_S1CTXPTR_SHIFT) |
 			STRTAB_STE_0_CFG_S1_TRANS;
 	}
@@ -1466,8 +1368,10 @@ static void arm_smmu_tlb_inv_context(void *cookie)
 	struct arm_smmu_cmdq_ent cmd;
 
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
+		if (unlikely(!smmu_domain->s1_cfg.cd0))
+			return;
 		cmd.opcode	= CMDQ_OP_TLBI_NH_ASID;
-		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd.asid;
+		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd0->tag;
 		cmd.tlbi.vmid	= 0;
 	} else {
 		cmd.opcode	= CMDQ_OP_TLBI_S12_VMALL;
@@ -1491,8 +1395,10 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
 	};
 
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
+		if (unlikely(!smmu_domain->s1_cfg.cd0))
+			return;
 		cmd.opcode	= CMDQ_OP_TLBI_NH_VA;
-		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd.asid;
+		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd0->tag;
 	} else {
 		cmd.opcode	= CMDQ_OP_TLBI_S2_IPA;
 		cmd.tlbi.vmid	= smmu_domain->s2_cfg.vmid;
@@ -1510,6 +1416,71 @@ static const struct iommu_gather_ops arm_smmu_gather_ops = {
 	.tlb_sync	= arm_smmu_tlb_sync,
 };
 
+/* PASID TABLE API */
+static void __arm_smmu_sync_cd(struct arm_smmu_domain *smmu_domain,
+			       struct arm_smmu_cmdq_ent *cmd)
+{
+	size_t i;
+	unsigned long flags;
+	struct arm_smmu_master_data *master;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+	list_for_each_entry(master, &smmu_domain->devices, list) {
+		struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
+
+		for (i = 0; i < fwspec->num_ids; i++) {
+			cmd->cfgi.sid = fwspec->ids[i];
+			arm_smmu_cmdq_issue_cmd(smmu, cmd);
+		}
+	}
+	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+
+	__arm_smmu_tlb_sync(smmu);
+}
+
+static void arm_smmu_sync_cd(void *cookie, int ssid, bool leaf)
+{
+	struct arm_smmu_cmdq_ent cmd = {
+		.opcode	= CMDQ_OP_CFGI_CD_ALL,
+		.cfgi	= {
+			.ssid	= ssid,
+			.leaf	= leaf,
+		},
+	};
+
+	__arm_smmu_sync_cd(cookie, &cmd);
+}
+
+static void arm_smmu_sync_cd_all(void *cookie)
+{
+	struct arm_smmu_cmdq_ent cmd = {
+		.opcode	= CMDQ_OP_CFGI_CD_ALL,
+	};
+
+	__arm_smmu_sync_cd(cookie, &cmd);
+}
+
+static void arm_smmu_tlb_inv_ssid(void *cookie, int ssid,
+				  struct iommu_pasid_entry *entry)
+{
+	struct arm_smmu_domain *smmu_domain = cookie;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	struct arm_smmu_cmdq_ent cmd = {
+		.opcode		= CMDQ_OP_TLBI_NH_ASID,
+		.tlbi.asid	= entry->tag,
+	};
+
+	arm_smmu_cmdq_issue_cmd(smmu, &cmd);
+	__arm_smmu_tlb_sync(smmu);
+}
+
+static struct iommu_pasid_sync_ops arm_smmu_ctx_sync = {
+	.cfg_flush	= arm_smmu_sync_cd,
+	.cfg_flush_all	= arm_smmu_sync_cd_all,
+	.tlb_flush	= arm_smmu_tlb_inv_ssid,
+};
+
 /* IOMMU API */
 static bool arm_smmu_capable(enum iommu_cap cap)
 {
@@ -1582,15 +1553,11 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
 
 	/* Free the CD and ASID, if we allocated them */
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
-		struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
-
-		if (cfg->cdptr) {
-			dmam_free_coherent(smmu_domain->smmu->dev,
-					   CTXDESC_CD_DWORDS << 3,
-					   cfg->cdptr,
-					   cfg->cdptr_dma);
+		struct iommu_pasid_table_ops *ops = smmu_domain->s1_cfg.ops;
 
-			arm_smmu_bitmap_free(smmu->asid_map, cfg->cd.asid);
+		if (ops) {
+			ops->free_entry(ops, smmu_domain->s1_cfg.cd0);
+			iommu_free_pasid_ops(ops);
 		}
 	} else {
 		struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg;
@@ -1605,31 +1572,42 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 				       struct io_pgtable_cfg *pgtbl_cfg)
 {
 	int ret;
-	int asid;
-	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	struct iommu_pasid_entry *entry;
+	struct iommu_pasid_table_ops *ops;
 	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	struct iommu_pasid_table_cfg pasid_cfg = {
+		.iommu_dev		= smmu->dev,
+		.sync			= &arm_smmu_ctx_sync,
+		.arm_smmu = {
+			.stall		= !!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE),
+			.asid_bits	= smmu->asid_bits,
+		},
+	};
 
-	asid = arm_smmu_bitmap_alloc(smmu->asid_map, smmu->asid_bits);
-	if (asid < 0)
-		return asid;
+	ops = iommu_alloc_pasid_ops(PASID_TABLE_ARM_SMMU_V3, &pasid_cfg,
+				    smmu_domain);
+	if (!ops)
+		return -ENOMEM;
 
-	cfg->cdptr = dmam_alloc_coherent(smmu->dev, CTXDESC_CD_DWORDS << 3,
-					 &cfg->cdptr_dma,
-					 GFP_KERNEL | __GFP_ZERO);
-	if (!cfg->cdptr) {
-		dev_warn(smmu->dev, "failed to allocate context descriptor\n");
-		ret = -ENOMEM;
-		goto out_free_asid;
+	/* Create default entry */
+	entry = ops->alloc_priv_entry(ops, ARM_64_LPAE_S1, pgtbl_cfg);
+	if (IS_ERR(entry)) {
+		iommu_free_pasid_ops(ops);
+		return PTR_ERR(entry);
 	}
 
-	cfg->cd.asid	= (u16)asid;
-	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
-	cfg->cd.tcr	= pgtbl_cfg->arm_lpae_s1_cfg.tcr;
-	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair[0];
-	return 0;
+	ret = ops->set_entry(ops, 0, entry);
+	if (ret) {
+		ops->free_entry(ops, entry);
+		iommu_free_pasid_ops(ops);
+		return ret;
+	}
+
+	cfg->tables	= pasid_cfg;
+	cfg->ops	= ops;
+	cfg->cd0	= entry;
 
-out_free_asid:
-	arm_smmu_bitmap_free(smmu->asid_map, asid);
 	return ret;
 }
 
@@ -1832,7 +1810,6 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	} else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
 		ste->s1_cfg = &smmu_domain->s1_cfg;
 		ste->s2_cfg = NULL;
-		arm_smmu_write_ctx_desc(smmu, ste->s1_cfg);
 	} else {
 		ste->s1_cfg = NULL;
 		ste->s2_cfg = &smmu_domain->s2_cfg;
diff --git a/drivers/iommu/iommu-pasid.c b/drivers/iommu/iommu-pasid.c
index 6b21d369d514..239b91e18543 100644
--- a/drivers/iommu/iommu-pasid.c
+++ b/drivers/iommu/iommu-pasid.c
@@ -13,6 +13,7 @@
 
 static const struct iommu_pasid_init_fns *
 pasid_table_init_fns[PASID_TABLE_NUM_FMTS] = {
+	[PASID_TABLE_ARM_SMMU_V3] = &arm_smmu_v3_pasid_init_fns,
 };
 
 struct iommu_pasid_table_ops *
diff --git a/drivers/iommu/iommu-pasid.h b/drivers/iommu/iommu-pasid.h
index 40a27d35c1e0..77e449a1655b 100644
--- a/drivers/iommu/iommu-pasid.h
+++ b/drivers/iommu/iommu-pasid.h
@@ -15,6 +15,7 @@
 struct mm_struct;
 
 enum iommu_pasid_table_fmt {
+	PASID_TABLE_ARM_SMMU_V3,
 	PASID_TABLE_NUM_FMTS,
 };
 
@@ -73,6 +74,25 @@ struct iommu_pasid_sync_ops {
 			  struct iommu_pasid_entry *entry);
 };
 
+/**
+ * arm_smmu_context_cfg - PASID table configuration for ARM SMMU v3
+ *
+ * SMMU properties:
+ * @stall:	devices attached to the domain are allowed to stall.
+ * @asid_bits:	number of ASID bits supported by the SMMU
+ *
+ * @s1fmt:	PASID table format, chosen by the allocator.
+ */
+struct arm_smmu_context_cfg {
+	u8				stall:1;
+	u8				asid_bits;
+
+#define ARM_SMMU_S1FMT_LINEAR		0x0
+#define ARM_SMMU_S1FMT_4K_L2		0x1
+#define ARM_SMMU_S1FMT_64K_L2		0x2
+	u8				s1fmt;
+};
+
 /**
  * struct iommu_pasid_table_cfg - Configuration data for a set of PASID tables.
  *
@@ -88,6 +108,11 @@ struct iommu_pasid_table_cfg {
 	const struct iommu_pasid_sync_ops *sync;
 
 	dma_addr_t			base;
+
+	/* Low-level data specific to the IOMMU */
+	union {
+		struct arm_smmu_context_cfg arm_smmu;
+	};
 };
 
 struct iommu_pasid_table_ops *
@@ -139,4 +164,6 @@ static inline void iommu_pasid_flush_tlbs(struct iommu_pasid_table *table,
 	table->cfg.sync->tlb_flush(table->cookie, pasid, entry);
 }
 
+extern struct iommu_pasid_init_fns arm_smmu_v3_pasid_init_fns;
+
 #endif /* __IOMMU_PASID_H */
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 17/37] iommu/arm-smmu-v3: Move context descriptor code
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

In order to add support for substream ID, move the context descriptor code
into a separate library. At the moment it only manages context descriptor
0, which is used for non-PASID translations.

One important behavior change is the ASID allocator, which is now global
instead of per-SMMU. If we end up needing per-SMMU ASIDs after all, it
would be relatively simple to move back to per-device allocator instead
of a global one. Sharing ASIDs will require an IDR, so implement the
ASID allocator with an IDA instead of porting the bitmap, to ease the
transition.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 MAINTAINERS                         |   2 +-
 drivers/iommu/Kconfig               |  11 ++
 drivers/iommu/Makefile              |   1 +
 drivers/iommu/arm-smmu-v3-context.c | 289 ++++++++++++++++++++++++++++++++++++
 drivers/iommu/arm-smmu-v3.c         | 265 +++++++++++++++------------------
 drivers/iommu/iommu-pasid.c         |   1 +
 drivers/iommu/iommu-pasid.h         |  27 ++++
 7 files changed, 451 insertions(+), 145 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-v3-context.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 9cb8ced8322a..93507bfe03a6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1104,7 +1104,7 @@ R:	Robin Murphy <robin.murphy@arm.com>
 L:	linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
 S:	Maintained
 F:	drivers/iommu/arm-smmu.c
-F:	drivers/iommu/arm-smmu-v3.c
+F:	drivers/iommu/arm-smmu-v3*
 F:	drivers/iommu/io-pgtable-arm.c
 F:	drivers/iommu/io-pgtable-arm.h
 F:	drivers/iommu/io-pgtable-arm-v7s.c
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 8add90ba9b75..4b272925ee78 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -66,6 +66,16 @@ menu "Generic PASID table support"
 config IOMMU_PASID_TABLE
 	bool
 
+config ARM_SMMU_V3_CONTEXT
+	bool "ARM SMMU v3 Context Descriptor tables"
+	select IOMMU_PASID_TABLE
+	depends on ARM64
+	help
+	Enable support for ARM SMMU v3 Context Descriptor tables, used for DMA
+	and PASID support.
+
+	If unsure, say N here.
+
 endmenu
 
 config IOMMU_IOVA
@@ -344,6 +354,7 @@ config ARM_SMMU_V3
 	depends on ARM64
 	select IOMMU_API
 	select IOMMU_IO_PGTABLE_LPAE
+	select ARM_SMMU_V3_CONTEXT
 	select GENERIC_MSI_IRQ_DOMAIN
 	help
 	  Support for implementations of the ARM System MMU architecture
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 338e59c93131..22758960ed02 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -9,6 +9,7 @@ obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
 obj-$(CONFIG_IOMMU_PASID_TABLE) += iommu-pasid.o
+obj-$(CONFIG_ARM_SMMU_V3_CONTEXT) += arm-smmu-v3-context.o
 obj-$(CONFIG_IOMMU_IOVA) += iova.o
 obj-$(CONFIG_OF_IOMMU)	+= of_iommu.o
 obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
diff --git a/drivers/iommu/arm-smmu-v3-context.c b/drivers/iommu/arm-smmu-v3-context.c
new file mode 100644
index 000000000000..e910cb356f45
--- /dev/null
+++ b/drivers/iommu/arm-smmu-v3-context.c
@@ -0,0 +1,289 @@
+/*
+ * Context descriptor table driver for SMMUv3
+ *
+ * Copyright (C) 2018 ARM Ltd.
+ *
+ * SPDX-License-Identifier: GPL-2.0
+ */
+
+#include <linux/device.h>
+#include <linux/dma-mapping.h>
+#include <linux/idr.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+
+#include "iommu-pasid.h"
+
+#define CTXDESC_CD_DWORDS		8
+#define CTXDESC_CD_0_TCR_T0SZ_SHIFT	0
+#define ARM64_TCR_T0SZ_SHIFT		0
+#define ARM64_TCR_T0SZ_MASK		0x1fUL
+#define CTXDESC_CD_0_TCR_TG0_SHIFT	6
+#define ARM64_TCR_TG0_SHIFT		14
+#define ARM64_TCR_TG0_MASK		0x3UL
+#define CTXDESC_CD_0_TCR_IRGN0_SHIFT	8
+#define ARM64_TCR_IRGN0_SHIFT		8
+#define ARM64_TCR_IRGN0_MASK		0x3UL
+#define CTXDESC_CD_0_TCR_ORGN0_SHIFT	10
+#define ARM64_TCR_ORGN0_SHIFT		10
+#define ARM64_TCR_ORGN0_MASK		0x3UL
+#define CTXDESC_CD_0_TCR_SH0_SHIFT	12
+#define ARM64_TCR_SH0_SHIFT		12
+#define ARM64_TCR_SH0_MASK		0x3UL
+#define CTXDESC_CD_0_TCR_EPD0_SHIFT	14
+#define ARM64_TCR_EPD0_SHIFT		7
+#define ARM64_TCR_EPD0_MASK		0x1UL
+#define CTXDESC_CD_0_TCR_EPD1_SHIFT	30
+#define ARM64_TCR_EPD1_SHIFT		23
+#define ARM64_TCR_EPD1_MASK		0x1UL
+
+#define CTXDESC_CD_0_ENDI		(1UL << 15)
+#define CTXDESC_CD_0_V			(1UL << 31)
+
+#define CTXDESC_CD_0_TCR_IPS_SHIFT	32
+#define ARM64_TCR_IPS_SHIFT		32
+#define ARM64_TCR_IPS_MASK		0x7UL
+#define CTXDESC_CD_0_TCR_TBI0_SHIFT	38
+#define ARM64_TCR_TBI0_SHIFT		37
+#define ARM64_TCR_TBI0_MASK		0x1UL
+
+#define CTXDESC_CD_0_AA64		(1UL << 41)
+#define CTXDESC_CD_0_S			(1UL << 44)
+#define CTXDESC_CD_0_R			(1UL << 45)
+#define CTXDESC_CD_0_A			(1UL << 46)
+#define CTXDESC_CD_0_ASET_SHIFT		47
+#define CTXDESC_CD_0_ASET_SHARED	(0UL << CTXDESC_CD_0_ASET_SHIFT)
+#define CTXDESC_CD_0_ASET_PRIVATE	(1UL << CTXDESC_CD_0_ASET_SHIFT)
+#define CTXDESC_CD_0_ASID_SHIFT		48
+#define CTXDESC_CD_0_ASID_MASK		0xffffUL
+
+#define CTXDESC_CD_1_TTB0_SHIFT		4
+#define CTXDESC_CD_1_TTB0_MASK		0xfffffffffffUL
+
+#define CTXDESC_CD_3_MAIR_SHIFT		0
+
+/* Convert between AArch64 (CPU) TCR format and SMMU CD format */
+#define ARM_SMMU_TCR2CD(tcr, fld)					\
+	(((tcr) >> ARM64_TCR_##fld##_SHIFT & ARM64_TCR_##fld##_MASK)	\
+	 << CTXDESC_CD_0_TCR_##fld##_SHIFT)
+
+
+struct arm_smmu_cd {
+	struct iommu_pasid_entry	entry;
+
+	u64				ttbr;
+	u64				tcr;
+	u64				mair;
+};
+
+#define pasid_entry_to_cd(entry) \
+	container_of((entry), struct arm_smmu_cd, entry)
+
+struct arm_smmu_cd_tables {
+	struct iommu_pasid_table	pasid;
+
+	void				*ptr;
+	dma_addr_t			ptr_dma;
+};
+
+#define pasid_to_cd_tables(pasid_table) \
+	container_of((pasid_table), struct arm_smmu_cd_tables, pasid)
+
+#define pasid_ops_to_tables(ops) \
+	pasid_to_cd_tables(iommu_pasid_table_ops_to_table(ops))
+
+static DEFINE_IDA(asid_ida);
+
+static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
+{
+	u64 val = 0;
+
+	/* Repack the TCR. Just care about TTBR0 for now */
+	val |= ARM_SMMU_TCR2CD(tcr, T0SZ);
+	val |= ARM_SMMU_TCR2CD(tcr, TG0);
+	val |= ARM_SMMU_TCR2CD(tcr, IRGN0);
+	val |= ARM_SMMU_TCR2CD(tcr, ORGN0);
+	val |= ARM_SMMU_TCR2CD(tcr, SH0);
+	val |= ARM_SMMU_TCR2CD(tcr, EPD0);
+	val |= ARM_SMMU_TCR2CD(tcr, EPD1);
+	val |= ARM_SMMU_TCR2CD(tcr, IPS);
+	val |= ARM_SMMU_TCR2CD(tcr, TBI0);
+
+	return val;
+}
+
+static int arm_smmu_write_ctx_desc(struct arm_smmu_cd_tables *tbl, int ssid,
+				    struct arm_smmu_cd *cd)
+{
+	u64 val;
+	__u64 *cdptr = tbl->ptr;
+	struct arm_smmu_context_cfg *cfg = &tbl->pasid.cfg.arm_smmu;
+
+	if (!cd || WARN_ON(ssid))
+		return -EINVAL;
+
+	/*
+	 * We don't need to issue any invalidation here, as we'll invalidate
+	 * the STE when installing the new entry anyway.
+	 */
+	val = arm_smmu_cpu_tcr_to_cd(cd->tcr) |
+#ifdef __BIG_ENDIAN
+	      CTXDESC_CD_0_ENDI |
+#endif
+	      CTXDESC_CD_0_R | CTXDESC_CD_0_A | CTXDESC_CD_0_ASET_PRIVATE |
+	      CTXDESC_CD_0_AA64 | cd->entry.tag << CTXDESC_CD_0_ASID_SHIFT |
+	      CTXDESC_CD_0_V;
+
+	if (cfg->stall)
+		val |= CTXDESC_CD_0_S;
+
+	cdptr[0] = cpu_to_le64(val);
+
+	val = cd->ttbr & CTXDESC_CD_1_TTB0_MASK << CTXDESC_CD_1_TTB0_SHIFT;
+	cdptr[1] = cpu_to_le64(val);
+
+	cdptr[3] = cpu_to_le64(cd->mair << CTXDESC_CD_3_MAIR_SHIFT);
+
+	return 0;
+}
+
+static struct iommu_pasid_entry *
+arm_smmu_alloc_shared_cd(struct iommu_pasid_table_ops *ops, struct mm_struct *mm)
+{
+	return ERR_PTR(-ENODEV);
+}
+
+static struct iommu_pasid_entry *
+arm_smmu_alloc_priv_cd(struct iommu_pasid_table_ops *ops,
+		       enum io_pgtable_fmt fmt,
+		       struct io_pgtable_cfg *cfg)
+{
+	int ret;
+	int asid;
+	struct arm_smmu_cd *cd;
+	struct arm_smmu_cd_tables *tbl = pasid_ops_to_tables(ops);
+	struct arm_smmu_context_cfg *ctx_cfg = &tbl->pasid.cfg.arm_smmu;
+
+	cd = kzalloc(sizeof(*cd), GFP_KERNEL);
+	if (!cd)
+		return ERR_PTR(-ENOMEM);
+
+	asid = ida_simple_get(&asid_ida, 0, 1 << ctx_cfg->asid_bits,
+			      GFP_KERNEL);
+	if (asid < 0) {
+		kfree(cd);
+		return ERR_PTR(asid);
+	}
+
+	cd->entry.tag = asid;
+
+	switch (fmt) {
+	case ARM_64_LPAE_S1:
+		cd->ttbr	= cfg->arm_lpae_s1_cfg.ttbr[0];
+		cd->tcr		= cfg->arm_lpae_s1_cfg.tcr;
+		cd->mair	= cfg->arm_lpae_s1_cfg.mair[0];
+		break;
+	default:
+		pr_err("Unsupported pgtable format 0x%x\n", fmt);
+		ret = -EINVAL;
+		goto err_free_asid;
+	}
+
+	return &cd->entry;
+
+err_free_asid:
+	ida_simple_remove(&asid_ida, asid);
+
+	kfree(cd);
+
+	return ERR_PTR(ret);
+}
+
+static void arm_smmu_free_cd(struct iommu_pasid_table_ops *ops,
+			     struct iommu_pasid_entry *entry)
+{
+	struct arm_smmu_cd *cd = pasid_entry_to_cd(entry);
+
+	ida_simple_remove(&asid_ida, (u16)entry->tag);
+	kfree(cd);
+}
+
+static int arm_smmu_set_cd(struct iommu_pasid_table_ops *ops, int pasid,
+			   struct iommu_pasid_entry *entry)
+{
+	struct arm_smmu_cd_tables *tbl = pasid_ops_to_tables(ops);
+	struct arm_smmu_cd *cd = pasid_entry_to_cd(entry);
+
+	if (WARN_ON(pasid > (1 << tbl->pasid.cfg.order)))
+		return -EINVAL;
+
+	return arm_smmu_write_ctx_desc(tbl, pasid, cd);
+}
+
+static void arm_smmu_clear_cd(struct iommu_pasid_table_ops *ops, int pasid,
+			      struct iommu_pasid_entry *entry)
+{
+	struct arm_smmu_cd_tables *tbl = pasid_ops_to_tables(ops);
+
+	if (WARN_ON(pasid > (1 << tbl->pasid.cfg.order)))
+		return;
+
+	arm_smmu_write_ctx_desc(tbl, pasid, NULL);
+}
+
+static struct iommu_pasid_table *
+arm_smmu_alloc_cd_tables(struct iommu_pasid_table_cfg *cfg, void *cookie)
+{
+	struct arm_smmu_cd_tables *tbl;
+	struct device *dev = cfg->iommu_dev;
+
+	if (cfg->order) {
+		/* TODO: support SSID */
+		return NULL;
+	}
+
+	tbl = devm_kzalloc(dev, sizeof(*tbl), GFP_KERNEL);
+	if (!tbl)
+		return NULL;
+
+	tbl->ptr = dmam_alloc_coherent(dev, CTXDESC_CD_DWORDS << 3,
+				       &tbl->ptr_dma, GFP_KERNEL | __GFP_ZERO);
+	if (!tbl->ptr) {
+		dev_warn(dev, "failed to allocate context descriptor\n");
+		goto err_free_tbl;
+	}
+
+	tbl->pasid.ops = (struct iommu_pasid_table_ops) {
+		.alloc_priv_entry	= arm_smmu_alloc_priv_cd,
+		.alloc_shared_entry	= arm_smmu_alloc_shared_cd,
+		.free_entry		= arm_smmu_free_cd,
+		.set_entry		= arm_smmu_set_cd,
+		.clear_entry		= arm_smmu_clear_cd,
+	};
+
+	cfg->base		= tbl->ptr_dma;
+	cfg->arm_smmu.s1fmt	= ARM_SMMU_S1FMT_LINEAR;
+
+	return &tbl->pasid;
+
+err_free_tbl:
+	devm_kfree(dev, tbl);
+
+	return NULL;
+}
+
+static void arm_smmu_free_cd_tables(struct iommu_pasid_table *pasid_table)
+{
+	struct iommu_pasid_table_cfg *cfg = &pasid_table->cfg;
+	struct device *dev = cfg->iommu_dev;
+	struct arm_smmu_cd_tables *tbl = pasid_to_cd_tables(pasid_table);
+
+	dmam_free_coherent(dev, CTXDESC_CD_DWORDS << 3,
+			   tbl->ptr, tbl->ptr_dma);
+	devm_kfree(dev, tbl);
+}
+
+struct iommu_pasid_init_fns arm_smmu_v3_pasid_init_fns = {
+	.alloc	= arm_smmu_alloc_cd_tables,
+	.free	= arm_smmu_free_cd_tables,
+};
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index fb2507ffcdaf..b6d8c90fafb3 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -40,6 +40,7 @@
 #include <linux/amba/bus.h>
 
 #include "io-pgtable.h"
+#include "iommu-pasid.h"
 
 /* MMIO registers */
 #define ARM_SMMU_IDR0			0x0
@@ -281,60 +282,6 @@
 #define STRTAB_STE_3_S2TTB_SHIFT	4
 #define STRTAB_STE_3_S2TTB_MASK		0xfffffffffffUL
 
-/* Context descriptor (stage-1 only) */
-#define CTXDESC_CD_DWORDS		8
-#define CTXDESC_CD_0_TCR_T0SZ_SHIFT	0
-#define ARM64_TCR_T0SZ_SHIFT		0
-#define ARM64_TCR_T0SZ_MASK		0x1fUL
-#define CTXDESC_CD_0_TCR_TG0_SHIFT	6
-#define ARM64_TCR_TG0_SHIFT		14
-#define ARM64_TCR_TG0_MASK		0x3UL
-#define CTXDESC_CD_0_TCR_IRGN0_SHIFT	8
-#define ARM64_TCR_IRGN0_SHIFT		8
-#define ARM64_TCR_IRGN0_MASK		0x3UL
-#define CTXDESC_CD_0_TCR_ORGN0_SHIFT	10
-#define ARM64_TCR_ORGN0_SHIFT		10
-#define ARM64_TCR_ORGN0_MASK		0x3UL
-#define CTXDESC_CD_0_TCR_SH0_SHIFT	12
-#define ARM64_TCR_SH0_SHIFT		12
-#define ARM64_TCR_SH0_MASK		0x3UL
-#define CTXDESC_CD_0_TCR_EPD0_SHIFT	14
-#define ARM64_TCR_EPD0_SHIFT		7
-#define ARM64_TCR_EPD0_MASK		0x1UL
-#define CTXDESC_CD_0_TCR_EPD1_SHIFT	30
-#define ARM64_TCR_EPD1_SHIFT		23
-#define ARM64_TCR_EPD1_MASK		0x1UL
-
-#define CTXDESC_CD_0_ENDI		(1UL << 15)
-#define CTXDESC_CD_0_V			(1UL << 31)
-
-#define CTXDESC_CD_0_TCR_IPS_SHIFT	32
-#define ARM64_TCR_IPS_SHIFT		32
-#define ARM64_TCR_IPS_MASK		0x7UL
-#define CTXDESC_CD_0_TCR_TBI0_SHIFT	38
-#define ARM64_TCR_TBI0_SHIFT		37
-#define ARM64_TCR_TBI0_MASK		0x1UL
-
-#define CTXDESC_CD_0_AA64		(1UL << 41)
-#define CTXDESC_CD_0_S			(1UL << 44)
-#define CTXDESC_CD_0_R			(1UL << 45)
-#define CTXDESC_CD_0_A			(1UL << 46)
-#define CTXDESC_CD_0_ASET_SHIFT		47
-#define CTXDESC_CD_0_ASET_SHARED	(0UL << CTXDESC_CD_0_ASET_SHIFT)
-#define CTXDESC_CD_0_ASET_PRIVATE	(1UL << CTXDESC_CD_0_ASET_SHIFT)
-#define CTXDESC_CD_0_ASID_SHIFT		48
-#define CTXDESC_CD_0_ASID_MASK		0xffffUL
-
-#define CTXDESC_CD_1_TTB0_SHIFT		4
-#define CTXDESC_CD_1_TTB0_MASK		0xfffffffffffUL
-
-#define CTXDESC_CD_3_MAIR_SHIFT		0
-
-/* Convert between AArch64 (CPU) TCR format and SMMU CD format */
-#define ARM_SMMU_TCR2CD(tcr, fld)					\
-	(((tcr) >> ARM64_TCR_##fld##_SHIFT & ARM64_TCR_##fld##_MASK)	\
-	 << CTXDESC_CD_0_TCR_##fld##_SHIFT)
-
 /* Command queue */
 #define CMDQ_ENT_DWORDS			2
 #define CMDQ_MAX_SZ_SHIFT		8
@@ -353,6 +300,8 @@
 #define CMDQ_PREFETCH_1_SIZE_SHIFT	0
 #define CMDQ_PREFETCH_1_ADDR_MASK	~0xfffUL
 
+#define CMDQ_CFGI_0_SSID_SHIFT		12
+#define CMDQ_CFGI_0_SSID_MASK		0xfffffUL
 #define CMDQ_CFGI_0_SID_SHIFT		32
 #define CMDQ_CFGI_0_SID_MASK		0xffffffffUL
 #define CMDQ_CFGI_1_LEAF		(1UL << 0)
@@ -476,8 +425,11 @@ struct arm_smmu_cmdq_ent {
 
 		#define CMDQ_OP_CFGI_STE	0x3
 		#define CMDQ_OP_CFGI_ALL	0x4
+		#define CMDQ_OP_CFGI_CD		0x5
+		#define CMDQ_OP_CFGI_CD_ALL	0x6
 		struct {
 			u32			sid;
+			u32			ssid;
 			union {
 				bool		leaf;
 				u8		span;
@@ -552,15 +504,9 @@ struct arm_smmu_strtab_l1_desc {
 };
 
 struct arm_smmu_s1_cfg {
-	__le64				*cdptr;
-	dma_addr_t			cdptr_dma;
-
-	struct arm_smmu_ctx_desc {
-		u16	asid;
-		u64	ttbr;
-		u64	tcr;
-		u64	mair;
-	}				cd;
+	struct iommu_pasid_table_cfg	tables;
+	struct iommu_pasid_table_ops	*ops;
+	struct iommu_pasid_entry	*cd0; /* Default context */
 };
 
 struct arm_smmu_s2_cfg {
@@ -629,9 +575,7 @@ struct arm_smmu_device {
 	unsigned long			oas; /* PA */
 	unsigned long			pgsize_bitmap;
 
-#define ARM_SMMU_MAX_ASIDS		(1 << 16)
 	unsigned int			asid_bits;
-	DECLARE_BITMAP(asid_map, ARM_SMMU_MAX_ASIDS);
 
 #define ARM_SMMU_MAX_VMIDS		(1 << 16)
 	unsigned int			vmid_bits;
@@ -855,10 +799,16 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 		cmd[1] |= ent->prefetch.size << CMDQ_PREFETCH_1_SIZE_SHIFT;
 		cmd[1] |= ent->prefetch.addr & CMDQ_PREFETCH_1_ADDR_MASK;
 		break;
+	case CMDQ_OP_CFGI_CD:
+		cmd[0] |= ent->cfgi.ssid << CMDQ_CFGI_0_SSID_SHIFT;
+		/* Fallthrough */
 	case CMDQ_OP_CFGI_STE:
 		cmd[0] |= (u64)ent->cfgi.sid << CMDQ_CFGI_0_SID_SHIFT;
 		cmd[1] |= ent->cfgi.leaf ? CMDQ_CFGI_1_LEAF : 0;
 		break;
+	case CMDQ_OP_CFGI_CD_ALL:
+		cmd[0] |= (u64)ent->cfgi.sid << CMDQ_CFGI_0_SID_SHIFT;
+		break;
 	case CMDQ_OP_CFGI_ALL:
 		/* Cover the entire SID range */
 		cmd[1] |= CMDQ_CFGI_1_RANGE_MASK << CMDQ_CFGI_1_RANGE_SHIFT;
@@ -1059,54 +1009,6 @@ static void arm_smmu_cmdq_issue_sync(struct arm_smmu_device *smmu)
 		dev_err_ratelimited(smmu->dev, "CMD_SYNC timeout\n");
 }
 
-/* Context descriptor manipulation functions */
-static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
-{
-	u64 val = 0;
-
-	/* Repack the TCR. Just care about TTBR0 for now */
-	val |= ARM_SMMU_TCR2CD(tcr, T0SZ);
-	val |= ARM_SMMU_TCR2CD(tcr, TG0);
-	val |= ARM_SMMU_TCR2CD(tcr, IRGN0);
-	val |= ARM_SMMU_TCR2CD(tcr, ORGN0);
-	val |= ARM_SMMU_TCR2CD(tcr, SH0);
-	val |= ARM_SMMU_TCR2CD(tcr, EPD0);
-	val |= ARM_SMMU_TCR2CD(tcr, EPD1);
-	val |= ARM_SMMU_TCR2CD(tcr, IPS);
-	val |= ARM_SMMU_TCR2CD(tcr, TBI0);
-
-	return val;
-}
-
-static void arm_smmu_write_ctx_desc(struct arm_smmu_device *smmu,
-				    struct arm_smmu_s1_cfg *cfg)
-{
-	u64 val;
-
-	/*
-	 * We don't need to issue any invalidation here, as we'll invalidate
-	 * the STE when installing the new entry anyway.
-	 */
-	val = arm_smmu_cpu_tcr_to_cd(cfg->cd.tcr) |
-#ifdef __BIG_ENDIAN
-	      CTXDESC_CD_0_ENDI |
-#endif
-	      CTXDESC_CD_0_R | CTXDESC_CD_0_A | CTXDESC_CD_0_ASET_PRIVATE |
-	      CTXDESC_CD_0_AA64 | (u64)cfg->cd.asid << CTXDESC_CD_0_ASID_SHIFT |
-	      CTXDESC_CD_0_V;
-
-	/* STALL_MODEL==0b10 && CD.S==0 is ILLEGAL */
-	if (smmu->features & ARM_SMMU_FEAT_STALL_FORCE)
-		val |= CTXDESC_CD_0_S;
-
-	cfg->cdptr[0] = cpu_to_le64(val);
-
-	val = cfg->cd.ttbr & CTXDESC_CD_1_TTB0_MASK << CTXDESC_CD_1_TTB0_SHIFT;
-	cfg->cdptr[1] = cpu_to_le64(val);
-
-	cfg->cdptr[3] = cpu_to_le64(cfg->cd.mair << CTXDESC_CD_3_MAIR_SHIFT);
-}
-
 /* Stream table manipulation functions */
 static void
 arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
@@ -1222,7 +1124,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
 			dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
 
-		val |= (ste->s1_cfg->cdptr_dma & STRTAB_STE_0_S1CTXPTR_MASK
+		val |= (ste->s1_cfg->tables.base & STRTAB_STE_0_S1CTXPTR_MASK
 		        << STRTAB_STE_0_S1CTXPTR_SHIFT) |
 			STRTAB_STE_0_CFG_S1_TRANS;
 	}
@@ -1466,8 +1368,10 @@ static void arm_smmu_tlb_inv_context(void *cookie)
 	struct arm_smmu_cmdq_ent cmd;
 
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
+		if (unlikely(!smmu_domain->s1_cfg.cd0))
+			return;
 		cmd.opcode	= CMDQ_OP_TLBI_NH_ASID;
-		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd.asid;
+		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd0->tag;
 		cmd.tlbi.vmid	= 0;
 	} else {
 		cmd.opcode	= CMDQ_OP_TLBI_S12_VMALL;
@@ -1491,8 +1395,10 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
 	};
 
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
+		if (unlikely(!smmu_domain->s1_cfg.cd0))
+			return;
 		cmd.opcode	= CMDQ_OP_TLBI_NH_VA;
-		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd.asid;
+		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd0->tag;
 	} else {
 		cmd.opcode	= CMDQ_OP_TLBI_S2_IPA;
 		cmd.tlbi.vmid	= smmu_domain->s2_cfg.vmid;
@@ -1510,6 +1416,71 @@ static const struct iommu_gather_ops arm_smmu_gather_ops = {
 	.tlb_sync	= arm_smmu_tlb_sync,
 };
 
+/* PASID TABLE API */
+static void __arm_smmu_sync_cd(struct arm_smmu_domain *smmu_domain,
+			       struct arm_smmu_cmdq_ent *cmd)
+{
+	size_t i;
+	unsigned long flags;
+	struct arm_smmu_master_data *master;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+	list_for_each_entry(master, &smmu_domain->devices, list) {
+		struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
+
+		for (i = 0; i < fwspec->num_ids; i++) {
+			cmd->cfgi.sid = fwspec->ids[i];
+			arm_smmu_cmdq_issue_cmd(smmu, cmd);
+		}
+	}
+	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+
+	__arm_smmu_tlb_sync(smmu);
+}
+
+static void arm_smmu_sync_cd(void *cookie, int ssid, bool leaf)
+{
+	struct arm_smmu_cmdq_ent cmd = {
+		.opcode	= CMDQ_OP_CFGI_CD_ALL,
+		.cfgi	= {
+			.ssid	= ssid,
+			.leaf	= leaf,
+		},
+	};
+
+	__arm_smmu_sync_cd(cookie, &cmd);
+}
+
+static void arm_smmu_sync_cd_all(void *cookie)
+{
+	struct arm_smmu_cmdq_ent cmd = {
+		.opcode	= CMDQ_OP_CFGI_CD_ALL,
+	};
+
+	__arm_smmu_sync_cd(cookie, &cmd);
+}
+
+static void arm_smmu_tlb_inv_ssid(void *cookie, int ssid,
+				  struct iommu_pasid_entry *entry)
+{
+	struct arm_smmu_domain *smmu_domain = cookie;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	struct arm_smmu_cmdq_ent cmd = {
+		.opcode		= CMDQ_OP_TLBI_NH_ASID,
+		.tlbi.asid	= entry->tag,
+	};
+
+	arm_smmu_cmdq_issue_cmd(smmu, &cmd);
+	__arm_smmu_tlb_sync(smmu);
+}
+
+static struct iommu_pasid_sync_ops arm_smmu_ctx_sync = {
+	.cfg_flush	= arm_smmu_sync_cd,
+	.cfg_flush_all	= arm_smmu_sync_cd_all,
+	.tlb_flush	= arm_smmu_tlb_inv_ssid,
+};
+
 /* IOMMU API */
 static bool arm_smmu_capable(enum iommu_cap cap)
 {
@@ -1582,15 +1553,11 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
 
 	/* Free the CD and ASID, if we allocated them */
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
-		struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
-
-		if (cfg->cdptr) {
-			dmam_free_coherent(smmu_domain->smmu->dev,
-					   CTXDESC_CD_DWORDS << 3,
-					   cfg->cdptr,
-					   cfg->cdptr_dma);
+		struct iommu_pasid_table_ops *ops = smmu_domain->s1_cfg.ops;
 
-			arm_smmu_bitmap_free(smmu->asid_map, cfg->cd.asid);
+		if (ops) {
+			ops->free_entry(ops, smmu_domain->s1_cfg.cd0);
+			iommu_free_pasid_ops(ops);
 		}
 	} else {
 		struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg;
@@ -1605,31 +1572,42 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 				       struct io_pgtable_cfg *pgtbl_cfg)
 {
 	int ret;
-	int asid;
-	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	struct iommu_pasid_entry *entry;
+	struct iommu_pasid_table_ops *ops;
 	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	struct iommu_pasid_table_cfg pasid_cfg = {
+		.iommu_dev		= smmu->dev,
+		.sync			= &arm_smmu_ctx_sync,
+		.arm_smmu = {
+			.stall		= !!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE),
+			.asid_bits	= smmu->asid_bits,
+		},
+	};
 
-	asid = arm_smmu_bitmap_alloc(smmu->asid_map, smmu->asid_bits);
-	if (asid < 0)
-		return asid;
+	ops = iommu_alloc_pasid_ops(PASID_TABLE_ARM_SMMU_V3, &pasid_cfg,
+				    smmu_domain);
+	if (!ops)
+		return -ENOMEM;
 
-	cfg->cdptr = dmam_alloc_coherent(smmu->dev, CTXDESC_CD_DWORDS << 3,
-					 &cfg->cdptr_dma,
-					 GFP_KERNEL | __GFP_ZERO);
-	if (!cfg->cdptr) {
-		dev_warn(smmu->dev, "failed to allocate context descriptor\n");
-		ret = -ENOMEM;
-		goto out_free_asid;
+	/* Create default entry */
+	entry = ops->alloc_priv_entry(ops, ARM_64_LPAE_S1, pgtbl_cfg);
+	if (IS_ERR(entry)) {
+		iommu_free_pasid_ops(ops);
+		return PTR_ERR(entry);
 	}
 
-	cfg->cd.asid	= (u16)asid;
-	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
-	cfg->cd.tcr	= pgtbl_cfg->arm_lpae_s1_cfg.tcr;
-	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair[0];
-	return 0;
+	ret = ops->set_entry(ops, 0, entry);
+	if (ret) {
+		ops->free_entry(ops, entry);
+		iommu_free_pasid_ops(ops);
+		return ret;
+	}
+
+	cfg->tables	= pasid_cfg;
+	cfg->ops	= ops;
+	cfg->cd0	= entry;
 
-out_free_asid:
-	arm_smmu_bitmap_free(smmu->asid_map, asid);
 	return ret;
 }
 
@@ -1832,7 +1810,6 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	} else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
 		ste->s1_cfg = &smmu_domain->s1_cfg;
 		ste->s2_cfg = NULL;
-		arm_smmu_write_ctx_desc(smmu, ste->s1_cfg);
 	} else {
 		ste->s1_cfg = NULL;
 		ste->s2_cfg = &smmu_domain->s2_cfg;
diff --git a/drivers/iommu/iommu-pasid.c b/drivers/iommu/iommu-pasid.c
index 6b21d369d514..239b91e18543 100644
--- a/drivers/iommu/iommu-pasid.c
+++ b/drivers/iommu/iommu-pasid.c
@@ -13,6 +13,7 @@
 
 static const struct iommu_pasid_init_fns *
 pasid_table_init_fns[PASID_TABLE_NUM_FMTS] = {
+	[PASID_TABLE_ARM_SMMU_V3] = &arm_smmu_v3_pasid_init_fns,
 };
 
 struct iommu_pasid_table_ops *
diff --git a/drivers/iommu/iommu-pasid.h b/drivers/iommu/iommu-pasid.h
index 40a27d35c1e0..77e449a1655b 100644
--- a/drivers/iommu/iommu-pasid.h
+++ b/drivers/iommu/iommu-pasid.h
@@ -15,6 +15,7 @@
 struct mm_struct;
 
 enum iommu_pasid_table_fmt {
+	PASID_TABLE_ARM_SMMU_V3,
 	PASID_TABLE_NUM_FMTS,
 };
 
@@ -73,6 +74,25 @@ struct iommu_pasid_sync_ops {
 			  struct iommu_pasid_entry *entry);
 };
 
+/**
+ * arm_smmu_context_cfg - PASID table configuration for ARM SMMU v3
+ *
+ * SMMU properties:
+ * @stall:	devices attached to the domain are allowed to stall.
+ * @asid_bits:	number of ASID bits supported by the SMMU
+ *
+ * @s1fmt:	PASID table format, chosen by the allocator.
+ */
+struct arm_smmu_context_cfg {
+	u8				stall:1;
+	u8				asid_bits;
+
+#define ARM_SMMU_S1FMT_LINEAR		0x0
+#define ARM_SMMU_S1FMT_4K_L2		0x1
+#define ARM_SMMU_S1FMT_64K_L2		0x2
+	u8				s1fmt;
+};
+
 /**
  * struct iommu_pasid_table_cfg - Configuration data for a set of PASID tables.
  *
@@ -88,6 +108,11 @@ struct iommu_pasid_table_cfg {
 	const struct iommu_pasid_sync_ops *sync;
 
 	dma_addr_t			base;
+
+	/* Low-level data specific to the IOMMU */
+	union {
+		struct arm_smmu_context_cfg arm_smmu;
+	};
 };
 
 struct iommu_pasid_table_ops *
@@ -139,4 +164,6 @@ static inline void iommu_pasid_flush_tlbs(struct iommu_pasid_table *table,
 	table->cfg.sync->tlb_flush(table->cookie, pasid, entry);
 }
 
+extern struct iommu_pasid_init_fns arm_smmu_v3_pasid_init_fns;
+
 #endif /* __IOMMU_PASID_H */
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 17/37] iommu/arm-smmu-v3: Move context descriptor code
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

In order to add support for substream ID, move the context descriptor code
into a separate library. At the moment it only manages context descriptor
0, which is used for non-PASID translations.

One important behavior change is the ASID allocator, which is now global
instead of per-SMMU. If we end up needing per-SMMU ASIDs after all, it
would be relatively simple to move back to per-device allocator instead
of a global one. Sharing ASIDs will require an IDR, so implement the
ASID allocator with an IDA instead of porting the bitmap, to ease the
transition.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 MAINTAINERS                         |   2 +-
 drivers/iommu/Kconfig               |  11 ++
 drivers/iommu/Makefile              |   1 +
 drivers/iommu/arm-smmu-v3-context.c | 289 ++++++++++++++++++++++++++++++++++++
 drivers/iommu/arm-smmu-v3.c         | 265 +++++++++++++++------------------
 drivers/iommu/iommu-pasid.c         |   1 +
 drivers/iommu/iommu-pasid.h         |  27 ++++
 7 files changed, 451 insertions(+), 145 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-v3-context.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 9cb8ced8322a..93507bfe03a6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1104,7 +1104,7 @@ R:	Robin Murphy <robin.murphy@arm.com>
 L:	linux-arm-kernel at lists.infradead.org (moderated for non-subscribers)
 S:	Maintained
 F:	drivers/iommu/arm-smmu.c
-F:	drivers/iommu/arm-smmu-v3.c
+F:	drivers/iommu/arm-smmu-v3*
 F:	drivers/iommu/io-pgtable-arm.c
 F:	drivers/iommu/io-pgtable-arm.h
 F:	drivers/iommu/io-pgtable-arm-v7s.c
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 8add90ba9b75..4b272925ee78 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -66,6 +66,16 @@ menu "Generic PASID table support"
 config IOMMU_PASID_TABLE
 	bool
 
+config ARM_SMMU_V3_CONTEXT
+	bool "ARM SMMU v3 Context Descriptor tables"
+	select IOMMU_PASID_TABLE
+	depends on ARM64
+	help
+	Enable support for ARM SMMU v3 Context Descriptor tables, used for DMA
+	and PASID support.
+
+	If unsure, say N here.
+
 endmenu
 
 config IOMMU_IOVA
@@ -344,6 +354,7 @@ config ARM_SMMU_V3
 	depends on ARM64
 	select IOMMU_API
 	select IOMMU_IO_PGTABLE_LPAE
+	select ARM_SMMU_V3_CONTEXT
 	select GENERIC_MSI_IRQ_DOMAIN
 	help
 	  Support for implementations of the ARM System MMU architecture
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 338e59c93131..22758960ed02 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -9,6 +9,7 @@ obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
 obj-$(CONFIG_IOMMU_PASID_TABLE) += iommu-pasid.o
+obj-$(CONFIG_ARM_SMMU_V3_CONTEXT) += arm-smmu-v3-context.o
 obj-$(CONFIG_IOMMU_IOVA) += iova.o
 obj-$(CONFIG_OF_IOMMU)	+= of_iommu.o
 obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
diff --git a/drivers/iommu/arm-smmu-v3-context.c b/drivers/iommu/arm-smmu-v3-context.c
new file mode 100644
index 000000000000..e910cb356f45
--- /dev/null
+++ b/drivers/iommu/arm-smmu-v3-context.c
@@ -0,0 +1,289 @@
+/*
+ * Context descriptor table driver for SMMUv3
+ *
+ * Copyright (C) 2018 ARM Ltd.
+ *
+ * SPDX-License-Identifier: GPL-2.0
+ */
+
+#include <linux/device.h>
+#include <linux/dma-mapping.h>
+#include <linux/idr.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+
+#include "iommu-pasid.h"
+
+#define CTXDESC_CD_DWORDS		8
+#define CTXDESC_CD_0_TCR_T0SZ_SHIFT	0
+#define ARM64_TCR_T0SZ_SHIFT		0
+#define ARM64_TCR_T0SZ_MASK		0x1fUL
+#define CTXDESC_CD_0_TCR_TG0_SHIFT	6
+#define ARM64_TCR_TG0_SHIFT		14
+#define ARM64_TCR_TG0_MASK		0x3UL
+#define CTXDESC_CD_0_TCR_IRGN0_SHIFT	8
+#define ARM64_TCR_IRGN0_SHIFT		8
+#define ARM64_TCR_IRGN0_MASK		0x3UL
+#define CTXDESC_CD_0_TCR_ORGN0_SHIFT	10
+#define ARM64_TCR_ORGN0_SHIFT		10
+#define ARM64_TCR_ORGN0_MASK		0x3UL
+#define CTXDESC_CD_0_TCR_SH0_SHIFT	12
+#define ARM64_TCR_SH0_SHIFT		12
+#define ARM64_TCR_SH0_MASK		0x3UL
+#define CTXDESC_CD_0_TCR_EPD0_SHIFT	14
+#define ARM64_TCR_EPD0_SHIFT		7
+#define ARM64_TCR_EPD0_MASK		0x1UL
+#define CTXDESC_CD_0_TCR_EPD1_SHIFT	30
+#define ARM64_TCR_EPD1_SHIFT		23
+#define ARM64_TCR_EPD1_MASK		0x1UL
+
+#define CTXDESC_CD_0_ENDI		(1UL << 15)
+#define CTXDESC_CD_0_V			(1UL << 31)
+
+#define CTXDESC_CD_0_TCR_IPS_SHIFT	32
+#define ARM64_TCR_IPS_SHIFT		32
+#define ARM64_TCR_IPS_MASK		0x7UL
+#define CTXDESC_CD_0_TCR_TBI0_SHIFT	38
+#define ARM64_TCR_TBI0_SHIFT		37
+#define ARM64_TCR_TBI0_MASK		0x1UL
+
+#define CTXDESC_CD_0_AA64		(1UL << 41)
+#define CTXDESC_CD_0_S			(1UL << 44)
+#define CTXDESC_CD_0_R			(1UL << 45)
+#define CTXDESC_CD_0_A			(1UL << 46)
+#define CTXDESC_CD_0_ASET_SHIFT		47
+#define CTXDESC_CD_0_ASET_SHARED	(0UL << CTXDESC_CD_0_ASET_SHIFT)
+#define CTXDESC_CD_0_ASET_PRIVATE	(1UL << CTXDESC_CD_0_ASET_SHIFT)
+#define CTXDESC_CD_0_ASID_SHIFT		48
+#define CTXDESC_CD_0_ASID_MASK		0xffffUL
+
+#define CTXDESC_CD_1_TTB0_SHIFT		4
+#define CTXDESC_CD_1_TTB0_MASK		0xfffffffffffUL
+
+#define CTXDESC_CD_3_MAIR_SHIFT		0
+
+/* Convert between AArch64 (CPU) TCR format and SMMU CD format */
+#define ARM_SMMU_TCR2CD(tcr, fld)					\
+	(((tcr) >> ARM64_TCR_##fld##_SHIFT & ARM64_TCR_##fld##_MASK)	\
+	 << CTXDESC_CD_0_TCR_##fld##_SHIFT)
+
+
+struct arm_smmu_cd {
+	struct iommu_pasid_entry	entry;
+
+	u64				ttbr;
+	u64				tcr;
+	u64				mair;
+};
+
+#define pasid_entry_to_cd(entry) \
+	container_of((entry), struct arm_smmu_cd, entry)
+
+struct arm_smmu_cd_tables {
+	struct iommu_pasid_table	pasid;
+
+	void				*ptr;
+	dma_addr_t			ptr_dma;
+};
+
+#define pasid_to_cd_tables(pasid_table) \
+	container_of((pasid_table), struct arm_smmu_cd_tables, pasid)
+
+#define pasid_ops_to_tables(ops) \
+	pasid_to_cd_tables(iommu_pasid_table_ops_to_table(ops))
+
+static DEFINE_IDA(asid_ida);
+
+static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
+{
+	u64 val = 0;
+
+	/* Repack the TCR. Just care about TTBR0 for now */
+	val |= ARM_SMMU_TCR2CD(tcr, T0SZ);
+	val |= ARM_SMMU_TCR2CD(tcr, TG0);
+	val |= ARM_SMMU_TCR2CD(tcr, IRGN0);
+	val |= ARM_SMMU_TCR2CD(tcr, ORGN0);
+	val |= ARM_SMMU_TCR2CD(tcr, SH0);
+	val |= ARM_SMMU_TCR2CD(tcr, EPD0);
+	val |= ARM_SMMU_TCR2CD(tcr, EPD1);
+	val |= ARM_SMMU_TCR2CD(tcr, IPS);
+	val |= ARM_SMMU_TCR2CD(tcr, TBI0);
+
+	return val;
+}
+
+static int arm_smmu_write_ctx_desc(struct arm_smmu_cd_tables *tbl, int ssid,
+				    struct arm_smmu_cd *cd)
+{
+	u64 val;
+	__u64 *cdptr = tbl->ptr;
+	struct arm_smmu_context_cfg *cfg = &tbl->pasid.cfg.arm_smmu;
+
+	if (!cd || WARN_ON(ssid))
+		return -EINVAL;
+
+	/*
+	 * We don't need to issue any invalidation here, as we'll invalidate
+	 * the STE when installing the new entry anyway.
+	 */
+	val = arm_smmu_cpu_tcr_to_cd(cd->tcr) |
+#ifdef __BIG_ENDIAN
+	      CTXDESC_CD_0_ENDI |
+#endif
+	      CTXDESC_CD_0_R | CTXDESC_CD_0_A | CTXDESC_CD_0_ASET_PRIVATE |
+	      CTXDESC_CD_0_AA64 | cd->entry.tag << CTXDESC_CD_0_ASID_SHIFT |
+	      CTXDESC_CD_0_V;
+
+	if (cfg->stall)
+		val |= CTXDESC_CD_0_S;
+
+	cdptr[0] = cpu_to_le64(val);
+
+	val = cd->ttbr & CTXDESC_CD_1_TTB0_MASK << CTXDESC_CD_1_TTB0_SHIFT;
+	cdptr[1] = cpu_to_le64(val);
+
+	cdptr[3] = cpu_to_le64(cd->mair << CTXDESC_CD_3_MAIR_SHIFT);
+
+	return 0;
+}
+
+static struct iommu_pasid_entry *
+arm_smmu_alloc_shared_cd(struct iommu_pasid_table_ops *ops, struct mm_struct *mm)
+{
+	return ERR_PTR(-ENODEV);
+}
+
+static struct iommu_pasid_entry *
+arm_smmu_alloc_priv_cd(struct iommu_pasid_table_ops *ops,
+		       enum io_pgtable_fmt fmt,
+		       struct io_pgtable_cfg *cfg)
+{
+	int ret;
+	int asid;
+	struct arm_smmu_cd *cd;
+	struct arm_smmu_cd_tables *tbl = pasid_ops_to_tables(ops);
+	struct arm_smmu_context_cfg *ctx_cfg = &tbl->pasid.cfg.arm_smmu;
+
+	cd = kzalloc(sizeof(*cd), GFP_KERNEL);
+	if (!cd)
+		return ERR_PTR(-ENOMEM);
+
+	asid = ida_simple_get(&asid_ida, 0, 1 << ctx_cfg->asid_bits,
+			      GFP_KERNEL);
+	if (asid < 0) {
+		kfree(cd);
+		return ERR_PTR(asid);
+	}
+
+	cd->entry.tag = asid;
+
+	switch (fmt) {
+	case ARM_64_LPAE_S1:
+		cd->ttbr	= cfg->arm_lpae_s1_cfg.ttbr[0];
+		cd->tcr		= cfg->arm_lpae_s1_cfg.tcr;
+		cd->mair	= cfg->arm_lpae_s1_cfg.mair[0];
+		break;
+	default:
+		pr_err("Unsupported pgtable format 0x%x\n", fmt);
+		ret = -EINVAL;
+		goto err_free_asid;
+	}
+
+	return &cd->entry;
+
+err_free_asid:
+	ida_simple_remove(&asid_ida, asid);
+
+	kfree(cd);
+
+	return ERR_PTR(ret);
+}
+
+static void arm_smmu_free_cd(struct iommu_pasid_table_ops *ops,
+			     struct iommu_pasid_entry *entry)
+{
+	struct arm_smmu_cd *cd = pasid_entry_to_cd(entry);
+
+	ida_simple_remove(&asid_ida, (u16)entry->tag);
+	kfree(cd);
+}
+
+static int arm_smmu_set_cd(struct iommu_pasid_table_ops *ops, int pasid,
+			   struct iommu_pasid_entry *entry)
+{
+	struct arm_smmu_cd_tables *tbl = pasid_ops_to_tables(ops);
+	struct arm_smmu_cd *cd = pasid_entry_to_cd(entry);
+
+	if (WARN_ON(pasid > (1 << tbl->pasid.cfg.order)))
+		return -EINVAL;
+
+	return arm_smmu_write_ctx_desc(tbl, pasid, cd);
+}
+
+static void arm_smmu_clear_cd(struct iommu_pasid_table_ops *ops, int pasid,
+			      struct iommu_pasid_entry *entry)
+{
+	struct arm_smmu_cd_tables *tbl = pasid_ops_to_tables(ops);
+
+	if (WARN_ON(pasid > (1 << tbl->pasid.cfg.order)))
+		return;
+
+	arm_smmu_write_ctx_desc(tbl, pasid, NULL);
+}
+
+static struct iommu_pasid_table *
+arm_smmu_alloc_cd_tables(struct iommu_pasid_table_cfg *cfg, void *cookie)
+{
+	struct arm_smmu_cd_tables *tbl;
+	struct device *dev = cfg->iommu_dev;
+
+	if (cfg->order) {
+		/* TODO: support SSID */
+		return NULL;
+	}
+
+	tbl = devm_kzalloc(dev, sizeof(*tbl), GFP_KERNEL);
+	if (!tbl)
+		return NULL;
+
+	tbl->ptr = dmam_alloc_coherent(dev, CTXDESC_CD_DWORDS << 3,
+				       &tbl->ptr_dma, GFP_KERNEL | __GFP_ZERO);
+	if (!tbl->ptr) {
+		dev_warn(dev, "failed to allocate context descriptor\n");
+		goto err_free_tbl;
+	}
+
+	tbl->pasid.ops = (struct iommu_pasid_table_ops) {
+		.alloc_priv_entry	= arm_smmu_alloc_priv_cd,
+		.alloc_shared_entry	= arm_smmu_alloc_shared_cd,
+		.free_entry		= arm_smmu_free_cd,
+		.set_entry		= arm_smmu_set_cd,
+		.clear_entry		= arm_smmu_clear_cd,
+	};
+
+	cfg->base		= tbl->ptr_dma;
+	cfg->arm_smmu.s1fmt	= ARM_SMMU_S1FMT_LINEAR;
+
+	return &tbl->pasid;
+
+err_free_tbl:
+	devm_kfree(dev, tbl);
+
+	return NULL;
+}
+
+static void arm_smmu_free_cd_tables(struct iommu_pasid_table *pasid_table)
+{
+	struct iommu_pasid_table_cfg *cfg = &pasid_table->cfg;
+	struct device *dev = cfg->iommu_dev;
+	struct arm_smmu_cd_tables *tbl = pasid_to_cd_tables(pasid_table);
+
+	dmam_free_coherent(dev, CTXDESC_CD_DWORDS << 3,
+			   tbl->ptr, tbl->ptr_dma);
+	devm_kfree(dev, tbl);
+}
+
+struct iommu_pasid_init_fns arm_smmu_v3_pasid_init_fns = {
+	.alloc	= arm_smmu_alloc_cd_tables,
+	.free	= arm_smmu_free_cd_tables,
+};
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index fb2507ffcdaf..b6d8c90fafb3 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -40,6 +40,7 @@
 #include <linux/amba/bus.h>
 
 #include "io-pgtable.h"
+#include "iommu-pasid.h"
 
 /* MMIO registers */
 #define ARM_SMMU_IDR0			0x0
@@ -281,60 +282,6 @@
 #define STRTAB_STE_3_S2TTB_SHIFT	4
 #define STRTAB_STE_3_S2TTB_MASK		0xfffffffffffUL
 
-/* Context descriptor (stage-1 only) */
-#define CTXDESC_CD_DWORDS		8
-#define CTXDESC_CD_0_TCR_T0SZ_SHIFT	0
-#define ARM64_TCR_T0SZ_SHIFT		0
-#define ARM64_TCR_T0SZ_MASK		0x1fUL
-#define CTXDESC_CD_0_TCR_TG0_SHIFT	6
-#define ARM64_TCR_TG0_SHIFT		14
-#define ARM64_TCR_TG0_MASK		0x3UL
-#define CTXDESC_CD_0_TCR_IRGN0_SHIFT	8
-#define ARM64_TCR_IRGN0_SHIFT		8
-#define ARM64_TCR_IRGN0_MASK		0x3UL
-#define CTXDESC_CD_0_TCR_ORGN0_SHIFT	10
-#define ARM64_TCR_ORGN0_SHIFT		10
-#define ARM64_TCR_ORGN0_MASK		0x3UL
-#define CTXDESC_CD_0_TCR_SH0_SHIFT	12
-#define ARM64_TCR_SH0_SHIFT		12
-#define ARM64_TCR_SH0_MASK		0x3UL
-#define CTXDESC_CD_0_TCR_EPD0_SHIFT	14
-#define ARM64_TCR_EPD0_SHIFT		7
-#define ARM64_TCR_EPD0_MASK		0x1UL
-#define CTXDESC_CD_0_TCR_EPD1_SHIFT	30
-#define ARM64_TCR_EPD1_SHIFT		23
-#define ARM64_TCR_EPD1_MASK		0x1UL
-
-#define CTXDESC_CD_0_ENDI		(1UL << 15)
-#define CTXDESC_CD_0_V			(1UL << 31)
-
-#define CTXDESC_CD_0_TCR_IPS_SHIFT	32
-#define ARM64_TCR_IPS_SHIFT		32
-#define ARM64_TCR_IPS_MASK		0x7UL
-#define CTXDESC_CD_0_TCR_TBI0_SHIFT	38
-#define ARM64_TCR_TBI0_SHIFT		37
-#define ARM64_TCR_TBI0_MASK		0x1UL
-
-#define CTXDESC_CD_0_AA64		(1UL << 41)
-#define CTXDESC_CD_0_S			(1UL << 44)
-#define CTXDESC_CD_0_R			(1UL << 45)
-#define CTXDESC_CD_0_A			(1UL << 46)
-#define CTXDESC_CD_0_ASET_SHIFT		47
-#define CTXDESC_CD_0_ASET_SHARED	(0UL << CTXDESC_CD_0_ASET_SHIFT)
-#define CTXDESC_CD_0_ASET_PRIVATE	(1UL << CTXDESC_CD_0_ASET_SHIFT)
-#define CTXDESC_CD_0_ASID_SHIFT		48
-#define CTXDESC_CD_0_ASID_MASK		0xffffUL
-
-#define CTXDESC_CD_1_TTB0_SHIFT		4
-#define CTXDESC_CD_1_TTB0_MASK		0xfffffffffffUL
-
-#define CTXDESC_CD_3_MAIR_SHIFT		0
-
-/* Convert between AArch64 (CPU) TCR format and SMMU CD format */
-#define ARM_SMMU_TCR2CD(tcr, fld)					\
-	(((tcr) >> ARM64_TCR_##fld##_SHIFT & ARM64_TCR_##fld##_MASK)	\
-	 << CTXDESC_CD_0_TCR_##fld##_SHIFT)
-
 /* Command queue */
 #define CMDQ_ENT_DWORDS			2
 #define CMDQ_MAX_SZ_SHIFT		8
@@ -353,6 +300,8 @@
 #define CMDQ_PREFETCH_1_SIZE_SHIFT	0
 #define CMDQ_PREFETCH_1_ADDR_MASK	~0xfffUL
 
+#define CMDQ_CFGI_0_SSID_SHIFT		12
+#define CMDQ_CFGI_0_SSID_MASK		0xfffffUL
 #define CMDQ_CFGI_0_SID_SHIFT		32
 #define CMDQ_CFGI_0_SID_MASK		0xffffffffUL
 #define CMDQ_CFGI_1_LEAF		(1UL << 0)
@@ -476,8 +425,11 @@ struct arm_smmu_cmdq_ent {
 
 		#define CMDQ_OP_CFGI_STE	0x3
 		#define CMDQ_OP_CFGI_ALL	0x4
+		#define CMDQ_OP_CFGI_CD		0x5
+		#define CMDQ_OP_CFGI_CD_ALL	0x6
 		struct {
 			u32			sid;
+			u32			ssid;
 			union {
 				bool		leaf;
 				u8		span;
@@ -552,15 +504,9 @@ struct arm_smmu_strtab_l1_desc {
 };
 
 struct arm_smmu_s1_cfg {
-	__le64				*cdptr;
-	dma_addr_t			cdptr_dma;
-
-	struct arm_smmu_ctx_desc {
-		u16	asid;
-		u64	ttbr;
-		u64	tcr;
-		u64	mair;
-	}				cd;
+	struct iommu_pasid_table_cfg	tables;
+	struct iommu_pasid_table_ops	*ops;
+	struct iommu_pasid_entry	*cd0; /* Default context */
 };
 
 struct arm_smmu_s2_cfg {
@@ -629,9 +575,7 @@ struct arm_smmu_device {
 	unsigned long			oas; /* PA */
 	unsigned long			pgsize_bitmap;
 
-#define ARM_SMMU_MAX_ASIDS		(1 << 16)
 	unsigned int			asid_bits;
-	DECLARE_BITMAP(asid_map, ARM_SMMU_MAX_ASIDS);
 
 #define ARM_SMMU_MAX_VMIDS		(1 << 16)
 	unsigned int			vmid_bits;
@@ -855,10 +799,16 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 		cmd[1] |= ent->prefetch.size << CMDQ_PREFETCH_1_SIZE_SHIFT;
 		cmd[1] |= ent->prefetch.addr & CMDQ_PREFETCH_1_ADDR_MASK;
 		break;
+	case CMDQ_OP_CFGI_CD:
+		cmd[0] |= ent->cfgi.ssid << CMDQ_CFGI_0_SSID_SHIFT;
+		/* Fallthrough */
 	case CMDQ_OP_CFGI_STE:
 		cmd[0] |= (u64)ent->cfgi.sid << CMDQ_CFGI_0_SID_SHIFT;
 		cmd[1] |= ent->cfgi.leaf ? CMDQ_CFGI_1_LEAF : 0;
 		break;
+	case CMDQ_OP_CFGI_CD_ALL:
+		cmd[0] |= (u64)ent->cfgi.sid << CMDQ_CFGI_0_SID_SHIFT;
+		break;
 	case CMDQ_OP_CFGI_ALL:
 		/* Cover the entire SID range */
 		cmd[1] |= CMDQ_CFGI_1_RANGE_MASK << CMDQ_CFGI_1_RANGE_SHIFT;
@@ -1059,54 +1009,6 @@ static void arm_smmu_cmdq_issue_sync(struct arm_smmu_device *smmu)
 		dev_err_ratelimited(smmu->dev, "CMD_SYNC timeout\n");
 }
 
-/* Context descriptor manipulation functions */
-static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
-{
-	u64 val = 0;
-
-	/* Repack the TCR. Just care about TTBR0 for now */
-	val |= ARM_SMMU_TCR2CD(tcr, T0SZ);
-	val |= ARM_SMMU_TCR2CD(tcr, TG0);
-	val |= ARM_SMMU_TCR2CD(tcr, IRGN0);
-	val |= ARM_SMMU_TCR2CD(tcr, ORGN0);
-	val |= ARM_SMMU_TCR2CD(tcr, SH0);
-	val |= ARM_SMMU_TCR2CD(tcr, EPD0);
-	val |= ARM_SMMU_TCR2CD(tcr, EPD1);
-	val |= ARM_SMMU_TCR2CD(tcr, IPS);
-	val |= ARM_SMMU_TCR2CD(tcr, TBI0);
-
-	return val;
-}
-
-static void arm_smmu_write_ctx_desc(struct arm_smmu_device *smmu,
-				    struct arm_smmu_s1_cfg *cfg)
-{
-	u64 val;
-
-	/*
-	 * We don't need to issue any invalidation here, as we'll invalidate
-	 * the STE when installing the new entry anyway.
-	 */
-	val = arm_smmu_cpu_tcr_to_cd(cfg->cd.tcr) |
-#ifdef __BIG_ENDIAN
-	      CTXDESC_CD_0_ENDI |
-#endif
-	      CTXDESC_CD_0_R | CTXDESC_CD_0_A | CTXDESC_CD_0_ASET_PRIVATE |
-	      CTXDESC_CD_0_AA64 | (u64)cfg->cd.asid << CTXDESC_CD_0_ASID_SHIFT |
-	      CTXDESC_CD_0_V;
-
-	/* STALL_MODEL==0b10 && CD.S==0 is ILLEGAL */
-	if (smmu->features & ARM_SMMU_FEAT_STALL_FORCE)
-		val |= CTXDESC_CD_0_S;
-
-	cfg->cdptr[0] = cpu_to_le64(val);
-
-	val = cfg->cd.ttbr & CTXDESC_CD_1_TTB0_MASK << CTXDESC_CD_1_TTB0_SHIFT;
-	cfg->cdptr[1] = cpu_to_le64(val);
-
-	cfg->cdptr[3] = cpu_to_le64(cfg->cd.mair << CTXDESC_CD_3_MAIR_SHIFT);
-}
-
 /* Stream table manipulation functions */
 static void
 arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
@@ -1222,7 +1124,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
 			dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
 
-		val |= (ste->s1_cfg->cdptr_dma & STRTAB_STE_0_S1CTXPTR_MASK
+		val |= (ste->s1_cfg->tables.base & STRTAB_STE_0_S1CTXPTR_MASK
 		        << STRTAB_STE_0_S1CTXPTR_SHIFT) |
 			STRTAB_STE_0_CFG_S1_TRANS;
 	}
@@ -1466,8 +1368,10 @@ static void arm_smmu_tlb_inv_context(void *cookie)
 	struct arm_smmu_cmdq_ent cmd;
 
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
+		if (unlikely(!smmu_domain->s1_cfg.cd0))
+			return;
 		cmd.opcode	= CMDQ_OP_TLBI_NH_ASID;
-		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd.asid;
+		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd0->tag;
 		cmd.tlbi.vmid	= 0;
 	} else {
 		cmd.opcode	= CMDQ_OP_TLBI_S12_VMALL;
@@ -1491,8 +1395,10 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
 	};
 
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
+		if (unlikely(!smmu_domain->s1_cfg.cd0))
+			return;
 		cmd.opcode	= CMDQ_OP_TLBI_NH_VA;
-		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd.asid;
+		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd0->tag;
 	} else {
 		cmd.opcode	= CMDQ_OP_TLBI_S2_IPA;
 		cmd.tlbi.vmid	= smmu_domain->s2_cfg.vmid;
@@ -1510,6 +1416,71 @@ static const struct iommu_gather_ops arm_smmu_gather_ops = {
 	.tlb_sync	= arm_smmu_tlb_sync,
 };
 
+/* PASID TABLE API */
+static void __arm_smmu_sync_cd(struct arm_smmu_domain *smmu_domain,
+			       struct arm_smmu_cmdq_ent *cmd)
+{
+	size_t i;
+	unsigned long flags;
+	struct arm_smmu_master_data *master;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+	list_for_each_entry(master, &smmu_domain->devices, list) {
+		struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
+
+		for (i = 0; i < fwspec->num_ids; i++) {
+			cmd->cfgi.sid = fwspec->ids[i];
+			arm_smmu_cmdq_issue_cmd(smmu, cmd);
+		}
+	}
+	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+
+	__arm_smmu_tlb_sync(smmu);
+}
+
+static void arm_smmu_sync_cd(void *cookie, int ssid, bool leaf)
+{
+	struct arm_smmu_cmdq_ent cmd = {
+		.opcode	= CMDQ_OP_CFGI_CD_ALL,
+		.cfgi	= {
+			.ssid	= ssid,
+			.leaf	= leaf,
+		},
+	};
+
+	__arm_smmu_sync_cd(cookie, &cmd);
+}
+
+static void arm_smmu_sync_cd_all(void *cookie)
+{
+	struct arm_smmu_cmdq_ent cmd = {
+		.opcode	= CMDQ_OP_CFGI_CD_ALL,
+	};
+
+	__arm_smmu_sync_cd(cookie, &cmd);
+}
+
+static void arm_smmu_tlb_inv_ssid(void *cookie, int ssid,
+				  struct iommu_pasid_entry *entry)
+{
+	struct arm_smmu_domain *smmu_domain = cookie;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	struct arm_smmu_cmdq_ent cmd = {
+		.opcode		= CMDQ_OP_TLBI_NH_ASID,
+		.tlbi.asid	= entry->tag,
+	};
+
+	arm_smmu_cmdq_issue_cmd(smmu, &cmd);
+	__arm_smmu_tlb_sync(smmu);
+}
+
+static struct iommu_pasid_sync_ops arm_smmu_ctx_sync = {
+	.cfg_flush	= arm_smmu_sync_cd,
+	.cfg_flush_all	= arm_smmu_sync_cd_all,
+	.tlb_flush	= arm_smmu_tlb_inv_ssid,
+};
+
 /* IOMMU API */
 static bool arm_smmu_capable(enum iommu_cap cap)
 {
@@ -1582,15 +1553,11 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
 
 	/* Free the CD and ASID, if we allocated them */
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
-		struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
-
-		if (cfg->cdptr) {
-			dmam_free_coherent(smmu_domain->smmu->dev,
-					   CTXDESC_CD_DWORDS << 3,
-					   cfg->cdptr,
-					   cfg->cdptr_dma);
+		struct iommu_pasid_table_ops *ops = smmu_domain->s1_cfg.ops;
 
-			arm_smmu_bitmap_free(smmu->asid_map, cfg->cd.asid);
+		if (ops) {
+			ops->free_entry(ops, smmu_domain->s1_cfg.cd0);
+			iommu_free_pasid_ops(ops);
 		}
 	} else {
 		struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg;
@@ -1605,31 +1572,42 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 				       struct io_pgtable_cfg *pgtbl_cfg)
 {
 	int ret;
-	int asid;
-	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	struct iommu_pasid_entry *entry;
+	struct iommu_pasid_table_ops *ops;
 	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	struct iommu_pasid_table_cfg pasid_cfg = {
+		.iommu_dev		= smmu->dev,
+		.sync			= &arm_smmu_ctx_sync,
+		.arm_smmu = {
+			.stall		= !!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE),
+			.asid_bits	= smmu->asid_bits,
+		},
+	};
 
-	asid = arm_smmu_bitmap_alloc(smmu->asid_map, smmu->asid_bits);
-	if (asid < 0)
-		return asid;
+	ops = iommu_alloc_pasid_ops(PASID_TABLE_ARM_SMMU_V3, &pasid_cfg,
+				    smmu_domain);
+	if (!ops)
+		return -ENOMEM;
 
-	cfg->cdptr = dmam_alloc_coherent(smmu->dev, CTXDESC_CD_DWORDS << 3,
-					 &cfg->cdptr_dma,
-					 GFP_KERNEL | __GFP_ZERO);
-	if (!cfg->cdptr) {
-		dev_warn(smmu->dev, "failed to allocate context descriptor\n");
-		ret = -ENOMEM;
-		goto out_free_asid;
+	/* Create default entry */
+	entry = ops->alloc_priv_entry(ops, ARM_64_LPAE_S1, pgtbl_cfg);
+	if (IS_ERR(entry)) {
+		iommu_free_pasid_ops(ops);
+		return PTR_ERR(entry);
 	}
 
-	cfg->cd.asid	= (u16)asid;
-	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
-	cfg->cd.tcr	= pgtbl_cfg->arm_lpae_s1_cfg.tcr;
-	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair[0];
-	return 0;
+	ret = ops->set_entry(ops, 0, entry);
+	if (ret) {
+		ops->free_entry(ops, entry);
+		iommu_free_pasid_ops(ops);
+		return ret;
+	}
+
+	cfg->tables	= pasid_cfg;
+	cfg->ops	= ops;
+	cfg->cd0	= entry;
 
-out_free_asid:
-	arm_smmu_bitmap_free(smmu->asid_map, asid);
 	return ret;
 }
 
@@ -1832,7 +1810,6 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	} else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
 		ste->s1_cfg = &smmu_domain->s1_cfg;
 		ste->s2_cfg = NULL;
-		arm_smmu_write_ctx_desc(smmu, ste->s1_cfg);
 	} else {
 		ste->s1_cfg = NULL;
 		ste->s2_cfg = &smmu_domain->s2_cfg;
diff --git a/drivers/iommu/iommu-pasid.c b/drivers/iommu/iommu-pasid.c
index 6b21d369d514..239b91e18543 100644
--- a/drivers/iommu/iommu-pasid.c
+++ b/drivers/iommu/iommu-pasid.c
@@ -13,6 +13,7 @@
 
 static const struct iommu_pasid_init_fns *
 pasid_table_init_fns[PASID_TABLE_NUM_FMTS] = {
+	[PASID_TABLE_ARM_SMMU_V3] = &arm_smmu_v3_pasid_init_fns,
 };
 
 struct iommu_pasid_table_ops *
diff --git a/drivers/iommu/iommu-pasid.h b/drivers/iommu/iommu-pasid.h
index 40a27d35c1e0..77e449a1655b 100644
--- a/drivers/iommu/iommu-pasid.h
+++ b/drivers/iommu/iommu-pasid.h
@@ -15,6 +15,7 @@
 struct mm_struct;
 
 enum iommu_pasid_table_fmt {
+	PASID_TABLE_ARM_SMMU_V3,
 	PASID_TABLE_NUM_FMTS,
 };
 
@@ -73,6 +74,25 @@ struct iommu_pasid_sync_ops {
 			  struct iommu_pasid_entry *entry);
 };
 
+/**
+ * arm_smmu_context_cfg - PASID table configuration for ARM SMMU v3
+ *
+ * SMMU properties:
+ * @stall:	devices attached to the domain are allowed to stall.
+ * @asid_bits:	number of ASID bits supported by the SMMU
+ *
+ * @s1fmt:	PASID table format, chosen by the allocator.
+ */
+struct arm_smmu_context_cfg {
+	u8				stall:1;
+	u8				asid_bits;
+
+#define ARM_SMMU_S1FMT_LINEAR		0x0
+#define ARM_SMMU_S1FMT_4K_L2		0x1
+#define ARM_SMMU_S1FMT_64K_L2		0x2
+	u8				s1fmt;
+};
+
 /**
  * struct iommu_pasid_table_cfg - Configuration data for a set of PASID tables.
  *
@@ -88,6 +108,11 @@ struct iommu_pasid_table_cfg {
 	const struct iommu_pasid_sync_ops *sync;
 
 	dma_addr_t			base;
+
+	/* Low-level data specific to the IOMMU */
+	union {
+		struct arm_smmu_context_cfg arm_smmu;
+	};
 };
 
 struct iommu_pasid_table_ops *
@@ -139,4 +164,6 @@ static inline void iommu_pasid_flush_tlbs(struct iommu_pasid_table *table,
 	table->cfg.sync->tlb_flush(table->cookie, pasid, entry);
 }
 
+extern struct iommu_pasid_init_fns arm_smmu_v3_pasid_init_fns;
+
 #endif /* __IOMMU_PASID_H */
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 18/37] iommu/arm-smmu-v3: Add support for Substream IDs
  2018-02-12 18:33 ` Jean-Philippe Brucker
@ 2018-02-12 18:33   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

At the moment, the SMMUv3 driver offers only one stage-1 or stage-2
address space to each device. SMMUv3 allows to associate multiple address
spaces per device. In addition to the Stream ID (SID), that identifies a
device, we can now have Substream IDs (SSID) identifying an address space.
In PCIe lingo, SID is called Requester ID (RID) and SSID is called Process
Address-Space ID (PASID).

Prepare the driver for SSID support, by adding context descriptor tables
in STEs (previously a single static context descriptor). A complete
stage-1 walk is now performed like this by the SMMU:

      Stream tables          Ctx. tables          Page tables
        +--------+   ,------->+-------+   ,------->+-------+
        :        :   |        :       :   |        :       :
        +--------+   |        +-------+   |        +-------+
   SID->|  STE   |---'  SSID->|  CD   |---'  IOVA->|  PTE  |--> IPA
        +--------+            +-------+            +-------+
        :        :            :       :            :       :
        +--------+            +-------+            +-------+

We only implement one level of context descriptor table for now, but as
with stream and page tables, an SSID can be split to target multiple
levels of tables.

In all stream table entries, we set S1DSS=SSID0 mode, making translations
without an ssid use context descriptor 0.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3-context.c | 132 ++++++++++++++++++++++++++----------
 drivers/iommu/arm-smmu-v3.c         |  33 +++++++--
 2 files changed, 126 insertions(+), 39 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3-context.c b/drivers/iommu/arm-smmu-v3-context.c
index e910cb356f45..3b0bb9475dea 100644
--- a/drivers/iommu/arm-smmu-v3-context.c
+++ b/drivers/iommu/arm-smmu-v3-context.c
@@ -79,11 +79,14 @@ struct arm_smmu_cd {
 #define pasid_entry_to_cd(entry) \
 	container_of((entry), struct arm_smmu_cd, entry)
 
+struct arm_smmu_cd_table {
+	__le64				*ptr;
+	dma_addr_t			ptr_dma;
+};
+
 struct arm_smmu_cd_tables {
 	struct iommu_pasid_table	pasid;
-
-	void				*ptr;
-	dma_addr_t			ptr_dma;
+	struct arm_smmu_cd_table	table;
 };
 
 #define pasid_to_cd_tables(pasid_table) \
@@ -94,6 +97,36 @@ struct arm_smmu_cd_tables {
 
 static DEFINE_IDA(asid_ida);
 
+static int arm_smmu_alloc_cd_leaf_table(struct device *dev,
+					struct arm_smmu_cd_table *desc,
+					size_t num_entries)
+{
+	size_t size = num_entries * (CTXDESC_CD_DWORDS << 3);
+
+	desc->ptr = dmam_alloc_coherent(dev, size, &desc->ptr_dma,
+					GFP_ATOMIC | __GFP_ZERO);
+	if (!desc->ptr) {
+		dev_warn(dev, "failed to allocate context descriptor table\n");
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static void arm_smmu_free_cd_leaf_table(struct device *dev,
+					struct arm_smmu_cd_table *desc,
+					size_t num_entries)
+{
+	size_t size = num_entries * (CTXDESC_CD_DWORDS << 3);
+
+	dmam_free_coherent(dev, size, desc->ptr, desc->ptr_dma);
+}
+
+static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_cd_tables *tbl, u32 ssid)
+{
+	return tbl->table.ptr + ssid * CTXDESC_CD_DWORDS;
+}
+
 static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
 {
 	u64 val = 0;
@@ -116,33 +149,72 @@ static int arm_smmu_write_ctx_desc(struct arm_smmu_cd_tables *tbl, int ssid,
 				    struct arm_smmu_cd *cd)
 {
 	u64 val;
-	__u64 *cdptr = tbl->ptr;
+	bool cd_live;
 	struct arm_smmu_context_cfg *cfg = &tbl->pasid.cfg.arm_smmu;
-
-	if (!cd || WARN_ON(ssid))
-		return -EINVAL;
+	__le64 *cdptr = arm_smmu_get_cd_ptr(tbl, ssid);
 
 	/*
-	 * We don't need to issue any invalidation here, as we'll invalidate
-	 * the STE when installing the new entry anyway.
+	 * This function handles the following cases:
+	 *
+	 * (1) Install primary CD, for normal DMA traffic (SSID = 0).
+	 * (2) Install a secondary CD, for SID+SSID traffic, followed by an
+	 *     invalidation.
+	 * (3) Update ASID of primary CD. This is allowed by atomically writing
+	 *     the first 64 bits of the CD, followed by invalidation of the old
+	 *     entry and mappings.
+	 * (4) Remove a secondary CD and invalidate it.
 	 */
-	val = arm_smmu_cpu_tcr_to_cd(cd->tcr) |
+
+	if (!cdptr)
+		return -ENOMEM;
+
+	val = le64_to_cpu(cdptr[0]);
+	cd_live = !!(val & CTXDESC_CD_0_V);
+
+	if (!cd) { /* (4) */
+		cdptr[0] = 0;
+	} else if (cd_live) { /* (3) */
+		val &= ~(CTXDESC_CD_0_ASID_MASK << CTXDESC_CD_0_ASID_SHIFT);
+		val |= (cd->entry.tag & CTXDESC_CD_0_ASID_MASK)
+			<< CTXDESC_CD_0_ASID_SHIFT;
+
+		cdptr[0] = cpu_to_le64(val);
+		/*
+		 * Until CD+TLB invalidation, both ASIDs may be used for tagging
+		 * this substream's traffic
+		 */
+	} else { /* (1) and (2) */
+		cdptr[1] = cpu_to_le64(cd->ttbr & CTXDESC_CD_1_TTB0_MASK
+				       << CTXDESC_CD_1_TTB0_SHIFT);
+		cdptr[2] = 0;
+		cdptr[3] = cpu_to_le64(cd->mair << CTXDESC_CD_3_MAIR_SHIFT);
+
+		/*
+		 * STE is live, and the SMMU might fetch this CD at any
+		 * time. Ensure it observes the rest of the CD before we
+		 * enable it.
+		 */
+		iommu_pasid_flush(&tbl->pasid, ssid, true);
+
+
+		val = arm_smmu_cpu_tcr_to_cd(cd->tcr) |
 #ifdef __BIG_ENDIAN
-	      CTXDESC_CD_0_ENDI |
+		      CTXDESC_CD_0_ENDI |
 #endif
-	      CTXDESC_CD_0_R | CTXDESC_CD_0_A | CTXDESC_CD_0_ASET_PRIVATE |
-	      CTXDESC_CD_0_AA64 | cd->entry.tag << CTXDESC_CD_0_ASID_SHIFT |
-	      CTXDESC_CD_0_V;
+		      CTXDESC_CD_0_R | CTXDESC_CD_0_A |
+		      CTXDESC_CD_0_ASET_PRIVATE |
+		      CTXDESC_CD_0_AA64 |
+		      (cd->entry.tag & CTXDESC_CD_0_ASID_MASK)
+		      << CTXDESC_CD_0_ASID_SHIFT |
+		      CTXDESC_CD_0_V;
 
-	if (cfg->stall)
-		val |= CTXDESC_CD_0_S;
+		if (cfg->stall)
+			val |= CTXDESC_CD_0_S;
 
-	cdptr[0] = cpu_to_le64(val);
-
-	val = cd->ttbr & CTXDESC_CD_1_TTB0_MASK << CTXDESC_CD_1_TTB0_SHIFT;
-	cdptr[1] = cpu_to_le64(val);
+		cdptr[0] = cpu_to_le64(val);
+	}
 
-	cdptr[3] = cpu_to_le64(cd->mair << CTXDESC_CD_3_MAIR_SHIFT);
+	iommu_pasid_flush(&tbl->pasid, ssid, true);
 
 	return 0;
 }
@@ -234,24 +306,17 @@ static void arm_smmu_clear_cd(struct iommu_pasid_table_ops *ops, int pasid,
 static struct iommu_pasid_table *
 arm_smmu_alloc_cd_tables(struct iommu_pasid_table_cfg *cfg, void *cookie)
 {
+	int ret;
 	struct arm_smmu_cd_tables *tbl;
 	struct device *dev = cfg->iommu_dev;
 
-	if (cfg->order) {
-		/* TODO: support SSID */
-		return NULL;
-	}
-
 	tbl = devm_kzalloc(dev, sizeof(*tbl), GFP_KERNEL);
 	if (!tbl)
 		return NULL;
 
-	tbl->ptr = dmam_alloc_coherent(dev, CTXDESC_CD_DWORDS << 3,
-				       &tbl->ptr_dma, GFP_KERNEL | __GFP_ZERO);
-	if (!tbl->ptr) {
-		dev_warn(dev, "failed to allocate context descriptor\n");
+	ret = arm_smmu_alloc_cd_leaf_table(dev, &tbl->table, 1 << cfg->order);
+	if (ret)
 		goto err_free_tbl;
-	}
 
 	tbl->pasid.ops = (struct iommu_pasid_table_ops) {
 		.alloc_priv_entry	= arm_smmu_alloc_priv_cd,
@@ -261,7 +326,7 @@ arm_smmu_alloc_cd_tables(struct iommu_pasid_table_cfg *cfg, void *cookie)
 		.clear_entry		= arm_smmu_clear_cd,
 	};
 
-	cfg->base		= tbl->ptr_dma;
+	cfg->base		= tbl->table.ptr_dma;
 	cfg->arm_smmu.s1fmt	= ARM_SMMU_S1FMT_LINEAR;
 
 	return &tbl->pasid;
@@ -278,8 +343,7 @@ static void arm_smmu_free_cd_tables(struct iommu_pasid_table *pasid_table)
 	struct device *dev = cfg->iommu_dev;
 	struct arm_smmu_cd_tables *tbl = pasid_to_cd_tables(pasid_table);
 
-	dmam_free_coherent(dev, CTXDESC_CD_DWORDS << 3,
-			   tbl->ptr, tbl->ptr_dma);
+	arm_smmu_free_cd_leaf_table(dev, &tbl->table, 1 << cfg->order);
 	devm_kfree(dev, tbl);
 }
 
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index b6d8c90fafb3..a307c6885dc0 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -239,12 +239,19 @@
 #define STRTAB_STE_0_CFG_S2_TRANS	(6UL << STRTAB_STE_0_CFG_SHIFT)
 
 #define STRTAB_STE_0_S1FMT_SHIFT	4
-#define STRTAB_STE_0_S1FMT_LINEAR	(0UL << STRTAB_STE_0_S1FMT_SHIFT)
+#define STRTAB_STE_0_S1FMT_MASK		0x3UL
 #define STRTAB_STE_0_S1CTXPTR_SHIFT	6
 #define STRTAB_STE_0_S1CTXPTR_MASK	0x3ffffffffffUL
 #define STRTAB_STE_0_S1CDMAX_SHIFT	59
 #define STRTAB_STE_0_S1CDMAX_MASK	0x1fUL
 
+#define STRTAB_STE_1_S1DSS_SHIFT	0
+#define STRTAB_STE_1_S1DSS_MASK		0x3UL
+#define STRTAB_STE_1_S1DSS_TERMINATE	(0x0 << STRTAB_STE_1_S1DSS_SHIFT)
+#define STRTAB_STE_1_S1DSS_BYPASS	(0x1 << STRTAB_STE_1_S1DSS_SHIFT)
+#define STRTAB_STE_1_S1DSS_SSID0	(0x2 << STRTAB_STE_1_S1DSS_SHIFT)
+
+
 #define STRTAB_STE_1_S1C_CACHE_NC	0UL
 #define STRTAB_STE_1_S1C_CACHE_WBRA	1UL
 #define STRTAB_STE_1_S1C_CACHE_WT	2UL
@@ -601,6 +608,8 @@ struct arm_smmu_master_data {
 	struct list_head		list; /* domain->devices */
 
 	struct device			*dev;
+
+	size_t				ssid_bits;
 };
 
 /* SMMU private data for an IOMMU domain */
@@ -1108,8 +1117,11 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 	}
 
 	if (ste->s1_cfg) {
+		struct iommu_pasid_table_cfg *cfg = &ste->s1_cfg->tables;
+
 		BUG_ON(ste_live);
 		dst[1] = cpu_to_le64(
+			 STRTAB_STE_1_S1DSS_SSID0 |
 			 STRTAB_STE_1_S1C_CACHE_WBRA
 			 << STRTAB_STE_1_S1CIR_SHIFT |
 			 STRTAB_STE_1_S1C_CACHE_WBRA
@@ -1124,8 +1136,12 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
 			dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
 
-		val |= (ste->s1_cfg->tables.base & STRTAB_STE_0_S1CTXPTR_MASK
+		val |= (cfg->base & STRTAB_STE_0_S1CTXPTR_MASK
 		        << STRTAB_STE_0_S1CTXPTR_SHIFT) |
+			(u64)(cfg->order & STRTAB_STE_0_S1CDMAX_MASK)
+			<< STRTAB_STE_0_S1CDMAX_SHIFT |
+			(cfg->arm_smmu.s1fmt & STRTAB_STE_0_S1FMT_MASK)
+			<< STRTAB_STE_0_S1FMT_SHIFT |
 			STRTAB_STE_0_CFG_S1_TRANS;
 	}
 
@@ -1569,6 +1585,7 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
 }
 
 static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
+				       struct arm_smmu_master_data *master,
 				       struct io_pgtable_cfg *pgtbl_cfg)
 {
 	int ret;
@@ -1578,6 +1595,7 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	struct iommu_pasid_table_cfg pasid_cfg = {
 		.iommu_dev		= smmu->dev,
+		.order			= master->ssid_bits,
 		.sync			= &arm_smmu_ctx_sync,
 		.arm_smmu = {
 			.stall		= !!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE),
@@ -1612,6 +1630,7 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 }
 
 static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain,
+				       struct arm_smmu_master_data *master,
 				       struct io_pgtable_cfg *pgtbl_cfg)
 {
 	int vmid;
@@ -1628,7 +1647,8 @@ static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain,
 	return 0;
 }
 
-static int arm_smmu_domain_finalise(struct iommu_domain *domain)
+static int arm_smmu_domain_finalise(struct iommu_domain *domain,
+				    struct arm_smmu_master_data *master)
 {
 	int ret;
 	unsigned long ias, oas;
@@ -1636,6 +1656,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain)
 	struct io_pgtable_cfg pgtbl_cfg;
 	struct io_pgtable_ops *pgtbl_ops;
 	int (*finalise_stage_fn)(struct arm_smmu_domain *,
+				 struct arm_smmu_master_data *,
 				 struct io_pgtable_cfg *);
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
@@ -1688,7 +1709,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain)
 	domain->geometry.aperture_end = (1UL << ias) - 1;
 	domain->geometry.force_aperture = true;
 
-	ret = finalise_stage_fn(smmu_domain, &pgtbl_cfg);
+	ret = finalise_stage_fn(smmu_domain, master, &pgtbl_cfg);
 	if (ret < 0) {
 		free_io_pgtable_ops(pgtbl_ops);
 		return ret;
@@ -1783,7 +1804,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 
 	if (!smmu_domain->smmu) {
 		smmu_domain->smmu = smmu;
-		ret = arm_smmu_domain_finalise(domain);
+		ret = arm_smmu_domain_finalise(domain, master);
 		if (ret) {
 			smmu_domain->smmu = NULL;
 			goto out_unlock;
@@ -1939,6 +1960,8 @@ static int arm_smmu_add_device(struct device *dev)
 		}
 	}
 
+	master->ssid_bits = min(smmu->ssid_bits, fwspec->num_pasid_bits);
+
 	group = iommu_group_get_for_dev(dev);
 	if (!IS_ERR(group)) {
 		iommu_group_put(group);
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 18/37] iommu/arm-smmu-v3: Add support for Substream IDs
@ 2018-02-12 18:33   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

At the moment, the SMMUv3 driver offers only one stage-1 or stage-2
address space to each device. SMMUv3 allows to associate multiple address
spaces per device. In addition to the Stream ID (SID), that identifies a
device, we can now have Substream IDs (SSID) identifying an address space.
In PCIe lingo, SID is called Requester ID (RID) and SSID is called Process
Address-Space ID (PASID).

Prepare the driver for SSID support, by adding context descriptor tables
in STEs (previously a single static context descriptor). A complete
stage-1 walk is now performed like this by the SMMU:

      Stream tables          Ctx. tables          Page tables
        +--------+   ,------->+-------+   ,------->+-------+
        :        :   |        :       :   |        :       :
        +--------+   |        +-------+   |        +-------+
   SID->|  STE   |---'  SSID->|  CD   |---'  IOVA->|  PTE  |--> IPA
        +--------+            +-------+            +-------+
        :        :            :       :            :       :
        +--------+            +-------+            +-------+

We only implement one level of context descriptor table for now, but as
with stream and page tables, an SSID can be split to target multiple
levels of tables.

In all stream table entries, we set S1DSS=SSID0 mode, making translations
without an ssid use context descriptor 0.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3-context.c | 132 ++++++++++++++++++++++++++----------
 drivers/iommu/arm-smmu-v3.c         |  33 +++++++--
 2 files changed, 126 insertions(+), 39 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3-context.c b/drivers/iommu/arm-smmu-v3-context.c
index e910cb356f45..3b0bb9475dea 100644
--- a/drivers/iommu/arm-smmu-v3-context.c
+++ b/drivers/iommu/arm-smmu-v3-context.c
@@ -79,11 +79,14 @@ struct arm_smmu_cd {
 #define pasid_entry_to_cd(entry) \
 	container_of((entry), struct arm_smmu_cd, entry)
 
+struct arm_smmu_cd_table {
+	__le64				*ptr;
+	dma_addr_t			ptr_dma;
+};
+
 struct arm_smmu_cd_tables {
 	struct iommu_pasid_table	pasid;
-
-	void				*ptr;
-	dma_addr_t			ptr_dma;
+	struct arm_smmu_cd_table	table;
 };
 
 #define pasid_to_cd_tables(pasid_table) \
@@ -94,6 +97,36 @@ struct arm_smmu_cd_tables {
 
 static DEFINE_IDA(asid_ida);
 
+static int arm_smmu_alloc_cd_leaf_table(struct device *dev,
+					struct arm_smmu_cd_table *desc,
+					size_t num_entries)
+{
+	size_t size = num_entries * (CTXDESC_CD_DWORDS << 3);
+
+	desc->ptr = dmam_alloc_coherent(dev, size, &desc->ptr_dma,
+					GFP_ATOMIC | __GFP_ZERO);
+	if (!desc->ptr) {
+		dev_warn(dev, "failed to allocate context descriptor table\n");
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static void arm_smmu_free_cd_leaf_table(struct device *dev,
+					struct arm_smmu_cd_table *desc,
+					size_t num_entries)
+{
+	size_t size = num_entries * (CTXDESC_CD_DWORDS << 3);
+
+	dmam_free_coherent(dev, size, desc->ptr, desc->ptr_dma);
+}
+
+static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_cd_tables *tbl, u32 ssid)
+{
+	return tbl->table.ptr + ssid * CTXDESC_CD_DWORDS;
+}
+
 static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
 {
 	u64 val = 0;
@@ -116,33 +149,72 @@ static int arm_smmu_write_ctx_desc(struct arm_smmu_cd_tables *tbl, int ssid,
 				    struct arm_smmu_cd *cd)
 {
 	u64 val;
-	__u64 *cdptr = tbl->ptr;
+	bool cd_live;
 	struct arm_smmu_context_cfg *cfg = &tbl->pasid.cfg.arm_smmu;
-
-	if (!cd || WARN_ON(ssid))
-		return -EINVAL;
+	__le64 *cdptr = arm_smmu_get_cd_ptr(tbl, ssid);
 
 	/*
-	 * We don't need to issue any invalidation here, as we'll invalidate
-	 * the STE when installing the new entry anyway.
+	 * This function handles the following cases:
+	 *
+	 * (1) Install primary CD, for normal DMA traffic (SSID = 0).
+	 * (2) Install a secondary CD, for SID+SSID traffic, followed by an
+	 *     invalidation.
+	 * (3) Update ASID of primary CD. This is allowed by atomically writing
+	 *     the first 64 bits of the CD, followed by invalidation of the old
+	 *     entry and mappings.
+	 * (4) Remove a secondary CD and invalidate it.
 	 */
-	val = arm_smmu_cpu_tcr_to_cd(cd->tcr) |
+
+	if (!cdptr)
+		return -ENOMEM;
+
+	val = le64_to_cpu(cdptr[0]);
+	cd_live = !!(val & CTXDESC_CD_0_V);
+
+	if (!cd) { /* (4) */
+		cdptr[0] = 0;
+	} else if (cd_live) { /* (3) */
+		val &= ~(CTXDESC_CD_0_ASID_MASK << CTXDESC_CD_0_ASID_SHIFT);
+		val |= (cd->entry.tag & CTXDESC_CD_0_ASID_MASK)
+			<< CTXDESC_CD_0_ASID_SHIFT;
+
+		cdptr[0] = cpu_to_le64(val);
+		/*
+		 * Until CD+TLB invalidation, both ASIDs may be used for tagging
+		 * this substream's traffic
+		 */
+	} else { /* (1) and (2) */
+		cdptr[1] = cpu_to_le64(cd->ttbr & CTXDESC_CD_1_TTB0_MASK
+				       << CTXDESC_CD_1_TTB0_SHIFT);
+		cdptr[2] = 0;
+		cdptr[3] = cpu_to_le64(cd->mair << CTXDESC_CD_3_MAIR_SHIFT);
+
+		/*
+		 * STE is live, and the SMMU might fetch this CD@any
+		 * time. Ensure it observes the rest of the CD before we
+		 * enable it.
+		 */
+		iommu_pasid_flush(&tbl->pasid, ssid, true);
+
+
+		val = arm_smmu_cpu_tcr_to_cd(cd->tcr) |
 #ifdef __BIG_ENDIAN
-	      CTXDESC_CD_0_ENDI |
+		      CTXDESC_CD_0_ENDI |
 #endif
-	      CTXDESC_CD_0_R | CTXDESC_CD_0_A | CTXDESC_CD_0_ASET_PRIVATE |
-	      CTXDESC_CD_0_AA64 | cd->entry.tag << CTXDESC_CD_0_ASID_SHIFT |
-	      CTXDESC_CD_0_V;
+		      CTXDESC_CD_0_R | CTXDESC_CD_0_A |
+		      CTXDESC_CD_0_ASET_PRIVATE |
+		      CTXDESC_CD_0_AA64 |
+		      (cd->entry.tag & CTXDESC_CD_0_ASID_MASK)
+		      << CTXDESC_CD_0_ASID_SHIFT |
+		      CTXDESC_CD_0_V;
 
-	if (cfg->stall)
-		val |= CTXDESC_CD_0_S;
+		if (cfg->stall)
+			val |= CTXDESC_CD_0_S;
 
-	cdptr[0] = cpu_to_le64(val);
-
-	val = cd->ttbr & CTXDESC_CD_1_TTB0_MASK << CTXDESC_CD_1_TTB0_SHIFT;
-	cdptr[1] = cpu_to_le64(val);
+		cdptr[0] = cpu_to_le64(val);
+	}
 
-	cdptr[3] = cpu_to_le64(cd->mair << CTXDESC_CD_3_MAIR_SHIFT);
+	iommu_pasid_flush(&tbl->pasid, ssid, true);
 
 	return 0;
 }
@@ -234,24 +306,17 @@ static void arm_smmu_clear_cd(struct iommu_pasid_table_ops *ops, int pasid,
 static struct iommu_pasid_table *
 arm_smmu_alloc_cd_tables(struct iommu_pasid_table_cfg *cfg, void *cookie)
 {
+	int ret;
 	struct arm_smmu_cd_tables *tbl;
 	struct device *dev = cfg->iommu_dev;
 
-	if (cfg->order) {
-		/* TODO: support SSID */
-		return NULL;
-	}
-
 	tbl = devm_kzalloc(dev, sizeof(*tbl), GFP_KERNEL);
 	if (!tbl)
 		return NULL;
 
-	tbl->ptr = dmam_alloc_coherent(dev, CTXDESC_CD_DWORDS << 3,
-				       &tbl->ptr_dma, GFP_KERNEL | __GFP_ZERO);
-	if (!tbl->ptr) {
-		dev_warn(dev, "failed to allocate context descriptor\n");
+	ret = arm_smmu_alloc_cd_leaf_table(dev, &tbl->table, 1 << cfg->order);
+	if (ret)
 		goto err_free_tbl;
-	}
 
 	tbl->pasid.ops = (struct iommu_pasid_table_ops) {
 		.alloc_priv_entry	= arm_smmu_alloc_priv_cd,
@@ -261,7 +326,7 @@ arm_smmu_alloc_cd_tables(struct iommu_pasid_table_cfg *cfg, void *cookie)
 		.clear_entry		= arm_smmu_clear_cd,
 	};
 
-	cfg->base		= tbl->ptr_dma;
+	cfg->base		= tbl->table.ptr_dma;
 	cfg->arm_smmu.s1fmt	= ARM_SMMU_S1FMT_LINEAR;
 
 	return &tbl->pasid;
@@ -278,8 +343,7 @@ static void arm_smmu_free_cd_tables(struct iommu_pasid_table *pasid_table)
 	struct device *dev = cfg->iommu_dev;
 	struct arm_smmu_cd_tables *tbl = pasid_to_cd_tables(pasid_table);
 
-	dmam_free_coherent(dev, CTXDESC_CD_DWORDS << 3,
-			   tbl->ptr, tbl->ptr_dma);
+	arm_smmu_free_cd_leaf_table(dev, &tbl->table, 1 << cfg->order);
 	devm_kfree(dev, tbl);
 }
 
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index b6d8c90fafb3..a307c6885dc0 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -239,12 +239,19 @@
 #define STRTAB_STE_0_CFG_S2_TRANS	(6UL << STRTAB_STE_0_CFG_SHIFT)
 
 #define STRTAB_STE_0_S1FMT_SHIFT	4
-#define STRTAB_STE_0_S1FMT_LINEAR	(0UL << STRTAB_STE_0_S1FMT_SHIFT)
+#define STRTAB_STE_0_S1FMT_MASK		0x3UL
 #define STRTAB_STE_0_S1CTXPTR_SHIFT	6
 #define STRTAB_STE_0_S1CTXPTR_MASK	0x3ffffffffffUL
 #define STRTAB_STE_0_S1CDMAX_SHIFT	59
 #define STRTAB_STE_0_S1CDMAX_MASK	0x1fUL
 
+#define STRTAB_STE_1_S1DSS_SHIFT	0
+#define STRTAB_STE_1_S1DSS_MASK		0x3UL
+#define STRTAB_STE_1_S1DSS_TERMINATE	(0x0 << STRTAB_STE_1_S1DSS_SHIFT)
+#define STRTAB_STE_1_S1DSS_BYPASS	(0x1 << STRTAB_STE_1_S1DSS_SHIFT)
+#define STRTAB_STE_1_S1DSS_SSID0	(0x2 << STRTAB_STE_1_S1DSS_SHIFT)
+
+
 #define STRTAB_STE_1_S1C_CACHE_NC	0UL
 #define STRTAB_STE_1_S1C_CACHE_WBRA	1UL
 #define STRTAB_STE_1_S1C_CACHE_WT	2UL
@@ -601,6 +608,8 @@ struct arm_smmu_master_data {
 	struct list_head		list; /* domain->devices */
 
 	struct device			*dev;
+
+	size_t				ssid_bits;
 };
 
 /* SMMU private data for an IOMMU domain */
@@ -1108,8 +1117,11 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 	}
 
 	if (ste->s1_cfg) {
+		struct iommu_pasid_table_cfg *cfg = &ste->s1_cfg->tables;
+
 		BUG_ON(ste_live);
 		dst[1] = cpu_to_le64(
+			 STRTAB_STE_1_S1DSS_SSID0 |
 			 STRTAB_STE_1_S1C_CACHE_WBRA
 			 << STRTAB_STE_1_S1CIR_SHIFT |
 			 STRTAB_STE_1_S1C_CACHE_WBRA
@@ -1124,8 +1136,12 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
 			dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
 
-		val |= (ste->s1_cfg->tables.base & STRTAB_STE_0_S1CTXPTR_MASK
+		val |= (cfg->base & STRTAB_STE_0_S1CTXPTR_MASK
 		        << STRTAB_STE_0_S1CTXPTR_SHIFT) |
+			(u64)(cfg->order & STRTAB_STE_0_S1CDMAX_MASK)
+			<< STRTAB_STE_0_S1CDMAX_SHIFT |
+			(cfg->arm_smmu.s1fmt & STRTAB_STE_0_S1FMT_MASK)
+			<< STRTAB_STE_0_S1FMT_SHIFT |
 			STRTAB_STE_0_CFG_S1_TRANS;
 	}
 
@@ -1569,6 +1585,7 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
 }
 
 static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
+				       struct arm_smmu_master_data *master,
 				       struct io_pgtable_cfg *pgtbl_cfg)
 {
 	int ret;
@@ -1578,6 +1595,7 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	struct iommu_pasid_table_cfg pasid_cfg = {
 		.iommu_dev		= smmu->dev,
+		.order			= master->ssid_bits,
 		.sync			= &arm_smmu_ctx_sync,
 		.arm_smmu = {
 			.stall		= !!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE),
@@ -1612,6 +1630,7 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 }
 
 static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain,
+				       struct arm_smmu_master_data *master,
 				       struct io_pgtable_cfg *pgtbl_cfg)
 {
 	int vmid;
@@ -1628,7 +1647,8 @@ static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain,
 	return 0;
 }
 
-static int arm_smmu_domain_finalise(struct iommu_domain *domain)
+static int arm_smmu_domain_finalise(struct iommu_domain *domain,
+				    struct arm_smmu_master_data *master)
 {
 	int ret;
 	unsigned long ias, oas;
@@ -1636,6 +1656,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain)
 	struct io_pgtable_cfg pgtbl_cfg;
 	struct io_pgtable_ops *pgtbl_ops;
 	int (*finalise_stage_fn)(struct arm_smmu_domain *,
+				 struct arm_smmu_master_data *,
 				 struct io_pgtable_cfg *);
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
@@ -1688,7 +1709,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain)
 	domain->geometry.aperture_end = (1UL << ias) - 1;
 	domain->geometry.force_aperture = true;
 
-	ret = finalise_stage_fn(smmu_domain, &pgtbl_cfg);
+	ret = finalise_stage_fn(smmu_domain, master, &pgtbl_cfg);
 	if (ret < 0) {
 		free_io_pgtable_ops(pgtbl_ops);
 		return ret;
@@ -1783,7 +1804,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 
 	if (!smmu_domain->smmu) {
 		smmu_domain->smmu = smmu;
-		ret = arm_smmu_domain_finalise(domain);
+		ret = arm_smmu_domain_finalise(domain, master);
 		if (ret) {
 			smmu_domain->smmu = NULL;
 			goto out_unlock;
@@ -1939,6 +1960,8 @@ static int arm_smmu_add_device(struct device *dev)
 		}
 	}
 
+	master->ssid_bits = min(smmu->ssid_bits, fwspec->num_pasid_bits);
+
 	group = iommu_group_get_for_dev(dev);
 	if (!IS_ERR(group)) {
 		iommu_group_put(group);
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 19/37] iommu/arm-smmu-v3: Add second level of context descriptor table
  2018-02-12 18:33 ` Jean-Philippe Brucker
  (?)
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: joro-zLv9SwRftAIdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	mark.rutland-5wv7dgnIgG8, catalin.marinas-5wv7dgnIgG8,
	will.deacon-5wv7dgnIgG8, lorenzo.pieralisi-5wv7dgnIgG8,
	hanjun.guo-QSEj5FYQhm4dnm+yROfE0A, sudeep.holla-5wv7dgnIgG8,
	rjw-LthD3rsA81gm4RdzfppkhA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	robin.murphy-5wv7dgnIgG8, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	tn-nYOzD4b6Jr9Wk0Htik3J/w, liubo95-hv44wF8Li93QT0dZR+AlfA,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	jonathan.cameron-hv44wF8Li93QT0dZR+AlfA,
	shunyong.yang-PT9Dzx9SjPiXmMXjJBpWqg,
	nwatters-sgV2jX0FEOL9JmXXK+q4OQ, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	jcrouse-sgV2jX0FEOL9JmXXK+q4OQ, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA,
	yi.l.liu-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	robdclark-Re5JQEeQqe8AvxtiuMwx3w, christian.koenig-5C7GfCeVMHo,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA

The SMMU can support up to 20 bits of SSID. Add a second level of page
tables to accommodate this. Devices that support more than 1024 SSIDs now
have a table of 1024 L1 entries (8kB), pointing to tables of 1024 context
descriptors (64kB), allocated on demand.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu-v3-context.c | 137 ++++++++++++++++++++++++++++++++++--
 1 file changed, 130 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3-context.c b/drivers/iommu/arm-smmu-v3-context.c
index 3b0bb9475dea..aaffc2071966 100644
--- a/drivers/iommu/arm-smmu-v3-context.c
+++ b/drivers/iommu/arm-smmu-v3-context.c
@@ -14,6 +14,19 @@
 
 #include "iommu-pasid.h"
 
+/*
+ * Linear: when less than 1024 SSIDs are supported
+ * 2lvl: at most 1024 L1 entrie,
+ *	 1024 lazy entries per table.
+ */
+#define CTXDESC_SPLIT			10
+#define CTXDESC_NUM_L2_ENTRIES		(1 << CTXDESC_SPLIT)
+
+#define CTXDESC_L1_DESC_DWORD		1
+#define CTXDESC_L1_DESC_VALID		1
+#define CTXDESC_L1_DESC_L2PTR_SHIFT	12
+#define CTXDESC_L1_DESC_L2PTR_MASK	0xfffffffffUL
+
 #define CTXDESC_CD_DWORDS		8
 #define CTXDESC_CD_0_TCR_T0SZ_SHIFT	0
 #define ARM64_TCR_T0SZ_SHIFT		0
@@ -86,7 +99,17 @@ struct arm_smmu_cd_table {
 
 struct arm_smmu_cd_tables {
 	struct iommu_pasid_table	pasid;
-	struct arm_smmu_cd_table	table;
+	bool				linear;
+	union {
+		struct arm_smmu_cd_table table;
+		struct {
+			__le64		*ptr;
+			dma_addr_t	ptr_dma;
+			size_t		num_entries;
+
+			struct arm_smmu_cd_table *tables;
+		} l1;
+	};
 };
 
 #define pasid_to_cd_tables(pasid_table) \
@@ -122,9 +145,44 @@ static void arm_smmu_free_cd_leaf_table(struct device *dev,
 	dmam_free_coherent(dev, size, desc->ptr, desc->ptr_dma);
 }
 
+static void arm_smmu_write_cd_l1_desc(__le64 *dst,
+				      struct arm_smmu_cd_table *desc)
+{
+	u64 val = (desc->ptr_dma & CTXDESC_L1_DESC_L2PTR_MASK <<
+		   CTXDESC_L1_DESC_L2PTR_SHIFT) | CTXDESC_L1_DESC_VALID;
+
+	*dst = cpu_to_le64(val);
+}
+
 static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_cd_tables *tbl, u32 ssid)
 {
-	return tbl->table.ptr + ssid * CTXDESC_CD_DWORDS;
+	unsigned long idx;
+	struct arm_smmu_cd_table *l1_desc;
+	struct iommu_pasid_table_cfg *cfg = &tbl->pasid.cfg;
+
+	if (tbl->linear)
+		return tbl->table.ptr + ssid * CTXDESC_CD_DWORDS;
+
+	idx = ssid >> CTXDESC_SPLIT;
+	if (idx >= tbl->l1.num_entries)
+		return NULL;
+
+	l1_desc = &tbl->l1.tables[idx];
+	if (!l1_desc->ptr) {
+		__le64 *l1ptr = tbl->l1.ptr + idx * CTXDESC_L1_DESC_DWORD;
+
+		if (arm_smmu_alloc_cd_leaf_table(cfg->iommu_dev, l1_desc,
+						 CTXDESC_NUM_L2_ENTRIES))
+			return NULL;
+
+		arm_smmu_write_cd_l1_desc(l1ptr, l1_desc);
+		/* An invalid L1 entry is allowed to be cached */
+		iommu_pasid_flush(&tbl->pasid, idx << CTXDESC_SPLIT, false);
+	}
+
+	idx = ssid & (CTXDESC_NUM_L2_ENTRIES - 1);
+
+	return l1_desc->ptr + idx * CTXDESC_CD_DWORDS;
 }
 
 static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
@@ -307,16 +365,51 @@ static struct iommu_pasid_table *
 arm_smmu_alloc_cd_tables(struct iommu_pasid_table_cfg *cfg, void *cookie)
 {
 	int ret;
+	size_t size = 0;
 	struct arm_smmu_cd_tables *tbl;
 	struct device *dev = cfg->iommu_dev;
+	struct arm_smmu_cd_table *leaf_table;
+	size_t num_contexts, num_leaf_entries;
 
 	tbl = devm_kzalloc(dev, sizeof(*tbl), GFP_KERNEL);
 	if (!tbl)
 		return NULL;
 
-	ret = arm_smmu_alloc_cd_leaf_table(dev, &tbl->table, 1 << cfg->order);
+	num_contexts = 1 << cfg->order;
+	if (num_contexts <= CTXDESC_NUM_L2_ENTRIES) {
+		/* Fits in a single table */
+		tbl->linear = true;
+		num_leaf_entries = num_contexts;
+		leaf_table = &tbl->table;
+	} else {
+		/*
+		 * SSID[S1CDmax-1:10] indexes 1st-level table, SSID[9:0] indexes
+		 * 2nd-level
+		 */
+		tbl->l1.num_entries = num_contexts / CTXDESC_NUM_L2_ENTRIES;
+
+		tbl->l1.tables = devm_kzalloc(dev,
+					      sizeof(struct arm_smmu_cd_table) *
+					      tbl->l1.num_entries, GFP_KERNEL);
+		if (!tbl->l1.tables)
+			goto err_free_tbl;
+
+		size = tbl->l1.num_entries * (CTXDESC_L1_DESC_DWORD << 3);
+		tbl->l1.ptr = dmam_alloc_coherent(dev, size, &tbl->l1.ptr_dma,
+						  GFP_KERNEL | __GFP_ZERO);
+		if (!tbl->l1.ptr) {
+			dev_warn(dev, "failed to allocate L1 context table\n");
+			devm_kfree(dev, tbl->l1.tables);
+			goto err_free_tbl;
+		}
+
+		num_leaf_entries = CTXDESC_NUM_L2_ENTRIES;
+		leaf_table = tbl->l1.tables;
+	}
+
+	ret = arm_smmu_alloc_cd_leaf_table(dev, leaf_table, num_leaf_entries);
 	if (ret)
-		goto err_free_tbl;
+		goto err_free_l1;
 
 	tbl->pasid.ops = (struct iommu_pasid_table_ops) {
 		.alloc_priv_entry	= arm_smmu_alloc_priv_cd,
@@ -326,11 +419,22 @@ arm_smmu_alloc_cd_tables(struct iommu_pasid_table_cfg *cfg, void *cookie)
 		.clear_entry		= arm_smmu_clear_cd,
 	};
 
-	cfg->base		= tbl->table.ptr_dma;
-	cfg->arm_smmu.s1fmt	= ARM_SMMU_S1FMT_LINEAR;
+	if (tbl->linear) {
+		cfg->base		= leaf_table->ptr_dma;
+		cfg->arm_smmu.s1fmt	= ARM_SMMU_S1FMT_LINEAR;
+	} else {
+		cfg->base		= tbl->l1.ptr_dma;
+		cfg->arm_smmu.s1fmt	= ARM_SMMU_S1FMT_64K_L2;
+		arm_smmu_write_cd_l1_desc(tbl->l1.ptr, leaf_table);
+	}
 
 	return &tbl->pasid;
 
+err_free_l1:
+	if (!tbl->linear) {
+		dmam_free_coherent(dev, size, tbl->l1.ptr, tbl->l1.ptr_dma);
+		devm_kfree(dev, tbl->l1.tables);
+	}
 err_free_tbl:
 	devm_kfree(dev, tbl);
 
@@ -343,7 +447,26 @@ static void arm_smmu_free_cd_tables(struct iommu_pasid_table *pasid_table)
 	struct device *dev = cfg->iommu_dev;
 	struct arm_smmu_cd_tables *tbl = pasid_to_cd_tables(pasid_table);
 
-	arm_smmu_free_cd_leaf_table(dev, &tbl->table, 1 << cfg->order);
+	if (tbl->linear) {
+		arm_smmu_free_cd_leaf_table(dev, &tbl->table, 1 << cfg->order);
+	} else {
+		size_t i, size;
+
+		for (i = 0; i < tbl->l1.num_entries; i++) {
+			struct arm_smmu_cd_table *table = &tbl->l1.tables[i];
+
+			if (!table->ptr)
+				continue;
+
+			arm_smmu_free_cd_leaf_table(dev, table,
+						    CTXDESC_NUM_L2_ENTRIES);
+		}
+
+		size = tbl->l1.num_entries * (CTXDESC_L1_DESC_DWORD << 3);
+		dmam_free_coherent(dev, size, tbl->l1.ptr, tbl->l1.ptr_dma);
+		devm_kfree(dev, tbl->l1.tables);
+	}
+
 	devm_kfree(dev, tbl);
 }
 
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 19/37] iommu/arm-smmu-v3: Add second level of context descriptor table
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

The SMMU can support up to 20 bits of SSID. Add a second level of page
tables to accommodate this. Devices that support more than 1024 SSIDs now
have a table of 1024 L1 entries (8kB), pointing to tables of 1024 context
descriptors (64kB), allocated on demand.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3-context.c | 137 ++++++++++++++++++++++++++++++++++--
 1 file changed, 130 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3-context.c b/drivers/iommu/arm-smmu-v3-context.c
index 3b0bb9475dea..aaffc2071966 100644
--- a/drivers/iommu/arm-smmu-v3-context.c
+++ b/drivers/iommu/arm-smmu-v3-context.c
@@ -14,6 +14,19 @@
 
 #include "iommu-pasid.h"
 
+/*
+ * Linear: when less than 1024 SSIDs are supported
+ * 2lvl: at most 1024 L1 entrie,
+ *	 1024 lazy entries per table.
+ */
+#define CTXDESC_SPLIT			10
+#define CTXDESC_NUM_L2_ENTRIES		(1 << CTXDESC_SPLIT)
+
+#define CTXDESC_L1_DESC_DWORD		1
+#define CTXDESC_L1_DESC_VALID		1
+#define CTXDESC_L1_DESC_L2PTR_SHIFT	12
+#define CTXDESC_L1_DESC_L2PTR_MASK	0xfffffffffUL
+
 #define CTXDESC_CD_DWORDS		8
 #define CTXDESC_CD_0_TCR_T0SZ_SHIFT	0
 #define ARM64_TCR_T0SZ_SHIFT		0
@@ -86,7 +99,17 @@ struct arm_smmu_cd_table {
 
 struct arm_smmu_cd_tables {
 	struct iommu_pasid_table	pasid;
-	struct arm_smmu_cd_table	table;
+	bool				linear;
+	union {
+		struct arm_smmu_cd_table table;
+		struct {
+			__le64		*ptr;
+			dma_addr_t	ptr_dma;
+			size_t		num_entries;
+
+			struct arm_smmu_cd_table *tables;
+		} l1;
+	};
 };
 
 #define pasid_to_cd_tables(pasid_table) \
@@ -122,9 +145,44 @@ static void arm_smmu_free_cd_leaf_table(struct device *dev,
 	dmam_free_coherent(dev, size, desc->ptr, desc->ptr_dma);
 }
 
+static void arm_smmu_write_cd_l1_desc(__le64 *dst,
+				      struct arm_smmu_cd_table *desc)
+{
+	u64 val = (desc->ptr_dma & CTXDESC_L1_DESC_L2PTR_MASK <<
+		   CTXDESC_L1_DESC_L2PTR_SHIFT) | CTXDESC_L1_DESC_VALID;
+
+	*dst = cpu_to_le64(val);
+}
+
 static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_cd_tables *tbl, u32 ssid)
 {
-	return tbl->table.ptr + ssid * CTXDESC_CD_DWORDS;
+	unsigned long idx;
+	struct arm_smmu_cd_table *l1_desc;
+	struct iommu_pasid_table_cfg *cfg = &tbl->pasid.cfg;
+
+	if (tbl->linear)
+		return tbl->table.ptr + ssid * CTXDESC_CD_DWORDS;
+
+	idx = ssid >> CTXDESC_SPLIT;
+	if (idx >= tbl->l1.num_entries)
+		return NULL;
+
+	l1_desc = &tbl->l1.tables[idx];
+	if (!l1_desc->ptr) {
+		__le64 *l1ptr = tbl->l1.ptr + idx * CTXDESC_L1_DESC_DWORD;
+
+		if (arm_smmu_alloc_cd_leaf_table(cfg->iommu_dev, l1_desc,
+						 CTXDESC_NUM_L2_ENTRIES))
+			return NULL;
+
+		arm_smmu_write_cd_l1_desc(l1ptr, l1_desc);
+		/* An invalid L1 entry is allowed to be cached */
+		iommu_pasid_flush(&tbl->pasid, idx << CTXDESC_SPLIT, false);
+	}
+
+	idx = ssid & (CTXDESC_NUM_L2_ENTRIES - 1);
+
+	return l1_desc->ptr + idx * CTXDESC_CD_DWORDS;
 }
 
 static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
@@ -307,16 +365,51 @@ static struct iommu_pasid_table *
 arm_smmu_alloc_cd_tables(struct iommu_pasid_table_cfg *cfg, void *cookie)
 {
 	int ret;
+	size_t size = 0;
 	struct arm_smmu_cd_tables *tbl;
 	struct device *dev = cfg->iommu_dev;
+	struct arm_smmu_cd_table *leaf_table;
+	size_t num_contexts, num_leaf_entries;
 
 	tbl = devm_kzalloc(dev, sizeof(*tbl), GFP_KERNEL);
 	if (!tbl)
 		return NULL;
 
-	ret = arm_smmu_alloc_cd_leaf_table(dev, &tbl->table, 1 << cfg->order);
+	num_contexts = 1 << cfg->order;
+	if (num_contexts <= CTXDESC_NUM_L2_ENTRIES) {
+		/* Fits in a single table */
+		tbl->linear = true;
+		num_leaf_entries = num_contexts;
+		leaf_table = &tbl->table;
+	} else {
+		/*
+		 * SSID[S1CDmax-1:10] indexes 1st-level table, SSID[9:0] indexes
+		 * 2nd-level
+		 */
+		tbl->l1.num_entries = num_contexts / CTXDESC_NUM_L2_ENTRIES;
+
+		tbl->l1.tables = devm_kzalloc(dev,
+					      sizeof(struct arm_smmu_cd_table) *
+					      tbl->l1.num_entries, GFP_KERNEL);
+		if (!tbl->l1.tables)
+			goto err_free_tbl;
+
+		size = tbl->l1.num_entries * (CTXDESC_L1_DESC_DWORD << 3);
+		tbl->l1.ptr = dmam_alloc_coherent(dev, size, &tbl->l1.ptr_dma,
+						  GFP_KERNEL | __GFP_ZERO);
+		if (!tbl->l1.ptr) {
+			dev_warn(dev, "failed to allocate L1 context table\n");
+			devm_kfree(dev, tbl->l1.tables);
+			goto err_free_tbl;
+		}
+
+		num_leaf_entries = CTXDESC_NUM_L2_ENTRIES;
+		leaf_table = tbl->l1.tables;
+	}
+
+	ret = arm_smmu_alloc_cd_leaf_table(dev, leaf_table, num_leaf_entries);
 	if (ret)
-		goto err_free_tbl;
+		goto err_free_l1;
 
 	tbl->pasid.ops = (struct iommu_pasid_table_ops) {
 		.alloc_priv_entry	= arm_smmu_alloc_priv_cd,
@@ -326,11 +419,22 @@ arm_smmu_alloc_cd_tables(struct iommu_pasid_table_cfg *cfg, void *cookie)
 		.clear_entry		= arm_smmu_clear_cd,
 	};
 
-	cfg->base		= tbl->table.ptr_dma;
-	cfg->arm_smmu.s1fmt	= ARM_SMMU_S1FMT_LINEAR;
+	if (tbl->linear) {
+		cfg->base		= leaf_table->ptr_dma;
+		cfg->arm_smmu.s1fmt	= ARM_SMMU_S1FMT_LINEAR;
+	} else {
+		cfg->base		= tbl->l1.ptr_dma;
+		cfg->arm_smmu.s1fmt	= ARM_SMMU_S1FMT_64K_L2;
+		arm_smmu_write_cd_l1_desc(tbl->l1.ptr, leaf_table);
+	}
 
 	return &tbl->pasid;
 
+err_free_l1:
+	if (!tbl->linear) {
+		dmam_free_coherent(dev, size, tbl->l1.ptr, tbl->l1.ptr_dma);
+		devm_kfree(dev, tbl->l1.tables);
+	}
 err_free_tbl:
 	devm_kfree(dev, tbl);
 
@@ -343,7 +447,26 @@ static void arm_smmu_free_cd_tables(struct iommu_pasid_table *pasid_table)
 	struct device *dev = cfg->iommu_dev;
 	struct arm_smmu_cd_tables *tbl = pasid_to_cd_tables(pasid_table);
 
-	arm_smmu_free_cd_leaf_table(dev, &tbl->table, 1 << cfg->order);
+	if (tbl->linear) {
+		arm_smmu_free_cd_leaf_table(dev, &tbl->table, 1 << cfg->order);
+	} else {
+		size_t i, size;
+
+		for (i = 0; i < tbl->l1.num_entries; i++) {
+			struct arm_smmu_cd_table *table = &tbl->l1.tables[i];
+
+			if (!table->ptr)
+				continue;
+
+			arm_smmu_free_cd_leaf_table(dev, table,
+						    CTXDESC_NUM_L2_ENTRIES);
+		}
+
+		size = tbl->l1.num_entries * (CTXDESC_L1_DESC_DWORD << 3);
+		dmam_free_coherent(dev, size, tbl->l1.ptr, tbl->l1.ptr_dma);
+		devm_kfree(dev, tbl->l1.tables);
+	}
+
 	devm_kfree(dev, tbl);
 }
 
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 19/37] iommu/arm-smmu-v3: Add second level of context descriptor table
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

The SMMU can support up to 20 bits of SSID. Add a second level of page
tables to accommodate this. Devices that support more than 1024 SSIDs now
have a table of 1024 L1 entries (8kB), pointing to tables of 1024 context
descriptors (64kB), allocated on demand.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3-context.c | 137 ++++++++++++++++++++++++++++++++++--
 1 file changed, 130 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3-context.c b/drivers/iommu/arm-smmu-v3-context.c
index 3b0bb9475dea..aaffc2071966 100644
--- a/drivers/iommu/arm-smmu-v3-context.c
+++ b/drivers/iommu/arm-smmu-v3-context.c
@@ -14,6 +14,19 @@
 
 #include "iommu-pasid.h"
 
+/*
+ * Linear: when less than 1024 SSIDs are supported
+ * 2lvl: at most 1024 L1 entrie,
+ *	 1024 lazy entries per table.
+ */
+#define CTXDESC_SPLIT			10
+#define CTXDESC_NUM_L2_ENTRIES		(1 << CTXDESC_SPLIT)
+
+#define CTXDESC_L1_DESC_DWORD		1
+#define CTXDESC_L1_DESC_VALID		1
+#define CTXDESC_L1_DESC_L2PTR_SHIFT	12
+#define CTXDESC_L1_DESC_L2PTR_MASK	0xfffffffffUL
+
 #define CTXDESC_CD_DWORDS		8
 #define CTXDESC_CD_0_TCR_T0SZ_SHIFT	0
 #define ARM64_TCR_T0SZ_SHIFT		0
@@ -86,7 +99,17 @@ struct arm_smmu_cd_table {
 
 struct arm_smmu_cd_tables {
 	struct iommu_pasid_table	pasid;
-	struct arm_smmu_cd_table	table;
+	bool				linear;
+	union {
+		struct arm_smmu_cd_table table;
+		struct {
+			__le64		*ptr;
+			dma_addr_t	ptr_dma;
+			size_t		num_entries;
+
+			struct arm_smmu_cd_table *tables;
+		} l1;
+	};
 };
 
 #define pasid_to_cd_tables(pasid_table) \
@@ -122,9 +145,44 @@ static void arm_smmu_free_cd_leaf_table(struct device *dev,
 	dmam_free_coherent(dev, size, desc->ptr, desc->ptr_dma);
 }
 
+static void arm_smmu_write_cd_l1_desc(__le64 *dst,
+				      struct arm_smmu_cd_table *desc)
+{
+	u64 val = (desc->ptr_dma & CTXDESC_L1_DESC_L2PTR_MASK <<
+		   CTXDESC_L1_DESC_L2PTR_SHIFT) | CTXDESC_L1_DESC_VALID;
+
+	*dst = cpu_to_le64(val);
+}
+
 static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_cd_tables *tbl, u32 ssid)
 {
-	return tbl->table.ptr + ssid * CTXDESC_CD_DWORDS;
+	unsigned long idx;
+	struct arm_smmu_cd_table *l1_desc;
+	struct iommu_pasid_table_cfg *cfg = &tbl->pasid.cfg;
+
+	if (tbl->linear)
+		return tbl->table.ptr + ssid * CTXDESC_CD_DWORDS;
+
+	idx = ssid >> CTXDESC_SPLIT;
+	if (idx >= tbl->l1.num_entries)
+		return NULL;
+
+	l1_desc = &tbl->l1.tables[idx];
+	if (!l1_desc->ptr) {
+		__le64 *l1ptr = tbl->l1.ptr + idx * CTXDESC_L1_DESC_DWORD;
+
+		if (arm_smmu_alloc_cd_leaf_table(cfg->iommu_dev, l1_desc,
+						 CTXDESC_NUM_L2_ENTRIES))
+			return NULL;
+
+		arm_smmu_write_cd_l1_desc(l1ptr, l1_desc);
+		/* An invalid L1 entry is allowed to be cached */
+		iommu_pasid_flush(&tbl->pasid, idx << CTXDESC_SPLIT, false);
+	}
+
+	idx = ssid & (CTXDESC_NUM_L2_ENTRIES - 1);
+
+	return l1_desc->ptr + idx * CTXDESC_CD_DWORDS;
 }
 
 static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
@@ -307,16 +365,51 @@ static struct iommu_pasid_table *
 arm_smmu_alloc_cd_tables(struct iommu_pasid_table_cfg *cfg, void *cookie)
 {
 	int ret;
+	size_t size = 0;
 	struct arm_smmu_cd_tables *tbl;
 	struct device *dev = cfg->iommu_dev;
+	struct arm_smmu_cd_table *leaf_table;
+	size_t num_contexts, num_leaf_entries;
 
 	tbl = devm_kzalloc(dev, sizeof(*tbl), GFP_KERNEL);
 	if (!tbl)
 		return NULL;
 
-	ret = arm_smmu_alloc_cd_leaf_table(dev, &tbl->table, 1 << cfg->order);
+	num_contexts = 1 << cfg->order;
+	if (num_contexts <= CTXDESC_NUM_L2_ENTRIES) {
+		/* Fits in a single table */
+		tbl->linear = true;
+		num_leaf_entries = num_contexts;
+		leaf_table = &tbl->table;
+	} else {
+		/*
+		 * SSID[S1CDmax-1:10] indexes 1st-level table, SSID[9:0] indexes
+		 * 2nd-level
+		 */
+		tbl->l1.num_entries = num_contexts / CTXDESC_NUM_L2_ENTRIES;
+
+		tbl->l1.tables = devm_kzalloc(dev,
+					      sizeof(struct arm_smmu_cd_table) *
+					      tbl->l1.num_entries, GFP_KERNEL);
+		if (!tbl->l1.tables)
+			goto err_free_tbl;
+
+		size = tbl->l1.num_entries * (CTXDESC_L1_DESC_DWORD << 3);
+		tbl->l1.ptr = dmam_alloc_coherent(dev, size, &tbl->l1.ptr_dma,
+						  GFP_KERNEL | __GFP_ZERO);
+		if (!tbl->l1.ptr) {
+			dev_warn(dev, "failed to allocate L1 context table\n");
+			devm_kfree(dev, tbl->l1.tables);
+			goto err_free_tbl;
+		}
+
+		num_leaf_entries = CTXDESC_NUM_L2_ENTRIES;
+		leaf_table = tbl->l1.tables;
+	}
+
+	ret = arm_smmu_alloc_cd_leaf_table(dev, leaf_table, num_leaf_entries);
 	if (ret)
-		goto err_free_tbl;
+		goto err_free_l1;
 
 	tbl->pasid.ops = (struct iommu_pasid_table_ops) {
 		.alloc_priv_entry	= arm_smmu_alloc_priv_cd,
@@ -326,11 +419,22 @@ arm_smmu_alloc_cd_tables(struct iommu_pasid_table_cfg *cfg, void *cookie)
 		.clear_entry		= arm_smmu_clear_cd,
 	};
 
-	cfg->base		= tbl->table.ptr_dma;
-	cfg->arm_smmu.s1fmt	= ARM_SMMU_S1FMT_LINEAR;
+	if (tbl->linear) {
+		cfg->base		= leaf_table->ptr_dma;
+		cfg->arm_smmu.s1fmt	= ARM_SMMU_S1FMT_LINEAR;
+	} else {
+		cfg->base		= tbl->l1.ptr_dma;
+		cfg->arm_smmu.s1fmt	= ARM_SMMU_S1FMT_64K_L2;
+		arm_smmu_write_cd_l1_desc(tbl->l1.ptr, leaf_table);
+	}
 
 	return &tbl->pasid;
 
+err_free_l1:
+	if (!tbl->linear) {
+		dmam_free_coherent(dev, size, tbl->l1.ptr, tbl->l1.ptr_dma);
+		devm_kfree(dev, tbl->l1.tables);
+	}
 err_free_tbl:
 	devm_kfree(dev, tbl);
 
@@ -343,7 +447,26 @@ static void arm_smmu_free_cd_tables(struct iommu_pasid_table *pasid_table)
 	struct device *dev = cfg->iommu_dev;
 	struct arm_smmu_cd_tables *tbl = pasid_to_cd_tables(pasid_table);
 
-	arm_smmu_free_cd_leaf_table(dev, &tbl->table, 1 << cfg->order);
+	if (tbl->linear) {
+		arm_smmu_free_cd_leaf_table(dev, &tbl->table, 1 << cfg->order);
+	} else {
+		size_t i, size;
+
+		for (i = 0; i < tbl->l1.num_entries; i++) {
+			struct arm_smmu_cd_table *table = &tbl->l1.tables[i];
+
+			if (!table->ptr)
+				continue;
+
+			arm_smmu_free_cd_leaf_table(dev, table,
+						    CTXDESC_NUM_L2_ENTRIES);
+		}
+
+		size = tbl->l1.num_entries * (CTXDESC_L1_DESC_DWORD << 3);
+		dmam_free_coherent(dev, size, tbl->l1.ptr, tbl->l1.ptr_dma);
+		devm_kfree(dev, tbl->l1.tables);
+	}
+
 	devm_kfree(dev, tbl);
 }
 
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 20/37] iommu/arm-smmu-v3: Share process page tables
  2018-02-12 18:33 ` Jean-Philippe Brucker
@ 2018-02-12 18:33   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

With Shared Virtual Addressing (SVA), we need to mirror CPU TTBR, TCR,
MAIR and ASIDs in SMMU contexts. Each SMMU has a single ASID space split
into two sets, shared and private. Shared ASIDs correspond to those
obtained from the arch ASID allocator, and private ASIDs are used for
"classic" map/unmap DMA.

Replace the ASID IDA with an IDR, allowing to keep information about each
context. Initialize shared contexts with info obtained from the mm.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3-context.c | 181 ++++++++++++++++++++++++++++++++++--
 1 file changed, 171 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3-context.c b/drivers/iommu/arm-smmu-v3-context.c
index aaffc2071966..b7c90384ff56 100644
--- a/drivers/iommu/arm-smmu-v3-context.c
+++ b/drivers/iommu/arm-smmu-v3-context.c
@@ -10,9 +10,11 @@
 #include <linux/dma-mapping.h>
 #include <linux/idr.h>
 #include <linux/kernel.h>
+#include <linux/mmu_context.h>
 #include <linux/slab.h>
 
 #include "iommu-pasid.h"
+#include "io-pgtable-arm.h"
 
 /*
  * Linear: when less than 1024 SSIDs are supported
@@ -87,6 +89,9 @@ struct arm_smmu_cd {
 	u64				ttbr;
 	u64				tcr;
 	u64				mair;
+
+	refcount_t			refs;
+	struct mm_struct		*mm;
 };
 
 #define pasid_entry_to_cd(entry) \
@@ -118,7 +123,8 @@ struct arm_smmu_cd_tables {
 #define pasid_ops_to_tables(ops) \
 	pasid_to_cd_tables(iommu_pasid_table_ops_to_table(ops))
 
-static DEFINE_IDA(asid_ida);
+static DEFINE_SPINLOCK(asid_lock);
+static DEFINE_IDR(asid_idr);
 
 static int arm_smmu_alloc_cd_leaf_table(struct device *dev,
 					struct arm_smmu_cd_table *desc,
@@ -260,7 +266,8 @@ static int arm_smmu_write_ctx_desc(struct arm_smmu_cd_tables *tbl, int ssid,
 		      CTXDESC_CD_0_ENDI |
 #endif
 		      CTXDESC_CD_0_R | CTXDESC_CD_0_A |
-		      CTXDESC_CD_0_ASET_PRIVATE |
+		      (cd->mm ? CTXDESC_CD_0_ASET_SHARED :
+		       CTXDESC_CD_0_ASET_PRIVATE) |
 		      CTXDESC_CD_0_AA64 |
 		      (cd->entry.tag & CTXDESC_CD_0_ASID_MASK)
 		      << CTXDESC_CD_0_ASID_SHIFT |
@@ -277,10 +284,145 @@ static int arm_smmu_write_ctx_desc(struct arm_smmu_cd_tables *tbl, int ssid,
 	return 0;
 }
 
+static bool arm_smmu_free_asid(struct arm_smmu_cd *cd)
+{
+	bool free;
+	struct arm_smmu_cd *old_cd;
+
+	spin_lock(&asid_lock);
+	free = refcount_dec_and_test(&cd->refs);
+	if (free) {
+		old_cd = idr_remove(&asid_idr, (u16)cd->entry.tag);
+		WARN_ON(old_cd != cd);
+	}
+	spin_unlock(&asid_lock);
+
+	return free;
+}
+
+static struct arm_smmu_cd *arm_smmu_alloc_cd(struct arm_smmu_cd_tables *tbl)
+{
+	struct arm_smmu_cd *cd;
+
+	cd = kzalloc(sizeof(*cd), GFP_KERNEL);
+	if (!cd)
+		return NULL;
+
+	refcount_set(&cd->refs, 1);
+
+	return cd;
+}
+
+static struct arm_smmu_cd *arm_smmu_share_asid(u16 asid)
+{
+	struct arm_smmu_cd *cd;
+
+	cd = idr_find(&asid_idr, asid);
+	if (!cd)
+		return NULL;
+
+	if (cd->mm) {
+		/*
+		 * It's pretty common to find a stale CD when doing unbind-bind,
+		 * given that the release happens after a RCU grace period.
+		 * Simply reuse it.
+		 */
+		refcount_inc(&cd->refs);
+		return cd;
+	}
+
+	/*
+	 * Ouch, ASID is already in use for a private cd.
+	 * TODO: seize it, for the common good.
+	 */
+	return ERR_PTR(-EEXIST);
+}
+
 static struct iommu_pasid_entry *
 arm_smmu_alloc_shared_cd(struct iommu_pasid_table_ops *ops, struct mm_struct *mm)
 {
-	return ERR_PTR(-ENODEV);
+	u16 asid;
+	u64 tcr, par, reg;
+	int ret = -ENOMEM;
+	struct arm_smmu_cd *cd;
+	struct arm_smmu_cd *old_cd = NULL;
+	struct arm_smmu_cd_tables *tbl = pasid_ops_to_tables(ops);
+
+	asid = mm_context_get(mm);
+	if (!asid)
+		return ERR_PTR(-ESRCH);
+
+	cd = arm_smmu_alloc_cd(tbl);
+	if (!cd)
+		goto err_put_context;
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&asid_lock);
+	old_cd = arm_smmu_share_asid(asid);
+	if (!old_cd)
+		ret = idr_alloc(&asid_idr, cd, asid, asid + 1, GFP_ATOMIC);
+	spin_unlock(&asid_lock);
+	idr_preload_end();
+
+	if (!IS_ERR_OR_NULL(old_cd)) {
+		if (WARN_ON(old_cd->mm != mm)) {
+			ret = -EINVAL;
+			goto err_free_cd;
+		}
+		kfree(cd);
+		mm_context_put(mm);
+		return &old_cd->entry;
+	} else if (old_cd) {
+		ret = PTR_ERR(old_cd);
+		goto err_free_cd;
+	}
+
+	tcr = TCR_T0SZ(VA_BITS) | TCR_IRGN0_WBWA | TCR_ORGN0_WBWA |
+		TCR_SH0_INNER | ARM_LPAE_TCR_EPD1;
+
+	switch (PAGE_SIZE) {
+	case SZ_4K:
+		tcr |= TCR_TG0_4K;
+		break;
+	case SZ_16K:
+		tcr |= TCR_TG0_16K;
+		break;
+	case SZ_64K:
+		tcr |= TCR_TG0_64K;
+		break;
+	default:
+		WARN_ON(1);
+		ret = -EINVAL;
+		goto err_free_asid;
+	}
+
+	reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
+	par = cpuid_feature_extract_unsigned_field(reg, ID_AA64MMFR0_PARANGE_SHIFT);
+	tcr |= par << ARM_LPAE_TCR_IPS_SHIFT;
+
+	cd->ttbr	= virt_to_phys(mm->pgd);
+	cd->tcr		= tcr;
+	/*
+	 * MAIR value is pretty much constant and global, so we can just get it
+	 * from the current CPU register
+	 */
+	cd->mair	= read_sysreg(mair_el1);
+
+	cd->mm		= mm;
+	cd->entry.tag	= asid;
+
+	return &cd->entry;
+
+err_free_asid:
+	arm_smmu_free_asid(cd);
+
+err_free_cd:
+	kfree(cd);
+
+err_put_context:
+	mm_context_put(mm);
+
+	return ERR_PTR(ret);
 }
 
 static struct iommu_pasid_entry *
@@ -294,19 +436,23 @@ arm_smmu_alloc_priv_cd(struct iommu_pasid_table_ops *ops,
 	struct arm_smmu_cd_tables *tbl = pasid_ops_to_tables(ops);
 	struct arm_smmu_context_cfg *ctx_cfg = &tbl->pasid.cfg.arm_smmu;
 
-	cd = kzalloc(sizeof(*cd), GFP_KERNEL);
+	cd = arm_smmu_alloc_cd(tbl);
 	if (!cd)
 		return ERR_PTR(-ENOMEM);
 
-	asid = ida_simple_get(&asid_ida, 0, 1 << ctx_cfg->asid_bits,
-			      GFP_KERNEL);
+	idr_preload(GFP_KERNEL);
+	spin_lock(&asid_lock);
+	asid = idr_alloc_cyclic(&asid_idr, cd, 0, 1 << ctx_cfg->asid_bits,
+				GFP_ATOMIC);
+	cd->entry.tag = asid;
+	spin_unlock(&asid_lock);
+	idr_preload_end();
+
 	if (asid < 0) {
 		kfree(cd);
 		return ERR_PTR(asid);
 	}
 
-	cd->entry.tag = asid;
-
 	switch (fmt) {
 	case ARM_64_LPAE_S1:
 		cd->ttbr	= cfg->arm_lpae_s1_cfg.ttbr[0];
@@ -322,7 +468,7 @@ arm_smmu_alloc_priv_cd(struct iommu_pasid_table_ops *ops,
 	return &cd->entry;
 
 err_free_asid:
-	ida_simple_remove(&asid_ida, asid);
+	arm_smmu_free_asid(cd);
 
 	kfree(cd);
 
@@ -334,7 +480,14 @@ static void arm_smmu_free_cd(struct iommu_pasid_table_ops *ops,
 {
 	struct arm_smmu_cd *cd = pasid_entry_to_cd(entry);
 
-	ida_simple_remove(&asid_ida, (u16)entry->tag);
+	if (!arm_smmu_free_asid(cd))
+		return;
+
+	if (cd->mm) {
+		/* Unpin ASID */
+		mm_context_put(cd->mm);
+	}
+
 	kfree(cd);
 }
 
@@ -359,6 +512,14 @@ static void arm_smmu_clear_cd(struct iommu_pasid_table_ops *ops, int pasid,
 		return;
 
 	arm_smmu_write_ctx_desc(tbl, pasid, NULL);
+
+	/*
+	 * The ASID allocator won't broadcast the final TLB invalidations for
+	 * this ASID, so we need to do it manually. For private contexts,
+	 * freeing io-pgtable ops performs the invalidation.
+	 */
+	if (cd->mm)
+		iommu_pasid_flush_tlbs(&tbl->pasid, cd->pasid, entry);
 }
 
 static struct iommu_pasid_table *
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 20/37] iommu/arm-smmu-v3: Share process page tables
@ 2018-02-12 18:33   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

With Shared Virtual Addressing (SVA), we need to mirror CPU TTBR, TCR,
MAIR and ASIDs in SMMU contexts. Each SMMU has a single ASID space split
into two sets, shared and private. Shared ASIDs correspond to those
obtained from the arch ASID allocator, and private ASIDs are used for
"classic" map/unmap DMA.

Replace the ASID IDA with an IDR, allowing to keep information about each
context. Initialize shared contexts with info obtained from the mm.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3-context.c | 181 ++++++++++++++++++++++++++++++++++--
 1 file changed, 171 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3-context.c b/drivers/iommu/arm-smmu-v3-context.c
index aaffc2071966..b7c90384ff56 100644
--- a/drivers/iommu/arm-smmu-v3-context.c
+++ b/drivers/iommu/arm-smmu-v3-context.c
@@ -10,9 +10,11 @@
 #include <linux/dma-mapping.h>
 #include <linux/idr.h>
 #include <linux/kernel.h>
+#include <linux/mmu_context.h>
 #include <linux/slab.h>
 
 #include "iommu-pasid.h"
+#include "io-pgtable-arm.h"
 
 /*
  * Linear: when less than 1024 SSIDs are supported
@@ -87,6 +89,9 @@ struct arm_smmu_cd {
 	u64				ttbr;
 	u64				tcr;
 	u64				mair;
+
+	refcount_t			refs;
+	struct mm_struct		*mm;
 };
 
 #define pasid_entry_to_cd(entry) \
@@ -118,7 +123,8 @@ struct arm_smmu_cd_tables {
 #define pasid_ops_to_tables(ops) \
 	pasid_to_cd_tables(iommu_pasid_table_ops_to_table(ops))
 
-static DEFINE_IDA(asid_ida);
+static DEFINE_SPINLOCK(asid_lock);
+static DEFINE_IDR(asid_idr);
 
 static int arm_smmu_alloc_cd_leaf_table(struct device *dev,
 					struct arm_smmu_cd_table *desc,
@@ -260,7 +266,8 @@ static int arm_smmu_write_ctx_desc(struct arm_smmu_cd_tables *tbl, int ssid,
 		      CTXDESC_CD_0_ENDI |
 #endif
 		      CTXDESC_CD_0_R | CTXDESC_CD_0_A |
-		      CTXDESC_CD_0_ASET_PRIVATE |
+		      (cd->mm ? CTXDESC_CD_0_ASET_SHARED :
+		       CTXDESC_CD_0_ASET_PRIVATE) |
 		      CTXDESC_CD_0_AA64 |
 		      (cd->entry.tag & CTXDESC_CD_0_ASID_MASK)
 		      << CTXDESC_CD_0_ASID_SHIFT |
@@ -277,10 +284,145 @@ static int arm_smmu_write_ctx_desc(struct arm_smmu_cd_tables *tbl, int ssid,
 	return 0;
 }
 
+static bool arm_smmu_free_asid(struct arm_smmu_cd *cd)
+{
+	bool free;
+	struct arm_smmu_cd *old_cd;
+
+	spin_lock(&asid_lock);
+	free = refcount_dec_and_test(&cd->refs);
+	if (free) {
+		old_cd = idr_remove(&asid_idr, (u16)cd->entry.tag);
+		WARN_ON(old_cd != cd);
+	}
+	spin_unlock(&asid_lock);
+
+	return free;
+}
+
+static struct arm_smmu_cd *arm_smmu_alloc_cd(struct arm_smmu_cd_tables *tbl)
+{
+	struct arm_smmu_cd *cd;
+
+	cd = kzalloc(sizeof(*cd), GFP_KERNEL);
+	if (!cd)
+		return NULL;
+
+	refcount_set(&cd->refs, 1);
+
+	return cd;
+}
+
+static struct arm_smmu_cd *arm_smmu_share_asid(u16 asid)
+{
+	struct arm_smmu_cd *cd;
+
+	cd = idr_find(&asid_idr, asid);
+	if (!cd)
+		return NULL;
+
+	if (cd->mm) {
+		/*
+		 * It's pretty common to find a stale CD when doing unbind-bind,
+		 * given that the release happens after a RCU grace period.
+		 * Simply reuse it.
+		 */
+		refcount_inc(&cd->refs);
+		return cd;
+	}
+
+	/*
+	 * Ouch, ASID is already in use for a private cd.
+	 * TODO: seize it, for the common good.
+	 */
+	return ERR_PTR(-EEXIST);
+}
+
 static struct iommu_pasid_entry *
 arm_smmu_alloc_shared_cd(struct iommu_pasid_table_ops *ops, struct mm_struct *mm)
 {
-	return ERR_PTR(-ENODEV);
+	u16 asid;
+	u64 tcr, par, reg;
+	int ret = -ENOMEM;
+	struct arm_smmu_cd *cd;
+	struct arm_smmu_cd *old_cd = NULL;
+	struct arm_smmu_cd_tables *tbl = pasid_ops_to_tables(ops);
+
+	asid = mm_context_get(mm);
+	if (!asid)
+		return ERR_PTR(-ESRCH);
+
+	cd = arm_smmu_alloc_cd(tbl);
+	if (!cd)
+		goto err_put_context;
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&asid_lock);
+	old_cd = arm_smmu_share_asid(asid);
+	if (!old_cd)
+		ret = idr_alloc(&asid_idr, cd, asid, asid + 1, GFP_ATOMIC);
+	spin_unlock(&asid_lock);
+	idr_preload_end();
+
+	if (!IS_ERR_OR_NULL(old_cd)) {
+		if (WARN_ON(old_cd->mm != mm)) {
+			ret = -EINVAL;
+			goto err_free_cd;
+		}
+		kfree(cd);
+		mm_context_put(mm);
+		return &old_cd->entry;
+	} else if (old_cd) {
+		ret = PTR_ERR(old_cd);
+		goto err_free_cd;
+	}
+
+	tcr = TCR_T0SZ(VA_BITS) | TCR_IRGN0_WBWA | TCR_ORGN0_WBWA |
+		TCR_SH0_INNER | ARM_LPAE_TCR_EPD1;
+
+	switch (PAGE_SIZE) {
+	case SZ_4K:
+		tcr |= TCR_TG0_4K;
+		break;
+	case SZ_16K:
+		tcr |= TCR_TG0_16K;
+		break;
+	case SZ_64K:
+		tcr |= TCR_TG0_64K;
+		break;
+	default:
+		WARN_ON(1);
+		ret = -EINVAL;
+		goto err_free_asid;
+	}
+
+	reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
+	par = cpuid_feature_extract_unsigned_field(reg, ID_AA64MMFR0_PARANGE_SHIFT);
+	tcr |= par << ARM_LPAE_TCR_IPS_SHIFT;
+
+	cd->ttbr	= virt_to_phys(mm->pgd);
+	cd->tcr		= tcr;
+	/*
+	 * MAIR value is pretty much constant and global, so we can just get it
+	 * from the current CPU register
+	 */
+	cd->mair	= read_sysreg(mair_el1);
+
+	cd->mm		= mm;
+	cd->entry.tag	= asid;
+
+	return &cd->entry;
+
+err_free_asid:
+	arm_smmu_free_asid(cd);
+
+err_free_cd:
+	kfree(cd);
+
+err_put_context:
+	mm_context_put(mm);
+
+	return ERR_PTR(ret);
 }
 
 static struct iommu_pasid_entry *
@@ -294,19 +436,23 @@ arm_smmu_alloc_priv_cd(struct iommu_pasid_table_ops *ops,
 	struct arm_smmu_cd_tables *tbl = pasid_ops_to_tables(ops);
 	struct arm_smmu_context_cfg *ctx_cfg = &tbl->pasid.cfg.arm_smmu;
 
-	cd = kzalloc(sizeof(*cd), GFP_KERNEL);
+	cd = arm_smmu_alloc_cd(tbl);
 	if (!cd)
 		return ERR_PTR(-ENOMEM);
 
-	asid = ida_simple_get(&asid_ida, 0, 1 << ctx_cfg->asid_bits,
-			      GFP_KERNEL);
+	idr_preload(GFP_KERNEL);
+	spin_lock(&asid_lock);
+	asid = idr_alloc_cyclic(&asid_idr, cd, 0, 1 << ctx_cfg->asid_bits,
+				GFP_ATOMIC);
+	cd->entry.tag = asid;
+	spin_unlock(&asid_lock);
+	idr_preload_end();
+
 	if (asid < 0) {
 		kfree(cd);
 		return ERR_PTR(asid);
 	}
 
-	cd->entry.tag = asid;
-
 	switch (fmt) {
 	case ARM_64_LPAE_S1:
 		cd->ttbr	= cfg->arm_lpae_s1_cfg.ttbr[0];
@@ -322,7 +468,7 @@ arm_smmu_alloc_priv_cd(struct iommu_pasid_table_ops *ops,
 	return &cd->entry;
 
 err_free_asid:
-	ida_simple_remove(&asid_ida, asid);
+	arm_smmu_free_asid(cd);
 
 	kfree(cd);
 
@@ -334,7 +480,14 @@ static void arm_smmu_free_cd(struct iommu_pasid_table_ops *ops,
 {
 	struct arm_smmu_cd *cd = pasid_entry_to_cd(entry);
 
-	ida_simple_remove(&asid_ida, (u16)entry->tag);
+	if (!arm_smmu_free_asid(cd))
+		return;
+
+	if (cd->mm) {
+		/* Unpin ASID */
+		mm_context_put(cd->mm);
+	}
+
 	kfree(cd);
 }
 
@@ -359,6 +512,14 @@ static void arm_smmu_clear_cd(struct iommu_pasid_table_ops *ops, int pasid,
 		return;
 
 	arm_smmu_write_ctx_desc(tbl, pasid, NULL);
+
+	/*
+	 * The ASID allocator won't broadcast the final TLB invalidations for
+	 * this ASID, so we need to do it manually. For private contexts,
+	 * freeing io-pgtable ops performs the invalidation.
+	 */
+	if (cd->mm)
+		iommu_pasid_flush_tlbs(&tbl->pasid, cd->pasid, entry);
 }
 
 static struct iommu_pasid_table *
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 21/37] iommu/arm-smmu-v3: Seize private ASID
  2018-02-12 18:33 ` Jean-Philippe Brucker
  (?)
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: mark.rutland-5wv7dgnIgG8,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	catalin.marinas-5wv7dgnIgG8, xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	will.deacon-5wv7dgnIgG8, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	shunyong.yang-PT9Dzx9SjPiXmMXjJBpWqg,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw-LthD3rsA81gm4RdzfppkhA,
	sudeep.holla-5wv7dgnIgG8, christian.koenig-5C7GfCeVMHo

The SMMU has a single ASID space, the union of shared and private ASID
sets. This means that the PASID lib competes with the arch allocator for
ASIDs. Shared ASIDs are those of Linux processes, allocated by the arch,
and contribute in broadcast TLB maintenance. Private ASIDs are allocated
by the SMMU driver and used for "classic" map/unmap DMA. They require
explicit TLB invalidations.

When we pin down an mm_context and get an ASID that is already in use by
the SMMU, it belongs to a private context. We used to simply abort the
bind, but this is unfair to users that would be unable to bind a few
seemingly random processes. We can try one step further, by allocating a
new private ASID for the context in use, and make the old ASID shared.

Introduce a new lock to prevent races when rewriting context descriptors.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu-v3-context.c | 88 ++++++++++++++++++++++++++++++++++---
 1 file changed, 82 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3-context.c b/drivers/iommu/arm-smmu-v3-context.c
index b7c90384ff56..5b8c5875e0d9 100644
--- a/drivers/iommu/arm-smmu-v3-context.c
+++ b/drivers/iommu/arm-smmu-v3-context.c
@@ -82,6 +82,7 @@
 	(((tcr) >> ARM64_TCR_##fld##_SHIFT & ARM64_TCR_##fld##_MASK)	\
 	 << CTXDESC_CD_0_TCR_##fld##_SHIFT)
 
+#define ARM_SMMU_NO_PASID		(-1)
 
 struct arm_smmu_cd {
 	struct iommu_pasid_entry	entry;
@@ -90,8 +91,14 @@ struct arm_smmu_cd {
 	u64				tcr;
 	u64				mair;
 
+	int				pasid;
+
+	/* 'refs' tracks alloc/free */
 	refcount_t			refs;
+	/* 'users' tracks attach/detach, and is only used for sanity checking */
+	unsigned int			users;
 	struct mm_struct		*mm;
+	struct arm_smmu_cd_tables	*tbl;
 };
 
 #define pasid_entry_to_cd(entry) \
@@ -123,6 +130,7 @@ struct arm_smmu_cd_tables {
 #define pasid_ops_to_tables(ops) \
 	pasid_to_cd_tables(iommu_pasid_table_ops_to_table(ops))
 
+static DEFINE_SPINLOCK(contexts_lock);
 static DEFINE_SPINLOCK(asid_lock);
 static DEFINE_IDR(asid_idr);
 
@@ -209,8 +217,8 @@ static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
 	return val;
 }
 
-static int arm_smmu_write_ctx_desc(struct arm_smmu_cd_tables *tbl, int ssid,
-				    struct arm_smmu_cd *cd)
+static int __arm_smmu_write_ctx_desc(struct arm_smmu_cd_tables *tbl, int ssid,
+				     struct arm_smmu_cd *cd)
 {
 	u64 val;
 	bool cd_live;
@@ -284,6 +292,18 @@ static int arm_smmu_write_ctx_desc(struct arm_smmu_cd_tables *tbl, int ssid,
 	return 0;
 }
 
+static int arm_smmu_write_ctx_desc(struct arm_smmu_cd_tables *tbl, int ssid,
+				   struct arm_smmu_cd *cd)
+{
+	int ret;
+
+	spin_lock(&contexts_lock);
+	ret = __arm_smmu_write_ctx_desc(tbl, ssid, cd);
+	spin_unlock(&contexts_lock);
+
+	return ret;
+}
+
 static bool arm_smmu_free_asid(struct arm_smmu_cd *cd)
 {
 	bool free;
@@ -308,14 +328,25 @@ static struct arm_smmu_cd *arm_smmu_alloc_cd(struct arm_smmu_cd_tables *tbl)
 	if (!cd)
 		return NULL;
 
+	cd->pasid	= ARM_SMMU_NO_PASID;
+	cd->tbl		= tbl;
 	refcount_set(&cd->refs, 1);
 
 	return cd;
 }
 
+/*
+ * Try to reserve this ASID in the SMMU. If it is in use, try to steal it from
+ * the private entry. Careful here, we may be modifying the context tables of
+ * another SMMU!
+ */
 static struct arm_smmu_cd *arm_smmu_share_asid(u16 asid)
 {
+	int ret;
 	struct arm_smmu_cd *cd;
+	struct arm_smmu_cd_tables *tbl;
+	struct arm_smmu_context_cfg *cfg;
+	struct iommu_pasid_entry old_entry;
 
 	cd = idr_find(&asid_idr, asid);
 	if (!cd)
@@ -325,17 +356,47 @@ static struct arm_smmu_cd *arm_smmu_share_asid(u16 asid)
 		/*
 		 * It's pretty common to find a stale CD when doing unbind-bind,
 		 * given that the release happens after a RCU grace period.
-		 * Simply reuse it.
+		 * Simply reuse it, but check that it isn't active, because it's
+		 * going to be assigned a different PASID.
 		 */
+		if (WARN_ON(cd->users))
+			return ERR_PTR(-EINVAL);
+
 		refcount_inc(&cd->refs);
 		return cd;
 	}
 
+	tbl = cd->tbl;
+	cfg = &tbl->pasid.cfg.arm_smmu;
+
+	ret = idr_alloc_cyclic(&asid_idr, cd, 0, 1 << cfg->asid_bits,
+			       GFP_ATOMIC);
+	if (ret < 0)
+		return ERR_PTR(-ENOSPC);
+
+	/* Save the previous ASID */
+	old_entry = cd->entry;
+
+	/*
+	 * Race with unmap; TLB invalidations will start targeting the new ASID,
+	 * which isn't assigned yet. We'll do an invalidate-all on the old ASID
+	 * later, so it doesn't matter.
+	 */
+	cd->entry.tag = ret;
+
 	/*
-	 * Ouch, ASID is already in use for a private cd.
-	 * TODO: seize it, for the common good.
+	 * Update ASID and invalidate CD in all associated masters. There will
+	 * be some overlap between use of both ASIDs, until we invalidate the
+	 * TLB.
 	 */
-	return ERR_PTR(-EEXIST);
+	arm_smmu_write_ctx_desc(tbl, cd->pasid, cd);
+
+	/* Invalidate TLB entries previously associated with that context */
+	iommu_pasid_flush_tlbs(&tbl->pasid, cd->pasid, &old_entry);
+
+	idr_remove(&asid_idr, asid);
+
+	return NULL;
 }
 
 static struct iommu_pasid_entry *
@@ -500,6 +561,15 @@ static int arm_smmu_set_cd(struct iommu_pasid_table_ops *ops, int pasid,
 	if (WARN_ON(pasid > (1 << tbl->pasid.cfg.order)))
 		return -EINVAL;
 
+	if (WARN_ON(cd->pasid != ARM_SMMU_NO_PASID && cd->pasid != pasid))
+		return -EEXIST;
+
+	/*
+	 * There is a single cd structure for each address space, multiple
+	 * devices may use the same in different tables.
+	 */
+	cd->users++;
+	cd->pasid = pasid;
 	return arm_smmu_write_ctx_desc(tbl, pasid, cd);
 }
 
@@ -507,10 +577,16 @@ static void arm_smmu_clear_cd(struct iommu_pasid_table_ops *ops, int pasid,
 			      struct iommu_pasid_entry *entry)
 {
 	struct arm_smmu_cd_tables *tbl = pasid_ops_to_tables(ops);
+	struct arm_smmu_cd *cd = pasid_entry_to_cd(entry);
 
 	if (WARN_ON(pasid > (1 << tbl->pasid.cfg.order)))
 		return;
 
+	WARN_ON(cd->pasid != pasid);
+
+	if (!(--cd->users))
+		cd->pasid = ARM_SMMU_NO_PASID;
+
 	arm_smmu_write_ctx_desc(tbl, pasid, NULL);
 
 	/*
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 21/37] iommu/arm-smmu-v3: Seize private ASID
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

The SMMU has a single ASID space, the union of shared and private ASID
sets. This means that the PASID lib competes with the arch allocator for
ASIDs. Shared ASIDs are those of Linux processes, allocated by the arch,
and contribute in broadcast TLB maintenance. Private ASIDs are allocated
by the SMMU driver and used for "classic" map/unmap DMA. They require
explicit TLB invalidations.

When we pin down an mm_context and get an ASID that is already in use by
the SMMU, it belongs to a private context. We used to simply abort the
bind, but this is unfair to users that would be unable to bind a few
seemingly random processes. We can try one step further, by allocating a
new private ASID for the context in use, and make the old ASID shared.

Introduce a new lock to prevent races when rewriting context descriptors.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3-context.c | 88 ++++++++++++++++++++++++++++++++++---
 1 file changed, 82 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3-context.c b/drivers/iommu/arm-smmu-v3-context.c
index b7c90384ff56..5b8c5875e0d9 100644
--- a/drivers/iommu/arm-smmu-v3-context.c
+++ b/drivers/iommu/arm-smmu-v3-context.c
@@ -82,6 +82,7 @@
 	(((tcr) >> ARM64_TCR_##fld##_SHIFT & ARM64_TCR_##fld##_MASK)	\
 	 << CTXDESC_CD_0_TCR_##fld##_SHIFT)
 
+#define ARM_SMMU_NO_PASID		(-1)
 
 struct arm_smmu_cd {
 	struct iommu_pasid_entry	entry;
@@ -90,8 +91,14 @@ struct arm_smmu_cd {
 	u64				tcr;
 	u64				mair;
 
+	int				pasid;
+
+	/* 'refs' tracks alloc/free */
 	refcount_t			refs;
+	/* 'users' tracks attach/detach, and is only used for sanity checking */
+	unsigned int			users;
 	struct mm_struct		*mm;
+	struct arm_smmu_cd_tables	*tbl;
 };
 
 #define pasid_entry_to_cd(entry) \
@@ -123,6 +130,7 @@ struct arm_smmu_cd_tables {
 #define pasid_ops_to_tables(ops) \
 	pasid_to_cd_tables(iommu_pasid_table_ops_to_table(ops))
 
+static DEFINE_SPINLOCK(contexts_lock);
 static DEFINE_SPINLOCK(asid_lock);
 static DEFINE_IDR(asid_idr);
 
@@ -209,8 +217,8 @@ static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
 	return val;
 }
 
-static int arm_smmu_write_ctx_desc(struct arm_smmu_cd_tables *tbl, int ssid,
-				    struct arm_smmu_cd *cd)
+static int __arm_smmu_write_ctx_desc(struct arm_smmu_cd_tables *tbl, int ssid,
+				     struct arm_smmu_cd *cd)
 {
 	u64 val;
 	bool cd_live;
@@ -284,6 +292,18 @@ static int arm_smmu_write_ctx_desc(struct arm_smmu_cd_tables *tbl, int ssid,
 	return 0;
 }
 
+static int arm_smmu_write_ctx_desc(struct arm_smmu_cd_tables *tbl, int ssid,
+				   struct arm_smmu_cd *cd)
+{
+	int ret;
+
+	spin_lock(&contexts_lock);
+	ret = __arm_smmu_write_ctx_desc(tbl, ssid, cd);
+	spin_unlock(&contexts_lock);
+
+	return ret;
+}
+
 static bool arm_smmu_free_asid(struct arm_smmu_cd *cd)
 {
 	bool free;
@@ -308,14 +328,25 @@ static struct arm_smmu_cd *arm_smmu_alloc_cd(struct arm_smmu_cd_tables *tbl)
 	if (!cd)
 		return NULL;
 
+	cd->pasid	= ARM_SMMU_NO_PASID;
+	cd->tbl		= tbl;
 	refcount_set(&cd->refs, 1);
 
 	return cd;
 }
 
+/*
+ * Try to reserve this ASID in the SMMU. If it is in use, try to steal it from
+ * the private entry. Careful here, we may be modifying the context tables of
+ * another SMMU!
+ */
 static struct arm_smmu_cd *arm_smmu_share_asid(u16 asid)
 {
+	int ret;
 	struct arm_smmu_cd *cd;
+	struct arm_smmu_cd_tables *tbl;
+	struct arm_smmu_context_cfg *cfg;
+	struct iommu_pasid_entry old_entry;
 
 	cd = idr_find(&asid_idr, asid);
 	if (!cd)
@@ -325,17 +356,47 @@ static struct arm_smmu_cd *arm_smmu_share_asid(u16 asid)
 		/*
 		 * It's pretty common to find a stale CD when doing unbind-bind,
 		 * given that the release happens after a RCU grace period.
-		 * Simply reuse it.
+		 * Simply reuse it, but check that it isn't active, because it's
+		 * going to be assigned a different PASID.
 		 */
+		if (WARN_ON(cd->users))
+			return ERR_PTR(-EINVAL);
+
 		refcount_inc(&cd->refs);
 		return cd;
 	}
 
+	tbl = cd->tbl;
+	cfg = &tbl->pasid.cfg.arm_smmu;
+
+	ret = idr_alloc_cyclic(&asid_idr, cd, 0, 1 << cfg->asid_bits,
+			       GFP_ATOMIC);
+	if (ret < 0)
+		return ERR_PTR(-ENOSPC);
+
+	/* Save the previous ASID */
+	old_entry = cd->entry;
+
+	/*
+	 * Race with unmap; TLB invalidations will start targeting the new ASID,
+	 * which isn't assigned yet. We'll do an invalidate-all on the old ASID
+	 * later, so it doesn't matter.
+	 */
+	cd->entry.tag = ret;
+
 	/*
-	 * Ouch, ASID is already in use for a private cd.
-	 * TODO: seize it, for the common good.
+	 * Update ASID and invalidate CD in all associated masters. There will
+	 * be some overlap between use of both ASIDs, until we invalidate the
+	 * TLB.
 	 */
-	return ERR_PTR(-EEXIST);
+	arm_smmu_write_ctx_desc(tbl, cd->pasid, cd);
+
+	/* Invalidate TLB entries previously associated with that context */
+	iommu_pasid_flush_tlbs(&tbl->pasid, cd->pasid, &old_entry);
+
+	idr_remove(&asid_idr, asid);
+
+	return NULL;
 }
 
 static struct iommu_pasid_entry *
@@ -500,6 +561,15 @@ static int arm_smmu_set_cd(struct iommu_pasid_table_ops *ops, int pasid,
 	if (WARN_ON(pasid > (1 << tbl->pasid.cfg.order)))
 		return -EINVAL;
 
+	if (WARN_ON(cd->pasid != ARM_SMMU_NO_PASID && cd->pasid != pasid))
+		return -EEXIST;
+
+	/*
+	 * There is a single cd structure for each address space, multiple
+	 * devices may use the same in different tables.
+	 */
+	cd->users++;
+	cd->pasid = pasid;
 	return arm_smmu_write_ctx_desc(tbl, pasid, cd);
 }
 
@@ -507,10 +577,16 @@ static void arm_smmu_clear_cd(struct iommu_pasid_table_ops *ops, int pasid,
 			      struct iommu_pasid_entry *entry)
 {
 	struct arm_smmu_cd_tables *tbl = pasid_ops_to_tables(ops);
+	struct arm_smmu_cd *cd = pasid_entry_to_cd(entry);
 
 	if (WARN_ON(pasid > (1 << tbl->pasid.cfg.order)))
 		return;
 
+	WARN_ON(cd->pasid != pasid);
+
+	if (!(--cd->users))
+		cd->pasid = ARM_SMMU_NO_PASID;
+
 	arm_smmu_write_ctx_desc(tbl, pasid, NULL);
 
 	/*
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 21/37] iommu/arm-smmu-v3: Seize private ASID
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

The SMMU has a single ASID space, the union of shared and private ASID
sets. This means that the PASID lib competes with the arch allocator for
ASIDs. Shared ASIDs are those of Linux processes, allocated by the arch,
and contribute in broadcast TLB maintenance. Private ASIDs are allocated
by the SMMU driver and used for "classic" map/unmap DMA. They require
explicit TLB invalidations.

When we pin down an mm_context and get an ASID that is already in use by
the SMMU, it belongs to a private context. We used to simply abort the
bind, but this is unfair to users that would be unable to bind a few
seemingly random processes. We can try one step further, by allocating a
new private ASID for the context in use, and make the old ASID shared.

Introduce a new lock to prevent races when rewriting context descriptors.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3-context.c | 88 ++++++++++++++++++++++++++++++++++---
 1 file changed, 82 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3-context.c b/drivers/iommu/arm-smmu-v3-context.c
index b7c90384ff56..5b8c5875e0d9 100644
--- a/drivers/iommu/arm-smmu-v3-context.c
+++ b/drivers/iommu/arm-smmu-v3-context.c
@@ -82,6 +82,7 @@
 	(((tcr) >> ARM64_TCR_##fld##_SHIFT & ARM64_TCR_##fld##_MASK)	\
 	 << CTXDESC_CD_0_TCR_##fld##_SHIFT)
 
+#define ARM_SMMU_NO_PASID		(-1)
 
 struct arm_smmu_cd {
 	struct iommu_pasid_entry	entry;
@@ -90,8 +91,14 @@ struct arm_smmu_cd {
 	u64				tcr;
 	u64				mair;
 
+	int				pasid;
+
+	/* 'refs' tracks alloc/free */
 	refcount_t			refs;
+	/* 'users' tracks attach/detach, and is only used for sanity checking */
+	unsigned int			users;
 	struct mm_struct		*mm;
+	struct arm_smmu_cd_tables	*tbl;
 };
 
 #define pasid_entry_to_cd(entry) \
@@ -123,6 +130,7 @@ struct arm_smmu_cd_tables {
 #define pasid_ops_to_tables(ops) \
 	pasid_to_cd_tables(iommu_pasid_table_ops_to_table(ops))
 
+static DEFINE_SPINLOCK(contexts_lock);
 static DEFINE_SPINLOCK(asid_lock);
 static DEFINE_IDR(asid_idr);
 
@@ -209,8 +217,8 @@ static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
 	return val;
 }
 
-static int arm_smmu_write_ctx_desc(struct arm_smmu_cd_tables *tbl, int ssid,
-				    struct arm_smmu_cd *cd)
+static int __arm_smmu_write_ctx_desc(struct arm_smmu_cd_tables *tbl, int ssid,
+				     struct arm_smmu_cd *cd)
 {
 	u64 val;
 	bool cd_live;
@@ -284,6 +292,18 @@ static int arm_smmu_write_ctx_desc(struct arm_smmu_cd_tables *tbl, int ssid,
 	return 0;
 }
 
+static int arm_smmu_write_ctx_desc(struct arm_smmu_cd_tables *tbl, int ssid,
+				   struct arm_smmu_cd *cd)
+{
+	int ret;
+
+	spin_lock(&contexts_lock);
+	ret = __arm_smmu_write_ctx_desc(tbl, ssid, cd);
+	spin_unlock(&contexts_lock);
+
+	return ret;
+}
+
 static bool arm_smmu_free_asid(struct arm_smmu_cd *cd)
 {
 	bool free;
@@ -308,14 +328,25 @@ static struct arm_smmu_cd *arm_smmu_alloc_cd(struct arm_smmu_cd_tables *tbl)
 	if (!cd)
 		return NULL;
 
+	cd->pasid	= ARM_SMMU_NO_PASID;
+	cd->tbl		= tbl;
 	refcount_set(&cd->refs, 1);
 
 	return cd;
 }
 
+/*
+ * Try to reserve this ASID in the SMMU. If it is in use, try to steal it from
+ * the private entry. Careful here, we may be modifying the context tables of
+ * another SMMU!
+ */
 static struct arm_smmu_cd *arm_smmu_share_asid(u16 asid)
 {
+	int ret;
 	struct arm_smmu_cd *cd;
+	struct arm_smmu_cd_tables *tbl;
+	struct arm_smmu_context_cfg *cfg;
+	struct iommu_pasid_entry old_entry;
 
 	cd = idr_find(&asid_idr, asid);
 	if (!cd)
@@ -325,17 +356,47 @@ static struct arm_smmu_cd *arm_smmu_share_asid(u16 asid)
 		/*
 		 * It's pretty common to find a stale CD when doing unbind-bind,
 		 * given that the release happens after a RCU grace period.
-		 * Simply reuse it.
+		 * Simply reuse it, but check that it isn't active, because it's
+		 * going to be assigned a different PASID.
 		 */
+		if (WARN_ON(cd->users))
+			return ERR_PTR(-EINVAL);
+
 		refcount_inc(&cd->refs);
 		return cd;
 	}
 
+	tbl = cd->tbl;
+	cfg = &tbl->pasid.cfg.arm_smmu;
+
+	ret = idr_alloc_cyclic(&asid_idr, cd, 0, 1 << cfg->asid_bits,
+			       GFP_ATOMIC);
+	if (ret < 0)
+		return ERR_PTR(-ENOSPC);
+
+	/* Save the previous ASID */
+	old_entry = cd->entry;
+
+	/*
+	 * Race with unmap; TLB invalidations will start targeting the new ASID,
+	 * which isn't assigned yet. We'll do an invalidate-all on the old ASID
+	 * later, so it doesn't matter.
+	 */
+	cd->entry.tag = ret;
+
 	/*
-	 * Ouch, ASID is already in use for a private cd.
-	 * TODO: seize it, for the common good.
+	 * Update ASID and invalidate CD in all associated masters. There will
+	 * be some overlap between use of both ASIDs, until we invalidate the
+	 * TLB.
 	 */
-	return ERR_PTR(-EEXIST);
+	arm_smmu_write_ctx_desc(tbl, cd->pasid, cd);
+
+	/* Invalidate TLB entries previously associated with that context */
+	iommu_pasid_flush_tlbs(&tbl->pasid, cd->pasid, &old_entry);
+
+	idr_remove(&asid_idr, asid);
+
+	return NULL;
 }
 
 static struct iommu_pasid_entry *
@@ -500,6 +561,15 @@ static int arm_smmu_set_cd(struct iommu_pasid_table_ops *ops, int pasid,
 	if (WARN_ON(pasid > (1 << tbl->pasid.cfg.order)))
 		return -EINVAL;
 
+	if (WARN_ON(cd->pasid != ARM_SMMU_NO_PASID && cd->pasid != pasid))
+		return -EEXIST;
+
+	/*
+	 * There is a single cd structure for each address space, multiple
+	 * devices may use the same in different tables.
+	 */
+	cd->users++;
+	cd->pasid = pasid;
 	return arm_smmu_write_ctx_desc(tbl, pasid, cd);
 }
 
@@ -507,10 +577,16 @@ static void arm_smmu_clear_cd(struct iommu_pasid_table_ops *ops, int pasid,
 			      struct iommu_pasid_entry *entry)
 {
 	struct arm_smmu_cd_tables *tbl = pasid_ops_to_tables(ops);
+	struct arm_smmu_cd *cd = pasid_entry_to_cd(entry);
 
 	if (WARN_ON(pasid > (1 << tbl->pasid.cfg.order)))
 		return;
 
+	WARN_ON(cd->pasid != pasid);
+
+	if (!(--cd->users))
+		cd->pasid = ARM_SMMU_NO_PASID;
+
 	arm_smmu_write_ctx_desc(tbl, pasid, NULL);
 
 	/*
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 22/37] iommu/arm-smmu-v3: Add support for VHE
  2018-02-12 18:33 ` Jean-Philippe Brucker
  (?)
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: mark.rutland-5wv7dgnIgG8,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	catalin.marinas-5wv7dgnIgG8, xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	will.deacon-5wv7dgnIgG8, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	shunyong.yang-PT9Dzx9SjPiXmMXjJBpWqg,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw-LthD3rsA81gm4RdzfppkhA,
	sudeep.holla-5wv7dgnIgG8, christian.koenig-5C7GfCeVMHo

ARMv8.1 extensions added Virtualization Host Extensions (VHE), which allow
to run a host kernel at EL2. When using normal DMA, Device and CPU address
spaces are dissociated, and do not need to implement the same
capabilities, so VHE hasn't been used in the SMMU until now.

With shared address spaces however, ASIDs are shared between MMU and SMMU,
and broadcast TLB invalidations issued by a CPU are taken into account by
the SMMU. TLB entries on both sides need to have identical exception level
in order to be cleared with a single invalidation.

When the CPU is using VHE, enable VHE in the SMMU for all STEs. Normal DMA
mappings will need to use TLBI_EL2 commands instead of TLBI_NH, but
shouldn't be otherwise affected by this change.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu-v3.c | 32 ++++++++++++++++++++++++++------
 1 file changed, 26 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index a307c6885dc0..1ca40ef51c47 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -22,6 +22,7 @@
 
 #include <linux/acpi.h>
 #include <linux/acpi_iort.h>
+#include <linux/cpufeature.h>
 #include <linux/delay.h>
 #include <linux/dma-iommu.h>
 #include <linux/err.h>
@@ -446,6 +447,8 @@ struct arm_smmu_cmdq_ent {
 		#define CMDQ_OP_TLBI_NH_ASID	0x11
 		#define CMDQ_OP_TLBI_NH_VA	0x12
 		#define CMDQ_OP_TLBI_EL2_ALL	0x20
+		#define CMDQ_OP_TLBI_EL2_ASID	0x21
+		#define CMDQ_OP_TLBI_EL2_VA	0x22
 		#define CMDQ_OP_TLBI_S12_VMALL	0x28
 		#define CMDQ_OP_TLBI_S2_IPA	0x2a
 		#define CMDQ_OP_TLBI_NSNH_ALL	0x30
@@ -564,6 +567,7 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_STALLS		(1 << 11)
 #define ARM_SMMU_FEAT_HYP		(1 << 12)
 #define ARM_SMMU_FEAT_STALL_FORCE	(1 << 13)
+#define ARM_SMMU_FEAT_E2H		(1 << 14)
 	u32				features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
@@ -823,6 +827,7 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 		cmd[1] |= CMDQ_CFGI_1_RANGE_MASK << CMDQ_CFGI_1_RANGE_SHIFT;
 		break;
 	case CMDQ_OP_TLBI_NH_VA:
+	case CMDQ_OP_TLBI_EL2_VA:
 		cmd[0] |= (u64)ent->tlbi.asid << CMDQ_TLBI_0_ASID_SHIFT;
 		cmd[1] |= ent->tlbi.leaf ? CMDQ_TLBI_1_LEAF : 0;
 		cmd[1] |= ent->tlbi.addr & CMDQ_TLBI_1_VA_MASK;
@@ -838,6 +843,9 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 	case CMDQ_OP_TLBI_S12_VMALL:
 		cmd[0] |= (u64)ent->tlbi.vmid << CMDQ_TLBI_0_VMID_SHIFT;
 		break;
+	case CMDQ_OP_TLBI_EL2_ASID:
+		cmd[0] |= (u64)ent->tlbi.asid << CMDQ_TLBI_0_ASID_SHIFT;
+		break;
 	case CMDQ_OP_PRI_RESP:
 		cmd[0] |= ent->substream_valid ? CMDQ_0_SSV : 0;
 		cmd[0] |= ent->pri.ssid << CMDQ_PRI_0_SSID_SHIFT;
@@ -1130,7 +1138,9 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 #ifdef CONFIG_PCI_ATS
 			 STRTAB_STE_1_EATS_TRANS << STRTAB_STE_1_EATS_SHIFT |
 #endif
-			 STRTAB_STE_1_STRW_NSEL1 << STRTAB_STE_1_STRW_SHIFT);
+			 (smmu->features & ARM_SMMU_FEAT_E2H ?
+			  STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1) <<
+			 STRTAB_STE_1_STRW_SHIFT);
 
 		if (smmu->features & ARM_SMMU_FEAT_STALLS &&
 		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
@@ -1386,7 +1396,8 @@ static void arm_smmu_tlb_inv_context(void *cookie)
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
 		if (unlikely(!smmu_domain->s1_cfg.cd0))
 			return;
-		cmd.opcode	= CMDQ_OP_TLBI_NH_ASID;
+		cmd.opcode	= smmu->features & ARM_SMMU_FEAT_E2H ?
+				  CMDQ_OP_TLBI_EL2_ASID : CMDQ_OP_TLBI_NH_ASID;
 		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd0->tag;
 		cmd.tlbi.vmid	= 0;
 	} else {
@@ -1413,7 +1424,8 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
 		if (unlikely(!smmu_domain->s1_cfg.cd0))
 			return;
-		cmd.opcode	= CMDQ_OP_TLBI_NH_VA;
+		cmd.opcode	= smmu->features & ARM_SMMU_FEAT_E2H ?
+				  CMDQ_OP_TLBI_EL2_VA : CMDQ_OP_TLBI_NH_VA;
 		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd0->tag;
 	} else {
 		cmd.opcode	= CMDQ_OP_TLBI_S2_IPA;
@@ -1483,7 +1495,8 @@ static void arm_smmu_tlb_inv_ssid(void *cookie, int ssid,
 	struct arm_smmu_domain *smmu_domain = cookie;
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	struct arm_smmu_cmdq_ent cmd = {
-		.opcode		= CMDQ_OP_TLBI_NH_ASID,
+		.opcode		= smmu->features & ARM_SMMU_FEAT_E2H ?
+				  CMDQ_OP_TLBI_EL2_ASID : CMDQ_OP_TLBI_NH_ASID,
 		.tlbi.asid	= entry->tag,
 	};
 
@@ -2500,7 +2513,11 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
 	writel_relaxed(reg, smmu->base + ARM_SMMU_CR1);
 
 	/* CR2 (random crap) */
-	reg = CR2_PTM | CR2_RECINVSID | CR2_E2H;
+	reg = CR2_PTM | CR2_RECINVSID;
+
+	if (smmu->features & ARM_SMMU_FEAT_E2H)
+		reg |= CR2_E2H;
+
 	writel_relaxed(reg, smmu->base + ARM_SMMU_CR2);
 
 	/* Stream table */
@@ -2648,8 +2665,11 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 	if (reg & IDR0_MSI)
 		smmu->features |= ARM_SMMU_FEAT_MSI;
 
-	if (reg & IDR0_HYP)
+	if (reg & IDR0_HYP) {
 		smmu->features |= ARM_SMMU_FEAT_HYP;
+		if (cpus_have_cap(ARM64_HAS_VIRT_HOST_EXTN))
+			smmu->features |= ARM_SMMU_FEAT_E2H;
+	}
 
 	/*
 	 * The coherency feature as set by FW is used in preference to the ID
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 22/37] iommu/arm-smmu-v3: Add support for VHE
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

ARMv8.1 extensions added Virtualization Host Extensions (VHE), which allow
to run a host kernel at EL2. When using normal DMA, Device and CPU address
spaces are dissociated, and do not need to implement the same
capabilities, so VHE hasn't been used in the SMMU until now.

With shared address spaces however, ASIDs are shared between MMU and SMMU,
and broadcast TLB invalidations issued by a CPU are taken into account by
the SMMU. TLB entries on both sides need to have identical exception level
in order to be cleared with a single invalidation.

When the CPU is using VHE, enable VHE in the SMMU for all STEs. Normal DMA
mappings will need to use TLBI_EL2 commands instead of TLBI_NH, but
shouldn't be otherwise affected by this change.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 32 ++++++++++++++++++++++++++------
 1 file changed, 26 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index a307c6885dc0..1ca40ef51c47 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -22,6 +22,7 @@
 
 #include <linux/acpi.h>
 #include <linux/acpi_iort.h>
+#include <linux/cpufeature.h>
 #include <linux/delay.h>
 #include <linux/dma-iommu.h>
 #include <linux/err.h>
@@ -446,6 +447,8 @@ struct arm_smmu_cmdq_ent {
 		#define CMDQ_OP_TLBI_NH_ASID	0x11
 		#define CMDQ_OP_TLBI_NH_VA	0x12
 		#define CMDQ_OP_TLBI_EL2_ALL	0x20
+		#define CMDQ_OP_TLBI_EL2_ASID	0x21
+		#define CMDQ_OP_TLBI_EL2_VA	0x22
 		#define CMDQ_OP_TLBI_S12_VMALL	0x28
 		#define CMDQ_OP_TLBI_S2_IPA	0x2a
 		#define CMDQ_OP_TLBI_NSNH_ALL	0x30
@@ -564,6 +567,7 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_STALLS		(1 << 11)
 #define ARM_SMMU_FEAT_HYP		(1 << 12)
 #define ARM_SMMU_FEAT_STALL_FORCE	(1 << 13)
+#define ARM_SMMU_FEAT_E2H		(1 << 14)
 	u32				features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
@@ -823,6 +827,7 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 		cmd[1] |= CMDQ_CFGI_1_RANGE_MASK << CMDQ_CFGI_1_RANGE_SHIFT;
 		break;
 	case CMDQ_OP_TLBI_NH_VA:
+	case CMDQ_OP_TLBI_EL2_VA:
 		cmd[0] |= (u64)ent->tlbi.asid << CMDQ_TLBI_0_ASID_SHIFT;
 		cmd[1] |= ent->tlbi.leaf ? CMDQ_TLBI_1_LEAF : 0;
 		cmd[1] |= ent->tlbi.addr & CMDQ_TLBI_1_VA_MASK;
@@ -838,6 +843,9 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 	case CMDQ_OP_TLBI_S12_VMALL:
 		cmd[0] |= (u64)ent->tlbi.vmid << CMDQ_TLBI_0_VMID_SHIFT;
 		break;
+	case CMDQ_OP_TLBI_EL2_ASID:
+		cmd[0] |= (u64)ent->tlbi.asid << CMDQ_TLBI_0_ASID_SHIFT;
+		break;
 	case CMDQ_OP_PRI_RESP:
 		cmd[0] |= ent->substream_valid ? CMDQ_0_SSV : 0;
 		cmd[0] |= ent->pri.ssid << CMDQ_PRI_0_SSID_SHIFT;
@@ -1130,7 +1138,9 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 #ifdef CONFIG_PCI_ATS
 			 STRTAB_STE_1_EATS_TRANS << STRTAB_STE_1_EATS_SHIFT |
 #endif
-			 STRTAB_STE_1_STRW_NSEL1 << STRTAB_STE_1_STRW_SHIFT);
+			 (smmu->features & ARM_SMMU_FEAT_E2H ?
+			  STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1) <<
+			 STRTAB_STE_1_STRW_SHIFT);
 
 		if (smmu->features & ARM_SMMU_FEAT_STALLS &&
 		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
@@ -1386,7 +1396,8 @@ static void arm_smmu_tlb_inv_context(void *cookie)
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
 		if (unlikely(!smmu_domain->s1_cfg.cd0))
 			return;
-		cmd.opcode	= CMDQ_OP_TLBI_NH_ASID;
+		cmd.opcode	= smmu->features & ARM_SMMU_FEAT_E2H ?
+				  CMDQ_OP_TLBI_EL2_ASID : CMDQ_OP_TLBI_NH_ASID;
 		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd0->tag;
 		cmd.tlbi.vmid	= 0;
 	} else {
@@ -1413,7 +1424,8 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
 		if (unlikely(!smmu_domain->s1_cfg.cd0))
 			return;
-		cmd.opcode	= CMDQ_OP_TLBI_NH_VA;
+		cmd.opcode	= smmu->features & ARM_SMMU_FEAT_E2H ?
+				  CMDQ_OP_TLBI_EL2_VA : CMDQ_OP_TLBI_NH_VA;
 		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd0->tag;
 	} else {
 		cmd.opcode	= CMDQ_OP_TLBI_S2_IPA;
@@ -1483,7 +1495,8 @@ static void arm_smmu_tlb_inv_ssid(void *cookie, int ssid,
 	struct arm_smmu_domain *smmu_domain = cookie;
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	struct arm_smmu_cmdq_ent cmd = {
-		.opcode		= CMDQ_OP_TLBI_NH_ASID,
+		.opcode		= smmu->features & ARM_SMMU_FEAT_E2H ?
+				  CMDQ_OP_TLBI_EL2_ASID : CMDQ_OP_TLBI_NH_ASID,
 		.tlbi.asid	= entry->tag,
 	};
 
@@ -2500,7 +2513,11 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
 	writel_relaxed(reg, smmu->base + ARM_SMMU_CR1);
 
 	/* CR2 (random crap) */
-	reg = CR2_PTM | CR2_RECINVSID | CR2_E2H;
+	reg = CR2_PTM | CR2_RECINVSID;
+
+	if (smmu->features & ARM_SMMU_FEAT_E2H)
+		reg |= CR2_E2H;
+
 	writel_relaxed(reg, smmu->base + ARM_SMMU_CR2);
 
 	/* Stream table */
@@ -2648,8 +2665,11 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 	if (reg & IDR0_MSI)
 		smmu->features |= ARM_SMMU_FEAT_MSI;
 
-	if (reg & IDR0_HYP)
+	if (reg & IDR0_HYP) {
 		smmu->features |= ARM_SMMU_FEAT_HYP;
+		if (cpus_have_cap(ARM64_HAS_VIRT_HOST_EXTN))
+			smmu->features |= ARM_SMMU_FEAT_E2H;
+	}
 
 	/*
 	 * The coherency feature as set by FW is used in preference to the ID
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 22/37] iommu/arm-smmu-v3: Add support for VHE
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

ARMv8.1 extensions added Virtualization Host Extensions (VHE), which allow
to run a host kernel at EL2. When using normal DMA, Device and CPU address
spaces are dissociated, and do not need to implement the same
capabilities, so VHE hasn't been used in the SMMU until now.

With shared address spaces however, ASIDs are shared between MMU and SMMU,
and broadcast TLB invalidations issued by a CPU are taken into account by
the SMMU. TLB entries on both sides need to have identical exception level
in order to be cleared with a single invalidation.

When the CPU is using VHE, enable VHE in the SMMU for all STEs. Normal DMA
mappings will need to use TLBI_EL2 commands instead of TLBI_NH, but
shouldn't be otherwise affected by this change.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 32 ++++++++++++++++++++++++++------
 1 file changed, 26 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index a307c6885dc0..1ca40ef51c47 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -22,6 +22,7 @@
 
 #include <linux/acpi.h>
 #include <linux/acpi_iort.h>
+#include <linux/cpufeature.h>
 #include <linux/delay.h>
 #include <linux/dma-iommu.h>
 #include <linux/err.h>
@@ -446,6 +447,8 @@ struct arm_smmu_cmdq_ent {
 		#define CMDQ_OP_TLBI_NH_ASID	0x11
 		#define CMDQ_OP_TLBI_NH_VA	0x12
 		#define CMDQ_OP_TLBI_EL2_ALL	0x20
+		#define CMDQ_OP_TLBI_EL2_ASID	0x21
+		#define CMDQ_OP_TLBI_EL2_VA	0x22
 		#define CMDQ_OP_TLBI_S12_VMALL	0x28
 		#define CMDQ_OP_TLBI_S2_IPA	0x2a
 		#define CMDQ_OP_TLBI_NSNH_ALL	0x30
@@ -564,6 +567,7 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_STALLS		(1 << 11)
 #define ARM_SMMU_FEAT_HYP		(1 << 12)
 #define ARM_SMMU_FEAT_STALL_FORCE	(1 << 13)
+#define ARM_SMMU_FEAT_E2H		(1 << 14)
 	u32				features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
@@ -823,6 +827,7 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 		cmd[1] |= CMDQ_CFGI_1_RANGE_MASK << CMDQ_CFGI_1_RANGE_SHIFT;
 		break;
 	case CMDQ_OP_TLBI_NH_VA:
+	case CMDQ_OP_TLBI_EL2_VA:
 		cmd[0] |= (u64)ent->tlbi.asid << CMDQ_TLBI_0_ASID_SHIFT;
 		cmd[1] |= ent->tlbi.leaf ? CMDQ_TLBI_1_LEAF : 0;
 		cmd[1] |= ent->tlbi.addr & CMDQ_TLBI_1_VA_MASK;
@@ -838,6 +843,9 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 	case CMDQ_OP_TLBI_S12_VMALL:
 		cmd[0] |= (u64)ent->tlbi.vmid << CMDQ_TLBI_0_VMID_SHIFT;
 		break;
+	case CMDQ_OP_TLBI_EL2_ASID:
+		cmd[0] |= (u64)ent->tlbi.asid << CMDQ_TLBI_0_ASID_SHIFT;
+		break;
 	case CMDQ_OP_PRI_RESP:
 		cmd[0] |= ent->substream_valid ? CMDQ_0_SSV : 0;
 		cmd[0] |= ent->pri.ssid << CMDQ_PRI_0_SSID_SHIFT;
@@ -1130,7 +1138,9 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 #ifdef CONFIG_PCI_ATS
 			 STRTAB_STE_1_EATS_TRANS << STRTAB_STE_1_EATS_SHIFT |
 #endif
-			 STRTAB_STE_1_STRW_NSEL1 << STRTAB_STE_1_STRW_SHIFT);
+			 (smmu->features & ARM_SMMU_FEAT_E2H ?
+			  STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1) <<
+			 STRTAB_STE_1_STRW_SHIFT);
 
 		if (smmu->features & ARM_SMMU_FEAT_STALLS &&
 		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
@@ -1386,7 +1396,8 @@ static void arm_smmu_tlb_inv_context(void *cookie)
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
 		if (unlikely(!smmu_domain->s1_cfg.cd0))
 			return;
-		cmd.opcode	= CMDQ_OP_TLBI_NH_ASID;
+		cmd.opcode	= smmu->features & ARM_SMMU_FEAT_E2H ?
+				  CMDQ_OP_TLBI_EL2_ASID : CMDQ_OP_TLBI_NH_ASID;
 		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd0->tag;
 		cmd.tlbi.vmid	= 0;
 	} else {
@@ -1413,7 +1424,8 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
 		if (unlikely(!smmu_domain->s1_cfg.cd0))
 			return;
-		cmd.opcode	= CMDQ_OP_TLBI_NH_VA;
+		cmd.opcode	= smmu->features & ARM_SMMU_FEAT_E2H ?
+				  CMDQ_OP_TLBI_EL2_VA : CMDQ_OP_TLBI_NH_VA;
 		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd0->tag;
 	} else {
 		cmd.opcode	= CMDQ_OP_TLBI_S2_IPA;
@@ -1483,7 +1495,8 @@ static void arm_smmu_tlb_inv_ssid(void *cookie, int ssid,
 	struct arm_smmu_domain *smmu_domain = cookie;
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	struct arm_smmu_cmdq_ent cmd = {
-		.opcode		= CMDQ_OP_TLBI_NH_ASID,
+		.opcode		= smmu->features & ARM_SMMU_FEAT_E2H ?
+				  CMDQ_OP_TLBI_EL2_ASID : CMDQ_OP_TLBI_NH_ASID,
 		.tlbi.asid	= entry->tag,
 	};
 
@@ -2500,7 +2513,11 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
 	writel_relaxed(reg, smmu->base + ARM_SMMU_CR1);
 
 	/* CR2 (random crap) */
-	reg = CR2_PTM | CR2_RECINVSID | CR2_E2H;
+	reg = CR2_PTM | CR2_RECINVSID;
+
+	if (smmu->features & ARM_SMMU_FEAT_E2H)
+		reg |= CR2_E2H;
+
 	writel_relaxed(reg, smmu->base + ARM_SMMU_CR2);
 
 	/* Stream table */
@@ -2648,8 +2665,11 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 	if (reg & IDR0_MSI)
 		smmu->features |= ARM_SMMU_FEAT_MSI;
 
-	if (reg & IDR0_HYP)
+	if (reg & IDR0_HYP) {
 		smmu->features |= ARM_SMMU_FEAT_HYP;
+		if (cpus_have_cap(ARM64_HAS_VIRT_HOST_EXTN))
+			smmu->features |= ARM_SMMU_FEAT_E2H;
+	}
 
 	/*
 	 * The coherency feature as set by FW is used in preference to the ID
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 23/37] iommu/arm-smmu-v3: Enable broadcast TLB maintenance
  2018-02-12 18:33 ` Jean-Philippe Brucker
@ 2018-02-12 18:33   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

The SMMUv3 can handle invalidation targeted at TLB entries with shared
ASIDs. If the implementation supports broadcast TLB maintenance, enable it
and keep track of it in a feature bit. The SMMU will then be affected by
inner-shareable TLB invalidations from other agents.

A major side-effect of this change is that stage-2 translation contexts
are now affected by all invalidations by VMID. VMIDs are all shared and
the only ways to prevent over-invalidation, since the stage-2 page tables
are not shared between CPU and SMMU, are to either disable BTM or allocate
different VMIDs. This patch does not address the problem.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 1ca40ef51c47..98690589156b 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -65,6 +65,7 @@
 #define IDR0_ASID16			(1 << 12)
 #define IDR0_ATS			(1 << 10)
 #define IDR0_HYP			(1 << 9)
+#define IDR0_BTM			(1 << 5)
 #define IDR0_COHACC			(1 << 4)
 #define IDR0_TTF_SHIFT			2
 #define IDR0_TTF_MASK			0x3
@@ -568,6 +569,7 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_HYP		(1 << 12)
 #define ARM_SMMU_FEAT_STALL_FORCE	(1 << 13)
 #define ARM_SMMU_FEAT_E2H		(1 << 14)
+#define ARM_SMMU_FEAT_BTM		(1 << 15)
 	u32				features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
@@ -2513,11 +2515,14 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
 	writel_relaxed(reg, smmu->base + ARM_SMMU_CR1);
 
 	/* CR2 (random crap) */
-	reg = CR2_PTM | CR2_RECINVSID;
+	reg = CR2_RECINVSID;
 
 	if (smmu->features & ARM_SMMU_FEAT_E2H)
 		reg |= CR2_E2H;
 
+	if (!(smmu->features & ARM_SMMU_FEAT_BTM))
+		reg |= CR2_PTM;
+
 	writel_relaxed(reg, smmu->base + ARM_SMMU_CR2);
 
 	/* Stream table */
@@ -2618,6 +2623,7 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 {
 	u32 reg;
 	bool coherent = smmu->features & ARM_SMMU_FEAT_COHERENCY;
+	bool vhe = cpus_have_cap(ARM64_HAS_VIRT_HOST_EXTN);
 
 	/* IDR0 */
 	reg = readl_relaxed(smmu->base + ARM_SMMU_IDR0);
@@ -2667,10 +2673,19 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 
 	if (reg & IDR0_HYP) {
 		smmu->features |= ARM_SMMU_FEAT_HYP;
-		if (cpus_have_cap(ARM64_HAS_VIRT_HOST_EXTN))
+		if (vhe)
 			smmu->features |= ARM_SMMU_FEAT_E2H;
 	}
 
+	/*
+	 * If the CPU is using VHE, but the SMMU doesn't support it, the SMMU
+	 * will create TLB entries for NH-EL1 world and will miss the
+	 * broadcasted TLB invalidations that target EL2-E2H world. Don't enable
+	 * BTM in that case.
+	 */
+	if (reg & IDR0_BTM && (!vhe || reg & IDR0_HYP))
+		smmu->features |= ARM_SMMU_FEAT_BTM;
+
 	/*
 	 * The coherency feature as set by FW is used in preference to the ID
 	 * register, but warn on mismatch.
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 23/37] iommu/arm-smmu-v3: Enable broadcast TLB maintenance
@ 2018-02-12 18:33   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

The SMMUv3 can handle invalidation targeted at TLB entries with shared
ASIDs. If the implementation supports broadcast TLB maintenance, enable it
and keep track of it in a feature bit. The SMMU will then be affected by
inner-shareable TLB invalidations from other agents.

A major side-effect of this change is that stage-2 translation contexts
are now affected by all invalidations by VMID. VMIDs are all shared and
the only ways to prevent over-invalidation, since the stage-2 page tables
are not shared between CPU and SMMU, are to either disable BTM or allocate
different VMIDs. This patch does not address the problem.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 1ca40ef51c47..98690589156b 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -65,6 +65,7 @@
 #define IDR0_ASID16			(1 << 12)
 #define IDR0_ATS			(1 << 10)
 #define IDR0_HYP			(1 << 9)
+#define IDR0_BTM			(1 << 5)
 #define IDR0_COHACC			(1 << 4)
 #define IDR0_TTF_SHIFT			2
 #define IDR0_TTF_MASK			0x3
@@ -568,6 +569,7 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_HYP		(1 << 12)
 #define ARM_SMMU_FEAT_STALL_FORCE	(1 << 13)
 #define ARM_SMMU_FEAT_E2H		(1 << 14)
+#define ARM_SMMU_FEAT_BTM		(1 << 15)
 	u32				features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
@@ -2513,11 +2515,14 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
 	writel_relaxed(reg, smmu->base + ARM_SMMU_CR1);
 
 	/* CR2 (random crap) */
-	reg = CR2_PTM | CR2_RECINVSID;
+	reg = CR2_RECINVSID;
 
 	if (smmu->features & ARM_SMMU_FEAT_E2H)
 		reg |= CR2_E2H;
 
+	if (!(smmu->features & ARM_SMMU_FEAT_BTM))
+		reg |= CR2_PTM;
+
 	writel_relaxed(reg, smmu->base + ARM_SMMU_CR2);
 
 	/* Stream table */
@@ -2618,6 +2623,7 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 {
 	u32 reg;
 	bool coherent = smmu->features & ARM_SMMU_FEAT_COHERENCY;
+	bool vhe = cpus_have_cap(ARM64_HAS_VIRT_HOST_EXTN);
 
 	/* IDR0 */
 	reg = readl_relaxed(smmu->base + ARM_SMMU_IDR0);
@@ -2667,10 +2673,19 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 
 	if (reg & IDR0_HYP) {
 		smmu->features |= ARM_SMMU_FEAT_HYP;
-		if (cpus_have_cap(ARM64_HAS_VIRT_HOST_EXTN))
+		if (vhe)
 			smmu->features |= ARM_SMMU_FEAT_E2H;
 	}
 
+	/*
+	 * If the CPU is using VHE, but the SMMU doesn't support it, the SMMU
+	 * will create TLB entries for NH-EL1 world and will miss the
+	 * broadcasted TLB invalidations that target EL2-E2H world. Don't enable
+	 * BTM in that case.
+	 */
+	if (reg & IDR0_BTM && (!vhe || reg & IDR0_HYP))
+		smmu->features |= ARM_SMMU_FEAT_BTM;
+
 	/*
 	 * The coherency feature as set by FW is used in preference to the ID
 	 * register, but warn on mismatch.
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 24/37] iommu/arm-smmu-v3: Add SVA feature checking
  2018-02-12 18:33 ` Jean-Philippe Brucker
  (?)
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: mark.rutland-5wv7dgnIgG8,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	catalin.marinas-5wv7dgnIgG8, xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	will.deacon-5wv7dgnIgG8, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	shunyong.yang-PT9Dzx9SjPiXmMXjJBpWqg,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw-LthD3rsA81gm4RdzfppkhA,
	sudeep.holla-5wv7dgnIgG8, christian.koenig-5C7GfCeVMHo

Aggregate all sanity-checks for sharing CPU page tables with the SMMU
under a single ARM_SMMU_FEAT_SVA bit. For PCIe SVA, users also need to
check FEAT_ATS and FEAT_PRI. For platform SVM, they will most likely have
to check FEAT_STALLS.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu-v3.c | 60 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 98690589156b..79bc5b5cceed 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -570,6 +570,7 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_STALL_FORCE	(1 << 13)
 #define ARM_SMMU_FEAT_E2H		(1 << 14)
 #define ARM_SMMU_FEAT_BTM		(1 << 15)
+#define ARM_SMMU_FEAT_SVA		(1 << 16)
 	u32				features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
@@ -2619,6 +2620,62 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
 	return 0;
 }
 
+static bool arm_smmu_supports_sva(struct arm_smmu_device *smmu)
+{
+	unsigned long reg, fld;
+	unsigned long oas;
+	unsigned long asid_bits;
+
+	u32 feat_mask = ARM_SMMU_FEAT_BTM | ARM_SMMU_FEAT_COHERENCY;
+
+	if ((smmu->features & feat_mask) != feat_mask)
+		return false;
+
+	if (!(smmu->pgsize_bitmap & PAGE_SIZE))
+		return false;
+
+	/*
+	 * Get the smallest PA size of all CPUs (sanitized by cpufeature). We're
+	 * not even pretending to support AArch32 here.
+	 */
+	reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
+	fld = cpuid_feature_extract_unsigned_field(reg, ID_AA64MMFR0_PARANGE_SHIFT);
+	switch (fld) {
+	case 0x0:
+		oas = 32;
+		break;
+	case 0x1:
+		oas = 36;
+		break;
+	case 0x2:
+		oas = 40;
+		break;
+	case 0x3:
+		oas = 42;
+		break;
+	case 0x4:
+		oas = 44;
+		break;
+	case 0x5:
+		oas = 48;
+		break;
+	default:
+		return false;
+	}
+
+	/* abort if MMU outputs addresses greater than what we support. */
+	if (smmu->oas < oas)
+		return false;
+
+	/* We can support bigger ASIDs than the CPU, but not smaller */
+	fld = cpuid_feature_extract_unsigned_field(reg, ID_AA64MMFR0_ASID_SHIFT);
+	asid_bits = fld ? 16 : 8;
+	if (smmu->asid_bits < asid_bits)
+		return false;
+
+	return true;
+}
+
 static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 {
 	u32 reg;
@@ -2813,6 +2870,9 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 
 	smmu->ias = max(smmu->ias, smmu->oas);
 
+	if (arm_smmu_supports_sva(smmu))
+		smmu->features |= ARM_SMMU_FEAT_SVA;
+
 	dev_info(smmu->dev, "ias %lu-bit, oas %lu-bit (features 0x%08x)\n",
 		 smmu->ias, smmu->oas, smmu->features);
 	return 0;
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 24/37] iommu/arm-smmu-v3: Add SVA feature checking
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

Aggregate all sanity-checks for sharing CPU page tables with the SMMU
under a single ARM_SMMU_FEAT_SVA bit. For PCIe SVA, users also need to
check FEAT_ATS and FEAT_PRI. For platform SVM, they will most likely have
to check FEAT_STALLS.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 60 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 98690589156b..79bc5b5cceed 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -570,6 +570,7 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_STALL_FORCE	(1 << 13)
 #define ARM_SMMU_FEAT_E2H		(1 << 14)
 #define ARM_SMMU_FEAT_BTM		(1 << 15)
+#define ARM_SMMU_FEAT_SVA		(1 << 16)
 	u32				features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
@@ -2619,6 +2620,62 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
 	return 0;
 }
 
+static bool arm_smmu_supports_sva(struct arm_smmu_device *smmu)
+{
+	unsigned long reg, fld;
+	unsigned long oas;
+	unsigned long asid_bits;
+
+	u32 feat_mask = ARM_SMMU_FEAT_BTM | ARM_SMMU_FEAT_COHERENCY;
+
+	if ((smmu->features & feat_mask) != feat_mask)
+		return false;
+
+	if (!(smmu->pgsize_bitmap & PAGE_SIZE))
+		return false;
+
+	/*
+	 * Get the smallest PA size of all CPUs (sanitized by cpufeature). We're
+	 * not even pretending to support AArch32 here.
+	 */
+	reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
+	fld = cpuid_feature_extract_unsigned_field(reg, ID_AA64MMFR0_PARANGE_SHIFT);
+	switch (fld) {
+	case 0x0:
+		oas = 32;
+		break;
+	case 0x1:
+		oas = 36;
+		break;
+	case 0x2:
+		oas = 40;
+		break;
+	case 0x3:
+		oas = 42;
+		break;
+	case 0x4:
+		oas = 44;
+		break;
+	case 0x5:
+		oas = 48;
+		break;
+	default:
+		return false;
+	}
+
+	/* abort if MMU outputs addresses greater than what we support. */
+	if (smmu->oas < oas)
+		return false;
+
+	/* We can support bigger ASIDs than the CPU, but not smaller */
+	fld = cpuid_feature_extract_unsigned_field(reg, ID_AA64MMFR0_ASID_SHIFT);
+	asid_bits = fld ? 16 : 8;
+	if (smmu->asid_bits < asid_bits)
+		return false;
+
+	return true;
+}
+
 static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 {
 	u32 reg;
@@ -2813,6 +2870,9 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 
 	smmu->ias = max(smmu->ias, smmu->oas);
 
+	if (arm_smmu_supports_sva(smmu))
+		smmu->features |= ARM_SMMU_FEAT_SVA;
+
 	dev_info(smmu->dev, "ias %lu-bit, oas %lu-bit (features 0x%08x)\n",
 		 smmu->ias, smmu->oas, smmu->features);
 	return 0;
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 24/37] iommu/arm-smmu-v3: Add SVA feature checking
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

Aggregate all sanity-checks for sharing CPU page tables with the SMMU
under a single ARM_SMMU_FEAT_SVA bit. For PCIe SVA, users also need to
check FEAT_ATS and FEAT_PRI. For platform SVM, they will most likely have
to check FEAT_STALLS.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 60 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 98690589156b..79bc5b5cceed 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -570,6 +570,7 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_STALL_FORCE	(1 << 13)
 #define ARM_SMMU_FEAT_E2H		(1 << 14)
 #define ARM_SMMU_FEAT_BTM		(1 << 15)
+#define ARM_SMMU_FEAT_SVA		(1 << 16)
 	u32				features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
@@ -2619,6 +2620,62 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
 	return 0;
 }
 
+static bool arm_smmu_supports_sva(struct arm_smmu_device *smmu)
+{
+	unsigned long reg, fld;
+	unsigned long oas;
+	unsigned long asid_bits;
+
+	u32 feat_mask = ARM_SMMU_FEAT_BTM | ARM_SMMU_FEAT_COHERENCY;
+
+	if ((smmu->features & feat_mask) != feat_mask)
+		return false;
+
+	if (!(smmu->pgsize_bitmap & PAGE_SIZE))
+		return false;
+
+	/*
+	 * Get the smallest PA size of all CPUs (sanitized by cpufeature). We're
+	 * not even pretending to support AArch32 here.
+	 */
+	reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
+	fld = cpuid_feature_extract_unsigned_field(reg, ID_AA64MMFR0_PARANGE_SHIFT);
+	switch (fld) {
+	case 0x0:
+		oas = 32;
+		break;
+	case 0x1:
+		oas = 36;
+		break;
+	case 0x2:
+		oas = 40;
+		break;
+	case 0x3:
+		oas = 42;
+		break;
+	case 0x4:
+		oas = 44;
+		break;
+	case 0x5:
+		oas = 48;
+		break;
+	default:
+		return false;
+	}
+
+	/* abort if MMU outputs addresses greater than what we support. */
+	if (smmu->oas < oas)
+		return false;
+
+	/* We can support bigger ASIDs than the CPU, but not smaller */
+	fld = cpuid_feature_extract_unsigned_field(reg, ID_AA64MMFR0_ASID_SHIFT);
+	asid_bits = fld ? 16 : 8;
+	if (smmu->asid_bits < asid_bits)
+		return false;
+
+	return true;
+}
+
 static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 {
 	u32 reg;
@@ -2813,6 +2870,9 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 
 	smmu->ias = max(smmu->ias, smmu->oas);
 
+	if (arm_smmu_supports_sva(smmu))
+		smmu->features |= ARM_SMMU_FEAT_SVA;
+
 	dev_info(smmu->dev, "ias %lu-bit, oas %lu-bit (features 0x%08x)\n",
 		 smmu->ias, smmu->oas, smmu->features);
 	return 0;
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 25/37] iommu/arm-smmu-v3: Implement mm operations
  2018-02-12 18:33 ` Jean-Philippe Brucker
  (?)
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: mark.rutland-5wv7dgnIgG8,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	catalin.marinas-5wv7dgnIgG8, xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	will.deacon-5wv7dgnIgG8, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	shunyong.yang-PT9Dzx9SjPiXmMXjJBpWqg,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw-LthD3rsA81gm4RdzfppkhA,
	sudeep.holla-5wv7dgnIgG8, christian.koenig-5C7GfCeVMHo

Hook mm operations to support PASID and page table sharing with the
SMMUv3:

* mm_alloc allocates a context descriptor.
* mm_free releases the context descriptor.
* mm_attach checks device capabilities and writes the context descriptor.
* mm_detach clears the context descriptor and sends required
  invalidations.
* mm_invalidate sends required invalidations.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/Kconfig       |   1 +
 drivers/iommu/arm-smmu-v3.c | 131 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 132 insertions(+)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 4b272925ee78..d434f7085dc2 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -353,6 +353,7 @@ config ARM_SMMU_V3
 	bool "ARM Ltd. System MMU Version 3 (SMMUv3) Support"
 	depends on ARM64
 	select IOMMU_API
+	select IOMMU_SVA
 	select IOMMU_IO_PGTABLE_LPAE
 	select ARM_SMMU_V3_CONTEXT
 	select GENERIC_MSI_IRQ_DOMAIN
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 79bc5b5cceed..1cdeea7e22cb 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -29,6 +29,7 @@
 #include <linux/interrupt.h>
 #include <linux/iommu.h>
 #include <linux/iopoll.h>
+#include <linux/mmu_context.h>
 #include <linux/module.h>
 #include <linux/msi.h>
 #include <linux/of.h>
@@ -37,6 +38,7 @@
 #include <linux/of_platform.h>
 #include <linux/pci.h>
 #include <linux/platform_device.h>
+#include <linux/sched/mm.h>
 
 #include <linux/amba/bus.h>
 
@@ -617,6 +619,7 @@ struct arm_smmu_master_data {
 	struct device			*dev;
 
 	size_t				ssid_bits;
+	bool				can_fault;
 };
 
 /* SMMU private data for an IOMMU domain */
@@ -645,6 +648,13 @@ struct arm_smmu_domain {
 	spinlock_t			devices_lock;
 };
 
+struct arm_smmu_mm {
+	struct io_mm			io_mm;
+	struct iommu_pasid_entry	*cd;
+	/* Only for release ! */
+	struct iommu_pasid_table_ops	*ops;
+};
+
 struct arm_smmu_option_prop {
 	u32 opt;
 	const char *prop;
@@ -671,6 +681,11 @@ static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
 	return container_of(dom, struct arm_smmu_domain, domain);
 }
 
+static struct arm_smmu_mm *to_smmu_mm(struct io_mm *io_mm)
+{
+	return container_of(io_mm, struct arm_smmu_mm, io_mm);
+}
+
 static void parse_driver_options(struct arm_smmu_device *smmu)
 {
 	int i = 0;
@@ -1785,6 +1800,8 @@ static void arm_smmu_detach_dev(struct device *dev)
 	struct arm_smmu_domain *smmu_domain = master->domain;
 
 	if (smmu_domain) {
+		__iommu_sva_unbind_dev_all(dev);
+
 		spin_lock_irqsave(&smmu_domain->devices_lock, flags);
 		list_del(&master->list);
 		spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
@@ -1902,6 +1919,113 @@ arm_smmu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova)
 	return ops->iova_to_phys(ops, iova);
 }
 
+static int arm_smmu_sva_init(struct device *dev, unsigned long features,
+			     unsigned int *min_pasid, unsigned int *max_pasid)
+{
+	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
+
+	if (features & IOMMU_SVA_FEAT_IOPF && !master->can_fault)
+		return -EINVAL;
+
+	if (features & IOMMU_SVA_FEAT_PASID && !master->ssid_bits)
+		return -EINVAL;
+
+	if (!*max_pasid)
+		*max_pasid = 0xfffffU;
+
+	/* SSID support in the SMMU requires at least one SSID bit */
+	*min_pasid = max(*min_pasid, 1U);
+	*max_pasid = min(*max_pasid, (1U << master->ssid_bits) - 1);
+
+	return 0;
+}
+
+static void arm_smmu_sva_shutdown(struct device *dev)
+{
+}
+
+static struct io_mm *arm_smmu_mm_alloc(struct iommu_domain *domain,
+				       struct mm_struct *mm)
+{
+	struct arm_smmu_mm *smmu_mm;
+	struct iommu_pasid_entry *cd;
+	struct iommu_pasid_table_ops *ops;
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+
+	if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
+		return NULL;
+
+	smmu_mm = kzalloc(sizeof(*smmu_mm), GFP_KERNEL);
+	if (!smmu_mm)
+		return NULL;
+
+	smmu_mm->ops = ops = smmu_domain->s1_cfg.ops;
+	cd = ops->alloc_shared_entry(ops, mm);
+	if (IS_ERR(cd)) {
+		kfree(smmu_mm);
+		return ERR_CAST(cd);
+	}
+
+	smmu_mm->cd = cd;
+	return &smmu_mm->io_mm;
+}
+
+static void arm_smmu_mm_free(struct io_mm *io_mm)
+{
+	struct arm_smmu_mm *smmu_mm = to_smmu_mm(io_mm);
+
+	smmu_mm->ops->free_entry(smmu_mm->ops, smmu_mm->cd);
+	kfree(smmu_mm);
+}
+
+static int arm_smmu_mm_attach(struct iommu_domain *domain, struct device *dev,
+			      struct io_mm *io_mm, bool attach_domain)
+{
+	struct arm_smmu_mm *smmu_mm = to_smmu_mm(io_mm);
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct iommu_pasid_table_ops *ops = smmu_domain->s1_cfg.ops;
+	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
+
+	if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
+		return -EINVAL;
+
+	if (!(master->smmu->features & ARM_SMMU_FEAT_SVA))
+		return -ENODEV;
+
+	/* TODO: io_mm->no_need_for_pri_ill_pin_everything */
+	if (!master->can_fault)
+		return -ENODEV;
+
+	if (!attach_domain)
+		return 0;
+
+	return ops->set_entry(ops, io_mm->pasid, smmu_mm->cd);
+}
+
+static void arm_smmu_mm_detach(struct iommu_domain *domain, struct device *dev,
+			       struct io_mm *io_mm, bool detach_domain)
+{
+	struct arm_smmu_mm *smmu_mm = to_smmu_mm(io_mm);
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct iommu_pasid_table_ops *ops = smmu_domain->s1_cfg.ops;
+
+	if (detach_domain)
+		ops->clear_entry(ops, io_mm->pasid, smmu_mm->cd);
+
+	/* TODO: Invalidate ATC. */
+	/* TODO: Invalidate all mappings if last and not DVM. */
+}
+
+static void arm_smmu_mm_invalidate(struct iommu_domain *domain,
+				   struct device *dev, struct io_mm *io_mm,
+				   unsigned long iova, size_t size)
+{
+	/*
+	 * TODO: Invalidate ATC.
+	 * TODO: Invalidate mapping if not DVM
+	 */
+}
+
 static struct platform_driver arm_smmu_driver;
 
 static int arm_smmu_match_node(struct device *dev, void *data)
@@ -2108,6 +2232,13 @@ static struct iommu_ops arm_smmu_ops = {
 	.domain_alloc		= arm_smmu_domain_alloc,
 	.domain_free		= arm_smmu_domain_free,
 	.attach_dev		= arm_smmu_attach_dev,
+	.sva_device_init	= arm_smmu_sva_init,
+	.sva_device_shutdown	= arm_smmu_sva_shutdown,
+	.mm_alloc		= arm_smmu_mm_alloc,
+	.mm_free		= arm_smmu_mm_free,
+	.mm_attach		= arm_smmu_mm_attach,
+	.mm_detach		= arm_smmu_mm_detach,
+	.mm_invalidate		= arm_smmu_mm_invalidate,
 	.map			= arm_smmu_map,
 	.unmap			= arm_smmu_unmap,
 	.map_sg			= default_iommu_map_sg,
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 25/37] iommu/arm-smmu-v3: Implement mm operations
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

Hook mm operations to support PASID and page table sharing with the
SMMUv3:

* mm_alloc allocates a context descriptor.
* mm_free releases the context descriptor.
* mm_attach checks device capabilities and writes the context descriptor.
* mm_detach clears the context descriptor and sends required
  invalidations.
* mm_invalidate sends required invalidations.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/Kconfig       |   1 +
 drivers/iommu/arm-smmu-v3.c | 131 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 132 insertions(+)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 4b272925ee78..d434f7085dc2 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -353,6 +353,7 @@ config ARM_SMMU_V3
 	bool "ARM Ltd. System MMU Version 3 (SMMUv3) Support"
 	depends on ARM64
 	select IOMMU_API
+	select IOMMU_SVA
 	select IOMMU_IO_PGTABLE_LPAE
 	select ARM_SMMU_V3_CONTEXT
 	select GENERIC_MSI_IRQ_DOMAIN
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 79bc5b5cceed..1cdeea7e22cb 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -29,6 +29,7 @@
 #include <linux/interrupt.h>
 #include <linux/iommu.h>
 #include <linux/iopoll.h>
+#include <linux/mmu_context.h>
 #include <linux/module.h>
 #include <linux/msi.h>
 #include <linux/of.h>
@@ -37,6 +38,7 @@
 #include <linux/of_platform.h>
 #include <linux/pci.h>
 #include <linux/platform_device.h>
+#include <linux/sched/mm.h>
 
 #include <linux/amba/bus.h>
 
@@ -617,6 +619,7 @@ struct arm_smmu_master_data {
 	struct device			*dev;
 
 	size_t				ssid_bits;
+	bool				can_fault;
 };
 
 /* SMMU private data for an IOMMU domain */
@@ -645,6 +648,13 @@ struct arm_smmu_domain {
 	spinlock_t			devices_lock;
 };
 
+struct arm_smmu_mm {
+	struct io_mm			io_mm;
+	struct iommu_pasid_entry	*cd;
+	/* Only for release ! */
+	struct iommu_pasid_table_ops	*ops;
+};
+
 struct arm_smmu_option_prop {
 	u32 opt;
 	const char *prop;
@@ -671,6 +681,11 @@ static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
 	return container_of(dom, struct arm_smmu_domain, domain);
 }
 
+static struct arm_smmu_mm *to_smmu_mm(struct io_mm *io_mm)
+{
+	return container_of(io_mm, struct arm_smmu_mm, io_mm);
+}
+
 static void parse_driver_options(struct arm_smmu_device *smmu)
 {
 	int i = 0;
@@ -1785,6 +1800,8 @@ static void arm_smmu_detach_dev(struct device *dev)
 	struct arm_smmu_domain *smmu_domain = master->domain;
 
 	if (smmu_domain) {
+		__iommu_sva_unbind_dev_all(dev);
+
 		spin_lock_irqsave(&smmu_domain->devices_lock, flags);
 		list_del(&master->list);
 		spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
@@ -1902,6 +1919,113 @@ arm_smmu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova)
 	return ops->iova_to_phys(ops, iova);
 }
 
+static int arm_smmu_sva_init(struct device *dev, unsigned long features,
+			     unsigned int *min_pasid, unsigned int *max_pasid)
+{
+	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
+
+	if (features & IOMMU_SVA_FEAT_IOPF && !master->can_fault)
+		return -EINVAL;
+
+	if (features & IOMMU_SVA_FEAT_PASID && !master->ssid_bits)
+		return -EINVAL;
+
+	if (!*max_pasid)
+		*max_pasid = 0xfffffU;
+
+	/* SSID support in the SMMU requires at least one SSID bit */
+	*min_pasid = max(*min_pasid, 1U);
+	*max_pasid = min(*max_pasid, (1U << master->ssid_bits) - 1);
+
+	return 0;
+}
+
+static void arm_smmu_sva_shutdown(struct device *dev)
+{
+}
+
+static struct io_mm *arm_smmu_mm_alloc(struct iommu_domain *domain,
+				       struct mm_struct *mm)
+{
+	struct arm_smmu_mm *smmu_mm;
+	struct iommu_pasid_entry *cd;
+	struct iommu_pasid_table_ops *ops;
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+
+	if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
+		return NULL;
+
+	smmu_mm = kzalloc(sizeof(*smmu_mm), GFP_KERNEL);
+	if (!smmu_mm)
+		return NULL;
+
+	smmu_mm->ops = ops = smmu_domain->s1_cfg.ops;
+	cd = ops->alloc_shared_entry(ops, mm);
+	if (IS_ERR(cd)) {
+		kfree(smmu_mm);
+		return ERR_CAST(cd);
+	}
+
+	smmu_mm->cd = cd;
+	return &smmu_mm->io_mm;
+}
+
+static void arm_smmu_mm_free(struct io_mm *io_mm)
+{
+	struct arm_smmu_mm *smmu_mm = to_smmu_mm(io_mm);
+
+	smmu_mm->ops->free_entry(smmu_mm->ops, smmu_mm->cd);
+	kfree(smmu_mm);
+}
+
+static int arm_smmu_mm_attach(struct iommu_domain *domain, struct device *dev,
+			      struct io_mm *io_mm, bool attach_domain)
+{
+	struct arm_smmu_mm *smmu_mm = to_smmu_mm(io_mm);
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct iommu_pasid_table_ops *ops = smmu_domain->s1_cfg.ops;
+	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
+
+	if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
+		return -EINVAL;
+
+	if (!(master->smmu->features & ARM_SMMU_FEAT_SVA))
+		return -ENODEV;
+
+	/* TODO: io_mm->no_need_for_pri_ill_pin_everything */
+	if (!master->can_fault)
+		return -ENODEV;
+
+	if (!attach_domain)
+		return 0;
+
+	return ops->set_entry(ops, io_mm->pasid, smmu_mm->cd);
+}
+
+static void arm_smmu_mm_detach(struct iommu_domain *domain, struct device *dev,
+			       struct io_mm *io_mm, bool detach_domain)
+{
+	struct arm_smmu_mm *smmu_mm = to_smmu_mm(io_mm);
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct iommu_pasid_table_ops *ops = smmu_domain->s1_cfg.ops;
+
+	if (detach_domain)
+		ops->clear_entry(ops, io_mm->pasid, smmu_mm->cd);
+
+	/* TODO: Invalidate ATC. */
+	/* TODO: Invalidate all mappings if last and not DVM. */
+}
+
+static void arm_smmu_mm_invalidate(struct iommu_domain *domain,
+				   struct device *dev, struct io_mm *io_mm,
+				   unsigned long iova, size_t size)
+{
+	/*
+	 * TODO: Invalidate ATC.
+	 * TODO: Invalidate mapping if not DVM
+	 */
+}
+
 static struct platform_driver arm_smmu_driver;
 
 static int arm_smmu_match_node(struct device *dev, void *data)
@@ -2108,6 +2232,13 @@ static struct iommu_ops arm_smmu_ops = {
 	.domain_alloc		= arm_smmu_domain_alloc,
 	.domain_free		= arm_smmu_domain_free,
 	.attach_dev		= arm_smmu_attach_dev,
+	.sva_device_init	= arm_smmu_sva_init,
+	.sva_device_shutdown	= arm_smmu_sva_shutdown,
+	.mm_alloc		= arm_smmu_mm_alloc,
+	.mm_free		= arm_smmu_mm_free,
+	.mm_attach		= arm_smmu_mm_attach,
+	.mm_detach		= arm_smmu_mm_detach,
+	.mm_invalidate		= arm_smmu_mm_invalidate,
 	.map			= arm_smmu_map,
 	.unmap			= arm_smmu_unmap,
 	.map_sg			= default_iommu_map_sg,
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 25/37] iommu/arm-smmu-v3: Implement mm operations
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

Hook mm operations to support PASID and page table sharing with the
SMMUv3:

* mm_alloc allocates a context descriptor.
* mm_free releases the context descriptor.
* mm_attach checks device capabilities and writes the context descriptor.
* mm_detach clears the context descriptor and sends required
  invalidations.
* mm_invalidate sends required invalidations.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/Kconfig       |   1 +
 drivers/iommu/arm-smmu-v3.c | 131 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 132 insertions(+)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 4b272925ee78..d434f7085dc2 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -353,6 +353,7 @@ config ARM_SMMU_V3
 	bool "ARM Ltd. System MMU Version 3 (SMMUv3) Support"
 	depends on ARM64
 	select IOMMU_API
+	select IOMMU_SVA
 	select IOMMU_IO_PGTABLE_LPAE
 	select ARM_SMMU_V3_CONTEXT
 	select GENERIC_MSI_IRQ_DOMAIN
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 79bc5b5cceed..1cdeea7e22cb 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -29,6 +29,7 @@
 #include <linux/interrupt.h>
 #include <linux/iommu.h>
 #include <linux/iopoll.h>
+#include <linux/mmu_context.h>
 #include <linux/module.h>
 #include <linux/msi.h>
 #include <linux/of.h>
@@ -37,6 +38,7 @@
 #include <linux/of_platform.h>
 #include <linux/pci.h>
 #include <linux/platform_device.h>
+#include <linux/sched/mm.h>
 
 #include <linux/amba/bus.h>
 
@@ -617,6 +619,7 @@ struct arm_smmu_master_data {
 	struct device			*dev;
 
 	size_t				ssid_bits;
+	bool				can_fault;
 };
 
 /* SMMU private data for an IOMMU domain */
@@ -645,6 +648,13 @@ struct arm_smmu_domain {
 	spinlock_t			devices_lock;
 };
 
+struct arm_smmu_mm {
+	struct io_mm			io_mm;
+	struct iommu_pasid_entry	*cd;
+	/* Only for release ! */
+	struct iommu_pasid_table_ops	*ops;
+};
+
 struct arm_smmu_option_prop {
 	u32 opt;
 	const char *prop;
@@ -671,6 +681,11 @@ static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
 	return container_of(dom, struct arm_smmu_domain, domain);
 }
 
+static struct arm_smmu_mm *to_smmu_mm(struct io_mm *io_mm)
+{
+	return container_of(io_mm, struct arm_smmu_mm, io_mm);
+}
+
 static void parse_driver_options(struct arm_smmu_device *smmu)
 {
 	int i = 0;
@@ -1785,6 +1800,8 @@ static void arm_smmu_detach_dev(struct device *dev)
 	struct arm_smmu_domain *smmu_domain = master->domain;
 
 	if (smmu_domain) {
+		__iommu_sva_unbind_dev_all(dev);
+
 		spin_lock_irqsave(&smmu_domain->devices_lock, flags);
 		list_del(&master->list);
 		spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
@@ -1902,6 +1919,113 @@ arm_smmu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova)
 	return ops->iova_to_phys(ops, iova);
 }
 
+static int arm_smmu_sva_init(struct device *dev, unsigned long features,
+			     unsigned int *min_pasid, unsigned int *max_pasid)
+{
+	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
+
+	if (features & IOMMU_SVA_FEAT_IOPF && !master->can_fault)
+		return -EINVAL;
+
+	if (features & IOMMU_SVA_FEAT_PASID && !master->ssid_bits)
+		return -EINVAL;
+
+	if (!*max_pasid)
+		*max_pasid = 0xfffffU;
+
+	/* SSID support in the SMMU requires at least one SSID bit */
+	*min_pasid = max(*min_pasid, 1U);
+	*max_pasid = min(*max_pasid, (1U << master->ssid_bits) - 1);
+
+	return 0;
+}
+
+static void arm_smmu_sva_shutdown(struct device *dev)
+{
+}
+
+static struct io_mm *arm_smmu_mm_alloc(struct iommu_domain *domain,
+				       struct mm_struct *mm)
+{
+	struct arm_smmu_mm *smmu_mm;
+	struct iommu_pasid_entry *cd;
+	struct iommu_pasid_table_ops *ops;
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+
+	if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
+		return NULL;
+
+	smmu_mm = kzalloc(sizeof(*smmu_mm), GFP_KERNEL);
+	if (!smmu_mm)
+		return NULL;
+
+	smmu_mm->ops = ops = smmu_domain->s1_cfg.ops;
+	cd = ops->alloc_shared_entry(ops, mm);
+	if (IS_ERR(cd)) {
+		kfree(smmu_mm);
+		return ERR_CAST(cd);
+	}
+
+	smmu_mm->cd = cd;
+	return &smmu_mm->io_mm;
+}
+
+static void arm_smmu_mm_free(struct io_mm *io_mm)
+{
+	struct arm_smmu_mm *smmu_mm = to_smmu_mm(io_mm);
+
+	smmu_mm->ops->free_entry(smmu_mm->ops, smmu_mm->cd);
+	kfree(smmu_mm);
+}
+
+static int arm_smmu_mm_attach(struct iommu_domain *domain, struct device *dev,
+			      struct io_mm *io_mm, bool attach_domain)
+{
+	struct arm_smmu_mm *smmu_mm = to_smmu_mm(io_mm);
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct iommu_pasid_table_ops *ops = smmu_domain->s1_cfg.ops;
+	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
+
+	if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
+		return -EINVAL;
+
+	if (!(master->smmu->features & ARM_SMMU_FEAT_SVA))
+		return -ENODEV;
+
+	/* TODO: io_mm->no_need_for_pri_ill_pin_everything */
+	if (!master->can_fault)
+		return -ENODEV;
+
+	if (!attach_domain)
+		return 0;
+
+	return ops->set_entry(ops, io_mm->pasid, smmu_mm->cd);
+}
+
+static void arm_smmu_mm_detach(struct iommu_domain *domain, struct device *dev,
+			       struct io_mm *io_mm, bool detach_domain)
+{
+	struct arm_smmu_mm *smmu_mm = to_smmu_mm(io_mm);
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct iommu_pasid_table_ops *ops = smmu_domain->s1_cfg.ops;
+
+	if (detach_domain)
+		ops->clear_entry(ops, io_mm->pasid, smmu_mm->cd);
+
+	/* TODO: Invalidate ATC. */
+	/* TODO: Invalidate all mappings if last and not DVM. */
+}
+
+static void arm_smmu_mm_invalidate(struct iommu_domain *domain,
+				   struct device *dev, struct io_mm *io_mm,
+				   unsigned long iova, size_t size)
+{
+	/*
+	 * TODO: Invalidate ATC.
+	 * TODO: Invalidate mapping if not DVM
+	 */
+}
+
 static struct platform_driver arm_smmu_driver;
 
 static int arm_smmu_match_node(struct device *dev, void *data)
@@ -2108,6 +2232,13 @@ static struct iommu_ops arm_smmu_ops = {
 	.domain_alloc		= arm_smmu_domain_alloc,
 	.domain_free		= arm_smmu_domain_free,
 	.attach_dev		= arm_smmu_attach_dev,
+	.sva_device_init	= arm_smmu_sva_init,
+	.sva_device_shutdown	= arm_smmu_sva_shutdown,
+	.mm_alloc		= arm_smmu_mm_alloc,
+	.mm_free		= arm_smmu_mm_free,
+	.mm_attach		= arm_smmu_mm_attach,
+	.mm_detach		= arm_smmu_mm_detach,
+	.mm_invalidate		= arm_smmu_mm_invalidate,
 	.map			= arm_smmu_map,
 	.unmap			= arm_smmu_unmap,
 	.map_sg			= default_iommu_map_sg,
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 26/37] iommu/arm-smmu-v3: Add support for Hardware Translation Table Update
  2018-02-12 18:33 ` Jean-Philippe Brucker
@ 2018-02-12 18:33   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

If the SMMU supports it and the kernel was built with HTTU support, enable
hardware update of access and dirty flags. This is essential for shared
page tables, to reduce the number of access faults on the fault queue.

We can still enable HTTU if CPUs don't support it, because the kernel
always checks for HW dirty bit and updates the PTE flags atomically.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3-context.c | 20 ++++++++++++++++++--
 drivers/iommu/arm-smmu-v3.c         | 12 ++++++++++++
 drivers/iommu/iommu-pasid.h         |  4 ++++
 3 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3-context.c b/drivers/iommu/arm-smmu-v3-context.c
index 5b8c5875e0d9..eaeba1bec2e9 100644
--- a/drivers/iommu/arm-smmu-v3-context.c
+++ b/drivers/iommu/arm-smmu-v3-context.c
@@ -62,7 +62,16 @@
 #define ARM64_TCR_TBI0_SHIFT		37
 #define ARM64_TCR_TBI0_MASK		0x1UL
 
+#define ARM64_TCR_HA_SHIFT		39
+#define ARM64_TCR_HA_MASK		0x1UL
+#define ARM64_TCR_HD_SHIFT		40
+#define ARM64_TCR_HD_MASK		0x1UL
+
 #define CTXDESC_CD_0_AA64		(1UL << 41)
+#define CTXDESC_CD_0_TCR_HD_SHIFT	42
+#define CTXDESC_CD_0_TCR_HA_SHIFT	43
+#define CTXDESC_CD_0_HD			(1UL << CTXDESC_CD_0_TCR_HD_SHIFT)
+#define CTXDESC_CD_0_HA			(1UL << CTXDESC_CD_0_TCR_HA_SHIFT)
 #define CTXDESC_CD_0_S			(1UL << 44)
 #define CTXDESC_CD_0_R			(1UL << 45)
 #define CTXDESC_CD_0_A			(1UL << 46)
@@ -199,7 +208,7 @@ static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_cd_tables *tbl, u32 ssid)
 	return l1_desc->ptr + idx * CTXDESC_CD_DWORDS;
 }
 
-static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
+static u64 arm_smmu_cpu_tcr_to_cd(struct arm_smmu_context_cfg *cfg, u64 tcr)
 {
 	u64 val = 0;
 
@@ -214,6 +223,12 @@ static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
 	val |= ARM_SMMU_TCR2CD(tcr, IPS);
 	val |= ARM_SMMU_TCR2CD(tcr, TBI0);
 
+	if (cfg->hw_access)
+		val |= ARM_SMMU_TCR2CD(tcr, HA);
+
+	if (cfg->hw_dirty)
+		val |= ARM_SMMU_TCR2CD(tcr, HD);
+
 	return val;
 }
 
@@ -269,7 +284,7 @@ static int __arm_smmu_write_ctx_desc(struct arm_smmu_cd_tables *tbl, int ssid,
 		iommu_pasid_flush(&tbl->pasid, ssid, true);
 
 
-		val = arm_smmu_cpu_tcr_to_cd(cd->tcr) |
+		val = arm_smmu_cpu_tcr_to_cd(cfg, cd->tcr) |
 #ifdef __BIG_ENDIAN
 		      CTXDESC_CD_0_ENDI |
 #endif
@@ -460,6 +475,7 @@ arm_smmu_alloc_shared_cd(struct iommu_pasid_table_ops *ops, struct mm_struct *mm
 	reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
 	par = cpuid_feature_extract_unsigned_field(reg, ID_AA64MMFR0_PARANGE_SHIFT);
 	tcr |= par << ARM_LPAE_TCR_IPS_SHIFT;
+	tcr |= TCR_HA | TCR_HD;
 
 	cd->ttbr	= virt_to_phys(mm->pgd);
 	cd->tcr		= tcr;
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 1cdeea7e22cb..8528704627b5 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -67,6 +67,8 @@
 #define IDR0_ASID16			(1 << 12)
 #define IDR0_ATS			(1 << 10)
 #define IDR0_HYP			(1 << 9)
+#define IDR0_HD				(1 << 7)
+#define IDR0_HA				(1 << 6)
 #define IDR0_BTM			(1 << 5)
 #define IDR0_COHACC			(1 << 4)
 #define IDR0_TTF_SHIFT			2
@@ -573,6 +575,8 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_E2H		(1 << 14)
 #define ARM_SMMU_FEAT_BTM		(1 << 15)
 #define ARM_SMMU_FEAT_SVA		(1 << 16)
+#define ARM_SMMU_FEAT_HA		(1 << 17)
+#define ARM_SMMU_FEAT_HD		(1 << 18)
 	u32				features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
@@ -1631,6 +1635,8 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 		.arm_smmu = {
 			.stall		= !!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE),
 			.asid_bits	= smmu->asid_bits,
+			.hw_access	= !!(smmu->features & ARM_SMMU_FEAT_HA),
+			.hw_dirty	= !!(smmu->features & ARM_SMMU_FEAT_HD),
 		},
 	};
 
@@ -2865,6 +2871,12 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 			smmu->features |= ARM_SMMU_FEAT_E2H;
 	}
 
+	if (reg & (IDR0_HA | IDR0_HD)) {
+		smmu->features |= ARM_SMMU_FEAT_HA;
+		if (reg & IDR0_HD)
+			smmu->features |= ARM_SMMU_FEAT_HD;
+	}
+
 	/*
 	 * If the CPU is using VHE, but the SMMU doesn't support it, the SMMU
 	 * will create TLB entries for NH-EL1 world and will miss the
diff --git a/drivers/iommu/iommu-pasid.h b/drivers/iommu/iommu-pasid.h
index 77e449a1655b..46fd44e7f4f1 100644
--- a/drivers/iommu/iommu-pasid.h
+++ b/drivers/iommu/iommu-pasid.h
@@ -79,6 +79,8 @@ struct iommu_pasid_sync_ops {
  *
  * SMMU properties:
  * @stall:	devices attached to the domain are allowed to stall.
+ * @hw_dirty:	hardware may update dirty flag
+ * @hw_access:	hardware may update access flag
  * @asid_bits:	number of ASID bits supported by the SMMU
  *
  * @s1fmt:	PASID table format, chosen by the allocator.
@@ -86,6 +88,8 @@ struct iommu_pasid_sync_ops {
 struct arm_smmu_context_cfg {
 	u8				stall:1;
 	u8				asid_bits;
+	u8				hw_dirty:1;
+	u8				hw_access:1;
 
 #define ARM_SMMU_S1FMT_LINEAR		0x0
 #define ARM_SMMU_S1FMT_4K_L2		0x1
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 26/37] iommu/arm-smmu-v3: Add support for Hardware Translation Table Update
@ 2018-02-12 18:33   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

If the SMMU supports it and the kernel was built with HTTU support, enable
hardware update of access and dirty flags. This is essential for shared
page tables, to reduce the number of access faults on the fault queue.

We can still enable HTTU if CPUs don't support it, because the kernel
always checks for HW dirty bit and updates the PTE flags atomically.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3-context.c | 20 ++++++++++++++++++--
 drivers/iommu/arm-smmu-v3.c         | 12 ++++++++++++
 drivers/iommu/iommu-pasid.h         |  4 ++++
 3 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3-context.c b/drivers/iommu/arm-smmu-v3-context.c
index 5b8c5875e0d9..eaeba1bec2e9 100644
--- a/drivers/iommu/arm-smmu-v3-context.c
+++ b/drivers/iommu/arm-smmu-v3-context.c
@@ -62,7 +62,16 @@
 #define ARM64_TCR_TBI0_SHIFT		37
 #define ARM64_TCR_TBI0_MASK		0x1UL
 
+#define ARM64_TCR_HA_SHIFT		39
+#define ARM64_TCR_HA_MASK		0x1UL
+#define ARM64_TCR_HD_SHIFT		40
+#define ARM64_TCR_HD_MASK		0x1UL
+
 #define CTXDESC_CD_0_AA64		(1UL << 41)
+#define CTXDESC_CD_0_TCR_HD_SHIFT	42
+#define CTXDESC_CD_0_TCR_HA_SHIFT	43
+#define CTXDESC_CD_0_HD			(1UL << CTXDESC_CD_0_TCR_HD_SHIFT)
+#define CTXDESC_CD_0_HA			(1UL << CTXDESC_CD_0_TCR_HA_SHIFT)
 #define CTXDESC_CD_0_S			(1UL << 44)
 #define CTXDESC_CD_0_R			(1UL << 45)
 #define CTXDESC_CD_0_A			(1UL << 46)
@@ -199,7 +208,7 @@ static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_cd_tables *tbl, u32 ssid)
 	return l1_desc->ptr + idx * CTXDESC_CD_DWORDS;
 }
 
-static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
+static u64 arm_smmu_cpu_tcr_to_cd(struct arm_smmu_context_cfg *cfg, u64 tcr)
 {
 	u64 val = 0;
 
@@ -214,6 +223,12 @@ static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
 	val |= ARM_SMMU_TCR2CD(tcr, IPS);
 	val |= ARM_SMMU_TCR2CD(tcr, TBI0);
 
+	if (cfg->hw_access)
+		val |= ARM_SMMU_TCR2CD(tcr, HA);
+
+	if (cfg->hw_dirty)
+		val |= ARM_SMMU_TCR2CD(tcr, HD);
+
 	return val;
 }
 
@@ -269,7 +284,7 @@ static int __arm_smmu_write_ctx_desc(struct arm_smmu_cd_tables *tbl, int ssid,
 		iommu_pasid_flush(&tbl->pasid, ssid, true);
 
 
-		val = arm_smmu_cpu_tcr_to_cd(cd->tcr) |
+		val = arm_smmu_cpu_tcr_to_cd(cfg, cd->tcr) |
 #ifdef __BIG_ENDIAN
 		      CTXDESC_CD_0_ENDI |
 #endif
@@ -460,6 +475,7 @@ arm_smmu_alloc_shared_cd(struct iommu_pasid_table_ops *ops, struct mm_struct *mm
 	reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
 	par = cpuid_feature_extract_unsigned_field(reg, ID_AA64MMFR0_PARANGE_SHIFT);
 	tcr |= par << ARM_LPAE_TCR_IPS_SHIFT;
+	tcr |= TCR_HA | TCR_HD;
 
 	cd->ttbr	= virt_to_phys(mm->pgd);
 	cd->tcr		= tcr;
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 1cdeea7e22cb..8528704627b5 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -67,6 +67,8 @@
 #define IDR0_ASID16			(1 << 12)
 #define IDR0_ATS			(1 << 10)
 #define IDR0_HYP			(1 << 9)
+#define IDR0_HD				(1 << 7)
+#define IDR0_HA				(1 << 6)
 #define IDR0_BTM			(1 << 5)
 #define IDR0_COHACC			(1 << 4)
 #define IDR0_TTF_SHIFT			2
@@ -573,6 +575,8 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_E2H		(1 << 14)
 #define ARM_SMMU_FEAT_BTM		(1 << 15)
 #define ARM_SMMU_FEAT_SVA		(1 << 16)
+#define ARM_SMMU_FEAT_HA		(1 << 17)
+#define ARM_SMMU_FEAT_HD		(1 << 18)
 	u32				features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
@@ -1631,6 +1635,8 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 		.arm_smmu = {
 			.stall		= !!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE),
 			.asid_bits	= smmu->asid_bits,
+			.hw_access	= !!(smmu->features & ARM_SMMU_FEAT_HA),
+			.hw_dirty	= !!(smmu->features & ARM_SMMU_FEAT_HD),
 		},
 	};
 
@@ -2865,6 +2871,12 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 			smmu->features |= ARM_SMMU_FEAT_E2H;
 	}
 
+	if (reg & (IDR0_HA | IDR0_HD)) {
+		smmu->features |= ARM_SMMU_FEAT_HA;
+		if (reg & IDR0_HD)
+			smmu->features |= ARM_SMMU_FEAT_HD;
+	}
+
 	/*
 	 * If the CPU is using VHE, but the SMMU doesn't support it, the SMMU
 	 * will create TLB entries for NH-EL1 world and will miss the
diff --git a/drivers/iommu/iommu-pasid.h b/drivers/iommu/iommu-pasid.h
index 77e449a1655b..46fd44e7f4f1 100644
--- a/drivers/iommu/iommu-pasid.h
+++ b/drivers/iommu/iommu-pasid.h
@@ -79,6 +79,8 @@ struct iommu_pasid_sync_ops {
  *
  * SMMU properties:
  * @stall:	devices attached to the domain are allowed to stall.
+ * @hw_dirty:	hardware may update dirty flag
+ * @hw_access:	hardware may update access flag
  * @asid_bits:	number of ASID bits supported by the SMMU
  *
  * @s1fmt:	PASID table format, chosen by the allocator.
@@ -86,6 +88,8 @@ struct iommu_pasid_sync_ops {
 struct arm_smmu_context_cfg {
 	u8				stall:1;
 	u8				asid_bits;
+	u8				hw_dirty:1;
+	u8				hw_access:1;
 
 #define ARM_SMMU_S1FMT_LINEAR		0x0
 #define ARM_SMMU_S1FMT_4K_L2		0x1
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 27/37] iommu/arm-smmu-v3: Register fault workqueue
  2018-02-12 18:33 ` Jean-Philippe Brucker
  (?)
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: joro-zLv9SwRftAIdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	mark.rutland-5wv7dgnIgG8, catalin.marinas-5wv7dgnIgG8,
	will.deacon-5wv7dgnIgG8, lorenzo.pieralisi-5wv7dgnIgG8,
	hanjun.guo-QSEj5FYQhm4dnm+yROfE0A, sudeep.holla-5wv7dgnIgG8,
	rjw-LthD3rsA81gm4RdzfppkhA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	robin.murphy-5wv7dgnIgG8, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	tn-nYOzD4b6Jr9Wk0Htik3J/w, liubo95-hv44wF8Li93QT0dZR+AlfA,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	jonathan.cameron-hv44wF8Li93QT0dZR+AlfA,
	shunyong.yang-PT9Dzx9SjPiXmMXjJBpWqg,
	nwatters-sgV2jX0FEOL9JmXXK+q4OQ, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	jcrouse-sgV2jX0FEOL9JmXXK+q4OQ, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA,
	yi.l.liu-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	robdclark-Re5JQEeQqe8AvxtiuMwx3w, christian.koenig-5C7GfCeVMHo,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA

When using PRI or Stall, the PRI or event handler enqueues faults into the
core fault queue. Register it based on the SMMU features.

When the core stops using a PASID, it notifies the SMMU to flush all
instances of this PASID from the PRI queue. Add a way to flush the PRI and
event queue. PRI and event thread now take a spinlock while processing the
queue. The flush handler takes this lock to inspect the queue state.
We avoid livelock, where the SMMU adds fault to the queue faster than we
can consume them, by incrementing a 'batch' number on every cycle so the
flush handler only has to wait a complete cycle (two batch increments.)

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/Kconfig       |   1 +
 drivers/iommu/arm-smmu-v3.c | 103 +++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 103 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index d434f7085dc2..d79c68754bb9 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -354,6 +354,7 @@ config ARM_SMMU_V3
 	depends on ARM64
 	select IOMMU_API
 	select IOMMU_SVA
+	select IOMMU_FAULT
 	select IOMMU_IO_PGTABLE_LPAE
 	select ARM_SMMU_V3_CONTEXT
 	select GENERIC_MSI_IRQ_DOMAIN
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 8528704627b5..c5b3a43becaf 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -494,6 +494,10 @@ struct arm_smmu_queue {
 
 	u32 __iomem			*prod_reg;
 	u32 __iomem			*cons_reg;
+
+	/* Event and PRI */
+	u64				batch;
+	wait_queue_head_t		wq;
 };
 
 struct arm_smmu_cmdq {
@@ -610,6 +614,9 @@ struct arm_smmu_device {
 
 	/* IOMMU core code handle */
 	struct iommu_device		iommu;
+
+	/* Notifier for the fault queue */
+	struct notifier_block		faultq_nb;
 };
 
 /* SMMU private data for each master */
@@ -1247,14 +1254,23 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
 static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 {
 	int i;
+	int num_handled = 0;
 	struct arm_smmu_device *smmu = dev;
 	struct arm_smmu_queue *q = &smmu->evtq.q;
+	size_t queue_size = 1 << q->max_n_shift;
 	u64 evt[EVTQ_ENT_DWORDS];
 
+	spin_lock(&q->wq.lock);
 	do {
 		while (!queue_remove_raw(q, evt)) {
 			u8 id = evt[0] >> EVTQ_0_ID_SHIFT & EVTQ_0_ID_MASK;
 
+			if (++num_handled == queue_size) {
+				q->batch++;
+				wake_up_locked(&q->wq);
+				num_handled = 0;
+			}
+
 			dev_info(smmu->dev, "event 0x%02x received:\n", id);
 			for (i = 0; i < ARRAY_SIZE(evt); ++i)
 				dev_info(smmu->dev, "\t0x%016llx\n",
@@ -1272,6 +1288,11 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 
 	/* Sync our overflow flag, as we believe we're up to speed */
 	q->cons = Q_OVF(q, q->prod) | Q_WRP(q, q->cons) | Q_IDX(q, q->cons);
+
+	q->batch++;
+	wake_up_locked(&q->wq);
+	spin_unlock(&q->wq.lock);
+
 	return IRQ_HANDLED;
 }
 
@@ -1315,13 +1336,24 @@ static void arm_smmu_handle_ppr(struct arm_smmu_device *smmu, u64 *evt)
 
 static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
 {
+	int num_handled = 0;
 	struct arm_smmu_device *smmu = dev;
 	struct arm_smmu_queue *q = &smmu->priq.q;
+	size_t queue_size = 1 << q->max_n_shift;
 	u64 evt[PRIQ_ENT_DWORDS];
 
+	spin_lock(&q->wq.lock);
 	do {
-		while (!queue_remove_raw(q, evt))
+		while (!queue_remove_raw(q, evt)) {
+			spin_unlock(&q->wq.lock);
 			arm_smmu_handle_ppr(smmu, evt);
+			spin_lock(&q->wq.lock);
+			if (++num_handled == queue_size) {
+				q->batch++;
+				wake_up_locked(&q->wq);
+				num_handled = 0;
+			}
+		}
 
 		if (queue_sync_prod(q) == -EOVERFLOW)
 			dev_err(smmu->dev, "PRIQ overflow detected -- requests lost\n");
@@ -1329,9 +1361,65 @@ static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
 
 	/* Sync our overflow flag, as we believe we're up to speed */
 	q->cons = Q_OVF(q, q->prod) | Q_WRP(q, q->cons) | Q_IDX(q, q->cons);
+
+	q->batch++;
+	wake_up_locked(&q->wq);
+	spin_unlock(&q->wq.lock);
+
 	return IRQ_HANDLED;
 }
 
+/*
+ * arm_smmu_flush_queue - wait until all events/PPRs currently in the queue have
+ * been consumed.
+ *
+ * Wait until the queue thread finished a batch, or until the queue is empty.
+ * Note that we don't handle overflows on q->batch. If it occurs, just wait for
+ * the queue to be empty.
+ */
+static int arm_smmu_flush_queue(struct arm_smmu_device *smmu,
+				struct arm_smmu_queue *q, const char *name)
+{
+	int ret;
+	u64 batch;
+
+	spin_lock(&q->wq.lock);
+	if (queue_sync_prod(q) == -EOVERFLOW)
+		dev_err(smmu->dev, "%s overflow detected -- requests lost\n", name);
+
+	batch = q->batch;
+	ret = wait_event_interruptible_locked(q->wq, queue_empty(q) ||
+					      q->batch >= batch + 2);
+	spin_unlock(&q->wq.lock);
+
+	return ret;
+}
+
+static int arm_smmu_flush_queues(struct notifier_block *nb,
+				 unsigned long action, void *data)
+{
+	struct arm_smmu_device *smmu = container_of(nb, struct arm_smmu_device,
+						    faultq_nb);
+	struct device *dev = data;
+	struct arm_smmu_master_data *master = NULL;
+
+	if (dev)
+		master = dev->iommu_fwspec->iommu_priv;
+
+	if (master) {
+		/* TODO: add support for PRI and Stall */
+		return 0;
+	}
+
+	/* No target device, flush all queues. */
+	if (smmu->features & ARM_SMMU_FEAT_STALLS)
+		arm_smmu_flush_queue(smmu, &smmu->evtq.q, "evtq");
+	if (smmu->features & ARM_SMMU_FEAT_PRI)
+		arm_smmu_flush_queue(smmu, &smmu->priq.q, "priq");
+
+	return 0;
+}
+
 static int arm_smmu_device_disable(struct arm_smmu_device *smmu);
 
 static irqreturn_t arm_smmu_gerror_handler(int irq, void *dev)
@@ -2288,6 +2376,10 @@ static int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
 		     << Q_BASE_LOG2SIZE_SHIFT;
 
 	q->prod = q->cons = 0;
+
+	init_waitqueue_head(&q->wq);
+	q->batch = 0;
+
 	return 0;
 }
 
@@ -3168,6 +3260,13 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
 	if (ret)
 		return ret;
 
+	if (smmu->features & (ARM_SMMU_FEAT_STALLS | ARM_SMMU_FEAT_PRI)) {
+		smmu->faultq_nb.notifier_call = arm_smmu_flush_queues;
+		ret = iommu_fault_queue_register(&smmu->faultq_nb);
+		if (ret)
+			return ret;
+	}
+
 	/* And we're up. Go go go! */
 	ret = iommu_device_sysfs_add(&smmu->iommu, dev, NULL,
 				     "smmu3.%pa", &ioaddr);
@@ -3210,6 +3309,8 @@ static int arm_smmu_device_remove(struct platform_device *pdev)
 {
 	struct arm_smmu_device *smmu = platform_get_drvdata(pdev);
 
+	iommu_fault_queue_unregister(&smmu->faultq_nb);
+
 	arm_smmu_device_disable(smmu);
 
 	return 0;
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 27/37] iommu/arm-smmu-v3: Register fault workqueue
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

When using PRI or Stall, the PRI or event handler enqueues faults into the
core fault queue. Register it based on the SMMU features.

When the core stops using a PASID, it notifies the SMMU to flush all
instances of this PASID from the PRI queue. Add a way to flush the PRI and
event queue. PRI and event thread now take a spinlock while processing the
queue. The flush handler takes this lock to inspect the queue state.
We avoid livelock, where the SMMU adds fault to the queue faster than we
can consume them, by incrementing a 'batch' number on every cycle so the
flush handler only has to wait a complete cycle (two batch increments.)

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/Kconfig       |   1 +
 drivers/iommu/arm-smmu-v3.c | 103 +++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 103 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index d434f7085dc2..d79c68754bb9 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -354,6 +354,7 @@ config ARM_SMMU_V3
 	depends on ARM64
 	select IOMMU_API
 	select IOMMU_SVA
+	select IOMMU_FAULT
 	select IOMMU_IO_PGTABLE_LPAE
 	select ARM_SMMU_V3_CONTEXT
 	select GENERIC_MSI_IRQ_DOMAIN
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 8528704627b5..c5b3a43becaf 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -494,6 +494,10 @@ struct arm_smmu_queue {
 
 	u32 __iomem			*prod_reg;
 	u32 __iomem			*cons_reg;
+
+	/* Event and PRI */
+	u64				batch;
+	wait_queue_head_t		wq;
 };
 
 struct arm_smmu_cmdq {
@@ -610,6 +614,9 @@ struct arm_smmu_device {
 
 	/* IOMMU core code handle */
 	struct iommu_device		iommu;
+
+	/* Notifier for the fault queue */
+	struct notifier_block		faultq_nb;
 };
 
 /* SMMU private data for each master */
@@ -1247,14 +1254,23 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
 static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 {
 	int i;
+	int num_handled = 0;
 	struct arm_smmu_device *smmu = dev;
 	struct arm_smmu_queue *q = &smmu->evtq.q;
+	size_t queue_size = 1 << q->max_n_shift;
 	u64 evt[EVTQ_ENT_DWORDS];
 
+	spin_lock(&q->wq.lock);
 	do {
 		while (!queue_remove_raw(q, evt)) {
 			u8 id = evt[0] >> EVTQ_0_ID_SHIFT & EVTQ_0_ID_MASK;
 
+			if (++num_handled == queue_size) {
+				q->batch++;
+				wake_up_locked(&q->wq);
+				num_handled = 0;
+			}
+
 			dev_info(smmu->dev, "event 0x%02x received:\n", id);
 			for (i = 0; i < ARRAY_SIZE(evt); ++i)
 				dev_info(smmu->dev, "\t0x%016llx\n",
@@ -1272,6 +1288,11 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 
 	/* Sync our overflow flag, as we believe we're up to speed */
 	q->cons = Q_OVF(q, q->prod) | Q_WRP(q, q->cons) | Q_IDX(q, q->cons);
+
+	q->batch++;
+	wake_up_locked(&q->wq);
+	spin_unlock(&q->wq.lock);
+
 	return IRQ_HANDLED;
 }
 
@@ -1315,13 +1336,24 @@ static void arm_smmu_handle_ppr(struct arm_smmu_device *smmu, u64 *evt)
 
 static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
 {
+	int num_handled = 0;
 	struct arm_smmu_device *smmu = dev;
 	struct arm_smmu_queue *q = &smmu->priq.q;
+	size_t queue_size = 1 << q->max_n_shift;
 	u64 evt[PRIQ_ENT_DWORDS];
 
+	spin_lock(&q->wq.lock);
 	do {
-		while (!queue_remove_raw(q, evt))
+		while (!queue_remove_raw(q, evt)) {
+			spin_unlock(&q->wq.lock);
 			arm_smmu_handle_ppr(smmu, evt);
+			spin_lock(&q->wq.lock);
+			if (++num_handled == queue_size) {
+				q->batch++;
+				wake_up_locked(&q->wq);
+				num_handled = 0;
+			}
+		}
 
 		if (queue_sync_prod(q) == -EOVERFLOW)
 			dev_err(smmu->dev, "PRIQ overflow detected -- requests lost\n");
@@ -1329,9 +1361,65 @@ static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
 
 	/* Sync our overflow flag, as we believe we're up to speed */
 	q->cons = Q_OVF(q, q->prod) | Q_WRP(q, q->cons) | Q_IDX(q, q->cons);
+
+	q->batch++;
+	wake_up_locked(&q->wq);
+	spin_unlock(&q->wq.lock);
+
 	return IRQ_HANDLED;
 }
 
+/*
+ * arm_smmu_flush_queue - wait until all events/PPRs currently in the queue have
+ * been consumed.
+ *
+ * Wait until the queue thread finished a batch, or until the queue is empty.
+ * Note that we don't handle overflows on q->batch. If it occurs, just wait for
+ * the queue to be empty.
+ */
+static int arm_smmu_flush_queue(struct arm_smmu_device *smmu,
+				struct arm_smmu_queue *q, const char *name)
+{
+	int ret;
+	u64 batch;
+
+	spin_lock(&q->wq.lock);
+	if (queue_sync_prod(q) == -EOVERFLOW)
+		dev_err(smmu->dev, "%s overflow detected -- requests lost\n", name);
+
+	batch = q->batch;
+	ret = wait_event_interruptible_locked(q->wq, queue_empty(q) ||
+					      q->batch >= batch + 2);
+	spin_unlock(&q->wq.lock);
+
+	return ret;
+}
+
+static int arm_smmu_flush_queues(struct notifier_block *nb,
+				 unsigned long action, void *data)
+{
+	struct arm_smmu_device *smmu = container_of(nb, struct arm_smmu_device,
+						    faultq_nb);
+	struct device *dev = data;
+	struct arm_smmu_master_data *master = NULL;
+
+	if (dev)
+		master = dev->iommu_fwspec->iommu_priv;
+
+	if (master) {
+		/* TODO: add support for PRI and Stall */
+		return 0;
+	}
+
+	/* No target device, flush all queues. */
+	if (smmu->features & ARM_SMMU_FEAT_STALLS)
+		arm_smmu_flush_queue(smmu, &smmu->evtq.q, "evtq");
+	if (smmu->features & ARM_SMMU_FEAT_PRI)
+		arm_smmu_flush_queue(smmu, &smmu->priq.q, "priq");
+
+	return 0;
+}
+
 static int arm_smmu_device_disable(struct arm_smmu_device *smmu);
 
 static irqreturn_t arm_smmu_gerror_handler(int irq, void *dev)
@@ -2288,6 +2376,10 @@ static int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
 		     << Q_BASE_LOG2SIZE_SHIFT;
 
 	q->prod = q->cons = 0;
+
+	init_waitqueue_head(&q->wq);
+	q->batch = 0;
+
 	return 0;
 }
 
@@ -3168,6 +3260,13 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
 	if (ret)
 		return ret;
 
+	if (smmu->features & (ARM_SMMU_FEAT_STALLS | ARM_SMMU_FEAT_PRI)) {
+		smmu->faultq_nb.notifier_call = arm_smmu_flush_queues;
+		ret = iommu_fault_queue_register(&smmu->faultq_nb);
+		if (ret)
+			return ret;
+	}
+
 	/* And we're up. Go go go! */
 	ret = iommu_device_sysfs_add(&smmu->iommu, dev, NULL,
 				     "smmu3.%pa", &ioaddr);
@@ -3210,6 +3309,8 @@ static int arm_smmu_device_remove(struct platform_device *pdev)
 {
 	struct arm_smmu_device *smmu = platform_get_drvdata(pdev);
 
+	iommu_fault_queue_unregister(&smmu->faultq_nb);
+
 	arm_smmu_device_disable(smmu);
 
 	return 0;
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 27/37] iommu/arm-smmu-v3: Register fault workqueue
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

When using PRI or Stall, the PRI or event handler enqueues faults into the
core fault queue. Register it based on the SMMU features.

When the core stops using a PASID, it notifies the SMMU to flush all
instances of this PASID from the PRI queue. Add a way to flush the PRI and
event queue. PRI and event thread now take a spinlock while processing the
queue. The flush handler takes this lock to inspect the queue state.
We avoid livelock, where the SMMU adds fault to the queue faster than we
can consume them, by incrementing a 'batch' number on every cycle so the
flush handler only has to wait a complete cycle (two batch increments.)

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/Kconfig       |   1 +
 drivers/iommu/arm-smmu-v3.c | 103 +++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 103 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index d434f7085dc2..d79c68754bb9 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -354,6 +354,7 @@ config ARM_SMMU_V3
 	depends on ARM64
 	select IOMMU_API
 	select IOMMU_SVA
+	select IOMMU_FAULT
 	select IOMMU_IO_PGTABLE_LPAE
 	select ARM_SMMU_V3_CONTEXT
 	select GENERIC_MSI_IRQ_DOMAIN
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 8528704627b5..c5b3a43becaf 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -494,6 +494,10 @@ struct arm_smmu_queue {
 
 	u32 __iomem			*prod_reg;
 	u32 __iomem			*cons_reg;
+
+	/* Event and PRI */
+	u64				batch;
+	wait_queue_head_t		wq;
 };
 
 struct arm_smmu_cmdq {
@@ -610,6 +614,9 @@ struct arm_smmu_device {
 
 	/* IOMMU core code handle */
 	struct iommu_device		iommu;
+
+	/* Notifier for the fault queue */
+	struct notifier_block		faultq_nb;
 };
 
 /* SMMU private data for each master */
@@ -1247,14 +1254,23 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
 static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 {
 	int i;
+	int num_handled = 0;
 	struct arm_smmu_device *smmu = dev;
 	struct arm_smmu_queue *q = &smmu->evtq.q;
+	size_t queue_size = 1 << q->max_n_shift;
 	u64 evt[EVTQ_ENT_DWORDS];
 
+	spin_lock(&q->wq.lock);
 	do {
 		while (!queue_remove_raw(q, evt)) {
 			u8 id = evt[0] >> EVTQ_0_ID_SHIFT & EVTQ_0_ID_MASK;
 
+			if (++num_handled == queue_size) {
+				q->batch++;
+				wake_up_locked(&q->wq);
+				num_handled = 0;
+			}
+
 			dev_info(smmu->dev, "event 0x%02x received:\n", id);
 			for (i = 0; i < ARRAY_SIZE(evt); ++i)
 				dev_info(smmu->dev, "\t0x%016llx\n",
@@ -1272,6 +1288,11 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 
 	/* Sync our overflow flag, as we believe we're up to speed */
 	q->cons = Q_OVF(q, q->prod) | Q_WRP(q, q->cons) | Q_IDX(q, q->cons);
+
+	q->batch++;
+	wake_up_locked(&q->wq);
+	spin_unlock(&q->wq.lock);
+
 	return IRQ_HANDLED;
 }
 
@@ -1315,13 +1336,24 @@ static void arm_smmu_handle_ppr(struct arm_smmu_device *smmu, u64 *evt)
 
 static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
 {
+	int num_handled = 0;
 	struct arm_smmu_device *smmu = dev;
 	struct arm_smmu_queue *q = &smmu->priq.q;
+	size_t queue_size = 1 << q->max_n_shift;
 	u64 evt[PRIQ_ENT_DWORDS];
 
+	spin_lock(&q->wq.lock);
 	do {
-		while (!queue_remove_raw(q, evt))
+		while (!queue_remove_raw(q, evt)) {
+			spin_unlock(&q->wq.lock);
 			arm_smmu_handle_ppr(smmu, evt);
+			spin_lock(&q->wq.lock);
+			if (++num_handled == queue_size) {
+				q->batch++;
+				wake_up_locked(&q->wq);
+				num_handled = 0;
+			}
+		}
 
 		if (queue_sync_prod(q) == -EOVERFLOW)
 			dev_err(smmu->dev, "PRIQ overflow detected -- requests lost\n");
@@ -1329,9 +1361,65 @@ static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
 
 	/* Sync our overflow flag, as we believe we're up to speed */
 	q->cons = Q_OVF(q, q->prod) | Q_WRP(q, q->cons) | Q_IDX(q, q->cons);
+
+	q->batch++;
+	wake_up_locked(&q->wq);
+	spin_unlock(&q->wq.lock);
+
 	return IRQ_HANDLED;
 }
 
+/*
+ * arm_smmu_flush_queue - wait until all events/PPRs currently in the queue have
+ * been consumed.
+ *
+ * Wait until the queue thread finished a batch, or until the queue is empty.
+ * Note that we don't handle overflows on q->batch. If it occurs, just wait for
+ * the queue to be empty.
+ */
+static int arm_smmu_flush_queue(struct arm_smmu_device *smmu,
+				struct arm_smmu_queue *q, const char *name)
+{
+	int ret;
+	u64 batch;
+
+	spin_lock(&q->wq.lock);
+	if (queue_sync_prod(q) == -EOVERFLOW)
+		dev_err(smmu->dev, "%s overflow detected -- requests lost\n", name);
+
+	batch = q->batch;
+	ret = wait_event_interruptible_locked(q->wq, queue_empty(q) ||
+					      q->batch >= batch + 2);
+	spin_unlock(&q->wq.lock);
+
+	return ret;
+}
+
+static int arm_smmu_flush_queues(struct notifier_block *nb,
+				 unsigned long action, void *data)
+{
+	struct arm_smmu_device *smmu = container_of(nb, struct arm_smmu_device,
+						    faultq_nb);
+	struct device *dev = data;
+	struct arm_smmu_master_data *master = NULL;
+
+	if (dev)
+		master = dev->iommu_fwspec->iommu_priv;
+
+	if (master) {
+		/* TODO: add support for PRI and Stall */
+		return 0;
+	}
+
+	/* No target device, flush all queues. */
+	if (smmu->features & ARM_SMMU_FEAT_STALLS)
+		arm_smmu_flush_queue(smmu, &smmu->evtq.q, "evtq");
+	if (smmu->features & ARM_SMMU_FEAT_PRI)
+		arm_smmu_flush_queue(smmu, &smmu->priq.q, "priq");
+
+	return 0;
+}
+
 static int arm_smmu_device_disable(struct arm_smmu_device *smmu);
 
 static irqreturn_t arm_smmu_gerror_handler(int irq, void *dev)
@@ -2288,6 +2376,10 @@ static int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
 		     << Q_BASE_LOG2SIZE_SHIFT;
 
 	q->prod = q->cons = 0;
+
+	init_waitqueue_head(&q->wq);
+	q->batch = 0;
+
 	return 0;
 }
 
@@ -3168,6 +3260,13 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
 	if (ret)
 		return ret;
 
+	if (smmu->features & (ARM_SMMU_FEAT_STALLS | ARM_SMMU_FEAT_PRI)) {
+		smmu->faultq_nb.notifier_call = arm_smmu_flush_queues;
+		ret = iommu_fault_queue_register(&smmu->faultq_nb);
+		if (ret)
+			return ret;
+	}
+
 	/* And we're up. Go go go! */
 	ret = iommu_device_sysfs_add(&smmu->iommu, dev, NULL,
 				     "smmu3.%pa", &ioaddr);
@@ -3210,6 +3309,8 @@ static int arm_smmu_device_remove(struct platform_device *pdev)
 {
 	struct arm_smmu_device *smmu = platform_get_drvdata(pdev);
 
+	iommu_fault_queue_unregister(&smmu->faultq_nb);
+
 	arm_smmu_device_disable(smmu);
 
 	return 0;
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 28/37] iommu/arm-smmu-v3: Maintain a SID->device structure
  2018-02-12 18:33 ` Jean-Philippe Brucker
  (?)
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: joro-zLv9SwRftAIdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	mark.rutland-5wv7dgnIgG8, catalin.marinas-5wv7dgnIgG8,
	will.deacon-5wv7dgnIgG8, lorenzo.pieralisi-5wv7dgnIgG8,
	hanjun.guo-QSEj5FYQhm4dnm+yROfE0A, sudeep.holla-5wv7dgnIgG8,
	rjw-LthD3rsA81gm4RdzfppkhA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	robin.murphy-5wv7dgnIgG8, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	tn-nYOzD4b6Jr9Wk0Htik3J/w, liubo95-hv44wF8Li93QT0dZR+AlfA,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	jonathan.cameron-hv44wF8Li93QT0dZR+AlfA,
	shunyong.yang-PT9Dzx9SjPiXmMXjJBpWqg,
	nwatters-sgV2jX0FEOL9JmXXK+q4OQ, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	jcrouse-sgV2jX0FEOL9JmXXK+q4OQ, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA,
	yi.l.liu-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	robdclark-Re5JQEeQqe8AvxtiuMwx3w, christian.koenig-5C7GfCeVMHo,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA

When handling faults from the event or PRI queue, we need to find the
struct device associated to a SID. Add a rb_tree to keep track of SIDs.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu-v3.c | 105 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 105 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index c5b3a43becaf..2430b2140f8d 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -615,10 +615,19 @@ struct arm_smmu_device {
 	/* IOMMU core code handle */
 	struct iommu_device		iommu;
 
+	struct rb_root			streams;
+	struct mutex			streams_mutex;
+
 	/* Notifier for the fault queue */
 	struct notifier_block		faultq_nb;
 };
 
+struct arm_smmu_stream {
+	u32				id;
+	struct arm_smmu_master_data	*master;
+	struct rb_node			node;
+};
+
 /* SMMU private data for each master */
 struct arm_smmu_master_data {
 	struct arm_smmu_device		*smmu;
@@ -626,6 +635,7 @@ struct arm_smmu_master_data {
 
 	struct arm_smmu_domain		*domain;
 	struct list_head		list; /* domain->devices */
+	struct arm_smmu_stream		*streams;
 
 	struct device			*dev;
 
@@ -1250,6 +1260,31 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
 	return 0;
 }
 
+static struct arm_smmu_master_data *
+arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
+{
+	struct rb_node *node;
+	struct arm_smmu_stream *stream;
+	struct arm_smmu_master_data *master = NULL;
+
+	mutex_lock(&smmu->streams_mutex);
+	node = smmu->streams.rb_node;
+	while (node) {
+		stream = rb_entry(node, struct arm_smmu_stream, node);
+		if (stream->id < sid) {
+			node = node->rb_right;
+		} else if (stream->id > sid) {
+			node = node->rb_left;
+		} else {
+			master = stream->master;
+			break;
+		}
+	}
+	mutex_unlock(&smmu->streams_mutex);
+
+	return master;
+}
+
 /* IRQ and event handlers */
 static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 {
@@ -2146,6 +2181,71 @@ static bool arm_smmu_sid_in_range(struct arm_smmu_device *smmu, u32 sid)
 	return sid < limit;
 }
 
+static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
+				  struct arm_smmu_master_data *master)
+{
+	int i;
+	int ret = 0;
+	struct arm_smmu_stream *new_stream, *cur_stream;
+	struct rb_node **new_node, *parent_node = NULL;
+	struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
+
+	master->streams = kcalloc(fwspec->num_ids,
+				  sizeof(struct arm_smmu_stream), GFP_KERNEL);
+	if (!master->streams)
+		return -ENOMEM;
+
+	mutex_lock(&smmu->streams_mutex);
+	for (i = 0; i < fwspec->num_ids && !ret; i++) {
+		new_stream = &master->streams[i];
+		new_stream->id = fwspec->ids[i];
+		new_stream->master = master;
+
+		new_node = &(smmu->streams.rb_node);
+		while (*new_node) {
+			cur_stream = rb_entry(*new_node, struct arm_smmu_stream,
+					      node);
+			parent_node = *new_node;
+			if (cur_stream->id > new_stream->id) {
+				new_node = &((*new_node)->rb_left);
+			} else if (cur_stream->id < new_stream->id) {
+				new_node = &((*new_node)->rb_right);
+			} else {
+				dev_warn(master->dev,
+					 "stream %u already in tree\n",
+					 cur_stream->id);
+				ret = -EINVAL;
+				break;
+			}
+		}
+
+		if (!ret) {
+			rb_link_node(&new_stream->node, parent_node, new_node);
+			rb_insert_color(&new_stream->node, &smmu->streams);
+		}
+	}
+	mutex_unlock(&smmu->streams_mutex);
+
+	return ret;
+}
+
+static void arm_smmu_remove_master(struct arm_smmu_device *smmu,
+				   struct arm_smmu_master_data *master)
+{
+	int i;
+	struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
+
+	if (!master->streams)
+		return;
+
+	mutex_lock(&smmu->streams_mutex);
+	for (i = 0; i < fwspec->num_ids; i++)
+		rb_erase(&master->streams[i].node, &smmu->streams);
+	mutex_unlock(&smmu->streams_mutex);
+
+	kfree(master->streams);
+}
+
 static struct iommu_ops arm_smmu_ops;
 
 static int arm_smmu_add_device(struct device *dev)
@@ -2198,6 +2298,7 @@ static int arm_smmu_add_device(struct device *dev)
 
 	group = iommu_group_get_for_dev(dev);
 	if (!IS_ERR(group)) {
+		arm_smmu_insert_master(smmu, master);
 		iommu_group_put(group);
 		iommu_device_link(&smmu->iommu, dev);
 	}
@@ -2218,6 +2319,7 @@ static void arm_smmu_remove_device(struct device *dev)
 	smmu = master->smmu;
 	if (master && master->ste.assigned)
 		arm_smmu_detach_dev(dev);
+	arm_smmu_remove_master(smmu, master);
 	iommu_group_remove_device(dev);
 	iommu_device_unlink(&smmu->iommu, dev);
 	kfree(master);
@@ -2527,6 +2629,9 @@ static int arm_smmu_init_structures(struct arm_smmu_device *smmu)
 	int ret;
 
 	atomic_set(&smmu->sync_nr, 0);
+	mutex_init(&smmu->streams_mutex);
+	smmu->streams = RB_ROOT;
+
 	ret = arm_smmu_init_queues(smmu);
 	if (ret)
 		return ret;
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 28/37] iommu/arm-smmu-v3: Maintain a SID->device structure
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

When handling faults from the event or PRI queue, we need to find the
struct device associated to a SID. Add a rb_tree to keep track of SIDs.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 105 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 105 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index c5b3a43becaf..2430b2140f8d 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -615,10 +615,19 @@ struct arm_smmu_device {
 	/* IOMMU core code handle */
 	struct iommu_device		iommu;
 
+	struct rb_root			streams;
+	struct mutex			streams_mutex;
+
 	/* Notifier for the fault queue */
 	struct notifier_block		faultq_nb;
 };
 
+struct arm_smmu_stream {
+	u32				id;
+	struct arm_smmu_master_data	*master;
+	struct rb_node			node;
+};
+
 /* SMMU private data for each master */
 struct arm_smmu_master_data {
 	struct arm_smmu_device		*smmu;
@@ -626,6 +635,7 @@ struct arm_smmu_master_data {
 
 	struct arm_smmu_domain		*domain;
 	struct list_head		list; /* domain->devices */
+	struct arm_smmu_stream		*streams;
 
 	struct device			*dev;
 
@@ -1250,6 +1260,31 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
 	return 0;
 }
 
+static struct arm_smmu_master_data *
+arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
+{
+	struct rb_node *node;
+	struct arm_smmu_stream *stream;
+	struct arm_smmu_master_data *master = NULL;
+
+	mutex_lock(&smmu->streams_mutex);
+	node = smmu->streams.rb_node;
+	while (node) {
+		stream = rb_entry(node, struct arm_smmu_stream, node);
+		if (stream->id < sid) {
+			node = node->rb_right;
+		} else if (stream->id > sid) {
+			node = node->rb_left;
+		} else {
+			master = stream->master;
+			break;
+		}
+	}
+	mutex_unlock(&smmu->streams_mutex);
+
+	return master;
+}
+
 /* IRQ and event handlers */
 static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 {
@@ -2146,6 +2181,71 @@ static bool arm_smmu_sid_in_range(struct arm_smmu_device *smmu, u32 sid)
 	return sid < limit;
 }
 
+static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
+				  struct arm_smmu_master_data *master)
+{
+	int i;
+	int ret = 0;
+	struct arm_smmu_stream *new_stream, *cur_stream;
+	struct rb_node **new_node, *parent_node = NULL;
+	struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
+
+	master->streams = kcalloc(fwspec->num_ids,
+				  sizeof(struct arm_smmu_stream), GFP_KERNEL);
+	if (!master->streams)
+		return -ENOMEM;
+
+	mutex_lock(&smmu->streams_mutex);
+	for (i = 0; i < fwspec->num_ids && !ret; i++) {
+		new_stream = &master->streams[i];
+		new_stream->id = fwspec->ids[i];
+		new_stream->master = master;
+
+		new_node = &(smmu->streams.rb_node);
+		while (*new_node) {
+			cur_stream = rb_entry(*new_node, struct arm_smmu_stream,
+					      node);
+			parent_node = *new_node;
+			if (cur_stream->id > new_stream->id) {
+				new_node = &((*new_node)->rb_left);
+			} else if (cur_stream->id < new_stream->id) {
+				new_node = &((*new_node)->rb_right);
+			} else {
+				dev_warn(master->dev,
+					 "stream %u already in tree\n",
+					 cur_stream->id);
+				ret = -EINVAL;
+				break;
+			}
+		}
+
+		if (!ret) {
+			rb_link_node(&new_stream->node, parent_node, new_node);
+			rb_insert_color(&new_stream->node, &smmu->streams);
+		}
+	}
+	mutex_unlock(&smmu->streams_mutex);
+
+	return ret;
+}
+
+static void arm_smmu_remove_master(struct arm_smmu_device *smmu,
+				   struct arm_smmu_master_data *master)
+{
+	int i;
+	struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
+
+	if (!master->streams)
+		return;
+
+	mutex_lock(&smmu->streams_mutex);
+	for (i = 0; i < fwspec->num_ids; i++)
+		rb_erase(&master->streams[i].node, &smmu->streams);
+	mutex_unlock(&smmu->streams_mutex);
+
+	kfree(master->streams);
+}
+
 static struct iommu_ops arm_smmu_ops;
 
 static int arm_smmu_add_device(struct device *dev)
@@ -2198,6 +2298,7 @@ static int arm_smmu_add_device(struct device *dev)
 
 	group = iommu_group_get_for_dev(dev);
 	if (!IS_ERR(group)) {
+		arm_smmu_insert_master(smmu, master);
 		iommu_group_put(group);
 		iommu_device_link(&smmu->iommu, dev);
 	}
@@ -2218,6 +2319,7 @@ static void arm_smmu_remove_device(struct device *dev)
 	smmu = master->smmu;
 	if (master && master->ste.assigned)
 		arm_smmu_detach_dev(dev);
+	arm_smmu_remove_master(smmu, master);
 	iommu_group_remove_device(dev);
 	iommu_device_unlink(&smmu->iommu, dev);
 	kfree(master);
@@ -2527,6 +2629,9 @@ static int arm_smmu_init_structures(struct arm_smmu_device *smmu)
 	int ret;
 
 	atomic_set(&smmu->sync_nr, 0);
+	mutex_init(&smmu->streams_mutex);
+	smmu->streams = RB_ROOT;
+
 	ret = arm_smmu_init_queues(smmu);
 	if (ret)
 		return ret;
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 28/37] iommu/arm-smmu-v3: Maintain a SID->device structure
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

When handling faults from the event or PRI queue, we need to find the
struct device associated to a SID. Add a rb_tree to keep track of SIDs.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 105 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 105 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index c5b3a43becaf..2430b2140f8d 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -615,10 +615,19 @@ struct arm_smmu_device {
 	/* IOMMU core code handle */
 	struct iommu_device		iommu;
 
+	struct rb_root			streams;
+	struct mutex			streams_mutex;
+
 	/* Notifier for the fault queue */
 	struct notifier_block		faultq_nb;
 };
 
+struct arm_smmu_stream {
+	u32				id;
+	struct arm_smmu_master_data	*master;
+	struct rb_node			node;
+};
+
 /* SMMU private data for each master */
 struct arm_smmu_master_data {
 	struct arm_smmu_device		*smmu;
@@ -626,6 +635,7 @@ struct arm_smmu_master_data {
 
 	struct arm_smmu_domain		*domain;
 	struct list_head		list; /* domain->devices */
+	struct arm_smmu_stream		*streams;
 
 	struct device			*dev;
 
@@ -1250,6 +1260,31 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
 	return 0;
 }
 
+static struct arm_smmu_master_data *
+arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
+{
+	struct rb_node *node;
+	struct arm_smmu_stream *stream;
+	struct arm_smmu_master_data *master = NULL;
+
+	mutex_lock(&smmu->streams_mutex);
+	node = smmu->streams.rb_node;
+	while (node) {
+		stream = rb_entry(node, struct arm_smmu_stream, node);
+		if (stream->id < sid) {
+			node = node->rb_right;
+		} else if (stream->id > sid) {
+			node = node->rb_left;
+		} else {
+			master = stream->master;
+			break;
+		}
+	}
+	mutex_unlock(&smmu->streams_mutex);
+
+	return master;
+}
+
 /* IRQ and event handlers */
 static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 {
@@ -2146,6 +2181,71 @@ static bool arm_smmu_sid_in_range(struct arm_smmu_device *smmu, u32 sid)
 	return sid < limit;
 }
 
+static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
+				  struct arm_smmu_master_data *master)
+{
+	int i;
+	int ret = 0;
+	struct arm_smmu_stream *new_stream, *cur_stream;
+	struct rb_node **new_node, *parent_node = NULL;
+	struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
+
+	master->streams = kcalloc(fwspec->num_ids,
+				  sizeof(struct arm_smmu_stream), GFP_KERNEL);
+	if (!master->streams)
+		return -ENOMEM;
+
+	mutex_lock(&smmu->streams_mutex);
+	for (i = 0; i < fwspec->num_ids && !ret; i++) {
+		new_stream = &master->streams[i];
+		new_stream->id = fwspec->ids[i];
+		new_stream->master = master;
+
+		new_node = &(smmu->streams.rb_node);
+		while (*new_node) {
+			cur_stream = rb_entry(*new_node, struct arm_smmu_stream,
+					      node);
+			parent_node = *new_node;
+			if (cur_stream->id > new_stream->id) {
+				new_node = &((*new_node)->rb_left);
+			} else if (cur_stream->id < new_stream->id) {
+				new_node = &((*new_node)->rb_right);
+			} else {
+				dev_warn(master->dev,
+					 "stream %u already in tree\n",
+					 cur_stream->id);
+				ret = -EINVAL;
+				break;
+			}
+		}
+
+		if (!ret) {
+			rb_link_node(&new_stream->node, parent_node, new_node);
+			rb_insert_color(&new_stream->node, &smmu->streams);
+		}
+	}
+	mutex_unlock(&smmu->streams_mutex);
+
+	return ret;
+}
+
+static void arm_smmu_remove_master(struct arm_smmu_device *smmu,
+				   struct arm_smmu_master_data *master)
+{
+	int i;
+	struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
+
+	if (!master->streams)
+		return;
+
+	mutex_lock(&smmu->streams_mutex);
+	for (i = 0; i < fwspec->num_ids; i++)
+		rb_erase(&master->streams[i].node, &smmu->streams);
+	mutex_unlock(&smmu->streams_mutex);
+
+	kfree(master->streams);
+}
+
 static struct iommu_ops arm_smmu_ops;
 
 static int arm_smmu_add_device(struct device *dev)
@@ -2198,6 +2298,7 @@ static int arm_smmu_add_device(struct device *dev)
 
 	group = iommu_group_get_for_dev(dev);
 	if (!IS_ERR(group)) {
+		arm_smmu_insert_master(smmu, master);
 		iommu_group_put(group);
 		iommu_device_link(&smmu->iommu, dev);
 	}
@@ -2218,6 +2319,7 @@ static void arm_smmu_remove_device(struct device *dev)
 	smmu = master->smmu;
 	if (master && master->ste.assigned)
 		arm_smmu_detach_dev(dev);
+	arm_smmu_remove_master(smmu, master);
 	iommu_group_remove_device(dev);
 	iommu_device_unlink(&smmu->iommu, dev);
 	kfree(master);
@@ -2527,6 +2629,9 @@ static int arm_smmu_init_structures(struct arm_smmu_device *smmu)
 	int ret;
 
 	atomic_set(&smmu->sync_nr, 0);
+	mutex_init(&smmu->streams_mutex);
+	smmu->streams = RB_ROOT;
+
 	ret = arm_smmu_init_queues(smmu);
 	if (ret)
 		return ret;
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 29/37] iommu/arm-smmu-v3: Add stall support for platform devices
  2018-02-12 18:33 ` Jean-Philippe Brucker
  (?)
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: joro-zLv9SwRftAIdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	mark.rutland-5wv7dgnIgG8, catalin.marinas-5wv7dgnIgG8,
	will.deacon-5wv7dgnIgG8, lorenzo.pieralisi-5wv7dgnIgG8,
	hanjun.guo-QSEj5FYQhm4dnm+yROfE0A, sudeep.holla-5wv7dgnIgG8,
	rjw-LthD3rsA81gm4RdzfppkhA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	robin.murphy-5wv7dgnIgG8, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	tn-nYOzD4b6Jr9Wk0Htik3J/w, liubo95-hv44wF8Li93QT0dZR+AlfA,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	jonathan.cameron-hv44wF8Li93QT0dZR+AlfA,
	shunyong.yang-PT9Dzx9SjPiXmMXjJBpWqg,
	nwatters-sgV2jX0FEOL9JmXXK+q4OQ, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	jcrouse-sgV2jX0FEOL9JmXXK+q4OQ, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA,
	yi.l.liu-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	robdclark-Re5JQEeQqe8AvxtiuMwx3w, christian.koenig-5C7GfCeVMHo,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA

The SMMU provides a Stall model for handling page faults in platform
devices. It is similar to PCI PRI, but doesn't require devices to have
their own translation cache. Instead, faulting transactions are parked and
the OS is given a chance to fix the page tables and retry the transaction.

Enable stall for devices that support it (opt-in by firmware). When an
event corresponds to a translation error, call the IOMMU fault handler. If
the fault is recoverable, it will call us back to terminate or continue
the stall.

Note that this patch tweaks the iommu_fault_event and page_response_msg to
extend the fault id field. Stall uses 16 bits of IDs whereas PCI PRI only
uses 9.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu-v3.c | 175 +++++++++++++++++++++++++++++++++++++++++++-
 include/linux/iommu.h       |   4 +-
 2 files changed, 173 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 2430b2140f8d..8b9f5dd06be0 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -338,6 +338,15 @@
 #define CMDQ_PRI_1_RESP_FAIL		(1UL << CMDQ_PRI_1_RESP_SHIFT)
 #define CMDQ_PRI_1_RESP_SUCC		(2UL << CMDQ_PRI_1_RESP_SHIFT)
 
+#define CMDQ_RESUME_0_SID_SHIFT		32
+#define CMDQ_RESUME_0_SID_MASK		0xffffffffUL
+#define CMDQ_RESUME_0_ACTION_SHIFT	12
+#define CMDQ_RESUME_0_ACTION_TERM	(0UL << CMDQ_RESUME_0_ACTION_SHIFT)
+#define CMDQ_RESUME_0_ACTION_RETRY	(1UL << CMDQ_RESUME_0_ACTION_SHIFT)
+#define CMDQ_RESUME_0_ACTION_ABORT	(2UL << CMDQ_RESUME_0_ACTION_SHIFT)
+#define CMDQ_RESUME_1_STAG_SHIFT	0
+#define CMDQ_RESUME_1_STAG_MASK		0xffffUL
+
 #define CMDQ_SYNC_0_CS_SHIFT		12
 #define CMDQ_SYNC_0_CS_NONE		(0UL << CMDQ_SYNC_0_CS_SHIFT)
 #define CMDQ_SYNC_0_CS_IRQ		(1UL << CMDQ_SYNC_0_CS_SHIFT)
@@ -358,6 +367,31 @@
 #define EVTQ_0_ID_SHIFT			0
 #define EVTQ_0_ID_MASK			0xffUL
 
+#define EVT_ID_TRANSLATION_FAULT	0x10
+#define EVT_ID_ADDR_SIZE_FAULT		0x11
+#define EVT_ID_ACCESS_FAULT		0x12
+#define EVT_ID_PERMISSION_FAULT		0x13
+
+#define EVTQ_0_SSV			(1UL << 11)
+#define EVTQ_0_SSID_SHIFT		12
+#define EVTQ_0_SSID_MASK		0xfffffUL
+#define EVTQ_0_SID_SHIFT		32
+#define EVTQ_0_SID_MASK			0xffffffffUL
+#define EVTQ_1_STAG_SHIFT		0
+#define EVTQ_1_STAG_MASK		0xffffUL
+#define EVTQ_1_STALL			(1UL << 31)
+#define EVTQ_1_PRIV			(1UL << 33)
+#define EVTQ_1_EXEC			(1UL << 34)
+#define EVTQ_1_READ			(1UL << 35)
+#define EVTQ_1_S2			(1UL << 39)
+#define EVTQ_1_CLASS_SHIFT		40
+#define EVTQ_1_CLASS_MASK		0x3UL
+#define EVTQ_1_TT_READ			(1UL << 44)
+#define EVTQ_2_ADDR_SHIFT		0
+#define EVTQ_2_ADDR_MASK		0xffffffffffffffffUL
+#define EVTQ_3_IPA_SHIFT		12
+#define EVTQ_3_IPA_MASK			0xffffffffffUL
+
 /* PRI queue */
 #define PRIQ_ENT_DWORDS			2
 #define PRIQ_MAX_SZ_SHIFT		8
@@ -472,6 +506,13 @@ struct arm_smmu_cmdq_ent {
 			enum pri_resp		resp;
 		} pri;
 
+		#define CMDQ_OP_RESUME		0x44
+		struct {
+			u32			sid;
+			u16			stag;
+			enum page_response_code	resp;
+		} resume;
+
 		#define CMDQ_OP_CMD_SYNC	0x46
 		struct {
 			u32			msidata;
@@ -545,6 +586,8 @@ struct arm_smmu_strtab_ent {
 	bool				assigned;
 	struct arm_smmu_s1_cfg		*s1_cfg;
 	struct arm_smmu_s2_cfg		*s2_cfg;
+
+	bool				can_stall;
 };
 
 struct arm_smmu_strtab_cfg {
@@ -904,6 +947,21 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 			return -EINVAL;
 		}
 		break;
+	case CMDQ_OP_RESUME:
+		cmd[0] |= (u64)ent->resume.sid << CMDQ_RESUME_0_SID_SHIFT;
+		cmd[1] |= ent->resume.stag << CMDQ_RESUME_1_STAG_SHIFT;
+		switch (ent->resume.resp) {
+		case IOMMU_PAGE_RESP_INVALID:
+		case IOMMU_PAGE_RESP_FAILURE:
+			cmd[0] |= CMDQ_RESUME_0_ACTION_ABORT;
+			break;
+		case IOMMU_PAGE_RESP_SUCCESS:
+			cmd[0] |= CMDQ_RESUME_0_ACTION_RETRY;
+			break;
+		default:
+			return -EINVAL;
+		}
+		break;
 	case CMDQ_OP_CMD_SYNC:
 		if (ent->sync.msiaddr)
 			cmd[0] |= CMDQ_SYNC_0_CS_IRQ;
@@ -1065,6 +1123,35 @@ static void arm_smmu_cmdq_issue_sync(struct arm_smmu_device *smmu)
 		dev_err_ratelimited(smmu->dev, "CMD_SYNC timeout\n");
 }
 
+static int arm_smmu_page_response(struct iommu_domain *domain,
+				  struct device *dev,
+				  struct page_response_msg *resp)
+{
+	int sid = dev->iommu_fwspec->ids[0];
+	struct arm_smmu_cmdq_ent cmd = {0};
+	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
+
+	if (master->ste.can_stall) {
+		cmd.opcode		= CMDQ_OP_RESUME;
+		cmd.resume.sid		= sid;
+		cmd.resume.stag		= resp->page_req_group_id;
+		cmd.resume.resp		= resp->resp_code;
+	} else {
+		/* TODO: put PRI response here */
+		return -EINVAL;
+	}
+
+	arm_smmu_cmdq_issue_cmd(master->smmu, &cmd);
+	/*
+	 * Don't send a SYNC, it doesn't do anything for RESUME or PRI_RESP.
+	 * RESUME consumption guarantees that the stalled transaction will be
+	 * terminated... at some point in the future. PRI_RESP is fire and
+	 * forget.
+	 */
+
+	return 0;
+}
+
 /* Stream table manipulation functions */
 static void
 arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
@@ -1182,7 +1269,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 			 STRTAB_STE_1_STRW_SHIFT);
 
 		if (smmu->features & ARM_SMMU_FEAT_STALLS &&
-		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
+		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE) &&
+		   !ste->can_stall)
 			dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
 
 		val |= (cfg->base & STRTAB_STE_0_S1CTXPTR_MASK
@@ -1285,10 +1373,73 @@ arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
 	return master;
 }
 
+static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt)
+{
+	struct arm_smmu_master_data *master;
+	u8 type = evt[0] >> EVTQ_0_ID_SHIFT & EVTQ_0_ID_MASK;
+	u32 sid = evt[0] >> EVTQ_0_SID_SHIFT & EVTQ_0_SID_MASK;
+
+	struct iommu_fault_event fault = {
+		.page_req_group_id = evt[1] >> EVTQ_1_STAG_SHIFT & EVTQ_1_STAG_MASK,
+		.addr		= evt[2] >> EVTQ_2_ADDR_SHIFT & EVTQ_2_ADDR_MASK,
+		.last_req	= true,
+	};
+
+	switch (type) {
+	case EVT_ID_TRANSLATION_FAULT:
+	case EVT_ID_ADDR_SIZE_FAULT:
+	case EVT_ID_ACCESS_FAULT:
+		fault.reason = IOMMU_FAULT_REASON_PTE_FETCH;
+		break;
+	case EVT_ID_PERMISSION_FAULT:
+		fault.reason = IOMMU_FAULT_REASON_PERMISSION;
+		break;
+	default:
+		/* TODO: report other unrecoverable faults. */
+		return -EFAULT;
+	}
+
+	/* Stage-2 is always pinned at the moment */
+	if (evt[1] & EVTQ_1_S2)
+		return -EFAULT;
+
+	master = arm_smmu_find_master(smmu, sid);
+	if (!master)
+		return -EINVAL;
+
+	/*
+	 * The domain is valid until the fault returns, because detach() flushes
+	 * the fault queue.
+	 */
+	if (evt[1] & EVTQ_1_STALL)
+		fault.type = IOMMU_FAULT_PAGE_REQ;
+	else
+		fault.type = IOMMU_FAULT_DMA_UNRECOV;
+
+	if (evt[1] & EVTQ_1_READ)
+		fault.prot |= IOMMU_FAULT_READ;
+	else
+		fault.prot |= IOMMU_FAULT_WRITE;
+
+	if (evt[1] & EVTQ_1_EXEC)
+		fault.prot |= IOMMU_FAULT_EXEC;
+
+	if (evt[1] & EVTQ_1_PRIV)
+		fault.prot |= IOMMU_FAULT_PRIV;
+
+	if (evt[0] & EVTQ_0_SSV) {
+		fault.pasid_valid = true;
+		fault.pasid = evt[0] >> EVTQ_0_SSID_SHIFT & EVTQ_0_SSID_MASK;
+	}
+
+	/* Report to device driver or populate the page tables */
+	return iommu_report_device_fault(master->dev, &fault);
+}
+
 /* IRQ and event handlers */
 static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 {
-	int i;
+	int i, ret;
 	int num_handled = 0;
 	struct arm_smmu_device *smmu = dev;
 	struct arm_smmu_queue *q = &smmu->evtq.q;
@@ -1300,12 +1451,19 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 		while (!queue_remove_raw(q, evt)) {
 			u8 id = evt[0] >> EVTQ_0_ID_SHIFT & EVTQ_0_ID_MASK;
 
+			spin_unlock(&q->wq.lock);
+			ret = arm_smmu_handle_evt(smmu, evt);
+			spin_lock(&q->wq.lock);
+
 			if (++num_handled == queue_size) {
 				q->batch++;
 				wake_up_locked(&q->wq);
 				num_handled = 0;
 			}
 
+			if (!ret)
+				continue;
+
 			dev_info(smmu->dev, "event 0x%02x received:\n", id);
 			for (i = 0; i < ARRAY_SIZE(evt); ++i)
 				dev_info(smmu->dev, "\t0x%016llx\n",
@@ -1442,7 +1600,9 @@ static int arm_smmu_flush_queues(struct notifier_block *nb,
 		master = dev->iommu_fwspec->iommu_priv;
 
 	if (master) {
-		/* TODO: add support for PRI and Stall */
+		if (master->ste.can_stall)
+			arm_smmu_flush_queue(smmu, &smmu->evtq.q, "evtq");
+		/* TODO: add support for PRI */
 		return 0;
 	}
 
@@ -1756,7 +1916,8 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 		.order			= master->ssid_bits,
 		.sync			= &arm_smmu_ctx_sync,
 		.arm_smmu = {
-			.stall		= !!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE),
+			.stall		= !!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE) ||
+					  master->ste.can_stall,
 			.asid_bits	= smmu->asid_bits,
 			.hw_access	= !!(smmu->features & ARM_SMMU_FEAT_HA),
 			.hw_dirty	= !!(smmu->features & ARM_SMMU_FEAT_HD),
@@ -2296,6 +2457,11 @@ static int arm_smmu_add_device(struct device *dev)
 
 	master->ssid_bits = min(smmu->ssid_bits, fwspec->num_pasid_bits);
 
+	if (fwspec->can_stall && smmu->features & ARM_SMMU_FEAT_STALLS) {
+		master->can_fault = true;
+		master->ste.can_stall = true;
+	}
+
 	group = iommu_group_get_for_dev(dev);
 	if (!IS_ERR(group)) {
 		arm_smmu_insert_master(smmu, master);
@@ -2435,6 +2601,7 @@ static struct iommu_ops arm_smmu_ops = {
 	.mm_attach		= arm_smmu_mm_attach,
 	.mm_detach		= arm_smmu_mm_detach,
 	.mm_invalidate		= arm_smmu_mm_invalidate,
+	.page_response		= arm_smmu_page_response,
 	.map			= arm_smmu_map,
 	.unmap			= arm_smmu_unmap,
 	.map_sg			= default_iommu_map_sg,
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 37c3b9d087ce..f5c2f4be2b42 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -227,7 +227,7 @@ struct page_response_msg {
 	u32 pasid;
 	enum page_response_code resp_code;
 	u32 pasid_present:1;
-	u32 page_req_group_id : 9;
+	u32 page_req_group_id;
 	enum page_response_type type;
 	u32 private_data;
 };
@@ -421,7 +421,7 @@ struct iommu_fault_event {
 	enum iommu_fault_reason reason;
 	u64 addr;
 	u32 pasid;
-	u32 page_req_group_id : 9;
+	u32 page_req_group_id;
 	u32 last_req : 1;
 	u32 pasid_valid : 1;
 	u32 prot;
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 29/37] iommu/arm-smmu-v3: Add stall support for platform devices
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

The SMMU provides a Stall model for handling page faults in platform
devices. It is similar to PCI PRI, but doesn't require devices to have
their own translation cache. Instead, faulting transactions are parked and
the OS is given a chance to fix the page tables and retry the transaction.

Enable stall for devices that support it (opt-in by firmware). When an
event corresponds to a translation error, call the IOMMU fault handler. If
the fault is recoverable, it will call us back to terminate or continue
the stall.

Note that this patch tweaks the iommu_fault_event and page_response_msg to
extend the fault id field. Stall uses 16 bits of IDs whereas PCI PRI only
uses 9.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 175 +++++++++++++++++++++++++++++++++++++++++++-
 include/linux/iommu.h       |   4 +-
 2 files changed, 173 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 2430b2140f8d..8b9f5dd06be0 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -338,6 +338,15 @@
 #define CMDQ_PRI_1_RESP_FAIL		(1UL << CMDQ_PRI_1_RESP_SHIFT)
 #define CMDQ_PRI_1_RESP_SUCC		(2UL << CMDQ_PRI_1_RESP_SHIFT)
 
+#define CMDQ_RESUME_0_SID_SHIFT		32
+#define CMDQ_RESUME_0_SID_MASK		0xffffffffUL
+#define CMDQ_RESUME_0_ACTION_SHIFT	12
+#define CMDQ_RESUME_0_ACTION_TERM	(0UL << CMDQ_RESUME_0_ACTION_SHIFT)
+#define CMDQ_RESUME_0_ACTION_RETRY	(1UL << CMDQ_RESUME_0_ACTION_SHIFT)
+#define CMDQ_RESUME_0_ACTION_ABORT	(2UL << CMDQ_RESUME_0_ACTION_SHIFT)
+#define CMDQ_RESUME_1_STAG_SHIFT	0
+#define CMDQ_RESUME_1_STAG_MASK		0xffffUL
+
 #define CMDQ_SYNC_0_CS_SHIFT		12
 #define CMDQ_SYNC_0_CS_NONE		(0UL << CMDQ_SYNC_0_CS_SHIFT)
 #define CMDQ_SYNC_0_CS_IRQ		(1UL << CMDQ_SYNC_0_CS_SHIFT)
@@ -358,6 +367,31 @@
 #define EVTQ_0_ID_SHIFT			0
 #define EVTQ_0_ID_MASK			0xffUL
 
+#define EVT_ID_TRANSLATION_FAULT	0x10
+#define EVT_ID_ADDR_SIZE_FAULT		0x11
+#define EVT_ID_ACCESS_FAULT		0x12
+#define EVT_ID_PERMISSION_FAULT		0x13
+
+#define EVTQ_0_SSV			(1UL << 11)
+#define EVTQ_0_SSID_SHIFT		12
+#define EVTQ_0_SSID_MASK		0xfffffUL
+#define EVTQ_0_SID_SHIFT		32
+#define EVTQ_0_SID_MASK			0xffffffffUL
+#define EVTQ_1_STAG_SHIFT		0
+#define EVTQ_1_STAG_MASK		0xffffUL
+#define EVTQ_1_STALL			(1UL << 31)
+#define EVTQ_1_PRIV			(1UL << 33)
+#define EVTQ_1_EXEC			(1UL << 34)
+#define EVTQ_1_READ			(1UL << 35)
+#define EVTQ_1_S2			(1UL << 39)
+#define EVTQ_1_CLASS_SHIFT		40
+#define EVTQ_1_CLASS_MASK		0x3UL
+#define EVTQ_1_TT_READ			(1UL << 44)
+#define EVTQ_2_ADDR_SHIFT		0
+#define EVTQ_2_ADDR_MASK		0xffffffffffffffffUL
+#define EVTQ_3_IPA_SHIFT		12
+#define EVTQ_3_IPA_MASK			0xffffffffffUL
+
 /* PRI queue */
 #define PRIQ_ENT_DWORDS			2
 #define PRIQ_MAX_SZ_SHIFT		8
@@ -472,6 +506,13 @@ struct arm_smmu_cmdq_ent {
 			enum pri_resp		resp;
 		} pri;
 
+		#define CMDQ_OP_RESUME		0x44
+		struct {
+			u32			sid;
+			u16			stag;
+			enum page_response_code	resp;
+		} resume;
+
 		#define CMDQ_OP_CMD_SYNC	0x46
 		struct {
 			u32			msidata;
@@ -545,6 +586,8 @@ struct arm_smmu_strtab_ent {
 	bool				assigned;
 	struct arm_smmu_s1_cfg		*s1_cfg;
 	struct arm_smmu_s2_cfg		*s2_cfg;
+
+	bool				can_stall;
 };
 
 struct arm_smmu_strtab_cfg {
@@ -904,6 +947,21 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 			return -EINVAL;
 		}
 		break;
+	case CMDQ_OP_RESUME:
+		cmd[0] |= (u64)ent->resume.sid << CMDQ_RESUME_0_SID_SHIFT;
+		cmd[1] |= ent->resume.stag << CMDQ_RESUME_1_STAG_SHIFT;
+		switch (ent->resume.resp) {
+		case IOMMU_PAGE_RESP_INVALID:
+		case IOMMU_PAGE_RESP_FAILURE:
+			cmd[0] |= CMDQ_RESUME_0_ACTION_ABORT;
+			break;
+		case IOMMU_PAGE_RESP_SUCCESS:
+			cmd[0] |= CMDQ_RESUME_0_ACTION_RETRY;
+			break;
+		default:
+			return -EINVAL;
+		}
+		break;
 	case CMDQ_OP_CMD_SYNC:
 		if (ent->sync.msiaddr)
 			cmd[0] |= CMDQ_SYNC_0_CS_IRQ;
@@ -1065,6 +1123,35 @@ static void arm_smmu_cmdq_issue_sync(struct arm_smmu_device *smmu)
 		dev_err_ratelimited(smmu->dev, "CMD_SYNC timeout\n");
 }
 
+static int arm_smmu_page_response(struct iommu_domain *domain,
+				  struct device *dev,
+				  struct page_response_msg *resp)
+{
+	int sid = dev->iommu_fwspec->ids[0];
+	struct arm_smmu_cmdq_ent cmd = {0};
+	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
+
+	if (master->ste.can_stall) {
+		cmd.opcode		= CMDQ_OP_RESUME;
+		cmd.resume.sid		= sid;
+		cmd.resume.stag		= resp->page_req_group_id;
+		cmd.resume.resp		= resp->resp_code;
+	} else {
+		/* TODO: put PRI response here */
+		return -EINVAL;
+	}
+
+	arm_smmu_cmdq_issue_cmd(master->smmu, &cmd);
+	/*
+	 * Don't send a SYNC, it doesn't do anything for RESUME or PRI_RESP.
+	 * RESUME consumption guarantees that the stalled transaction will be
+	 * terminated... at some point in the future. PRI_RESP is fire and
+	 * forget.
+	 */
+
+	return 0;
+}
+
 /* Stream table manipulation functions */
 static void
 arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
@@ -1182,7 +1269,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 			 STRTAB_STE_1_STRW_SHIFT);
 
 		if (smmu->features & ARM_SMMU_FEAT_STALLS &&
-		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
+		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE) &&
+		   !ste->can_stall)
 			dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
 
 		val |= (cfg->base & STRTAB_STE_0_S1CTXPTR_MASK
@@ -1285,10 +1373,73 @@ arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
 	return master;
 }
 
+static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt)
+{
+	struct arm_smmu_master_data *master;
+	u8 type = evt[0] >> EVTQ_0_ID_SHIFT & EVTQ_0_ID_MASK;
+	u32 sid = evt[0] >> EVTQ_0_SID_SHIFT & EVTQ_0_SID_MASK;
+
+	struct iommu_fault_event fault = {
+		.page_req_group_id = evt[1] >> EVTQ_1_STAG_SHIFT & EVTQ_1_STAG_MASK,
+		.addr		= evt[2] >> EVTQ_2_ADDR_SHIFT & EVTQ_2_ADDR_MASK,
+		.last_req	= true,
+	};
+
+	switch (type) {
+	case EVT_ID_TRANSLATION_FAULT:
+	case EVT_ID_ADDR_SIZE_FAULT:
+	case EVT_ID_ACCESS_FAULT:
+		fault.reason = IOMMU_FAULT_REASON_PTE_FETCH;
+		break;
+	case EVT_ID_PERMISSION_FAULT:
+		fault.reason = IOMMU_FAULT_REASON_PERMISSION;
+		break;
+	default:
+		/* TODO: report other unrecoverable faults. */
+		return -EFAULT;
+	}
+
+	/* Stage-2 is always pinned at the moment */
+	if (evt[1] & EVTQ_1_S2)
+		return -EFAULT;
+
+	master = arm_smmu_find_master(smmu, sid);
+	if (!master)
+		return -EINVAL;
+
+	/*
+	 * The domain is valid until the fault returns, because detach() flushes
+	 * the fault queue.
+	 */
+	if (evt[1] & EVTQ_1_STALL)
+		fault.type = IOMMU_FAULT_PAGE_REQ;
+	else
+		fault.type = IOMMU_FAULT_DMA_UNRECOV;
+
+	if (evt[1] & EVTQ_1_READ)
+		fault.prot |= IOMMU_FAULT_READ;
+	else
+		fault.prot |= IOMMU_FAULT_WRITE;
+
+	if (evt[1] & EVTQ_1_EXEC)
+		fault.prot |= IOMMU_FAULT_EXEC;
+
+	if (evt[1] & EVTQ_1_PRIV)
+		fault.prot |= IOMMU_FAULT_PRIV;
+
+	if (evt[0] & EVTQ_0_SSV) {
+		fault.pasid_valid = true;
+		fault.pasid = evt[0] >> EVTQ_0_SSID_SHIFT & EVTQ_0_SSID_MASK;
+	}
+
+	/* Report to device driver or populate the page tables */
+	return iommu_report_device_fault(master->dev, &fault);
+}
+
 /* IRQ and event handlers */
 static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 {
-	int i;
+	int i, ret;
 	int num_handled = 0;
 	struct arm_smmu_device *smmu = dev;
 	struct arm_smmu_queue *q = &smmu->evtq.q;
@@ -1300,12 +1451,19 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 		while (!queue_remove_raw(q, evt)) {
 			u8 id = evt[0] >> EVTQ_0_ID_SHIFT & EVTQ_0_ID_MASK;
 
+			spin_unlock(&q->wq.lock);
+			ret = arm_smmu_handle_evt(smmu, evt);
+			spin_lock(&q->wq.lock);
+
 			if (++num_handled == queue_size) {
 				q->batch++;
 				wake_up_locked(&q->wq);
 				num_handled = 0;
 			}
 
+			if (!ret)
+				continue;
+
 			dev_info(smmu->dev, "event 0x%02x received:\n", id);
 			for (i = 0; i < ARRAY_SIZE(evt); ++i)
 				dev_info(smmu->dev, "\t0x%016llx\n",
@@ -1442,7 +1600,9 @@ static int arm_smmu_flush_queues(struct notifier_block *nb,
 		master = dev->iommu_fwspec->iommu_priv;
 
 	if (master) {
-		/* TODO: add support for PRI and Stall */
+		if (master->ste.can_stall)
+			arm_smmu_flush_queue(smmu, &smmu->evtq.q, "evtq");
+		/* TODO: add support for PRI */
 		return 0;
 	}
 
@@ -1756,7 +1916,8 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 		.order			= master->ssid_bits,
 		.sync			= &arm_smmu_ctx_sync,
 		.arm_smmu = {
-			.stall		= !!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE),
+			.stall		= !!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE) ||
+					  master->ste.can_stall,
 			.asid_bits	= smmu->asid_bits,
 			.hw_access	= !!(smmu->features & ARM_SMMU_FEAT_HA),
 			.hw_dirty	= !!(smmu->features & ARM_SMMU_FEAT_HD),
@@ -2296,6 +2457,11 @@ static int arm_smmu_add_device(struct device *dev)
 
 	master->ssid_bits = min(smmu->ssid_bits, fwspec->num_pasid_bits);
 
+	if (fwspec->can_stall && smmu->features & ARM_SMMU_FEAT_STALLS) {
+		master->can_fault = true;
+		master->ste.can_stall = true;
+	}
+
 	group = iommu_group_get_for_dev(dev);
 	if (!IS_ERR(group)) {
 		arm_smmu_insert_master(smmu, master);
@@ -2435,6 +2601,7 @@ static struct iommu_ops arm_smmu_ops = {
 	.mm_attach		= arm_smmu_mm_attach,
 	.mm_detach		= arm_smmu_mm_detach,
 	.mm_invalidate		= arm_smmu_mm_invalidate,
+	.page_response		= arm_smmu_page_response,
 	.map			= arm_smmu_map,
 	.unmap			= arm_smmu_unmap,
 	.map_sg			= default_iommu_map_sg,
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 37c3b9d087ce..f5c2f4be2b42 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -227,7 +227,7 @@ struct page_response_msg {
 	u32 pasid;
 	enum page_response_code resp_code;
 	u32 pasid_present:1;
-	u32 page_req_group_id : 9;
+	u32 page_req_group_id;
 	enum page_response_type type;
 	u32 private_data;
 };
@@ -421,7 +421,7 @@ struct iommu_fault_event {
 	enum iommu_fault_reason reason;
 	u64 addr;
 	u32 pasid;
-	u32 page_req_group_id : 9;
+	u32 page_req_group_id;
 	u32 last_req : 1;
 	u32 pasid_valid : 1;
 	u32 prot;
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 29/37] iommu/arm-smmu-v3: Add stall support for platform devices
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

The SMMU provides a Stall model for handling page faults in platform
devices. It is similar to PCI PRI, but doesn't require devices to have
their own translation cache. Instead, faulting transactions are parked and
the OS is given a chance to fix the page tables and retry the transaction.

Enable stall for devices that support it (opt-in by firmware). When an
event corresponds to a translation error, call the IOMMU fault handler. If
the fault is recoverable, it will call us back to terminate or continue
the stall.

Note that this patch tweaks the iommu_fault_event and page_response_msg to
extend the fault id field. Stall uses 16 bits of IDs whereas PCI PRI only
uses 9.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 175 +++++++++++++++++++++++++++++++++++++++++++-
 include/linux/iommu.h       |   4 +-
 2 files changed, 173 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 2430b2140f8d..8b9f5dd06be0 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -338,6 +338,15 @@
 #define CMDQ_PRI_1_RESP_FAIL		(1UL << CMDQ_PRI_1_RESP_SHIFT)
 #define CMDQ_PRI_1_RESP_SUCC		(2UL << CMDQ_PRI_1_RESP_SHIFT)
 
+#define CMDQ_RESUME_0_SID_SHIFT		32
+#define CMDQ_RESUME_0_SID_MASK		0xffffffffUL
+#define CMDQ_RESUME_0_ACTION_SHIFT	12
+#define CMDQ_RESUME_0_ACTION_TERM	(0UL << CMDQ_RESUME_0_ACTION_SHIFT)
+#define CMDQ_RESUME_0_ACTION_RETRY	(1UL << CMDQ_RESUME_0_ACTION_SHIFT)
+#define CMDQ_RESUME_0_ACTION_ABORT	(2UL << CMDQ_RESUME_0_ACTION_SHIFT)
+#define CMDQ_RESUME_1_STAG_SHIFT	0
+#define CMDQ_RESUME_1_STAG_MASK		0xffffUL
+
 #define CMDQ_SYNC_0_CS_SHIFT		12
 #define CMDQ_SYNC_0_CS_NONE		(0UL << CMDQ_SYNC_0_CS_SHIFT)
 #define CMDQ_SYNC_0_CS_IRQ		(1UL << CMDQ_SYNC_0_CS_SHIFT)
@@ -358,6 +367,31 @@
 #define EVTQ_0_ID_SHIFT			0
 #define EVTQ_0_ID_MASK			0xffUL
 
+#define EVT_ID_TRANSLATION_FAULT	0x10
+#define EVT_ID_ADDR_SIZE_FAULT		0x11
+#define EVT_ID_ACCESS_FAULT		0x12
+#define EVT_ID_PERMISSION_FAULT		0x13
+
+#define EVTQ_0_SSV			(1UL << 11)
+#define EVTQ_0_SSID_SHIFT		12
+#define EVTQ_0_SSID_MASK		0xfffffUL
+#define EVTQ_0_SID_SHIFT		32
+#define EVTQ_0_SID_MASK			0xffffffffUL
+#define EVTQ_1_STAG_SHIFT		0
+#define EVTQ_1_STAG_MASK		0xffffUL
+#define EVTQ_1_STALL			(1UL << 31)
+#define EVTQ_1_PRIV			(1UL << 33)
+#define EVTQ_1_EXEC			(1UL << 34)
+#define EVTQ_1_READ			(1UL << 35)
+#define EVTQ_1_S2			(1UL << 39)
+#define EVTQ_1_CLASS_SHIFT		40
+#define EVTQ_1_CLASS_MASK		0x3UL
+#define EVTQ_1_TT_READ			(1UL << 44)
+#define EVTQ_2_ADDR_SHIFT		0
+#define EVTQ_2_ADDR_MASK		0xffffffffffffffffUL
+#define EVTQ_3_IPA_SHIFT		12
+#define EVTQ_3_IPA_MASK			0xffffffffffUL
+
 /* PRI queue */
 #define PRIQ_ENT_DWORDS			2
 #define PRIQ_MAX_SZ_SHIFT		8
@@ -472,6 +506,13 @@ struct arm_smmu_cmdq_ent {
 			enum pri_resp		resp;
 		} pri;
 
+		#define CMDQ_OP_RESUME		0x44
+		struct {
+			u32			sid;
+			u16			stag;
+			enum page_response_code	resp;
+		} resume;
+
 		#define CMDQ_OP_CMD_SYNC	0x46
 		struct {
 			u32			msidata;
@@ -545,6 +586,8 @@ struct arm_smmu_strtab_ent {
 	bool				assigned;
 	struct arm_smmu_s1_cfg		*s1_cfg;
 	struct arm_smmu_s2_cfg		*s2_cfg;
+
+	bool				can_stall;
 };
 
 struct arm_smmu_strtab_cfg {
@@ -904,6 +947,21 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 			return -EINVAL;
 		}
 		break;
+	case CMDQ_OP_RESUME:
+		cmd[0] |= (u64)ent->resume.sid << CMDQ_RESUME_0_SID_SHIFT;
+		cmd[1] |= ent->resume.stag << CMDQ_RESUME_1_STAG_SHIFT;
+		switch (ent->resume.resp) {
+		case IOMMU_PAGE_RESP_INVALID:
+		case IOMMU_PAGE_RESP_FAILURE:
+			cmd[0] |= CMDQ_RESUME_0_ACTION_ABORT;
+			break;
+		case IOMMU_PAGE_RESP_SUCCESS:
+			cmd[0] |= CMDQ_RESUME_0_ACTION_RETRY;
+			break;
+		default:
+			return -EINVAL;
+		}
+		break;
 	case CMDQ_OP_CMD_SYNC:
 		if (ent->sync.msiaddr)
 			cmd[0] |= CMDQ_SYNC_0_CS_IRQ;
@@ -1065,6 +1123,35 @@ static void arm_smmu_cmdq_issue_sync(struct arm_smmu_device *smmu)
 		dev_err_ratelimited(smmu->dev, "CMD_SYNC timeout\n");
 }
 
+static int arm_smmu_page_response(struct iommu_domain *domain,
+				  struct device *dev,
+				  struct page_response_msg *resp)
+{
+	int sid = dev->iommu_fwspec->ids[0];
+	struct arm_smmu_cmdq_ent cmd = {0};
+	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
+
+	if (master->ste.can_stall) {
+		cmd.opcode		= CMDQ_OP_RESUME;
+		cmd.resume.sid		= sid;
+		cmd.resume.stag		= resp->page_req_group_id;
+		cmd.resume.resp		= resp->resp_code;
+	} else {
+		/* TODO: put PRI response here */
+		return -EINVAL;
+	}
+
+	arm_smmu_cmdq_issue_cmd(master->smmu, &cmd);
+	/*
+	 * Don't send a SYNC, it doesn't do anything for RESUME or PRI_RESP.
+	 * RESUME consumption guarantees that the stalled transaction will be
+	 * terminated... at some point in the future. PRI_RESP is fire and
+	 * forget.
+	 */
+
+	return 0;
+}
+
 /* Stream table manipulation functions */
 static void
 arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
@@ -1182,7 +1269,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 			 STRTAB_STE_1_STRW_SHIFT);
 
 		if (smmu->features & ARM_SMMU_FEAT_STALLS &&
-		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
+		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE) &&
+		   !ste->can_stall)
 			dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
 
 		val |= (cfg->base & STRTAB_STE_0_S1CTXPTR_MASK
@@ -1285,10 +1373,73 @@ arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
 	return master;
 }
 
+static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt)
+{
+	struct arm_smmu_master_data *master;
+	u8 type = evt[0] >> EVTQ_0_ID_SHIFT & EVTQ_0_ID_MASK;
+	u32 sid = evt[0] >> EVTQ_0_SID_SHIFT & EVTQ_0_SID_MASK;
+
+	struct iommu_fault_event fault = {
+		.page_req_group_id = evt[1] >> EVTQ_1_STAG_SHIFT & EVTQ_1_STAG_MASK,
+		.addr		= evt[2] >> EVTQ_2_ADDR_SHIFT & EVTQ_2_ADDR_MASK,
+		.last_req	= true,
+	};
+
+	switch (type) {
+	case EVT_ID_TRANSLATION_FAULT:
+	case EVT_ID_ADDR_SIZE_FAULT:
+	case EVT_ID_ACCESS_FAULT:
+		fault.reason = IOMMU_FAULT_REASON_PTE_FETCH;
+		break;
+	case EVT_ID_PERMISSION_FAULT:
+		fault.reason = IOMMU_FAULT_REASON_PERMISSION;
+		break;
+	default:
+		/* TODO: report other unrecoverable faults. */
+		return -EFAULT;
+	}
+
+	/* Stage-2 is always pinned at the moment */
+	if (evt[1] & EVTQ_1_S2)
+		return -EFAULT;
+
+	master = arm_smmu_find_master(smmu, sid);
+	if (!master)
+		return -EINVAL;
+
+	/*
+	 * The domain is valid until the fault returns, because detach() flushes
+	 * the fault queue.
+	 */
+	if (evt[1] & EVTQ_1_STALL)
+		fault.type = IOMMU_FAULT_PAGE_REQ;
+	else
+		fault.type = IOMMU_FAULT_DMA_UNRECOV;
+
+	if (evt[1] & EVTQ_1_READ)
+		fault.prot |= IOMMU_FAULT_READ;
+	else
+		fault.prot |= IOMMU_FAULT_WRITE;
+
+	if (evt[1] & EVTQ_1_EXEC)
+		fault.prot |= IOMMU_FAULT_EXEC;
+
+	if (evt[1] & EVTQ_1_PRIV)
+		fault.prot |= IOMMU_FAULT_PRIV;
+
+	if (evt[0] & EVTQ_0_SSV) {
+		fault.pasid_valid = true;
+		fault.pasid = evt[0] >> EVTQ_0_SSID_SHIFT & EVTQ_0_SSID_MASK;
+	}
+
+	/* Report to device driver or populate the page tables */
+	return iommu_report_device_fault(master->dev, &fault);
+}
+
 /* IRQ and event handlers */
 static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 {
-	int i;
+	int i, ret;
 	int num_handled = 0;
 	struct arm_smmu_device *smmu = dev;
 	struct arm_smmu_queue *q = &smmu->evtq.q;
@@ -1300,12 +1451,19 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 		while (!queue_remove_raw(q, evt)) {
 			u8 id = evt[0] >> EVTQ_0_ID_SHIFT & EVTQ_0_ID_MASK;
 
+			spin_unlock(&q->wq.lock);
+			ret = arm_smmu_handle_evt(smmu, evt);
+			spin_lock(&q->wq.lock);
+
 			if (++num_handled == queue_size) {
 				q->batch++;
 				wake_up_locked(&q->wq);
 				num_handled = 0;
 			}
 
+			if (!ret)
+				continue;
+
 			dev_info(smmu->dev, "event 0x%02x received:\n", id);
 			for (i = 0; i < ARRAY_SIZE(evt); ++i)
 				dev_info(smmu->dev, "\t0x%016llx\n",
@@ -1442,7 +1600,9 @@ static int arm_smmu_flush_queues(struct notifier_block *nb,
 		master = dev->iommu_fwspec->iommu_priv;
 
 	if (master) {
-		/* TODO: add support for PRI and Stall */
+		if (master->ste.can_stall)
+			arm_smmu_flush_queue(smmu, &smmu->evtq.q, "evtq");
+		/* TODO: add support for PRI */
 		return 0;
 	}
 
@@ -1756,7 +1916,8 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 		.order			= master->ssid_bits,
 		.sync			= &arm_smmu_ctx_sync,
 		.arm_smmu = {
-			.stall		= !!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE),
+			.stall		= !!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE) ||
+					  master->ste.can_stall,
 			.asid_bits	= smmu->asid_bits,
 			.hw_access	= !!(smmu->features & ARM_SMMU_FEAT_HA),
 			.hw_dirty	= !!(smmu->features & ARM_SMMU_FEAT_HD),
@@ -2296,6 +2457,11 @@ static int arm_smmu_add_device(struct device *dev)
 
 	master->ssid_bits = min(smmu->ssid_bits, fwspec->num_pasid_bits);
 
+	if (fwspec->can_stall && smmu->features & ARM_SMMU_FEAT_STALLS) {
+		master->can_fault = true;
+		master->ste.can_stall = true;
+	}
+
 	group = iommu_group_get_for_dev(dev);
 	if (!IS_ERR(group)) {
 		arm_smmu_insert_master(smmu, master);
@@ -2435,6 +2601,7 @@ static struct iommu_ops arm_smmu_ops = {
 	.mm_attach		= arm_smmu_mm_attach,
 	.mm_detach		= arm_smmu_mm_detach,
 	.mm_invalidate		= arm_smmu_mm_invalidate,
+	.page_response		= arm_smmu_page_response,
 	.map			= arm_smmu_map,
 	.unmap			= arm_smmu_unmap,
 	.map_sg			= default_iommu_map_sg,
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 37c3b9d087ce..f5c2f4be2b42 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -227,7 +227,7 @@ struct page_response_msg {
 	u32 pasid;
 	enum page_response_code resp_code;
 	u32 pasid_present:1;
-	u32 page_req_group_id : 9;
+	u32 page_req_group_id;
 	enum page_response_type type;
 	u32 private_data;
 };
@@ -421,7 +421,7 @@ struct iommu_fault_event {
 	enum iommu_fault_reason reason;
 	u64 addr;
 	u32 pasid;
-	u32 page_req_group_id : 9;
+	u32 page_req_group_id;
 	u32 last_req : 1;
 	u32 pasid_valid : 1;
 	u32 prot;
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 30/37] ACPI/IORT: Check ATS capability in root complex nodes
  2018-02-12 18:33 ` Jean-Philippe Brucker
  (?)
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: joro-zLv9SwRftAIdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	mark.rutland-5wv7dgnIgG8, catalin.marinas-5wv7dgnIgG8,
	will.deacon-5wv7dgnIgG8, lorenzo.pieralisi-5wv7dgnIgG8,
	hanjun.guo-QSEj5FYQhm4dnm+yROfE0A, sudeep.holla-5wv7dgnIgG8,
	rjw-LthD3rsA81gm4RdzfppkhA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	robin.murphy-5wv7dgnIgG8, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	tn-nYOzD4b6Jr9Wk0Htik3J/w, liubo95-hv44wF8Li93QT0dZR+AlfA,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	jonathan.cameron-hv44wF8Li93QT0dZR+AlfA,
	shunyong.yang-PT9Dzx9SjPiXmMXjJBpWqg,
	nwatters-sgV2jX0FEOL9JmXXK+q4OQ, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	jcrouse-sgV2jX0FEOL9JmXXK+q4OQ, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA,
	yi.l.liu-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	robdclark-Re5JQEeQqe8AvxtiuMwx3w, christian.koenig-5C7GfCeVMHo,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA

Root complex node in IORT has a bit telling whether it supports ATS or
not. Store this bit in the IOMMU fwspec when setting up a device, so it
can be accessed later by an IOMMU driver.

Use the negative version (NO_ATS) at the moment because it's not clear
if/how the bit needs to be integrated in other firmware descriptions. The
SMMU has a feature bit telling if it supports ATS, which might be
sufficient in most systems for deciding whether or not we should enable
the ATS capability in endpoints.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/acpi/arm64/iort.c | 11 +++++++++++
 include/linux/iommu.h     |  4 ++++
 2 files changed, 15 insertions(+)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 95255ecfae7c..db374062ec9d 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -911,6 +911,14 @@ void iort_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size)
 	dev_dbg(dev, "dma_pfn_offset(%#08llx)\n", offset);
 }
 
+static bool iort_pci_rc_supports_ats(struct acpi_iort_node *node)
+{
+	struct acpi_iort_root_complex *pci_rc;
+
+	pci_rc = (struct acpi_iort_root_complex *)node->node_data;
+	return pci_rc->ats_attribute & ACPI_IORT_ATS_SUPPORTED;
+}
+
 /**
  * iort_iommu_configure - Set-up IOMMU configuration for a device.
  *
@@ -946,6 +954,9 @@ const struct iommu_ops *iort_iommu_configure(struct device *dev)
 		info.node = node;
 		err = pci_for_each_dma_alias(to_pci_dev(dev),
 					     iort_pci_iommu_init, &info);
+
+		if (!err && !iort_pci_rc_supports_ats(node))
+			dev->iommu_fwspec->flags |= IOMMU_FWSPEC_PCI_NO_ATS;
 	} else {
 		int i = 0;
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index f5c2f4be2b42..641aaf0f1b81 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -631,12 +631,16 @@ struct iommu_fwspec {
 	const struct iommu_ops	*ops;
 	struct fwnode_handle	*iommu_fwnode;
 	void			*iommu_priv;
+	u32			flags;
 	unsigned int		num_ids;
 	unsigned int		num_pasid_bits;
 	bool			can_stall;
 	u32			ids[1];
 };
 
+/* Firmware disabled ATS in the root complex */
+#define IOMMU_FWSPEC_PCI_NO_ATS			(1 << 0)
+
 int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
 		      const struct iommu_ops *ops);
 void iommu_fwspec_free(struct device *dev);
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 30/37] ACPI/IORT: Check ATS capability in root complex nodes
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

Root complex node in IORT has a bit telling whether it supports ATS or
not. Store this bit in the IOMMU fwspec when setting up a device, so it
can be accessed later by an IOMMU driver.

Use the negative version (NO_ATS) at the moment because it's not clear
if/how the bit needs to be integrated in other firmware descriptions. The
SMMU has a feature bit telling if it supports ATS, which might be
sufficient in most systems for deciding whether or not we should enable
the ATS capability in endpoints.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/acpi/arm64/iort.c | 11 +++++++++++
 include/linux/iommu.h     |  4 ++++
 2 files changed, 15 insertions(+)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 95255ecfae7c..db374062ec9d 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -911,6 +911,14 @@ void iort_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size)
 	dev_dbg(dev, "dma_pfn_offset(%#08llx)\n", offset);
 }
 
+static bool iort_pci_rc_supports_ats(struct acpi_iort_node *node)
+{
+	struct acpi_iort_root_complex *pci_rc;
+
+	pci_rc = (struct acpi_iort_root_complex *)node->node_data;
+	return pci_rc->ats_attribute & ACPI_IORT_ATS_SUPPORTED;
+}
+
 /**
  * iort_iommu_configure - Set-up IOMMU configuration for a device.
  *
@@ -946,6 +954,9 @@ const struct iommu_ops *iort_iommu_configure(struct device *dev)
 		info.node = node;
 		err = pci_for_each_dma_alias(to_pci_dev(dev),
 					     iort_pci_iommu_init, &info);
+
+		if (!err && !iort_pci_rc_supports_ats(node))
+			dev->iommu_fwspec->flags |= IOMMU_FWSPEC_PCI_NO_ATS;
 	} else {
 		int i = 0;
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index f5c2f4be2b42..641aaf0f1b81 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -631,12 +631,16 @@ struct iommu_fwspec {
 	const struct iommu_ops	*ops;
 	struct fwnode_handle	*iommu_fwnode;
 	void			*iommu_priv;
+	u32			flags;
 	unsigned int		num_ids;
 	unsigned int		num_pasid_bits;
 	bool			can_stall;
 	u32			ids[1];
 };
 
+/* Firmware disabled ATS in the root complex */
+#define IOMMU_FWSPEC_PCI_NO_ATS			(1 << 0)
+
 int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
 		      const struct iommu_ops *ops);
 void iommu_fwspec_free(struct device *dev);
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 30/37] ACPI/IORT: Check ATS capability in root complex nodes
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

Root complex node in IORT has a bit telling whether it supports ATS or
not. Store this bit in the IOMMU fwspec when setting up a device, so it
can be accessed later by an IOMMU driver.

Use the negative version (NO_ATS) at the moment because it's not clear
if/how the bit needs to be integrated in other firmware descriptions. The
SMMU has a feature bit telling if it supports ATS, which might be
sufficient in most systems for deciding whether or not we should enable
the ATS capability in endpoints.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/acpi/arm64/iort.c | 11 +++++++++++
 include/linux/iommu.h     |  4 ++++
 2 files changed, 15 insertions(+)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 95255ecfae7c..db374062ec9d 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -911,6 +911,14 @@ void iort_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size)
 	dev_dbg(dev, "dma_pfn_offset(%#08llx)\n", offset);
 }
 
+static bool iort_pci_rc_supports_ats(struct acpi_iort_node *node)
+{
+	struct acpi_iort_root_complex *pci_rc;
+
+	pci_rc = (struct acpi_iort_root_complex *)node->node_data;
+	return pci_rc->ats_attribute & ACPI_IORT_ATS_SUPPORTED;
+}
+
 /**
  * iort_iommu_configure - Set-up IOMMU configuration for a device.
  *
@@ -946,6 +954,9 @@ const struct iommu_ops *iort_iommu_configure(struct device *dev)
 		info.node = node;
 		err = pci_for_each_dma_alias(to_pci_dev(dev),
 					     iort_pci_iommu_init, &info);
+
+		if (!err && !iort_pci_rc_supports_ats(node))
+			dev->iommu_fwspec->flags |= IOMMU_FWSPEC_PCI_NO_ATS;
 	} else {
 		int i = 0;
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index f5c2f4be2b42..641aaf0f1b81 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -631,12 +631,16 @@ struct iommu_fwspec {
 	const struct iommu_ops	*ops;
 	struct fwnode_handle	*iommu_fwnode;
 	void			*iommu_priv;
+	u32			flags;
 	unsigned int		num_ids;
 	unsigned int		num_pasid_bits;
 	bool			can_stall;
 	u32			ids[1];
 };
 
+/* Firmware disabled ATS in the root complex */
+#define IOMMU_FWSPEC_PCI_NO_ATS			(1 << 0)
+
 int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
 		      const struct iommu_ops *ops);
 void iommu_fwspec_free(struct device *dev);
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 31/37] iommu/arm-smmu-v3: Add support for PCI ATS
  2018-02-12 18:33 ` Jean-Philippe Brucker
  (?)
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: mark.rutland-5wv7dgnIgG8,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	catalin.marinas-5wv7dgnIgG8, xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	will.deacon-5wv7dgnIgG8, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	shunyong.yang-PT9Dzx9SjPiXmMXjJBpWqg,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw-LthD3rsA81gm4RdzfppkhA,
	sudeep.holla-5wv7dgnIgG8, christian.koenig-5C7GfCeVMHo

PCIe devices can implement their own TLB, named Address Translation Cache
(ATC). Enable Address Translation Service (ATS) for devices that support
it and send them invalidation requests whenever we invalidate the IOTLBs.

  Range calculation
  -----------------

The invalidation packet itself is a bit awkward: range must be naturally
aligned, which means that the start address is a multiple of the range
size. In addition, the size must be a power of two number of 4k pages. We
have a few options to enforce this constraint:

(1) Find the smallest naturally aligned region that covers the requested
    range. This is simple to compute and only takes one ATC_INV, but it
    will spill on lots of neighbouring ATC entries.

(2) Align the start address to the region size (rounded up to a power of
    two), and send a second invalidation for the next range of the same
    size. Still not great, but reduces spilling.

(3) Cover the range exactly with the smallest number of naturally aligned
    regions. This would be interesting to implement but as for (2),
    requires multiple ATC_INV.

As I suspect ATC invalidation packets will be a very scarce resource, I'll
go with option (1) for now, and only send one big invalidation. We can
move to (2), which is both easier to read and more gentle with the ATC,
once we've observed on real systems that we can send multiple smaller
Invalidation Requests for roughly the same price as a single big one.

Note that with io-pgtable, the unmap function is called for each page, so
this doesn't matter. The problem shows up when sharing page tables with
the MMU.

  Timeout
  -------

ATC invalidation is allowed to take up to 90 seconds, according to the
PCIe spec, so it is possible to hit the SMMU command queue timeout during
normal operations.

Some SMMU implementations will raise a CERROR_ATC_INV_SYNC when a CMD_SYNC
fails because of an ATC invalidation. Some will just abort the CMD_SYNC.
Others might let CMD_SYNC complete and have an asynchronous IMPDEF
mechanism to record the error. When we receive a CERROR_ATC_INV_SYNC, we
could retry sending all ATC_INV since last successful CMD_SYNC. When a
CMD_SYNC fails without CERROR_ATC_INV_SYNC, we could retry sending *all*
commands since last successful CMD_SYNC.

We cannot afford to wait 90 seconds in iommu_unmap, let alone MMU
notifiers. So we'd have to introduce a more clever system if this timeout
becomes a problem, like keeping hold of mappings and invalidating in the
background. Implementing safe delayed invalidations is a very complex
problem and deserves a series of its own. We'll assess whether more work
is needed to properly handle ATC invalidation timeouts once this code runs
on real hardware.

  Misc
  ----

I didn't put ATC and TLB invalidations in the same functions for three
reasons:

* TLB invalidation by range is batched and committed with a single sync.
  Batching ATC invalidation is inconvenient, endpoints limit the number of
  inflight invalidations. We'd have to count the number of invalidations
  queued and send a sync periodically. In addition, I suspect we always
  need a sync between TLB and ATC invalidation for the same page.

* Doing ATC invalidation outside tlb_inv_range also allows to send less
  requests, since TLB invalidations are done per page or block, while ATC
  invalidations target IOVA ranges.

* TLB invalidation by context is performed when freeing the domain, at
  which point there isn't any device attached anymore.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu-v3.c | 236 ++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 226 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 8b9f5dd06be0..76513135310f 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -37,6 +37,7 @@
 #include <linux/of_iommu.h>
 #include <linux/of_platform.h>
 #include <linux/pci.h>
+#include <linux/pci-ats.h>
 #include <linux/platform_device.h>
 #include <linux/sched/mm.h>
 
@@ -109,6 +110,7 @@
 #define IDR5_OAS_48_BIT			(5 << IDR5_OAS_SHIFT)
 
 #define ARM_SMMU_CR0			0x20
+#define CR0_ATSCHK			(1 << 4)
 #define CR0_CMDQEN			(1 << 3)
 #define CR0_EVTQEN			(1 << 2)
 #define CR0_PRIQEN			(1 << 1)
@@ -304,6 +306,7 @@
 #define CMDQ_ERR_CERROR_NONE_IDX	0
 #define CMDQ_ERR_CERROR_ILL_IDX		1
 #define CMDQ_ERR_CERROR_ABT_IDX		2
+#define CMDQ_ERR_CERROR_ATC_INV_IDX	3
 
 #define CMDQ_0_OP_SHIFT			0
 #define CMDQ_0_OP_MASK			0xffUL
@@ -327,6 +330,15 @@
 #define CMDQ_TLBI_1_VA_MASK		~0xfffUL
 #define CMDQ_TLBI_1_IPA_MASK		0xfffffffff000UL
 
+#define CMDQ_ATC_0_SSID_SHIFT		12
+#define CMDQ_ATC_0_SSID_MASK		0xfffffUL
+#define CMDQ_ATC_0_SID_SHIFT		32
+#define CMDQ_ATC_0_SID_MASK		0xffffffffUL
+#define CMDQ_ATC_0_GLOBAL		(1UL << 9)
+#define CMDQ_ATC_1_SIZE_SHIFT		0
+#define CMDQ_ATC_1_SIZE_MASK		0x3fUL
+#define CMDQ_ATC_1_ADDR_MASK		~0xfffUL
+
 #define CMDQ_PRI_0_SSID_SHIFT		12
 #define CMDQ_PRI_0_SSID_MASK		0xfffffUL
 #define CMDQ_PRI_0_SID_SHIFT		32
@@ -425,6 +437,11 @@ module_param_named(disable_bypass, disable_bypass, bool, S_IRUGO);
 MODULE_PARM_DESC(disable_bypass,
 	"Disable bypass streams such that incoming transactions from devices that are not attached to an iommu domain will report an abort back to the device and will not be allowed to pass through the SMMU.");
 
+static bool disable_ats_check;
+module_param_named(disable_ats_check, disable_ats_check, bool, S_IRUGO);
+MODULE_PARM_DESC(disable_ats_check,
+	"By default, the SMMU checks whether each incoming transaction marked as translated is allowed by the stream configuration. This option disables the check.");
+
 enum pri_resp {
 	PRI_RESP_DENY,
 	PRI_RESP_FAIL,
@@ -498,6 +515,16 @@ struct arm_smmu_cmdq_ent {
 			u64			addr;
 		} tlbi;
 
+		#define CMDQ_OP_ATC_INV		0x40
+		#define ATC_INV_SIZE_ALL	52
+		struct {
+			u32			sid;
+			u32			ssid;
+			u64			addr;
+			u8			size;
+			bool			global;
+		} atc;
+
 		#define CMDQ_OP_PRI_RESP	0x41
 		struct {
 			u32			sid;
@@ -928,6 +955,14 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 	case CMDQ_OP_TLBI_EL2_ASID:
 		cmd[0] |= (u64)ent->tlbi.asid << CMDQ_TLBI_0_ASID_SHIFT;
 		break;
+	case CMDQ_OP_ATC_INV:
+		cmd[0] |= ent->substream_valid ? CMDQ_0_SSV : 0;
+		cmd[0] |= ent->atc.global ? CMDQ_ATC_0_GLOBAL : 0;
+		cmd[0] |= ent->atc.ssid << CMDQ_ATC_0_SSID_SHIFT;
+		cmd[0] |= (u64)ent->atc.sid << CMDQ_ATC_0_SID_SHIFT;
+		cmd[1] |= ent->atc.size << CMDQ_ATC_1_SIZE_SHIFT;
+		cmd[1] |= ent->atc.addr & CMDQ_ATC_1_ADDR_MASK;
+		break;
 	case CMDQ_OP_PRI_RESP:
 		cmd[0] |= ent->substream_valid ? CMDQ_0_SSV : 0;
 		cmd[0] |= ent->pri.ssid << CMDQ_PRI_0_SSID_SHIFT;
@@ -984,6 +1019,7 @@ static void arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu)
 		[CMDQ_ERR_CERROR_NONE_IDX]	= "No error",
 		[CMDQ_ERR_CERROR_ILL_IDX]	= "Illegal command",
 		[CMDQ_ERR_CERROR_ABT_IDX]	= "Abort on command fetch",
+		[CMDQ_ERR_CERROR_ATC_INV_IDX]	= "ATC invalidate timeout",
 	};
 
 	int i;
@@ -1003,6 +1039,14 @@ static void arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu)
 		dev_err(smmu->dev, "retrying command fetch\n");
 	case CMDQ_ERR_CERROR_NONE_IDX:
 		return;
+	case CMDQ_ERR_CERROR_ATC_INV_IDX:
+		/*
+		 * ATC Invalidation Completion timeout. CONS is still pointing
+		 * at the CMD_SYNC. Attempt to complete other pending commands
+		 * by repeating the CMD_SYNC, though we might well end up back
+		 * here since the ATC invalidation may still be pending.
+		 */
+		return;
 	case CMDQ_ERR_CERROR_ILL_IDX:
 		/* Fallthrough */
 	default:
@@ -1261,9 +1305,6 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 			 STRTAB_STE_1_S1C_CACHE_WBRA
 			 << STRTAB_STE_1_S1COR_SHIFT |
 			 STRTAB_STE_1_S1C_SH_ISH << STRTAB_STE_1_S1CSH_SHIFT |
-#ifdef CONFIG_PCI_ATS
-			 STRTAB_STE_1_EATS_TRANS << STRTAB_STE_1_EATS_SHIFT |
-#endif
 			 (smmu->features & ARM_SMMU_FEAT_E2H ?
 			  STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1) <<
 			 STRTAB_STE_1_STRW_SHIFT);
@@ -1300,6 +1341,10 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 		val |= STRTAB_STE_0_CFG_S2_TRANS;
 	}
 
+	if (IS_ENABLED(CONFIG_PCI_ATS))
+		dst[1] |= cpu_to_le64(STRTAB_STE_1_EATS_TRANS
+				      << STRTAB_STE_1_EATS_SHIFT);
+
 	arm_smmu_sync_ste_for_sid(smmu, sid);
 	dst[0] = cpu_to_le64(val);
 	arm_smmu_sync_ste_for_sid(smmu, sid);
@@ -1680,6 +1725,104 @@ static irqreturn_t arm_smmu_combined_irq_handler(int irq, void *dev)
 	return IRQ_WAKE_THREAD;
 }
 
+/* ATS invalidation */
+static bool arm_smmu_master_has_ats(struct arm_smmu_master_data *master)
+{
+	return dev_is_pci(master->dev) && to_pci_dev(master->dev)->ats_enabled;
+}
+
+static void
+arm_smmu_atc_inv_to_cmd(int ssid, unsigned long iova, size_t size,
+			struct arm_smmu_cmdq_ent *cmd)
+{
+	size_t log2_span;
+	size_t span_mask;
+	/* ATC invalidates are always on 4096 bytes pages */
+	size_t inval_grain_shift = 12;
+	unsigned long page_start, page_end;
+
+	*cmd = (struct arm_smmu_cmdq_ent) {
+		.opcode			= CMDQ_OP_ATC_INV,
+		.substream_valid	= !!ssid,
+		.atc.ssid		= ssid,
+	};
+
+	if (!size) {
+		cmd->atc.size = ATC_INV_SIZE_ALL;
+		return;
+	}
+
+	page_start	= iova >> inval_grain_shift;
+	page_end	= (iova + size - 1) >> inval_grain_shift;
+
+	/*
+	 * Find the smallest power of two that covers the range. Most
+	 * significant differing bit between start and end address indicates the
+	 * required span, ie. fls(start ^ end). For example:
+	 *
+	 * We want to invalidate pages [8; 11]. This is already the ideal range:
+	 *		x = 0b1000 ^ 0b1011 = 0b11
+	 *		span = 1 << fls(x) = 4
+	 *
+	 * To invalidate pages [7; 10], we need to invalidate [0; 15]:
+	 *		x = 0b0111 ^ 0b1010 = 0b1101
+	 *		span = 1 << fls(x) = 16
+	 */
+	log2_span	= fls_long(page_start ^ page_end);
+	span_mask	= (1ULL << log2_span) - 1;
+
+	page_start	&= ~span_mask;
+
+	cmd->atc.addr	= page_start << inval_grain_shift;
+	cmd->atc.size	= log2_span;
+}
+
+static int arm_smmu_atc_inv_master(struct arm_smmu_master_data *master,
+				   struct arm_smmu_cmdq_ent *cmd)
+{
+	int i;
+	struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
+
+	if (!arm_smmu_master_has_ats(master))
+		return 0;
+
+	for (i = 0; i < fwspec->num_ids; i++) {
+		cmd->atc.sid = fwspec->ids[i];
+		arm_smmu_cmdq_issue_cmd(master->smmu, cmd);
+	}
+
+	arm_smmu_cmdq_issue_sync(master->smmu);
+
+	return 0;
+}
+
+static int arm_smmu_atc_inv_master_all(struct arm_smmu_master_data *master,
+				       int ssid)
+{
+	struct arm_smmu_cmdq_ent cmd;
+
+	arm_smmu_atc_inv_to_cmd(ssid, 0, 0, &cmd);
+	return arm_smmu_atc_inv_master(master, &cmd);
+}
+
+static size_t
+arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
+			unsigned long iova, size_t size)
+{
+	unsigned long flags;
+	struct arm_smmu_cmdq_ent cmd;
+	struct arm_smmu_master_data *master;
+
+	arm_smmu_atc_inv_to_cmd(ssid, iova, size, &cmd);
+
+	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+	list_for_each_entry(master, &smmu_domain->devices, list)
+		arm_smmu_atc_inv_master(master, &cmd);
+	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+
+	return size;
+}
+
 /* IO_PGTABLE API */
 static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu)
 {
@@ -2092,6 +2235,8 @@ static void arm_smmu_detach_dev(struct device *dev)
 	if (smmu_domain) {
 		__iommu_sva_unbind_dev_all(dev);
 
+		arm_smmu_atc_inv_master_all(master, 0);
+
 		spin_lock_irqsave(&smmu_domain->devices_lock, flags);
 		list_del(&master->list);
 		spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
@@ -2179,12 +2324,19 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
 static size_t
 arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
 {
-	struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
+	int ret;
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
 
 	if (!ops)
 		return 0;
 
-	return ops->unmap(ops, iova, size);
+	ret = ops->unmap(ops, iova, size);
+
+	if (ret && smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS)
+		ret = arm_smmu_atc_inv_domain(smmu_domain, 0, iova, size);
+
+	return ret;
 }
 
 static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
@@ -2342,6 +2494,48 @@ static bool arm_smmu_sid_in_range(struct arm_smmu_device *smmu, u32 sid)
 	return sid < limit;
 }
 
+static int arm_smmu_enable_ats(struct arm_smmu_master_data *master)
+{
+	int ret;
+	size_t stu;
+	struct pci_dev *pdev;
+	struct arm_smmu_device *smmu = master->smmu;
+	struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
+
+	if (!(smmu->features & ARM_SMMU_FEAT_ATS) || !dev_is_pci(master->dev) ||
+	    (fwspec->flags & IOMMU_FWSPEC_PCI_NO_ATS))
+		return -ENOSYS;
+
+	pdev = to_pci_dev(master->dev);
+
+	/* Smallest Translation Unit: log2 of the smallest supported granule */
+	stu = __ffs(smmu->pgsize_bitmap);
+
+	ret = pci_enable_ats(pdev, stu);
+	if (ret)
+		return ret;
+
+	dev_dbg(&pdev->dev, "enabled ATS (STU=%zu, QDEP=%d)\n", stu,
+		pci_ats_queue_depth(pdev));
+
+	return 0;
+}
+
+static void arm_smmu_disable_ats(struct arm_smmu_master_data *master)
+{
+	struct pci_dev *pdev;
+
+	if (!dev_is_pci(master->dev))
+		return;
+
+	pdev = to_pci_dev(master->dev);
+
+	if (!pdev->ats_enabled)
+		return;
+
+	pci_disable_ats(pdev);
+}
+
 static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
 				  struct arm_smmu_master_data *master)
 {
@@ -2462,14 +2656,24 @@ static int arm_smmu_add_device(struct device *dev)
 		master->ste.can_stall = true;
 	}
 
+	arm_smmu_enable_ats(master);
+
 	group = iommu_group_get_for_dev(dev);
-	if (!IS_ERR(group)) {
-		arm_smmu_insert_master(smmu, master);
-		iommu_group_put(group);
-		iommu_device_link(&smmu->iommu, dev);
+	if (IS_ERR(group)) {
+		ret = PTR_ERR(group);
+		goto err_disable_ats;
 	}
 
-	return PTR_ERR_OR_ZERO(group);
+	iommu_group_put(group);
+	arm_smmu_insert_master(smmu, master);
+	iommu_device_link(&smmu->iommu, dev);
+
+	return 0;
+
+err_disable_ats:
+	arm_smmu_disable_ats(master);
+
+	return ret;
 }
 
 static void arm_smmu_remove_device(struct device *dev)
@@ -2486,6 +2690,8 @@ static void arm_smmu_remove_device(struct device *dev)
 	if (master && master->ste.assigned)
 		arm_smmu_detach_dev(dev);
 	arm_smmu_remove_master(smmu, master);
+	arm_smmu_disable_ats(master);
+
 	iommu_group_remove_device(dev);
 	iommu_device_unlink(&smmu->iommu, dev);
 	kfree(master);
@@ -3094,6 +3300,16 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
 		}
 	}
 
+	if (smmu->features & ARM_SMMU_FEAT_ATS && !disable_ats_check) {
+		enables |= CR0_ATSCHK;
+		ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0,
+					      ARM_SMMU_CR0ACK);
+		if (ret) {
+			dev_err(smmu->dev, "failed to enable ATS check\n");
+			return ret;
+		}
+	}
+
 	ret = arm_smmu_setup_irqs(smmu);
 	if (ret) {
 		dev_err(smmu->dev, "failed to setup irqs\n");
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 31/37] iommu/arm-smmu-v3: Add support for PCI ATS
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

PCIe devices can implement their own TLB, named Address Translation Cache
(ATC). Enable Address Translation Service (ATS) for devices that support
it and send them invalidation requests whenever we invalidate the IOTLBs.

  Range calculation
  -----------------

The invalidation packet itself is a bit awkward: range must be naturally
aligned, which means that the start address is a multiple of the range
size. In addition, the size must be a power of two number of 4k pages. We
have a few options to enforce this constraint:

(1) Find the smallest naturally aligned region that covers the requested
    range. This is simple to compute and only takes one ATC_INV, but it
    will spill on lots of neighbouring ATC entries.

(2) Align the start address to the region size (rounded up to a power of
    two), and send a second invalidation for the next range of the same
    size. Still not great, but reduces spilling.

(3) Cover the range exactly with the smallest number of naturally aligned
    regions. This would be interesting to implement but as for (2),
    requires multiple ATC_INV.

As I suspect ATC invalidation packets will be a very scarce resource, I'll
go with option (1) for now, and only send one big invalidation. We can
move to (2), which is both easier to read and more gentle with the ATC,
once we've observed on real systems that we can send multiple smaller
Invalidation Requests for roughly the same price as a single big one.

Note that with io-pgtable, the unmap function is called for each page, so
this doesn't matter. The problem shows up when sharing page tables with
the MMU.

  Timeout
  -------

ATC invalidation is allowed to take up to 90 seconds, according to the
PCIe spec, so it is possible to hit the SMMU command queue timeout during
normal operations.

Some SMMU implementations will raise a CERROR_ATC_INV_SYNC when a CMD_SYNC
fails because of an ATC invalidation. Some will just abort the CMD_SYNC.
Others might let CMD_SYNC complete and have an asynchronous IMPDEF
mechanism to record the error. When we receive a CERROR_ATC_INV_SYNC, we
could retry sending all ATC_INV since last successful CMD_SYNC. When a
CMD_SYNC fails without CERROR_ATC_INV_SYNC, we could retry sending *all*
commands since last successful CMD_SYNC.

We cannot afford to wait 90 seconds in iommu_unmap, let alone MMU
notifiers. So we'd have to introduce a more clever system if this timeout
becomes a problem, like keeping hold of mappings and invalidating in the
background. Implementing safe delayed invalidations is a very complex
problem and deserves a series of its own. We'll assess whether more work
is needed to properly handle ATC invalidation timeouts once this code runs
on real hardware.

  Misc
  ----

I didn't put ATC and TLB invalidations in the same functions for three
reasons:

* TLB invalidation by range is batched and committed with a single sync.
  Batching ATC invalidation is inconvenient, endpoints limit the number of
  inflight invalidations. We'd have to count the number of invalidations
  queued and send a sync periodically. In addition, I suspect we always
  need a sync between TLB and ATC invalidation for the same page.

* Doing ATC invalidation outside tlb_inv_range also allows to send less
  requests, since TLB invalidations are done per page or block, while ATC
  invalidations target IOVA ranges.

* TLB invalidation by context is performed when freeing the domain, at
  which point there isn't any device attached anymore.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 236 ++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 226 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 8b9f5dd06be0..76513135310f 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -37,6 +37,7 @@
 #include <linux/of_iommu.h>
 #include <linux/of_platform.h>
 #include <linux/pci.h>
+#include <linux/pci-ats.h>
 #include <linux/platform_device.h>
 #include <linux/sched/mm.h>
 
@@ -109,6 +110,7 @@
 #define IDR5_OAS_48_BIT			(5 << IDR5_OAS_SHIFT)
 
 #define ARM_SMMU_CR0			0x20
+#define CR0_ATSCHK			(1 << 4)
 #define CR0_CMDQEN			(1 << 3)
 #define CR0_EVTQEN			(1 << 2)
 #define CR0_PRIQEN			(1 << 1)
@@ -304,6 +306,7 @@
 #define CMDQ_ERR_CERROR_NONE_IDX	0
 #define CMDQ_ERR_CERROR_ILL_IDX		1
 #define CMDQ_ERR_CERROR_ABT_IDX		2
+#define CMDQ_ERR_CERROR_ATC_INV_IDX	3
 
 #define CMDQ_0_OP_SHIFT			0
 #define CMDQ_0_OP_MASK			0xffUL
@@ -327,6 +330,15 @@
 #define CMDQ_TLBI_1_VA_MASK		~0xfffUL
 #define CMDQ_TLBI_1_IPA_MASK		0xfffffffff000UL
 
+#define CMDQ_ATC_0_SSID_SHIFT		12
+#define CMDQ_ATC_0_SSID_MASK		0xfffffUL
+#define CMDQ_ATC_0_SID_SHIFT		32
+#define CMDQ_ATC_0_SID_MASK		0xffffffffUL
+#define CMDQ_ATC_0_GLOBAL		(1UL << 9)
+#define CMDQ_ATC_1_SIZE_SHIFT		0
+#define CMDQ_ATC_1_SIZE_MASK		0x3fUL
+#define CMDQ_ATC_1_ADDR_MASK		~0xfffUL
+
 #define CMDQ_PRI_0_SSID_SHIFT		12
 #define CMDQ_PRI_0_SSID_MASK		0xfffffUL
 #define CMDQ_PRI_0_SID_SHIFT		32
@@ -425,6 +437,11 @@ module_param_named(disable_bypass, disable_bypass, bool, S_IRUGO);
 MODULE_PARM_DESC(disable_bypass,
 	"Disable bypass streams such that incoming transactions from devices that are not attached to an iommu domain will report an abort back to the device and will not be allowed to pass through the SMMU.");
 
+static bool disable_ats_check;
+module_param_named(disable_ats_check, disable_ats_check, bool, S_IRUGO);
+MODULE_PARM_DESC(disable_ats_check,
+	"By default, the SMMU checks whether each incoming transaction marked as translated is allowed by the stream configuration. This option disables the check.");
+
 enum pri_resp {
 	PRI_RESP_DENY,
 	PRI_RESP_FAIL,
@@ -498,6 +515,16 @@ struct arm_smmu_cmdq_ent {
 			u64			addr;
 		} tlbi;
 
+		#define CMDQ_OP_ATC_INV		0x40
+		#define ATC_INV_SIZE_ALL	52
+		struct {
+			u32			sid;
+			u32			ssid;
+			u64			addr;
+			u8			size;
+			bool			global;
+		} atc;
+
 		#define CMDQ_OP_PRI_RESP	0x41
 		struct {
 			u32			sid;
@@ -928,6 +955,14 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 	case CMDQ_OP_TLBI_EL2_ASID:
 		cmd[0] |= (u64)ent->tlbi.asid << CMDQ_TLBI_0_ASID_SHIFT;
 		break;
+	case CMDQ_OP_ATC_INV:
+		cmd[0] |= ent->substream_valid ? CMDQ_0_SSV : 0;
+		cmd[0] |= ent->atc.global ? CMDQ_ATC_0_GLOBAL : 0;
+		cmd[0] |= ent->atc.ssid << CMDQ_ATC_0_SSID_SHIFT;
+		cmd[0] |= (u64)ent->atc.sid << CMDQ_ATC_0_SID_SHIFT;
+		cmd[1] |= ent->atc.size << CMDQ_ATC_1_SIZE_SHIFT;
+		cmd[1] |= ent->atc.addr & CMDQ_ATC_1_ADDR_MASK;
+		break;
 	case CMDQ_OP_PRI_RESP:
 		cmd[0] |= ent->substream_valid ? CMDQ_0_SSV : 0;
 		cmd[0] |= ent->pri.ssid << CMDQ_PRI_0_SSID_SHIFT;
@@ -984,6 +1019,7 @@ static void arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu)
 		[CMDQ_ERR_CERROR_NONE_IDX]	= "No error",
 		[CMDQ_ERR_CERROR_ILL_IDX]	= "Illegal command",
 		[CMDQ_ERR_CERROR_ABT_IDX]	= "Abort on command fetch",
+		[CMDQ_ERR_CERROR_ATC_INV_IDX]	= "ATC invalidate timeout",
 	};
 
 	int i;
@@ -1003,6 +1039,14 @@ static void arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu)
 		dev_err(smmu->dev, "retrying command fetch\n");
 	case CMDQ_ERR_CERROR_NONE_IDX:
 		return;
+	case CMDQ_ERR_CERROR_ATC_INV_IDX:
+		/*
+		 * ATC Invalidation Completion timeout. CONS is still pointing
+		 * at the CMD_SYNC. Attempt to complete other pending commands
+		 * by repeating the CMD_SYNC, though we might well end up back
+		 * here since the ATC invalidation may still be pending.
+		 */
+		return;
 	case CMDQ_ERR_CERROR_ILL_IDX:
 		/* Fallthrough */
 	default:
@@ -1261,9 +1305,6 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 			 STRTAB_STE_1_S1C_CACHE_WBRA
 			 << STRTAB_STE_1_S1COR_SHIFT |
 			 STRTAB_STE_1_S1C_SH_ISH << STRTAB_STE_1_S1CSH_SHIFT |
-#ifdef CONFIG_PCI_ATS
-			 STRTAB_STE_1_EATS_TRANS << STRTAB_STE_1_EATS_SHIFT |
-#endif
 			 (smmu->features & ARM_SMMU_FEAT_E2H ?
 			  STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1) <<
 			 STRTAB_STE_1_STRW_SHIFT);
@@ -1300,6 +1341,10 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 		val |= STRTAB_STE_0_CFG_S2_TRANS;
 	}
 
+	if (IS_ENABLED(CONFIG_PCI_ATS))
+		dst[1] |= cpu_to_le64(STRTAB_STE_1_EATS_TRANS
+				      << STRTAB_STE_1_EATS_SHIFT);
+
 	arm_smmu_sync_ste_for_sid(smmu, sid);
 	dst[0] = cpu_to_le64(val);
 	arm_smmu_sync_ste_for_sid(smmu, sid);
@@ -1680,6 +1725,104 @@ static irqreturn_t arm_smmu_combined_irq_handler(int irq, void *dev)
 	return IRQ_WAKE_THREAD;
 }
 
+/* ATS invalidation */
+static bool arm_smmu_master_has_ats(struct arm_smmu_master_data *master)
+{
+	return dev_is_pci(master->dev) && to_pci_dev(master->dev)->ats_enabled;
+}
+
+static void
+arm_smmu_atc_inv_to_cmd(int ssid, unsigned long iova, size_t size,
+			struct arm_smmu_cmdq_ent *cmd)
+{
+	size_t log2_span;
+	size_t span_mask;
+	/* ATC invalidates are always on 4096 bytes pages */
+	size_t inval_grain_shift = 12;
+	unsigned long page_start, page_end;
+
+	*cmd = (struct arm_smmu_cmdq_ent) {
+		.opcode			= CMDQ_OP_ATC_INV,
+		.substream_valid	= !!ssid,
+		.atc.ssid		= ssid,
+	};
+
+	if (!size) {
+		cmd->atc.size = ATC_INV_SIZE_ALL;
+		return;
+	}
+
+	page_start	= iova >> inval_grain_shift;
+	page_end	= (iova + size - 1) >> inval_grain_shift;
+
+	/*
+	 * Find the smallest power of two that covers the range. Most
+	 * significant differing bit between start and end address indicates the
+	 * required span, ie. fls(start ^ end). For example:
+	 *
+	 * We want to invalidate pages [8; 11]. This is already the ideal range:
+	 *		x = 0b1000 ^ 0b1011 = 0b11
+	 *		span = 1 << fls(x) = 4
+	 *
+	 * To invalidate pages [7; 10], we need to invalidate [0; 15]:
+	 *		x = 0b0111 ^ 0b1010 = 0b1101
+	 *		span = 1 << fls(x) = 16
+	 */
+	log2_span	= fls_long(page_start ^ page_end);
+	span_mask	= (1ULL << log2_span) - 1;
+
+	page_start	&= ~span_mask;
+
+	cmd->atc.addr	= page_start << inval_grain_shift;
+	cmd->atc.size	= log2_span;
+}
+
+static int arm_smmu_atc_inv_master(struct arm_smmu_master_data *master,
+				   struct arm_smmu_cmdq_ent *cmd)
+{
+	int i;
+	struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
+
+	if (!arm_smmu_master_has_ats(master))
+		return 0;
+
+	for (i = 0; i < fwspec->num_ids; i++) {
+		cmd->atc.sid = fwspec->ids[i];
+		arm_smmu_cmdq_issue_cmd(master->smmu, cmd);
+	}
+
+	arm_smmu_cmdq_issue_sync(master->smmu);
+
+	return 0;
+}
+
+static int arm_smmu_atc_inv_master_all(struct arm_smmu_master_data *master,
+				       int ssid)
+{
+	struct arm_smmu_cmdq_ent cmd;
+
+	arm_smmu_atc_inv_to_cmd(ssid, 0, 0, &cmd);
+	return arm_smmu_atc_inv_master(master, &cmd);
+}
+
+static size_t
+arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
+			unsigned long iova, size_t size)
+{
+	unsigned long flags;
+	struct arm_smmu_cmdq_ent cmd;
+	struct arm_smmu_master_data *master;
+
+	arm_smmu_atc_inv_to_cmd(ssid, iova, size, &cmd);
+
+	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+	list_for_each_entry(master, &smmu_domain->devices, list)
+		arm_smmu_atc_inv_master(master, &cmd);
+	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+
+	return size;
+}
+
 /* IO_PGTABLE API */
 static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu)
 {
@@ -2092,6 +2235,8 @@ static void arm_smmu_detach_dev(struct device *dev)
 	if (smmu_domain) {
 		__iommu_sva_unbind_dev_all(dev);
 
+		arm_smmu_atc_inv_master_all(master, 0);
+
 		spin_lock_irqsave(&smmu_domain->devices_lock, flags);
 		list_del(&master->list);
 		spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
@@ -2179,12 +2324,19 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
 static size_t
 arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
 {
-	struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
+	int ret;
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
 
 	if (!ops)
 		return 0;
 
-	return ops->unmap(ops, iova, size);
+	ret = ops->unmap(ops, iova, size);
+
+	if (ret && smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS)
+		ret = arm_smmu_atc_inv_domain(smmu_domain, 0, iova, size);
+
+	return ret;
 }
 
 static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
@@ -2342,6 +2494,48 @@ static bool arm_smmu_sid_in_range(struct arm_smmu_device *smmu, u32 sid)
 	return sid < limit;
 }
 
+static int arm_smmu_enable_ats(struct arm_smmu_master_data *master)
+{
+	int ret;
+	size_t stu;
+	struct pci_dev *pdev;
+	struct arm_smmu_device *smmu = master->smmu;
+	struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
+
+	if (!(smmu->features & ARM_SMMU_FEAT_ATS) || !dev_is_pci(master->dev) ||
+	    (fwspec->flags & IOMMU_FWSPEC_PCI_NO_ATS))
+		return -ENOSYS;
+
+	pdev = to_pci_dev(master->dev);
+
+	/* Smallest Translation Unit: log2 of the smallest supported granule */
+	stu = __ffs(smmu->pgsize_bitmap);
+
+	ret = pci_enable_ats(pdev, stu);
+	if (ret)
+		return ret;
+
+	dev_dbg(&pdev->dev, "enabled ATS (STU=%zu, QDEP=%d)\n", stu,
+		pci_ats_queue_depth(pdev));
+
+	return 0;
+}
+
+static void arm_smmu_disable_ats(struct arm_smmu_master_data *master)
+{
+	struct pci_dev *pdev;
+
+	if (!dev_is_pci(master->dev))
+		return;
+
+	pdev = to_pci_dev(master->dev);
+
+	if (!pdev->ats_enabled)
+		return;
+
+	pci_disable_ats(pdev);
+}
+
 static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
 				  struct arm_smmu_master_data *master)
 {
@@ -2462,14 +2656,24 @@ static int arm_smmu_add_device(struct device *dev)
 		master->ste.can_stall = true;
 	}
 
+	arm_smmu_enable_ats(master);
+
 	group = iommu_group_get_for_dev(dev);
-	if (!IS_ERR(group)) {
-		arm_smmu_insert_master(smmu, master);
-		iommu_group_put(group);
-		iommu_device_link(&smmu->iommu, dev);
+	if (IS_ERR(group)) {
+		ret = PTR_ERR(group);
+		goto err_disable_ats;
 	}
 
-	return PTR_ERR_OR_ZERO(group);
+	iommu_group_put(group);
+	arm_smmu_insert_master(smmu, master);
+	iommu_device_link(&smmu->iommu, dev);
+
+	return 0;
+
+err_disable_ats:
+	arm_smmu_disable_ats(master);
+
+	return ret;
 }
 
 static void arm_smmu_remove_device(struct device *dev)
@@ -2486,6 +2690,8 @@ static void arm_smmu_remove_device(struct device *dev)
 	if (master && master->ste.assigned)
 		arm_smmu_detach_dev(dev);
 	arm_smmu_remove_master(smmu, master);
+	arm_smmu_disable_ats(master);
+
 	iommu_group_remove_device(dev);
 	iommu_device_unlink(&smmu->iommu, dev);
 	kfree(master);
@@ -3094,6 +3300,16 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
 		}
 	}
 
+	if (smmu->features & ARM_SMMU_FEAT_ATS && !disable_ats_check) {
+		enables |= CR0_ATSCHK;
+		ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0,
+					      ARM_SMMU_CR0ACK);
+		if (ret) {
+			dev_err(smmu->dev, "failed to enable ATS check\n");
+			return ret;
+		}
+	}
+
 	ret = arm_smmu_setup_irqs(smmu);
 	if (ret) {
 		dev_err(smmu->dev, "failed to setup irqs\n");
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 31/37] iommu/arm-smmu-v3: Add support for PCI ATS
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

PCIe devices can implement their own TLB, named Address Translation Cache
(ATC). Enable Address Translation Service (ATS) for devices that support
it and send them invalidation requests whenever we invalidate the IOTLBs.

  Range calculation
  -----------------

The invalidation packet itself is a bit awkward: range must be naturally
aligned, which means that the start address is a multiple of the range
size. In addition, the size must be a power of two number of 4k pages. We
have a few options to enforce this constraint:

(1) Find the smallest naturally aligned region that covers the requested
    range. This is simple to compute and only takes one ATC_INV, but it
    will spill on lots of neighbouring ATC entries.

(2) Align the start address to the region size (rounded up to a power of
    two), and send a second invalidation for the next range of the same
    size. Still not great, but reduces spilling.

(3) Cover the range exactly with the smallest number of naturally aligned
    regions. This would be interesting to implement but as for (2),
    requires multiple ATC_INV.

As I suspect ATC invalidation packets will be a very scarce resource, I'll
go with option (1) for now, and only send one big invalidation. We can
move to (2), which is both easier to read and more gentle with the ATC,
once we've observed on real systems that we can send multiple smaller
Invalidation Requests for roughly the same price as a single big one.

Note that with io-pgtable, the unmap function is called for each page, so
this doesn't matter. The problem shows up when sharing page tables with
the MMU.

  Timeout
  -------

ATC invalidation is allowed to take up to 90 seconds, according to the
PCIe spec, so it is possible to hit the SMMU command queue timeout during
normal operations.

Some SMMU implementations will raise a CERROR_ATC_INV_SYNC when a CMD_SYNC
fails because of an ATC invalidation. Some will just abort the CMD_SYNC.
Others might let CMD_SYNC complete and have an asynchronous IMPDEF
mechanism to record the error. When we receive a CERROR_ATC_INV_SYNC, we
could retry sending all ATC_INV since last successful CMD_SYNC. When a
CMD_SYNC fails without CERROR_ATC_INV_SYNC, we could retry sending *all*
commands since last successful CMD_SYNC.

We cannot afford to wait 90 seconds in iommu_unmap, let alone MMU
notifiers. So we'd have to introduce a more clever system if this timeout
becomes a problem, like keeping hold of mappings and invalidating in the
background. Implementing safe delayed invalidations is a very complex
problem and deserves a series of its own. We'll assess whether more work
is needed to properly handle ATC invalidation timeouts once this code runs
on real hardware.

  Misc
  ----

I didn't put ATC and TLB invalidations in the same functions for three
reasons:

* TLB invalidation by range is batched and committed with a single sync.
  Batching ATC invalidation is inconvenient, endpoints limit the number of
  inflight invalidations. We'd have to count the number of invalidations
  queued and send a sync periodically. In addition, I suspect we always
  need a sync between TLB and ATC invalidation for the same page.

* Doing ATC invalidation outside tlb_inv_range also allows to send less
  requests, since TLB invalidations are done per page or block, while ATC
  invalidations target IOVA ranges.

* TLB invalidation by context is performed when freeing the domain, at
  which point there isn't any device attached anymore.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 236 ++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 226 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 8b9f5dd06be0..76513135310f 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -37,6 +37,7 @@
 #include <linux/of_iommu.h>
 #include <linux/of_platform.h>
 #include <linux/pci.h>
+#include <linux/pci-ats.h>
 #include <linux/platform_device.h>
 #include <linux/sched/mm.h>
 
@@ -109,6 +110,7 @@
 #define IDR5_OAS_48_BIT			(5 << IDR5_OAS_SHIFT)
 
 #define ARM_SMMU_CR0			0x20
+#define CR0_ATSCHK			(1 << 4)
 #define CR0_CMDQEN			(1 << 3)
 #define CR0_EVTQEN			(1 << 2)
 #define CR0_PRIQEN			(1 << 1)
@@ -304,6 +306,7 @@
 #define CMDQ_ERR_CERROR_NONE_IDX	0
 #define CMDQ_ERR_CERROR_ILL_IDX		1
 #define CMDQ_ERR_CERROR_ABT_IDX		2
+#define CMDQ_ERR_CERROR_ATC_INV_IDX	3
 
 #define CMDQ_0_OP_SHIFT			0
 #define CMDQ_0_OP_MASK			0xffUL
@@ -327,6 +330,15 @@
 #define CMDQ_TLBI_1_VA_MASK		~0xfffUL
 #define CMDQ_TLBI_1_IPA_MASK		0xfffffffff000UL
 
+#define CMDQ_ATC_0_SSID_SHIFT		12
+#define CMDQ_ATC_0_SSID_MASK		0xfffffUL
+#define CMDQ_ATC_0_SID_SHIFT		32
+#define CMDQ_ATC_0_SID_MASK		0xffffffffUL
+#define CMDQ_ATC_0_GLOBAL		(1UL << 9)
+#define CMDQ_ATC_1_SIZE_SHIFT		0
+#define CMDQ_ATC_1_SIZE_MASK		0x3fUL
+#define CMDQ_ATC_1_ADDR_MASK		~0xfffUL
+
 #define CMDQ_PRI_0_SSID_SHIFT		12
 #define CMDQ_PRI_0_SSID_MASK		0xfffffUL
 #define CMDQ_PRI_0_SID_SHIFT		32
@@ -425,6 +437,11 @@ module_param_named(disable_bypass, disable_bypass, bool, S_IRUGO);
 MODULE_PARM_DESC(disable_bypass,
 	"Disable bypass streams such that incoming transactions from devices that are not attached to an iommu domain will report an abort back to the device and will not be allowed to pass through the SMMU.");
 
+static bool disable_ats_check;
+module_param_named(disable_ats_check, disable_ats_check, bool, S_IRUGO);
+MODULE_PARM_DESC(disable_ats_check,
+	"By default, the SMMU checks whether each incoming transaction marked as translated is allowed by the stream configuration. This option disables the check.");
+
 enum pri_resp {
 	PRI_RESP_DENY,
 	PRI_RESP_FAIL,
@@ -498,6 +515,16 @@ struct arm_smmu_cmdq_ent {
 			u64			addr;
 		} tlbi;
 
+		#define CMDQ_OP_ATC_INV		0x40
+		#define ATC_INV_SIZE_ALL	52
+		struct {
+			u32			sid;
+			u32			ssid;
+			u64			addr;
+			u8			size;
+			bool			global;
+		} atc;
+
 		#define CMDQ_OP_PRI_RESP	0x41
 		struct {
 			u32			sid;
@@ -928,6 +955,14 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 	case CMDQ_OP_TLBI_EL2_ASID:
 		cmd[0] |= (u64)ent->tlbi.asid << CMDQ_TLBI_0_ASID_SHIFT;
 		break;
+	case CMDQ_OP_ATC_INV:
+		cmd[0] |= ent->substream_valid ? CMDQ_0_SSV : 0;
+		cmd[0] |= ent->atc.global ? CMDQ_ATC_0_GLOBAL : 0;
+		cmd[0] |= ent->atc.ssid << CMDQ_ATC_0_SSID_SHIFT;
+		cmd[0] |= (u64)ent->atc.sid << CMDQ_ATC_0_SID_SHIFT;
+		cmd[1] |= ent->atc.size << CMDQ_ATC_1_SIZE_SHIFT;
+		cmd[1] |= ent->atc.addr & CMDQ_ATC_1_ADDR_MASK;
+		break;
 	case CMDQ_OP_PRI_RESP:
 		cmd[0] |= ent->substream_valid ? CMDQ_0_SSV : 0;
 		cmd[0] |= ent->pri.ssid << CMDQ_PRI_0_SSID_SHIFT;
@@ -984,6 +1019,7 @@ static void arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu)
 		[CMDQ_ERR_CERROR_NONE_IDX]	= "No error",
 		[CMDQ_ERR_CERROR_ILL_IDX]	= "Illegal command",
 		[CMDQ_ERR_CERROR_ABT_IDX]	= "Abort on command fetch",
+		[CMDQ_ERR_CERROR_ATC_INV_IDX]	= "ATC invalidate timeout",
 	};
 
 	int i;
@@ -1003,6 +1039,14 @@ static void arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu)
 		dev_err(smmu->dev, "retrying command fetch\n");
 	case CMDQ_ERR_CERROR_NONE_IDX:
 		return;
+	case CMDQ_ERR_CERROR_ATC_INV_IDX:
+		/*
+		 * ATC Invalidation Completion timeout. CONS is still pointing
+		 * at the CMD_SYNC. Attempt to complete other pending commands
+		 * by repeating the CMD_SYNC, though we might well end up back
+		 * here since the ATC invalidation may still be pending.
+		 */
+		return;
 	case CMDQ_ERR_CERROR_ILL_IDX:
 		/* Fallthrough */
 	default:
@@ -1261,9 +1305,6 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 			 STRTAB_STE_1_S1C_CACHE_WBRA
 			 << STRTAB_STE_1_S1COR_SHIFT |
 			 STRTAB_STE_1_S1C_SH_ISH << STRTAB_STE_1_S1CSH_SHIFT |
-#ifdef CONFIG_PCI_ATS
-			 STRTAB_STE_1_EATS_TRANS << STRTAB_STE_1_EATS_SHIFT |
-#endif
 			 (smmu->features & ARM_SMMU_FEAT_E2H ?
 			  STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1) <<
 			 STRTAB_STE_1_STRW_SHIFT);
@@ -1300,6 +1341,10 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 		val |= STRTAB_STE_0_CFG_S2_TRANS;
 	}
 
+	if (IS_ENABLED(CONFIG_PCI_ATS))
+		dst[1] |= cpu_to_le64(STRTAB_STE_1_EATS_TRANS
+				      << STRTAB_STE_1_EATS_SHIFT);
+
 	arm_smmu_sync_ste_for_sid(smmu, sid);
 	dst[0] = cpu_to_le64(val);
 	arm_smmu_sync_ste_for_sid(smmu, sid);
@@ -1680,6 +1725,104 @@ static irqreturn_t arm_smmu_combined_irq_handler(int irq, void *dev)
 	return IRQ_WAKE_THREAD;
 }
 
+/* ATS invalidation */
+static bool arm_smmu_master_has_ats(struct arm_smmu_master_data *master)
+{
+	return dev_is_pci(master->dev) && to_pci_dev(master->dev)->ats_enabled;
+}
+
+static void
+arm_smmu_atc_inv_to_cmd(int ssid, unsigned long iova, size_t size,
+			struct arm_smmu_cmdq_ent *cmd)
+{
+	size_t log2_span;
+	size_t span_mask;
+	/* ATC invalidates are always on 4096 bytes pages */
+	size_t inval_grain_shift = 12;
+	unsigned long page_start, page_end;
+
+	*cmd = (struct arm_smmu_cmdq_ent) {
+		.opcode			= CMDQ_OP_ATC_INV,
+		.substream_valid	= !!ssid,
+		.atc.ssid		= ssid,
+	};
+
+	if (!size) {
+		cmd->atc.size = ATC_INV_SIZE_ALL;
+		return;
+	}
+
+	page_start	= iova >> inval_grain_shift;
+	page_end	= (iova + size - 1) >> inval_grain_shift;
+
+	/*
+	 * Find the smallest power of two that covers the range. Most
+	 * significant differing bit between start and end address indicates the
+	 * required span, ie. fls(start ^ end). For example:
+	 *
+	 * We want to invalidate pages [8; 11]. This is already the ideal range:
+	 *		x = 0b1000 ^ 0b1011 = 0b11
+	 *		span = 1 << fls(x) = 4
+	 *
+	 * To invalidate pages [7; 10], we need to invalidate [0; 15]:
+	 *		x = 0b0111 ^ 0b1010 = 0b1101
+	 *		span = 1 << fls(x) = 16
+	 */
+	log2_span	= fls_long(page_start ^ page_end);
+	span_mask	= (1ULL << log2_span) - 1;
+
+	page_start	&= ~span_mask;
+
+	cmd->atc.addr	= page_start << inval_grain_shift;
+	cmd->atc.size	= log2_span;
+}
+
+static int arm_smmu_atc_inv_master(struct arm_smmu_master_data *master,
+				   struct arm_smmu_cmdq_ent *cmd)
+{
+	int i;
+	struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
+
+	if (!arm_smmu_master_has_ats(master))
+		return 0;
+
+	for (i = 0; i < fwspec->num_ids; i++) {
+		cmd->atc.sid = fwspec->ids[i];
+		arm_smmu_cmdq_issue_cmd(master->smmu, cmd);
+	}
+
+	arm_smmu_cmdq_issue_sync(master->smmu);
+
+	return 0;
+}
+
+static int arm_smmu_atc_inv_master_all(struct arm_smmu_master_data *master,
+				       int ssid)
+{
+	struct arm_smmu_cmdq_ent cmd;
+
+	arm_smmu_atc_inv_to_cmd(ssid, 0, 0, &cmd);
+	return arm_smmu_atc_inv_master(master, &cmd);
+}
+
+static size_t
+arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
+			unsigned long iova, size_t size)
+{
+	unsigned long flags;
+	struct arm_smmu_cmdq_ent cmd;
+	struct arm_smmu_master_data *master;
+
+	arm_smmu_atc_inv_to_cmd(ssid, iova, size, &cmd);
+
+	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+	list_for_each_entry(master, &smmu_domain->devices, list)
+		arm_smmu_atc_inv_master(master, &cmd);
+	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+
+	return size;
+}
+
 /* IO_PGTABLE API */
 static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu)
 {
@@ -2092,6 +2235,8 @@ static void arm_smmu_detach_dev(struct device *dev)
 	if (smmu_domain) {
 		__iommu_sva_unbind_dev_all(dev);
 
+		arm_smmu_atc_inv_master_all(master, 0);
+
 		spin_lock_irqsave(&smmu_domain->devices_lock, flags);
 		list_del(&master->list);
 		spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
@@ -2179,12 +2324,19 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
 static size_t
 arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
 {
-	struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
+	int ret;
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
 
 	if (!ops)
 		return 0;
 
-	return ops->unmap(ops, iova, size);
+	ret = ops->unmap(ops, iova, size);
+
+	if (ret && smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS)
+		ret = arm_smmu_atc_inv_domain(smmu_domain, 0, iova, size);
+
+	return ret;
 }
 
 static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
@@ -2342,6 +2494,48 @@ static bool arm_smmu_sid_in_range(struct arm_smmu_device *smmu, u32 sid)
 	return sid < limit;
 }
 
+static int arm_smmu_enable_ats(struct arm_smmu_master_data *master)
+{
+	int ret;
+	size_t stu;
+	struct pci_dev *pdev;
+	struct arm_smmu_device *smmu = master->smmu;
+	struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
+
+	if (!(smmu->features & ARM_SMMU_FEAT_ATS) || !dev_is_pci(master->dev) ||
+	    (fwspec->flags & IOMMU_FWSPEC_PCI_NO_ATS))
+		return -ENOSYS;
+
+	pdev = to_pci_dev(master->dev);
+
+	/* Smallest Translation Unit: log2 of the smallest supported granule */
+	stu = __ffs(smmu->pgsize_bitmap);
+
+	ret = pci_enable_ats(pdev, stu);
+	if (ret)
+		return ret;
+
+	dev_dbg(&pdev->dev, "enabled ATS (STU=%zu, QDEP=%d)\n", stu,
+		pci_ats_queue_depth(pdev));
+
+	return 0;
+}
+
+static void arm_smmu_disable_ats(struct arm_smmu_master_data *master)
+{
+	struct pci_dev *pdev;
+
+	if (!dev_is_pci(master->dev))
+		return;
+
+	pdev = to_pci_dev(master->dev);
+
+	if (!pdev->ats_enabled)
+		return;
+
+	pci_disable_ats(pdev);
+}
+
 static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
 				  struct arm_smmu_master_data *master)
 {
@@ -2462,14 +2656,24 @@ static int arm_smmu_add_device(struct device *dev)
 		master->ste.can_stall = true;
 	}
 
+	arm_smmu_enable_ats(master);
+
 	group = iommu_group_get_for_dev(dev);
-	if (!IS_ERR(group)) {
-		arm_smmu_insert_master(smmu, master);
-		iommu_group_put(group);
-		iommu_device_link(&smmu->iommu, dev);
+	if (IS_ERR(group)) {
+		ret = PTR_ERR(group);
+		goto err_disable_ats;
 	}
 
-	return PTR_ERR_OR_ZERO(group);
+	iommu_group_put(group);
+	arm_smmu_insert_master(smmu, master);
+	iommu_device_link(&smmu->iommu, dev);
+
+	return 0;
+
+err_disable_ats:
+	arm_smmu_disable_ats(master);
+
+	return ret;
 }
 
 static void arm_smmu_remove_device(struct device *dev)
@@ -2486,6 +2690,8 @@ static void arm_smmu_remove_device(struct device *dev)
 	if (master && master->ste.assigned)
 		arm_smmu_detach_dev(dev);
 	arm_smmu_remove_master(smmu, master);
+	arm_smmu_disable_ats(master);
+
 	iommu_group_remove_device(dev);
 	iommu_device_unlink(&smmu->iommu, dev);
 	kfree(master);
@@ -3094,6 +3300,16 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
 		}
 	}
 
+	if (smmu->features & ARM_SMMU_FEAT_ATS && !disable_ats_check) {
+		enables |= CR0_ATSCHK;
+		ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0,
+					      ARM_SMMU_CR0ACK);
+		if (ret) {
+			dev_err(smmu->dev, "failed to enable ATS check\n");
+			return ret;
+		}
+	}
+
 	ret = arm_smmu_setup_irqs(smmu);
 	if (ret) {
 		dev_err(smmu->dev, "failed to setup irqs\n");
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 32/37] iommu/arm-smmu-v3: Hook up ATC invalidation to mm ops
  2018-02-12 18:33 ` Jean-Philippe Brucker
  (?)
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: joro-zLv9SwRftAIdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	mark.rutland-5wv7dgnIgG8, catalin.marinas-5wv7dgnIgG8,
	will.deacon-5wv7dgnIgG8, lorenzo.pieralisi-5wv7dgnIgG8,
	hanjun.guo-QSEj5FYQhm4dnm+yROfE0A, sudeep.holla-5wv7dgnIgG8,
	rjw-LthD3rsA81gm4RdzfppkhA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	robin.murphy-5wv7dgnIgG8, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	tn-nYOzD4b6Jr9Wk0Htik3J/w, liubo95-hv44wF8Li93QT0dZR+AlfA,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	jonathan.cameron-hv44wF8Li93QT0dZR+AlfA,
	shunyong.yang-PT9Dzx9SjPiXmMXjJBpWqg,
	nwatters-sgV2jX0FEOL9JmXXK+q4OQ, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	jcrouse-sgV2jX0FEOL9JmXXK+q4OQ, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA,
	yi.l.liu-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	robdclark-Re5JQEeQqe8AvxtiuMwx3w, christian.koenig-5C7GfCeVMHo,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA

The core calls us when an mm is modified. Perform the required ATC
invalidations.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu-v3.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 76513135310f..8d09615fab35 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1805,6 +1805,15 @@ static int arm_smmu_atc_inv_master_all(struct arm_smmu_master_data *master,
 	return arm_smmu_atc_inv_master(master, &cmd);
 }
 
+static int arm_smmu_atc_inv_master_range(struct arm_smmu_master_data *master,
+					 int ssid, unsigned long iova, size_t size)
+{
+	struct arm_smmu_cmdq_ent cmd;
+
+	arm_smmu_atc_inv_to_cmd(ssid, iova, size, &cmd);
+	return arm_smmu_atc_inv_master(master, &cmd);
+}
+
 static size_t
 arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
 			unsigned long iova, size_t size)
@@ -2450,11 +2459,12 @@ static void arm_smmu_mm_detach(struct iommu_domain *domain, struct device *dev,
 	struct arm_smmu_mm *smmu_mm = to_smmu_mm(io_mm);
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 	struct iommu_pasid_table_ops *ops = smmu_domain->s1_cfg.ops;
+	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
 
 	if (detach_domain)
 		ops->clear_entry(ops, io_mm->pasid, smmu_mm->cd);
 
-	/* TODO: Invalidate ATC. */
+	arm_smmu_atc_inv_master_all(master, io_mm->pasid);
 	/* TODO: Invalidate all mappings if last and not DVM. */
 }
 
@@ -2462,8 +2472,10 @@ static void arm_smmu_mm_invalidate(struct iommu_domain *domain,
 				   struct device *dev, struct io_mm *io_mm,
 				   unsigned long iova, size_t size)
 {
+	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
+
+	arm_smmu_atc_inv_master_range(master, io_mm->pasid, iova, size);
 	/*
-	 * TODO: Invalidate ATC.
 	 * TODO: Invalidate mapping if not DVM
 	 */
 }
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 32/37] iommu/arm-smmu-v3: Hook up ATC invalidation to mm ops
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

The core calls us when an mm is modified. Perform the required ATC
invalidations.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 76513135310f..8d09615fab35 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1805,6 +1805,15 @@ static int arm_smmu_atc_inv_master_all(struct arm_smmu_master_data *master,
 	return arm_smmu_atc_inv_master(master, &cmd);
 }
 
+static int arm_smmu_atc_inv_master_range(struct arm_smmu_master_data *master,
+					 int ssid, unsigned long iova, size_t size)
+{
+	struct arm_smmu_cmdq_ent cmd;
+
+	arm_smmu_atc_inv_to_cmd(ssid, iova, size, &cmd);
+	return arm_smmu_atc_inv_master(master, &cmd);
+}
+
 static size_t
 arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
 			unsigned long iova, size_t size)
@@ -2450,11 +2459,12 @@ static void arm_smmu_mm_detach(struct iommu_domain *domain, struct device *dev,
 	struct arm_smmu_mm *smmu_mm = to_smmu_mm(io_mm);
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 	struct iommu_pasid_table_ops *ops = smmu_domain->s1_cfg.ops;
+	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
 
 	if (detach_domain)
 		ops->clear_entry(ops, io_mm->pasid, smmu_mm->cd);
 
-	/* TODO: Invalidate ATC. */
+	arm_smmu_atc_inv_master_all(master, io_mm->pasid);
 	/* TODO: Invalidate all mappings if last and not DVM. */
 }
 
@@ -2462,8 +2472,10 @@ static void arm_smmu_mm_invalidate(struct iommu_domain *domain,
 				   struct device *dev, struct io_mm *io_mm,
 				   unsigned long iova, size_t size)
 {
+	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
+
+	arm_smmu_atc_inv_master_range(master, io_mm->pasid, iova, size);
 	/*
-	 * TODO: Invalidate ATC.
 	 * TODO: Invalidate mapping if not DVM
 	 */
 }
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 32/37] iommu/arm-smmu-v3: Hook up ATC invalidation to mm ops
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

The core calls us when an mm is modified. Perform the required ATC
invalidations.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 76513135310f..8d09615fab35 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1805,6 +1805,15 @@ static int arm_smmu_atc_inv_master_all(struct arm_smmu_master_data *master,
 	return arm_smmu_atc_inv_master(master, &cmd);
 }
 
+static int arm_smmu_atc_inv_master_range(struct arm_smmu_master_data *master,
+					 int ssid, unsigned long iova, size_t size)
+{
+	struct arm_smmu_cmdq_ent cmd;
+
+	arm_smmu_atc_inv_to_cmd(ssid, iova, size, &cmd);
+	return arm_smmu_atc_inv_master(master, &cmd);
+}
+
 static size_t
 arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
 			unsigned long iova, size_t size)
@@ -2450,11 +2459,12 @@ static void arm_smmu_mm_detach(struct iommu_domain *domain, struct device *dev,
 	struct arm_smmu_mm *smmu_mm = to_smmu_mm(io_mm);
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 	struct iommu_pasid_table_ops *ops = smmu_domain->s1_cfg.ops;
+	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
 
 	if (detach_domain)
 		ops->clear_entry(ops, io_mm->pasid, smmu_mm->cd);
 
-	/* TODO: Invalidate ATC. */
+	arm_smmu_atc_inv_master_all(master, io_mm->pasid);
 	/* TODO: Invalidate all mappings if last and not DVM. */
 }
 
@@ -2462,8 +2472,10 @@ static void arm_smmu_mm_invalidate(struct iommu_domain *domain,
 				   struct device *dev, struct io_mm *io_mm,
 				   unsigned long iova, size_t size)
 {
+	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
+
+	arm_smmu_atc_inv_master_range(master, io_mm->pasid, iova, size);
 	/*
-	 * TODO: Invalidate ATC.
 	 * TODO: Invalidate mapping if not DVM
 	 */
 }
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 33/37] iommu/arm-smmu-v3: Disable tagged pointers
  2018-02-12 18:33 ` Jean-Philippe Brucker
  (?)
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: mark.rutland-5wv7dgnIgG8,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	catalin.marinas-5wv7dgnIgG8, xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	will.deacon-5wv7dgnIgG8, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	shunyong.yang-PT9Dzx9SjPiXmMXjJBpWqg,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw-LthD3rsA81gm4RdzfppkhA,
	sudeep.holla-5wv7dgnIgG8, christian.koenig-5C7GfCeVMHo

The ARM architecture has a "Top Byte Ignore" (TBI) option that makes the
MMU mask out bits [63:56] of an address, allowing a userspace application
to store data in its pointers. This option is incompatible with PCI ATS.

If TBI is enabled in the SMMU and userspace triggers DMA transactions on
tagged pointers, the endpoint might create ATC entries for addresses that
include a tag. Software would then have to send ATC invalidation packets
for each 255 possible alias of an address, or just wipe the whole address
space. This is not a viable option, so disable TBI.

The impact of this change is unclear, since there are very few users of
tagged pointers, much less SVA. But the requirement introduced by this
patch doesn't seem excessive: a userspace application using both tagged
pointers and SVA should now sanitize addresses (clear the tag) before
using them for device DMA.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu-v3-context.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-v3-context.c b/drivers/iommu/arm-smmu-v3-context.c
index eaeba1bec2e9..0479cae5249c 100644
--- a/drivers/iommu/arm-smmu-v3-context.c
+++ b/drivers/iommu/arm-smmu-v3-context.c
@@ -221,7 +221,6 @@ static u64 arm_smmu_cpu_tcr_to_cd(struct arm_smmu_context_cfg *cfg, u64 tcr)
 	val |= ARM_SMMU_TCR2CD(tcr, EPD0);
 	val |= ARM_SMMU_TCR2CD(tcr, EPD1);
 	val |= ARM_SMMU_TCR2CD(tcr, IPS);
-	val |= ARM_SMMU_TCR2CD(tcr, TBI0);
 
 	if (cfg->hw_access)
 		val |= ARM_SMMU_TCR2CD(tcr, HA);
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 33/37] iommu/arm-smmu-v3: Disable tagged pointers
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

The ARM architecture has a "Top Byte Ignore" (TBI) option that makes the
MMU mask out bits [63:56] of an address, allowing a userspace application
to store data in its pointers. This option is incompatible with PCI ATS.

If TBI is enabled in the SMMU and userspace triggers DMA transactions on
tagged pointers, the endpoint might create ATC entries for addresses that
include a tag. Software would then have to send ATC invalidation packets
for each 255 possible alias of an address, or just wipe the whole address
space. This is not a viable option, so disable TBI.

The impact of this change is unclear, since there are very few users of
tagged pointers, much less SVA. But the requirement introduced by this
patch doesn't seem excessive: a userspace application using both tagged
pointers and SVA should now sanitize addresses (clear the tag) before
using them for device DMA.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3-context.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-v3-context.c b/drivers/iommu/arm-smmu-v3-context.c
index eaeba1bec2e9..0479cae5249c 100644
--- a/drivers/iommu/arm-smmu-v3-context.c
+++ b/drivers/iommu/arm-smmu-v3-context.c
@@ -221,7 +221,6 @@ static u64 arm_smmu_cpu_tcr_to_cd(struct arm_smmu_context_cfg *cfg, u64 tcr)
 	val |= ARM_SMMU_TCR2CD(tcr, EPD0);
 	val |= ARM_SMMU_TCR2CD(tcr, EPD1);
 	val |= ARM_SMMU_TCR2CD(tcr, IPS);
-	val |= ARM_SMMU_TCR2CD(tcr, TBI0);
 
 	if (cfg->hw_access)
 		val |= ARM_SMMU_TCR2CD(tcr, HA);
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 33/37] iommu/arm-smmu-v3: Disable tagged pointers
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

The ARM architecture has a "Top Byte Ignore" (TBI) option that makes the
MMU mask out bits [63:56] of an address, allowing a userspace application
to store data in its pointers. This option is incompatible with PCI ATS.

If TBI is enabled in the SMMU and userspace triggers DMA transactions on
tagged pointers, the endpoint might create ATC entries for addresses that
include a tag. Software would then have to send ATC invalidation packets
for each 255 possible alias of an address, or just wipe the whole address
space. This is not a viable option, so disable TBI.

The impact of this change is unclear, since there are very few users of
tagged pointers, much less SVA. But the requirement introduced by this
patch doesn't seem excessive: a userspace application using both tagged
pointers and SVA should now sanitize addresses (clear the tag) before
using them for device DMA.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3-context.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-v3-context.c b/drivers/iommu/arm-smmu-v3-context.c
index eaeba1bec2e9..0479cae5249c 100644
--- a/drivers/iommu/arm-smmu-v3-context.c
+++ b/drivers/iommu/arm-smmu-v3-context.c
@@ -221,7 +221,6 @@ static u64 arm_smmu_cpu_tcr_to_cd(struct arm_smmu_context_cfg *cfg, u64 tcr)
 	val |= ARM_SMMU_TCR2CD(tcr, EPD0);
 	val |= ARM_SMMU_TCR2CD(tcr, EPD1);
 	val |= ARM_SMMU_TCR2CD(tcr, IPS);
-	val |= ARM_SMMU_TCR2CD(tcr, TBI0);
 
 	if (cfg->hw_access)
 		val |= ARM_SMMU_TCR2CD(tcr, HA);
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 34/37] PCI: Make "PRG Response PASID Required" handling common
  2018-02-12 18:33 ` Jean-Philippe Brucker
@ 2018-02-12 18:33   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

The PASID ECN to the PCIe spec added a bit in the PRI status register that
allows a Function to declare whether a PRG Response should contain the
PASID prefix or not.

Move the helper that accesses it from amd_iommu into the PCI subsystem,
renaming it to be consistent with the current PCI Express specification
(PRPR - PRG Response PASID Required).

Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/amd_iommu.c     | 19 +------------------
 drivers/pci/ats.c             | 17 +++++++++++++++++
 include/linux/pci-ats.h       |  8 ++++++++
 include/uapi/linux/pci_regs.h |  1 +
 4 files changed, 27 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 74788fdeb773..4bf606747295 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2049,23 +2049,6 @@ static int pdev_iommuv2_enable(struct pci_dev *pdev)
 	return ret;
 }
 
-/* FIXME: Move this to PCI code */
-#define PCI_PRI_TLP_OFF		(1 << 15)
-
-static bool pci_pri_tlp_required(struct pci_dev *pdev)
-{
-	u16 status;
-	int pos;
-
-	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
-	if (!pos)
-		return false;
-
-	pci_read_config_word(pdev, pos + PCI_PRI_STATUS, &status);
-
-	return (status & PCI_PRI_TLP_OFF) ? true : false;
-}
-
 /*
  * If a device is not yet associated with a domain, this function
  * assigns it visible for the hardware
@@ -2094,7 +2077,7 @@ static int attach_device(struct device *dev,
 
 			dev_data->ats.enabled = true;
 			dev_data->ats.qdep    = pci_ats_queue_depth(pdev);
-			dev_data->pri_tlp     = pci_pri_tlp_required(pdev);
+			dev_data->pri_tlp     = pci_prg_resp_requires_prefix(pdev);
 		}
 	} else if (amd_iommu_iotlb_sup &&
 		   pci_enable_ats(pdev, PAGE_SHIFT) == 0) {
diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index 6ad80a1fd5a7..52bac62a0e40 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -390,3 +390,20 @@ int pci_max_pasids(struct pci_dev *pdev)
 }
 EXPORT_SYMBOL_GPL(pci_max_pasids);
 #endif /* CONFIG_PCI_PASID */
+
+#if defined(CONFIG_PCI_PASID) && defined(CONFIG_PCI_PRI)
+bool pci_prg_resp_requires_prefix(struct pci_dev *pdev)
+{
+	u16 status;
+	int pos;
+
+	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
+	if (!pos)
+		return false;
+
+	pci_read_config_word(pdev, pos + PCI_PRI_STATUS, &status);
+
+	return !!(status & PCI_PRI_STATUS_PRPR);
+}
+EXPORT_SYMBOL_GPL(pci_prg_resp_requires_prefix);
+#endif /* CONFIG_PCI_PASID && CONFIG_PCI_PRI */
diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
index 7c4b8e27268c..1825ca2c9bf4 100644
--- a/include/linux/pci-ats.h
+++ b/include/linux/pci-ats.h
@@ -68,5 +68,13 @@ static inline int pci_max_pasids(struct pci_dev *pdev)
 
 #endif /* CONFIG_PCI_PASID */
 
+#if defined(CONFIG_PCI_PASID) && defined(CONFIG_PCI_PRI)
+bool pci_prg_resp_requires_prefix(struct pci_dev *pdev);
+#else
+static inline bool pci_prg_resp_requires_prefix(struct pci_dev *pdev)
+{
+	return false;
+}
+#endif /* CONFIG_PCI_PASID && CONFIG_PCI_PRI */
 
 #endif /* LINUX_PCI_ATS_H*/
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index 0c79eac5e9b8..c8020391cfa4 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -868,6 +868,7 @@
 #define  PCI_PRI_STATUS_RF	0x001	/* Response Failure */
 #define  PCI_PRI_STATUS_UPRGI	0x002	/* Unexpected PRG index */
 #define  PCI_PRI_STATUS_STOPPED	0x100	/* PRI Stopped */
+#define  PCI_PRI_STATUS_PRPR	0x8000	/* PRG Response requires PASID prefix */
 #define PCI_PRI_MAX_REQ		0x08	/* PRI max reqs supported */
 #define PCI_PRI_ALLOC_REQ	0x0c	/* PRI max reqs allowed */
 #define PCI_EXT_CAP_PRI_SIZEOF	16
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 34/37] PCI: Make "PRG Response PASID Required" handling common
@ 2018-02-12 18:33   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

The PASID ECN to the PCIe spec added a bit in the PRI status register that
allows a Function to declare whether a PRG Response should contain the
PASID prefix or not.

Move the helper that accesses it from amd_iommu into the PCI subsystem,
renaming it to be consistent with the current PCI Express specification
(PRPR - PRG Response PASID Required).

Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/amd_iommu.c     | 19 +------------------
 drivers/pci/ats.c             | 17 +++++++++++++++++
 include/linux/pci-ats.h       |  8 ++++++++
 include/uapi/linux/pci_regs.h |  1 +
 4 files changed, 27 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 74788fdeb773..4bf606747295 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2049,23 +2049,6 @@ static int pdev_iommuv2_enable(struct pci_dev *pdev)
 	return ret;
 }
 
-/* FIXME: Move this to PCI code */
-#define PCI_PRI_TLP_OFF		(1 << 15)
-
-static bool pci_pri_tlp_required(struct pci_dev *pdev)
-{
-	u16 status;
-	int pos;
-
-	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
-	if (!pos)
-		return false;
-
-	pci_read_config_word(pdev, pos + PCI_PRI_STATUS, &status);
-
-	return (status & PCI_PRI_TLP_OFF) ? true : false;
-}
-
 /*
  * If a device is not yet associated with a domain, this function
  * assigns it visible for the hardware
@@ -2094,7 +2077,7 @@ static int attach_device(struct device *dev,
 
 			dev_data->ats.enabled = true;
 			dev_data->ats.qdep    = pci_ats_queue_depth(pdev);
-			dev_data->pri_tlp     = pci_pri_tlp_required(pdev);
+			dev_data->pri_tlp     = pci_prg_resp_requires_prefix(pdev);
 		}
 	} else if (amd_iommu_iotlb_sup &&
 		   pci_enable_ats(pdev, PAGE_SHIFT) == 0) {
diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index 6ad80a1fd5a7..52bac62a0e40 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -390,3 +390,20 @@ int pci_max_pasids(struct pci_dev *pdev)
 }
 EXPORT_SYMBOL_GPL(pci_max_pasids);
 #endif /* CONFIG_PCI_PASID */
+
+#if defined(CONFIG_PCI_PASID) && defined(CONFIG_PCI_PRI)
+bool pci_prg_resp_requires_prefix(struct pci_dev *pdev)
+{
+	u16 status;
+	int pos;
+
+	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
+	if (!pos)
+		return false;
+
+	pci_read_config_word(pdev, pos + PCI_PRI_STATUS, &status);
+
+	return !!(status & PCI_PRI_STATUS_PRPR);
+}
+EXPORT_SYMBOL_GPL(pci_prg_resp_requires_prefix);
+#endif /* CONFIG_PCI_PASID && CONFIG_PCI_PRI */
diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
index 7c4b8e27268c..1825ca2c9bf4 100644
--- a/include/linux/pci-ats.h
+++ b/include/linux/pci-ats.h
@@ -68,5 +68,13 @@ static inline int pci_max_pasids(struct pci_dev *pdev)
 
 #endif /* CONFIG_PCI_PASID */
 
+#if defined(CONFIG_PCI_PASID) && defined(CONFIG_PCI_PRI)
+bool pci_prg_resp_requires_prefix(struct pci_dev *pdev);
+#else
+static inline bool pci_prg_resp_requires_prefix(struct pci_dev *pdev)
+{
+	return false;
+}
+#endif /* CONFIG_PCI_PASID && CONFIG_PCI_PRI */
 
 #endif /* LINUX_PCI_ATS_H*/
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index 0c79eac5e9b8..c8020391cfa4 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -868,6 +868,7 @@
 #define  PCI_PRI_STATUS_RF	0x001	/* Response Failure */
 #define  PCI_PRI_STATUS_UPRGI	0x002	/* Unexpected PRG index */
 #define  PCI_PRI_STATUS_STOPPED	0x100	/* PRI Stopped */
+#define  PCI_PRI_STATUS_PRPR	0x8000	/* PRG Response requires PASID prefix */
 #define PCI_PRI_MAX_REQ		0x08	/* PRI max reqs supported */
 #define PCI_PRI_ALLOC_REQ	0x0c	/* PRI max reqs allowed */
 #define PCI_EXT_CAP_PRI_SIZEOF	16
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 35/37] iommu/arm-smmu-v3: Add support for PRI
  2018-02-12 18:33 ` Jean-Philippe Brucker
  (?)
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: joro-zLv9SwRftAIdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	mark.rutland-5wv7dgnIgG8, catalin.marinas-5wv7dgnIgG8,
	will.deacon-5wv7dgnIgG8, lorenzo.pieralisi-5wv7dgnIgG8,
	hanjun.guo-QSEj5FYQhm4dnm+yROfE0A, sudeep.holla-5wv7dgnIgG8,
	rjw-LthD3rsA81gm4RdzfppkhA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	robin.murphy-5wv7dgnIgG8, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	tn-nYOzD4b6Jr9Wk0Htik3J/w, liubo95-hv44wF8Li93QT0dZR+AlfA,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	jonathan.cameron-hv44wF8Li93QT0dZR+AlfA,
	shunyong.yang-PT9Dzx9SjPiXmMXjJBpWqg,
	nwatters-sgV2jX0FEOL9JmXXK+q4OQ, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	jcrouse-sgV2jX0FEOL9JmXXK+q4OQ, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA,
	yi.l.liu-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	robdclark-Re5JQEeQqe8AvxtiuMwx3w, christian.koenig-5C7GfCeVMHo,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA

For PCI devices that support it, enable the PRI capability and handle
PRI Page Requests with the generic fault handler.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu-v3.c | 174 ++++++++++++++++++++++++++++++--------------
 1 file changed, 119 insertions(+), 55 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 8d09615fab35..ace2f995b0c0 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -271,6 +271,7 @@
 #define STRTAB_STE_1_S1COR_SHIFT	4
 #define STRTAB_STE_1_S1CSH_SHIFT	6
 
+#define STRTAB_STE_1_PPAR		(1UL << 18)
 #define STRTAB_STE_1_S1STALLD		(1UL << 27)
 
 #define STRTAB_STE_1_EATS_ABT		0UL
@@ -346,9 +347,9 @@
 #define CMDQ_PRI_1_GRPID_SHIFT		0
 #define CMDQ_PRI_1_GRPID_MASK		0x1ffUL
 #define CMDQ_PRI_1_RESP_SHIFT		12
-#define CMDQ_PRI_1_RESP_DENY		(0UL << CMDQ_PRI_1_RESP_SHIFT)
-#define CMDQ_PRI_1_RESP_FAIL		(1UL << CMDQ_PRI_1_RESP_SHIFT)
-#define CMDQ_PRI_1_RESP_SUCC		(2UL << CMDQ_PRI_1_RESP_SHIFT)
+#define CMDQ_PRI_1_RESP_FAILURE		(0UL << CMDQ_PRI_1_RESP_SHIFT)
+#define CMDQ_PRI_1_RESP_INVALID		(1UL << CMDQ_PRI_1_RESP_SHIFT)
+#define CMDQ_PRI_1_RESP_SUCCESS		(2UL << CMDQ_PRI_1_RESP_SHIFT)
 
 #define CMDQ_RESUME_0_SID_SHIFT		32
 #define CMDQ_RESUME_0_SID_MASK		0xffffffffUL
@@ -442,12 +443,6 @@ module_param_named(disable_ats_check, disable_ats_check, bool, S_IRUGO);
 MODULE_PARM_DESC(disable_ats_check,
 	"By default, the SMMU checks whether each incoming transaction marked as translated is allowed by the stream configuration. This option disables the check.");
 
-enum pri_resp {
-	PRI_RESP_DENY,
-	PRI_RESP_FAIL,
-	PRI_RESP_SUCC,
-};
-
 enum arm_smmu_msi_index {
 	EVTQ_MSI_INDEX,
 	GERROR_MSI_INDEX,
@@ -530,7 +525,7 @@ struct arm_smmu_cmdq_ent {
 			u32			sid;
 			u32			ssid;
 			u16			grpid;
-			enum pri_resp		resp;
+			enum page_response_code	resp;
 		} pri;
 
 		#define CMDQ_OP_RESUME		0x44
@@ -615,6 +610,7 @@ struct arm_smmu_strtab_ent {
 	struct arm_smmu_s2_cfg		*s2_cfg;
 
 	bool				can_stall;
+	bool				prg_resp_needs_ssid;
 };
 
 struct arm_smmu_strtab_cfg {
@@ -969,14 +965,14 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 		cmd[0] |= (u64)ent->pri.sid << CMDQ_PRI_0_SID_SHIFT;
 		cmd[1] |= ent->pri.grpid << CMDQ_PRI_1_GRPID_SHIFT;
 		switch (ent->pri.resp) {
-		case PRI_RESP_DENY:
-			cmd[1] |= CMDQ_PRI_1_RESP_DENY;
+		case IOMMU_PAGE_RESP_FAILURE:
+			cmd[1] |= CMDQ_PRI_1_RESP_FAILURE;
 			break;
-		case PRI_RESP_FAIL:
-			cmd[1] |= CMDQ_PRI_1_RESP_FAIL;
+		case IOMMU_PAGE_RESP_INVALID:
+			cmd[1] |= CMDQ_PRI_1_RESP_INVALID;
 			break;
-		case PRI_RESP_SUCC:
-			cmd[1] |= CMDQ_PRI_1_RESP_SUCC;
+		case IOMMU_PAGE_RESP_SUCCESS:
+			cmd[1] |= CMDQ_PRI_1_RESP_SUCCESS;
 			break;
 		default:
 			return -EINVAL;
@@ -1180,9 +1176,16 @@ static int arm_smmu_page_response(struct iommu_domain *domain,
 		cmd.resume.sid		= sid;
 		cmd.resume.stag		= resp->page_req_group_id;
 		cmd.resume.resp		= resp->resp_code;
+	} else if (master->can_fault) {
+		cmd.opcode		= CMDQ_OP_PRI_RESP;
+		cmd.substream_valid	= resp->pasid_present &&
+					  master->ste.prg_resp_needs_ssid;
+		cmd.pri.sid		= sid;
+		cmd.pri.ssid		= resp->pasid;
+		cmd.pri.grpid		= resp->page_req_group_id;
+		cmd.pri.resp		= resp->resp_code;
 	} else {
-		/* TODO: put PRI response here */
-		return -EINVAL;
+		return -ENODEV;
 	}
 
 	arm_smmu_cmdq_issue_cmd(master->smmu, &cmd);
@@ -1309,6 +1312,9 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 			  STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1) <<
 			 STRTAB_STE_1_STRW_SHIFT);
 
+		if (ste->prg_resp_needs_ssid)
+			dst[1] |= STRTAB_STE_1_PPAR;
+
 		if (smmu->features & ARM_SMMU_FEAT_STALLS &&
 		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE) &&
 		   !ste->can_stall)
@@ -1536,40 +1542,32 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 
 static void arm_smmu_handle_ppr(struct arm_smmu_device *smmu, u64 *evt)
 {
-	u32 sid, ssid;
-	u16 grpid;
-	bool ssv, last;
-
-	sid = evt[0] >> PRIQ_0_SID_SHIFT & PRIQ_0_SID_MASK;
-	ssv = evt[0] & PRIQ_0_SSID_V;
-	ssid = ssv ? evt[0] >> PRIQ_0_SSID_SHIFT & PRIQ_0_SSID_MASK : 0;
-	last = evt[0] & PRIQ_0_PRG_LAST;
-	grpid = evt[1] >> PRIQ_1_PRG_IDX_SHIFT & PRIQ_1_PRG_IDX_MASK;
-
-	dev_info(smmu->dev, "unexpected PRI request received:\n");
-	dev_info(smmu->dev,
-		 "\tsid 0x%08x.0x%05x: [%u%s] %sprivileged %s%s%s access at iova 0x%016llx\n",
-		 sid, ssid, grpid, last ? "L" : "",
-		 evt[0] & PRIQ_0_PERM_PRIV ? "" : "un",
-		 evt[0] & PRIQ_0_PERM_READ ? "R" : "",
-		 evt[0] & PRIQ_0_PERM_WRITE ? "W" : "",
-		 evt[0] & PRIQ_0_PERM_EXEC ? "X" : "",
-		 evt[1] & PRIQ_1_ADDR_MASK << PRIQ_1_ADDR_SHIFT);
-
-	if (last) {
-		struct arm_smmu_cmdq_ent cmd = {
-			.opcode			= CMDQ_OP_PRI_RESP,
-			.substream_valid	= ssv,
-			.pri			= {
-				.sid	= sid,
-				.ssid	= ssid,
-				.grpid	= grpid,
-				.resp	= PRI_RESP_DENY,
-			},
-		};
+	u32 sid = evt[0] >> PRIQ_0_SID_SHIFT & PRIQ_0_SID_MASK;
 
-		arm_smmu_cmdq_issue_cmd(smmu, &cmd);
-	}
+	struct arm_smmu_master_data *master;
+	struct iommu_fault_event fault = {
+		.type		= IOMMU_FAULT_PAGE_REQ,
+		.last_req	= !!(evt[0] & PRIQ_0_PRG_LAST),
+		.pasid_valid	= !!(evt[0] & PRIQ_0_SSID_V),
+		.pasid		= evt[0] >> PRIQ_0_SSID_SHIFT & PRIQ_0_SSID_MASK,
+		.page_req_group_id = evt[1] >> PRIQ_1_PRG_IDX_SHIFT & PRIQ_1_PRG_IDX_MASK,
+		.addr		= evt[1] & PRIQ_1_ADDR_MASK << PRIQ_1_ADDR_SHIFT,
+	};
+
+	if (evt[0] & PRIQ_0_PERM_READ)
+		fault.prot |= IOMMU_FAULT_READ;
+	if (evt[0] & PRIQ_0_PERM_WRITE)
+		fault.prot |= IOMMU_FAULT_WRITE;
+	if (evt[0] & PRIQ_0_PERM_EXEC)
+		fault.prot |= IOMMU_FAULT_EXEC;
+	if (evt[0] & PRIQ_0_PERM_PRIV)
+		fault.prot |= IOMMU_FAULT_PRIV;
+
+	master = arm_smmu_find_master(smmu, sid);
+	if (WARN_ON(!master))
+		return;
+
+	iommu_report_device_fault(master->dev, &fault);
 }
 
 static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
@@ -1594,6 +1592,11 @@ static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
 		}
 
 		if (queue_sync_prod(q) == -EOVERFLOW)
+			/*
+			 * TODO: flush pending faults, since the SMMU might have
+			 * auto-responded to the Last request of a pending
+			 * group
+			 */
 			dev_err(smmu->dev, "PRIQ overflow detected -- requests lost\n");
 	} while (!queue_empty(q));
 
@@ -1647,7 +1650,8 @@ static int arm_smmu_flush_queues(struct notifier_block *nb,
 	if (master) {
 		if (master->ste.can_stall)
 			arm_smmu_flush_queue(smmu, &smmu->evtq.q, "evtq");
-		/* TODO: add support for PRI */
+		else if (master->can_fault)
+			arm_smmu_flush_queue(smmu, &smmu->priq.q, "priq");
 		return 0;
 	}
 
@@ -2533,6 +2537,46 @@ static int arm_smmu_enable_ats(struct arm_smmu_master_data *master)
 	return 0;
 }
 
+static int arm_smmu_enable_pri(struct arm_smmu_master_data *master)
+{
+	int ret, pos;
+	struct pci_dev *pdev;
+	/*
+	 * TODO: find a good inflight PPR number. We should divide the PRI queue
+	 * by the number of PRI-capable devices, but it's impossible to know
+	 * about current and future (hotplugged) devices. So we're at risk of
+	 * dropping PPRs (and leaking pending requests in the FQ).
+	 */
+	size_t max_inflight_pprs = 16;
+	struct arm_smmu_device *smmu = master->smmu;
+
+	if (!(smmu->features & ARM_SMMU_FEAT_PRI) || !dev_is_pci(master->dev))
+		return -ENOSYS;
+
+	pdev = to_pci_dev(master->dev);
+
+	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
+	if (!pos)
+		return -ENOSYS;
+
+	ret = pci_reset_pri(pdev);
+	if (ret)
+		return ret;
+
+	ret = pci_enable_pri(pdev, max_inflight_pprs);
+	if (ret) {
+		dev_err(master->dev, "cannot enable PRI: %d\n", ret);
+		return ret;
+	}
+
+	master->can_fault = true;
+	master->ste.prg_resp_needs_ssid = pci_prg_resp_requires_prefix(pdev);
+
+	dev_dbg(master->dev, "enabled PRI");
+
+	return 0;
+}
+
 static void arm_smmu_disable_ats(struct arm_smmu_master_data *master)
 {
 	struct pci_dev *pdev;
@@ -2548,6 +2592,22 @@ static void arm_smmu_disable_ats(struct arm_smmu_master_data *master)
 	pci_disable_ats(pdev);
 }
 
+static void arm_smmu_disable_pri(struct arm_smmu_master_data *master)
+{
+	struct pci_dev *pdev;
+
+	if (!dev_is_pci(master->dev))
+		return;
+
+	pdev = to_pci_dev(master->dev);
+
+	if (!pdev->pri_enabled)
+		return;
+
+	pci_disable_pri(pdev);
+	master->can_fault = false;
+}
+
 static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
 				  struct arm_smmu_master_data *master)
 {
@@ -2668,12 +2728,13 @@ static int arm_smmu_add_device(struct device *dev)
 		master->ste.can_stall = true;
 	}
 
-	arm_smmu_enable_ats(master);
+	if (!arm_smmu_enable_ats(master))
+		arm_smmu_enable_pri(master);
 
 	group = iommu_group_get_for_dev(dev);
 	if (IS_ERR(group)) {
 		ret = PTR_ERR(group);
-		goto err_disable_ats;
+		goto err_disable_pri;
 	}
 
 	iommu_group_put(group);
@@ -2682,7 +2743,8 @@ static int arm_smmu_add_device(struct device *dev)
 
 	return 0;
 
-err_disable_ats:
+err_disable_pri:
+	arm_smmu_disable_pri(master);
 	arm_smmu_disable_ats(master);
 
 	return ret;
@@ -2702,6 +2764,8 @@ static void arm_smmu_remove_device(struct device *dev)
 	if (master && master->ste.assigned)
 		arm_smmu_detach_dev(dev);
 	arm_smmu_remove_master(smmu, master);
+
+	arm_smmu_disable_pri(master);
 	arm_smmu_disable_ats(master);
 
 	iommu_group_remove_device(dev);
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 35/37] iommu/arm-smmu-v3: Add support for PRI
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

For PCI devices that support it, enable the PRI capability and handle
PRI Page Requests with the generic fault handler.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 174 ++++++++++++++++++++++++++++++--------------
 1 file changed, 119 insertions(+), 55 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 8d09615fab35..ace2f995b0c0 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -271,6 +271,7 @@
 #define STRTAB_STE_1_S1COR_SHIFT	4
 #define STRTAB_STE_1_S1CSH_SHIFT	6
 
+#define STRTAB_STE_1_PPAR		(1UL << 18)
 #define STRTAB_STE_1_S1STALLD		(1UL << 27)
 
 #define STRTAB_STE_1_EATS_ABT		0UL
@@ -346,9 +347,9 @@
 #define CMDQ_PRI_1_GRPID_SHIFT		0
 #define CMDQ_PRI_1_GRPID_MASK		0x1ffUL
 #define CMDQ_PRI_1_RESP_SHIFT		12
-#define CMDQ_PRI_1_RESP_DENY		(0UL << CMDQ_PRI_1_RESP_SHIFT)
-#define CMDQ_PRI_1_RESP_FAIL		(1UL << CMDQ_PRI_1_RESP_SHIFT)
-#define CMDQ_PRI_1_RESP_SUCC		(2UL << CMDQ_PRI_1_RESP_SHIFT)
+#define CMDQ_PRI_1_RESP_FAILURE		(0UL << CMDQ_PRI_1_RESP_SHIFT)
+#define CMDQ_PRI_1_RESP_INVALID		(1UL << CMDQ_PRI_1_RESP_SHIFT)
+#define CMDQ_PRI_1_RESP_SUCCESS		(2UL << CMDQ_PRI_1_RESP_SHIFT)
 
 #define CMDQ_RESUME_0_SID_SHIFT		32
 #define CMDQ_RESUME_0_SID_MASK		0xffffffffUL
@@ -442,12 +443,6 @@ module_param_named(disable_ats_check, disable_ats_check, bool, S_IRUGO);
 MODULE_PARM_DESC(disable_ats_check,
 	"By default, the SMMU checks whether each incoming transaction marked as translated is allowed by the stream configuration. This option disables the check.");
 
-enum pri_resp {
-	PRI_RESP_DENY,
-	PRI_RESP_FAIL,
-	PRI_RESP_SUCC,
-};
-
 enum arm_smmu_msi_index {
 	EVTQ_MSI_INDEX,
 	GERROR_MSI_INDEX,
@@ -530,7 +525,7 @@ struct arm_smmu_cmdq_ent {
 			u32			sid;
 			u32			ssid;
 			u16			grpid;
-			enum pri_resp		resp;
+			enum page_response_code	resp;
 		} pri;
 
 		#define CMDQ_OP_RESUME		0x44
@@ -615,6 +610,7 @@ struct arm_smmu_strtab_ent {
 	struct arm_smmu_s2_cfg		*s2_cfg;
 
 	bool				can_stall;
+	bool				prg_resp_needs_ssid;
 };
 
 struct arm_smmu_strtab_cfg {
@@ -969,14 +965,14 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 		cmd[0] |= (u64)ent->pri.sid << CMDQ_PRI_0_SID_SHIFT;
 		cmd[1] |= ent->pri.grpid << CMDQ_PRI_1_GRPID_SHIFT;
 		switch (ent->pri.resp) {
-		case PRI_RESP_DENY:
-			cmd[1] |= CMDQ_PRI_1_RESP_DENY;
+		case IOMMU_PAGE_RESP_FAILURE:
+			cmd[1] |= CMDQ_PRI_1_RESP_FAILURE;
 			break;
-		case PRI_RESP_FAIL:
-			cmd[1] |= CMDQ_PRI_1_RESP_FAIL;
+		case IOMMU_PAGE_RESP_INVALID:
+			cmd[1] |= CMDQ_PRI_1_RESP_INVALID;
 			break;
-		case PRI_RESP_SUCC:
-			cmd[1] |= CMDQ_PRI_1_RESP_SUCC;
+		case IOMMU_PAGE_RESP_SUCCESS:
+			cmd[1] |= CMDQ_PRI_1_RESP_SUCCESS;
 			break;
 		default:
 			return -EINVAL;
@@ -1180,9 +1176,16 @@ static int arm_smmu_page_response(struct iommu_domain *domain,
 		cmd.resume.sid		= sid;
 		cmd.resume.stag		= resp->page_req_group_id;
 		cmd.resume.resp		= resp->resp_code;
+	} else if (master->can_fault) {
+		cmd.opcode		= CMDQ_OP_PRI_RESP;
+		cmd.substream_valid	= resp->pasid_present &&
+					  master->ste.prg_resp_needs_ssid;
+		cmd.pri.sid		= sid;
+		cmd.pri.ssid		= resp->pasid;
+		cmd.pri.grpid		= resp->page_req_group_id;
+		cmd.pri.resp		= resp->resp_code;
 	} else {
-		/* TODO: put PRI response here */
-		return -EINVAL;
+		return -ENODEV;
 	}
 
 	arm_smmu_cmdq_issue_cmd(master->smmu, &cmd);
@@ -1309,6 +1312,9 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 			  STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1) <<
 			 STRTAB_STE_1_STRW_SHIFT);
 
+		if (ste->prg_resp_needs_ssid)
+			dst[1] |= STRTAB_STE_1_PPAR;
+
 		if (smmu->features & ARM_SMMU_FEAT_STALLS &&
 		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE) &&
 		   !ste->can_stall)
@@ -1536,40 +1542,32 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 
 static void arm_smmu_handle_ppr(struct arm_smmu_device *smmu, u64 *evt)
 {
-	u32 sid, ssid;
-	u16 grpid;
-	bool ssv, last;
-
-	sid = evt[0] >> PRIQ_0_SID_SHIFT & PRIQ_0_SID_MASK;
-	ssv = evt[0] & PRIQ_0_SSID_V;
-	ssid = ssv ? evt[0] >> PRIQ_0_SSID_SHIFT & PRIQ_0_SSID_MASK : 0;
-	last = evt[0] & PRIQ_0_PRG_LAST;
-	grpid = evt[1] >> PRIQ_1_PRG_IDX_SHIFT & PRIQ_1_PRG_IDX_MASK;
-
-	dev_info(smmu->dev, "unexpected PRI request received:\n");
-	dev_info(smmu->dev,
-		 "\tsid 0x%08x.0x%05x: [%u%s] %sprivileged %s%s%s access at iova 0x%016llx\n",
-		 sid, ssid, grpid, last ? "L" : "",
-		 evt[0] & PRIQ_0_PERM_PRIV ? "" : "un",
-		 evt[0] & PRIQ_0_PERM_READ ? "R" : "",
-		 evt[0] & PRIQ_0_PERM_WRITE ? "W" : "",
-		 evt[0] & PRIQ_0_PERM_EXEC ? "X" : "",
-		 evt[1] & PRIQ_1_ADDR_MASK << PRIQ_1_ADDR_SHIFT);
-
-	if (last) {
-		struct arm_smmu_cmdq_ent cmd = {
-			.opcode			= CMDQ_OP_PRI_RESP,
-			.substream_valid	= ssv,
-			.pri			= {
-				.sid	= sid,
-				.ssid	= ssid,
-				.grpid	= grpid,
-				.resp	= PRI_RESP_DENY,
-			},
-		};
+	u32 sid = evt[0] >> PRIQ_0_SID_SHIFT & PRIQ_0_SID_MASK;
 
-		arm_smmu_cmdq_issue_cmd(smmu, &cmd);
-	}
+	struct arm_smmu_master_data *master;
+	struct iommu_fault_event fault = {
+		.type		= IOMMU_FAULT_PAGE_REQ,
+		.last_req	= !!(evt[0] & PRIQ_0_PRG_LAST),
+		.pasid_valid	= !!(evt[0] & PRIQ_0_SSID_V),
+		.pasid		= evt[0] >> PRIQ_0_SSID_SHIFT & PRIQ_0_SSID_MASK,
+		.page_req_group_id = evt[1] >> PRIQ_1_PRG_IDX_SHIFT & PRIQ_1_PRG_IDX_MASK,
+		.addr		= evt[1] & PRIQ_1_ADDR_MASK << PRIQ_1_ADDR_SHIFT,
+	};
+
+	if (evt[0] & PRIQ_0_PERM_READ)
+		fault.prot |= IOMMU_FAULT_READ;
+	if (evt[0] & PRIQ_0_PERM_WRITE)
+		fault.prot |= IOMMU_FAULT_WRITE;
+	if (evt[0] & PRIQ_0_PERM_EXEC)
+		fault.prot |= IOMMU_FAULT_EXEC;
+	if (evt[0] & PRIQ_0_PERM_PRIV)
+		fault.prot |= IOMMU_FAULT_PRIV;
+
+	master = arm_smmu_find_master(smmu, sid);
+	if (WARN_ON(!master))
+		return;
+
+	iommu_report_device_fault(master->dev, &fault);
 }
 
 static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
@@ -1594,6 +1592,11 @@ static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
 		}
 
 		if (queue_sync_prod(q) == -EOVERFLOW)
+			/*
+			 * TODO: flush pending faults, since the SMMU might have
+			 * auto-responded to the Last request of a pending
+			 * group
+			 */
 			dev_err(smmu->dev, "PRIQ overflow detected -- requests lost\n");
 	} while (!queue_empty(q));
 
@@ -1647,7 +1650,8 @@ static int arm_smmu_flush_queues(struct notifier_block *nb,
 	if (master) {
 		if (master->ste.can_stall)
 			arm_smmu_flush_queue(smmu, &smmu->evtq.q, "evtq");
-		/* TODO: add support for PRI */
+		else if (master->can_fault)
+			arm_smmu_flush_queue(smmu, &smmu->priq.q, "priq");
 		return 0;
 	}
 
@@ -2533,6 +2537,46 @@ static int arm_smmu_enable_ats(struct arm_smmu_master_data *master)
 	return 0;
 }
 
+static int arm_smmu_enable_pri(struct arm_smmu_master_data *master)
+{
+	int ret, pos;
+	struct pci_dev *pdev;
+	/*
+	 * TODO: find a good inflight PPR number. We should divide the PRI queue
+	 * by the number of PRI-capable devices, but it's impossible to know
+	 * about current and future (hotplugged) devices. So we're at risk of
+	 * dropping PPRs (and leaking pending requests in the FQ).
+	 */
+	size_t max_inflight_pprs = 16;
+	struct arm_smmu_device *smmu = master->smmu;
+
+	if (!(smmu->features & ARM_SMMU_FEAT_PRI) || !dev_is_pci(master->dev))
+		return -ENOSYS;
+
+	pdev = to_pci_dev(master->dev);
+
+	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
+	if (!pos)
+		return -ENOSYS;
+
+	ret = pci_reset_pri(pdev);
+	if (ret)
+		return ret;
+
+	ret = pci_enable_pri(pdev, max_inflight_pprs);
+	if (ret) {
+		dev_err(master->dev, "cannot enable PRI: %d\n", ret);
+		return ret;
+	}
+
+	master->can_fault = true;
+	master->ste.prg_resp_needs_ssid = pci_prg_resp_requires_prefix(pdev);
+
+	dev_dbg(master->dev, "enabled PRI");
+
+	return 0;
+}
+
 static void arm_smmu_disable_ats(struct arm_smmu_master_data *master)
 {
 	struct pci_dev *pdev;
@@ -2548,6 +2592,22 @@ static void arm_smmu_disable_ats(struct arm_smmu_master_data *master)
 	pci_disable_ats(pdev);
 }
 
+static void arm_smmu_disable_pri(struct arm_smmu_master_data *master)
+{
+	struct pci_dev *pdev;
+
+	if (!dev_is_pci(master->dev))
+		return;
+
+	pdev = to_pci_dev(master->dev);
+
+	if (!pdev->pri_enabled)
+		return;
+
+	pci_disable_pri(pdev);
+	master->can_fault = false;
+}
+
 static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
 				  struct arm_smmu_master_data *master)
 {
@@ -2668,12 +2728,13 @@ static int arm_smmu_add_device(struct device *dev)
 		master->ste.can_stall = true;
 	}
 
-	arm_smmu_enable_ats(master);
+	if (!arm_smmu_enable_ats(master))
+		arm_smmu_enable_pri(master);
 
 	group = iommu_group_get_for_dev(dev);
 	if (IS_ERR(group)) {
 		ret = PTR_ERR(group);
-		goto err_disable_ats;
+		goto err_disable_pri;
 	}
 
 	iommu_group_put(group);
@@ -2682,7 +2743,8 @@ static int arm_smmu_add_device(struct device *dev)
 
 	return 0;
 
-err_disable_ats:
+err_disable_pri:
+	arm_smmu_disable_pri(master);
 	arm_smmu_disable_ats(master);
 
 	return ret;
@@ -2702,6 +2764,8 @@ static void arm_smmu_remove_device(struct device *dev)
 	if (master && master->ste.assigned)
 		arm_smmu_detach_dev(dev);
 	arm_smmu_remove_master(smmu, master);
+
+	arm_smmu_disable_pri(master);
 	arm_smmu_disable_ats(master);
 
 	iommu_group_remove_device(dev);
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 35/37] iommu/arm-smmu-v3: Add support for PRI
@ 2018-02-12 18:33     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

For PCI devices that support it, enable the PRI capability and handle
PRI Page Requests with the generic fault handler.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 174 ++++++++++++++++++++++++++++++--------------
 1 file changed, 119 insertions(+), 55 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 8d09615fab35..ace2f995b0c0 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -271,6 +271,7 @@
 #define STRTAB_STE_1_S1COR_SHIFT	4
 #define STRTAB_STE_1_S1CSH_SHIFT	6
 
+#define STRTAB_STE_1_PPAR		(1UL << 18)
 #define STRTAB_STE_1_S1STALLD		(1UL << 27)
 
 #define STRTAB_STE_1_EATS_ABT		0UL
@@ -346,9 +347,9 @@
 #define CMDQ_PRI_1_GRPID_SHIFT		0
 #define CMDQ_PRI_1_GRPID_MASK		0x1ffUL
 #define CMDQ_PRI_1_RESP_SHIFT		12
-#define CMDQ_PRI_1_RESP_DENY		(0UL << CMDQ_PRI_1_RESP_SHIFT)
-#define CMDQ_PRI_1_RESP_FAIL		(1UL << CMDQ_PRI_1_RESP_SHIFT)
-#define CMDQ_PRI_1_RESP_SUCC		(2UL << CMDQ_PRI_1_RESP_SHIFT)
+#define CMDQ_PRI_1_RESP_FAILURE		(0UL << CMDQ_PRI_1_RESP_SHIFT)
+#define CMDQ_PRI_1_RESP_INVALID		(1UL << CMDQ_PRI_1_RESP_SHIFT)
+#define CMDQ_PRI_1_RESP_SUCCESS		(2UL << CMDQ_PRI_1_RESP_SHIFT)
 
 #define CMDQ_RESUME_0_SID_SHIFT		32
 #define CMDQ_RESUME_0_SID_MASK		0xffffffffUL
@@ -442,12 +443,6 @@ module_param_named(disable_ats_check, disable_ats_check, bool, S_IRUGO);
 MODULE_PARM_DESC(disable_ats_check,
 	"By default, the SMMU checks whether each incoming transaction marked as translated is allowed by the stream configuration. This option disables the check.");
 
-enum pri_resp {
-	PRI_RESP_DENY,
-	PRI_RESP_FAIL,
-	PRI_RESP_SUCC,
-};
-
 enum arm_smmu_msi_index {
 	EVTQ_MSI_INDEX,
 	GERROR_MSI_INDEX,
@@ -530,7 +525,7 @@ struct arm_smmu_cmdq_ent {
 			u32			sid;
 			u32			ssid;
 			u16			grpid;
-			enum pri_resp		resp;
+			enum page_response_code	resp;
 		} pri;
 
 		#define CMDQ_OP_RESUME		0x44
@@ -615,6 +610,7 @@ struct arm_smmu_strtab_ent {
 	struct arm_smmu_s2_cfg		*s2_cfg;
 
 	bool				can_stall;
+	bool				prg_resp_needs_ssid;
 };
 
 struct arm_smmu_strtab_cfg {
@@ -969,14 +965,14 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 		cmd[0] |= (u64)ent->pri.sid << CMDQ_PRI_0_SID_SHIFT;
 		cmd[1] |= ent->pri.grpid << CMDQ_PRI_1_GRPID_SHIFT;
 		switch (ent->pri.resp) {
-		case PRI_RESP_DENY:
-			cmd[1] |= CMDQ_PRI_1_RESP_DENY;
+		case IOMMU_PAGE_RESP_FAILURE:
+			cmd[1] |= CMDQ_PRI_1_RESP_FAILURE;
 			break;
-		case PRI_RESP_FAIL:
-			cmd[1] |= CMDQ_PRI_1_RESP_FAIL;
+		case IOMMU_PAGE_RESP_INVALID:
+			cmd[1] |= CMDQ_PRI_1_RESP_INVALID;
 			break;
-		case PRI_RESP_SUCC:
-			cmd[1] |= CMDQ_PRI_1_RESP_SUCC;
+		case IOMMU_PAGE_RESP_SUCCESS:
+			cmd[1] |= CMDQ_PRI_1_RESP_SUCCESS;
 			break;
 		default:
 			return -EINVAL;
@@ -1180,9 +1176,16 @@ static int arm_smmu_page_response(struct iommu_domain *domain,
 		cmd.resume.sid		= sid;
 		cmd.resume.stag		= resp->page_req_group_id;
 		cmd.resume.resp		= resp->resp_code;
+	} else if (master->can_fault) {
+		cmd.opcode		= CMDQ_OP_PRI_RESP;
+		cmd.substream_valid	= resp->pasid_present &&
+					  master->ste.prg_resp_needs_ssid;
+		cmd.pri.sid		= sid;
+		cmd.pri.ssid		= resp->pasid;
+		cmd.pri.grpid		= resp->page_req_group_id;
+		cmd.pri.resp		= resp->resp_code;
 	} else {
-		/* TODO: put PRI response here */
-		return -EINVAL;
+		return -ENODEV;
 	}
 
 	arm_smmu_cmdq_issue_cmd(master->smmu, &cmd);
@@ -1309,6 +1312,9 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 			  STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1) <<
 			 STRTAB_STE_1_STRW_SHIFT);
 
+		if (ste->prg_resp_needs_ssid)
+			dst[1] |= STRTAB_STE_1_PPAR;
+
 		if (smmu->features & ARM_SMMU_FEAT_STALLS &&
 		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE) &&
 		   !ste->can_stall)
@@ -1536,40 +1542,32 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 
 static void arm_smmu_handle_ppr(struct arm_smmu_device *smmu, u64 *evt)
 {
-	u32 sid, ssid;
-	u16 grpid;
-	bool ssv, last;
-
-	sid = evt[0] >> PRIQ_0_SID_SHIFT & PRIQ_0_SID_MASK;
-	ssv = evt[0] & PRIQ_0_SSID_V;
-	ssid = ssv ? evt[0] >> PRIQ_0_SSID_SHIFT & PRIQ_0_SSID_MASK : 0;
-	last = evt[0] & PRIQ_0_PRG_LAST;
-	grpid = evt[1] >> PRIQ_1_PRG_IDX_SHIFT & PRIQ_1_PRG_IDX_MASK;
-
-	dev_info(smmu->dev, "unexpected PRI request received:\n");
-	dev_info(smmu->dev,
-		 "\tsid 0x%08x.0x%05x: [%u%s] %sprivileged %s%s%s access at iova 0x%016llx\n",
-		 sid, ssid, grpid, last ? "L" : "",
-		 evt[0] & PRIQ_0_PERM_PRIV ? "" : "un",
-		 evt[0] & PRIQ_0_PERM_READ ? "R" : "",
-		 evt[0] & PRIQ_0_PERM_WRITE ? "W" : "",
-		 evt[0] & PRIQ_0_PERM_EXEC ? "X" : "",
-		 evt[1] & PRIQ_1_ADDR_MASK << PRIQ_1_ADDR_SHIFT);
-
-	if (last) {
-		struct arm_smmu_cmdq_ent cmd = {
-			.opcode			= CMDQ_OP_PRI_RESP,
-			.substream_valid	= ssv,
-			.pri			= {
-				.sid	= sid,
-				.ssid	= ssid,
-				.grpid	= grpid,
-				.resp	= PRI_RESP_DENY,
-			},
-		};
+	u32 sid = evt[0] >> PRIQ_0_SID_SHIFT & PRIQ_0_SID_MASK;
 
-		arm_smmu_cmdq_issue_cmd(smmu, &cmd);
-	}
+	struct arm_smmu_master_data *master;
+	struct iommu_fault_event fault = {
+		.type		= IOMMU_FAULT_PAGE_REQ,
+		.last_req	= !!(evt[0] & PRIQ_0_PRG_LAST),
+		.pasid_valid	= !!(evt[0] & PRIQ_0_SSID_V),
+		.pasid		= evt[0] >> PRIQ_0_SSID_SHIFT & PRIQ_0_SSID_MASK,
+		.page_req_group_id = evt[1] >> PRIQ_1_PRG_IDX_SHIFT & PRIQ_1_PRG_IDX_MASK,
+		.addr		= evt[1] & PRIQ_1_ADDR_MASK << PRIQ_1_ADDR_SHIFT,
+	};
+
+	if (evt[0] & PRIQ_0_PERM_READ)
+		fault.prot |= IOMMU_FAULT_READ;
+	if (evt[0] & PRIQ_0_PERM_WRITE)
+		fault.prot |= IOMMU_FAULT_WRITE;
+	if (evt[0] & PRIQ_0_PERM_EXEC)
+		fault.prot |= IOMMU_FAULT_EXEC;
+	if (evt[0] & PRIQ_0_PERM_PRIV)
+		fault.prot |= IOMMU_FAULT_PRIV;
+
+	master = arm_smmu_find_master(smmu, sid);
+	if (WARN_ON(!master))
+		return;
+
+	iommu_report_device_fault(master->dev, &fault);
 }
 
 static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
@@ -1594,6 +1592,11 @@ static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
 		}
 
 		if (queue_sync_prod(q) == -EOVERFLOW)
+			/*
+			 * TODO: flush pending faults, since the SMMU might have
+			 * auto-responded to the Last request of a pending
+			 * group
+			 */
 			dev_err(smmu->dev, "PRIQ overflow detected -- requests lost\n");
 	} while (!queue_empty(q));
 
@@ -1647,7 +1650,8 @@ static int arm_smmu_flush_queues(struct notifier_block *nb,
 	if (master) {
 		if (master->ste.can_stall)
 			arm_smmu_flush_queue(smmu, &smmu->evtq.q, "evtq");
-		/* TODO: add support for PRI */
+		else if (master->can_fault)
+			arm_smmu_flush_queue(smmu, &smmu->priq.q, "priq");
 		return 0;
 	}
 
@@ -2533,6 +2537,46 @@ static int arm_smmu_enable_ats(struct arm_smmu_master_data *master)
 	return 0;
 }
 
+static int arm_smmu_enable_pri(struct arm_smmu_master_data *master)
+{
+	int ret, pos;
+	struct pci_dev *pdev;
+	/*
+	 * TODO: find a good inflight PPR number. We should divide the PRI queue
+	 * by the number of PRI-capable devices, but it's impossible to know
+	 * about current and future (hotplugged) devices. So we're at risk of
+	 * dropping PPRs (and leaking pending requests in the FQ).
+	 */
+	size_t max_inflight_pprs = 16;
+	struct arm_smmu_device *smmu = master->smmu;
+
+	if (!(smmu->features & ARM_SMMU_FEAT_PRI) || !dev_is_pci(master->dev))
+		return -ENOSYS;
+
+	pdev = to_pci_dev(master->dev);
+
+	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
+	if (!pos)
+		return -ENOSYS;
+
+	ret = pci_reset_pri(pdev);
+	if (ret)
+		return ret;
+
+	ret = pci_enable_pri(pdev, max_inflight_pprs);
+	if (ret) {
+		dev_err(master->dev, "cannot enable PRI: %d\n", ret);
+		return ret;
+	}
+
+	master->can_fault = true;
+	master->ste.prg_resp_needs_ssid = pci_prg_resp_requires_prefix(pdev);
+
+	dev_dbg(master->dev, "enabled PRI");
+
+	return 0;
+}
+
 static void arm_smmu_disable_ats(struct arm_smmu_master_data *master)
 {
 	struct pci_dev *pdev;
@@ -2548,6 +2592,22 @@ static void arm_smmu_disable_ats(struct arm_smmu_master_data *master)
 	pci_disable_ats(pdev);
 }
 
+static void arm_smmu_disable_pri(struct arm_smmu_master_data *master)
+{
+	struct pci_dev *pdev;
+
+	if (!dev_is_pci(master->dev))
+		return;
+
+	pdev = to_pci_dev(master->dev);
+
+	if (!pdev->pri_enabled)
+		return;
+
+	pci_disable_pri(pdev);
+	master->can_fault = false;
+}
+
 static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
 				  struct arm_smmu_master_data *master)
 {
@@ -2668,12 +2728,13 @@ static int arm_smmu_add_device(struct device *dev)
 		master->ste.can_stall = true;
 	}
 
-	arm_smmu_enable_ats(master);
+	if (!arm_smmu_enable_ats(master))
+		arm_smmu_enable_pri(master);
 
 	group = iommu_group_get_for_dev(dev);
 	if (IS_ERR(group)) {
 		ret = PTR_ERR(group);
-		goto err_disable_ats;
+		goto err_disable_pri;
 	}
 
 	iommu_group_put(group);
@@ -2682,7 +2743,8 @@ static int arm_smmu_add_device(struct device *dev)
 
 	return 0;
 
-err_disable_ats:
+err_disable_pri:
+	arm_smmu_disable_pri(master);
 	arm_smmu_disable_ats(master);
 
 	return ret;
@@ -2702,6 +2764,8 @@ static void arm_smmu_remove_device(struct device *dev)
 	if (master && master->ste.assigned)
 		arm_smmu_detach_dev(dev);
 	arm_smmu_remove_master(smmu, master);
+
+	arm_smmu_disable_pri(master);
 	arm_smmu_disable_ats(master);
 
 	iommu_group_remove_device(dev);
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 36/37] iommu/arm-smmu-v3: Add support for PCI PASID
  2018-02-12 18:33 ` Jean-Philippe Brucker
@ 2018-02-12 18:33   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

Enable PASID for PCI devices that support it.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 54 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index ace2f995b0c0..26935a9a5a97 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2608,6 +2608,52 @@ static void arm_smmu_disable_pri(struct arm_smmu_master_data *master)
 	master->can_fault = false;
 }
 
+static int arm_smmu_enable_pasid(struct arm_smmu_master_data *master)
+{
+	int ret;
+	int features;
+	u8 pasid_bits;
+	int num_pasids;
+	struct pci_dev *pdev;
+
+	if (!dev_is_pci(master->dev))
+		return -ENOSYS;
+
+	pdev = to_pci_dev(master->dev);
+
+	features = pci_pasid_features(pdev);
+	if (features < 0)
+		return -ENOSYS;
+
+	num_pasids = pci_max_pasids(pdev);
+	if (num_pasids <= 0)
+		return -ENOSYS;
+
+	pasid_bits = min_t(u8, ilog2(num_pasids), master->smmu->ssid_bits);
+
+	dev_dbg(&pdev->dev, "device supports %#x PASID bits [%s%s]\n", pasid_bits,
+		(features & PCI_PASID_CAP_EXEC) ? "x" : "",
+		(features & PCI_PASID_CAP_PRIV) ? "p" : "");
+
+	ret = pci_enable_pasid(pdev, features);
+	return ret ? ret : pasid_bits;
+}
+
+static void arm_smmu_disable_pasid(struct arm_smmu_master_data *master)
+{
+	struct pci_dev *pdev;
+
+	if (!dev_is_pci(master->dev))
+		return;
+
+	pdev = to_pci_dev(master->dev);
+
+	if (!pdev->pasid_enabled)
+		return;
+
+	pci_disable_pasid(pdev);
+}
+
 static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
 				  struct arm_smmu_master_data *master)
 {
@@ -2728,6 +2774,11 @@ static int arm_smmu_add_device(struct device *dev)
 		master->ste.can_stall = true;
 	}
 
+	/* PASID must be enabled before ATS */
+	ret = arm_smmu_enable_pasid(master);
+	if (ret > 0)
+		master->ssid_bits = ret;
+
 	if (!arm_smmu_enable_ats(master))
 		arm_smmu_enable_pri(master);
 
@@ -2746,6 +2797,7 @@ static int arm_smmu_add_device(struct device *dev)
 err_disable_pri:
 	arm_smmu_disable_pri(master);
 	arm_smmu_disable_ats(master);
+	arm_smmu_disable_pasid(master);
 
 	return ret;
 }
@@ -2766,7 +2818,9 @@ static void arm_smmu_remove_device(struct device *dev)
 	arm_smmu_remove_master(smmu, master);
 
 	arm_smmu_disable_pri(master);
+	/* PASID must be disabled after ATS */
 	arm_smmu_disable_ats(master);
+	arm_smmu_disable_pasid(master);
 
 	iommu_group_remove_device(dev);
 	iommu_device_unlink(&smmu->iommu, dev);
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 36/37] iommu/arm-smmu-v3: Add support for PCI PASID
@ 2018-02-12 18:33   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

Enable PASID for PCI devices that support it.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 54 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index ace2f995b0c0..26935a9a5a97 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2608,6 +2608,52 @@ static void arm_smmu_disable_pri(struct arm_smmu_master_data *master)
 	master->can_fault = false;
 }
 
+static int arm_smmu_enable_pasid(struct arm_smmu_master_data *master)
+{
+	int ret;
+	int features;
+	u8 pasid_bits;
+	int num_pasids;
+	struct pci_dev *pdev;
+
+	if (!dev_is_pci(master->dev))
+		return -ENOSYS;
+
+	pdev = to_pci_dev(master->dev);
+
+	features = pci_pasid_features(pdev);
+	if (features < 0)
+		return -ENOSYS;
+
+	num_pasids = pci_max_pasids(pdev);
+	if (num_pasids <= 0)
+		return -ENOSYS;
+
+	pasid_bits = min_t(u8, ilog2(num_pasids), master->smmu->ssid_bits);
+
+	dev_dbg(&pdev->dev, "device supports %#x PASID bits [%s%s]\n", pasid_bits,
+		(features & PCI_PASID_CAP_EXEC) ? "x" : "",
+		(features & PCI_PASID_CAP_PRIV) ? "p" : "");
+
+	ret = pci_enable_pasid(pdev, features);
+	return ret ? ret : pasid_bits;
+}
+
+static void arm_smmu_disable_pasid(struct arm_smmu_master_data *master)
+{
+	struct pci_dev *pdev;
+
+	if (!dev_is_pci(master->dev))
+		return;
+
+	pdev = to_pci_dev(master->dev);
+
+	if (!pdev->pasid_enabled)
+		return;
+
+	pci_disable_pasid(pdev);
+}
+
 static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
 				  struct arm_smmu_master_data *master)
 {
@@ -2728,6 +2774,11 @@ static int arm_smmu_add_device(struct device *dev)
 		master->ste.can_stall = true;
 	}
 
+	/* PASID must be enabled before ATS */
+	ret = arm_smmu_enable_pasid(master);
+	if (ret > 0)
+		master->ssid_bits = ret;
+
 	if (!arm_smmu_enable_ats(master))
 		arm_smmu_enable_pri(master);
 
@@ -2746,6 +2797,7 @@ static int arm_smmu_add_device(struct device *dev)
 err_disable_pri:
 	arm_smmu_disable_pri(master);
 	arm_smmu_disable_ats(master);
+	arm_smmu_disable_pasid(master);
 
 	return ret;
 }
@@ -2766,7 +2818,9 @@ static void arm_smmu_remove_device(struct device *dev)
 	arm_smmu_remove_master(smmu, master);
 
 	arm_smmu_disable_pri(master);
+	/* PASID must be disabled after ATS */
 	arm_smmu_disable_ats(master);
+	arm_smmu_disable_pasid(master);
 
 	iommu_group_remove_device(dev);
 	iommu_device_unlink(&smmu->iommu, dev);
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 37/37] vfio: Add support for Shared Virtual Addressing
  2018-02-12 18:33 ` Jean-Philippe Brucker
@ 2018-02-12 18:33   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

Add two new ioctl for VFIO containers. VFIO_IOMMU_BIND_PROCESS creates a
bond between a container and a process address space, identified by a
device-specific ID named PASID. This allows the device to target DMA
transactions at the process virtual addresses without a need for mapping
and unmapping buffers explicitly in the IOMMU. The process page tables are
shared with the IOMMU, and mechanisms such as PCI ATS/PRI are used to
handle faults. VFIO_IOMMU_UNBIND_PROCESS removes a bond created with
VFIO_IOMMU_BIND_PROCESS.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/vfio/vfio_iommu_type1.c | 399 ++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/vfio.h       |  76 ++++++++
 2 files changed, 475 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index e30e29ae4819..cac066f0026b 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -30,6 +30,7 @@
 #include <linux/iommu.h>
 #include <linux/module.h>
 #include <linux/mm.h>
+#include <linux/ptrace.h>
 #include <linux/rbtree.h>
 #include <linux/sched/signal.h>
 #include <linux/sched/mm.h>
@@ -60,6 +61,7 @@ MODULE_PARM_DESC(disable_hugepages,
 
 struct vfio_iommu {
 	struct list_head	domain_list;
+	struct list_head	mm_list;
 	struct vfio_domain	*external_domain; /* domain for external user */
 	struct mutex		lock;
 	struct rb_root		dma_list;
@@ -90,6 +92,15 @@ struct vfio_dma {
 struct vfio_group {
 	struct iommu_group	*iommu_group;
 	struct list_head	next;
+	bool			sva_enabled;
+};
+
+struct vfio_mm {
+#define VFIO_PASID_INVALID	(-1)
+	spinlock_t		lock;
+	int			pasid;
+	struct mm_struct	*mm;
+	struct list_head	next;
 };
 
 /*
@@ -1117,6 +1128,157 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
 	return 0;
 }
 
+static int vfio_iommu_mm_exit(struct device *dev, int pasid, void *data)
+{
+	struct vfio_mm *vfio_mm = data;
+
+	/*
+	 * The mm_exit callback cannot block, so we can't take the iommu mutex
+	 * and remove this vfio_mm from the list. Hopefully the SVA code will
+	 * relax its locking requirement in the future.
+	 *
+	 * We mostly care about attach_group, which will attempt to replay all
+	 * binds in this container. Ensure that it doesn't touch this defunct mm
+	 * struct, by clearing the pointer. The structure will be freed when the
+	 * group is removed from the container.
+	 */
+	spin_lock(&vfio_mm->lock);
+	vfio_mm->mm = NULL;
+	spin_unlock(&vfio_mm->lock);
+
+	return 0;
+}
+
+static int vfio_iommu_sva_init(struct device *dev, void *data)
+{
+
+	int ret;
+
+	ret = iommu_sva_device_init(dev, IOMMU_SVA_FEAT_PASID |
+				    IOMMU_SVA_FEAT_IOPF, 0);
+	if (ret)
+		return ret;
+
+	return iommu_register_mm_exit_handler(dev, vfio_iommu_mm_exit);
+}
+
+static int vfio_iommu_sva_shutdown(struct device *dev, void *data)
+{
+	iommu_sva_device_shutdown(dev);
+	iommu_unregister_mm_exit_handler(dev);
+
+	return 0;
+}
+
+static int vfio_iommu_bind_group(struct vfio_iommu *iommu,
+				 struct vfio_group *group,
+				 struct vfio_mm *vfio_mm)
+{
+	int ret;
+	int pasid;
+
+	if (!group->sva_enabled) {
+		ret = iommu_group_for_each_dev(group->iommu_group, NULL,
+					       vfio_iommu_sva_init);
+		if (ret)
+			return ret;
+
+		group->sva_enabled = true;
+	}
+
+	ret = iommu_sva_bind_group(group->iommu_group, vfio_mm->mm, &pasid,
+				   IOMMU_SVA_FEAT_PASID | IOMMU_SVA_FEAT_IOPF,
+				   vfio_mm);
+	if (ret)
+		return ret;
+
+	if (WARN_ON(vfio_mm->pasid != VFIO_PASID_INVALID && pasid !=
+		    vfio_mm->pasid))
+		return -EFAULT;
+
+	vfio_mm->pasid = pasid;
+
+	return 0;
+}
+
+static void vfio_iommu_unbind_group(struct vfio_group *group,
+				    struct vfio_mm *vfio_mm)
+{
+	iommu_sva_unbind_group(group->iommu_group, vfio_mm->pasid);
+}
+
+static void vfio_iommu_unbind(struct vfio_iommu *iommu,
+			      struct vfio_mm *vfio_mm)
+{
+	struct vfio_group *group;
+	struct vfio_domain *domain;
+
+	list_for_each_entry(domain, &iommu->domain_list, next)
+		list_for_each_entry(group, &domain->group_list, next)
+			vfio_iommu_unbind_group(group, vfio_mm);
+}
+
+static bool vfio_mm_get(struct vfio_mm *vfio_mm)
+{
+	bool ret;
+
+	spin_lock(&vfio_mm->lock);
+	ret = vfio_mm->mm && mmget_not_zero(vfio_mm->mm);
+	spin_unlock(&vfio_mm->lock);
+
+	return ret;
+}
+
+static void vfio_mm_put(struct vfio_mm *vfio_mm)
+{
+	mmput(vfio_mm->mm);
+}
+
+static int vfio_iommu_replay_bind(struct vfio_iommu *iommu, struct vfio_group *group)
+{
+	int ret = 0;
+	struct vfio_mm *vfio_mm;
+
+	list_for_each_entry(vfio_mm, &iommu->mm_list, next) {
+		/*
+		 * Ensure mm doesn't exit while we're binding it to the new
+		 * group.
+		 */
+		if (!vfio_mm_get(vfio_mm))
+			continue;
+		ret = vfio_iommu_bind_group(iommu, group, vfio_mm);
+		vfio_mm_put(vfio_mm);
+
+		if (ret)
+			goto out_unbind;
+	}
+
+	return 0;
+
+out_unbind:
+	list_for_each_entry_continue_reverse(vfio_mm, &iommu->mm_list, next) {
+		if (!vfio_mm_get(vfio_mm))
+			continue;
+		iommu_sva_unbind_group(group->iommu_group, vfio_mm->pasid);
+		vfio_mm_put(vfio_mm);
+	}
+
+	return ret;
+}
+
+static void vfio_iommu_free_all_mm(struct vfio_iommu *iommu)
+{
+	struct vfio_mm *vfio_mm, *tmp;
+
+	/*
+	 * No need for unbind() here. Since all groups are detached from this
+	 * iommu, bonds have been removed.
+	 */
+	list_for_each_entry_safe(vfio_mm, tmp, &iommu->mm_list, next)
+		kfree(vfio_mm);
+	INIT_LIST_HEAD(&iommu->mm_list);
+}
+
 /*
  * We change our unmap behavior slightly depending on whether the IOMMU
  * supports fine-grained superpages.  IOMMUs like AMD-Vi will use a superpage
@@ -1301,6 +1463,15 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 		    d->prot == domain->prot) {
 			iommu_detach_group(domain->domain, iommu_group);
 			if (!iommu_attach_group(d->domain, iommu_group)) {
+				if (vfio_iommu_replay_bind(iommu, group)) {
+					iommu_detach_group(d->domain, iommu_group);
+					ret = iommu_attach_group(domain->domain,
+								 iommu_group);
+					if (ret)
+						goto out_domain;
+					continue;
+				}
+
 				list_add(&group->next, &d->group_list);
 				iommu_domain_free(domain->domain);
 				kfree(domain);
@@ -1321,6 +1492,10 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	if (ret)
 		goto out_detach;
 
+	ret = vfio_iommu_replay_bind(iommu, group);
+	if (ret)
+		goto out_detach;
+
 	if (resv_msi) {
 		ret = iommu_get_msi_cookie(domain->domain, resv_msi_base);
 		if (ret)
@@ -1426,6 +1601,11 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
 			continue;
 
 		iommu_detach_group(domain->domain, iommu_group);
+		if (group->sva_enabled) {
+			iommu_group_for_each_dev(iommu_group, NULL,
+						 vfio_iommu_sva_shutdown);
+			group->sva_enabled = false;
+		}
 		list_del(&group->next);
 		kfree(group);
 		/*
@@ -1441,6 +1621,7 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
 					vfio_iommu_unmap_unpin_all(iommu);
 				else
 					vfio_iommu_unmap_unpin_reaccount(iommu);
+				vfio_iommu_free_all_mm(iommu);
 			}
 			iommu_domain_free(domain->domain);
 			list_del(&domain->next);
@@ -1475,6 +1656,7 @@ static void *vfio_iommu_type1_open(unsigned long arg)
 	}
 
 	INIT_LIST_HEAD(&iommu->domain_list);
+	INIT_LIST_HEAD(&iommu->mm_list);
 	iommu->dma_list = RB_ROOT;
 	mutex_init(&iommu->lock);
 	BLOCKING_INIT_NOTIFIER_HEAD(&iommu->notifier);
@@ -1509,6 +1691,7 @@ static void vfio_iommu_type1_release(void *iommu_data)
 		kfree(iommu->external_domain);
 	}
 
+	vfio_iommu_free_all_mm(iommu);
 	vfio_iommu_unmap_unpin_all(iommu);
 
 	list_for_each_entry_safe(domain, domain_tmp,
@@ -1537,6 +1720,184 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
 	return ret;
 }
 
+static struct mm_struct *vfio_iommu_get_mm_by_vpid(pid_t vpid)
+{
+	struct mm_struct *mm;
+	struct task_struct *task;
+
+	rcu_read_lock();
+	task = find_task_by_vpid(vpid);
+	if (task)
+		get_task_struct(task);
+	rcu_read_unlock();
+	if (!task)
+		return ERR_PTR(-ESRCH);
+
+	/* Ensure that current has RW access on the mm */
+	mm = mm_access(task, PTRACE_MODE_ATTACH_REALCREDS);
+	put_task_struct(task);
+
+	if (!mm)
+		return ERR_PTR(-ESRCH);
+
+	return mm;
+}
+
+static long vfio_iommu_type1_bind_process(struct vfio_iommu *iommu,
+					  void __user *arg,
+					  struct vfio_iommu_type1_bind *bind)
+{
+	struct vfio_iommu_type1_bind_process params;
+	struct vfio_domain *domain;
+	struct vfio_group *group;
+	struct vfio_mm *vfio_mm;
+	struct mm_struct *mm;
+	unsigned long minsz;
+	int ret = 0;
+
+	minsz = sizeof(*bind) + sizeof(params);
+	if (bind->argsz < minsz)
+		return -EINVAL;
+
+	arg += sizeof(*bind);
+	if (copy_from_user(&params, arg, sizeof(params)))
+		return -EFAULT;
+
+	if (params.flags & ~VFIO_IOMMU_BIND_PID)
+		return -EINVAL;
+
+	if (params.flags & VFIO_IOMMU_BIND_PID) {
+		mm = vfio_iommu_get_mm_by_vpid(params.pid);
+		if (IS_ERR(mm))
+			return PTR_ERR(mm);
+	} else {
+		mm = get_task_mm(current);
+		if (!mm)
+			return -EINVAL;
+	}
+
+	mutex_lock(&iommu->lock);
+	if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) {
+		ret = -EINVAL;
+		goto out_put_mm;
+	}
+
+	list_for_each_entry(vfio_mm, &iommu->mm_list, next) {
+		if (vfio_mm->mm != mm)
+			continue;
+
+		params.pasid = vfio_mm->pasid;
+
+		ret = copy_to_user(arg, &params, sizeof(params)) ? -EFAULT : 0;
+		goto out_put_mm;
+	}
+
+	vfio_mm = kzalloc(sizeof(*vfio_mm), GFP_KERNEL);
+	if (!vfio_mm) {
+		ret = -ENOMEM;
+		goto out_put_mm;
+	}
+
+	vfio_mm->mm = mm;
+	vfio_mm->pasid = VFIO_PASID_INVALID;
+	spin_lock_init(&vfio_mm->lock);
+
+	list_for_each_entry(domain, &iommu->domain_list, next) {
+		list_for_each_entry(group, &domain->group_list, next) {
+			ret = vfio_iommu_bind_group(iommu, group, vfio_mm);
+			if (ret)
+				break;
+		}
+		if (ret)
+			break;
+	}
+
+	if (ret) {
+		/* Undo all binds that already succeeded */
+		list_for_each_entry_continue_reverse(group, &domain->group_list,
+						     next)
+			vfio_iommu_unbind_group(group, vfio_mm);
+		list_for_each_entry_continue_reverse(domain, &iommu->domain_list,
+						     next)
+			list_for_each_entry(group, &domain->group_list, next)
+				vfio_iommu_unbind_group(group, vfio_mm);
+		kfree(vfio_mm);
+	} else {
+		list_add(&vfio_mm->next, &iommu->mm_list);
+
+		params.pasid = vfio_mm->pasid;
+		ret = copy_to_user(arg, &params, sizeof(params)) ? -EFAULT : 0;
+		if (ret) {
+			vfio_iommu_unbind(iommu, vfio_mm);
+			kfree(vfio_mm);
+		}
+	}
+
+out_put_mm:
+	mutex_unlock(&iommu->lock);
+	mmput(mm);
+
+	return ret;
+}
+
+static long vfio_iommu_type1_unbind_process(struct vfio_iommu *iommu,
+					    void __user *arg,
+					    struct vfio_iommu_type1_bind *bind)
+{
+	int ret = -EINVAL;
+	unsigned long minsz;
+	struct mm_struct *mm;
+	struct vfio_mm *vfio_mm;
+	struct vfio_iommu_type1_bind_process params;
+
+	minsz = sizeof(*bind) + sizeof(params);
+	if (bind->argsz < minsz)
+		return -EINVAL;
+
+	arg += sizeof(*bind);
+	if (copy_from_user(&params, arg, sizeof(params)))
+		return -EFAULT;
+
+	if (params.flags & ~VFIO_IOMMU_BIND_PID)
+		return -EINVAL;
+
+	/*
+	 * We can't simply unbind a foreign process by PASID, because the
+	 * process might have died and the PASID might have been reallocated to
+	 * another process. Instead we need to fetch that process mm by PID
+	 * again to make sure we remove the right vfio_mm. In addition, holding
+	 * the mm guarantees that mm_users isn't dropped while we unbind and the
+	 * exit_mm handler doesn't fire. While not strictly necessary, not
+	 * having to care about that race simplifies everyone's life.
+	 */
+	if (params.flags & VFIO_IOMMU_BIND_PID) {
+		mm = vfio_iommu_get_mm_by_vpid(params.pid);
+		if (IS_ERR(mm))
+			return PTR_ERR(mm);
+	} else {
+		mm = get_task_mm(current);
+		if (!mm)
+			return -EINVAL;
+	}
+
+	ret = -ESRCH;
+	mutex_lock(&iommu->lock);
+	list_for_each_entry(vfio_mm, &iommu->mm_list, next) {
+		if (vfio_mm->mm != mm)
+			continue;
+
+		vfio_iommu_unbind(iommu, vfio_mm);
+		list_del(&vfio_mm->next);
+		kfree(vfio_mm);
+		ret = 0;
+		break;
+	}
+	mutex_unlock(&iommu->lock);
+	mmput(mm);
+
+	return ret;
+}
+
 static long vfio_iommu_type1_ioctl(void *iommu_data,
 				   unsigned int cmd, unsigned long arg)
 {
@@ -1607,6 +1968,44 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 
 		return copy_to_user((void __user *)arg, &unmap, minsz) ?
 			-EFAULT : 0;
+
+	} else if (cmd == VFIO_IOMMU_BIND) {
+		struct vfio_iommu_type1_bind bind;
+
+		minsz = offsetofend(struct vfio_iommu_type1_bind, mode);
+
+		if (copy_from_user(&bind, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (bind.argsz < minsz)
+			return -EINVAL;
+
+		switch (bind.mode) {
+		case VFIO_IOMMU_BIND_PROCESS:
+			return vfio_iommu_type1_bind_process(iommu, (void *)arg,
+							     &bind);
+		default:
+			return -EINVAL;
+		}
+
+	} else if (cmd == VFIO_IOMMU_UNBIND) {
+		struct vfio_iommu_type1_bind bind;
+
+		minsz = offsetofend(struct vfio_iommu_type1_bind, mode);
+
+		if (copy_from_user(&bind, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (bind.argsz < minsz)
+			return -EINVAL;
+
+		switch (bind.mode) {
+		case VFIO_IOMMU_BIND_PROCESS:
+			return vfio_iommu_type1_unbind_process(iommu, (void *)arg,
+							       &bind);
+		default:
+			return -EINVAL;
+		}
 	}
 
 	return -ENOTTY;
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index c74372163ed2..e1b9b8c58916 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -638,6 +638,82 @@ struct vfio_iommu_type1_dma_unmap {
 #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
 #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
 
+/*
+ * VFIO_IOMMU_BIND_PROCESS
+ *
+ * Allocate a PASID for a process address space, and use it to attach this
+ * process to all devices in the container. Devices can then tag their DMA
+ * traffic with the returned @pasid to perform transactions on the associated
+ * virtual address space. Mapping and unmapping buffers is performed by standard
+ * functions such as mmap and malloc.
+ *
+ * If flag is VFIO_IOMMU_BIND_PID, @pid contains the pid of a foreign process to
+ * bind. Otherwise the current task is bound. Given that the caller owns the
+ * device, setting this flag grants the caller read and write permissions on the
+ * entire address space of foreign process described by @pid. Therefore,
+ * permission to perform the bind operation on a foreign process is governed by
+ * the ptrace access mode PTRACE_MODE_ATTACH_REALCREDS check. See man ptrace(2)
+ * for more information.
+ *
+ * On success, VFIO writes a Process Address Space ID (PASID) into @pasid. This
+ * ID is unique to a process and can be used on all devices in the container.
+ *
+ * On fork, the child inherits the device fd and can use the bonds setup by its
+ * parent. Consequently, the child has R/W access on the address spaces bound by
+ * its parent. After an execv, the device fd is closed and the child doesn't
+ * have access to the address space anymore.
+ *
+ * To remove a bond between process and container, VFIO_IOMMU_UNBIND ioctl is
+ * issued with the same parameters. If a pid was specified in VFIO_IOMMU_BIND,
+ * it should also be present for VFIO_IOMMU_UNBIND. Otherwise unbind the current
+ * task from the container.
+ */
+struct vfio_iommu_type1_bind_process {
+	__u32	flags;
+#define VFIO_IOMMU_BIND_PID		(1 << 0)
+	__u32	pasid;
+	__s32	pid;
+};
+
+/*
+ * Only mode supported at the moment is VFIO_IOMMU_BIND_PROCESS, which takes
+ * vfio_iommu_type1_bind_process in data.
+ */
+struct vfio_iommu_type1_bind {
+	__u32	argsz;
+	__u32	mode;
+#define VFIO_IOMMU_BIND_PROCESS		(1 << 0)
+	__u8	data[];
+};
+
+/*
+ * VFIO_IOMMU_BIND - _IOWR(VFIO_TYPE, VFIO_BASE + 22, struct vfio_iommu_bind)
+ *
+ * Manage address spaces of devices in this container. Initially a TYPE1
+ * container can only have one address space, managed with
+ * VFIO_IOMMU_MAP/UNMAP_DMA.
+ *
+ * An IOMMU of type VFIO_TYPE1_NESTING_IOMMU can be managed by both MAP/UNMAP
+ * and BIND ioctls at the same time. MAP/UNMAP acts on the stage-2 (host) page
+ * tables, and BIND manages the stage-1 (guest) page tables. Other types of
+ * IOMMU may allow MAP/UNMAP and BIND to coexist, where MAP/UNMAP controls
+ * non-PASID traffic and BIND controls PASID traffic. But this depends on the
+ * underlying IOMMU architecture and isn't guaranteed.
+ *
+ * Availability of this feature depends on the device, its bus, the underlying
+ * IOMMU and the CPU architecture.
+ *
+ * returns: 0 on success, -errno on failure.
+ */
+#define VFIO_IOMMU_BIND		_IO(VFIO_TYPE, VFIO_BASE + 22)
+
+/*
+ * VFIO_IOMMU_UNBIND - _IOWR(VFIO_TYPE, VFIO_BASE + 23, struct vfio_iommu_bind)
+ *
+ * Undo what was done by the corresponding VFIO_IOMMU_BIND ioctl.
+ */
+#define VFIO_IOMMU_UNBIND	_IO(VFIO_TYPE, VFIO_BASE + 23)
+
 /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
 
 /*
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [PATCH 37/37] vfio: Add support for Shared Virtual Addressing
@ 2018-02-12 18:33   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-12 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

Add two new ioctl for VFIO containers. VFIO_IOMMU_BIND_PROCESS creates a
bond between a container and a process address space, identified by a
device-specific ID named PASID. This allows the device to target DMA
transactions at the process virtual addresses without a need for mapping
and unmapping buffers explicitly in the IOMMU. The process page tables are
shared with the IOMMU, and mechanisms such as PCI ATS/PRI are used to
handle faults. VFIO_IOMMU_UNBIND_PROCESS removes a bond created with
VFIO_IOMMU_BIND_PROCESS.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/vfio/vfio_iommu_type1.c | 399 ++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/vfio.h       |  76 ++++++++
 2 files changed, 475 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index e30e29ae4819..cac066f0026b 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -30,6 +30,7 @@
 #include <linux/iommu.h>
 #include <linux/module.h>
 #include <linux/mm.h>
+#include <linux/ptrace.h>
 #include <linux/rbtree.h>
 #include <linux/sched/signal.h>
 #include <linux/sched/mm.h>
@@ -60,6 +61,7 @@ MODULE_PARM_DESC(disable_hugepages,
 
 struct vfio_iommu {
 	struct list_head	domain_list;
+	struct list_head	mm_list;
 	struct vfio_domain	*external_domain; /* domain for external user */
 	struct mutex		lock;
 	struct rb_root		dma_list;
@@ -90,6 +92,15 @@ struct vfio_dma {
 struct vfio_group {
 	struct iommu_group	*iommu_group;
 	struct list_head	next;
+	bool			sva_enabled;
+};
+
+struct vfio_mm {
+#define VFIO_PASID_INVALID	(-1)
+	spinlock_t		lock;
+	int			pasid;
+	struct mm_struct	*mm;
+	struct list_head	next;
 };
 
 /*
@@ -1117,6 +1128,157 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
 	return 0;
 }
 
+static int vfio_iommu_mm_exit(struct device *dev, int pasid, void *data)
+{
+	struct vfio_mm *vfio_mm = data;
+
+	/*
+	 * The mm_exit callback cannot block, so we can't take the iommu mutex
+	 * and remove this vfio_mm from the list. Hopefully the SVA code will
+	 * relax its locking requirement in the future.
+	 *
+	 * We mostly care about attach_group, which will attempt to replay all
+	 * binds in this container. Ensure that it doesn't touch this defunct mm
+	 * struct, by clearing the pointer. The structure will be freed when the
+	 * group is removed from the container.
+	 */
+	spin_lock(&vfio_mm->lock);
+	vfio_mm->mm = NULL;
+	spin_unlock(&vfio_mm->lock);
+
+	return 0;
+}
+
+static int vfio_iommu_sva_init(struct device *dev, void *data)
+{
+
+	int ret;
+
+	ret = iommu_sva_device_init(dev, IOMMU_SVA_FEAT_PASID |
+				    IOMMU_SVA_FEAT_IOPF, 0);
+	if (ret)
+		return ret;
+
+	return iommu_register_mm_exit_handler(dev, vfio_iommu_mm_exit);
+}
+
+static int vfio_iommu_sva_shutdown(struct device *dev, void *data)
+{
+	iommu_sva_device_shutdown(dev);
+	iommu_unregister_mm_exit_handler(dev);
+
+	return 0;
+}
+
+static int vfio_iommu_bind_group(struct vfio_iommu *iommu,
+				 struct vfio_group *group,
+				 struct vfio_mm *vfio_mm)
+{
+	int ret;
+	int pasid;
+
+	if (!group->sva_enabled) {
+		ret = iommu_group_for_each_dev(group->iommu_group, NULL,
+					       vfio_iommu_sva_init);
+		if (ret)
+			return ret;
+
+		group->sva_enabled = true;
+	}
+
+	ret = iommu_sva_bind_group(group->iommu_group, vfio_mm->mm, &pasid,
+				   IOMMU_SVA_FEAT_PASID | IOMMU_SVA_FEAT_IOPF,
+				   vfio_mm);
+	if (ret)
+		return ret;
+
+	if (WARN_ON(vfio_mm->pasid != VFIO_PASID_INVALID && pasid !=
+		    vfio_mm->pasid))
+		return -EFAULT;
+
+	vfio_mm->pasid = pasid;
+
+	return 0;
+}
+
+static void vfio_iommu_unbind_group(struct vfio_group *group,
+				    struct vfio_mm *vfio_mm)
+{
+	iommu_sva_unbind_group(group->iommu_group, vfio_mm->pasid);
+}
+
+static void vfio_iommu_unbind(struct vfio_iommu *iommu,
+			      struct vfio_mm *vfio_mm)
+{
+	struct vfio_group *group;
+	struct vfio_domain *domain;
+
+	list_for_each_entry(domain, &iommu->domain_list, next)
+		list_for_each_entry(group, &domain->group_list, next)
+			vfio_iommu_unbind_group(group, vfio_mm);
+}
+
+static bool vfio_mm_get(struct vfio_mm *vfio_mm)
+{
+	bool ret;
+
+	spin_lock(&vfio_mm->lock);
+	ret = vfio_mm->mm && mmget_not_zero(vfio_mm->mm);
+	spin_unlock(&vfio_mm->lock);
+
+	return ret;
+}
+
+static void vfio_mm_put(struct vfio_mm *vfio_mm)
+{
+	mmput(vfio_mm->mm);
+}
+
+static int vfio_iommu_replay_bind(struct vfio_iommu *iommu, struct vfio_group *group)
+{
+	int ret = 0;
+	struct vfio_mm *vfio_mm;
+
+	list_for_each_entry(vfio_mm, &iommu->mm_list, next) {
+		/*
+		 * Ensure mm doesn't exit while we're binding it to the new
+		 * group.
+		 */
+		if (!vfio_mm_get(vfio_mm))
+			continue;
+		ret = vfio_iommu_bind_group(iommu, group, vfio_mm);
+		vfio_mm_put(vfio_mm);
+
+		if (ret)
+			goto out_unbind;
+	}
+
+	return 0;
+
+out_unbind:
+	list_for_each_entry_continue_reverse(vfio_mm, &iommu->mm_list, next) {
+		if (!vfio_mm_get(vfio_mm))
+			continue;
+		iommu_sva_unbind_group(group->iommu_group, vfio_mm->pasid);
+		vfio_mm_put(vfio_mm);
+	}
+
+	return ret;
+}
+
+static void vfio_iommu_free_all_mm(struct vfio_iommu *iommu)
+{
+	struct vfio_mm *vfio_mm, *tmp;
+
+	/*
+	 * No need for unbind() here. Since all groups are detached from this
+	 * iommu, bonds have been removed.
+	 */
+	list_for_each_entry_safe(vfio_mm, tmp, &iommu->mm_list, next)
+		kfree(vfio_mm);
+	INIT_LIST_HEAD(&iommu->mm_list);
+}
+
 /*
  * We change our unmap behavior slightly depending on whether the IOMMU
  * supports fine-grained superpages.  IOMMUs like AMD-Vi will use a superpage
@@ -1301,6 +1463,15 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 		    d->prot == domain->prot) {
 			iommu_detach_group(domain->domain, iommu_group);
 			if (!iommu_attach_group(d->domain, iommu_group)) {
+				if (vfio_iommu_replay_bind(iommu, group)) {
+					iommu_detach_group(d->domain, iommu_group);
+					ret = iommu_attach_group(domain->domain,
+								 iommu_group);
+					if (ret)
+						goto out_domain;
+					continue;
+				}
+
 				list_add(&group->next, &d->group_list);
 				iommu_domain_free(domain->domain);
 				kfree(domain);
@@ -1321,6 +1492,10 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	if (ret)
 		goto out_detach;
 
+	ret = vfio_iommu_replay_bind(iommu, group);
+	if (ret)
+		goto out_detach;
+
 	if (resv_msi) {
 		ret = iommu_get_msi_cookie(domain->domain, resv_msi_base);
 		if (ret)
@@ -1426,6 +1601,11 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
 			continue;
 
 		iommu_detach_group(domain->domain, iommu_group);
+		if (group->sva_enabled) {
+			iommu_group_for_each_dev(iommu_group, NULL,
+						 vfio_iommu_sva_shutdown);
+			group->sva_enabled = false;
+		}
 		list_del(&group->next);
 		kfree(group);
 		/*
@@ -1441,6 +1621,7 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
 					vfio_iommu_unmap_unpin_all(iommu);
 				else
 					vfio_iommu_unmap_unpin_reaccount(iommu);
+				vfio_iommu_free_all_mm(iommu);
 			}
 			iommu_domain_free(domain->domain);
 			list_del(&domain->next);
@@ -1475,6 +1656,7 @@ static void *vfio_iommu_type1_open(unsigned long arg)
 	}
 
 	INIT_LIST_HEAD(&iommu->domain_list);
+	INIT_LIST_HEAD(&iommu->mm_list);
 	iommu->dma_list = RB_ROOT;
 	mutex_init(&iommu->lock);
 	BLOCKING_INIT_NOTIFIER_HEAD(&iommu->notifier);
@@ -1509,6 +1691,7 @@ static void vfio_iommu_type1_release(void *iommu_data)
 		kfree(iommu->external_domain);
 	}
 
+	vfio_iommu_free_all_mm(iommu);
 	vfio_iommu_unmap_unpin_all(iommu);
 
 	list_for_each_entry_safe(domain, domain_tmp,
@@ -1537,6 +1720,184 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
 	return ret;
 }
 
+static struct mm_struct *vfio_iommu_get_mm_by_vpid(pid_t vpid)
+{
+	struct mm_struct *mm;
+	struct task_struct *task;
+
+	rcu_read_lock();
+	task = find_task_by_vpid(vpid);
+	if (task)
+		get_task_struct(task);
+	rcu_read_unlock();
+	if (!task)
+		return ERR_PTR(-ESRCH);
+
+	/* Ensure that current has RW access on the mm */
+	mm = mm_access(task, PTRACE_MODE_ATTACH_REALCREDS);
+	put_task_struct(task);
+
+	if (!mm)
+		return ERR_PTR(-ESRCH);
+
+	return mm;
+}
+
+static long vfio_iommu_type1_bind_process(struct vfio_iommu *iommu,
+					  void __user *arg,
+					  struct vfio_iommu_type1_bind *bind)
+{
+	struct vfio_iommu_type1_bind_process params;
+	struct vfio_domain *domain;
+	struct vfio_group *group;
+	struct vfio_mm *vfio_mm;
+	struct mm_struct *mm;
+	unsigned long minsz;
+	int ret = 0;
+
+	minsz = sizeof(*bind) + sizeof(params);
+	if (bind->argsz < minsz)
+		return -EINVAL;
+
+	arg += sizeof(*bind);
+	if (copy_from_user(&params, arg, sizeof(params)))
+		return -EFAULT;
+
+	if (params.flags & ~VFIO_IOMMU_BIND_PID)
+		return -EINVAL;
+
+	if (params.flags & VFIO_IOMMU_BIND_PID) {
+		mm = vfio_iommu_get_mm_by_vpid(params.pid);
+		if (IS_ERR(mm))
+			return PTR_ERR(mm);
+	} else {
+		mm = get_task_mm(current);
+		if (!mm)
+			return -EINVAL;
+	}
+
+	mutex_lock(&iommu->lock);
+	if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) {
+		ret = -EINVAL;
+		goto out_put_mm;
+	}
+
+	list_for_each_entry(vfio_mm, &iommu->mm_list, next) {
+		if (vfio_mm->mm != mm)
+			continue;
+
+		params.pasid = vfio_mm->pasid;
+
+		ret = copy_to_user(arg, &params, sizeof(params)) ? -EFAULT : 0;
+		goto out_put_mm;
+	}
+
+	vfio_mm = kzalloc(sizeof(*vfio_mm), GFP_KERNEL);
+	if (!vfio_mm) {
+		ret = -ENOMEM;
+		goto out_put_mm;
+	}
+
+	vfio_mm->mm = mm;
+	vfio_mm->pasid = VFIO_PASID_INVALID;
+	spin_lock_init(&vfio_mm->lock);
+
+	list_for_each_entry(domain, &iommu->domain_list, next) {
+		list_for_each_entry(group, &domain->group_list, next) {
+			ret = vfio_iommu_bind_group(iommu, group, vfio_mm);
+			if (ret)
+				break;
+		}
+		if (ret)
+			break;
+	}
+
+	if (ret) {
+		/* Undo all binds that already succeeded */
+		list_for_each_entry_continue_reverse(group, &domain->group_list,
+						     next)
+			vfio_iommu_unbind_group(group, vfio_mm);
+		list_for_each_entry_continue_reverse(domain, &iommu->domain_list,
+						     next)
+			list_for_each_entry(group, &domain->group_list, next)
+				vfio_iommu_unbind_group(group, vfio_mm);
+		kfree(vfio_mm);
+	} else {
+		list_add(&vfio_mm->next, &iommu->mm_list);
+
+		params.pasid = vfio_mm->pasid;
+		ret = copy_to_user(arg, &params, sizeof(params)) ? -EFAULT : 0;
+		if (ret) {
+			vfio_iommu_unbind(iommu, vfio_mm);
+			kfree(vfio_mm);
+		}
+	}
+
+out_put_mm:
+	mutex_unlock(&iommu->lock);
+	mmput(mm);
+
+	return ret;
+}
+
+static long vfio_iommu_type1_unbind_process(struct vfio_iommu *iommu,
+					    void __user *arg,
+					    struct vfio_iommu_type1_bind *bind)
+{
+	int ret = -EINVAL;
+	unsigned long minsz;
+	struct mm_struct *mm;
+	struct vfio_mm *vfio_mm;
+	struct vfio_iommu_type1_bind_process params;
+
+	minsz = sizeof(*bind) + sizeof(params);
+	if (bind->argsz < minsz)
+		return -EINVAL;
+
+	arg += sizeof(*bind);
+	if (copy_from_user(&params, arg, sizeof(params)))
+		return -EFAULT;
+
+	if (params.flags & ~VFIO_IOMMU_BIND_PID)
+		return -EINVAL;
+
+	/*
+	 * We can't simply unbind a foreign process by PASID, because the
+	 * process might have died and the PASID might have been reallocated to
+	 * another process. Instead we need to fetch that process mm by PID
+	 * again to make sure we remove the right vfio_mm. In addition, holding
+	 * the mm guarantees that mm_users isn't dropped while we unbind and the
+	 * exit_mm handler doesn't fire. While not strictly necessary, not
+	 * having to care about that race simplifies everyone's life.
+	 */
+	if (params.flags & VFIO_IOMMU_BIND_PID) {
+		mm = vfio_iommu_get_mm_by_vpid(params.pid);
+		if (IS_ERR(mm))
+			return PTR_ERR(mm);
+	} else {
+		mm = get_task_mm(current);
+		if (!mm)
+			return -EINVAL;
+	}
+
+	ret = -ESRCH;
+	mutex_lock(&iommu->lock);
+	list_for_each_entry(vfio_mm, &iommu->mm_list, next) {
+		if (vfio_mm->mm != mm)
+			continue;
+
+		vfio_iommu_unbind(iommu, vfio_mm);
+		list_del(&vfio_mm->next);
+		kfree(vfio_mm);
+		ret = 0;
+		break;
+	}
+	mutex_unlock(&iommu->lock);
+	mmput(mm);
+
+	return ret;
+}
+
 static long vfio_iommu_type1_ioctl(void *iommu_data,
 				   unsigned int cmd, unsigned long arg)
 {
@@ -1607,6 +1968,44 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 
 		return copy_to_user((void __user *)arg, &unmap, minsz) ?
 			-EFAULT : 0;
+
+	} else if (cmd == VFIO_IOMMU_BIND) {
+		struct vfio_iommu_type1_bind bind;
+
+		minsz = offsetofend(struct vfio_iommu_type1_bind, mode);
+
+		if (copy_from_user(&bind, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (bind.argsz < minsz)
+			return -EINVAL;
+
+		switch (bind.mode) {
+		case VFIO_IOMMU_BIND_PROCESS:
+			return vfio_iommu_type1_bind_process(iommu, (void *)arg,
+							     &bind);
+		default:
+			return -EINVAL;
+		}
+
+	} else if (cmd == VFIO_IOMMU_UNBIND) {
+		struct vfio_iommu_type1_bind bind;
+
+		minsz = offsetofend(struct vfio_iommu_type1_bind, mode);
+
+		if (copy_from_user(&bind, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (bind.argsz < minsz)
+			return -EINVAL;
+
+		switch (bind.mode) {
+		case VFIO_IOMMU_BIND_PROCESS:
+			return vfio_iommu_type1_unbind_process(iommu, (void *)arg,
+							       &bind);
+		default:
+			return -EINVAL;
+		}
 	}
 
 	return -ENOTTY;
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index c74372163ed2..e1b9b8c58916 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -638,6 +638,82 @@ struct vfio_iommu_type1_dma_unmap {
 #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
 #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
 
+/*
+ * VFIO_IOMMU_BIND_PROCESS
+ *
+ * Allocate a PASID for a process address space, and use it to attach this
+ * process to all devices in the container. Devices can then tag their DMA
+ * traffic with the returned @pasid to perform transactions on the associated
+ * virtual address space. Mapping and unmapping buffers is performed by standard
+ * functions such as mmap and malloc.
+ *
+ * If flag is VFIO_IOMMU_BIND_PID, @pid contains the pid of a foreign process to
+ * bind. Otherwise the current task is bound. Given that the caller owns the
+ * device, setting this flag grants the caller read and write permissions on the
+ * entire address space of foreign process described by @pid. Therefore,
+ * permission to perform the bind operation on a foreign process is governed by
+ * the ptrace access mode PTRACE_MODE_ATTACH_REALCREDS check. See man ptrace(2)
+ * for more information.
+ *
+ * On success, VFIO writes a Process Address Space ID (PASID) into @pasid. This
+ * ID is unique to a process and can be used on all devices in the container.
+ *
+ * On fork, the child inherits the device fd and can use the bonds setup by its
+ * parent. Consequently, the child has R/W access on the address spaces bound by
+ * its parent. After an execv, the device fd is closed and the child doesn't
+ * have access to the address space anymore.
+ *
+ * To remove a bond between process and container, VFIO_IOMMU_UNBIND ioctl is
+ * issued with the same parameters. If a pid was specified in VFIO_IOMMU_BIND,
+ * it should also be present for VFIO_IOMMU_UNBIND. Otherwise unbind the current
+ * task from the container.
+ */
+struct vfio_iommu_type1_bind_process {
+	__u32	flags;
+#define VFIO_IOMMU_BIND_PID		(1 << 0)
+	__u32	pasid;
+	__s32	pid;
+};
+
+/*
+ * Only mode supported at the moment is VFIO_IOMMU_BIND_PROCESS, which takes
+ * vfio_iommu_type1_bind_process in data.
+ */
+struct vfio_iommu_type1_bind {
+	__u32	argsz;
+	__u32	mode;
+#define VFIO_IOMMU_BIND_PROCESS		(1 << 0)
+	__u8	data[];
+};
+
+/*
+ * VFIO_IOMMU_BIND - _IOWR(VFIO_TYPE, VFIO_BASE + 22, struct vfio_iommu_bind)
+ *
+ * Manage address spaces of devices in this container. Initially a TYPE1
+ * container can only have one address space, managed with
+ * VFIO_IOMMU_MAP/UNMAP_DMA.
+ *
+ * An IOMMU of type VFIO_TYPE1_NESTING_IOMMU can be managed by both MAP/UNMAP
+ * and BIND ioctls@the same time. MAP/UNMAP acts on the stage-2 (host) page
+ * tables, and BIND manages the stage-1 (guest) page tables. Other types of
+ * IOMMU may allow MAP/UNMAP and BIND to coexist, where MAP/UNMAP controls
+ * non-PASID traffic and BIND controls PASID traffic. But this depends on the
+ * underlying IOMMU architecture and isn't guaranteed.
+ *
+ * Availability of this feature depends on the device, its bus, the underlying
+ * IOMMU and the CPU architecture.
+ *
+ * returns: 0 on success, -errno on failure.
+ */
+#define VFIO_IOMMU_BIND		_IO(VFIO_TYPE, VFIO_BASE + 22)
+
+/*
+ * VFIO_IOMMU_UNBIND - _IOWR(VFIO_TYPE, VFIO_BASE + 23, struct vfio_iommu_bind)
+ *
+ * Undo what was done by the corresponding VFIO_IOMMU_BIND ioctl.
+ */
+#define VFIO_IOMMU_UNBIND	_IO(VFIO_TYPE, VFIO_BASE + 23)
+
 /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
 
 /*
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* Re: [PATCH 29/37] iommu/arm-smmu-v3: Add stall support for platform devices
  2018-02-12 18:33     ` Jean-Philippe Brucker
  (?)
  (?)
@ 2018-02-13  1:46       ` Xu Zaibo
  -1 siblings, 0 replies; 317+ messages in thread
From: Xu Zaibo @ 2018-02-13  1:46 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, ilias.apalodimas, jonathan.cameron,
	shunyong.yang, nwatters, okaya, jcrouse, rfranz, dwmw2,
	jacob.jun.pan, yi.l.liu, ashok.raj, robdclark, christian.koenig,
	bharatku, liguozhu

Hi,

On 2018/2/13 2:33, Jean-Philippe Brucker wrote:
> The SMMU provides a Stall model for handling page faults in platform
> devices. It is similar to PCI PRI, but doesn't require devices to have
> their own translation cache. Instead, faulting transactions are parked and
> the OS is given a chance to fix the page tables and retry the transaction.
>
> Enable stall for devices that support it (opt-in by firmware). When an
> event corresponds to a translation error, call the IOMMU fault handler. If
> the fault is recoverable, it will call us back to terminate or continue
> the stall.
>
> Note that this patch tweaks the iommu_fault_event and page_response_msg to
> extend the fault id field. Stall uses 16 bits of IDs whereas PCI PRI only
> uses 9.
For PCIe devices without ATC,  can they use this Stall model?

Thanks.

Xu Zaibo
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>   drivers/iommu/arm-smmu-v3.c | 175 +++++++++++++++++++++++++++++++++++++++++++-
>   include/linux/iommu.h       |   4 +-
>   2 files changed, 173 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 2430b2140f8d..8b9f5dd06be0 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -338,6 +338,15 @@
>   #define CMDQ_PRI_1_RESP_FAIL		(1UL << CMDQ_PRI_1_RESP_SHIFT)
>   #define CMDQ_PRI_1_RESP_SUCC		(2UL << CMDQ_PRI_1_RESP_SHIFT)
>   
> +#define CMDQ_RESUME_0_SID_SHIFT		32
> +#define CMDQ_RESUME_0_SID_MASK		0xffffffffUL
> +#define CMDQ_RESUME_0_ACTION_SHIFT	12
> +#define CMDQ_RESUME_0_ACTION_TERM	(0UL << CMDQ_RESUME_0_ACTION_SHIFT)
> +#define CMDQ_RESUME_0_ACTION_RETRY	(1UL << CMDQ_RESUME_0_ACTION_SHIFT)
> +#define CMDQ_RESUME_0_ACTION_ABORT	(2UL << CMDQ_RESUME_0_ACTION_SHIFT)
> +#define CMDQ_RESUME_1_STAG_SHIFT	0
> +#define CMDQ_RESUME_1_STAG_MASK		0xffffUL
> +
>   #define CMDQ_SYNC_0_CS_SHIFT		12
>   #define CMDQ_SYNC_0_CS_NONE		(0UL << CMDQ_SYNC_0_CS_SHIFT)
>   #define CMDQ_SYNC_0_CS_IRQ		(1UL << CMDQ_SYNC_0_CS_SHIFT)
> @@ -358,6 +367,31 @@
>   #define EVTQ_0_ID_SHIFT			0
>   #define EVTQ_0_ID_MASK			0xffUL
>   
> +#define EVT_ID_TRANSLATION_FAULT	0x10
> +#define EVT_ID_ADDR_SIZE_FAULT		0x11
> +#define EVT_ID_ACCESS_FAULT		0x12
> +#define EVT_ID_PERMISSION_FAULT		0x13
> +
> +#define EVTQ_0_SSV			(1UL << 11)
> +#define EVTQ_0_SSID_SHIFT		12
> +#define EVTQ_0_SSID_MASK		0xfffffUL
> +#define EVTQ_0_SID_SHIFT		32
> +#define EVTQ_0_SID_MASK			0xffffffffUL
> +#define EVTQ_1_STAG_SHIFT		0
> +#define EVTQ_1_STAG_MASK		0xffffUL
> +#define EVTQ_1_STALL			(1UL << 31)
> +#define EVTQ_1_PRIV			(1UL << 33)
> +#define EVTQ_1_EXEC			(1UL << 34)
> +#define EVTQ_1_READ			(1UL << 35)
> +#define EVTQ_1_S2			(1UL << 39)
> +#define EVTQ_1_CLASS_SHIFT		40
> +#define EVTQ_1_CLASS_MASK		0x3UL
> +#define EVTQ_1_TT_READ			(1UL << 44)
> +#define EVTQ_2_ADDR_SHIFT		0
> +#define EVTQ_2_ADDR_MASK		0xffffffffffffffffUL
> +#define EVTQ_3_IPA_SHIFT		12
> +#define EVTQ_3_IPA_MASK			0xffffffffffUL
> +
>   /* PRI queue */
>   #define PRIQ_ENT_DWORDS			2
>   #define PRIQ_MAX_SZ_SHIFT		8
> @@ -472,6 +506,13 @@ struct arm_smmu_cmdq_ent {
>   			enum pri_resp		resp;
>   		} pri;
>   
> +		#define CMDQ_OP_RESUME		0x44
> +		struct {
> +			u32			sid;
> +			u16			stag;
> +			enum page_response_code	resp;
> +		} resume;
> +
>   		#define CMDQ_OP_CMD_SYNC	0x46
>   		struct {
>   			u32			msidata;
> @@ -545,6 +586,8 @@ struct arm_smmu_strtab_ent {
>   	bool				assigned;
>   	struct arm_smmu_s1_cfg		*s1_cfg;
>   	struct arm_smmu_s2_cfg		*s2_cfg;
> +
> +	bool				can_stall;
>   };
>   
>   struct arm_smmu_strtab_cfg {
> @@ -904,6 +947,21 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
>   			return -EINVAL;
>   		}
>   		break;
> +	case CMDQ_OP_RESUME:
> +		cmd[0] |= (u64)ent->resume.sid << CMDQ_RESUME_0_SID_SHIFT;
> +		cmd[1] |= ent->resume.stag << CMDQ_RESUME_1_STAG_SHIFT;
> +		switch (ent->resume.resp) {
> +		case IOMMU_PAGE_RESP_INVALID:
> +		case IOMMU_PAGE_RESP_FAILURE:
> +			cmd[0] |= CMDQ_RESUME_0_ACTION_ABORT;
> +			break;
> +		case IOMMU_PAGE_RESP_SUCCESS:
> +			cmd[0] |= CMDQ_RESUME_0_ACTION_RETRY;
> +			break;
> +		default:
> +			return -EINVAL;
> +		}
> +		break;
>   	case CMDQ_OP_CMD_SYNC:
>   		if (ent->sync.msiaddr)
>   			cmd[0] |= CMDQ_SYNC_0_CS_IRQ;
> @@ -1065,6 +1123,35 @@ static void arm_smmu_cmdq_issue_sync(struct arm_smmu_device *smmu)
>   		dev_err_ratelimited(smmu->dev, "CMD_SYNC timeout\n");
>   }
>   
> +static int arm_smmu_page_response(struct iommu_domain *domain,
> +				  struct device *dev,
> +				  struct page_response_msg *resp)
> +{
> +	int sid = dev->iommu_fwspec->ids[0];
> +	struct arm_smmu_cmdq_ent cmd = {0};
> +	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
> +
> +	if (master->ste.can_stall) {
> +		cmd.opcode		= CMDQ_OP_RESUME;
> +		cmd.resume.sid		= sid;
> +		cmd.resume.stag		= resp->page_req_group_id;
> +		cmd.resume.resp		= resp->resp_code;
> +	} else {
> +		/* TODO: put PRI response here */
> +		return -EINVAL;
> +	}
> +
> +	arm_smmu_cmdq_issue_cmd(master->smmu, &cmd);
> +	/*
> +	 * Don't send a SYNC, it doesn't do anything for RESUME or PRI_RESP.
> +	 * RESUME consumption guarantees that the stalled transaction will be
> +	 * terminated... at some point in the future. PRI_RESP is fire and
> +	 * forget.
> +	 */
> +
> +	return 0;
> +}
> +
>   /* Stream table manipulation functions */
>   static void
>   arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
> @@ -1182,7 +1269,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
>   			 STRTAB_STE_1_STRW_SHIFT);
>   
>   		if (smmu->features & ARM_SMMU_FEAT_STALLS &&
> -		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
> +		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE) &&
> +		   !ste->can_stall)
>   			dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
>   
>   		val |= (cfg->base & STRTAB_STE_0_S1CTXPTR_MASK
> @@ -1285,10 +1373,73 @@ arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
>   	return master;
>   }
>   
> +static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt)
> +{
> +	struct arm_smmu_master_data *master;
> +	u8 type = evt[0] >> EVTQ_0_ID_SHIFT & EVTQ_0_ID_MASK;
> +	u32 sid = evt[0] >> EVTQ_0_SID_SHIFT & EVTQ_0_SID_MASK;
> +
> +	struct iommu_fault_event fault = {
> +		.page_req_group_id = evt[1] >> EVTQ_1_STAG_SHIFT & EVTQ_1_STAG_MASK,
> +		.addr		= evt[2] >> EVTQ_2_ADDR_SHIFT & EVTQ_2_ADDR_MASK,
> +		.last_req	= true,
> +	};
> +
> +	switch (type) {
> +	case EVT_ID_TRANSLATION_FAULT:
> +	case EVT_ID_ADDR_SIZE_FAULT:
> +	case EVT_ID_ACCESS_FAULT:
> +		fault.reason = IOMMU_FAULT_REASON_PTE_FETCH;
> +		break;
> +	case EVT_ID_PERMISSION_FAULT:
> +		fault.reason = IOMMU_FAULT_REASON_PERMISSION;
> +		break;
> +	default:
> +		/* TODO: report other unrecoverable faults. */
> +		return -EFAULT;
> +	}
> +
> +	/* Stage-2 is always pinned at the moment */
> +	if (evt[1] & EVTQ_1_S2)
> +		return -EFAULT;
> +
> +	master = arm_smmu_find_master(smmu, sid);
> +	if (!master)
> +		return -EINVAL;
> +
> +	/*
> +	 * The domain is valid until the fault returns, because detach() flushes
> +	 * the fault queue.
> +	 */
> +	if (evt[1] & EVTQ_1_STALL)
> +		fault.type = IOMMU_FAULT_PAGE_REQ;
> +	else
> +		fault.type = IOMMU_FAULT_DMA_UNRECOV;
> +
> +	if (evt[1] & EVTQ_1_READ)
> +		fault.prot |= IOMMU_FAULT_READ;
> +	else
> +		fault.prot |= IOMMU_FAULT_WRITE;
> +
> +	if (evt[1] & EVTQ_1_EXEC)
> +		fault.prot |= IOMMU_FAULT_EXEC;
> +
> +	if (evt[1] & EVTQ_1_PRIV)
> +		fault.prot |= IOMMU_FAULT_PRIV;
> +
> +	if (evt[0] & EVTQ_0_SSV) {
> +		fault.pasid_valid = true;
> +		fault.pasid = evt[0] >> EVTQ_0_SSID_SHIFT & EVTQ_0_SSID_MASK;
> +	}
> +
> +	/* Report to device driver or populate the page tables */
> +	return iommu_report_device_fault(master->dev, &fault);
> +}
> +
>   /* IRQ and event handlers */
>   static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
>   {
> -	int i;
> +	int i, ret;
>   	int num_handled = 0;
>   	struct arm_smmu_device *smmu = dev;
>   	struct arm_smmu_queue *q = &smmu->evtq.q;
> @@ -1300,12 +1451,19 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
>   		while (!queue_remove_raw(q, evt)) {
>   			u8 id = evt[0] >> EVTQ_0_ID_SHIFT & EVTQ_0_ID_MASK;
>   
> +			spin_unlock(&q->wq.lock);
> +			ret = arm_smmu_handle_evt(smmu, evt);
> +			spin_lock(&q->wq.lock);
> +
>   			if (++num_handled == queue_size) {
>   				q->batch++;
>   				wake_up_locked(&q->wq);
>   				num_handled = 0;
>   			}
>   
> +			if (!ret)
> +				continue;
> +
>   			dev_info(smmu->dev, "event 0x%02x received:\n", id);
>   			for (i = 0; i < ARRAY_SIZE(evt); ++i)
>   				dev_info(smmu->dev, "\t0x%016llx\n",
> @@ -1442,7 +1600,9 @@ static int arm_smmu_flush_queues(struct notifier_block *nb,
>   		master = dev->iommu_fwspec->iommu_priv;
>   
>   	if (master) {
> -		/* TODO: add support for PRI and Stall */
> +		if (master->ste.can_stall)
> +			arm_smmu_flush_queue(smmu, &smmu->evtq.q, "evtq");
> +		/* TODO: add support for PRI */
>   		return 0;
>   	}
>   
> @@ -1756,7 +1916,8 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
>   		.order			= master->ssid_bits,
>   		.sync			= &arm_smmu_ctx_sync,
>   		.arm_smmu = {
> -			.stall		= !!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE),
> +			.stall		= !!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE) ||
> +					  master->ste.can_stall,
>   			.asid_bits	= smmu->asid_bits,
>   			.hw_access	= !!(smmu->features & ARM_SMMU_FEAT_HA),
>   			.hw_dirty	= !!(smmu->features & ARM_SMMU_FEAT_HD),
> @@ -2296,6 +2457,11 @@ static int arm_smmu_add_device(struct device *dev)
>   
>   	master->ssid_bits = min(smmu->ssid_bits, fwspec->num_pasid_bits);
>   
> +	if (fwspec->can_stall && smmu->features & ARM_SMMU_FEAT_STALLS) {
> +		master->can_fault = true;
> +		master->ste.can_stall = true;
> +	}
> +
>   	group = iommu_group_get_for_dev(dev);
>   	if (!IS_ERR(group)) {
>   		arm_smmu_insert_master(smmu, master);
> @@ -2435,6 +2601,7 @@ static struct iommu_ops arm_smmu_ops = {
>   	.mm_attach		= arm_smmu_mm_attach,
>   	.mm_detach		= arm_smmu_mm_detach,
>   	.mm_invalidate		= arm_smmu_mm_invalidate,
> +	.page_response		= arm_smmu_page_response,
>   	.map			= arm_smmu_map,
>   	.unmap			= arm_smmu_unmap,
>   	.map_sg			= default_iommu_map_sg,
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 37c3b9d087ce..f5c2f4be2b42 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -227,7 +227,7 @@ struct page_response_msg {
>   	u32 pasid;
>   	enum page_response_code resp_code;
>   	u32 pasid_present:1;
> -	u32 page_req_group_id : 9;
> +	u32 page_req_group_id;
>   	enum page_response_type type;
>   	u32 private_data;
>   };
> @@ -421,7 +421,7 @@ struct iommu_fault_event {
>   	enum iommu_fault_reason reason;
>   	u64 addr;
>   	u32 pasid;
> -	u32 page_req_group_id : 9;
> +	u32 page_req_group_id;
>   	u32 last_req : 1;
>   	u32 pasid_valid : 1;
>   	u32 prot;



^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 29/37] iommu/arm-smmu-v3: Add stall support for platform devices
@ 2018-02-13  1:46       ` Xu Zaibo
  0 siblings, 0 replies; 317+ messages in thread
From: Xu Zaibo @ 2018-02-13  1:46 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu, kvm
  Cc: mark.rutland, xieyisheng1, ilias.apalodimas, catalin.marinas,
	jonathan.cameron, will.deacon, okaya, liguozhu, yi.l.liu,
	lorenzo.pieralisi, ashok.raj, tn, joro, bharatku, rfranz, lenb,
	jacob.jun.pan, alex.williamson, robh+dt, thunder.leizhen,
	bhelgaas, shunyong.yang, dwmw2, liubo95, rjw, jcrouse, robdclark,
	hanjun.guo, sudeep.holla, robin.murphy, christian.koenig,
	nwatters

Hi,

On 2018/2/13 2:33, Jean-Philippe Brucker wrote:
> The SMMU provides a Stall model for handling page faults in platform
> devices. It is similar to PCI PRI, but doesn't require devices to have
> their own translation cache. Instead, faulting transactions are parked and
> the OS is given a chance to fix the page tables and retry the transaction.
>
> Enable stall for devices that support it (opt-in by firmware). When an
> event corresponds to a translation error, call the IOMMU fault handler. If
> the fault is recoverable, it will call us back to terminate or continue
> the stall.
>
> Note that this patch tweaks the iommu_fault_event and page_response_msg to
> extend the fault id field. Stall uses 16 bits of IDs whereas PCI PRI only
> uses 9.
For PCIe devices without ATC,  can they use this Stall model?

Thanks.

Xu Zaibo
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>   drivers/iommu/arm-smmu-v3.c | 175 +++++++++++++++++++++++++++++++++++++++++++-
>   include/linux/iommu.h       |   4 +-
>   2 files changed, 173 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 2430b2140f8d..8b9f5dd06be0 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -338,6 +338,15 @@
>   #define CMDQ_PRI_1_RESP_FAIL		(1UL << CMDQ_PRI_1_RESP_SHIFT)
>   #define CMDQ_PRI_1_RESP_SUCC		(2UL << CMDQ_PRI_1_RESP_SHIFT)
>   
> +#define CMDQ_RESUME_0_SID_SHIFT		32
> +#define CMDQ_RESUME_0_SID_MASK		0xffffffffUL
> +#define CMDQ_RESUME_0_ACTION_SHIFT	12
> +#define CMDQ_RESUME_0_ACTION_TERM	(0UL << CMDQ_RESUME_0_ACTION_SHIFT)
> +#define CMDQ_RESUME_0_ACTION_RETRY	(1UL << CMDQ_RESUME_0_ACTION_SHIFT)
> +#define CMDQ_RESUME_0_ACTION_ABORT	(2UL << CMDQ_RESUME_0_ACTION_SHIFT)
> +#define CMDQ_RESUME_1_STAG_SHIFT	0
> +#define CMDQ_RESUME_1_STAG_MASK		0xffffUL
> +
>   #define CMDQ_SYNC_0_CS_SHIFT		12
>   #define CMDQ_SYNC_0_CS_NONE		(0UL << CMDQ_SYNC_0_CS_SHIFT)
>   #define CMDQ_SYNC_0_CS_IRQ		(1UL << CMDQ_SYNC_0_CS_SHIFT)
> @@ -358,6 +367,31 @@
>   #define EVTQ_0_ID_SHIFT			0
>   #define EVTQ_0_ID_MASK			0xffUL
>   
> +#define EVT_ID_TRANSLATION_FAULT	0x10
> +#define EVT_ID_ADDR_SIZE_FAULT		0x11
> +#define EVT_ID_ACCESS_FAULT		0x12
> +#define EVT_ID_PERMISSION_FAULT		0x13
> +
> +#define EVTQ_0_SSV			(1UL << 11)
> +#define EVTQ_0_SSID_SHIFT		12
> +#define EVTQ_0_SSID_MASK		0xfffffUL
> +#define EVTQ_0_SID_SHIFT		32
> +#define EVTQ_0_SID_MASK			0xffffffffUL
> +#define EVTQ_1_STAG_SHIFT		0
> +#define EVTQ_1_STAG_MASK		0xffffUL
> +#define EVTQ_1_STALL			(1UL << 31)
> +#define EVTQ_1_PRIV			(1UL << 33)
> +#define EVTQ_1_EXEC			(1UL << 34)
> +#define EVTQ_1_READ			(1UL << 35)
> +#define EVTQ_1_S2			(1UL << 39)
> +#define EVTQ_1_CLASS_SHIFT		40
> +#define EVTQ_1_CLASS_MASK		0x3UL
> +#define EVTQ_1_TT_READ			(1UL << 44)
> +#define EVTQ_2_ADDR_SHIFT		0
> +#define EVTQ_2_ADDR_MASK		0xffffffffffffffffUL
> +#define EVTQ_3_IPA_SHIFT		12
> +#define EVTQ_3_IPA_MASK			0xffffffffffUL
> +
>   /* PRI queue */
>   #define PRIQ_ENT_DWORDS			2
>   #define PRIQ_MAX_SZ_SHIFT		8
> @@ -472,6 +506,13 @@ struct arm_smmu_cmdq_ent {
>   			enum pri_resp		resp;
>   		} pri;
>   
> +		#define CMDQ_OP_RESUME		0x44
> +		struct {
> +			u32			sid;
> +			u16			stag;
> +			enum page_response_code	resp;
> +		} resume;
> +
>   		#define CMDQ_OP_CMD_SYNC	0x46
>   		struct {
>   			u32			msidata;
> @@ -545,6 +586,8 @@ struct arm_smmu_strtab_ent {
>   	bool				assigned;
>   	struct arm_smmu_s1_cfg		*s1_cfg;
>   	struct arm_smmu_s2_cfg		*s2_cfg;
> +
> +	bool				can_stall;
>   };
>   
>   struct arm_smmu_strtab_cfg {
> @@ -904,6 +947,21 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
>   			return -EINVAL;
>   		}
>   		break;
> +	case CMDQ_OP_RESUME:
> +		cmd[0] |= (u64)ent->resume.sid << CMDQ_RESUME_0_SID_SHIFT;
> +		cmd[1] |= ent->resume.stag << CMDQ_RESUME_1_STAG_SHIFT;
> +		switch (ent->resume.resp) {
> +		case IOMMU_PAGE_RESP_INVALID:
> +		case IOMMU_PAGE_RESP_FAILURE:
> +			cmd[0] |= CMDQ_RESUME_0_ACTION_ABORT;
> +			break;
> +		case IOMMU_PAGE_RESP_SUCCESS:
> +			cmd[0] |= CMDQ_RESUME_0_ACTION_RETRY;
> +			break;
> +		default:
> +			return -EINVAL;
> +		}
> +		break;
>   	case CMDQ_OP_CMD_SYNC:
>   		if (ent->sync.msiaddr)
>   			cmd[0] |= CMDQ_SYNC_0_CS_IRQ;
> @@ -1065,6 +1123,35 @@ static void arm_smmu_cmdq_issue_sync(struct arm_smmu_device *smmu)
>   		dev_err_ratelimited(smmu->dev, "CMD_SYNC timeout\n");
>   }
>   
> +static int arm_smmu_page_response(struct iommu_domain *domain,
> +				  struct device *dev,
> +				  struct page_response_msg *resp)
> +{
> +	int sid = dev->iommu_fwspec->ids[0];
> +	struct arm_smmu_cmdq_ent cmd = {0};
> +	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
> +
> +	if (master->ste.can_stall) {
> +		cmd.opcode		= CMDQ_OP_RESUME;
> +		cmd.resume.sid		= sid;
> +		cmd.resume.stag		= resp->page_req_group_id;
> +		cmd.resume.resp		= resp->resp_code;
> +	} else {
> +		/* TODO: put PRI response here */
> +		return -EINVAL;
> +	}
> +
> +	arm_smmu_cmdq_issue_cmd(master->smmu, &cmd);
> +	/*
> +	 * Don't send a SYNC, it doesn't do anything for RESUME or PRI_RESP.
> +	 * RESUME consumption guarantees that the stalled transaction will be
> +	 * terminated... at some point in the future. PRI_RESP is fire and
> +	 * forget.
> +	 */
> +
> +	return 0;
> +}
> +
>   /* Stream table manipulation functions */
>   static void
>   arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
> @@ -1182,7 +1269,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
>   			 STRTAB_STE_1_STRW_SHIFT);
>   
>   		if (smmu->features & ARM_SMMU_FEAT_STALLS &&
> -		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
> +		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE) &&
> +		   !ste->can_stall)
>   			dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
>   
>   		val |= (cfg->base & STRTAB_STE_0_S1CTXPTR_MASK
> @@ -1285,10 +1373,73 @@ arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
>   	return master;
>   }
>   
> +static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt)
> +{
> +	struct arm_smmu_master_data *master;
> +	u8 type = evt[0] >> EVTQ_0_ID_SHIFT & EVTQ_0_ID_MASK;
> +	u32 sid = evt[0] >> EVTQ_0_SID_SHIFT & EVTQ_0_SID_MASK;
> +
> +	struct iommu_fault_event fault = {
> +		.page_req_group_id = evt[1] >> EVTQ_1_STAG_SHIFT & EVTQ_1_STAG_MASK,
> +		.addr		= evt[2] >> EVTQ_2_ADDR_SHIFT & EVTQ_2_ADDR_MASK,
> +		.last_req	= true,
> +	};
> +
> +	switch (type) {
> +	case EVT_ID_TRANSLATION_FAULT:
> +	case EVT_ID_ADDR_SIZE_FAULT:
> +	case EVT_ID_ACCESS_FAULT:
> +		fault.reason = IOMMU_FAULT_REASON_PTE_FETCH;
> +		break;
> +	case EVT_ID_PERMISSION_FAULT:
> +		fault.reason = IOMMU_FAULT_REASON_PERMISSION;
> +		break;
> +	default:
> +		/* TODO: report other unrecoverable faults. */
> +		return -EFAULT;
> +	}
> +
> +	/* Stage-2 is always pinned at the moment */
> +	if (evt[1] & EVTQ_1_S2)
> +		return -EFAULT;
> +
> +	master = arm_smmu_find_master(smmu, sid);
> +	if (!master)
> +		return -EINVAL;
> +
> +	/*
> +	 * The domain is valid until the fault returns, because detach() flushes
> +	 * the fault queue.
> +	 */
> +	if (evt[1] & EVTQ_1_STALL)
> +		fault.type = IOMMU_FAULT_PAGE_REQ;
> +	else
> +		fault.type = IOMMU_FAULT_DMA_UNRECOV;
> +
> +	if (evt[1] & EVTQ_1_READ)
> +		fault.prot |= IOMMU_FAULT_READ;
> +	else
> +		fault.prot |= IOMMU_FAULT_WRITE;
> +
> +	if (evt[1] & EVTQ_1_EXEC)
> +		fault.prot |= IOMMU_FAULT_EXEC;
> +
> +	if (evt[1] & EVTQ_1_PRIV)
> +		fault.prot |= IOMMU_FAULT_PRIV;
> +
> +	if (evt[0] & EVTQ_0_SSV) {
> +		fault.pasid_valid = true;
> +		fault.pasid = evt[0] >> EVTQ_0_SSID_SHIFT & EVTQ_0_SSID_MASK;
> +	}
> +
> +	/* Report to device driver or populate the page tables */
> +	return iommu_report_device_fault(master->dev, &fault);
> +}
> +
>   /* IRQ and event handlers */
>   static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
>   {
> -	int i;
> +	int i, ret;
>   	int num_handled = 0;
>   	struct arm_smmu_device *smmu = dev;
>   	struct arm_smmu_queue *q = &smmu->evtq.q;
> @@ -1300,12 +1451,19 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
>   		while (!queue_remove_raw(q, evt)) {
>   			u8 id = evt[0] >> EVTQ_0_ID_SHIFT & EVTQ_0_ID_MASK;
>   
> +			spin_unlock(&q->wq.lock);
> +			ret = arm_smmu_handle_evt(smmu, evt);
> +			spin_lock(&q->wq.lock);
> +
>   			if (++num_handled == queue_size) {
>   				q->batch++;
>   				wake_up_locked(&q->wq);
>   				num_handled = 0;
>   			}
>   
> +			if (!ret)
> +				continue;
> +
>   			dev_info(smmu->dev, "event 0x%02x received:\n", id);
>   			for (i = 0; i < ARRAY_SIZE(evt); ++i)
>   				dev_info(smmu->dev, "\t0x%016llx\n",
> @@ -1442,7 +1600,9 @@ static int arm_smmu_flush_queues(struct notifier_block *nb,
>   		master = dev->iommu_fwspec->iommu_priv;
>   
>   	if (master) {
> -		/* TODO: add support for PRI and Stall */
> +		if (master->ste.can_stall)
> +			arm_smmu_flush_queue(smmu, &smmu->evtq.q, "evtq");
> +		/* TODO: add support for PRI */
>   		return 0;
>   	}
>   
> @@ -1756,7 +1916,8 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
>   		.order			= master->ssid_bits,
>   		.sync			= &arm_smmu_ctx_sync,
>   		.arm_smmu = {
> -			.stall		= !!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE),
> +			.stall		= !!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE) ||
> +					  master->ste.can_stall,
>   			.asid_bits	= smmu->asid_bits,
>   			.hw_access	= !!(smmu->features & ARM_SMMU_FEAT_HA),
>   			.hw_dirty	= !!(smmu->features & ARM_SMMU_FEAT_HD),
> @@ -2296,6 +2457,11 @@ static int arm_smmu_add_device(struct device *dev)
>   
>   	master->ssid_bits = min(smmu->ssid_bits, fwspec->num_pasid_bits);
>   
> +	if (fwspec->can_stall && smmu->features & ARM_SMMU_FEAT_STALLS) {
> +		master->can_fault = true;
> +		master->ste.can_stall = true;
> +	}
> +
>   	group = iommu_group_get_for_dev(dev);
>   	if (!IS_ERR(group)) {
>   		arm_smmu_insert_master(smmu, master);
> @@ -2435,6 +2601,7 @@ static struct iommu_ops arm_smmu_ops = {
>   	.mm_attach		= arm_smmu_mm_attach,
>   	.mm_detach		= arm_smmu_mm_detach,
>   	.mm_invalidate		= arm_smmu_mm_invalidate,
> +	.page_response		= arm_smmu_page_response,
>   	.map			= arm_smmu_map,
>   	.unmap			= arm_smmu_unmap,
>   	.map_sg			= default_iommu_map_sg,
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 37c3b9d087ce..f5c2f4be2b42 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -227,7 +227,7 @@ struct page_response_msg {
>   	u32 pasid;
>   	enum page_response_code resp_code;
>   	u32 pasid_present:1;
> -	u32 page_req_group_id : 9;
> +	u32 page_req_group_id;
>   	enum page_response_type type;
>   	u32 private_data;
>   };
> @@ -421,7 +421,7 @@ struct iommu_fault_event {
>   	enum iommu_fault_reason reason;
>   	u64 addr;
>   	u32 pasid;
> -	u32 page_req_group_id : 9;
> +	u32 page_req_group_id;
>   	u32 last_req : 1;
>   	u32 pasid_valid : 1;
>   	u32 prot;



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 29/37] iommu/arm-smmu-v3: Add stall support for platform devices
@ 2018-02-13  1:46       ` Xu Zaibo
  0 siblings, 0 replies; 317+ messages in thread
From: Xu Zaibo @ 2018-02-13  1:46 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, ilias.apalodimas, jonathan.cameron,
	shunyong.yang, nwatters, okaya, jcrouse, rfranz, dwmw2,
	jacob.jun.pan, yi.l.liu, ashok.raj, robdclark, christian.koenig,
	bharatku, liguozhu

Hi,

On 2018/2/13 2:33, Jean-Philippe Brucker wrote:
> The SMMU provides a Stall model for handling page faults in platform
> devices. It is similar to PCI PRI, but doesn't require devices to have
> their own translation cache. Instead, faulting transactions are parked and
> the OS is given a chance to fix the page tables and retry the transaction.
>
> Enable stall for devices that support it (opt-in by firmware). When an
> event corresponds to a translation error, call the IOMMU fault handler. If
> the fault is recoverable, it will call us back to terminate or continue
> the stall.
>
> Note that this patch tweaks the iommu_fault_event and page_response_msg to
> extend the fault id field. Stall uses 16 bits of IDs whereas PCI PRI only
> uses 9.
For PCIe devices without ATC,  can they use this Stall model?

Thanks.

Xu Zaibo
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>   drivers/iommu/arm-smmu-v3.c | 175 +++++++++++++++++++++++++++++++++++++++++++-
>   include/linux/iommu.h       |   4 +-
>   2 files changed, 173 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 2430b2140f8d..8b9f5dd06be0 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -338,6 +338,15 @@
>   #define CMDQ_PRI_1_RESP_FAIL		(1UL << CMDQ_PRI_1_RESP_SHIFT)
>   #define CMDQ_PRI_1_RESP_SUCC		(2UL << CMDQ_PRI_1_RESP_SHIFT)
>   
> +#define CMDQ_RESUME_0_SID_SHIFT		32
> +#define CMDQ_RESUME_0_SID_MASK		0xffffffffUL
> +#define CMDQ_RESUME_0_ACTION_SHIFT	12
> +#define CMDQ_RESUME_0_ACTION_TERM	(0UL << CMDQ_RESUME_0_ACTION_SHIFT)
> +#define CMDQ_RESUME_0_ACTION_RETRY	(1UL << CMDQ_RESUME_0_ACTION_SHIFT)
> +#define CMDQ_RESUME_0_ACTION_ABORT	(2UL << CMDQ_RESUME_0_ACTION_SHIFT)
> +#define CMDQ_RESUME_1_STAG_SHIFT	0
> +#define CMDQ_RESUME_1_STAG_MASK		0xffffUL
> +
>   #define CMDQ_SYNC_0_CS_SHIFT		12
>   #define CMDQ_SYNC_0_CS_NONE		(0UL << CMDQ_SYNC_0_CS_SHIFT)
>   #define CMDQ_SYNC_0_CS_IRQ		(1UL << CMDQ_SYNC_0_CS_SHIFT)
> @@ -358,6 +367,31 @@
>   #define EVTQ_0_ID_SHIFT			0
>   #define EVTQ_0_ID_MASK			0xffUL
>   
> +#define EVT_ID_TRANSLATION_FAULT	0x10
> +#define EVT_ID_ADDR_SIZE_FAULT		0x11
> +#define EVT_ID_ACCESS_FAULT		0x12
> +#define EVT_ID_PERMISSION_FAULT		0x13
> +
> +#define EVTQ_0_SSV			(1UL << 11)
> +#define EVTQ_0_SSID_SHIFT		12
> +#define EVTQ_0_SSID_MASK		0xfffffUL
> +#define EVTQ_0_SID_SHIFT		32
> +#define EVTQ_0_SID_MASK			0xffffffffUL
> +#define EVTQ_1_STAG_SHIFT		0
> +#define EVTQ_1_STAG_MASK		0xffffUL
> +#define EVTQ_1_STALL			(1UL << 31)
> +#define EVTQ_1_PRIV			(1UL << 33)
> +#define EVTQ_1_EXEC			(1UL << 34)
> +#define EVTQ_1_READ			(1UL << 35)
> +#define EVTQ_1_S2			(1UL << 39)
> +#define EVTQ_1_CLASS_SHIFT		40
> +#define EVTQ_1_CLASS_MASK		0x3UL
> +#define EVTQ_1_TT_READ			(1UL << 44)
> +#define EVTQ_2_ADDR_SHIFT		0
> +#define EVTQ_2_ADDR_MASK		0xffffffffffffffffUL
> +#define EVTQ_3_IPA_SHIFT		12
> +#define EVTQ_3_IPA_MASK			0xffffffffffUL
> +
>   /* PRI queue */
>   #define PRIQ_ENT_DWORDS			2
>   #define PRIQ_MAX_SZ_SHIFT		8
> @@ -472,6 +506,13 @@ struct arm_smmu_cmdq_ent {
>   			enum pri_resp		resp;
>   		} pri;
>   
> +		#define CMDQ_OP_RESUME		0x44
> +		struct {
> +			u32			sid;
> +			u16			stag;
> +			enum page_response_code	resp;
> +		} resume;
> +
>   		#define CMDQ_OP_CMD_SYNC	0x46
>   		struct {
>   			u32			msidata;
> @@ -545,6 +586,8 @@ struct arm_smmu_strtab_ent {
>   	bool				assigned;
>   	struct arm_smmu_s1_cfg		*s1_cfg;
>   	struct arm_smmu_s2_cfg		*s2_cfg;
> +
> +	bool				can_stall;
>   };
>   
>   struct arm_smmu_strtab_cfg {
> @@ -904,6 +947,21 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
>   			return -EINVAL;
>   		}
>   		break;
> +	case CMDQ_OP_RESUME:
> +		cmd[0] |= (u64)ent->resume.sid << CMDQ_RESUME_0_SID_SHIFT;
> +		cmd[1] |= ent->resume.stag << CMDQ_RESUME_1_STAG_SHIFT;
> +		switch (ent->resume.resp) {
> +		case IOMMU_PAGE_RESP_INVALID:
> +		case IOMMU_PAGE_RESP_FAILURE:
> +			cmd[0] |= CMDQ_RESUME_0_ACTION_ABORT;
> +			break;
> +		case IOMMU_PAGE_RESP_SUCCESS:
> +			cmd[0] |= CMDQ_RESUME_0_ACTION_RETRY;
> +			break;
> +		default:
> +			return -EINVAL;
> +		}
> +		break;
>   	case CMDQ_OP_CMD_SYNC:
>   		if (ent->sync.msiaddr)
>   			cmd[0] |= CMDQ_SYNC_0_CS_IRQ;
> @@ -1065,6 +1123,35 @@ static void arm_smmu_cmdq_issue_sync(struct arm_smmu_device *smmu)
>   		dev_err_ratelimited(smmu->dev, "CMD_SYNC timeout\n");
>   }
>   
> +static int arm_smmu_page_response(struct iommu_domain *domain,
> +				  struct device *dev,
> +				  struct page_response_msg *resp)
> +{
> +	int sid = dev->iommu_fwspec->ids[0];
> +	struct arm_smmu_cmdq_ent cmd = {0};
> +	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
> +
> +	if (master->ste.can_stall) {
> +		cmd.opcode		= CMDQ_OP_RESUME;
> +		cmd.resume.sid		= sid;
> +		cmd.resume.stag		= resp->page_req_group_id;
> +		cmd.resume.resp		= resp->resp_code;
> +	} else {
> +		/* TODO: put PRI response here */
> +		return -EINVAL;
> +	}
> +
> +	arm_smmu_cmdq_issue_cmd(master->smmu, &cmd);
> +	/*
> +	 * Don't send a SYNC, it doesn't do anything for RESUME or PRI_RESP.
> +	 * RESUME consumption guarantees that the stalled transaction will be
> +	 * terminated... at some point in the future. PRI_RESP is fire and
> +	 * forget.
> +	 */
> +
> +	return 0;
> +}
> +
>   /* Stream table manipulation functions */
>   static void
>   arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
> @@ -1182,7 +1269,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
>   			 STRTAB_STE_1_STRW_SHIFT);
>   
>   		if (smmu->features & ARM_SMMU_FEAT_STALLS &&
> -		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
> +		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE) &&
> +		   !ste->can_stall)
>   			dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
>   
>   		val |= (cfg->base & STRTAB_STE_0_S1CTXPTR_MASK
> @@ -1285,10 +1373,73 @@ arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
>   	return master;
>   }
>   
> +static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt)
> +{
> +	struct arm_smmu_master_data *master;
> +	u8 type = evt[0] >> EVTQ_0_ID_SHIFT & EVTQ_0_ID_MASK;
> +	u32 sid = evt[0] >> EVTQ_0_SID_SHIFT & EVTQ_0_SID_MASK;
> +
> +	struct iommu_fault_event fault = {
> +		.page_req_group_id = evt[1] >> EVTQ_1_STAG_SHIFT & EVTQ_1_STAG_MASK,
> +		.addr		= evt[2] >> EVTQ_2_ADDR_SHIFT & EVTQ_2_ADDR_MASK,
> +		.last_req	= true,
> +	};
> +
> +	switch (type) {
> +	case EVT_ID_TRANSLATION_FAULT:
> +	case EVT_ID_ADDR_SIZE_FAULT:
> +	case EVT_ID_ACCESS_FAULT:
> +		fault.reason = IOMMU_FAULT_REASON_PTE_FETCH;
> +		break;
> +	case EVT_ID_PERMISSION_FAULT:
> +		fault.reason = IOMMU_FAULT_REASON_PERMISSION;
> +		break;
> +	default:
> +		/* TODO: report other unrecoverable faults. */
> +		return -EFAULT;
> +	}
> +
> +	/* Stage-2 is always pinned at the moment */
> +	if (evt[1] & EVTQ_1_S2)
> +		return -EFAULT;
> +
> +	master = arm_smmu_find_master(smmu, sid);
> +	if (!master)
> +		return -EINVAL;
> +
> +	/*
> +	 * The domain is valid until the fault returns, because detach() flushes
> +	 * the fault queue.
> +	 */
> +	if (evt[1] & EVTQ_1_STALL)
> +		fault.type = IOMMU_FAULT_PAGE_REQ;
> +	else
> +		fault.type = IOMMU_FAULT_DMA_UNRECOV;
> +
> +	if (evt[1] & EVTQ_1_READ)
> +		fault.prot |= IOMMU_FAULT_READ;
> +	else
> +		fault.prot |= IOMMU_FAULT_WRITE;
> +
> +	if (evt[1] & EVTQ_1_EXEC)
> +		fault.prot |= IOMMU_FAULT_EXEC;
> +
> +	if (evt[1] & EVTQ_1_PRIV)
> +		fault.prot |= IOMMU_FAULT_PRIV;
> +
> +	if (evt[0] & EVTQ_0_SSV) {
> +		fault.pasid_valid = true;
> +		fault.pasid = evt[0] >> EVTQ_0_SSID_SHIFT & EVTQ_0_SSID_MASK;
> +	}
> +
> +	/* Report to device driver or populate the page tables */
> +	return iommu_report_device_fault(master->dev, &fault);
> +}
> +
>   /* IRQ and event handlers */
>   static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
>   {
> -	int i;
> +	int i, ret;
>   	int num_handled = 0;
>   	struct arm_smmu_device *smmu = dev;
>   	struct arm_smmu_queue *q = &smmu->evtq.q;
> @@ -1300,12 +1451,19 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
>   		while (!queue_remove_raw(q, evt)) {
>   			u8 id = evt[0] >> EVTQ_0_ID_SHIFT & EVTQ_0_ID_MASK;
>   
> +			spin_unlock(&q->wq.lock);
> +			ret = arm_smmu_handle_evt(smmu, evt);
> +			spin_lock(&q->wq.lock);
> +
>   			if (++num_handled == queue_size) {
>   				q->batch++;
>   				wake_up_locked(&q->wq);
>   				num_handled = 0;
>   			}
>   
> +			if (!ret)
> +				continue;
> +
>   			dev_info(smmu->dev, "event 0x%02x received:\n", id);
>   			for (i = 0; i < ARRAY_SIZE(evt); ++i)
>   				dev_info(smmu->dev, "\t0x%016llx\n",
> @@ -1442,7 +1600,9 @@ static int arm_smmu_flush_queues(struct notifier_block *nb,
>   		master = dev->iommu_fwspec->iommu_priv;
>   
>   	if (master) {
> -		/* TODO: add support for PRI and Stall */
> +		if (master->ste.can_stall)
> +			arm_smmu_flush_queue(smmu, &smmu->evtq.q, "evtq");
> +		/* TODO: add support for PRI */
>   		return 0;
>   	}
>   
> @@ -1756,7 +1916,8 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
>   		.order			= master->ssid_bits,
>   		.sync			= &arm_smmu_ctx_sync,
>   		.arm_smmu = {
> -			.stall		= !!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE),
> +			.stall		= !!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE) ||
> +					  master->ste.can_stall,
>   			.asid_bits	= smmu->asid_bits,
>   			.hw_access	= !!(smmu->features & ARM_SMMU_FEAT_HA),
>   			.hw_dirty	= !!(smmu->features & ARM_SMMU_FEAT_HD),
> @@ -2296,6 +2457,11 @@ static int arm_smmu_add_device(struct device *dev)
>   
>   	master->ssid_bits = min(smmu->ssid_bits, fwspec->num_pasid_bits);
>   
> +	if (fwspec->can_stall && smmu->features & ARM_SMMU_FEAT_STALLS) {
> +		master->can_fault = true;
> +		master->ste.can_stall = true;
> +	}
> +
>   	group = iommu_group_get_for_dev(dev);
>   	if (!IS_ERR(group)) {
>   		arm_smmu_insert_master(smmu, master);
> @@ -2435,6 +2601,7 @@ static struct iommu_ops arm_smmu_ops = {
>   	.mm_attach		= arm_smmu_mm_attach,
>   	.mm_detach		= arm_smmu_mm_detach,
>   	.mm_invalidate		= arm_smmu_mm_invalidate,
> +	.page_response		= arm_smmu_page_response,
>   	.map			= arm_smmu_map,
>   	.unmap			= arm_smmu_unmap,
>   	.map_sg			= default_iommu_map_sg,
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 37c3b9d087ce..f5c2f4be2b42 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -227,7 +227,7 @@ struct page_response_msg {
>   	u32 pasid;
>   	enum page_response_code resp_code;
>   	u32 pasid_present:1;
> -	u32 page_req_group_id : 9;
> +	u32 page_req_group_id;
>   	enum page_response_type type;
>   	u32 private_data;
>   };
> @@ -421,7 +421,7 @@ struct iommu_fault_event {
>   	enum iommu_fault_reason reason;
>   	u64 addr;
>   	u32 pasid;
> -	u32 page_req_group_id : 9;
> +	u32 page_req_group_id;
>   	u32 last_req : 1;
>   	u32 pasid_valid : 1;
>   	u32 prot;



^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 29/37] iommu/arm-smmu-v3: Add stall support for platform devices
@ 2018-02-13  1:46       ` Xu Zaibo
  0 siblings, 0 replies; 317+ messages in thread
From: Xu Zaibo @ 2018-02-13  1:46 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 2018/2/13 2:33, Jean-Philippe Brucker wrote:
> The SMMU provides a Stall model for handling page faults in platform
> devices. It is similar to PCI PRI, but doesn't require devices to have
> their own translation cache. Instead, faulting transactions are parked and
> the OS is given a chance to fix the page tables and retry the transaction.
>
> Enable stall for devices that support it (opt-in by firmware). When an
> event corresponds to a translation error, call the IOMMU fault handler. If
> the fault is recoverable, it will call us back to terminate or continue
> the stall.
>
> Note that this patch tweaks the iommu_fault_event and page_response_msg to
> extend the fault id field. Stall uses 16 bits of IDs whereas PCI PRI only
> uses 9.
For PCIe devices without ATC,  can they use this Stall model?

Thanks.

Xu Zaibo
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>   drivers/iommu/arm-smmu-v3.c | 175 +++++++++++++++++++++++++++++++++++++++++++-
>   include/linux/iommu.h       |   4 +-
>   2 files changed, 173 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 2430b2140f8d..8b9f5dd06be0 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -338,6 +338,15 @@
>   #define CMDQ_PRI_1_RESP_FAIL		(1UL << CMDQ_PRI_1_RESP_SHIFT)
>   #define CMDQ_PRI_1_RESP_SUCC		(2UL << CMDQ_PRI_1_RESP_SHIFT)
>   
> +#define CMDQ_RESUME_0_SID_SHIFT		32
> +#define CMDQ_RESUME_0_SID_MASK		0xffffffffUL
> +#define CMDQ_RESUME_0_ACTION_SHIFT	12
> +#define CMDQ_RESUME_0_ACTION_TERM	(0UL << CMDQ_RESUME_0_ACTION_SHIFT)
> +#define CMDQ_RESUME_0_ACTION_RETRY	(1UL << CMDQ_RESUME_0_ACTION_SHIFT)
> +#define CMDQ_RESUME_0_ACTION_ABORT	(2UL << CMDQ_RESUME_0_ACTION_SHIFT)
> +#define CMDQ_RESUME_1_STAG_SHIFT	0
> +#define CMDQ_RESUME_1_STAG_MASK		0xffffUL
> +
>   #define CMDQ_SYNC_0_CS_SHIFT		12
>   #define CMDQ_SYNC_0_CS_NONE		(0UL << CMDQ_SYNC_0_CS_SHIFT)
>   #define CMDQ_SYNC_0_CS_IRQ		(1UL << CMDQ_SYNC_0_CS_SHIFT)
> @@ -358,6 +367,31 @@
>   #define EVTQ_0_ID_SHIFT			0
>   #define EVTQ_0_ID_MASK			0xffUL
>   
> +#define EVT_ID_TRANSLATION_FAULT	0x10
> +#define EVT_ID_ADDR_SIZE_FAULT		0x11
> +#define EVT_ID_ACCESS_FAULT		0x12
> +#define EVT_ID_PERMISSION_FAULT		0x13
> +
> +#define EVTQ_0_SSV			(1UL << 11)
> +#define EVTQ_0_SSID_SHIFT		12
> +#define EVTQ_0_SSID_MASK		0xfffffUL
> +#define EVTQ_0_SID_SHIFT		32
> +#define EVTQ_0_SID_MASK			0xffffffffUL
> +#define EVTQ_1_STAG_SHIFT		0
> +#define EVTQ_1_STAG_MASK		0xffffUL
> +#define EVTQ_1_STALL			(1UL << 31)
> +#define EVTQ_1_PRIV			(1UL << 33)
> +#define EVTQ_1_EXEC			(1UL << 34)
> +#define EVTQ_1_READ			(1UL << 35)
> +#define EVTQ_1_S2			(1UL << 39)
> +#define EVTQ_1_CLASS_SHIFT		40
> +#define EVTQ_1_CLASS_MASK		0x3UL
> +#define EVTQ_1_TT_READ			(1UL << 44)
> +#define EVTQ_2_ADDR_SHIFT		0
> +#define EVTQ_2_ADDR_MASK		0xffffffffffffffffUL
> +#define EVTQ_3_IPA_SHIFT		12
> +#define EVTQ_3_IPA_MASK			0xffffffffffUL
> +
>   /* PRI queue */
>   #define PRIQ_ENT_DWORDS			2
>   #define PRIQ_MAX_SZ_SHIFT		8
> @@ -472,6 +506,13 @@ struct arm_smmu_cmdq_ent {
>   			enum pri_resp		resp;
>   		} pri;
>   
> +		#define CMDQ_OP_RESUME		0x44
> +		struct {
> +			u32			sid;
> +			u16			stag;
> +			enum page_response_code	resp;
> +		} resume;
> +
>   		#define CMDQ_OP_CMD_SYNC	0x46
>   		struct {
>   			u32			msidata;
> @@ -545,6 +586,8 @@ struct arm_smmu_strtab_ent {
>   	bool				assigned;
>   	struct arm_smmu_s1_cfg		*s1_cfg;
>   	struct arm_smmu_s2_cfg		*s2_cfg;
> +
> +	bool				can_stall;
>   };
>   
>   struct arm_smmu_strtab_cfg {
> @@ -904,6 +947,21 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
>   			return -EINVAL;
>   		}
>   		break;
> +	case CMDQ_OP_RESUME:
> +		cmd[0] |= (u64)ent->resume.sid << CMDQ_RESUME_0_SID_SHIFT;
> +		cmd[1] |= ent->resume.stag << CMDQ_RESUME_1_STAG_SHIFT;
> +		switch (ent->resume.resp) {
> +		case IOMMU_PAGE_RESP_INVALID:
> +		case IOMMU_PAGE_RESP_FAILURE:
> +			cmd[0] |= CMDQ_RESUME_0_ACTION_ABORT;
> +			break;
> +		case IOMMU_PAGE_RESP_SUCCESS:
> +			cmd[0] |= CMDQ_RESUME_0_ACTION_RETRY;
> +			break;
> +		default:
> +			return -EINVAL;
> +		}
> +		break;
>   	case CMDQ_OP_CMD_SYNC:
>   		if (ent->sync.msiaddr)
>   			cmd[0] |= CMDQ_SYNC_0_CS_IRQ;
> @@ -1065,6 +1123,35 @@ static void arm_smmu_cmdq_issue_sync(struct arm_smmu_device *smmu)
>   		dev_err_ratelimited(smmu->dev, "CMD_SYNC timeout\n");
>   }
>   
> +static int arm_smmu_page_response(struct iommu_domain *domain,
> +				  struct device *dev,
> +				  struct page_response_msg *resp)
> +{
> +	int sid = dev->iommu_fwspec->ids[0];
> +	struct arm_smmu_cmdq_ent cmd = {0};
> +	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
> +
> +	if (master->ste.can_stall) {
> +		cmd.opcode		= CMDQ_OP_RESUME;
> +		cmd.resume.sid		= sid;
> +		cmd.resume.stag		= resp->page_req_group_id;
> +		cmd.resume.resp		= resp->resp_code;
> +	} else {
> +		/* TODO: put PRI response here */
> +		return -EINVAL;
> +	}
> +
> +	arm_smmu_cmdq_issue_cmd(master->smmu, &cmd);
> +	/*
> +	 * Don't send a SYNC, it doesn't do anything for RESUME or PRI_RESP.
> +	 * RESUME consumption guarantees that the stalled transaction will be
> +	 * terminated... at some point in the future. PRI_RESP is fire and
> +	 * forget.
> +	 */
> +
> +	return 0;
> +}
> +
>   /* Stream table manipulation functions */
>   static void
>   arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
> @@ -1182,7 +1269,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
>   			 STRTAB_STE_1_STRW_SHIFT);
>   
>   		if (smmu->features & ARM_SMMU_FEAT_STALLS &&
> -		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
> +		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE) &&
> +		   !ste->can_stall)
>   			dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
>   
>   		val |= (cfg->base & STRTAB_STE_0_S1CTXPTR_MASK
> @@ -1285,10 +1373,73 @@ arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
>   	return master;
>   }
>   
> +static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt)
> +{
> +	struct arm_smmu_master_data *master;
> +	u8 type = evt[0] >> EVTQ_0_ID_SHIFT & EVTQ_0_ID_MASK;
> +	u32 sid = evt[0] >> EVTQ_0_SID_SHIFT & EVTQ_0_SID_MASK;
> +
> +	struct iommu_fault_event fault = {
> +		.page_req_group_id = evt[1] >> EVTQ_1_STAG_SHIFT & EVTQ_1_STAG_MASK,
> +		.addr		= evt[2] >> EVTQ_2_ADDR_SHIFT & EVTQ_2_ADDR_MASK,
> +		.last_req	= true,
> +	};
> +
> +	switch (type) {
> +	case EVT_ID_TRANSLATION_FAULT:
> +	case EVT_ID_ADDR_SIZE_FAULT:
> +	case EVT_ID_ACCESS_FAULT:
> +		fault.reason = IOMMU_FAULT_REASON_PTE_FETCH;
> +		break;
> +	case EVT_ID_PERMISSION_FAULT:
> +		fault.reason = IOMMU_FAULT_REASON_PERMISSION;
> +		break;
> +	default:
> +		/* TODO: report other unrecoverable faults. */
> +		return -EFAULT;
> +	}
> +
> +	/* Stage-2 is always pinned at the moment */
> +	if (evt[1] & EVTQ_1_S2)
> +		return -EFAULT;
> +
> +	master = arm_smmu_find_master(smmu, sid);
> +	if (!master)
> +		return -EINVAL;
> +
> +	/*
> +	 * The domain is valid until the fault returns, because detach() flushes
> +	 * the fault queue.
> +	 */
> +	if (evt[1] & EVTQ_1_STALL)
> +		fault.type = IOMMU_FAULT_PAGE_REQ;
> +	else
> +		fault.type = IOMMU_FAULT_DMA_UNRECOV;
> +
> +	if (evt[1] & EVTQ_1_READ)
> +		fault.prot |= IOMMU_FAULT_READ;
> +	else
> +		fault.prot |= IOMMU_FAULT_WRITE;
> +
> +	if (evt[1] & EVTQ_1_EXEC)
> +		fault.prot |= IOMMU_FAULT_EXEC;
> +
> +	if (evt[1] & EVTQ_1_PRIV)
> +		fault.prot |= IOMMU_FAULT_PRIV;
> +
> +	if (evt[0] & EVTQ_0_SSV) {
> +		fault.pasid_valid = true;
> +		fault.pasid = evt[0] >> EVTQ_0_SSID_SHIFT & EVTQ_0_SSID_MASK;
> +	}
> +
> +	/* Report to device driver or populate the page tables */
> +	return iommu_report_device_fault(master->dev, &fault);
> +}
> +
>   /* IRQ and event handlers */
>   static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
>   {
> -	int i;
> +	int i, ret;
>   	int num_handled = 0;
>   	struct arm_smmu_device *smmu = dev;
>   	struct arm_smmu_queue *q = &smmu->evtq.q;
> @@ -1300,12 +1451,19 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
>   		while (!queue_remove_raw(q, evt)) {
>   			u8 id = evt[0] >> EVTQ_0_ID_SHIFT & EVTQ_0_ID_MASK;
>   
> +			spin_unlock(&q->wq.lock);
> +			ret = arm_smmu_handle_evt(smmu, evt);
> +			spin_lock(&q->wq.lock);
> +
>   			if (++num_handled == queue_size) {
>   				q->batch++;
>   				wake_up_locked(&q->wq);
>   				num_handled = 0;
>   			}
>   
> +			if (!ret)
> +				continue;
> +
>   			dev_info(smmu->dev, "event 0x%02x received:\n", id);
>   			for (i = 0; i < ARRAY_SIZE(evt); ++i)
>   				dev_info(smmu->dev, "\t0x%016llx\n",
> @@ -1442,7 +1600,9 @@ static int arm_smmu_flush_queues(struct notifier_block *nb,
>   		master = dev->iommu_fwspec->iommu_priv;
>   
>   	if (master) {
> -		/* TODO: add support for PRI and Stall */
> +		if (master->ste.can_stall)
> +			arm_smmu_flush_queue(smmu, &smmu->evtq.q, "evtq");
> +		/* TODO: add support for PRI */
>   		return 0;
>   	}
>   
> @@ -1756,7 +1916,8 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
>   		.order			= master->ssid_bits,
>   		.sync			= &arm_smmu_ctx_sync,
>   		.arm_smmu = {
> -			.stall		= !!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE),
> +			.stall		= !!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE) ||
> +					  master->ste.can_stall,
>   			.asid_bits	= smmu->asid_bits,
>   			.hw_access	= !!(smmu->features & ARM_SMMU_FEAT_HA),
>   			.hw_dirty	= !!(smmu->features & ARM_SMMU_FEAT_HD),
> @@ -2296,6 +2457,11 @@ static int arm_smmu_add_device(struct device *dev)
>   
>   	master->ssid_bits = min(smmu->ssid_bits, fwspec->num_pasid_bits);
>   
> +	if (fwspec->can_stall && smmu->features & ARM_SMMU_FEAT_STALLS) {
> +		master->can_fault = true;
> +		master->ste.can_stall = true;
> +	}
> +
>   	group = iommu_group_get_for_dev(dev);
>   	if (!IS_ERR(group)) {
>   		arm_smmu_insert_master(smmu, master);
> @@ -2435,6 +2601,7 @@ static struct iommu_ops arm_smmu_ops = {
>   	.mm_attach		= arm_smmu_mm_attach,
>   	.mm_detach		= arm_smmu_mm_detach,
>   	.mm_invalidate		= arm_smmu_mm_invalidate,
> +	.page_response		= arm_smmu_page_response,
>   	.map			= arm_smmu_map,
>   	.unmap			= arm_smmu_unmap,
>   	.map_sg			= default_iommu_map_sg,
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 37c3b9d087ce..f5c2f4be2b42 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -227,7 +227,7 @@ struct page_response_msg {
>   	u32 pasid;
>   	enum page_response_code resp_code;
>   	u32 pasid_present:1;
> -	u32 page_req_group_id : 9;
> +	u32 page_req_group_id;
>   	enum page_response_type type;
>   	u32 private_data;
>   };
> @@ -421,7 +421,7 @@ struct iommu_fault_event {
>   	enum iommu_fault_reason reason;
>   	u64 addr;
>   	u32 pasid;
> -	u32 page_req_group_id : 9;
> +	u32 page_req_group_id;
>   	u32 last_req : 1;
>   	u32 pasid_valid : 1;
>   	u32 prot;

^ permalink raw reply	[flat|nested] 317+ messages in thread

* RE: [PATCH 01/37] iommu: Introduce Shared Virtual Addressing API
  2018-02-12 18:33   ` Jean-Philippe Brucker
  (?)
@ 2018-02-13  7:31       ` Tian, Kevin
  -1 siblings, 0 replies; 317+ messages in thread
From: Tian, Kevin @ 2018-02-13  7:31 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: mark.rutland-5wv7dgnIgG8, bharatku-gjFFaj9aHVfQT0dZR+AlfA, Raj,
	Ashok, shunyong.yang-PT9Dzx9SjPiXmMXjJBpWqg,
	rjw-LthD3rsA81gm4RdzfppkhA, catalin.marinas-5wv7dgnIgG8,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, sudeep.holla-5wv7dgnIgG8,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	christian.koenig-5C7GfCeVMHo, lenb-DgEjT+Ai2ygdnm+yROfE0A

> From: Jean-Philippe Brucker
> Sent: Tuesday, February 13, 2018 2:33 AM
> 
> Shared Virtual Addressing (SVA) provides a way for device drivers to bind
> process address spaces to devices. This requires the IOMMU to support the
> same page table format as CPUs, and requires the system to support I/O

"same" is a bit restrictive. "compatible" is better as you used in coverletter. :-)

> Page Faults (IOPF) and Process Address Space ID (PASID). When all of these
> are available, DMA can access virtual addresses of a process. A PASID is
> allocated for each process, and the device driver programs it into the
> device in an implementation-specific way.
> 
> Add a new API for sharing process page tables with devices. Introduce two
> IOMMU operations, sva_device_init() and sva_device_shutdown(), that
> prepare the IOMMU driver for SVA. For example allocate PASID tables and
> fault queues. Subsequent patches will implement the bind() and unbind()
> operations.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
> ---
>  drivers/iommu/Kconfig     | 10 ++++++
>  drivers/iommu/Makefile    |  1 +
>  drivers/iommu/iommu-sva.c | 90
> +++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/iommu.h     | 32 +++++++++++++++++
>  4 files changed, 133 insertions(+)
>  create mode 100644 drivers/iommu/iommu-sva.c
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index f3a21343e636..555147a61f7c 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -74,6 +74,16 @@ config IOMMU_DMA
>  	select IOMMU_IOVA
>  	select NEED_SG_DMA_LENGTH
> 
> +config IOMMU_SVA
> +	bool "Shared Virtual Addressing API for the IOMMU"
> +	select IOMMU_API
> +	help
> +	  Enable process address space management for the IOMMU API. In
> systems
> +	  that support it, device drivers can bind process address spaces to
> +	  devices and share their page tables using this API.

"their page table" is a bit confusing here.

> +
> +	  If unsure, say N here.
> +
>  config FSL_PAMU
>  	bool "Freescale IOMMU support"
>  	depends on PCI
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 1fb695854809..1dbcc89ebe4c 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -3,6 +3,7 @@ obj-$(CONFIG_IOMMU_API) += iommu.o
>  obj-$(CONFIG_IOMMU_API) += iommu-traces.o
>  obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
>  obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
> +obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> new file mode 100644
> index 000000000000..cab5d723520f
> --- /dev/null
> +++ b/drivers/iommu/iommu-sva.c
> @@ -0,0 +1,90 @@
> +/*
> + * Track processes address spaces bound to devices and allocate PASIDs.
> + *
> + * Copyright (C) 2018 ARM Ltd.
> + * Author: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
> + *
> + * SPDX-License-Identifier: GPL-2.0
> + */
> +
> +#include <linux/iommu.h>
> +
> +/**
> + * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a
> device
> + * @dev: the device
> + * @features: bitmask of features that need to be initialized
> + * @max_pasid: max PASID value supported by the device
> + *
> + * Users of the bind()/unbind() API must call this function to initialize all
> + * features required for SVA.
> + *
> + * - If the device should support multiple address spaces (e.g. PCI PASID),
> + *   IOMMU_SVA_FEAT_PASID must be requested.

I think it is by default assumed when using this API, based on definition of
SVA. Can you elaborate the situation where this flag can be cleared?

> + *
> + *   By default the PASID allocated during bind() is limited by the IOMMU
> + *   capacity, and by the device PASID width defined in the PCI capability or
> in
> + *   the firmware description. Setting @max_pasid to a non-zero value
> smaller
> + *   than this limit overrides it.
> + *
> + * - If the device should support I/O Page Faults (e.g. PCI PRI),
> + *   IOMMU_SVA_FEAT_IOPF must be requested.
> + *
> + * The device should not be be performing any DMA while this function is

remove double "be"

> + * running.

"otherwise the behavior is undefined"

> + *
> + * Return 0 if initialization succeeded, or an error.
> + */
> +int iommu_sva_device_init(struct device *dev, unsigned long features,
> +			  unsigned int max_pasid)
> +{
> +	int ret;
> +	unsigned int min_pasid = 0;
> +	struct iommu_param *dev_param = dev->iommu_param;
> +	struct iommu_domain *domain =
> iommu_get_domain_for_dev(dev);
> +
> +	if (!domain || !dev_param || !domain->ops->sva_device_init)
> +		return -ENODEV;
> +
> +	/*
> +	 * IOMMU driver updates the limits depending on the IOMMU and
> device
> +	 * capabilities.
> +	 */
> +	ret = domain->ops->sva_device_init(dev, features, &min_pasid,
> +					   &max_pasid);
> +	if (ret)
> +		return ret;
> +
> +	/* FIXME: racy. Next version should have a mutex (same as fault
> handler) */
> +	dev_param->sva_features = features;
> +	dev_param->min_pasid = min_pasid;
> +	dev_param->max_pasid = max_pasid;

what's the point of min_pasid here?

> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_device_init);
> +
> +/**
> + * iommu_sva_device_shutdown() - Shutdown Shared Virtual Addressing
> for a device
> + * @dev: the device
> + *
> + * Disable SVA. The device should not be performing any DMA while this
> function
> + * is running.
> + */
> +int iommu_sva_device_shutdown(struct device *dev)
> +{
> +	struct iommu_param *dev_param = dev->iommu_param;
> +	struct iommu_domain *domain =
> iommu_get_domain_for_dev(dev);
> +
> +	if (!domain)
> +		return -ENODEV;
> +
> +	if (domain->ops->sva_device_shutdown)
> +		domain->ops->sva_device_shutdown(dev);
> +
> +	dev_param->sva_features = 0;
> +	dev_param->min_pasid = 0;
> +	dev_param->max_pasid = 0;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_device_shutdown);
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 66ef406396e9..e9e09eecdece 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -60,6 +60,11 @@ typedef int (*iommu_fault_handler_t)(struct
> iommu_domain *,
>  			struct device *, unsigned long, int, void *);
>  typedef int (*iommu_dev_fault_handler_t)(struct iommu_fault_event *,
> void *);
> 
> +/* Request PASID support */
> +#define IOMMU_SVA_FEAT_PASID		(1 << 0)
> +/* Request I/O page fault support */
> +#define IOMMU_SVA_FEAT_IOPF		(1 << 1)
> +
>  struct iommu_domain_geometry {
>  	dma_addr_t aperture_start; /* First address that can be mapped
> */
>  	dma_addr_t aperture_end;   /* Last address that can be mapped
> */
> @@ -197,6 +202,8 @@ struct page_response_msg {
>   * @domain_free: free iommu domain
>   * @attach_dev: attach device to an iommu domain
>   * @detach_dev: detach device from an iommu domain
> + * @sva_device_init: initialize Shared Virtual Adressing for a device
> + * @sva_device_shutdown: shutdown Shared Virtual Adressing for a
> device
>   * @map: map a physically contiguous memory region to an iommu
> domain
>   * @unmap: unmap a physically contiguous memory region from an
> iommu domain
>   * @map_sg: map a scatter-gather list of physically contiguous memory
> chunks
> @@ -230,6 +237,10 @@ struct iommu_ops {
> 
>  	int (*attach_dev)(struct iommu_domain *domain, struct device
> *dev);
>  	void (*detach_dev)(struct iommu_domain *domain, struct device
> *dev);
> +	int (*sva_device_init)(struct device *dev, unsigned long features,
> +			       unsigned int *min_pasid,
> +			       unsigned int *max_pasid);
> +	void (*sva_device_shutdown)(struct device *dev);
>  	int (*map)(struct iommu_domain *domain, unsigned long iova,
>  		   phys_addr_t paddr, size_t size, int prot);
>  	size_t (*unmap)(struct iommu_domain *domain, unsigned long
> iova,
> @@ -385,6 +396,9 @@ struct iommu_fault_param {
>   */
>  struct iommu_param {
>  	struct iommu_fault_param *fault_param;
> +	unsigned long sva_features;
> +	unsigned int min_pasid;
> +	unsigned int max_pasid;
>  };
> 
>  int  iommu_device_register(struct iommu_device *iommu);
> @@ -878,4 +892,22 @@ const struct iommu_ops
> *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
> 
>  #endif /* CONFIG_IOMMU_API */
> 
> +#ifdef CONFIG_IOMMU_SVA
> +extern int iommu_sva_device_init(struct device *dev, unsigned long
> features,
> +				 unsigned int max_pasid);
> +extern int iommu_sva_device_shutdown(struct device *dev);
> +#else /* CONFIG_IOMMU_SVA */
> +static inline int iommu_sva_device_init(struct device *dev,
> +					unsigned long features,
> +					unsigned int max_pasid)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline int iommu_sva_device_shutdown(struct device *dev)
> +{
> +	return -ENODEV;
> +}
> +#endif /* CONFIG_IOMMU_SVA */
> +
>  #endif /* __LINUX_IOMMU_H */
> --
> 2.15.1
> 
> _______________________________________________
> iommu mailing list
> iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 317+ messages in thread

* RE: [PATCH 01/37] iommu: Introduce Shared Virtual Addressing API
@ 2018-02-13  7:31       ` Tian, Kevin
  0 siblings, 0 replies; 317+ messages in thread
From: Tian, Kevin @ 2018-02-13  7:31 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu, kvm
  Cc: mark.rutland, bharatku, Raj, Ashok, shunyong.yang, rjw,
	catalin.marinas, xuzaibo, ilias.apalodimas, will.deacon, okaya,
	bhelgaas, robh+dt, sudeep.holla, rfranz, dwmw2, christian.koenig,
	lenb

> From: Jean-Philippe Brucker
> Sent: Tuesday, February 13, 2018 2:33 AM
> 
> Shared Virtual Addressing (SVA) provides a way for device drivers to bind
> process address spaces to devices. This requires the IOMMU to support the
> same page table format as CPUs, and requires the system to support I/O

"same" is a bit restrictive. "compatible" is better as you used in coverletter. :-)

> Page Faults (IOPF) and Process Address Space ID (PASID). When all of these
> are available, DMA can access virtual addresses of a process. A PASID is
> allocated for each process, and the device driver programs it into the
> device in an implementation-specific way.
> 
> Add a new API for sharing process page tables with devices. Introduce two
> IOMMU operations, sva_device_init() and sva_device_shutdown(), that
> prepare the IOMMU driver for SVA. For example allocate PASID tables and
> fault queues. Subsequent patches will implement the bind() and unbind()
> operations.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  drivers/iommu/Kconfig     | 10 ++++++
>  drivers/iommu/Makefile    |  1 +
>  drivers/iommu/iommu-sva.c | 90
> +++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/iommu.h     | 32 +++++++++++++++++
>  4 files changed, 133 insertions(+)
>  create mode 100644 drivers/iommu/iommu-sva.c
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index f3a21343e636..555147a61f7c 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -74,6 +74,16 @@ config IOMMU_DMA
>  	select IOMMU_IOVA
>  	select NEED_SG_DMA_LENGTH
> 
> +config IOMMU_SVA
> +	bool "Shared Virtual Addressing API for the IOMMU"
> +	select IOMMU_API
> +	help
> +	  Enable process address space management for the IOMMU API. In
> systems
> +	  that support it, device drivers can bind process address spaces to
> +	  devices and share their page tables using this API.

"their page table" is a bit confusing here.

> +
> +	  If unsure, say N here.
> +
>  config FSL_PAMU
>  	bool "Freescale IOMMU support"
>  	depends on PCI
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 1fb695854809..1dbcc89ebe4c 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -3,6 +3,7 @@ obj-$(CONFIG_IOMMU_API) += iommu.o
>  obj-$(CONFIG_IOMMU_API) += iommu-traces.o
>  obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
>  obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
> +obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> new file mode 100644
> index 000000000000..cab5d723520f
> --- /dev/null
> +++ b/drivers/iommu/iommu-sva.c
> @@ -0,0 +1,90 @@
> +/*
> + * Track processes address spaces bound to devices and allocate PASIDs.
> + *
> + * Copyright (C) 2018 ARM Ltd.
> + * Author: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> + *
> + * SPDX-License-Identifier: GPL-2.0
> + */
> +
> +#include <linux/iommu.h>
> +
> +/**
> + * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a
> device
> + * @dev: the device
> + * @features: bitmask of features that need to be initialized
> + * @max_pasid: max PASID value supported by the device
> + *
> + * Users of the bind()/unbind() API must call this function to initialize all
> + * features required for SVA.
> + *
> + * - If the device should support multiple address spaces (e.g. PCI PASID),
> + *   IOMMU_SVA_FEAT_PASID must be requested.

I think it is by default assumed when using this API, based on definition of
SVA. Can you elaborate the situation where this flag can be cleared?

> + *
> + *   By default the PASID allocated during bind() is limited by the IOMMU
> + *   capacity, and by the device PASID width defined in the PCI capability or
> in
> + *   the firmware description. Setting @max_pasid to a non-zero value
> smaller
> + *   than this limit overrides it.
> + *
> + * - If the device should support I/O Page Faults (e.g. PCI PRI),
> + *   IOMMU_SVA_FEAT_IOPF must be requested.
> + *
> + * The device should not be be performing any DMA while this function is

remove double "be"

> + * running.

"otherwise the behavior is undefined"

> + *
> + * Return 0 if initialization succeeded, or an error.
> + */
> +int iommu_sva_device_init(struct device *dev, unsigned long features,
> +			  unsigned int max_pasid)
> +{
> +	int ret;
> +	unsigned int min_pasid = 0;
> +	struct iommu_param *dev_param = dev->iommu_param;
> +	struct iommu_domain *domain =
> iommu_get_domain_for_dev(dev);
> +
> +	if (!domain || !dev_param || !domain->ops->sva_device_init)
> +		return -ENODEV;
> +
> +	/*
> +	 * IOMMU driver updates the limits depending on the IOMMU and
> device
> +	 * capabilities.
> +	 */
> +	ret = domain->ops->sva_device_init(dev, features, &min_pasid,
> +					   &max_pasid);
> +	if (ret)
> +		return ret;
> +
> +	/* FIXME: racy. Next version should have a mutex (same as fault
> handler) */
> +	dev_param->sva_features = features;
> +	dev_param->min_pasid = min_pasid;
> +	dev_param->max_pasid = max_pasid;

what's the point of min_pasid here?

> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_device_init);
> +
> +/**
> + * iommu_sva_device_shutdown() - Shutdown Shared Virtual Addressing
> for a device
> + * @dev: the device
> + *
> + * Disable SVA. The device should not be performing any DMA while this
> function
> + * is running.
> + */
> +int iommu_sva_device_shutdown(struct device *dev)
> +{
> +	struct iommu_param *dev_param = dev->iommu_param;
> +	struct iommu_domain *domain =
> iommu_get_domain_for_dev(dev);
> +
> +	if (!domain)
> +		return -ENODEV;
> +
> +	if (domain->ops->sva_device_shutdown)
> +		domain->ops->sva_device_shutdown(dev);
> +
> +	dev_param->sva_features = 0;
> +	dev_param->min_pasid = 0;
> +	dev_param->max_pasid = 0;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_device_shutdown);
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 66ef406396e9..e9e09eecdece 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -60,6 +60,11 @@ typedef int (*iommu_fault_handler_t)(struct
> iommu_domain *,
>  			struct device *, unsigned long, int, void *);
>  typedef int (*iommu_dev_fault_handler_t)(struct iommu_fault_event *,
> void *);
> 
> +/* Request PASID support */
> +#define IOMMU_SVA_FEAT_PASID		(1 << 0)
> +/* Request I/O page fault support */
> +#define IOMMU_SVA_FEAT_IOPF		(1 << 1)
> +
>  struct iommu_domain_geometry {
>  	dma_addr_t aperture_start; /* First address that can be mapped
> */
>  	dma_addr_t aperture_end;   /* Last address that can be mapped
> */
> @@ -197,6 +202,8 @@ struct page_response_msg {
>   * @domain_free: free iommu domain
>   * @attach_dev: attach device to an iommu domain
>   * @detach_dev: detach device from an iommu domain
> + * @sva_device_init: initialize Shared Virtual Adressing for a device
> + * @sva_device_shutdown: shutdown Shared Virtual Adressing for a
> device
>   * @map: map a physically contiguous memory region to an iommu
> domain
>   * @unmap: unmap a physically contiguous memory region from an
> iommu domain
>   * @map_sg: map a scatter-gather list of physically contiguous memory
> chunks
> @@ -230,6 +237,10 @@ struct iommu_ops {
> 
>  	int (*attach_dev)(struct iommu_domain *domain, struct device
> *dev);
>  	void (*detach_dev)(struct iommu_domain *domain, struct device
> *dev);
> +	int (*sva_device_init)(struct device *dev, unsigned long features,
> +			       unsigned int *min_pasid,
> +			       unsigned int *max_pasid);
> +	void (*sva_device_shutdown)(struct device *dev);
>  	int (*map)(struct iommu_domain *domain, unsigned long iova,
>  		   phys_addr_t paddr, size_t size, int prot);
>  	size_t (*unmap)(struct iommu_domain *domain, unsigned long
> iova,
> @@ -385,6 +396,9 @@ struct iommu_fault_param {
>   */
>  struct iommu_param {
>  	struct iommu_fault_param *fault_param;
> +	unsigned long sva_features;
> +	unsigned int min_pasid;
> +	unsigned int max_pasid;
>  };
> 
>  int  iommu_device_register(struct iommu_device *iommu);
> @@ -878,4 +892,22 @@ const struct iommu_ops
> *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
> 
>  #endif /* CONFIG_IOMMU_API */
> 
> +#ifdef CONFIG_IOMMU_SVA
> +extern int iommu_sva_device_init(struct device *dev, unsigned long
> features,
> +				 unsigned int max_pasid);
> +extern int iommu_sva_device_shutdown(struct device *dev);
> +#else /* CONFIG_IOMMU_SVA */
> +static inline int iommu_sva_device_init(struct device *dev,
> +					unsigned long features,
> +					unsigned int max_pasid)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline int iommu_sva_device_shutdown(struct device *dev)
> +{
> +	return -ENODEV;
> +}
> +#endif /* CONFIG_IOMMU_SVA */
> +
>  #endif /* __LINUX_IOMMU_H */
> --
> 2.15.1
> 
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 01/37] iommu: Introduce Shared Virtual Addressing API
@ 2018-02-13  7:31       ` Tian, Kevin
  0 siblings, 0 replies; 317+ messages in thread
From: Tian, Kevin @ 2018-02-13  7:31 UTC (permalink / raw)
  To: linux-arm-kernel

> From: Jean-Philippe Brucker
> Sent: Tuesday, February 13, 2018 2:33 AM
> 
> Shared Virtual Addressing (SVA) provides a way for device drivers to bind
> process address spaces to devices. This requires the IOMMU to support the
> same page table format as CPUs, and requires the system to support I/O

"same" is a bit restrictive. "compatible" is better as you used in coverletter. :-)

> Page Faults (IOPF) and Process Address Space ID (PASID). When all of these
> are available, DMA can access virtual addresses of a process. A PASID is
> allocated for each process, and the device driver programs it into the
> device in an implementation-specific way.
> 
> Add a new API for sharing process page tables with devices. Introduce two
> IOMMU operations, sva_device_init() and sva_device_shutdown(), that
> prepare the IOMMU driver for SVA. For example allocate PASID tables and
> fault queues. Subsequent patches will implement the bind() and unbind()
> operations.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  drivers/iommu/Kconfig     | 10 ++++++
>  drivers/iommu/Makefile    |  1 +
>  drivers/iommu/iommu-sva.c | 90
> +++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/iommu.h     | 32 +++++++++++++++++
>  4 files changed, 133 insertions(+)
>  create mode 100644 drivers/iommu/iommu-sva.c
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index f3a21343e636..555147a61f7c 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -74,6 +74,16 @@ config IOMMU_DMA
>  	select IOMMU_IOVA
>  	select NEED_SG_DMA_LENGTH
> 
> +config IOMMU_SVA
> +	bool "Shared Virtual Addressing API for the IOMMU"
> +	select IOMMU_API
> +	help
> +	  Enable process address space management for the IOMMU API. In
> systems
> +	  that support it, device drivers can bind process address spaces to
> +	  devices and share their page tables using this API.

"their page table" is a bit confusing here.

> +
> +	  If unsure, say N here.
> +
>  config FSL_PAMU
>  	bool "Freescale IOMMU support"
>  	depends on PCI
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 1fb695854809..1dbcc89ebe4c 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -3,6 +3,7 @@ obj-$(CONFIG_IOMMU_API) += iommu.o
>  obj-$(CONFIG_IOMMU_API) += iommu-traces.o
>  obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
>  obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
> +obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> new file mode 100644
> index 000000000000..cab5d723520f
> --- /dev/null
> +++ b/drivers/iommu/iommu-sva.c
> @@ -0,0 +1,90 @@
> +/*
> + * Track processes address spaces bound to devices and allocate PASIDs.
> + *
> + * Copyright (C) 2018 ARM Ltd.
> + * Author: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> + *
> + * SPDX-License-Identifier: GPL-2.0
> + */
> +
> +#include <linux/iommu.h>
> +
> +/**
> + * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a
> device
> + * @dev: the device
> + * @features: bitmask of features that need to be initialized
> + * @max_pasid: max PASID value supported by the device
> + *
> + * Users of the bind()/unbind() API must call this function to initialize all
> + * features required for SVA.
> + *
> + * - If the device should support multiple address spaces (e.g. PCI PASID),
> + *   IOMMU_SVA_FEAT_PASID must be requested.

I think it is by default assumed when using this API, based on definition of
SVA. Can you elaborate the situation where this flag can be cleared?

> + *
> + *   By default the PASID allocated during bind() is limited by the IOMMU
> + *   capacity, and by the device PASID width defined in the PCI capability or
> in
> + *   the firmware description. Setting @max_pasid to a non-zero value
> smaller
> + *   than this limit overrides it.
> + *
> + * - If the device should support I/O Page Faults (e.g. PCI PRI),
> + *   IOMMU_SVA_FEAT_IOPF must be requested.
> + *
> + * The device should not be be performing any DMA while this function is

remove double "be"

> + * running.

"otherwise the behavior is undefined"

> + *
> + * Return 0 if initialization succeeded, or an error.
> + */
> +int iommu_sva_device_init(struct device *dev, unsigned long features,
> +			  unsigned int max_pasid)
> +{
> +	int ret;
> +	unsigned int min_pasid = 0;
> +	struct iommu_param *dev_param = dev->iommu_param;
> +	struct iommu_domain *domain =
> iommu_get_domain_for_dev(dev);
> +
> +	if (!domain || !dev_param || !domain->ops->sva_device_init)
> +		return -ENODEV;
> +
> +	/*
> +	 * IOMMU driver updates the limits depending on the IOMMU and
> device
> +	 * capabilities.
> +	 */
> +	ret = domain->ops->sva_device_init(dev, features, &min_pasid,
> +					   &max_pasid);
> +	if (ret)
> +		return ret;
> +
> +	/* FIXME: racy. Next version should have a mutex (same as fault
> handler) */
> +	dev_param->sva_features = features;
> +	dev_param->min_pasid = min_pasid;
> +	dev_param->max_pasid = max_pasid;

what's the point of min_pasid here?

> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_device_init);
> +
> +/**
> + * iommu_sva_device_shutdown() - Shutdown Shared Virtual Addressing
> for a device
> + * @dev: the device
> + *
> + * Disable SVA. The device should not be performing any DMA while this
> function
> + * is running.
> + */
> +int iommu_sva_device_shutdown(struct device *dev)
> +{
> +	struct iommu_param *dev_param = dev->iommu_param;
> +	struct iommu_domain *domain =
> iommu_get_domain_for_dev(dev);
> +
> +	if (!domain)
> +		return -ENODEV;
> +
> +	if (domain->ops->sva_device_shutdown)
> +		domain->ops->sva_device_shutdown(dev);
> +
> +	dev_param->sva_features = 0;
> +	dev_param->min_pasid = 0;
> +	dev_param->max_pasid = 0;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_device_shutdown);
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 66ef406396e9..e9e09eecdece 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -60,6 +60,11 @@ typedef int (*iommu_fault_handler_t)(struct
> iommu_domain *,
>  			struct device *, unsigned long, int, void *);
>  typedef int (*iommu_dev_fault_handler_t)(struct iommu_fault_event *,
> void *);
> 
> +/* Request PASID support */
> +#define IOMMU_SVA_FEAT_PASID		(1 << 0)
> +/* Request I/O page fault support */
> +#define IOMMU_SVA_FEAT_IOPF		(1 << 1)
> +
>  struct iommu_domain_geometry {
>  	dma_addr_t aperture_start; /* First address that can be mapped
> */
>  	dma_addr_t aperture_end;   /* Last address that can be mapped
> */
> @@ -197,6 +202,8 @@ struct page_response_msg {
>   * @domain_free: free iommu domain
>   * @attach_dev: attach device to an iommu domain
>   * @detach_dev: detach device from an iommu domain
> + * @sva_device_init: initialize Shared Virtual Adressing for a device
> + * @sva_device_shutdown: shutdown Shared Virtual Adressing for a
> device
>   * @map: map a physically contiguous memory region to an iommu
> domain
>   * @unmap: unmap a physically contiguous memory region from an
> iommu domain
>   * @map_sg: map a scatter-gather list of physically contiguous memory
> chunks
> @@ -230,6 +237,10 @@ struct iommu_ops {
> 
>  	int (*attach_dev)(struct iommu_domain *domain, struct device
> *dev);
>  	void (*detach_dev)(struct iommu_domain *domain, struct device
> *dev);
> +	int (*sva_device_init)(struct device *dev, unsigned long features,
> +			       unsigned int *min_pasid,
> +			       unsigned int *max_pasid);
> +	void (*sva_device_shutdown)(struct device *dev);
>  	int (*map)(struct iommu_domain *domain, unsigned long iova,
>  		   phys_addr_t paddr, size_t size, int prot);
>  	size_t (*unmap)(struct iommu_domain *domain, unsigned long
> iova,
> @@ -385,6 +396,9 @@ struct iommu_fault_param {
>   */
>  struct iommu_param {
>  	struct iommu_fault_param *fault_param;
> +	unsigned long sva_features;
> +	unsigned int min_pasid;
> +	unsigned int max_pasid;
>  };
> 
>  int  iommu_device_register(struct iommu_device *iommu);
> @@ -878,4 +892,22 @@ const struct iommu_ops
> *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
> 
>  #endif /* CONFIG_IOMMU_API */
> 
> +#ifdef CONFIG_IOMMU_SVA
> +extern int iommu_sva_device_init(struct device *dev, unsigned long
> features,
> +				 unsigned int max_pasid);
> +extern int iommu_sva_device_shutdown(struct device *dev);
> +#else /* CONFIG_IOMMU_SVA */
> +static inline int iommu_sva_device_init(struct device *dev,
> +					unsigned long features,
> +					unsigned int max_pasid)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline int iommu_sva_device_shutdown(struct device *dev)
> +{
> +	return -ENODEV;
> +}
> +#endif /* CONFIG_IOMMU_SVA */
> +
>  #endif /* __LINUX_IOMMU_H */
> --
> 2.15.1
> 
> _______________________________________________
> iommu mailing list
> iommu at lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 317+ messages in thread

* RE: [PATCH 02/37] iommu/sva: Bind process address spaces to devices
  2018-02-12 18:33   ` Jean-Philippe Brucker
  (?)
@ 2018-02-13  7:54     ` Tian, Kevin
  -1 siblings, 0 replies; 317+ messages in thread
From: Tian, Kevin @ 2018-02-13  7:54 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1

> From: Jean-Philippe Brucker
> Sent: Tuesday, February 13, 2018 2:33 AM
> 
> Add bind() and unbind() operations to the IOMMU API. Device drivers can
> use them to share process page tables with their devices. bind_group()
> is provided for VFIO's convenience, as it needs to provide a coherent
> interface on containers. Other device drivers will most likely want to
> use bind_device(), which binds a single device in the group.

I saw your bind_group implementation tries to bind the address space
for all devices within a group, which IMO has some problem. Based on PCIe
spec, packet routing on the bus doesn't take PASID into consideration. 
since devices within same group cannot be isolated based on requestor-ID
i.e. traffic not guaranteed going to IOMMU, enabling SVA on multiple devices
could cause undesired p2p.
 
If my understanding of PCIe spec is correct, probably we should fail 
calling bind_group()/bind_device() when there are multiple devices within 
the given group. If only one device then bind_group is essentially a wrapper
to bind_device.

> 
> Regardless of the IOMMU group or domain a device is in, device drivers
> should call bind() for each device that will use the PASID.
> 
> This patch only adds skeletons for the device driver API, most of the
> implementation is still missing.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  drivers/iommu/iommu-sva.c | 105
> ++++++++++++++++++++++++++++++++++++++++++++++
>  drivers/iommu/iommu.c     |  63 ++++++++++++++++++++++++++++
>  include/linux/iommu.h     |  36 ++++++++++++++++
>  3 files changed, 204 insertions(+)
> 
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> index cab5d723520f..593685d891bf 100644
> --- a/drivers/iommu/iommu-sva.c
> +++ b/drivers/iommu/iommu-sva.c
> @@ -9,6 +9,9 @@
> 
>  #include <linux/iommu.h>
> 
> +/* TODO: stub for the fault queue. Remove later. */
> +#define iommu_fault_queue_flush(...)
> +
>  /**
>   * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a
> device
>   * @dev: the device
> @@ -78,6 +81,8 @@ int iommu_sva_device_shutdown(struct device *dev)
>  	if (!domain)
>  		return -ENODEV;
> 
> +	__iommu_sva_unbind_dev_all(dev);
> +
>  	if (domain->ops->sva_device_shutdown)
>  		domain->ops->sva_device_shutdown(dev);
> 
> @@ -88,3 +93,103 @@ int iommu_sva_device_shutdown(struct device
> *dev)
>  	return 0;
>  }
>  EXPORT_SYMBOL_GPL(iommu_sva_device_shutdown);
> +
> +/**
> + * iommu_sva_bind_device() - Bind a process address space to a device
> + * @dev: the device
> + * @mm: the mm to bind, caller must hold a reference to it
> + * @pasid: valid address where the PASID will be stored
> + * @flags: bond properties (IOMMU_SVA_FEAT_*)
> + * @drvdata: private data passed to the mm exit handler
> + *
> + * Create a bond between device and task, allowing the device to access
> the mm
> + * using the returned PASID. A subsequent bind() for the same device and
> mm will
> + * reuse the bond (and return the same PASID), but users will have to call
> + * unbind() twice.

what's the point of requiring unbind twice?

> + *
> + * Callers should have taken care of setting up SVA for this device with
> + * iommu_sva_device_init() beforehand. They may also be notified of the
> bond
> + * disappearing, for example when the last task that uses the mm dies, by
> + * registering a notifier with iommu_register_mm_exit_handler().
> + *
> + * If IOMMU_SVA_FEAT_PASID is requested, a PASID is allocated and
> returned.
> + * TODO: The alternative, binding the non-PASID context to an mm, isn't
> + * supported at the moment because existing IOMMU domain types
> initialize the
> + * non-PASID context for iommu_map()/unmap() or bypass. This requires
> a new
> + * domain type.
> + *
> + * If IOMMU_SVA_FEAT_IOPF is not requested, the caller must pin down
> all
> + * mappings shared with the device. mlock() isn't sufficient, as it doesn't
> + * prevent minor page faults (e.g. copy-on-write). TODO: !IOPF isn't
> allowed at
> + * the moment.
> + *
> + * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an
> error
> + * is returned.
> + */
> +int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
> int *pasid,
> +			  unsigned long flags, void *drvdata)
> +{
> +	struct iommu_domain *domain;
> +	struct iommu_param *dev_param = dev->iommu_param;
> +
> +	domain = iommu_get_domain_for_dev(dev);
> +	if (!domain)
> +		return -EINVAL;
> +
> +	if (!pasid)
> +		return -EINVAL;
> +
> +	if (!dev_param || (flags & ~dev_param->sva_features))
> +		return -EINVAL;
> +
> +	if (flags != (IOMMU_SVA_FEAT_PASID | IOMMU_SVA_FEAT_IOPF))
> +		return -EINVAL;
> +
> +	return -ENOSYS; /* TODO */
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
> +
> +/**
> + * iommu_sva_unbind_device() - Remove a bond created with
> iommu_sva_bind_device
> + * @dev: the device
> + * @pasid: the pasid returned by bind()
> + *
> + * Remove bond between device and address space identified by @pasid.
> Users
> + * should not call unbind() if the corresponding mm exited (as the PASID
> might
> + * have been reallocated to another process.)
> + *
> + * The device must not be issuing any more transaction for this PASID. All
> + * outstanding page requests for this PASID must have been flushed to the
> IOMMU.
> + *
> + * Returns 0 on success, or an error value
> + */
> +int iommu_sva_unbind_device(struct device *dev, int pasid)
> +{
> +	struct iommu_domain *domain;
> +
> +	domain = iommu_get_domain_for_dev(dev);
> +	if (WARN_ON(!domain))
> +		return -EINVAL;
> +
> +	/*
> +	 * Caller stopped the device from issuing PASIDs, now make sure
> they are
> +	 * out of the fault queue.
> +	 */
> +	iommu_fault_queue_flush(dev);
> +
> +	return -ENOSYS; /* TODO */
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);
> +
> +/**
> + * __iommu_sva_unbind_dev_all() - Detach all address spaces from this
> device
> + *
> + * When detaching @device from a domain, IOMMU drivers should use
> this helper.
> + */
> +void __iommu_sva_unbind_dev_all(struct device *dev)
> +{
> +	iommu_fault_queue_flush(dev);
> +
> +	/* TODO */
> +}
> +EXPORT_SYMBOL_GPL(__iommu_sva_unbind_dev_all);
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index d4a4edaf2d8c..f977851c522b 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1535,6 +1535,69 @@ void iommu_detach_group(struct
> iommu_domain *domain, struct iommu_group *group)
>  }
>  EXPORT_SYMBOL_GPL(iommu_detach_group);
> 
> +/*
> + * iommu_sva_bind_group() - Share address space with all devices in the
> group.
> + * @group: the iommu group
> + * @mm: the mm to bind
> + * @pasid: valid address where the PASID will be stored
> + * @flags: bond properties (IOMMU_PROCESS_BIND_*)
> + * @drvdata: private data passed to the mm exit handler
> + *
> + * Create a bond between group and process, allowing devices in the
> group to
> + * access the process address space using @pasid.
> + *
> + * Refer to iommu_sva_bind_device() for more details.
> + *
> + * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an
> error
> + * is returned.
> + */
> +int iommu_sva_bind_group(struct iommu_group *group, struct
> mm_struct *mm,
> +			 int *pasid, unsigned long flags, void *drvdata)
> +{
> +	struct group_device *device;
> +	int ret = -ENODEV;
> +
> +	if (!group->domain)
> +		return -EINVAL;
> +
> +	mutex_lock(&group->mutex);
> +	list_for_each_entry(device, &group->devices, list) {
> +		ret = iommu_sva_bind_device(device->dev, mm, pasid,
> flags,
> +					    drvdata);
> +		if (ret)
> +			break;
> +	}
> +
> +	if (ret) {
> +		list_for_each_entry_continue_reverse(device, &group-
> >devices, list)
> +			iommu_sva_unbind_device(device->dev, *pasid);
> +	}
> +	mutex_unlock(&group->mutex);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_bind_group);
> +
> +/**
> + * iommu_sva_unbind_group() - Remove a bond created with
> iommu_sva_bind_group()
> + * @group: the group
> + * @pasid: the pasid returned by bind
> + *
> + * Refer to iommu_sva_unbind_device() for more details.
> + */
> +int iommu_sva_unbind_group(struct iommu_group *group, int pasid)
> +{
> +	struct group_device *device;
> +
> +	mutex_lock(&group->mutex);
> +	list_for_each_entry(device, &group->devices, list)
> +		iommu_sva_unbind_device(device->dev, pasid);
> +	mutex_unlock(&group->mutex);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_unbind_group);
> +
>  phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain,
> dma_addr_t iova)
>  {
>  	if (unlikely(domain->ops->iova_to_phys == NULL))
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index e9e09eecdece..1fb10d64b9e5 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -576,6 +576,10 @@ int iommu_fwspec_init(struct device *dev, struct
> fwnode_handle *iommu_fwnode,
>  void iommu_fwspec_free(struct device *dev);
>  int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids);
>  const struct iommu_ops *iommu_ops_from_fwnode(struct
> fwnode_handle *fwnode);
> +extern int iommu_sva_bind_group(struct iommu_group *group,
> +				struct mm_struct *mm, int *pasid,
> +				unsigned long flags, void *drvdata);
> +extern int iommu_sva_unbind_group(struct iommu_group *group, int
> pasid);
> 
>  #else /* CONFIG_IOMMU_API */
> 
> @@ -890,12 +894,28 @@ const struct iommu_ops
> *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
>  	return NULL;
>  }
> 
> +static inline int iommu_sva_bind_group(struct iommu_group *group,
> +				       struct mm_struct *mm, int *pasid,
> +				       unsigned long flags, void *drvdata)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline int iommu_sva_unbind_group(struct iommu_group *group,
> int pasid)
> +{
> +	return -ENODEV;
> +}
> +
>  #endif /* CONFIG_IOMMU_API */
> 
>  #ifdef CONFIG_IOMMU_SVA
>  extern int iommu_sva_device_init(struct device *dev, unsigned long
> features,
>  				 unsigned int max_pasid);
>  extern int iommu_sva_device_shutdown(struct device *dev);
> +extern int iommu_sva_bind_device(struct device *dev, struct mm_struct
> *mm,
> +				int *pasid, unsigned long flags, void
> *drvdata);
> +extern int iommu_sva_unbind_device(struct device *dev, int pasid);
> +extern void __iommu_sva_unbind_dev_all(struct device *dev);
>  #else /* CONFIG_IOMMU_SVA */
>  static inline int iommu_sva_device_init(struct device *dev,
>  					unsigned long features,
> @@ -908,6 +928,22 @@ static inline int
> iommu_sva_device_shutdown(struct device *dev)
>  {
>  	return -ENODEV;
>  }
> +
> +static inline int iommu_sva_bind_device(struct device *dev,
> +					struct mm_struct *mm, int *pasid,
> +					unsigned long flags, void *drvdata)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline int iommu_sva_unbind_device(struct device *dev, int pasid)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline void __iommu_sva_unbind_dev_all(struct device *dev)
> +{
> +}
>  #endif /* CONFIG_IOMMU_SVA */
> 
>  #endif /* __LINUX_IOMMU_H */
> --
> 2.15.1

^ permalink raw reply	[flat|nested] 317+ messages in thread

* RE: [PATCH 02/37] iommu/sva: Bind process address spaces to devices
@ 2018-02-13  7:54     ` Tian, Kevin
  0 siblings, 0 replies; 317+ messages in thread
From: Tian, Kevin @ 2018-02-13  7:54 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu, kvm
  Cc: mark.rutland, xieyisheng1, ilias.apalodimas, catalin.marinas,
	xuzaibo, jonathan.cameron, will.deacon, okaya, Liu, Yi L,
	lorenzo.pieralisi, Raj, Ashok, tn, joro, bharatku, rfranz, lenb,
	jacob.jun.pan, alex.williamson, robh+dt, thunder.leizhen,
	bhelgaas, shunyong.yang, dwmw2, liubo95, rjw, jcrouse, robdclark,
	hanjun.guo, sudeep.holla, robin.murphy, christian.koenig,
	nwatters

> From: Jean-Philippe Brucker
> Sent: Tuesday, February 13, 2018 2:33 AM
> 
> Add bind() and unbind() operations to the IOMMU API. Device drivers can
> use them to share process page tables with their devices. bind_group()
> is provided for VFIO's convenience, as it needs to provide a coherent
> interface on containers. Other device drivers will most likely want to
> use bind_device(), which binds a single device in the group.

I saw your bind_group implementation tries to bind the address space
for all devices within a group, which IMO has some problem. Based on PCIe
spec, packet routing on the bus doesn't take PASID into consideration. 
since devices within same group cannot be isolated based on requestor-ID
i.e. traffic not guaranteed going to IOMMU, enabling SVA on multiple devices
could cause undesired p2p.
 
If my understanding of PCIe spec is correct, probably we should fail 
calling bind_group()/bind_device() when there are multiple devices within 
the given group. If only one device then bind_group is essentially a wrapper
to bind_device.

> 
> Regardless of the IOMMU group or domain a device is in, device drivers
> should call bind() for each device that will use the PASID.
> 
> This patch only adds skeletons for the device driver API, most of the
> implementation is still missing.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  drivers/iommu/iommu-sva.c | 105
> ++++++++++++++++++++++++++++++++++++++++++++++
>  drivers/iommu/iommu.c     |  63 ++++++++++++++++++++++++++++
>  include/linux/iommu.h     |  36 ++++++++++++++++
>  3 files changed, 204 insertions(+)
> 
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> index cab5d723520f..593685d891bf 100644
> --- a/drivers/iommu/iommu-sva.c
> +++ b/drivers/iommu/iommu-sva.c
> @@ -9,6 +9,9 @@
> 
>  #include <linux/iommu.h>
> 
> +/* TODO: stub for the fault queue. Remove later. */
> +#define iommu_fault_queue_flush(...)
> +
>  /**
>   * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a
> device
>   * @dev: the device
> @@ -78,6 +81,8 @@ int iommu_sva_device_shutdown(struct device *dev)
>  	if (!domain)
>  		return -ENODEV;
> 
> +	__iommu_sva_unbind_dev_all(dev);
> +
>  	if (domain->ops->sva_device_shutdown)
>  		domain->ops->sva_device_shutdown(dev);
> 
> @@ -88,3 +93,103 @@ int iommu_sva_device_shutdown(struct device
> *dev)
>  	return 0;
>  }
>  EXPORT_SYMBOL_GPL(iommu_sva_device_shutdown);
> +
> +/**
> + * iommu_sva_bind_device() - Bind a process address space to a device
> + * @dev: the device
> + * @mm: the mm to bind, caller must hold a reference to it
> + * @pasid: valid address where the PASID will be stored
> + * @flags: bond properties (IOMMU_SVA_FEAT_*)
> + * @drvdata: private data passed to the mm exit handler
> + *
> + * Create a bond between device and task, allowing the device to access
> the mm
> + * using the returned PASID. A subsequent bind() for the same device and
> mm will
> + * reuse the bond (and return the same PASID), but users will have to call
> + * unbind() twice.

what's the point of requiring unbind twice?

> + *
> + * Callers should have taken care of setting up SVA for this device with
> + * iommu_sva_device_init() beforehand. They may also be notified of the
> bond
> + * disappearing, for example when the last task that uses the mm dies, by
> + * registering a notifier with iommu_register_mm_exit_handler().
> + *
> + * If IOMMU_SVA_FEAT_PASID is requested, a PASID is allocated and
> returned.
> + * TODO: The alternative, binding the non-PASID context to an mm, isn't
> + * supported at the moment because existing IOMMU domain types
> initialize the
> + * non-PASID context for iommu_map()/unmap() or bypass. This requires
> a new
> + * domain type.
> + *
> + * If IOMMU_SVA_FEAT_IOPF is not requested, the caller must pin down
> all
> + * mappings shared with the device. mlock() isn't sufficient, as it doesn't
> + * prevent minor page faults (e.g. copy-on-write). TODO: !IOPF isn't
> allowed at
> + * the moment.
> + *
> + * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an
> error
> + * is returned.
> + */
> +int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
> int *pasid,
> +			  unsigned long flags, void *drvdata)
> +{
> +	struct iommu_domain *domain;
> +	struct iommu_param *dev_param = dev->iommu_param;
> +
> +	domain = iommu_get_domain_for_dev(dev);
> +	if (!domain)
> +		return -EINVAL;
> +
> +	if (!pasid)
> +		return -EINVAL;
> +
> +	if (!dev_param || (flags & ~dev_param->sva_features))
> +		return -EINVAL;
> +
> +	if (flags != (IOMMU_SVA_FEAT_PASID | IOMMU_SVA_FEAT_IOPF))
> +		return -EINVAL;
> +
> +	return -ENOSYS; /* TODO */
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
> +
> +/**
> + * iommu_sva_unbind_device() - Remove a bond created with
> iommu_sva_bind_device
> + * @dev: the device
> + * @pasid: the pasid returned by bind()
> + *
> + * Remove bond between device and address space identified by @pasid.
> Users
> + * should not call unbind() if the corresponding mm exited (as the PASID
> might
> + * have been reallocated to another process.)
> + *
> + * The device must not be issuing any more transaction for this PASID. All
> + * outstanding page requests for this PASID must have been flushed to the
> IOMMU.
> + *
> + * Returns 0 on success, or an error value
> + */
> +int iommu_sva_unbind_device(struct device *dev, int pasid)
> +{
> +	struct iommu_domain *domain;
> +
> +	domain = iommu_get_domain_for_dev(dev);
> +	if (WARN_ON(!domain))
> +		return -EINVAL;
> +
> +	/*
> +	 * Caller stopped the device from issuing PASIDs, now make sure
> they are
> +	 * out of the fault queue.
> +	 */
> +	iommu_fault_queue_flush(dev);
> +
> +	return -ENOSYS; /* TODO */
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);
> +
> +/**
> + * __iommu_sva_unbind_dev_all() - Detach all address spaces from this
> device
> + *
> + * When detaching @device from a domain, IOMMU drivers should use
> this helper.
> + */
> +void __iommu_sva_unbind_dev_all(struct device *dev)
> +{
> +	iommu_fault_queue_flush(dev);
> +
> +	/* TODO */
> +}
> +EXPORT_SYMBOL_GPL(__iommu_sva_unbind_dev_all);
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index d4a4edaf2d8c..f977851c522b 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1535,6 +1535,69 @@ void iommu_detach_group(struct
> iommu_domain *domain, struct iommu_group *group)
>  }
>  EXPORT_SYMBOL_GPL(iommu_detach_group);
> 
> +/*
> + * iommu_sva_bind_group() - Share address space with all devices in the
> group.
> + * @group: the iommu group
> + * @mm: the mm to bind
> + * @pasid: valid address where the PASID will be stored
> + * @flags: bond properties (IOMMU_PROCESS_BIND_*)
> + * @drvdata: private data passed to the mm exit handler
> + *
> + * Create a bond between group and process, allowing devices in the
> group to
> + * access the process address space using @pasid.
> + *
> + * Refer to iommu_sva_bind_device() for more details.
> + *
> + * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an
> error
> + * is returned.
> + */
> +int iommu_sva_bind_group(struct iommu_group *group, struct
> mm_struct *mm,
> +			 int *pasid, unsigned long flags, void *drvdata)
> +{
> +	struct group_device *device;
> +	int ret = -ENODEV;
> +
> +	if (!group->domain)
> +		return -EINVAL;
> +
> +	mutex_lock(&group->mutex);
> +	list_for_each_entry(device, &group->devices, list) {
> +		ret = iommu_sva_bind_device(device->dev, mm, pasid,
> flags,
> +					    drvdata);
> +		if (ret)
> +			break;
> +	}
> +
> +	if (ret) {
> +		list_for_each_entry_continue_reverse(device, &group-
> >devices, list)
> +			iommu_sva_unbind_device(device->dev, *pasid);
> +	}
> +	mutex_unlock(&group->mutex);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_bind_group);
> +
> +/**
> + * iommu_sva_unbind_group() - Remove a bond created with
> iommu_sva_bind_group()
> + * @group: the group
> + * @pasid: the pasid returned by bind
> + *
> + * Refer to iommu_sva_unbind_device() for more details.
> + */
> +int iommu_sva_unbind_group(struct iommu_group *group, int pasid)
> +{
> +	struct group_device *device;
> +
> +	mutex_lock(&group->mutex);
> +	list_for_each_entry(device, &group->devices, list)
> +		iommu_sva_unbind_device(device->dev, pasid);
> +	mutex_unlock(&group->mutex);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_unbind_group);
> +
>  phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain,
> dma_addr_t iova)
>  {
>  	if (unlikely(domain->ops->iova_to_phys == NULL))
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index e9e09eecdece..1fb10d64b9e5 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -576,6 +576,10 @@ int iommu_fwspec_init(struct device *dev, struct
> fwnode_handle *iommu_fwnode,
>  void iommu_fwspec_free(struct device *dev);
>  int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids);
>  const struct iommu_ops *iommu_ops_from_fwnode(struct
> fwnode_handle *fwnode);
> +extern int iommu_sva_bind_group(struct iommu_group *group,
> +				struct mm_struct *mm, int *pasid,
> +				unsigned long flags, void *drvdata);
> +extern int iommu_sva_unbind_group(struct iommu_group *group, int
> pasid);
> 
>  #else /* CONFIG_IOMMU_API */
> 
> @@ -890,12 +894,28 @@ const struct iommu_ops
> *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
>  	return NULL;
>  }
> 
> +static inline int iommu_sva_bind_group(struct iommu_group *group,
> +				       struct mm_struct *mm, int *pasid,
> +				       unsigned long flags, void *drvdata)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline int iommu_sva_unbind_group(struct iommu_group *group,
> int pasid)
> +{
> +	return -ENODEV;
> +}
> +
>  #endif /* CONFIG_IOMMU_API */
> 
>  #ifdef CONFIG_IOMMU_SVA
>  extern int iommu_sva_device_init(struct device *dev, unsigned long
> features,
>  				 unsigned int max_pasid);
>  extern int iommu_sva_device_shutdown(struct device *dev);
> +extern int iommu_sva_bind_device(struct device *dev, struct mm_struct
> *mm,
> +				int *pasid, unsigned long flags, void
> *drvdata);
> +extern int iommu_sva_unbind_device(struct device *dev, int pasid);
> +extern void __iommu_sva_unbind_dev_all(struct device *dev);
>  #else /* CONFIG_IOMMU_SVA */
>  static inline int iommu_sva_device_init(struct device *dev,
>  					unsigned long features,
> @@ -908,6 +928,22 @@ static inline int
> iommu_sva_device_shutdown(struct device *dev)
>  {
>  	return -ENODEV;
>  }
> +
> +static inline int iommu_sva_bind_device(struct device *dev,
> +					struct mm_struct *mm, int *pasid,
> +					unsigned long flags, void *drvdata)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline int iommu_sva_unbind_device(struct device *dev, int pasid)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline void __iommu_sva_unbind_dev_all(struct device *dev)
> +{
> +}
>  #endif /* CONFIG_IOMMU_SVA */
> 
>  #endif /* __LINUX_IOMMU_H */
> --
> 2.15.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 02/37] iommu/sva: Bind process address spaces to devices
@ 2018-02-13  7:54     ` Tian, Kevin
  0 siblings, 0 replies; 317+ messages in thread
From: Tian, Kevin @ 2018-02-13  7:54 UTC (permalink / raw)
  To: linux-arm-kernel

> From: Jean-Philippe Brucker
> Sent: Tuesday, February 13, 2018 2:33 AM
> 
> Add bind() and unbind() operations to the IOMMU API. Device drivers can
> use them to share process page tables with their devices. bind_group()
> is provided for VFIO's convenience, as it needs to provide a coherent
> interface on containers. Other device drivers will most likely want to
> use bind_device(), which binds a single device in the group.

I saw your bind_group implementation tries to bind the address space
for all devices within a group, which IMO has some problem. Based on PCIe
spec, packet routing on the bus doesn't take PASID into consideration. 
since devices within same group cannot be isolated based on requestor-ID
i.e. traffic not guaranteed going to IOMMU, enabling SVA on multiple devices
could cause undesired p2p.
 
If my understanding of PCIe spec is correct, probably we should fail 
calling bind_group()/bind_device() when there are multiple devices within 
the given group. If only one device then bind_group is essentially a wrapper
to bind_device.

> 
> Regardless of the IOMMU group or domain a device is in, device drivers
> should call bind() for each device that will use the PASID.
> 
> This patch only adds skeletons for the device driver API, most of the
> implementation is still missing.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  drivers/iommu/iommu-sva.c | 105
> ++++++++++++++++++++++++++++++++++++++++++++++
>  drivers/iommu/iommu.c     |  63 ++++++++++++++++++++++++++++
>  include/linux/iommu.h     |  36 ++++++++++++++++
>  3 files changed, 204 insertions(+)
> 
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> index cab5d723520f..593685d891bf 100644
> --- a/drivers/iommu/iommu-sva.c
> +++ b/drivers/iommu/iommu-sva.c
> @@ -9,6 +9,9 @@
> 
>  #include <linux/iommu.h>
> 
> +/* TODO: stub for the fault queue. Remove later. */
> +#define iommu_fault_queue_flush(...)
> +
>  /**
>   * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a
> device
>   * @dev: the device
> @@ -78,6 +81,8 @@ int iommu_sva_device_shutdown(struct device *dev)
>  	if (!domain)
>  		return -ENODEV;
> 
> +	__iommu_sva_unbind_dev_all(dev);
> +
>  	if (domain->ops->sva_device_shutdown)
>  		domain->ops->sva_device_shutdown(dev);
> 
> @@ -88,3 +93,103 @@ int iommu_sva_device_shutdown(struct device
> *dev)
>  	return 0;
>  }
>  EXPORT_SYMBOL_GPL(iommu_sva_device_shutdown);
> +
> +/**
> + * iommu_sva_bind_device() - Bind a process address space to a device
> + * @dev: the device
> + * @mm: the mm to bind, caller must hold a reference to it
> + * @pasid: valid address where the PASID will be stored
> + * @flags: bond properties (IOMMU_SVA_FEAT_*)
> + * @drvdata: private data passed to the mm exit handler
> + *
> + * Create a bond between device and task, allowing the device to access
> the mm
> + * using the returned PASID. A subsequent bind() for the same device and
> mm will
> + * reuse the bond (and return the same PASID), but users will have to call
> + * unbind() twice.

what's the point of requiring unbind twice?

> + *
> + * Callers should have taken care of setting up SVA for this device with
> + * iommu_sva_device_init() beforehand. They may also be notified of the
> bond
> + * disappearing, for example when the last task that uses the mm dies, by
> + * registering a notifier with iommu_register_mm_exit_handler().
> + *
> + * If IOMMU_SVA_FEAT_PASID is requested, a PASID is allocated and
> returned.
> + * TODO: The alternative, binding the non-PASID context to an mm, isn't
> + * supported at the moment because existing IOMMU domain types
> initialize the
> + * non-PASID context for iommu_map()/unmap() or bypass. This requires
> a new
> + * domain type.
> + *
> + * If IOMMU_SVA_FEAT_IOPF is not requested, the caller must pin down
> all
> + * mappings shared with the device. mlock() isn't sufficient, as it doesn't
> + * prevent minor page faults (e.g. copy-on-write). TODO: !IOPF isn't
> allowed at
> + * the moment.
> + *
> + * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an
> error
> + * is returned.
> + */
> +int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
> int *pasid,
> +			  unsigned long flags, void *drvdata)
> +{
> +	struct iommu_domain *domain;
> +	struct iommu_param *dev_param = dev->iommu_param;
> +
> +	domain = iommu_get_domain_for_dev(dev);
> +	if (!domain)
> +		return -EINVAL;
> +
> +	if (!pasid)
> +		return -EINVAL;
> +
> +	if (!dev_param || (flags & ~dev_param->sva_features))
> +		return -EINVAL;
> +
> +	if (flags != (IOMMU_SVA_FEAT_PASID | IOMMU_SVA_FEAT_IOPF))
> +		return -EINVAL;
> +
> +	return -ENOSYS; /* TODO */
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
> +
> +/**
> + * iommu_sva_unbind_device() - Remove a bond created with
> iommu_sva_bind_device
> + * @dev: the device
> + * @pasid: the pasid returned by bind()
> + *
> + * Remove bond between device and address space identified by @pasid.
> Users
> + * should not call unbind() if the corresponding mm exited (as the PASID
> might
> + * have been reallocated to another process.)
> + *
> + * The device must not be issuing any more transaction for this PASID. All
> + * outstanding page requests for this PASID must have been flushed to the
> IOMMU.
> + *
> + * Returns 0 on success, or an error value
> + */
> +int iommu_sva_unbind_device(struct device *dev, int pasid)
> +{
> +	struct iommu_domain *domain;
> +
> +	domain = iommu_get_domain_for_dev(dev);
> +	if (WARN_ON(!domain))
> +		return -EINVAL;
> +
> +	/*
> +	 * Caller stopped the device from issuing PASIDs, now make sure
> they are
> +	 * out of the fault queue.
> +	 */
> +	iommu_fault_queue_flush(dev);
> +
> +	return -ENOSYS; /* TODO */
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);
> +
> +/**
> + * __iommu_sva_unbind_dev_all() - Detach all address spaces from this
> device
> + *
> + * When detaching @device from a domain, IOMMU drivers should use
> this helper.
> + */
> +void __iommu_sva_unbind_dev_all(struct device *dev)
> +{
> +	iommu_fault_queue_flush(dev);
> +
> +	/* TODO */
> +}
> +EXPORT_SYMBOL_GPL(__iommu_sva_unbind_dev_all);
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index d4a4edaf2d8c..f977851c522b 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1535,6 +1535,69 @@ void iommu_detach_group(struct
> iommu_domain *domain, struct iommu_group *group)
>  }
>  EXPORT_SYMBOL_GPL(iommu_detach_group);
> 
> +/*
> + * iommu_sva_bind_group() - Share address space with all devices in the
> group.
> + * @group: the iommu group
> + * @mm: the mm to bind
> + * @pasid: valid address where the PASID will be stored
> + * @flags: bond properties (IOMMU_PROCESS_BIND_*)
> + * @drvdata: private data passed to the mm exit handler
> + *
> + * Create a bond between group and process, allowing devices in the
> group to
> + * access the process address space using @pasid.
> + *
> + * Refer to iommu_sva_bind_device() for more details.
> + *
> + * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an
> error
> + * is returned.
> + */
> +int iommu_sva_bind_group(struct iommu_group *group, struct
> mm_struct *mm,
> +			 int *pasid, unsigned long flags, void *drvdata)
> +{
> +	struct group_device *device;
> +	int ret = -ENODEV;
> +
> +	if (!group->domain)
> +		return -EINVAL;
> +
> +	mutex_lock(&group->mutex);
> +	list_for_each_entry(device, &group->devices, list) {
> +		ret = iommu_sva_bind_device(device->dev, mm, pasid,
> flags,
> +					    drvdata);
> +		if (ret)
> +			break;
> +	}
> +
> +	if (ret) {
> +		list_for_each_entry_continue_reverse(device, &group-
> >devices, list)
> +			iommu_sva_unbind_device(device->dev, *pasid);
> +	}
> +	mutex_unlock(&group->mutex);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_bind_group);
> +
> +/**
> + * iommu_sva_unbind_group() - Remove a bond created with
> iommu_sva_bind_group()
> + * @group: the group
> + * @pasid: the pasid returned by bind
> + *
> + * Refer to iommu_sva_unbind_device() for more details.
> + */
> +int iommu_sva_unbind_group(struct iommu_group *group, int pasid)
> +{
> +	struct group_device *device;
> +
> +	mutex_lock(&group->mutex);
> +	list_for_each_entry(device, &group->devices, list)
> +		iommu_sva_unbind_device(device->dev, pasid);
> +	mutex_unlock(&group->mutex);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_unbind_group);
> +
>  phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain,
> dma_addr_t iova)
>  {
>  	if (unlikely(domain->ops->iova_to_phys == NULL))
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index e9e09eecdece..1fb10d64b9e5 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -576,6 +576,10 @@ int iommu_fwspec_init(struct device *dev, struct
> fwnode_handle *iommu_fwnode,
>  void iommu_fwspec_free(struct device *dev);
>  int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids);
>  const struct iommu_ops *iommu_ops_from_fwnode(struct
> fwnode_handle *fwnode);
> +extern int iommu_sva_bind_group(struct iommu_group *group,
> +				struct mm_struct *mm, int *pasid,
> +				unsigned long flags, void *drvdata);
> +extern int iommu_sva_unbind_group(struct iommu_group *group, int
> pasid);
> 
>  #else /* CONFIG_IOMMU_API */
> 
> @@ -890,12 +894,28 @@ const struct iommu_ops
> *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
>  	return NULL;
>  }
> 
> +static inline int iommu_sva_bind_group(struct iommu_group *group,
> +				       struct mm_struct *mm, int *pasid,
> +				       unsigned long flags, void *drvdata)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline int iommu_sva_unbind_group(struct iommu_group *group,
> int pasid)
> +{
> +	return -ENODEV;
> +}
> +
>  #endif /* CONFIG_IOMMU_API */
> 
>  #ifdef CONFIG_IOMMU_SVA
>  extern int iommu_sva_device_init(struct device *dev, unsigned long
> features,
>  				 unsigned int max_pasid);
>  extern int iommu_sva_device_shutdown(struct device *dev);
> +extern int iommu_sva_bind_device(struct device *dev, struct mm_struct
> *mm,
> +				int *pasid, unsigned long flags, void
> *drvdata);
> +extern int iommu_sva_unbind_device(struct device *dev, int pasid);
> +extern void __iommu_sva_unbind_dev_all(struct device *dev);
>  #else /* CONFIG_IOMMU_SVA */
>  static inline int iommu_sva_device_init(struct device *dev,
>  					unsigned long features,
> @@ -908,6 +928,22 @@ static inline int
> iommu_sva_device_shutdown(struct device *dev)
>  {
>  	return -ENODEV;
>  }
> +
> +static inline int iommu_sva_bind_device(struct device *dev,
> +					struct mm_struct *mm, int *pasid,
> +					unsigned long flags, void *drvdata)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline int iommu_sva_unbind_device(struct device *dev, int pasid)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline void __iommu_sva_unbind_dev_all(struct device *dev)
> +{
> +}
>  #endif /* CONFIG_IOMMU_SVA */
> 
>  #endif /* __LINUX_IOMMU_H */
> --
> 2.15.1

^ permalink raw reply	[flat|nested] 317+ messages in thread

* RE: [PATCH 04/37] iommu/sva: Add a mm_exit callback for device drivers
  2018-02-12 18:33     ` Jean-Philippe Brucker
  (?)
@ 2018-02-13  8:11         ` Tian, Kevin
  -1 siblings, 0 replies; 317+ messages in thread
From: Tian, Kevin @ 2018-02-13  8:11 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: joro-zLv9SwRftAIdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	mark.rutland-5wv7dgnIgG8, catalin.marinas-5wv7dgnIgG8,
	will.deacon-5wv7dgnIgG8, lorenzo.pieralisi-5wv7dgnIgG8,
	hanjun.guo-QSEj5FYQhm4dnm+yROfE0A, sudeep.holla-5wv7dgnIgG8,
	rjw-LthD3rsA81gm4RdzfppkhA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	robin.murphy-5wv7dgnIgG8, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	tn-nYOzD4b6Jr9Wk0Htik3J/w, liubo95-hv44wF8Li93QT0dZR+AlfA,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	xieyisheng1-hv44wF8Li91hl2p70BpVqQ

> From: Jean-Philippe Brucker
> Sent: Tuesday, February 13, 2018 2:33 AM
> 
> When an mm exits, devices that were bound to it must stop performing
> DMA
> on its PASID. Let device drivers register a callback to be notified on mm
> exit. Add the callback to the iommu_param structure attached to struct
> device.

what about registering the callback in sva_device_init? 

> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
> ---
>  drivers/iommu/iommu-sva.c | 54
> +++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/iommu.h     | 18 ++++++++++++++++
>  2 files changed, 72 insertions(+)
> 
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> index f9af9d66b3ed..90b524c99d3d 100644
> --- a/drivers/iommu/iommu-sva.c
> +++ b/drivers/iommu/iommu-sva.c
> @@ -569,3 +569,57 @@ void __iommu_sva_unbind_dev_all(struct device
> *dev)
>  	spin_unlock(&iommu_sva_lock);
>  }
>  EXPORT_SYMBOL_GPL(__iommu_sva_unbind_dev_all);
> +
> +/**
> + * iommu_register_mm_exit_handler() - Set a callback for mm exit
> + * @dev: the device
> + * @handler: exit handler
> + *
> + * Users of the bind/unbind API should call this function to set a
> + * device-specific callback telling them when a mm is exiting.
> + *
> + * After the callback returns, the device must not issue any more
> transaction
> + * with the PASID given as argument to the handler. In addition the
> handler gets
> + * an opaque pointer corresponding to the drvdata passed as argument of
> bind().
> + *
> + * The handler itself should return 0 on success, and an appropriate error
> code
> + * otherwise.
> + */
> +int iommu_register_mm_exit_handler(struct device *dev,
> +				   iommu_mm_exit_handler_t handler)
> +{
> +	struct iommu_param *dev_param = dev->iommu_param;
> +
> +	if (!dev_param)
> +		return -EINVAL;
> +
> +	/*
> +	 * FIXME: racy. Same as iommu_sva_device_init, but here we'll
> need a
> +	 * spinlock to call the mm_exit param from atomic context.
> +	 */
> +	if (dev_param->mm_exit)
> +		return -EBUSY;
> +
> +	get_device(dev);
> +	dev_param->mm_exit = handler;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(iommu_register_mm_exit_handler);
> +
> +/**
> + * iommu_unregister_mm_exit_handler() - Remove mm exit callback
> + */
> +int iommu_unregister_mm_exit_handler(struct device *dev)
> +{
> +	struct iommu_param *dev_param = dev->iommu_param;
> +
> +	if (!dev_param || !dev_param->mm_exit)
> +		return -EINVAL;
> +
> +	dev_param->mm_exit = NULL;
> +	put_device(dev);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(iommu_unregister_mm_exit_handler);
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 09d85f44142a..1b1a16892ac1 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -65,6 +65,8 @@ typedef int (*iommu_dev_fault_handler_t)(struct
> iommu_fault_event *, void *);
>  /* Request I/O page fault support */
>  #define IOMMU_SVA_FEAT_IOPF		(1 << 1)
> 
> +typedef int (*iommu_mm_exit_handler_t)(struct device *dev, int pasid,
> void *);
> +
>  struct iommu_domain_geometry {
>  	dma_addr_t aperture_start; /* First address that can be mapped
> */
>  	dma_addr_t aperture_end;   /* Last address that can be mapped
> */
> @@ -424,6 +426,7 @@ struct iommu_param {
>  	unsigned int min_pasid;
>  	unsigned int max_pasid;
>  	struct list_head mm_list;
> +	iommu_mm_exit_handler_t mm_exit;
>  };
> 
>  int  iommu_device_register(struct iommu_device *iommu);
> @@ -941,6 +944,10 @@ extern int iommu_sva_bind_device(struct device
> *dev, struct mm_struct *mm,
>  				int *pasid, unsigned long flags, void
> *drvdata);
>  extern int iommu_sva_unbind_device(struct device *dev, int pasid);
>  extern void __iommu_sva_unbind_dev_all(struct device *dev);
> +extern int iommu_register_mm_exit_handler(struct device *dev,
> +					  iommu_mm_exit_handler_t
> handler);
> +extern int iommu_unregister_mm_exit_handler(struct device *dev);
> +
>  #else /* CONFIG_IOMMU_SVA */
>  static inline int iommu_sva_device_init(struct device *dev,
>  					unsigned long features,
> @@ -969,6 +976,17 @@ static inline int iommu_sva_unbind_device(struct
> device *dev, int pasid)
>  static inline void __iommu_sva_unbind_dev_all(struct device *dev)
>  {
>  }
> +
> +static inline int iommu_register_mm_exit_handler(struct device *dev,
> +						 iommu_mm_exit_handler_t
> handler)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline int iommu_unregister_mm_exit_handler(struct device *dev)
> +{
> +	return -ENODEV;
> +}
>  #endif /* CONFIG_IOMMU_SVA */
> 
>  #endif /* __LINUX_IOMMU_H */
> --
> 2.15.1

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 317+ messages in thread

* RE: [PATCH 04/37] iommu/sva: Add a mm_exit callback for device drivers
@ 2018-02-13  8:11         ` Tian, Kevin
  0 siblings, 0 replies; 317+ messages in thread
From: Tian, Kevin @ 2018-02-13  8:11 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu, kvm
  Cc: mark.rutland, xieyisheng1, ilias.apalodimas, catalin.marinas,
	xuzaibo, jonathan.cameron, will.deacon, okaya, Liu, Yi L,
	lorenzo.pieralisi, Raj, Ashok, tn, joro, bharatku, rfranz, lenb,
	jacob.jun.pan, alex.williamson, robh+dt, thunder.leizhen,
	bhelgaas, shunyong.yang, dwmw2, liubo95, rjw, jcrouse, robdclark,
	hanjun.guo, sudeep.holla, robin.murphy, christian.koenig,
	nwatters

> From: Jean-Philippe Brucker
> Sent: Tuesday, February 13, 2018 2:33 AM
> 
> When an mm exits, devices that were bound to it must stop performing
> DMA
> on its PASID. Let device drivers register a callback to be notified on mm
> exit. Add the callback to the iommu_param structure attached to struct
> device.

what about registering the callback in sva_device_init? 

> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  drivers/iommu/iommu-sva.c | 54
> +++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/iommu.h     | 18 ++++++++++++++++
>  2 files changed, 72 insertions(+)
> 
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> index f9af9d66b3ed..90b524c99d3d 100644
> --- a/drivers/iommu/iommu-sva.c
> +++ b/drivers/iommu/iommu-sva.c
> @@ -569,3 +569,57 @@ void __iommu_sva_unbind_dev_all(struct device
> *dev)
>  	spin_unlock(&iommu_sva_lock);
>  }
>  EXPORT_SYMBOL_GPL(__iommu_sva_unbind_dev_all);
> +
> +/**
> + * iommu_register_mm_exit_handler() - Set a callback for mm exit
> + * @dev: the device
> + * @handler: exit handler
> + *
> + * Users of the bind/unbind API should call this function to set a
> + * device-specific callback telling them when a mm is exiting.
> + *
> + * After the callback returns, the device must not issue any more
> transaction
> + * with the PASID given as argument to the handler. In addition the
> handler gets
> + * an opaque pointer corresponding to the drvdata passed as argument of
> bind().
> + *
> + * The handler itself should return 0 on success, and an appropriate error
> code
> + * otherwise.
> + */
> +int iommu_register_mm_exit_handler(struct device *dev,
> +				   iommu_mm_exit_handler_t handler)
> +{
> +	struct iommu_param *dev_param = dev->iommu_param;
> +
> +	if (!dev_param)
> +		return -EINVAL;
> +
> +	/*
> +	 * FIXME: racy. Same as iommu_sva_device_init, but here we'll
> need a
> +	 * spinlock to call the mm_exit param from atomic context.
> +	 */
> +	if (dev_param->mm_exit)
> +		return -EBUSY;
> +
> +	get_device(dev);
> +	dev_param->mm_exit = handler;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(iommu_register_mm_exit_handler);
> +
> +/**
> + * iommu_unregister_mm_exit_handler() - Remove mm exit callback
> + */
> +int iommu_unregister_mm_exit_handler(struct device *dev)
> +{
> +	struct iommu_param *dev_param = dev->iommu_param;
> +
> +	if (!dev_param || !dev_param->mm_exit)
> +		return -EINVAL;
> +
> +	dev_param->mm_exit = NULL;
> +	put_device(dev);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(iommu_unregister_mm_exit_handler);
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 09d85f44142a..1b1a16892ac1 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -65,6 +65,8 @@ typedef int (*iommu_dev_fault_handler_t)(struct
> iommu_fault_event *, void *);
>  /* Request I/O page fault support */
>  #define IOMMU_SVA_FEAT_IOPF		(1 << 1)
> 
> +typedef int (*iommu_mm_exit_handler_t)(struct device *dev, int pasid,
> void *);
> +
>  struct iommu_domain_geometry {
>  	dma_addr_t aperture_start; /* First address that can be mapped
> */
>  	dma_addr_t aperture_end;   /* Last address that can be mapped
> */
> @@ -424,6 +426,7 @@ struct iommu_param {
>  	unsigned int min_pasid;
>  	unsigned int max_pasid;
>  	struct list_head mm_list;
> +	iommu_mm_exit_handler_t mm_exit;
>  };
> 
>  int  iommu_device_register(struct iommu_device *iommu);
> @@ -941,6 +944,10 @@ extern int iommu_sva_bind_device(struct device
> *dev, struct mm_struct *mm,
>  				int *pasid, unsigned long flags, void
> *drvdata);
>  extern int iommu_sva_unbind_device(struct device *dev, int pasid);
>  extern void __iommu_sva_unbind_dev_all(struct device *dev);
> +extern int iommu_register_mm_exit_handler(struct device *dev,
> +					  iommu_mm_exit_handler_t
> handler);
> +extern int iommu_unregister_mm_exit_handler(struct device *dev);
> +
>  #else /* CONFIG_IOMMU_SVA */
>  static inline int iommu_sva_device_init(struct device *dev,
>  					unsigned long features,
> @@ -969,6 +976,17 @@ static inline int iommu_sva_unbind_device(struct
> device *dev, int pasid)
>  static inline void __iommu_sva_unbind_dev_all(struct device *dev)
>  {
>  }
> +
> +static inline int iommu_register_mm_exit_handler(struct device *dev,
> +						 iommu_mm_exit_handler_t
> handler)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline int iommu_unregister_mm_exit_handler(struct device *dev)
> +{
> +	return -ENODEV;
> +}
>  #endif /* CONFIG_IOMMU_SVA */
> 
>  #endif /* __LINUX_IOMMU_H */
> --
> 2.15.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 04/37] iommu/sva: Add a mm_exit callback for device drivers
@ 2018-02-13  8:11         ` Tian, Kevin
  0 siblings, 0 replies; 317+ messages in thread
From: Tian, Kevin @ 2018-02-13  8:11 UTC (permalink / raw)
  To: linux-arm-kernel

> From: Jean-Philippe Brucker
> Sent: Tuesday, February 13, 2018 2:33 AM
> 
> When an mm exits, devices that were bound to it must stop performing
> DMA
> on its PASID. Let device drivers register a callback to be notified on mm
> exit. Add the callback to the iommu_param structure attached to struct
> device.

what about registering the callback in sva_device_init? 

> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  drivers/iommu/iommu-sva.c | 54
> +++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/iommu.h     | 18 ++++++++++++++++
>  2 files changed, 72 insertions(+)
> 
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> index f9af9d66b3ed..90b524c99d3d 100644
> --- a/drivers/iommu/iommu-sva.c
> +++ b/drivers/iommu/iommu-sva.c
> @@ -569,3 +569,57 @@ void __iommu_sva_unbind_dev_all(struct device
> *dev)
>  	spin_unlock(&iommu_sva_lock);
>  }
>  EXPORT_SYMBOL_GPL(__iommu_sva_unbind_dev_all);
> +
> +/**
> + * iommu_register_mm_exit_handler() - Set a callback for mm exit
> + * @dev: the device
> + * @handler: exit handler
> + *
> + * Users of the bind/unbind API should call this function to set a
> + * device-specific callback telling them when a mm is exiting.
> + *
> + * After the callback returns, the device must not issue any more
> transaction
> + * with the PASID given as argument to the handler. In addition the
> handler gets
> + * an opaque pointer corresponding to the drvdata passed as argument of
> bind().
> + *
> + * The handler itself should return 0 on success, and an appropriate error
> code
> + * otherwise.
> + */
> +int iommu_register_mm_exit_handler(struct device *dev,
> +				   iommu_mm_exit_handler_t handler)
> +{
> +	struct iommu_param *dev_param = dev->iommu_param;
> +
> +	if (!dev_param)
> +		return -EINVAL;
> +
> +	/*
> +	 * FIXME: racy. Same as iommu_sva_device_init, but here we'll
> need a
> +	 * spinlock to call the mm_exit param from atomic context.
> +	 */
> +	if (dev_param->mm_exit)
> +		return -EBUSY;
> +
> +	get_device(dev);
> +	dev_param->mm_exit = handler;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(iommu_register_mm_exit_handler);
> +
> +/**
> + * iommu_unregister_mm_exit_handler() - Remove mm exit callback
> + */
> +int iommu_unregister_mm_exit_handler(struct device *dev)
> +{
> +	struct iommu_param *dev_param = dev->iommu_param;
> +
> +	if (!dev_param || !dev_param->mm_exit)
> +		return -EINVAL;
> +
> +	dev_param->mm_exit = NULL;
> +	put_device(dev);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(iommu_unregister_mm_exit_handler);
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 09d85f44142a..1b1a16892ac1 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -65,6 +65,8 @@ typedef int (*iommu_dev_fault_handler_t)(struct
> iommu_fault_event *, void *);
>  /* Request I/O page fault support */
>  #define IOMMU_SVA_FEAT_IOPF		(1 << 1)
> 
> +typedef int (*iommu_mm_exit_handler_t)(struct device *dev, int pasid,
> void *);
> +
>  struct iommu_domain_geometry {
>  	dma_addr_t aperture_start; /* First address that can be mapped
> */
>  	dma_addr_t aperture_end;   /* Last address that can be mapped
> */
> @@ -424,6 +426,7 @@ struct iommu_param {
>  	unsigned int min_pasid;
>  	unsigned int max_pasid;
>  	struct list_head mm_list;
> +	iommu_mm_exit_handler_t mm_exit;
>  };
> 
>  int  iommu_device_register(struct iommu_device *iommu);
> @@ -941,6 +944,10 @@ extern int iommu_sva_bind_device(struct device
> *dev, struct mm_struct *mm,
>  				int *pasid, unsigned long flags, void
> *drvdata);
>  extern int iommu_sva_unbind_device(struct device *dev, int pasid);
>  extern void __iommu_sva_unbind_dev_all(struct device *dev);
> +extern int iommu_register_mm_exit_handler(struct device *dev,
> +					  iommu_mm_exit_handler_t
> handler);
> +extern int iommu_unregister_mm_exit_handler(struct device *dev);
> +
>  #else /* CONFIG_IOMMU_SVA */
>  static inline int iommu_sva_device_init(struct device *dev,
>  					unsigned long features,
> @@ -969,6 +976,17 @@ static inline int iommu_sva_unbind_device(struct
> device *dev, int pasid)
>  static inline void __iommu_sva_unbind_dev_all(struct device *dev)
>  {
>  }
> +
> +static inline int iommu_register_mm_exit_handler(struct device *dev,
> +						 iommu_mm_exit_handler_t
> handler)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline int iommu_unregister_mm_exit_handler(struct device *dev)
> +{
> +	return -ENODEV;
> +}
>  #endif /* CONFIG_IOMMU_SVA */
> 
>  #endif /* __LINUX_IOMMU_H */
> --
> 2.15.1

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 01/37] iommu: Introduce Shared Virtual Addressing API
  2018-02-13  7:31       ` Tian, Kevin
  (?)
@ 2018-02-13 12:40           ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-13 12:40 UTC (permalink / raw)
  To: Tian, Kevin, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Mark Rutland, bharatku-gjFFaj9aHVfQT0dZR+AlfA, Raj, Ashok,
	shunyong.yang-PT9Dzx9SjPiXmMXjJBpWqg, rjw-LthD3rsA81gm4RdzfppkhA,
	Catalin Marinas, xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, Sudeep Holla,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	christian.koenig-5C7GfCeVMHo, lenb-DgEjT+Ai2ygdnm+yROfE0A

Hi Kevin,

Thanks for taking a look!

On 13/02/18 07:31, Tian, Kevin wrote:
>> From: Jean-Philippe Brucker
>> Sent: Tuesday, February 13, 2018 2:33 AM
>>
>> Shared Virtual Addressing (SVA) provides a way for device drivers to bind
>> process address spaces to devices. This requires the IOMMU to support the
>> same page table format as CPUs, and requires the system to support I/O
> 
> "same" is a bit restrictive. "compatible" is better as you used in coverletter. :-)

Indeed

[..]
>> +config IOMMU_SVA
>> +	bool "Shared Virtual Addressing API for the IOMMU"
>> +	select IOMMU_API
>> +	help
>> +	  Enable process address space management for the IOMMU API. In
>> systems
>> +	  that support it, device drivers can bind process address spaces to
>> +	  devices and share their page tables using this API.
> 
> "their page table" is a bit confusing here.

Maybe this is sufficient:
"In systems that support it, drivers can share process address spaces with
their devices using this API."

[...]
>> +
>> +/**
>> + * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a
>> device
>> + * @dev: the device
>> + * @features: bitmask of features that need to be initialized
>> + * @max_pasid: max PASID value supported by the device
>> + *
>> + * Users of the bind()/unbind() API must call this function to initialize all
>> + * features required for SVA.
>> + *
>> + * - If the device should support multiple address spaces (e.g. PCI PASID),
>> + *   IOMMU_SVA_FEAT_PASID must be requested.
> 
> I think it is by default assumed when using this API, based on definition of
> SVA. Can you elaborate the situation where this flag can be cleared?

When passing a device to userspace, you could also share its non-pasid
address space with the process. It requires a new domain type so is left
as a TODO in patch 2/37. I did get requests for this feature, though I
think it was mostly for prototyping. I guess I could remove the flag, and
reintroduce it as IOMMU_SVA_FEAT_NO_PASID later on.

>> + *
>> + *   By default the PASID allocated during bind() is limited by the IOMMU
>> + *   capacity, and by the device PASID width defined in the PCI capability or
>> in
>> + *   the firmware description. Setting @max_pasid to a non-zero value
>> smaller
>> + *   than this limit overrides it.
>> + *
>> + * - If the device should support I/O Page Faults (e.g. PCI PRI),
>> + *   IOMMU_SVA_FEAT_IOPF must be requested.
>> + *
>> + * The device should not be be performing any DMA while this function is
> 
> remove double "be"
> 
>> + * running.
> 
> "otherwise the behavior is undefined"

ok

[...]
>> +	ret = domain->ops->sva_device_init(dev, features, &min_pasid,
>> +					   &max_pasid);
>> +	if (ret)
>> +		return ret;
>> +
>> +	/* FIXME: racy. Next version should have a mutex (same as fault
>> handler) */
>> +	dev_param->sva_features = features;
>> +	dev_param->min_pasid = min_pasid;
>> +	dev_param->max_pasid = max_pasid;
> 
> what's the point of min_pasid here?

Arm SMMUv3 uses entry 0 of the PASID table for the default (non-pasid)
context, so it needs to set min_pasid to 1. AMD IOMMU recently added a
similar feature (GIoSup), if I understood correctly.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 01/37] iommu: Introduce Shared Virtual Addressing API
@ 2018-02-13 12:40           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-13 12:40 UTC (permalink / raw)
  To: Tian, Kevin, linux-arm-kernel, linux-pci, linux-acpi, devicetree,
	iommu, kvm
  Cc: Mark Rutland, bharatku, Raj, Ashok, shunyong.yang, rjw,
	Catalin Marinas, xuzaibo, ilias.apalodimas, Will Deacon,
	Joerg Roedel, okaya, bhelgaas, robh+dt, Sudeep Holla, rfranz,
	dwmw2, christian.koenig, lenb

Hi Kevin,

Thanks for taking a look!

On 13/02/18 07:31, Tian, Kevin wrote:
>> From: Jean-Philippe Brucker
>> Sent: Tuesday, February 13, 2018 2:33 AM
>>
>> Shared Virtual Addressing (SVA) provides a way for device drivers to bind
>> process address spaces to devices. This requires the IOMMU to support the
>> same page table format as CPUs, and requires the system to support I/O
> 
> "same" is a bit restrictive. "compatible" is better as you used in coverletter. :-)

Indeed

[..]
>> +config IOMMU_SVA
>> +	bool "Shared Virtual Addressing API for the IOMMU"
>> +	select IOMMU_API
>> +	help
>> +	  Enable process address space management for the IOMMU API. In
>> systems
>> +	  that support it, device drivers can bind process address spaces to
>> +	  devices and share their page tables using this API.
> 
> "their page table" is a bit confusing here.

Maybe this is sufficient:
"In systems that support it, drivers can share process address spaces with
their devices using this API."

[...]
>> +
>> +/**
>> + * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a
>> device
>> + * @dev: the device
>> + * @features: bitmask of features that need to be initialized
>> + * @max_pasid: max PASID value supported by the device
>> + *
>> + * Users of the bind()/unbind() API must call this function to initialize all
>> + * features required for SVA.
>> + *
>> + * - If the device should support multiple address spaces (e.g. PCI PASID),
>> + *   IOMMU_SVA_FEAT_PASID must be requested.
> 
> I think it is by default assumed when using this API, based on definition of
> SVA. Can you elaborate the situation where this flag can be cleared?

When passing a device to userspace, you could also share its non-pasid
address space with the process. It requires a new domain type so is left
as a TODO in patch 2/37. I did get requests for this feature, though I
think it was mostly for prototyping. I guess I could remove the flag, and
reintroduce it as IOMMU_SVA_FEAT_NO_PASID later on.

>> + *
>> + *   By default the PASID allocated during bind() is limited by the IOMMU
>> + *   capacity, and by the device PASID width defined in the PCI capability or
>> in
>> + *   the firmware description. Setting @max_pasid to a non-zero value
>> smaller
>> + *   than this limit overrides it.
>> + *
>> + * - If the device should support I/O Page Faults (e.g. PCI PRI),
>> + *   IOMMU_SVA_FEAT_IOPF must be requested.
>> + *
>> + * The device should not be be performing any DMA while this function is
> 
> remove double "be"
> 
>> + * running.
> 
> "otherwise the behavior is undefined"

ok

[...]
>> +	ret = domain->ops->sva_device_init(dev, features, &min_pasid,
>> +					   &max_pasid);
>> +	if (ret)
>> +		return ret;
>> +
>> +	/* FIXME: racy. Next version should have a mutex (same as fault
>> handler) */
>> +	dev_param->sva_features = features;
>> +	dev_param->min_pasid = min_pasid;
>> +	dev_param->max_pasid = max_pasid;
> 
> what's the point of min_pasid here?

Arm SMMUv3 uses entry 0 of the PASID table for the default (non-pasid)
context, so it needs to set min_pasid to 1. AMD IOMMU recently added a
similar feature (GIoSup), if I understood correctly.

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 01/37] iommu: Introduce Shared Virtual Addressing API
@ 2018-02-13 12:40           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-13 12:40 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Kevin,

Thanks for taking a look!

On 13/02/18 07:31, Tian, Kevin wrote:
>> From: Jean-Philippe Brucker
>> Sent: Tuesday, February 13, 2018 2:33 AM
>>
>> Shared Virtual Addressing (SVA) provides a way for device drivers to bind
>> process address spaces to devices. This requires the IOMMU to support the
>> same page table format as CPUs, and requires the system to support I/O
> 
> "same" is a bit restrictive. "compatible" is better as you used in coverletter. :-)

Indeed

[..]
>> +config IOMMU_SVA
>> +	bool "Shared Virtual Addressing API for the IOMMU"
>> +	select IOMMU_API
>> +	help
>> +	  Enable process address space management for the IOMMU API. In
>> systems
>> +	  that support it, device drivers can bind process address spaces to
>> +	  devices and share their page tables using this API.
> 
> "their page table" is a bit confusing here.

Maybe this is sufficient:
"In systems that support it, drivers can share process address spaces with
their devices using this API."

[...]
>> +
>> +/**
>> + * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a
>> device
>> + * @dev: the device
>> + * @features: bitmask of features that need to be initialized
>> + * @max_pasid: max PASID value supported by the device
>> + *
>> + * Users of the bind()/unbind() API must call this function to initialize all
>> + * features required for SVA.
>> + *
>> + * - If the device should support multiple address spaces (e.g. PCI PASID),
>> + *   IOMMU_SVA_FEAT_PASID must be requested.
> 
> I think it is by default assumed when using this API, based on definition of
> SVA. Can you elaborate the situation where this flag can be cleared?

When passing a device to userspace, you could also share its non-pasid
address space with the process. It requires a new domain type so is left
as a TODO in patch 2/37. I did get requests for this feature, though I
think it was mostly for prototyping. I guess I could remove the flag, and
reintroduce it as IOMMU_SVA_FEAT_NO_PASID later on.

>> + *
>> + *   By default the PASID allocated during bind() is limited by the IOMMU
>> + *   capacity, and by the device PASID width defined in the PCI capability or
>> in
>> + *   the firmware description. Setting @max_pasid to a non-zero value
>> smaller
>> + *   than this limit overrides it.
>> + *
>> + * - If the device should support I/O Page Faults (e.g. PCI PRI),
>> + *   IOMMU_SVA_FEAT_IOPF must be requested.
>> + *
>> + * The device should not be be performing any DMA while this function is
> 
> remove double "be"
> 
>> + * running.
> 
> "otherwise the behavior is undefined"

ok

[...]
>> +	ret = domain->ops->sva_device_init(dev, features, &min_pasid,
>> +					   &max_pasid);
>> +	if (ret)
>> +		return ret;
>> +
>> +	/* FIXME: racy. Next version should have a mutex (same as fault
>> handler) */
>> +	dev_param->sva_features = features;
>> +	dev_param->min_pasid = min_pasid;
>> +	dev_param->max_pasid = max_pasid;
> 
> what's the point of min_pasid here?

Arm SMMUv3 uses entry 0 of the PASID table for the default (non-pasid)
context, so it needs to set min_pasid to 1. AMD IOMMU recently added a
similar feature (GIoSup), if I understood correctly.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices
  2018-02-13  7:54     ` Tian, Kevin
  (?)
  (?)
@ 2018-02-13 12:57       ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-13 12:57 UTC (permalink / raw)
  To: Tian, Kevin, linux-arm-kernel, linux-pci, linux-acpi, devicetree,
	iommu, kvm
  Cc: joro, robh+dt, Mark Rutland, Catalin Marinas, Will Deacon,
	Lorenzo Pieralisi, hanjun.guo, Sudeep Holla, rjw, lenb,
	Robin Murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1

On 13/02/18 07:54, Tian, Kevin wrote:
>> From: Jean-Philippe Brucker
>> Sent: Tuesday, February 13, 2018 2:33 AM
>>
>> Add bind() and unbind() operations to the IOMMU API. Device drivers can
>> use them to share process page tables with their devices. bind_group()
>> is provided for VFIO's convenience, as it needs to provide a coherent
>> interface on containers. Other device drivers will most likely want to
>> use bind_device(), which binds a single device in the group.
> 
> I saw your bind_group implementation tries to bind the address space
> for all devices within a group, which IMO has some problem. Based on PCIe
> spec, packet routing on the bus doesn't take PASID into consideration. 
> since devices within same group cannot be isolated based on requestor-ID
> i.e. traffic not guaranteed going to IOMMU, enabling SVA on multiple devices
> could cause undesired p2p.
But so does enabling "classic" DMA... If two devices are not protected by
ACS for example, they are put in the same IOMMU group, and one device
might be able to snoop the other's DMA. VFIO allows userspace to create a
container for them and use MAP/UNMAP, but makes it explicit to the user
that for DMA, these devices are not isolated and must be considered as a
single device (you can't pass them to different VMs or put them in
different containers). So I tried to keep the same idea as MAP/UNMAP for
SVA, performing BIND/UNBIND operations on the VFIO container instead of
the device.

I kept the analogy simple though, because I don't think there will be many
SVA-capable systems that require IOMMU groups. They will likely implement
proper device isolation. Unlike iommu_attach_device(), bind_device()
doesn't call bind_group(), because keeping bonds consistent in groups is
complicated, not worth implementing (drivers can explicitly bind() all
devices that need it) and probably wouldn't ever be used. I also can't
test it. But maybe we could implement the following for now:

* bind_device() fails if the device's group has more than one device,
otherwise calls __bind_device(). This prevents device drivers that are
oblivious to IOMMU groups from opening a backdoor.

* bind_group() calls __bind_device() for all devices in group. This way
users that are aware of IOMMU groups can still use them safely. Note that
at the moment bind_group() fails as soon as it finds a device that doesn't
support SVA. Having all devices support SVA in a given group is
unrealistic and this behavior ought to be improved.

* hotplugging a device into a group still succeeds even if the group
already has mm bonds. Same happens for classic DMA, a hotplugged device
will have access to all mappings already present in the domain.

> If my understanding of PCIe spec is correct, probably we should fail 
> calling bind_group()/bind_device() when there are multiple devices within 
> the given group. If only one device then bind_group is essentially a wrapper
> to bind_device.>>
>> Regardless of the IOMMU group or domain a device is in, device drivers
>> should call bind() for each device that will use the PASID.
>>
[...]
>> +/**
>> + * iommu_sva_bind_device() - Bind a process address space to a device
>> + * @dev: the device
>> + * @mm: the mm to bind, caller must hold a reference to it
>> + * @pasid: valid address where the PASID will be stored
>> + * @flags: bond properties (IOMMU_SVA_FEAT_*)
>> + * @drvdata: private data passed to the mm exit handler
>> + *
>> + * Create a bond between device and task, allowing the device to access
>> the mm
>> + * using the returned PASID. A subsequent bind() for the same device and
>> mm will
>> + * reuse the bond (and return the same PASID), but users will have to call
>> + * unbind() twice.
> 
> what's the point of requiring unbind twice?

Mmh, that was necessary when we kept bond information as domain<->mm, but
since it's now device<->mm, we can probably remove the bond refcount. I
consider that a bind() between a given device and mm will always be issued
by the same driver.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices
@ 2018-02-13 12:57       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-13 12:57 UTC (permalink / raw)
  To: Tian, Kevin, linux-arm-kernel, linux-pci, linux-acpi, devicetree,
	iommu, kvm
  Cc: joro, robh+dt, Mark Rutland, Catalin Marinas, Will Deacon,
	Lorenzo Pieralisi, hanjun.guo, Sudeep Holla, rjw, lenb,
	Robin Murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, Liu, Yi L, Raj, Ashok, robdclark,
	christian.koenig, bharatku, mykyta.iziumtsev

On 13/02/18 07:54, Tian, Kevin wrote:
>> From: Jean-Philippe Brucker
>> Sent: Tuesday, February 13, 2018 2:33 AM
>>
>> Add bind() and unbind() operations to the IOMMU API. Device drivers can
>> use them to share process page tables with their devices. bind_group()
>> is provided for VFIO's convenience, as it needs to provide a coherent
>> interface on containers. Other device drivers will most likely want to
>> use bind_device(), which binds a single device in the group.
> 
> I saw your bind_group implementation tries to bind the address space
> for all devices within a group, which IMO has some problem. Based on PCIe
> spec, packet routing on the bus doesn't take PASID into consideration. 
> since devices within same group cannot be isolated based on requestor-ID
> i.e. traffic not guaranteed going to IOMMU, enabling SVA on multiple devices
> could cause undesired p2p.
But so does enabling "classic" DMA... If two devices are not protected by
ACS for example, they are put in the same IOMMU group, and one device
might be able to snoop the other's DMA. VFIO allows userspace to create a
container for them and use MAP/UNMAP, but makes it explicit to the user
that for DMA, these devices are not isolated and must be considered as a
single device (you can't pass them to different VMs or put them in
different containers). So I tried to keep the same idea as MAP/UNMAP for
SVA, performing BIND/UNBIND operations on the VFIO container instead of
the device.

I kept the analogy simple though, because I don't think there will be many
SVA-capable systems that require IOMMU groups. They will likely implement
proper device isolation. Unlike iommu_attach_device(), bind_device()
doesn't call bind_group(), because keeping bonds consistent in groups is
complicated, not worth implementing (drivers can explicitly bind() all
devices that need it) and probably wouldn't ever be used. I also can't
test it. But maybe we could implement the following for now:

* bind_device() fails if the device's group has more than one device,
otherwise calls __bind_device(). This prevents device drivers that are
oblivious to IOMMU groups from opening a backdoor.

* bind_group() calls __bind_device() for all devices in group. This way
users that are aware of IOMMU groups can still use them safely. Note that
at the moment bind_group() fails as soon as it finds a device that doesn't
support SVA. Having all devices support SVA in a given group is
unrealistic and this behavior ought to be improved.

* hotplugging a device into a group still succeeds even if the group
already has mm bonds. Same happens for classic DMA, a hotplugged device
will have access to all mappings already present in the domain.

> If my understanding of PCIe spec is correct, probably we should fail 
> calling bind_group()/bind_device() when there are multiple devices within 
> the given group. If only one device then bind_group is essentially a wrapper
> to bind_device.>>
>> Regardless of the IOMMU group or domain a device is in, device drivers
>> should call bind() for each device that will use the PASID.
>>
[...]
>> +/**
>> + * iommu_sva_bind_device() - Bind a process address space to a device
>> + * @dev: the device
>> + * @mm: the mm to bind, caller must hold a reference to it
>> + * @pasid: valid address where the PASID will be stored
>> + * @flags: bond properties (IOMMU_SVA_FEAT_*)
>> + * @drvdata: private data passed to the mm exit handler
>> + *
>> + * Create a bond between device and task, allowing the device to access
>> the mm
>> + * using the returned PASID. A subsequent bind() for the same device and
>> mm will
>> + * reuse the bond (and return the same PASID), but users will have to call
>> + * unbind() twice.
> 
> what's the point of requiring unbind twice?

Mmh, that was necessary when we kept bond information as domain<->mm, but
since it's now device<->mm, we can probably remove the bond refcount. I
consider that a bind() between a given device and mm will always be issued
by the same driver.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices
@ 2018-02-13 12:57       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-13 12:57 UTC (permalink / raw)
  To: Tian, Kevin, linux-arm-kernel, linux-pci, linux-acpi, devicetree,
	iommu, kvm
  Cc: joro, robh+dt, Mark Rutland, Catalin Marinas, Will Deacon,
	Lorenzo Pieralisi, hanjun.guo, Sudeep Holla, rjw, lenb,
	Robin Murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo@huawei.com

On 13/02/18 07:54, Tian, Kevin wrote:
>> From: Jean-Philippe Brucker
>> Sent: Tuesday, February 13, 2018 2:33 AM
>>
>> Add bind() and unbind() operations to the IOMMU API. Device drivers can
>> use them to share process page tables with their devices. bind_group()
>> is provided for VFIO's convenience, as it needs to provide a coherent
>> interface on containers. Other device drivers will most likely want to
>> use bind_device(), which binds a single device in the group.
> 
> I saw your bind_group implementation tries to bind the address space
> for all devices within a group, which IMO has some problem. Based on PCIe
> spec, packet routing on the bus doesn't take PASID into consideration. 
> since devices within same group cannot be isolated based on requestor-ID
> i.e. traffic not guaranteed going to IOMMU, enabling SVA on multiple devices
> could cause undesired p2p.
But so does enabling "classic" DMA... If two devices are not protected by
ACS for example, they are put in the same IOMMU group, and one device
might be able to snoop the other's DMA. VFIO allows userspace to create a
container for them and use MAP/UNMAP, but makes it explicit to the user
that for DMA, these devices are not isolated and must be considered as a
single device (you can't pass them to different VMs or put them in
different containers). So I tried to keep the same idea as MAP/UNMAP for
SVA, performing BIND/UNBIND operations on the VFIO container instead of
the device.

I kept the analogy simple though, because I don't think there will be many
SVA-capable systems that require IOMMU groups. They will likely implement
proper device isolation. Unlike iommu_attach_device(), bind_device()
doesn't call bind_group(), because keeping bonds consistent in groups is
complicated, not worth implementing (drivers can explicitly bind() all
devices that need it) and probably wouldn't ever be used. I also can't
test it. But maybe we could implement the following for now:

* bind_device() fails if the device's group has more than one device,
otherwise calls __bind_device(). This prevents device drivers that are
oblivious to IOMMU groups from opening a backdoor.

* bind_group() calls __bind_device() for all devices in group. This way
users that are aware of IOMMU groups can still use them safely. Note that
at the moment bind_group() fails as soon as it finds a device that doesn't
support SVA. Having all devices support SVA in a given group is
unrealistic and this behavior ought to be improved.

* hotplugging a device into a group still succeeds even if the group
already has mm bonds. Same happens for classic DMA, a hotplugged device
will have access to all mappings already present in the domain.

> If my understanding of PCIe spec is correct, probably we should fail 
> calling bind_group()/bind_device() when there are multiple devices within 
> the given group. If only one device then bind_group is essentially a wrapper
> to bind_device.>>
>> Regardless of the IOMMU group or domain a device is in, device drivers
>> should call bind() for each device that will use the PASID.
>>
[...]
>> +/**
>> + * iommu_sva_bind_device() - Bind a process address space to a device
>> + * @dev: the device
>> + * @mm: the mm to bind, caller must hold a reference to it
>> + * @pasid: valid address where the PASID will be stored
>> + * @flags: bond properties (IOMMU_SVA_FEAT_*)
>> + * @drvdata: private data passed to the mm exit handler
>> + *
>> + * Create a bond between device and task, allowing the device to access
>> the mm
>> + * using the returned PASID. A subsequent bind() for the same device and
>> mm will
>> + * reuse the bond (and return the same PASID), but users will have to call
>> + * unbind() twice.
> 
> what's the point of requiring unbind twice?

Mmh, that was necessary when we kept bond information as domain<->mm, but
since it's now device<->mm, we can probably remove the bond refcount. I
consider that a bind() between a given device and mm will always be issued
by the same driver.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 02/37] iommu/sva: Bind process address spaces to devices
@ 2018-02-13 12:57       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-13 12:57 UTC (permalink / raw)
  To: linux-arm-kernel

On 13/02/18 07:54, Tian, Kevin wrote:
>> From: Jean-Philippe Brucker
>> Sent: Tuesday, February 13, 2018 2:33 AM
>>
>> Add bind() and unbind() operations to the IOMMU API. Device drivers can
>> use them to share process page tables with their devices. bind_group()
>> is provided for VFIO's convenience, as it needs to provide a coherent
>> interface on containers. Other device drivers will most likely want to
>> use bind_device(), which binds a single device in the group.
> 
> I saw your bind_group implementation tries to bind the address space
> for all devices within a group, which IMO has some problem. Based on PCIe
> spec, packet routing on the bus doesn't take PASID into consideration. 
> since devices within same group cannot be isolated based on requestor-ID
> i.e. traffic not guaranteed going to IOMMU, enabling SVA on multiple devices
> could cause undesired p2p.
But so does enabling "classic" DMA... If two devices are not protected by
ACS for example, they are put in the same IOMMU group, and one device
might be able to snoop the other's DMA. VFIO allows userspace to create a
container for them and use MAP/UNMAP, but makes it explicit to the user
that for DMA, these devices are not isolated and must be considered as a
single device (you can't pass them to different VMs or put them in
different containers). So I tried to keep the same idea as MAP/UNMAP for
SVA, performing BIND/UNBIND operations on the VFIO container instead of
the device.

I kept the analogy simple though, because I don't think there will be many
SVA-capable systems that require IOMMU groups. They will likely implement
proper device isolation. Unlike iommu_attach_device(), bind_device()
doesn't call bind_group(), because keeping bonds consistent in groups is
complicated, not worth implementing (drivers can explicitly bind() all
devices that need it) and probably wouldn't ever be used. I also can't
test it. But maybe we could implement the following for now:

* bind_device() fails if the device's group has more than one device,
otherwise calls __bind_device(). This prevents device drivers that are
oblivious to IOMMU groups from opening a backdoor.

* bind_group() calls __bind_device() for all devices in group. This way
users that are aware of IOMMU groups can still use them safely. Note that
at the moment bind_group() fails as soon as it finds a device that doesn't
support SVA. Having all devices support SVA in a given group is
unrealistic and this behavior ought to be improved.

* hotplugging a device into a group still succeeds even if the group
already has mm bonds. Same happens for classic DMA, a hotplugged device
will have access to all mappings already present in the domain.

> If my understanding of PCIe spec is correct, probably we should fail 
> calling bind_group()/bind_device() when there are multiple devices within 
> the given group. If only one device then bind_group is essentially a wrapper
> to bind_device.>>
>> Regardless of the IOMMU group or domain a device is in, device drivers
>> should call bind() for each device that will use the PASID.
>>
[...]
>> +/**
>> + * iommu_sva_bind_device() - Bind a process address space to a device
>> + * @dev: the device
>> + * @mm: the mm to bind, caller must hold a reference to it
>> + * @pasid: valid address where the PASID will be stored
>> + * @flags: bond properties (IOMMU_SVA_FEAT_*)
>> + * @drvdata: private data passed to the mm exit handler
>> + *
>> + * Create a bond between device and task, allowing the device to access
>> the mm
>> + * using the returned PASID. A subsequent bind() for the same device and
>> mm will
>> + * reuse the bond (and return the same PASID), but users will have to call
>> + * unbind() twice.
> 
> what's the point of requiring unbind twice?

Mmh, that was necessary when we kept bond information as domain<->mm, but
since it's now device<->mm, we can probably remove the bond refcount. I
consider that a bind() between a given device and mm will always be issued
by the same driver.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 04/37] iommu/sva: Add a mm_exit callback for device drivers
  2018-02-13  8:11         ` Tian, Kevin
  (?)
  (?)
@ 2018-02-13 12:57           ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-13 12:57 UTC (permalink / raw)
  To: Tian, Kevin, linux-arm-kernel, linux-pci, linux-acpi, devicetree,
	iommu, kvm
  Cc: joro, robh+dt, Mark Rutland, Catalin Marinas, Will Deacon,
	Lorenzo Pieralisi, hanjun.guo, Sudeep Holla, rjw, lenb,
	Robin Murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1

On 13/02/18 08:11, Tian, Kevin wrote:
>> From: Jean-Philippe Brucker
>> Sent: Tuesday, February 13, 2018 2:33 AM
>>
>> When an mm exits, devices that were bound to it must stop performing
>> DMA
>> on its PASID. Let device drivers register a callback to be notified on mm
>> exit. Add the callback to the iommu_param structure attached to struct
>> device.
> 
> what about registering the callback in sva_device_init? 

I don't have a preference. This way it look like
iommu_register_device_fault_handler, but adding the callback to
sva_device_init makes sense too.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 04/37] iommu/sva: Add a mm_exit callback for device drivers
@ 2018-02-13 12:57           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-13 12:57 UTC (permalink / raw)
  To: Tian, Kevin, linux-arm-kernel, linux-pci, linux-acpi, devicetree,
	iommu, kvm
  Cc: Mark Rutland, xieyisheng1, ilias.apalodimas, Catalin Marinas,
	xuzaibo, jonathan.cameron, Will Deacon, okaya, Liu, Yi L,
	Lorenzo Pieralisi, Raj, Ashok, tn, joro, bharatku, rfranz, lenb,
	jacob.jun.pan, alex.williamson, robh+dt, thunder.leizhen,
	bhelgaas, shunyong.yang, dwmw2, liubo95, rjw, jcrouse, robdclark,
	hanjun.guo, Sudeep Holla, Robin Murphy, christian.koenig,
	nwatters

On 13/02/18 08:11, Tian, Kevin wrote:
>> From: Jean-Philippe Brucker
>> Sent: Tuesday, February 13, 2018 2:33 AM
>>
>> When an mm exits, devices that were bound to it must stop performing
>> DMA
>> on its PASID. Let device drivers register a callback to be notified on mm
>> exit. Add the callback to the iommu_param structure attached to struct
>> device.
> 
> what about registering the callback in sva_device_init? 

I don't have a preference. This way it look like
iommu_register_device_fault_handler, but adding the callback to
sva_device_init makes sense too.

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 04/37] iommu/sva: Add a mm_exit callback for device drivers
@ 2018-02-13 12:57           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-13 12:57 UTC (permalink / raw)
  To: Tian, Kevin, linux-arm-kernel, linux-pci, linux-acpi, devicetree,
	iommu, kvm
  Cc: joro, robh+dt, Mark Rutland, Catalin Marinas, Will Deacon,
	Lorenzo Pieralisi, hanjun.guo, Sudeep Holla, rjw, lenb,
	Robin Murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo@huawei.com

On 13/02/18 08:11, Tian, Kevin wrote:
>> From: Jean-Philippe Brucker
>> Sent: Tuesday, February 13, 2018 2:33 AM
>>
>> When an mm exits, devices that were bound to it must stop performing
>> DMA
>> on its PASID. Let device drivers register a callback to be notified on mm
>> exit. Add the callback to the iommu_param structure attached to struct
>> device.
> 
> what about registering the callback in sva_device_init? 

I don't have a preference. This way it look like
iommu_register_device_fault_handler, but adding the callback to
sva_device_init makes sense too.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 04/37] iommu/sva: Add a mm_exit callback for device drivers
@ 2018-02-13 12:57           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-13 12:57 UTC (permalink / raw)
  To: linux-arm-kernel

On 13/02/18 08:11, Tian, Kevin wrote:
>> From: Jean-Philippe Brucker
>> Sent: Tuesday, February 13, 2018 2:33 AM
>>
>> When an mm exits, devices that were bound to it must stop performing
>> DMA
>> on its PASID. Let device drivers register a callback to be notified on mm
>> exit. Add the callback to the iommu_param structure attached to struct
>> device.
> 
> what about registering the callback in sva_device_init? 

I don't have a preference. This way it look like
iommu_register_device_fault_handler, but adding the callback to
sva_device_init makes sense too.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 29/37] iommu/arm-smmu-v3: Add stall support for platform devices
  2018-02-13  1:46       ` Xu Zaibo
  (?)
@ 2018-02-13 12:58           ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-13 12:58 UTC (permalink / raw)
  To: Xu Zaibo, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Mark Rutland, ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	Catalin Marinas, Will Deacon, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	liguozhu-C8/M+/jPZTeaMJb+Lgu22Q,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	shunyong.yang-PT9Dzx9SjPiXmMXjJBpWqg,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw-LthD3rsA81gm4RdzfppkhA,
	Sudeep Holla, christian.koenig-5C7GfCeVMHo

Hi,

On 13/02/18 01:46, Xu Zaibo wrote:
> Hi,
> 
> On 2018/2/13 2:33, Jean-Philippe Brucker wrote:
>> The SMMU provides a Stall model for handling page faults in platform
>> devices. It is similar to PCI PRI, but doesn't require devices to have
>> their own translation cache. Instead, faulting transactions are parked and
>> the OS is given a chance to fix the page tables and retry the transaction.
>>
>> Enable stall for devices that support it (opt-in by firmware). When an
>> event corresponds to a translation error, call the IOMMU fault handler. If
>> the fault is recoverable, it will call us back to terminate or continue
>> the stall.
>>
>> Note that this patch tweaks the iommu_fault_event and page_response_msg to
>> extend the fault id field. Stall uses 16 bits of IDs whereas PCI PRI only
>> uses 9.
> For PCIe devices without ATC,  can they use this Stall model?

Unfortunately no, Stall it is incompatible with PCI. Timing constraints in
PCI prevent from stalling transactions in the IOMMU.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 29/37] iommu/arm-smmu-v3: Add stall support for platform devices
@ 2018-02-13 12:58           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-13 12:58 UTC (permalink / raw)
  To: Xu Zaibo, linux-arm-kernel, linux-pci, linux-acpi, devicetree,
	iommu, kvm
  Cc: joro, robh+dt, Mark Rutland, Catalin Marinas, Will Deacon,
	Lorenzo Pieralisi, hanjun.guo, Sudeep Holla, rjw, lenb,
	Robin Murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, ilias.apalodimas, jonathan.cameron,
	shunyong.yang, nwatters, okaya, jcrouse, rfranz, dwmw2,
	jacob.jun.pan, yi.l.liu, ashok.raj, robdclark, christian.koenig,
	bharatku, liguozhu

Hi,

On 13/02/18 01:46, Xu Zaibo wrote:
> Hi,
> 
> On 2018/2/13 2:33, Jean-Philippe Brucker wrote:
>> The SMMU provides a Stall model for handling page faults in platform
>> devices. It is similar to PCI PRI, but doesn't require devices to have
>> their own translation cache. Instead, faulting transactions are parked and
>> the OS is given a chance to fix the page tables and retry the transaction.
>>
>> Enable stall for devices that support it (opt-in by firmware). When an
>> event corresponds to a translation error, call the IOMMU fault handler. If
>> the fault is recoverable, it will call us back to terminate or continue
>> the stall.
>>
>> Note that this patch tweaks the iommu_fault_event and page_response_msg to
>> extend the fault id field. Stall uses 16 bits of IDs whereas PCI PRI only
>> uses 9.
> For PCIe devices without ATC,  can they use this Stall model?

Unfortunately no, Stall it is incompatible with PCI. Timing constraints in
PCI prevent from stalling transactions in the IOMMU.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 29/37] iommu/arm-smmu-v3: Add stall support for platform devices
@ 2018-02-13 12:58           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-13 12:58 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 13/02/18 01:46, Xu Zaibo wrote:
> Hi,
> 
> On 2018/2/13 2:33, Jean-Philippe Brucker wrote:
>> The SMMU provides a Stall model for handling page faults in platform
>> devices. It is similar to PCI PRI, but doesn't require devices to have
>> their own translation cache. Instead, faulting transactions are parked and
>> the OS is given a chance to fix the page tables and retry the transaction.
>>
>> Enable stall for devices that support it (opt-in by firmware). When an
>> event corresponds to a translation error, call the IOMMU fault handler. If
>> the fault is recoverable, it will call us back to terminate or continue
>> the stall.
>>
>> Note that this patch tweaks the iommu_fault_event and page_response_msg to
>> extend the fault id field. Stall uses 16 bits of IDs whereas PCI PRI only
>> uses 9.
> For PCIe devices without ATC,  can they use this Stall model?

Unfortunately no, Stall it is incompatible with PCI. Timing constraints in
PCI prevent from stalling transactions in the IOMMU.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* RE: [PATCH 02/37] iommu/sva: Bind process address spaces to devices
  2018-02-13 12:57       ` Jean-Philippe Brucker
  (?)
@ 2018-02-13 23:34         ` Tian, Kevin
  -1 siblings, 0 replies; 317+ messages in thread
From: Tian, Kevin @ 2018-02-13 23:34 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu, kvm
  Cc: joro, robh+dt, Mark Rutland, Catalin Marinas, Will Deacon,
	Lorenzo Pieralisi, hanjun.guo, Sudeep Holla, rjw, lenb,
	Robin Murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo

> From: Jean-Philippe Brucker
> Sent: Tuesday, February 13, 2018 8:57 PM
> 
> On 13/02/18 07:54, Tian, Kevin wrote:
> >> From: Jean-Philippe Brucker
> >> Sent: Tuesday, February 13, 2018 2:33 AM
> >>
> >> Add bind() and unbind() operations to the IOMMU API. Device drivers
> can
> >> use them to share process page tables with their devices. bind_group()
> >> is provided for VFIO's convenience, as it needs to provide a coherent
> >> interface on containers. Other device drivers will most likely want to
> >> use bind_device(), which binds a single device in the group.
> >
> > I saw your bind_group implementation tries to bind the address space
> > for all devices within a group, which IMO has some problem. Based on
> PCIe
> > spec, packet routing on the bus doesn't take PASID into consideration.
> > since devices within same group cannot be isolated based on requestor-
> ID
> > i.e. traffic not guaranteed going to IOMMU, enabling SVA on multiple
> devices
> > could cause undesired p2p.
> But so does enabling "classic" DMA... If two devices are not protected by
> ACS for example, they are put in the same IOMMU group, and one device
> might be able to snoop the other's DMA. VFIO allows userspace to create a
> container for them and use MAP/UNMAP, but makes it explicit to the user
> that for DMA, these devices are not isolated and must be considered as a
> single device (you can't pass them to different VMs or put them in
> different containers). So I tried to keep the same idea as MAP/UNMAP for
> SVA, performing BIND/UNBIND operations on the VFIO container instead of
> the device.

there is a small difference. for classic DMA we can reserve PCI BARs 
when allocating IOVA, thus multiple devices in the same group can 
still work correctly applied with same translation, if isolation is not
cared in between. However for SVA it's CPU virtual addresses 
managed by kernel mm thus difficult to introduce similar address 
reservation. Then it's possible for a VA falling into other device's 
BAR in the same group and cause undesired p2p traffic. In such 
regard, SVA is actually functionally-broken.

> 
> I kept the analogy simple though, because I don't think there will be many
> SVA-capable systems that require IOMMU groups. They will likely

I agree that multiple SVA-capable devices in same IOMMU group is not
a typical configuration, especially it's usually observed on new devices.
Then based on above limitation, I think we could just explicitly avoid
enabling SVA in such case. :-)

> implement
> proper device isolation. Unlike iommu_attach_device(), bind_device()
> doesn't call bind_group(), because keeping bonds consistent in groups is
> complicated, not worth implementing (drivers can explicitly bind() all
> devices that need it) and probably wouldn't ever be used. I also can't
> test it. But maybe we could implement the following for now:
> 
> * bind_device() fails if the device's group has more than one device,
> otherwise calls __bind_device(). This prevents device drivers that are
> oblivious to IOMMU groups from opening a backdoor.
> 
> * bind_group() calls __bind_device() for all devices in group. This way
> users that are aware of IOMMU groups can still use them safely. Note that
> at the moment bind_group() fails as soon as it finds a device that doesn't
> support SVA. Having all devices support SVA in a given group is
> unrealistic and this behavior ought to be improved.
> 
> * hotplugging a device into a group still succeeds even if the group
> already has mm bonds. Same happens for classic DMA, a hotplugged
> device
> will have access to all mappings already present in the domain.
> 
> > If my understanding of PCIe spec is correct, probably we should fail
> > calling bind_group()/bind_device() when there are multiple devices within
> > the given group. If only one device then bind_group is essentially a
> wrapper
> > to bind_device.>>
> >> Regardless of the IOMMU group or domain a device is in, device drivers
> >> should call bind() for each device that will use the PASID.
> >>
> [...]
> >> +/**
> >> + * iommu_sva_bind_device() - Bind a process address space to a device
> >> + * @dev: the device
> >> + * @mm: the mm to bind, caller must hold a reference to it
> >> + * @pasid: valid address where the PASID will be stored
> >> + * @flags: bond properties (IOMMU_SVA_FEAT_*)
> >> + * @drvdata: private data passed to the mm exit handler
> >> + *
> >> + * Create a bond between device and task, allowing the device to access
> >> the mm
> >> + * using the returned PASID. A subsequent bind() for the same device
> and
> >> mm will
> >> + * reuse the bond (and return the same PASID), but users will have to
> call
> >> + * unbind() twice.
> >
> > what's the point of requiring unbind twice?
> 
> Mmh, that was necessary when we kept bond information as domain<-
> >mm, but
> since it's now device<->mm, we can probably remove the bond refcount. I
> consider that a bind() between a given device and mm will always be issued
> by the same driver.
> 
> Thanks,
> Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* RE: [PATCH 02/37] iommu/sva: Bind process address spaces to devices
@ 2018-02-13 23:34         ` Tian, Kevin
  0 siblings, 0 replies; 317+ messages in thread
From: Tian, Kevin @ 2018-02-13 23:34 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu, kvm
  Cc: joro, robh+dt, Mark Rutland, Catalin Marinas, Will Deacon,
	Lorenzo Pieralisi, hanjun.guo, Sudeep Holla, rjw, lenb,
	Robin Murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, Liu, Yi L, Raj, Ashok, robdclark,
	christian.koenig, bharatku, mykyta.iziumtsev

> From: Jean-Philippe Brucker
> Sent: Tuesday, February 13, 2018 8:57 PM
> 
> On 13/02/18 07:54, Tian, Kevin wrote:
> >> From: Jean-Philippe Brucker
> >> Sent: Tuesday, February 13, 2018 2:33 AM
> >>
> >> Add bind() and unbind() operations to the IOMMU API. Device drivers
> can
> >> use them to share process page tables with their devices. bind_group()
> >> is provided for VFIO's convenience, as it needs to provide a coherent
> >> interface on containers. Other device drivers will most likely want to
> >> use bind_device(), which binds a single device in the group.
> >
> > I saw your bind_group implementation tries to bind the address space
> > for all devices within a group, which IMO has some problem. Based on
> PCIe
> > spec, packet routing on the bus doesn't take PASID into consideration.
> > since devices within same group cannot be isolated based on requestor-
> ID
> > i.e. traffic not guaranteed going to IOMMU, enabling SVA on multiple
> devices
> > could cause undesired p2p.
> But so does enabling "classic" DMA... If two devices are not protected by
> ACS for example, they are put in the same IOMMU group, and one device
> might be able to snoop the other's DMA. VFIO allows userspace to create a
> container for them and use MAP/UNMAP, but makes it explicit to the user
> that for DMA, these devices are not isolated and must be considered as a
> single device (you can't pass them to different VMs or put them in
> different containers). So I tried to keep the same idea as MAP/UNMAP for
> SVA, performing BIND/UNBIND operations on the VFIO container instead of
> the device.

there is a small difference. for classic DMA we can reserve PCI BARs 
when allocating IOVA, thus multiple devices in the same group can 
still work correctly applied with same translation, if isolation is not
cared in between. However for SVA it's CPU virtual addresses 
managed by kernel mm thus difficult to introduce similar address 
reservation. Then it's possible for a VA falling into other device's 
BAR in the same group and cause undesired p2p traffic. In such 
regard, SVA is actually functionally-broken.

> 
> I kept the analogy simple though, because I don't think there will be many
> SVA-capable systems that require IOMMU groups. They will likely

I agree that multiple SVA-capable devices in same IOMMU group is not
a typical configuration, especially it's usually observed on new devices.
Then based on above limitation, I think we could just explicitly avoid
enabling SVA in such case. :-)

> implement
> proper device isolation. Unlike iommu_attach_device(), bind_device()
> doesn't call bind_group(), because keeping bonds consistent in groups is
> complicated, not worth implementing (drivers can explicitly bind() all
> devices that need it) and probably wouldn't ever be used. I also can't
> test it. But maybe we could implement the following for now:
> 
> * bind_device() fails if the device's group has more than one device,
> otherwise calls __bind_device(). This prevents device drivers that are
> oblivious to IOMMU groups from opening a backdoor.
> 
> * bind_group() calls __bind_device() for all devices in group. This way
> users that are aware of IOMMU groups can still use them safely. Note that
> at the moment bind_group() fails as soon as it finds a device that doesn't
> support SVA. Having all devices support SVA in a given group is
> unrealistic and this behavior ought to be improved.
> 
> * hotplugging a device into a group still succeeds even if the group
> already has mm bonds. Same happens for classic DMA, a hotplugged
> device
> will have access to all mappings already present in the domain.
> 
> > If my understanding of PCIe spec is correct, probably we should fail
> > calling bind_group()/bind_device() when there are multiple devices within
> > the given group. If only one device then bind_group is essentially a
> wrapper
> > to bind_device.>>
> >> Regardless of the IOMMU group or domain a device is in, device drivers
> >> should call bind() for each device that will use the PASID.
> >>
> [...]
> >> +/**
> >> + * iommu_sva_bind_device() - Bind a process address space to a device
> >> + * @dev: the device
> >> + * @mm: the mm to bind, caller must hold a reference to it
> >> + * @pasid: valid address where the PASID will be stored
> >> + * @flags: bond properties (IOMMU_SVA_FEAT_*)
> >> + * @drvdata: private data passed to the mm exit handler
> >> + *
> >> + * Create a bond between device and task, allowing the device to access
> >> the mm
> >> + * using the returned PASID. A subsequent bind() for the same device
> and
> >> mm will
> >> + * reuse the bond (and return the same PASID), but users will have to
> call
> >> + * unbind() twice.
> >
> > what's the point of requiring unbind twice?
> 
> Mmh, that was necessary when we kept bond information as domain<-
> >mm, but
> since it's now device<->mm, we can probably remove the bond refcount. I
> consider that a bind() between a given device and mm will always be issued
> by the same driver.
> 
> Thanks,
> Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 02/37] iommu/sva: Bind process address spaces to devices
@ 2018-02-13 23:34         ` Tian, Kevin
  0 siblings, 0 replies; 317+ messages in thread
From: Tian, Kevin @ 2018-02-13 23:34 UTC (permalink / raw)
  To: linux-arm-kernel

> From: Jean-Philippe Brucker
> Sent: Tuesday, February 13, 2018 8:57 PM
> 
> On 13/02/18 07:54, Tian, Kevin wrote:
> >> From: Jean-Philippe Brucker
> >> Sent: Tuesday, February 13, 2018 2:33 AM
> >>
> >> Add bind() and unbind() operations to the IOMMU API. Device drivers
> can
> >> use them to share process page tables with their devices. bind_group()
> >> is provided for VFIO's convenience, as it needs to provide a coherent
> >> interface on containers. Other device drivers will most likely want to
> >> use bind_device(), which binds a single device in the group.
> >
> > I saw your bind_group implementation tries to bind the address space
> > for all devices within a group, which IMO has some problem. Based on
> PCIe
> > spec, packet routing on the bus doesn't take PASID into consideration.
> > since devices within same group cannot be isolated based on requestor-
> ID
> > i.e. traffic not guaranteed going to IOMMU, enabling SVA on multiple
> devices
> > could cause undesired p2p.
> But so does enabling "classic" DMA... If two devices are not protected by
> ACS for example, they are put in the same IOMMU group, and one device
> might be able to snoop the other's DMA. VFIO allows userspace to create a
> container for them and use MAP/UNMAP, but makes it explicit to the user
> that for DMA, these devices are not isolated and must be considered as a
> single device (you can't pass them to different VMs or put them in
> different containers). So I tried to keep the same idea as MAP/UNMAP for
> SVA, performing BIND/UNBIND operations on the VFIO container instead of
> the device.

there is a small difference. for classic DMA we can reserve PCI BARs 
when allocating IOVA, thus multiple devices in the same group can 
still work correctly applied with same translation, if isolation is not
cared in between. However for SVA it's CPU virtual addresses 
managed by kernel mm thus difficult to introduce similar address 
reservation. Then it's possible for a VA falling into other device's 
BAR in the same group and cause undesired p2p traffic. In such 
regard, SVA is actually functionally-broken.

> 
> I kept the analogy simple though, because I don't think there will be many
> SVA-capable systems that require IOMMU groups. They will likely

I agree that multiple SVA-capable devices in same IOMMU group is not
a typical configuration, especially it's usually observed on new devices.
Then based on above limitation, I think we could just explicitly avoid
enabling SVA in such case. :-)

> implement
> proper device isolation. Unlike iommu_attach_device(), bind_device()
> doesn't call bind_group(), because keeping bonds consistent in groups is
> complicated, not worth implementing (drivers can explicitly bind() all
> devices that need it) and probably wouldn't ever be used. I also can't
> test it. But maybe we could implement the following for now:
> 
> * bind_device() fails if the device's group has more than one device,
> otherwise calls __bind_device(). This prevents device drivers that are
> oblivious to IOMMU groups from opening a backdoor.
> 
> * bind_group() calls __bind_device() for all devices in group. This way
> users that are aware of IOMMU groups can still use them safely. Note that
> at the moment bind_group() fails as soon as it finds a device that doesn't
> support SVA. Having all devices support SVA in a given group is
> unrealistic and this behavior ought to be improved.
> 
> * hotplugging a device into a group still succeeds even if the group
> already has mm bonds. Same happens for classic DMA, a hotplugged
> device
> will have access to all mappings already present in the domain.
> 
> > If my understanding of PCIe spec is correct, probably we should fail
> > calling bind_group()/bind_device() when there are multiple devices within
> > the given group. If only one device then bind_group is essentially a
> wrapper
> > to bind_device.>>
> >> Regardless of the IOMMU group or domain a device is in, device drivers
> >> should call bind() for each device that will use the PASID.
> >>
> [...]
> >> +/**
> >> + * iommu_sva_bind_device() - Bind a process address space to a device
> >> + * @dev: the device
> >> + * @mm: the mm to bind, caller must hold a reference to it
> >> + * @pasid: valid address where the PASID will be stored
> >> + * @flags: bond properties (IOMMU_SVA_FEAT_*)
> >> + * @drvdata: private data passed to the mm exit handler
> >> + *
> >> + * Create a bond between device and task, allowing the device to access
> >> the mm
> >> + * using the returned PASID. A subsequent bind() for the same device
> and
> >> mm will
> >> + * reuse the bond (and return the same PASID), but users will have to
> call
> >> + * unbind() twice.
> >
> > what's the point of requiring unbind twice?
> 
> Mmh, that was necessary when we kept bond information as domain<-
> >mm, but
> since it's now device<->mm, we can probably remove the bond refcount. I
> consider that a bind() between a given device and mm will always be issued
> by the same driver.
> 
> Thanks,
> Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* RE: [PATCH 01/37] iommu: Introduce Shared Virtual Addressing API
  2018-02-13 12:40           ` Jean-Philippe Brucker
  (?)
@ 2018-02-13 23:43             ` Tian, Kevin
  -1 siblings, 0 replies; 317+ messages in thread
From: Tian, Kevin @ 2018-02-13 23:43 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu, kvm
  Cc: Mark Rutland, ilias.apalodimas, Catalin Marinas, xuzaibo,
	Will Deacon, okaya, Raj, Ashok, bharatku, rfranz, lenb, robh+dt,
	bhelgaas, shunyong.yang, dwmw2, rjw, Sudeep Holla,
	christian.koenig, Jo

> From: Jean-Philippe Brucker
> Sent: Tuesday, February 13, 2018 8:40 PM
> 
> 
> [...]
> >> +
> >> +/**
> >> + * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a
> >> device
> >> + * @dev: the device
> >> + * @features: bitmask of features that need to be initialized
> >> + * @max_pasid: max PASID value supported by the device
> >> + *
> >> + * Users of the bind()/unbind() API must call this function to initialize all
> >> + * features required for SVA.
> >> + *
> >> + * - If the device should support multiple address spaces (e.g. PCI
> PASID),
> >> + *   IOMMU_SVA_FEAT_PASID must be requested.
> >
> > I think it is by default assumed when using this API, based on definition of
> > SVA. Can you elaborate the situation where this flag can be cleared?
> 
> When passing a device to userspace, you could also share its non-pasid
> address space with the process. It requires a new domain type so is left
> as a TODO in patch 2/37. I did get requests for this feature, though I
> think it was mostly for prototyping. I guess I could remove the flag, and
> reintroduce it as IOMMU_SVA_FEAT_NO_PASID later on.

sorry I still didn't get the definition of non-pasid address space. 
Did you mean the GPA/IOVA address space and no_pasid implies
actually some default PASID associated?

> 
> [...]
> >> +	ret = domain->ops->sva_device_init(dev, features, &min_pasid,
> >> +					   &max_pasid);
> >> +	if (ret)
> >> +		return ret;
> >> +
> >> +	/* FIXME: racy. Next version should have a mutex (same as fault
> >> handler) */
> >> +	dev_param->sva_features = features;
> >> +	dev_param->min_pasid = min_pasid;
> >> +	dev_param->max_pasid = max_pasid;
> >
> > what's the point of min_pasid here?
> 
> Arm SMMUv3 uses entry 0 of the PASID table for the default (non-pasid)
> context, so it needs to set min_pasid to 1. AMD IOMMU recently added a
> similar feature (GIoSup), if I understood correctly.
> 

just for such purpose maybe we should just define a reserved_pasid
otherwise there will be some waste if an implementation allows it
non-zero.

Thanks
Kevin
 

^ permalink raw reply	[flat|nested] 317+ messages in thread

* RE: [PATCH 01/37] iommu: Introduce Shared Virtual Addressing API
@ 2018-02-13 23:43             ` Tian, Kevin
  0 siblings, 0 replies; 317+ messages in thread
From: Tian, Kevin @ 2018-02-13 23:43 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu, kvm
  Cc: Mark Rutland, ilias.apalodimas, Catalin Marinas, xuzaibo,
	Will Deacon, okaya, Raj, Ashok, bharatku, rfranz, lenb, robh+dt,
	bhelgaas, shunyong.yang, dwmw2, rjw, Sudeep Holla,
	christian.koenig, Joerg Roedel

> From: Jean-Philippe Brucker
> Sent: Tuesday, February 13, 2018 8:40 PM
> 
> 
> [...]
> >> +
> >> +/**
> >> + * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a
> >> device
> >> + * @dev: the device
> >> + * @features: bitmask of features that need to be initialized
> >> + * @max_pasid: max PASID value supported by the device
> >> + *
> >> + * Users of the bind()/unbind() API must call this function to initialize all
> >> + * features required for SVA.
> >> + *
> >> + * - If the device should support multiple address spaces (e.g. PCI
> PASID),
> >> + *   IOMMU_SVA_FEAT_PASID must be requested.
> >
> > I think it is by default assumed when using this API, based on definition of
> > SVA. Can you elaborate the situation where this flag can be cleared?
> 
> When passing a device to userspace, you could also share its non-pasid
> address space with the process. It requires a new domain type so is left
> as a TODO in patch 2/37. I did get requests for this feature, though I
> think it was mostly for prototyping. I guess I could remove the flag, and
> reintroduce it as IOMMU_SVA_FEAT_NO_PASID later on.

sorry I still didn't get the definition of non-pasid address space. 
Did you mean the GPA/IOVA address space and no_pasid implies
actually some default PASID associated?

> 
> [...]
> >> +	ret = domain->ops->sva_device_init(dev, features, &min_pasid,
> >> +					   &max_pasid);
> >> +	if (ret)
> >> +		return ret;
> >> +
> >> +	/* FIXME: racy. Next version should have a mutex (same as fault
> >> handler) */
> >> +	dev_param->sva_features = features;
> >> +	dev_param->min_pasid = min_pasid;
> >> +	dev_param->max_pasid = max_pasid;
> >
> > what's the point of min_pasid here?
> 
> Arm SMMUv3 uses entry 0 of the PASID table for the default (non-pasid)
> context, so it needs to set min_pasid to 1. AMD IOMMU recently added a
> similar feature (GIoSup), if I understood correctly.
> 

just for such purpose maybe we should just define a reserved_pasid
otherwise there will be some waste if an implementation allows it
non-zero.

Thanks
Kevin
 

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 01/37] iommu: Introduce Shared Virtual Addressing API
@ 2018-02-13 23:43             ` Tian, Kevin
  0 siblings, 0 replies; 317+ messages in thread
From: Tian, Kevin @ 2018-02-13 23:43 UTC (permalink / raw)
  To: linux-arm-kernel

> From: Jean-Philippe Brucker
> Sent: Tuesday, February 13, 2018 8:40 PM
> 
> 
> [...]
> >> +
> >> +/**
> >> + * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a
> >> device
> >> + * @dev: the device
> >> + * @features: bitmask of features that need to be initialized
> >> + * @max_pasid: max PASID value supported by the device
> >> + *
> >> + * Users of the bind()/unbind() API must call this function to initialize all
> >> + * features required for SVA.
> >> + *
> >> + * - If the device should support multiple address spaces (e.g. PCI
> PASID),
> >> + *   IOMMU_SVA_FEAT_PASID must be requested.
> >
> > I think it is by default assumed when using this API, based on definition of
> > SVA. Can you elaborate the situation where this flag can be cleared?
> 
> When passing a device to userspace, you could also share its non-pasid
> address space with the process. It requires a new domain type so is left
> as a TODO in patch 2/37. I did get requests for this feature, though I
> think it was mostly for prototyping. I guess I could remove the flag, and
> reintroduce it as IOMMU_SVA_FEAT_NO_PASID later on.

sorry I still didn't get the definition of non-pasid address space. 
Did you mean the GPA/IOVA address space and no_pasid implies
actually some default PASID associated?

> 
> [...]
> >> +	ret = domain->ops->sva_device_init(dev, features, &min_pasid,
> >> +					   &max_pasid);
> >> +	if (ret)
> >> +		return ret;
> >> +
> >> +	/* FIXME: racy. Next version should have a mutex (same as fault
> >> handler) */
> >> +	dev_param->sva_features = features;
> >> +	dev_param->min_pasid = min_pasid;
> >> +	dev_param->max_pasid = max_pasid;
> >
> > what's the point of min_pasid here?
> 
> Arm SMMUv3 uses entry 0 of the PASID table for the default (non-pasid)
> context, so it needs to set min_pasid to 1. AMD IOMMU recently added a
> similar feature (GIoSup), if I understood correctly.
> 

just for such purpose maybe we should just define a reserved_pasid
otherwise there will be some waste if an implementation allows it
non-zero.

Thanks
Kevin
 

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 07/37] iommu: Add a page fault handler
  2018-02-12 18:33   ` Jean-Philippe Brucker
  (?)
@ 2018-02-14  7:18     ` Jacob Pan
  -1 siblings, 0 replies; 317+ messages in thread
From: Jacob Pan @ 2018-02-14  7:18 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm,
	joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, yi.l.liu

On Mon, 12 Feb 2018 18:33:22 +0000
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> Some systems allow devices to handle IOMMU translation faults in the
> core mm. For example systems supporting the PCI PRI extension or Arm
> SMMU stall model. Infrastructure for reporting such recoverable page
> faults was recently added to the IOMMU core, for SVA virtualization.
> Extend iommu_report_device_fault() to handle host page faults as well.
> 
> * IOMMU drivers instantiate a fault workqueue, using
>   iommu_fault_queue_init() and iommu_fault_queue_destroy().
> 
> * When it receives a fault event, supposedly in an IRQ handler, the
> IOMMU driver reports the fault using iommu_report_device_fault()
> 
> * If the device driver registered a handler (e.g. VFIO), pass down the
>   fault event. Otherwise submit it to the fault queue, to be handled
> in a thread.
> 
> * When the fault corresponds to an io_mm, call the mm fault handler
> on it (in next patch).
> 
> * Once the fault is handled, the mm wrapper or the device driver
> reports success of failure with iommu_page_response(). The
> translation is either retried or aborted, depending on the response
> code.
> 
Hi Jean,
Seems a good approach to consolidate page fault. I will try to test
intel-svm code with this flow. more comments inline.
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  drivers/iommu/Kconfig      |  10 ++
>  drivers/iommu/Makefile     |   1 +
>  drivers/iommu/io-pgfault.c | 282
> +++++++++++++++++++++++++++++++++++++++++++++
> drivers/iommu/iommu-sva.c  |   3 - drivers/iommu/iommu.c      |  31
> ++--- include/linux/iommu.h      |  34 +++++-
>  6 files changed, 339 insertions(+), 22 deletions(-)
>  create mode 100644 drivers/iommu/io-pgfault.c
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 146eebe9a4bb..e751bb9958ba 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -85,6 +85,15 @@ config IOMMU_SVA
>  
>  	  If unsure, say N here.
>  
> +config IOMMU_FAULT
> +	bool "Fault handler for the IOMMU API"
> +	select IOMMU_API
> +	help
> +	  Enable the generic fault handler for the IOMMU API, that
> handles
> +	  recoverable page faults or inject them into guests.
> +
> +	  If unsure, say N here.
> +
>  config FSL_PAMU
>  	bool "Freescale IOMMU support"
>  	depends on PCI
> @@ -156,6 +165,7 @@ config INTEL_IOMMU
>  	select IOMMU_API
>  	select IOMMU_IOVA
>  	select DMAR_TABLE
> +	select IOMMU_FAULT
>  	help
>  	  DMA remapping (DMAR) devices support enables independent
> address translations for Direct Memory Access (DMA) from devices.
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 1dbcc89ebe4c..f4324e29035e 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -4,6 +4,7 @@ obj-$(CONFIG_IOMMU_API) += iommu-traces.o
>  obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
>  obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
>  obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
> +obj-$(CONFIG_IOMMU_FAULT) += io-pgfault.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
> diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
> new file mode 100644
> index 000000000000..33309ed316d2
> --- /dev/null
> +++ b/drivers/iommu/io-pgfault.c
> @@ -0,0 +1,282 @@
> +/*
> + * Handle device page faults
> + *
> + * Copyright (C) 2018 ARM Ltd.
> + * Author: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> + *
> + * SPDX-License-Identifier: GPL-2.0
> + */
> +
> +#include <linux/iommu.h>
> +#include <linux/list.h>
> +#include <linux/slab.h>
> +#include <linux/workqueue.h>
> +
> +static struct workqueue_struct *iommu_fault_queue;
> +static DECLARE_RWSEM(iommu_fault_queue_sem);
> +static refcount_t iommu_fault_queue_refs = REFCOUNT_INIT(0);
> +static BLOCKING_NOTIFIER_HEAD(iommu_fault_queue_flush_notifiers);
> +
> +/* Used to store incomplete fault groups */
> +static LIST_HEAD(iommu_partial_faults);
> +static DEFINE_SPINLOCK(iommu_partial_faults_lock);
> +
should partial fault list be per iommu?
> +struct iommu_fault_context {
> +	struct device			*dev;
> +	struct iommu_fault_event	evt;
> +	struct list_head		head;
> +};
> +
> +struct iommu_fault_group {
> +	struct iommu_domain		*domain;
> +	struct iommu_fault_context	last_fault;
> +	struct list_head		faults;
> +	struct work_struct		work;
> +};
> +
> +/*
> + * iommu_fault_complete() - Finish handling a fault
> + *
> + * Send a response if necessary and pass on the sanitized status code
> + */
> +static int iommu_fault_complete(struct iommu_domain *domain, struct
> device *dev,
> +				struct iommu_fault_event *evt, int
> status) +{
> +	struct page_response_msg resp = {
> +		.addr		= evt->addr,
> +		.pasid		= evt->pasid,
> +		.pasid_present	= evt->pasid_valid,
> +		.page_req_group_id = evt->page_req_group_id,
> +		.type		= IOMMU_PAGE_GROUP_RESP,
> +		.private_data	= evt->iommu_private,
> +	};
> +
> +	/*
> +	 * There is no "handling" an unrecoverable fault, so the
> only valid
> +	 * return values are 0 or an error.
> +	 */
> +	if (evt->type == IOMMU_FAULT_DMA_UNRECOV)
> +		return status > 0 ? 0 : status;
> +
> +	/* Someone took ownership of the fault and will complete it
> later */
> +	if (status == IOMMU_PAGE_RESP_HANDLED)
> +		return 0;
> +
> +	/*
> +	 * There was an internal error with handling the recoverable
> fault. Try
> +	 * to complete the fault if possible.
> +	 */
> +	if (status < 0)
> +		status = IOMMU_PAGE_RESP_INVALID;
> +
> +	if (WARN_ON(!domain->ops->page_response))
> +		/*
> +		 * The IOMMU driver shouldn't have submitted
> recoverable faults
> +		 * if it cannot receive a response.
> +		 */
> +		return -EINVAL;
> +
> +	resp.resp_code = status;
> +	return domain->ops->page_response(domain, dev, &resp);
> +}
> +
> +static int iommu_fault_handle_single(struct iommu_fault_context
> *fault) +{
> +	/* TODO */
> +	return -ENODEV;
> +}
> +
> +static void iommu_fault_handle_group(struct work_struct *work)
> +{
> +	struct iommu_fault_group *group;
> +	struct iommu_fault_context *fault, *next;
> +	int status = IOMMU_PAGE_RESP_SUCCESS;
> +
> +	group = container_of(work, struct iommu_fault_group, work);
> +
> +	list_for_each_entry_safe(fault, next, &group->faults, head) {
> +		struct iommu_fault_event *evt = &fault->evt;
> +		/*
> +		 * Errors are sticky: don't handle subsequent faults
> in the
> +		 * group if there is an error.
> +		 */
> +		if (status == IOMMU_PAGE_RESP_SUCCESS)
> +			status = iommu_fault_handle_single(fault);
> +
> +		if (!evt->last_req)
> +			kfree(fault);
> +	}
> +
> +	iommu_fault_complete(group->domain, group->last_fault.dev,
> +			     &group->last_fault.evt, status);
> +	kfree(group);
> +}
> +
> +static int iommu_queue_fault(struct iommu_domain *domain, struct
> device *dev,
> +			     struct iommu_fault_event *evt)
> +{
> +	struct iommu_fault_group *group;
> +	struct iommu_fault_context *fault, *next;
> +
> +	if (!iommu_fault_queue)
> +		return -ENOSYS;
> +
> +	if (!evt->last_req) {
> +		fault = kzalloc(sizeof(*fault), GFP_KERNEL);
> +		if (!fault)
> +			return -ENOMEM;
> +
> +		fault->evt = *evt;
> +		fault->dev = dev;
> +
> +		/* Non-last request of a group. Postpone until the
> last one */
> +		spin_lock(&iommu_partial_faults_lock);
> +		list_add_tail(&fault->head, &iommu_partial_faults);
> +		spin_unlock(&iommu_partial_faults_lock);
> +
> +		return IOMMU_PAGE_RESP_HANDLED;
> +	}
> +
> +	group = kzalloc(sizeof(*group), GFP_KERNEL);
> +	if (!group)
> +		return -ENOMEM;
> +
> +	group->last_fault.evt = *evt;
> +	group->last_fault.dev = dev;
> +	group->domain = domain;
> +	INIT_LIST_HEAD(&group->faults);
> +	list_add(&group->last_fault.head, &group->faults);
> +	INIT_WORK(&group->work, iommu_fault_handle_group);
> +
> +	/* See if we have pending faults for this group */
> +	spin_lock(&iommu_partial_faults_lock);
> +	list_for_each_entry_safe(fault, next, &iommu_partial_faults,
> head) {
> +		if (fault->evt.page_req_group_id ==
> evt->page_req_group_id &&
> +		    fault->dev == dev) {
> +			list_del(&fault->head);
> +			/* Insert *before* the last fault */
> +			list_add(&fault->head, &group->faults);
> +		}
> +	}
> +	spin_unlock(&iommu_partial_faults_lock);
> +
> +	queue_work(iommu_fault_queue, &group->work);
> +
> +	/* Postpone the fault completion */
> +	return IOMMU_PAGE_RESP_HANDLED;
> +}
> +
> +/**
> + * iommu_report_device_fault() - Handle fault in device driver or mm
> + *
> + * If the device driver expressed interest in handling fault, report
> it through
> + * the callback. If the fault is recoverable, try to page in the
> address.
> + */
> +int iommu_report_device_fault(struct device *dev, struct
> iommu_fault_event *evt) +{
> +	int ret = -ENOSYS;
> +	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> +
> +	if (!domain)
> +		return -ENODEV;
> +
> +	/*
> +	 * if upper layers showed interest and installed a fault
> handler,
> +	 * invoke it.
> +	 */
> +	if (iommu_has_device_fault_handler(dev)) {
I think Alex pointed out this is racy, so adding a mutex to the
iommu_fault_param and acquire it would help. Do we really
atomic handler?
> +		struct iommu_fault_param *param =
> dev->iommu_param->fault_param; +
> +		return param->handler(evt, param->data);
Even upper layer (VFIO) registered handler to propagate PRQ to a guest
to fault in the pages, we may still need to keep track of the page
requests that need page response later, i.e. last page in group or
stream request in vt-d. This will allow us sanitize the page response
come back from the guest/VFIO.
In my next round, I am adding a per device list under iommu_fault_param
for pending page request. This will also address the situation where
guest failed to send response. We can enforce time or credit limit of
pending requests based on this list.

> +	}
> +
> +	/* If the handler is blocking, handle fault in the workqueue
> */
> +	if (evt->type == IOMMU_FAULT_PAGE_REQ)
> +		ret = iommu_queue_fault(domain, dev, evt);
> +
> +	return iommu_fault_complete(domain, dev, evt, ret);
> +}
> +EXPORT_SYMBOL_GPL(iommu_report_device_fault);
> +
> +/**
> + * iommu_fault_queue_register() - register an IOMMU driver to the
> fault queue
> + * @flush_notifier: a notifier block that is called before the fault
> queue is
> + * flushed. The IOMMU driver should commit all faults that are
> pending in its
> + * low-level queues at the time of the call, into the fault queue.
> The notifier
> + * takes a device pointer as argument, hinting what endpoint is
> causing the
> + * flush. When the device is NULL, all faults should be committed.
> + */
> +int iommu_fault_queue_register(struct notifier_block *flush_notifier)
> +{
> +	/*
> +	 * The WQ is unordered because the low-level handler
> enqueues faults by
> +	 * group. PRI requests within a group have to be ordered,
> but once
> +	 * that's dealt with, the high-level function can handle
> groups out of
> +	 * order.
> +	 */
> +	down_write(&iommu_fault_queue_sem);
> +	if (!iommu_fault_queue) {
> +		iommu_fault_queue =
> alloc_workqueue("iommu_fault_queue",
> +						    WQ_UNBOUND, 0);
> +		if (iommu_fault_queue)
> +			refcount_set(&iommu_fault_queue_refs, 1);
> +	} else {
> +		refcount_inc(&iommu_fault_queue_refs);
> +	}
> +	up_write(&iommu_fault_queue_sem);
> +
> +	if (!iommu_fault_queue)
> +		return -ENOMEM;
> +
> +	if (flush_notifier)
> +
> blocking_notifier_chain_register(&iommu_fault_queue_flush_notifiers,
> +						 flush_notifier);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(iommu_fault_queue_register);
> +
> +/**
> + * iommu_fault_queue_flush() - Ensure that all queued faults have
> been
> + * processed.
> + * @dev: the endpoint whose faults need to be flushed. If NULL,
> flush all
> + *       pending faults.
> + *
> + * Users must call this function when releasing a PASID, to ensure
> that all
> + * pending faults affecting this PASID have been handled, and won't
> affect the
> + * address space of a subsequent process that reuses this PASID.
> + */
> +void iommu_fault_queue_flush(struct device *dev)
> +{
> +
> blocking_notifier_call_chain(&iommu_fault_queue_flush_notifiers, 0,
> dev); +
> +	down_read(&iommu_fault_queue_sem);
> +	/*
> +	 * Don't flush the partial faults list. All PRGs with the
> PASID are
> +	 * complete and have been submitted to the queue.
> +	 */
> +	if (iommu_fault_queue)
> +		flush_workqueue(iommu_fault_queue);
> +	up_read(&iommu_fault_queue_sem);
> +}
> +EXPORT_SYMBOL_GPL(iommu_fault_queue_flush);
> +
> +/**
> + * iommu_fault_queue_unregister() - Unregister an IOMMU driver from
> the fault
> + * queue.
> + * @flush_notifier: same parameter as iommu_fault_queue_register
> + */
> +void iommu_fault_queue_unregister(struct notifier_block
> *flush_notifier) +{
> +	down_write(&iommu_fault_queue_sem);
> +	if (refcount_dec_and_test(&iommu_fault_queue_refs)) {
> +		destroy_workqueue(iommu_fault_queue);
> +		iommu_fault_queue = NULL;
> +	}
> +	up_write(&iommu_fault_queue_sem);
> +
> +	if (flush_notifier)
> +
> blocking_notifier_chain_unregister(&iommu_fault_queue_flush_notifiers,
> +						   flush_notifier);
> +}
> +EXPORT_SYMBOL_GPL(iommu_fault_queue_unregister);
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> index 4bc2a8c12465..d7b231cd7355 100644
> --- a/drivers/iommu/iommu-sva.c
> +++ b/drivers/iommu/iommu-sva.c
> @@ -102,9 +102,6 @@
>   * the device table and PASID 0 would be available to the allocator.
>   */
>  
> -/* TODO: stub for the fault queue. Remove later. */
> -#define iommu_fault_queue_flush(...)
> -
>  struct iommu_bond {
>  	struct io_mm		*io_mm;
>  	struct device		*dev;
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 1d60b32a6744..c475893ec7dc 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -798,6 +798,17 @@ int iommu_group_unregister_notifier(struct
> iommu_group *group, }
>  EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
>  
> +/**
> + * iommu_register_device_fault_handler() - Register a device fault
> handler
> + * @dev: the device
> + * @handler: the fault handler
> + * @data: private data passed as argument to the callback
> + *
> + * When an IOMMU fault event is received, call this handler with the
> fault event
> + * and data as argument.
> + *
> + * Return 0 if the fault handler was installed successfully, or an
> error.
> + */
>  int iommu_register_device_fault_handler(struct device *dev,
>  					iommu_dev_fault_handler_t
> handler, void *data)
> @@ -825,6 +836,13 @@ int iommu_register_device_fault_handler(struct
> device *dev, }
>  EXPORT_SYMBOL_GPL(iommu_register_device_fault_handler);
>  
> +/**
> + * iommu_unregister_device_fault_handler() - Unregister the device
> fault handler
> + * @dev: the device
> + *
> + * Remove the device fault handler installed with
> + * iommu_register_device_fault_handler().
> + */
>  int iommu_unregister_device_fault_handler(struct device *dev)
>  {
>  	struct iommu_param *idata = dev->iommu_param;
> @@ -840,19 +858,6 @@ int iommu_unregister_device_fault_handler(struct
> device *dev) }
>  EXPORT_SYMBOL_GPL(iommu_unregister_device_fault_handler);
>  
> -
> -int iommu_report_device_fault(struct device *dev, struct
> iommu_fault_event *evt) -{
> -	/* we only report device fault if there is a handler
> registered */
> -	if (!dev->iommu_param || !dev->iommu_param->fault_param ||
> -		!dev->iommu_param->fault_param->handler)
> -		return -ENOSYS;
> -
> -	return dev->iommu_param->fault_param->handler(evt,
> -
> dev->iommu_param->fault_param->data); -}
> -EXPORT_SYMBOL_GPL(iommu_report_device_fault);
> -
>  /**
>   * iommu_group_id - Return ID for a group
>   * @group: the group to ID
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 226ab4f3ae0e..65e56f28e0ce 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -205,6 +205,7 @@ struct page_response_msg {
>  	u32 resp_code:4;
>  #define IOMMU_PAGE_RESP_SUCCESS	0
>  #define IOMMU_PAGE_RESP_INVALID	1
> +#define IOMMU_PAGE_RESP_HANDLED	2
>  #define IOMMU_PAGE_RESP_FAILURE	0xF
>  
>  	u32 pasid_present:1;
> @@ -534,7 +535,6 @@ extern int
> iommu_register_device_fault_handler(struct device *dev, 
>  extern int iommu_unregister_device_fault_handler(struct device *dev);
>  
> -extern int iommu_report_device_fault(struct device *dev, struct
> iommu_fault_event *evt); extern int iommu_page_response(struct
> iommu_domain *domain, struct device *dev, struct page_response_msg
> *msg); 
> @@ -836,11 +836,6 @@ static inline bool
> iommu_has_device_fault_handler(struct device *dev) return false;
>  }
>  
> -static inline int iommu_report_device_fault(struct device *dev,
> struct iommu_fault_event *evt) -{
> -	return 0;
> -}
> -
>  static inline int iommu_page_response(struct iommu_domain *domain,
> struct device *dev, struct page_response_msg *msg)
>  {
> @@ -1005,4 +1000,31 @@ static inline struct mm_struct
> *iommu_sva_find(int pasid) }
>  #endif /* CONFIG_IOMMU_SVA */
>  
> +#ifdef CONFIG_IOMMU_FAULT
> +extern int iommu_fault_queue_register(struct notifier_block
> *flush_notifier); +extern void iommu_fault_queue_flush(struct device
> *dev); +extern void iommu_fault_queue_unregister(struct
> notifier_block *flush_notifier); +extern int
> iommu_report_device_fault(struct device *dev,
> +				     struct iommu_fault_event *evt);
> +#else /* CONFIG_IOMMU_FAULT */
> +static inline int iommu_fault_queue_register(struct notifier_block
> *flush_notifier) +{
> +	return -ENODEV;
> +}
> +
> +static inline void iommu_fault_queue_flush(struct device *dev)
> +{
> +}
> +
> +static inline void iommu_fault_queue_unregister(struct
> notifier_block *flush_notifier) +{
> +}
> +
> +static inline int iommu_report_device_fault(struct device *dev,
> +					    struct iommu_fault_event
> *evt) +{
> +	return 0;
> +}
> +#endif /* CONFIG_IOMMU_FAULT */
> +
>  #endif /* __LINUX_IOMMU_H */

[Jacob Pan]

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 07/37] iommu: Add a page fault handler
@ 2018-02-14  7:18     ` Jacob Pan
  0 siblings, 0 replies; 317+ messages in thread
From: Jacob Pan @ 2018-02-14  7:18 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm,
	joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, yi.l.liu, ashok.raj, robdclark, christian.koenig,
	bharatku, jacob.jun.pan

On Mon, 12 Feb 2018 18:33:22 +0000
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> Some systems allow devices to handle IOMMU translation faults in the
> core mm. For example systems supporting the PCI PRI extension or Arm
> SMMU stall model. Infrastructure for reporting such recoverable page
> faults was recently added to the IOMMU core, for SVA virtualization.
> Extend iommu_report_device_fault() to handle host page faults as well.
> 
> * IOMMU drivers instantiate a fault workqueue, using
>   iommu_fault_queue_init() and iommu_fault_queue_destroy().
> 
> * When it receives a fault event, supposedly in an IRQ handler, the
> IOMMU driver reports the fault using iommu_report_device_fault()
> 
> * If the device driver registered a handler (e.g. VFIO), pass down the
>   fault event. Otherwise submit it to the fault queue, to be handled
> in a thread.
> 
> * When the fault corresponds to an io_mm, call the mm fault handler
> on it (in next patch).
> 
> * Once the fault is handled, the mm wrapper or the device driver
> reports success of failure with iommu_page_response(). The
> translation is either retried or aborted, depending on the response
> code.
> 
Hi Jean,
Seems a good approach to consolidate page fault. I will try to test
intel-svm code with this flow. more comments inline.
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  drivers/iommu/Kconfig      |  10 ++
>  drivers/iommu/Makefile     |   1 +
>  drivers/iommu/io-pgfault.c | 282
> +++++++++++++++++++++++++++++++++++++++++++++
> drivers/iommu/iommu-sva.c  |   3 - drivers/iommu/iommu.c      |  31
> ++--- include/linux/iommu.h      |  34 +++++-
>  6 files changed, 339 insertions(+), 22 deletions(-)
>  create mode 100644 drivers/iommu/io-pgfault.c
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 146eebe9a4bb..e751bb9958ba 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -85,6 +85,15 @@ config IOMMU_SVA
>  
>  	  If unsure, say N here.
>  
> +config IOMMU_FAULT
> +	bool "Fault handler for the IOMMU API"
> +	select IOMMU_API
> +	help
> +	  Enable the generic fault handler for the IOMMU API, that
> handles
> +	  recoverable page faults or inject them into guests.
> +
> +	  If unsure, say N here.
> +
>  config FSL_PAMU
>  	bool "Freescale IOMMU support"
>  	depends on PCI
> @@ -156,6 +165,7 @@ config INTEL_IOMMU
>  	select IOMMU_API
>  	select IOMMU_IOVA
>  	select DMAR_TABLE
> +	select IOMMU_FAULT
>  	help
>  	  DMA remapping (DMAR) devices support enables independent
> address translations for Direct Memory Access (DMA) from devices.
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 1dbcc89ebe4c..f4324e29035e 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -4,6 +4,7 @@ obj-$(CONFIG_IOMMU_API) += iommu-traces.o
>  obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
>  obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
>  obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
> +obj-$(CONFIG_IOMMU_FAULT) += io-pgfault.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
> diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
> new file mode 100644
> index 000000000000..33309ed316d2
> --- /dev/null
> +++ b/drivers/iommu/io-pgfault.c
> @@ -0,0 +1,282 @@
> +/*
> + * Handle device page faults
> + *
> + * Copyright (C) 2018 ARM Ltd.
> + * Author: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> + *
> + * SPDX-License-Identifier: GPL-2.0
> + */
> +
> +#include <linux/iommu.h>
> +#include <linux/list.h>
> +#include <linux/slab.h>
> +#include <linux/workqueue.h>
> +
> +static struct workqueue_struct *iommu_fault_queue;
> +static DECLARE_RWSEM(iommu_fault_queue_sem);
> +static refcount_t iommu_fault_queue_refs = REFCOUNT_INIT(0);
> +static BLOCKING_NOTIFIER_HEAD(iommu_fault_queue_flush_notifiers);
> +
> +/* Used to store incomplete fault groups */
> +static LIST_HEAD(iommu_partial_faults);
> +static DEFINE_SPINLOCK(iommu_partial_faults_lock);
> +
should partial fault list be per iommu?
> +struct iommu_fault_context {
> +	struct device			*dev;
> +	struct iommu_fault_event	evt;
> +	struct list_head		head;
> +};
> +
> +struct iommu_fault_group {
> +	struct iommu_domain		*domain;
> +	struct iommu_fault_context	last_fault;
> +	struct list_head		faults;
> +	struct work_struct		work;
> +};
> +
> +/*
> + * iommu_fault_complete() - Finish handling a fault
> + *
> + * Send a response if necessary and pass on the sanitized status code
> + */
> +static int iommu_fault_complete(struct iommu_domain *domain, struct
> device *dev,
> +				struct iommu_fault_event *evt, int
> status) +{
> +	struct page_response_msg resp = {
> +		.addr		= evt->addr,
> +		.pasid		= evt->pasid,
> +		.pasid_present	= evt->pasid_valid,
> +		.page_req_group_id = evt->page_req_group_id,
> +		.type		= IOMMU_PAGE_GROUP_RESP,
> +		.private_data	= evt->iommu_private,
> +	};
> +
> +	/*
> +	 * There is no "handling" an unrecoverable fault, so the
> only valid
> +	 * return values are 0 or an error.
> +	 */
> +	if (evt->type == IOMMU_FAULT_DMA_UNRECOV)
> +		return status > 0 ? 0 : status;
> +
> +	/* Someone took ownership of the fault and will complete it
> later */
> +	if (status == IOMMU_PAGE_RESP_HANDLED)
> +		return 0;
> +
> +	/*
> +	 * There was an internal error with handling the recoverable
> fault. Try
> +	 * to complete the fault if possible.
> +	 */
> +	if (status < 0)
> +		status = IOMMU_PAGE_RESP_INVALID;
> +
> +	if (WARN_ON(!domain->ops->page_response))
> +		/*
> +		 * The IOMMU driver shouldn't have submitted
> recoverable faults
> +		 * if it cannot receive a response.
> +		 */
> +		return -EINVAL;
> +
> +	resp.resp_code = status;
> +	return domain->ops->page_response(domain, dev, &resp);
> +}
> +
> +static int iommu_fault_handle_single(struct iommu_fault_context
> *fault) +{
> +	/* TODO */
> +	return -ENODEV;
> +}
> +
> +static void iommu_fault_handle_group(struct work_struct *work)
> +{
> +	struct iommu_fault_group *group;
> +	struct iommu_fault_context *fault, *next;
> +	int status = IOMMU_PAGE_RESP_SUCCESS;
> +
> +	group = container_of(work, struct iommu_fault_group, work);
> +
> +	list_for_each_entry_safe(fault, next, &group->faults, head) {
> +		struct iommu_fault_event *evt = &fault->evt;
> +		/*
> +		 * Errors are sticky: don't handle subsequent faults
> in the
> +		 * group if there is an error.
> +		 */
> +		if (status == IOMMU_PAGE_RESP_SUCCESS)
> +			status = iommu_fault_handle_single(fault);
> +
> +		if (!evt->last_req)
> +			kfree(fault);
> +	}
> +
> +	iommu_fault_complete(group->domain, group->last_fault.dev,
> +			     &group->last_fault.evt, status);
> +	kfree(group);
> +}
> +
> +static int iommu_queue_fault(struct iommu_domain *domain, struct
> device *dev,
> +			     struct iommu_fault_event *evt)
> +{
> +	struct iommu_fault_group *group;
> +	struct iommu_fault_context *fault, *next;
> +
> +	if (!iommu_fault_queue)
> +		return -ENOSYS;
> +
> +	if (!evt->last_req) {
> +		fault = kzalloc(sizeof(*fault), GFP_KERNEL);
> +		if (!fault)
> +			return -ENOMEM;
> +
> +		fault->evt = *evt;
> +		fault->dev = dev;
> +
> +		/* Non-last request of a group. Postpone until the
> last one */
> +		spin_lock(&iommu_partial_faults_lock);
> +		list_add_tail(&fault->head, &iommu_partial_faults);
> +		spin_unlock(&iommu_partial_faults_lock);
> +
> +		return IOMMU_PAGE_RESP_HANDLED;
> +	}
> +
> +	group = kzalloc(sizeof(*group), GFP_KERNEL);
> +	if (!group)
> +		return -ENOMEM;
> +
> +	group->last_fault.evt = *evt;
> +	group->last_fault.dev = dev;
> +	group->domain = domain;
> +	INIT_LIST_HEAD(&group->faults);
> +	list_add(&group->last_fault.head, &group->faults);
> +	INIT_WORK(&group->work, iommu_fault_handle_group);
> +
> +	/* See if we have pending faults for this group */
> +	spin_lock(&iommu_partial_faults_lock);
> +	list_for_each_entry_safe(fault, next, &iommu_partial_faults,
> head) {
> +		if (fault->evt.page_req_group_id ==
> evt->page_req_group_id &&
> +		    fault->dev == dev) {
> +			list_del(&fault->head);
> +			/* Insert *before* the last fault */
> +			list_add(&fault->head, &group->faults);
> +		}
> +	}
> +	spin_unlock(&iommu_partial_faults_lock);
> +
> +	queue_work(iommu_fault_queue, &group->work);
> +
> +	/* Postpone the fault completion */
> +	return IOMMU_PAGE_RESP_HANDLED;
> +}
> +
> +/**
> + * iommu_report_device_fault() - Handle fault in device driver or mm
> + *
> + * If the device driver expressed interest in handling fault, report
> it through
> + * the callback. If the fault is recoverable, try to page in the
> address.
> + */
> +int iommu_report_device_fault(struct device *dev, struct
> iommu_fault_event *evt) +{
> +	int ret = -ENOSYS;
> +	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> +
> +	if (!domain)
> +		return -ENODEV;
> +
> +	/*
> +	 * if upper layers showed interest and installed a fault
> handler,
> +	 * invoke it.
> +	 */
> +	if (iommu_has_device_fault_handler(dev)) {
I think Alex pointed out this is racy, so adding a mutex to the
iommu_fault_param and acquire it would help. Do we really
atomic handler?
> +		struct iommu_fault_param *param =
> dev->iommu_param->fault_param; +
> +		return param->handler(evt, param->data);
Even upper layer (VFIO) registered handler to propagate PRQ to a guest
to fault in the pages, we may still need to keep track of the page
requests that need page response later, i.e. last page in group or
stream request in vt-d. This will allow us sanitize the page response
come back from the guest/VFIO.
In my next round, I am adding a per device list under iommu_fault_param
for pending page request. This will also address the situation where
guest failed to send response. We can enforce time or credit limit of
pending requests based on this list.

> +	}
> +
> +	/* If the handler is blocking, handle fault in the workqueue
> */
> +	if (evt->type == IOMMU_FAULT_PAGE_REQ)
> +		ret = iommu_queue_fault(domain, dev, evt);
> +
> +	return iommu_fault_complete(domain, dev, evt, ret);
> +}
> +EXPORT_SYMBOL_GPL(iommu_report_device_fault);
> +
> +/**
> + * iommu_fault_queue_register() - register an IOMMU driver to the
> fault queue
> + * @flush_notifier: a notifier block that is called before the fault
> queue is
> + * flushed. The IOMMU driver should commit all faults that are
> pending in its
> + * low-level queues at the time of the call, into the fault queue.
> The notifier
> + * takes a device pointer as argument, hinting what endpoint is
> causing the
> + * flush. When the device is NULL, all faults should be committed.
> + */
> +int iommu_fault_queue_register(struct notifier_block *flush_notifier)
> +{
> +	/*
> +	 * The WQ is unordered because the low-level handler
> enqueues faults by
> +	 * group. PRI requests within a group have to be ordered,
> but once
> +	 * that's dealt with, the high-level function can handle
> groups out of
> +	 * order.
> +	 */
> +	down_write(&iommu_fault_queue_sem);
> +	if (!iommu_fault_queue) {
> +		iommu_fault_queue =
> alloc_workqueue("iommu_fault_queue",
> +						    WQ_UNBOUND, 0);
> +		if (iommu_fault_queue)
> +			refcount_set(&iommu_fault_queue_refs, 1);
> +	} else {
> +		refcount_inc(&iommu_fault_queue_refs);
> +	}
> +	up_write(&iommu_fault_queue_sem);
> +
> +	if (!iommu_fault_queue)
> +		return -ENOMEM;
> +
> +	if (flush_notifier)
> +
> blocking_notifier_chain_register(&iommu_fault_queue_flush_notifiers,
> +						 flush_notifier);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(iommu_fault_queue_register);
> +
> +/**
> + * iommu_fault_queue_flush() - Ensure that all queued faults have
> been
> + * processed.
> + * @dev: the endpoint whose faults need to be flushed. If NULL,
> flush all
> + *       pending faults.
> + *
> + * Users must call this function when releasing a PASID, to ensure
> that all
> + * pending faults affecting this PASID have been handled, and won't
> affect the
> + * address space of a subsequent process that reuses this PASID.
> + */
> +void iommu_fault_queue_flush(struct device *dev)
> +{
> +
> blocking_notifier_call_chain(&iommu_fault_queue_flush_notifiers, 0,
> dev); +
> +	down_read(&iommu_fault_queue_sem);
> +	/*
> +	 * Don't flush the partial faults list. All PRGs with the
> PASID are
> +	 * complete and have been submitted to the queue.
> +	 */
> +	if (iommu_fault_queue)
> +		flush_workqueue(iommu_fault_queue);
> +	up_read(&iommu_fault_queue_sem);
> +}
> +EXPORT_SYMBOL_GPL(iommu_fault_queue_flush);
> +
> +/**
> + * iommu_fault_queue_unregister() - Unregister an IOMMU driver from
> the fault
> + * queue.
> + * @flush_notifier: same parameter as iommu_fault_queue_register
> + */
> +void iommu_fault_queue_unregister(struct notifier_block
> *flush_notifier) +{
> +	down_write(&iommu_fault_queue_sem);
> +	if (refcount_dec_and_test(&iommu_fault_queue_refs)) {
> +		destroy_workqueue(iommu_fault_queue);
> +		iommu_fault_queue = NULL;
> +	}
> +	up_write(&iommu_fault_queue_sem);
> +
> +	if (flush_notifier)
> +
> blocking_notifier_chain_unregister(&iommu_fault_queue_flush_notifiers,
> +						   flush_notifier);
> +}
> +EXPORT_SYMBOL_GPL(iommu_fault_queue_unregister);
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> index 4bc2a8c12465..d7b231cd7355 100644
> --- a/drivers/iommu/iommu-sva.c
> +++ b/drivers/iommu/iommu-sva.c
> @@ -102,9 +102,6 @@
>   * the device table and PASID 0 would be available to the allocator.
>   */
>  
> -/* TODO: stub for the fault queue. Remove later. */
> -#define iommu_fault_queue_flush(...)
> -
>  struct iommu_bond {
>  	struct io_mm		*io_mm;
>  	struct device		*dev;
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 1d60b32a6744..c475893ec7dc 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -798,6 +798,17 @@ int iommu_group_unregister_notifier(struct
> iommu_group *group, }
>  EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
>  
> +/**
> + * iommu_register_device_fault_handler() - Register a device fault
> handler
> + * @dev: the device
> + * @handler: the fault handler
> + * @data: private data passed as argument to the callback
> + *
> + * When an IOMMU fault event is received, call this handler with the
> fault event
> + * and data as argument.
> + *
> + * Return 0 if the fault handler was installed successfully, or an
> error.
> + */
>  int iommu_register_device_fault_handler(struct device *dev,
>  					iommu_dev_fault_handler_t
> handler, void *data)
> @@ -825,6 +836,13 @@ int iommu_register_device_fault_handler(struct
> device *dev, }
>  EXPORT_SYMBOL_GPL(iommu_register_device_fault_handler);
>  
> +/**
> + * iommu_unregister_device_fault_handler() - Unregister the device
> fault handler
> + * @dev: the device
> + *
> + * Remove the device fault handler installed with
> + * iommu_register_device_fault_handler().
> + */
>  int iommu_unregister_device_fault_handler(struct device *dev)
>  {
>  	struct iommu_param *idata = dev->iommu_param;
> @@ -840,19 +858,6 @@ int iommu_unregister_device_fault_handler(struct
> device *dev) }
>  EXPORT_SYMBOL_GPL(iommu_unregister_device_fault_handler);
>  
> -
> -int iommu_report_device_fault(struct device *dev, struct
> iommu_fault_event *evt) -{
> -	/* we only report device fault if there is a handler
> registered */
> -	if (!dev->iommu_param || !dev->iommu_param->fault_param ||
> -		!dev->iommu_param->fault_param->handler)
> -		return -ENOSYS;
> -
> -	return dev->iommu_param->fault_param->handler(evt,
> -
> dev->iommu_param->fault_param->data); -}
> -EXPORT_SYMBOL_GPL(iommu_report_device_fault);
> -
>  /**
>   * iommu_group_id - Return ID for a group
>   * @group: the group to ID
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 226ab4f3ae0e..65e56f28e0ce 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -205,6 +205,7 @@ struct page_response_msg {
>  	u32 resp_code:4;
>  #define IOMMU_PAGE_RESP_SUCCESS	0
>  #define IOMMU_PAGE_RESP_INVALID	1
> +#define IOMMU_PAGE_RESP_HANDLED	2
>  #define IOMMU_PAGE_RESP_FAILURE	0xF
>  
>  	u32 pasid_present:1;
> @@ -534,7 +535,6 @@ extern int
> iommu_register_device_fault_handler(struct device *dev, 
>  extern int iommu_unregister_device_fault_handler(struct device *dev);
>  
> -extern int iommu_report_device_fault(struct device *dev, struct
> iommu_fault_event *evt); extern int iommu_page_response(struct
> iommu_domain *domain, struct device *dev, struct page_response_msg
> *msg); 
> @@ -836,11 +836,6 @@ static inline bool
> iommu_has_device_fault_handler(struct device *dev) return false;
>  }
>  
> -static inline int iommu_report_device_fault(struct device *dev,
> struct iommu_fault_event *evt) -{
> -	return 0;
> -}
> -
>  static inline int iommu_page_response(struct iommu_domain *domain,
> struct device *dev, struct page_response_msg *msg)
>  {
> @@ -1005,4 +1000,31 @@ static inline struct mm_struct
> *iommu_sva_find(int pasid) }
>  #endif /* CONFIG_IOMMU_SVA */
>  
> +#ifdef CONFIG_IOMMU_FAULT
> +extern int iommu_fault_queue_register(struct notifier_block
> *flush_notifier); +extern void iommu_fault_queue_flush(struct device
> *dev); +extern void iommu_fault_queue_unregister(struct
> notifier_block *flush_notifier); +extern int
> iommu_report_device_fault(struct device *dev,
> +				     struct iommu_fault_event *evt);
> +#else /* CONFIG_IOMMU_FAULT */
> +static inline int iommu_fault_queue_register(struct notifier_block
> *flush_notifier) +{
> +	return -ENODEV;
> +}
> +
> +static inline void iommu_fault_queue_flush(struct device *dev)
> +{
> +}
> +
> +static inline void iommu_fault_queue_unregister(struct
> notifier_block *flush_notifier) +{
> +}
> +
> +static inline int iommu_report_device_fault(struct device *dev,
> +					    struct iommu_fault_event
> *evt) +{
> +	return 0;
> +}
> +#endif /* CONFIG_IOMMU_FAULT */
> +
>  #endif /* __LINUX_IOMMU_H */

[Jacob Pan]

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 07/37] iommu: Add a page fault handler
@ 2018-02-14  7:18     ` Jacob Pan
  0 siblings, 0 replies; 317+ messages in thread
From: Jacob Pan @ 2018-02-14  7:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 12 Feb 2018 18:33:22 +0000
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> Some systems allow devices to handle IOMMU translation faults in the
> core mm. For example systems supporting the PCI PRI extension or Arm
> SMMU stall model. Infrastructure for reporting such recoverable page
> faults was recently added to the IOMMU core, for SVA virtualization.
> Extend iommu_report_device_fault() to handle host page faults as well.
> 
> * IOMMU drivers instantiate a fault workqueue, using
>   iommu_fault_queue_init() and iommu_fault_queue_destroy().
> 
> * When it receives a fault event, supposedly in an IRQ handler, the
> IOMMU driver reports the fault using iommu_report_device_fault()
> 
> * If the device driver registered a handler (e.g. VFIO), pass down the
>   fault event. Otherwise submit it to the fault queue, to be handled
> in a thread.
> 
> * When the fault corresponds to an io_mm, call the mm fault handler
> on it (in next patch).
> 
> * Once the fault is handled, the mm wrapper or the device driver
> reports success of failure with iommu_page_response(). The
> translation is either retried or aborted, depending on the response
> code.
> 
Hi Jean,
Seems a good approach to consolidate page fault. I will try to test
intel-svm code with this flow. more comments inline.
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  drivers/iommu/Kconfig      |  10 ++
>  drivers/iommu/Makefile     |   1 +
>  drivers/iommu/io-pgfault.c | 282
> +++++++++++++++++++++++++++++++++++++++++++++
> drivers/iommu/iommu-sva.c  |   3 - drivers/iommu/iommu.c      |  31
> ++--- include/linux/iommu.h      |  34 +++++-
>  6 files changed, 339 insertions(+), 22 deletions(-)
>  create mode 100644 drivers/iommu/io-pgfault.c
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 146eebe9a4bb..e751bb9958ba 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -85,6 +85,15 @@ config IOMMU_SVA
>  
>  	  If unsure, say N here.
>  
> +config IOMMU_FAULT
> +	bool "Fault handler for the IOMMU API"
> +	select IOMMU_API
> +	help
> +	  Enable the generic fault handler for the IOMMU API, that
> handles
> +	  recoverable page faults or inject them into guests.
> +
> +	  If unsure, say N here.
> +
>  config FSL_PAMU
>  	bool "Freescale IOMMU support"
>  	depends on PCI
> @@ -156,6 +165,7 @@ config INTEL_IOMMU
>  	select IOMMU_API
>  	select IOMMU_IOVA
>  	select DMAR_TABLE
> +	select IOMMU_FAULT
>  	help
>  	  DMA remapping (DMAR) devices support enables independent
> address translations for Direct Memory Access (DMA) from devices.
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 1dbcc89ebe4c..f4324e29035e 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -4,6 +4,7 @@ obj-$(CONFIG_IOMMU_API) += iommu-traces.o
>  obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
>  obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
>  obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
> +obj-$(CONFIG_IOMMU_FAULT) += io-pgfault.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
> diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
> new file mode 100644
> index 000000000000..33309ed316d2
> --- /dev/null
> +++ b/drivers/iommu/io-pgfault.c
> @@ -0,0 +1,282 @@
> +/*
> + * Handle device page faults
> + *
> + * Copyright (C) 2018 ARM Ltd.
> + * Author: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> + *
> + * SPDX-License-Identifier: GPL-2.0
> + */
> +
> +#include <linux/iommu.h>
> +#include <linux/list.h>
> +#include <linux/slab.h>
> +#include <linux/workqueue.h>
> +
> +static struct workqueue_struct *iommu_fault_queue;
> +static DECLARE_RWSEM(iommu_fault_queue_sem);
> +static refcount_t iommu_fault_queue_refs = REFCOUNT_INIT(0);
> +static BLOCKING_NOTIFIER_HEAD(iommu_fault_queue_flush_notifiers);
> +
> +/* Used to store incomplete fault groups */
> +static LIST_HEAD(iommu_partial_faults);
> +static DEFINE_SPINLOCK(iommu_partial_faults_lock);
> +
should partial fault list be per iommu?
> +struct iommu_fault_context {
> +	struct device			*dev;
> +	struct iommu_fault_event	evt;
> +	struct list_head		head;
> +};
> +
> +struct iommu_fault_group {
> +	struct iommu_domain		*domain;
> +	struct iommu_fault_context	last_fault;
> +	struct list_head		faults;
> +	struct work_struct		work;
> +};
> +
> +/*
> + * iommu_fault_complete() - Finish handling a fault
> + *
> + * Send a response if necessary and pass on the sanitized status code
> + */
> +static int iommu_fault_complete(struct iommu_domain *domain, struct
> device *dev,
> +				struct iommu_fault_event *evt, int
> status) +{
> +	struct page_response_msg resp = {
> +		.addr		= evt->addr,
> +		.pasid		= evt->pasid,
> +		.pasid_present	= evt->pasid_valid,
> +		.page_req_group_id = evt->page_req_group_id,
> +		.type		= IOMMU_PAGE_GROUP_RESP,
> +		.private_data	= evt->iommu_private,
> +	};
> +
> +	/*
> +	 * There is no "handling" an unrecoverable fault, so the
> only valid
> +	 * return values are 0 or an error.
> +	 */
> +	if (evt->type == IOMMU_FAULT_DMA_UNRECOV)
> +		return status > 0 ? 0 : status;
> +
> +	/* Someone took ownership of the fault and will complete it
> later */
> +	if (status == IOMMU_PAGE_RESP_HANDLED)
> +		return 0;
> +
> +	/*
> +	 * There was an internal error with handling the recoverable
> fault. Try
> +	 * to complete the fault if possible.
> +	 */
> +	if (status < 0)
> +		status = IOMMU_PAGE_RESP_INVALID;
> +
> +	if (WARN_ON(!domain->ops->page_response))
> +		/*
> +		 * The IOMMU driver shouldn't have submitted
> recoverable faults
> +		 * if it cannot receive a response.
> +		 */
> +		return -EINVAL;
> +
> +	resp.resp_code = status;
> +	return domain->ops->page_response(domain, dev, &resp);
> +}
> +
> +static int iommu_fault_handle_single(struct iommu_fault_context
> *fault) +{
> +	/* TODO */
> +	return -ENODEV;
> +}
> +
> +static void iommu_fault_handle_group(struct work_struct *work)
> +{
> +	struct iommu_fault_group *group;
> +	struct iommu_fault_context *fault, *next;
> +	int status = IOMMU_PAGE_RESP_SUCCESS;
> +
> +	group = container_of(work, struct iommu_fault_group, work);
> +
> +	list_for_each_entry_safe(fault, next, &group->faults, head) {
> +		struct iommu_fault_event *evt = &fault->evt;
> +		/*
> +		 * Errors are sticky: don't handle subsequent faults
> in the
> +		 * group if there is an error.
> +		 */
> +		if (status == IOMMU_PAGE_RESP_SUCCESS)
> +			status = iommu_fault_handle_single(fault);
> +
> +		if (!evt->last_req)
> +			kfree(fault);
> +	}
> +
> +	iommu_fault_complete(group->domain, group->last_fault.dev,
> +			     &group->last_fault.evt, status);
> +	kfree(group);
> +}
> +
> +static int iommu_queue_fault(struct iommu_domain *domain, struct
> device *dev,
> +			     struct iommu_fault_event *evt)
> +{
> +	struct iommu_fault_group *group;
> +	struct iommu_fault_context *fault, *next;
> +
> +	if (!iommu_fault_queue)
> +		return -ENOSYS;
> +
> +	if (!evt->last_req) {
> +		fault = kzalloc(sizeof(*fault), GFP_KERNEL);
> +		if (!fault)
> +			return -ENOMEM;
> +
> +		fault->evt = *evt;
> +		fault->dev = dev;
> +
> +		/* Non-last request of a group. Postpone until the
> last one */
> +		spin_lock(&iommu_partial_faults_lock);
> +		list_add_tail(&fault->head, &iommu_partial_faults);
> +		spin_unlock(&iommu_partial_faults_lock);
> +
> +		return IOMMU_PAGE_RESP_HANDLED;
> +	}
> +
> +	group = kzalloc(sizeof(*group), GFP_KERNEL);
> +	if (!group)
> +		return -ENOMEM;
> +
> +	group->last_fault.evt = *evt;
> +	group->last_fault.dev = dev;
> +	group->domain = domain;
> +	INIT_LIST_HEAD(&group->faults);
> +	list_add(&group->last_fault.head, &group->faults);
> +	INIT_WORK(&group->work, iommu_fault_handle_group);
> +
> +	/* See if we have pending faults for this group */
> +	spin_lock(&iommu_partial_faults_lock);
> +	list_for_each_entry_safe(fault, next, &iommu_partial_faults,
> head) {
> +		if (fault->evt.page_req_group_id ==
> evt->page_req_group_id &&
> +		    fault->dev == dev) {
> +			list_del(&fault->head);
> +			/* Insert *before* the last fault */
> +			list_add(&fault->head, &group->faults);
> +		}
> +	}
> +	spin_unlock(&iommu_partial_faults_lock);
> +
> +	queue_work(iommu_fault_queue, &group->work);
> +
> +	/* Postpone the fault completion */
> +	return IOMMU_PAGE_RESP_HANDLED;
> +}
> +
> +/**
> + * iommu_report_device_fault() - Handle fault in device driver or mm
> + *
> + * If the device driver expressed interest in handling fault, report
> it through
> + * the callback. If the fault is recoverable, try to page in the
> address.
> + */
> +int iommu_report_device_fault(struct device *dev, struct
> iommu_fault_event *evt) +{
> +	int ret = -ENOSYS;
> +	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> +
> +	if (!domain)
> +		return -ENODEV;
> +
> +	/*
> +	 * if upper layers showed interest and installed a fault
> handler,
> +	 * invoke it.
> +	 */
> +	if (iommu_has_device_fault_handler(dev)) {
I think Alex pointed out this is racy, so adding a mutex to the
iommu_fault_param and acquire it would help. Do we really
atomic handler?
> +		struct iommu_fault_param *param =
> dev->iommu_param->fault_param; +
> +		return param->handler(evt, param->data);
Even upper layer (VFIO) registered handler to propagate PRQ to a guest
to fault in the pages, we may still need to keep track of the page
requests that need page response later, i.e. last page in group or
stream request in vt-d. This will allow us sanitize the page response
come back from the guest/VFIO.
In my next round, I am adding a per device list under iommu_fault_param
for pending page request. This will also address the situation where
guest failed to send response. We can enforce time or credit limit of
pending requests based on this list.

> +	}
> +
> +	/* If the handler is blocking, handle fault in the workqueue
> */
> +	if (evt->type == IOMMU_FAULT_PAGE_REQ)
> +		ret = iommu_queue_fault(domain, dev, evt);
> +
> +	return iommu_fault_complete(domain, dev, evt, ret);
> +}
> +EXPORT_SYMBOL_GPL(iommu_report_device_fault);
> +
> +/**
> + * iommu_fault_queue_register() - register an IOMMU driver to the
> fault queue
> + * @flush_notifier: a notifier block that is called before the fault
> queue is
> + * flushed. The IOMMU driver should commit all faults that are
> pending in its
> + * low-level queues at the time of the call, into the fault queue.
> The notifier
> + * takes a device pointer as argument, hinting what endpoint is
> causing the
> + * flush. When the device is NULL, all faults should be committed.
> + */
> +int iommu_fault_queue_register(struct notifier_block *flush_notifier)
> +{
> +	/*
> +	 * The WQ is unordered because the low-level handler
> enqueues faults by
> +	 * group. PRI requests within a group have to be ordered,
> but once
> +	 * that's dealt with, the high-level function can handle
> groups out of
> +	 * order.
> +	 */
> +	down_write(&iommu_fault_queue_sem);
> +	if (!iommu_fault_queue) {
> +		iommu_fault_queue =
> alloc_workqueue("iommu_fault_queue",
> +						    WQ_UNBOUND, 0);
> +		if (iommu_fault_queue)
> +			refcount_set(&iommu_fault_queue_refs, 1);
> +	} else {
> +		refcount_inc(&iommu_fault_queue_refs);
> +	}
> +	up_write(&iommu_fault_queue_sem);
> +
> +	if (!iommu_fault_queue)
> +		return -ENOMEM;
> +
> +	if (flush_notifier)
> +
> blocking_notifier_chain_register(&iommu_fault_queue_flush_notifiers,
> +						 flush_notifier);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(iommu_fault_queue_register);
> +
> +/**
> + * iommu_fault_queue_flush() - Ensure that all queued faults have
> been
> + * processed.
> + * @dev: the endpoint whose faults need to be flushed. If NULL,
> flush all
> + *       pending faults.
> + *
> + * Users must call this function when releasing a PASID, to ensure
> that all
> + * pending faults affecting this PASID have been handled, and won't
> affect the
> + * address space of a subsequent process that reuses this PASID.
> + */
> +void iommu_fault_queue_flush(struct device *dev)
> +{
> +
> blocking_notifier_call_chain(&iommu_fault_queue_flush_notifiers, 0,
> dev); +
> +	down_read(&iommu_fault_queue_sem);
> +	/*
> +	 * Don't flush the partial faults list. All PRGs with the
> PASID are
> +	 * complete and have been submitted to the queue.
> +	 */
> +	if (iommu_fault_queue)
> +		flush_workqueue(iommu_fault_queue);
> +	up_read(&iommu_fault_queue_sem);
> +}
> +EXPORT_SYMBOL_GPL(iommu_fault_queue_flush);
> +
> +/**
> + * iommu_fault_queue_unregister() - Unregister an IOMMU driver from
> the fault
> + * queue.
> + * @flush_notifier: same parameter as iommu_fault_queue_register
> + */
> +void iommu_fault_queue_unregister(struct notifier_block
> *flush_notifier) +{
> +	down_write(&iommu_fault_queue_sem);
> +	if (refcount_dec_and_test(&iommu_fault_queue_refs)) {
> +		destroy_workqueue(iommu_fault_queue);
> +		iommu_fault_queue = NULL;
> +	}
> +	up_write(&iommu_fault_queue_sem);
> +
> +	if (flush_notifier)
> +
> blocking_notifier_chain_unregister(&iommu_fault_queue_flush_notifiers,
> +						   flush_notifier);
> +}
> +EXPORT_SYMBOL_GPL(iommu_fault_queue_unregister);
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> index 4bc2a8c12465..d7b231cd7355 100644
> --- a/drivers/iommu/iommu-sva.c
> +++ b/drivers/iommu/iommu-sva.c
> @@ -102,9 +102,6 @@
>   * the device table and PASID 0 would be available to the allocator.
>   */
>  
> -/* TODO: stub for the fault queue. Remove later. */
> -#define iommu_fault_queue_flush(...)
> -
>  struct iommu_bond {
>  	struct io_mm		*io_mm;
>  	struct device		*dev;
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 1d60b32a6744..c475893ec7dc 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -798,6 +798,17 @@ int iommu_group_unregister_notifier(struct
> iommu_group *group, }
>  EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
>  
> +/**
> + * iommu_register_device_fault_handler() - Register a device fault
> handler
> + * @dev: the device
> + * @handler: the fault handler
> + * @data: private data passed as argument to the callback
> + *
> + * When an IOMMU fault event is received, call this handler with the
> fault event
> + * and data as argument.
> + *
> + * Return 0 if the fault handler was installed successfully, or an
> error.
> + */
>  int iommu_register_device_fault_handler(struct device *dev,
>  					iommu_dev_fault_handler_t
> handler, void *data)
> @@ -825,6 +836,13 @@ int iommu_register_device_fault_handler(struct
> device *dev, }
>  EXPORT_SYMBOL_GPL(iommu_register_device_fault_handler);
>  
> +/**
> + * iommu_unregister_device_fault_handler() - Unregister the device
> fault handler
> + * @dev: the device
> + *
> + * Remove the device fault handler installed with
> + * iommu_register_device_fault_handler().
> + */
>  int iommu_unregister_device_fault_handler(struct device *dev)
>  {
>  	struct iommu_param *idata = dev->iommu_param;
> @@ -840,19 +858,6 @@ int iommu_unregister_device_fault_handler(struct
> device *dev) }
>  EXPORT_SYMBOL_GPL(iommu_unregister_device_fault_handler);
>  
> -
> -int iommu_report_device_fault(struct device *dev, struct
> iommu_fault_event *evt) -{
> -	/* we only report device fault if there is a handler
> registered */
> -	if (!dev->iommu_param || !dev->iommu_param->fault_param ||
> -		!dev->iommu_param->fault_param->handler)
> -		return -ENOSYS;
> -
> -	return dev->iommu_param->fault_param->handler(evt,
> -
> dev->iommu_param->fault_param->data); -}
> -EXPORT_SYMBOL_GPL(iommu_report_device_fault);
> -
>  /**
>   * iommu_group_id - Return ID for a group
>   * @group: the group to ID
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 226ab4f3ae0e..65e56f28e0ce 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -205,6 +205,7 @@ struct page_response_msg {
>  	u32 resp_code:4;
>  #define IOMMU_PAGE_RESP_SUCCESS	0
>  #define IOMMU_PAGE_RESP_INVALID	1
> +#define IOMMU_PAGE_RESP_HANDLED	2
>  #define IOMMU_PAGE_RESP_FAILURE	0xF
>  
>  	u32 pasid_present:1;
> @@ -534,7 +535,6 @@ extern int
> iommu_register_device_fault_handler(struct device *dev, 
>  extern int iommu_unregister_device_fault_handler(struct device *dev);
>  
> -extern int iommu_report_device_fault(struct device *dev, struct
> iommu_fault_event *evt); extern int iommu_page_response(struct
> iommu_domain *domain, struct device *dev, struct page_response_msg
> *msg); 
> @@ -836,11 +836,6 @@ static inline bool
> iommu_has_device_fault_handler(struct device *dev) return false;
>  }
>  
> -static inline int iommu_report_device_fault(struct device *dev,
> struct iommu_fault_event *evt) -{
> -	return 0;
> -}
> -
>  static inline int iommu_page_response(struct iommu_domain *domain,
> struct device *dev, struct page_response_msg *msg)
>  {
> @@ -1005,4 +1000,31 @@ static inline struct mm_struct
> *iommu_sva_find(int pasid) }
>  #endif /* CONFIG_IOMMU_SVA */
>  
> +#ifdef CONFIG_IOMMU_FAULT
> +extern int iommu_fault_queue_register(struct notifier_block
> *flush_notifier); +extern void iommu_fault_queue_flush(struct device
> *dev); +extern void iommu_fault_queue_unregister(struct
> notifier_block *flush_notifier); +extern int
> iommu_report_device_fault(struct device *dev,
> +				     struct iommu_fault_event *evt);
> +#else /* CONFIG_IOMMU_FAULT */
> +static inline int iommu_fault_queue_register(struct notifier_block
> *flush_notifier) +{
> +	return -ENODEV;
> +}
> +
> +static inline void iommu_fault_queue_flush(struct device *dev)
> +{
> +}
> +
> +static inline void iommu_fault_queue_unregister(struct
> notifier_block *flush_notifier) +{
> +}
> +
> +static inline int iommu_report_device_fault(struct device *dev,
> +					    struct iommu_fault_event
> *evt) +{
> +	return 0;
> +}
> +#endif /* CONFIG_IOMMU_FAULT */
> +
>  #endif /* __LINUX_IOMMU_H */

[Jacob Pan]

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 08/37] iommu/fault: Handle mm faults
  2018-02-12 18:33   ` Jean-Philippe Brucker
  (?)
@ 2018-02-14 18:46       ` Jacob Pan
  -1 siblings, 0 replies; 317+ messages in thread
From: Jacob Pan @ 2018-02-14 18:46 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: mark.rutland-5wv7dgnIgG8,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, catalin.marinas-5wv7dgnIgG8,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	shunyong.yang-PT9Dzx9SjPiXmMXjJBpWqg,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw-LthD3rsA81gm4RdzfppkhA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	sudeep.holla-5wv7dgnIgG8, christian.koenig-5C7GfCeVMHo

On Mon, 12 Feb 2018 18:33:23 +0000
Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:

> When a recoverable page fault is handled by the fault workqueue, find
> the associated mm and call handle_mm_fault.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
> ---
>  drivers/iommu/io-pgfault.c | 89
> ++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 87
> insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
> index 33309ed316d2..565ec01a1b5f 100644
> --- a/drivers/iommu/io-pgfault.c
> +++ b/drivers/iommu/io-pgfault.c
> @@ -9,6 +9,7 @@
>  
>  #include <linux/iommu.h>
>  #include <linux/list.h>
> +#include <linux/sched/mm.h>
>  #include <linux/slab.h>
>  #include <linux/workqueue.h>
>  
> @@ -82,8 +83,92 @@ static int iommu_fault_complete(struct
> iommu_domain *domain, struct device *dev, 
>  static int iommu_fault_handle_single(struct iommu_fault_context
> *fault) {
> -	/* TODO */
> -	return -ENODEV;
> +	struct mm_struct *mm;
> +	struct vm_area_struct *vma;
> +	unsigned int access_flags = 0;
unsigned long to match vm_flags?
> +	int ret = IOMMU_PAGE_RESP_INVALID;
> +	unsigned int fault_flags = FAULT_FLAG_REMOTE;
> +	struct iommu_fault_event *evt = &fault->evt;
> +
> +	if (!evt->pasid_valid)
> +		return ret;
I guess for not we don't handle PRQ without PASID, right?
> +
> +	/*
> +	 * Special case: PASID Stop Marker (LRW = 0b100) doesn't
> expect a
> +	 * response. A Stop Marker may be generated when disabling a
> PASID
> +	 * (issuing a PASID stop request) in some PCI devices.
> +	 *
> +	 * When the mm_exit() callback returns from the device
> driver, no page
> +	 * request is generated for this PASID anymore and
> outstanding ones have
> +	 * been pushed to the IOMMU (as per PCIe 4.0r1.0 - 6.20.1
> and 10.4.1.2 -
> +	 * Managing PASID TLP Prefix Usage). Some PCI devices will
> wait for all
> +	 * outstanding page requests to come back with a response
> before
> +	 * completing the PASID stop request. Others do not wait for
> page
> +	 * responses, and instead issue this Stop Marker that tells
> us when the
> +	 * PASID can be reallocated.
> +	 *
> +	 * We ignore the Stop Marker because:
> +	 * a. Page requests, which are posted requests, have been
> flushed to the
> +	 *    IOMMU when mm_exit() returns,
> +	 * b. We flush all fault queues after mm_exit() returns and
> before
> +	 *    freeing the PASID.
> +	 *
> +	 * So even though the Stop Marker might be issued by the
> device *after*
> +	 * the stop request completes, outstanding faults will have
> been dealt
> +	 * with by the time we free the PASID.
> +	 */
> +	if (evt->last_req &&
> +	    !(evt->prot & (IOMMU_FAULT_READ | IOMMU_FAULT_WRITE)))
> +		return IOMMU_PAGE_RESP_HANDLED;
> +
If we don't expect a page response, shouldn't it be filtered by the
IOMMU vendor driver in the first place? i.e. in the vendor IOMMU driver
PRQ handler, it will sanitize the request anyway, for anything that
does not need response, it will not call iommu_report_device_fault().
> +	mm = iommu_sva_find(evt->pasid);
> +	if (!mm)
> +		return ret;
> +
> +	down_read(&mm->mmap_sem);
> +
> +	vma = find_extend_vma(mm, evt->addr);
> +	if (!vma)
> +		/* Unmapped area */
> +		goto out_put_mm;
> +
> +	if (evt->prot & IOMMU_FAULT_READ)
> +		access_flags |= VM_READ;
> +
> +	if (evt->prot & IOMMU_FAULT_WRITE) {
> +		access_flags |= VM_WRITE;
> +		fault_flags |= FAULT_FLAG_WRITE;
> +	}
> +
> +	if (evt->prot & IOMMU_FAULT_EXEC) {
> +		access_flags |= VM_EXEC;
> +		fault_flags |= FAULT_FLAG_INSTRUCTION;
> +	}
> +
> +	if (!(evt->prot & IOMMU_FAULT_PRIV))
> +		fault_flags |= FAULT_FLAG_USER;
> +
> +	if (access_flags & ~vma->vm_flags)
> +		/* Access fault */
> +		goto out_put_mm;
> +
> +	ret = handle_mm_fault(vma, evt->addr, fault_flags);
> +	ret = ret & VM_FAULT_ERROR ? IOMMU_PAGE_RESP_INVALID :
> +		IOMMU_PAGE_RESP_SUCCESS;
> +
> +out_put_mm:
> +	up_read(&mm->mmap_sem);
> +
> +	/*
> +	 * If the process exits while we're handling the fault on
> its mm, we
> +	 * can't do mmput(). exit_mmap() would release the MMU
> notifier, calling
> +	 * iommu_notifier_release(), which has to flush the fault
> queue that
> +	 * we're executing on... So mmput_async() moves the release
> of the mm to
> +	 * another thread, if we're the last user.
> +	 */
> +	mmput_async(mm);
> +
> +	return ret;
>  }
>  
>  static void iommu_fault_handle_group(struct work_struct *work)

[Jacob Pan]

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 08/37] iommu/fault: Handle mm faults
@ 2018-02-14 18:46       ` Jacob Pan
  0 siblings, 0 replies; 317+ messages in thread
From: Jacob Pan @ 2018-02-14 18:46 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm,
	joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, yi.l.liu, ashok.raj, robdclark, christian.koenig,
	bharatku, jacob.jun.pan

On Mon, 12 Feb 2018 18:33:23 +0000
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> When a recoverable page fault is handled by the fault workqueue, find
> the associated mm and call handle_mm_fault.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  drivers/iommu/io-pgfault.c | 89
> ++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 87
> insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
> index 33309ed316d2..565ec01a1b5f 100644
> --- a/drivers/iommu/io-pgfault.c
> +++ b/drivers/iommu/io-pgfault.c
> @@ -9,6 +9,7 @@
>  
>  #include <linux/iommu.h>
>  #include <linux/list.h>
> +#include <linux/sched/mm.h>
>  #include <linux/slab.h>
>  #include <linux/workqueue.h>
>  
> @@ -82,8 +83,92 @@ static int iommu_fault_complete(struct
> iommu_domain *domain, struct device *dev, 
>  static int iommu_fault_handle_single(struct iommu_fault_context
> *fault) {
> -	/* TODO */
> -	return -ENODEV;
> +	struct mm_struct *mm;
> +	struct vm_area_struct *vma;
> +	unsigned int access_flags = 0;
unsigned long to match vm_flags?
> +	int ret = IOMMU_PAGE_RESP_INVALID;
> +	unsigned int fault_flags = FAULT_FLAG_REMOTE;
> +	struct iommu_fault_event *evt = &fault->evt;
> +
> +	if (!evt->pasid_valid)
> +		return ret;
I guess for not we don't handle PRQ without PASID, right?
> +
> +	/*
> +	 * Special case: PASID Stop Marker (LRW = 0b100) doesn't
> expect a
> +	 * response. A Stop Marker may be generated when disabling a
> PASID
> +	 * (issuing a PASID stop request) in some PCI devices.
> +	 *
> +	 * When the mm_exit() callback returns from the device
> driver, no page
> +	 * request is generated for this PASID anymore and
> outstanding ones have
> +	 * been pushed to the IOMMU (as per PCIe 4.0r1.0 - 6.20.1
> and 10.4.1.2 -
> +	 * Managing PASID TLP Prefix Usage). Some PCI devices will
> wait for all
> +	 * outstanding page requests to come back with a response
> before
> +	 * completing the PASID stop request. Others do not wait for
> page
> +	 * responses, and instead issue this Stop Marker that tells
> us when the
> +	 * PASID can be reallocated.
> +	 *
> +	 * We ignore the Stop Marker because:
> +	 * a. Page requests, which are posted requests, have been
> flushed to the
> +	 *    IOMMU when mm_exit() returns,
> +	 * b. We flush all fault queues after mm_exit() returns and
> before
> +	 *    freeing the PASID.
> +	 *
> +	 * So even though the Stop Marker might be issued by the
> device *after*
> +	 * the stop request completes, outstanding faults will have
> been dealt
> +	 * with by the time we free the PASID.
> +	 */
> +	if (evt->last_req &&
> +	    !(evt->prot & (IOMMU_FAULT_READ | IOMMU_FAULT_WRITE)))
> +		return IOMMU_PAGE_RESP_HANDLED;
> +
If we don't expect a page response, shouldn't it be filtered by the
IOMMU vendor driver in the first place? i.e. in the vendor IOMMU driver
PRQ handler, it will sanitize the request anyway, for anything that
does not need response, it will not call iommu_report_device_fault().
> +	mm = iommu_sva_find(evt->pasid);
> +	if (!mm)
> +		return ret;
> +
> +	down_read(&mm->mmap_sem);
> +
> +	vma = find_extend_vma(mm, evt->addr);
> +	if (!vma)
> +		/* Unmapped area */
> +		goto out_put_mm;
> +
> +	if (evt->prot & IOMMU_FAULT_READ)
> +		access_flags |= VM_READ;
> +
> +	if (evt->prot & IOMMU_FAULT_WRITE) {
> +		access_flags |= VM_WRITE;
> +		fault_flags |= FAULT_FLAG_WRITE;
> +	}
> +
> +	if (evt->prot & IOMMU_FAULT_EXEC) {
> +		access_flags |= VM_EXEC;
> +		fault_flags |= FAULT_FLAG_INSTRUCTION;
> +	}
> +
> +	if (!(evt->prot & IOMMU_FAULT_PRIV))
> +		fault_flags |= FAULT_FLAG_USER;
> +
> +	if (access_flags & ~vma->vm_flags)
> +		/* Access fault */
> +		goto out_put_mm;
> +
> +	ret = handle_mm_fault(vma, evt->addr, fault_flags);
> +	ret = ret & VM_FAULT_ERROR ? IOMMU_PAGE_RESP_INVALID :
> +		IOMMU_PAGE_RESP_SUCCESS;
> +
> +out_put_mm:
> +	up_read(&mm->mmap_sem);
> +
> +	/*
> +	 * If the process exits while we're handling the fault on
> its mm, we
> +	 * can't do mmput(). exit_mmap() would release the MMU
> notifier, calling
> +	 * iommu_notifier_release(), which has to flush the fault
> queue that
> +	 * we're executing on... So mmput_async() moves the release
> of the mm to
> +	 * another thread, if we're the last user.
> +	 */
> +	mmput_async(mm);
> +
> +	return ret;
>  }
>  
>  static void iommu_fault_handle_group(struct work_struct *work)

[Jacob Pan]

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 08/37] iommu/fault: Handle mm faults
@ 2018-02-14 18:46       ` Jacob Pan
  0 siblings, 0 replies; 317+ messages in thread
From: Jacob Pan @ 2018-02-14 18:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 12 Feb 2018 18:33:23 +0000
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> When a recoverable page fault is handled by the fault workqueue, find
> the associated mm and call handle_mm_fault.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  drivers/iommu/io-pgfault.c | 89
> ++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 87
> insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
> index 33309ed316d2..565ec01a1b5f 100644
> --- a/drivers/iommu/io-pgfault.c
> +++ b/drivers/iommu/io-pgfault.c
> @@ -9,6 +9,7 @@
>  
>  #include <linux/iommu.h>
>  #include <linux/list.h>
> +#include <linux/sched/mm.h>
>  #include <linux/slab.h>
>  #include <linux/workqueue.h>
>  
> @@ -82,8 +83,92 @@ static int iommu_fault_complete(struct
> iommu_domain *domain, struct device *dev, 
>  static int iommu_fault_handle_single(struct iommu_fault_context
> *fault) {
> -	/* TODO */
> -	return -ENODEV;
> +	struct mm_struct *mm;
> +	struct vm_area_struct *vma;
> +	unsigned int access_flags = 0;
unsigned long to match vm_flags?
> +	int ret = IOMMU_PAGE_RESP_INVALID;
> +	unsigned int fault_flags = FAULT_FLAG_REMOTE;
> +	struct iommu_fault_event *evt = &fault->evt;
> +
> +	if (!evt->pasid_valid)
> +		return ret;
I guess for not we don't handle PRQ without PASID, right?
> +
> +	/*
> +	 * Special case: PASID Stop Marker (LRW = 0b100) doesn't
> expect a
> +	 * response. A Stop Marker may be generated when disabling a
> PASID
> +	 * (issuing a PASID stop request) in some PCI devices.
> +	 *
> +	 * When the mm_exit() callback returns from the device
> driver, no page
> +	 * request is generated for this PASID anymore and
> outstanding ones have
> +	 * been pushed to the IOMMU (as per PCIe 4.0r1.0 - 6.20.1
> and 10.4.1.2 -
> +	 * Managing PASID TLP Prefix Usage). Some PCI devices will
> wait for all
> +	 * outstanding page requests to come back with a response
> before
> +	 * completing the PASID stop request. Others do not wait for
> page
> +	 * responses, and instead issue this Stop Marker that tells
> us when the
> +	 * PASID can be reallocated.
> +	 *
> +	 * We ignore the Stop Marker because:
> +	 * a. Page requests, which are posted requests, have been
> flushed to the
> +	 *    IOMMU when mm_exit() returns,
> +	 * b. We flush all fault queues after mm_exit() returns and
> before
> +	 *    freeing the PASID.
> +	 *
> +	 * So even though the Stop Marker might be issued by the
> device *after*
> +	 * the stop request completes, outstanding faults will have
> been dealt
> +	 * with by the time we free the PASID.
> +	 */
> +	if (evt->last_req &&
> +	    !(evt->prot & (IOMMU_FAULT_READ | IOMMU_FAULT_WRITE)))
> +		return IOMMU_PAGE_RESP_HANDLED;
> +
If we don't expect a page response, shouldn't it be filtered by the
IOMMU vendor driver in the first place? i.e. in the vendor IOMMU driver
PRQ handler, it will sanitize the request anyway, for anything that
does not need response, it will not call iommu_report_device_fault().
> +	mm = iommu_sva_find(evt->pasid);
> +	if (!mm)
> +		return ret;
> +
> +	down_read(&mm->mmap_sem);
> +
> +	vma = find_extend_vma(mm, evt->addr);
> +	if (!vma)
> +		/* Unmapped area */
> +		goto out_put_mm;
> +
> +	if (evt->prot & IOMMU_FAULT_READ)
> +		access_flags |= VM_READ;
> +
> +	if (evt->prot & IOMMU_FAULT_WRITE) {
> +		access_flags |= VM_WRITE;
> +		fault_flags |= FAULT_FLAG_WRITE;
> +	}
> +
> +	if (evt->prot & IOMMU_FAULT_EXEC) {
> +		access_flags |= VM_EXEC;
> +		fault_flags |= FAULT_FLAG_INSTRUCTION;
> +	}
> +
> +	if (!(evt->prot & IOMMU_FAULT_PRIV))
> +		fault_flags |= FAULT_FLAG_USER;
> +
> +	if (access_flags & ~vma->vm_flags)
> +		/* Access fault */
> +		goto out_put_mm;
> +
> +	ret = handle_mm_fault(vma, evt->addr, fault_flags);
> +	ret = ret & VM_FAULT_ERROR ? IOMMU_PAGE_RESP_INVALID :
> +		IOMMU_PAGE_RESP_SUCCESS;
> +
> +out_put_mm:
> +	up_read(&mm->mmap_sem);
> +
> +	/*
> +	 * If the process exits while we're handling the fault on
> its mm, we
> +	 * can't do mmput(). exit_mmap() would release the MMU
> notifier, calling
> +	 * iommu_notifier_release(), which has to flush the fault
> queue that
> +	 * we're executing on... So mmput_async() moves the release
> of the mm to
> +	 * another thread, if we're the last user.
> +	 */
> +	mmput_async(mm);
> +
> +	return ret;
>  }
>  
>  static void iommu_fault_handle_group(struct work_struct *work)

[Jacob Pan]

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 01/37] iommu: Introduce Shared Virtual Addressing API
  2018-02-12 18:33   ` Jean-Philippe Brucker
  (?)
@ 2018-02-15  9:59     ` Joerg Roedel
  -1 siblings, 0 replies; 317+ messages in thread
From: Joerg Roedel @ 2018-02-15  9:59 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm,
	robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu

On Mon, Feb 12, 2018 at 06:33:16PM +0000, Jean-Philippe Brucker wrote:
  
> +config IOMMU_SVA
> +	bool "Shared Virtual Addressing API for the IOMMU"
> +	select IOMMU_API
> +	help
> +	  Enable process address space management for the IOMMU API. In systems
> +	  that support it, device drivers can bind process address spaces to
> +	  devices and share their page tables using this API.
> +
> +	  If unsure, say N here.

I think this should be an option selected by IOMMU driver and not be
activly selectable by the user.

> +/**
> + * iommu_sva_device_shutdown() - Shutdown Shared Virtual Addressing for a device
> + * @dev: the device
> + *
> + * Disable SVA. The device should not be performing any DMA while this function
> + * is running.

Is this a good idea? How about devices that get hot-unplugged while
processes still use them and there is DMA going back and forth? This
function can be the point to shut down all ongoing stuff first and the
shutdown the device.


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 01/37] iommu: Introduce Shared Virtual Addressing API
@ 2018-02-15  9:59     ` Joerg Roedel
  0 siblings, 0 replies; 317+ messages in thread
From: Joerg Roedel @ 2018-02-15  9:59 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: mark.rutland, xieyisheng1, ilias.apalodimas, kvm, linux-pci,
	xuzaibo, jonathan.cameron, will.deacon, okaya, yi.l.liu,
	lorenzo.pieralisi, ashok.raj, tn, robdclark, bharatku,
	linux-acpi, catalin.marinas, rfranz, lenb, devicetree,
	jacob.jun.pan, alex.williamson, robh+dt, thunder.leizhen,
	bhelgaas, linux-arm-kernel, shunyong.yang, dwmw2, liubo95, rjw,
	jcrouse, iommu, hanjun.guo, sudeep.holla, robin.murphy,
	christian.koenig, nwatters

On Mon, Feb 12, 2018 at 06:33:16PM +0000, Jean-Philippe Brucker wrote:
  
> +config IOMMU_SVA
> +	bool "Shared Virtual Addressing API for the IOMMU"
> +	select IOMMU_API
> +	help
> +	  Enable process address space management for the IOMMU API. In systems
> +	  that support it, device drivers can bind process address spaces to
> +	  devices and share their page tables using this API.
> +
> +	  If unsure, say N here.

I think this should be an option selected by IOMMU driver and not be
activly selectable by the user.

> +/**
> + * iommu_sva_device_shutdown() - Shutdown Shared Virtual Addressing for a device
> + * @dev: the device
> + *
> + * Disable SVA. The device should not be performing any DMA while this function
> + * is running.

Is this a good idea? How about devices that get hot-unplugged while
processes still use them and there is DMA going back and forth? This
function can be the point to shut down all ongoing stuff first and the
shutdown the device.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 01/37] iommu: Introduce Shared Virtual Addressing API
@ 2018-02-15  9:59     ` Joerg Roedel
  0 siblings, 0 replies; 317+ messages in thread
From: Joerg Roedel @ 2018-02-15  9:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 12, 2018 at 06:33:16PM +0000, Jean-Philippe Brucker wrote:
  
> +config IOMMU_SVA
> +	bool "Shared Virtual Addressing API for the IOMMU"
> +	select IOMMU_API
> +	help
> +	  Enable process address space management for the IOMMU API. In systems
> +	  that support it, device drivers can bind process address spaces to
> +	  devices and share their page tables using this API.
> +
> +	  If unsure, say N here.

I think this should be an option selected by IOMMU driver and not be
activly selectable by the user.

> +/**
> + * iommu_sva_device_shutdown() - Shutdown Shared Virtual Addressing for a device
> + * @dev: the device
> + *
> + * Disable SVA. The device should not be performing any DMA while this function
> + * is running.

Is this a good idea? How about devices that get hot-unplugged while
processes still use them and there is DMA going back and forth? This
function can be the point to shut down all ongoing stuff first and the
shutdown the device.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices
  2018-02-13 12:57       ` Jean-Philippe Brucker
  (?)
  (?)
@ 2018-02-15 10:21           ` joro
  -1 siblings, 0 replies; 317+ messages in thread
From: joro-zLv9SwRftAIdnm+yROfE0A @ 2018-02-15 10:21 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Mark Rutland, ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	mykyta.iziumtsev-QSEj5FYQhm4dnm+yROfE0A,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, Raj, Ashok,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, Catalin Marinas,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org

On Tue, Feb 13, 2018 at 12:57:23PM +0000, Jean-Philippe Brucker wrote:
> * bind_device() fails if the device's group has more than one device,
> otherwise calls __bind_device(). This prevents device drivers that are
> oblivious to IOMMU groups from opening a backdoor.
> 
> * bind_group() calls __bind_device() for all devices in group. This way
> users that are aware of IOMMU groups can still use them safely. Note that
> at the moment bind_group() fails as soon as it finds a device that doesn't
> support SVA. Having all devices support SVA in a given group is
> unrealistic and this behavior ought to be improved.

Yeah, so the problem on PCI is that all functions of a multi-function
device are put into one group. For AMD-GPUs this means that the GPU
(SVA-capable) will end up in the same group as the on-GPU sound
device (not SVA-capable).

Before this causes us big headaches I suggest to only provide the
bind_device() function. This should be fine because for SVA we don't
need all types of isolation that iommu_groups provide.

IOMMU-groups provide two types of isolation:

	1) They group devices together which the IOMMU can't distinguish
	   from each other, like PCI devices behind a PCIe bridge.

	2) Devices that can't be isolated from each other are also put
	   into the same group. This is the case for multi-function PCIe
	   devices as well as all PCIe devices behind a non-ACS bridge.
	   But all these devices cann still be distinguished by the
	   IOMMU.

These two types of protection are needed to safely assign devices to
guests, but for bare-metal SVA all we need is type 1) isolation, and
not even that if we can assume that all SVA-capable devices have an
exclusive device-id (or stream-id).



	Joerg

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices
@ 2018-02-15 10:21           ` joro
  0 siblings, 0 replies; 317+ messages in thread
From: joro @ 2018-02-15 10:21 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Mark Rutland, xieyisheng1, ilias.apalodimas, mykyta.iziumtsev,
	kvm, linux-pci, xuzaibo, jonathan.cameron, Will Deacon, okaya,
	Liu, Yi L, Lorenzo Pieralisi, Raj, Ashok, tn, robdclark,
	bharatku, linux-acpi, Catalin Marinas, rfranz, lenb, devicetree,
	Tian, Kevin, jacob.jun.pan, alex.williamson, robh+dt,
	thunder.leizhen, bhelgaas, linux-arm-kernel, shunyong.yang,
	dwmw2, liubo95, rjw, jcrouse, iommu, hanjun.guo, Sudeep Holla,
	Robin Murphy, christian.koenig, nwatters

On Tue, Feb 13, 2018 at 12:57:23PM +0000, Jean-Philippe Brucker wrote:
> * bind_device() fails if the device's group has more than one device,
> otherwise calls __bind_device(). This prevents device drivers that are
> oblivious to IOMMU groups from opening a backdoor.
> 
> * bind_group() calls __bind_device() for all devices in group. This way
> users that are aware of IOMMU groups can still use them safely. Note that
> at the moment bind_group() fails as soon as it finds a device that doesn't
> support SVA. Having all devices support SVA in a given group is
> unrealistic and this behavior ought to be improved.

Yeah, so the problem on PCI is that all functions of a multi-function
device are put into one group. For AMD-GPUs this means that the GPU
(SVA-capable) will end up in the same group as the on-GPU sound
device (not SVA-capable).

Before this causes us big headaches I suggest to only provide the
bind_device() function. This should be fine because for SVA we don't
need all types of isolation that iommu_groups provide.

IOMMU-groups provide two types of isolation:

	1) They group devices together which the IOMMU can't distinguish
	   from each other, like PCI devices behind a PCIe bridge.

	2) Devices that can't be isolated from each other are also put
	   into the same group. This is the case for multi-function PCIe
	   devices as well as all PCIe devices behind a non-ACS bridge.
	   But all these devices cann still be distinguished by the
	   IOMMU.

These two types of protection are needed to safely assign devices to
guests, but for bare-metal SVA all we need is type 1) isolation, and
not even that if we can assume that all SVA-capable devices have an
exclusive device-id (or stream-id).



	Joerg


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices
@ 2018-02-15 10:21           ` joro
  0 siblings, 0 replies; 317+ messages in thread
From: joro-zLv9SwRftAIdnm+yROfE0A @ 2018-02-15 10:21 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Mark Rutland, ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	mykyta.iziumtsev-QSEj5FYQhm4dnm+yROfE0A,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, Raj, Ashok,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, Catalin Marinas,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org

On Tue, Feb 13, 2018 at 12:57:23PM +0000, Jean-Philippe Brucker wrote:
> * bind_device() fails if the device's group has more than one device,
> otherwise calls __bind_device(). This prevents device drivers that are
> oblivious to IOMMU groups from opening a backdoor.
> 
> * bind_group() calls __bind_device() for all devices in group. This way
> users that are aware of IOMMU groups can still use them safely. Note that
> at the moment bind_group() fails as soon as it finds a device that doesn't
> support SVA. Having all devices support SVA in a given group is
> unrealistic and this behavior ought to be improved.

Yeah, so the problem on PCI is that all functions of a multi-function
device are put into one group. For AMD-GPUs this means that the GPU
(SVA-capable) will end up in the same group as the on-GPU sound
device (not SVA-capable).

Before this causes us big headaches I suggest to only provide the
bind_device() function. This should be fine because for SVA we don't
need all types of isolation that iommu_groups provide.

IOMMU-groups provide two types of isolation:

	1) They group devices together which the IOMMU can't distinguish
	   from each other, like PCI devices behind a PCIe bridge.

	2) Devices that can't be isolated from each other are also put
	   into the same group. This is the case for multi-function PCIe
	   devices as well as all PCIe devices behind a non-ACS bridge.
	   But all these devices cann still be distinguished by the
	   IOMMU.

These two types of protection are needed to safely assign devices to
guests, but for bare-metal SVA all we need is type 1) isolation, and
not even that if we can assume that all SVA-capable devices have an
exclusive device-id (or stream-id).



	Joerg

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 02/37] iommu/sva: Bind process address spaces to devices
@ 2018-02-15 10:21           ` joro
  0 siblings, 0 replies; 317+ messages in thread
From: joro at 8bytes.org @ 2018-02-15 10:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 13, 2018 at 12:57:23PM +0000, Jean-Philippe Brucker wrote:
> * bind_device() fails if the device's group has more than one device,
> otherwise calls __bind_device(). This prevents device drivers that are
> oblivious to IOMMU groups from opening a backdoor.
> 
> * bind_group() calls __bind_device() for all devices in group. This way
> users that are aware of IOMMU groups can still use them safely. Note that
> at the moment bind_group() fails as soon as it finds a device that doesn't
> support SVA. Having all devices support SVA in a given group is
> unrealistic and this behavior ought to be improved.

Yeah, so the problem on PCI is that all functions of a multi-function
device are put into one group. For AMD-GPUs this means that the GPU
(SVA-capable) will end up in the same group as the on-GPU sound
device (not SVA-capable).

Before this causes us big headaches I suggest to only provide the
bind_device() function. This should be fine because for SVA we don't
need all types of isolation that iommu_groups provide.

IOMMU-groups provide two types of isolation:

	1) They group devices together which the IOMMU can't distinguish
	   from each other, like PCI devices behind a PCIe bridge.

	2) Devices that can't be isolated from each other are also put
	   into the same group. This is the case for multi-function PCIe
	   devices as well as all PCIe devices behind a non-ACS bridge.
	   But all these devices cann still be distinguished by the
	   IOMMU.

These two types of protection are needed to safely assign devices to
guests, but for bare-metal SVA all we need is type 1) isolation, and
not even that if we can assume that all SVA-capable devices have an
exclusive device-id (or stream-id).



	Joerg

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices
  2018-02-15 10:21           ` joro
  (?)
  (?)
@ 2018-02-15 12:29               ` Christian König
  -1 siblings, 0 replies; 317+ messages in thread
From: Christian König @ 2018-02-15 12:29 UTC (permalink / raw)
  To: joro-zLv9SwRftAIdnm+yROfE0A, Jean-Philippe Brucker
  Cc: Mark Rutland, ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	mykyta.iziumtsev-QSEj5FYQhm4dnm+yROfE0A,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, Raj, Ashok,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, Catalin Marinas,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org

Am 15.02.2018 um 11:21 schrieb joro@8bytes.org:
> On Tue, Feb 13, 2018 at 12:57:23PM +0000, Jean-Philippe Brucker wrote:
>> * bind_device() fails if the device's group has more than one device,
>> otherwise calls __bind_device(). This prevents device drivers that are
>> oblivious to IOMMU groups from opening a backdoor.
>>
>> * bind_group() calls __bind_device() for all devices in group. This way
>> users that are aware of IOMMU groups can still use them safely. Note that
>> at the moment bind_group() fails as soon as it finds a device that doesn't
>> support SVA. Having all devices support SVA in a given group is
>> unrealistic and this behavior ought to be improved.
> Yeah, so the problem on PCI is that all functions of a multi-function
> device are put into one group. For AMD-GPUs this means that the GPU
> (SVA-capable) will end up in the same group as the on-GPU sound
> device (not SVA-capable).

Yeah, but SVA only applies to rather new AMD-GPUs, which in turn can 
only do PCIe and there the problem doesn't seems to exist any more.

E.g. the audio device on my Vega10 gets a separate group despite being 
behind several bridges:
> 0b:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
> [AMD/ATI] Vega 10 [Radeon Vega Frontier Edition]
> 0b:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aaf8
...
> [    6.362665] iommu: Adding device 0000:0b:00.0 to group 14
> [    6.368468] iommu: Using direct mapping for device 0000:0b:00.0
> [    6.380040] iommu: Adding device 0000:0b:00.1 to group 15

Regards,
Christian.

>
> Before this causes us big headaches I suggest to only provide the
> bind_device() function. This should be fine because for SVA we don't
> need all types of isolation that iommu_groups provide.
>
> IOMMU-groups provide two types of isolation:
>
> 	1) They group devices together which the IOMMU can't distinguish
> 	   from each other, like PCI devices behind a PCIe bridge.
>
> 	2) Devices that can't be isolated from each other are also put
> 	   into the same group. This is the case for multi-function PCIe
> 	   devices as well as all PCIe devices behind a non-ACS bridge.
> 	   But all these devices cann still be distinguished by the
> 	   IOMMU.
>
> These two types of protection are needed to safely assign devices to
> guests, but for bare-metal SVA all we need is type 1) isolation, and
> not even that if we can assume that all SVA-capable devices have an
> exclusive device-id (or stream-id).
>
>
>
> 	Joerg
>

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices
@ 2018-02-15 12:29               ` Christian König
  0 siblings, 0 replies; 317+ messages in thread
From: Christian König @ 2018-02-15 12:29 UTC (permalink / raw)
  To: joro, Jean-Philippe Brucker
  Cc: Mark Rutland, xieyisheng1, ilias.apalodimas, mykyta.iziumtsev,
	kvm, linux-pci, xuzaibo, jonathan.cameron, Will Deacon, okaya,
	Liu, Yi L, Lorenzo Pieralisi, Raj, Ashok, tn, robdclark,
	bharatku, linux-acpi, Catalin Marinas, rfranz, lenb, devicetree,
	Tian, Kevin, jacob.jun.pan, alex.williamson, robh+dt,
	thunder.leizhen, bhelgaas, linux-arm-kernel, shunyong.yang,
	dwmw2, liubo95, rjw, jcrouse, iommu, hanjun.guo, Sudeep Holla,
	Robin Murphy, nwatters

QW0gMTUuMDIuMjAxOCB1bSAxMToyMSBzY2hyaWViIGpvcm9AOGJ5dGVzLm9yZzoKPiBPbiBUdWUs
IEZlYiAxMywgMjAxOCBhdCAxMjo1NzoyM1BNICswMDAwLCBKZWFuLVBoaWxpcHBlIEJydWNrZXIg
d3JvdGU6Cj4+ICogYmluZF9kZXZpY2UoKSBmYWlscyBpZiB0aGUgZGV2aWNlJ3MgZ3JvdXAgaGFz
IG1vcmUgdGhhbiBvbmUgZGV2aWNlLAo+PiBvdGhlcndpc2UgY2FsbHMgX19iaW5kX2RldmljZSgp
LiBUaGlzIHByZXZlbnRzIGRldmljZSBkcml2ZXJzIHRoYXQgYXJlCj4+IG9ibGl2aW91cyB0byBJ
T01NVSBncm91cHMgZnJvbSBvcGVuaW5nIGEgYmFja2Rvb3IuCj4+Cj4+ICogYmluZF9ncm91cCgp
IGNhbGxzIF9fYmluZF9kZXZpY2UoKSBmb3IgYWxsIGRldmljZXMgaW4gZ3JvdXAuIFRoaXMgd2F5
Cj4+IHVzZXJzIHRoYXQgYXJlIGF3YXJlIG9mIElPTU1VIGdyb3VwcyBjYW4gc3RpbGwgdXNlIHRo
ZW0gc2FmZWx5LiBOb3RlIHRoYXQKPj4gYXQgdGhlIG1vbWVudCBiaW5kX2dyb3VwKCkgZmFpbHMg
YXMgc29vbiBhcyBpdCBmaW5kcyBhIGRldmljZSB0aGF0IGRvZXNuJ3QKPj4gc3VwcG9ydCBTVkEu
IEhhdmluZyBhbGwgZGV2aWNlcyBzdXBwb3J0IFNWQSBpbiBhIGdpdmVuIGdyb3VwIGlzCj4+IHVu
cmVhbGlzdGljIGFuZCB0aGlzIGJlaGF2aW9yIG91Z2h0IHRvIGJlIGltcHJvdmVkLgo+IFllYWgs
IHNvIHRoZSBwcm9ibGVtIG9uIFBDSSBpcyB0aGF0IGFsbCBmdW5jdGlvbnMgb2YgYSBtdWx0aS1m
dW5jdGlvbgo+IGRldmljZSBhcmUgcHV0IGludG8gb25lIGdyb3VwLiBGb3IgQU1ELUdQVXMgdGhp
cyBtZWFucyB0aGF0IHRoZSBHUFUKPiAoU1ZBLWNhcGFibGUpIHdpbGwgZW5kIHVwIGluIHRoZSBz
YW1lIGdyb3VwIGFzIHRoZSBvbi1HUFUgc291bmQKPiBkZXZpY2UgKG5vdCBTVkEtY2FwYWJsZSku
CgpZZWFoLCBidXQgU1ZBIG9ubHkgYXBwbGllcyB0byByYXRoZXIgbmV3IEFNRC1HUFVzLCB3aGlj
aCBpbiB0dXJuIGNhbiAKb25seSBkbyBQQ0llIGFuZCB0aGVyZSB0aGUgcHJvYmxlbSBkb2Vzbid0
IHNlZW1zIHRvIGV4aXN0IGFueSBtb3JlLgoKRS5nLiB0aGUgYXVkaW8gZGV2aWNlIG9uIG15IFZl
Z2ExMCBnZXRzIGEgc2VwYXJhdGUgZ3JvdXAgZGVzcGl0ZSBiZWluZyAKYmVoaW5kIHNldmVyYWwg
YnJpZGdlczoKPiAwYjowMC4wIFZHQSBjb21wYXRpYmxlIGNvbnRyb2xsZXI6IEFkdmFuY2VkIE1p
Y3JvIERldmljZXMsIEluYy4gCj4gW0FNRC9BVEldIFZlZ2EgMTAgW1JhZGVvbiBWZWdhIEZyb250
aWVyIEVkaXRpb25dCj4gMGI6MDAuMSBBdWRpbyBkZXZpY2U6IEFkdmFuY2VkIE1pY3JvIERldmlj
ZXMsIEluYy4gW0FNRC9BVEldIERldmljZSBhYWY4Ci4uLgo+IFvCoMKgwqAgNi4zNjI2NjVdIGlv
bW11OiBBZGRpbmcgZGV2aWNlIDAwMDA6MGI6MDAuMCB0byBncm91cCAxNAo+IFvCoMKgwqAgNi4z
Njg0NjhdIGlvbW11OiBVc2luZyBkaXJlY3QgbWFwcGluZyBmb3IgZGV2aWNlIDAwMDA6MGI6MDAu
MAo+IFvCoMKgwqAgNi4zODAwNDBdIGlvbW11OiBBZGRpbmcgZGV2aWNlIDAwMDA6MGI6MDAuMSB0
byBncm91cCAxNQoKUmVnYXJkcywKQ2hyaXN0aWFuLgoKPgo+IEJlZm9yZSB0aGlzIGNhdXNlcyB1
cyBiaWcgaGVhZGFjaGVzIEkgc3VnZ2VzdCB0byBvbmx5IHByb3ZpZGUgdGhlCj4gYmluZF9kZXZp
Y2UoKSBmdW5jdGlvbi4gVGhpcyBzaG91bGQgYmUgZmluZSBiZWNhdXNlIGZvciBTVkEgd2UgZG9u
J3QKPiBuZWVkIGFsbCB0eXBlcyBvZiBpc29sYXRpb24gdGhhdCBpb21tdV9ncm91cHMgcHJvdmlk
ZS4KPgo+IElPTU1VLWdyb3VwcyBwcm92aWRlIHR3byB0eXBlcyBvZiBpc29sYXRpb246Cj4KPiAJ
MSkgVGhleSBncm91cCBkZXZpY2VzIHRvZ2V0aGVyIHdoaWNoIHRoZSBJT01NVSBjYW4ndCBkaXN0
aW5ndWlzaAo+IAkgICBmcm9tIGVhY2ggb3RoZXIsIGxpa2UgUENJIGRldmljZXMgYmVoaW5kIGEg
UENJZSBicmlkZ2UuCj4KPiAJMikgRGV2aWNlcyB0aGF0IGNhbid0IGJlIGlzb2xhdGVkIGZyb20g
ZWFjaCBvdGhlciBhcmUgYWxzbyBwdXQKPiAJICAgaW50byB0aGUgc2FtZSBncm91cC4gVGhpcyBp
cyB0aGUgY2FzZSBmb3IgbXVsdGktZnVuY3Rpb24gUENJZQo+IAkgICBkZXZpY2VzIGFzIHdlbGwg
YXMgYWxsIFBDSWUgZGV2aWNlcyBiZWhpbmQgYSBub24tQUNTIGJyaWRnZS4KPiAJICAgQnV0IGFs
bCB0aGVzZSBkZXZpY2VzIGNhbm4gc3RpbGwgYmUgZGlzdGluZ3Vpc2hlZCBieSB0aGUKPiAJICAg
SU9NTVUuCj4KPiBUaGVzZSB0d28gdHlwZXMgb2YgcHJvdGVjdGlvbiBhcmUgbmVlZGVkIHRvIHNh
ZmVseSBhc3NpZ24gZGV2aWNlcyB0bwo+IGd1ZXN0cywgYnV0IGZvciBiYXJlLW1ldGFsIFNWQSBh
bGwgd2UgbmVlZCBpcyB0eXBlIDEpIGlzb2xhdGlvbiwgYW5kCj4gbm90IGV2ZW4gdGhhdCBpZiB3
ZSBjYW4gYXNzdW1lIHRoYXQgYWxsIFNWQS1jYXBhYmxlIGRldmljZXMgaGF2ZSBhbgo+IGV4Y2x1
c2l2ZSBkZXZpY2UtaWQgKG9yIHN0cmVhbS1pZCkuCj4KPgo+Cj4gCUpvZXJnCj4KCgpfX19fX19f
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwpsaW51eC1hcm0ta2VybmVs
IG1haWxpbmcgbGlzdApsaW51eC1hcm0ta2VybmVsQGxpc3RzLmluZnJhZGVhZC5vcmcKaHR0cDov
L2xpc3RzLmluZnJhZGVhZC5vcmcvbWFpbG1hbi9saXN0aW5mby9saW51eC1hcm0ta2VybmVsCg==

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices
@ 2018-02-15 12:29               ` Christian König
  0 siblings, 0 replies; 317+ messages in thread
From: Christian König @ 2018-02-15 12:29 UTC (permalink / raw)
  To: joro-zLv9SwRftAIdnm+yROfE0A, Jean-Philippe Brucker
  Cc: Mark Rutland, ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	mykyta.iziumtsev-QSEj5FYQhm4dnm+yROfE0A,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, Raj, Ashok,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, Catalin Marinas,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org

Am 15.02.2018 um 11:21 schrieb joro@8bytes.org:
> On Tue, Feb 13, 2018 at 12:57:23PM +0000, Jean-Philippe Brucker wrote:
>> * bind_device() fails if the device's group has more than one device,
>> otherwise calls __bind_device(). This prevents device drivers that are
>> oblivious to IOMMU groups from opening a backdoor.
>>
>> * bind_group() calls __bind_device() for all devices in group. This way
>> users that are aware of IOMMU groups can still use them safely. Note that
>> at the moment bind_group() fails as soon as it finds a device that doesn't
>> support SVA. Having all devices support SVA in a given group is
>> unrealistic and this behavior ought to be improved.
> Yeah, so the problem on PCI is that all functions of a multi-function
> device are put into one group. For AMD-GPUs this means that the GPU
> (SVA-capable) will end up in the same group as the on-GPU sound
> device (not SVA-capable).

Yeah, but SVA only applies to rather new AMD-GPUs, which in turn can 
only do PCIe and there the problem doesn't seems to exist any more.

E.g. the audio device on my Vega10 gets a separate group despite being 
behind several bridges:
> 0b:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
> [AMD/ATI] Vega 10 [Radeon Vega Frontier Edition]
> 0b:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aaf8
...
> [    6.362665] iommu: Adding device 0000:0b:00.0 to group 14
> [    6.368468] iommu: Using direct mapping for device 0000:0b:00.0
> [    6.380040] iommu: Adding device 0000:0b:00.1 to group 15

Regards,
Christian.

>
> Before this causes us big headaches I suggest to only provide the
> bind_device() function. This should be fine because for SVA we don't
> need all types of isolation that iommu_groups provide.
>
> IOMMU-groups provide two types of isolation:
>
> 	1) They group devices together which the IOMMU can't distinguish
> 	   from each other, like PCI devices behind a PCIe bridge.
>
> 	2) Devices that can't be isolated from each other are also put
> 	   into the same group. This is the case for multi-function PCIe
> 	   devices as well as all PCIe devices behind a non-ACS bridge.
> 	   But all these devices cann still be distinguished by the
> 	   IOMMU.
>
> These two types of protection are needed to safely assign devices to
> guests, but for bare-metal SVA all we need is type 1) isolation, and
> not even that if we can assume that all SVA-capable devices have an
> exclusive device-id (or stream-id).
>
>
>
> 	Joerg
>

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 02/37] iommu/sva: Bind process address spaces to devices
@ 2018-02-15 12:29               ` Christian König
  0 siblings, 0 replies; 317+ messages in thread
From: Christian König @ 2018-02-15 12:29 UTC (permalink / raw)
  To: linux-arm-kernel

Am 15.02.2018 um 11:21 schrieb joro at 8bytes.org:
> On Tue, Feb 13, 2018 at 12:57:23PM +0000, Jean-Philippe Brucker wrote:
>> * bind_device() fails if the device's group has more than one device,
>> otherwise calls __bind_device(). This prevents device drivers that are
>> oblivious to IOMMU groups from opening a backdoor.
>>
>> * bind_group() calls __bind_device() for all devices in group. This way
>> users that are aware of IOMMU groups can still use them safely. Note that
>> at the moment bind_group() fails as soon as it finds a device that doesn't
>> support SVA. Having all devices support SVA in a given group is
>> unrealistic and this behavior ought to be improved.
> Yeah, so the problem on PCI is that all functions of a multi-function
> device are put into one group. For AMD-GPUs this means that the GPU
> (SVA-capable) will end up in the same group as the on-GPU sound
> device (not SVA-capable).

Yeah, but SVA only applies to rather new AMD-GPUs, which in turn can 
only do PCIe and there the problem doesn't seems to exist any more.

E.g. the audio device on my Vega10 gets a separate group despite being 
behind several bridges:
> 0b:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
> [AMD/ATI] Vega 10 [Radeon Vega Frontier Edition]
> 0b:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aaf8
...
> [??? 6.362665] iommu: Adding device 0000:0b:00.0 to group 14
> [??? 6.368468] iommu: Using direct mapping for device 0000:0b:00.0
> [??? 6.380040] iommu: Adding device 0000:0b:00.1 to group 15

Regards,
Christian.

>
> Before this causes us big headaches I suggest to only provide the
> bind_device() function. This should be fine because for SVA we don't
> need all types of isolation that iommu_groups provide.
>
> IOMMU-groups provide two types of isolation:
>
> 	1) They group devices together which the IOMMU can't distinguish
> 	   from each other, like PCI devices behind a PCIe bridge.
>
> 	2) Devices that can't be isolated from each other are also put
> 	   into the same group. This is the case for multi-function PCIe
> 	   devices as well as all PCIe devices behind a non-ACS bridge.
> 	   But all these devices cann still be distinguished by the
> 	   IOMMU.
>
> These two types of protection are needed to safely assign devices to
> guests, but for bare-metal SVA all we need is type 1) isolation, and
> not even that if we can assume that all SVA-capable devices have an
> exclusive device-id (or stream-id).
>
>
>
> 	Joerg
>

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices
  2018-02-13 23:34         ` Tian, Kevin
  (?)
@ 2018-02-15 12:40             ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-15 12:40 UTC (permalink / raw)
  To: Tian, Kevin, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Mark Rutland, ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	mykyta.iziumtsev-QSEj5FYQhm4dnm+yROfE0A, Catalin Marinas,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, Raj, Ashok,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	rjw-LthD3rsA81gm4RdzfppkhA, Sudeep Holla,
	christian.koenig-5C7GfCeVMHo

On 13/02/18 23:34, Tian, Kevin wrote:
>> From: Jean-Philippe Brucker
>> Sent: Tuesday, February 13, 2018 8:57 PM
>>
>> On 13/02/18 07:54, Tian, Kevin wrote:
>>>> From: Jean-Philippe Brucker
>>>> Sent: Tuesday, February 13, 2018 2:33 AM
>>>>
>>>> Add bind() and unbind() operations to the IOMMU API. Device drivers
>> can
>>>> use them to share process page tables with their devices. bind_group()
>>>> is provided for VFIO's convenience, as it needs to provide a coherent
>>>> interface on containers. Other device drivers will most likely want to
>>>> use bind_device(), which binds a single device in the group.
>>>
>>> I saw your bind_group implementation tries to bind the address space
>>> for all devices within a group, which IMO has some problem. Based on
>> PCIe
>>> spec, packet routing on the bus doesn't take PASID into consideration.
>>> since devices within same group cannot be isolated based on requestor-
>> ID
>>> i.e. traffic not guaranteed going to IOMMU, enabling SVA on multiple
>> devices
>>> could cause undesired p2p.
>> But so does enabling "classic" DMA... If two devices are not protected by
>> ACS for example, they are put in the same IOMMU group, and one device
>> might be able to snoop the other's DMA. VFIO allows userspace to create a
>> container for them and use MAP/UNMAP, but makes it explicit to the user
>> that for DMA, these devices are not isolated and must be considered as a
>> single device (you can't pass them to different VMs or put them in
>> different containers). So I tried to keep the same idea as MAP/UNMAP for
>> SVA, performing BIND/UNBIND operations on the VFIO container instead of
>> the device.
> 
> there is a small difference. for classic DMA we can reserve PCI BARs 
> when allocating IOVA, thus multiple devices in the same group can 
> still work correctly applied with same translation, if isolation is not
> cared in between. However for SVA it's CPU virtual addresses 
> managed by kernel mm thus difficult to introduce similar address 
> reservation. Then it's possible for a VA falling into other device's 
> BAR in the same group and cause undesired p2p traffic. In such 
> regard, SVA is actually functionally-broken.

I think the problem exists even if there is a single device in the group.
If for example, malloc() returns a VA that corresponds to a PCI host
bridge in IOVA space, performing DMA on that buffer won't reach the IOMMU
and will cause undesirable side-effects.

My series doesn't address the problem, but I believe we should carve
reserved regions out of the process address space during bind(), for
example by creating a PROT_NONE vma preventing userspace from obtaining
that VA.

If you solve this problem, you also solve it for multiple devices in a
group, because the IOMMU core provides the resv API on groups... That's
until you hotplug a device into a live group (currently WARN in VFIO),
with different resv regions.

>> I kept the analogy simple though, because I don't think there will be many
>> SVA-capable systems that require IOMMU groups. They will likely
> 
> I agree that multiple SVA-capable devices in same IOMMU group is not
> a typical configuration, especially it's usually observed on new devices.
> Then based on above limitation, I think we could just explicitly avoid
> enabling SVA in such case. :-)

I'd certainly like that :)

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices
@ 2018-02-15 12:40             ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-15 12:40 UTC (permalink / raw)
  To: Tian, Kevin, linux-arm-kernel, linux-pci, linux-acpi, devicetree,
	iommu, kvm
  Cc: joro, robh+dt, Mark Rutland, Catalin Marinas, Will Deacon,
	Lorenzo Pieralisi, hanjun.guo, Sudeep Holla, rjw, lenb,
	Robin Murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, jacob.jun.pan, Liu, Yi L, Raj, Ashok, robdclark,
	christian.koenig, bharatku, mykyta.iziumtsev

On 13/02/18 23:34, Tian, Kevin wrote:
>> From: Jean-Philippe Brucker
>> Sent: Tuesday, February 13, 2018 8:57 PM
>>
>> On 13/02/18 07:54, Tian, Kevin wrote:
>>>> From: Jean-Philippe Brucker
>>>> Sent: Tuesday, February 13, 2018 2:33 AM
>>>>
>>>> Add bind() and unbind() operations to the IOMMU API. Device drivers
>> can
>>>> use them to share process page tables with their devices. bind_group()
>>>> is provided for VFIO's convenience, as it needs to provide a coherent
>>>> interface on containers. Other device drivers will most likely want to
>>>> use bind_device(), which binds a single device in the group.
>>>
>>> I saw your bind_group implementation tries to bind the address space
>>> for all devices within a group, which IMO has some problem. Based on
>> PCIe
>>> spec, packet routing on the bus doesn't take PASID into consideration.
>>> since devices within same group cannot be isolated based on requestor-
>> ID
>>> i.e. traffic not guaranteed going to IOMMU, enabling SVA on multiple
>> devices
>>> could cause undesired p2p.
>> But so does enabling "classic" DMA... If two devices are not protected by
>> ACS for example, they are put in the same IOMMU group, and one device
>> might be able to snoop the other's DMA. VFIO allows userspace to create a
>> container for them and use MAP/UNMAP, but makes it explicit to the user
>> that for DMA, these devices are not isolated and must be considered as a
>> single device (you can't pass them to different VMs or put them in
>> different containers). So I tried to keep the same idea as MAP/UNMAP for
>> SVA, performing BIND/UNBIND operations on the VFIO container instead of
>> the device.
> 
> there is a small difference. for classic DMA we can reserve PCI BARs 
> when allocating IOVA, thus multiple devices in the same group can 
> still work correctly applied with same translation, if isolation is not
> cared in between. However for SVA it's CPU virtual addresses 
> managed by kernel mm thus difficult to introduce similar address 
> reservation. Then it's possible for a VA falling into other device's 
> BAR in the same group and cause undesired p2p traffic. In such 
> regard, SVA is actually functionally-broken.

I think the problem exists even if there is a single device in the group.
If for example, malloc() returns a VA that corresponds to a PCI host
bridge in IOVA space, performing DMA on that buffer won't reach the IOMMU
and will cause undesirable side-effects.

My series doesn't address the problem, but I believe we should carve
reserved regions out of the process address space during bind(), for
example by creating a PROT_NONE vma preventing userspace from obtaining
that VA.

If you solve this problem, you also solve it for multiple devices in a
group, because the IOMMU core provides the resv API on groups... That's
until you hotplug a device into a live group (currently WARN in VFIO),
with different resv regions.

>> I kept the analogy simple though, because I don't think there will be many
>> SVA-capable systems that require IOMMU groups. They will likely
> 
> I agree that multiple SVA-capable devices in same IOMMU group is not
> a typical configuration, especially it's usually observed on new devices.
> Then based on above limitation, I think we could just explicitly avoid
> enabling SVA in such case. :-)

I'd certainly like that :)

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 02/37] iommu/sva: Bind process address spaces to devices
@ 2018-02-15 12:40             ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-15 12:40 UTC (permalink / raw)
  To: linux-arm-kernel

On 13/02/18 23:34, Tian, Kevin wrote:
>> From: Jean-Philippe Brucker
>> Sent: Tuesday, February 13, 2018 8:57 PM
>>
>> On 13/02/18 07:54, Tian, Kevin wrote:
>>>> From: Jean-Philippe Brucker
>>>> Sent: Tuesday, February 13, 2018 2:33 AM
>>>>
>>>> Add bind() and unbind() operations to the IOMMU API. Device drivers
>> can
>>>> use them to share process page tables with their devices. bind_group()
>>>> is provided for VFIO's convenience, as it needs to provide a coherent
>>>> interface on containers. Other device drivers will most likely want to
>>>> use bind_device(), which binds a single device in the group.
>>>
>>> I saw your bind_group implementation tries to bind the address space
>>> for all devices within a group, which IMO has some problem. Based on
>> PCIe
>>> spec, packet routing on the bus doesn't take PASID into consideration.
>>> since devices within same group cannot be isolated based on requestor-
>> ID
>>> i.e. traffic not guaranteed going to IOMMU, enabling SVA on multiple
>> devices
>>> could cause undesired p2p.
>> But so does enabling "classic" DMA... If two devices are not protected by
>> ACS for example, they are put in the same IOMMU group, and one device
>> might be able to snoop the other's DMA. VFIO allows userspace to create a
>> container for them and use MAP/UNMAP, but makes it explicit to the user
>> that for DMA, these devices are not isolated and must be considered as a
>> single device (you can't pass them to different VMs or put them in
>> different containers). So I tried to keep the same idea as MAP/UNMAP for
>> SVA, performing BIND/UNBIND operations on the VFIO container instead of
>> the device.
> 
> there is a small difference. for classic DMA we can reserve PCI BARs 
> when allocating IOVA, thus multiple devices in the same group can 
> still work correctly applied with same translation, if isolation is not
> cared in between. However for SVA it's CPU virtual addresses 
> managed by kernel mm thus difficult to introduce similar address 
> reservation. Then it's possible for a VA falling into other device's 
> BAR in the same group and cause undesired p2p traffic. In such 
> regard, SVA is actually functionally-broken.

I think the problem exists even if there is a single device in the group.
If for example, malloc() returns a VA that corresponds to a PCI host
bridge in IOVA space, performing DMA on that buffer won't reach the IOMMU
and will cause undesirable side-effects.

My series doesn't address the problem, but I believe we should carve
reserved regions out of the process address space during bind(), for
example by creating a PROT_NONE vma preventing userspace from obtaining
that VA.

If you solve this problem, you also solve it for multiple devices in a
group, because the IOMMU core provides the resv API on groups... That's
until you hotplug a device into a live group (currently WARN in VFIO),
with different resv regions.

>> I kept the analogy simple though, because I don't think there will be many
>> SVA-capable systems that require IOMMU groups. They will likely
> 
> I agree that multiple SVA-capable devices in same IOMMU group is not
> a typical configuration, especially it's usually observed on new devices.
> Then based on above limitation, I think we could just explicitly avoid
> enabling SVA in such case. :-)

I'd certainly like that :)

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 01/37] iommu: Introduce Shared Virtual Addressing API
  2018-02-13 23:43             ` Tian, Kevin
  (?)
@ 2018-02-15 12:42                 ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-15 12:42 UTC (permalink / raw)
  To: Tian, Kevin, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Mark Rutland, bharatku-gjFFaj9aHVfQT0dZR+AlfA, Raj, Ashok,
	rjw-LthD3rsA81gm4RdzfppkhA, Catalin Marinas,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, Sudeep Holla,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	christian.koenig-5C7GfCeVMHo, lenb-DgEjT+Ai2ygdnm+yROfE0A

On 13/02/18 23:43, Tian, Kevin wrote:
>> From: Jean-Philippe Brucker
>> Sent: Tuesday, February 13, 2018 8:40 PM
>>
>>
>> [...]
>>>> +
>>>> +/**
>>>> + * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a
>>>> device
>>>> + * @dev: the device
>>>> + * @features: bitmask of features that need to be initialized
>>>> + * @max_pasid: max PASID value supported by the device
>>>> + *
>>>> + * Users of the bind()/unbind() API must call this function to initialize all
>>>> + * features required for SVA.
>>>> + *
>>>> + * - If the device should support multiple address spaces (e.g. PCI
>> PASID),
>>>> + *   IOMMU_SVA_FEAT_PASID must be requested.
>>>
>>> I think it is by default assumed when using this API, based on definition of
>>> SVA. Can you elaborate the situation where this flag can be cleared?
>>
>> When passing a device to userspace, you could also share its non-pasid
>> address space with the process. It requires a new domain type so is left
>> as a TODO in patch 2/37. I did get requests for this feature, though I
>> think it was mostly for prototyping. I guess I could remove the flag, and
>> reintroduce it as IOMMU_SVA_FEAT_NO_PASID later on.
> 
> sorry I still didn't get the definition of non-pasid address space. 
> Did you mean the GPA/IOVA address space and no_pasid implies
> actually some default PASID associated?

Yes I mean merging the process address space and IOVA space. There are no
PASIDs involved if the device or the IOMMU doesn't support it. Instead of
private DMA page tables you program the mm pgd into the IOMMU. A VFIO
userspace driver, instead of sending MAP/UNMAP ioctl, could simply issue a
BIND.

Technically nothing prevents it, but now the resv problem discussed on
patch 2/37 stands out. For example on x86 you'd probably need to carve the
IOAPIC MSI range out of the process address space. On Arm you'd need to
create a write-only mapping for MSIs (IOMMU translates it to the IRQ chip
address, but thankfully accessing the doorbell from CPU side doesn't
trigger an MSI.)

>> [...]
>>>> +	ret = domain->ops->sva_device_init(dev, features, &min_pasid,
>>>> +					   &max_pasid);
>>>> +	if (ret)
>>>> +		return ret;
>>>> +
>>>> +	/* FIXME: racy. Next version should have a mutex (same as fault
>>>> handler) */
>>>> +	dev_param->sva_features = features;
>>>> +	dev_param->min_pasid = min_pasid;
>>>> +	dev_param->max_pasid = max_pasid;
>>>
>>> what's the point of min_pasid here?
>>
>> Arm SMMUv3 uses entry 0 of the PASID table for the default (non-pasid)
>> context, so it needs to set min_pasid to 1. AMD IOMMU recently added a
>> similar feature (GIoSup), if I understood correctly.
>>
> 
> just for such purpose maybe we should just define a reserved_pasid
> otherwise there will be some waste if an implementation allows it
> non-zero.

What's wasted? It's slightly simpler to use min_pasid because we just pass
that limit to idr_alloc(). With a reserved_pasid we'll have to call
idr_alloc(reserved_pasid) once, for the same result.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 01/37] iommu: Introduce Shared Virtual Addressing API
@ 2018-02-15 12:42                 ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-15 12:42 UTC (permalink / raw)
  To: Tian, Kevin, linux-arm-kernel, linux-pci, linux-acpi, devicetree,
	iommu, kvm
  Cc: Mark Rutland, ilias.apalodimas, Catalin Marinas, xuzaibo,
	Will Deacon, okaya, Raj, Ashok, bharatku, rfranz, lenb, robh+dt,
	bhelgaas, shunyong.yang, dwmw2, rjw, Sudeep Holla,
	christian.koenig, Joerg Roedel

On 13/02/18 23:43, Tian, Kevin wrote:
>> From: Jean-Philippe Brucker
>> Sent: Tuesday, February 13, 2018 8:40 PM
>>
>>
>> [...]
>>>> +
>>>> +/**
>>>> + * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a
>>>> device
>>>> + * @dev: the device
>>>> + * @features: bitmask of features that need to be initialized
>>>> + * @max_pasid: max PASID value supported by the device
>>>> + *
>>>> + * Users of the bind()/unbind() API must call this function to initialize all
>>>> + * features required for SVA.
>>>> + *
>>>> + * - If the device should support multiple address spaces (e.g. PCI
>> PASID),
>>>> + *   IOMMU_SVA_FEAT_PASID must be requested.
>>>
>>> I think it is by default assumed when using this API, based on definition of
>>> SVA. Can you elaborate the situation where this flag can be cleared?
>>
>> When passing a device to userspace, you could also share its non-pasid
>> address space with the process. It requires a new domain type so is left
>> as a TODO in patch 2/37. I did get requests for this feature, though I
>> think it was mostly for prototyping. I guess I could remove the flag, and
>> reintroduce it as IOMMU_SVA_FEAT_NO_PASID later on.
> 
> sorry I still didn't get the definition of non-pasid address space. 
> Did you mean the GPA/IOVA address space and no_pasid implies
> actually some default PASID associated?

Yes I mean merging the process address space and IOVA space. There are no
PASIDs involved if the device or the IOMMU doesn't support it. Instead of
private DMA page tables you program the mm pgd into the IOMMU. A VFIO
userspace driver, instead of sending MAP/UNMAP ioctl, could simply issue a
BIND.

Technically nothing prevents it, but now the resv problem discussed on
patch 2/37 stands out. For example on x86 you'd probably need to carve the
IOAPIC MSI range out of the process address space. On Arm you'd need to
create a write-only mapping for MSIs (IOMMU translates it to the IRQ chip
address, but thankfully accessing the doorbell from CPU side doesn't
trigger an MSI.)

>> [...]
>>>> +	ret = domain->ops->sva_device_init(dev, features, &min_pasid,
>>>> +					   &max_pasid);
>>>> +	if (ret)
>>>> +		return ret;
>>>> +
>>>> +	/* FIXME: racy. Next version should have a mutex (same as fault
>>>> handler) */
>>>> +	dev_param->sva_features = features;
>>>> +	dev_param->min_pasid = min_pasid;
>>>> +	dev_param->max_pasid = max_pasid;
>>>
>>> what's the point of min_pasid here?
>>
>> Arm SMMUv3 uses entry 0 of the PASID table for the default (non-pasid)
>> context, so it needs to set min_pasid to 1. AMD IOMMU recently added a
>> similar feature (GIoSup), if I understood correctly.
>>
> 
> just for such purpose maybe we should just define a reserved_pasid
> otherwise there will be some waste if an implementation allows it
> non-zero.

What's wasted? It's slightly simpler to use min_pasid because we just pass
that limit to idr_alloc(). With a reserved_pasid we'll have to call
idr_alloc(reserved_pasid) once, for the same result.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 01/37] iommu: Introduce Shared Virtual Addressing API
@ 2018-02-15 12:42                 ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-15 12:42 UTC (permalink / raw)
  To: linux-arm-kernel

On 13/02/18 23:43, Tian, Kevin wrote:
>> From: Jean-Philippe Brucker
>> Sent: Tuesday, February 13, 2018 8:40 PM
>>
>>
>> [...]
>>>> +
>>>> +/**
>>>> + * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a
>>>> device
>>>> + * @dev: the device
>>>> + * @features: bitmask of features that need to be initialized
>>>> + * @max_pasid: max PASID value supported by the device
>>>> + *
>>>> + * Users of the bind()/unbind() API must call this function to initialize all
>>>> + * features required for SVA.
>>>> + *
>>>> + * - If the device should support multiple address spaces (e.g. PCI
>> PASID),
>>>> + *   IOMMU_SVA_FEAT_PASID must be requested.
>>>
>>> I think it is by default assumed when using this API, based on definition of
>>> SVA. Can you elaborate the situation where this flag can be cleared?
>>
>> When passing a device to userspace, you could also share its non-pasid
>> address space with the process. It requires a new domain type so is left
>> as a TODO in patch 2/37. I did get requests for this feature, though I
>> think it was mostly for prototyping. I guess I could remove the flag, and
>> reintroduce it as IOMMU_SVA_FEAT_NO_PASID later on.
> 
> sorry I still didn't get the definition of non-pasid address space. 
> Did you mean the GPA/IOVA address space and no_pasid implies
> actually some default PASID associated?

Yes I mean merging the process address space and IOVA space. There are no
PASIDs involved if the device or the IOMMU doesn't support it. Instead of
private DMA page tables you program the mm pgd into the IOMMU. A VFIO
userspace driver, instead of sending MAP/UNMAP ioctl, could simply issue a
BIND.

Technically nothing prevents it, but now the resv problem discussed on
patch 2/37 stands out. For example on x86 you'd probably need to carve the
IOAPIC MSI range out of the process address space. On Arm you'd need to
create a write-only mapping for MSIs (IOMMU translates it to the IRQ chip
address, but thankfully accessing the doorbell from CPU side doesn't
trigger an MSI.)

>> [...]
>>>> +	ret = domain->ops->sva_device_init(dev, features, &min_pasid,
>>>> +					   &max_pasid);
>>>> +	if (ret)
>>>> +		return ret;
>>>> +
>>>> +	/* FIXME: racy. Next version should have a mutex (same as fault
>>>> handler) */
>>>> +	dev_param->sva_features = features;
>>>> +	dev_param->min_pasid = min_pasid;
>>>> +	dev_param->max_pasid = max_pasid;
>>>
>>> what's the point of min_pasid here?
>>
>> Arm SMMUv3 uses entry 0 of the PASID table for the default (non-pasid)
>> context, so it needs to set min_pasid to 1. AMD IOMMU recently added a
>> similar feature (GIoSup), if I understood correctly.
>>
> 
> just for such purpose maybe we should just define a reserved_pasid
> otherwise there will be some waste if an implementation allows it
> non-zero.

What's wasted? It's slightly simpler to use min_pasid because we just pass
that limit to idr_alloc(). With a reserved_pasid we'll have to call
idr_alloc(reserved_pasid) once, for the same result.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 01/37] iommu: Introduce Shared Virtual Addressing API
  2018-02-15  9:59     ` Joerg Roedel
  (?)
@ 2018-02-15 12:43         ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-15 12:43 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Mark Rutland, ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, Catalin Marinas,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw

On 15/02/18 09:59, Joerg Roedel wrote:
> On Mon, Feb 12, 2018 at 06:33:16PM +0000, Jean-Philippe Brucker wrote:
>   
>> +config IOMMU_SVA
>> +	bool "Shared Virtual Addressing API for the IOMMU"
>> +	select IOMMU_API
>> +	help
>> +	  Enable process address space management for the IOMMU API. In systems
>> +	  that support it, device drivers can bind process address spaces to
>> +	  devices and share their page tables using this API.
>> +
>> +	  If unsure, say N here.
> 
> I think this should be an option selected by IOMMU driver and not be
> activly selectable by the user.

Ok

>> +/**
>> + * iommu_sva_device_shutdown() - Shutdown Shared Virtual Addressing for a device
>> + * @dev: the device
>> + *
>> + * Disable SVA. The device should not be performing any DMA while this function
>> + * is running.
> 
> Is this a good idea? How about devices that get hot-unplugged while
> processes still use them and there is DMA going back and forth? This
> function can be the point to shut down all ongoing stuff first and the
> shutdown the device.

To be honest I don't know how hot-unplug works. But sva_device_shutdown()
may be called, for instance, by the device driver before the device
disappears, so it has to know how to stop DMA before calling it. The IOMMU
driver can't really do anything more.

For hot-unplug I guess that device_driver::remove() is called first,
allowing it to stop all DMA and call sva_device_shutdown().

Then the IOMMU gets a BUS_NOTIFY_REMOVED_DEVICE notification and calls
iommu_ops::remove_device(), allowing to clean up SVA structure if the
device driver didn't call unbind_device() and sva_device_shutdown(). But
at that point we don't have a way to cooperate with the driver to stop DMA
anymore.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 01/37] iommu: Introduce Shared Virtual Addressing API
@ 2018-02-15 12:43         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-15 12:43 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Mark Rutland, xieyisheng1, ilias.apalodimas, kvm, linux-pci,
	xuzaibo, jonathan.cameron, Will Deacon, okaya, yi.l.liu,
	Lorenzo Pieralisi, ashok.raj, tn, robdclark, bharatku,
	linux-acpi, Catalin Marinas, rfranz, lenb, devicetree,
	jacob.jun.pan, alex.williamson, robh+dt, thunder.leizhen,
	bhelgaas, linux-arm-kernel, shunyong.yang, dwmw2, liubo95, rjw,
	jcrouse, iommu, hanjun.guo, Sudeep Holla, Robin Murphy,
	christian.koenig, nwatters

On 15/02/18 09:59, Joerg Roedel wrote:
> On Mon, Feb 12, 2018 at 06:33:16PM +0000, Jean-Philippe Brucker wrote:
>   
>> +config IOMMU_SVA
>> +	bool "Shared Virtual Addressing API for the IOMMU"
>> +	select IOMMU_API
>> +	help
>> +	  Enable process address space management for the IOMMU API. In systems
>> +	  that support it, device drivers can bind process address spaces to
>> +	  devices and share their page tables using this API.
>> +
>> +	  If unsure, say N here.
> 
> I think this should be an option selected by IOMMU driver and not be
> activly selectable by the user.

Ok

>> +/**
>> + * iommu_sva_device_shutdown() - Shutdown Shared Virtual Addressing for a device
>> + * @dev: the device
>> + *
>> + * Disable SVA. The device should not be performing any DMA while this function
>> + * is running.
> 
> Is this a good idea? How about devices that get hot-unplugged while
> processes still use them and there is DMA going back and forth? This
> function can be the point to shut down all ongoing stuff first and the
> shutdown the device.

To be honest I don't know how hot-unplug works. But sva_device_shutdown()
may be called, for instance, by the device driver before the device
disappears, so it has to know how to stop DMA before calling it. The IOMMU
driver can't really do anything more.

For hot-unplug I guess that device_driver::remove() is called first,
allowing it to stop all DMA and call sva_device_shutdown().

Then the IOMMU gets a BUS_NOTIFY_REMOVED_DEVICE notification and calls
iommu_ops::remove_device(), allowing to clean up SVA structure if the
device driver didn't call unbind_device() and sva_device_shutdown(). But
at that point we don't have a way to cooperate with the driver to stop DMA
anymore.

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 01/37] iommu: Introduce Shared Virtual Addressing API
@ 2018-02-15 12:43         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-15 12:43 UTC (permalink / raw)
  To: linux-arm-kernel

On 15/02/18 09:59, Joerg Roedel wrote:
> On Mon, Feb 12, 2018 at 06:33:16PM +0000, Jean-Philippe Brucker wrote:
>   
>> +config IOMMU_SVA
>> +	bool "Shared Virtual Addressing API for the IOMMU"
>> +	select IOMMU_API
>> +	help
>> +	  Enable process address space management for the IOMMU API. In systems
>> +	  that support it, device drivers can bind process address spaces to
>> +	  devices and share their page tables using this API.
>> +
>> +	  If unsure, say N here.
> 
> I think this should be an option selected by IOMMU driver and not be
> activly selectable by the user.

Ok

>> +/**
>> + * iommu_sva_device_shutdown() - Shutdown Shared Virtual Addressing for a device
>> + * @dev: the device
>> + *
>> + * Disable SVA. The device should not be performing any DMA while this function
>> + * is running.
> 
> Is this a good idea? How about devices that get hot-unplugged while
> processes still use them and there is DMA going back and forth? This
> function can be the point to shut down all ongoing stuff first and the
> shutdown the device.

To be honest I don't know how hot-unplug works. But sva_device_shutdown()
may be called, for instance, by the device driver before the device
disappears, so it has to know how to stop DMA before calling it. The IOMMU
driver can't really do anything more.

For hot-unplug I guess that device_driver::remove() is called first,
allowing it to stop all DMA and call sva_device_shutdown().

Then the IOMMU gets a BUS_NOTIFY_REMOVED_DEVICE notification and calls
iommu_ops::remove_device(), allowing to clean up SVA structure if the
device driver didn't call unbind_device() and sva_device_shutdown(). But
at that point we don't have a way to cooperate with the driver to stop DMA
anymore.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices
  2018-02-15 10:21           ` joro
  (?)
@ 2018-02-15 12:46               ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-15 12:46 UTC (permalink / raw)
  To: joro-zLv9SwRftAIdnm+yROfE0A
  Cc: Mark Rutland, ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	mykyta.iziumtsev-QSEj5FYQhm4dnm+yROfE0A,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, Raj, Ashok,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, Catalin Marinas,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, dwmw

On 15/02/18 10:21, joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org wrote:
> On Tue, Feb 13, 2018 at 12:57:23PM +0000, Jean-Philippe Brucker wrote:
>> * bind_device() fails if the device's group has more than one device,
>> otherwise calls __bind_device(). This prevents device drivers that are
>> oblivious to IOMMU groups from opening a backdoor.
>>
>> * bind_group() calls __bind_device() for all devices in group. This way
>> users that are aware of IOMMU groups can still use them safely. Note that
>> at the moment bind_group() fails as soon as it finds a device that doesn't
>> support SVA. Having all devices support SVA in a given group is
>> unrealistic and this behavior ought to be improved.
> 
> Yeah, so the problem on PCI is that all functions of a multi-function
> device are put into one group. For AMD-GPUs this means that the GPU
> (SVA-capable) will end up in the same group as the on-GPU sound
> device (not SVA-capable).

As I understood it ACS also isolate functions within a device, for example
the two PFs of my ixgbe card are in different groups. Strangely all VFs go
in the same group, I haven't investigated why yet.

> Before this causes us big headaches I suggest to only provide the
> bind_device() function.

Ok. I added bind_group() to make it easier for VFIO - so if one bind()
fails in the group, iommu.c can rollback and remove the bonds already
created. If we mandate a single device in the group for SVA, then VFIO can
use iommu_group_for_each_dev() and ensure that the callback was only
called once.

> This should be fine because for SVA we don't
> need all types of isolation that iommu_groups provide.
> 
> IOMMU-groups provide two types of isolation:
> 
> 	1) They group devices together which the IOMMU can't distinguish
> 	   from each other, like PCI devices behind a PCIe bridge.
> 
> 	2) Devices that can't be isolated from each other are also put
> 	   into the same group. This is the case for multi-function PCIe
> 	   devices as well as all PCIe devices behind a non-ACS bridge.
> 	   But all these devices cann still be distinguished by the
> 	   IOMMU.

But transactions don't necessarily reach the IOMMU if devices are not
isolated by ACS. So even if you disable all translation in the IOMMU for
one device in the group, it may still have a view of address spaces shared
with another device in that group.

> These two types of protection are needed to safely assign devices to
> guests, but for bare-metal SVA all we need is type 1) isolation, and
> not even that if we can assume that all SVA-capable devices have an
> exclusive device-id (or stream-id).

I'm not as optimistic that we won't need IOMMU groups with SVA devices for
2) (hardware bugs, integration issues, etc). I'd be more comfortable if we
added a sanity-check as suggested by Kevin, to ensure that SVA is
disallowed if multiple devices are in the group.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices
@ 2018-02-15 12:46               ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-15 12:46 UTC (permalink / raw)
  To: joro
  Cc: Mark Rutland, xieyisheng1, ilias.apalodimas, mykyta.iziumtsev,
	kvm, linux-pci, xuzaibo, jonathan.cameron, Will Deacon, okaya,
	Liu, Yi L, Lorenzo Pieralisi, Raj, Ashok, tn, robdclark,
	bharatku, linux-acpi, Catalin Marinas, rfranz, lenb, devicetree,
	Tian, Kevin, jacob.jun.pan, alex.williamson, robh+dt,
	thunder.leizhen, bhelgaas, linux-arm-kernel, shunyong.yang,
	dwmw2, liubo95, rjw, jcrouse, iommu, hanjun.guo, Sudeep Holla,
	Robin Murphy, christian.koenig, nwatters

On 15/02/18 10:21, joro@8bytes.org wrote:
> On Tue, Feb 13, 2018 at 12:57:23PM +0000, Jean-Philippe Brucker wrote:
>> * bind_device() fails if the device's group has more than one device,
>> otherwise calls __bind_device(). This prevents device drivers that are
>> oblivious to IOMMU groups from opening a backdoor.
>>
>> * bind_group() calls __bind_device() for all devices in group. This way
>> users that are aware of IOMMU groups can still use them safely. Note that
>> at the moment bind_group() fails as soon as it finds a device that doesn't
>> support SVA. Having all devices support SVA in a given group is
>> unrealistic and this behavior ought to be improved.
> 
> Yeah, so the problem on PCI is that all functions of a multi-function
> device are put into one group. For AMD-GPUs this means that the GPU
> (SVA-capable) will end up in the same group as the on-GPU sound
> device (not SVA-capable).

As I understood it ACS also isolate functions within a device, for example
the two PFs of my ixgbe card are in different groups. Strangely all VFs go
in the same group, I haven't investigated why yet.

> Before this causes us big headaches I suggest to only provide the
> bind_device() function.

Ok. I added bind_group() to make it easier for VFIO - so if one bind()
fails in the group, iommu.c can rollback and remove the bonds already
created. If we mandate a single device in the group for SVA, then VFIO can
use iommu_group_for_each_dev() and ensure that the callback was only
called once.

> This should be fine because for SVA we don't
> need all types of isolation that iommu_groups provide.
> 
> IOMMU-groups provide two types of isolation:
> 
> 	1) They group devices together which the IOMMU can't distinguish
> 	   from each other, like PCI devices behind a PCIe bridge.
> 
> 	2) Devices that can't be isolated from each other are also put
> 	   into the same group. This is the case for multi-function PCIe
> 	   devices as well as all PCIe devices behind a non-ACS bridge.
> 	   But all these devices cann still be distinguished by the
> 	   IOMMU.

But transactions don't necessarily reach the IOMMU if devices are not
isolated by ACS. So even if you disable all translation in the IOMMU for
one device in the group, it may still have a view of address spaces shared
with another device in that group.

> These two types of protection are needed to safely assign devices to
> guests, but for bare-metal SVA all we need is type 1) isolation, and
> not even that if we can assume that all SVA-capable devices have an
> exclusive device-id (or stream-id).

I'm not as optimistic that we won't need IOMMU groups with SVA devices for
2) (hardware bugs, integration issues, etc). I'd be more comfortable if we
added a sanity-check as suggested by Kevin, to ensure that SVA is
disallowed if multiple devices are in the group.

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 02/37] iommu/sva: Bind process address spaces to devices
@ 2018-02-15 12:46               ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-15 12:46 UTC (permalink / raw)
  To: linux-arm-kernel

On 15/02/18 10:21, joro at 8bytes.org wrote:
> On Tue, Feb 13, 2018 at 12:57:23PM +0000, Jean-Philippe Brucker wrote:
>> * bind_device() fails if the device's group has more than one device,
>> otherwise calls __bind_device(). This prevents device drivers that are
>> oblivious to IOMMU groups from opening a backdoor.
>>
>> * bind_group() calls __bind_device() for all devices in group. This way
>> users that are aware of IOMMU groups can still use them safely. Note that
>> at the moment bind_group() fails as soon as it finds a device that doesn't
>> support SVA. Having all devices support SVA in a given group is
>> unrealistic and this behavior ought to be improved.
> 
> Yeah, so the problem on PCI is that all functions of a multi-function
> device are put into one group. For AMD-GPUs this means that the GPU
> (SVA-capable) will end up in the same group as the on-GPU sound
> device (not SVA-capable).

As I understood it ACS also isolate functions within a device, for example
the two PFs of my ixgbe card are in different groups. Strangely all VFs go
in the same group, I haven't investigated why yet.

> Before this causes us big headaches I suggest to only provide the
> bind_device() function.

Ok. I added bind_group() to make it easier for VFIO - so if one bind()
fails in the group, iommu.c can rollback and remove the bonds already
created. If we mandate a single device in the group for SVA, then VFIO can
use iommu_group_for_each_dev() and ensure that the callback was only
called once.

> This should be fine because for SVA we don't
> need all types of isolation that iommu_groups provide.
> 
> IOMMU-groups provide two types of isolation:
> 
> 	1) They group devices together which the IOMMU can't distinguish
> 	   from each other, like PCI devices behind a PCIe bridge.
> 
> 	2) Devices that can't be isolated from each other are also put
> 	   into the same group. This is the case for multi-function PCIe
> 	   devices as well as all PCIe devices behind a non-ACS bridge.
> 	   But all these devices cann still be distinguished by the
> 	   IOMMU.

But transactions don't necessarily reach the IOMMU if devices are not
isolated by ACS. So even if you disable all translation in the IOMMU for
one device in the group, it may still have a view of address spaces shared
with another device in that group.

> These two types of protection are needed to safely assign devices to
> guests, but for bare-metal SVA all we need is type 1) isolation, and
> not even that if we can assume that all SVA-capable devices have an
> exclusive device-id (or stream-id).

I'm not as optimistic that we won't need IOMMU groups with SVA devices for
2) (hardware bugs, integration issues, etc). I'd be more comfortable if we
added a sanity-check as suggested by Kevin, to ensure that SVA is
disallowed if multiple devices are in the group.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 07/37] iommu: Add a page fault handler
  2018-02-14  7:18     ` Jacob Pan
  (?)
@ 2018-02-15 13:49       ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-15 13:49 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Mark Rutland, ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, Catalin Marinas,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw

On 14/02/18 07:18, Jacob Pan wrote:
[...]
>> +/* Used to store incomplete fault groups */
>> +static LIST_HEAD(iommu_partial_faults);
>> +static DEFINE_SPINLOCK(iommu_partial_faults_lock);
>> +
> should partial fault list be per iommu?

That would be good, but I don't see an easy way to retrieve the iommu
instance in report_device_fault(). Maybe the driver should pass it to
report_device_fault(), and we can then store partial faults in struct
iommu_device.

[...]
>> +/**
>> + * iommu_report_device_fault() - Handle fault in device driver or mm
>> + *
>> + * If the device driver expressed interest in handling fault, report
>> it through
>> + * the callback. If the fault is recoverable, try to page in the
>> address.
>> + */
>> +int iommu_report_device_fault(struct device *dev, struct
>> iommu_fault_event *evt) +{
>> +	int ret = -ENOSYS;
>> +	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
>> +
>> +	if (!domain)
>> +		return -ENODEV;
>> +
>> +	/*
>> +	 * if upper layers showed interest and installed a fault
>> handler,
>> +	 * invoke it.
>> +	 */
>> +	if (iommu_has_device_fault_handler(dev)) {
> I think Alex pointed out this is racy, so adding a mutex to the
> iommu_fault_param and acquire it would help. Do we really
> atomic handler?

Yes I think a few IOMMU drivers will call this function from IRQ context,
so a spinlock might be better for acquiring iommu_fault_param.

>> +		struct iommu_fault_param *param =
>> dev->iommu_param->fault_param; +
>> +		return param->handler(evt, param->data);
> Even upper layer (VFIO) registered handler to propagate PRQ to a guest
> to fault in the pages, we may still need to keep track of the page
> requests that need page response later, i.e. last page in group or
> stream request in vt-d. This will allow us sanitize the page response
> come back from the guest/VFIO.
> In my next round, I am adding a per device list under iommu_fault_param
> for pending page request. This will also address the situation where
> guest failed to send response. We can enforce time or credit limit of
> pending requests based on this list.

Sounds good

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 07/37] iommu: Add a page fault handler
@ 2018-02-15 13:49       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-15 13:49 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Mark Rutland, xieyisheng1, ilias.apalodimas, kvm, linux-pci,
	xuzaibo, jonathan.cameron, Will Deacon, okaya, yi.l.liu,
	Lorenzo Pieralisi, ashok.raj, tn, joro, robdclark, bharatku,
	linux-acpi, Catalin Marinas, rfranz, lenb, devicetree,
	alex.williamson, robh+dt, thunder.leizhen, bhelgaas,
	linux-arm-kernel, shunyong.yang, dwmw2, liubo95, rjw, jcrouse,
	iommu, hanjun.guo, Sudeep Holla, Robin Murphy, christian.koenig,
	nwatters

On 14/02/18 07:18, Jacob Pan wrote:
[...]
>> +/* Used to store incomplete fault groups */
>> +static LIST_HEAD(iommu_partial_faults);
>> +static DEFINE_SPINLOCK(iommu_partial_faults_lock);
>> +
> should partial fault list be per iommu?

That would be good, but I don't see an easy way to retrieve the iommu
instance in report_device_fault(). Maybe the driver should pass it to
report_device_fault(), and we can then store partial faults in struct
iommu_device.

[...]
>> +/**
>> + * iommu_report_device_fault() - Handle fault in device driver or mm
>> + *
>> + * If the device driver expressed interest in handling fault, report
>> it through
>> + * the callback. If the fault is recoverable, try to page in the
>> address.
>> + */
>> +int iommu_report_device_fault(struct device *dev, struct
>> iommu_fault_event *evt) +{
>> +	int ret = -ENOSYS;
>> +	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
>> +
>> +	if (!domain)
>> +		return -ENODEV;
>> +
>> +	/*
>> +	 * if upper layers showed interest and installed a fault
>> handler,
>> +	 * invoke it.
>> +	 */
>> +	if (iommu_has_device_fault_handler(dev)) {
> I think Alex pointed out this is racy, so adding a mutex to the
> iommu_fault_param and acquire it would help. Do we really
> atomic handler?

Yes I think a few IOMMU drivers will call this function from IRQ context,
so a spinlock might be better for acquiring iommu_fault_param.

>> +		struct iommu_fault_param *param =
>> dev->iommu_param->fault_param; +
>> +		return param->handler(evt, param->data);
> Even upper layer (VFIO) registered handler to propagate PRQ to a guest
> to fault in the pages, we may still need to keep track of the page
> requests that need page response later, i.e. last page in group or
> stream request in vt-d. This will allow us sanitize the page response
> come back from the guest/VFIO.
> In my next round, I am adding a per device list under iommu_fault_param
> for pending page request. This will also address the situation where
> guest failed to send response. We can enforce time or credit limit of
> pending requests based on this list.

Sounds good

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 07/37] iommu: Add a page fault handler
@ 2018-02-15 13:49       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-15 13:49 UTC (permalink / raw)
  To: linux-arm-kernel

On 14/02/18 07:18, Jacob Pan wrote:
[...]
>> +/* Used to store incomplete fault groups */
>> +static LIST_HEAD(iommu_partial_faults);
>> +static DEFINE_SPINLOCK(iommu_partial_faults_lock);
>> +
> should partial fault list be per iommu?

That would be good, but I don't see an easy way to retrieve the iommu
instance in report_device_fault(). Maybe the driver should pass it to
report_device_fault(), and we can then store partial faults in struct
iommu_device.

[...]
>> +/**
>> + * iommu_report_device_fault() - Handle fault in device driver or mm
>> + *
>> + * If the device driver expressed interest in handling fault, report
>> it through
>> + * the callback. If the fault is recoverable, try to page in the
>> address.
>> + */
>> +int iommu_report_device_fault(struct device *dev, struct
>> iommu_fault_event *evt) +{
>> +	int ret = -ENOSYS;
>> +	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
>> +
>> +	if (!domain)
>> +		return -ENODEV;
>> +
>> +	/*
>> +	 * if upper layers showed interest and installed a fault
>> handler,
>> +	 * invoke it.
>> +	 */
>> +	if (iommu_has_device_fault_handler(dev)) {
> I think Alex pointed out this is racy, so adding a mutex to the
> iommu_fault_param and acquire it would help. Do we really
> atomic handler?

Yes I think a few IOMMU drivers will call this function from IRQ context,
so a spinlock might be better for acquiring iommu_fault_param.

>> +		struct iommu_fault_param *param =
>> dev->iommu_param->fault_param; +
>> +		return param->handler(evt, param->data);
> Even upper layer (VFIO) registered handler to propagate PRQ to a guest
> to fault in the pages, we may still need to keep track of the page
> requests that need page response later, i.e. last page in group or
> stream request in vt-d. This will allow us sanitize the page response
> come back from the guest/VFIO.
> In my next round, I am adding a per device list under iommu_fault_param
> for pending page request. This will also address the situation where
> guest failed to send response. We can enforce time or credit limit of
> pending requests based on this list.

Sounds good

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 08/37] iommu/fault: Handle mm faults
  2018-02-14 18:46       ` Jacob Pan
  (?)
@ 2018-02-15 13:51         ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-15 13:51 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Mark Rutland, ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, Catalin Marinas,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw

On 14/02/18 18:46, Jacob Pan wrote:
> On Mon, 12 Feb 2018 18:33:23 +0000
> Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:
[...]
>> +	if (!evt->pasid_valid)
>> +		return ret;
> I guess for not we don't handle PRQ without PASID, right?

No. I'm not sure how to implement it, though there have been some requests
(see discussion on 1/37)

>> +	/*
>> +	 * Special case: PASID Stop Marker (LRW = 0b100) doesn't
>> expect a
>> +	 * response. A Stop Marker may be generated when disabling a
>> PASID
>> +	 * (issuing a PASID stop request) in some PCI devices.
>> +	 *
>> +	 * When the mm_exit() callback returns from the device
>> driver, no page
>> +	 * request is generated for this PASID anymore and
>> outstanding ones have
>> +	 * been pushed to the IOMMU (as per PCIe 4.0r1.0 - 6.20.1
>> and 10.4.1.2 -
>> +	 * Managing PASID TLP Prefix Usage). Some PCI devices will
>> wait for all
>> +	 * outstanding page requests to come back with a response
>> before
>> +	 * completing the PASID stop request. Others do not wait for
>> page
>> +	 * responses, and instead issue this Stop Marker that tells
>> us when the
>> +	 * PASID can be reallocated.
>> +	 *
>> +	 * We ignore the Stop Marker because:
>> +	 * a. Page requests, which are posted requests, have been
>> flushed to the
>> +	 *    IOMMU when mm_exit() returns,
>> +	 * b. We flush all fault queues after mm_exit() returns and
>> before
>> +	 *    freeing the PASID.
>> +	 *
>> +	 * So even though the Stop Marker might be issued by the
>> device *after*
>> +	 * the stop request completes, outstanding faults will have
>> been dealt
>> +	 * with by the time we free the PASID.
>> +	 */
>> +	if (evt->last_req &&
>> +	    !(evt->prot & (IOMMU_FAULT_READ | IOMMU_FAULT_WRITE)))
>> +		return IOMMU_PAGE_RESP_HANDLED;
>> +
> If we don't expect a page response, shouldn't it be filtered by the
> IOMMU vendor driver in the first place? i.e. in the vendor IOMMU driver
> PRQ handler, it will sanitize the request anyway, for anything that
> does not need response, it will not call iommu_report_device_fault().

Right, we're not doing anything with the stop marker anyway. This encoding
is also specific to PCI PRI, and maybe in future architectures, LRW =
0b100 will mean something else and will require a response. So filtering
it in the IOMMU driver makes more sense.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 08/37] iommu/fault: Handle mm faults
@ 2018-02-15 13:51         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-15 13:51 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Mark Rutland, xieyisheng1, ilias.apalodimas, kvm, linux-pci,
	xuzaibo, jonathan.cameron, Will Deacon, okaya, yi.l.liu,
	Lorenzo Pieralisi, ashok.raj, tn, joro, robdclark, bharatku,
	linux-acpi, Catalin Marinas, rfranz, lenb, devicetree,
	alex.williamson, robh+dt, thunder.leizhen, bhelgaas,
	linux-arm-kernel, shunyong.yang, dwmw2, liubo95, rjw, jcrouse,
	iommu, hanjun.guo, Sudeep Holla, Robin Murphy, christian.koenig,
	nwatters

On 14/02/18 18:46, Jacob Pan wrote:
> On Mon, 12 Feb 2018 18:33:23 +0000
> Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:
[...]
>> +	if (!evt->pasid_valid)
>> +		return ret;
> I guess for not we don't handle PRQ without PASID, right?

No. I'm not sure how to implement it, though there have been some requests
(see discussion on 1/37)

>> +	/*
>> +	 * Special case: PASID Stop Marker (LRW = 0b100) doesn't
>> expect a
>> +	 * response. A Stop Marker may be generated when disabling a
>> PASID
>> +	 * (issuing a PASID stop request) in some PCI devices.
>> +	 *
>> +	 * When the mm_exit() callback returns from the device
>> driver, no page
>> +	 * request is generated for this PASID anymore and
>> outstanding ones have
>> +	 * been pushed to the IOMMU (as per PCIe 4.0r1.0 - 6.20.1
>> and 10.4.1.2 -
>> +	 * Managing PASID TLP Prefix Usage). Some PCI devices will
>> wait for all
>> +	 * outstanding page requests to come back with a response
>> before
>> +	 * completing the PASID stop request. Others do not wait for
>> page
>> +	 * responses, and instead issue this Stop Marker that tells
>> us when the
>> +	 * PASID can be reallocated.
>> +	 *
>> +	 * We ignore the Stop Marker because:
>> +	 * a. Page requests, which are posted requests, have been
>> flushed to the
>> +	 *    IOMMU when mm_exit() returns,
>> +	 * b. We flush all fault queues after mm_exit() returns and
>> before
>> +	 *    freeing the PASID.
>> +	 *
>> +	 * So even though the Stop Marker might be issued by the
>> device *after*
>> +	 * the stop request completes, outstanding faults will have
>> been dealt
>> +	 * with by the time we free the PASID.
>> +	 */
>> +	if (evt->last_req &&
>> +	    !(evt->prot & (IOMMU_FAULT_READ | IOMMU_FAULT_WRITE)))
>> +		return IOMMU_PAGE_RESP_HANDLED;
>> +
> If we don't expect a page response, shouldn't it be filtered by the
> IOMMU vendor driver in the first place? i.e. in the vendor IOMMU driver
> PRQ handler, it will sanitize the request anyway, for anything that
> does not need response, it will not call iommu_report_device_fault().

Right, we're not doing anything with the stop marker anyway. This encoding
is also specific to PCI PRI, and maybe in future architectures, LRW =
0b100 will mean something else and will require a response. So filtering
it in the IOMMU driver makes more sense.

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 08/37] iommu/fault: Handle mm faults
@ 2018-02-15 13:51         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-15 13:51 UTC (permalink / raw)
  To: linux-arm-kernel

On 14/02/18 18:46, Jacob Pan wrote:
> On Mon, 12 Feb 2018 18:33:23 +0000
> Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:
[...]
>> +	if (!evt->pasid_valid)
>> +		return ret;
> I guess for not we don't handle PRQ without PASID, right?

No. I'm not sure how to implement it, though there have been some requests
(see discussion on 1/37)

>> +	/*
>> +	 * Special case: PASID Stop Marker (LRW = 0b100) doesn't
>> expect a
>> +	 * response. A Stop Marker may be generated when disabling a
>> PASID
>> +	 * (issuing a PASID stop request) in some PCI devices.
>> +	 *
>> +	 * When the mm_exit() callback returns from the device
>> driver, no page
>> +	 * request is generated for this PASID anymore and
>> outstanding ones have
>> +	 * been pushed to the IOMMU (as per PCIe 4.0r1.0 - 6.20.1
>> and 10.4.1.2 -
>> +	 * Managing PASID TLP Prefix Usage). Some PCI devices will
>> wait for all
>> +	 * outstanding page requests to come back with a response
>> before
>> +	 * completing the PASID stop request. Others do not wait for
>> page
>> +	 * responses, and instead issue this Stop Marker that tells
>> us when the
>> +	 * PASID can be reallocated.
>> +	 *
>> +	 * We ignore the Stop Marker because:
>> +	 * a. Page requests, which are posted requests, have been
>> flushed to the
>> +	 *    IOMMU when mm_exit() returns,
>> +	 * b. We flush all fault queues after mm_exit() returns and
>> before
>> +	 *    freeing the PASID.
>> +	 *
>> +	 * So even though the Stop Marker might be issued by the
>> device *after*
>> +	 * the stop request completes, outstanding faults will have
>> been dealt
>> +	 * with by the time we free the PASID.
>> +	 */
>> +	if (evt->last_req &&
>> +	    !(evt->prot & (IOMMU_FAULT_READ | IOMMU_FAULT_WRITE)))
>> +		return IOMMU_PAGE_RESP_HANDLED;
>> +
> If we don't expect a page response, shouldn't it be filtered by the
> IOMMU vendor driver in the first place? i.e. in the vendor IOMMU driver
> PRQ handler, it will sanitize the request anyway, for anything that
> does not need response, it will not call iommu_report_device_fault().

Right, we're not doing anything with the stop marker anyway. This encoding
is also specific to PCI PRI, and maybe in future architectures, LRW =
0b100 will mean something else and will require a response. So filtering
it in the IOMMU driver makes more sense.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 37/37] vfio: Add support for Shared Virtual Addressing
  2018-02-12 18:33   ` Jean-Philippe Brucker
  (?)
@ 2018-02-16 19:33       ` Alex Williamson
  -1 siblings, 0 replies; 317+ messages in thread
From: Alex Williamson @ 2018-02-16 19:33 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, joro-zLv9SwRftAIdnm+yROfE0A,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, mark.rutland-5wv7dgnIgG8,
	catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	lorenzo.pieralisi-5wv7dgnIgG8, hanjun.guo-QSEj5FYQhm4dnm+yROfE0A,
	sudeep.holla-5wv7dgnIgG8, rjw-LthD3rsA81gm4RdzfppkhA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robin.murphy-5wv7dgnIgG8,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, tn-nYOzD4b6Jr9Wk0Htik3J/w,
	liubo95-hv44wF8Li93QT0dZR+AlfA,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	jonathan.cameron-hv44wF8Li93QT0dZR+AlfA,
	shunyong.yang-PT9Dzx9SjPiXmMXjJBpWqg,
	nwatters-sgV2jX0FEOL9JmXXK+q4OQ, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	jcrouse-sgV2jX0FEOL9JmXXK+q4OQ, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA, yi.l.liu

On Mon, 12 Feb 2018 18:33:52 +0000
Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:

> Add two new ioctl for VFIO containers. VFIO_IOMMU_BIND_PROCESS creates a
> bond between a container and a process address space, identified by a
> device-specific ID named PASID. This allows the device to target DMA
> transactions at the process virtual addresses without a need for mapping
> and unmapping buffers explicitly in the IOMMU. The process page tables are
> shared with the IOMMU, and mechanisms such as PCI ATS/PRI are used to
> handle faults. VFIO_IOMMU_UNBIND_PROCESS removes a bond created with
> VFIO_IOMMU_BIND_PROCESS.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 399 ++++++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/vfio.h       |  76 ++++++++
>  2 files changed, 475 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index e30e29ae4819..cac066f0026b 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -30,6 +30,7 @@
>  #include <linux/iommu.h>
>  #include <linux/module.h>
>  #include <linux/mm.h>
> +#include <linux/ptrace.h>
>  #include <linux/rbtree.h>
>  #include <linux/sched/signal.h>
>  #include <linux/sched/mm.h>
> @@ -60,6 +61,7 @@ MODULE_PARM_DESC(disable_hugepages,
>  
>  struct vfio_iommu {
>  	struct list_head	domain_list;
> +	struct list_head	mm_list;
>  	struct vfio_domain	*external_domain; /* domain for external user */
>  	struct mutex		lock;
>  	struct rb_root		dma_list;
> @@ -90,6 +92,15 @@ struct vfio_dma {
>  struct vfio_group {
>  	struct iommu_group	*iommu_group;
>  	struct list_head	next;
> +	bool			sva_enabled;
> +};
> +
> +struct vfio_mm {
> +#define VFIO_PASID_INVALID	(-1)
> +	spinlock_t		lock;
> +	int			pasid;
> +	struct mm_struct	*mm;
> +	struct list_head	next;
>  };
>  
>  /*
> @@ -1117,6 +1128,157 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
>  	return 0;
>  }
>  
> +static int vfio_iommu_mm_exit(struct device *dev, int pasid, void *data)
> +{
> +	struct vfio_mm *vfio_mm = data;
> +
> +	/*
> +	 * The mm_exit callback cannot block, so we can't take the iommu mutex
> +	 * and remove this vfio_mm from the list. Hopefully the SVA code will
> +	 * relax its locking requirement in the future.
> +	 *
> +	 * We mostly care about attach_group, which will attempt to replay all
> +	 * binds in this container. Ensure that it doesn't touch this defunct mm
> +	 * struct, by clearing the pointer. The structure will be freed when the
> +	 * group is removed from the container.
> +	 */
> +	spin_lock(&vfio_mm->lock);
> +	vfio_mm->mm = NULL;
> +	spin_unlock(&vfio_mm->lock);
> +
> +	return 0;
> +}
> +
> +static int vfio_iommu_sva_init(struct device *dev, void *data)
> +{
> +
> +	int ret;
> +
> +	ret = iommu_sva_device_init(dev, IOMMU_SVA_FEAT_PASID |
> +				    IOMMU_SVA_FEAT_IOPF, 0);
> +	if (ret)
> +		return ret;
> +
> +	return iommu_register_mm_exit_handler(dev, vfio_iommu_mm_exit);
> +}
> +
> +static int vfio_iommu_sva_shutdown(struct device *dev, void *data)
> +{
> +	iommu_sva_device_shutdown(dev);
> +	iommu_unregister_mm_exit_handler(dev);

Typically the order would be reverse of the setup, is it correct this
way?

> +
> +	return 0;
> +}
> +
> +static int vfio_iommu_bind_group(struct vfio_iommu *iommu,
> +				 struct vfio_group *group,
> +				 struct vfio_mm *vfio_mm)
> +{
> +	int ret;
> +	int pasid;
> +
> +	if (!group->sva_enabled) {
> +		ret = iommu_group_for_each_dev(group->iommu_group, NULL,
> +					       vfio_iommu_sva_init);
> +		if (ret)
> +			return ret;

Seems were at an unknown state here, do we need to undo any that
succeeded?

> +
> +		group->sva_enabled = true;
> +	}
> +
> +	ret = iommu_sva_bind_group(group->iommu_group, vfio_mm->mm, &pasid,
> +				   IOMMU_SVA_FEAT_PASID | IOMMU_SVA_FEAT_IOPF,
> +				   vfio_mm);
> +	if (ret)
> +		return ret;
> +
> +	if (WARN_ON(vfio_mm->pasid != VFIO_PASID_INVALID && pasid !=
> +		    vfio_mm->pasid))
> +		return -EFAULT;
> +
> +	vfio_mm->pasid = pasid;
> +
> +	return 0;
> +}
> +
> +static void vfio_iommu_unbind_group(struct vfio_group *group,
> +				    struct vfio_mm *vfio_mm)
> +{
> +	iommu_sva_unbind_group(group->iommu_group, vfio_mm->pasid);
> +}
> +
> +static void vfio_iommu_unbind(struct vfio_iommu *iommu,
> +			      struct vfio_mm *vfio_mm)
> +{
> +	struct vfio_group *group;
> +	struct vfio_domain *domain;
> +
> +	list_for_each_entry(domain, &iommu->domain_list, next)
> +		list_for_each_entry(group, &domain->group_list, next)
> +			vfio_iommu_unbind_group(group, vfio_mm);
> +}
> +
> +static bool vfio_mm_get(struct vfio_mm *vfio_mm)
> +{
> +	bool ret;
> +
> +	spin_lock(&vfio_mm->lock);
> +	ret = vfio_mm->mm && mmget_not_zero(vfio_mm->mm);
> +	spin_unlock(&vfio_mm->lock);
> +
> +	return ret;
> +}
> +
> +static void vfio_mm_put(struct vfio_mm *vfio_mm)
> +{
> +	mmput(vfio_mm->mm);
> +}
> +
> +static int vfio_iommu_replay_bind(struct vfio_iommu *iommu, struct vfio_group *group)
> +{
> +	int ret = 0;
> +	struct vfio_mm *vfio_mm;
> +
> +	list_for_each_entry(vfio_mm, &iommu->mm_list, next) {
> +		/*
> +		 * Ensure mm doesn't exit while we're binding it to the new
> +		 * group.
> +		 */
> +		if (!vfio_mm_get(vfio_mm))
> +			continue;
> +		ret = vfio_iommu_bind_group(iommu, group, vfio_mm);
> +		vfio_mm_put(vfio_mm);
> +
> +		if (ret)
> +			goto out_unbind;
> +	}
> +
> +	return 0;
> +
> +out_unbind:
> +	list_for_each_entry_continue_reverse(vfio_mm, &iommu->mm_list, next) {
> +		if (!vfio_mm_get(vfio_mm))
> +			continue;
> +		iommu_sva_unbind_group(group->iommu_group, vfio_mm->pasid);
> +		vfio_mm_put(vfio_mm);
> +	}
> +
> +	return ret;
> +}
> +
> +static void vfio_iommu_free_all_mm(struct vfio_iommu *iommu)
> +{
> +	struct vfio_mm *vfio_mm, *tmp;
> +
> +	/*
> +	 * No need for unbind() here. Since all groups are detached from this
> +	 * iommu, bonds have been removed.
> +	 */
> +	list_for_each_entry_safe(vfio_mm, tmp, &iommu->mm_list, next)
> +		kfree(vfio_mm);
> +	INIT_LIST_HEAD(&iommu->mm_list);
> +}
> +
>  /*
>   * We change our unmap behavior slightly depending on whether the IOMMU
>   * supports fine-grained superpages.  IOMMUs like AMD-Vi will use a superpage
> @@ -1301,6 +1463,15 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>  		    d->prot == domain->prot) {
>  			iommu_detach_group(domain->domain, iommu_group);
>  			if (!iommu_attach_group(d->domain, iommu_group)) {
> +				if (vfio_iommu_replay_bind(iommu, group)) {
> +					iommu_detach_group(d->domain, iommu_group);
> +					ret = iommu_attach_group(domain->domain,
> +								 iommu_group);
> +					if (ret)
> +						goto out_domain;
> +					continue;
> +				}
> +
>  				list_add(&group->next, &d->group_list);
>  				iommu_domain_free(domain->domain);
>  				kfree(domain);
> @@ -1321,6 +1492,10 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>  	if (ret)
>  		goto out_detach;
>  
> +	ret = vfio_iommu_replay_bind(iommu, group);
> +	if (ret)
> +		goto out_detach;
> +
>  	if (resv_msi) {
>  		ret = iommu_get_msi_cookie(domain->domain, resv_msi_base);
>  		if (ret)
> @@ -1426,6 +1601,11 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
>  			continue;
>  
>  		iommu_detach_group(domain->domain, iommu_group);
> +		if (group->sva_enabled) {
> +			iommu_group_for_each_dev(iommu_group, NULL,
> +						 vfio_iommu_sva_shutdown);
> +			group->sva_enabled = false;
> +		}
>  		list_del(&group->next);
>  		kfree(group);
>  		/*
> @@ -1441,6 +1621,7 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
>  					vfio_iommu_unmap_unpin_all(iommu);
>  				else
>  					vfio_iommu_unmap_unpin_reaccount(iommu);
> +				vfio_iommu_free_all_mm(iommu);
>  			}
>  			iommu_domain_free(domain->domain);
>  			list_del(&domain->next);
> @@ -1475,6 +1656,7 @@ static void *vfio_iommu_type1_open(unsigned long arg)
>  	}
>  
>  	INIT_LIST_HEAD(&iommu->domain_list);
> +	INIT_LIST_HEAD(&iommu->mm_list);
>  	iommu->dma_list = RB_ROOT;
>  	mutex_init(&iommu->lock);
>  	BLOCKING_INIT_NOTIFIER_HEAD(&iommu->notifier);
> @@ -1509,6 +1691,7 @@ static void vfio_iommu_type1_release(void *iommu_data)
>  		kfree(iommu->external_domain);
>  	}
>  
> +	vfio_iommu_free_all_mm(iommu);
>  	vfio_iommu_unmap_unpin_all(iommu);
>  
>  	list_for_each_entry_safe(domain, domain_tmp,
> @@ -1537,6 +1720,184 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
>  	return ret;
>  }
>  
> +static struct mm_struct *vfio_iommu_get_mm_by_vpid(pid_t vpid)
> +{
> +	struct mm_struct *mm;
> +	struct task_struct *task;
> +
> +	rcu_read_lock();
> +	task = find_task_by_vpid(vpid);
> +	if (task)
> +		get_task_struct(task);
> +	rcu_read_unlock();
> +	if (!task)
> +		return ERR_PTR(-ESRCH);
> +
> +	/* Ensure that current has RW access on the mm */
> +	mm = mm_access(task, PTRACE_MODE_ATTACH_REALCREDS);
> +	put_task_struct(task);
> +
> +	if (!mm)
> +		return ERR_PTR(-ESRCH);
> +
> +	return mm;
> +}
> +
> +static long vfio_iommu_type1_bind_process(struct vfio_iommu *iommu,
> +					  void __user *arg,
> +					  struct vfio_iommu_type1_bind *bind)
> +{
> +	struct vfio_iommu_type1_bind_process params;
> +	struct vfio_domain *domain;
> +	struct vfio_group *group;
> +	struct vfio_mm *vfio_mm;
> +	struct mm_struct *mm;
> +	unsigned long minsz;
> +	int ret = 0;
> +
> +	minsz = sizeof(*bind) + sizeof(params);
> +	if (bind->argsz < minsz)
> +		return -EINVAL;
> +
> +	arg += sizeof(*bind);
> +	if (copy_from_user(&params, arg, sizeof(params)))
> +		return -EFAULT;
> +
> +	if (params.flags & ~VFIO_IOMMU_BIND_PID)
> +		return -EINVAL;
> +
> +	if (params.flags & VFIO_IOMMU_BIND_PID) {
> +		mm = vfio_iommu_get_mm_by_vpid(params.pid);
> +		if (IS_ERR(mm))
> +			return PTR_ERR(mm);
> +	} else {
> +		mm = get_task_mm(current);
> +		if (!mm)
> +			return -EINVAL;
> +	}
> +
> +	mutex_lock(&iommu->lock);
> +	if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) {
> +		ret = -EINVAL;
> +		goto out_put_mm;
> +	}
> +
> +	list_for_each_entry(vfio_mm, &iommu->mm_list, next) {
> +		if (vfio_mm->mm != mm)
> +			continue;
> +
> +		params.pasid = vfio_mm->pasid;
> +
> +		ret = copy_to_user(arg, &params, sizeof(params)) ? -EFAULT : 0;
> +		goto out_put_mm;
> +	}
> +
> +	vfio_mm = kzalloc(sizeof(*vfio_mm), GFP_KERNEL);
> +	if (!vfio_mm) {
> +		ret = -ENOMEM;
> +		goto out_put_mm;
> +	}
> +
> +	vfio_mm->mm = mm;
> +	vfio_mm->pasid = VFIO_PASID_INVALID;
> +	spin_lock_init(&vfio_mm->lock);
> +
> +	list_for_each_entry(domain, &iommu->domain_list, next) {
> +		list_for_each_entry(group, &domain->group_list, next) {
> +			ret = vfio_iommu_bind_group(iommu, group, vfio_mm);
> +			if (ret)
> +				break;
> +		}
> +		if (ret)
> +			break;
> +	}
> +
> +	if (ret) {
> +		/* Undo all binds that already succeeded */
> +		list_for_each_entry_continue_reverse(group, &domain->group_list,
> +						     next)
> +			vfio_iommu_unbind_group(group, vfio_mm);
> +		list_for_each_entry_continue_reverse(domain, &iommu->domain_list,
> +						     next)
> +			list_for_each_entry(group, &domain->group_list, next)
> +				vfio_iommu_unbind_group(group, vfio_mm);
> +		kfree(vfio_mm);
> +	} else {
> +		list_add(&vfio_mm->next, &iommu->mm_list);
> +
> +		params.pasid = vfio_mm->pasid;
> +		ret = copy_to_user(arg, &params, sizeof(params)) ? -EFAULT : 0;
> +		if (ret) {
> +			vfio_iommu_unbind(iommu, vfio_mm);
> +			kfree(vfio_mm);
> +		}
> +	}
> +
> +out_put_mm:
> +	mutex_unlock(&iommu->lock);
> +	mmput(mm);
> +
> +	return ret;
> +}
> +
> +static long vfio_iommu_type1_unbind_process(struct vfio_iommu *iommu,
> +					    void __user *arg,
> +					    struct vfio_iommu_type1_bind *bind)
> +{
> +	int ret = -EINVAL;
> +	unsigned long minsz;
> +	struct mm_struct *mm;
> +	struct vfio_mm *vfio_mm;
> +	struct vfio_iommu_type1_bind_process params;
> +
> +	minsz = sizeof(*bind) + sizeof(params);
> +	if (bind->argsz < minsz)
> +		return -EINVAL;
> +
> +	arg += sizeof(*bind);
> +	if (copy_from_user(&params, arg, sizeof(params)))
> +		return -EFAULT;
> +
> +	if (params.flags & ~VFIO_IOMMU_BIND_PID)
> +		return -EINVAL;
> +
> +	/*
> +	 * We can't simply unbind a foreign process by PASID, because the
> +	 * process might have died and the PASID might have been reallocated to
> +	 * another process. Instead we need to fetch that process mm by PID
> +	 * again to make sure we remove the right vfio_mm. In addition, holding
> +	 * the mm guarantees that mm_users isn't dropped while we unbind and the
> +	 * exit_mm handler doesn't fire. While not strictly necessary, not
> +	 * having to care about that race simplifies everyone's life.
> +	 */
> +	if (params.flags & VFIO_IOMMU_BIND_PID) {
> +		mm = vfio_iommu_get_mm_by_vpid(params.pid);
> +		if (IS_ERR(mm))
> +			return PTR_ERR(mm);

I don't understand how this works for a process that has exited, the
mm_exit function gets called to clear vfio_mm.mm, the above may or may
not work (could be new ptrace'able process with same pid), but it won't
match the mm below, so is the vfio_mm that mm_exit zapped forever stuck
in this list until the container is destroyed?

> +	} else {
> +		mm = get_task_mm(current);
> +		if (!mm)
> +			return -EINVAL;
> +	}
> +
> +	ret = -ESRCH;
> +	mutex_lock(&iommu->lock);
> +	list_for_each_entry(vfio_mm, &iommu->mm_list, next) {
> +		if (vfio_mm->mm != mm)
> +			continue;
> +
> +		vfio_iommu_unbind(iommu, vfio_mm);
> +		list_del(&vfio_mm->next);
> +		kfree(vfio_mm);
> +		ret = 0;
> +		break;
> +	}
> +	mutex_unlock(&iommu->lock);
> +	mmput(mm);
> +
> +	return ret;
> +}
> +
>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>  				   unsigned int cmd, unsigned long arg)
>  {
> @@ -1607,6 +1968,44 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  
>  		return copy_to_user((void __user *)arg, &unmap, minsz) ?
>  			-EFAULT : 0;
> +
> +	} else if (cmd == VFIO_IOMMU_BIND) {
> +		struct vfio_iommu_type1_bind bind;
> +
> +		minsz = offsetofend(struct vfio_iommu_type1_bind, mode);
> +
> +		if (copy_from_user(&bind, (void __user *)arg, minsz))
> +			return -EFAULT;
> +
> +		if (bind.argsz < minsz)
> +			return -EINVAL;
> +
> +		switch (bind.mode) {
> +		case VFIO_IOMMU_BIND_PROCESS:
> +			return vfio_iommu_type1_bind_process(iommu, (void *)arg,
> +							     &bind);
> +		default:
> +			return -EINVAL;
> +		}
> +
> +	} else if (cmd == VFIO_IOMMU_UNBIND) {
> +		struct vfio_iommu_type1_bind bind;
> +
> +		minsz = offsetofend(struct vfio_iommu_type1_bind, mode);
> +
> +		if (copy_from_user(&bind, (void __user *)arg, minsz))
> +			return -EFAULT;
> +
> +		if (bind.argsz < minsz)
> +			return -EINVAL;
> +
> +		switch (bind.mode) {
> +		case VFIO_IOMMU_BIND_PROCESS:
> +			return vfio_iommu_type1_unbind_process(iommu, (void *)arg,
> +							       &bind);
> +		default:
> +			return -EINVAL;
> +		}
>  	}
>  
>  	return -ENOTTY;
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index c74372163ed2..e1b9b8c58916 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -638,6 +638,82 @@ struct vfio_iommu_type1_dma_unmap {
>  #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
>  #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
>  
> +/*
> + * VFIO_IOMMU_BIND_PROCESS
> + *
> + * Allocate a PASID for a process address space, and use it to attach this
> + * process to all devices in the container. Devices can then tag their DMA
> + * traffic with the returned @pasid to perform transactions on the associated
> + * virtual address space. Mapping and unmapping buffers is performed by standard
> + * functions such as mmap and malloc.
> + *
> + * If flag is VFIO_IOMMU_BIND_PID, @pid contains the pid of a foreign process to
> + * bind. Otherwise the current task is bound. Given that the caller owns the
> + * device, setting this flag grants the caller read and write permissions on the
> + * entire address space of foreign process described by @pid. Therefore,
> + * permission to perform the bind operation on a foreign process is governed by
> + * the ptrace access mode PTRACE_MODE_ATTACH_REALCREDS check. See man ptrace(2)
> + * for more information.
> + *
> + * On success, VFIO writes a Process Address Space ID (PASID) into @pasid. This
> + * ID is unique to a process and can be used on all devices in the container.
> + *
> + * On fork, the child inherits the device fd and can use the bonds setup by its
> + * parent. Consequently, the child has R/W access on the address spaces bound by
> + * its parent. After an execv, the device fd is closed and the child doesn't
> + * have access to the address space anymore.
> + *
> + * To remove a bond between process and container, VFIO_IOMMU_UNBIND ioctl is
> + * issued with the same parameters. If a pid was specified in VFIO_IOMMU_BIND,
> + * it should also be present for VFIO_IOMMU_UNBIND. Otherwise unbind the current
> + * task from the container.
> + */
> +struct vfio_iommu_type1_bind_process {
> +	__u32	flags;
> +#define VFIO_IOMMU_BIND_PID		(1 << 0)
> +	__u32	pasid;
> +	__s32	pid;
> +};
> +
> +/*
> + * Only mode supported at the moment is VFIO_IOMMU_BIND_PROCESS, which takes
> + * vfio_iommu_type1_bind_process in data.
> + */
> +struct vfio_iommu_type1_bind {
> +	__u32	argsz;
> +	__u32	mode;

s/mode/flags/

> +#define VFIO_IOMMU_BIND_PROCESS		(1 << 0)
> +	__u8	data[];
> +};

I'm not convinced having a separate vfio_iommu_type1_bind_process
struct is necessary.  It seems like we always expect to return a pasid,
only the pid is optional, but that could be handled by a single
structure with a flag bit to indicate a pid bind is requested.

> +
> +/*
> + * VFIO_IOMMU_BIND - _IOWR(VFIO_TYPE, VFIO_BASE + 22, struct vfio_iommu_bind)

vfio_iommu_type1_bind

> + *
> + * Manage address spaces of devices in this container. Initially a TYPE1
> + * container can only have one address space, managed with
> + * VFIO_IOMMU_MAP/UNMAP_DMA.
> + *
> + * An IOMMU of type VFIO_TYPE1_NESTING_IOMMU can be managed by both MAP/UNMAP
> + * and BIND ioctls at the same time. MAP/UNMAP acts on the stage-2 (host) page
> + * tables, and BIND manages the stage-1 (guest) page tables. Other types of
> + * IOMMU may allow MAP/UNMAP and BIND to coexist, where MAP/UNMAP controls
> + * non-PASID traffic and BIND controls PASID traffic. But this depends on the
> + * underlying IOMMU architecture and isn't guaranteed.
> + *
> + * Availability of this feature depends on the device, its bus, the underlying
> + * IOMMU and the CPU architecture.
> + *
> + * returns: 0 on success, -errno on failure.
> + */
> +#define VFIO_IOMMU_BIND		_IO(VFIO_TYPE, VFIO_BASE + 22)
> +
> +/*
> + * VFIO_IOMMU_UNBIND - _IOWR(VFIO_TYPE, VFIO_BASE + 23, struct vfio_iommu_bind)

vifo_iommu_type1_bind

> + *
> + * Undo what was done by the corresponding VFIO_IOMMU_BIND ioctl.
> + */
> +#define VFIO_IOMMU_UNBIND	_IO(VFIO_TYPE, VFIO_BASE + 23)
> +
>  /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
>  
>  /*

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 37/37] vfio: Add support for Shared Virtual Addressing
@ 2018-02-16 19:33       ` Alex Williamson
  0 siblings, 0 replies; 317+ messages in thread
From: Alex Williamson @ 2018-02-16 19:33 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: mark.rutland, xieyisheng1, ilias.apalodimas, kvm, linux-pci,
	xuzaibo, jonathan.cameron, will.deacon, okaya, yi.l.liu,
	lorenzo.pieralisi, ashok.raj, tn, joro, robdclark, bharatku,
	linux-acpi, catalin.marinas, rfranz, lenb, devicetree,
	jacob.jun.pan, jcrouse, robh+dt, thunder.leizhen, bhelgaas,
	linux-arm-kernel, shunyong.yang, dwmw2, liubo95, rjw, iommu,
	hanjun.guo, sudeep.holla, robin.murphy, christian.koenig,
	nwatters

On Mon, 12 Feb 2018 18:33:52 +0000
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> Add two new ioctl for VFIO containers. VFIO_IOMMU_BIND_PROCESS creates a
> bond between a container and a process address space, identified by a
> device-specific ID named PASID. This allows the device to target DMA
> transactions at the process virtual addresses without a need for mapping
> and unmapping buffers explicitly in the IOMMU. The process page tables are
> shared with the IOMMU, and mechanisms such as PCI ATS/PRI are used to
> handle faults. VFIO_IOMMU_UNBIND_PROCESS removes a bond created with
> VFIO_IOMMU_BIND_PROCESS.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 399 ++++++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/vfio.h       |  76 ++++++++
>  2 files changed, 475 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index e30e29ae4819..cac066f0026b 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -30,6 +30,7 @@
>  #include <linux/iommu.h>
>  #include <linux/module.h>
>  #include <linux/mm.h>
> +#include <linux/ptrace.h>
>  #include <linux/rbtree.h>
>  #include <linux/sched/signal.h>
>  #include <linux/sched/mm.h>
> @@ -60,6 +61,7 @@ MODULE_PARM_DESC(disable_hugepages,
>  
>  struct vfio_iommu {
>  	struct list_head	domain_list;
> +	struct list_head	mm_list;
>  	struct vfio_domain	*external_domain; /* domain for external user */
>  	struct mutex		lock;
>  	struct rb_root		dma_list;
> @@ -90,6 +92,15 @@ struct vfio_dma {
>  struct vfio_group {
>  	struct iommu_group	*iommu_group;
>  	struct list_head	next;
> +	bool			sva_enabled;
> +};
> +
> +struct vfio_mm {
> +#define VFIO_PASID_INVALID	(-1)
> +	spinlock_t		lock;
> +	int			pasid;
> +	struct mm_struct	*mm;
> +	struct list_head	next;
>  };
>  
>  /*
> @@ -1117,6 +1128,157 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
>  	return 0;
>  }
>  
> +static int vfio_iommu_mm_exit(struct device *dev, int pasid, void *data)
> +{
> +	struct vfio_mm *vfio_mm = data;
> +
> +	/*
> +	 * The mm_exit callback cannot block, so we can't take the iommu mutex
> +	 * and remove this vfio_mm from the list. Hopefully the SVA code will
> +	 * relax its locking requirement in the future.
> +	 *
> +	 * We mostly care about attach_group, which will attempt to replay all
> +	 * binds in this container. Ensure that it doesn't touch this defunct mm
> +	 * struct, by clearing the pointer. The structure will be freed when the
> +	 * group is removed from the container.
> +	 */
> +	spin_lock(&vfio_mm->lock);
> +	vfio_mm->mm = NULL;
> +	spin_unlock(&vfio_mm->lock);
> +
> +	return 0;
> +}
> +
> +static int vfio_iommu_sva_init(struct device *dev, void *data)
> +{
> +
> +	int ret;
> +
> +	ret = iommu_sva_device_init(dev, IOMMU_SVA_FEAT_PASID |
> +				    IOMMU_SVA_FEAT_IOPF, 0);
> +	if (ret)
> +		return ret;
> +
> +	return iommu_register_mm_exit_handler(dev, vfio_iommu_mm_exit);
> +}
> +
> +static int vfio_iommu_sva_shutdown(struct device *dev, void *data)
> +{
> +	iommu_sva_device_shutdown(dev);
> +	iommu_unregister_mm_exit_handler(dev);

Typically the order would be reverse of the setup, is it correct this
way?

> +
> +	return 0;
> +}
> +
> +static int vfio_iommu_bind_group(struct vfio_iommu *iommu,
> +				 struct vfio_group *group,
> +				 struct vfio_mm *vfio_mm)
> +{
> +	int ret;
> +	int pasid;
> +
> +	if (!group->sva_enabled) {
> +		ret = iommu_group_for_each_dev(group->iommu_group, NULL,
> +					       vfio_iommu_sva_init);
> +		if (ret)
> +			return ret;

Seems were at an unknown state here, do we need to undo any that
succeeded?

> +
> +		group->sva_enabled = true;
> +	}
> +
> +	ret = iommu_sva_bind_group(group->iommu_group, vfio_mm->mm, &pasid,
> +				   IOMMU_SVA_FEAT_PASID | IOMMU_SVA_FEAT_IOPF,
> +				   vfio_mm);
> +	if (ret)
> +		return ret;
> +
> +	if (WARN_ON(vfio_mm->pasid != VFIO_PASID_INVALID && pasid !=
> +		    vfio_mm->pasid))
> +		return -EFAULT;
> +
> +	vfio_mm->pasid = pasid;
> +
> +	return 0;
> +}
> +
> +static void vfio_iommu_unbind_group(struct vfio_group *group,
> +				    struct vfio_mm *vfio_mm)
> +{
> +	iommu_sva_unbind_group(group->iommu_group, vfio_mm->pasid);
> +}
> +
> +static void vfio_iommu_unbind(struct vfio_iommu *iommu,
> +			      struct vfio_mm *vfio_mm)
> +{
> +	struct vfio_group *group;
> +	struct vfio_domain *domain;
> +
> +	list_for_each_entry(domain, &iommu->domain_list, next)
> +		list_for_each_entry(group, &domain->group_list, next)
> +			vfio_iommu_unbind_group(group, vfio_mm);
> +}
> +
> +static bool vfio_mm_get(struct vfio_mm *vfio_mm)
> +{
> +	bool ret;
> +
> +	spin_lock(&vfio_mm->lock);
> +	ret = vfio_mm->mm && mmget_not_zero(vfio_mm->mm);
> +	spin_unlock(&vfio_mm->lock);
> +
> +	return ret;
> +}
> +
> +static void vfio_mm_put(struct vfio_mm *vfio_mm)
> +{
> +	mmput(vfio_mm->mm);
> +}
> +
> +static int vfio_iommu_replay_bind(struct vfio_iommu *iommu, struct vfio_group *group)
> +{
> +	int ret = 0;
> +	struct vfio_mm *vfio_mm;
> +
> +	list_for_each_entry(vfio_mm, &iommu->mm_list, next) {
> +		/*
> +		 * Ensure mm doesn't exit while we're binding it to the new
> +		 * group.
> +		 */
> +		if (!vfio_mm_get(vfio_mm))
> +			continue;
> +		ret = vfio_iommu_bind_group(iommu, group, vfio_mm);
> +		vfio_mm_put(vfio_mm);
> +
> +		if (ret)
> +			goto out_unbind;
> +	}
> +
> +	return 0;
> +
> +out_unbind:
> +	list_for_each_entry_continue_reverse(vfio_mm, &iommu->mm_list, next) {
> +		if (!vfio_mm_get(vfio_mm))
> +			continue;
> +		iommu_sva_unbind_group(group->iommu_group, vfio_mm->pasid);
> +		vfio_mm_put(vfio_mm);
> +	}
> +
> +	return ret;
> +}
> +
> +static void vfio_iommu_free_all_mm(struct vfio_iommu *iommu)
> +{
> +	struct vfio_mm *vfio_mm, *tmp;
> +
> +	/*
> +	 * No need for unbind() here. Since all groups are detached from this
> +	 * iommu, bonds have been removed.
> +	 */
> +	list_for_each_entry_safe(vfio_mm, tmp, &iommu->mm_list, next)
> +		kfree(vfio_mm);
> +	INIT_LIST_HEAD(&iommu->mm_list);
> +}
> +
>  /*
>   * We change our unmap behavior slightly depending on whether the IOMMU
>   * supports fine-grained superpages.  IOMMUs like AMD-Vi will use a superpage
> @@ -1301,6 +1463,15 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>  		    d->prot == domain->prot) {
>  			iommu_detach_group(domain->domain, iommu_group);
>  			if (!iommu_attach_group(d->domain, iommu_group)) {
> +				if (vfio_iommu_replay_bind(iommu, group)) {
> +					iommu_detach_group(d->domain, iommu_group);
> +					ret = iommu_attach_group(domain->domain,
> +								 iommu_group);
> +					if (ret)
> +						goto out_domain;
> +					continue;
> +				}
> +
>  				list_add(&group->next, &d->group_list);
>  				iommu_domain_free(domain->domain);
>  				kfree(domain);
> @@ -1321,6 +1492,10 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>  	if (ret)
>  		goto out_detach;
>  
> +	ret = vfio_iommu_replay_bind(iommu, group);
> +	if (ret)
> +		goto out_detach;
> +
>  	if (resv_msi) {
>  		ret = iommu_get_msi_cookie(domain->domain, resv_msi_base);
>  		if (ret)
> @@ -1426,6 +1601,11 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
>  			continue;
>  
>  		iommu_detach_group(domain->domain, iommu_group);
> +		if (group->sva_enabled) {
> +			iommu_group_for_each_dev(iommu_group, NULL,
> +						 vfio_iommu_sva_shutdown);
> +			group->sva_enabled = false;
> +		}
>  		list_del(&group->next);
>  		kfree(group);
>  		/*
> @@ -1441,6 +1621,7 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
>  					vfio_iommu_unmap_unpin_all(iommu);
>  				else
>  					vfio_iommu_unmap_unpin_reaccount(iommu);
> +				vfio_iommu_free_all_mm(iommu);
>  			}
>  			iommu_domain_free(domain->domain);
>  			list_del(&domain->next);
> @@ -1475,6 +1656,7 @@ static void *vfio_iommu_type1_open(unsigned long arg)
>  	}
>  
>  	INIT_LIST_HEAD(&iommu->domain_list);
> +	INIT_LIST_HEAD(&iommu->mm_list);
>  	iommu->dma_list = RB_ROOT;
>  	mutex_init(&iommu->lock);
>  	BLOCKING_INIT_NOTIFIER_HEAD(&iommu->notifier);
> @@ -1509,6 +1691,7 @@ static void vfio_iommu_type1_release(void *iommu_data)
>  		kfree(iommu->external_domain);
>  	}
>  
> +	vfio_iommu_free_all_mm(iommu);
>  	vfio_iommu_unmap_unpin_all(iommu);
>  
>  	list_for_each_entry_safe(domain, domain_tmp,
> @@ -1537,6 +1720,184 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
>  	return ret;
>  }
>  
> +static struct mm_struct *vfio_iommu_get_mm_by_vpid(pid_t vpid)
> +{
> +	struct mm_struct *mm;
> +	struct task_struct *task;
> +
> +	rcu_read_lock();
> +	task = find_task_by_vpid(vpid);
> +	if (task)
> +		get_task_struct(task);
> +	rcu_read_unlock();
> +	if (!task)
> +		return ERR_PTR(-ESRCH);
> +
> +	/* Ensure that current has RW access on the mm */
> +	mm = mm_access(task, PTRACE_MODE_ATTACH_REALCREDS);
> +	put_task_struct(task);
> +
> +	if (!mm)
> +		return ERR_PTR(-ESRCH);
> +
> +	return mm;
> +}
> +
> +static long vfio_iommu_type1_bind_process(struct vfio_iommu *iommu,
> +					  void __user *arg,
> +					  struct vfio_iommu_type1_bind *bind)
> +{
> +	struct vfio_iommu_type1_bind_process params;
> +	struct vfio_domain *domain;
> +	struct vfio_group *group;
> +	struct vfio_mm *vfio_mm;
> +	struct mm_struct *mm;
> +	unsigned long minsz;
> +	int ret = 0;
> +
> +	minsz = sizeof(*bind) + sizeof(params);
> +	if (bind->argsz < minsz)
> +		return -EINVAL;
> +
> +	arg += sizeof(*bind);
> +	if (copy_from_user(&params, arg, sizeof(params)))
> +		return -EFAULT;
> +
> +	if (params.flags & ~VFIO_IOMMU_BIND_PID)
> +		return -EINVAL;
> +
> +	if (params.flags & VFIO_IOMMU_BIND_PID) {
> +		mm = vfio_iommu_get_mm_by_vpid(params.pid);
> +		if (IS_ERR(mm))
> +			return PTR_ERR(mm);
> +	} else {
> +		mm = get_task_mm(current);
> +		if (!mm)
> +			return -EINVAL;
> +	}
> +
> +	mutex_lock(&iommu->lock);
> +	if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) {
> +		ret = -EINVAL;
> +		goto out_put_mm;
> +	}
> +
> +	list_for_each_entry(vfio_mm, &iommu->mm_list, next) {
> +		if (vfio_mm->mm != mm)
> +			continue;
> +
> +		params.pasid = vfio_mm->pasid;
> +
> +		ret = copy_to_user(arg, &params, sizeof(params)) ? -EFAULT : 0;
> +		goto out_put_mm;
> +	}
> +
> +	vfio_mm = kzalloc(sizeof(*vfio_mm), GFP_KERNEL);
> +	if (!vfio_mm) {
> +		ret = -ENOMEM;
> +		goto out_put_mm;
> +	}
> +
> +	vfio_mm->mm = mm;
> +	vfio_mm->pasid = VFIO_PASID_INVALID;
> +	spin_lock_init(&vfio_mm->lock);
> +
> +	list_for_each_entry(domain, &iommu->domain_list, next) {
> +		list_for_each_entry(group, &domain->group_list, next) {
> +			ret = vfio_iommu_bind_group(iommu, group, vfio_mm);
> +			if (ret)
> +				break;
> +		}
> +		if (ret)
> +			break;
> +	}
> +
> +	if (ret) {
> +		/* Undo all binds that already succeeded */
> +		list_for_each_entry_continue_reverse(group, &domain->group_list,
> +						     next)
> +			vfio_iommu_unbind_group(group, vfio_mm);
> +		list_for_each_entry_continue_reverse(domain, &iommu->domain_list,
> +						     next)
> +			list_for_each_entry(group, &domain->group_list, next)
> +				vfio_iommu_unbind_group(group, vfio_mm);
> +		kfree(vfio_mm);
> +	} else {
> +		list_add(&vfio_mm->next, &iommu->mm_list);
> +
> +		params.pasid = vfio_mm->pasid;
> +		ret = copy_to_user(arg, &params, sizeof(params)) ? -EFAULT : 0;
> +		if (ret) {
> +			vfio_iommu_unbind(iommu, vfio_mm);
> +			kfree(vfio_mm);
> +		}
> +	}
> +
> +out_put_mm:
> +	mutex_unlock(&iommu->lock);
> +	mmput(mm);
> +
> +	return ret;
> +}
> +
> +static long vfio_iommu_type1_unbind_process(struct vfio_iommu *iommu,
> +					    void __user *arg,
> +					    struct vfio_iommu_type1_bind *bind)
> +{
> +	int ret = -EINVAL;
> +	unsigned long minsz;
> +	struct mm_struct *mm;
> +	struct vfio_mm *vfio_mm;
> +	struct vfio_iommu_type1_bind_process params;
> +
> +	minsz = sizeof(*bind) + sizeof(params);
> +	if (bind->argsz < minsz)
> +		return -EINVAL;
> +
> +	arg += sizeof(*bind);
> +	if (copy_from_user(&params, arg, sizeof(params)))
> +		return -EFAULT;
> +
> +	if (params.flags & ~VFIO_IOMMU_BIND_PID)
> +		return -EINVAL;
> +
> +	/*
> +	 * We can't simply unbind a foreign process by PASID, because the
> +	 * process might have died and the PASID might have been reallocated to
> +	 * another process. Instead we need to fetch that process mm by PID
> +	 * again to make sure we remove the right vfio_mm. In addition, holding
> +	 * the mm guarantees that mm_users isn't dropped while we unbind and the
> +	 * exit_mm handler doesn't fire. While not strictly necessary, not
> +	 * having to care about that race simplifies everyone's life.
> +	 */
> +	if (params.flags & VFIO_IOMMU_BIND_PID) {
> +		mm = vfio_iommu_get_mm_by_vpid(params.pid);
> +		if (IS_ERR(mm))
> +			return PTR_ERR(mm);

I don't understand how this works for a process that has exited, the
mm_exit function gets called to clear vfio_mm.mm, the above may or may
not work (could be new ptrace'able process with same pid), but it won't
match the mm below, so is the vfio_mm that mm_exit zapped forever stuck
in this list until the container is destroyed?

> +	} else {
> +		mm = get_task_mm(current);
> +		if (!mm)
> +			return -EINVAL;
> +	}
> +
> +	ret = -ESRCH;
> +	mutex_lock(&iommu->lock);
> +	list_for_each_entry(vfio_mm, &iommu->mm_list, next) {
> +		if (vfio_mm->mm != mm)
> +			continue;
> +
> +		vfio_iommu_unbind(iommu, vfio_mm);
> +		list_del(&vfio_mm->next);
> +		kfree(vfio_mm);
> +		ret = 0;
> +		break;
> +	}
> +	mutex_unlock(&iommu->lock);
> +	mmput(mm);
> +
> +	return ret;
> +}
> +
>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>  				   unsigned int cmd, unsigned long arg)
>  {
> @@ -1607,6 +1968,44 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  
>  		return copy_to_user((void __user *)arg, &unmap, minsz) ?
>  			-EFAULT : 0;
> +
> +	} else if (cmd == VFIO_IOMMU_BIND) {
> +		struct vfio_iommu_type1_bind bind;
> +
> +		minsz = offsetofend(struct vfio_iommu_type1_bind, mode);
> +
> +		if (copy_from_user(&bind, (void __user *)arg, minsz))
> +			return -EFAULT;
> +
> +		if (bind.argsz < minsz)
> +			return -EINVAL;
> +
> +		switch (bind.mode) {
> +		case VFIO_IOMMU_BIND_PROCESS:
> +			return vfio_iommu_type1_bind_process(iommu, (void *)arg,
> +							     &bind);
> +		default:
> +			return -EINVAL;
> +		}
> +
> +	} else if (cmd == VFIO_IOMMU_UNBIND) {
> +		struct vfio_iommu_type1_bind bind;
> +
> +		minsz = offsetofend(struct vfio_iommu_type1_bind, mode);
> +
> +		if (copy_from_user(&bind, (void __user *)arg, minsz))
> +			return -EFAULT;
> +
> +		if (bind.argsz < minsz)
> +			return -EINVAL;
> +
> +		switch (bind.mode) {
> +		case VFIO_IOMMU_BIND_PROCESS:
> +			return vfio_iommu_type1_unbind_process(iommu, (void *)arg,
> +							       &bind);
> +		default:
> +			return -EINVAL;
> +		}
>  	}
>  
>  	return -ENOTTY;
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index c74372163ed2..e1b9b8c58916 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -638,6 +638,82 @@ struct vfio_iommu_type1_dma_unmap {
>  #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
>  #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
>  
> +/*
> + * VFIO_IOMMU_BIND_PROCESS
> + *
> + * Allocate a PASID for a process address space, and use it to attach this
> + * process to all devices in the container. Devices can then tag their DMA
> + * traffic with the returned @pasid to perform transactions on the associated
> + * virtual address space. Mapping and unmapping buffers is performed by standard
> + * functions such as mmap and malloc.
> + *
> + * If flag is VFIO_IOMMU_BIND_PID, @pid contains the pid of a foreign process to
> + * bind. Otherwise the current task is bound. Given that the caller owns the
> + * device, setting this flag grants the caller read and write permissions on the
> + * entire address space of foreign process described by @pid. Therefore,
> + * permission to perform the bind operation on a foreign process is governed by
> + * the ptrace access mode PTRACE_MODE_ATTACH_REALCREDS check. See man ptrace(2)
> + * for more information.
> + *
> + * On success, VFIO writes a Process Address Space ID (PASID) into @pasid. This
> + * ID is unique to a process and can be used on all devices in the container.
> + *
> + * On fork, the child inherits the device fd and can use the bonds setup by its
> + * parent. Consequently, the child has R/W access on the address spaces bound by
> + * its parent. After an execv, the device fd is closed and the child doesn't
> + * have access to the address space anymore.
> + *
> + * To remove a bond between process and container, VFIO_IOMMU_UNBIND ioctl is
> + * issued with the same parameters. If a pid was specified in VFIO_IOMMU_BIND,
> + * it should also be present for VFIO_IOMMU_UNBIND. Otherwise unbind the current
> + * task from the container.
> + */
> +struct vfio_iommu_type1_bind_process {
> +	__u32	flags;
> +#define VFIO_IOMMU_BIND_PID		(1 << 0)
> +	__u32	pasid;
> +	__s32	pid;
> +};
> +
> +/*
> + * Only mode supported at the moment is VFIO_IOMMU_BIND_PROCESS, which takes
> + * vfio_iommu_type1_bind_process in data.
> + */
> +struct vfio_iommu_type1_bind {
> +	__u32	argsz;
> +	__u32	mode;

s/mode/flags/

> +#define VFIO_IOMMU_BIND_PROCESS		(1 << 0)
> +	__u8	data[];
> +};

I'm not convinced having a separate vfio_iommu_type1_bind_process
struct is necessary.  It seems like we always expect to return a pasid,
only the pid is optional, but that could be handled by a single
structure with a flag bit to indicate a pid bind is requested.

> +
> +/*
> + * VFIO_IOMMU_BIND - _IOWR(VFIO_TYPE, VFIO_BASE + 22, struct vfio_iommu_bind)

vfio_iommu_type1_bind

> + *
> + * Manage address spaces of devices in this container. Initially a TYPE1
> + * container can only have one address space, managed with
> + * VFIO_IOMMU_MAP/UNMAP_DMA.
> + *
> + * An IOMMU of type VFIO_TYPE1_NESTING_IOMMU can be managed by both MAP/UNMAP
> + * and BIND ioctls at the same time. MAP/UNMAP acts on the stage-2 (host) page
> + * tables, and BIND manages the stage-1 (guest) page tables. Other types of
> + * IOMMU may allow MAP/UNMAP and BIND to coexist, where MAP/UNMAP controls
> + * non-PASID traffic and BIND controls PASID traffic. But this depends on the
> + * underlying IOMMU architecture and isn't guaranteed.
> + *
> + * Availability of this feature depends on the device, its bus, the underlying
> + * IOMMU and the CPU architecture.
> + *
> + * returns: 0 on success, -errno on failure.
> + */
> +#define VFIO_IOMMU_BIND		_IO(VFIO_TYPE, VFIO_BASE + 22)
> +
> +/*
> + * VFIO_IOMMU_UNBIND - _IOWR(VFIO_TYPE, VFIO_BASE + 23, struct vfio_iommu_bind)

vifo_iommu_type1_bind

> + *
> + * Undo what was done by the corresponding VFIO_IOMMU_BIND ioctl.
> + */
> +#define VFIO_IOMMU_UNBIND	_IO(VFIO_TYPE, VFIO_BASE + 23)
> +
>  /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
>  
>  /*


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 37/37] vfio: Add support for Shared Virtual Addressing
@ 2018-02-16 19:33       ` Alex Williamson
  0 siblings, 0 replies; 317+ messages in thread
From: Alex Williamson @ 2018-02-16 19:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 12 Feb 2018 18:33:52 +0000
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> Add two new ioctl for VFIO containers. VFIO_IOMMU_BIND_PROCESS creates a
> bond between a container and a process address space, identified by a
> device-specific ID named PASID. This allows the device to target DMA
> transactions at the process virtual addresses without a need for mapping
> and unmapping buffers explicitly in the IOMMU. The process page tables are
> shared with the IOMMU, and mechanisms such as PCI ATS/PRI are used to
> handle faults. VFIO_IOMMU_UNBIND_PROCESS removes a bond created with
> VFIO_IOMMU_BIND_PROCESS.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 399 ++++++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/vfio.h       |  76 ++++++++
>  2 files changed, 475 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index e30e29ae4819..cac066f0026b 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -30,6 +30,7 @@
>  #include <linux/iommu.h>
>  #include <linux/module.h>
>  #include <linux/mm.h>
> +#include <linux/ptrace.h>
>  #include <linux/rbtree.h>
>  #include <linux/sched/signal.h>
>  #include <linux/sched/mm.h>
> @@ -60,6 +61,7 @@ MODULE_PARM_DESC(disable_hugepages,
>  
>  struct vfio_iommu {
>  	struct list_head	domain_list;
> +	struct list_head	mm_list;
>  	struct vfio_domain	*external_domain; /* domain for external user */
>  	struct mutex		lock;
>  	struct rb_root		dma_list;
> @@ -90,6 +92,15 @@ struct vfio_dma {
>  struct vfio_group {
>  	struct iommu_group	*iommu_group;
>  	struct list_head	next;
> +	bool			sva_enabled;
> +};
> +
> +struct vfio_mm {
> +#define VFIO_PASID_INVALID	(-1)
> +	spinlock_t		lock;
> +	int			pasid;
> +	struct mm_struct	*mm;
> +	struct list_head	next;
>  };
>  
>  /*
> @@ -1117,6 +1128,157 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
>  	return 0;
>  }
>  
> +static int vfio_iommu_mm_exit(struct device *dev, int pasid, void *data)
> +{
> +	struct vfio_mm *vfio_mm = data;
> +
> +	/*
> +	 * The mm_exit callback cannot block, so we can't take the iommu mutex
> +	 * and remove this vfio_mm from the list. Hopefully the SVA code will
> +	 * relax its locking requirement in the future.
> +	 *
> +	 * We mostly care about attach_group, which will attempt to replay all
> +	 * binds in this container. Ensure that it doesn't touch this defunct mm
> +	 * struct, by clearing the pointer. The structure will be freed when the
> +	 * group is removed from the container.
> +	 */
> +	spin_lock(&vfio_mm->lock);
> +	vfio_mm->mm = NULL;
> +	spin_unlock(&vfio_mm->lock);
> +
> +	return 0;
> +}
> +
> +static int vfio_iommu_sva_init(struct device *dev, void *data)
> +{
> +
> +	int ret;
> +
> +	ret = iommu_sva_device_init(dev, IOMMU_SVA_FEAT_PASID |
> +				    IOMMU_SVA_FEAT_IOPF, 0);
> +	if (ret)
> +		return ret;
> +
> +	return iommu_register_mm_exit_handler(dev, vfio_iommu_mm_exit);
> +}
> +
> +static int vfio_iommu_sva_shutdown(struct device *dev, void *data)
> +{
> +	iommu_sva_device_shutdown(dev);
> +	iommu_unregister_mm_exit_handler(dev);

Typically the order would be reverse of the setup, is it correct this
way?

> +
> +	return 0;
> +}
> +
> +static int vfio_iommu_bind_group(struct vfio_iommu *iommu,
> +				 struct vfio_group *group,
> +				 struct vfio_mm *vfio_mm)
> +{
> +	int ret;
> +	int pasid;
> +
> +	if (!group->sva_enabled) {
> +		ret = iommu_group_for_each_dev(group->iommu_group, NULL,
> +					       vfio_iommu_sva_init);
> +		if (ret)
> +			return ret;

Seems were at an unknown state here, do we need to undo any that
succeeded?

> +
> +		group->sva_enabled = true;
> +	}
> +
> +	ret = iommu_sva_bind_group(group->iommu_group, vfio_mm->mm, &pasid,
> +				   IOMMU_SVA_FEAT_PASID | IOMMU_SVA_FEAT_IOPF,
> +				   vfio_mm);
> +	if (ret)
> +		return ret;
> +
> +	if (WARN_ON(vfio_mm->pasid != VFIO_PASID_INVALID && pasid !=
> +		    vfio_mm->pasid))
> +		return -EFAULT;
> +
> +	vfio_mm->pasid = pasid;
> +
> +	return 0;
> +}
> +
> +static void vfio_iommu_unbind_group(struct vfio_group *group,
> +				    struct vfio_mm *vfio_mm)
> +{
> +	iommu_sva_unbind_group(group->iommu_group, vfio_mm->pasid);
> +}
> +
> +static void vfio_iommu_unbind(struct vfio_iommu *iommu,
> +			      struct vfio_mm *vfio_mm)
> +{
> +	struct vfio_group *group;
> +	struct vfio_domain *domain;
> +
> +	list_for_each_entry(domain, &iommu->domain_list, next)
> +		list_for_each_entry(group, &domain->group_list, next)
> +			vfio_iommu_unbind_group(group, vfio_mm);
> +}
> +
> +static bool vfio_mm_get(struct vfio_mm *vfio_mm)
> +{
> +	bool ret;
> +
> +	spin_lock(&vfio_mm->lock);
> +	ret = vfio_mm->mm && mmget_not_zero(vfio_mm->mm);
> +	spin_unlock(&vfio_mm->lock);
> +
> +	return ret;
> +}
> +
> +static void vfio_mm_put(struct vfio_mm *vfio_mm)
> +{
> +	mmput(vfio_mm->mm);
> +}
> +
> +static int vfio_iommu_replay_bind(struct vfio_iommu *iommu, struct vfio_group *group)
> +{
> +	int ret = 0;
> +	struct vfio_mm *vfio_mm;
> +
> +	list_for_each_entry(vfio_mm, &iommu->mm_list, next) {
> +		/*
> +		 * Ensure mm doesn't exit while we're binding it to the new
> +		 * group.
> +		 */
> +		if (!vfio_mm_get(vfio_mm))
> +			continue;
> +		ret = vfio_iommu_bind_group(iommu, group, vfio_mm);
> +		vfio_mm_put(vfio_mm);
> +
> +		if (ret)
> +			goto out_unbind;
> +	}
> +
> +	return 0;
> +
> +out_unbind:
> +	list_for_each_entry_continue_reverse(vfio_mm, &iommu->mm_list, next) {
> +		if (!vfio_mm_get(vfio_mm))
> +			continue;
> +		iommu_sva_unbind_group(group->iommu_group, vfio_mm->pasid);
> +		vfio_mm_put(vfio_mm);
> +	}
> +
> +	return ret;
> +}
> +
> +static void vfio_iommu_free_all_mm(struct vfio_iommu *iommu)
> +{
> +	struct vfio_mm *vfio_mm, *tmp;
> +
> +	/*
> +	 * No need for unbind() here. Since all groups are detached from this
> +	 * iommu, bonds have been removed.
> +	 */
> +	list_for_each_entry_safe(vfio_mm, tmp, &iommu->mm_list, next)
> +		kfree(vfio_mm);
> +	INIT_LIST_HEAD(&iommu->mm_list);
> +}
> +
>  /*
>   * We change our unmap behavior slightly depending on whether the IOMMU
>   * supports fine-grained superpages.  IOMMUs like AMD-Vi will use a superpage
> @@ -1301,6 +1463,15 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>  		    d->prot == domain->prot) {
>  			iommu_detach_group(domain->domain, iommu_group);
>  			if (!iommu_attach_group(d->domain, iommu_group)) {
> +				if (vfio_iommu_replay_bind(iommu, group)) {
> +					iommu_detach_group(d->domain, iommu_group);
> +					ret = iommu_attach_group(domain->domain,
> +								 iommu_group);
> +					if (ret)
> +						goto out_domain;
> +					continue;
> +				}
> +
>  				list_add(&group->next, &d->group_list);
>  				iommu_domain_free(domain->domain);
>  				kfree(domain);
> @@ -1321,6 +1492,10 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>  	if (ret)
>  		goto out_detach;
>  
> +	ret = vfio_iommu_replay_bind(iommu, group);
> +	if (ret)
> +		goto out_detach;
> +
>  	if (resv_msi) {
>  		ret = iommu_get_msi_cookie(domain->domain, resv_msi_base);
>  		if (ret)
> @@ -1426,6 +1601,11 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
>  			continue;
>  
>  		iommu_detach_group(domain->domain, iommu_group);
> +		if (group->sva_enabled) {
> +			iommu_group_for_each_dev(iommu_group, NULL,
> +						 vfio_iommu_sva_shutdown);
> +			group->sva_enabled = false;
> +		}
>  		list_del(&group->next);
>  		kfree(group);
>  		/*
> @@ -1441,6 +1621,7 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
>  					vfio_iommu_unmap_unpin_all(iommu);
>  				else
>  					vfio_iommu_unmap_unpin_reaccount(iommu);
> +				vfio_iommu_free_all_mm(iommu);
>  			}
>  			iommu_domain_free(domain->domain);
>  			list_del(&domain->next);
> @@ -1475,6 +1656,7 @@ static void *vfio_iommu_type1_open(unsigned long arg)
>  	}
>  
>  	INIT_LIST_HEAD(&iommu->domain_list);
> +	INIT_LIST_HEAD(&iommu->mm_list);
>  	iommu->dma_list = RB_ROOT;
>  	mutex_init(&iommu->lock);
>  	BLOCKING_INIT_NOTIFIER_HEAD(&iommu->notifier);
> @@ -1509,6 +1691,7 @@ static void vfio_iommu_type1_release(void *iommu_data)
>  		kfree(iommu->external_domain);
>  	}
>  
> +	vfio_iommu_free_all_mm(iommu);
>  	vfio_iommu_unmap_unpin_all(iommu);
>  
>  	list_for_each_entry_safe(domain, domain_tmp,
> @@ -1537,6 +1720,184 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
>  	return ret;
>  }
>  
> +static struct mm_struct *vfio_iommu_get_mm_by_vpid(pid_t vpid)
> +{
> +	struct mm_struct *mm;
> +	struct task_struct *task;
> +
> +	rcu_read_lock();
> +	task = find_task_by_vpid(vpid);
> +	if (task)
> +		get_task_struct(task);
> +	rcu_read_unlock();
> +	if (!task)
> +		return ERR_PTR(-ESRCH);
> +
> +	/* Ensure that current has RW access on the mm */
> +	mm = mm_access(task, PTRACE_MODE_ATTACH_REALCREDS);
> +	put_task_struct(task);
> +
> +	if (!mm)
> +		return ERR_PTR(-ESRCH);
> +
> +	return mm;
> +}
> +
> +static long vfio_iommu_type1_bind_process(struct vfio_iommu *iommu,
> +					  void __user *arg,
> +					  struct vfio_iommu_type1_bind *bind)
> +{
> +	struct vfio_iommu_type1_bind_process params;
> +	struct vfio_domain *domain;
> +	struct vfio_group *group;
> +	struct vfio_mm *vfio_mm;
> +	struct mm_struct *mm;
> +	unsigned long minsz;
> +	int ret = 0;
> +
> +	minsz = sizeof(*bind) + sizeof(params);
> +	if (bind->argsz < minsz)
> +		return -EINVAL;
> +
> +	arg += sizeof(*bind);
> +	if (copy_from_user(&params, arg, sizeof(params)))
> +		return -EFAULT;
> +
> +	if (params.flags & ~VFIO_IOMMU_BIND_PID)
> +		return -EINVAL;
> +
> +	if (params.flags & VFIO_IOMMU_BIND_PID) {
> +		mm = vfio_iommu_get_mm_by_vpid(params.pid);
> +		if (IS_ERR(mm))
> +			return PTR_ERR(mm);
> +	} else {
> +		mm = get_task_mm(current);
> +		if (!mm)
> +			return -EINVAL;
> +	}
> +
> +	mutex_lock(&iommu->lock);
> +	if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) {
> +		ret = -EINVAL;
> +		goto out_put_mm;
> +	}
> +
> +	list_for_each_entry(vfio_mm, &iommu->mm_list, next) {
> +		if (vfio_mm->mm != mm)
> +			continue;
> +
> +		params.pasid = vfio_mm->pasid;
> +
> +		ret = copy_to_user(arg, &params, sizeof(params)) ? -EFAULT : 0;
> +		goto out_put_mm;
> +	}
> +
> +	vfio_mm = kzalloc(sizeof(*vfio_mm), GFP_KERNEL);
> +	if (!vfio_mm) {
> +		ret = -ENOMEM;
> +		goto out_put_mm;
> +	}
> +
> +	vfio_mm->mm = mm;
> +	vfio_mm->pasid = VFIO_PASID_INVALID;
> +	spin_lock_init(&vfio_mm->lock);
> +
> +	list_for_each_entry(domain, &iommu->domain_list, next) {
> +		list_for_each_entry(group, &domain->group_list, next) {
> +			ret = vfio_iommu_bind_group(iommu, group, vfio_mm);
> +			if (ret)
> +				break;
> +		}
> +		if (ret)
> +			break;
> +	}
> +
> +	if (ret) {
> +		/* Undo all binds that already succeeded */
> +		list_for_each_entry_continue_reverse(group, &domain->group_list,
> +						     next)
> +			vfio_iommu_unbind_group(group, vfio_mm);
> +		list_for_each_entry_continue_reverse(domain, &iommu->domain_list,
> +						     next)
> +			list_for_each_entry(group, &domain->group_list, next)
> +				vfio_iommu_unbind_group(group, vfio_mm);
> +		kfree(vfio_mm);
> +	} else {
> +		list_add(&vfio_mm->next, &iommu->mm_list);
> +
> +		params.pasid = vfio_mm->pasid;
> +		ret = copy_to_user(arg, &params, sizeof(params)) ? -EFAULT : 0;
> +		if (ret) {
> +			vfio_iommu_unbind(iommu, vfio_mm);
> +			kfree(vfio_mm);
> +		}
> +	}
> +
> +out_put_mm:
> +	mutex_unlock(&iommu->lock);
> +	mmput(mm);
> +
> +	return ret;
> +}
> +
> +static long vfio_iommu_type1_unbind_process(struct vfio_iommu *iommu,
> +					    void __user *arg,
> +					    struct vfio_iommu_type1_bind *bind)
> +{
> +	int ret = -EINVAL;
> +	unsigned long minsz;
> +	struct mm_struct *mm;
> +	struct vfio_mm *vfio_mm;
> +	struct vfio_iommu_type1_bind_process params;
> +
> +	minsz = sizeof(*bind) + sizeof(params);
> +	if (bind->argsz < minsz)
> +		return -EINVAL;
> +
> +	arg += sizeof(*bind);
> +	if (copy_from_user(&params, arg, sizeof(params)))
> +		return -EFAULT;
> +
> +	if (params.flags & ~VFIO_IOMMU_BIND_PID)
> +		return -EINVAL;
> +
> +	/*
> +	 * We can't simply unbind a foreign process by PASID, because the
> +	 * process might have died and the PASID might have been reallocated to
> +	 * another process. Instead we need to fetch that process mm by PID
> +	 * again to make sure we remove the right vfio_mm. In addition, holding
> +	 * the mm guarantees that mm_users isn't dropped while we unbind and the
> +	 * exit_mm handler doesn't fire. While not strictly necessary, not
> +	 * having to care about that race simplifies everyone's life.
> +	 */
> +	if (params.flags & VFIO_IOMMU_BIND_PID) {
> +		mm = vfio_iommu_get_mm_by_vpid(params.pid);
> +		if (IS_ERR(mm))
> +			return PTR_ERR(mm);

I don't understand how this works for a process that has exited, the
mm_exit function gets called to clear vfio_mm.mm, the above may or may
not work (could be new ptrace'able process with same pid), but it won't
match the mm below, so is the vfio_mm that mm_exit zapped forever stuck
in this list until the container is destroyed?

> +	} else {
> +		mm = get_task_mm(current);
> +		if (!mm)
> +			return -EINVAL;
> +	}
> +
> +	ret = -ESRCH;
> +	mutex_lock(&iommu->lock);
> +	list_for_each_entry(vfio_mm, &iommu->mm_list, next) {
> +		if (vfio_mm->mm != mm)
> +			continue;
> +
> +		vfio_iommu_unbind(iommu, vfio_mm);
> +		list_del(&vfio_mm->next);
> +		kfree(vfio_mm);
> +		ret = 0;
> +		break;
> +	}
> +	mutex_unlock(&iommu->lock);
> +	mmput(mm);
> +
> +	return ret;
> +}
> +
>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>  				   unsigned int cmd, unsigned long arg)
>  {
> @@ -1607,6 +1968,44 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  
>  		return copy_to_user((void __user *)arg, &unmap, minsz) ?
>  			-EFAULT : 0;
> +
> +	} else if (cmd == VFIO_IOMMU_BIND) {
> +		struct vfio_iommu_type1_bind bind;
> +
> +		minsz = offsetofend(struct vfio_iommu_type1_bind, mode);
> +
> +		if (copy_from_user(&bind, (void __user *)arg, minsz))
> +			return -EFAULT;
> +
> +		if (bind.argsz < minsz)
> +			return -EINVAL;
> +
> +		switch (bind.mode) {
> +		case VFIO_IOMMU_BIND_PROCESS:
> +			return vfio_iommu_type1_bind_process(iommu, (void *)arg,
> +							     &bind);
> +		default:
> +			return -EINVAL;
> +		}
> +
> +	} else if (cmd == VFIO_IOMMU_UNBIND) {
> +		struct vfio_iommu_type1_bind bind;
> +
> +		minsz = offsetofend(struct vfio_iommu_type1_bind, mode);
> +
> +		if (copy_from_user(&bind, (void __user *)arg, minsz))
> +			return -EFAULT;
> +
> +		if (bind.argsz < minsz)
> +			return -EINVAL;
> +
> +		switch (bind.mode) {
> +		case VFIO_IOMMU_BIND_PROCESS:
> +			return vfio_iommu_type1_unbind_process(iommu, (void *)arg,
> +							       &bind);
> +		default:
> +			return -EINVAL;
> +		}
>  	}
>  
>  	return -ENOTTY;
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index c74372163ed2..e1b9b8c58916 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -638,6 +638,82 @@ struct vfio_iommu_type1_dma_unmap {
>  #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
>  #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
>  
> +/*
> + * VFIO_IOMMU_BIND_PROCESS
> + *
> + * Allocate a PASID for a process address space, and use it to attach this
> + * process to all devices in the container. Devices can then tag their DMA
> + * traffic with the returned @pasid to perform transactions on the associated
> + * virtual address space. Mapping and unmapping buffers is performed by standard
> + * functions such as mmap and malloc.
> + *
> + * If flag is VFIO_IOMMU_BIND_PID, @pid contains the pid of a foreign process to
> + * bind. Otherwise the current task is bound. Given that the caller owns the
> + * device, setting this flag grants the caller read and write permissions on the
> + * entire address space of foreign process described by @pid. Therefore,
> + * permission to perform the bind operation on a foreign process is governed by
> + * the ptrace access mode PTRACE_MODE_ATTACH_REALCREDS check. See man ptrace(2)
> + * for more information.
> + *
> + * On success, VFIO writes a Process Address Space ID (PASID) into @pasid. This
> + * ID is unique to a process and can be used on all devices in the container.
> + *
> + * On fork, the child inherits the device fd and can use the bonds setup by its
> + * parent. Consequently, the child has R/W access on the address spaces bound by
> + * its parent. After an execv, the device fd is closed and the child doesn't
> + * have access to the address space anymore.
> + *
> + * To remove a bond between process and container, VFIO_IOMMU_UNBIND ioctl is
> + * issued with the same parameters. If a pid was specified in VFIO_IOMMU_BIND,
> + * it should also be present for VFIO_IOMMU_UNBIND. Otherwise unbind the current
> + * task from the container.
> + */
> +struct vfio_iommu_type1_bind_process {
> +	__u32	flags;
> +#define VFIO_IOMMU_BIND_PID		(1 << 0)
> +	__u32	pasid;
> +	__s32	pid;
> +};
> +
> +/*
> + * Only mode supported at the moment is VFIO_IOMMU_BIND_PROCESS, which takes
> + * vfio_iommu_type1_bind_process in data.
> + */
> +struct vfio_iommu_type1_bind {
> +	__u32	argsz;
> +	__u32	mode;

s/mode/flags/

> +#define VFIO_IOMMU_BIND_PROCESS		(1 << 0)
> +	__u8	data[];
> +};

I'm not convinced having a separate vfio_iommu_type1_bind_process
struct is necessary.  It seems like we always expect to return a pasid,
only the pid is optional, but that could be handled by a single
structure with a flag bit to indicate a pid bind is requested.

> +
> +/*
> + * VFIO_IOMMU_BIND - _IOWR(VFIO_TYPE, VFIO_BASE + 22, struct vfio_iommu_bind)

vfio_iommu_type1_bind

> + *
> + * Manage address spaces of devices in this container. Initially a TYPE1
> + * container can only have one address space, managed with
> + * VFIO_IOMMU_MAP/UNMAP_DMA.
> + *
> + * An IOMMU of type VFIO_TYPE1_NESTING_IOMMU can be managed by both MAP/UNMAP
> + * and BIND ioctls at the same time. MAP/UNMAP acts on the stage-2 (host) page
> + * tables, and BIND manages the stage-1 (guest) page tables. Other types of
> + * IOMMU may allow MAP/UNMAP and BIND to coexist, where MAP/UNMAP controls
> + * non-PASID traffic and BIND controls PASID traffic. But this depends on the
> + * underlying IOMMU architecture and isn't guaranteed.
> + *
> + * Availability of this feature depends on the device, its bus, the underlying
> + * IOMMU and the CPU architecture.
> + *
> + * returns: 0 on success, -errno on failure.
> + */
> +#define VFIO_IOMMU_BIND		_IO(VFIO_TYPE, VFIO_BASE + 22)
> +
> +/*
> + * VFIO_IOMMU_UNBIND - _IOWR(VFIO_TYPE, VFIO_BASE + 23, struct vfio_iommu_bind)

vifo_iommu_type1_bind

> + *
> + * Undo what was done by the corresponding VFIO_IOMMU_BIND ioctl.
> + */
> +#define VFIO_IOMMU_UNBIND	_IO(VFIO_TYPE, VFIO_BASE + 23)
> +
>  /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
>  
>  /*

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 11/37] dt-bindings: document stall and PASID properties for IOMMU masters
  2018-02-12 18:33   ` Jean-Philippe Brucker
  (?)
@ 2018-02-19  2:51       ` Rob Herring
  -1 siblings, 0 replies; 317+ messages in thread
From: Rob Herring @ 2018-02-19  2:51 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: mark.rutland-5wv7dgnIgG8,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, catalin.marinas-5wv7dgnIgG8,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw-LthD3rsA81gm4RdzfppkhA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	sudeep.holla-5wv7dgnIgG8, christian.koenig-5C7GfCeVMHo

On Mon, Feb 12, 2018 at 06:33:26PM +0000, Jean-Philippe Brucker wrote:
> On ARM systems, some platform devices behind an IOMMU may support stall
> and PASID features. Stall is the ability to recover from page faults and
> PASID offers multiple process address spaces to the device. Together they
> allow to do paging with a device. Let the firmware tell us when a device
> supports stall and PASID.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
> ---
>  Documentation/devicetree/bindings/iommu/iommu.txt | 24 +++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/iommu/iommu.txt b/Documentation/devicetree/bindings/iommu/iommu.txt
> index 5a8b4624defc..8066b3852110 100644
> --- a/Documentation/devicetree/bindings/iommu/iommu.txt
> +++ b/Documentation/devicetree/bindings/iommu/iommu.txt
> @@ -86,6 +86,30 @@ have a means to turn off translation. But it is invalid in such cases to
>  disable the IOMMU's device tree node in the first place because it would
>  prevent any driver from properly setting up the translations.
>  
> +Optional properties:
> +--------------------
> +- dma-can-stall: When present, the master can wait for a transaction to
> +  complete for an indefinite amount of time. Upon translation fault some
> +  IOMMUs, instead of aborting the translation immediately, may first
> +  notify the driver and keep the transaction in flight. This allows the OS
> +  to inspect the fault and, for example, make physical pages resident
> +  before updating the mappings and completing the transaction. Such IOMMU
> +  accepts a limited number of simultaneous stalled transactions before
> +  having to either put back-pressure on the master, or abort new faulting
> +  transactions.
> +
> +  Firmware has to opt-in stalling, because most buses and masters don't
> +  support it. In particular it isn't compatible with PCI, where
> +  transactions have to complete before a time limit. More generally it
> +  won't work in systems and masters that haven't been designed for
> +  stalling. For example the OS, in order to handle a stalled transaction,
> +  may attempt to retrieve pages from secondary storage in a stalled
> +  domain, leading to a deadlock.
> +
> +- pasid-bits: Some masters support multiple address spaces for DMA, by
> +  tagging DMA transactions with an address space identifier. By default,
> +  this is 0, which means that the device only has one address space.

So 3 would mean 8 address spaces?

Maybe pasid-num-bits would be a bit clearer. Either way,

Reviewed-by: Rob Herring <robh-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

Rob

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 11/37] dt-bindings: document stall and PASID properties for IOMMU masters
@ 2018-02-19  2:51       ` Rob Herring
  0 siblings, 0 replies; 317+ messages in thread
From: Rob Herring @ 2018-02-19  2:51 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: mark.rutland, xieyisheng1, ilias.apalodimas, kvm, linux-pci,
	xuzaibo, jonathan.cameron, will.deacon, okaya, lorenzo.pieralisi,
	ashok.raj, tn, joro, robdclark, bharatku, linux-acpi,
	catalin.marinas, rfranz, lenb, devicetree, jacob.jun.pan,
	alex.williamson, yi.l.liu, thunder.leizhen, bhelgaas,
	linux-arm-kernel, shunyong.yang, dwmw2, liubo95, rjw, jcrouse,
	iommu, hanjun.guo, sudeep.holla, robin.murphy, christian.koenig,
	nwatters

On Mon, Feb 12, 2018 at 06:33:26PM +0000, Jean-Philippe Brucker wrote:
> On ARM systems, some platform devices behind an IOMMU may support stall
> and PASID features. Stall is the ability to recover from page faults and
> PASID offers multiple process address spaces to the device. Together they
> allow to do paging with a device. Let the firmware tell us when a device
> supports stall and PASID.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  Documentation/devicetree/bindings/iommu/iommu.txt | 24 +++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/iommu/iommu.txt b/Documentation/devicetree/bindings/iommu/iommu.txt
> index 5a8b4624defc..8066b3852110 100644
> --- a/Documentation/devicetree/bindings/iommu/iommu.txt
> +++ b/Documentation/devicetree/bindings/iommu/iommu.txt
> @@ -86,6 +86,30 @@ have a means to turn off translation. But it is invalid in such cases to
>  disable the IOMMU's device tree node in the first place because it would
>  prevent any driver from properly setting up the translations.
>  
> +Optional properties:
> +--------------------
> +- dma-can-stall: When present, the master can wait for a transaction to
> +  complete for an indefinite amount of time. Upon translation fault some
> +  IOMMUs, instead of aborting the translation immediately, may first
> +  notify the driver and keep the transaction in flight. This allows the OS
> +  to inspect the fault and, for example, make physical pages resident
> +  before updating the mappings and completing the transaction. Such IOMMU
> +  accepts a limited number of simultaneous stalled transactions before
> +  having to either put back-pressure on the master, or abort new faulting
> +  transactions.
> +
> +  Firmware has to opt-in stalling, because most buses and masters don't
> +  support it. In particular it isn't compatible with PCI, where
> +  transactions have to complete before a time limit. More generally it
> +  won't work in systems and masters that haven't been designed for
> +  stalling. For example the OS, in order to handle a stalled transaction,
> +  may attempt to retrieve pages from secondary storage in a stalled
> +  domain, leading to a deadlock.
> +
> +- pasid-bits: Some masters support multiple address spaces for DMA, by
> +  tagging DMA transactions with an address space identifier. By default,
> +  this is 0, which means that the device only has one address space.

So 3 would mean 8 address spaces?

Maybe pasid-num-bits would be a bit clearer. Either way,

Reviewed-by: Rob Herring <robh@kernel.org>

Rob

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 11/37] dt-bindings: document stall and PASID properties for IOMMU masters
@ 2018-02-19  2:51       ` Rob Herring
  0 siblings, 0 replies; 317+ messages in thread
From: Rob Herring @ 2018-02-19  2:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 12, 2018 at 06:33:26PM +0000, Jean-Philippe Brucker wrote:
> On ARM systems, some platform devices behind an IOMMU may support stall
> and PASID features. Stall is the ability to recover from page faults and
> PASID offers multiple process address spaces to the device. Together they
> allow to do paging with a device. Let the firmware tell us when a device
> supports stall and PASID.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  Documentation/devicetree/bindings/iommu/iommu.txt | 24 +++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/iommu/iommu.txt b/Documentation/devicetree/bindings/iommu/iommu.txt
> index 5a8b4624defc..8066b3852110 100644
> --- a/Documentation/devicetree/bindings/iommu/iommu.txt
> +++ b/Documentation/devicetree/bindings/iommu/iommu.txt
> @@ -86,6 +86,30 @@ have a means to turn off translation. But it is invalid in such cases to
>  disable the IOMMU's device tree node in the first place because it would
>  prevent any driver from properly setting up the translations.
>  
> +Optional properties:
> +--------------------
> +- dma-can-stall: When present, the master can wait for a transaction to
> +  complete for an indefinite amount of time. Upon translation fault some
> +  IOMMUs, instead of aborting the translation immediately, may first
> +  notify the driver and keep the transaction in flight. This allows the OS
> +  to inspect the fault and, for example, make physical pages resident
> +  before updating the mappings and completing the transaction. Such IOMMU
> +  accepts a limited number of simultaneous stalled transactions before
> +  having to either put back-pressure on the master, or abort new faulting
> +  transactions.
> +
> +  Firmware has to opt-in stalling, because most buses and masters don't
> +  support it. In particular it isn't compatible with PCI, where
> +  transactions have to complete before a time limit. More generally it
> +  won't work in systems and masters that haven't been designed for
> +  stalling. For example the OS, in order to handle a stalled transaction,
> +  may attempt to retrieve pages from secondary storage in a stalled
> +  domain, leading to a deadlock.
> +
> +- pasid-bits: Some masters support multiple address spaces for DMA, by
> +  tagging DMA transactions with an address space identifier. By default,
> +  this is 0, which means that the device only has one address space.

So 3 would mean 8 address spaces?

Maybe pasid-num-bits would be a bit clearer. Either way,

Reviewed-by: Rob Herring <robh@kernel.org>

Rob

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 37/37] vfio: Add support for Shared Virtual Addressing
  2018-02-16 19:33       ` Alex Williamson
  (?)
@ 2018-02-20 11:26           ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-20 11:26 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Mark Rutland, ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, Catalin Marinas,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw

On 16/02/18 19:33, Alex Williamson wrote:
[...]
>> +static int vfio_iommu_sva_init(struct device *dev, void *data)
>> +{
>> +
>> +	int ret;
>> +
>> +	ret = iommu_sva_device_init(dev, IOMMU_SVA_FEAT_PASID |
>> +				    IOMMU_SVA_FEAT_IOPF, 0);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return iommu_register_mm_exit_handler(dev, vfio_iommu_mm_exit);
>> +}
>> +
>> +static int vfio_iommu_sva_shutdown(struct device *dev, void *data)
>> +{
>> +	iommu_sva_device_shutdown(dev);
>> +	iommu_unregister_mm_exit_handler(dev);
> 
> Typically the order would be reverse of the setup, is it correct this
> way?

I don't think it matters either way, but ABBA order would be nicer.
Registering mm_exit handler before sva_device_init is probably best.

>> +
>> +	return 0;
>> +}
>> +
>> +static int vfio_iommu_bind_group(struct vfio_iommu *iommu,
>> +				 struct vfio_group *group,
>> +				 struct vfio_mm *vfio_mm)
>> +{
>> +	int ret;
>> +	int pasid;
>> +
>> +	if (!group->sva_enabled) {
>> +		ret = iommu_group_for_each_dev(group->iommu_group, NULL,
>> +					       vfio_iommu_sva_init);
>> +		if (ret)
>> +			return ret;
> 
> Seems were at an unknown state here, do we need to undo any that
> succeeded?

I think we do. However following the discussion on patch 2/37 it seems
we should limit SVA to singular groups for the moment, disallowing it if
the group has more than one device. Handling compound groups is
complicated and hopefully not needed by SVA systems. So I'd like to
change the logic here and ensure group_for_each_dev only calls sva_init
once.

[...]
>> +	/*
>> +	 * We can't simply unbind a foreign process by PASID, because the
>> +	 * process might have died and the PASID might have been reallocated to
>> +	 * another process. Instead we need to fetch that process mm by PID
>> +	 * again to make sure we remove the right vfio_mm. In addition, holding
>> +	 * the mm guarantees that mm_users isn't dropped while we unbind and the
>> +	 * exit_mm handler doesn't fire. While not strictly necessary, not
>> +	 * having to care about that race simplifies everyone's life.
>> +	 */
>> +	if (params.flags & VFIO_IOMMU_BIND_PID) {
>> +		mm = vfio_iommu_get_mm_by_vpid(params.pid);
>> +		if (IS_ERR(mm))
>> +			return PTR_ERR(mm);
> 
> I don't understand how this works for a process that has exited, the
> mm_exit function gets called to clear vfio_mm.mm, the above may or may
> not work (could be new ptrace'able process with same pid), but it won't
> match the mm below, so is the vfio_mm that mm_exit zapped forever stuck
> in this list until the container is destroyed?

Yes, it's not nice. mm_exit() is called with a spinlock held, so it
can't take the iommu->lock and modify mm_list.
vfio_iommu_type1_unbind_process() could do a bit of garbage collection
and remove all defunct vfio_mm, if they're not held by any iommu_bond
anymore.

But I think iommu_notifier_release (patch 5/37) can actually release the
lock temporarily if it's careful about concurrent list modifications
(and takes a ref to the given bond), in which case we can remove this
mm_exit() constraint and simplify the VFIO patch.

[...]
>> +/*
>> + * Only mode supported at the moment is VFIO_IOMMU_BIND_PROCESS, which takes
>> + * vfio_iommu_type1_bind_process in data.
>> + */
>> +struct vfio_iommu_type1_bind {
>> +	__u32	argsz;
>> +	__u32	mode;
> 
> s/mode/flags/
> 
>> +#define VFIO_IOMMU_BIND_PROCESS		(1 << 0)
>> +	__u8	data[];
>> +};
> 
> I'm not convinced having a separate vfio_iommu_type1_bind_process
> struct is necessary.  It seems like we always expect to return a pasid,
> only the pid is optional, but that could be handled by a single
> structure with a flag bit to indicate a pid bind is requested.

We were planning to reuse VFIO_IOMMU_BIND for PASID table binding as
well. So vfio_iommu_type1_bind::flags would either be
VFIO_IOMMU_BIND_PROCESS or VFIO_IOMMU_BIND_PASID_TABLE, and
vfio_iommu_type1_bind::data is an union of vfio_iommu_type1_bind_process
and vfio_iommu_type1_bind_pasid_table

https://patchwork.kernel.org/patch/9701025/

> 
>> +
>> +/*
>> + * VFIO_IOMMU_BIND - _IOWR(VFIO_TYPE, VFIO_BASE + 22, struct vfio_iommu_bind)
> 
> vfio_iommu_type1_bind

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 37/37] vfio: Add support for Shared Virtual Addressing
@ 2018-02-20 11:26           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-20 11:26 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Mark Rutland, xieyisheng1, ilias.apalodimas, kvm, linux-pci,
	xuzaibo, jonathan.cameron, Will Deacon, okaya, yi.l.liu,
	Lorenzo Pieralisi, ashok.raj, tn, joro, robdclark, bharatku,
	linux-acpi, Catalin Marinas, rfranz, lenb, devicetree,
	jacob.jun.pan, jcrouse, robh+dt, thunder.leizhen, bhelgaas,
	linux-arm-kernel, shunyong.yang, dwmw2, liubo95, rjw, iommu,
	hanjun.guo, Sudeep Holla, Robin Murphy, christian.koenig,
	nwatters

On 16/02/18 19:33, Alex Williamson wrote:
[...]
>> +static int vfio_iommu_sva_init(struct device *dev, void *data)
>> +{
>> +
>> +	int ret;
>> +
>> +	ret = iommu_sva_device_init(dev, IOMMU_SVA_FEAT_PASID |
>> +				    IOMMU_SVA_FEAT_IOPF, 0);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return iommu_register_mm_exit_handler(dev, vfio_iommu_mm_exit);
>> +}
>> +
>> +static int vfio_iommu_sva_shutdown(struct device *dev, void *data)
>> +{
>> +	iommu_sva_device_shutdown(dev);
>> +	iommu_unregister_mm_exit_handler(dev);
> 
> Typically the order would be reverse of the setup, is it correct this
> way?

I don't think it matters either way, but ABBA order would be nicer.
Registering mm_exit handler before sva_device_init is probably best.

>> +
>> +	return 0;
>> +}
>> +
>> +static int vfio_iommu_bind_group(struct vfio_iommu *iommu,
>> +				 struct vfio_group *group,
>> +				 struct vfio_mm *vfio_mm)
>> +{
>> +	int ret;
>> +	int pasid;
>> +
>> +	if (!group->sva_enabled) {
>> +		ret = iommu_group_for_each_dev(group->iommu_group, NULL,
>> +					       vfio_iommu_sva_init);
>> +		if (ret)
>> +			return ret;
> 
> Seems were at an unknown state here, do we need to undo any that
> succeeded?

I think we do. However following the discussion on patch 2/37 it seems
we should limit SVA to singular groups for the moment, disallowing it if
the group has more than one device. Handling compound groups is
complicated and hopefully not needed by SVA systems. So I'd like to
change the logic here and ensure group_for_each_dev only calls sva_init
once.

[...]
>> +	/*
>> +	 * We can't simply unbind a foreign process by PASID, because the
>> +	 * process might have died and the PASID might have been reallocated to
>> +	 * another process. Instead we need to fetch that process mm by PID
>> +	 * again to make sure we remove the right vfio_mm. In addition, holding
>> +	 * the mm guarantees that mm_users isn't dropped while we unbind and the
>> +	 * exit_mm handler doesn't fire. While not strictly necessary, not
>> +	 * having to care about that race simplifies everyone's life.
>> +	 */
>> +	if (params.flags & VFIO_IOMMU_BIND_PID) {
>> +		mm = vfio_iommu_get_mm_by_vpid(params.pid);
>> +		if (IS_ERR(mm))
>> +			return PTR_ERR(mm);
> 
> I don't understand how this works for a process that has exited, the
> mm_exit function gets called to clear vfio_mm.mm, the above may or may
> not work (could be new ptrace'able process with same pid), but it won't
> match the mm below, so is the vfio_mm that mm_exit zapped forever stuck
> in this list until the container is destroyed?

Yes, it's not nice. mm_exit() is called with a spinlock held, so it
can't take the iommu->lock and modify mm_list.
vfio_iommu_type1_unbind_process() could do a bit of garbage collection
and remove all defunct vfio_mm, if they're not held by any iommu_bond
anymore.

But I think iommu_notifier_release (patch 5/37) can actually release the
lock temporarily if it's careful about concurrent list modifications
(and takes a ref to the given bond), in which case we can remove this
mm_exit() constraint and simplify the VFIO patch.

[...]
>> +/*
>> + * Only mode supported at the moment is VFIO_IOMMU_BIND_PROCESS, which takes
>> + * vfio_iommu_type1_bind_process in data.
>> + */
>> +struct vfio_iommu_type1_bind {
>> +	__u32	argsz;
>> +	__u32	mode;
> 
> s/mode/flags/
> 
>> +#define VFIO_IOMMU_BIND_PROCESS		(1 << 0)
>> +	__u8	data[];
>> +};
> 
> I'm not convinced having a separate vfio_iommu_type1_bind_process
> struct is necessary.  It seems like we always expect to return a pasid,
> only the pid is optional, but that could be handled by a single
> structure with a flag bit to indicate a pid bind is requested.

We were planning to reuse VFIO_IOMMU_BIND for PASID table binding as
well. So vfio_iommu_type1_bind::flags would either be
VFIO_IOMMU_BIND_PROCESS or VFIO_IOMMU_BIND_PASID_TABLE, and
vfio_iommu_type1_bind::data is an union of vfio_iommu_type1_bind_process
and vfio_iommu_type1_bind_pasid_table

https://patchwork.kernel.org/patch/9701025/

> 
>> +
>> +/*
>> + * VFIO_IOMMU_BIND - _IOWR(VFIO_TYPE, VFIO_BASE + 22, struct vfio_iommu_bind)
> 
> vfio_iommu_type1_bind

Thanks,
Jean


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 37/37] vfio: Add support for Shared Virtual Addressing
@ 2018-02-20 11:26           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-20 11:26 UTC (permalink / raw)
  To: linux-arm-kernel

On 16/02/18 19:33, Alex Williamson wrote:
[...]
>> +static int vfio_iommu_sva_init(struct device *dev, void *data)
>> +{
>> +
>> +	int ret;
>> +
>> +	ret = iommu_sva_device_init(dev, IOMMU_SVA_FEAT_PASID |
>> +				    IOMMU_SVA_FEAT_IOPF, 0);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return iommu_register_mm_exit_handler(dev, vfio_iommu_mm_exit);
>> +}
>> +
>> +static int vfio_iommu_sva_shutdown(struct device *dev, void *data)
>> +{
>> +	iommu_sva_device_shutdown(dev);
>> +	iommu_unregister_mm_exit_handler(dev);
> 
> Typically the order would be reverse of the setup, is it correct this
> way?

I don't think it matters either way, but ABBA order would be nicer.
Registering mm_exit handler before sva_device_init is probably best.

>> +
>> +	return 0;
>> +}
>> +
>> +static int vfio_iommu_bind_group(struct vfio_iommu *iommu,
>> +				 struct vfio_group *group,
>> +				 struct vfio_mm *vfio_mm)
>> +{
>> +	int ret;
>> +	int pasid;
>> +
>> +	if (!group->sva_enabled) {
>> +		ret = iommu_group_for_each_dev(group->iommu_group, NULL,
>> +					       vfio_iommu_sva_init);
>> +		if (ret)
>> +			return ret;
> 
> Seems were at an unknown state here, do we need to undo any that
> succeeded?

I think we do. However following the discussion on patch 2/37 it seems
we should limit SVA to singular groups for the moment, disallowing it if
the group has more than one device. Handling compound groups is
complicated and hopefully not needed by SVA systems. So I'd like to
change the logic here and ensure group_for_each_dev only calls sva_init
once.

[...]
>> +	/*
>> +	 * We can't simply unbind a foreign process by PASID, because the
>> +	 * process might have died and the PASID might have been reallocated to
>> +	 * another process. Instead we need to fetch that process mm by PID
>> +	 * again to make sure we remove the right vfio_mm. In addition, holding
>> +	 * the mm guarantees that mm_users isn't dropped while we unbind and the
>> +	 * exit_mm handler doesn't fire. While not strictly necessary, not
>> +	 * having to care about that race simplifies everyone's life.
>> +	 */
>> +	if (params.flags & VFIO_IOMMU_BIND_PID) {
>> +		mm = vfio_iommu_get_mm_by_vpid(params.pid);
>> +		if (IS_ERR(mm))
>> +			return PTR_ERR(mm);
> 
> I don't understand how this works for a process that has exited, the
> mm_exit function gets called to clear vfio_mm.mm, the above may or may
> not work (could be new ptrace'able process with same pid), but it won't
> match the mm below, so is the vfio_mm that mm_exit zapped forever stuck
> in this list until the container is destroyed?

Yes, it's not nice. mm_exit() is called with a spinlock held, so it
can't take the iommu->lock and modify mm_list.
vfio_iommu_type1_unbind_process() could do a bit of garbage collection
and remove all defunct vfio_mm, if they're not held by any iommu_bond
anymore.

But I think iommu_notifier_release (patch 5/37) can actually release the
lock temporarily if it's careful about concurrent list modifications
(and takes a ref to the given bond), in which case we can remove this
mm_exit() constraint and simplify the VFIO patch.

[...]
>> +/*
>> + * Only mode supported at the moment is VFIO_IOMMU_BIND_PROCESS, which takes
>> + * vfio_iommu_type1_bind_process in data.
>> + */
>> +struct vfio_iommu_type1_bind {
>> +	__u32	argsz;
>> +	__u32	mode;
> 
> s/mode/flags/
> 
>> +#define VFIO_IOMMU_BIND_PROCESS		(1 << 0)
>> +	__u8	data[];
>> +};
> 
> I'm not convinced having a separate vfio_iommu_type1_bind_process
> struct is necessary.  It seems like we always expect to return a pasid,
> only the pid is optional, but that could be handled by a single
> structure with a flag bit to indicate a pid bind is requested.

We were planning to reuse VFIO_IOMMU_BIND for PASID table binding as
well. So vfio_iommu_type1_bind::flags would either be
VFIO_IOMMU_BIND_PROCESS or VFIO_IOMMU_BIND_PASID_TABLE, and
vfio_iommu_type1_bind::data is an union of vfio_iommu_type1_bind_process
and vfio_iommu_type1_bind_pasid_table

https://patchwork.kernel.org/patch/9701025/

> 
>> +
>> +/*
>> + * VFIO_IOMMU_BIND - _IOWR(VFIO_TYPE, VFIO_BASE + 22, struct vfio_iommu_bind)
> 
> vfio_iommu_type1_bind

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 11/37] dt-bindings: document stall and PASID properties for IOMMU masters
  2018-02-19  2:51       ` Rob Herring
  (?)
@ 2018-02-20 11:28         ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-20 11:28 UTC (permalink / raw)
  To: Rob Herring
  Cc: Mark Rutland, ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, Catalin Marinas,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw-LthD3rsA81gm4RdzfppkhA, iommu

On 19/02/18 02:51, Rob Herring wrote:
[...]
>> +- pasid-bits: Some masters support multiple address spaces for DMA, by
>> +  tagging DMA transactions with an address space identifier. By default,
>> +  this is 0, which means that the device only has one address space.
> 
> So 3 would mean 8 address spaces?

Yes

> Maybe pasid-num-bits would be a bit clearer. Either way,

Indeed, I'll change it

> Reviewed-by: Rob Herring <robh-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

Thanks!
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 11/37] dt-bindings: document stall and PASID properties for IOMMU masters
@ 2018-02-20 11:28         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-20 11:28 UTC (permalink / raw)
  To: Rob Herring
  Cc: Mark Rutland, xieyisheng1, ilias.apalodimas, kvm, linux-pci,
	xuzaibo, jonathan.cameron, Will Deacon, okaya, Lorenzo Pieralisi,
	ashok.raj, tn, joro, robdclark, bharatku, linux-acpi,
	Catalin Marinas, rfranz, lenb, devicetree, jacob.jun.pan,
	alex.williamson, yi.l.liu, thunder.leizhen, bhelgaas,
	linux-arm-kernel, shunyong.yang, dwmw2, liubo95, rjw, jcrouse,
	iommu, hanjun.guo, Sudeep Holla, Robin Murphy, christian.koenig,
	nwatters

On 19/02/18 02:51, Rob Herring wrote:
[...]
>> +- pasid-bits: Some masters support multiple address spaces for DMA, by
>> +  tagging DMA transactions with an address space identifier. By default,
>> +  this is 0, which means that the device only has one address space.
> 
> So 3 would mean 8 address spaces?

Yes

> Maybe pasid-num-bits would be a bit clearer. Either way,

Indeed, I'll change it

> Reviewed-by: Rob Herring <robh@kernel.org>

Thanks!
Jean


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 11/37] dt-bindings: document stall and PASID properties for IOMMU masters
@ 2018-02-20 11:28         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-20 11:28 UTC (permalink / raw)
  To: linux-arm-kernel

On 19/02/18 02:51, Rob Herring wrote:
[...]
>> +- pasid-bits: Some masters support multiple address spaces for DMA, by
>> +  tagging DMA transactions with an address space identifier. By default,
>> +  this is 0, which means that the device only has one address space.
> 
> So 3 would mean 8 address spaces?

Yes

> Maybe pasid-num-bits would be a bit clearer. Either way,

Indeed, I'll change it

> Reviewed-by: Rob Herring <robh@kernel.org>

Thanks!
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 09/37] iommu/fault: Let handler return a fault response
  2018-02-12 18:33   ` Jean-Philippe Brucker
  (?)
@ 2018-02-20 23:19       ` Jacob Pan
  -1 siblings, 0 replies; 317+ messages in thread
From: Jacob Pan @ 2018-02-20 23:19 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: mark.rutland-5wv7dgnIgG8,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, catalin.marinas-5wv7dgnIgG8,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw-LthD3rsA81gm4RdzfppkhA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	sudeep.holla-5wv7dgnIgG8, christian.koenig-5C7GfCeVMHo

On Mon, 12 Feb 2018 18:33:24 +0000
Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:

>  
> +/**
> + * enum page_response_code - Return status of fault handlers,
> telling the IOMMU
> + * driver how to proceed with the fault.
> + *
> + * @IOMMU_FAULT_STATUS_HANDLED: Stop processing the fault, and do
> not send a
> + *	reply to the device.
> + * @IOMMU_FAULT_STATUS_CONTINUE: Fault was not handled. Call the
> next handler,
> + *	or terminate.
> + * @IOMMU_FAULT_STATUS_SUCCESS: Fault has been handled and the page
> tables
> + *	populated, retry the access. This is "Success" in PCI PRI.
> + * @IOMMU_FAULT_STATUS_FAILURE: General error. Drop all subsequent
> faults from
> + *	this device if possible. This is "Response Failure" in PCI
> PRI.
> + * @IOMMU_FAULT_STATUS_INVALID: Could not handle this fault, don't
> retry the
> + *	access. This is "Invalid Request" in PCI PRI.
> + */
> +enum page_response_code {
> +	IOMMU_PAGE_RESP_HANDLED = 0,
> +	IOMMU_PAGE_RESP_CONTINUE,
> +	IOMMU_PAGE_RESP_SUCCESS,
> +	IOMMU_PAGE_RESP_INVALID,
> +	IOMMU_PAGE_RESP_FAILURE,
> +};
it seems to me two things are mixed here:
1. driver handler response status (HANDLED, CONTINUE)
2. PCI standard page response code (the rest)
Can we leave them separate? then we don't have to convert this enum
to/from PCI ATS page response code.

> +
>  /**
>   * Generic page response information based on PCI ATS and PASID spec.
>   * @addr: servicing page address
> @@ -202,12 +225,7 @@ enum page_response_type {
>  struct page_response_msg {
>  	u64 addr;
>  	u32 pasid;
> -	u32 resp_code:4;
> -#define IOMMU_PAGE_RESP_SUCCESS	0
> -#define IOMMU_PAGE_RESP_INVALID	1
> -#define IOMMU_PAGE_RESP_HANDLED	2
> -#define IOMMU_PAGE_RESP_FAILURE	0xF
> -
[Jacob Pan]

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 09/37] iommu/fault: Let handler return a fault response
@ 2018-02-20 23:19       ` Jacob Pan
  0 siblings, 0 replies; 317+ messages in thread
From: Jacob Pan @ 2018-02-20 23:19 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm,
	joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, okaya, jcrouse,
	rfranz, dwmw2, yi.l.liu, ashok.raj, robdclark, christian.koenig,
	bharatku, jacob.jun.pan

On Mon, 12 Feb 2018 18:33:24 +0000
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

>  
> +/**
> + * enum page_response_code - Return status of fault handlers,
> telling the IOMMU
> + * driver how to proceed with the fault.
> + *
> + * @IOMMU_FAULT_STATUS_HANDLED: Stop processing the fault, and do
> not send a
> + *	reply to the device.
> + * @IOMMU_FAULT_STATUS_CONTINUE: Fault was not handled. Call the
> next handler,
> + *	or terminate.
> + * @IOMMU_FAULT_STATUS_SUCCESS: Fault has been handled and the page
> tables
> + *	populated, retry the access. This is "Success" in PCI PRI.
> + * @IOMMU_FAULT_STATUS_FAILURE: General error. Drop all subsequent
> faults from
> + *	this device if possible. This is "Response Failure" in PCI
> PRI.
> + * @IOMMU_FAULT_STATUS_INVALID: Could not handle this fault, don't
> retry the
> + *	access. This is "Invalid Request" in PCI PRI.
> + */
> +enum page_response_code {
> +	IOMMU_PAGE_RESP_HANDLED = 0,
> +	IOMMU_PAGE_RESP_CONTINUE,
> +	IOMMU_PAGE_RESP_SUCCESS,
> +	IOMMU_PAGE_RESP_INVALID,
> +	IOMMU_PAGE_RESP_FAILURE,
> +};
it seems to me two things are mixed here:
1. driver handler response status (HANDLED, CONTINUE)
2. PCI standard page response code (the rest)
Can we leave them separate? then we don't have to convert this enum
to/from PCI ATS page response code.

> +
>  /**
>   * Generic page response information based on PCI ATS and PASID spec.
>   * @addr: servicing page address
> @@ -202,12 +225,7 @@ enum page_response_type {
>  struct page_response_msg {
>  	u64 addr;
>  	u32 pasid;
> -	u32 resp_code:4;
> -#define IOMMU_PAGE_RESP_SUCCESS	0
> -#define IOMMU_PAGE_RESP_INVALID	1
> -#define IOMMU_PAGE_RESP_HANDLED	2
> -#define IOMMU_PAGE_RESP_FAILURE	0xF
> -
[Jacob Pan]

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 09/37] iommu/fault: Let handler return a fault response
@ 2018-02-20 23:19       ` Jacob Pan
  0 siblings, 0 replies; 317+ messages in thread
From: Jacob Pan @ 2018-02-20 23:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 12 Feb 2018 18:33:24 +0000
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

>  
> +/**
> + * enum page_response_code - Return status of fault handlers,
> telling the IOMMU
> + * driver how to proceed with the fault.
> + *
> + * @IOMMU_FAULT_STATUS_HANDLED: Stop processing the fault, and do
> not send a
> + *	reply to the device.
> + * @IOMMU_FAULT_STATUS_CONTINUE: Fault was not handled. Call the
> next handler,
> + *	or terminate.
> + * @IOMMU_FAULT_STATUS_SUCCESS: Fault has been handled and the page
> tables
> + *	populated, retry the access. This is "Success" in PCI PRI.
> + * @IOMMU_FAULT_STATUS_FAILURE: General error. Drop all subsequent
> faults from
> + *	this device if possible. This is "Response Failure" in PCI
> PRI.
> + * @IOMMU_FAULT_STATUS_INVALID: Could not handle this fault, don't
> retry the
> + *	access. This is "Invalid Request" in PCI PRI.
> + */
> +enum page_response_code {
> +	IOMMU_PAGE_RESP_HANDLED = 0,
> +	IOMMU_PAGE_RESP_CONTINUE,
> +	IOMMU_PAGE_RESP_SUCCESS,
> +	IOMMU_PAGE_RESP_INVALID,
> +	IOMMU_PAGE_RESP_FAILURE,
> +};
it seems to me two things are mixed here:
1. driver handler response status (HANDLED, CONTINUE)
2. PCI standard page response code (the rest)
Can we leave them separate? then we don't have to convert this enum
to/from PCI ATS page response code.

> +
>  /**
>   * Generic page response information based on PCI ATS and PASID spec.
>   * @addr: servicing page address
> @@ -202,12 +225,7 @@ enum page_response_type {
>  struct page_response_msg {
>  	u64 addr;
>  	u32 pasid;
> -	u32 resp_code:4;
> -#define IOMMU_PAGE_RESP_SUCCESS	0
> -#define IOMMU_PAGE_RESP_INVALID	1
> -#define IOMMU_PAGE_RESP_HANDLED	2
> -#define IOMMU_PAGE_RESP_FAILURE	0xF
> -
[Jacob Pan]

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 09/37] iommu/fault: Let handler return a fault response
  2018-02-20 23:19       ` Jacob Pan
  (?)
@ 2018-02-21 10:28         ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-21 10:28 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Mark Rutland, ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, Catalin Marinas,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw

On 20/02/18 23:19, Jacob Pan wrote:
> On Mon, 12 Feb 2018 18:33:24 +0000
> Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:
> 
>>  
>> +/**
>> + * enum page_response_code - Return status of fault handlers,
>> telling the IOMMU
>> + * driver how to proceed with the fault.
>> + *
>> + * @IOMMU_FAULT_STATUS_HANDLED: Stop processing the fault, and do
>> not send a
>> + *	reply to the device.
>> + * @IOMMU_FAULT_STATUS_CONTINUE: Fault was not handled. Call the
>> next handler,
>> + *	or terminate.
>> + * @IOMMU_FAULT_STATUS_SUCCESS: Fault has been handled and the page
>> tables
>> + *	populated, retry the access. This is "Success" in PCI PRI.
>> + * @IOMMU_FAULT_STATUS_FAILURE: General error. Drop all subsequent
>> faults from
>> + *	this device if possible. This is "Response Failure" in PCI
>> PRI.
>> + * @IOMMU_FAULT_STATUS_INVALID: Could not handle this fault, don't
>> retry the
>> + *	access. This is "Invalid Request" in PCI PRI.
>> + */
>> +enum page_response_code {
>> +	IOMMU_PAGE_RESP_HANDLED = 0,
>> +	IOMMU_PAGE_RESP_CONTINUE,
>> +	IOMMU_PAGE_RESP_SUCCESS,
>> +	IOMMU_PAGE_RESP_INVALID,
>> +	IOMMU_PAGE_RESP_FAILURE,
>> +};
> it seems to me two things are mixed here:
> 1. driver handler response status (HANDLED, CONTINUE)
> 2. PCI standard page response code (the rest)
> Can we leave them separate? then we don't have to convert this enum
> to/from PCI ATS page response code.

Except when the producer is a platform device instead of PCI :) But I get
your point. I liked combining them into one enum because it may simplify
some device drivers. I can separate HANDLED/CONTINUE and have drivers
always call page_response().

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 09/37] iommu/fault: Let handler return a fault response
@ 2018-02-21 10:28         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-21 10:28 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Mark Rutland, xieyisheng1, ilias.apalodimas, kvm, linux-pci,
	xuzaibo, jonathan.cameron, Will Deacon, okaya, yi.l.liu,
	Lorenzo Pieralisi, ashok.raj, tn, joro, robdclark, bharatku,
	linux-acpi, Catalin Marinas, rfranz, lenb, devicetree,
	alex.williamson, robh+dt, thunder.leizhen, bhelgaas,
	linux-arm-kernel, shunyong.yang, dwmw2, liubo95, rjw, jcrouse,
	iommu, hanjun.guo, Sudeep Holla, Robin Murphy, christian.koenig,
	nwatters

On 20/02/18 23:19, Jacob Pan wrote:
> On Mon, 12 Feb 2018 18:33:24 +0000
> Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:
> 
>>  
>> +/**
>> + * enum page_response_code - Return status of fault handlers,
>> telling the IOMMU
>> + * driver how to proceed with the fault.
>> + *
>> + * @IOMMU_FAULT_STATUS_HANDLED: Stop processing the fault, and do
>> not send a
>> + *	reply to the device.
>> + * @IOMMU_FAULT_STATUS_CONTINUE: Fault was not handled. Call the
>> next handler,
>> + *	or terminate.
>> + * @IOMMU_FAULT_STATUS_SUCCESS: Fault has been handled and the page
>> tables
>> + *	populated, retry the access. This is "Success" in PCI PRI.
>> + * @IOMMU_FAULT_STATUS_FAILURE: General error. Drop all subsequent
>> faults from
>> + *	this device if possible. This is "Response Failure" in PCI
>> PRI.
>> + * @IOMMU_FAULT_STATUS_INVALID: Could not handle this fault, don't
>> retry the
>> + *	access. This is "Invalid Request" in PCI PRI.
>> + */
>> +enum page_response_code {
>> +	IOMMU_PAGE_RESP_HANDLED = 0,
>> +	IOMMU_PAGE_RESP_CONTINUE,
>> +	IOMMU_PAGE_RESP_SUCCESS,
>> +	IOMMU_PAGE_RESP_INVALID,
>> +	IOMMU_PAGE_RESP_FAILURE,
>> +};
> it seems to me two things are mixed here:
> 1. driver handler response status (HANDLED, CONTINUE)
> 2. PCI standard page response code (the rest)
> Can we leave them separate? then we don't have to convert this enum
> to/from PCI ATS page response code.

Except when the producer is a platform device instead of PCI :) But I get
your point. I liked combining them into one enum because it may simplify
some device drivers. I can separate HANDLED/CONTINUE and have drivers
always call page_response().

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 09/37] iommu/fault: Let handler return a fault response
@ 2018-02-21 10:28         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-21 10:28 UTC (permalink / raw)
  To: linux-arm-kernel

On 20/02/18 23:19, Jacob Pan wrote:
> On Mon, 12 Feb 2018 18:33:24 +0000
> Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:
> 
>>  
>> +/**
>> + * enum page_response_code - Return status of fault handlers,
>> telling the IOMMU
>> + * driver how to proceed with the fault.
>> + *
>> + * @IOMMU_FAULT_STATUS_HANDLED: Stop processing the fault, and do
>> not send a
>> + *	reply to the device.
>> + * @IOMMU_FAULT_STATUS_CONTINUE: Fault was not handled. Call the
>> next handler,
>> + *	or terminate.
>> + * @IOMMU_FAULT_STATUS_SUCCESS: Fault has been handled and the page
>> tables
>> + *	populated, retry the access. This is "Success" in PCI PRI.
>> + * @IOMMU_FAULT_STATUS_FAILURE: General error. Drop all subsequent
>> faults from
>> + *	this device if possible. This is "Response Failure" in PCI
>> PRI.
>> + * @IOMMU_FAULT_STATUS_INVALID: Could not handle this fault, don't
>> retry the
>> + *	access. This is "Invalid Request" in PCI PRI.
>> + */
>> +enum page_response_code {
>> +	IOMMU_PAGE_RESP_HANDLED = 0,
>> +	IOMMU_PAGE_RESP_CONTINUE,
>> +	IOMMU_PAGE_RESP_SUCCESS,
>> +	IOMMU_PAGE_RESP_INVALID,
>> +	IOMMU_PAGE_RESP_FAILURE,
>> +};
> it seems to me two things are mixed here:
> 1. driver handler response status (HANDLED, CONTINUE)
> 2. PCI standard page response code (the rest)
> Can we leave them separate? then we don't have to convert this enum
> to/from PCI ATS page response code.

Except when the producer is a platform device instead of PCI :) But I get
your point. I liked combining them into one enum because it may simplify
some device drivers. I can separate HANDLED/CONTINUE and have drivers
always call page_response().

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* RE: [PATCH 01/37] iommu: Introduce Shared Virtual Addressing API
  2018-02-15 12:42                 ` Jean-Philippe Brucker
  (?)
@ 2018-02-27  6:21                     ` Tian, Kevin
  -1 siblings, 0 replies; 317+ messages in thread
From: Tian, Kevin @ 2018-02-27  6:21 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Mark Rutland, bharatku-gjFFaj9aHVfQT0dZR+AlfA, Raj, Ashok,
	rjw-LthD3rsA81gm4RdzfppkhA, Catalin Marinas,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, Sudeep Holla,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	christian.koenig-5C7GfCeVMHo, lenb-DgEjT+Ai2ygdnm+yROfE0A

> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org]
> Sent: Thursday, February 15, 2018 8:42 PM
> 
> On 13/02/18 23:43, Tian, Kevin wrote:
> >> From: Jean-Philippe Brucker
> >> Sent: Tuesday, February 13, 2018 8:40 PM
> >>
> >>
> >> [...]
> >>>> +
> >>>> +/**
> >>>> + * iommu_sva_device_init() - Initialize Shared Virtual Addressing for
> a
> >>>> device
> >>>> + * @dev: the device
> >>>> + * @features: bitmask of features that need to be initialized
> >>>> + * @max_pasid: max PASID value supported by the device
> >>>> + *
> >>>> + * Users of the bind()/unbind() API must call this function to initialize
> all
> >>>> + * features required for SVA.
> >>>> + *
> >>>> + * - If the device should support multiple address spaces (e.g. PCI
> >> PASID),
> >>>> + *   IOMMU_SVA_FEAT_PASID must be requested.
> >>>
> >>> I think it is by default assumed when using this API, based on definition
> of
> >>> SVA. Can you elaborate the situation where this flag can be cleared?
> >>
> >> When passing a device to userspace, you could also share its non-pasid
> >> address space with the process. It requires a new domain type so is left
> >> as a TODO in patch 2/37. I did get requests for this feature, though I
> >> think it was mostly for prototyping. I guess I could remove the flag, and
> >> reintroduce it as IOMMU_SVA_FEAT_NO_PASID later on.
> >
> > sorry I still didn't get the definition of non-pasid address space.
> > Did you mean the GPA/IOVA address space and no_pasid implies
> > actually some default PASID associated?
> 
> Yes I mean merging the process address space and IOVA space. There are
> no
> PASIDs involved if the device or the IOMMU doesn't support it. Instead of
> private DMA page tables you program the mm pgd into the IOMMU. A VFIO
> userspace driver, instead of sending MAP/UNMAP ioctl, could simply issue
> a
> BIND.

got it. yes it's better to remove it for now which can avoid
unnecessary confusion. :-)

> 
> Technically nothing prevents it, but now the resv problem discussed on
> patch 2/37 stands out. For example on x86 you'd probably need to carve
> the
> IOAPIC MSI range out of the process address space. On Arm you'd need to
> create a write-only mapping for MSIs (IOMMU translates it to the IRQ chip
> address, but thankfully accessing the doorbell from CPU side doesn't
> trigger an MSI.)

so if overlap already exists when binding a process address space
(since binding may happen much later than creating the process),
I assume the call will simply fail since carve out at this point is not
possible?

> 
> >> [...]
> >>>> +	ret = domain->ops->sva_device_init(dev, features, &min_pasid,
> >>>> +					   &max_pasid);
> >>>> +	if (ret)
> >>>> +		return ret;
> >>>> +
> >>>> +	/* FIXME: racy. Next version should have a mutex (same as fault
> >>>> handler) */
> >>>> +	dev_param->sva_features = features;
> >>>> +	dev_param->min_pasid = min_pasid;
> >>>> +	dev_param->max_pasid = max_pasid;
> >>>
> >>> what's the point of min_pasid here?
> >>
> >> Arm SMMUv3 uses entry 0 of the PASID table for the default (non-pasid)
> >> context, so it needs to set min_pasid to 1. AMD IOMMU recently added
> a
> >> similar feature (GIoSup), if I understood correctly.
> >>
> >
> > just for such purpose maybe we should just define a reserved_pasid
> > otherwise there will be some waste if an implementation allows it
> > non-zero.
> 
> What's wasted? It's slightly simpler to use min_pasid because we just pass
> that limit to idr_alloc(). With a reserved_pasid we'll have to call
> idr_alloc(reserved_pasid) once, for the same result.
> 

I'm thinking about the case where an implementation allows
software to define a random reserved_pasid, then banning
all pasids below reserved one could be a waste. But after
more thinking it is not a big problem. We can request such
driver to use 0 as reserved_pasid then same situation as
ARM side.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 317+ messages in thread

* RE: [PATCH 01/37] iommu: Introduce Shared Virtual Addressing API
@ 2018-02-27  6:21                     ` Tian, Kevin
  0 siblings, 0 replies; 317+ messages in thread
From: Tian, Kevin @ 2018-02-27  6:21 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu, kvm
  Cc: Mark Rutland, ilias.apalodimas, Catalin Marinas, xuzaibo,
	Will Deacon, okaya, Raj, Ashok, bharatku, rfranz, lenb, robh+dt,
	bhelgaas, shunyong.yang, dwmw2, rjw, Sudeep Holla,
	christian.koenig, Joerg Roedel

> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
> Sent: Thursday, February 15, 2018 8:42 PM
> 
> On 13/02/18 23:43, Tian, Kevin wrote:
> >> From: Jean-Philippe Brucker
> >> Sent: Tuesday, February 13, 2018 8:40 PM
> >>
> >>
> >> [...]
> >>>> +
> >>>> +/**
> >>>> + * iommu_sva_device_init() - Initialize Shared Virtual Addressing for
> a
> >>>> device
> >>>> + * @dev: the device
> >>>> + * @features: bitmask of features that need to be initialized
> >>>> + * @max_pasid: max PASID value supported by the device
> >>>> + *
> >>>> + * Users of the bind()/unbind() API must call this function to initialize
> all
> >>>> + * features required for SVA.
> >>>> + *
> >>>> + * - If the device should support multiple address spaces (e.g. PCI
> >> PASID),
> >>>> + *   IOMMU_SVA_FEAT_PASID must be requested.
> >>>
> >>> I think it is by default assumed when using this API, based on definition
> of
> >>> SVA. Can you elaborate the situation where this flag can be cleared?
> >>
> >> When passing a device to userspace, you could also share its non-pasid
> >> address space with the process. It requires a new domain type so is left
> >> as a TODO in patch 2/37. I did get requests for this feature, though I
> >> think it was mostly for prototyping. I guess I could remove the flag, and
> >> reintroduce it as IOMMU_SVA_FEAT_NO_PASID later on.
> >
> > sorry I still didn't get the definition of non-pasid address space.
> > Did you mean the GPA/IOVA address space and no_pasid implies
> > actually some default PASID associated?
> 
> Yes I mean merging the process address space and IOVA space. There are
> no
> PASIDs involved if the device or the IOMMU doesn't support it. Instead of
> private DMA page tables you program the mm pgd into the IOMMU. A VFIO
> userspace driver, instead of sending MAP/UNMAP ioctl, could simply issue
> a
> BIND.

got it. yes it's better to remove it for now which can avoid
unnecessary confusion. :-)

> 
> Technically nothing prevents it, but now the resv problem discussed on
> patch 2/37 stands out. For example on x86 you'd probably need to carve
> the
> IOAPIC MSI range out of the process address space. On Arm you'd need to
> create a write-only mapping for MSIs (IOMMU translates it to the IRQ chip
> address, but thankfully accessing the doorbell from CPU side doesn't
> trigger an MSI.)

so if overlap already exists when binding a process address space
(since binding may happen much later than creating the process),
I assume the call will simply fail since carve out at this point is not
possible?

> 
> >> [...]
> >>>> +	ret = domain->ops->sva_device_init(dev, features, &min_pasid,
> >>>> +					   &max_pasid);
> >>>> +	if (ret)
> >>>> +		return ret;
> >>>> +
> >>>> +	/* FIXME: racy. Next version should have a mutex (same as fault
> >>>> handler) */
> >>>> +	dev_param->sva_features = features;
> >>>> +	dev_param->min_pasid = min_pasid;
> >>>> +	dev_param->max_pasid = max_pasid;
> >>>
> >>> what's the point of min_pasid here?
> >>
> >> Arm SMMUv3 uses entry 0 of the PASID table for the default (non-pasid)
> >> context, so it needs to set min_pasid to 1. AMD IOMMU recently added
> a
> >> similar feature (GIoSup), if I understood correctly.
> >>
> >
> > just for such purpose maybe we should just define a reserved_pasid
> > otherwise there will be some waste if an implementation allows it
> > non-zero.
> 
> What's wasted? It's slightly simpler to use min_pasid because we just pass
> that limit to idr_alloc(). With a reserved_pasid we'll have to call
> idr_alloc(reserved_pasid) once, for the same result.
> 

I'm thinking about the case where an implementation allows
software to define a random reserved_pasid, then banning
all pasids below reserved one could be a waste. But after
more thinking it is not a big problem. We can request such
driver to use 0 as reserved_pasid then same situation as
ARM side.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 01/37] iommu: Introduce Shared Virtual Addressing API
@ 2018-02-27  6:21                     ` Tian, Kevin
  0 siblings, 0 replies; 317+ messages in thread
From: Tian, Kevin @ 2018-02-27  6:21 UTC (permalink / raw)
  To: linux-arm-kernel

> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker at arm.com]
> Sent: Thursday, February 15, 2018 8:42 PM
> 
> On 13/02/18 23:43, Tian, Kevin wrote:
> >> From: Jean-Philippe Brucker
> >> Sent: Tuesday, February 13, 2018 8:40 PM
> >>
> >>
> >> [...]
> >>>> +
> >>>> +/**
> >>>> + * iommu_sva_device_init() - Initialize Shared Virtual Addressing for
> a
> >>>> device
> >>>> + * @dev: the device
> >>>> + * @features: bitmask of features that need to be initialized
> >>>> + * @max_pasid: max PASID value supported by the device
> >>>> + *
> >>>> + * Users of the bind()/unbind() API must call this function to initialize
> all
> >>>> + * features required for SVA.
> >>>> + *
> >>>> + * - If the device should support multiple address spaces (e.g. PCI
> >> PASID),
> >>>> + *   IOMMU_SVA_FEAT_PASID must be requested.
> >>>
> >>> I think it is by default assumed when using this API, based on definition
> of
> >>> SVA. Can you elaborate the situation where this flag can be cleared?
> >>
> >> When passing a device to userspace, you could also share its non-pasid
> >> address space with the process. It requires a new domain type so is left
> >> as a TODO in patch 2/37. I did get requests for this feature, though I
> >> think it was mostly for prototyping. I guess I could remove the flag, and
> >> reintroduce it as IOMMU_SVA_FEAT_NO_PASID later on.
> >
> > sorry I still didn't get the definition of non-pasid address space.
> > Did you mean the GPA/IOVA address space and no_pasid implies
> > actually some default PASID associated?
> 
> Yes I mean merging the process address space and IOVA space. There are
> no
> PASIDs involved if the device or the IOMMU doesn't support it. Instead of
> private DMA page tables you program the mm pgd into the IOMMU. A VFIO
> userspace driver, instead of sending MAP/UNMAP ioctl, could simply issue
> a
> BIND.

got it. yes it's better to remove it for now which can avoid
unnecessary confusion. :-)

> 
> Technically nothing prevents it, but now the resv problem discussed on
> patch 2/37 stands out. For example on x86 you'd probably need to carve
> the
> IOAPIC MSI range out of the process address space. On Arm you'd need to
> create a write-only mapping for MSIs (IOMMU translates it to the IRQ chip
> address, but thankfully accessing the doorbell from CPU side doesn't
> trigger an MSI.)

so if overlap already exists when binding a process address space
(since binding may happen much later than creating the process),
I assume the call will simply fail since carve out at this point is not
possible?

> 
> >> [...]
> >>>> +	ret = domain->ops->sva_device_init(dev, features, &min_pasid,
> >>>> +					   &max_pasid);
> >>>> +	if (ret)
> >>>> +		return ret;
> >>>> +
> >>>> +	/* FIXME: racy. Next version should have a mutex (same as fault
> >>>> handler) */
> >>>> +	dev_param->sva_features = features;
> >>>> +	dev_param->min_pasid = min_pasid;
> >>>> +	dev_param->max_pasid = max_pasid;
> >>>
> >>> what's the point of min_pasid here?
> >>
> >> Arm SMMUv3 uses entry 0 of the PASID table for the default (non-pasid)
> >> context, so it needs to set min_pasid to 1. AMD IOMMU recently added
> a
> >> similar feature (GIoSup), if I understood correctly.
> >>
> >
> > just for such purpose maybe we should just define a reserved_pasid
> > otherwise there will be some waste if an implementation allows it
> > non-zero.
> 
> What's wasted? It's slightly simpler to use min_pasid because we just pass
> that limit to idr_alloc(). With a reserved_pasid we'll have to call
> idr_alloc(reserved_pasid) once, for the same result.
> 

I'm thinking about the case where an implementation allows
software to define a random reserved_pasid, then banning
all pasids below reserved one could be a waste. But after
more thinking it is not a big problem. We can request such
driver to use 0 as reserved_pasid then same situation as
ARM side.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 16/37] iommu: Add generic PASID table library
  2018-02-12 18:33     ` Jean-Philippe Brucker
  (?)
@ 2018-02-27 18:51         ` Jacob Pan
  -1 siblings, 0 replies; 317+ messages in thread
From: Jacob Pan @ 2018-02-27 18:51 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: mark.rutland-5wv7dgnIgG8,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, catalin.marinas-5wv7dgnIgG8,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw-LthD3rsA81gm4RdzfppkhA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	sudeep.holla-5wv7dgnIgG8, christian.koenig-5C7GfCeVMHo

On Mon, 12 Feb 2018 18:33:31 +0000
Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:

> Add a small API within the IOMMU subsystem to handle different
> formats of PASID tables. It uses the same principle as io-pgtable:
> 
> * The IOMMU driver registers a PASID table with some invalidation
>   callbacks.
> * The pasid-table lib allocates a set of tables of the right format,
> and returns an iommu_pasid_table_ops structure.
> * The IOMMU driver allocates entries and writes them using the
> provided ops.
> * The pasid-table lib calls the IOMMU driver back for invalidation
> when necessary.
> * The IOMMU driver unregisters the ops which frees the tables when
>   finished.
> 
> An example user will be Arm SMMU in a subsequent patch.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
> ---
>  drivers/iommu/Kconfig       |   8 +++
>  drivers/iommu/Makefile      |   1 +
>  drivers/iommu/iommu-pasid.c |  53 +++++++++++++++++
>  drivers/iommu/iommu-pasid.h | 142
> ++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 204
> insertions(+) create mode 100644 drivers/iommu/iommu-pasid.c
>  create mode 100644 drivers/iommu/iommu-pasid.h
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index e751bb9958ba..8add90ba9b75 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -60,6 +60,14 @@ config IOMMU_IO_PGTABLE_ARMV7S_SELFTEST
>  
>  endmenu
>  
> +menu "Generic PASID table support"
> +
> +# Selected by the actual PASID table implementations
> +config IOMMU_PASID_TABLE
> +	bool
> +
> +endmenu
> +
>  config IOMMU_IOVA
>  	tristate
>  
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index f4324e29035e..338e59c93131 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -8,6 +8,7 @@ obj-$(CONFIG_IOMMU_FAULT) += io-pgfault.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
> +obj-$(CONFIG_IOMMU_PASID_TABLE) += iommu-pasid.o
>  obj-$(CONFIG_IOMMU_IOVA) += iova.o
>  obj-$(CONFIG_OF_IOMMU)	+= of_iommu.o
>  obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
> diff --git a/drivers/iommu/iommu-pasid.c b/drivers/iommu/iommu-pasid.c
> new file mode 100644
> index 000000000000..6b21d369d514
> --- /dev/null
> +++ b/drivers/iommu/iommu-pasid.c
> @@ -0,0 +1,53 @@
> +/*
> + * PASID table management for the IOMMU
> + *
> + * Copyright (C) 2018 ARM Ltd.
> + * Author: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
> + *
> + * SPDX-License-Identifier: GPL-2.0
> + */
> +
> +#include <linux/kernel.h>
> +
> +#include "iommu-pasid.h"
> +
> +static const struct iommu_pasid_init_fns *
> +pasid_table_init_fns[PASID_TABLE_NUM_FMTS] = {
> +};
> +
> +struct iommu_pasid_table_ops *
> +iommu_alloc_pasid_ops(enum iommu_pasid_table_fmt fmt,
> +		      struct iommu_pasid_table_cfg *cfg, void
> *cookie) +{
I guess you don't need to pass in cookie here.
> +	struct iommu_pasid_table *table;
> +	const struct iommu_pasid_init_fns *fns;
> +
> +	if (fmt >= PASID_TABLE_NUM_FMTS)
> +		return NULL;
> +
> +	fns = pasid_table_init_fns[fmt];
> +	if (!fns)
> +		return NULL;
> +
> +	table = fns->alloc(cfg, cookie);
> +	if (!table)
> +		return NULL;
> +
> +	table->fmt = fmt;
> +	table->cookie = cookie;
> +	table->cfg = *cfg;
> +
the ops is already IOMMU model specific, why do you need to pass cfg
back?
> +	return &table->ops;
If there is no common code that uses these ops, I don't see the benefit
of having these APIs. Or the plan is to consolidate even further such
that referene to pasid table can be attached at per iommu_domain etc,
but that would be model specific choice.

Jacob 
> +}
> +
> +void iommu_free_pasid_ops(struct iommu_pasid_table_ops *ops)
> +{
> +	struct iommu_pasid_table *table;
> +
> +	if (!ops)
> +		return;
> +
> +	table = container_of(ops, struct iommu_pasid_table, ops);
> +	iommu_pasid_flush_all(table);
> +	pasid_table_init_fns[table->fmt]->free(table);
> +}
> diff --git a/drivers/iommu/iommu-pasid.h b/drivers/iommu/iommu-pasid.h
> new file mode 100644
> index 000000000000..40a27d35c1e0
> --- /dev/null
> +++ b/drivers/iommu/iommu-pasid.h
> @@ -0,0 +1,142 @@
> +/*
> + * PASID table management for the IOMMU
> + *
> + * Copyright (C) 2017 ARM Ltd.
> + * Author: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
> + *
> + * SPDX-License-Identifier: GPL-2.0
> + */
> +#ifndef __IOMMU_PASID_H
> +#define __IOMMU_PASID_H
> +
> +#include <linux/types.h>
> +#include "io-pgtable.h"
> +
> +struct mm_struct;
> +
> +enum iommu_pasid_table_fmt {
> +	PASID_TABLE_NUM_FMTS,
> +};
> +
> +/**
> + * iommu_pasid_entry - Entry of a PASID table
> + *
> + * @token:	architecture-specific data needed to uniquely
> identify the
> + *		entry. Most notably used for TLB invalidation
> + */
> +struct iommu_pasid_entry {
> +	u64		tag;
> +};
> +
> +/**
> + * iommu_pasid_table_ops - Operations on a PASID table
> + *
> + * @alloc_shared_entry:	allocate an entry for sharing an mm
> (SVA)
> + *			Returns the pointer to a new entry or an
> error
> + * @alloc_priv_entry:	allocate an entry for map/unmap
> operations
> + *			Returns the pointer to a new entry or an
> error
> + * @free_entry:		free an entry obtained with
> alloc_entry
> + * @set_entry:		write PASID table entry
> + * @clear_entry:	clear PASID table entry
> + */
> +struct iommu_pasid_table_ops {
> +	struct iommu_pasid_entry *
> +	(*alloc_shared_entry)(struct iommu_pasid_table_ops *ops,
> +			      struct mm_struct *mm);
> +	struct iommu_pasid_entry *
> +	(*alloc_priv_entry)(struct iommu_pasid_table_ops *ops,
> +			    enum io_pgtable_fmt fmt,
> +			    struct io_pgtable_cfg *cfg);
> +	void (*free_entry)(struct iommu_pasid_table_ops *ops,
> +			   struct iommu_pasid_entry *entry);
> +	int (*set_entry)(struct iommu_pasid_table_ops *ops, int
> pasid,
> +			 struct iommu_pasid_entry *entry);
> +	void (*clear_entry)(struct iommu_pasid_table_ops *ops, int
> pasid,
> +			    struct iommu_pasid_entry *entry);
> +};
> +
> +/**
> + * iommu_pasid_sync_ops - Callbacks into the IOMMU driver
> + *
> + * @cfg_flush:		flush cached configuration for one
> entry. For a
> + *			multi-level PASID table, 'leaf' tells
> whether to only
> + *			flush cached leaf entries or intermediate
> levels as
> + *			well.
> + * @cfg_flush_all:	flush cached configuration for all entries
> of the PASID
> + *			table
> + * @tlb_flush:		flush TLB entries for one entry
> + */
> +struct iommu_pasid_sync_ops {
> +	void (*cfg_flush)(void *cookie, int pasid, bool leaf);
> +	void (*cfg_flush_all)(void *cookie);
> +	void (*tlb_flush)(void *cookie, int pasid,
> +			  struct iommu_pasid_entry *entry);
> +};
> +
> +/**
> + * struct iommu_pasid_table_cfg - Configuration data for a set of
> PASID tables.
> + *
> + * @iommu_dev	device performing the DMA table walks
> + * @order:	number of PASID bits, set by IOMMU driver
> + * @flush:	TLB management callbacks for this set of tables.
> + *
> + * @base:	DMA address of the allocated table, set by the
> allocator.
> + */
> +struct iommu_pasid_table_cfg {
> +	struct device			*iommu_dev;
> +	size_t				order;
> +	const struct iommu_pasid_sync_ops *sync;
> +
> +	dma_addr_t			base;
> +};
> +
> +struct iommu_pasid_table_ops *
> +iommu_alloc_pasid_ops(enum iommu_pasid_table_fmt fmt,
> +		      struct iommu_pasid_table_cfg *cfg,
> +		      void *cookie);
> +void iommu_free_pasid_ops(struct iommu_pasid_table_ops *ops);
> +
> +/**
> + * struct iommu_pasid_table - describes a set of PASID tables
> + *
> + * @fmt:	The PASID table format.
> + * @cookie:	An opaque token provided by the IOMMU driver and
> passed back to
> + *		any callback routine.
> + * @cfg:	A copy of the PASID table configuration.
> + * @ops:	The PASID table operations in use for this set of
> page tables.
> + */
> +struct iommu_pasid_table {
> +	enum iommu_pasid_table_fmt	fmt;
> +	void				*cookie;
> +	struct iommu_pasid_table_cfg	cfg;
> +	struct iommu_pasid_table_ops	ops;
> +};
> +
> +#define iommu_pasid_table_ops_to_table(ops) \
> +	container_of((ops), struct iommu_pasid_table, ops)
> +
> +struct iommu_pasid_init_fns {
> +	struct iommu_pasid_table *(*alloc)(struct
> iommu_pasid_table_cfg *cfg,
> +					   void *cookie);
> +	void (*free)(struct iommu_pasid_table *table);
> +};
> +
> +static inline void iommu_pasid_flush_all(struct iommu_pasid_table
> *table) +{
> +	table->cfg.sync->cfg_flush_all(table->cookie);
> +}
> +
> +static inline void iommu_pasid_flush(struct iommu_pasid_table *table,
> +					 int pasid, bool leaf)
> +{
> +	table->cfg.sync->cfg_flush(table->cookie, pasid, leaf);
> +}
> +
> +static inline void iommu_pasid_flush_tlbs(struct iommu_pasid_table
> *table,
> +					  int pasid,
> +					  struct iommu_pasid_entry
> *entry) +{
> +	table->cfg.sync->tlb_flush(table->cookie, pasid, entry);
> +}
> +
> +#endif /* __IOMMU_PASID_H */

[Jacob Pan]

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 16/37] iommu: Add generic PASID table library
@ 2018-02-27 18:51         ` Jacob Pan
  0 siblings, 0 replies; 317+ messages in thread
From: Jacob Pan @ 2018-02-27 18:51 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: mark.rutland, xieyisheng1, ilias.apalodimas, kvm, linux-pci,
	xuzaibo, jonathan.cameron, will.deacon, okaya, yi.l.liu,
	lorenzo.pieralisi, ashok.raj, tn, joro, robdclark, bharatku,
	linux-acpi, catalin.marinas, rfranz, lenb, devicetree,
	jacob.jun.pan, alex.williamson, robh+dt, thunder.leizhen,
	bhelgaas, linux-arm-kernel, shunyong.yang, dwmw2, liubo95, rjw,
	jcrouse, iommu, hanjun.guo, sudeep.holla, robin.murphy,
	christian.koenig, nwatters

On Mon, 12 Feb 2018 18:33:31 +0000
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> Add a small API within the IOMMU subsystem to handle different
> formats of PASID tables. It uses the same principle as io-pgtable:
> 
> * The IOMMU driver registers a PASID table with some invalidation
>   callbacks.
> * The pasid-table lib allocates a set of tables of the right format,
> and returns an iommu_pasid_table_ops structure.
> * The IOMMU driver allocates entries and writes them using the
> provided ops.
> * The pasid-table lib calls the IOMMU driver back for invalidation
> when necessary.
> * The IOMMU driver unregisters the ops which frees the tables when
>   finished.
> 
> An example user will be Arm SMMU in a subsequent patch.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  drivers/iommu/Kconfig       |   8 +++
>  drivers/iommu/Makefile      |   1 +
>  drivers/iommu/iommu-pasid.c |  53 +++++++++++++++++
>  drivers/iommu/iommu-pasid.h | 142
> ++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 204
> insertions(+) create mode 100644 drivers/iommu/iommu-pasid.c
>  create mode 100644 drivers/iommu/iommu-pasid.h
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index e751bb9958ba..8add90ba9b75 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -60,6 +60,14 @@ config IOMMU_IO_PGTABLE_ARMV7S_SELFTEST
>  
>  endmenu
>  
> +menu "Generic PASID table support"
> +
> +# Selected by the actual PASID table implementations
> +config IOMMU_PASID_TABLE
> +	bool
> +
> +endmenu
> +
>  config IOMMU_IOVA
>  	tristate
>  
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index f4324e29035e..338e59c93131 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -8,6 +8,7 @@ obj-$(CONFIG_IOMMU_FAULT) += io-pgfault.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
> +obj-$(CONFIG_IOMMU_PASID_TABLE) += iommu-pasid.o
>  obj-$(CONFIG_IOMMU_IOVA) += iova.o
>  obj-$(CONFIG_OF_IOMMU)	+= of_iommu.o
>  obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
> diff --git a/drivers/iommu/iommu-pasid.c b/drivers/iommu/iommu-pasid.c
> new file mode 100644
> index 000000000000..6b21d369d514
> --- /dev/null
> +++ b/drivers/iommu/iommu-pasid.c
> @@ -0,0 +1,53 @@
> +/*
> + * PASID table management for the IOMMU
> + *
> + * Copyright (C) 2018 ARM Ltd.
> + * Author: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> + *
> + * SPDX-License-Identifier: GPL-2.0
> + */
> +
> +#include <linux/kernel.h>
> +
> +#include "iommu-pasid.h"
> +
> +static const struct iommu_pasid_init_fns *
> +pasid_table_init_fns[PASID_TABLE_NUM_FMTS] = {
> +};
> +
> +struct iommu_pasid_table_ops *
> +iommu_alloc_pasid_ops(enum iommu_pasid_table_fmt fmt,
> +		      struct iommu_pasid_table_cfg *cfg, void
> *cookie) +{
I guess you don't need to pass in cookie here.
> +	struct iommu_pasid_table *table;
> +	const struct iommu_pasid_init_fns *fns;
> +
> +	if (fmt >= PASID_TABLE_NUM_FMTS)
> +		return NULL;
> +
> +	fns = pasid_table_init_fns[fmt];
> +	if (!fns)
> +		return NULL;
> +
> +	table = fns->alloc(cfg, cookie);
> +	if (!table)
> +		return NULL;
> +
> +	table->fmt = fmt;
> +	table->cookie = cookie;
> +	table->cfg = *cfg;
> +
the ops is already IOMMU model specific, why do you need to pass cfg
back?
> +	return &table->ops;
If there is no common code that uses these ops, I don't see the benefit
of having these APIs. Or the plan is to consolidate even further such
that referene to pasid table can be attached at per iommu_domain etc,
but that would be model specific choice.

Jacob 
> +}
> +
> +void iommu_free_pasid_ops(struct iommu_pasid_table_ops *ops)
> +{
> +	struct iommu_pasid_table *table;
> +
> +	if (!ops)
> +		return;
> +
> +	table = container_of(ops, struct iommu_pasid_table, ops);
> +	iommu_pasid_flush_all(table);
> +	pasid_table_init_fns[table->fmt]->free(table);
> +}
> diff --git a/drivers/iommu/iommu-pasid.h b/drivers/iommu/iommu-pasid.h
> new file mode 100644
> index 000000000000..40a27d35c1e0
> --- /dev/null
> +++ b/drivers/iommu/iommu-pasid.h
> @@ -0,0 +1,142 @@
> +/*
> + * PASID table management for the IOMMU
> + *
> + * Copyright (C) 2017 ARM Ltd.
> + * Author: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> + *
> + * SPDX-License-Identifier: GPL-2.0
> + */
> +#ifndef __IOMMU_PASID_H
> +#define __IOMMU_PASID_H
> +
> +#include <linux/types.h>
> +#include "io-pgtable.h"
> +
> +struct mm_struct;
> +
> +enum iommu_pasid_table_fmt {
> +	PASID_TABLE_NUM_FMTS,
> +};
> +
> +/**
> + * iommu_pasid_entry - Entry of a PASID table
> + *
> + * @token:	architecture-specific data needed to uniquely
> identify the
> + *		entry. Most notably used for TLB invalidation
> + */
> +struct iommu_pasid_entry {
> +	u64		tag;
> +};
> +
> +/**
> + * iommu_pasid_table_ops - Operations on a PASID table
> + *
> + * @alloc_shared_entry:	allocate an entry for sharing an mm
> (SVA)
> + *			Returns the pointer to a new entry or an
> error
> + * @alloc_priv_entry:	allocate an entry for map/unmap
> operations
> + *			Returns the pointer to a new entry or an
> error
> + * @free_entry:		free an entry obtained with
> alloc_entry
> + * @set_entry:		write PASID table entry
> + * @clear_entry:	clear PASID table entry
> + */
> +struct iommu_pasid_table_ops {
> +	struct iommu_pasid_entry *
> +	(*alloc_shared_entry)(struct iommu_pasid_table_ops *ops,
> +			      struct mm_struct *mm);
> +	struct iommu_pasid_entry *
> +	(*alloc_priv_entry)(struct iommu_pasid_table_ops *ops,
> +			    enum io_pgtable_fmt fmt,
> +			    struct io_pgtable_cfg *cfg);
> +	void (*free_entry)(struct iommu_pasid_table_ops *ops,
> +			   struct iommu_pasid_entry *entry);
> +	int (*set_entry)(struct iommu_pasid_table_ops *ops, int
> pasid,
> +			 struct iommu_pasid_entry *entry);
> +	void (*clear_entry)(struct iommu_pasid_table_ops *ops, int
> pasid,
> +			    struct iommu_pasid_entry *entry);
> +};
> +
> +/**
> + * iommu_pasid_sync_ops - Callbacks into the IOMMU driver
> + *
> + * @cfg_flush:		flush cached configuration for one
> entry. For a
> + *			multi-level PASID table, 'leaf' tells
> whether to only
> + *			flush cached leaf entries or intermediate
> levels as
> + *			well.
> + * @cfg_flush_all:	flush cached configuration for all entries
> of the PASID
> + *			table
> + * @tlb_flush:		flush TLB entries for one entry
> + */
> +struct iommu_pasid_sync_ops {
> +	void (*cfg_flush)(void *cookie, int pasid, bool leaf);
> +	void (*cfg_flush_all)(void *cookie);
> +	void (*tlb_flush)(void *cookie, int pasid,
> +			  struct iommu_pasid_entry *entry);
> +};
> +
> +/**
> + * struct iommu_pasid_table_cfg - Configuration data for a set of
> PASID tables.
> + *
> + * @iommu_dev	device performing the DMA table walks
> + * @order:	number of PASID bits, set by IOMMU driver
> + * @flush:	TLB management callbacks for this set of tables.
> + *
> + * @base:	DMA address of the allocated table, set by the
> allocator.
> + */
> +struct iommu_pasid_table_cfg {
> +	struct device			*iommu_dev;
> +	size_t				order;
> +	const struct iommu_pasid_sync_ops *sync;
> +
> +	dma_addr_t			base;
> +};
> +
> +struct iommu_pasid_table_ops *
> +iommu_alloc_pasid_ops(enum iommu_pasid_table_fmt fmt,
> +		      struct iommu_pasid_table_cfg *cfg,
> +		      void *cookie);
> +void iommu_free_pasid_ops(struct iommu_pasid_table_ops *ops);
> +
> +/**
> + * struct iommu_pasid_table - describes a set of PASID tables
> + *
> + * @fmt:	The PASID table format.
> + * @cookie:	An opaque token provided by the IOMMU driver and
> passed back to
> + *		any callback routine.
> + * @cfg:	A copy of the PASID table configuration.
> + * @ops:	The PASID table operations in use for this set of
> page tables.
> + */
> +struct iommu_pasid_table {
> +	enum iommu_pasid_table_fmt	fmt;
> +	void				*cookie;
> +	struct iommu_pasid_table_cfg	cfg;
> +	struct iommu_pasid_table_ops	ops;
> +};
> +
> +#define iommu_pasid_table_ops_to_table(ops) \
> +	container_of((ops), struct iommu_pasid_table, ops)
> +
> +struct iommu_pasid_init_fns {
> +	struct iommu_pasid_table *(*alloc)(struct
> iommu_pasid_table_cfg *cfg,
> +					   void *cookie);
> +	void (*free)(struct iommu_pasid_table *table);
> +};
> +
> +static inline void iommu_pasid_flush_all(struct iommu_pasid_table
> *table) +{
> +	table->cfg.sync->cfg_flush_all(table->cookie);
> +}
> +
> +static inline void iommu_pasid_flush(struct iommu_pasid_table *table,
> +					 int pasid, bool leaf)
> +{
> +	table->cfg.sync->cfg_flush(table->cookie, pasid, leaf);
> +}
> +
> +static inline void iommu_pasid_flush_tlbs(struct iommu_pasid_table
> *table,
> +					  int pasid,
> +					  struct iommu_pasid_entry
> *entry) +{
> +	table->cfg.sync->tlb_flush(table->cookie, pasid, entry);
> +}
> +
> +#endif /* __IOMMU_PASID_H */

[Jacob Pan]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 16/37] iommu: Add generic PASID table library
@ 2018-02-27 18:51         ` Jacob Pan
  0 siblings, 0 replies; 317+ messages in thread
From: Jacob Pan @ 2018-02-27 18:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 12 Feb 2018 18:33:31 +0000
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> Add a small API within the IOMMU subsystem to handle different
> formats of PASID tables. It uses the same principle as io-pgtable:
> 
> * The IOMMU driver registers a PASID table with some invalidation
>   callbacks.
> * The pasid-table lib allocates a set of tables of the right format,
> and returns an iommu_pasid_table_ops structure.
> * The IOMMU driver allocates entries and writes them using the
> provided ops.
> * The pasid-table lib calls the IOMMU driver back for invalidation
> when necessary.
> * The IOMMU driver unregisters the ops which frees the tables when
>   finished.
> 
> An example user will be Arm SMMU in a subsequent patch.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  drivers/iommu/Kconfig       |   8 +++
>  drivers/iommu/Makefile      |   1 +
>  drivers/iommu/iommu-pasid.c |  53 +++++++++++++++++
>  drivers/iommu/iommu-pasid.h | 142
> ++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 204
> insertions(+) create mode 100644 drivers/iommu/iommu-pasid.c
>  create mode 100644 drivers/iommu/iommu-pasid.h
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index e751bb9958ba..8add90ba9b75 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -60,6 +60,14 @@ config IOMMU_IO_PGTABLE_ARMV7S_SELFTEST
>  
>  endmenu
>  
> +menu "Generic PASID table support"
> +
> +# Selected by the actual PASID table implementations
> +config IOMMU_PASID_TABLE
> +	bool
> +
> +endmenu
> +
>  config IOMMU_IOVA
>  	tristate
>  
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index f4324e29035e..338e59c93131 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -8,6 +8,7 @@ obj-$(CONFIG_IOMMU_FAULT) += io-pgfault.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
> +obj-$(CONFIG_IOMMU_PASID_TABLE) += iommu-pasid.o
>  obj-$(CONFIG_IOMMU_IOVA) += iova.o
>  obj-$(CONFIG_OF_IOMMU)	+= of_iommu.o
>  obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
> diff --git a/drivers/iommu/iommu-pasid.c b/drivers/iommu/iommu-pasid.c
> new file mode 100644
> index 000000000000..6b21d369d514
> --- /dev/null
> +++ b/drivers/iommu/iommu-pasid.c
> @@ -0,0 +1,53 @@
> +/*
> + * PASID table management for the IOMMU
> + *
> + * Copyright (C) 2018 ARM Ltd.
> + * Author: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> + *
> + * SPDX-License-Identifier: GPL-2.0
> + */
> +
> +#include <linux/kernel.h>
> +
> +#include "iommu-pasid.h"
> +
> +static const struct iommu_pasid_init_fns *
> +pasid_table_init_fns[PASID_TABLE_NUM_FMTS] = {
> +};
> +
> +struct iommu_pasid_table_ops *
> +iommu_alloc_pasid_ops(enum iommu_pasid_table_fmt fmt,
> +		      struct iommu_pasid_table_cfg *cfg, void
> *cookie) +{
I guess you don't need to pass in cookie here.
> +	struct iommu_pasid_table *table;
> +	const struct iommu_pasid_init_fns *fns;
> +
> +	if (fmt >= PASID_TABLE_NUM_FMTS)
> +		return NULL;
> +
> +	fns = pasid_table_init_fns[fmt];
> +	if (!fns)
> +		return NULL;
> +
> +	table = fns->alloc(cfg, cookie);
> +	if (!table)
> +		return NULL;
> +
> +	table->fmt = fmt;
> +	table->cookie = cookie;
> +	table->cfg = *cfg;
> +
the ops is already IOMMU model specific, why do you need to pass cfg
back?
> +	return &table->ops;
If there is no common code that uses these ops, I don't see the benefit
of having these APIs. Or the plan is to consolidate even further such
that referene to pasid table can be attached at per iommu_domain etc,
but that would be model specific choice.

Jacob 
> +}
> +
> +void iommu_free_pasid_ops(struct iommu_pasid_table_ops *ops)
> +{
> +	struct iommu_pasid_table *table;
> +
> +	if (!ops)
> +		return;
> +
> +	table = container_of(ops, struct iommu_pasid_table, ops);
> +	iommu_pasid_flush_all(table);
> +	pasid_table_init_fns[table->fmt]->free(table);
> +}
> diff --git a/drivers/iommu/iommu-pasid.h b/drivers/iommu/iommu-pasid.h
> new file mode 100644
> index 000000000000..40a27d35c1e0
> --- /dev/null
> +++ b/drivers/iommu/iommu-pasid.h
> @@ -0,0 +1,142 @@
> +/*
> + * PASID table management for the IOMMU
> + *
> + * Copyright (C) 2017 ARM Ltd.
> + * Author: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> + *
> + * SPDX-License-Identifier: GPL-2.0
> + */
> +#ifndef __IOMMU_PASID_H
> +#define __IOMMU_PASID_H
> +
> +#include <linux/types.h>
> +#include "io-pgtable.h"
> +
> +struct mm_struct;
> +
> +enum iommu_pasid_table_fmt {
> +	PASID_TABLE_NUM_FMTS,
> +};
> +
> +/**
> + * iommu_pasid_entry - Entry of a PASID table
> + *
> + * @token:	architecture-specific data needed to uniquely
> identify the
> + *		entry. Most notably used for TLB invalidation
> + */
> +struct iommu_pasid_entry {
> +	u64		tag;
> +};
> +
> +/**
> + * iommu_pasid_table_ops - Operations on a PASID table
> + *
> + * @alloc_shared_entry:	allocate an entry for sharing an mm
> (SVA)
> + *			Returns the pointer to a new entry or an
> error
> + * @alloc_priv_entry:	allocate an entry for map/unmap
> operations
> + *			Returns the pointer to a new entry or an
> error
> + * @free_entry:		free an entry obtained with
> alloc_entry
> + * @set_entry:		write PASID table entry
> + * @clear_entry:	clear PASID table entry
> + */
> +struct iommu_pasid_table_ops {
> +	struct iommu_pasid_entry *
> +	(*alloc_shared_entry)(struct iommu_pasid_table_ops *ops,
> +			      struct mm_struct *mm);
> +	struct iommu_pasid_entry *
> +	(*alloc_priv_entry)(struct iommu_pasid_table_ops *ops,
> +			    enum io_pgtable_fmt fmt,
> +			    struct io_pgtable_cfg *cfg);
> +	void (*free_entry)(struct iommu_pasid_table_ops *ops,
> +			   struct iommu_pasid_entry *entry);
> +	int (*set_entry)(struct iommu_pasid_table_ops *ops, int
> pasid,
> +			 struct iommu_pasid_entry *entry);
> +	void (*clear_entry)(struct iommu_pasid_table_ops *ops, int
> pasid,
> +			    struct iommu_pasid_entry *entry);
> +};
> +
> +/**
> + * iommu_pasid_sync_ops - Callbacks into the IOMMU driver
> + *
> + * @cfg_flush:		flush cached configuration for one
> entry. For a
> + *			multi-level PASID table, 'leaf' tells
> whether to only
> + *			flush cached leaf entries or intermediate
> levels as
> + *			well.
> + * @cfg_flush_all:	flush cached configuration for all entries
> of the PASID
> + *			table
> + * @tlb_flush:		flush TLB entries for one entry
> + */
> +struct iommu_pasid_sync_ops {
> +	void (*cfg_flush)(void *cookie, int pasid, bool leaf);
> +	void (*cfg_flush_all)(void *cookie);
> +	void (*tlb_flush)(void *cookie, int pasid,
> +			  struct iommu_pasid_entry *entry);
> +};
> +
> +/**
> + * struct iommu_pasid_table_cfg - Configuration data for a set of
> PASID tables.
> + *
> + * @iommu_dev	device performing the DMA table walks
> + * @order:	number of PASID bits, set by IOMMU driver
> + * @flush:	TLB management callbacks for this set of tables.
> + *
> + * @base:	DMA address of the allocated table, set by the
> allocator.
> + */
> +struct iommu_pasid_table_cfg {
> +	struct device			*iommu_dev;
> +	size_t				order;
> +	const struct iommu_pasid_sync_ops *sync;
> +
> +	dma_addr_t			base;
> +};
> +
> +struct iommu_pasid_table_ops *
> +iommu_alloc_pasid_ops(enum iommu_pasid_table_fmt fmt,
> +		      struct iommu_pasid_table_cfg *cfg,
> +		      void *cookie);
> +void iommu_free_pasid_ops(struct iommu_pasid_table_ops *ops);
> +
> +/**
> + * struct iommu_pasid_table - describes a set of PASID tables
> + *
> + * @fmt:	The PASID table format.
> + * @cookie:	An opaque token provided by the IOMMU driver and
> passed back to
> + *		any callback routine.
> + * @cfg:	A copy of the PASID table configuration.
> + * @ops:	The PASID table operations in use for this set of
> page tables.
> + */
> +struct iommu_pasid_table {
> +	enum iommu_pasid_table_fmt	fmt;
> +	void				*cookie;
> +	struct iommu_pasid_table_cfg	cfg;
> +	struct iommu_pasid_table_ops	ops;
> +};
> +
> +#define iommu_pasid_table_ops_to_table(ops) \
> +	container_of((ops), struct iommu_pasid_table, ops)
> +
> +struct iommu_pasid_init_fns {
> +	struct iommu_pasid_table *(*alloc)(struct
> iommu_pasid_table_cfg *cfg,
> +					   void *cookie);
> +	void (*free)(struct iommu_pasid_table *table);
> +};
> +
> +static inline void iommu_pasid_flush_all(struct iommu_pasid_table
> *table) +{
> +	table->cfg.sync->cfg_flush_all(table->cookie);
> +}
> +
> +static inline void iommu_pasid_flush(struct iommu_pasid_table *table,
> +					 int pasid, bool leaf)
> +{
> +	table->cfg.sync->cfg_flush(table->cookie, pasid, leaf);
> +}
> +
> +static inline void iommu_pasid_flush_tlbs(struct iommu_pasid_table
> *table,
> +					  int pasid,
> +					  struct iommu_pasid_entry
> *entry) +{
> +	table->cfg.sync->tlb_flush(table->cookie, pasid, entry);
> +}
> +
> +#endif /* __IOMMU_PASID_H */

[Jacob Pan]

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 37/37] vfio: Add support for Shared Virtual Addressing
  2018-02-12 18:33   ` Jean-Philippe Brucker
  (?)
@ 2018-02-28  1:26       ` Sinan Kaya
  -1 siblings, 0 replies; 317+ messages in thread
From: Sinan Kaya @ 2018-02-28  1:26 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: mark.rutland-5wv7dgnIgG8,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	catalin.marinas-5wv7dgnIgG8, xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	will.deacon-5wv7dgnIgG8, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	rjw-LthD3rsA81gm4RdzfppkhA, sudeep.holla-5wv7dgnIgG8,
	christian.koenig-5C7GfCeVMHo

On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
> Add two new ioctl for VFIO containers. VFIO_IOMMU_BIND_PROCESS creates a
> bond between a container and a process address space, identified by a
> device-specific ID named PASID. This allows the device to target DMA
> transactions at the process virtual addresses without a need for mapping
> and unmapping buffers explicitly in the IOMMU. The process page tables are
> shared with the IOMMU, and mechanisms such as PCI ATS/PRI are used to
> handle faults. VFIO_IOMMU_UNBIND_PROCESS removes a bond created with
> VFIO_IOMMU_BIND_PROCESS.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 399 ++++++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/vfio.h       |  76 ++++++++
>  2 files changed, 475 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index e30e29ae4819..cac066f0026b 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -30,6 +30,7 @@
>  #include <linux/iommu.h>
>  #include <linux/module.h>
>  #include <linux/mm.h>
> +#include <linux/ptrace.h>
>  #include <linux/rbtree.h>
>  #include <linux/sched/signal.h>
>  #include <linux/sched/mm.h>
> @@ -60,6 +61,7 @@ MODULE_PARM_DESC(disable_hugepages,
>  
>  struct vfio_iommu {
>  	struct list_head	domain_list;
> +	struct list_head	mm_list;
>  	struct vfio_domain	*external_domain; /* domain for external user */
>  	struct mutex		lock;
>  	struct rb_root		dma_list;
> @@ -90,6 +92,15 @@ struct vfio_dma {
>  struct vfio_group {
>  	struct iommu_group	*iommu_group;
>  	struct list_head	next;
> +	bool			sva_enabled;
> +};
> +
> +struct vfio_mm {
> +#define VFIO_PASID_INVALID	(-1)
> +	spinlock_t		lock;
> +	int			pasid;
> +	struct mm_struct	*mm;
> +	struct list_head	next;
>  };
>  
>  /*
> @@ -1117,6 +1128,157 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
>  	return 0;
>  }
>  
> +static int vfio_iommu_mm_exit(struct device *dev, int pasid, void *data)
> +{
> +	struct vfio_mm *vfio_mm = data;
> +
> +	/*
> +	 * The mm_exit callback cannot block, so we can't take the iommu mutex
> +	 * and remove this vfio_mm from the list. Hopefully the SVA code will
> +	 * relax its locking requirement in the future.
> +	 *
> +	 * We mostly care about attach_group, which will attempt to replay all
> +	 * binds in this container. Ensure that it doesn't touch this defunct mm
> +	 * struct, by clearing the pointer. The structure will be freed when the
> +	 * group is removed from the container.
> +	 */
> +	spin_lock(&vfio_mm->lock);
> +	vfio_mm->mm = NULL;
> +	spin_unlock(&vfio_mm->lock);
> +
> +	return 0;
> +}
> +
> +static int vfio_iommu_sva_init(struct device *dev, void *data)
> +{

data is not getting used.

> +
> +	int ret;
> +
> +	ret = iommu_sva_device_init(dev, IOMMU_SVA_FEAT_PASID |
> +				    IOMMU_SVA_FEAT_IOPF, 0);
> +	if (ret)
> +		return ret;
> +
> +	return iommu_register_mm_exit_handler(dev, vfio_iommu_mm_exit);
> +}
> +
> +static int vfio_iommu_sva_shutdown(struct device *dev, void *data)
> +{
> +	iommu_sva_device_shutdown(dev);
> +	iommu_unregister_mm_exit_handler(dev);
> +
> +	return 0;
> +}
> +
> +static int vfio_iommu_bind_group(struct vfio_iommu *iommu,
> +				 struct vfio_group *group,
> +				 struct vfio_mm *vfio_mm)
> +{
> +	int ret;
> +	int pasid;
> +
> +	if (!group->sva_enabled) {
> +		ret = iommu_group_for_each_dev(group->iommu_group, NULL,
> +					       vfio_iommu_sva_init);
> +		if (ret)
> +			return ret;
> +
> +		group->sva_enabled = true;
> +	}
> +
> +	ret = iommu_sva_bind_group(group->iommu_group, vfio_mm->mm, &pasid,
> +				   IOMMU_SVA_FEAT_PASID | IOMMU_SVA_FEAT_IOPF,
> +				   vfio_mm);
> +	if (ret)
> +		return ret;

don't you need to clean up the work done by vfio_iommu_sva_init() here.

> +
> +	if (WARN_ON(vfio_mm->pasid != VFIO_PASID_INVALID && pasid !=
> +		    vfio_mm->pasid))
> +		return -EFAULT;
> +
> +	vfio_mm->pasid = pasid;
> +
> +	return 0;
> +}
> +
> +static void vfio_iommu_unbind_group(struct vfio_group *group,
> +				    struct vfio_mm *vfio_mm)
> +{
> +	iommu_sva_unbind_group(group->iommu_group, vfio_mm->pasid);
> +}
> +
> +static void vfio_iommu_unbind(struct vfio_iommu *iommu,
> +			      struct vfio_mm *vfio_mm)
> +{
> +	struct vfio_group *group;
> +	struct vfio_domain *domain;
> +
> +	list_for_each_entry(domain, &iommu->domain_list, next)
> +		list_for_each_entry(group, &domain->group_list, next)
> +			vfio_iommu_unbind_group(group, vfio_mm);
> +}
> +
> +static bool vfio_mm_get(struct vfio_mm *vfio_mm)
> +{
> +	bool ret;
> +
> +	spin_lock(&vfio_mm->lock);
> +	ret = vfio_mm->mm && mmget_not_zero(vfio_mm->mm);
> +	spin_unlock(&vfio_mm->lock);
> +
> +	return ret;
> +}
> +
> +static void vfio_mm_put(struct vfio_mm *vfio_mm)
> +{
> +	mmput(vfio_mm->mm);
> +}
> +
> +static int vfio_iommu_replay_bind(struct vfio_iommu *iommu, struct vfio_group *group)
> +{
> +	int ret = 0;
> +	struct vfio_mm *vfio_mm;
> +
> +	list_for_each_entry(vfio_mm, &iommu->mm_list, next) {
> +		/*
> +		 * Ensure mm doesn't exit while we're binding it to the new
> +		 * group.
> +		 */
> +		if (!vfio_mm_get(vfio_mm))
> +			continue;
> +		ret = vfio_iommu_bind_group(iommu, group, vfio_mm);
> +		vfio_mm_put(vfio_mm);
> +
> +		if (ret)
> +			goto out_unbind;
> +	}
> +
> +	return 0;
> +
> +out_unbind:
> +	list_for_each_entry_continue_reverse(vfio_mm, &iommu->mm_list, next) {
> +		if (!vfio_mm_get(vfio_mm))
> +			continue;
> +		iommu_sva_unbind_group(group->iommu_group, vfio_mm->pasid);
> +		vfio_mm_put(vfio_mm);
> +	}
> +
> +	return ret;
> +}
> +
> +static void vfio_iommu_free_all_mm(struct vfio_iommu *iommu)
> +{
> +	struct vfio_mm *vfio_mm, *tmp;
> +
> +	/*
> +	 * No need for unbind() here. Since all groups are detached from this
> +	 * iommu, bonds have been removed.
> +	 */
> +	list_for_each_entry_safe(vfio_mm, tmp, &iommu->mm_list, next)
> +		kfree(vfio_mm);
> +	INIT_LIST_HEAD(&iommu->mm_list);
> +}
> +
>  /*
>   * We change our unmap behavior slightly depending on whether the IOMMU
>   * supports fine-grained superpages.  IOMMUs like AMD-Vi will use a superpage
> @@ -1301,6 +1463,15 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>  		    d->prot == domain->prot) {
>  			iommu_detach_group(domain->domain, iommu_group);
>  			if (!iommu_attach_group(d->domain, iommu_group)) {
> +				if (vfio_iommu_replay_bind(iommu, group)) {
> +					iommu_detach_group(d->domain, iommu_group);
> +					ret = iommu_attach_group(domain->domain,
> +								 iommu_group);
> +					if (ret)
> +						goto out_domain;
> +					continue;
> +				}
> +
>  				list_add(&group->next, &d->group_list);
>  				iommu_domain_free(domain->domain);
>  				kfree(domain);
> @@ -1321,6 +1492,10 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>  	if (ret)
>  		goto out_detach;
>  
> +	ret = vfio_iommu_replay_bind(iommu, group);
> +	if (ret)
> +		goto out_detach;
> +
>  	if (resv_msi) {
>  		ret = iommu_get_msi_cookie(domain->domain, resv_msi_base);
>  		if (ret)
> @@ -1426,6 +1601,11 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
>  			continue;
>  
>  		iommu_detach_group(domain->domain, iommu_group);
> +		if (group->sva_enabled) {
> +			iommu_group_for_each_dev(iommu_group, NULL,
> +						 vfio_iommu_sva_shutdown);
> +			group->sva_enabled = false;
> +		}
>  		list_del(&group->next);
>  		kfree(group);
>  		/*
> @@ -1441,6 +1621,7 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
>  					vfio_iommu_unmap_unpin_all(iommu);
>  				else
>  					vfio_iommu_unmap_unpin_reaccount(iommu);
> +				vfio_iommu_free_all_mm(iommu);
>  			}
>  			iommu_domain_free(domain->domain);
>  			list_del(&domain->next);
> @@ -1475,6 +1656,7 @@ static void *vfio_iommu_type1_open(unsigned long arg)
>  	}
>  
>  	INIT_LIST_HEAD(&iommu->domain_list);
> +	INIT_LIST_HEAD(&iommu->mm_list);
>  	iommu->dma_list = RB_ROOT;
>  	mutex_init(&iommu->lock);
>  	BLOCKING_INIT_NOTIFIER_HEAD(&iommu->notifier);
> @@ -1509,6 +1691,7 @@ static void vfio_iommu_type1_release(void *iommu_data)
>  		kfree(iommu->external_domain);
>  	}
>  
> +	vfio_iommu_free_all_mm(iommu);
>  	vfio_iommu_unmap_unpin_all(iommu);
>  
>  	list_for_each_entry_safe(domain, domain_tmp,
> @@ -1537,6 +1720,184 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
>  	return ret;
>  }
>  
> +static struct mm_struct *vfio_iommu_get_mm_by_vpid(pid_t vpid)
> +{
> +	struct mm_struct *mm;
> +	struct task_struct *task;
> +
> +	rcu_read_lock();
> +	task = find_task_by_vpid(vpid);
> +	if (task)
> +		get_task_struct(task);
> +	rcu_read_unlock();
> +	if (!task)
> +		return ERR_PTR(-ESRCH);
> +
> +	/* Ensure that current has RW access on the mm */
> +	mm = mm_access(task, PTRACE_MODE_ATTACH_REALCREDS);
> +	put_task_struct(task);
> +
> +	if (!mm)
> +		return ERR_PTR(-ESRCH);
> +
> +	return mm;
> +}
> +
> +static long vfio_iommu_type1_bind_process(struct vfio_iommu *iommu,
> +					  void __user *arg,
> +					  struct vfio_iommu_type1_bind *bind)
> +{
> +	struct vfio_iommu_type1_bind_process params;
> +	struct vfio_domain *domain;
> +	struct vfio_group *group;
> +	struct vfio_mm *vfio_mm;
> +	struct mm_struct *mm;
> +	unsigned long minsz;
> +	int ret = 0;
> +
> +	minsz = sizeof(*bind) + sizeof(params);
> +	if (bind->argsz < minsz)
> +		return -EINVAL;
> +
> +	arg += sizeof(*bind);
> +	if (copy_from_user(&params, arg, sizeof(params)))
> +		return -EFAULT;
> +
> +	if (params.flags & ~VFIO_IOMMU_BIND_PID)
> +		return -EINVAL;
> +
> +	if (params.flags & VFIO_IOMMU_BIND_PID) {
> +		mm = vfio_iommu_get_mm_by_vpid(params.pid);
> +		if (IS_ERR(mm))
> +			return PTR_ERR(mm);
> +	} else {
> +		mm = get_task_mm(current);
> +		if (!mm)
> +			return -EINVAL;
> +	}

I think you can merge mm failure in both states.

> +
> +	mutex_lock(&iommu->lock);
> +	if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) {
> +		ret = -EINVAL;
> +		goto out_put_mm;
> +	}
> +
> +	list_for_each_entry(vfio_mm, &iommu->mm_list, next) {
> +		if (vfio_mm->mm != mm)
> +			continue;
> +
> +		params.pasid = vfio_mm->pasid;
> +
> +		ret = copy_to_user(arg, &params, sizeof(params)) ? -EFAULT : 0;
> +		goto out_put_mm;
> +	}
> +
> +	vfio_mm = kzalloc(sizeof(*vfio_mm), GFP_KERNEL);
> +	if (!vfio_mm) {
> +		ret = -ENOMEM;
> +		goto out_put_mm;
> +	}
> +
> +	vfio_mm->mm = mm;
> +	vfio_mm->pasid = VFIO_PASID_INVALID;
> +	spin_lock_init(&vfio_mm->lock);
> +
> +	list_for_each_entry(domain, &iommu->domain_list, next) {
> +		list_for_each_entry(group, &domain->group_list, next) {
> +			ret = vfio_iommu_bind_group(iommu, group, vfio_mm);
> +			if (ret)
> +				break;
> +		}
> +		if (ret)
> +			break;
> +	}
> +
> +	if (ret) {
> +		/* Undo all binds that already succeeded */
> +		list_for_each_entry_continue_reverse(group, &domain->group_list,
> +						     next)
> +			vfio_iommu_unbind_group(group, vfio_mm);
> +		list_for_each_entry_continue_reverse(domain, &iommu->domain_list,
> +						     next)
> +			list_for_each_entry(group, &domain->group_list, next)
> +				vfio_iommu_unbind_group(group, vfio_mm);
> +		kfree(vfio_mm);
> +	} else {
> +		list_add(&vfio_mm->next, &iommu->mm_list);
> +
> +		params.pasid = vfio_mm->pasid;
> +		ret = copy_to_user(arg, &params, sizeof(params)) ? -EFAULT : 0;
> +		if (ret) {
> +			vfio_iommu_unbind(iommu, vfio_mm);
> +			kfree(vfio_mm);
> +		}
> +	}
> +
> +out_put_mm:
> +	mutex_unlock(&iommu->lock);
> +	mmput(mm);
> +
> +	return ret;
> +}
> +
> +static long vfio_iommu_type1_unbind_process(struct vfio_iommu *iommu,
> +					    void __user *arg,
> +					    struct vfio_iommu_type1_bind *bind)
> +{
> +	int ret = -EINVAL;
> +	unsigned long minsz;
> +	struct mm_struct *mm;
> +	struct vfio_mm *vfio_mm;
> +	struct vfio_iommu_type1_bind_process params;
> +
> +	minsz = sizeof(*bind) + sizeof(params);
> +	if (bind->argsz < minsz)
> +		return -EINVAL;
> +
> +	arg += sizeof(*bind);
> +	if (copy_from_user(&params, arg, sizeof(params)))
> +		return -EFAULT;
> +
> +	if (params.flags & ~VFIO_IOMMU_BIND_PID)
> +		return -EINVAL;
> +
> +	/*
> +	 * We can't simply unbind a foreign process by PASID, because the
> +	 * process might have died and the PASID might have been reallocated to
> +	 * another process. Instead we need to fetch that process mm by PID
> +	 * again to make sure we remove the right vfio_mm. In addition, holding
> +	 * the mm guarantees that mm_users isn't dropped while we unbind and the
> +	 * exit_mm handler doesn't fire. While not strictly necessary, not
> +	 * having to care about that race simplifies everyone's life.
> +	 */
> +	if (params.flags & VFIO_IOMMU_BIND_PID) {
> +		mm = vfio_iommu_get_mm_by_vpid(params.pid);
> +		if (IS_ERR(mm))
> +			return PTR_ERR(mm);
> +	} else {
> +		mm = get_task_mm(current);
> +		if (!mm)
> +			return -EINVAL;
> +	}
> +

I think you can merge mm failure in both states.

> +	ret = -ESRCH;
> +	mutex_lock(&iommu->lock);
> +	list_for_each_entry(vfio_mm, &iommu->mm_list, next) {
> +		if (vfio_mm->mm != mm)
> +			continue;
> +

these loops look wierd 
1. for loops + break 
2. for loops + goto

how about closing the for loop here. and then return here if not vfio_mm
not found.


> +		vfio_iommu_unbind(iommu, vfio_mm);
> +		list_del(&vfio_mm->next);
> +		kfree(vfio_mm);
> +		ret = 0;
> +		break;
> +	}
> +	mutex_unlock(&iommu->lock);
> +	mmput(mm);
> +
> +	return ret;
> +}
> +

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 37/37] vfio: Add support for Shared Virtual Addressing
@ 2018-02-28  1:26       ` Sinan Kaya
  0 siblings, 0 replies; 317+ messages in thread
From: Sinan Kaya @ 2018-02-28  1:26 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu, kvm
  Cc: mark.rutland, xieyisheng1, ilias.apalodimas, catalin.marinas,
	xuzaibo, jonathan.cameron, will.deacon, yi.l.liu,
	lorenzo.pieralisi, ashok.raj, tn, joro, bharatku, rfranz, lenb,
	jacob.jun.pan, alex.williamson, robh+dt, thunder.leizhen,
	bhelgaas, shunyong.yang, dwmw2, liubo95, rjw, jcrouse, robdclark,
	hanjun.guo, sudeep.holla, robin.murphy, christian.koenig,
	nwatters

On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
> Add two new ioctl for VFIO containers. VFIO_IOMMU_BIND_PROCESS creates a
> bond between a container and a process address space, identified by a
> device-specific ID named PASID. This allows the device to target DMA
> transactions at the process virtual addresses without a need for mapping
> and unmapping buffers explicitly in the IOMMU. The process page tables are
> shared with the IOMMU, and mechanisms such as PCI ATS/PRI are used to
> handle faults. VFIO_IOMMU_UNBIND_PROCESS removes a bond created with
> VFIO_IOMMU_BIND_PROCESS.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 399 ++++++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/vfio.h       |  76 ++++++++
>  2 files changed, 475 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index e30e29ae4819..cac066f0026b 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -30,6 +30,7 @@
>  #include <linux/iommu.h>
>  #include <linux/module.h>
>  #include <linux/mm.h>
> +#include <linux/ptrace.h>
>  #include <linux/rbtree.h>
>  #include <linux/sched/signal.h>
>  #include <linux/sched/mm.h>
> @@ -60,6 +61,7 @@ MODULE_PARM_DESC(disable_hugepages,
>  
>  struct vfio_iommu {
>  	struct list_head	domain_list;
> +	struct list_head	mm_list;
>  	struct vfio_domain	*external_domain; /* domain for external user */
>  	struct mutex		lock;
>  	struct rb_root		dma_list;
> @@ -90,6 +92,15 @@ struct vfio_dma {
>  struct vfio_group {
>  	struct iommu_group	*iommu_group;
>  	struct list_head	next;
> +	bool			sva_enabled;
> +};
> +
> +struct vfio_mm {
> +#define VFIO_PASID_INVALID	(-1)
> +	spinlock_t		lock;
> +	int			pasid;
> +	struct mm_struct	*mm;
> +	struct list_head	next;
>  };
>  
>  /*
> @@ -1117,6 +1128,157 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
>  	return 0;
>  }
>  
> +static int vfio_iommu_mm_exit(struct device *dev, int pasid, void *data)
> +{
> +	struct vfio_mm *vfio_mm = data;
> +
> +	/*
> +	 * The mm_exit callback cannot block, so we can't take the iommu mutex
> +	 * and remove this vfio_mm from the list. Hopefully the SVA code will
> +	 * relax its locking requirement in the future.
> +	 *
> +	 * We mostly care about attach_group, which will attempt to replay all
> +	 * binds in this container. Ensure that it doesn't touch this defunct mm
> +	 * struct, by clearing the pointer. The structure will be freed when the
> +	 * group is removed from the container.
> +	 */
> +	spin_lock(&vfio_mm->lock);
> +	vfio_mm->mm = NULL;
> +	spin_unlock(&vfio_mm->lock);
> +
> +	return 0;
> +}
> +
> +static int vfio_iommu_sva_init(struct device *dev, void *data)
> +{

data is not getting used.

> +
> +	int ret;
> +
> +	ret = iommu_sva_device_init(dev, IOMMU_SVA_FEAT_PASID |
> +				    IOMMU_SVA_FEAT_IOPF, 0);
> +	if (ret)
> +		return ret;
> +
> +	return iommu_register_mm_exit_handler(dev, vfio_iommu_mm_exit);
> +}
> +
> +static int vfio_iommu_sva_shutdown(struct device *dev, void *data)
> +{
> +	iommu_sva_device_shutdown(dev);
> +	iommu_unregister_mm_exit_handler(dev);
> +
> +	return 0;
> +}
> +
> +static int vfio_iommu_bind_group(struct vfio_iommu *iommu,
> +				 struct vfio_group *group,
> +				 struct vfio_mm *vfio_mm)
> +{
> +	int ret;
> +	int pasid;
> +
> +	if (!group->sva_enabled) {
> +		ret = iommu_group_for_each_dev(group->iommu_group, NULL,
> +					       vfio_iommu_sva_init);
> +		if (ret)
> +			return ret;
> +
> +		group->sva_enabled = true;
> +	}
> +
> +	ret = iommu_sva_bind_group(group->iommu_group, vfio_mm->mm, &pasid,
> +				   IOMMU_SVA_FEAT_PASID | IOMMU_SVA_FEAT_IOPF,
> +				   vfio_mm);
> +	if (ret)
> +		return ret;

don't you need to clean up the work done by vfio_iommu_sva_init() here.

> +
> +	if (WARN_ON(vfio_mm->pasid != VFIO_PASID_INVALID && pasid !=
> +		    vfio_mm->pasid))
> +		return -EFAULT;
> +
> +	vfio_mm->pasid = pasid;
> +
> +	return 0;
> +}
> +
> +static void vfio_iommu_unbind_group(struct vfio_group *group,
> +				    struct vfio_mm *vfio_mm)
> +{
> +	iommu_sva_unbind_group(group->iommu_group, vfio_mm->pasid);
> +}
> +
> +static void vfio_iommu_unbind(struct vfio_iommu *iommu,
> +			      struct vfio_mm *vfio_mm)
> +{
> +	struct vfio_group *group;
> +	struct vfio_domain *domain;
> +
> +	list_for_each_entry(domain, &iommu->domain_list, next)
> +		list_for_each_entry(group, &domain->group_list, next)
> +			vfio_iommu_unbind_group(group, vfio_mm);
> +}
> +
> +static bool vfio_mm_get(struct vfio_mm *vfio_mm)
> +{
> +	bool ret;
> +
> +	spin_lock(&vfio_mm->lock);
> +	ret = vfio_mm->mm && mmget_not_zero(vfio_mm->mm);
> +	spin_unlock(&vfio_mm->lock);
> +
> +	return ret;
> +}
> +
> +static void vfio_mm_put(struct vfio_mm *vfio_mm)
> +{
> +	mmput(vfio_mm->mm);
> +}
> +
> +static int vfio_iommu_replay_bind(struct vfio_iommu *iommu, struct vfio_group *group)
> +{
> +	int ret = 0;
> +	struct vfio_mm *vfio_mm;
> +
> +	list_for_each_entry(vfio_mm, &iommu->mm_list, next) {
> +		/*
> +		 * Ensure mm doesn't exit while we're binding it to the new
> +		 * group.
> +		 */
> +		if (!vfio_mm_get(vfio_mm))
> +			continue;
> +		ret = vfio_iommu_bind_group(iommu, group, vfio_mm);
> +		vfio_mm_put(vfio_mm);
> +
> +		if (ret)
> +			goto out_unbind;
> +	}
> +
> +	return 0;
> +
> +out_unbind:
> +	list_for_each_entry_continue_reverse(vfio_mm, &iommu->mm_list, next) {
> +		if (!vfio_mm_get(vfio_mm))
> +			continue;
> +		iommu_sva_unbind_group(group->iommu_group, vfio_mm->pasid);
> +		vfio_mm_put(vfio_mm);
> +	}
> +
> +	return ret;
> +}
> +
> +static void vfio_iommu_free_all_mm(struct vfio_iommu *iommu)
> +{
> +	struct vfio_mm *vfio_mm, *tmp;
> +
> +	/*
> +	 * No need for unbind() here. Since all groups are detached from this
> +	 * iommu, bonds have been removed.
> +	 */
> +	list_for_each_entry_safe(vfio_mm, tmp, &iommu->mm_list, next)
> +		kfree(vfio_mm);
> +	INIT_LIST_HEAD(&iommu->mm_list);
> +}
> +
>  /*
>   * We change our unmap behavior slightly depending on whether the IOMMU
>   * supports fine-grained superpages.  IOMMUs like AMD-Vi will use a superpage
> @@ -1301,6 +1463,15 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>  		    d->prot == domain->prot) {
>  			iommu_detach_group(domain->domain, iommu_group);
>  			if (!iommu_attach_group(d->domain, iommu_group)) {
> +				if (vfio_iommu_replay_bind(iommu, group)) {
> +					iommu_detach_group(d->domain, iommu_group);
> +					ret = iommu_attach_group(domain->domain,
> +								 iommu_group);
> +					if (ret)
> +						goto out_domain;
> +					continue;
> +				}
> +
>  				list_add(&group->next, &d->group_list);
>  				iommu_domain_free(domain->domain);
>  				kfree(domain);
> @@ -1321,6 +1492,10 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>  	if (ret)
>  		goto out_detach;
>  
> +	ret = vfio_iommu_replay_bind(iommu, group);
> +	if (ret)
> +		goto out_detach;
> +
>  	if (resv_msi) {
>  		ret = iommu_get_msi_cookie(domain->domain, resv_msi_base);
>  		if (ret)
> @@ -1426,6 +1601,11 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
>  			continue;
>  
>  		iommu_detach_group(domain->domain, iommu_group);
> +		if (group->sva_enabled) {
> +			iommu_group_for_each_dev(iommu_group, NULL,
> +						 vfio_iommu_sva_shutdown);
> +			group->sva_enabled = false;
> +		}
>  		list_del(&group->next);
>  		kfree(group);
>  		/*
> @@ -1441,6 +1621,7 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
>  					vfio_iommu_unmap_unpin_all(iommu);
>  				else
>  					vfio_iommu_unmap_unpin_reaccount(iommu);
> +				vfio_iommu_free_all_mm(iommu);
>  			}
>  			iommu_domain_free(domain->domain);
>  			list_del(&domain->next);
> @@ -1475,6 +1656,7 @@ static void *vfio_iommu_type1_open(unsigned long arg)
>  	}
>  
>  	INIT_LIST_HEAD(&iommu->domain_list);
> +	INIT_LIST_HEAD(&iommu->mm_list);
>  	iommu->dma_list = RB_ROOT;
>  	mutex_init(&iommu->lock);
>  	BLOCKING_INIT_NOTIFIER_HEAD(&iommu->notifier);
> @@ -1509,6 +1691,7 @@ static void vfio_iommu_type1_release(void *iommu_data)
>  		kfree(iommu->external_domain);
>  	}
>  
> +	vfio_iommu_free_all_mm(iommu);
>  	vfio_iommu_unmap_unpin_all(iommu);
>  
>  	list_for_each_entry_safe(domain, domain_tmp,
> @@ -1537,6 +1720,184 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
>  	return ret;
>  }
>  
> +static struct mm_struct *vfio_iommu_get_mm_by_vpid(pid_t vpid)
> +{
> +	struct mm_struct *mm;
> +	struct task_struct *task;
> +
> +	rcu_read_lock();
> +	task = find_task_by_vpid(vpid);
> +	if (task)
> +		get_task_struct(task);
> +	rcu_read_unlock();
> +	if (!task)
> +		return ERR_PTR(-ESRCH);
> +
> +	/* Ensure that current has RW access on the mm */
> +	mm = mm_access(task, PTRACE_MODE_ATTACH_REALCREDS);
> +	put_task_struct(task);
> +
> +	if (!mm)
> +		return ERR_PTR(-ESRCH);
> +
> +	return mm;
> +}
> +
> +static long vfio_iommu_type1_bind_process(struct vfio_iommu *iommu,
> +					  void __user *arg,
> +					  struct vfio_iommu_type1_bind *bind)
> +{
> +	struct vfio_iommu_type1_bind_process params;
> +	struct vfio_domain *domain;
> +	struct vfio_group *group;
> +	struct vfio_mm *vfio_mm;
> +	struct mm_struct *mm;
> +	unsigned long minsz;
> +	int ret = 0;
> +
> +	minsz = sizeof(*bind) + sizeof(params);
> +	if (bind->argsz < minsz)
> +		return -EINVAL;
> +
> +	arg += sizeof(*bind);
> +	if (copy_from_user(&params, arg, sizeof(params)))
> +		return -EFAULT;
> +
> +	if (params.flags & ~VFIO_IOMMU_BIND_PID)
> +		return -EINVAL;
> +
> +	if (params.flags & VFIO_IOMMU_BIND_PID) {
> +		mm = vfio_iommu_get_mm_by_vpid(params.pid);
> +		if (IS_ERR(mm))
> +			return PTR_ERR(mm);
> +	} else {
> +		mm = get_task_mm(current);
> +		if (!mm)
> +			return -EINVAL;
> +	}

I think you can merge mm failure in both states.

> +
> +	mutex_lock(&iommu->lock);
> +	if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) {
> +		ret = -EINVAL;
> +		goto out_put_mm;
> +	}
> +
> +	list_for_each_entry(vfio_mm, &iommu->mm_list, next) {
> +		if (vfio_mm->mm != mm)
> +			continue;
> +
> +		params.pasid = vfio_mm->pasid;
> +
> +		ret = copy_to_user(arg, &params, sizeof(params)) ? -EFAULT : 0;
> +		goto out_put_mm;
> +	}
> +
> +	vfio_mm = kzalloc(sizeof(*vfio_mm), GFP_KERNEL);
> +	if (!vfio_mm) {
> +		ret = -ENOMEM;
> +		goto out_put_mm;
> +	}
> +
> +	vfio_mm->mm = mm;
> +	vfio_mm->pasid = VFIO_PASID_INVALID;
> +	spin_lock_init(&vfio_mm->lock);
> +
> +	list_for_each_entry(domain, &iommu->domain_list, next) {
> +		list_for_each_entry(group, &domain->group_list, next) {
> +			ret = vfio_iommu_bind_group(iommu, group, vfio_mm);
> +			if (ret)
> +				break;
> +		}
> +		if (ret)
> +			break;
> +	}
> +
> +	if (ret) {
> +		/* Undo all binds that already succeeded */
> +		list_for_each_entry_continue_reverse(group, &domain->group_list,
> +						     next)
> +			vfio_iommu_unbind_group(group, vfio_mm);
> +		list_for_each_entry_continue_reverse(domain, &iommu->domain_list,
> +						     next)
> +			list_for_each_entry(group, &domain->group_list, next)
> +				vfio_iommu_unbind_group(group, vfio_mm);
> +		kfree(vfio_mm);
> +	} else {
> +		list_add(&vfio_mm->next, &iommu->mm_list);
> +
> +		params.pasid = vfio_mm->pasid;
> +		ret = copy_to_user(arg, &params, sizeof(params)) ? -EFAULT : 0;
> +		if (ret) {
> +			vfio_iommu_unbind(iommu, vfio_mm);
> +			kfree(vfio_mm);
> +		}
> +	}
> +
> +out_put_mm:
> +	mutex_unlock(&iommu->lock);
> +	mmput(mm);
> +
> +	return ret;
> +}
> +
> +static long vfio_iommu_type1_unbind_process(struct vfio_iommu *iommu,
> +					    void __user *arg,
> +					    struct vfio_iommu_type1_bind *bind)
> +{
> +	int ret = -EINVAL;
> +	unsigned long minsz;
> +	struct mm_struct *mm;
> +	struct vfio_mm *vfio_mm;
> +	struct vfio_iommu_type1_bind_process params;
> +
> +	minsz = sizeof(*bind) + sizeof(params);
> +	if (bind->argsz < minsz)
> +		return -EINVAL;
> +
> +	arg += sizeof(*bind);
> +	if (copy_from_user(&params, arg, sizeof(params)))
> +		return -EFAULT;
> +
> +	if (params.flags & ~VFIO_IOMMU_BIND_PID)
> +		return -EINVAL;
> +
> +	/*
> +	 * We can't simply unbind a foreign process by PASID, because the
> +	 * process might have died and the PASID might have been reallocated to
> +	 * another process. Instead we need to fetch that process mm by PID
> +	 * again to make sure we remove the right vfio_mm. In addition, holding
> +	 * the mm guarantees that mm_users isn't dropped while we unbind and the
> +	 * exit_mm handler doesn't fire. While not strictly necessary, not
> +	 * having to care about that race simplifies everyone's life.
> +	 */
> +	if (params.flags & VFIO_IOMMU_BIND_PID) {
> +		mm = vfio_iommu_get_mm_by_vpid(params.pid);
> +		if (IS_ERR(mm))
> +			return PTR_ERR(mm);
> +	} else {
> +		mm = get_task_mm(current);
> +		if (!mm)
> +			return -EINVAL;
> +	}
> +

I think you can merge mm failure in both states.

> +	ret = -ESRCH;
> +	mutex_lock(&iommu->lock);
> +	list_for_each_entry(vfio_mm, &iommu->mm_list, next) {
> +		if (vfio_mm->mm != mm)
> +			continue;
> +

these loops look wierd 
1. for loops + break 
2. for loops + goto

how about closing the for loop here. and then return here if not vfio_mm
not found.


> +		vfio_iommu_unbind(iommu, vfio_mm);
> +		list_del(&vfio_mm->next);
> +		kfree(vfio_mm);
> +		ret = 0;
> +		break;
> +	}
> +	mutex_unlock(&iommu->lock);
> +	mmput(mm);
> +
> +	return ret;
> +}
> +

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 37/37] vfio: Add support for Shared Virtual Addressing
@ 2018-02-28  1:26       ` Sinan Kaya
  0 siblings, 0 replies; 317+ messages in thread
From: Sinan Kaya @ 2018-02-28  1:26 UTC (permalink / raw)
  To: linux-arm-kernel

On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
> Add two new ioctl for VFIO containers. VFIO_IOMMU_BIND_PROCESS creates a
> bond between a container and a process address space, identified by a
> device-specific ID named PASID. This allows the device to target DMA
> transactions at the process virtual addresses without a need for mapping
> and unmapping buffers explicitly in the IOMMU. The process page tables are
> shared with the IOMMU, and mechanisms such as PCI ATS/PRI are used to
> handle faults. VFIO_IOMMU_UNBIND_PROCESS removes a bond created with
> VFIO_IOMMU_BIND_PROCESS.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 399 ++++++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/vfio.h       |  76 ++++++++
>  2 files changed, 475 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index e30e29ae4819..cac066f0026b 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -30,6 +30,7 @@
>  #include <linux/iommu.h>
>  #include <linux/module.h>
>  #include <linux/mm.h>
> +#include <linux/ptrace.h>
>  #include <linux/rbtree.h>
>  #include <linux/sched/signal.h>
>  #include <linux/sched/mm.h>
> @@ -60,6 +61,7 @@ MODULE_PARM_DESC(disable_hugepages,
>  
>  struct vfio_iommu {
>  	struct list_head	domain_list;
> +	struct list_head	mm_list;
>  	struct vfio_domain	*external_domain; /* domain for external user */
>  	struct mutex		lock;
>  	struct rb_root		dma_list;
> @@ -90,6 +92,15 @@ struct vfio_dma {
>  struct vfio_group {
>  	struct iommu_group	*iommu_group;
>  	struct list_head	next;
> +	bool			sva_enabled;
> +};
> +
> +struct vfio_mm {
> +#define VFIO_PASID_INVALID	(-1)
> +	spinlock_t		lock;
> +	int			pasid;
> +	struct mm_struct	*mm;
> +	struct list_head	next;
>  };
>  
>  /*
> @@ -1117,6 +1128,157 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
>  	return 0;
>  }
>  
> +static int vfio_iommu_mm_exit(struct device *dev, int pasid, void *data)
> +{
> +	struct vfio_mm *vfio_mm = data;
> +
> +	/*
> +	 * The mm_exit callback cannot block, so we can't take the iommu mutex
> +	 * and remove this vfio_mm from the list. Hopefully the SVA code will
> +	 * relax its locking requirement in the future.
> +	 *
> +	 * We mostly care about attach_group, which will attempt to replay all
> +	 * binds in this container. Ensure that it doesn't touch this defunct mm
> +	 * struct, by clearing the pointer. The structure will be freed when the
> +	 * group is removed from the container.
> +	 */
> +	spin_lock(&vfio_mm->lock);
> +	vfio_mm->mm = NULL;
> +	spin_unlock(&vfio_mm->lock);
> +
> +	return 0;
> +}
> +
> +static int vfio_iommu_sva_init(struct device *dev, void *data)
> +{

data is not getting used.

> +
> +	int ret;
> +
> +	ret = iommu_sva_device_init(dev, IOMMU_SVA_FEAT_PASID |
> +				    IOMMU_SVA_FEAT_IOPF, 0);
> +	if (ret)
> +		return ret;
> +
> +	return iommu_register_mm_exit_handler(dev, vfio_iommu_mm_exit);
> +}
> +
> +static int vfio_iommu_sva_shutdown(struct device *dev, void *data)
> +{
> +	iommu_sva_device_shutdown(dev);
> +	iommu_unregister_mm_exit_handler(dev);
> +
> +	return 0;
> +}
> +
> +static int vfio_iommu_bind_group(struct vfio_iommu *iommu,
> +				 struct vfio_group *group,
> +				 struct vfio_mm *vfio_mm)
> +{
> +	int ret;
> +	int pasid;
> +
> +	if (!group->sva_enabled) {
> +		ret = iommu_group_for_each_dev(group->iommu_group, NULL,
> +					       vfio_iommu_sva_init);
> +		if (ret)
> +			return ret;
> +
> +		group->sva_enabled = true;
> +	}
> +
> +	ret = iommu_sva_bind_group(group->iommu_group, vfio_mm->mm, &pasid,
> +				   IOMMU_SVA_FEAT_PASID | IOMMU_SVA_FEAT_IOPF,
> +				   vfio_mm);
> +	if (ret)
> +		return ret;

don't you need to clean up the work done by vfio_iommu_sva_init() here.

> +
> +	if (WARN_ON(vfio_mm->pasid != VFIO_PASID_INVALID && pasid !=
> +		    vfio_mm->pasid))
> +		return -EFAULT;
> +
> +	vfio_mm->pasid = pasid;
> +
> +	return 0;
> +}
> +
> +static void vfio_iommu_unbind_group(struct vfio_group *group,
> +				    struct vfio_mm *vfio_mm)
> +{
> +	iommu_sva_unbind_group(group->iommu_group, vfio_mm->pasid);
> +}
> +
> +static void vfio_iommu_unbind(struct vfio_iommu *iommu,
> +			      struct vfio_mm *vfio_mm)
> +{
> +	struct vfio_group *group;
> +	struct vfio_domain *domain;
> +
> +	list_for_each_entry(domain, &iommu->domain_list, next)
> +		list_for_each_entry(group, &domain->group_list, next)
> +			vfio_iommu_unbind_group(group, vfio_mm);
> +}
> +
> +static bool vfio_mm_get(struct vfio_mm *vfio_mm)
> +{
> +	bool ret;
> +
> +	spin_lock(&vfio_mm->lock);
> +	ret = vfio_mm->mm && mmget_not_zero(vfio_mm->mm);
> +	spin_unlock(&vfio_mm->lock);
> +
> +	return ret;
> +}
> +
> +static void vfio_mm_put(struct vfio_mm *vfio_mm)
> +{
> +	mmput(vfio_mm->mm);
> +}
> +
> +static int vfio_iommu_replay_bind(struct vfio_iommu *iommu, struct vfio_group *group)
> +{
> +	int ret = 0;
> +	struct vfio_mm *vfio_mm;
> +
> +	list_for_each_entry(vfio_mm, &iommu->mm_list, next) {
> +		/*
> +		 * Ensure mm doesn't exit while we're binding it to the new
> +		 * group.
> +		 */
> +		if (!vfio_mm_get(vfio_mm))
> +			continue;
> +		ret = vfio_iommu_bind_group(iommu, group, vfio_mm);
> +		vfio_mm_put(vfio_mm);
> +
> +		if (ret)
> +			goto out_unbind;
> +	}
> +
> +	return 0;
> +
> +out_unbind:
> +	list_for_each_entry_continue_reverse(vfio_mm, &iommu->mm_list, next) {
> +		if (!vfio_mm_get(vfio_mm))
> +			continue;
> +		iommu_sva_unbind_group(group->iommu_group, vfio_mm->pasid);
> +		vfio_mm_put(vfio_mm);
> +	}
> +
> +	return ret;
> +}
> +
> +static void vfio_iommu_free_all_mm(struct vfio_iommu *iommu)
> +{
> +	struct vfio_mm *vfio_mm, *tmp;
> +
> +	/*
> +	 * No need for unbind() here. Since all groups are detached from this
> +	 * iommu, bonds have been removed.
> +	 */
> +	list_for_each_entry_safe(vfio_mm, tmp, &iommu->mm_list, next)
> +		kfree(vfio_mm);
> +	INIT_LIST_HEAD(&iommu->mm_list);
> +}
> +
>  /*
>   * We change our unmap behavior slightly depending on whether the IOMMU
>   * supports fine-grained superpages.  IOMMUs like AMD-Vi will use a superpage
> @@ -1301,6 +1463,15 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>  		    d->prot == domain->prot) {
>  			iommu_detach_group(domain->domain, iommu_group);
>  			if (!iommu_attach_group(d->domain, iommu_group)) {
> +				if (vfio_iommu_replay_bind(iommu, group)) {
> +					iommu_detach_group(d->domain, iommu_group);
> +					ret = iommu_attach_group(domain->domain,
> +								 iommu_group);
> +					if (ret)
> +						goto out_domain;
> +					continue;
> +				}
> +
>  				list_add(&group->next, &d->group_list);
>  				iommu_domain_free(domain->domain);
>  				kfree(domain);
> @@ -1321,6 +1492,10 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>  	if (ret)
>  		goto out_detach;
>  
> +	ret = vfio_iommu_replay_bind(iommu, group);
> +	if (ret)
> +		goto out_detach;
> +
>  	if (resv_msi) {
>  		ret = iommu_get_msi_cookie(domain->domain, resv_msi_base);
>  		if (ret)
> @@ -1426,6 +1601,11 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
>  			continue;
>  
>  		iommu_detach_group(domain->domain, iommu_group);
> +		if (group->sva_enabled) {
> +			iommu_group_for_each_dev(iommu_group, NULL,
> +						 vfio_iommu_sva_shutdown);
> +			group->sva_enabled = false;
> +		}
>  		list_del(&group->next);
>  		kfree(group);
>  		/*
> @@ -1441,6 +1621,7 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
>  					vfio_iommu_unmap_unpin_all(iommu);
>  				else
>  					vfio_iommu_unmap_unpin_reaccount(iommu);
> +				vfio_iommu_free_all_mm(iommu);
>  			}
>  			iommu_domain_free(domain->domain);
>  			list_del(&domain->next);
> @@ -1475,6 +1656,7 @@ static void *vfio_iommu_type1_open(unsigned long arg)
>  	}
>  
>  	INIT_LIST_HEAD(&iommu->domain_list);
> +	INIT_LIST_HEAD(&iommu->mm_list);
>  	iommu->dma_list = RB_ROOT;
>  	mutex_init(&iommu->lock);
>  	BLOCKING_INIT_NOTIFIER_HEAD(&iommu->notifier);
> @@ -1509,6 +1691,7 @@ static void vfio_iommu_type1_release(void *iommu_data)
>  		kfree(iommu->external_domain);
>  	}
>  
> +	vfio_iommu_free_all_mm(iommu);
>  	vfio_iommu_unmap_unpin_all(iommu);
>  
>  	list_for_each_entry_safe(domain, domain_tmp,
> @@ -1537,6 +1720,184 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
>  	return ret;
>  }
>  
> +static struct mm_struct *vfio_iommu_get_mm_by_vpid(pid_t vpid)
> +{
> +	struct mm_struct *mm;
> +	struct task_struct *task;
> +
> +	rcu_read_lock();
> +	task = find_task_by_vpid(vpid);
> +	if (task)
> +		get_task_struct(task);
> +	rcu_read_unlock();
> +	if (!task)
> +		return ERR_PTR(-ESRCH);
> +
> +	/* Ensure that current has RW access on the mm */
> +	mm = mm_access(task, PTRACE_MODE_ATTACH_REALCREDS);
> +	put_task_struct(task);
> +
> +	if (!mm)
> +		return ERR_PTR(-ESRCH);
> +
> +	return mm;
> +}
> +
> +static long vfio_iommu_type1_bind_process(struct vfio_iommu *iommu,
> +					  void __user *arg,
> +					  struct vfio_iommu_type1_bind *bind)
> +{
> +	struct vfio_iommu_type1_bind_process params;
> +	struct vfio_domain *domain;
> +	struct vfio_group *group;
> +	struct vfio_mm *vfio_mm;
> +	struct mm_struct *mm;
> +	unsigned long minsz;
> +	int ret = 0;
> +
> +	minsz = sizeof(*bind) + sizeof(params);
> +	if (bind->argsz < minsz)
> +		return -EINVAL;
> +
> +	arg += sizeof(*bind);
> +	if (copy_from_user(&params, arg, sizeof(params)))
> +		return -EFAULT;
> +
> +	if (params.flags & ~VFIO_IOMMU_BIND_PID)
> +		return -EINVAL;
> +
> +	if (params.flags & VFIO_IOMMU_BIND_PID) {
> +		mm = vfio_iommu_get_mm_by_vpid(params.pid);
> +		if (IS_ERR(mm))
> +			return PTR_ERR(mm);
> +	} else {
> +		mm = get_task_mm(current);
> +		if (!mm)
> +			return -EINVAL;
> +	}

I think you can merge mm failure in both states.

> +
> +	mutex_lock(&iommu->lock);
> +	if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) {
> +		ret = -EINVAL;
> +		goto out_put_mm;
> +	}
> +
> +	list_for_each_entry(vfio_mm, &iommu->mm_list, next) {
> +		if (vfio_mm->mm != mm)
> +			continue;
> +
> +		params.pasid = vfio_mm->pasid;
> +
> +		ret = copy_to_user(arg, &params, sizeof(params)) ? -EFAULT : 0;
> +		goto out_put_mm;
> +	}
> +
> +	vfio_mm = kzalloc(sizeof(*vfio_mm), GFP_KERNEL);
> +	if (!vfio_mm) {
> +		ret = -ENOMEM;
> +		goto out_put_mm;
> +	}
> +
> +	vfio_mm->mm = mm;
> +	vfio_mm->pasid = VFIO_PASID_INVALID;
> +	spin_lock_init(&vfio_mm->lock);
> +
> +	list_for_each_entry(domain, &iommu->domain_list, next) {
> +		list_for_each_entry(group, &domain->group_list, next) {
> +			ret = vfio_iommu_bind_group(iommu, group, vfio_mm);
> +			if (ret)
> +				break;
> +		}
> +		if (ret)
> +			break;
> +	}
> +
> +	if (ret) {
> +		/* Undo all binds that already succeeded */
> +		list_for_each_entry_continue_reverse(group, &domain->group_list,
> +						     next)
> +			vfio_iommu_unbind_group(group, vfio_mm);
> +		list_for_each_entry_continue_reverse(domain, &iommu->domain_list,
> +						     next)
> +			list_for_each_entry(group, &domain->group_list, next)
> +				vfio_iommu_unbind_group(group, vfio_mm);
> +		kfree(vfio_mm);
> +	} else {
> +		list_add(&vfio_mm->next, &iommu->mm_list);
> +
> +		params.pasid = vfio_mm->pasid;
> +		ret = copy_to_user(arg, &params, sizeof(params)) ? -EFAULT : 0;
> +		if (ret) {
> +			vfio_iommu_unbind(iommu, vfio_mm);
> +			kfree(vfio_mm);
> +		}
> +	}
> +
> +out_put_mm:
> +	mutex_unlock(&iommu->lock);
> +	mmput(mm);
> +
> +	return ret;
> +}
> +
> +static long vfio_iommu_type1_unbind_process(struct vfio_iommu *iommu,
> +					    void __user *arg,
> +					    struct vfio_iommu_type1_bind *bind)
> +{
> +	int ret = -EINVAL;
> +	unsigned long minsz;
> +	struct mm_struct *mm;
> +	struct vfio_mm *vfio_mm;
> +	struct vfio_iommu_type1_bind_process params;
> +
> +	minsz = sizeof(*bind) + sizeof(params);
> +	if (bind->argsz < minsz)
> +		return -EINVAL;
> +
> +	arg += sizeof(*bind);
> +	if (copy_from_user(&params, arg, sizeof(params)))
> +		return -EFAULT;
> +
> +	if (params.flags & ~VFIO_IOMMU_BIND_PID)
> +		return -EINVAL;
> +
> +	/*
> +	 * We can't simply unbind a foreign process by PASID, because the
> +	 * process might have died and the PASID might have been reallocated to
> +	 * another process. Instead we need to fetch that process mm by PID
> +	 * again to make sure we remove the right vfio_mm. In addition, holding
> +	 * the mm guarantees that mm_users isn't dropped while we unbind and the
> +	 * exit_mm handler doesn't fire. While not strictly necessary, not
> +	 * having to care about that race simplifies everyone's life.
> +	 */
> +	if (params.flags & VFIO_IOMMU_BIND_PID) {
> +		mm = vfio_iommu_get_mm_by_vpid(params.pid);
> +		if (IS_ERR(mm))
> +			return PTR_ERR(mm);
> +	} else {
> +		mm = get_task_mm(current);
> +		if (!mm)
> +			return -EINVAL;
> +	}
> +

I think you can merge mm failure in both states.

> +	ret = -ESRCH;
> +	mutex_lock(&iommu->lock);
> +	list_for_each_entry(vfio_mm, &iommu->mm_list, next) {
> +		if (vfio_mm->mm != mm)
> +			continue;
> +

these loops look wierd 
1. for loops + break 
2. for loops + goto

how about closing the for loop here. and then return here if not vfio_mm
not found.


> +		vfio_iommu_unbind(iommu, vfio_mm);
> +		list_del(&vfio_mm->next);
> +		kfree(vfio_mm);
> +		ret = 0;
> +		break;
> +	}
> +	mutex_unlock(&iommu->lock);
> +	mmput(mm);
> +
> +	return ret;
> +}
> +

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 01/37] iommu: Introduce Shared Virtual Addressing API
  2018-02-27  6:21                     ` Tian, Kevin
  (?)
@ 2018-02-28 16:20                         ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-28 16:20 UTC (permalink / raw)
  To: Tian, Kevin, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Mark Rutland, bharatku-gjFFaj9aHVfQT0dZR+AlfA, Raj, Ashok,
	rjw-LthD3rsA81gm4RdzfppkhA, Catalin Marinas,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, Sudeep Holla,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	christian.koenig-5C7GfCeVMHo, lenb-DgEjT+Ai2ygdnm+yROfE0A

On 27/02/18 06:21, Tian, Kevin wrote:
[...]
>> Technically nothing prevents it, but now the resv problem discussed on
>> patch 2/37 stands out. For example on x86 you'd probably need to carve
>> the
>> IOAPIC MSI range out of the process address space. On Arm you'd need to
>> create a write-only mapping for MSIs (IOMMU translates it to the IRQ chip
>> address, but thankfully accessing the doorbell from CPU side doesn't
>> trigger an MSI.)
> 
> so if overlap already exists when binding a process address space
> (since binding may happen much later than creating the process),
> I assume the call will simply fail since carve out at this point is not
> possible?

Yes in this case I think it's safer to abort the bind() call

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 01/37] iommu: Introduce Shared Virtual Addressing API
@ 2018-02-28 16:20                         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-28 16:20 UTC (permalink / raw)
  To: Tian, Kevin, linux-arm-kernel, linux-pci, linux-acpi, devicetree,
	iommu, kvm
  Cc: Mark Rutland, bharatku, Raj, Ashok, shunyong.yang, rjw,
	Catalin Marinas, xuzaibo, ilias.apalodimas, Will Deacon,
	Joerg Roedel, okaya, bhelgaas, robh+dt, Sudeep Holla, rfranz,
	dwmw2, christian.koenig, lenb

On 27/02/18 06:21, Tian, Kevin wrote:
[...]
>> Technically nothing prevents it, but now the resv problem discussed on
>> patch 2/37 stands out. For example on x86 you'd probably need to carve
>> the
>> IOAPIC MSI range out of the process address space. On Arm you'd need to
>> create a write-only mapping for MSIs (IOMMU translates it to the IRQ chip
>> address, but thankfully accessing the doorbell from CPU side doesn't
>> trigger an MSI.)
> 
> so if overlap already exists when binding a process address space
> (since binding may happen much later than creating the process),
> I assume the call will simply fail since carve out at this point is not
> possible?

Yes in this case I think it's safer to abort the bind() call

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 01/37] iommu: Introduce Shared Virtual Addressing API
@ 2018-02-28 16:20                         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-28 16:20 UTC (permalink / raw)
  To: linux-arm-kernel

On 27/02/18 06:21, Tian, Kevin wrote:
[...]
>> Technically nothing prevents it, but now the resv problem discussed on
>> patch 2/37 stands out. For example on x86 you'd probably need to carve
>> the
>> IOAPIC MSI range out of the process address space. On Arm you'd need to
>> create a write-only mapping for MSIs (IOMMU translates it to the IRQ chip
>> address, but thankfully accessing the doorbell from CPU side doesn't
>> trigger an MSI.)
> 
> so if overlap already exists when binding a process address space
> (since binding may happen much later than creating the process),
> I assume the call will simply fail since carve out at this point is not
> possible?

Yes in this case I think it's safer to abort the bind() call

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 16/37] iommu: Add generic PASID table library
  2018-02-27 18:51         ` Jacob Pan
  (?)
@ 2018-02-28 16:22           ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-28 16:22 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Mark Rutland, ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, Catalin Marinas,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw

On 27/02/18 18:51, Jacob Pan wrote:
[...]
>> +struct iommu_pasid_table_ops *
>> +iommu_alloc_pasid_ops(enum iommu_pasid_table_fmt fmt,
>> +		      struct iommu_pasid_table_cfg *cfg, void
>> *cookie) +{
> I guess you don't need to pass in cookie here.

The cookie is stored in the table driver and passed back to the IOMMU
driver when invalidating a PASID table entry

>> +	struct iommu_pasid_table *table;
>> +	const struct iommu_pasid_init_fns *fns;
>> +
>> +	if (fmt >= PASID_TABLE_NUM_FMTS)
>> +		return NULL;
>> +
>> +	fns = pasid_table_init_fns[fmt];
>> +	if (!fns)
>> +		return NULL;
>> +
>> +	table = fns->alloc(cfg, cookie);
>> +	if (!table)
>> +		return NULL;
>> +
>> +	table->fmt = fmt;
>> +	table->cookie = cookie;
>> +	table->cfg = *cfg;
>> +
> the ops is already IOMMU model specific, why do you need to pass cfg
> back?

The table driver needs some config information at runtime. Callbacks such
as iommu_pasid_table_ops::alloc_shared_entry() receive the
iommu_pasid_table_ops instance as argument. They can then get the
iommu_pasid_table structure with container_of() and retrieve the config
stored in table->cfg.

>> +	return &table->ops;
> If there is no common code that uses these ops, I don't see the benefit
> of having these APIs. Or the plan is to consolidate even further such
> that referene to pasid table can be attached at per iommu_domain etc,
> but that would be model specific choice.

I don't plan to consolidate further. This API is for multiple IOMMU
drivers with different transports implementing the same PASID table
formats. For example my vSVA implementation uses this API in virtio-iommu
for assigning PASID tables to the guest (All fairly experimental at this
point. I initially intended to assign just the page directories, but
passing the whole PASID table seemed more popular.)

In the future there might be other vendor IOMMUs implementing the same
PASID table formats, just like there are currently 6 IOMMU drivers using
the page-table code implemented by the io-pgtable.c lib (which I copied in
this patch).

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 16/37] iommu: Add generic PASID table library
@ 2018-02-28 16:22           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-28 16:22 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Mark Rutland, xieyisheng1, ilias.apalodimas, kvm, linux-pci,
	xuzaibo, jonathan.cameron, Will Deacon, okaya, yi.l.liu,
	Lorenzo Pieralisi, ashok.raj, tn, joro, robdclark, bharatku,
	linux-acpi, Catalin Marinas, rfranz, lenb, devicetree,
	alex.williamson, robh+dt, thunder.leizhen, bhelgaas,
	linux-arm-kernel, shunyong.yang, dwmw2, liubo95, rjw, jcrouse,
	iommu, hanjun.guo, Sudeep Holla, Robin Murphy, christian.koenig,
	nwatters

On 27/02/18 18:51, Jacob Pan wrote:
[...]
>> +struct iommu_pasid_table_ops *
>> +iommu_alloc_pasid_ops(enum iommu_pasid_table_fmt fmt,
>> +		      struct iommu_pasid_table_cfg *cfg, void
>> *cookie) +{
> I guess you don't need to pass in cookie here.

The cookie is stored in the table driver and passed back to the IOMMU
driver when invalidating a PASID table entry

>> +	struct iommu_pasid_table *table;
>> +	const struct iommu_pasid_init_fns *fns;
>> +
>> +	if (fmt >= PASID_TABLE_NUM_FMTS)
>> +		return NULL;
>> +
>> +	fns = pasid_table_init_fns[fmt];
>> +	if (!fns)
>> +		return NULL;
>> +
>> +	table = fns->alloc(cfg, cookie);
>> +	if (!table)
>> +		return NULL;
>> +
>> +	table->fmt = fmt;
>> +	table->cookie = cookie;
>> +	table->cfg = *cfg;
>> +
> the ops is already IOMMU model specific, why do you need to pass cfg
> back?

The table driver needs some config information at runtime. Callbacks such
as iommu_pasid_table_ops::alloc_shared_entry() receive the
iommu_pasid_table_ops instance as argument. They can then get the
iommu_pasid_table structure with container_of() and retrieve the config
stored in table->cfg.

>> +	return &table->ops;
> If there is no common code that uses these ops, I don't see the benefit
> of having these APIs. Or the plan is to consolidate even further such
> that referene to pasid table can be attached at per iommu_domain etc,
> but that would be model specific choice.

I don't plan to consolidate further. This API is for multiple IOMMU
drivers with different transports implementing the same PASID table
formats. For example my vSVA implementation uses this API in virtio-iommu
for assigning PASID tables to the guest (All fairly experimental at this
point. I initially intended to assign just the page directories, but
passing the whole PASID table seemed more popular.)

In the future there might be other vendor IOMMUs implementing the same
PASID table formats, just like there are currently 6 IOMMU drivers using
the page-table code implemented by the io-pgtable.c lib (which I copied in
this patch).

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 16/37] iommu: Add generic PASID table library
@ 2018-02-28 16:22           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-28 16:22 UTC (permalink / raw)
  To: linux-arm-kernel

On 27/02/18 18:51, Jacob Pan wrote:
[...]
>> +struct iommu_pasid_table_ops *
>> +iommu_alloc_pasid_ops(enum iommu_pasid_table_fmt fmt,
>> +		      struct iommu_pasid_table_cfg *cfg, void
>> *cookie) +{
> I guess you don't need to pass in cookie here.

The cookie is stored in the table driver and passed back to the IOMMU
driver when invalidating a PASID table entry

>> +	struct iommu_pasid_table *table;
>> +	const struct iommu_pasid_init_fns *fns;
>> +
>> +	if (fmt >= PASID_TABLE_NUM_FMTS)
>> +		return NULL;
>> +
>> +	fns = pasid_table_init_fns[fmt];
>> +	if (!fns)
>> +		return NULL;
>> +
>> +	table = fns->alloc(cfg, cookie);
>> +	if (!table)
>> +		return NULL;
>> +
>> +	table->fmt = fmt;
>> +	table->cookie = cookie;
>> +	table->cfg = *cfg;
>> +
> the ops is already IOMMU model specific, why do you need to pass cfg
> back?

The table driver needs some config information at runtime. Callbacks such
as iommu_pasid_table_ops::alloc_shared_entry() receive the
iommu_pasid_table_ops instance as argument. They can then get the
iommu_pasid_table structure with container_of() and retrieve the config
stored in table->cfg.

>> +	return &table->ops;
> If there is no common code that uses these ops, I don't see the benefit
> of having these APIs. Or the plan is to consolidate even further such
> that referene to pasid table can be attached at per iommu_domain etc,
> but that would be model specific choice.

I don't plan to consolidate further. This API is for multiple IOMMU
drivers with different transports implementing the same PASID table
formats. For example my vSVA implementation uses this API in virtio-iommu
for assigning PASID tables to the guest (All fairly experimental at this
point. I initially intended to assign just the page directories, but
passing the whole PASID table seemed more popular.)

In the future there might be other vendor IOMMUs implementing the same
PASID table formats, just like there are currently 6 IOMMU drivers using
the page-table code implemented by the io-pgtable.c lib (which I copied in
this patch).

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 37/37] vfio: Add support for Shared Virtual Addressing
  2018-02-28  1:26       ` Sinan Kaya
  (?)
@ 2018-02-28 16:25           ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-28 16:25 UTC (permalink / raw)
  To: Sinan Kaya, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Mark Rutland, ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	Catalin Marinas, xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	rjw-LthD3rsA81gm4RdzfppkhA, Sudeep Holla,
	christian.koenig-5C7GfCeVMHo

On 28/02/18 01:26, Sinan Kaya wrote:
[...]
>> +static int vfio_iommu_sva_init(struct device *dev, void *data)
>> +{
> 
> data is not getting used.

That's the pointer passed to "iommu_group_for_each_dev", NULL at the
moment. Next version of this patch will keep some state in data to
ensure one device per group.

>> +
>> +	int ret;
>> +
>> +	ret = iommu_sva_device_init(dev, IOMMU_SVA_FEAT_PASID |
>> +				    IOMMU_SVA_FEAT_IOPF, 0);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return iommu_register_mm_exit_handler(dev, vfio_iommu_mm_exit);
>> +}
>> +
>> +static int vfio_iommu_sva_shutdown(struct device *dev, void *data)
>> +{
>> +	iommu_sva_device_shutdown(dev);
>> +	iommu_unregister_mm_exit_handler(dev);
>> +
>> +	return 0;
>> +}
>> +
>> +static int vfio_iommu_bind_group(struct vfio_iommu *iommu,
>> +				 struct vfio_group *group,
>> +				 struct vfio_mm *vfio_mm)
>> +{
>> +	int ret;
>> +	int pasid;
>> +
>> +	if (!group->sva_enabled) {
>> +		ret = iommu_group_for_each_dev(group->iommu_group, NULL,
>> +					       vfio_iommu_sva_init);
>> +		if (ret)
>> +			return ret;
>> +
>> +		group->sva_enabled = true;
>> +	}
>> +
>> +	ret = iommu_sva_bind_group(group->iommu_group, vfio_mm->mm, &pasid,
>> +				   IOMMU_SVA_FEAT_PASID | IOMMU_SVA_FEAT_IOPF,
>> +				   vfio_mm);
>> +	if (ret)
>> +		return ret;
> 
> don't you need to clean up the work done by vfio_iommu_sva_init() here.

Yes I suppose we can, if we enabled during this bind

[...]
>> +static long vfio_iommu_type1_bind_process(struct vfio_iommu *iommu,
>> +					  void __user *arg,
>> +					  struct vfio_iommu_type1_bind *bind)
>> +{
>> +	struct vfio_iommu_type1_bind_process params;
>> +	struct vfio_domain *domain;
>> +	struct vfio_group *group;
>> +	struct vfio_mm *vfio_mm;
>> +	struct mm_struct *mm;
>> +	unsigned long minsz;
>> +	int ret = 0;
>> +
>> +	minsz = sizeof(*bind) + sizeof(params);
>> +	if (bind->argsz < minsz)
>> +		return -EINVAL;
>> +
>> +	arg += sizeof(*bind);
>> +	if (copy_from_user(&params, arg, sizeof(params)))
>> +		return -EFAULT;
>> +
>> +	if (params.flags & ~VFIO_IOMMU_BIND_PID)
>> +		return -EINVAL;
>> +
>> +	if (params.flags & VFIO_IOMMU_BIND_PID) {
>> +		mm = vfio_iommu_get_mm_by_vpid(params.pid);
>> +		if (IS_ERR(mm))
>> +			return PTR_ERR(mm);
>> +	} else {
>> +		mm = get_task_mm(current);
>> +		if (!mm)
>> +			return -EINVAL;
>> +	}
> 
> I think you can merge mm failure in both states.

Yes, I think vfio_iommu_get_mm_by_vpid could return NULL instead of an
error pointer, and we can throw -ESRCH in all cases (the existing
get_task_mm() failure in this driver does return -ESRCH, so it would be
consistent.)

[...]
>> +	/*
>> +	 * We can't simply unbind a foreign process by PASID, because the
>> +	 * process might have died and the PASID might have been reallocated to
>> +	 * another process. Instead we need to fetch that process mm by PID
>> +	 * again to make sure we remove the right vfio_mm. In addition, holding
>> +	 * the mm guarantees that mm_users isn't dropped while we unbind and the
>> +	 * exit_mm handler doesn't fire. While not strictly necessary, not
>> +	 * having to care about that race simplifies everyone's life.
>> +	 */
>> +	if (params.flags & VFIO_IOMMU_BIND_PID) {
>> +		mm = vfio_iommu_get_mm_by_vpid(params.pid);
>> +		if (IS_ERR(mm))
>> +			return PTR_ERR(mm);
>> +	} else {
>> +		mm = get_task_mm(current);
>> +		if (!mm)
>> +			return -EINVAL;
>> +	}
>> +
> 
> I think you can merge mm failure in both states.

ok

>> +	ret = -ESRCH;
>> +	mutex_lock(&iommu->lock);
>> +	list_for_each_entry(vfio_mm, &iommu->mm_list, next) {
>> +		if (vfio_mm->mm != mm)
>> +			continue;
>> +
> 
> these loops look wierd 
> 1. for loops + break 
> 2. for loops + goto
> 
> how about closing the for loop here. and then return here if not vfio_mm
> not found.

ok

>> +		vfio_iommu_unbind(iommu, vfio_mm);
>> +		list_del(&vfio_mm->next);
>> +		kfree(vfio_mm);
>> +		ret = 0;
>> +		break;
>> +	}
>> +	mutex_unlock(&iommu->lock);
>> +	mmput(mm);
>> +
>> +	return ret;
>> +}
>> +
> 

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 37/37] vfio: Add support for Shared Virtual Addressing
@ 2018-02-28 16:25           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-28 16:25 UTC (permalink / raw)
  To: Sinan Kaya, linux-arm-kernel, linux-pci, linux-acpi, devicetree,
	iommu, kvm
  Cc: Mark Rutland, xieyisheng1, ilias.apalodimas, Catalin Marinas,
	xuzaibo, jonathan.cameron, Will Deacon, yi.l.liu,
	Lorenzo Pieralisi, ashok.raj, tn, joro, bharatku, rfranz, lenb,
	jacob.jun.pan, alex.williamson, robh+dt, thunder.leizhen,
	bhelgaas, shunyong.yang, dwmw2, liubo95, rjw, jcrouse, robdclark,
	hanjun.guo, Sudeep Holla, Robin Murphy, christian.koenig,
	nwatters

On 28/02/18 01:26, Sinan Kaya wrote:
[...]
>> +static int vfio_iommu_sva_init(struct device *dev, void *data)
>> +{
> 
> data is not getting used.

That's the pointer passed to "iommu_group_for_each_dev", NULL at the
moment. Next version of this patch will keep some state in data to
ensure one device per group.

>> +
>> +	int ret;
>> +
>> +	ret = iommu_sva_device_init(dev, IOMMU_SVA_FEAT_PASID |
>> +				    IOMMU_SVA_FEAT_IOPF, 0);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return iommu_register_mm_exit_handler(dev, vfio_iommu_mm_exit);
>> +}
>> +
>> +static int vfio_iommu_sva_shutdown(struct device *dev, void *data)
>> +{
>> +	iommu_sva_device_shutdown(dev);
>> +	iommu_unregister_mm_exit_handler(dev);
>> +
>> +	return 0;
>> +}
>> +
>> +static int vfio_iommu_bind_group(struct vfio_iommu *iommu,
>> +				 struct vfio_group *group,
>> +				 struct vfio_mm *vfio_mm)
>> +{
>> +	int ret;
>> +	int pasid;
>> +
>> +	if (!group->sva_enabled) {
>> +		ret = iommu_group_for_each_dev(group->iommu_group, NULL,
>> +					       vfio_iommu_sva_init);
>> +		if (ret)
>> +			return ret;
>> +
>> +		group->sva_enabled = true;
>> +	}
>> +
>> +	ret = iommu_sva_bind_group(group->iommu_group, vfio_mm->mm, &pasid,
>> +				   IOMMU_SVA_FEAT_PASID | IOMMU_SVA_FEAT_IOPF,
>> +				   vfio_mm);
>> +	if (ret)
>> +		return ret;
> 
> don't you need to clean up the work done by vfio_iommu_sva_init() here.

Yes I suppose we can, if we enabled during this bind

[...]
>> +static long vfio_iommu_type1_bind_process(struct vfio_iommu *iommu,
>> +					  void __user *arg,
>> +					  struct vfio_iommu_type1_bind *bind)
>> +{
>> +	struct vfio_iommu_type1_bind_process params;
>> +	struct vfio_domain *domain;
>> +	struct vfio_group *group;
>> +	struct vfio_mm *vfio_mm;
>> +	struct mm_struct *mm;
>> +	unsigned long minsz;
>> +	int ret = 0;
>> +
>> +	minsz = sizeof(*bind) + sizeof(params);
>> +	if (bind->argsz < minsz)
>> +		return -EINVAL;
>> +
>> +	arg += sizeof(*bind);
>> +	if (copy_from_user(&params, arg, sizeof(params)))
>> +		return -EFAULT;
>> +
>> +	if (params.flags & ~VFIO_IOMMU_BIND_PID)
>> +		return -EINVAL;
>> +
>> +	if (params.flags & VFIO_IOMMU_BIND_PID) {
>> +		mm = vfio_iommu_get_mm_by_vpid(params.pid);
>> +		if (IS_ERR(mm))
>> +			return PTR_ERR(mm);
>> +	} else {
>> +		mm = get_task_mm(current);
>> +		if (!mm)
>> +			return -EINVAL;
>> +	}
> 
> I think you can merge mm failure in both states.

Yes, I think vfio_iommu_get_mm_by_vpid could return NULL instead of an
error pointer, and we can throw -ESRCH in all cases (the existing
get_task_mm() failure in this driver does return -ESRCH, so it would be
consistent.)

[...]
>> +	/*
>> +	 * We can't simply unbind a foreign process by PASID, because the
>> +	 * process might have died and the PASID might have been reallocated to
>> +	 * another process. Instead we need to fetch that process mm by PID
>> +	 * again to make sure we remove the right vfio_mm. In addition, holding
>> +	 * the mm guarantees that mm_users isn't dropped while we unbind and the
>> +	 * exit_mm handler doesn't fire. While not strictly necessary, not
>> +	 * having to care about that race simplifies everyone's life.
>> +	 */
>> +	if (params.flags & VFIO_IOMMU_BIND_PID) {
>> +		mm = vfio_iommu_get_mm_by_vpid(params.pid);
>> +		if (IS_ERR(mm))
>> +			return PTR_ERR(mm);
>> +	} else {
>> +		mm = get_task_mm(current);
>> +		if (!mm)
>> +			return -EINVAL;
>> +	}
>> +
> 
> I think you can merge mm failure in both states.

ok

>> +	ret = -ESRCH;
>> +	mutex_lock(&iommu->lock);
>> +	list_for_each_entry(vfio_mm, &iommu->mm_list, next) {
>> +		if (vfio_mm->mm != mm)
>> +			continue;
>> +
> 
> these loops look wierd 
> 1. for loops + break 
> 2. for loops + goto
> 
> how about closing the for loop here. and then return here if not vfio_mm
> not found.

ok

>> +		vfio_iommu_unbind(iommu, vfio_mm);
>> +		list_del(&vfio_mm->next);
>> +		kfree(vfio_mm);
>> +		ret = 0;
>> +		break;
>> +	}
>> +	mutex_unlock(&iommu->lock);
>> +	mmput(mm);
>> +
>> +	return ret;
>> +}
>> +
> 

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 37/37] vfio: Add support for Shared Virtual Addressing
@ 2018-02-28 16:25           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-28 16:25 UTC (permalink / raw)
  To: linux-arm-kernel

On 28/02/18 01:26, Sinan Kaya wrote:
[...]
>> +static int vfio_iommu_sva_init(struct device *dev, void *data)
>> +{
> 
> data is not getting used.

That's the pointer passed to "iommu_group_for_each_dev", NULL at the
moment. Next version of this patch will keep some state in data to
ensure one device per group.

>> +
>> +	int ret;
>> +
>> +	ret = iommu_sva_device_init(dev, IOMMU_SVA_FEAT_PASID |
>> +				    IOMMU_SVA_FEAT_IOPF, 0);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return iommu_register_mm_exit_handler(dev, vfio_iommu_mm_exit);
>> +}
>> +
>> +static int vfio_iommu_sva_shutdown(struct device *dev, void *data)
>> +{
>> +	iommu_sva_device_shutdown(dev);
>> +	iommu_unregister_mm_exit_handler(dev);
>> +
>> +	return 0;
>> +}
>> +
>> +static int vfio_iommu_bind_group(struct vfio_iommu *iommu,
>> +				 struct vfio_group *group,
>> +				 struct vfio_mm *vfio_mm)
>> +{
>> +	int ret;
>> +	int pasid;
>> +
>> +	if (!group->sva_enabled) {
>> +		ret = iommu_group_for_each_dev(group->iommu_group, NULL,
>> +					       vfio_iommu_sva_init);
>> +		if (ret)
>> +			return ret;
>> +
>> +		group->sva_enabled = true;
>> +	}
>> +
>> +	ret = iommu_sva_bind_group(group->iommu_group, vfio_mm->mm, &pasid,
>> +				   IOMMU_SVA_FEAT_PASID | IOMMU_SVA_FEAT_IOPF,
>> +				   vfio_mm);
>> +	if (ret)
>> +		return ret;
> 
> don't you need to clean up the work done by vfio_iommu_sva_init() here.

Yes I suppose we can, if we enabled during this bind

[...]
>> +static long vfio_iommu_type1_bind_process(struct vfio_iommu *iommu,
>> +					  void __user *arg,
>> +					  struct vfio_iommu_type1_bind *bind)
>> +{
>> +	struct vfio_iommu_type1_bind_process params;
>> +	struct vfio_domain *domain;
>> +	struct vfio_group *group;
>> +	struct vfio_mm *vfio_mm;
>> +	struct mm_struct *mm;
>> +	unsigned long minsz;
>> +	int ret = 0;
>> +
>> +	minsz = sizeof(*bind) + sizeof(params);
>> +	if (bind->argsz < minsz)
>> +		return -EINVAL;
>> +
>> +	arg += sizeof(*bind);
>> +	if (copy_from_user(&params, arg, sizeof(params)))
>> +		return -EFAULT;
>> +
>> +	if (params.flags & ~VFIO_IOMMU_BIND_PID)
>> +		return -EINVAL;
>> +
>> +	if (params.flags & VFIO_IOMMU_BIND_PID) {
>> +		mm = vfio_iommu_get_mm_by_vpid(params.pid);
>> +		if (IS_ERR(mm))
>> +			return PTR_ERR(mm);
>> +	} else {
>> +		mm = get_task_mm(current);
>> +		if (!mm)
>> +			return -EINVAL;
>> +	}
> 
> I think you can merge mm failure in both states.

Yes, I think vfio_iommu_get_mm_by_vpid could return NULL instead of an
error pointer, and we can throw -ESRCH in all cases (the existing
get_task_mm() failure in this driver does return -ESRCH, so it would be
consistent.)

[...]
>> +	/*
>> +	 * We can't simply unbind a foreign process by PASID, because the
>> +	 * process might have died and the PASID might have been reallocated to
>> +	 * another process. Instead we need to fetch that process mm by PID
>> +	 * again to make sure we remove the right vfio_mm. In addition, holding
>> +	 * the mm guarantees that mm_users isn't dropped while we unbind and the
>> +	 * exit_mm handler doesn't fire. While not strictly necessary, not
>> +	 * having to care about that race simplifies everyone's life.
>> +	 */
>> +	if (params.flags & VFIO_IOMMU_BIND_PID) {
>> +		mm = vfio_iommu_get_mm_by_vpid(params.pid);
>> +		if (IS_ERR(mm))
>> +			return PTR_ERR(mm);
>> +	} else {
>> +		mm = get_task_mm(current);
>> +		if (!mm)
>> +			return -EINVAL;
>> +	}
>> +
> 
> I think you can merge mm failure in both states.

ok

>> +	ret = -ESRCH;
>> +	mutex_lock(&iommu->lock);
>> +	list_for_each_entry(vfio_mm, &iommu->mm_list, next) {
>> +		if (vfio_mm->mm != mm)
>> +			continue;
>> +
> 
> these loops look wierd 
> 1. for loops + break 
> 2. for loops + goto
> 
> how about closing the for loop here. and then return here if not vfio_mm
> not found.

ok

>> +		vfio_iommu_unbind(iommu, vfio_mm);
>> +		list_del(&vfio_mm->next);
>> +		kfree(vfio_mm);
>> +		ret = 0;
>> +		break;
>> +	}
>> +	mutex_unlock(&iommu->lock);
>> +	mmput(mm);
>> +
>> +	return ret;
>> +}
>> +
> 

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices
  2018-02-12 18:33   ` Jean-Philippe Brucker
  (?)
@ 2018-02-28 20:34       ` Sinan Kaya
  -1 siblings, 0 replies; 317+ messages in thread
From: Sinan Kaya @ 2018-02-28 20:34 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: mark.rutland-5wv7dgnIgG8,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	catalin.marinas-5wv7dgnIgG8, xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	will.deacon-5wv7dgnIgG8, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	rjw-LthD3rsA81gm4RdzfppkhA, sudeep.holla-5wv7dgnIgG8,
	christian.koenig-5C7GfCeVMHo

On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
> +int iommu_sva_unbind_group(struct iommu_group *group, int pasid)
> +{
> +	struct group_device *device;
> +
> +	mutex_lock(&group->mutex);
> +	list_for_each_entry(device, &group->devices, list)
> +		iommu_sva_unbind_device(device->dev, pasid);
> +	mutex_unlock(&group->mutex);
> +
> +	return 0;
> +}

I think we should handle the errors returned by iommu_sva_unbind_device() here
or at least print a warning if we want to still continue unbinding. 

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices
@ 2018-02-28 20:34       ` Sinan Kaya
  0 siblings, 0 replies; 317+ messages in thread
From: Sinan Kaya @ 2018-02-28 20:34 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, jcrouse, rfranz,
	dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
> +int iommu_sva_unbind_group(struct iommu_group *group, int pasid)
> +{
> +	struct group_device *device;
> +
> +	mutex_lock(&group->mutex);
> +	list_for_each_entry(device, &group->devices, list)
> +		iommu_sva_unbind_device(device->dev, pasid);
> +	mutex_unlock(&group->mutex);
> +
> +	return 0;
> +}

I think we should handle the errors returned by iommu_sva_unbind_device() here
or at least print a warning if we want to still continue unbinding. 

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 02/37] iommu/sva: Bind process address spaces to devices
@ 2018-02-28 20:34       ` Sinan Kaya
  0 siblings, 0 replies; 317+ messages in thread
From: Sinan Kaya @ 2018-02-28 20:34 UTC (permalink / raw)
  To: linux-arm-kernel

On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
> +int iommu_sva_unbind_group(struct iommu_group *group, int pasid)
> +{
> +	struct group_device *device;
> +
> +	mutex_lock(&group->mutex);
> +	list_for_each_entry(device, &group->devices, list)
> +		iommu_sva_unbind_device(device->dev, pasid);
> +	mutex_unlock(&group->mutex);
> +
> +	return 0;
> +}

I think we should handle the errors returned by iommu_sva_unbind_device() here
or at least print a warning if we want to still continue unbinding. 

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* RE: [PATCH 02/37] iommu/sva: Bind process address spaces to devices
  2018-02-15 12:40             ` Jean-Philippe Brucker
  (?)
@ 2018-03-01  3:03                 ` Liu, Yi L
  -1 siblings, 0 replies; 317+ messages in thread
From: Liu, Yi L @ 2018-03-01  3:03 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Tian, Kevin,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Mark Rutland, ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	mykyta.iziumtsev-QSEj5FYQhm4dnm+yROfE0A, Catalin Marinas,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, Raj, Ashok,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	rjw-LthD3rsA81gm4RdzfppkhA, Sudeep Holla,
	christian.koenig-5C7GfCeVMHo

Hi Jean,

> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org]
> Sent: Thursday, February 15, 2018 8:41 PM
> Subject: Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices
> 
> On 13/02/18 23:34, Tian, Kevin wrote:
> >> From: Jean-Philippe Brucker
> >> Sent: Tuesday, February 13, 2018 8:57 PM
> >>
> >> On 13/02/18 07:54, Tian, Kevin wrote:
> >>>> From: Jean-Philippe Brucker
> >>>> Sent: Tuesday, February 13, 2018 2:33 AM
> >>>>
> >>>> Add bind() and unbind() operations to the IOMMU API. Device drivers
> >> can
> >>>> use them to share process page tables with their devices.
> >>>> bind_group() is provided for VFIO's convenience, as it needs to
> >>>> provide a coherent interface on containers. Other device drivers
> >>>> will most likely want to use bind_device(), which binds a single device in the
> group.
> >>>
> >>> I saw your bind_group implementation tries to bind the address space
> >>> for all devices within a group, which IMO has some problem. Based on
> >> PCIe
> >>> spec, packet routing on the bus doesn't take PASID into consideration.
> >>> since devices within same group cannot be isolated based on
> >>> requestor-
> >> ID
> >>> i.e. traffic not guaranteed going to IOMMU, enabling SVA on multiple
> >> devices
> >>> could cause undesired p2p.
> >> But so does enabling "classic" DMA... If two devices are not
> >> protected by ACS for example, they are put in the same IOMMU group,
> >> and one device might be able to snoop the other's DMA. VFIO allows
> >> userspace to create a container for them and use MAP/UNMAP, but makes
> >> it explicit to the user that for DMA, these devices are not isolated
> >> and must be considered as a single device (you can't pass them to
> >> different VMs or put them in different containers). So I tried to
> >> keep the same idea as MAP/UNMAP for SVA, performing BIND/UNBIND
> >> operations on the VFIO container instead of the device.
> >
> > there is a small difference. for classic DMA we can reserve PCI BARs
> > when allocating IOVA, thus multiple devices in the same group can
> > still work correctly applied with same translation, if isolation is
> > not cared in between. However for SVA it's CPU virtual addresses
> > managed by kernel mm thus difficult to introduce similar address
> > reservation. Then it's possible for a VA falling into other device's
> > BAR in the same group and cause undesired p2p traffic. In such regard,
> > SVA is actually functionally-broken.
> 
> I think the problem exists even if there is a single device in the group.
> If for example, malloc() returns a VA that corresponds to a PCI host bridge in IOVA
> space, performing DMA on that buffer won't reach the IOMMU and will cause
> undesirable side-effects.

If only a single device in a group, should it mean there is ACS support in
the path from this device to root complex? It means any memory request
from this device would be upstreamed to root complex, thus it should be
able to avoid undesired p2p traffics. So I intend to believe, even we do
bind in group level, we actually expect to make it work only for the case
where a single device within a group.

Thanks,
Yi Liu

^ permalink raw reply	[flat|nested] 317+ messages in thread

* RE: [PATCH 02/37] iommu/sva: Bind process address spaces to devices
@ 2018-03-01  3:03                 ` Liu, Yi L
  0 siblings, 0 replies; 317+ messages in thread
From: Liu, Yi L @ 2018-03-01  3:03 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Tian, Kevin, linux-arm-kernel, linux-pci,
	linux-acpi, devicetree, iommu, kvm
  Cc: Mark Rutland, xieyisheng1, ilias.apalodimas, mykyta.iziumtsev,
	Catalin Marinas, xuzaibo, jonathan.cameron, Will Deacon, okaya,
	Lorenzo Pieralisi, Raj, Ashok, tn, joro, bharatku, rfranz, lenb,
	jacob.jun.pan, alex.williamson, robh+dt, thunder.leizhen,
	bhelgaas, shunyong.yang, dwmw2, liubo95, rjw, jcrouse, robdclark,
	hanjun.guo, Sudeep Holla, Robin Murphy, christian.koenig,
	nwatters

Hi Jean,

> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
> Sent: Thursday, February 15, 2018 8:41 PM
> Subject: Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices
> 
> On 13/02/18 23:34, Tian, Kevin wrote:
> >> From: Jean-Philippe Brucker
> >> Sent: Tuesday, February 13, 2018 8:57 PM
> >>
> >> On 13/02/18 07:54, Tian, Kevin wrote:
> >>>> From: Jean-Philippe Brucker
> >>>> Sent: Tuesday, February 13, 2018 2:33 AM
> >>>>
> >>>> Add bind() and unbind() operations to the IOMMU API. Device drivers
> >> can
> >>>> use them to share process page tables with their devices.
> >>>> bind_group() is provided for VFIO's convenience, as it needs to
> >>>> provide a coherent interface on containers. Other device drivers
> >>>> will most likely want to use bind_device(), which binds a single device in the
> group.
> >>>
> >>> I saw your bind_group implementation tries to bind the address space
> >>> for all devices within a group, which IMO has some problem. Based on
> >> PCIe
> >>> spec, packet routing on the bus doesn't take PASID into consideration.
> >>> since devices within same group cannot be isolated based on
> >>> requestor-
> >> ID
> >>> i.e. traffic not guaranteed going to IOMMU, enabling SVA on multiple
> >> devices
> >>> could cause undesired p2p.
> >> But so does enabling "classic" DMA... If two devices are not
> >> protected by ACS for example, they are put in the same IOMMU group,
> >> and one device might be able to snoop the other's DMA. VFIO allows
> >> userspace to create a container for them and use MAP/UNMAP, but makes
> >> it explicit to the user that for DMA, these devices are not isolated
> >> and must be considered as a single device (you can't pass them to
> >> different VMs or put them in different containers). So I tried to
> >> keep the same idea as MAP/UNMAP for SVA, performing BIND/UNBIND
> >> operations on the VFIO container instead of the device.
> >
> > there is a small difference. for classic DMA we can reserve PCI BARs
> > when allocating IOVA, thus multiple devices in the same group can
> > still work correctly applied with same translation, if isolation is
> > not cared in between. However for SVA it's CPU virtual addresses
> > managed by kernel mm thus difficult to introduce similar address
> > reservation. Then it's possible for a VA falling into other device's
> > BAR in the same group and cause undesired p2p traffic. In such regard,
> > SVA is actually functionally-broken.
> 
> I think the problem exists even if there is a single device in the group.
> If for example, malloc() returns a VA that corresponds to a PCI host bridge in IOVA
> space, performing DMA on that buffer won't reach the IOMMU and will cause
> undesirable side-effects.

If only a single device in a group, should it mean there is ACS support in
the path from this device to root complex? It means any memory request
from this device would be upstreamed to root complex, thus it should be
able to avoid undesired p2p traffics. So I intend to believe, even we do
bind in group level, we actually expect to make it work only for the case
where a single device within a group.

Thanks,
Yi Liu

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 02/37] iommu/sva: Bind process address spaces to devices
@ 2018-03-01  3:03                 ` Liu, Yi L
  0 siblings, 0 replies; 317+ messages in thread
From: Liu, Yi L @ 2018-03-01  3:03 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jean,

> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker at arm.com]
> Sent: Thursday, February 15, 2018 8:41 PM
> Subject: Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices
> 
> On 13/02/18 23:34, Tian, Kevin wrote:
> >> From: Jean-Philippe Brucker
> >> Sent: Tuesday, February 13, 2018 8:57 PM
> >>
> >> On 13/02/18 07:54, Tian, Kevin wrote:
> >>>> From: Jean-Philippe Brucker
> >>>> Sent: Tuesday, February 13, 2018 2:33 AM
> >>>>
> >>>> Add bind() and unbind() operations to the IOMMU API. Device drivers
> >> can
> >>>> use them to share process page tables with their devices.
> >>>> bind_group() is provided for VFIO's convenience, as it needs to
> >>>> provide a coherent interface on containers. Other device drivers
> >>>> will most likely want to use bind_device(), which binds a single device in the
> group.
> >>>
> >>> I saw your bind_group implementation tries to bind the address space
> >>> for all devices within a group, which IMO has some problem. Based on
> >> PCIe
> >>> spec, packet routing on the bus doesn't take PASID into consideration.
> >>> since devices within same group cannot be isolated based on
> >>> requestor-
> >> ID
> >>> i.e. traffic not guaranteed going to IOMMU, enabling SVA on multiple
> >> devices
> >>> could cause undesired p2p.
> >> But so does enabling "classic" DMA... If two devices are not
> >> protected by ACS for example, they are put in the same IOMMU group,
> >> and one device might be able to snoop the other's DMA. VFIO allows
> >> userspace to create a container for them and use MAP/UNMAP, but makes
> >> it explicit to the user that for DMA, these devices are not isolated
> >> and must be considered as a single device (you can't pass them to
> >> different VMs or put them in different containers). So I tried to
> >> keep the same idea as MAP/UNMAP for SVA, performing BIND/UNBIND
> >> operations on the VFIO container instead of the device.
> >
> > there is a small difference. for classic DMA we can reserve PCI BARs
> > when allocating IOVA, thus multiple devices in the same group can
> > still work correctly applied with same translation, if isolation is
> > not cared in between. However for SVA it's CPU virtual addresses
> > managed by kernel mm thus difficult to introduce similar address
> > reservation. Then it's possible for a VA falling into other device's
> > BAR in the same group and cause undesired p2p traffic. In such regard,
> > SVA is actually functionally-broken.
> 
> I think the problem exists even if there is a single device in the group.
> If for example, malloc() returns a VA that corresponds to a PCI host bridge in IOVA
> space, performing DMA on that buffer won't reach the IOMMU and will cause
> undesirable side-effects.

If only a single device in a group, should it mean there is ACS support in
the path from this device to root complex? It means any memory request
from this device would be upstreamed to root complex, thus it should be
able to avoid undesired p2p traffics. So I intend to believe, even we do
bind in group level, we actually expect to make it work only for the case
where a single device within a group.

Thanks,
Yi Liu

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 03/37] iommu/sva: Manage process address spaces
  2018-02-12 18:33     ` Jean-Philippe Brucker
  (?)
@ 2018-03-01  6:52         ` Lu Baolu
  -1 siblings, 0 replies; 317+ messages in thread
From: Lu Baolu @ 2018-03-01  6:52 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: mark.rutland-5wv7dgnIgG8, bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w, rjw-LthD3rsA81gm4RdzfppkhA,
	catalin.marinas-5wv7dgnIgG8, xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, sudeep.holla-5wv7dgnIgG8,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	christian.koenig-5C7GfCeVMHo, lenb-DgEjT+Ai2ygdnm+yROfE0A

Hi Jean,

On 02/13/2018 02:33 AM, Jean-Philippe Brucker wrote:
> Introduce boilerplate code for allocating IOMMU mm structures and binding
> them to devices. Four operations are added to IOMMU drivers:
>
> * mm_alloc(): to create an io_mm structure and perform architecture-
>   specific operations required to grab the process (for instance on ARM,
>   pin down the CPU ASID so that the process doesn't get assigned a new
>   ASID on rollover).
>
>   There is a single valid io_mm structure per Linux mm. Future extensions
>   may also use io_mm for kernel-managed address spaces, populated with
>   map()/unmap() calls instead of bound to process address spaces. This
>   patch focuses on "shared" io_mm.
>
> * mm_attach(): attach an mm to a device. The IOMMU driver checks that the
>   device is capable of sharing an address space, and writes the PASID
>   table entry to install the pgd.
>
>   Some IOMMU drivers will have a single PASID table per domain, for
>   convenience. Other can implement it differently but to help these
>   drivers, mm_attach and mm_detach take 'attach_domain' and
>   'detach_domain' parameters, that tell whether they need to set and clear
>   the PASID entry or only send the required TLB invalidations.
>
> * mm_detach(): detach an mm from a device. The IOMMU driver removes the
>   PASID table entry and invalidates the IOTLBs.
>
> * mm_free(): free a structure allocated by mm_alloc(), and let arch
>   release the process.
>
> mm_attach and mm_detach operations are serialized with a spinlock. At the
> moment it is global, but if we try to optimize it, the core should at
> least prevent concurrent attach()/detach() on the same domain (so
> multi-level PASID table code can allocate tables lazily). mm_alloc() can
> sleep, but mm_free must not (because we'll have to call it from call_srcu
> later on.)
>
> At the moment we use an IDR for allocating PASIDs and retrieving contexts.
> We also use a single spinlock. These can be refined and optimized later (a
> custom allocator will be needed for top-down PASID allocation).
>
> Keeping track of address spaces requires the use of MMU notifiers.
> Handling process exit with regard to unbind() is tricky, so it is left for
> another patch and we explicitly fail mm_alloc() for the moment.
>
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
> ---
>  drivers/iommu/iommu-sva.c | 382 +++++++++++++++++++++++++++++++++++++++++++++-
>  drivers/iommu/iommu.c     |   2 +
>  include/linux/iommu.h     |  25 +++
>  3 files changed, 406 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> index 593685d891bf..f9af9d66b3ed 100644
> --- a/drivers/iommu/iommu-sva.c
> +++ b/drivers/iommu/iommu-sva.c
> @@ -7,11 +7,321 @@
>   * SPDX-License-Identifier: GPL-2.0
>   */
>  
> +#include <linux/idr.h>
>  #include <linux/iommu.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +
> +/**
> + * DOC: io_mm model
> + *
> + * The io_mm keeps track of process address spaces shared between CPU and IOMMU.
> + * The following example illustrates the relation between structures
> + * iommu_domain, io_mm and iommu_bond. An iommu_bond is a link between io_mm and
> + * device. A device can have multiple io_mm and an io_mm may be bound to
> + * multiple devices.
> + *              ___________________________
> + *             |  IOMMU domain A           |
> + *             |  ________________         |
> + *             | |  IOMMU group   |        +------- io_pgtables
> + *             | |                |        |
> + *             | |   dev 00:00.0 ----+------- bond --- io_mm X
> + *             | |________________|   \    |
> + *             |                       '----- bond ---.
> + *             |___________________________|           \
> + *              ___________________________             \
> + *             |  IOMMU domain B           |           io_mm Y
> + *             |  ________________         |           / /
> + *             | |  IOMMU group   |        |          / /
> + *             | |                |        |         / /
> + *             | |   dev 00:01.0 ------------ bond -' /
> + *             | |   dev 00:01.1 ------------ bond --'
> + *             | |________________|        |
> + *             |                           +------- io_pgtables
> + *             |___________________________|
> + *
> + * In this example, device 00:00.0 is in domain A, devices 00:01.* are in domain
> + * B. All devices within the same domain access the same address spaces. Device
> + * 00:00.0 accesses address spaces X and Y, each corresponding to an mm_struct.
> + * Devices 00:01.* only access address space Y. In addition each
> + * IOMMU_DOMAIN_DMA domain has a private address space, io_pgtable, that is
> + * managed with iommu_map()/iommu_unmap(), and isn't shared with the CPU MMU.
> + *
> + * To obtain the above configuration, users would for instance issue the
> + * following calls:
> + *
> + *     iommu_sva_bind_device(dev 00:00.0, mm X, ...) -> PASID 1
> + *     iommu_sva_bind_device(dev 00:00.0, mm Y, ...) -> PASID 2
> + *     iommu_sva_bind_device(dev 00:01.0, mm Y, ...) -> PASID 2
> + *     iommu_sva_bind_device(dev 00:01.1, mm Y, ...) -> PASID 2
> + *
> + * A single Process Address Space ID (PASID) is allocated for each mm. In the
> + * example, devices use PASID 1 to read/write into address space X and PASID 2
> + * to read/write into address space Y.
> + *
> + * Hardware tables describing this configuration in the IOMMU would typically
> + * look like this:
> + *
> + *                                PASID tables
> + *                                 of domain A
> + *                              .->+--------+
> + *                             / 0 |        |-------> io_pgtable
> + *                            /    +--------+
> + *            Device tables  /   1 |        |-------> pgd X
> + *              +--------+  /      +--------+
> + *      00:00.0 |      A |-'     2 |        |--.
> + *              +--------+         +--------+   \
> + *              :        :       3 |        |    \
> + *              +--------+         +--------+     --> pgd Y
> + *      00:01.0 |      B |--.                    /
> + *              +--------+   \                  |
> + *      00:01.1 |      B |----+   PASID tables  |
> + *              +--------+     \   of domain B  |
> + *                              '->+--------+   |
> + *                               0 |        |-- | --> io_pgtable
> + *                                 +--------+   |
> + *                               1 |        |   |
> + *                                 +--------+   |
> + *                               2 |        |---'
> + *                                 +--------+
> + *                               3 |        |
> + *                                 +--------+
> + *
> + * With this model, a single call binds all devices in a given domain to an
> + * address space. Other devices in the domain will get the same bond implicitly.
> + * However, users must issue one bind() for each device, because IOMMUs may
> + * implement SVA differently. Furthermore, mandating one bind() per device
> + * allows the driver to perform sanity-checks on device capabilities.
> + *
> + * On Arm and AMD IOMMUs, entry 0 of the PASID table can be used to hold
> + * non-PASID translations. In this case PASID 0 is reserved and entry 0 points
> + * to the io_pgtable base. On Intel IOMMU, the io_pgtable base would be held in
> + * the device table and PASID 0 would be available to the allocator.
> + */
>  
>  /* TODO: stub for the fault queue. Remove later. */
>  #define iommu_fault_queue_flush(...)
>  
> +struct iommu_bond {
> +	struct io_mm		*io_mm;
> +	struct device		*dev;
> +	struct iommu_domain	*domain;
> +
> +	struct list_head	mm_head;
> +	struct list_head	dev_head;
> +	struct list_head	domain_head;
> +
> +	void			*drvdata;
> +
> +	/* Number of bind() calls */
> +	refcount_t		refs;
> +};
> +
> +/*
> + * Because we're using an IDR, PASIDs are limited to 31 bits (the sign bit is
> + * used for returning errors). In practice implementations will use at most 20
> + * bits, which is the PCI limit.
> + */
> +static DEFINE_IDR(iommu_pasid_idr);
> +
> +/*
> + * For the moment this is an all-purpose lock. It serializes
> + * access/modifications to bonds, access/modifications to the PASID IDR, and
> + * changes to io_mm refcount as well.
> + */
> +static DEFINE_SPINLOCK(iommu_sva_lock);
> +
> +static struct io_mm *
> +io_mm_alloc(struct iommu_domain *domain, struct device *dev,
> +	    struct mm_struct *mm)
> +{
> +	int ret;
> +	int pasid;
> +	struct io_mm *io_mm;
> +	struct iommu_param *dev_param = dev->iommu_param;
> +
> +	if (!dev_param || !domain->ops->mm_alloc || !domain->ops->mm_free)
> +		return ERR_PTR(-ENODEV);
> +
> +	io_mm = domain->ops->mm_alloc(domain, mm);
> +	if (IS_ERR(io_mm))
> +		return io_mm;
> +	if (!io_mm)
> +		return ERR_PTR(-ENOMEM);
> +
> +	/*
> +	 * The mm must not be freed until after the driver frees the io_mm
> +	 * (which may involve unpinning the CPU ASID for instance, requiring a
> +	 * valid mm struct.)
> +	 */
> +	mmgrab(mm);
> +
> +	io_mm->mm		= mm;
> +	io_mm->release		= domain->ops->mm_free;
> +	INIT_LIST_HEAD(&io_mm->devices);
> +
> +	idr_preload(GFP_KERNEL);
> +	spin_lock(&iommu_sva_lock);
> +	pasid = idr_alloc_cyclic(&iommu_pasid_idr, io_mm, dev_param->min_pasid,
> +				 dev_param->max_pasid + 1, GFP_ATOMIC);

Can the pasid management code be moved into a common library?
PASID is not stick to SVA. An IOMMU model device could be designed
to use PASID for second level translation (classical DMA translation)
as well.

Best regards,
Lu Baolu

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 03/37] iommu/sva: Manage process address spaces
@ 2018-03-01  6:52         ` Lu Baolu
  0 siblings, 0 replies; 317+ messages in thread
From: Lu Baolu @ 2018-03-01  6:52 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu, kvm
  Cc: mark.rutland, ilias.apalodimas, catalin.marinas, xuzaibo,
	will.deacon, okaya, ashok.raj, bharatku, rfranz, lenb, robh+dt,
	bhelgaas, shunyong.yang, dwmw2, rjw, sudeep.holla,
	christian.koenig

Hi Jean,

On 02/13/2018 02:33 AM, Jean-Philippe Brucker wrote:
> Introduce boilerplate code for allocating IOMMU mm structures and binding
> them to devices. Four operations are added to IOMMU drivers:
>
> * mm_alloc(): to create an io_mm structure and perform architecture-
>   specific operations required to grab the process (for instance on ARM,
>   pin down the CPU ASID so that the process doesn't get assigned a new
>   ASID on rollover).
>
>   There is a single valid io_mm structure per Linux mm. Future extensions
>   may also use io_mm for kernel-managed address spaces, populated with
>   map()/unmap() calls instead of bound to process address spaces. This
>   patch focuses on "shared" io_mm.
>
> * mm_attach(): attach an mm to a device. The IOMMU driver checks that the
>   device is capable of sharing an address space, and writes the PASID
>   table entry to install the pgd.
>
>   Some IOMMU drivers will have a single PASID table per domain, for
>   convenience. Other can implement it differently but to help these
>   drivers, mm_attach and mm_detach take 'attach_domain' and
>   'detach_domain' parameters, that tell whether they need to set and clear
>   the PASID entry or only send the required TLB invalidations.
>
> * mm_detach(): detach an mm from a device. The IOMMU driver removes the
>   PASID table entry and invalidates the IOTLBs.
>
> * mm_free(): free a structure allocated by mm_alloc(), and let arch
>   release the process.
>
> mm_attach and mm_detach operations are serialized with a spinlock. At the
> moment it is global, but if we try to optimize it, the core should at
> least prevent concurrent attach()/detach() on the same domain (so
> multi-level PASID table code can allocate tables lazily). mm_alloc() can
> sleep, but mm_free must not (because we'll have to call it from call_srcu
> later on.)
>
> At the moment we use an IDR for allocating PASIDs and retrieving contexts.
> We also use a single spinlock. These can be refined and optimized later (a
> custom allocator will be needed for top-down PASID allocation).
>
> Keeping track of address spaces requires the use of MMU notifiers.
> Handling process exit with regard to unbind() is tricky, so it is left for
> another patch and we explicitly fail mm_alloc() for the moment.
>
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  drivers/iommu/iommu-sva.c | 382 +++++++++++++++++++++++++++++++++++++++++++++-
>  drivers/iommu/iommu.c     |   2 +
>  include/linux/iommu.h     |  25 +++
>  3 files changed, 406 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> index 593685d891bf..f9af9d66b3ed 100644
> --- a/drivers/iommu/iommu-sva.c
> +++ b/drivers/iommu/iommu-sva.c
> @@ -7,11 +7,321 @@
>   * SPDX-License-Identifier: GPL-2.0
>   */
>  
> +#include <linux/idr.h>
>  #include <linux/iommu.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +
> +/**
> + * DOC: io_mm model
> + *
> + * The io_mm keeps track of process address spaces shared between CPU and IOMMU.
> + * The following example illustrates the relation between structures
> + * iommu_domain, io_mm and iommu_bond. An iommu_bond is a link between io_mm and
> + * device. A device can have multiple io_mm and an io_mm may be bound to
> + * multiple devices.
> + *              ___________________________
> + *             |  IOMMU domain A           |
> + *             |  ________________         |
> + *             | |  IOMMU group   |        +------- io_pgtables
> + *             | |                |        |
> + *             | |   dev 00:00.0 ----+------- bond --- io_mm X
> + *             | |________________|   \    |
> + *             |                       '----- bond ---.
> + *             |___________________________|           \
> + *              ___________________________             \
> + *             |  IOMMU domain B           |           io_mm Y
> + *             |  ________________         |           / /
> + *             | |  IOMMU group   |        |          / /
> + *             | |                |        |         / /
> + *             | |   dev 00:01.0 ------------ bond -' /
> + *             | |   dev 00:01.1 ------------ bond --'
> + *             | |________________|        |
> + *             |                           +------- io_pgtables
> + *             |___________________________|
> + *
> + * In this example, device 00:00.0 is in domain A, devices 00:01.* are in domain
> + * B. All devices within the same domain access the same address spaces. Device
> + * 00:00.0 accesses address spaces X and Y, each corresponding to an mm_struct.
> + * Devices 00:01.* only access address space Y. In addition each
> + * IOMMU_DOMAIN_DMA domain has a private address space, io_pgtable, that is
> + * managed with iommu_map()/iommu_unmap(), and isn't shared with the CPU MMU.
> + *
> + * To obtain the above configuration, users would for instance issue the
> + * following calls:
> + *
> + *     iommu_sva_bind_device(dev 00:00.0, mm X, ...) -> PASID 1
> + *     iommu_sva_bind_device(dev 00:00.0, mm Y, ...) -> PASID 2
> + *     iommu_sva_bind_device(dev 00:01.0, mm Y, ...) -> PASID 2
> + *     iommu_sva_bind_device(dev 00:01.1, mm Y, ...) -> PASID 2
> + *
> + * A single Process Address Space ID (PASID) is allocated for each mm. In the
> + * example, devices use PASID 1 to read/write into address space X and PASID 2
> + * to read/write into address space Y.
> + *
> + * Hardware tables describing this configuration in the IOMMU would typically
> + * look like this:
> + *
> + *                                PASID tables
> + *                                 of domain A
> + *                              .->+--------+
> + *                             / 0 |        |-------> io_pgtable
> + *                            /    +--------+
> + *            Device tables  /   1 |        |-------> pgd X
> + *              +--------+  /      +--------+
> + *      00:00.0 |      A |-'     2 |        |--.
> + *              +--------+         +--------+   \
> + *              :        :       3 |        |    \
> + *              +--------+         +--------+     --> pgd Y
> + *      00:01.0 |      B |--.                    /
> + *              +--------+   \                  |
> + *      00:01.1 |      B |----+   PASID tables  |
> + *              +--------+     \   of domain B  |
> + *                              '->+--------+   |
> + *                               0 |        |-- | --> io_pgtable
> + *                                 +--------+   |
> + *                               1 |        |   |
> + *                                 +--------+   |
> + *                               2 |        |---'
> + *                                 +--------+
> + *                               3 |        |
> + *                                 +--------+
> + *
> + * With this model, a single call binds all devices in a given domain to an
> + * address space. Other devices in the domain will get the same bond implicitly.
> + * However, users must issue one bind() for each device, because IOMMUs may
> + * implement SVA differently. Furthermore, mandating one bind() per device
> + * allows the driver to perform sanity-checks on device capabilities.
> + *
> + * On Arm and AMD IOMMUs, entry 0 of the PASID table can be used to hold
> + * non-PASID translations. In this case PASID 0 is reserved and entry 0 points
> + * to the io_pgtable base. On Intel IOMMU, the io_pgtable base would be held in
> + * the device table and PASID 0 would be available to the allocator.
> + */
>  
>  /* TODO: stub for the fault queue. Remove later. */
>  #define iommu_fault_queue_flush(...)
>  
> +struct iommu_bond {
> +	struct io_mm		*io_mm;
> +	struct device		*dev;
> +	struct iommu_domain	*domain;
> +
> +	struct list_head	mm_head;
> +	struct list_head	dev_head;
> +	struct list_head	domain_head;
> +
> +	void			*drvdata;
> +
> +	/* Number of bind() calls */
> +	refcount_t		refs;
> +};
> +
> +/*
> + * Because we're using an IDR, PASIDs are limited to 31 bits (the sign bit is
> + * used for returning errors). In practice implementations will use at most 20
> + * bits, which is the PCI limit.
> + */
> +static DEFINE_IDR(iommu_pasid_idr);
> +
> +/*
> + * For the moment this is an all-purpose lock. It serializes
> + * access/modifications to bonds, access/modifications to the PASID IDR, and
> + * changes to io_mm refcount as well.
> + */
> +static DEFINE_SPINLOCK(iommu_sva_lock);
> +
> +static struct io_mm *
> +io_mm_alloc(struct iommu_domain *domain, struct device *dev,
> +	    struct mm_struct *mm)
> +{
> +	int ret;
> +	int pasid;
> +	struct io_mm *io_mm;
> +	struct iommu_param *dev_param = dev->iommu_param;
> +
> +	if (!dev_param || !domain->ops->mm_alloc || !domain->ops->mm_free)
> +		return ERR_PTR(-ENODEV);
> +
> +	io_mm = domain->ops->mm_alloc(domain, mm);
> +	if (IS_ERR(io_mm))
> +		return io_mm;
> +	if (!io_mm)
> +		return ERR_PTR(-ENOMEM);
> +
> +	/*
> +	 * The mm must not be freed until after the driver frees the io_mm
> +	 * (which may involve unpinning the CPU ASID for instance, requiring a
> +	 * valid mm struct.)
> +	 */
> +	mmgrab(mm);
> +
> +	io_mm->mm		= mm;
> +	io_mm->release		= domain->ops->mm_free;
> +	INIT_LIST_HEAD(&io_mm->devices);
> +
> +	idr_preload(GFP_KERNEL);
> +	spin_lock(&iommu_sva_lock);
> +	pasid = idr_alloc_cyclic(&iommu_pasid_idr, io_mm, dev_param->min_pasid,
> +				 dev_param->max_pasid + 1, GFP_ATOMIC);

Can the pasid management code be moved into a common library?
PASID is not stick to SVA. An IOMMU model device could be designed
to use PASID for second level translation (classical DMA translation)
as well.

Best regards,
Lu Baolu

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 03/37] iommu/sva: Manage process address spaces
@ 2018-03-01  6:52         ` Lu Baolu
  0 siblings, 0 replies; 317+ messages in thread
From: Lu Baolu @ 2018-03-01  6:52 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jean,

On 02/13/2018 02:33 AM, Jean-Philippe Brucker wrote:
> Introduce boilerplate code for allocating IOMMU mm structures and binding
> them to devices. Four operations are added to IOMMU drivers:
>
> * mm_alloc(): to create an io_mm structure and perform architecture-
>   specific operations required to grab the process (for instance on ARM,
>   pin down the CPU ASID so that the process doesn't get assigned a new
>   ASID on rollover).
>
>   There is a single valid io_mm structure per Linux mm. Future extensions
>   may also use io_mm for kernel-managed address spaces, populated with
>   map()/unmap() calls instead of bound to process address spaces. This
>   patch focuses on "shared" io_mm.
>
> * mm_attach(): attach an mm to a device. The IOMMU driver checks that the
>   device is capable of sharing an address space, and writes the PASID
>   table entry to install the pgd.
>
>   Some IOMMU drivers will have a single PASID table per domain, for
>   convenience. Other can implement it differently but to help these
>   drivers, mm_attach and mm_detach take 'attach_domain' and
>   'detach_domain' parameters, that tell whether they need to set and clear
>   the PASID entry or only send the required TLB invalidations.
>
> * mm_detach(): detach an mm from a device. The IOMMU driver removes the
>   PASID table entry and invalidates the IOTLBs.
>
> * mm_free(): free a structure allocated by mm_alloc(), and let arch
>   release the process.
>
> mm_attach and mm_detach operations are serialized with a spinlock. At the
> moment it is global, but if we try to optimize it, the core should at
> least prevent concurrent attach()/detach() on the same domain (so
> multi-level PASID table code can allocate tables lazily). mm_alloc() can
> sleep, but mm_free must not (because we'll have to call it from call_srcu
> later on.)
>
> At the moment we use an IDR for allocating PASIDs and retrieving contexts.
> We also use a single spinlock. These can be refined and optimized later (a
> custom allocator will be needed for top-down PASID allocation).
>
> Keeping track of address spaces requires the use of MMU notifiers.
> Handling process exit with regard to unbind() is tricky, so it is left for
> another patch and we explicitly fail mm_alloc() for the moment.
>
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  drivers/iommu/iommu-sva.c | 382 +++++++++++++++++++++++++++++++++++++++++++++-
>  drivers/iommu/iommu.c     |   2 +
>  include/linux/iommu.h     |  25 +++
>  3 files changed, 406 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> index 593685d891bf..f9af9d66b3ed 100644
> --- a/drivers/iommu/iommu-sva.c
> +++ b/drivers/iommu/iommu-sva.c
> @@ -7,11 +7,321 @@
>   * SPDX-License-Identifier: GPL-2.0
>   */
>  
> +#include <linux/idr.h>
>  #include <linux/iommu.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +
> +/**
> + * DOC: io_mm model
> + *
> + * The io_mm keeps track of process address spaces shared between CPU and IOMMU.
> + * The following example illustrates the relation between structures
> + * iommu_domain, io_mm and iommu_bond. An iommu_bond is a link between io_mm and
> + * device. A device can have multiple io_mm and an io_mm may be bound to
> + * multiple devices.
> + *              ___________________________
> + *             |  IOMMU domain A           |
> + *             |  ________________         |
> + *             | |  IOMMU group   |        +------- io_pgtables
> + *             | |                |        |
> + *             | |   dev 00:00.0 ----+------- bond --- io_mm X
> + *             | |________________|   \    |
> + *             |                       '----- bond ---.
> + *             |___________________________|           \
> + *              ___________________________             \
> + *             |  IOMMU domain B           |           io_mm Y
> + *             |  ________________         |           / /
> + *             | |  IOMMU group   |        |          / /
> + *             | |                |        |         / /
> + *             | |   dev 00:01.0 ------------ bond -' /
> + *             | |   dev 00:01.1 ------------ bond --'
> + *             | |________________|        |
> + *             |                           +------- io_pgtables
> + *             |___________________________|
> + *
> + * In this example, device 00:00.0 is in domain A, devices 00:01.* are in domain
> + * B. All devices within the same domain access the same address spaces. Device
> + * 00:00.0 accesses address spaces X and Y, each corresponding to an mm_struct.
> + * Devices 00:01.* only access address space Y. In addition each
> + * IOMMU_DOMAIN_DMA domain has a private address space, io_pgtable, that is
> + * managed with iommu_map()/iommu_unmap(), and isn't shared with the CPU MMU.
> + *
> + * To obtain the above configuration, users would for instance issue the
> + * following calls:
> + *
> + *     iommu_sva_bind_device(dev 00:00.0, mm X, ...) -> PASID 1
> + *     iommu_sva_bind_device(dev 00:00.0, mm Y, ...) -> PASID 2
> + *     iommu_sva_bind_device(dev 00:01.0, mm Y, ...) -> PASID 2
> + *     iommu_sva_bind_device(dev 00:01.1, mm Y, ...) -> PASID 2
> + *
> + * A single Process Address Space ID (PASID) is allocated for each mm. In the
> + * example, devices use PASID 1 to read/write into address space X and PASID 2
> + * to read/write into address space Y.
> + *
> + * Hardware tables describing this configuration in the IOMMU would typically
> + * look like this:
> + *
> + *                                PASID tables
> + *                                 of domain A
> + *                              .->+--------+
> + *                             / 0 |        |-------> io_pgtable
> + *                            /    +--------+
> + *            Device tables  /   1 |        |-------> pgd X
> + *              +--------+  /      +--------+
> + *      00:00.0 |      A |-'     2 |        |--.
> + *              +--------+         +--------+   \
> + *              :        :       3 |        |    \
> + *              +--------+         +--------+     --> pgd Y
> + *      00:01.0 |      B |--.                    /
> + *              +--------+   \                  |
> + *      00:01.1 |      B |----+   PASID tables  |
> + *              +--------+     \   of domain B  |
> + *                              '->+--------+   |
> + *                               0 |        |-- | --> io_pgtable
> + *                                 +--------+   |
> + *                               1 |        |   |
> + *                                 +--------+   |
> + *                               2 |        |---'
> + *                                 +--------+
> + *                               3 |        |
> + *                                 +--------+
> + *
> + * With this model, a single call binds all devices in a given domain to an
> + * address space. Other devices in the domain will get the same bond implicitly.
> + * However, users must issue one bind() for each device, because IOMMUs may
> + * implement SVA differently. Furthermore, mandating one bind() per device
> + * allows the driver to perform sanity-checks on device capabilities.
> + *
> + * On Arm and AMD IOMMUs, entry 0 of the PASID table can be used to hold
> + * non-PASID translations. In this case PASID 0 is reserved and entry 0 points
> + * to the io_pgtable base. On Intel IOMMU, the io_pgtable base would be held in
> + * the device table and PASID 0 would be available to the allocator.
> + */
>  
>  /* TODO: stub for the fault queue. Remove later. */
>  #define iommu_fault_queue_flush(...)
>  
> +struct iommu_bond {
> +	struct io_mm		*io_mm;
> +	struct device		*dev;
> +	struct iommu_domain	*domain;
> +
> +	struct list_head	mm_head;
> +	struct list_head	dev_head;
> +	struct list_head	domain_head;
> +
> +	void			*drvdata;
> +
> +	/* Number of bind() calls */
> +	refcount_t		refs;
> +};
> +
> +/*
> + * Because we're using an IDR, PASIDs are limited to 31 bits (the sign bit is
> + * used for returning errors). In practice implementations will use at most 20
> + * bits, which is the PCI limit.
> + */
> +static DEFINE_IDR(iommu_pasid_idr);
> +
> +/*
> + * For the moment this is an all-purpose lock. It serializes
> + * access/modifications to bonds, access/modifications to the PASID IDR, and
> + * changes to io_mm refcount as well.
> + */
> +static DEFINE_SPINLOCK(iommu_sva_lock);
> +
> +static struct io_mm *
> +io_mm_alloc(struct iommu_domain *domain, struct device *dev,
> +	    struct mm_struct *mm)
> +{
> +	int ret;
> +	int pasid;
> +	struct io_mm *io_mm;
> +	struct iommu_param *dev_param = dev->iommu_param;
> +
> +	if (!dev_param || !domain->ops->mm_alloc || !domain->ops->mm_free)
> +		return ERR_PTR(-ENODEV);
> +
> +	io_mm = domain->ops->mm_alloc(domain, mm);
> +	if (IS_ERR(io_mm))
> +		return io_mm;
> +	if (!io_mm)
> +		return ERR_PTR(-ENOMEM);
> +
> +	/*
> +	 * The mm must not be freed until after the driver frees the io_mm
> +	 * (which may involve unpinning the CPU ASID for instance, requiring a
> +	 * valid mm struct.)
> +	 */
> +	mmgrab(mm);
> +
> +	io_mm->mm		= mm;
> +	io_mm->release		= domain->ops->mm_free;
> +	INIT_LIST_HEAD(&io_mm->devices);
> +
> +	idr_preload(GFP_KERNEL);
> +	spin_lock(&iommu_sva_lock);
> +	pasid = idr_alloc_cyclic(&iommu_pasid_idr, io_mm, dev_param->min_pasid,
> +				 dev_param->max_pasid + 1, GFP_ATOMIC);

Can the pasid management code be moved into a common library?
PASID is not stick to SVA. An IOMMU model device could be designed
to use PASID for second level translation (classical DMA translation)
as well.

Best regards,
Lu Baolu

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 03/37] iommu/sva: Manage process address spaces
  2018-03-01  6:52         ` Lu Baolu
  (?)
@ 2018-03-01  8:04             ` Christian König
  -1 siblings, 0 replies; 317+ messages in thread
From: Christian König @ 2018-03-01  8:04 UTC (permalink / raw)
  To: Lu Baolu, Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: mark.rutland-5wv7dgnIgG8, bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w, rjw-LthD3rsA81gm4RdzfppkhA,
	catalin.marinas-5wv7dgnIgG8, xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, sudeep.holla-5wv7dgnIgG8,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	lenb-DgEjT+Ai2ygdnm+yROfE0A

Am 01.03.2018 um 07:52 schrieb Lu Baolu:
> Hi Jean,
>
> On 02/13/2018 02:33 AM, Jean-Philippe Brucker wrote:
>> [SNIP]
>> +	pasid = idr_alloc_cyclic(&iommu_pasid_idr, io_mm, dev_param->min_pasid,
>> +				 dev_param->max_pasid + 1, GFP_ATOMIC);
> Can the pasid management code be moved into a common library?
> PASID is not stick to SVA. An IOMMU model device could be designed
> to use PASID for second level translation (classical DMA translation)
> as well.

Yeah, we have the same problem on amdgpu.

We assign PASIDs to clients even when IOMMU isn't present in the system 
just because we need it for debugging.

E.g. when the hardware detects that some shader program is doing 
something nasty we get the PASID in the interrupt and could for example 
use it to inform the client about the fault.

Regards,
Christian.

>
> Best regards,
> Lu Baolu

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 03/37] iommu/sva: Manage process address spaces
@ 2018-03-01  8:04             ` Christian König
  0 siblings, 0 replies; 317+ messages in thread
From: Christian König @ 2018-03-01  8:04 UTC (permalink / raw)
  To: Lu Baolu, Jean-Philippe Brucker, linux-arm-kernel, linux-pci,
	linux-acpi, devicetree, iommu, kvm
  Cc: mark.rutland, ilias.apalodimas, catalin.marinas, xuzaibo,
	will.deacon, okaya, ashok.raj, bharatku, rfranz, lenb, robh+dt,
	bhelgaas, shunyong.yang, dwmw2, rjw, sudeep.holla

Am 01.03.2018 um 07:52 schrieb Lu Baolu:
> Hi Jean,
>
> On 02/13/2018 02:33 AM, Jean-Philippe Brucker wrote:
>> [SNIP]
>> +	pasid = idr_alloc_cyclic(&iommu_pasid_idr, io_mm, dev_param->min_pasid,
>> +				 dev_param->max_pasid + 1, GFP_ATOMIC);
> Can the pasid management code be moved into a common library?
> PASID is not stick to SVA. An IOMMU model device could be designed
> to use PASID for second level translation (classical DMA translation)
> as well.

Yeah, we have the same problem on amdgpu.

We assign PASIDs to clients even when IOMMU isn't present in the system 
just because we need it for debugging.

E.g. when the hardware detects that some shader program is doing 
something nasty we get the PASID in the interrupt and could for example 
use it to inform the client about the fault.

Regards,
Christian.

>
> Best regards,
> Lu Baolu


^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 03/37] iommu/sva: Manage process address spaces
@ 2018-03-01  8:04             ` Christian König
  0 siblings, 0 replies; 317+ messages in thread
From: Christian König @ 2018-03-01  8:04 UTC (permalink / raw)
  To: linux-arm-kernel

Am 01.03.2018 um 07:52 schrieb Lu Baolu:
> Hi Jean,
>
> On 02/13/2018 02:33 AM, Jean-Philippe Brucker wrote:
>> [SNIP]
>> +	pasid = idr_alloc_cyclic(&iommu_pasid_idr, io_mm, dev_param->min_pasid,
>> +				 dev_param->max_pasid + 1, GFP_ATOMIC);
> Can the pasid management code be moved into a common library?
> PASID is not stick to SVA. An IOMMU model device could be designed
> to use PASID for second level translation (classical DMA translation)
> as well.

Yeah, we have the same problem on amdgpu.

We assign PASIDs to clients even when IOMMU isn't present in the system 
just because we need it for debugging.

E.g. when the hardware detects that some shader program is doing 
something nasty we get the PASID in the interrupt and could for example 
use it to inform the client about the fault.

Regards,
Christian.

>
> Best regards,
> Lu Baolu

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices
  2018-02-28 20:34       ` Sinan Kaya
  (?)
@ 2018-03-02 12:32           ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-02 12:32 UTC (permalink / raw)
  To: Sinan Kaya, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Mark Rutland, ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	Catalin Marinas, xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	rjw-LthD3rsA81gm4RdzfppkhA, Sudeep Holla,
	christian.koenig-5C7GfCeVMHo

On 28/02/18 20:34, Sinan Kaya wrote:
> On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
>> +int iommu_sva_unbind_group(struct iommu_group *group, int pasid)
>> +{
>> +	struct group_device *device;
>> +
>> +	mutex_lock(&group->mutex);
>> +	list_for_each_entry(device, &group->devices, list)
>> +		iommu_sva_unbind_device(device->dev, pasid);
>> +	mutex_unlock(&group->mutex);
>> +
>> +	return 0;
>> +}
> 
> I think we should handle the errors returned by iommu_sva_unbind_device() here
> or at least print a warning if we want to still continue unbinding. 

Agreed, though bind_group/unbind_group are probably going away in next
series

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices
@ 2018-03-02 12:32           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-02 12:32 UTC (permalink / raw)
  To: Sinan Kaya, linux-arm-kernel, linux-pci, linux-acpi, devicetree,
	iommu, kvm
  Cc: Mark Rutland, xieyisheng1, ilias.apalodimas, Catalin Marinas,
	xuzaibo, jonathan.cameron, Will Deacon, yi.l.liu,
	Lorenzo Pieralisi, ashok.raj, tn, joro, bharatku, rfranz, lenb,
	jacob.jun.pan, alex.williamson, robh+dt, thunder.leizhen,
	bhelgaas, shunyong.yang, dwmw2, liubo95, rjw, jcrouse, robdclark,
	hanjun.guo, Sudeep Holla, Robin Murphy, christian.koenig,
	nwatters

On 28/02/18 20:34, Sinan Kaya wrote:
> On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
>> +int iommu_sva_unbind_group(struct iommu_group *group, int pasid)
>> +{
>> +	struct group_device *device;
>> +
>> +	mutex_lock(&group->mutex);
>> +	list_for_each_entry(device, &group->devices, list)
>> +		iommu_sva_unbind_device(device->dev, pasid);
>> +	mutex_unlock(&group->mutex);
>> +
>> +	return 0;
>> +}
> 
> I think we should handle the errors returned by iommu_sva_unbind_device() here
> or at least print a warning if we want to still continue unbinding. 

Agreed, though bind_group/unbind_group are probably going away in next
series

Thanks,
Jean


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 02/37] iommu/sva: Bind process address spaces to devices
@ 2018-03-02 12:32           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-02 12:32 UTC (permalink / raw)
  To: linux-arm-kernel

On 28/02/18 20:34, Sinan Kaya wrote:
> On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
>> +int iommu_sva_unbind_group(struct iommu_group *group, int pasid)
>> +{
>> +	struct group_device *device;
>> +
>> +	mutex_lock(&group->mutex);
>> +	list_for_each_entry(device, &group->devices, list)
>> +		iommu_sva_unbind_device(device->dev, pasid);
>> +	mutex_unlock(&group->mutex);
>> +
>> +	return 0;
>> +}
> 
> I think we should handle the errors returned by iommu_sva_unbind_device() here
> or at least print a warning if we want to still continue unbinding. 

Agreed, though bind_group/unbind_group are probably going away in next
series

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices
  2018-03-01  3:03                 ` Liu, Yi L
  (?)
@ 2018-03-02 16:03                     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-02 16:03 UTC (permalink / raw)
  To: Liu, Yi L, Tian, Kevin,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Mark Rutland, ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	mykyta.iziumtsev-QSEj5FYQhm4dnm+yROfE0A, Catalin Marinas,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, Raj, Ashok,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	rjw-LthD3rsA81gm4RdzfppkhA, Sudeep Holla,
	christian.koenig-5C7GfCeVMHo

On 01/03/18 03:03, Liu, Yi L wrote:
> Hi Jean,
> 
>> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org]
>> Sent: Thursday, February 15, 2018 8:41 PM
>> Subject: Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices
>>
>> On 13/02/18 23:34, Tian, Kevin wrote:
>>>> From: Jean-Philippe Brucker
>>>> Sent: Tuesday, February 13, 2018 8:57 PM
>>>>
>>>> On 13/02/18 07:54, Tian, Kevin wrote:
>>>>>> From: Jean-Philippe Brucker
>>>>>> Sent: Tuesday, February 13, 2018 2:33 AM
>>>>>>
>>>>>> Add bind() and unbind() operations to the IOMMU API. Device drivers
>>>> can
>>>>>> use them to share process page tables with their devices.
>>>>>> bind_group() is provided for VFIO's convenience, as it needs to
>>>>>> provide a coherent interface on containers. Other device drivers
>>>>>> will most likely want to use bind_device(), which binds a single device in the
>> group.
>>>>>
>>>>> I saw your bind_group implementation tries to bind the address space
>>>>> for all devices within a group, which IMO has some problem. Based on
>>>> PCIe
>>>>> spec, packet routing on the bus doesn't take PASID into consideration.
>>>>> since devices within same group cannot be isolated based on
>>>>> requestor-
>>>> ID
>>>>> i.e. traffic not guaranteed going to IOMMU, enabling SVA on multiple
>>>> devices
>>>>> could cause undesired p2p.
>>>> But so does enabling "classic" DMA... If two devices are not
>>>> protected by ACS for example, they are put in the same IOMMU group,
>>>> and one device might be able to snoop the other's DMA. VFIO allows
>>>> userspace to create a container for them and use MAP/UNMAP, but makes
>>>> it explicit to the user that for DMA, these devices are not isolated
>>>> and must be considered as a single device (you can't pass them to
>>>> different VMs or put them in different containers). So I tried to
>>>> keep the same idea as MAP/UNMAP for SVA, performing BIND/UNBIND
>>>> operations on the VFIO container instead of the device.
>>>
>>> there is a small difference. for classic DMA we can reserve PCI BARs
>>> when allocating IOVA, thus multiple devices in the same group can
>>> still work correctly applied with same translation, if isolation is
>>> not cared in between. However for SVA it's CPU virtual addresses
>>> managed by kernel mm thus difficult to introduce similar address
>>> reservation. Then it's possible for a VA falling into other device's
>>> BAR in the same group and cause undesired p2p traffic. In such regard,
>>> SVA is actually functionally-broken.
>>
>> I think the problem exists even if there is a single device in the group.
>> If for example, malloc() returns a VA that corresponds to a PCI host bridge in IOVA
>> space, performing DMA on that buffer won't reach the IOMMU and will cause
>> undesirable side-effects.
> 
> If only a single device in a group, should it mean there is ACS support in
> the path from this device to root complex? It means any memory request
> from this device would be upstreamed to root complex, thus it should be
> able to avoid undesired p2p traffics. So I intend to believe, even we do
> bind in group level, we actually expect to make it work only for the case
> where a single device within a group.

Yes if each device has its own group then ACS is properly enabled.

Even without thinking about ACS or p2p, all memory requests don't
necessarily make it to the IOMMU. For example transactions targeting the
PCI host bridge MMIO window (marked as RESV_RESERVED by dma-iommu.c),
may get eaten by the RC and not reach the IOMMU (I'm blindly following
the code here, don't have anything in the spec to back me up). Commit
fade1ec055dc also refers to "faults, corruption and other badness"
though I don't know if that's only for PCI or could also affect future
systems.

And I don't think prefixing transactions with a PASID changes the
situation. I couldn't find anything in the PCIe spec contradicting it
and I guess it's up to the root complex implementation. So I tend to
take a conservative approach and assume that RESV_RESERVED regions will
also apply to PASID-prefixed traffic.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices
@ 2018-03-02 16:03                     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-02 16:03 UTC (permalink / raw)
  To: Liu, Yi L, Tian, Kevin, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu, kvm
  Cc: Mark Rutland, xieyisheng1, ilias.apalodimas, mykyta.iziumtsev,
	Catalin Marinas, xuzaibo, jonathan.cameron, Will Deacon, okaya,
	Lorenzo Pieralisi, Raj, Ashok, tn, joro, bharatku, rfranz, lenb,
	jacob.jun.pan, alex.williamson, robh+dt, thunder.leizhen,
	bhelgaas, shunyong.yang, dwmw2, liubo95, rjw, jcrouse, robdclark,
	hanjun.guo, Sudeep Holla, Robin Murphy, christian.koenig,
	nwatters

On 01/03/18 03:03, Liu, Yi L wrote:
> Hi Jean,
> 
>> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
>> Sent: Thursday, February 15, 2018 8:41 PM
>> Subject: Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices
>>
>> On 13/02/18 23:34, Tian, Kevin wrote:
>>>> From: Jean-Philippe Brucker
>>>> Sent: Tuesday, February 13, 2018 8:57 PM
>>>>
>>>> On 13/02/18 07:54, Tian, Kevin wrote:
>>>>>> From: Jean-Philippe Brucker
>>>>>> Sent: Tuesday, February 13, 2018 2:33 AM
>>>>>>
>>>>>> Add bind() and unbind() operations to the IOMMU API. Device drivers
>>>> can
>>>>>> use them to share process page tables with their devices.
>>>>>> bind_group() is provided for VFIO's convenience, as it needs to
>>>>>> provide a coherent interface on containers. Other device drivers
>>>>>> will most likely want to use bind_device(), which binds a single device in the
>> group.
>>>>>
>>>>> I saw your bind_group implementation tries to bind the address space
>>>>> for all devices within a group, which IMO has some problem. Based on
>>>> PCIe
>>>>> spec, packet routing on the bus doesn't take PASID into consideration.
>>>>> since devices within same group cannot be isolated based on
>>>>> requestor-
>>>> ID
>>>>> i.e. traffic not guaranteed going to IOMMU, enabling SVA on multiple
>>>> devices
>>>>> could cause undesired p2p.
>>>> But so does enabling "classic" DMA... If two devices are not
>>>> protected by ACS for example, they are put in the same IOMMU group,
>>>> and one device might be able to snoop the other's DMA. VFIO allows
>>>> userspace to create a container for them and use MAP/UNMAP, but makes
>>>> it explicit to the user that for DMA, these devices are not isolated
>>>> and must be considered as a single device (you can't pass them to
>>>> different VMs or put them in different containers). So I tried to
>>>> keep the same idea as MAP/UNMAP for SVA, performing BIND/UNBIND
>>>> operations on the VFIO container instead of the device.
>>>
>>> there is a small difference. for classic DMA we can reserve PCI BARs
>>> when allocating IOVA, thus multiple devices in the same group can
>>> still work correctly applied with same translation, if isolation is
>>> not cared in between. However for SVA it's CPU virtual addresses
>>> managed by kernel mm thus difficult to introduce similar address
>>> reservation. Then it's possible for a VA falling into other device's
>>> BAR in the same group and cause undesired p2p traffic. In such regard,
>>> SVA is actually functionally-broken.
>>
>> I think the problem exists even if there is a single device in the group.
>> If for example, malloc() returns a VA that corresponds to a PCI host bridge in IOVA
>> space, performing DMA on that buffer won't reach the IOMMU and will cause
>> undesirable side-effects.
> 
> If only a single device in a group, should it mean there is ACS support in
> the path from this device to root complex? It means any memory request
> from this device would be upstreamed to root complex, thus it should be
> able to avoid undesired p2p traffics. So I intend to believe, even we do
> bind in group level, we actually expect to make it work only for the case
> where a single device within a group.

Yes if each device has its own group then ACS is properly enabled.

Even without thinking about ACS or p2p, all memory requests don't
necessarily make it to the IOMMU. For example transactions targeting the
PCI host bridge MMIO window (marked as RESV_RESERVED by dma-iommu.c),
may get eaten by the RC and not reach the IOMMU (I'm blindly following
the code here, don't have anything in the spec to back me up). Commit
fade1ec055dc also refers to "faults, corruption and other badness"
though I don't know if that's only for PCI or could also affect future
systems.

And I don't think prefixing transactions with a PASID changes the
situation. I couldn't find anything in the PCIe spec contradicting it
and I guess it's up to the root complex implementation. So I tend to
take a conservative approach and assume that RESV_RESERVED regions will
also apply to PASID-prefixed traffic.

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 02/37] iommu/sva: Bind process address spaces to devices
@ 2018-03-02 16:03                     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-02 16:03 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/03/18 03:03, Liu, Yi L wrote:
> Hi Jean,
> 
>> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker at arm.com]
>> Sent: Thursday, February 15, 2018 8:41 PM
>> Subject: Re: [PATCH 02/37] iommu/sva: Bind process address spaces to devices
>>
>> On 13/02/18 23:34, Tian, Kevin wrote:
>>>> From: Jean-Philippe Brucker
>>>> Sent: Tuesday, February 13, 2018 8:57 PM
>>>>
>>>> On 13/02/18 07:54, Tian, Kevin wrote:
>>>>>> From: Jean-Philippe Brucker
>>>>>> Sent: Tuesday, February 13, 2018 2:33 AM
>>>>>>
>>>>>> Add bind() and unbind() operations to the IOMMU API. Device drivers
>>>> can
>>>>>> use them to share process page tables with their devices.
>>>>>> bind_group() is provided for VFIO's convenience, as it needs to
>>>>>> provide a coherent interface on containers. Other device drivers
>>>>>> will most likely want to use bind_device(), which binds a single device in the
>> group.
>>>>>
>>>>> I saw your bind_group implementation tries to bind the address space
>>>>> for all devices within a group, which IMO has some problem. Based on
>>>> PCIe
>>>>> spec, packet routing on the bus doesn't take PASID into consideration.
>>>>> since devices within same group cannot be isolated based on
>>>>> requestor-
>>>> ID
>>>>> i.e. traffic not guaranteed going to IOMMU, enabling SVA on multiple
>>>> devices
>>>>> could cause undesired p2p.
>>>> But so does enabling "classic" DMA... If two devices are not
>>>> protected by ACS for example, they are put in the same IOMMU group,
>>>> and one device might be able to snoop the other's DMA. VFIO allows
>>>> userspace to create a container for them and use MAP/UNMAP, but makes
>>>> it explicit to the user that for DMA, these devices are not isolated
>>>> and must be considered as a single device (you can't pass them to
>>>> different VMs or put them in different containers). So I tried to
>>>> keep the same idea as MAP/UNMAP for SVA, performing BIND/UNBIND
>>>> operations on the VFIO container instead of the device.
>>>
>>> there is a small difference. for classic DMA we can reserve PCI BARs
>>> when allocating IOVA, thus multiple devices in the same group can
>>> still work correctly applied with same translation, if isolation is
>>> not cared in between. However for SVA it's CPU virtual addresses
>>> managed by kernel mm thus difficult to introduce similar address
>>> reservation. Then it's possible for a VA falling into other device's
>>> BAR in the same group and cause undesired p2p traffic. In such regard,
>>> SVA is actually functionally-broken.
>>
>> I think the problem exists even if there is a single device in the group.
>> If for example, malloc() returns a VA that corresponds to a PCI host bridge in IOVA
>> space, performing DMA on that buffer won't reach the IOMMU and will cause
>> undesirable side-effects.
> 
> If only a single device in a group, should it mean there is ACS support in
> the path from this device to root complex? It means any memory request
> from this device would be upstreamed to root complex, thus it should be
> able to avoid undesired p2p traffics. So I intend to believe, even we do
> bind in group level, we actually expect to make it work only for the case
> where a single device within a group.

Yes if each device has its own group then ACS is properly enabled.

Even without thinking about ACS or p2p, all memory requests don't
necessarily make it to the IOMMU. For example transactions targeting the
PCI host bridge MMIO window (marked as RESV_RESERVED by dma-iommu.c),
may get eaten by the RC and not reach the IOMMU (I'm blindly following
the code here, don't have anything in the spec to back me up). Commit
fade1ec055dc also refers to "faults, corruption and other badness"
though I don't know if that's only for PCI or could also affect future
systems.

And I don't think prefixing transactions with a PASID changes the
situation. I couldn't find anything in the PCIe spec contradicting it
and I guess it's up to the root complex implementation. So I tend to
take a conservative approach and assume that RESV_RESERVED regions will
also apply to PASID-prefixed traffic.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 03/37] iommu/sva: Manage process address spaces
  2018-03-01  6:52         ` Lu Baolu
  (?)
@ 2018-03-02 16:19             ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-02 16:19 UTC (permalink / raw)
  To: Lu Baolu, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Mark Rutland, bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w, rjw-LthD3rsA81gm4RdzfppkhA,
	Catalin Marinas, xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, Sudeep Holla,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	christian.koenig-5C7GfCeVMHo, lenb-DgEjT+Ai2ygdnm+yROfE0A

On 01/03/18 06:52, Lu Baolu wrote:
> Can the pasid management code be moved into a common library?
> PASID is not stick to SVA. An IOMMU model device could be designed
> to use PASID for second level translation (classical DMA translation)
> as well.

What do you mean by second level translation? Do you see a use-case with
nesting translation within the host?

I agree that PASID + classical DMA is desirable. A device driver would
allocate PASIDs and perform iommu_sva_map(domain, pasid, iova, pa, size,
prot) and iommu_sva_unmap(domain, pasid, iova, size). I'm hoping that we
can also augment the DMA API with PASIDs, and that a driver can use both
shared and private contexts simultaneously. So that it can use a few
PASIDs for management purpose, and assign the rest to userspace.

The intent is for iommu-sva.c to be this common library. Work for
"private" PASID allocation is underway, see Jordan Crouse's series
posted last week https://www.spinics.net/lists/arm-kernel/msg635857.html

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 03/37] iommu/sva: Manage process address spaces
@ 2018-03-02 16:19             ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-02 16:19 UTC (permalink / raw)
  To: Lu Baolu, linux-arm-kernel, linux-pci, linux-acpi, devicetree,
	iommu, kvm
  Cc: Mark Rutland, bharatku, ashok.raj, shunyong.yang, rjw,
	Catalin Marinas, xuzaibo, ilias.apalodimas, Will Deacon, okaya,
	bhelgaas, robh+dt, Sudeep Holla, rfranz, dwmw2, christian.koenig,
	lenb

On 01/03/18 06:52, Lu Baolu wrote:
> Can the pasid management code be moved into a common library?
> PASID is not stick to SVA. An IOMMU model device could be designed
> to use PASID for second level translation (classical DMA translation)
> as well.

What do you mean by second level translation? Do you see a use-case with
nesting translation within the host?

I agree that PASID + classical DMA is desirable. A device driver would
allocate PASIDs and perform iommu_sva_map(domain, pasid, iova, pa, size,
prot) and iommu_sva_unmap(domain, pasid, iova, size). I'm hoping that we
can also augment the DMA API with PASIDs, and that a driver can use both
shared and private contexts simultaneously. So that it can use a few
PASIDs for management purpose, and assign the rest to userspace.

The intent is for iommu-sva.c to be this common library. Work for
"private" PASID allocation is underway, see Jordan Crouse's series
posted last week https://www.spinics.net/lists/arm-kernel/msg635857.html

Thanks,
Jean


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 03/37] iommu/sva: Manage process address spaces
@ 2018-03-02 16:19             ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-02 16:19 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/03/18 06:52, Lu Baolu wrote:
> Can the pasid management code be moved into a common library?
> PASID is not stick to SVA. An IOMMU model device could be designed
> to use PASID for second level translation (classical DMA translation)
> as well.

What do you mean by second level translation? Do you see a use-case with
nesting translation within the host?

I agree that PASID + classical DMA is desirable. A device driver would
allocate PASIDs and perform iommu_sva_map(domain, pasid, iova, pa, size,
prot) and iommu_sva_unmap(domain, pasid, iova, size). I'm hoping that we
can also augment the DMA API with PASIDs, and that a driver can use both
shared and private contexts simultaneously. So that it can use a few
PASIDs for management purpose, and assign the rest to userspace.

The intent is for iommu-sva.c to be this common library. Work for
"private" PASID allocation is underway, see Jordan Crouse's series
posted last week https://www.spinics.net/lists/arm-kernel/msg635857.html

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 03/37] iommu/sva: Manage process address spaces
  2018-03-01  8:04             ` Christian König
  (?)
@ 2018-03-02 16:42                 ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-02 16:42 UTC (permalink / raw)
  To: Christian König, Lu Baolu,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Mark Rutland, bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w, rjw-LthD3rsA81gm4RdzfppkhA,
	Catalin Marinas, xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, Sudeep Holla,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	lenb-DgEjT+Ai2ygdnm+yROfE0A

On 01/03/18 08:04, Christian König wrote:
> Am 01.03.2018 um 07:52 schrieb Lu Baolu:
>> Hi Jean,
>>
>> On 02/13/2018 02:33 AM, Jean-Philippe Brucker wrote:
>>> [SNIP]
>>> +	pasid = idr_alloc_cyclic(&iommu_pasid_idr, io_mm, dev_param->min_pasid,
>>> +				 dev_param->max_pasid + 1, GFP_ATOMIC);
>> Can the pasid management code be moved into a common library?
>> PASID is not stick to SVA. An IOMMU model device could be designed
>> to use PASID for second level translation (classical DMA translation)
>> as well.
> 
> Yeah, we have the same problem on amdgpu.
> 
> We assign PASIDs to clients even when IOMMU isn't present in the system 
> just because we need it for debugging.
> 
> E.g. when the hardware detects that some shader program is doing 
> something nasty we get the PASID in the interrupt and could for example 
> use it to inform the client about the fault.

This seems like a new requirement altogether, and a bit outside the
scope of this series to be honest. Is the client userspace in this
context? I guess it would be mostly for prototyping then? Otherwise you
probably don't want to hand GPU contexts to userspace without an IOMMU
isolating them?

If you don't need mm tracking/sharing or iommu_map/unmap, then maybe an
IDR private to the GPU driver would be enough? If you do need mm
tracking, I suppose we could modify iommu_sva_bind() to allocate and
track io_mm even if the given device doesn't have an IOMMU, but it seems
a bit backward.

Thanks,
Jean
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 03/37] iommu/sva: Manage process address spaces
@ 2018-03-02 16:42                 ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-02 16:42 UTC (permalink / raw)
  To: Christian König, Lu Baolu, linux-arm-kernel, linux-pci,
	linux-acpi, devicetree, iommu, kvm
  Cc: Mark Rutland, bharatku, ashok.raj, shunyong.yang, rjw,
	Catalin Marinas, xuzaibo, ilias.apalodimas, Will Deacon, okaya,
	bhelgaas, robh+dt, Sudeep Holla, rfranz, dwmw2, lenb

T24gMDEvMDMvMTggMDg6MDQsIENocmlzdGlhbiBLw7ZuaWcgd3JvdGU6Cj4gQW0gMDEuMDMuMjAx
OCB1bSAwNzo1MiBzY2hyaWViIEx1IEJhb2x1Ogo+PiBIaSBKZWFuLAo+Pgo+PiBPbiAwMi8xMy8y
MDE4IDAyOjMzIEFNLCBKZWFuLVBoaWxpcHBlIEJydWNrZXIgd3JvdGU6Cj4+PiBbU05JUF0KPj4+
ICsJcGFzaWQgPSBpZHJfYWxsb2NfY3ljbGljKCZpb21tdV9wYXNpZF9pZHIsIGlvX21tLCBkZXZf
cGFyYW0tPm1pbl9wYXNpZCwKPj4+ICsJCQkJIGRldl9wYXJhbS0+bWF4X3Bhc2lkICsgMSwgR0ZQ
X0FUT01JQyk7Cj4+IENhbiB0aGUgcGFzaWQgbWFuYWdlbWVudCBjb2RlIGJlIG1vdmVkIGludG8g
YSBjb21tb24gbGlicmFyeT8KPj4gUEFTSUQgaXMgbm90IHN0aWNrIHRvIFNWQS4gQW4gSU9NTVUg
bW9kZWwgZGV2aWNlIGNvdWxkIGJlIGRlc2lnbmVkCj4+IHRvIHVzZSBQQVNJRCBmb3Igc2Vjb25k
IGxldmVsIHRyYW5zbGF0aW9uIChjbGFzc2ljYWwgRE1BIHRyYW5zbGF0aW9uKQo+PiBhcyB3ZWxs
Lgo+IAo+IFllYWgsIHdlIGhhdmUgdGhlIHNhbWUgcHJvYmxlbSBvbiBhbWRncHUuCj4gCj4gV2Ug
YXNzaWduIFBBU0lEcyB0byBjbGllbnRzIGV2ZW4gd2hlbiBJT01NVSBpc24ndCBwcmVzZW50IGlu
IHRoZSBzeXN0ZW0gCj4ganVzdCBiZWNhdXNlIHdlIG5lZWQgaXQgZm9yIGRlYnVnZ2luZy4KPiAK
PiBFLmcuIHdoZW4gdGhlIGhhcmR3YXJlIGRldGVjdHMgdGhhdCBzb21lIHNoYWRlciBwcm9ncmFt
IGlzIGRvaW5nIAo+IHNvbWV0aGluZyBuYXN0eSB3ZSBnZXQgdGhlIFBBU0lEIGluIHRoZSBpbnRl
cnJ1cHQgYW5kIGNvdWxkIGZvciBleGFtcGxlIAo+IHVzZSBpdCB0byBpbmZvcm0gdGhlIGNsaWVu
dCBhYm91dCB0aGUgZmF1bHQuCgpUaGlzIHNlZW1zIGxpa2UgYSBuZXcgcmVxdWlyZW1lbnQgYWx0
b2dldGhlciwgYW5kIGEgYml0IG91dHNpZGUgdGhlCnNjb3BlIG9mIHRoaXMgc2VyaWVzIHRvIGJl
IGhvbmVzdC4gSXMgdGhlIGNsaWVudCB1c2Vyc3BhY2UgaW4gdGhpcwpjb250ZXh0PyBJIGd1ZXNz
IGl0IHdvdWxkIGJlIG1vc3RseSBmb3IgcHJvdG90eXBpbmcgdGhlbj8gT3RoZXJ3aXNlIHlvdQpw
cm9iYWJseSBkb24ndCB3YW50IHRvIGhhbmQgR1BVIGNvbnRleHRzIHRvIHVzZXJzcGFjZSB3aXRo
b3V0IGFuIElPTU1VCmlzb2xhdGluZyB0aGVtPwoKSWYgeW91IGRvbid0IG5lZWQgbW0gdHJhY2tp
bmcvc2hhcmluZyBvciBpb21tdV9tYXAvdW5tYXAsIHRoZW4gbWF5YmUgYW4KSURSIHByaXZhdGUg
dG8gdGhlIEdQVSBkcml2ZXIgd291bGQgYmUgZW5vdWdoPyBJZiB5b3UgZG8gbmVlZCBtbQp0cmFj
a2luZywgSSBzdXBwb3NlIHdlIGNvdWxkIG1vZGlmeSBpb21tdV9zdmFfYmluZCgpIHRvIGFsbG9j
YXRlIGFuZAp0cmFjayBpb19tbSBldmVuIGlmIHRoZSBnaXZlbiBkZXZpY2UgZG9lc24ndCBoYXZl
IGFuIElPTU1VLCBidXQgaXQgc2VlbXMKYSBiaXQgYmFja3dhcmQuCgpUaGFua3MsCkplYW4KCl9f
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fCmxpbnV4LWFybS1r
ZXJuZWwgbWFpbGluZyBsaXN0CmxpbnV4LWFybS1rZXJuZWxAbGlzdHMuaW5mcmFkZWFkLm9yZwpo
dHRwOi8vbGlzdHMuaW5mcmFkZWFkLm9yZy9tYWlsbWFuL2xpc3RpbmZvL2xpbnV4LWFybS1rZXJu
ZWwK

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 03/37] iommu/sva: Manage process address spaces
@ 2018-03-02 16:42                 ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-02 16:42 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/03/18 08:04, Christian K?nig wrote:
> Am 01.03.2018 um 07:52 schrieb Lu Baolu:
>> Hi Jean,
>>
>> On 02/13/2018 02:33 AM, Jean-Philippe Brucker wrote:
>>> [SNIP]
>>> +	pasid = idr_alloc_cyclic(&iommu_pasid_idr, io_mm, dev_param->min_pasid,
>>> +				 dev_param->max_pasid + 1, GFP_ATOMIC);
>> Can the pasid management code be moved into a common library?
>> PASID is not stick to SVA. An IOMMU model device could be designed
>> to use PASID for second level translation (classical DMA translation)
>> as well.
> 
> Yeah, we have the same problem on amdgpu.
> 
> We assign PASIDs to clients even when IOMMU isn't present in the system 
> just because we need it for debugging.
> 
> E.g. when the hardware detects that some shader program is doing 
> something nasty we get the PASID in the interrupt and could for example 
> use it to inform the client about the fault.

This seems like a new requirement altogether, and a bit outside the
scope of this series to be honest. Is the client userspace in this
context? I guess it would be mostly for prototyping then? Otherwise you
probably don't want to hand GPU contexts to userspace without an IOMMU
isolating them?

If you don't need mm tracking/sharing or iommu_map/unmap, then maybe an
IDR private to the GPU driver would be enough? If you do need mm
tracking, I suppose we could modify iommu_sva_bind() to allocate and
track io_mm even if the given device doesn't have an IOMMU, but it seems
a bit backward.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 35/37] iommu/arm-smmu-v3: Add support for PRI
  2018-02-12 18:33     ` Jean-Philippe Brucker
  (?)
  (?)
@ 2018-03-05 12:29         ` Dongdong Liu
  -1 siblings, 0 replies; 317+ messages in thread
From: Dongdong Liu @ 2018-03-05 12:29 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: mark.rutland-5wv7dgnIgG8,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	catalin.marinas-5wv7dgnIgG8, xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	will.deacon-5wv7dgnIgG8, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	rjw-LthD3rsA81gm4RdzfppkhA, sudeep.holla-5wv7dgnIgG8,
	christian.koenig-5C7GfCeVMHo

>
> +static int arm_smmu_enable_pri(struct arm_smmu_master_data *master)
> +{
> +	int ret, pos;
> +	struct pci_dev *pdev;
> +	/*
> +	 * TODO: find a good inflight PPR number. We should divide the PRI queue
> +	 * by the number of PRI-capable devices, but it's impossible to know
> +	 * about current and future (hotplugged) devices. So we're at risk of
> +	 * dropping PPRs (and leaking pending requests in the FQ).
> +	 */
> +	size_t max_inflight_pprs = 16;
> +	struct arm_smmu_device *smmu = master->smmu;
> +
> +	if (!(smmu->features & ARM_SMMU_FEAT_PRI) || !dev_is_pci(master->dev))
> +		return -ENOSYS;
> +
> +	pdev = to_pci_dev(master->dev);
> +
 From here
> +	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> +	if (!pos)
> +		return -ENOSYS;
to here, seems this code is not needed as it is already done in
pci_reset_pri().

Thanks,
Dongdong
> +
> +	ret = pci_reset_pri(pdev);
> +	if (ret)
> +		return ret;
> +
> +	ret = pci_enable_pri(pdev, max_inflight_pprs);
> +	if (ret) {
> +		dev_err(master->dev, "cannot enable PRI: %d\n", ret);
> +		return ret;
> +	}
> +
> +	master->can_fault = true;
> +	master->ste.prg_resp_needs_ssid = pci_prg_resp_requires_prefix(pdev);
> +
> +	dev_dbg(master->dev, "enabled PRI");
> +
> +	return 0;
> +}
> +
>  static void arm_smmu_disable_ats(struct arm_smmu_master_data *master)
>  {
>  	struct pci_dev *pdev;
> @@ -2548,6 +2592,22 @@ static void arm_smmu_disable_ats(struct arm_smmu_master_data *master)
>  	pci_disable_ats(pdev);
>  }
>
> +static void arm_smmu_disable_pri(struct arm_smmu_master_data *master)
> +{
> +	struct pci_dev *pdev;
> +
> +	if (!dev_is_pci(master->dev))
> +		return;
> +
> +	pdev = to_pci_dev(master->dev);
> +
> +	if (!pdev->pri_enabled)
> +		return;
> +
> +	pci_disable_pri(pdev);
> +	master->can_fault = false;
> +}
> +
>  static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
>  				  struct arm_smmu_master_data *master)
>  {
> @@ -2668,12 +2728,13 @@ static int arm_smmu_add_device(struct device *dev)
>  		master->ste.can_stall = true;
>  	}
>
> -	arm_smmu_enable_ats(master);
> +	if (!arm_smmu_enable_ats(master))
> +		arm_smmu_enable_pri(master);
>
>  	group = iommu_group_get_for_dev(dev);
>  	if (IS_ERR(group)) {
>  		ret = PTR_ERR(group);
> -		goto err_disable_ats;
> +		goto err_disable_pri;
>  	}
>
>  	iommu_group_put(group);
> @@ -2682,7 +2743,8 @@ static int arm_smmu_add_device(struct device *dev)
>
>  	return 0;
>
> -err_disable_ats:
> +err_disable_pri:
> +	arm_smmu_disable_pri(master);
>  	arm_smmu_disable_ats(master);
>
>  	return ret;
> @@ -2702,6 +2764,8 @@ static void arm_smmu_remove_device(struct device *dev)
>  	if (master && master->ste.assigned)
>  		arm_smmu_detach_dev(dev);
>  	arm_smmu_remove_master(smmu, master);
> +
> +	arm_smmu_disable_pri(master);
>  	arm_smmu_disable_ats(master);
>
>  	iommu_group_remove_device(dev);
>

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 35/37] iommu/arm-smmu-v3: Add support for PRI
@ 2018-03-05 12:29         ` Dongdong Liu
  0 siblings, 0 replies; 317+ messages in thread
From: Dongdong Liu @ 2018-03-05 12:29 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu, kvm
  Cc: mark.rutland, xieyisheng1, ilias.apalodimas, catalin.marinas,
	xuzaibo, jonathan.cameron, will.deacon, okaya, yi.l.liu,
	lorenzo.pieralisi, ashok.raj, tn, joro, bharatku, rfranz, lenb,
	jacob.jun.pan, alex.williamson, robh+dt, thunder.leizhen,
	bhelgaas, shunyong.yang, dwmw2, liubo95, rjw, jcrouse, robdclark,
	hanjun.guo, sudeep.holla, robin.murphy, christian.koenig,
	nwatters

>
> +static int arm_smmu_enable_pri(struct arm_smmu_master_data *master)
> +{
> +	int ret, pos;
> +	struct pci_dev *pdev;
> +	/*
> +	 * TODO: find a good inflight PPR number. We should divide the PRI queue
> +	 * by the number of PRI-capable devices, but it's impossible to know
> +	 * about current and future (hotplugged) devices. So we're at risk of
> +	 * dropping PPRs (and leaking pending requests in the FQ).
> +	 */
> +	size_t max_inflight_pprs = 16;
> +	struct arm_smmu_device *smmu = master->smmu;
> +
> +	if (!(smmu->features & ARM_SMMU_FEAT_PRI) || !dev_is_pci(master->dev))
> +		return -ENOSYS;
> +
> +	pdev = to_pci_dev(master->dev);
> +
 From here
> +	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> +	if (!pos)
> +		return -ENOSYS;
to here, seems this code is not needed as it is already done in
pci_reset_pri().

Thanks,
Dongdong
> +
> +	ret = pci_reset_pri(pdev);
> +	if (ret)
> +		return ret;
> +
> +	ret = pci_enable_pri(pdev, max_inflight_pprs);
> +	if (ret) {
> +		dev_err(master->dev, "cannot enable PRI: %d\n", ret);
> +		return ret;
> +	}
> +
> +	master->can_fault = true;
> +	master->ste.prg_resp_needs_ssid = pci_prg_resp_requires_prefix(pdev);
> +
> +	dev_dbg(master->dev, "enabled PRI");
> +
> +	return 0;
> +}
> +
>  static void arm_smmu_disable_ats(struct arm_smmu_master_data *master)
>  {
>  	struct pci_dev *pdev;
> @@ -2548,6 +2592,22 @@ static void arm_smmu_disable_ats(struct arm_smmu_master_data *master)
>  	pci_disable_ats(pdev);
>  }
>
> +static void arm_smmu_disable_pri(struct arm_smmu_master_data *master)
> +{
> +	struct pci_dev *pdev;
> +
> +	if (!dev_is_pci(master->dev))
> +		return;
> +
> +	pdev = to_pci_dev(master->dev);
> +
> +	if (!pdev->pri_enabled)
> +		return;
> +
> +	pci_disable_pri(pdev);
> +	master->can_fault = false;
> +}
> +
>  static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
>  				  struct arm_smmu_master_data *master)
>  {
> @@ -2668,12 +2728,13 @@ static int arm_smmu_add_device(struct device *dev)
>  		master->ste.can_stall = true;
>  	}
>
> -	arm_smmu_enable_ats(master);
> +	if (!arm_smmu_enable_ats(master))
> +		arm_smmu_enable_pri(master);
>
>  	group = iommu_group_get_for_dev(dev);
>  	if (IS_ERR(group)) {
>  		ret = PTR_ERR(group);
> -		goto err_disable_ats;
> +		goto err_disable_pri;
>  	}
>
>  	iommu_group_put(group);
> @@ -2682,7 +2743,8 @@ static int arm_smmu_add_device(struct device *dev)
>
>  	return 0;
>
> -err_disable_ats:
> +err_disable_pri:
> +	arm_smmu_disable_pri(master);
>  	arm_smmu_disable_ats(master);
>
>  	return ret;
> @@ -2702,6 +2764,8 @@ static void arm_smmu_remove_device(struct device *dev)
>  	if (master && master->ste.assigned)
>  		arm_smmu_detach_dev(dev);
>  	arm_smmu_remove_master(smmu, master);
> +
> +	arm_smmu_disable_pri(master);
>  	arm_smmu_disable_ats(master);
>
>  	iommu_group_remove_device(dev);
>


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 35/37] iommu/arm-smmu-v3: Add support for PRI
@ 2018-03-05 12:29         ` Dongdong Liu
  0 siblings, 0 replies; 317+ messages in thread
From: Dongdong Liu @ 2018-03-05 12:29 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: mark.rutland-5wv7dgnIgG8,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	catalin.marinas-5wv7dgnIgG8, xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	will.deacon-5wv7dgnIgG8, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	rjw-LthD3rsA81gm4RdzfppkhA, sudeep.holla-5wv7dgnIgG8,
	christian.koenig-5C7GfCeVMHo

>
> +static int arm_smmu_enable_pri(struct arm_smmu_master_data *master)
> +{
> +	int ret, pos;
> +	struct pci_dev *pdev;
> +	/*
> +	 * TODO: find a good inflight PPR number. We should divide the PRI queue
> +	 * by the number of PRI-capable devices, but it's impossible to know
> +	 * about current and future (hotplugged) devices. So we're at risk of
> +	 * dropping PPRs (and leaking pending requests in the FQ).
> +	 */
> +	size_t max_inflight_pprs = 16;
> +	struct arm_smmu_device *smmu = master->smmu;
> +
> +	if (!(smmu->features & ARM_SMMU_FEAT_PRI) || !dev_is_pci(master->dev))
> +		return -ENOSYS;
> +
> +	pdev = to_pci_dev(master->dev);
> +
 From here
> +	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> +	if (!pos)
> +		return -ENOSYS;
to here, seems this code is not needed as it is already done in
pci_reset_pri().

Thanks,
Dongdong
> +
> +	ret = pci_reset_pri(pdev);
> +	if (ret)
> +		return ret;
> +
> +	ret = pci_enable_pri(pdev, max_inflight_pprs);
> +	if (ret) {
> +		dev_err(master->dev, "cannot enable PRI: %d\n", ret);
> +		return ret;
> +	}
> +
> +	master->can_fault = true;
> +	master->ste.prg_resp_needs_ssid = pci_prg_resp_requires_prefix(pdev);
> +
> +	dev_dbg(master->dev, "enabled PRI");
> +
> +	return 0;
> +}
> +
>  static void arm_smmu_disable_ats(struct arm_smmu_master_data *master)
>  {
>  	struct pci_dev *pdev;
> @@ -2548,6 +2592,22 @@ static void arm_smmu_disable_ats(struct arm_smmu_master_data *master)
>  	pci_disable_ats(pdev);
>  }
>
> +static void arm_smmu_disable_pri(struct arm_smmu_master_data *master)
> +{
> +	struct pci_dev *pdev;
> +
> +	if (!dev_is_pci(master->dev))
> +		return;
> +
> +	pdev = to_pci_dev(master->dev);
> +
> +	if (!pdev->pri_enabled)
> +		return;
> +
> +	pci_disable_pri(pdev);
> +	master->can_fault = false;
> +}
> +
>  static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
>  				  struct arm_smmu_master_data *master)
>  {
> @@ -2668,12 +2728,13 @@ static int arm_smmu_add_device(struct device *dev)
>  		master->ste.can_stall = true;
>  	}
>
> -	arm_smmu_enable_ats(master);
> +	if (!arm_smmu_enable_ats(master))
> +		arm_smmu_enable_pri(master);
>
>  	group = iommu_group_get_for_dev(dev);
>  	if (IS_ERR(group)) {
>  		ret = PTR_ERR(group);
> -		goto err_disable_ats;
> +		goto err_disable_pri;
>  	}
>
>  	iommu_group_put(group);
> @@ -2682,7 +2743,8 @@ static int arm_smmu_add_device(struct device *dev)
>
>  	return 0;
>
> -err_disable_ats:
> +err_disable_pri:
> +	arm_smmu_disable_pri(master);
>  	arm_smmu_disable_ats(master);
>
>  	return ret;
> @@ -2702,6 +2764,8 @@ static void arm_smmu_remove_device(struct device *dev)
>  	if (master && master->ste.assigned)
>  		arm_smmu_detach_dev(dev);
>  	arm_smmu_remove_master(smmu, master);
> +
> +	arm_smmu_disable_pri(master);
>  	arm_smmu_disable_ats(master);
>
>  	iommu_group_remove_device(dev);
>

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 35/37] iommu/arm-smmu-v3: Add support for PRI
@ 2018-03-05 12:29         ` Dongdong Liu
  0 siblings, 0 replies; 317+ messages in thread
From: Dongdong Liu @ 2018-03-05 12:29 UTC (permalink / raw)
  To: linux-arm-kernel

>
> +static int arm_smmu_enable_pri(struct arm_smmu_master_data *master)
> +{
> +	int ret, pos;
> +	struct pci_dev *pdev;
> +	/*
> +	 * TODO: find a good inflight PPR number. We should divide the PRI queue
> +	 * by the number of PRI-capable devices, but it's impossible to know
> +	 * about current and future (hotplugged) devices. So we're at risk of
> +	 * dropping PPRs (and leaking pending requests in the FQ).
> +	 */
> +	size_t max_inflight_pprs = 16;
> +	struct arm_smmu_device *smmu = master->smmu;
> +
> +	if (!(smmu->features & ARM_SMMU_FEAT_PRI) || !dev_is_pci(master->dev))
> +		return -ENOSYS;
> +
> +	pdev = to_pci_dev(master->dev);
> +
 From here
> +	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> +	if (!pos)
> +		return -ENOSYS;
to here, seems this code is not needed as it is already done in
pci_reset_pri().

Thanks,
Dongdong
> +
> +	ret = pci_reset_pri(pdev);
> +	if (ret)
> +		return ret;
> +
> +	ret = pci_enable_pri(pdev, max_inflight_pprs);
> +	if (ret) {
> +		dev_err(master->dev, "cannot enable PRI: %d\n", ret);
> +		return ret;
> +	}
> +
> +	master->can_fault = true;
> +	master->ste.prg_resp_needs_ssid = pci_prg_resp_requires_prefix(pdev);
> +
> +	dev_dbg(master->dev, "enabled PRI");
> +
> +	return 0;
> +}
> +
>  static void arm_smmu_disable_ats(struct arm_smmu_master_data *master)
>  {
>  	struct pci_dev *pdev;
> @@ -2548,6 +2592,22 @@ static void arm_smmu_disable_ats(struct arm_smmu_master_data *master)
>  	pci_disable_ats(pdev);
>  }
>
> +static void arm_smmu_disable_pri(struct arm_smmu_master_data *master)
> +{
> +	struct pci_dev *pdev;
> +
> +	if (!dev_is_pci(master->dev))
> +		return;
> +
> +	pdev = to_pci_dev(master->dev);
> +
> +	if (!pdev->pri_enabled)
> +		return;
> +
> +	pci_disable_pri(pdev);
> +	master->can_fault = false;
> +}
> +
>  static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
>  				  struct arm_smmu_master_data *master)
>  {
> @@ -2668,12 +2728,13 @@ static int arm_smmu_add_device(struct device *dev)
>  		master->ste.can_stall = true;
>  	}
>
> -	arm_smmu_enable_ats(master);
> +	if (!arm_smmu_enable_ats(master))
> +		arm_smmu_enable_pri(master);
>
>  	group = iommu_group_get_for_dev(dev);
>  	if (IS_ERR(group)) {
>  		ret = PTR_ERR(group);
> -		goto err_disable_ats;
> +		goto err_disable_pri;
>  	}
>
>  	iommu_group_put(group);
> @@ -2682,7 +2743,8 @@ static int arm_smmu_add_device(struct device *dev)
>
>  	return 0;
>
> -err_disable_ats:
> +err_disable_pri:
> +	arm_smmu_disable_pri(master);
>  	arm_smmu_disable_ats(master);
>
>  	return ret;
> @@ -2702,6 +2764,8 @@ static void arm_smmu_remove_device(struct device *dev)
>  	if (master && master->ste.assigned)
>  		arm_smmu_detach_dev(dev);
>  	arm_smmu_remove_master(smmu, master);
> +
> +	arm_smmu_disable_pri(master);
>  	arm_smmu_disable_ats(master);
>
>  	iommu_group_remove_device(dev);
>

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 35/37] iommu/arm-smmu-v3: Add support for PRI
  2018-03-05 12:29         ` Dongdong Liu
  (?)
@ 2018-03-05 13:09             ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-05 13:09 UTC (permalink / raw)
  To: Dongdong Liu, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Mark Rutland, ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	Catalin Marinas, xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	rjw-LthD3rsA81gm4RdzfppkhA, Sudeep Holla,
	christian.koenig-5C7GfCeVMHo

On 05/03/18 12:29, Dongdong Liu wrote:
>>
>> +static int arm_smmu_enable_pri(struct arm_smmu_master_data *master)
>> +{
>> +	int ret, pos;
>> +	struct pci_dev *pdev;
>> +	/*
>> +	 * TODO: find a good inflight PPR number. We should divide the PRI queue
>> +	 * by the number of PRI-capable devices, but it's impossible to know
>> +	 * about current and future (hotplugged) devices. So we're at risk of
>> +	 * dropping PPRs (and leaking pending requests in the FQ).
>> +	 */
>> +	size_t max_inflight_pprs = 16;
>> +	struct arm_smmu_device *smmu = master->smmu;
>> +
>> +	if (!(smmu->features & ARM_SMMU_FEAT_PRI) || !dev_is_pci(master->dev))
>> +		return -ENOSYS;
>> +
>> +	pdev = to_pci_dev(master->dev);
>> +
>  From here
>> +	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
>> +	if (!pos)
>> +		return -ENOSYS;
> to here, seems this code is not needed as it is already done in
> pci_reset_pri().

Indeed, thanks. It would allow to differentiate a device that doesn't
support PRI from a reset error, but since we ignore the return value at
the moment I'll remove it.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 35/37] iommu/arm-smmu-v3: Add support for PRI
@ 2018-03-05 13:09             ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-05 13:09 UTC (permalink / raw)
  To: Dongdong Liu, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu, kvm
  Cc: Mark Rutland, xieyisheng1, ilias.apalodimas, Catalin Marinas,
	xuzaibo, jonathan.cameron, Will Deacon, okaya, yi.l.liu,
	Lorenzo Pieralisi, ashok.raj, tn, joro, bharatku, rfranz, lenb,
	jacob.jun.pan, alex.williamson, robh+dt, thunder.leizhen,
	bhelgaas, shunyong.yang, dwmw2, liubo95, rjw, jcrouse, robdclark,
	hanjun.guo, Sudeep Holla, Robin Murphy, christian.koenig,
	nwatters

On 05/03/18 12:29, Dongdong Liu wrote:
>>
>> +static int arm_smmu_enable_pri(struct arm_smmu_master_data *master)
>> +{
>> +	int ret, pos;
>> +	struct pci_dev *pdev;
>> +	/*
>> +	 * TODO: find a good inflight PPR number. We should divide the PRI queue
>> +	 * by the number of PRI-capable devices, but it's impossible to know
>> +	 * about current and future (hotplugged) devices. So we're at risk of
>> +	 * dropping PPRs (and leaking pending requests in the FQ).
>> +	 */
>> +	size_t max_inflight_pprs = 16;
>> +	struct arm_smmu_device *smmu = master->smmu;
>> +
>> +	if (!(smmu->features & ARM_SMMU_FEAT_PRI) || !dev_is_pci(master->dev))
>> +		return -ENOSYS;
>> +
>> +	pdev = to_pci_dev(master->dev);
>> +
>  From here
>> +	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
>> +	if (!pos)
>> +		return -ENOSYS;
> to here, seems this code is not needed as it is already done in
> pci_reset_pri().

Indeed, thanks. It would allow to differentiate a device that doesn't
support PRI from a reset error, but since we ignore the return value at
the moment I'll remove it.

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 35/37] iommu/arm-smmu-v3: Add support for PRI
@ 2018-03-05 13:09             ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-05 13:09 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/03/18 12:29, Dongdong Liu wrote:
>>
>> +static int arm_smmu_enable_pri(struct arm_smmu_master_data *master)
>> +{
>> +	int ret, pos;
>> +	struct pci_dev *pdev;
>> +	/*
>> +	 * TODO: find a good inflight PPR number. We should divide the PRI queue
>> +	 * by the number of PRI-capable devices, but it's impossible to know
>> +	 * about current and future (hotplugged) devices. So we're at risk of
>> +	 * dropping PPRs (and leaking pending requests in the FQ).
>> +	 */
>> +	size_t max_inflight_pprs = 16;
>> +	struct arm_smmu_device *smmu = master->smmu;
>> +
>> +	if (!(smmu->features & ARM_SMMU_FEAT_PRI) || !dev_is_pci(master->dev))
>> +		return -ENOSYS;
>> +
>> +	pdev = to_pci_dev(master->dev);
>> +
>  From here
>> +	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
>> +	if (!pos)
>> +		return -ENOSYS;
> to here, seems this code is not needed as it is already done in
> pci_reset_pri().

Indeed, thanks. It would allow to differentiate a device that doesn't
support PRI from a reset error, but since we ignore the return value at
the moment I'll remove it.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 03/37] iommu/sva: Manage process address spaces
  2018-02-12 18:33     ` Jean-Philippe Brucker
  (?)
@ 2018-03-05 15:28         ` Sinan Kaya
  -1 siblings, 0 replies; 317+ messages in thread
From: Sinan Kaya @ 2018-03-05 15:28 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: mark.rutland-5wv7dgnIgG8,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	catalin.marinas-5wv7dgnIgG8, xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	will.deacon-5wv7dgnIgG8, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	rjw-LthD3rsA81gm4RdzfppkhA, sudeep.holla-5wv7dgnIgG8,
	christian.koenig-5C7GfCeVMHo

On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
> +static void io_mm_free(struct io_mm *io_mm)
> +{
> +	struct mm_struct *mm;
> +	void (*release)(struct io_mm *);
> +
> +	release = io_mm->release;
> +	mm = io_mm->mm;
> +
> +	release(io_mm);

Is there any reason why you can't call iommu->release()
here directly? Why do you need the release local variable?

> +	mmdrop(mm);
> +}
> +


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 03/37] iommu/sva: Manage process address spaces
@ 2018-03-05 15:28         ` Sinan Kaya
  0 siblings, 0 replies; 317+ messages in thread
From: Sinan Kaya @ 2018-03-05 15:28 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, jcrouse, rfranz,
	dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
> +static void io_mm_free(struct io_mm *io_mm)
> +{
> +	struct mm_struct *mm;
> +	void (*release)(struct io_mm *);
> +
> +	release = io_mm->release;
> +	mm = io_mm->mm;
> +
> +	release(io_mm);

Is there any reason why you can't call iommu->release()
here directly? Why do you need the release local variable?

> +	mmdrop(mm);
> +}
> +


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 03/37] iommu/sva: Manage process address spaces
@ 2018-03-05 15:28         ` Sinan Kaya
  0 siblings, 0 replies; 317+ messages in thread
From: Sinan Kaya @ 2018-03-05 15:28 UTC (permalink / raw)
  To: linux-arm-kernel

On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
> +static void io_mm_free(struct io_mm *io_mm)
> +{
> +	struct mm_struct *mm;
> +	void (*release)(struct io_mm *);
> +
> +	release = io_mm->release;
> +	mm = io_mm->mm;
> +
> +	release(io_mm);

Is there any reason why you can't call iommu->release()
here directly? Why do you need the release local variable?

> +	mmdrop(mm);
> +}
> +


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 07/37] iommu: Add a page fault handler
  2018-02-12 18:33   ` Jean-Philippe Brucker
  (?)
@ 2018-03-05 21:44       ` Sinan Kaya
  -1 siblings, 0 replies; 317+ messages in thread
From: Sinan Kaya @ 2018-03-05 21:44 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: mark.rutland-5wv7dgnIgG8,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	catalin.marinas-5wv7dgnIgG8, xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	will.deacon-5wv7dgnIgG8, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	rjw-LthD3rsA81gm4RdzfppkhA, sudeep.holla-5wv7dgnIgG8,
	christian.koenig-5C7GfCeVMHo

On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
> +static int iommu_queue_fault(struct iommu_domain *domain, struct device *dev,
> +			     struct iommu_fault_event *evt)
> +{
> +	struct iommu_fault_group *group;
> +	struct iommu_fault_context *fault, *next;
> +
> +	if (!iommu_fault_queue)
> +		return -ENOSYS;
> +
> +	if (!evt->last_req) {
> +		fault = kzalloc(sizeof(*fault), GFP_KERNEL);
> +		if (!fault)
> +			return -ENOMEM;
> +
> +		fault->evt = *evt;
> +		fault->dev = dev;
> +
> +		/* Non-last request of a group. Postpone until the last one */
> +		spin_lock(&iommu_partial_faults_lock);
> +		list_add_tail(&fault->head, &iommu_partial_faults);
> +		spin_unlock(&iommu_partial_faults_lock);
> +
> +		return IOMMU_PAGE_RESP_HANDLED;
> +	}
> +
> +	group = kzalloc(sizeof(*group), GFP_KERNEL);
> +	if (!group)
> +		return -ENOMEM;

Release the requests in iommu_partial_faults here.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 07/37] iommu: Add a page fault handler
@ 2018-03-05 21:44       ` Sinan Kaya
  0 siblings, 0 replies; 317+ messages in thread
From: Sinan Kaya @ 2018-03-05 21:44 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu, kvm
  Cc: mark.rutland, xieyisheng1, ilias.apalodimas, catalin.marinas,
	xuzaibo, jonathan.cameron, will.deacon, yi.l.liu,
	lorenzo.pieralisi, ashok.raj, tn, joro, bharatku, rfranz, lenb,
	jacob.jun.pan, alex.williamson, robh+dt, thunder.leizhen,
	bhelgaas, shunyong.yang, dwmw2, liubo95, rjw, jcrouse, robdclark,
	hanjun.guo, sudeep.holla, robin.murphy, christian.koenig,
	nwatters

On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
> +static int iommu_queue_fault(struct iommu_domain *domain, struct device *dev,
> +			     struct iommu_fault_event *evt)
> +{
> +	struct iommu_fault_group *group;
> +	struct iommu_fault_context *fault, *next;
> +
> +	if (!iommu_fault_queue)
> +		return -ENOSYS;
> +
> +	if (!evt->last_req) {
> +		fault = kzalloc(sizeof(*fault), GFP_KERNEL);
> +		if (!fault)
> +			return -ENOMEM;
> +
> +		fault->evt = *evt;
> +		fault->dev = dev;
> +
> +		/* Non-last request of a group. Postpone until the last one */
> +		spin_lock(&iommu_partial_faults_lock);
> +		list_add_tail(&fault->head, &iommu_partial_faults);
> +		spin_unlock(&iommu_partial_faults_lock);
> +
> +		return IOMMU_PAGE_RESP_HANDLED;
> +	}
> +
> +	group = kzalloc(sizeof(*group), GFP_KERNEL);
> +	if (!group)
> +		return -ENOMEM;

Release the requests in iommu_partial_faults here.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 07/37] iommu: Add a page fault handler
@ 2018-03-05 21:44       ` Sinan Kaya
  0 siblings, 0 replies; 317+ messages in thread
From: Sinan Kaya @ 2018-03-05 21:44 UTC (permalink / raw)
  To: linux-arm-kernel

On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
> +static int iommu_queue_fault(struct iommu_domain *domain, struct device *dev,
> +			     struct iommu_fault_event *evt)
> +{
> +	struct iommu_fault_group *group;
> +	struct iommu_fault_context *fault, *next;
> +
> +	if (!iommu_fault_queue)
> +		return -ENOSYS;
> +
> +	if (!evt->last_req) {
> +		fault = kzalloc(sizeof(*fault), GFP_KERNEL);
> +		if (!fault)
> +			return -ENOMEM;
> +
> +		fault->evt = *evt;
> +		fault->dev = dev;
> +
> +		/* Non-last request of a group. Postpone until the last one */
> +		spin_lock(&iommu_partial_faults_lock);
> +		list_add_tail(&fault->head, &iommu_partial_faults);
> +		spin_unlock(&iommu_partial_faults_lock);
> +
> +		return IOMMU_PAGE_RESP_HANDLED;
> +	}
> +
> +	group = kzalloc(sizeof(*group), GFP_KERNEL);
> +	if (!group)
> +		return -ENOMEM;

Release the requests in iommu_partial_faults here.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 07/37] iommu: Add a page fault handler
  2018-02-12 18:33   ` Jean-Philippe Brucker
  (?)
@ 2018-03-05 21:53       ` Sinan Kaya
  -1 siblings, 0 replies; 317+ messages in thread
From: Sinan Kaya @ 2018-03-05 21:53 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: mark.rutland-5wv7dgnIgG8,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	catalin.marinas-5wv7dgnIgG8, xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	will.deacon-5wv7dgnIgG8, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	rjw-LthD3rsA81gm4RdzfppkhA, sudeep.holla-5wv7dgnIgG8,
	christian.koenig-5C7GfCeVMHo

On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
> +static struct workqueue_struct *iommu_fault_queue;

Is there anyway we can make this fault queue per struct device?
Since this is common code, I think it needs some care.


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 07/37] iommu: Add a page fault handler
@ 2018-03-05 21:53       ` Sinan Kaya
  0 siblings, 0 replies; 317+ messages in thread
From: Sinan Kaya @ 2018-03-05 21:53 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu, kvm
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, jcrouse, rfranz,
	dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
> +static struct workqueue_struct *iommu_fault_queue;

Is there anyway we can make this fault queue per struct device?
Since this is common code, I think it needs some care.


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 07/37] iommu: Add a page fault handler
@ 2018-03-05 21:53       ` Sinan Kaya
  0 siblings, 0 replies; 317+ messages in thread
From: Sinan Kaya @ 2018-03-05 21:53 UTC (permalink / raw)
  To: linux-arm-kernel

On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
> +static struct workqueue_struct *iommu_fault_queue;

Is there anyway we can make this fault queue per struct device?
Since this is common code, I think it needs some care.


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 07/37] iommu: Add a page fault handler
  2018-03-05 21:44       ` Sinan Kaya
  (?)
@ 2018-03-06 10:24           ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-06 10:24 UTC (permalink / raw)
  To: Sinan Kaya, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Mark Rutland, ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	Catalin Marinas, xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	rjw-LthD3rsA81gm4RdzfppkhA, Sudeep Holla,
	christian.koenig-5C7GfCeVMHo

On 05/03/18 21:44, Sinan Kaya wrote:
> On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
>> +static int iommu_queue_fault(struct iommu_domain *domain, struct device *dev,
>> +			     struct iommu_fault_event *evt)
>> +{
>> +	struct iommu_fault_group *group;
>> +	struct iommu_fault_context *fault, *next;
>> +
>> +	if (!iommu_fault_queue)
>> +		return -ENOSYS;
>> +
>> +	if (!evt->last_req) {
>> +		fault = kzalloc(sizeof(*fault), GFP_KERNEL);
>> +		if (!fault)
>> +			return -ENOMEM;
>> +
>> +		fault->evt = *evt;
>> +		fault->dev = dev;
>> +
>> +		/* Non-last request of a group. Postpone until the last one */
>> +		spin_lock(&iommu_partial_faults_lock);
>> +		list_add_tail(&fault->head, &iommu_partial_faults);
>> +		spin_unlock(&iommu_partial_faults_lock);
>> +
>> +		return IOMMU_PAGE_RESP_HANDLED;
>> +	}
>> +
>> +	group = kzalloc(sizeof(*group), GFP_KERNEL);
>> +	if (!group)
>> +		return -ENOMEM;
> 
> Release the requests in iommu_partial_faults here.

We move these requests to the group->faults list (which btw should use
list_move instead of the current list_del+list_add) and we release them in
iommu_fault_handle_group()

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 07/37] iommu: Add a page fault handler
@ 2018-03-06 10:24           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-06 10:24 UTC (permalink / raw)
  To: Sinan Kaya, linux-arm-kernel, linux-pci, linux-acpi, devicetree,
	iommu, kvm
  Cc: Mark Rutland, xieyisheng1, ilias.apalodimas, Catalin Marinas,
	xuzaibo, jonathan.cameron, Will Deacon, yi.l.liu,
	Lorenzo Pieralisi, ashok.raj, tn, joro, bharatku, rfranz, lenb,
	jacob.jun.pan, alex.williamson, robh+dt, thunder.leizhen,
	bhelgaas, shunyong.yang, dwmw2, liubo95, rjw, jcrouse, robdclark,
	hanjun.guo, Sudeep Holla, Robin Murphy, christian.koenig,
	nwatters

On 05/03/18 21:44, Sinan Kaya wrote:
> On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
>> +static int iommu_queue_fault(struct iommu_domain *domain, struct device *dev,
>> +			     struct iommu_fault_event *evt)
>> +{
>> +	struct iommu_fault_group *group;
>> +	struct iommu_fault_context *fault, *next;
>> +
>> +	if (!iommu_fault_queue)
>> +		return -ENOSYS;
>> +
>> +	if (!evt->last_req) {
>> +		fault = kzalloc(sizeof(*fault), GFP_KERNEL);
>> +		if (!fault)
>> +			return -ENOMEM;
>> +
>> +		fault->evt = *evt;
>> +		fault->dev = dev;
>> +
>> +		/* Non-last request of a group. Postpone until the last one */
>> +		spin_lock(&iommu_partial_faults_lock);
>> +		list_add_tail(&fault->head, &iommu_partial_faults);
>> +		spin_unlock(&iommu_partial_faults_lock);
>> +
>> +		return IOMMU_PAGE_RESP_HANDLED;
>> +	}
>> +
>> +	group = kzalloc(sizeof(*group), GFP_KERNEL);
>> +	if (!group)
>> +		return -ENOMEM;
> 
> Release the requests in iommu_partial_faults here.

We move these requests to the group->faults list (which btw should use
list_move instead of the current list_del+list_add) and we release them in
iommu_fault_handle_group()

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 07/37] iommu: Add a page fault handler
@ 2018-03-06 10:24           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-06 10:24 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/03/18 21:44, Sinan Kaya wrote:
> On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
>> +static int iommu_queue_fault(struct iommu_domain *domain, struct device *dev,
>> +			     struct iommu_fault_event *evt)
>> +{
>> +	struct iommu_fault_group *group;
>> +	struct iommu_fault_context *fault, *next;
>> +
>> +	if (!iommu_fault_queue)
>> +		return -ENOSYS;
>> +
>> +	if (!evt->last_req) {
>> +		fault = kzalloc(sizeof(*fault), GFP_KERNEL);
>> +		if (!fault)
>> +			return -ENOMEM;
>> +
>> +		fault->evt = *evt;
>> +		fault->dev = dev;
>> +
>> +		/* Non-last request of a group. Postpone until the last one */
>> +		spin_lock(&iommu_partial_faults_lock);
>> +		list_add_tail(&fault->head, &iommu_partial_faults);
>> +		spin_unlock(&iommu_partial_faults_lock);
>> +
>> +		return IOMMU_PAGE_RESP_HANDLED;
>> +	}
>> +
>> +	group = kzalloc(sizeof(*group), GFP_KERNEL);
>> +	if (!group)
>> +		return -ENOMEM;
> 
> Release the requests in iommu_partial_faults here.

We move these requests to the group->faults list (which btw should use
list_move instead of the current list_del+list_add) and we release them in
iommu_fault_handle_group()

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 03/37] iommu/sva: Manage process address spaces
  2018-03-05 15:28         ` Sinan Kaya
  (?)
@ 2018-03-06 10:37             ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-06 10:37 UTC (permalink / raw)
  To: Sinan Kaya, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Mark Rutland, ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	Catalin Marinas, xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	rjw-LthD3rsA81gm4RdzfppkhA, Sudeep Holla,
	christian.koenig-5C7GfCeVMHo

On 05/03/18 15:28, Sinan Kaya wrote:
> On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
>> +static void io_mm_free(struct io_mm *io_mm)
>> +{
>> +	struct mm_struct *mm;
>> +	void (*release)(struct io_mm *);
>> +
>> +	release = io_mm->release;
>> +	mm = io_mm->mm;
>> +
>> +	release(io_mm);
> 
> Is there any reason why you can't call iommu->release()
> here directly? Why do you need the release local variable?

I think I can remove the local variable

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 03/37] iommu/sva: Manage process address spaces
@ 2018-03-06 10:37             ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-06 10:37 UTC (permalink / raw)
  To: Sinan Kaya, linux-arm-kernel, linux-pci, linux-acpi, devicetree,
	iommu, kvm
  Cc: joro, robh+dt, Mark Rutland, Catalin Marinas, Will Deacon,
	Lorenzo Pieralisi, hanjun.guo, Sudeep Holla, rjw, lenb,
	Robin Murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, jcrouse, rfranz,
	dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

On 05/03/18 15:28, Sinan Kaya wrote:
> On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
>> +static void io_mm_free(struct io_mm *io_mm)
>> +{
>> +	struct mm_struct *mm;
>> +	void (*release)(struct io_mm *);
>> +
>> +	release = io_mm->release;
>> +	mm = io_mm->mm;
>> +
>> +	release(io_mm);
> 
> Is there any reason why you can't call iommu->release()
> here directly? Why do you need the release local variable?

I think I can remove the local variable

Thanks,
Jean


^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 03/37] iommu/sva: Manage process address spaces
@ 2018-03-06 10:37             ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-06 10:37 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/03/18 15:28, Sinan Kaya wrote:
> On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
>> +static void io_mm_free(struct io_mm *io_mm)
>> +{
>> +	struct mm_struct *mm;
>> +	void (*release)(struct io_mm *);
>> +
>> +	release = io_mm->release;
>> +	mm = io_mm->mm;
>> +
>> +	release(io_mm);
> 
> Is there any reason why you can't call iommu->release()
> here directly? Why do you need the release local variable?

I think I can remove the local variable

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 07/37] iommu: Add a page fault handler
  2018-03-05 21:53       ` Sinan Kaya
  (?)
@ 2018-03-06 10:46           ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-06 10:46 UTC (permalink / raw)
  To: Sinan Kaya, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Mark Rutland, ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	Catalin Marinas, xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	rjw-LthD3rsA81gm4RdzfppkhA, Sudeep Holla,
	christian.koenig-5C7GfCeVMHo

On 05/03/18 21:53, Sinan Kaya wrote:
> On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
>> +static struct workqueue_struct *iommu_fault_queue;
> 
> Is there anyway we can make this fault queue per struct device?
> Since this is common code, I think it needs some care.

I don't think it's better, the workqueue struct seems large. Maybe having
one wq per IOMMU is a good compromise? As said in my other reply for this
patch, doing so isn't completely straightforward. I'll consider adding an
iommu pointer to the iommu_param struct attached to each device.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 07/37] iommu: Add a page fault handler
@ 2018-03-06 10:46           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-06 10:46 UTC (permalink / raw)
  To: Sinan Kaya, linux-arm-kernel, linux-pci, linux-acpi, devicetree,
	iommu, kvm
  Cc: Mark Rutland, xieyisheng1, ilias.apalodimas, Catalin Marinas,
	xuzaibo, jonathan.cameron, Will Deacon, yi.l.liu,
	Lorenzo Pieralisi, ashok.raj, tn, joro, bharatku, rfranz, lenb,
	jacob.jun.pan, alex.williamson, robh+dt, thunder.leizhen,
	bhelgaas, shunyong.yang, dwmw2, liubo95, rjw, jcrouse, robdclark,
	hanjun.guo, Sudeep Holla, Robin Murphy, christian.koenig,
	nwatters

On 05/03/18 21:53, Sinan Kaya wrote:
> On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
>> +static struct workqueue_struct *iommu_fault_queue;
> 
> Is there anyway we can make this fault queue per struct device?
> Since this is common code, I think it needs some care.

I don't think it's better, the workqueue struct seems large. Maybe having
one wq per IOMMU is a good compromise? As said in my other reply for this
patch, doing so isn't completely straightforward. I'll consider adding an
iommu pointer to the iommu_param struct attached to each device.

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 07/37] iommu: Add a page fault handler
@ 2018-03-06 10:46           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-06 10:46 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/03/18 21:53, Sinan Kaya wrote:
> On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
>> +static struct workqueue_struct *iommu_fault_queue;
> 
> Is there anyway we can make this fault queue per struct device?
> Since this is common code, I think it needs some care.

I don't think it's better, the workqueue struct seems large. Maybe having
one wq per IOMMU is a good compromise? As said in my other reply for this
patch, doing so isn't completely straightforward. I'll consider adding an
iommu pointer to the iommu_param struct attached to each device.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 07/37] iommu: Add a page fault handler
  2018-03-06 10:46           ` Jean-Philippe Brucker
  (?)
@ 2018-03-06 12:52               ` okaya
  -1 siblings, 0 replies; 317+ messages in thread
From: okaya-sgV2jX0FEOL9JmXXK+q4OQ @ 2018-03-06 12:52 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Mark Rutland, ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, Catalin Marinas,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw-LthD3rsA81gm4RdzfppkhA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Sudeep Holla,
	linux-acpi-owner-u79uwXL29TY76Z2rM5mHXA,
	christian.koenig-5C7GfCeVMHo

On 2018-03-06 05:46, Jean-Philippe Brucker wrote:
> On 05/03/18 21:53, Sinan Kaya wrote:
>> On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
>>> +static struct workqueue_struct *iommu_fault_queue;
>> 
>> Is there anyway we can make this fault queue per struct device?
>> Since this is common code, I think it needs some care.
> 
> I don't think it's better, the workqueue struct seems large. Maybe 
> having
> one wq per IOMMU is a good compromise?

Yes, one per iommu sounds reasonable.


As said in my other reply for this
> patch, doing so isn't completely straightforward. I'll consider adding 
> an
> iommu pointer to the iommu_param struct attached to each device.
> 
> Thanks,
> Jean
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" 
> in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 07/37] iommu: Add a page fault handler
@ 2018-03-06 12:52               ` okaya
  0 siblings, 0 replies; 317+ messages in thread
From: okaya @ 2018-03-06 12:52 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Mark Rutland, xieyisheng1, ilias.apalodimas, kvm, linux-pci,
	xuzaibo, jonathan.cameron, Will Deacon, yi.l.liu,
	Lorenzo Pieralisi, ashok.raj, tn, joro, robdclark, bharatku,
	linux-acpi, Catalin Marinas, rfranz, lenb, devicetree,
	jacob.jun.pan, alex.williamson, robh+dt, thunder.leizhen,
	bhelgaas, linux-arm-kernel, shunyong.yang, dwmw2, liubo95, rjw,
	jcrouse, iommu, hanjun.guo, Sudeep Holla, linux-acpi-owner,
	Robin Murphy, christian.koenig, nwatters

On 2018-03-06 05:46, Jean-Philippe Brucker wrote:
> On 05/03/18 21:53, Sinan Kaya wrote:
>> On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
>>> +static struct workqueue_struct *iommu_fault_queue;
>> 
>> Is there anyway we can make this fault queue per struct device?
>> Since this is common code, I think it needs some care.
> 
> I don't think it's better, the workqueue struct seems large. Maybe 
> having
> one wq per IOMMU is a good compromise?

Yes, one per iommu sounds reasonable.


As said in my other reply for this
> patch, doing so isn't completely straightforward. I'll consider adding 
> an
> iommu pointer to the iommu_param struct attached to each device.
> 
> Thanks,
> Jean
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" 
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 07/37] iommu: Add a page fault handler
@ 2018-03-06 12:52               ` okaya
  0 siblings, 0 replies; 317+ messages in thread
From: okaya at codeaurora.org @ 2018-03-06 12:52 UTC (permalink / raw)
  To: linux-arm-kernel

On 2018-03-06 05:46, Jean-Philippe Brucker wrote:
> On 05/03/18 21:53, Sinan Kaya wrote:
>> On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
>>> +static struct workqueue_struct *iommu_fault_queue;
>> 
>> Is there anyway we can make this fault queue per struct device?
>> Since this is common code, I think it needs some care.
> 
> I don't think it's better, the workqueue struct seems large. Maybe 
> having
> one wq per IOMMU is a good compromise?

Yes, one per iommu sounds reasonable.


As said in my other reply for this
> patch, doing so isn't completely straightforward. I'll consider adding 
> an
> iommu pointer to the iommu_param struct attached to each device.
> 
> Thanks,
> Jean
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" 
> in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 07/37] iommu: Add a page fault handler
  2018-02-12 18:33   ` Jean-Philippe Brucker
  (?)
@ 2018-03-08 15:40       ` Jonathan Cameron
  -1 siblings, 0 replies; 317+ messages in thread
From: Jonathan Cameron @ 2018-03-08 15:40 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: mark.rutland-5wv7dgnIgG8,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, catalin.marinas-5wv7dgnIgG8,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw-LthD3rsA81gm4RdzfppkhA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	sudeep.holla-5wv7dgnIgG8, christian.koenig-5C7GfCeVMHo

On Mon, 12 Feb 2018 18:33:22 +0000
Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:

> Some systems allow devices to handle IOMMU translation faults in the core
> mm. For example systems supporting the PCI PRI extension or Arm SMMU stall
> model. Infrastructure for reporting such recoverable page faults was
> recently added to the IOMMU core, for SVA virtualization. Extend
> iommu_report_device_fault() to handle host page faults as well.
> 
> * IOMMU drivers instantiate a fault workqueue, using
>   iommu_fault_queue_init() and iommu_fault_queue_destroy().
> 
> * When it receives a fault event, supposedly in an IRQ handler, the IOMMU
>   driver reports the fault using iommu_report_device_fault()
> 
> * If the device driver registered a handler (e.g. VFIO), pass down the
>   fault event. Otherwise submit it to the fault queue, to be handled in a
>   thread.
> 
> * When the fault corresponds to an io_mm, call the mm fault handler on it
>   (in next patch).
> 
> * Once the fault is handled, the mm wrapper or the device driver reports
>   success of failure with iommu_page_response(). The translation is either
>   retried or aborted, depending on the response code.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
A few really minor points inline...  Basically looks good to me.

> ---
>  drivers/iommu/Kconfig      |  10 ++
>  drivers/iommu/Makefile     |   1 +
>  drivers/iommu/io-pgfault.c | 282 +++++++++++++++++++++++++++++++++++++++++++++
>  drivers/iommu/iommu-sva.c  |   3 -
>  drivers/iommu/iommu.c      |  31 ++---
>  include/linux/iommu.h      |  34 +++++-
>  6 files changed, 339 insertions(+), 22 deletions(-)
>  create mode 100644 drivers/iommu/io-pgfault.c
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 146eebe9a4bb..e751bb9958ba 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -85,6 +85,15 @@ config IOMMU_SVA
>  
>  	  If unsure, say N here.
>  
> +config IOMMU_FAULT
> +	bool "Fault handler for the IOMMU API"
> +	select IOMMU_API
> +	help
> +	  Enable the generic fault handler for the IOMMU API, that handles
> +	  recoverable page faults or inject them into guests.
> +
> +	  If unsure, say N here.
> +
>  config FSL_PAMU
>  	bool "Freescale IOMMU support"
>  	depends on PCI
> @@ -156,6 +165,7 @@ config INTEL_IOMMU
>  	select IOMMU_API
>  	select IOMMU_IOVA
>  	select DMAR_TABLE
> +	select IOMMU_FAULT
>  	help
>  	  DMA remapping (DMAR) devices support enables independent address
>  	  translations for Direct Memory Access (DMA) from devices.
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 1dbcc89ebe4c..f4324e29035e 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -4,6 +4,7 @@ obj-$(CONFIG_IOMMU_API) += iommu-traces.o
>  obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
>  obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
>  obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
> +obj-$(CONFIG_IOMMU_FAULT) += io-pgfault.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
> diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
> new file mode 100644
> index 000000000000..33309ed316d2
> --- /dev/null
> +++ b/drivers/iommu/io-pgfault.c
> @@ -0,0 +1,282 @@
> +/*
> + * Handle device page faults
> + *
> + * Copyright (C) 2018 ARM Ltd.
> + * Author: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
> + *
> + * SPDX-License-Identifier: GPL-2.0
> + */
> +
> +#include <linux/iommu.h>
> +#include <linux/list.h>
> +#include <linux/slab.h>
> +#include <linux/workqueue.h>
> +
> +static struct workqueue_struct *iommu_fault_queue;
> +static DECLARE_RWSEM(iommu_fault_queue_sem);
> +static refcount_t iommu_fault_queue_refs = REFCOUNT_INIT(0);
> +static BLOCKING_NOTIFIER_HEAD(iommu_fault_queue_flush_notifiers);
> +
> +/* Used to store incomplete fault groups */
> +static LIST_HEAD(iommu_partial_faults);
> +static DEFINE_SPINLOCK(iommu_partial_faults_lock);
> +
> +struct iommu_fault_context {
> +	struct device			*dev;
> +	struct iommu_fault_event	evt;
> +	struct list_head		head;
> +};
> +
> +struct iommu_fault_group {
> +	struct iommu_domain		*domain;
> +	struct iommu_fault_context	last_fault;
> +	struct list_head		faults;
> +	struct work_struct		work;
> +};
> +
> +/*
> + * iommu_fault_complete() - Finish handling a fault
> + *
> + * Send a response if necessary and pass on the sanitized status code
> + */
> +static int iommu_fault_complete(struct iommu_domain *domain, struct device *dev,
> +				struct iommu_fault_event *evt, int status)
> +{
> +	struct page_response_msg resp = {
> +		.addr		= evt->addr,
> +		.pasid		= evt->pasid,
> +		.pasid_present	= evt->pasid_valid,
> +		.page_req_group_id = evt->page_req_group_id,
Really trivial, but if you want to align the equals signs, the all need indenting
one more tab.

> +		.type		= IOMMU_PAGE_GROUP_RESP,
> +		.private_data	= evt->iommu_private,
> +	};
> +
> +	/*
> +	 * There is no "handling" an unrecoverable fault, so the only valid
> +	 * return values are 0 or an error.
> +	 */
> +	if (evt->type == IOMMU_FAULT_DMA_UNRECOV)
> +		return status > 0 ? 0 : status;
> +
> +	/* Someone took ownership of the fault and will complete it later */
> +	if (status == IOMMU_PAGE_RESP_HANDLED)
> +		return 0;
> +
> +	/*
> +	 * There was an internal error with handling the recoverable fault. Try
> +	 * to complete the fault if possible.
> +	 */
> +	if (status < 0)
> +		status = IOMMU_PAGE_RESP_INVALID;
> +
> +	if (WARN_ON(!domain->ops->page_response))
> +		/*
> +		 * The IOMMU driver shouldn't have submitted recoverable faults
> +		 * if it cannot receive a response.
> +		 */
> +		return -EINVAL;
> +
> +	resp.resp_code = status;
> +	return domain->ops->page_response(domain, dev, &resp);
> +}
> +
> +static int iommu_fault_handle_single(struct iommu_fault_context *fault)
> +{
> +	/* TODO */
> +	return -ENODEV;
> +}
> +
> +static void iommu_fault_handle_group(struct work_struct *work)
> +{
> +	struct iommu_fault_group *group;
> +	struct iommu_fault_context *fault, *next;
> +	int status = IOMMU_PAGE_RESP_SUCCESS;
> +
> +	group = container_of(work, struct iommu_fault_group, work);
> +
> +	list_for_each_entry_safe(fault, next, &group->faults, head) {
> +		struct iommu_fault_event *evt = &fault->evt;
> +		/*
> +		 * Errors are sticky: don't handle subsequent faults in the
> +		 * group if there is an error.
> +		 */
> +		if (status == IOMMU_PAGE_RESP_SUCCESS)
> +			status = iommu_fault_handle_single(fault);
> +
> +		if (!evt->last_req)
> +			kfree(fault);
> +	}
> +
> +	iommu_fault_complete(group->domain, group->last_fault.dev,
> +			     &group->last_fault.evt, status);
> +	kfree(group);
> +}
> +
> +static int iommu_queue_fault(struct iommu_domain *domain, struct device *dev,
> +			     struct iommu_fault_event *evt)
> +{
> +	struct iommu_fault_group *group;
> +	struct iommu_fault_context *fault, *next;
> +
> +	if (!iommu_fault_queue)
> +		return -ENOSYS;
> +
> +	if (!evt->last_req) {
> +		fault = kzalloc(sizeof(*fault), GFP_KERNEL);
> +		if (!fault)
> +			return -ENOMEM;
> +
> +		fault->evt = *evt;
> +		fault->dev = dev;
> +
> +		/* Non-last request of a group. Postpone until the last one */
> +		spin_lock(&iommu_partial_faults_lock);
> +		list_add_tail(&fault->head, &iommu_partial_faults);
> +		spin_unlock(&iommu_partial_faults_lock);
> +
> +		return IOMMU_PAGE_RESP_HANDLED;
> +	}
> +
> +	group = kzalloc(sizeof(*group), GFP_KERNEL);
> +	if (!group)
> +		return -ENOMEM;
> +
> +	group->last_fault.evt = *evt;
> +	group->last_fault.dev = dev;
> +	group->domain = domain;
> +	INIT_LIST_HEAD(&group->faults);
> +	list_add(&group->last_fault.head, &group->faults);
> +	INIT_WORK(&group->work, iommu_fault_handle_group);
> +
> +	/* See if we have pending faults for this group */
> +	spin_lock(&iommu_partial_faults_lock);
> +	list_for_each_entry_safe(fault, next, &iommu_partial_faults, head) {
> +		if (fault->evt.page_req_group_id == evt->page_req_group_id &&
> +		    fault->dev == dev) {
> +			list_del(&fault->head);
> +			/* Insert *before* the last fault */
> +			list_add(&fault->head, &group->faults);
> +		}
> +	}
> +	spin_unlock(&iommu_partial_faults_lock);
> +
> +	queue_work(iommu_fault_queue, &group->work);
> +
> +	/* Postpone the fault completion */
> +	return IOMMU_PAGE_RESP_HANDLED;
> +}
> +
> +/**
> + * iommu_report_device_fault() - Handle fault in device driver or mm
> + *
> + * If the device driver expressed interest in handling fault, report it through
> + * the callback. If the fault is recoverable, try to page in the address.
> + */
> +int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
> +{
> +	int ret = -ENOSYS;
> +	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> +
> +	if (!domain)
> +		return -ENODEV;
> +
> +	/*
> +	 * if upper layers showed interest and installed a fault handler,
> +	 * invoke it.
> +	 */
> +	if (iommu_has_device_fault_handler(dev)) {
> +		struct iommu_fault_param *param = dev->iommu_param->fault_param;
> +
> +		return param->handler(evt, param->data);
> +	}
> +
> +	/* If the handler is blocking, handle fault in the workqueue */
> +	if (evt->type == IOMMU_FAULT_PAGE_REQ)
> +		ret = iommu_queue_fault(domain, dev, evt);
> +
> +	return iommu_fault_complete(domain, dev, evt, ret);
> +}
> +EXPORT_SYMBOL_GPL(iommu_report_device_fault);
> +
> +/**
> + * iommu_fault_queue_register() - register an IOMMU driver to the fault queue
> + * @flush_notifier: a notifier block that is called before the fault queue is
> + * flushed. The IOMMU driver should commit all faults that are pending in its
> + * low-level queues at the time of the call, into the fault queue. The notifier
> + * takes a device pointer as argument, hinting what endpoint is causing the
> + * flush. When the device is NULL, all faults should be committed.
> + */
> +int iommu_fault_queue_register(struct notifier_block *flush_notifier)
> +{
> +	/*
> +	 * The WQ is unordered because the low-level handler enqueues faults by
> +	 * group. PRI requests within a group have to be ordered, but once
> +	 * that's dealt with, the high-level function can handle groups out of
> +	 * order.
> +	 */
> +	down_write(&iommu_fault_queue_sem);
> +	if (!iommu_fault_queue) {
> +		iommu_fault_queue = alloc_workqueue("iommu_fault_queue",
> +						    WQ_UNBOUND, 0);
> +		if (iommu_fault_queue)
> +			refcount_set(&iommu_fault_queue_refs, 1);
> +	} else {
> +		refcount_inc(&iommu_fault_queue_refs);
> +	}
> +	up_write(&iommu_fault_queue_sem);
> +
> +	if (!iommu_fault_queue)
> +		return -ENOMEM;
> +
> +	if (flush_notifier)
> +		blocking_notifier_chain_register(&iommu_fault_queue_flush_notifiers,
> +						 flush_notifier);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(iommu_fault_queue_register);
> +
> +/**
> + * iommu_fault_queue_flush() - Ensure that all queued faults have been
> + * processed.
> + * @dev: the endpoint whose faults need to be flushed. If NULL, flush all
> + *       pending faults.
> + *
> + * Users must call this function when releasing a PASID, to ensure that all
> + * pending faults affecting this PASID have been handled, and won't affect the
> + * address space of a subsequent process that reuses this PASID.
> + */
> +void iommu_fault_queue_flush(struct device *dev)
> +{
> +	blocking_notifier_call_chain(&iommu_fault_queue_flush_notifiers, 0, dev);
> +
> +	down_read(&iommu_fault_queue_sem);
> +	/*
> +	 * Don't flush the partial faults list. All PRGs with the PASID are
> +	 * complete and have been submitted to the queue.
> +	 */
> +	if (iommu_fault_queue)
> +		flush_workqueue(iommu_fault_queue);
> +	up_read(&iommu_fault_queue_sem);
> +}
> +EXPORT_SYMBOL_GPL(iommu_fault_queue_flush);
> +
> +/**
> + * iommu_fault_queue_unregister() - Unregister an IOMMU driver from the fault
> + * queue.
> + * @flush_notifier: same parameter as iommu_fault_queue_register
> + */
> +void iommu_fault_queue_unregister(struct notifier_block *flush_notifier)
> +{
> +	down_write(&iommu_fault_queue_sem);
> +	if (refcount_dec_and_test(&iommu_fault_queue_refs)) {
> +		destroy_workqueue(iommu_fault_queue);
> +		iommu_fault_queue = NULL;
> +	}
> +	up_write(&iommu_fault_queue_sem);
> +
> +	if (flush_notifier)
> +		blocking_notifier_chain_unregister(&iommu_fault_queue_flush_notifiers,
> +						   flush_notifier);
I would expect the ordering in queue_unregister to be the reverse of queue
register (to make it obvious there are no races).

That would put this last block at the start before potentially destroying
the work queue.  If I'm missing something then perhaps a comment to
explain why the ordering is not the obvious one?

> +}
> +EXPORT_SYMBOL_GPL(iommu_fault_queue_unregister);
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> index 4bc2a8c12465..d7b231cd7355 100644
> --- a/drivers/iommu/iommu-sva.c
> +++ b/drivers/iommu/iommu-sva.c
> @@ -102,9 +102,6 @@
>   * the device table and PASID 0 would be available to the allocator.
>   */
>  
> -/* TODO: stub for the fault queue. Remove later. */
> -#define iommu_fault_queue_flush(...)
> -
>  struct iommu_bond {
>  	struct io_mm		*io_mm;
>  	struct device		*dev;
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 1d60b32a6744..c475893ec7dc 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -798,6 +798,17 @@ int iommu_group_unregister_notifier(struct iommu_group *group,
>  }
>  EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
>  
> +/**
> + * iommu_register_device_fault_handler() - Register a device fault handler
> + * @dev: the device
> + * @handler: the fault handler
> + * @data: private data passed as argument to the callback
> + *
> + * When an IOMMU fault event is received, call this handler with the fault event
> + * and data as argument.
> + *
> + * Return 0 if the fault handler was installed successfully, or an error.
> + */
>  int iommu_register_device_fault_handler(struct device *dev,
>  					iommu_dev_fault_handler_t handler,
>  					void *data)
> @@ -825,6 +836,13 @@ int iommu_register_device_fault_handler(struct device *dev,
>  }
>  EXPORT_SYMBOL_GPL(iommu_register_device_fault_handler);
>  
> +/**
> + * iommu_unregister_device_fault_handler() - Unregister the device fault handler
> + * @dev: the device
> + *
> + * Remove the device fault handler installed with
> + * iommu_register_device_fault_handler().
> + */
>  int iommu_unregister_device_fault_handler(struct device *dev)
>  {
>  	struct iommu_param *idata = dev->iommu_param;
> @@ -840,19 +858,6 @@ int iommu_unregister_device_fault_handler(struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(iommu_unregister_device_fault_handler);
>  
> -
> -int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
> -{
> -	/* we only report device fault if there is a handler registered */
> -	if (!dev->iommu_param || !dev->iommu_param->fault_param ||
> -		!dev->iommu_param->fault_param->handler)
> -		return -ENOSYS;
> -
> -	return dev->iommu_param->fault_param->handler(evt,
> -						dev->iommu_param->fault_param->data);
> -}
> -EXPORT_SYMBOL_GPL(iommu_report_device_fault);
> -
>  /**
>   * iommu_group_id - Return ID for a group
>   * @group: the group to ID
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 226ab4f3ae0e..65e56f28e0ce 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -205,6 +205,7 @@ struct page_response_msg {
>  	u32 resp_code:4;
>  #define IOMMU_PAGE_RESP_SUCCESS	0
>  #define IOMMU_PAGE_RESP_INVALID	1
> +#define IOMMU_PAGE_RESP_HANDLED	2
>  #define IOMMU_PAGE_RESP_FAILURE	0xF
>  
>  	u32 pasid_present:1;
> @@ -534,7 +535,6 @@ extern int iommu_register_device_fault_handler(struct device *dev,
>  
>  extern int iommu_unregister_device_fault_handler(struct device *dev);
>  
> -extern int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt);
>  extern int iommu_page_response(struct iommu_domain *domain, struct device *dev,
>  			       struct page_response_msg *msg);
>  
> @@ -836,11 +836,6 @@ static inline bool iommu_has_device_fault_handler(struct device *dev)
>  	return false;
>  }
>  
> -static inline int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
> -{
> -	return 0;
> -}
> -
>  static inline int iommu_page_response(struct iommu_domain *domain, struct device *dev,
>  				      struct page_response_msg *msg)
>  {
> @@ -1005,4 +1000,31 @@ static inline struct mm_struct *iommu_sva_find(int pasid)
>  }
>  #endif /* CONFIG_IOMMU_SVA */
>  
> +#ifdef CONFIG_IOMMU_FAULT
> +extern int iommu_fault_queue_register(struct notifier_block *flush_notifier);
> +extern void iommu_fault_queue_flush(struct device *dev);
> +extern void iommu_fault_queue_unregister(struct notifier_block *flush_notifier);
> +extern int iommu_report_device_fault(struct device *dev,
> +				     struct iommu_fault_event *evt);
> +#else /* CONFIG_IOMMU_FAULT */
> +static inline int iommu_fault_queue_register(struct notifier_block *flush_notifier)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline void iommu_fault_queue_flush(struct device *dev)
> +{
> +}
> +
> +static inline void iommu_fault_queue_unregister(struct notifier_block *flush_notifier)
> +{
> +}
> +
> +static inline int iommu_report_device_fault(struct device *dev,
> +					    struct iommu_fault_event *evt)
> +{
> +	return 0;
> +}
> +#endif /* CONFIG_IOMMU_FAULT */
> +
>  #endif /* __LINUX_IOMMU_H */

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 07/37] iommu: Add a page fault handler
@ 2018-03-08 15:40       ` Jonathan Cameron
  0 siblings, 0 replies; 317+ messages in thread
From: Jonathan Cameron @ 2018-03-08 15:40 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm,
	joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	shunyong.yang, nwatters, okaya, jcrouse, rfranz, dwmw2,
	jacob.jun.pan, yi.l.liu, ashok.raj, robdclark, christian.koenig,
	bharatku

On Mon, 12 Feb 2018 18:33:22 +0000
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> Some systems allow devices to handle IOMMU translation faults in the core
> mm. For example systems supporting the PCI PRI extension or Arm SMMU stall
> model. Infrastructure for reporting such recoverable page faults was
> recently added to the IOMMU core, for SVA virtualization. Extend
> iommu_report_device_fault() to handle host page faults as well.
> 
> * IOMMU drivers instantiate a fault workqueue, using
>   iommu_fault_queue_init() and iommu_fault_queue_destroy().
> 
> * When it receives a fault event, supposedly in an IRQ handler, the IOMMU
>   driver reports the fault using iommu_report_device_fault()
> 
> * If the device driver registered a handler (e.g. VFIO), pass down the
>   fault event. Otherwise submit it to the fault queue, to be handled in a
>   thread.
> 
> * When the fault corresponds to an io_mm, call the mm fault handler on it
>   (in next patch).
> 
> * Once the fault is handled, the mm wrapper or the device driver reports
>   success of failure with iommu_page_response(). The translation is either
>   retried or aborted, depending on the response code.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
A few really minor points inline...  Basically looks good to me.

> ---
>  drivers/iommu/Kconfig      |  10 ++
>  drivers/iommu/Makefile     |   1 +
>  drivers/iommu/io-pgfault.c | 282 +++++++++++++++++++++++++++++++++++++++++++++
>  drivers/iommu/iommu-sva.c  |   3 -
>  drivers/iommu/iommu.c      |  31 ++---
>  include/linux/iommu.h      |  34 +++++-
>  6 files changed, 339 insertions(+), 22 deletions(-)
>  create mode 100644 drivers/iommu/io-pgfault.c
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 146eebe9a4bb..e751bb9958ba 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -85,6 +85,15 @@ config IOMMU_SVA
>  
>  	  If unsure, say N here.
>  
> +config IOMMU_FAULT
> +	bool "Fault handler for the IOMMU API"
> +	select IOMMU_API
> +	help
> +	  Enable the generic fault handler for the IOMMU API, that handles
> +	  recoverable page faults or inject them into guests.
> +
> +	  If unsure, say N here.
> +
>  config FSL_PAMU
>  	bool "Freescale IOMMU support"
>  	depends on PCI
> @@ -156,6 +165,7 @@ config INTEL_IOMMU
>  	select IOMMU_API
>  	select IOMMU_IOVA
>  	select DMAR_TABLE
> +	select IOMMU_FAULT
>  	help
>  	  DMA remapping (DMAR) devices support enables independent address
>  	  translations for Direct Memory Access (DMA) from devices.
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 1dbcc89ebe4c..f4324e29035e 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -4,6 +4,7 @@ obj-$(CONFIG_IOMMU_API) += iommu-traces.o
>  obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
>  obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
>  obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
> +obj-$(CONFIG_IOMMU_FAULT) += io-pgfault.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
> diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
> new file mode 100644
> index 000000000000..33309ed316d2
> --- /dev/null
> +++ b/drivers/iommu/io-pgfault.c
> @@ -0,0 +1,282 @@
> +/*
> + * Handle device page faults
> + *
> + * Copyright (C) 2018 ARM Ltd.
> + * Author: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> + *
> + * SPDX-License-Identifier: GPL-2.0
> + */
> +
> +#include <linux/iommu.h>
> +#include <linux/list.h>
> +#include <linux/slab.h>
> +#include <linux/workqueue.h>
> +
> +static struct workqueue_struct *iommu_fault_queue;
> +static DECLARE_RWSEM(iommu_fault_queue_sem);
> +static refcount_t iommu_fault_queue_refs = REFCOUNT_INIT(0);
> +static BLOCKING_NOTIFIER_HEAD(iommu_fault_queue_flush_notifiers);
> +
> +/* Used to store incomplete fault groups */
> +static LIST_HEAD(iommu_partial_faults);
> +static DEFINE_SPINLOCK(iommu_partial_faults_lock);
> +
> +struct iommu_fault_context {
> +	struct device			*dev;
> +	struct iommu_fault_event	evt;
> +	struct list_head		head;
> +};
> +
> +struct iommu_fault_group {
> +	struct iommu_domain		*domain;
> +	struct iommu_fault_context	last_fault;
> +	struct list_head		faults;
> +	struct work_struct		work;
> +};
> +
> +/*
> + * iommu_fault_complete() - Finish handling a fault
> + *
> + * Send a response if necessary and pass on the sanitized status code
> + */
> +static int iommu_fault_complete(struct iommu_domain *domain, struct device *dev,
> +				struct iommu_fault_event *evt, int status)
> +{
> +	struct page_response_msg resp = {
> +		.addr		= evt->addr,
> +		.pasid		= evt->pasid,
> +		.pasid_present	= evt->pasid_valid,
> +		.page_req_group_id = evt->page_req_group_id,
Really trivial, but if you want to align the equals signs, the all need indenting
one more tab.

> +		.type		= IOMMU_PAGE_GROUP_RESP,
> +		.private_data	= evt->iommu_private,
> +	};
> +
> +	/*
> +	 * There is no "handling" an unrecoverable fault, so the only valid
> +	 * return values are 0 or an error.
> +	 */
> +	if (evt->type == IOMMU_FAULT_DMA_UNRECOV)
> +		return status > 0 ? 0 : status;
> +
> +	/* Someone took ownership of the fault and will complete it later */
> +	if (status == IOMMU_PAGE_RESP_HANDLED)
> +		return 0;
> +
> +	/*
> +	 * There was an internal error with handling the recoverable fault. Try
> +	 * to complete the fault if possible.
> +	 */
> +	if (status < 0)
> +		status = IOMMU_PAGE_RESP_INVALID;
> +
> +	if (WARN_ON(!domain->ops->page_response))
> +		/*
> +		 * The IOMMU driver shouldn't have submitted recoverable faults
> +		 * if it cannot receive a response.
> +		 */
> +		return -EINVAL;
> +
> +	resp.resp_code = status;
> +	return domain->ops->page_response(domain, dev, &resp);
> +}
> +
> +static int iommu_fault_handle_single(struct iommu_fault_context *fault)
> +{
> +	/* TODO */
> +	return -ENODEV;
> +}
> +
> +static void iommu_fault_handle_group(struct work_struct *work)
> +{
> +	struct iommu_fault_group *group;
> +	struct iommu_fault_context *fault, *next;
> +	int status = IOMMU_PAGE_RESP_SUCCESS;
> +
> +	group = container_of(work, struct iommu_fault_group, work);
> +
> +	list_for_each_entry_safe(fault, next, &group->faults, head) {
> +		struct iommu_fault_event *evt = &fault->evt;
> +		/*
> +		 * Errors are sticky: don't handle subsequent faults in the
> +		 * group if there is an error.
> +		 */
> +		if (status == IOMMU_PAGE_RESP_SUCCESS)
> +			status = iommu_fault_handle_single(fault);
> +
> +		if (!evt->last_req)
> +			kfree(fault);
> +	}
> +
> +	iommu_fault_complete(group->domain, group->last_fault.dev,
> +			     &group->last_fault.evt, status);
> +	kfree(group);
> +}
> +
> +static int iommu_queue_fault(struct iommu_domain *domain, struct device *dev,
> +			     struct iommu_fault_event *evt)
> +{
> +	struct iommu_fault_group *group;
> +	struct iommu_fault_context *fault, *next;
> +
> +	if (!iommu_fault_queue)
> +		return -ENOSYS;
> +
> +	if (!evt->last_req) {
> +		fault = kzalloc(sizeof(*fault), GFP_KERNEL);
> +		if (!fault)
> +			return -ENOMEM;
> +
> +		fault->evt = *evt;
> +		fault->dev = dev;
> +
> +		/* Non-last request of a group. Postpone until the last one */
> +		spin_lock(&iommu_partial_faults_lock);
> +		list_add_tail(&fault->head, &iommu_partial_faults);
> +		spin_unlock(&iommu_partial_faults_lock);
> +
> +		return IOMMU_PAGE_RESP_HANDLED;
> +	}
> +
> +	group = kzalloc(sizeof(*group), GFP_KERNEL);
> +	if (!group)
> +		return -ENOMEM;
> +
> +	group->last_fault.evt = *evt;
> +	group->last_fault.dev = dev;
> +	group->domain = domain;
> +	INIT_LIST_HEAD(&group->faults);
> +	list_add(&group->last_fault.head, &group->faults);
> +	INIT_WORK(&group->work, iommu_fault_handle_group);
> +
> +	/* See if we have pending faults for this group */
> +	spin_lock(&iommu_partial_faults_lock);
> +	list_for_each_entry_safe(fault, next, &iommu_partial_faults, head) {
> +		if (fault->evt.page_req_group_id == evt->page_req_group_id &&
> +		    fault->dev == dev) {
> +			list_del(&fault->head);
> +			/* Insert *before* the last fault */
> +			list_add(&fault->head, &group->faults);
> +		}
> +	}
> +	spin_unlock(&iommu_partial_faults_lock);
> +
> +	queue_work(iommu_fault_queue, &group->work);
> +
> +	/* Postpone the fault completion */
> +	return IOMMU_PAGE_RESP_HANDLED;
> +}
> +
> +/**
> + * iommu_report_device_fault() - Handle fault in device driver or mm
> + *
> + * If the device driver expressed interest in handling fault, report it through
> + * the callback. If the fault is recoverable, try to page in the address.
> + */
> +int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
> +{
> +	int ret = -ENOSYS;
> +	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> +
> +	if (!domain)
> +		return -ENODEV;
> +
> +	/*
> +	 * if upper layers showed interest and installed a fault handler,
> +	 * invoke it.
> +	 */
> +	if (iommu_has_device_fault_handler(dev)) {
> +		struct iommu_fault_param *param = dev->iommu_param->fault_param;
> +
> +		return param->handler(evt, param->data);
> +	}
> +
> +	/* If the handler is blocking, handle fault in the workqueue */
> +	if (evt->type == IOMMU_FAULT_PAGE_REQ)
> +		ret = iommu_queue_fault(domain, dev, evt);
> +
> +	return iommu_fault_complete(domain, dev, evt, ret);
> +}
> +EXPORT_SYMBOL_GPL(iommu_report_device_fault);
> +
> +/**
> + * iommu_fault_queue_register() - register an IOMMU driver to the fault queue
> + * @flush_notifier: a notifier block that is called before the fault queue is
> + * flushed. The IOMMU driver should commit all faults that are pending in its
> + * low-level queues at the time of the call, into the fault queue. The notifier
> + * takes a device pointer as argument, hinting what endpoint is causing the
> + * flush. When the device is NULL, all faults should be committed.
> + */
> +int iommu_fault_queue_register(struct notifier_block *flush_notifier)
> +{
> +	/*
> +	 * The WQ is unordered because the low-level handler enqueues faults by
> +	 * group. PRI requests within a group have to be ordered, but once
> +	 * that's dealt with, the high-level function can handle groups out of
> +	 * order.
> +	 */
> +	down_write(&iommu_fault_queue_sem);
> +	if (!iommu_fault_queue) {
> +		iommu_fault_queue = alloc_workqueue("iommu_fault_queue",
> +						    WQ_UNBOUND, 0);
> +		if (iommu_fault_queue)
> +			refcount_set(&iommu_fault_queue_refs, 1);
> +	} else {
> +		refcount_inc(&iommu_fault_queue_refs);
> +	}
> +	up_write(&iommu_fault_queue_sem);
> +
> +	if (!iommu_fault_queue)
> +		return -ENOMEM;
> +
> +	if (flush_notifier)
> +		blocking_notifier_chain_register(&iommu_fault_queue_flush_notifiers,
> +						 flush_notifier);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(iommu_fault_queue_register);
> +
> +/**
> + * iommu_fault_queue_flush() - Ensure that all queued faults have been
> + * processed.
> + * @dev: the endpoint whose faults need to be flushed. If NULL, flush all
> + *       pending faults.
> + *
> + * Users must call this function when releasing a PASID, to ensure that all
> + * pending faults affecting this PASID have been handled, and won't affect the
> + * address space of a subsequent process that reuses this PASID.
> + */
> +void iommu_fault_queue_flush(struct device *dev)
> +{
> +	blocking_notifier_call_chain(&iommu_fault_queue_flush_notifiers, 0, dev);
> +
> +	down_read(&iommu_fault_queue_sem);
> +	/*
> +	 * Don't flush the partial faults list. All PRGs with the PASID are
> +	 * complete and have been submitted to the queue.
> +	 */
> +	if (iommu_fault_queue)
> +		flush_workqueue(iommu_fault_queue);
> +	up_read(&iommu_fault_queue_sem);
> +}
> +EXPORT_SYMBOL_GPL(iommu_fault_queue_flush);
> +
> +/**
> + * iommu_fault_queue_unregister() - Unregister an IOMMU driver from the fault
> + * queue.
> + * @flush_notifier: same parameter as iommu_fault_queue_register
> + */
> +void iommu_fault_queue_unregister(struct notifier_block *flush_notifier)
> +{
> +	down_write(&iommu_fault_queue_sem);
> +	if (refcount_dec_and_test(&iommu_fault_queue_refs)) {
> +		destroy_workqueue(iommu_fault_queue);
> +		iommu_fault_queue = NULL;
> +	}
> +	up_write(&iommu_fault_queue_sem);
> +
> +	if (flush_notifier)
> +		blocking_notifier_chain_unregister(&iommu_fault_queue_flush_notifiers,
> +						   flush_notifier);
I would expect the ordering in queue_unregister to be the reverse of queue
register (to make it obvious there are no races).

That would put this last block at the start before potentially destroying
the work queue.  If I'm missing something then perhaps a comment to
explain why the ordering is not the obvious one?

> +}
> +EXPORT_SYMBOL_GPL(iommu_fault_queue_unregister);
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> index 4bc2a8c12465..d7b231cd7355 100644
> --- a/drivers/iommu/iommu-sva.c
> +++ b/drivers/iommu/iommu-sva.c
> @@ -102,9 +102,6 @@
>   * the device table and PASID 0 would be available to the allocator.
>   */
>  
> -/* TODO: stub for the fault queue. Remove later. */
> -#define iommu_fault_queue_flush(...)
> -
>  struct iommu_bond {
>  	struct io_mm		*io_mm;
>  	struct device		*dev;
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 1d60b32a6744..c475893ec7dc 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -798,6 +798,17 @@ int iommu_group_unregister_notifier(struct iommu_group *group,
>  }
>  EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
>  
> +/**
> + * iommu_register_device_fault_handler() - Register a device fault handler
> + * @dev: the device
> + * @handler: the fault handler
> + * @data: private data passed as argument to the callback
> + *
> + * When an IOMMU fault event is received, call this handler with the fault event
> + * and data as argument.
> + *
> + * Return 0 if the fault handler was installed successfully, or an error.
> + */
>  int iommu_register_device_fault_handler(struct device *dev,
>  					iommu_dev_fault_handler_t handler,
>  					void *data)
> @@ -825,6 +836,13 @@ int iommu_register_device_fault_handler(struct device *dev,
>  }
>  EXPORT_SYMBOL_GPL(iommu_register_device_fault_handler);
>  
> +/**
> + * iommu_unregister_device_fault_handler() - Unregister the device fault handler
> + * @dev: the device
> + *
> + * Remove the device fault handler installed with
> + * iommu_register_device_fault_handler().
> + */
>  int iommu_unregister_device_fault_handler(struct device *dev)
>  {
>  	struct iommu_param *idata = dev->iommu_param;
> @@ -840,19 +858,6 @@ int iommu_unregister_device_fault_handler(struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(iommu_unregister_device_fault_handler);
>  
> -
> -int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
> -{
> -	/* we only report device fault if there is a handler registered */
> -	if (!dev->iommu_param || !dev->iommu_param->fault_param ||
> -		!dev->iommu_param->fault_param->handler)
> -		return -ENOSYS;
> -
> -	return dev->iommu_param->fault_param->handler(evt,
> -						dev->iommu_param->fault_param->data);
> -}
> -EXPORT_SYMBOL_GPL(iommu_report_device_fault);
> -
>  /**
>   * iommu_group_id - Return ID for a group
>   * @group: the group to ID
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 226ab4f3ae0e..65e56f28e0ce 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -205,6 +205,7 @@ struct page_response_msg {
>  	u32 resp_code:4;
>  #define IOMMU_PAGE_RESP_SUCCESS	0
>  #define IOMMU_PAGE_RESP_INVALID	1
> +#define IOMMU_PAGE_RESP_HANDLED	2
>  #define IOMMU_PAGE_RESP_FAILURE	0xF
>  
>  	u32 pasid_present:1;
> @@ -534,7 +535,6 @@ extern int iommu_register_device_fault_handler(struct device *dev,
>  
>  extern int iommu_unregister_device_fault_handler(struct device *dev);
>  
> -extern int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt);
>  extern int iommu_page_response(struct iommu_domain *domain, struct device *dev,
>  			       struct page_response_msg *msg);
>  
> @@ -836,11 +836,6 @@ static inline bool iommu_has_device_fault_handler(struct device *dev)
>  	return false;
>  }
>  
> -static inline int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
> -{
> -	return 0;
> -}
> -
>  static inline int iommu_page_response(struct iommu_domain *domain, struct device *dev,
>  				      struct page_response_msg *msg)
>  {
> @@ -1005,4 +1000,31 @@ static inline struct mm_struct *iommu_sva_find(int pasid)
>  }
>  #endif /* CONFIG_IOMMU_SVA */
>  
> +#ifdef CONFIG_IOMMU_FAULT
> +extern int iommu_fault_queue_register(struct notifier_block *flush_notifier);
> +extern void iommu_fault_queue_flush(struct device *dev);
> +extern void iommu_fault_queue_unregister(struct notifier_block *flush_notifier);
> +extern int iommu_report_device_fault(struct device *dev,
> +				     struct iommu_fault_event *evt);
> +#else /* CONFIG_IOMMU_FAULT */
> +static inline int iommu_fault_queue_register(struct notifier_block *flush_notifier)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline void iommu_fault_queue_flush(struct device *dev)
> +{
> +}
> +
> +static inline void iommu_fault_queue_unregister(struct notifier_block *flush_notifier)
> +{
> +}
> +
> +static inline int iommu_report_device_fault(struct device *dev,
> +					    struct iommu_fault_event *evt)
> +{
> +	return 0;
> +}
> +#endif /* CONFIG_IOMMU_FAULT */
> +
>  #endif /* __LINUX_IOMMU_H */


^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 07/37] iommu: Add a page fault handler
@ 2018-03-08 15:40       ` Jonathan Cameron
  0 siblings, 0 replies; 317+ messages in thread
From: Jonathan Cameron @ 2018-03-08 15:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 12 Feb 2018 18:33:22 +0000
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> Some systems allow devices to handle IOMMU translation faults in the core
> mm. For example systems supporting the PCI PRI extension or Arm SMMU stall
> model. Infrastructure for reporting such recoverable page faults was
> recently added to the IOMMU core, for SVA virtualization. Extend
> iommu_report_device_fault() to handle host page faults as well.
> 
> * IOMMU drivers instantiate a fault workqueue, using
>   iommu_fault_queue_init() and iommu_fault_queue_destroy().
> 
> * When it receives a fault event, supposedly in an IRQ handler, the IOMMU
>   driver reports the fault using iommu_report_device_fault()
> 
> * If the device driver registered a handler (e.g. VFIO), pass down the
>   fault event. Otherwise submit it to the fault queue, to be handled in a
>   thread.
> 
> * When the fault corresponds to an io_mm, call the mm fault handler on it
>   (in next patch).
> 
> * Once the fault is handled, the mm wrapper or the device driver reports
>   success of failure with iommu_page_response(). The translation is either
>   retried or aborted, depending on the response code.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
A few really minor points inline...  Basically looks good to me.

> ---
>  drivers/iommu/Kconfig      |  10 ++
>  drivers/iommu/Makefile     |   1 +
>  drivers/iommu/io-pgfault.c | 282 +++++++++++++++++++++++++++++++++++++++++++++
>  drivers/iommu/iommu-sva.c  |   3 -
>  drivers/iommu/iommu.c      |  31 ++---
>  include/linux/iommu.h      |  34 +++++-
>  6 files changed, 339 insertions(+), 22 deletions(-)
>  create mode 100644 drivers/iommu/io-pgfault.c
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 146eebe9a4bb..e751bb9958ba 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -85,6 +85,15 @@ config IOMMU_SVA
>  
>  	  If unsure, say N here.
>  
> +config IOMMU_FAULT
> +	bool "Fault handler for the IOMMU API"
> +	select IOMMU_API
> +	help
> +	  Enable the generic fault handler for the IOMMU API, that handles
> +	  recoverable page faults or inject them into guests.
> +
> +	  If unsure, say N here.
> +
>  config FSL_PAMU
>  	bool "Freescale IOMMU support"
>  	depends on PCI
> @@ -156,6 +165,7 @@ config INTEL_IOMMU
>  	select IOMMU_API
>  	select IOMMU_IOVA
>  	select DMAR_TABLE
> +	select IOMMU_FAULT
>  	help
>  	  DMA remapping (DMAR) devices support enables independent address
>  	  translations for Direct Memory Access (DMA) from devices.
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 1dbcc89ebe4c..f4324e29035e 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -4,6 +4,7 @@ obj-$(CONFIG_IOMMU_API) += iommu-traces.o
>  obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
>  obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
>  obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
> +obj-$(CONFIG_IOMMU_FAULT) += io-pgfault.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
> diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
> new file mode 100644
> index 000000000000..33309ed316d2
> --- /dev/null
> +++ b/drivers/iommu/io-pgfault.c
> @@ -0,0 +1,282 @@
> +/*
> + * Handle device page faults
> + *
> + * Copyright (C) 2018 ARM Ltd.
> + * Author: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> + *
> + * SPDX-License-Identifier: GPL-2.0
> + */
> +
> +#include <linux/iommu.h>
> +#include <linux/list.h>
> +#include <linux/slab.h>
> +#include <linux/workqueue.h>
> +
> +static struct workqueue_struct *iommu_fault_queue;
> +static DECLARE_RWSEM(iommu_fault_queue_sem);
> +static refcount_t iommu_fault_queue_refs = REFCOUNT_INIT(0);
> +static BLOCKING_NOTIFIER_HEAD(iommu_fault_queue_flush_notifiers);
> +
> +/* Used to store incomplete fault groups */
> +static LIST_HEAD(iommu_partial_faults);
> +static DEFINE_SPINLOCK(iommu_partial_faults_lock);
> +
> +struct iommu_fault_context {
> +	struct device			*dev;
> +	struct iommu_fault_event	evt;
> +	struct list_head		head;
> +};
> +
> +struct iommu_fault_group {
> +	struct iommu_domain		*domain;
> +	struct iommu_fault_context	last_fault;
> +	struct list_head		faults;
> +	struct work_struct		work;
> +};
> +
> +/*
> + * iommu_fault_complete() - Finish handling a fault
> + *
> + * Send a response if necessary and pass on the sanitized status code
> + */
> +static int iommu_fault_complete(struct iommu_domain *domain, struct device *dev,
> +				struct iommu_fault_event *evt, int status)
> +{
> +	struct page_response_msg resp = {
> +		.addr		= evt->addr,
> +		.pasid		= evt->pasid,
> +		.pasid_present	= evt->pasid_valid,
> +		.page_req_group_id = evt->page_req_group_id,
Really trivial, but if you want to align the equals signs, the all need indenting
one more tab.

> +		.type		= IOMMU_PAGE_GROUP_RESP,
> +		.private_data	= evt->iommu_private,
> +	};
> +
> +	/*
> +	 * There is no "handling" an unrecoverable fault, so the only valid
> +	 * return values are 0 or an error.
> +	 */
> +	if (evt->type == IOMMU_FAULT_DMA_UNRECOV)
> +		return status > 0 ? 0 : status;
> +
> +	/* Someone took ownership of the fault and will complete it later */
> +	if (status == IOMMU_PAGE_RESP_HANDLED)
> +		return 0;
> +
> +	/*
> +	 * There was an internal error with handling the recoverable fault. Try
> +	 * to complete the fault if possible.
> +	 */
> +	if (status < 0)
> +		status = IOMMU_PAGE_RESP_INVALID;
> +
> +	if (WARN_ON(!domain->ops->page_response))
> +		/*
> +		 * The IOMMU driver shouldn't have submitted recoverable faults
> +		 * if it cannot receive a response.
> +		 */
> +		return -EINVAL;
> +
> +	resp.resp_code = status;
> +	return domain->ops->page_response(domain, dev, &resp);
> +}
> +
> +static int iommu_fault_handle_single(struct iommu_fault_context *fault)
> +{
> +	/* TODO */
> +	return -ENODEV;
> +}
> +
> +static void iommu_fault_handle_group(struct work_struct *work)
> +{
> +	struct iommu_fault_group *group;
> +	struct iommu_fault_context *fault, *next;
> +	int status = IOMMU_PAGE_RESP_SUCCESS;
> +
> +	group = container_of(work, struct iommu_fault_group, work);
> +
> +	list_for_each_entry_safe(fault, next, &group->faults, head) {
> +		struct iommu_fault_event *evt = &fault->evt;
> +		/*
> +		 * Errors are sticky: don't handle subsequent faults in the
> +		 * group if there is an error.
> +		 */
> +		if (status == IOMMU_PAGE_RESP_SUCCESS)
> +			status = iommu_fault_handle_single(fault);
> +
> +		if (!evt->last_req)
> +			kfree(fault);
> +	}
> +
> +	iommu_fault_complete(group->domain, group->last_fault.dev,
> +			     &group->last_fault.evt, status);
> +	kfree(group);
> +}
> +
> +static int iommu_queue_fault(struct iommu_domain *domain, struct device *dev,
> +			     struct iommu_fault_event *evt)
> +{
> +	struct iommu_fault_group *group;
> +	struct iommu_fault_context *fault, *next;
> +
> +	if (!iommu_fault_queue)
> +		return -ENOSYS;
> +
> +	if (!evt->last_req) {
> +		fault = kzalloc(sizeof(*fault), GFP_KERNEL);
> +		if (!fault)
> +			return -ENOMEM;
> +
> +		fault->evt = *evt;
> +		fault->dev = dev;
> +
> +		/* Non-last request of a group. Postpone until the last one */
> +		spin_lock(&iommu_partial_faults_lock);
> +		list_add_tail(&fault->head, &iommu_partial_faults);
> +		spin_unlock(&iommu_partial_faults_lock);
> +
> +		return IOMMU_PAGE_RESP_HANDLED;
> +	}
> +
> +	group = kzalloc(sizeof(*group), GFP_KERNEL);
> +	if (!group)
> +		return -ENOMEM;
> +
> +	group->last_fault.evt = *evt;
> +	group->last_fault.dev = dev;
> +	group->domain = domain;
> +	INIT_LIST_HEAD(&group->faults);
> +	list_add(&group->last_fault.head, &group->faults);
> +	INIT_WORK(&group->work, iommu_fault_handle_group);
> +
> +	/* See if we have pending faults for this group */
> +	spin_lock(&iommu_partial_faults_lock);
> +	list_for_each_entry_safe(fault, next, &iommu_partial_faults, head) {
> +		if (fault->evt.page_req_group_id == evt->page_req_group_id &&
> +		    fault->dev == dev) {
> +			list_del(&fault->head);
> +			/* Insert *before* the last fault */
> +			list_add(&fault->head, &group->faults);
> +		}
> +	}
> +	spin_unlock(&iommu_partial_faults_lock);
> +
> +	queue_work(iommu_fault_queue, &group->work);
> +
> +	/* Postpone the fault completion */
> +	return IOMMU_PAGE_RESP_HANDLED;
> +}
> +
> +/**
> + * iommu_report_device_fault() - Handle fault in device driver or mm
> + *
> + * If the device driver expressed interest in handling fault, report it through
> + * the callback. If the fault is recoverable, try to page in the address.
> + */
> +int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
> +{
> +	int ret = -ENOSYS;
> +	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> +
> +	if (!domain)
> +		return -ENODEV;
> +
> +	/*
> +	 * if upper layers showed interest and installed a fault handler,
> +	 * invoke it.
> +	 */
> +	if (iommu_has_device_fault_handler(dev)) {
> +		struct iommu_fault_param *param = dev->iommu_param->fault_param;
> +
> +		return param->handler(evt, param->data);
> +	}
> +
> +	/* If the handler is blocking, handle fault in the workqueue */
> +	if (evt->type == IOMMU_FAULT_PAGE_REQ)
> +		ret = iommu_queue_fault(domain, dev, evt);
> +
> +	return iommu_fault_complete(domain, dev, evt, ret);
> +}
> +EXPORT_SYMBOL_GPL(iommu_report_device_fault);
> +
> +/**
> + * iommu_fault_queue_register() - register an IOMMU driver to the fault queue
> + * @flush_notifier: a notifier block that is called before the fault queue is
> + * flushed. The IOMMU driver should commit all faults that are pending in its
> + * low-level queues at the time of the call, into the fault queue. The notifier
> + * takes a device pointer as argument, hinting what endpoint is causing the
> + * flush. When the device is NULL, all faults should be committed.
> + */
> +int iommu_fault_queue_register(struct notifier_block *flush_notifier)
> +{
> +	/*
> +	 * The WQ is unordered because the low-level handler enqueues faults by
> +	 * group. PRI requests within a group have to be ordered, but once
> +	 * that's dealt with, the high-level function can handle groups out of
> +	 * order.
> +	 */
> +	down_write(&iommu_fault_queue_sem);
> +	if (!iommu_fault_queue) {
> +		iommu_fault_queue = alloc_workqueue("iommu_fault_queue",
> +						    WQ_UNBOUND, 0);
> +		if (iommu_fault_queue)
> +			refcount_set(&iommu_fault_queue_refs, 1);
> +	} else {
> +		refcount_inc(&iommu_fault_queue_refs);
> +	}
> +	up_write(&iommu_fault_queue_sem);
> +
> +	if (!iommu_fault_queue)
> +		return -ENOMEM;
> +
> +	if (flush_notifier)
> +		blocking_notifier_chain_register(&iommu_fault_queue_flush_notifiers,
> +						 flush_notifier);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(iommu_fault_queue_register);
> +
> +/**
> + * iommu_fault_queue_flush() - Ensure that all queued faults have been
> + * processed.
> + * @dev: the endpoint whose faults need to be flushed. If NULL, flush all
> + *       pending faults.
> + *
> + * Users must call this function when releasing a PASID, to ensure that all
> + * pending faults affecting this PASID have been handled, and won't affect the
> + * address space of a subsequent process that reuses this PASID.
> + */
> +void iommu_fault_queue_flush(struct device *dev)
> +{
> +	blocking_notifier_call_chain(&iommu_fault_queue_flush_notifiers, 0, dev);
> +
> +	down_read(&iommu_fault_queue_sem);
> +	/*
> +	 * Don't flush the partial faults list. All PRGs with the PASID are
> +	 * complete and have been submitted to the queue.
> +	 */
> +	if (iommu_fault_queue)
> +		flush_workqueue(iommu_fault_queue);
> +	up_read(&iommu_fault_queue_sem);
> +}
> +EXPORT_SYMBOL_GPL(iommu_fault_queue_flush);
> +
> +/**
> + * iommu_fault_queue_unregister() - Unregister an IOMMU driver from the fault
> + * queue.
> + * @flush_notifier: same parameter as iommu_fault_queue_register
> + */
> +void iommu_fault_queue_unregister(struct notifier_block *flush_notifier)
> +{
> +	down_write(&iommu_fault_queue_sem);
> +	if (refcount_dec_and_test(&iommu_fault_queue_refs)) {
> +		destroy_workqueue(iommu_fault_queue);
> +		iommu_fault_queue = NULL;
> +	}
> +	up_write(&iommu_fault_queue_sem);
> +
> +	if (flush_notifier)
> +		blocking_notifier_chain_unregister(&iommu_fault_queue_flush_notifiers,
> +						   flush_notifier);
I would expect the ordering in queue_unregister to be the reverse of queue
register (to make it obvious there are no races).

That would put this last block at the start before potentially destroying
the work queue.  If I'm missing something then perhaps a comment to
explain why the ordering is not the obvious one?

> +}
> +EXPORT_SYMBOL_GPL(iommu_fault_queue_unregister);
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> index 4bc2a8c12465..d7b231cd7355 100644
> --- a/drivers/iommu/iommu-sva.c
> +++ b/drivers/iommu/iommu-sva.c
> @@ -102,9 +102,6 @@
>   * the device table and PASID 0 would be available to the allocator.
>   */
>  
> -/* TODO: stub for the fault queue. Remove later. */
> -#define iommu_fault_queue_flush(...)
> -
>  struct iommu_bond {
>  	struct io_mm		*io_mm;
>  	struct device		*dev;
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 1d60b32a6744..c475893ec7dc 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -798,6 +798,17 @@ int iommu_group_unregister_notifier(struct iommu_group *group,
>  }
>  EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
>  
> +/**
> + * iommu_register_device_fault_handler() - Register a device fault handler
> + * @dev: the device
> + * @handler: the fault handler
> + * @data: private data passed as argument to the callback
> + *
> + * When an IOMMU fault event is received, call this handler with the fault event
> + * and data as argument.
> + *
> + * Return 0 if the fault handler was installed successfully, or an error.
> + */
>  int iommu_register_device_fault_handler(struct device *dev,
>  					iommu_dev_fault_handler_t handler,
>  					void *data)
> @@ -825,6 +836,13 @@ int iommu_register_device_fault_handler(struct device *dev,
>  }
>  EXPORT_SYMBOL_GPL(iommu_register_device_fault_handler);
>  
> +/**
> + * iommu_unregister_device_fault_handler() - Unregister the device fault handler
> + * @dev: the device
> + *
> + * Remove the device fault handler installed with
> + * iommu_register_device_fault_handler().
> + */
>  int iommu_unregister_device_fault_handler(struct device *dev)
>  {
>  	struct iommu_param *idata = dev->iommu_param;
> @@ -840,19 +858,6 @@ int iommu_unregister_device_fault_handler(struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(iommu_unregister_device_fault_handler);
>  
> -
> -int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
> -{
> -	/* we only report device fault if there is a handler registered */
> -	if (!dev->iommu_param || !dev->iommu_param->fault_param ||
> -		!dev->iommu_param->fault_param->handler)
> -		return -ENOSYS;
> -
> -	return dev->iommu_param->fault_param->handler(evt,
> -						dev->iommu_param->fault_param->data);
> -}
> -EXPORT_SYMBOL_GPL(iommu_report_device_fault);
> -
>  /**
>   * iommu_group_id - Return ID for a group
>   * @group: the group to ID
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 226ab4f3ae0e..65e56f28e0ce 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -205,6 +205,7 @@ struct page_response_msg {
>  	u32 resp_code:4;
>  #define IOMMU_PAGE_RESP_SUCCESS	0
>  #define IOMMU_PAGE_RESP_INVALID	1
> +#define IOMMU_PAGE_RESP_HANDLED	2
>  #define IOMMU_PAGE_RESP_FAILURE	0xF
>  
>  	u32 pasid_present:1;
> @@ -534,7 +535,6 @@ extern int iommu_register_device_fault_handler(struct device *dev,
>  
>  extern int iommu_unregister_device_fault_handler(struct device *dev);
>  
> -extern int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt);
>  extern int iommu_page_response(struct iommu_domain *domain, struct device *dev,
>  			       struct page_response_msg *msg);
>  
> @@ -836,11 +836,6 @@ static inline bool iommu_has_device_fault_handler(struct device *dev)
>  	return false;
>  }
>  
> -static inline int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
> -{
> -	return 0;
> -}
> -
>  static inline int iommu_page_response(struct iommu_domain *domain, struct device *dev,
>  				      struct page_response_msg *msg)
>  {
> @@ -1005,4 +1000,31 @@ static inline struct mm_struct *iommu_sva_find(int pasid)
>  }
>  #endif /* CONFIG_IOMMU_SVA */
>  
> +#ifdef CONFIG_IOMMU_FAULT
> +extern int iommu_fault_queue_register(struct notifier_block *flush_notifier);
> +extern void iommu_fault_queue_flush(struct device *dev);
> +extern void iommu_fault_queue_unregister(struct notifier_block *flush_notifier);
> +extern int iommu_report_device_fault(struct device *dev,
> +				     struct iommu_fault_event *evt);
> +#else /* CONFIG_IOMMU_FAULT */
> +static inline int iommu_fault_queue_register(struct notifier_block *flush_notifier)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline void iommu_fault_queue_flush(struct device *dev)
> +{
> +}
> +
> +static inline void iommu_fault_queue_unregister(struct notifier_block *flush_notifier)
> +{
> +}
> +
> +static inline int iommu_report_device_fault(struct device *dev,
> +					    struct iommu_fault_event *evt)
> +{
> +	return 0;
> +}
> +#endif /* CONFIG_IOMMU_FAULT */
> +
>  #endif /* __LINUX_IOMMU_H */

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 31/37] iommu/arm-smmu-v3: Add support for PCI ATS
  2018-02-12 18:33     ` Jean-Philippe Brucker
  (?)
@ 2018-03-08 16:17         ` Jonathan Cameron
  -1 siblings, 0 replies; 317+ messages in thread
From: Jonathan Cameron @ 2018-03-08 16:17 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: mark.rutland-5wv7dgnIgG8,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, catalin.marinas-5wv7dgnIgG8,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw-LthD3rsA81gm4RdzfppkhA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	sudeep.holla-5wv7dgnIgG8, christian.koenig-5C7GfCeVMHo

On Mon, 12 Feb 2018 18:33:46 +0000
Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:

> PCIe devices can implement their own TLB, named Address Translation Cache
> (ATC). Enable Address Translation Service (ATS) for devices that support
> it and send them invalidation requests whenever we invalidate the IOTLBs.
> 
>   Range calculation
>   -----------------
> 
> The invalidation packet itself is a bit awkward: range must be naturally
> aligned, which means that the start address is a multiple of the range
> size. In addition, the size must be a power of two number of 4k pages. We
> have a few options to enforce this constraint:
> 
> (1) Find the smallest naturally aligned region that covers the requested
>     range. This is simple to compute and only takes one ATC_INV, but it
>     will spill on lots of neighbouring ATC entries.
> 
> (2) Align the start address to the region size (rounded up to a power of
>     two), and send a second invalidation for the next range of the same
>     size. Still not great, but reduces spilling.
> 
> (3) Cover the range exactly with the smallest number of naturally aligned
>     regions. This would be interesting to implement but as for (2),
>     requires multiple ATC_INV.
> 
> As I suspect ATC invalidation packets will be a very scarce resource, I'll
> go with option (1) for now, and only send one big invalidation. We can
> move to (2), which is both easier to read and more gentle with the ATC,
> once we've observed on real systems that we can send multiple smaller
> Invalidation Requests for roughly the same price as a single big one.
> 
> Note that with io-pgtable, the unmap function is called for each page, so
> this doesn't matter. The problem shows up when sharing page tables with
> the MMU.
> 
>   Timeout
>   -------
> 
> ATC invalidation is allowed to take up to 90 seconds, according to the
> PCIe spec, so it is possible to hit the SMMU command queue timeout during
> normal operations.
> 
> Some SMMU implementations will raise a CERROR_ATC_INV_SYNC when a CMD_SYNC
> fails because of an ATC invalidation. Some will just abort the CMD_SYNC.
> Others might let CMD_SYNC complete and have an asynchronous IMPDEF
> mechanism to record the error. When we receive a CERROR_ATC_INV_SYNC, we
> could retry sending all ATC_INV since last successful CMD_SYNC. When a
> CMD_SYNC fails without CERROR_ATC_INV_SYNC, we could retry sending *all*
> commands since last successful CMD_SYNC.
> 
> We cannot afford to wait 90 seconds in iommu_unmap, let alone MMU
> notifiers. So we'd have to introduce a more clever system if this timeout
> becomes a problem, like keeping hold of mappings and invalidating in the
> background. Implementing safe delayed invalidations is a very complex
> problem and deserves a series of its own. We'll assess whether more work
> is needed to properly handle ATC invalidation timeouts once this code runs
> on real hardware.
> 
>   Misc
>   ----
> 
> I didn't put ATC and TLB invalidations in the same functions for three
> reasons:
> 
> * TLB invalidation by range is batched and committed with a single sync.
>   Batching ATC invalidation is inconvenient, endpoints limit the number of
>   inflight invalidations. We'd have to count the number of invalidations
>   queued and send a sync periodically. In addition, I suspect we always
>   need a sync between TLB and ATC invalidation for the same page.
> 
> * Doing ATC invalidation outside tlb_inv_range also allows to send less
>   requests, since TLB invalidations are done per page or block, while ATC
>   invalidations target IOVA ranges.
> 
> * TLB invalidation by context is performed when freeing the domain, at
>   which point there isn't any device attached anymore.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
Few minor error path related comments inline..

> ---
>  drivers/iommu/arm-smmu-v3.c | 236 ++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 226 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 8b9f5dd06be0..76513135310f 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -37,6 +37,7 @@
>  #include <linux/of_iommu.h>
>  #include <linux/of_platform.h>
>  #include <linux/pci.h>
> +#include <linux/pci-ats.h>
>  #include <linux/platform_device.h>
>  #include <linux/sched/mm.h>
>  
> @@ -109,6 +110,7 @@
>  #define IDR5_OAS_48_BIT			(5 << IDR5_OAS_SHIFT)
>  
>  #define ARM_SMMU_CR0			0x20
> +#define CR0_ATSCHK			(1 << 4)
>  #define CR0_CMDQEN			(1 << 3)
>  #define CR0_EVTQEN			(1 << 2)
>  #define CR0_PRIQEN			(1 << 1)
> @@ -304,6 +306,7 @@
>  #define CMDQ_ERR_CERROR_NONE_IDX	0
>  #define CMDQ_ERR_CERROR_ILL_IDX		1
>  #define CMDQ_ERR_CERROR_ABT_IDX		2
> +#define CMDQ_ERR_CERROR_ATC_INV_IDX	3
>  
>  #define CMDQ_0_OP_SHIFT			0
>  #define CMDQ_0_OP_MASK			0xffUL
> @@ -327,6 +330,15 @@
>  #define CMDQ_TLBI_1_VA_MASK		~0xfffUL
>  #define CMDQ_TLBI_1_IPA_MASK		0xfffffffff000UL
>  
> +#define CMDQ_ATC_0_SSID_SHIFT		12
> +#define CMDQ_ATC_0_SSID_MASK		0xfffffUL
> +#define CMDQ_ATC_0_SID_SHIFT		32
> +#define CMDQ_ATC_0_SID_MASK		0xffffffffUL
> +#define CMDQ_ATC_0_GLOBAL		(1UL << 9)
> +#define CMDQ_ATC_1_SIZE_SHIFT		0
> +#define CMDQ_ATC_1_SIZE_MASK		0x3fUL
> +#define CMDQ_ATC_1_ADDR_MASK		~0xfffUL
> +
>  #define CMDQ_PRI_0_SSID_SHIFT		12
>  #define CMDQ_PRI_0_SSID_MASK		0xfffffUL
>  #define CMDQ_PRI_0_SID_SHIFT		32
> @@ -425,6 +437,11 @@ module_param_named(disable_bypass, disable_bypass, bool, S_IRUGO);
>  MODULE_PARM_DESC(disable_bypass,
>  	"Disable bypass streams such that incoming transactions from devices that are not attached to an iommu domain will report an abort back to the device and will not be allowed to pass through the SMMU.");
>  
> +static bool disable_ats_check;
> +module_param_named(disable_ats_check, disable_ats_check, bool, S_IRUGO);
> +MODULE_PARM_DESC(disable_ats_check,
> +	"By default, the SMMU checks whether each incoming transaction marked as translated is allowed by the stream configuration. This option disables the check.");
> +
>  enum pri_resp {
>  	PRI_RESP_DENY,
>  	PRI_RESP_FAIL,
> @@ -498,6 +515,16 @@ struct arm_smmu_cmdq_ent {
>  			u64			addr;
>  		} tlbi;
>  
> +		#define CMDQ_OP_ATC_INV		0x40
> +		#define ATC_INV_SIZE_ALL	52
> +		struct {
> +			u32			sid;
> +			u32			ssid;
> +			u64			addr;
> +			u8			size;
> +			bool			global;
> +		} atc;
> +
>  		#define CMDQ_OP_PRI_RESP	0x41
>  		struct {
>  			u32			sid;
> @@ -928,6 +955,14 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
>  	case CMDQ_OP_TLBI_EL2_ASID:
>  		cmd[0] |= (u64)ent->tlbi.asid << CMDQ_TLBI_0_ASID_SHIFT;
>  		break;
> +	case CMDQ_OP_ATC_INV:
> +		cmd[0] |= ent->substream_valid ? CMDQ_0_SSV : 0;
> +		cmd[0] |= ent->atc.global ? CMDQ_ATC_0_GLOBAL : 0;
> +		cmd[0] |= ent->atc.ssid << CMDQ_ATC_0_SSID_SHIFT;
> +		cmd[0] |= (u64)ent->atc.sid << CMDQ_ATC_0_SID_SHIFT;
> +		cmd[1] |= ent->atc.size << CMDQ_ATC_1_SIZE_SHIFT;
> +		cmd[1] |= ent->atc.addr & CMDQ_ATC_1_ADDR_MASK;
> +		break;
>  	case CMDQ_OP_PRI_RESP:
>  		cmd[0] |= ent->substream_valid ? CMDQ_0_SSV : 0;
>  		cmd[0] |= ent->pri.ssid << CMDQ_PRI_0_SSID_SHIFT;
> @@ -984,6 +1019,7 @@ static void arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu)
>  		[CMDQ_ERR_CERROR_NONE_IDX]	= "No error",
>  		[CMDQ_ERR_CERROR_ILL_IDX]	= "Illegal command",
>  		[CMDQ_ERR_CERROR_ABT_IDX]	= "Abort on command fetch",
> +		[CMDQ_ERR_CERROR_ATC_INV_IDX]	= "ATC invalidate timeout",
>  	};
>  
>  	int i;
> @@ -1003,6 +1039,14 @@ static void arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu)
>  		dev_err(smmu->dev, "retrying command fetch\n");
>  	case CMDQ_ERR_CERROR_NONE_IDX:
>  		return;
> +	case CMDQ_ERR_CERROR_ATC_INV_IDX:
> +		/*
> +		 * ATC Invalidation Completion timeout. CONS is still pointing
> +		 * at the CMD_SYNC. Attempt to complete other pending commands
> +		 * by repeating the CMD_SYNC, though we might well end up back
> +		 * here since the ATC invalidation may still be pending.
> +		 */
> +		return;
>  	case CMDQ_ERR_CERROR_ILL_IDX:
>  		/* Fallthrough */
>  	default:
> @@ -1261,9 +1305,6 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
>  			 STRTAB_STE_1_S1C_CACHE_WBRA
>  			 << STRTAB_STE_1_S1COR_SHIFT |
>  			 STRTAB_STE_1_S1C_SH_ISH << STRTAB_STE_1_S1CSH_SHIFT |
> -#ifdef CONFIG_PCI_ATS
> -			 STRTAB_STE_1_EATS_TRANS << STRTAB_STE_1_EATS_SHIFT |
> -#endif
>  			 (smmu->features & ARM_SMMU_FEAT_E2H ?
>  			  STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1) <<
>  			 STRTAB_STE_1_STRW_SHIFT);
> @@ -1300,6 +1341,10 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
>  		val |= STRTAB_STE_0_CFG_S2_TRANS;
>  	}
>  
> +	if (IS_ENABLED(CONFIG_PCI_ATS))
> +		dst[1] |= cpu_to_le64(STRTAB_STE_1_EATS_TRANS
> +				      << STRTAB_STE_1_EATS_SHIFT);
> +
>  	arm_smmu_sync_ste_for_sid(smmu, sid);
>  	dst[0] = cpu_to_le64(val);
>  	arm_smmu_sync_ste_for_sid(smmu, sid);
> @@ -1680,6 +1725,104 @@ static irqreturn_t arm_smmu_combined_irq_handler(int irq, void *dev)
>  	return IRQ_WAKE_THREAD;
>  }
>  
> +/* ATS invalidation */
> +static bool arm_smmu_master_has_ats(struct arm_smmu_master_data *master)
> +{
> +	return dev_is_pci(master->dev) && to_pci_dev(master->dev)->ats_enabled;
> +}
> +
> +static void
> +arm_smmu_atc_inv_to_cmd(int ssid, unsigned long iova, size_t size,
> +			struct arm_smmu_cmdq_ent *cmd)
> +{
> +	size_t log2_span;
> +	size_t span_mask;
> +	/* ATC invalidates are always on 4096 bytes pages */
> +	size_t inval_grain_shift = 12;
> +	unsigned long page_start, page_end;
> +
> +	*cmd = (struct arm_smmu_cmdq_ent) {
> +		.opcode			= CMDQ_OP_ATC_INV,
> +		.substream_valid	= !!ssid,
> +		.atc.ssid		= ssid,
> +	};
> +
> +	if (!size) {
> +		cmd->atc.size = ATC_INV_SIZE_ALL;
> +		return;
> +	}
> +
> +	page_start	= iova >> inval_grain_shift;
> +	page_end	= (iova + size - 1) >> inval_grain_shift;
> +
> +	/*
> +	 * Find the smallest power of two that covers the range. Most
> +	 * significant differing bit between start and end address indicates the
> +	 * required span, ie. fls(start ^ end). For example:
> +	 *
> +	 * We want to invalidate pages [8; 11]. This is already the ideal range:
> +	 *		x = 0b1000 ^ 0b1011 = 0b11
> +	 *		span = 1 << fls(x) = 4
> +	 *
> +	 * To invalidate pages [7; 10], we need to invalidate [0; 15]:
> +	 *		x = 0b0111 ^ 0b1010 = 0b1101
> +	 *		span = 1 << fls(x) = 16
> +	 */
> +	log2_span	= fls_long(page_start ^ page_end);
> +	span_mask	= (1ULL << log2_span) - 1;
> +
> +	page_start	&= ~span_mask;
> +
> +	cmd->atc.addr	= page_start << inval_grain_shift;
> +	cmd->atc.size	= log2_span;
> +}
> +
> +static int arm_smmu_atc_inv_master(struct arm_smmu_master_data *master,
> +				   struct arm_smmu_cmdq_ent *cmd)
> +{
> +	int i;
> +	struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
> +
> +	if (!arm_smmu_master_has_ats(master))
> +		return 0;
> +
> +	for (i = 0; i < fwspec->num_ids; i++) {
> +		cmd->atc.sid = fwspec->ids[i];
> +		arm_smmu_cmdq_issue_cmd(master->smmu, cmd);
> +	}
> +
> +	arm_smmu_cmdq_issue_sync(master->smmu);
> +
> +	return 0;
> +}
> +
> +static int arm_smmu_atc_inv_master_all(struct arm_smmu_master_data *master,
> +				       int ssid)
> +{
> +	struct arm_smmu_cmdq_ent cmd;
> +
> +	arm_smmu_atc_inv_to_cmd(ssid, 0, 0, &cmd);
> +	return arm_smmu_atc_inv_master(master, &cmd);
> +}
> +
> +static size_t
> +arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
> +			unsigned long iova, size_t size)
> +{
> +	unsigned long flags;
> +	struct arm_smmu_cmdq_ent cmd;
> +	struct arm_smmu_master_data *master;
> +
> +	arm_smmu_atc_inv_to_cmd(ssid, iova, size, &cmd);
> +
> +	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
> +	list_for_each_entry(master, &smmu_domain->devices, list)
> +		arm_smmu_atc_inv_master(master, &cmd);
> +	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
> +
> +	return size;
> +}
> +
>  /* IO_PGTABLE API */
>  static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu)
>  {
> @@ -2092,6 +2235,8 @@ static void arm_smmu_detach_dev(struct device *dev)
>  	if (smmu_domain) {
>  		__iommu_sva_unbind_dev_all(dev);
>  
> +		arm_smmu_atc_inv_master_all(master, 0);
> +
>  		spin_lock_irqsave(&smmu_domain->devices_lock, flags);
>  		list_del(&master->list);
>  		spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
> @@ -2179,12 +2324,19 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>  static size_t
>  arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
>  {
> -	struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
> +	int ret;
> +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> +	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>  
>  	if (!ops)
>  		return 0;
>  
> -	return ops->unmap(ops, iova, size);
> +	ret = ops->unmap(ops, iova, size);
> +
> +	if (ret && smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS)
> +		ret = arm_smmu_atc_inv_domain(smmu_domain, 0, iova, size);
> +
> +	return ret;
>  }
>  
>  static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
> @@ -2342,6 +2494,48 @@ static bool arm_smmu_sid_in_range(struct arm_smmu_device *smmu, u32 sid)
>  	return sid < limit;
>  }
>  
> +static int arm_smmu_enable_ats(struct arm_smmu_master_data *master)
> +{
> +	int ret;
> +	size_t stu;
> +	struct pci_dev *pdev;
> +	struct arm_smmu_device *smmu = master->smmu;
> +	struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
> +
> +	if (!(smmu->features & ARM_SMMU_FEAT_ATS) || !dev_is_pci(master->dev) ||
> +	    (fwspec->flags & IOMMU_FWSPEC_PCI_NO_ATS))
> +		return -ENOSYS;
> +
> +	pdev = to_pci_dev(master->dev);
> +
> +	/* Smallest Translation Unit: log2 of the smallest supported granule */
> +	stu = __ffs(smmu->pgsize_bitmap);
> +
> +	ret = pci_enable_ats(pdev, stu);
> +	if (ret)
> +		return ret;
> +
> +	dev_dbg(&pdev->dev, "enabled ATS (STU=%zu, QDEP=%d)\n", stu,
> +		pci_ats_queue_depth(pdev));
> +
> +	return 0;
> +}
> +
> +static void arm_smmu_disable_ats(struct arm_smmu_master_data *master)
> +{
> +	struct pci_dev *pdev;
> +
> +	if (!dev_is_pci(master->dev))
> +		return;
> +
> +	pdev = to_pci_dev(master->dev);
> +
> +	if (!pdev->ats_enabled)
> +		return;
> +
> +	pci_disable_ats(pdev);
> +}
> +
>  static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
>  				  struct arm_smmu_master_data *master)
>  {
> @@ -2462,14 +2656,24 @@ static int arm_smmu_add_device(struct device *dev)
>  		master->ste.can_stall = true;
>  	}
>  
> +	arm_smmu_enable_ats(master);
It's a bit nasty not to handle the errors that this could output (other than
the ENOSYS for when it's not available). Seems that it would be nice to at
least add a note to the log if people are expecting it to work and it won't
because some condition or other isn't met.

> +
>  	group = iommu_group_get_for_dev(dev);
> -	if (!IS_ERR(group)) {
> -		arm_smmu_insert_master(smmu, master);
> -		iommu_group_put(group);
> -		iommu_device_link(&smmu->iommu, dev);
> +	if (IS_ERR(group)) {
> +		ret = PTR_ERR(group);
> +		goto err_disable_ats;
>  	}
>  
> -	return PTR_ERR_OR_ZERO(group);
> +	iommu_group_put(group);
> +	arm_smmu_insert_master(smmu, master);
> +	iommu_device_link(&smmu->iommu, dev);
> +
> +	return 0;
> +
> +err_disable_ats:
> +	arm_smmu_disable_ats(master);
master is leaked here I think...
Possibly other things as this doesn't line up with the
remove which I'd have mostly expected it to do.

There are some slightly fishy bits of ordering in the original code
anyway that I'm not seeing justification for (why is
the iommu_device_unlink later than one might expect for
example).

> +
> +	return ret;
>  }
>  
>  static void arm_smmu_remove_device(struct device *dev)
> @@ -2486,6 +2690,8 @@ static void arm_smmu_remove_device(struct device *dev)
>  	if (master && master->ste.assigned)
>  		arm_smmu_detach_dev(dev);
>  	arm_smmu_remove_master(smmu, master);
> +	arm_smmu_disable_ats(master);
> +
>  	iommu_group_remove_device(dev);
>  	iommu_device_unlink(&smmu->iommu, dev);
>  	kfree(master);
> @@ -3094,6 +3300,16 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
>  		}
>  	}
>  
> +	if (smmu->features & ARM_SMMU_FEAT_ATS && !disable_ats_check) {
> +		enables |= CR0_ATSCHK;
> +		ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0,
> +					      ARM_SMMU_CR0ACK);
> +		if (ret) {
> +			dev_err(smmu->dev, "failed to enable ATS check\n");
> +			return ret;
> +		}
> +	}
> +
>  	ret = arm_smmu_setup_irqs(smmu);
>  	if (ret) {
>  		dev_err(smmu->dev, "failed to setup irqs\n");

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 31/37] iommu/arm-smmu-v3: Add support for PCI ATS
@ 2018-03-08 16:17         ` Jonathan Cameron
  0 siblings, 0 replies; 317+ messages in thread
From: Jonathan Cameron @ 2018-03-08 16:17 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm,
	joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	shunyong.yang, nwatters, okaya, jcrouse, rfranz, dwmw2,
	jacob.jun.pan, yi.l.liu, ashok.raj, robdclark, christian.koenig,
	bharatku

On Mon, 12 Feb 2018 18:33:46 +0000
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> PCIe devices can implement their own TLB, named Address Translation Cache
> (ATC). Enable Address Translation Service (ATS) for devices that support
> it and send them invalidation requests whenever we invalidate the IOTLBs.
> 
>   Range calculation
>   -----------------
> 
> The invalidation packet itself is a bit awkward: range must be naturally
> aligned, which means that the start address is a multiple of the range
> size. In addition, the size must be a power of two number of 4k pages. We
> have a few options to enforce this constraint:
> 
> (1) Find the smallest naturally aligned region that covers the requested
>     range. This is simple to compute and only takes one ATC_INV, but it
>     will spill on lots of neighbouring ATC entries.
> 
> (2) Align the start address to the region size (rounded up to a power of
>     two), and send a second invalidation for the next range of the same
>     size. Still not great, but reduces spilling.
> 
> (3) Cover the range exactly with the smallest number of naturally aligned
>     regions. This would be interesting to implement but as for (2),
>     requires multiple ATC_INV.
> 
> As I suspect ATC invalidation packets will be a very scarce resource, I'll
> go with option (1) for now, and only send one big invalidation. We can
> move to (2), which is both easier to read and more gentle with the ATC,
> once we've observed on real systems that we can send multiple smaller
> Invalidation Requests for roughly the same price as a single big one.
> 
> Note that with io-pgtable, the unmap function is called for each page, so
> this doesn't matter. The problem shows up when sharing page tables with
> the MMU.
> 
>   Timeout
>   -------
> 
> ATC invalidation is allowed to take up to 90 seconds, according to the
> PCIe spec, so it is possible to hit the SMMU command queue timeout during
> normal operations.
> 
> Some SMMU implementations will raise a CERROR_ATC_INV_SYNC when a CMD_SYNC
> fails because of an ATC invalidation. Some will just abort the CMD_SYNC.
> Others might let CMD_SYNC complete and have an asynchronous IMPDEF
> mechanism to record the error. When we receive a CERROR_ATC_INV_SYNC, we
> could retry sending all ATC_INV since last successful CMD_SYNC. When a
> CMD_SYNC fails without CERROR_ATC_INV_SYNC, we could retry sending *all*
> commands since last successful CMD_SYNC.
> 
> We cannot afford to wait 90 seconds in iommu_unmap, let alone MMU
> notifiers. So we'd have to introduce a more clever system if this timeout
> becomes a problem, like keeping hold of mappings and invalidating in the
> background. Implementing safe delayed invalidations is a very complex
> problem and deserves a series of its own. We'll assess whether more work
> is needed to properly handle ATC invalidation timeouts once this code runs
> on real hardware.
> 
>   Misc
>   ----
> 
> I didn't put ATC and TLB invalidations in the same functions for three
> reasons:
> 
> * TLB invalidation by range is batched and committed with a single sync.
>   Batching ATC invalidation is inconvenient, endpoints limit the number of
>   inflight invalidations. We'd have to count the number of invalidations
>   queued and send a sync periodically. In addition, I suspect we always
>   need a sync between TLB and ATC invalidation for the same page.
> 
> * Doing ATC invalidation outside tlb_inv_range also allows to send less
>   requests, since TLB invalidations are done per page or block, while ATC
>   invalidations target IOVA ranges.
> 
> * TLB invalidation by context is performed when freeing the domain, at
>   which point there isn't any device attached anymore.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Few minor error path related comments inline..

> ---
>  drivers/iommu/arm-smmu-v3.c | 236 ++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 226 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 8b9f5dd06be0..76513135310f 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -37,6 +37,7 @@
>  #include <linux/of_iommu.h>
>  #include <linux/of_platform.h>
>  #include <linux/pci.h>
> +#include <linux/pci-ats.h>
>  #include <linux/platform_device.h>
>  #include <linux/sched/mm.h>
>  
> @@ -109,6 +110,7 @@
>  #define IDR5_OAS_48_BIT			(5 << IDR5_OAS_SHIFT)
>  
>  #define ARM_SMMU_CR0			0x20
> +#define CR0_ATSCHK			(1 << 4)
>  #define CR0_CMDQEN			(1 << 3)
>  #define CR0_EVTQEN			(1 << 2)
>  #define CR0_PRIQEN			(1 << 1)
> @@ -304,6 +306,7 @@
>  #define CMDQ_ERR_CERROR_NONE_IDX	0
>  #define CMDQ_ERR_CERROR_ILL_IDX		1
>  #define CMDQ_ERR_CERROR_ABT_IDX		2
> +#define CMDQ_ERR_CERROR_ATC_INV_IDX	3
>  
>  #define CMDQ_0_OP_SHIFT			0
>  #define CMDQ_0_OP_MASK			0xffUL
> @@ -327,6 +330,15 @@
>  #define CMDQ_TLBI_1_VA_MASK		~0xfffUL
>  #define CMDQ_TLBI_1_IPA_MASK		0xfffffffff000UL
>  
> +#define CMDQ_ATC_0_SSID_SHIFT		12
> +#define CMDQ_ATC_0_SSID_MASK		0xfffffUL
> +#define CMDQ_ATC_0_SID_SHIFT		32
> +#define CMDQ_ATC_0_SID_MASK		0xffffffffUL
> +#define CMDQ_ATC_0_GLOBAL		(1UL << 9)
> +#define CMDQ_ATC_1_SIZE_SHIFT		0
> +#define CMDQ_ATC_1_SIZE_MASK		0x3fUL
> +#define CMDQ_ATC_1_ADDR_MASK		~0xfffUL
> +
>  #define CMDQ_PRI_0_SSID_SHIFT		12
>  #define CMDQ_PRI_0_SSID_MASK		0xfffffUL
>  #define CMDQ_PRI_0_SID_SHIFT		32
> @@ -425,6 +437,11 @@ module_param_named(disable_bypass, disable_bypass, bool, S_IRUGO);
>  MODULE_PARM_DESC(disable_bypass,
>  	"Disable bypass streams such that incoming transactions from devices that are not attached to an iommu domain will report an abort back to the device and will not be allowed to pass through the SMMU.");
>  
> +static bool disable_ats_check;
> +module_param_named(disable_ats_check, disable_ats_check, bool, S_IRUGO);
> +MODULE_PARM_DESC(disable_ats_check,
> +	"By default, the SMMU checks whether each incoming transaction marked as translated is allowed by the stream configuration. This option disables the check.");
> +
>  enum pri_resp {
>  	PRI_RESP_DENY,
>  	PRI_RESP_FAIL,
> @@ -498,6 +515,16 @@ struct arm_smmu_cmdq_ent {
>  			u64			addr;
>  		} tlbi;
>  
> +		#define CMDQ_OP_ATC_INV		0x40
> +		#define ATC_INV_SIZE_ALL	52
> +		struct {
> +			u32			sid;
> +			u32			ssid;
> +			u64			addr;
> +			u8			size;
> +			bool			global;
> +		} atc;
> +
>  		#define CMDQ_OP_PRI_RESP	0x41
>  		struct {
>  			u32			sid;
> @@ -928,6 +955,14 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
>  	case CMDQ_OP_TLBI_EL2_ASID:
>  		cmd[0] |= (u64)ent->tlbi.asid << CMDQ_TLBI_0_ASID_SHIFT;
>  		break;
> +	case CMDQ_OP_ATC_INV:
> +		cmd[0] |= ent->substream_valid ? CMDQ_0_SSV : 0;
> +		cmd[0] |= ent->atc.global ? CMDQ_ATC_0_GLOBAL : 0;
> +		cmd[0] |= ent->atc.ssid << CMDQ_ATC_0_SSID_SHIFT;
> +		cmd[0] |= (u64)ent->atc.sid << CMDQ_ATC_0_SID_SHIFT;
> +		cmd[1] |= ent->atc.size << CMDQ_ATC_1_SIZE_SHIFT;
> +		cmd[1] |= ent->atc.addr & CMDQ_ATC_1_ADDR_MASK;
> +		break;
>  	case CMDQ_OP_PRI_RESP:
>  		cmd[0] |= ent->substream_valid ? CMDQ_0_SSV : 0;
>  		cmd[0] |= ent->pri.ssid << CMDQ_PRI_0_SSID_SHIFT;
> @@ -984,6 +1019,7 @@ static void arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu)
>  		[CMDQ_ERR_CERROR_NONE_IDX]	= "No error",
>  		[CMDQ_ERR_CERROR_ILL_IDX]	= "Illegal command",
>  		[CMDQ_ERR_CERROR_ABT_IDX]	= "Abort on command fetch",
> +		[CMDQ_ERR_CERROR_ATC_INV_IDX]	= "ATC invalidate timeout",
>  	};
>  
>  	int i;
> @@ -1003,6 +1039,14 @@ static void arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu)
>  		dev_err(smmu->dev, "retrying command fetch\n");
>  	case CMDQ_ERR_CERROR_NONE_IDX:
>  		return;
> +	case CMDQ_ERR_CERROR_ATC_INV_IDX:
> +		/*
> +		 * ATC Invalidation Completion timeout. CONS is still pointing
> +		 * at the CMD_SYNC. Attempt to complete other pending commands
> +		 * by repeating the CMD_SYNC, though we might well end up back
> +		 * here since the ATC invalidation may still be pending.
> +		 */
> +		return;
>  	case CMDQ_ERR_CERROR_ILL_IDX:
>  		/* Fallthrough */
>  	default:
> @@ -1261,9 +1305,6 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
>  			 STRTAB_STE_1_S1C_CACHE_WBRA
>  			 << STRTAB_STE_1_S1COR_SHIFT |
>  			 STRTAB_STE_1_S1C_SH_ISH << STRTAB_STE_1_S1CSH_SHIFT |
> -#ifdef CONFIG_PCI_ATS
> -			 STRTAB_STE_1_EATS_TRANS << STRTAB_STE_1_EATS_SHIFT |
> -#endif
>  			 (smmu->features & ARM_SMMU_FEAT_E2H ?
>  			  STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1) <<
>  			 STRTAB_STE_1_STRW_SHIFT);
> @@ -1300,6 +1341,10 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
>  		val |= STRTAB_STE_0_CFG_S2_TRANS;
>  	}
>  
> +	if (IS_ENABLED(CONFIG_PCI_ATS))
> +		dst[1] |= cpu_to_le64(STRTAB_STE_1_EATS_TRANS
> +				      << STRTAB_STE_1_EATS_SHIFT);
> +
>  	arm_smmu_sync_ste_for_sid(smmu, sid);
>  	dst[0] = cpu_to_le64(val);
>  	arm_smmu_sync_ste_for_sid(smmu, sid);
> @@ -1680,6 +1725,104 @@ static irqreturn_t arm_smmu_combined_irq_handler(int irq, void *dev)
>  	return IRQ_WAKE_THREAD;
>  }
>  
> +/* ATS invalidation */
> +static bool arm_smmu_master_has_ats(struct arm_smmu_master_data *master)
> +{
> +	return dev_is_pci(master->dev) && to_pci_dev(master->dev)->ats_enabled;
> +}
> +
> +static void
> +arm_smmu_atc_inv_to_cmd(int ssid, unsigned long iova, size_t size,
> +			struct arm_smmu_cmdq_ent *cmd)
> +{
> +	size_t log2_span;
> +	size_t span_mask;
> +	/* ATC invalidates are always on 4096 bytes pages */
> +	size_t inval_grain_shift = 12;
> +	unsigned long page_start, page_end;
> +
> +	*cmd = (struct arm_smmu_cmdq_ent) {
> +		.opcode			= CMDQ_OP_ATC_INV,
> +		.substream_valid	= !!ssid,
> +		.atc.ssid		= ssid,
> +	};
> +
> +	if (!size) {
> +		cmd->atc.size = ATC_INV_SIZE_ALL;
> +		return;
> +	}
> +
> +	page_start	= iova >> inval_grain_shift;
> +	page_end	= (iova + size - 1) >> inval_grain_shift;
> +
> +	/*
> +	 * Find the smallest power of two that covers the range. Most
> +	 * significant differing bit between start and end address indicates the
> +	 * required span, ie. fls(start ^ end). For example:
> +	 *
> +	 * We want to invalidate pages [8; 11]. This is already the ideal range:
> +	 *		x = 0b1000 ^ 0b1011 = 0b11
> +	 *		span = 1 << fls(x) = 4
> +	 *
> +	 * To invalidate pages [7; 10], we need to invalidate [0; 15]:
> +	 *		x = 0b0111 ^ 0b1010 = 0b1101
> +	 *		span = 1 << fls(x) = 16
> +	 */
> +	log2_span	= fls_long(page_start ^ page_end);
> +	span_mask	= (1ULL << log2_span) - 1;
> +
> +	page_start	&= ~span_mask;
> +
> +	cmd->atc.addr	= page_start << inval_grain_shift;
> +	cmd->atc.size	= log2_span;
> +}
> +
> +static int arm_smmu_atc_inv_master(struct arm_smmu_master_data *master,
> +				   struct arm_smmu_cmdq_ent *cmd)
> +{
> +	int i;
> +	struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
> +
> +	if (!arm_smmu_master_has_ats(master))
> +		return 0;
> +
> +	for (i = 0; i < fwspec->num_ids; i++) {
> +		cmd->atc.sid = fwspec->ids[i];
> +		arm_smmu_cmdq_issue_cmd(master->smmu, cmd);
> +	}
> +
> +	arm_smmu_cmdq_issue_sync(master->smmu);
> +
> +	return 0;
> +}
> +
> +static int arm_smmu_atc_inv_master_all(struct arm_smmu_master_data *master,
> +				       int ssid)
> +{
> +	struct arm_smmu_cmdq_ent cmd;
> +
> +	arm_smmu_atc_inv_to_cmd(ssid, 0, 0, &cmd);
> +	return arm_smmu_atc_inv_master(master, &cmd);
> +}
> +
> +static size_t
> +arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
> +			unsigned long iova, size_t size)
> +{
> +	unsigned long flags;
> +	struct arm_smmu_cmdq_ent cmd;
> +	struct arm_smmu_master_data *master;
> +
> +	arm_smmu_atc_inv_to_cmd(ssid, iova, size, &cmd);
> +
> +	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
> +	list_for_each_entry(master, &smmu_domain->devices, list)
> +		arm_smmu_atc_inv_master(master, &cmd);
> +	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
> +
> +	return size;
> +}
> +
>  /* IO_PGTABLE API */
>  static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu)
>  {
> @@ -2092,6 +2235,8 @@ static void arm_smmu_detach_dev(struct device *dev)
>  	if (smmu_domain) {
>  		__iommu_sva_unbind_dev_all(dev);
>  
> +		arm_smmu_atc_inv_master_all(master, 0);
> +
>  		spin_lock_irqsave(&smmu_domain->devices_lock, flags);
>  		list_del(&master->list);
>  		spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
> @@ -2179,12 +2324,19 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>  static size_t
>  arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
>  {
> -	struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
> +	int ret;
> +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> +	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>  
>  	if (!ops)
>  		return 0;
>  
> -	return ops->unmap(ops, iova, size);
> +	ret = ops->unmap(ops, iova, size);
> +
> +	if (ret && smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS)
> +		ret = arm_smmu_atc_inv_domain(smmu_domain, 0, iova, size);
> +
> +	return ret;
>  }
>  
>  static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
> @@ -2342,6 +2494,48 @@ static bool arm_smmu_sid_in_range(struct arm_smmu_device *smmu, u32 sid)
>  	return sid < limit;
>  }
>  
> +static int arm_smmu_enable_ats(struct arm_smmu_master_data *master)
> +{
> +	int ret;
> +	size_t stu;
> +	struct pci_dev *pdev;
> +	struct arm_smmu_device *smmu = master->smmu;
> +	struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
> +
> +	if (!(smmu->features & ARM_SMMU_FEAT_ATS) || !dev_is_pci(master->dev) ||
> +	    (fwspec->flags & IOMMU_FWSPEC_PCI_NO_ATS))
> +		return -ENOSYS;
> +
> +	pdev = to_pci_dev(master->dev);
> +
> +	/* Smallest Translation Unit: log2 of the smallest supported granule */
> +	stu = __ffs(smmu->pgsize_bitmap);
> +
> +	ret = pci_enable_ats(pdev, stu);
> +	if (ret)
> +		return ret;
> +
> +	dev_dbg(&pdev->dev, "enabled ATS (STU=%zu, QDEP=%d)\n", stu,
> +		pci_ats_queue_depth(pdev));
> +
> +	return 0;
> +}
> +
> +static void arm_smmu_disable_ats(struct arm_smmu_master_data *master)
> +{
> +	struct pci_dev *pdev;
> +
> +	if (!dev_is_pci(master->dev))
> +		return;
> +
> +	pdev = to_pci_dev(master->dev);
> +
> +	if (!pdev->ats_enabled)
> +		return;
> +
> +	pci_disable_ats(pdev);
> +}
> +
>  static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
>  				  struct arm_smmu_master_data *master)
>  {
> @@ -2462,14 +2656,24 @@ static int arm_smmu_add_device(struct device *dev)
>  		master->ste.can_stall = true;
>  	}
>  
> +	arm_smmu_enable_ats(master);
It's a bit nasty not to handle the errors that this could output (other than
the ENOSYS for when it's not available). Seems that it would be nice to at
least add a note to the log if people are expecting it to work and it won't
because some condition or other isn't met.

> +
>  	group = iommu_group_get_for_dev(dev);
> -	if (!IS_ERR(group)) {
> -		arm_smmu_insert_master(smmu, master);
> -		iommu_group_put(group);
> -		iommu_device_link(&smmu->iommu, dev);
> +	if (IS_ERR(group)) {
> +		ret = PTR_ERR(group);
> +		goto err_disable_ats;
>  	}
>  
> -	return PTR_ERR_OR_ZERO(group);
> +	iommu_group_put(group);
> +	arm_smmu_insert_master(smmu, master);
> +	iommu_device_link(&smmu->iommu, dev);
> +
> +	return 0;
> +
> +err_disable_ats:
> +	arm_smmu_disable_ats(master);
master is leaked here I think...
Possibly other things as this doesn't line up with the
remove which I'd have mostly expected it to do.

There are some slightly fishy bits of ordering in the original code
anyway that I'm not seeing justification for (why is
the iommu_device_unlink later than one might expect for
example).

> +
> +	return ret;
>  }
>  
>  static void arm_smmu_remove_device(struct device *dev)
> @@ -2486,6 +2690,8 @@ static void arm_smmu_remove_device(struct device *dev)
>  	if (master && master->ste.assigned)
>  		arm_smmu_detach_dev(dev);
>  	arm_smmu_remove_master(smmu, master);
> +	arm_smmu_disable_ats(master);
> +
>  	iommu_group_remove_device(dev);
>  	iommu_device_unlink(&smmu->iommu, dev);
>  	kfree(master);
> @@ -3094,6 +3300,16 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
>  		}
>  	}
>  
> +	if (smmu->features & ARM_SMMU_FEAT_ATS && !disable_ats_check) {
> +		enables |= CR0_ATSCHK;
> +		ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0,
> +					      ARM_SMMU_CR0ACK);
> +		if (ret) {
> +			dev_err(smmu->dev, "failed to enable ATS check\n");
> +			return ret;
> +		}
> +	}
> +
>  	ret = arm_smmu_setup_irqs(smmu);
>  	if (ret) {
>  		dev_err(smmu->dev, "failed to setup irqs\n");


^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 31/37] iommu/arm-smmu-v3: Add support for PCI ATS
@ 2018-03-08 16:17         ` Jonathan Cameron
  0 siblings, 0 replies; 317+ messages in thread
From: Jonathan Cameron @ 2018-03-08 16:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 12 Feb 2018 18:33:46 +0000
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> PCIe devices can implement their own TLB, named Address Translation Cache
> (ATC). Enable Address Translation Service (ATS) for devices that support
> it and send them invalidation requests whenever we invalidate the IOTLBs.
> 
>   Range calculation
>   -----------------
> 
> The invalidation packet itself is a bit awkward: range must be naturally
> aligned, which means that the start address is a multiple of the range
> size. In addition, the size must be a power of two number of 4k pages. We
> have a few options to enforce this constraint:
> 
> (1) Find the smallest naturally aligned region that covers the requested
>     range. This is simple to compute and only takes one ATC_INV, but it
>     will spill on lots of neighbouring ATC entries.
> 
> (2) Align the start address to the region size (rounded up to a power of
>     two), and send a second invalidation for the next range of the same
>     size. Still not great, but reduces spilling.
> 
> (3) Cover the range exactly with the smallest number of naturally aligned
>     regions. This would be interesting to implement but as for (2),
>     requires multiple ATC_INV.
> 
> As I suspect ATC invalidation packets will be a very scarce resource, I'll
> go with option (1) for now, and only send one big invalidation. We can
> move to (2), which is both easier to read and more gentle with the ATC,
> once we've observed on real systems that we can send multiple smaller
> Invalidation Requests for roughly the same price as a single big one.
> 
> Note that with io-pgtable, the unmap function is called for each page, so
> this doesn't matter. The problem shows up when sharing page tables with
> the MMU.
> 
>   Timeout
>   -------
> 
> ATC invalidation is allowed to take up to 90 seconds, according to the
> PCIe spec, so it is possible to hit the SMMU command queue timeout during
> normal operations.
> 
> Some SMMU implementations will raise a CERROR_ATC_INV_SYNC when a CMD_SYNC
> fails because of an ATC invalidation. Some will just abort the CMD_SYNC.
> Others might let CMD_SYNC complete and have an asynchronous IMPDEF
> mechanism to record the error. When we receive a CERROR_ATC_INV_SYNC, we
> could retry sending all ATC_INV since last successful CMD_SYNC. When a
> CMD_SYNC fails without CERROR_ATC_INV_SYNC, we could retry sending *all*
> commands since last successful CMD_SYNC.
> 
> We cannot afford to wait 90 seconds in iommu_unmap, let alone MMU
> notifiers. So we'd have to introduce a more clever system if this timeout
> becomes a problem, like keeping hold of mappings and invalidating in the
> background. Implementing safe delayed invalidations is a very complex
> problem and deserves a series of its own. We'll assess whether more work
> is needed to properly handle ATC invalidation timeouts once this code runs
> on real hardware.
> 
>   Misc
>   ----
> 
> I didn't put ATC and TLB invalidations in the same functions for three
> reasons:
> 
> * TLB invalidation by range is batched and committed with a single sync.
>   Batching ATC invalidation is inconvenient, endpoints limit the number of
>   inflight invalidations. We'd have to count the number of invalidations
>   queued and send a sync periodically. In addition, I suspect we always
>   need a sync between TLB and ATC invalidation for the same page.
> 
> * Doing ATC invalidation outside tlb_inv_range also allows to send less
>   requests, since TLB invalidations are done per page or block, while ATC
>   invalidations target IOVA ranges.
> 
> * TLB invalidation by context is performed when freeing the domain, at
>   which point there isn't any device attached anymore.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Few minor error path related comments inline..

> ---
>  drivers/iommu/arm-smmu-v3.c | 236 ++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 226 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 8b9f5dd06be0..76513135310f 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -37,6 +37,7 @@
>  #include <linux/of_iommu.h>
>  #include <linux/of_platform.h>
>  #include <linux/pci.h>
> +#include <linux/pci-ats.h>
>  #include <linux/platform_device.h>
>  #include <linux/sched/mm.h>
>  
> @@ -109,6 +110,7 @@
>  #define IDR5_OAS_48_BIT			(5 << IDR5_OAS_SHIFT)
>  
>  #define ARM_SMMU_CR0			0x20
> +#define CR0_ATSCHK			(1 << 4)
>  #define CR0_CMDQEN			(1 << 3)
>  #define CR0_EVTQEN			(1 << 2)
>  #define CR0_PRIQEN			(1 << 1)
> @@ -304,6 +306,7 @@
>  #define CMDQ_ERR_CERROR_NONE_IDX	0
>  #define CMDQ_ERR_CERROR_ILL_IDX		1
>  #define CMDQ_ERR_CERROR_ABT_IDX		2
> +#define CMDQ_ERR_CERROR_ATC_INV_IDX	3
>  
>  #define CMDQ_0_OP_SHIFT			0
>  #define CMDQ_0_OP_MASK			0xffUL
> @@ -327,6 +330,15 @@
>  #define CMDQ_TLBI_1_VA_MASK		~0xfffUL
>  #define CMDQ_TLBI_1_IPA_MASK		0xfffffffff000UL
>  
> +#define CMDQ_ATC_0_SSID_SHIFT		12
> +#define CMDQ_ATC_0_SSID_MASK		0xfffffUL
> +#define CMDQ_ATC_0_SID_SHIFT		32
> +#define CMDQ_ATC_0_SID_MASK		0xffffffffUL
> +#define CMDQ_ATC_0_GLOBAL		(1UL << 9)
> +#define CMDQ_ATC_1_SIZE_SHIFT		0
> +#define CMDQ_ATC_1_SIZE_MASK		0x3fUL
> +#define CMDQ_ATC_1_ADDR_MASK		~0xfffUL
> +
>  #define CMDQ_PRI_0_SSID_SHIFT		12
>  #define CMDQ_PRI_0_SSID_MASK		0xfffffUL
>  #define CMDQ_PRI_0_SID_SHIFT		32
> @@ -425,6 +437,11 @@ module_param_named(disable_bypass, disable_bypass, bool, S_IRUGO);
>  MODULE_PARM_DESC(disable_bypass,
>  	"Disable bypass streams such that incoming transactions from devices that are not attached to an iommu domain will report an abort back to the device and will not be allowed to pass through the SMMU.");
>  
> +static bool disable_ats_check;
> +module_param_named(disable_ats_check, disable_ats_check, bool, S_IRUGO);
> +MODULE_PARM_DESC(disable_ats_check,
> +	"By default, the SMMU checks whether each incoming transaction marked as translated is allowed by the stream configuration. This option disables the check.");
> +
>  enum pri_resp {
>  	PRI_RESP_DENY,
>  	PRI_RESP_FAIL,
> @@ -498,6 +515,16 @@ struct arm_smmu_cmdq_ent {
>  			u64			addr;
>  		} tlbi;
>  
> +		#define CMDQ_OP_ATC_INV		0x40
> +		#define ATC_INV_SIZE_ALL	52
> +		struct {
> +			u32			sid;
> +			u32			ssid;
> +			u64			addr;
> +			u8			size;
> +			bool			global;
> +		} atc;
> +
>  		#define CMDQ_OP_PRI_RESP	0x41
>  		struct {
>  			u32			sid;
> @@ -928,6 +955,14 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
>  	case CMDQ_OP_TLBI_EL2_ASID:
>  		cmd[0] |= (u64)ent->tlbi.asid << CMDQ_TLBI_0_ASID_SHIFT;
>  		break;
> +	case CMDQ_OP_ATC_INV:
> +		cmd[0] |= ent->substream_valid ? CMDQ_0_SSV : 0;
> +		cmd[0] |= ent->atc.global ? CMDQ_ATC_0_GLOBAL : 0;
> +		cmd[0] |= ent->atc.ssid << CMDQ_ATC_0_SSID_SHIFT;
> +		cmd[0] |= (u64)ent->atc.sid << CMDQ_ATC_0_SID_SHIFT;
> +		cmd[1] |= ent->atc.size << CMDQ_ATC_1_SIZE_SHIFT;
> +		cmd[1] |= ent->atc.addr & CMDQ_ATC_1_ADDR_MASK;
> +		break;
>  	case CMDQ_OP_PRI_RESP:
>  		cmd[0] |= ent->substream_valid ? CMDQ_0_SSV : 0;
>  		cmd[0] |= ent->pri.ssid << CMDQ_PRI_0_SSID_SHIFT;
> @@ -984,6 +1019,7 @@ static void arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu)
>  		[CMDQ_ERR_CERROR_NONE_IDX]	= "No error",
>  		[CMDQ_ERR_CERROR_ILL_IDX]	= "Illegal command",
>  		[CMDQ_ERR_CERROR_ABT_IDX]	= "Abort on command fetch",
> +		[CMDQ_ERR_CERROR_ATC_INV_IDX]	= "ATC invalidate timeout",
>  	};
>  
>  	int i;
> @@ -1003,6 +1039,14 @@ static void arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu)
>  		dev_err(smmu->dev, "retrying command fetch\n");
>  	case CMDQ_ERR_CERROR_NONE_IDX:
>  		return;
> +	case CMDQ_ERR_CERROR_ATC_INV_IDX:
> +		/*
> +		 * ATC Invalidation Completion timeout. CONS is still pointing
> +		 * at the CMD_SYNC. Attempt to complete other pending commands
> +		 * by repeating the CMD_SYNC, though we might well end up back
> +		 * here since the ATC invalidation may still be pending.
> +		 */
> +		return;
>  	case CMDQ_ERR_CERROR_ILL_IDX:
>  		/* Fallthrough */
>  	default:
> @@ -1261,9 +1305,6 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
>  			 STRTAB_STE_1_S1C_CACHE_WBRA
>  			 << STRTAB_STE_1_S1COR_SHIFT |
>  			 STRTAB_STE_1_S1C_SH_ISH << STRTAB_STE_1_S1CSH_SHIFT |
> -#ifdef CONFIG_PCI_ATS
> -			 STRTAB_STE_1_EATS_TRANS << STRTAB_STE_1_EATS_SHIFT |
> -#endif
>  			 (smmu->features & ARM_SMMU_FEAT_E2H ?
>  			  STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1) <<
>  			 STRTAB_STE_1_STRW_SHIFT);
> @@ -1300,6 +1341,10 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
>  		val |= STRTAB_STE_0_CFG_S2_TRANS;
>  	}
>  
> +	if (IS_ENABLED(CONFIG_PCI_ATS))
> +		dst[1] |= cpu_to_le64(STRTAB_STE_1_EATS_TRANS
> +				      << STRTAB_STE_1_EATS_SHIFT);
> +
>  	arm_smmu_sync_ste_for_sid(smmu, sid);
>  	dst[0] = cpu_to_le64(val);
>  	arm_smmu_sync_ste_for_sid(smmu, sid);
> @@ -1680,6 +1725,104 @@ static irqreturn_t arm_smmu_combined_irq_handler(int irq, void *dev)
>  	return IRQ_WAKE_THREAD;
>  }
>  
> +/* ATS invalidation */
> +static bool arm_smmu_master_has_ats(struct arm_smmu_master_data *master)
> +{
> +	return dev_is_pci(master->dev) && to_pci_dev(master->dev)->ats_enabled;
> +}
> +
> +static void
> +arm_smmu_atc_inv_to_cmd(int ssid, unsigned long iova, size_t size,
> +			struct arm_smmu_cmdq_ent *cmd)
> +{
> +	size_t log2_span;
> +	size_t span_mask;
> +	/* ATC invalidates are always on 4096 bytes pages */
> +	size_t inval_grain_shift = 12;
> +	unsigned long page_start, page_end;
> +
> +	*cmd = (struct arm_smmu_cmdq_ent) {
> +		.opcode			= CMDQ_OP_ATC_INV,
> +		.substream_valid	= !!ssid,
> +		.atc.ssid		= ssid,
> +	};
> +
> +	if (!size) {
> +		cmd->atc.size = ATC_INV_SIZE_ALL;
> +		return;
> +	}
> +
> +	page_start	= iova >> inval_grain_shift;
> +	page_end	= (iova + size - 1) >> inval_grain_shift;
> +
> +	/*
> +	 * Find the smallest power of two that covers the range. Most
> +	 * significant differing bit between start and end address indicates the
> +	 * required span, ie. fls(start ^ end). For example:
> +	 *
> +	 * We want to invalidate pages [8; 11]. This is already the ideal range:
> +	 *		x = 0b1000 ^ 0b1011 = 0b11
> +	 *		span = 1 << fls(x) = 4
> +	 *
> +	 * To invalidate pages [7; 10], we need to invalidate [0; 15]:
> +	 *		x = 0b0111 ^ 0b1010 = 0b1101
> +	 *		span = 1 << fls(x) = 16
> +	 */
> +	log2_span	= fls_long(page_start ^ page_end);
> +	span_mask	= (1ULL << log2_span) - 1;
> +
> +	page_start	&= ~span_mask;
> +
> +	cmd->atc.addr	= page_start << inval_grain_shift;
> +	cmd->atc.size	= log2_span;
> +}
> +
> +static int arm_smmu_atc_inv_master(struct arm_smmu_master_data *master,
> +				   struct arm_smmu_cmdq_ent *cmd)
> +{
> +	int i;
> +	struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
> +
> +	if (!arm_smmu_master_has_ats(master))
> +		return 0;
> +
> +	for (i = 0; i < fwspec->num_ids; i++) {
> +		cmd->atc.sid = fwspec->ids[i];
> +		arm_smmu_cmdq_issue_cmd(master->smmu, cmd);
> +	}
> +
> +	arm_smmu_cmdq_issue_sync(master->smmu);
> +
> +	return 0;
> +}
> +
> +static int arm_smmu_atc_inv_master_all(struct arm_smmu_master_data *master,
> +				       int ssid)
> +{
> +	struct arm_smmu_cmdq_ent cmd;
> +
> +	arm_smmu_atc_inv_to_cmd(ssid, 0, 0, &cmd);
> +	return arm_smmu_atc_inv_master(master, &cmd);
> +}
> +
> +static size_t
> +arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
> +			unsigned long iova, size_t size)
> +{
> +	unsigned long flags;
> +	struct arm_smmu_cmdq_ent cmd;
> +	struct arm_smmu_master_data *master;
> +
> +	arm_smmu_atc_inv_to_cmd(ssid, iova, size, &cmd);
> +
> +	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
> +	list_for_each_entry(master, &smmu_domain->devices, list)
> +		arm_smmu_atc_inv_master(master, &cmd);
> +	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
> +
> +	return size;
> +}
> +
>  /* IO_PGTABLE API */
>  static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu)
>  {
> @@ -2092,6 +2235,8 @@ static void arm_smmu_detach_dev(struct device *dev)
>  	if (smmu_domain) {
>  		__iommu_sva_unbind_dev_all(dev);
>  
> +		arm_smmu_atc_inv_master_all(master, 0);
> +
>  		spin_lock_irqsave(&smmu_domain->devices_lock, flags);
>  		list_del(&master->list);
>  		spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
> @@ -2179,12 +2324,19 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>  static size_t
>  arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
>  {
> -	struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
> +	int ret;
> +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> +	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>  
>  	if (!ops)
>  		return 0;
>  
> -	return ops->unmap(ops, iova, size);
> +	ret = ops->unmap(ops, iova, size);
> +
> +	if (ret && smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS)
> +		ret = arm_smmu_atc_inv_domain(smmu_domain, 0, iova, size);
> +
> +	return ret;
>  }
>  
>  static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
> @@ -2342,6 +2494,48 @@ static bool arm_smmu_sid_in_range(struct arm_smmu_device *smmu, u32 sid)
>  	return sid < limit;
>  }
>  
> +static int arm_smmu_enable_ats(struct arm_smmu_master_data *master)
> +{
> +	int ret;
> +	size_t stu;
> +	struct pci_dev *pdev;
> +	struct arm_smmu_device *smmu = master->smmu;
> +	struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
> +
> +	if (!(smmu->features & ARM_SMMU_FEAT_ATS) || !dev_is_pci(master->dev) ||
> +	    (fwspec->flags & IOMMU_FWSPEC_PCI_NO_ATS))
> +		return -ENOSYS;
> +
> +	pdev = to_pci_dev(master->dev);
> +
> +	/* Smallest Translation Unit: log2 of the smallest supported granule */
> +	stu = __ffs(smmu->pgsize_bitmap);
> +
> +	ret = pci_enable_ats(pdev, stu);
> +	if (ret)
> +		return ret;
> +
> +	dev_dbg(&pdev->dev, "enabled ATS (STU=%zu, QDEP=%d)\n", stu,
> +		pci_ats_queue_depth(pdev));
> +
> +	return 0;
> +}
> +
> +static void arm_smmu_disable_ats(struct arm_smmu_master_data *master)
> +{
> +	struct pci_dev *pdev;
> +
> +	if (!dev_is_pci(master->dev))
> +		return;
> +
> +	pdev = to_pci_dev(master->dev);
> +
> +	if (!pdev->ats_enabled)
> +		return;
> +
> +	pci_disable_ats(pdev);
> +}
> +
>  static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
>  				  struct arm_smmu_master_data *master)
>  {
> @@ -2462,14 +2656,24 @@ static int arm_smmu_add_device(struct device *dev)
>  		master->ste.can_stall = true;
>  	}
>  
> +	arm_smmu_enable_ats(master);
It's a bit nasty not to handle the errors that this could output (other than
the ENOSYS for when it's not available). Seems that it would be nice to at
least add a note to the log if people are expecting it to work and it won't
because some condition or other isn't met.

> +
>  	group = iommu_group_get_for_dev(dev);
> -	if (!IS_ERR(group)) {
> -		arm_smmu_insert_master(smmu, master);
> -		iommu_group_put(group);
> -		iommu_device_link(&smmu->iommu, dev);
> +	if (IS_ERR(group)) {
> +		ret = PTR_ERR(group);
> +		goto err_disable_ats;
>  	}
>  
> -	return PTR_ERR_OR_ZERO(group);
> +	iommu_group_put(group);
> +	arm_smmu_insert_master(smmu, master);
> +	iommu_device_link(&smmu->iommu, dev);
> +
> +	return 0;
> +
> +err_disable_ats:
> +	arm_smmu_disable_ats(master);
master is leaked here I think...
Possibly other things as this doesn't line up with the
remove which I'd have mostly expected it to do.

There are some slightly fishy bits of ordering in the original code
anyway that I'm not seeing justification for (why is
the iommu_device_unlink later than one might expect for
example).

> +
> +	return ret;
>  }
>  
>  static void arm_smmu_remove_device(struct device *dev)
> @@ -2486,6 +2690,8 @@ static void arm_smmu_remove_device(struct device *dev)
>  	if (master && master->ste.assigned)
>  		arm_smmu_detach_dev(dev);
>  	arm_smmu_remove_master(smmu, master);
> +	arm_smmu_disable_ats(master);
> +
>  	iommu_group_remove_device(dev);
>  	iommu_device_unlink(&smmu->iommu, dev);
>  	kfree(master);
> @@ -3094,6 +3300,16 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
>  		}
>  	}
>  
> +	if (smmu->features & ARM_SMMU_FEAT_ATS && !disable_ats_check) {
> +		enables |= CR0_ATSCHK;
> +		ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0,
> +					      ARM_SMMU_CR0ACK);
> +		if (ret) {
> +			dev_err(smmu->dev, "failed to enable ATS check\n");
> +			return ret;
> +		}
> +	}
> +
>  	ret = arm_smmu_setup_irqs(smmu);
>  	if (ret) {
>  		dev_err(smmu->dev, "failed to setup irqs\n");

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 35/37] iommu/arm-smmu-v3: Add support for PRI
  2018-02-12 18:33     ` Jean-Philippe Brucker
  (?)
@ 2018-03-08 16:24         ` Jonathan Cameron
  -1 siblings, 0 replies; 317+ messages in thread
From: Jonathan Cameron @ 2018-03-08 16:24 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: mark.rutland-5wv7dgnIgG8,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, catalin.marinas-5wv7dgnIgG8,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw-LthD3rsA81gm4RdzfppkhA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	sudeep.holla-5wv7dgnIgG8, christian.koenig-5C7GfCeVMHo

On Mon, 12 Feb 2018 18:33:50 +0000
Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:

> For PCI devices that support it, enable the PRI capability and handle
> PRI Page Requests with the generic fault handler.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
A couple of nitpicks.

> ---
>  drivers/iommu/arm-smmu-v3.c | 174 ++++++++++++++++++++++++++++++--------------
>  1 file changed, 119 insertions(+), 55 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 8d09615fab35..ace2f995b0c0 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -271,6 +271,7 @@
>  #define STRTAB_STE_1_S1COR_SHIFT	4
>  #define STRTAB_STE_1_S1CSH_SHIFT	6
>  
> +#define STRTAB_STE_1_PPAR		(1UL << 18)
>  #define STRTAB_STE_1_S1STALLD		(1UL << 27)
>  
>  #define STRTAB_STE_1_EATS_ABT		0UL
> @@ -346,9 +347,9 @@
>  #define CMDQ_PRI_1_GRPID_SHIFT		0
>  #define CMDQ_PRI_1_GRPID_MASK		0x1ffUL
>  #define CMDQ_PRI_1_RESP_SHIFT		12
> -#define CMDQ_PRI_1_RESP_DENY		(0UL << CMDQ_PRI_1_RESP_SHIFT)
> -#define CMDQ_PRI_1_RESP_FAIL		(1UL << CMDQ_PRI_1_RESP_SHIFT)
> -#define CMDQ_PRI_1_RESP_SUCC		(2UL << CMDQ_PRI_1_RESP_SHIFT)
> +#define CMDQ_PRI_1_RESP_FAILURE		(0UL << CMDQ_PRI_1_RESP_SHIFT)
> +#define CMDQ_PRI_1_RESP_INVALID		(1UL << CMDQ_PRI_1_RESP_SHIFT)
> +#define CMDQ_PRI_1_RESP_SUCCESS		(2UL << CMDQ_PRI_1_RESP_SHIFT)
Mixing fixing up this naming with the rest of the patch does make things a
little harder to read than they would have been if done as separate patches.
Worth splitting?

>  
>  #define CMDQ_RESUME_0_SID_SHIFT		32
>  #define CMDQ_RESUME_0_SID_MASK		0xffffffffUL
> @@ -442,12 +443,6 @@ module_param_named(disable_ats_check, disable_ats_check, bool, S_IRUGO);
>  MODULE_PARM_DESC(disable_ats_check,
>  	"By default, the SMMU checks whether each incoming transaction marked as translated is allowed by the stream configuration. This option disables the check.");
>  
> -enum pri_resp {
> -	PRI_RESP_DENY,
> -	PRI_RESP_FAIL,
> -	PRI_RESP_SUCC,
> -};
> -
>  enum arm_smmu_msi_index {
>  	EVTQ_MSI_INDEX,
>  	GERROR_MSI_INDEX,
> @@ -530,7 +525,7 @@ struct arm_smmu_cmdq_ent {
>  			u32			sid;
>  			u32			ssid;
>  			u16			grpid;
> -			enum pri_resp		resp;
> +			enum page_response_code	resp;
>  		} pri;
>  
>  		#define CMDQ_OP_RESUME		0x44
> @@ -615,6 +610,7 @@ struct arm_smmu_strtab_ent {
>  	struct arm_smmu_s2_cfg		*s2_cfg;
>  
>  	bool				can_stall;
> +	bool				prg_resp_needs_ssid;
>  };
>  
>  struct arm_smmu_strtab_cfg {
> @@ -969,14 +965,14 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
>  		cmd[0] |= (u64)ent->pri.sid << CMDQ_PRI_0_SID_SHIFT;
>  		cmd[1] |= ent->pri.grpid << CMDQ_PRI_1_GRPID_SHIFT;
>  		switch (ent->pri.resp) {
> -		case PRI_RESP_DENY:
> -			cmd[1] |= CMDQ_PRI_1_RESP_DENY;
> +		case IOMMU_PAGE_RESP_FAILURE:
> +			cmd[1] |= CMDQ_PRI_1_RESP_FAILURE;
>  			break;
> -		case PRI_RESP_FAIL:
> -			cmd[1] |= CMDQ_PRI_1_RESP_FAIL;
> +		case IOMMU_PAGE_RESP_INVALID:
> +			cmd[1] |= CMDQ_PRI_1_RESP_INVALID;
>  			break;
> -		case PRI_RESP_SUCC:
> -			cmd[1] |= CMDQ_PRI_1_RESP_SUCC;
> +		case IOMMU_PAGE_RESP_SUCCESS:
> +			cmd[1] |= CMDQ_PRI_1_RESP_SUCCESS;
>  			break;
>  		default:
>  			return -EINVAL;
> @@ -1180,9 +1176,16 @@ static int arm_smmu_page_response(struct iommu_domain *domain,
>  		cmd.resume.sid		= sid;
>  		cmd.resume.stag		= resp->page_req_group_id;
>  		cmd.resume.resp		= resp->resp_code;
> +	} else if (master->can_fault) {
> +		cmd.opcode		= CMDQ_OP_PRI_RESP;
> +		cmd.substream_valid	= resp->pasid_present &&
> +					  master->ste.prg_resp_needs_ssid;
> +		cmd.pri.sid		= sid;
> +		cmd.pri.ssid		= resp->pasid;
> +		cmd.pri.grpid		= resp->page_req_group_id;
> +		cmd.pri.resp		= resp->resp_code;
>  	} else {
> -		/* TODO: put PRI response here */
> -		return -EINVAL;
> +		return -ENODEV;
>  	}
>  
>  	arm_smmu_cmdq_issue_cmd(master->smmu, &cmd);
> @@ -1309,6 +1312,9 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
>  			  STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1) <<
>  			 STRTAB_STE_1_STRW_SHIFT);
>  
> +		if (ste->prg_resp_needs_ssid)
> +			dst[1] |= STRTAB_STE_1_PPAR;
> +
>  		if (smmu->features & ARM_SMMU_FEAT_STALLS &&
>  		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE) &&
>  		   !ste->can_stall)
> @@ -1536,40 +1542,32 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
>  
>  static void arm_smmu_handle_ppr(struct arm_smmu_device *smmu, u64 *evt)
>  {
> -	u32 sid, ssid;
> -	u16 grpid;
> -	bool ssv, last;
> -
> -	sid = evt[0] >> PRIQ_0_SID_SHIFT & PRIQ_0_SID_MASK;
> -	ssv = evt[0] & PRIQ_0_SSID_V;
> -	ssid = ssv ? evt[0] >> PRIQ_0_SSID_SHIFT & PRIQ_0_SSID_MASK : 0;
> -	last = evt[0] & PRIQ_0_PRG_LAST;
> -	grpid = evt[1] >> PRIQ_1_PRG_IDX_SHIFT & PRIQ_1_PRG_IDX_MASK;
> -
> -	dev_info(smmu->dev, "unexpected PRI request received:\n");
> -	dev_info(smmu->dev,
> -		 "\tsid 0x%08x.0x%05x: [%u%s] %sprivileged %s%s%s access at iova 0x%016llx\n",
> -		 sid, ssid, grpid, last ? "L" : "",
> -		 evt[0] & PRIQ_0_PERM_PRIV ? "" : "un",
> -		 evt[0] & PRIQ_0_PERM_READ ? "R" : "",
> -		 evt[0] & PRIQ_0_PERM_WRITE ? "W" : "",
> -		 evt[0] & PRIQ_0_PERM_EXEC ? "X" : "",
> -		 evt[1] & PRIQ_1_ADDR_MASK << PRIQ_1_ADDR_SHIFT);
> -
> -	if (last) {
> -		struct arm_smmu_cmdq_ent cmd = {
> -			.opcode			= CMDQ_OP_PRI_RESP,
> -			.substream_valid	= ssv,
> -			.pri			= {
> -				.sid	= sid,
> -				.ssid	= ssid,
> -				.grpid	= grpid,
> -				.resp	= PRI_RESP_DENY,
> -			},
> -		};
> +	u32 sid = evt[0] >> PRIQ_0_SID_SHIFT & PRIQ_0_SID_MASK;
>  
> -		arm_smmu_cmdq_issue_cmd(smmu, &cmd);
> -	}
> +	struct arm_smmu_master_data *master;
> +	struct iommu_fault_event fault = {
> +		.type		= IOMMU_FAULT_PAGE_REQ,
> +		.last_req	= !!(evt[0] & PRIQ_0_PRG_LAST),
> +		.pasid_valid	= !!(evt[0] & PRIQ_0_SSID_V),
> +		.pasid		= evt[0] >> PRIQ_0_SSID_SHIFT & PRIQ_0_SSID_MASK,
> +		.page_req_group_id = evt[1] >> PRIQ_1_PRG_IDX_SHIFT & PRIQ_1_PRG_IDX_MASK,
> +		.addr		= evt[1] & PRIQ_1_ADDR_MASK << PRIQ_1_ADDR_SHIFT,
> +	};
> +
> +	if (evt[0] & PRIQ_0_PERM_READ)
> +		fault.prot |= IOMMU_FAULT_READ;
> +	if (evt[0] & PRIQ_0_PERM_WRITE)
> +		fault.prot |= IOMMU_FAULT_WRITE;
> +	if (evt[0] & PRIQ_0_PERM_EXEC)
> +		fault.prot |= IOMMU_FAULT_EXEC;
> +	if (evt[0] & PRIQ_0_PERM_PRIV)
> +		fault.prot |= IOMMU_FAULT_PRIV;
> +
> +	master = arm_smmu_find_master(smmu, sid);
> +	if (WARN_ON(!master))
> +		return;
> +
> +	iommu_report_device_fault(master->dev, &fault);
>  }
>  
>  static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
> @@ -1594,6 +1592,11 @@ static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
>  		}
>  
>  		if (queue_sync_prod(q) == -EOVERFLOW)
> +			/*
> +			 * TODO: flush pending faults, since the SMMU might have
> +			 * auto-responded to the Last request of a pending
> +			 * group
> +			 */
>  			dev_err(smmu->dev, "PRIQ overflow detected -- requests lost\n");
>  	} while (!queue_empty(q));
>  
> @@ -1647,7 +1650,8 @@ static int arm_smmu_flush_queues(struct notifier_block *nb,
>  	if (master) {
>  		if (master->ste.can_stall)
>  			arm_smmu_flush_queue(smmu, &smmu->evtq.q, "evtq");
> -		/* TODO: add support for PRI */
> +		else if (master->can_fault)
> +			arm_smmu_flush_queue(smmu, &smmu->priq.q, "priq");
>  		return 0;
>  	}
>  
> @@ -2533,6 +2537,46 @@ static int arm_smmu_enable_ats(struct arm_smmu_master_data *master)
>  	return 0;
>  }
>  
> +static int arm_smmu_enable_pri(struct arm_smmu_master_data *master)
> +{
> +	int ret, pos;
> +	struct pci_dev *pdev;
> +	/*
> +	 * TODO: find a good inflight PPR number. We should divide the PRI queue
> +	 * by the number of PRI-capable devices, but it's impossible to know
> +	 * about current and future (hotplugged) devices. So we're at risk of
> +	 * dropping PPRs (and leaking pending requests in the FQ).
> +	 */
> +	size_t max_inflight_pprs = 16;
> +	struct arm_smmu_device *smmu = master->smmu;
> +
> +	if (!(smmu->features & ARM_SMMU_FEAT_PRI) || !dev_is_pci(master->dev))
> +		return -ENOSYS;
> +
> +	pdev = to_pci_dev(master->dev);
> +
> +	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> +	if (!pos)
> +		return -ENOSYS;
> +
> +	ret = pci_reset_pri(pdev);
> +	if (ret)
> +		return ret;
> +
> +	ret = pci_enable_pri(pdev, max_inflight_pprs);
> +	if (ret) {
> +		dev_err(master->dev, "cannot enable PRI: %d\n", ret);
> +		return ret;
> +	}
> +
> +	master->can_fault = true;
> +	master->ste.prg_resp_needs_ssid = pci_prg_resp_requires_prefix(pdev);
> +
> +	dev_dbg(master->dev, "enabled PRI");
> +
> +	return 0;
> +}
> +

The function ordering gets a bit random as you add all the new ones,
Might be better to keep each disable following each enable.

>  static void arm_smmu_disable_ats(struct arm_smmu_master_data *master)
>  {
>  	struct pci_dev *pdev;
> @@ -2548,6 +2592,22 @@ static void arm_smmu_disable_ats(struct arm_smmu_master_data *master)
>  	pci_disable_ats(pdev);
>  }
>  
> +static void arm_smmu_disable_pri(struct arm_smmu_master_data *master)
> +{
> +	struct pci_dev *pdev;
> +
> +	if (!dev_is_pci(master->dev))
> +		return;
> +
> +	pdev = to_pci_dev(master->dev);
> +
> +	if (!pdev->pri_enabled)
> +		return;
> +
> +	pci_disable_pri(pdev);
> +	master->can_fault = false;
> +}
> +
>  static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
>  				  struct arm_smmu_master_data *master)
>  {
> @@ -2668,12 +2728,13 @@ static int arm_smmu_add_device(struct device *dev)
>  		master->ste.can_stall = true;
>  	}
>  
> -	arm_smmu_enable_ats(master);
> +	if (!arm_smmu_enable_ats(master))
> +		arm_smmu_enable_pri(master);
>  
>  	group = iommu_group_get_for_dev(dev);
>  	if (IS_ERR(group)) {
>  		ret = PTR_ERR(group);
> -		goto err_disable_ats;
> +		goto err_disable_pri;
>  	}
>  
>  	iommu_group_put(group);
> @@ -2682,7 +2743,8 @@ static int arm_smmu_add_device(struct device *dev)
>  
>  	return 0;
>  
> -err_disable_ats:
> +err_disable_pri:
> +	arm_smmu_disable_pri(master);
>  	arm_smmu_disable_ats(master);
>  
>  	return ret;
> @@ -2702,6 +2764,8 @@ static void arm_smmu_remove_device(struct device *dev)
>  	if (master && master->ste.assigned)
>  		arm_smmu_detach_dev(dev);
>  	arm_smmu_remove_master(smmu, master);
> +
> +	arm_smmu_disable_pri(master);
>  	arm_smmu_disable_ats(master);
>  
>  	iommu_group_remove_device(dev);

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 35/37] iommu/arm-smmu-v3: Add support for PRI
@ 2018-03-08 16:24         ` Jonathan Cameron
  0 siblings, 0 replies; 317+ messages in thread
From: Jonathan Cameron @ 2018-03-08 16:24 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm,
	joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	shunyong.yang, nwatters, okaya, jcrouse, rfranz, dwmw2,
	jacob.jun.pan, yi.l.liu, ashok.raj, robdclark, christian.koenig,
	bharatku

On Mon, 12 Feb 2018 18:33:50 +0000
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> For PCI devices that support it, enable the PRI capability and handle
> PRI Page Requests with the generic fault handler.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
A couple of nitpicks.

> ---
>  drivers/iommu/arm-smmu-v3.c | 174 ++++++++++++++++++++++++++++++--------------
>  1 file changed, 119 insertions(+), 55 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 8d09615fab35..ace2f995b0c0 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -271,6 +271,7 @@
>  #define STRTAB_STE_1_S1COR_SHIFT	4
>  #define STRTAB_STE_1_S1CSH_SHIFT	6
>  
> +#define STRTAB_STE_1_PPAR		(1UL << 18)
>  #define STRTAB_STE_1_S1STALLD		(1UL << 27)
>  
>  #define STRTAB_STE_1_EATS_ABT		0UL
> @@ -346,9 +347,9 @@
>  #define CMDQ_PRI_1_GRPID_SHIFT		0
>  #define CMDQ_PRI_1_GRPID_MASK		0x1ffUL
>  #define CMDQ_PRI_1_RESP_SHIFT		12
> -#define CMDQ_PRI_1_RESP_DENY		(0UL << CMDQ_PRI_1_RESP_SHIFT)
> -#define CMDQ_PRI_1_RESP_FAIL		(1UL << CMDQ_PRI_1_RESP_SHIFT)
> -#define CMDQ_PRI_1_RESP_SUCC		(2UL << CMDQ_PRI_1_RESP_SHIFT)
> +#define CMDQ_PRI_1_RESP_FAILURE		(0UL << CMDQ_PRI_1_RESP_SHIFT)
> +#define CMDQ_PRI_1_RESP_INVALID		(1UL << CMDQ_PRI_1_RESP_SHIFT)
> +#define CMDQ_PRI_1_RESP_SUCCESS		(2UL << CMDQ_PRI_1_RESP_SHIFT)
Mixing fixing up this naming with the rest of the patch does make things a
little harder to read than they would have been if done as separate patches.
Worth splitting?

>  
>  #define CMDQ_RESUME_0_SID_SHIFT		32
>  #define CMDQ_RESUME_0_SID_MASK		0xffffffffUL
> @@ -442,12 +443,6 @@ module_param_named(disable_ats_check, disable_ats_check, bool, S_IRUGO);
>  MODULE_PARM_DESC(disable_ats_check,
>  	"By default, the SMMU checks whether each incoming transaction marked as translated is allowed by the stream configuration. This option disables the check.");
>  
> -enum pri_resp {
> -	PRI_RESP_DENY,
> -	PRI_RESP_FAIL,
> -	PRI_RESP_SUCC,
> -};
> -
>  enum arm_smmu_msi_index {
>  	EVTQ_MSI_INDEX,
>  	GERROR_MSI_INDEX,
> @@ -530,7 +525,7 @@ struct arm_smmu_cmdq_ent {
>  			u32			sid;
>  			u32			ssid;
>  			u16			grpid;
> -			enum pri_resp		resp;
> +			enum page_response_code	resp;
>  		} pri;
>  
>  		#define CMDQ_OP_RESUME		0x44
> @@ -615,6 +610,7 @@ struct arm_smmu_strtab_ent {
>  	struct arm_smmu_s2_cfg		*s2_cfg;
>  
>  	bool				can_stall;
> +	bool				prg_resp_needs_ssid;
>  };
>  
>  struct arm_smmu_strtab_cfg {
> @@ -969,14 +965,14 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
>  		cmd[0] |= (u64)ent->pri.sid << CMDQ_PRI_0_SID_SHIFT;
>  		cmd[1] |= ent->pri.grpid << CMDQ_PRI_1_GRPID_SHIFT;
>  		switch (ent->pri.resp) {
> -		case PRI_RESP_DENY:
> -			cmd[1] |= CMDQ_PRI_1_RESP_DENY;
> +		case IOMMU_PAGE_RESP_FAILURE:
> +			cmd[1] |= CMDQ_PRI_1_RESP_FAILURE;
>  			break;
> -		case PRI_RESP_FAIL:
> -			cmd[1] |= CMDQ_PRI_1_RESP_FAIL;
> +		case IOMMU_PAGE_RESP_INVALID:
> +			cmd[1] |= CMDQ_PRI_1_RESP_INVALID;
>  			break;
> -		case PRI_RESP_SUCC:
> -			cmd[1] |= CMDQ_PRI_1_RESP_SUCC;
> +		case IOMMU_PAGE_RESP_SUCCESS:
> +			cmd[1] |= CMDQ_PRI_1_RESP_SUCCESS;
>  			break;
>  		default:
>  			return -EINVAL;
> @@ -1180,9 +1176,16 @@ static int arm_smmu_page_response(struct iommu_domain *domain,
>  		cmd.resume.sid		= sid;
>  		cmd.resume.stag		= resp->page_req_group_id;
>  		cmd.resume.resp		= resp->resp_code;
> +	} else if (master->can_fault) {
> +		cmd.opcode		= CMDQ_OP_PRI_RESP;
> +		cmd.substream_valid	= resp->pasid_present &&
> +					  master->ste.prg_resp_needs_ssid;
> +		cmd.pri.sid		= sid;
> +		cmd.pri.ssid		= resp->pasid;
> +		cmd.pri.grpid		= resp->page_req_group_id;
> +		cmd.pri.resp		= resp->resp_code;
>  	} else {
> -		/* TODO: put PRI response here */
> -		return -EINVAL;
> +		return -ENODEV;
>  	}
>  
>  	arm_smmu_cmdq_issue_cmd(master->smmu, &cmd);
> @@ -1309,6 +1312,9 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
>  			  STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1) <<
>  			 STRTAB_STE_1_STRW_SHIFT);
>  
> +		if (ste->prg_resp_needs_ssid)
> +			dst[1] |= STRTAB_STE_1_PPAR;
> +
>  		if (smmu->features & ARM_SMMU_FEAT_STALLS &&
>  		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE) &&
>  		   !ste->can_stall)
> @@ -1536,40 +1542,32 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
>  
>  static void arm_smmu_handle_ppr(struct arm_smmu_device *smmu, u64 *evt)
>  {
> -	u32 sid, ssid;
> -	u16 grpid;
> -	bool ssv, last;
> -
> -	sid = evt[0] >> PRIQ_0_SID_SHIFT & PRIQ_0_SID_MASK;
> -	ssv = evt[0] & PRIQ_0_SSID_V;
> -	ssid = ssv ? evt[0] >> PRIQ_0_SSID_SHIFT & PRIQ_0_SSID_MASK : 0;
> -	last = evt[0] & PRIQ_0_PRG_LAST;
> -	grpid = evt[1] >> PRIQ_1_PRG_IDX_SHIFT & PRIQ_1_PRG_IDX_MASK;
> -
> -	dev_info(smmu->dev, "unexpected PRI request received:\n");
> -	dev_info(smmu->dev,
> -		 "\tsid 0x%08x.0x%05x: [%u%s] %sprivileged %s%s%s access at iova 0x%016llx\n",
> -		 sid, ssid, grpid, last ? "L" : "",
> -		 evt[0] & PRIQ_0_PERM_PRIV ? "" : "un",
> -		 evt[0] & PRIQ_0_PERM_READ ? "R" : "",
> -		 evt[0] & PRIQ_0_PERM_WRITE ? "W" : "",
> -		 evt[0] & PRIQ_0_PERM_EXEC ? "X" : "",
> -		 evt[1] & PRIQ_1_ADDR_MASK << PRIQ_1_ADDR_SHIFT);
> -
> -	if (last) {
> -		struct arm_smmu_cmdq_ent cmd = {
> -			.opcode			= CMDQ_OP_PRI_RESP,
> -			.substream_valid	= ssv,
> -			.pri			= {
> -				.sid	= sid,
> -				.ssid	= ssid,
> -				.grpid	= grpid,
> -				.resp	= PRI_RESP_DENY,
> -			},
> -		};
> +	u32 sid = evt[0] >> PRIQ_0_SID_SHIFT & PRIQ_0_SID_MASK;
>  
> -		arm_smmu_cmdq_issue_cmd(smmu, &cmd);
> -	}
> +	struct arm_smmu_master_data *master;
> +	struct iommu_fault_event fault = {
> +		.type		= IOMMU_FAULT_PAGE_REQ,
> +		.last_req	= !!(evt[0] & PRIQ_0_PRG_LAST),
> +		.pasid_valid	= !!(evt[0] & PRIQ_0_SSID_V),
> +		.pasid		= evt[0] >> PRIQ_0_SSID_SHIFT & PRIQ_0_SSID_MASK,
> +		.page_req_group_id = evt[1] >> PRIQ_1_PRG_IDX_SHIFT & PRIQ_1_PRG_IDX_MASK,
> +		.addr		= evt[1] & PRIQ_1_ADDR_MASK << PRIQ_1_ADDR_SHIFT,
> +	};
> +
> +	if (evt[0] & PRIQ_0_PERM_READ)
> +		fault.prot |= IOMMU_FAULT_READ;
> +	if (evt[0] & PRIQ_0_PERM_WRITE)
> +		fault.prot |= IOMMU_FAULT_WRITE;
> +	if (evt[0] & PRIQ_0_PERM_EXEC)
> +		fault.prot |= IOMMU_FAULT_EXEC;
> +	if (evt[0] & PRIQ_0_PERM_PRIV)
> +		fault.prot |= IOMMU_FAULT_PRIV;
> +
> +	master = arm_smmu_find_master(smmu, sid);
> +	if (WARN_ON(!master))
> +		return;
> +
> +	iommu_report_device_fault(master->dev, &fault);
>  }
>  
>  static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
> @@ -1594,6 +1592,11 @@ static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
>  		}
>  
>  		if (queue_sync_prod(q) == -EOVERFLOW)
> +			/*
> +			 * TODO: flush pending faults, since the SMMU might have
> +			 * auto-responded to the Last request of a pending
> +			 * group
> +			 */
>  			dev_err(smmu->dev, "PRIQ overflow detected -- requests lost\n");
>  	} while (!queue_empty(q));
>  
> @@ -1647,7 +1650,8 @@ static int arm_smmu_flush_queues(struct notifier_block *nb,
>  	if (master) {
>  		if (master->ste.can_stall)
>  			arm_smmu_flush_queue(smmu, &smmu->evtq.q, "evtq");
> -		/* TODO: add support for PRI */
> +		else if (master->can_fault)
> +			arm_smmu_flush_queue(smmu, &smmu->priq.q, "priq");
>  		return 0;
>  	}
>  
> @@ -2533,6 +2537,46 @@ static int arm_smmu_enable_ats(struct arm_smmu_master_data *master)
>  	return 0;
>  }
>  
> +static int arm_smmu_enable_pri(struct arm_smmu_master_data *master)
> +{
> +	int ret, pos;
> +	struct pci_dev *pdev;
> +	/*
> +	 * TODO: find a good inflight PPR number. We should divide the PRI queue
> +	 * by the number of PRI-capable devices, but it's impossible to know
> +	 * about current and future (hotplugged) devices. So we're at risk of
> +	 * dropping PPRs (and leaking pending requests in the FQ).
> +	 */
> +	size_t max_inflight_pprs = 16;
> +	struct arm_smmu_device *smmu = master->smmu;
> +
> +	if (!(smmu->features & ARM_SMMU_FEAT_PRI) || !dev_is_pci(master->dev))
> +		return -ENOSYS;
> +
> +	pdev = to_pci_dev(master->dev);
> +
> +	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> +	if (!pos)
> +		return -ENOSYS;
> +
> +	ret = pci_reset_pri(pdev);
> +	if (ret)
> +		return ret;
> +
> +	ret = pci_enable_pri(pdev, max_inflight_pprs);
> +	if (ret) {
> +		dev_err(master->dev, "cannot enable PRI: %d\n", ret);
> +		return ret;
> +	}
> +
> +	master->can_fault = true;
> +	master->ste.prg_resp_needs_ssid = pci_prg_resp_requires_prefix(pdev);
> +
> +	dev_dbg(master->dev, "enabled PRI");
> +
> +	return 0;
> +}
> +

The function ordering gets a bit random as you add all the new ones,
Might be better to keep each disable following each enable.

>  static void arm_smmu_disable_ats(struct arm_smmu_master_data *master)
>  {
>  	struct pci_dev *pdev;
> @@ -2548,6 +2592,22 @@ static void arm_smmu_disable_ats(struct arm_smmu_master_data *master)
>  	pci_disable_ats(pdev);
>  }
>  
> +static void arm_smmu_disable_pri(struct arm_smmu_master_data *master)
> +{
> +	struct pci_dev *pdev;
> +
> +	if (!dev_is_pci(master->dev))
> +		return;
> +
> +	pdev = to_pci_dev(master->dev);
> +
> +	if (!pdev->pri_enabled)
> +		return;
> +
> +	pci_disable_pri(pdev);
> +	master->can_fault = false;
> +}
> +
>  static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
>  				  struct arm_smmu_master_data *master)
>  {
> @@ -2668,12 +2728,13 @@ static int arm_smmu_add_device(struct device *dev)
>  		master->ste.can_stall = true;
>  	}
>  
> -	arm_smmu_enable_ats(master);
> +	if (!arm_smmu_enable_ats(master))
> +		arm_smmu_enable_pri(master);
>  
>  	group = iommu_group_get_for_dev(dev);
>  	if (IS_ERR(group)) {
>  		ret = PTR_ERR(group);
> -		goto err_disable_ats;
> +		goto err_disable_pri;
>  	}
>  
>  	iommu_group_put(group);
> @@ -2682,7 +2743,8 @@ static int arm_smmu_add_device(struct device *dev)
>  
>  	return 0;
>  
> -err_disable_ats:
> +err_disable_pri:
> +	arm_smmu_disable_pri(master);
>  	arm_smmu_disable_ats(master);
>  
>  	return ret;
> @@ -2702,6 +2764,8 @@ static void arm_smmu_remove_device(struct device *dev)
>  	if (master && master->ste.assigned)
>  		arm_smmu_detach_dev(dev);
>  	arm_smmu_remove_master(smmu, master);
> +
> +	arm_smmu_disable_pri(master);
>  	arm_smmu_disable_ats(master);
>  
>  	iommu_group_remove_device(dev);

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 35/37] iommu/arm-smmu-v3: Add support for PRI
@ 2018-03-08 16:24         ` Jonathan Cameron
  0 siblings, 0 replies; 317+ messages in thread
From: Jonathan Cameron @ 2018-03-08 16:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 12 Feb 2018 18:33:50 +0000
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> For PCI devices that support it, enable the PRI capability and handle
> PRI Page Requests with the generic fault handler.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
A couple of nitpicks.

> ---
>  drivers/iommu/arm-smmu-v3.c | 174 ++++++++++++++++++++++++++++++--------------
>  1 file changed, 119 insertions(+), 55 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 8d09615fab35..ace2f995b0c0 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -271,6 +271,7 @@
>  #define STRTAB_STE_1_S1COR_SHIFT	4
>  #define STRTAB_STE_1_S1CSH_SHIFT	6
>  
> +#define STRTAB_STE_1_PPAR		(1UL << 18)
>  #define STRTAB_STE_1_S1STALLD		(1UL << 27)
>  
>  #define STRTAB_STE_1_EATS_ABT		0UL
> @@ -346,9 +347,9 @@
>  #define CMDQ_PRI_1_GRPID_SHIFT		0
>  #define CMDQ_PRI_1_GRPID_MASK		0x1ffUL
>  #define CMDQ_PRI_1_RESP_SHIFT		12
> -#define CMDQ_PRI_1_RESP_DENY		(0UL << CMDQ_PRI_1_RESP_SHIFT)
> -#define CMDQ_PRI_1_RESP_FAIL		(1UL << CMDQ_PRI_1_RESP_SHIFT)
> -#define CMDQ_PRI_1_RESP_SUCC		(2UL << CMDQ_PRI_1_RESP_SHIFT)
> +#define CMDQ_PRI_1_RESP_FAILURE		(0UL << CMDQ_PRI_1_RESP_SHIFT)
> +#define CMDQ_PRI_1_RESP_INVALID		(1UL << CMDQ_PRI_1_RESP_SHIFT)
> +#define CMDQ_PRI_1_RESP_SUCCESS		(2UL << CMDQ_PRI_1_RESP_SHIFT)
Mixing fixing up this naming with the rest of the patch does make things a
little harder to read than they would have been if done as separate patches.
Worth splitting?

>  
>  #define CMDQ_RESUME_0_SID_SHIFT		32
>  #define CMDQ_RESUME_0_SID_MASK		0xffffffffUL
> @@ -442,12 +443,6 @@ module_param_named(disable_ats_check, disable_ats_check, bool, S_IRUGO);
>  MODULE_PARM_DESC(disable_ats_check,
>  	"By default, the SMMU checks whether each incoming transaction marked as translated is allowed by the stream configuration. This option disables the check.");
>  
> -enum pri_resp {
> -	PRI_RESP_DENY,
> -	PRI_RESP_FAIL,
> -	PRI_RESP_SUCC,
> -};
> -
>  enum arm_smmu_msi_index {
>  	EVTQ_MSI_INDEX,
>  	GERROR_MSI_INDEX,
> @@ -530,7 +525,7 @@ struct arm_smmu_cmdq_ent {
>  			u32			sid;
>  			u32			ssid;
>  			u16			grpid;
> -			enum pri_resp		resp;
> +			enum page_response_code	resp;
>  		} pri;
>  
>  		#define CMDQ_OP_RESUME		0x44
> @@ -615,6 +610,7 @@ struct arm_smmu_strtab_ent {
>  	struct arm_smmu_s2_cfg		*s2_cfg;
>  
>  	bool				can_stall;
> +	bool				prg_resp_needs_ssid;
>  };
>  
>  struct arm_smmu_strtab_cfg {
> @@ -969,14 +965,14 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
>  		cmd[0] |= (u64)ent->pri.sid << CMDQ_PRI_0_SID_SHIFT;
>  		cmd[1] |= ent->pri.grpid << CMDQ_PRI_1_GRPID_SHIFT;
>  		switch (ent->pri.resp) {
> -		case PRI_RESP_DENY:
> -			cmd[1] |= CMDQ_PRI_1_RESP_DENY;
> +		case IOMMU_PAGE_RESP_FAILURE:
> +			cmd[1] |= CMDQ_PRI_1_RESP_FAILURE;
>  			break;
> -		case PRI_RESP_FAIL:
> -			cmd[1] |= CMDQ_PRI_1_RESP_FAIL;
> +		case IOMMU_PAGE_RESP_INVALID:
> +			cmd[1] |= CMDQ_PRI_1_RESP_INVALID;
>  			break;
> -		case PRI_RESP_SUCC:
> -			cmd[1] |= CMDQ_PRI_1_RESP_SUCC;
> +		case IOMMU_PAGE_RESP_SUCCESS:
> +			cmd[1] |= CMDQ_PRI_1_RESP_SUCCESS;
>  			break;
>  		default:
>  			return -EINVAL;
> @@ -1180,9 +1176,16 @@ static int arm_smmu_page_response(struct iommu_domain *domain,
>  		cmd.resume.sid		= sid;
>  		cmd.resume.stag		= resp->page_req_group_id;
>  		cmd.resume.resp		= resp->resp_code;
> +	} else if (master->can_fault) {
> +		cmd.opcode		= CMDQ_OP_PRI_RESP;
> +		cmd.substream_valid	= resp->pasid_present &&
> +					  master->ste.prg_resp_needs_ssid;
> +		cmd.pri.sid		= sid;
> +		cmd.pri.ssid		= resp->pasid;
> +		cmd.pri.grpid		= resp->page_req_group_id;
> +		cmd.pri.resp		= resp->resp_code;
>  	} else {
> -		/* TODO: put PRI response here */
> -		return -EINVAL;
> +		return -ENODEV;
>  	}
>  
>  	arm_smmu_cmdq_issue_cmd(master->smmu, &cmd);
> @@ -1309,6 +1312,9 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
>  			  STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1) <<
>  			 STRTAB_STE_1_STRW_SHIFT);
>  
> +		if (ste->prg_resp_needs_ssid)
> +			dst[1] |= STRTAB_STE_1_PPAR;
> +
>  		if (smmu->features & ARM_SMMU_FEAT_STALLS &&
>  		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE) &&
>  		   !ste->can_stall)
> @@ -1536,40 +1542,32 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
>  
>  static void arm_smmu_handle_ppr(struct arm_smmu_device *smmu, u64 *evt)
>  {
> -	u32 sid, ssid;
> -	u16 grpid;
> -	bool ssv, last;
> -
> -	sid = evt[0] >> PRIQ_0_SID_SHIFT & PRIQ_0_SID_MASK;
> -	ssv = evt[0] & PRIQ_0_SSID_V;
> -	ssid = ssv ? evt[0] >> PRIQ_0_SSID_SHIFT & PRIQ_0_SSID_MASK : 0;
> -	last = evt[0] & PRIQ_0_PRG_LAST;
> -	grpid = evt[1] >> PRIQ_1_PRG_IDX_SHIFT & PRIQ_1_PRG_IDX_MASK;
> -
> -	dev_info(smmu->dev, "unexpected PRI request received:\n");
> -	dev_info(smmu->dev,
> -		 "\tsid 0x%08x.0x%05x: [%u%s] %sprivileged %s%s%s access at iova 0x%016llx\n",
> -		 sid, ssid, grpid, last ? "L" : "",
> -		 evt[0] & PRIQ_0_PERM_PRIV ? "" : "un",
> -		 evt[0] & PRIQ_0_PERM_READ ? "R" : "",
> -		 evt[0] & PRIQ_0_PERM_WRITE ? "W" : "",
> -		 evt[0] & PRIQ_0_PERM_EXEC ? "X" : "",
> -		 evt[1] & PRIQ_1_ADDR_MASK << PRIQ_1_ADDR_SHIFT);
> -
> -	if (last) {
> -		struct arm_smmu_cmdq_ent cmd = {
> -			.opcode			= CMDQ_OP_PRI_RESP,
> -			.substream_valid	= ssv,
> -			.pri			= {
> -				.sid	= sid,
> -				.ssid	= ssid,
> -				.grpid	= grpid,
> -				.resp	= PRI_RESP_DENY,
> -			},
> -		};
> +	u32 sid = evt[0] >> PRIQ_0_SID_SHIFT & PRIQ_0_SID_MASK;
>  
> -		arm_smmu_cmdq_issue_cmd(smmu, &cmd);
> -	}
> +	struct arm_smmu_master_data *master;
> +	struct iommu_fault_event fault = {
> +		.type		= IOMMU_FAULT_PAGE_REQ,
> +		.last_req	= !!(evt[0] & PRIQ_0_PRG_LAST),
> +		.pasid_valid	= !!(evt[0] & PRIQ_0_SSID_V),
> +		.pasid		= evt[0] >> PRIQ_0_SSID_SHIFT & PRIQ_0_SSID_MASK,
> +		.page_req_group_id = evt[1] >> PRIQ_1_PRG_IDX_SHIFT & PRIQ_1_PRG_IDX_MASK,
> +		.addr		= evt[1] & PRIQ_1_ADDR_MASK << PRIQ_1_ADDR_SHIFT,
> +	};
> +
> +	if (evt[0] & PRIQ_0_PERM_READ)
> +		fault.prot |= IOMMU_FAULT_READ;
> +	if (evt[0] & PRIQ_0_PERM_WRITE)
> +		fault.prot |= IOMMU_FAULT_WRITE;
> +	if (evt[0] & PRIQ_0_PERM_EXEC)
> +		fault.prot |= IOMMU_FAULT_EXEC;
> +	if (evt[0] & PRIQ_0_PERM_PRIV)
> +		fault.prot |= IOMMU_FAULT_PRIV;
> +
> +	master = arm_smmu_find_master(smmu, sid);
> +	if (WARN_ON(!master))
> +		return;
> +
> +	iommu_report_device_fault(master->dev, &fault);
>  }
>  
>  static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
> @@ -1594,6 +1592,11 @@ static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
>  		}
>  
>  		if (queue_sync_prod(q) == -EOVERFLOW)
> +			/*
> +			 * TODO: flush pending faults, since the SMMU might have
> +			 * auto-responded to the Last request of a pending
> +			 * group
> +			 */
>  			dev_err(smmu->dev, "PRIQ overflow detected -- requests lost\n");
>  	} while (!queue_empty(q));
>  
> @@ -1647,7 +1650,8 @@ static int arm_smmu_flush_queues(struct notifier_block *nb,
>  	if (master) {
>  		if (master->ste.can_stall)
>  			arm_smmu_flush_queue(smmu, &smmu->evtq.q, "evtq");
> -		/* TODO: add support for PRI */
> +		else if (master->can_fault)
> +			arm_smmu_flush_queue(smmu, &smmu->priq.q, "priq");
>  		return 0;
>  	}
>  
> @@ -2533,6 +2537,46 @@ static int arm_smmu_enable_ats(struct arm_smmu_master_data *master)
>  	return 0;
>  }
>  
> +static int arm_smmu_enable_pri(struct arm_smmu_master_data *master)
> +{
> +	int ret, pos;
> +	struct pci_dev *pdev;
> +	/*
> +	 * TODO: find a good inflight PPR number. We should divide the PRI queue
> +	 * by the number of PRI-capable devices, but it's impossible to know
> +	 * about current and future (hotplugged) devices. So we're at risk of
> +	 * dropping PPRs (and leaking pending requests in the FQ).
> +	 */
> +	size_t max_inflight_pprs = 16;
> +	struct arm_smmu_device *smmu = master->smmu;
> +
> +	if (!(smmu->features & ARM_SMMU_FEAT_PRI) || !dev_is_pci(master->dev))
> +		return -ENOSYS;
> +
> +	pdev = to_pci_dev(master->dev);
> +
> +	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> +	if (!pos)
> +		return -ENOSYS;
> +
> +	ret = pci_reset_pri(pdev);
> +	if (ret)
> +		return ret;
> +
> +	ret = pci_enable_pri(pdev, max_inflight_pprs);
> +	if (ret) {
> +		dev_err(master->dev, "cannot enable PRI: %d\n", ret);
> +		return ret;
> +	}
> +
> +	master->can_fault = true;
> +	master->ste.prg_resp_needs_ssid = pci_prg_resp_requires_prefix(pdev);
> +
> +	dev_dbg(master->dev, "enabled PRI");
> +
> +	return 0;
> +}
> +

The function ordering gets a bit random as you add all the new ones,
Might be better to keep each disable following each enable.

>  static void arm_smmu_disable_ats(struct arm_smmu_master_data *master)
>  {
>  	struct pci_dev *pdev;
> @@ -2548,6 +2592,22 @@ static void arm_smmu_disable_ats(struct arm_smmu_master_data *master)
>  	pci_disable_ats(pdev);
>  }
>  
> +static void arm_smmu_disable_pri(struct arm_smmu_master_data *master)
> +{
> +	struct pci_dev *pdev;
> +
> +	if (!dev_is_pci(master->dev))
> +		return;
> +
> +	pdev = to_pci_dev(master->dev);
> +
> +	if (!pdev->pri_enabled)
> +		return;
> +
> +	pci_disable_pri(pdev);
> +	master->can_fault = false;
> +}
> +
>  static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
>  				  struct arm_smmu_master_data *master)
>  {
> @@ -2668,12 +2728,13 @@ static int arm_smmu_add_device(struct device *dev)
>  		master->ste.can_stall = true;
>  	}
>  
> -	arm_smmu_enable_ats(master);
> +	if (!arm_smmu_enable_ats(master))
> +		arm_smmu_enable_pri(master);
>  
>  	group = iommu_group_get_for_dev(dev);
>  	if (IS_ERR(group)) {
>  		ret = PTR_ERR(group);
> -		goto err_disable_ats;
> +		goto err_disable_pri;
>  	}
>  
>  	iommu_group_put(group);
> @@ -2682,7 +2743,8 @@ static int arm_smmu_add_device(struct device *dev)
>  
>  	return 0;
>  
> -err_disable_ats:
> +err_disable_pri:
> +	arm_smmu_disable_pri(master);
>  	arm_smmu_disable_ats(master);
>  
>  	return ret;
> @@ -2702,6 +2764,8 @@ static void arm_smmu_remove_device(struct device *dev)
>  	if (master && master->ste.assigned)
>  		arm_smmu_detach_dev(dev);
>  	arm_smmu_remove_master(smmu, master);
> +
> +	arm_smmu_disable_pri(master);
>  	arm_smmu_disable_ats(master);
>  
>  	iommu_group_remove_device(dev);

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 28/37] iommu/arm-smmu-v3: Maintain a SID->device structure
  2018-02-12 18:33     ` Jean-Philippe Brucker
  (?)
@ 2018-03-08 17:34         ` Jonathan Cameron
  -1 siblings, 0 replies; 317+ messages in thread
From: Jonathan Cameron @ 2018-03-08 17:34 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: mark.rutland-5wv7dgnIgG8,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, catalin.marinas-5wv7dgnIgG8,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw-LthD3rsA81gm4RdzfppkhA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	sudeep.holla-5wv7dgnIgG8, christian.koenig-5C7GfCeVMHo

On Mon, 12 Feb 2018 18:33:43 +0000
Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:

> When handling faults from the event or PRI queue, we need to find the
> struct device associated to a SID. Add a rb_tree to keep track of SIDs.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
nipick inline.


> ---
>  drivers/iommu/arm-smmu-v3.c | 105 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 105 insertions(+)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index c5b3a43becaf..2430b2140f8d 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -615,10 +615,19 @@ struct arm_smmu_device {
>  	/* IOMMU core code handle */
>  	struct iommu_device		iommu;
>  
> +	struct rb_root			streams;
> +	struct mutex			streams_mutex;
> +
>  	/* Notifier for the fault queue */
>  	struct notifier_block		faultq_nb;
>  };
>  
> +struct arm_smmu_stream {
> +	u32				id;
> +	struct arm_smmu_master_data	*master;
> +	struct rb_node			node;
> +};
> +
>  /* SMMU private data for each master */
>  struct arm_smmu_master_data {
>  	struct arm_smmu_device		*smmu;
> @@ -626,6 +635,7 @@ struct arm_smmu_master_data {
>  
>  	struct arm_smmu_domain		*domain;
>  	struct list_head		list; /* domain->devices */
> +	struct arm_smmu_stream		*streams;
>  
>  	struct device			*dev;
>  
> @@ -1250,6 +1260,31 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
>  	return 0;
>  }
>  
> +static struct arm_smmu_master_data *
> +arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
> +{
> +	struct rb_node *node;
> +	struct arm_smmu_stream *stream;
> +	struct arm_smmu_master_data *master = NULL;
> +
> +	mutex_lock(&smmu->streams_mutex);
> +	node = smmu->streams.rb_node;
> +	while (node) {
> +		stream = rb_entry(node, struct arm_smmu_stream, node);
> +		if (stream->id < sid) {
> +			node = node->rb_right;
> +		} else if (stream->id > sid) {
> +			node = node->rb_left;
> +		} else {
> +			master = stream->master;
> +			break;
> +		}
> +	}
> +	mutex_unlock(&smmu->streams_mutex);
> +
> +	return master;
> +}
> +
>  /* IRQ and event handlers */
>  static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
>  {
> @@ -2146,6 +2181,71 @@ static bool arm_smmu_sid_in_range(struct arm_smmu_device *smmu, u32 sid)
>  	return sid < limit;
>  }
>  
> +static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
> +				  struct arm_smmu_master_data *master)
> +{
> +	int i;
> +	int ret = 0;
> +	struct arm_smmu_stream *new_stream, *cur_stream;
> +	struct rb_node **new_node, *parent_node = NULL;
> +	struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
> +
> +	master->streams = kcalloc(fwspec->num_ids,
> +				  sizeof(struct arm_smmu_stream), GFP_KERNEL);
> +	if (!master->streams)
> +		return -ENOMEM;
> +
> +	mutex_lock(&smmu->streams_mutex);
> +	for (i = 0; i < fwspec->num_ids && !ret; i++) {
> +		new_stream = &master->streams[i];
> +		new_stream->id = fwspec->ids[i];
> +		new_stream->master = master;
> +
> +		new_node = &(smmu->streams.rb_node);
> +		while (*new_node) {
> +			cur_stream = rb_entry(*new_node, struct arm_smmu_stream,
> +					      node);
> +			parent_node = *new_node;
> +			if (cur_stream->id > new_stream->id) {
> +				new_node = &((*new_node)->rb_left);
> +			} else if (cur_stream->id < new_stream->id) {
> +				new_node = &((*new_node)->rb_right);
> +			} else {
> +				dev_warn(master->dev,
> +					 "stream %u already in tree\n",
> +					 cur_stream->id);
> +				ret = -EINVAL;
> +				break;
> +			}
> +		}
> +
> +		if (!ret) {
> +			rb_link_node(&new_stream->node, parent_node, new_node);
> +			rb_insert_color(&new_stream->node, &smmu->streams);
> +		}
> +	}
> +	mutex_unlock(&smmu->streams_mutex);
> +
> +	return ret;
> +}
> +
> +static void arm_smmu_remove_master(struct arm_smmu_device *smmu,
> +				   struct arm_smmu_master_data *master)
> +{
> +	int i;
> +	struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
> +
> +	if (!master->streams)
> +		return;
> +
> +	mutex_lock(&smmu->streams_mutex);
> +	for (i = 0; i < fwspec->num_ids; i++)
> +		rb_erase(&master->streams[i].node, &smmu->streams);
> +	mutex_unlock(&smmu->streams_mutex);
> +
> +	kfree(master->streams);
> +}
> +
>  static struct iommu_ops arm_smmu_ops;
>  
>  static int arm_smmu_add_device(struct device *dev)
> @@ -2198,6 +2298,7 @@ static int arm_smmu_add_device(struct device *dev)
>  
>  	group = iommu_group_get_for_dev(dev);
>  	if (!IS_ERR(group)) {
> +		arm_smmu_insert_master(smmu, master);
There are some error cases it would be good to take notice off when
inserting the master.  Admittedly the same is true of iommu_device_link
so I guess you are keeping with the existing code style.

Would also be nice if the later bit of rework to drop these out
of the if statement was done before this patch in the series.


>  		iommu_group_put(group);
>  		iommu_device_link(&smmu->iommu, dev);
>  	}
> @@ -2218,6 +2319,7 @@ static void arm_smmu_remove_device(struct device *dev)
>  	smmu = master->smmu;
>  	if (master && master->ste.assigned)
>  		arm_smmu_detach_dev(dev);
> +	arm_smmu_remove_master(smmu, master);
>  	iommu_group_remove_device(dev);
>  	iommu_device_unlink(&smmu->iommu, dev);
>  	kfree(master);
> @@ -2527,6 +2629,9 @@ static int arm_smmu_init_structures(struct arm_smmu_device *smmu)
>  	int ret;
>  
>  	atomic_set(&smmu->sync_nr, 0);
> +	mutex_init(&smmu->streams_mutex);
> +	smmu->streams = RB_ROOT;
> +
>  	ret = arm_smmu_init_queues(smmu);
>  	if (ret)
>  		return ret;

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 28/37] iommu/arm-smmu-v3: Maintain a SID->device structure
@ 2018-03-08 17:34         ` Jonathan Cameron
  0 siblings, 0 replies; 317+ messages in thread
From: Jonathan Cameron @ 2018-03-08 17:34 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm,
	joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	shunyong.yang, nwatters, okaya, jcrouse, rfranz, dwmw2,
	jacob.jun.pan, yi.l.liu, ashok.raj, robdclark, christian.koenig,
	bharatku

On Mon, 12 Feb 2018 18:33:43 +0000
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> When handling faults from the event or PRI queue, we need to find the
> struct device associated to a SID. Add a rb_tree to keep track of SIDs.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
nipick inline.


> ---
>  drivers/iommu/arm-smmu-v3.c | 105 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 105 insertions(+)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index c5b3a43becaf..2430b2140f8d 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -615,10 +615,19 @@ struct arm_smmu_device {
>  	/* IOMMU core code handle */
>  	struct iommu_device		iommu;
>  
> +	struct rb_root			streams;
> +	struct mutex			streams_mutex;
> +
>  	/* Notifier for the fault queue */
>  	struct notifier_block		faultq_nb;
>  };
>  
> +struct arm_smmu_stream {
> +	u32				id;
> +	struct arm_smmu_master_data	*master;
> +	struct rb_node			node;
> +};
> +
>  /* SMMU private data for each master */
>  struct arm_smmu_master_data {
>  	struct arm_smmu_device		*smmu;
> @@ -626,6 +635,7 @@ struct arm_smmu_master_data {
>  
>  	struct arm_smmu_domain		*domain;
>  	struct list_head		list; /* domain->devices */
> +	struct arm_smmu_stream		*streams;
>  
>  	struct device			*dev;
>  
> @@ -1250,6 +1260,31 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
>  	return 0;
>  }
>  
> +static struct arm_smmu_master_data *
> +arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
> +{
> +	struct rb_node *node;
> +	struct arm_smmu_stream *stream;
> +	struct arm_smmu_master_data *master = NULL;
> +
> +	mutex_lock(&smmu->streams_mutex);
> +	node = smmu->streams.rb_node;
> +	while (node) {
> +		stream = rb_entry(node, struct arm_smmu_stream, node);
> +		if (stream->id < sid) {
> +			node = node->rb_right;
> +		} else if (stream->id > sid) {
> +			node = node->rb_left;
> +		} else {
> +			master = stream->master;
> +			break;
> +		}
> +	}
> +	mutex_unlock(&smmu->streams_mutex);
> +
> +	return master;
> +}
> +
>  /* IRQ and event handlers */
>  static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
>  {
> @@ -2146,6 +2181,71 @@ static bool arm_smmu_sid_in_range(struct arm_smmu_device *smmu, u32 sid)
>  	return sid < limit;
>  }
>  
> +static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
> +				  struct arm_smmu_master_data *master)
> +{
> +	int i;
> +	int ret = 0;
> +	struct arm_smmu_stream *new_stream, *cur_stream;
> +	struct rb_node **new_node, *parent_node = NULL;
> +	struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
> +
> +	master->streams = kcalloc(fwspec->num_ids,
> +				  sizeof(struct arm_smmu_stream), GFP_KERNEL);
> +	if (!master->streams)
> +		return -ENOMEM;
> +
> +	mutex_lock(&smmu->streams_mutex);
> +	for (i = 0; i < fwspec->num_ids && !ret; i++) {
> +		new_stream = &master->streams[i];
> +		new_stream->id = fwspec->ids[i];
> +		new_stream->master = master;
> +
> +		new_node = &(smmu->streams.rb_node);
> +		while (*new_node) {
> +			cur_stream = rb_entry(*new_node, struct arm_smmu_stream,
> +					      node);
> +			parent_node = *new_node;
> +			if (cur_stream->id > new_stream->id) {
> +				new_node = &((*new_node)->rb_left);
> +			} else if (cur_stream->id < new_stream->id) {
> +				new_node = &((*new_node)->rb_right);
> +			} else {
> +				dev_warn(master->dev,
> +					 "stream %u already in tree\n",
> +					 cur_stream->id);
> +				ret = -EINVAL;
> +				break;
> +			}
> +		}
> +
> +		if (!ret) {
> +			rb_link_node(&new_stream->node, parent_node, new_node);
> +			rb_insert_color(&new_stream->node, &smmu->streams);
> +		}
> +	}
> +	mutex_unlock(&smmu->streams_mutex);
> +
> +	return ret;
> +}
> +
> +static void arm_smmu_remove_master(struct arm_smmu_device *smmu,
> +				   struct arm_smmu_master_data *master)
> +{
> +	int i;
> +	struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
> +
> +	if (!master->streams)
> +		return;
> +
> +	mutex_lock(&smmu->streams_mutex);
> +	for (i = 0; i < fwspec->num_ids; i++)
> +		rb_erase(&master->streams[i].node, &smmu->streams);
> +	mutex_unlock(&smmu->streams_mutex);
> +
> +	kfree(master->streams);
> +}
> +
>  static struct iommu_ops arm_smmu_ops;
>  
>  static int arm_smmu_add_device(struct device *dev)
> @@ -2198,6 +2298,7 @@ static int arm_smmu_add_device(struct device *dev)
>  
>  	group = iommu_group_get_for_dev(dev);
>  	if (!IS_ERR(group)) {
> +		arm_smmu_insert_master(smmu, master);
There are some error cases it would be good to take notice off when
inserting the master.  Admittedly the same is true of iommu_device_link
so I guess you are keeping with the existing code style.

Would also be nice if the later bit of rework to drop these out
of the if statement was done before this patch in the series.


>  		iommu_group_put(group);
>  		iommu_device_link(&smmu->iommu, dev);
>  	}
> @@ -2218,6 +2319,7 @@ static void arm_smmu_remove_device(struct device *dev)
>  	smmu = master->smmu;
>  	if (master && master->ste.assigned)
>  		arm_smmu_detach_dev(dev);
> +	arm_smmu_remove_master(smmu, master);
>  	iommu_group_remove_device(dev);
>  	iommu_device_unlink(&smmu->iommu, dev);
>  	kfree(master);
> @@ -2527,6 +2629,9 @@ static int arm_smmu_init_structures(struct arm_smmu_device *smmu)
>  	int ret;
>  
>  	atomic_set(&smmu->sync_nr, 0);
> +	mutex_init(&smmu->streams_mutex);
> +	smmu->streams = RB_ROOT;
> +
>  	ret = arm_smmu_init_queues(smmu);
>  	if (ret)
>  		return ret;


^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 28/37] iommu/arm-smmu-v3: Maintain a SID->device structure
@ 2018-03-08 17:34         ` Jonathan Cameron
  0 siblings, 0 replies; 317+ messages in thread
From: Jonathan Cameron @ 2018-03-08 17:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 12 Feb 2018 18:33:43 +0000
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> When handling faults from the event or PRI queue, we need to find the
> struct device associated to a SID. Add a rb_tree to keep track of SIDs.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
nipick inline.


> ---
>  drivers/iommu/arm-smmu-v3.c | 105 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 105 insertions(+)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index c5b3a43becaf..2430b2140f8d 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -615,10 +615,19 @@ struct arm_smmu_device {
>  	/* IOMMU core code handle */
>  	struct iommu_device		iommu;
>  
> +	struct rb_root			streams;
> +	struct mutex			streams_mutex;
> +
>  	/* Notifier for the fault queue */
>  	struct notifier_block		faultq_nb;
>  };
>  
> +struct arm_smmu_stream {
> +	u32				id;
> +	struct arm_smmu_master_data	*master;
> +	struct rb_node			node;
> +};
> +
>  /* SMMU private data for each master */
>  struct arm_smmu_master_data {
>  	struct arm_smmu_device		*smmu;
> @@ -626,6 +635,7 @@ struct arm_smmu_master_data {
>  
>  	struct arm_smmu_domain		*domain;
>  	struct list_head		list; /* domain->devices */
> +	struct arm_smmu_stream		*streams;
>  
>  	struct device			*dev;
>  
> @@ -1250,6 +1260,31 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
>  	return 0;
>  }
>  
> +static struct arm_smmu_master_data *
> +arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
> +{
> +	struct rb_node *node;
> +	struct arm_smmu_stream *stream;
> +	struct arm_smmu_master_data *master = NULL;
> +
> +	mutex_lock(&smmu->streams_mutex);
> +	node = smmu->streams.rb_node;
> +	while (node) {
> +		stream = rb_entry(node, struct arm_smmu_stream, node);
> +		if (stream->id < sid) {
> +			node = node->rb_right;
> +		} else if (stream->id > sid) {
> +			node = node->rb_left;
> +		} else {
> +			master = stream->master;
> +			break;
> +		}
> +	}
> +	mutex_unlock(&smmu->streams_mutex);
> +
> +	return master;
> +}
> +
>  /* IRQ and event handlers */
>  static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
>  {
> @@ -2146,6 +2181,71 @@ static bool arm_smmu_sid_in_range(struct arm_smmu_device *smmu, u32 sid)
>  	return sid < limit;
>  }
>  
> +static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
> +				  struct arm_smmu_master_data *master)
> +{
> +	int i;
> +	int ret = 0;
> +	struct arm_smmu_stream *new_stream, *cur_stream;
> +	struct rb_node **new_node, *parent_node = NULL;
> +	struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
> +
> +	master->streams = kcalloc(fwspec->num_ids,
> +				  sizeof(struct arm_smmu_stream), GFP_KERNEL);
> +	if (!master->streams)
> +		return -ENOMEM;
> +
> +	mutex_lock(&smmu->streams_mutex);
> +	for (i = 0; i < fwspec->num_ids && !ret; i++) {
> +		new_stream = &master->streams[i];
> +		new_stream->id = fwspec->ids[i];
> +		new_stream->master = master;
> +
> +		new_node = &(smmu->streams.rb_node);
> +		while (*new_node) {
> +			cur_stream = rb_entry(*new_node, struct arm_smmu_stream,
> +					      node);
> +			parent_node = *new_node;
> +			if (cur_stream->id > new_stream->id) {
> +				new_node = &((*new_node)->rb_left);
> +			} else if (cur_stream->id < new_stream->id) {
> +				new_node = &((*new_node)->rb_right);
> +			} else {
> +				dev_warn(master->dev,
> +					 "stream %u already in tree\n",
> +					 cur_stream->id);
> +				ret = -EINVAL;
> +				break;
> +			}
> +		}
> +
> +		if (!ret) {
> +			rb_link_node(&new_stream->node, parent_node, new_node);
> +			rb_insert_color(&new_stream->node, &smmu->streams);
> +		}
> +	}
> +	mutex_unlock(&smmu->streams_mutex);
> +
> +	return ret;
> +}
> +
> +static void arm_smmu_remove_master(struct arm_smmu_device *smmu,
> +				   struct arm_smmu_master_data *master)
> +{
> +	int i;
> +	struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
> +
> +	if (!master->streams)
> +		return;
> +
> +	mutex_lock(&smmu->streams_mutex);
> +	for (i = 0; i < fwspec->num_ids; i++)
> +		rb_erase(&master->streams[i].node, &smmu->streams);
> +	mutex_unlock(&smmu->streams_mutex);
> +
> +	kfree(master->streams);
> +}
> +
>  static struct iommu_ops arm_smmu_ops;
>  
>  static int arm_smmu_add_device(struct device *dev)
> @@ -2198,6 +2298,7 @@ static int arm_smmu_add_device(struct device *dev)
>  
>  	group = iommu_group_get_for_dev(dev);
>  	if (!IS_ERR(group)) {
> +		arm_smmu_insert_master(smmu, master);
There are some error cases it would be good to take notice off when
inserting the master.  Admittedly the same is true of iommu_device_link
so I guess you are keeping with the existing code style.

Would also be nice if the later bit of rework to drop these out
of the if statement was done before this patch in the series.


>  		iommu_group_put(group);
>  		iommu_device_link(&smmu->iommu, dev);
>  	}
> @@ -2218,6 +2319,7 @@ static void arm_smmu_remove_device(struct device *dev)
>  	smmu = master->smmu;
>  	if (master && master->ste.assigned)
>  		arm_smmu_detach_dev(dev);
> +	arm_smmu_remove_master(smmu, master);
>  	iommu_group_remove_device(dev);
>  	iommu_device_unlink(&smmu->iommu, dev);
>  	kfree(master);
> @@ -2527,6 +2629,9 @@ static int arm_smmu_init_structures(struct arm_smmu_device *smmu)
>  	int ret;
>  
>  	atomic_set(&smmu->sync_nr, 0);
> +	mutex_init(&smmu->streams_mutex);
> +	smmu->streams = RB_ROOT;
> +
>  	ret = arm_smmu_init_queues(smmu);
>  	if (ret)
>  		return ret;

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 27/37] iommu/arm-smmu-v3: Register fault workqueue
  2018-02-12 18:33     ` Jean-Philippe Brucker
  (?)
@ 2018-03-08 17:44         ` Jonathan Cameron
  -1 siblings, 0 replies; 317+ messages in thread
From: Jonathan Cameron @ 2018-03-08 17:44 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: mark.rutland-5wv7dgnIgG8,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, catalin.marinas-5wv7dgnIgG8,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw-LthD3rsA81gm4RdzfppkhA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	sudeep.holla-5wv7dgnIgG8, christian.koenig-5C7GfCeVMHo

On Mon, 12 Feb 2018 18:33:42 +0000
Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:

> When using PRI or Stall, the PRI or event handler enqueues faults into the
> core fault queue. Register it based on the SMMU features.
> 
> When the core stops using a PASID, it notifies the SMMU to flush all
> instances of this PASID from the PRI queue. Add a way to flush the PRI and
> event queue. PRI and event thread now take a spinlock while processing the
> queue. The flush handler takes this lock to inspect the queue state.
> We avoid livelock, where the SMMU adds fault to the queue faster than we
> can consume them, by incrementing a 'batch' number on every cycle so the
> flush handler only has to wait a complete cycle (two batch increments.)
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
I think you have a potential incorrect free issue... See inline.

Jonathan
> ---
>  drivers/iommu/Kconfig       |   1 +
>  drivers/iommu/arm-smmu-v3.c | 103 +++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 103 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index d434f7085dc2..d79c68754bb9 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -354,6 +354,7 @@ config ARM_SMMU_V3
>  	depends on ARM64
>  	select IOMMU_API
>  	select IOMMU_SVA
> +	select IOMMU_FAULT
>  	select IOMMU_IO_PGTABLE_LPAE
>  	select ARM_SMMU_V3_CONTEXT
>  	select GENERIC_MSI_IRQ_DOMAIN
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 8528704627b5..c5b3a43becaf 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -494,6 +494,10 @@ struct arm_smmu_queue {
>  
>  	u32 __iomem			*prod_reg;
>  	u32 __iomem			*cons_reg;
> +
> +	/* Event and PRI */
> +	u64				batch;
> +	wait_queue_head_t		wq;
>  };
>  
>  struct arm_smmu_cmdq {
> @@ -610,6 +614,9 @@ struct arm_smmu_device {
>  
>  	/* IOMMU core code handle */
>  	struct iommu_device		iommu;
> +
> +	/* Notifier for the fault queue */
> +	struct notifier_block		faultq_nb;
>  };
>  
>  /* SMMU private data for each master */
> @@ -1247,14 +1254,23 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
>  static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
>  {
>  	int i;
> +	int num_handled = 0;
>  	struct arm_smmu_device *smmu = dev;
>  	struct arm_smmu_queue *q = &smmu->evtq.q;
> +	size_t queue_size = 1 << q->max_n_shift;
>  	u64 evt[EVTQ_ENT_DWORDS];
>  
> +	spin_lock(&q->wq.lock);
>  	do {
>  		while (!queue_remove_raw(q, evt)) {
>  			u8 id = evt[0] >> EVTQ_0_ID_SHIFT & EVTQ_0_ID_MASK;
>  
> +			if (++num_handled == queue_size) {
> +				q->batch++;
> +				wake_up_locked(&q->wq);
> +				num_handled = 0;
> +			}
> +
>  			dev_info(smmu->dev, "event 0x%02x received:\n", id);
>  			for (i = 0; i < ARRAY_SIZE(evt); ++i)
>  				dev_info(smmu->dev, "\t0x%016llx\n",
> @@ -1272,6 +1288,11 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
>  
>  	/* Sync our overflow flag, as we believe we're up to speed */
>  	q->cons = Q_OVF(q, q->prod) | Q_WRP(q, q->cons) | Q_IDX(q, q->cons);
> +
> +	q->batch++;
> +	wake_up_locked(&q->wq);
> +	spin_unlock(&q->wq.lock);
> +
>  	return IRQ_HANDLED;
>  }
>  
> @@ -1315,13 +1336,24 @@ static void arm_smmu_handle_ppr(struct arm_smmu_device *smmu, u64 *evt)
>  
>  static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
>  {
> +	int num_handled = 0;
>  	struct arm_smmu_device *smmu = dev;
>  	struct arm_smmu_queue *q = &smmu->priq.q;
> +	size_t queue_size = 1 << q->max_n_shift;
>  	u64 evt[PRIQ_ENT_DWORDS];
>  
> +	spin_lock(&q->wq.lock);
>  	do {
> -		while (!queue_remove_raw(q, evt))
> +		while (!queue_remove_raw(q, evt)) {
> +			spin_unlock(&q->wq.lock);
>  			arm_smmu_handle_ppr(smmu, evt);
> +			spin_lock(&q->wq.lock);
> +			if (++num_handled == queue_size) {
> +				q->batch++;
> +				wake_up_locked(&q->wq);
> +				num_handled = 0;
> +			}
> +		}
>  
>  		if (queue_sync_prod(q) == -EOVERFLOW)
>  			dev_err(smmu->dev, "PRIQ overflow detected -- requests lost\n");
> @@ -1329,9 +1361,65 @@ static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
>  
>  	/* Sync our overflow flag, as we believe we're up to speed */
>  	q->cons = Q_OVF(q, q->prod) | Q_WRP(q, q->cons) | Q_IDX(q, q->cons);
> +
> +	q->batch++;
> +	wake_up_locked(&q->wq);
> +	spin_unlock(&q->wq.lock);
> +
>  	return IRQ_HANDLED;
>  }
>  
> +/*
> + * arm_smmu_flush_queue - wait until all events/PPRs currently in the queue have
> + * been consumed.
> + *
> + * Wait until the queue thread finished a batch, or until the queue is empty.
> + * Note that we don't handle overflows on q->batch. If it occurs, just wait for
> + * the queue to be empty.
> + */
> +static int arm_smmu_flush_queue(struct arm_smmu_device *smmu,
> +				struct arm_smmu_queue *q, const char *name)
> +{
> +	int ret;
> +	u64 batch;
> +
> +	spin_lock(&q->wq.lock);
> +	if (queue_sync_prod(q) == -EOVERFLOW)
> +		dev_err(smmu->dev, "%s overflow detected -- requests lost\n", name);
> +
> +	batch = q->batch;
> +	ret = wait_event_interruptible_locked(q->wq, queue_empty(q) ||
> +					      q->batch >= batch + 2);
> +	spin_unlock(&q->wq.lock);
> +
> +	return ret;
> +}
> +
> +static int arm_smmu_flush_queues(struct notifier_block *nb,
> +				 unsigned long action, void *data)
> +{
> +	struct arm_smmu_device *smmu = container_of(nb, struct arm_smmu_device,
> +						    faultq_nb);
> +	struct device *dev = data;
> +	struct arm_smmu_master_data *master = NULL;
> +
> +	if (dev)
> +		master = dev->iommu_fwspec->iommu_priv;
> +
> +	if (master) {
> +		/* TODO: add support for PRI and Stall */
> +		return 0;
> +	}
> +
> +	/* No target device, flush all queues. */
> +	if (smmu->features & ARM_SMMU_FEAT_STALLS)
> +		arm_smmu_flush_queue(smmu, &smmu->evtq.q, "evtq");
> +	if (smmu->features & ARM_SMMU_FEAT_PRI)
> +		arm_smmu_flush_queue(smmu, &smmu->priq.q, "priq");
> +
> +	return 0;
> +}
> +
>  static int arm_smmu_device_disable(struct arm_smmu_device *smmu);
>  
>  static irqreturn_t arm_smmu_gerror_handler(int irq, void *dev)
> @@ -2288,6 +2376,10 @@ static int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
>  		     << Q_BASE_LOG2SIZE_SHIFT;
>  
>  	q->prod = q->cons = 0;
> +
> +	init_waitqueue_head(&q->wq);
> +	q->batch = 0;
> +
>  	return 0;
>  }
>  
> @@ -3168,6 +3260,13 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
>  	if (ret)
>  		return ret;
>  
> +	if (smmu->features & (ARM_SMMU_FEAT_STALLS | ARM_SMMU_FEAT_PRI)) {
> +		smmu->faultq_nb.notifier_call = arm_smmu_flush_queues;
> +		ret = iommu_fault_queue_register(&smmu->faultq_nb);
Here you register only if this smmu supports stalls or pri which is fine, but
see the unregister path.

> +		if (ret)
> +			return ret;
> +	}
> +
>  	/* And we're up. Go go go! */
>  	ret = iommu_device_sysfs_add(&smmu->iommu, dev, NULL,
>  				     "smmu3.%pa", &ioaddr);
> @@ -3210,6 +3309,8 @@ static int arm_smmu_device_remove(struct platform_device *pdev)
>  {
>  	struct arm_smmu_device *smmu = platform_get_drvdata(pdev);
>  
> +	iommu_fault_queue_unregister(&smmu->faultq_nb);

Here you unregister from the fault queue unconditionally.  That is mostly
safe but it seems to decrement and potentially destroy the work queue that
is in use by another smmu instance that does support page faulting.

> +
>  	arm_smmu_device_disable(smmu);
>  
>  	return 0;

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 27/37] iommu/arm-smmu-v3: Register fault workqueue
@ 2018-03-08 17:44         ` Jonathan Cameron
  0 siblings, 0 replies; 317+ messages in thread
From: Jonathan Cameron @ 2018-03-08 17:44 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm,
	joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	shunyong.yang, nwatters, okaya, jcrouse, rfranz, dwmw2,
	jacob.jun.pan, yi.l.liu, ashok.raj, robdclark, christian.koenig,
	bharatku

On Mon, 12 Feb 2018 18:33:42 +0000
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> When using PRI or Stall, the PRI or event handler enqueues faults into the
> core fault queue. Register it based on the SMMU features.
> 
> When the core stops using a PASID, it notifies the SMMU to flush all
> instances of this PASID from the PRI queue. Add a way to flush the PRI and
> event queue. PRI and event thread now take a spinlock while processing the
> queue. The flush handler takes this lock to inspect the queue state.
> We avoid livelock, where the SMMU adds fault to the queue faster than we
> can consume them, by incrementing a 'batch' number on every cycle so the
> flush handler only has to wait a complete cycle (two batch increments.)
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
I think you have a potential incorrect free issue... See inline.

Jonathan
> ---
>  drivers/iommu/Kconfig       |   1 +
>  drivers/iommu/arm-smmu-v3.c | 103 +++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 103 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index d434f7085dc2..d79c68754bb9 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -354,6 +354,7 @@ config ARM_SMMU_V3
>  	depends on ARM64
>  	select IOMMU_API
>  	select IOMMU_SVA
> +	select IOMMU_FAULT
>  	select IOMMU_IO_PGTABLE_LPAE
>  	select ARM_SMMU_V3_CONTEXT
>  	select GENERIC_MSI_IRQ_DOMAIN
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 8528704627b5..c5b3a43becaf 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -494,6 +494,10 @@ struct arm_smmu_queue {
>  
>  	u32 __iomem			*prod_reg;
>  	u32 __iomem			*cons_reg;
> +
> +	/* Event and PRI */
> +	u64				batch;
> +	wait_queue_head_t		wq;
>  };
>  
>  struct arm_smmu_cmdq {
> @@ -610,6 +614,9 @@ struct arm_smmu_device {
>  
>  	/* IOMMU core code handle */
>  	struct iommu_device		iommu;
> +
> +	/* Notifier for the fault queue */
> +	struct notifier_block		faultq_nb;
>  };
>  
>  /* SMMU private data for each master */
> @@ -1247,14 +1254,23 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
>  static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
>  {
>  	int i;
> +	int num_handled = 0;
>  	struct arm_smmu_device *smmu = dev;
>  	struct arm_smmu_queue *q = &smmu->evtq.q;
> +	size_t queue_size = 1 << q->max_n_shift;
>  	u64 evt[EVTQ_ENT_DWORDS];
>  
> +	spin_lock(&q->wq.lock);
>  	do {
>  		while (!queue_remove_raw(q, evt)) {
>  			u8 id = evt[0] >> EVTQ_0_ID_SHIFT & EVTQ_0_ID_MASK;
>  
> +			if (++num_handled == queue_size) {
> +				q->batch++;
> +				wake_up_locked(&q->wq);
> +				num_handled = 0;
> +			}
> +
>  			dev_info(smmu->dev, "event 0x%02x received:\n", id);
>  			for (i = 0; i < ARRAY_SIZE(evt); ++i)
>  				dev_info(smmu->dev, "\t0x%016llx\n",
> @@ -1272,6 +1288,11 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
>  
>  	/* Sync our overflow flag, as we believe we're up to speed */
>  	q->cons = Q_OVF(q, q->prod) | Q_WRP(q, q->cons) | Q_IDX(q, q->cons);
> +
> +	q->batch++;
> +	wake_up_locked(&q->wq);
> +	spin_unlock(&q->wq.lock);
> +
>  	return IRQ_HANDLED;
>  }
>  
> @@ -1315,13 +1336,24 @@ static void arm_smmu_handle_ppr(struct arm_smmu_device *smmu, u64 *evt)
>  
>  static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
>  {
> +	int num_handled = 0;
>  	struct arm_smmu_device *smmu = dev;
>  	struct arm_smmu_queue *q = &smmu->priq.q;
> +	size_t queue_size = 1 << q->max_n_shift;
>  	u64 evt[PRIQ_ENT_DWORDS];
>  
> +	spin_lock(&q->wq.lock);
>  	do {
> -		while (!queue_remove_raw(q, evt))
> +		while (!queue_remove_raw(q, evt)) {
> +			spin_unlock(&q->wq.lock);
>  			arm_smmu_handle_ppr(smmu, evt);
> +			spin_lock(&q->wq.lock);
> +			if (++num_handled == queue_size) {
> +				q->batch++;
> +				wake_up_locked(&q->wq);
> +				num_handled = 0;
> +			}
> +		}
>  
>  		if (queue_sync_prod(q) == -EOVERFLOW)
>  			dev_err(smmu->dev, "PRIQ overflow detected -- requests lost\n");
> @@ -1329,9 +1361,65 @@ static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
>  
>  	/* Sync our overflow flag, as we believe we're up to speed */
>  	q->cons = Q_OVF(q, q->prod) | Q_WRP(q, q->cons) | Q_IDX(q, q->cons);
> +
> +	q->batch++;
> +	wake_up_locked(&q->wq);
> +	spin_unlock(&q->wq.lock);
> +
>  	return IRQ_HANDLED;
>  }
>  
> +/*
> + * arm_smmu_flush_queue - wait until all events/PPRs currently in the queue have
> + * been consumed.
> + *
> + * Wait until the queue thread finished a batch, or until the queue is empty.
> + * Note that we don't handle overflows on q->batch. If it occurs, just wait for
> + * the queue to be empty.
> + */
> +static int arm_smmu_flush_queue(struct arm_smmu_device *smmu,
> +				struct arm_smmu_queue *q, const char *name)
> +{
> +	int ret;
> +	u64 batch;
> +
> +	spin_lock(&q->wq.lock);
> +	if (queue_sync_prod(q) == -EOVERFLOW)
> +		dev_err(smmu->dev, "%s overflow detected -- requests lost\n", name);
> +
> +	batch = q->batch;
> +	ret = wait_event_interruptible_locked(q->wq, queue_empty(q) ||
> +					      q->batch >= batch + 2);
> +	spin_unlock(&q->wq.lock);
> +
> +	return ret;
> +}
> +
> +static int arm_smmu_flush_queues(struct notifier_block *nb,
> +				 unsigned long action, void *data)
> +{
> +	struct arm_smmu_device *smmu = container_of(nb, struct arm_smmu_device,
> +						    faultq_nb);
> +	struct device *dev = data;
> +	struct arm_smmu_master_data *master = NULL;
> +
> +	if (dev)
> +		master = dev->iommu_fwspec->iommu_priv;
> +
> +	if (master) {
> +		/* TODO: add support for PRI and Stall */
> +		return 0;
> +	}
> +
> +	/* No target device, flush all queues. */
> +	if (smmu->features & ARM_SMMU_FEAT_STALLS)
> +		arm_smmu_flush_queue(smmu, &smmu->evtq.q, "evtq");
> +	if (smmu->features & ARM_SMMU_FEAT_PRI)
> +		arm_smmu_flush_queue(smmu, &smmu->priq.q, "priq");
> +
> +	return 0;
> +}
> +
>  static int arm_smmu_device_disable(struct arm_smmu_device *smmu);
>  
>  static irqreturn_t arm_smmu_gerror_handler(int irq, void *dev)
> @@ -2288,6 +2376,10 @@ static int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
>  		     << Q_BASE_LOG2SIZE_SHIFT;
>  
>  	q->prod = q->cons = 0;
> +
> +	init_waitqueue_head(&q->wq);
> +	q->batch = 0;
> +
>  	return 0;
>  }
>  
> @@ -3168,6 +3260,13 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
>  	if (ret)
>  		return ret;
>  
> +	if (smmu->features & (ARM_SMMU_FEAT_STALLS | ARM_SMMU_FEAT_PRI)) {
> +		smmu->faultq_nb.notifier_call = arm_smmu_flush_queues;
> +		ret = iommu_fault_queue_register(&smmu->faultq_nb);
Here you register only if this smmu supports stalls or pri which is fine, but
see the unregister path.

> +		if (ret)
> +			return ret;
> +	}
> +
>  	/* And we're up. Go go go! */
>  	ret = iommu_device_sysfs_add(&smmu->iommu, dev, NULL,
>  				     "smmu3.%pa", &ioaddr);
> @@ -3210,6 +3309,8 @@ static int arm_smmu_device_remove(struct platform_device *pdev)
>  {
>  	struct arm_smmu_device *smmu = platform_get_drvdata(pdev);
>  
> +	iommu_fault_queue_unregister(&smmu->faultq_nb);

Here you unregister from the fault queue unconditionally.  That is mostly
safe but it seems to decrement and potentially destroy the work queue that
is in use by another smmu instance that does support page faulting.

> +
>  	arm_smmu_device_disable(smmu);
>  
>  	return 0;


^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 27/37] iommu/arm-smmu-v3: Register fault workqueue
@ 2018-03-08 17:44         ` Jonathan Cameron
  0 siblings, 0 replies; 317+ messages in thread
From: Jonathan Cameron @ 2018-03-08 17:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 12 Feb 2018 18:33:42 +0000
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> When using PRI or Stall, the PRI or event handler enqueues faults into the
> core fault queue. Register it based on the SMMU features.
> 
> When the core stops using a PASID, it notifies the SMMU to flush all
> instances of this PASID from the PRI queue. Add a way to flush the PRI and
> event queue. PRI and event thread now take a spinlock while processing the
> queue. The flush handler takes this lock to inspect the queue state.
> We avoid livelock, where the SMMU adds fault to the queue faster than we
> can consume them, by incrementing a 'batch' number on every cycle so the
> flush handler only has to wait a complete cycle (two batch increments.)
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
I think you have a potential incorrect free issue... See inline.

Jonathan
> ---
>  drivers/iommu/Kconfig       |   1 +
>  drivers/iommu/arm-smmu-v3.c | 103 +++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 103 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index d434f7085dc2..d79c68754bb9 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -354,6 +354,7 @@ config ARM_SMMU_V3
>  	depends on ARM64
>  	select IOMMU_API
>  	select IOMMU_SVA
> +	select IOMMU_FAULT
>  	select IOMMU_IO_PGTABLE_LPAE
>  	select ARM_SMMU_V3_CONTEXT
>  	select GENERIC_MSI_IRQ_DOMAIN
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 8528704627b5..c5b3a43becaf 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -494,6 +494,10 @@ struct arm_smmu_queue {
>  
>  	u32 __iomem			*prod_reg;
>  	u32 __iomem			*cons_reg;
> +
> +	/* Event and PRI */
> +	u64				batch;
> +	wait_queue_head_t		wq;
>  };
>  
>  struct arm_smmu_cmdq {
> @@ -610,6 +614,9 @@ struct arm_smmu_device {
>  
>  	/* IOMMU core code handle */
>  	struct iommu_device		iommu;
> +
> +	/* Notifier for the fault queue */
> +	struct notifier_block		faultq_nb;
>  };
>  
>  /* SMMU private data for each master */
> @@ -1247,14 +1254,23 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
>  static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
>  {
>  	int i;
> +	int num_handled = 0;
>  	struct arm_smmu_device *smmu = dev;
>  	struct arm_smmu_queue *q = &smmu->evtq.q;
> +	size_t queue_size = 1 << q->max_n_shift;
>  	u64 evt[EVTQ_ENT_DWORDS];
>  
> +	spin_lock(&q->wq.lock);
>  	do {
>  		while (!queue_remove_raw(q, evt)) {
>  			u8 id = evt[0] >> EVTQ_0_ID_SHIFT & EVTQ_0_ID_MASK;
>  
> +			if (++num_handled == queue_size) {
> +				q->batch++;
> +				wake_up_locked(&q->wq);
> +				num_handled = 0;
> +			}
> +
>  			dev_info(smmu->dev, "event 0x%02x received:\n", id);
>  			for (i = 0; i < ARRAY_SIZE(evt); ++i)
>  				dev_info(smmu->dev, "\t0x%016llx\n",
> @@ -1272,6 +1288,11 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
>  
>  	/* Sync our overflow flag, as we believe we're up to speed */
>  	q->cons = Q_OVF(q, q->prod) | Q_WRP(q, q->cons) | Q_IDX(q, q->cons);
> +
> +	q->batch++;
> +	wake_up_locked(&q->wq);
> +	spin_unlock(&q->wq.lock);
> +
>  	return IRQ_HANDLED;
>  }
>  
> @@ -1315,13 +1336,24 @@ static void arm_smmu_handle_ppr(struct arm_smmu_device *smmu, u64 *evt)
>  
>  static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
>  {
> +	int num_handled = 0;
>  	struct arm_smmu_device *smmu = dev;
>  	struct arm_smmu_queue *q = &smmu->priq.q;
> +	size_t queue_size = 1 << q->max_n_shift;
>  	u64 evt[PRIQ_ENT_DWORDS];
>  
> +	spin_lock(&q->wq.lock);
>  	do {
> -		while (!queue_remove_raw(q, evt))
> +		while (!queue_remove_raw(q, evt)) {
> +			spin_unlock(&q->wq.lock);
>  			arm_smmu_handle_ppr(smmu, evt);
> +			spin_lock(&q->wq.lock);
> +			if (++num_handled == queue_size) {
> +				q->batch++;
> +				wake_up_locked(&q->wq);
> +				num_handled = 0;
> +			}
> +		}
>  
>  		if (queue_sync_prod(q) == -EOVERFLOW)
>  			dev_err(smmu->dev, "PRIQ overflow detected -- requests lost\n");
> @@ -1329,9 +1361,65 @@ static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
>  
>  	/* Sync our overflow flag, as we believe we're up to speed */
>  	q->cons = Q_OVF(q, q->prod) | Q_WRP(q, q->cons) | Q_IDX(q, q->cons);
> +
> +	q->batch++;
> +	wake_up_locked(&q->wq);
> +	spin_unlock(&q->wq.lock);
> +
>  	return IRQ_HANDLED;
>  }
>  
> +/*
> + * arm_smmu_flush_queue - wait until all events/PPRs currently in the queue have
> + * been consumed.
> + *
> + * Wait until the queue thread finished a batch, or until the queue is empty.
> + * Note that we don't handle overflows on q->batch. If it occurs, just wait for
> + * the queue to be empty.
> + */
> +static int arm_smmu_flush_queue(struct arm_smmu_device *smmu,
> +				struct arm_smmu_queue *q, const char *name)
> +{
> +	int ret;
> +	u64 batch;
> +
> +	spin_lock(&q->wq.lock);
> +	if (queue_sync_prod(q) == -EOVERFLOW)
> +		dev_err(smmu->dev, "%s overflow detected -- requests lost\n", name);
> +
> +	batch = q->batch;
> +	ret = wait_event_interruptible_locked(q->wq, queue_empty(q) ||
> +					      q->batch >= batch + 2);
> +	spin_unlock(&q->wq.lock);
> +
> +	return ret;
> +}
> +
> +static int arm_smmu_flush_queues(struct notifier_block *nb,
> +				 unsigned long action, void *data)
> +{
> +	struct arm_smmu_device *smmu = container_of(nb, struct arm_smmu_device,
> +						    faultq_nb);
> +	struct device *dev = data;
> +	struct arm_smmu_master_data *master = NULL;
> +
> +	if (dev)
> +		master = dev->iommu_fwspec->iommu_priv;
> +
> +	if (master) {
> +		/* TODO: add support for PRI and Stall */
> +		return 0;
> +	}
> +
> +	/* No target device, flush all queues. */
> +	if (smmu->features & ARM_SMMU_FEAT_STALLS)
> +		arm_smmu_flush_queue(smmu, &smmu->evtq.q, "evtq");
> +	if (smmu->features & ARM_SMMU_FEAT_PRI)
> +		arm_smmu_flush_queue(smmu, &smmu->priq.q, "priq");
> +
> +	return 0;
> +}
> +
>  static int arm_smmu_device_disable(struct arm_smmu_device *smmu);
>  
>  static irqreturn_t arm_smmu_gerror_handler(int irq, void *dev)
> @@ -2288,6 +2376,10 @@ static int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
>  		     << Q_BASE_LOG2SIZE_SHIFT;
>  
>  	q->prod = q->cons = 0;
> +
> +	init_waitqueue_head(&q->wq);
> +	q->batch = 0;
> +
>  	return 0;
>  }
>  
> @@ -3168,6 +3260,13 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
>  	if (ret)
>  		return ret;
>  
> +	if (smmu->features & (ARM_SMMU_FEAT_STALLS | ARM_SMMU_FEAT_PRI)) {
> +		smmu->faultq_nb.notifier_call = arm_smmu_flush_queues;
> +		ret = iommu_fault_queue_register(&smmu->faultq_nb);
Here you register only if this smmu supports stalls or pri which is fine, but
see the unregister path.

> +		if (ret)
> +			return ret;
> +	}
> +
>  	/* And we're up. Go go go! */
>  	ret = iommu_device_sysfs_add(&smmu->iommu, dev, NULL,
>  				     "smmu3.%pa", &ioaddr);
> @@ -3210,6 +3309,8 @@ static int arm_smmu_device_remove(struct platform_device *pdev)
>  {
>  	struct arm_smmu_device *smmu = platform_get_drvdata(pdev);
>  
> +	iommu_fault_queue_unregister(&smmu->faultq_nb);

Here you unregister from the fault queue unconditionally.  That is mostly
safe but it seems to decrement and potentially destroy the work queue that
is in use by another smmu instance that does support page faulting.

> +
>  	arm_smmu_device_disable(smmu);
>  
>  	return 0;

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 17/37] iommu/arm-smmu-v3: Move context descriptor code
  2018-02-12 18:33     ` Jean-Philippe Brucker
  (?)
@ 2018-03-09 11:44         ` Jonathan Cameron
  -1 siblings, 0 replies; 317+ messages in thread
From: Jonathan Cameron @ 2018-03-09 11:44 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: mark.rutland-5wv7dgnIgG8,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, catalin.marinas-5wv7dgnIgG8,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw-LthD3rsA81gm4RdzfppkhA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	sudeep.holla-5wv7dgnIgG8, christian.koenig-5C7GfCeVMHo

On Mon, 12 Feb 2018 18:33:32 +0000
Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:

> In order to add support for substream ID, move the context descriptor code
> into a separate library. At the moment it only manages context descriptor
> 0, which is used for non-PASID translations.
> 
> One important behavior change is the ASID allocator, which is now global
> instead of per-SMMU. If we end up needing per-SMMU ASIDs after all, it
> would be relatively simple to move back to per-device allocator instead
> of a global one. Sharing ASIDs will require an IDR, so implement the
> ASID allocator with an IDA instead of porting the bitmap, to ease the
> transition.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
Hi Jean-Philippe,

This would have been easier to review if split into a 'move' and additional
patches actually making the changes described.

Superficially it looks like there may be more going on in here than the
above description suggests.  I'm unsure why we are gaining 
the CFGI_CD_ALL and similar in this patch as there is just to much going on.

Thanks,

Jonathan
> ---
>  MAINTAINERS                         |   2 +-
>  drivers/iommu/Kconfig               |  11 ++
>  drivers/iommu/Makefile              |   1 +
>  drivers/iommu/arm-smmu-v3-context.c | 289 ++++++++++++++++++++++++++++++++++++
>  drivers/iommu/arm-smmu-v3.c         | 265 +++++++++++++++------------------
>  drivers/iommu/iommu-pasid.c         |   1 +
>  drivers/iommu/iommu-pasid.h         |  27 ++++
>  7 files changed, 451 insertions(+), 145 deletions(-)
>  create mode 100644 drivers/iommu/arm-smmu-v3-context.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 9cb8ced8322a..93507bfe03a6 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1104,7 +1104,7 @@ R:	Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
>  L:	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org (moderated for non-subscribers)
>  S:	Maintained
>  F:	drivers/iommu/arm-smmu.c
> -F:	drivers/iommu/arm-smmu-v3.c
> +F:	drivers/iommu/arm-smmu-v3*
>  F:	drivers/iommu/io-pgtable-arm.c
>  F:	drivers/iommu/io-pgtable-arm.h
>  F:	drivers/iommu/io-pgtable-arm-v7s.c
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 8add90ba9b75..4b272925ee78 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -66,6 +66,16 @@ menu "Generic PASID table support"
>  config IOMMU_PASID_TABLE
>  	bool
>  
> +config ARM_SMMU_V3_CONTEXT
> +	bool "ARM SMMU v3 Context Descriptor tables"
> +	select IOMMU_PASID_TABLE
> +	depends on ARM64
> +	help
> +	Enable support for ARM SMMU v3 Context Descriptor tables, used for DMA
> +	and PASID support.
> +
> +	If unsure, say N here.
> +
>  endmenu
>  
>  config IOMMU_IOVA
> @@ -344,6 +354,7 @@ config ARM_SMMU_V3
>  	depends on ARM64
>  	select IOMMU_API
>  	select IOMMU_IO_PGTABLE_LPAE
> +	select ARM_SMMU_V3_CONTEXT
>  	select GENERIC_MSI_IRQ_DOMAIN
>  	help
>  	  Support for implementations of the ARM System MMU architecture
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 338e59c93131..22758960ed02 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -9,6 +9,7 @@ obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
>  obj-$(CONFIG_IOMMU_PASID_TABLE) += iommu-pasid.o
> +obj-$(CONFIG_ARM_SMMU_V3_CONTEXT) += arm-smmu-v3-context.o
>  obj-$(CONFIG_IOMMU_IOVA) += iova.o
>  obj-$(CONFIG_OF_IOMMU)	+= of_iommu.o
>  obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
> diff --git a/drivers/iommu/arm-smmu-v3-context.c b/drivers/iommu/arm-smmu-v3-context.c
> new file mode 100644
> index 000000000000..e910cb356f45
> --- /dev/null
> +++ b/drivers/iommu/arm-smmu-v3-context.c
> @@ -0,0 +1,289 @@
> +/*
> + * Context descriptor table driver for SMMUv3
> + *
> + * Copyright (C) 2018 ARM Ltd.
> + *
> + * SPDX-License-Identifier: GPL-2.0
> + */
> +
> +#include <linux/device.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/idr.h>
> +#include <linux/kernel.h>
> +#include <linux/slab.h>
> +
> +#include "iommu-pasid.h"
> +
> +#define CTXDESC_CD_DWORDS		8
> +#define CTXDESC_CD_0_TCR_T0SZ_SHIFT	0
> +#define ARM64_TCR_T0SZ_SHIFT		0
> +#define ARM64_TCR_T0SZ_MASK		0x1fUL
> +#define CTXDESC_CD_0_TCR_TG0_SHIFT	6
> +#define ARM64_TCR_TG0_SHIFT		14
> +#define ARM64_TCR_TG0_MASK		0x3UL
> +#define CTXDESC_CD_0_TCR_IRGN0_SHIFT	8
> +#define ARM64_TCR_IRGN0_SHIFT		8
> +#define ARM64_TCR_IRGN0_MASK		0x3UL
> +#define CTXDESC_CD_0_TCR_ORGN0_SHIFT	10
> +#define ARM64_TCR_ORGN0_SHIFT		10
> +#define ARM64_TCR_ORGN0_MASK		0x3UL
> +#define CTXDESC_CD_0_TCR_SH0_SHIFT	12
> +#define ARM64_TCR_SH0_SHIFT		12
> +#define ARM64_TCR_SH0_MASK		0x3UL
> +#define CTXDESC_CD_0_TCR_EPD0_SHIFT	14
> +#define ARM64_TCR_EPD0_SHIFT		7
> +#define ARM64_TCR_EPD0_MASK		0x1UL
> +#define CTXDESC_CD_0_TCR_EPD1_SHIFT	30
> +#define ARM64_TCR_EPD1_SHIFT		23
> +#define ARM64_TCR_EPD1_MASK		0x1UL
> +
> +#define CTXDESC_CD_0_ENDI		(1UL << 15)
> +#define CTXDESC_CD_0_V			(1UL << 31)
> +
> +#define CTXDESC_CD_0_TCR_IPS_SHIFT	32
> +#define ARM64_TCR_IPS_SHIFT		32
> +#define ARM64_TCR_IPS_MASK		0x7UL
> +#define CTXDESC_CD_0_TCR_TBI0_SHIFT	38
> +#define ARM64_TCR_TBI0_SHIFT		37
> +#define ARM64_TCR_TBI0_MASK		0x1UL
> +
> +#define CTXDESC_CD_0_AA64		(1UL << 41)
> +#define CTXDESC_CD_0_S			(1UL << 44)
> +#define CTXDESC_CD_0_R			(1UL << 45)
> +#define CTXDESC_CD_0_A			(1UL << 46)
> +#define CTXDESC_CD_0_ASET_SHIFT		47
> +#define CTXDESC_CD_0_ASET_SHARED	(0UL << CTXDESC_CD_0_ASET_SHIFT)
> +#define CTXDESC_CD_0_ASET_PRIVATE	(1UL << CTXDESC_CD_0_ASET_SHIFT)
> +#define CTXDESC_CD_0_ASID_SHIFT		48
> +#define CTXDESC_CD_0_ASID_MASK		0xffffUL
> +
> +#define CTXDESC_CD_1_TTB0_SHIFT		4
> +#define CTXDESC_CD_1_TTB0_MASK		0xfffffffffffUL
> +
> +#define CTXDESC_CD_3_MAIR_SHIFT		0
> +
> +/* Convert between AArch64 (CPU) TCR format and SMMU CD format */
> +#define ARM_SMMU_TCR2CD(tcr, fld)					\
> +	(((tcr) >> ARM64_TCR_##fld##_SHIFT & ARM64_TCR_##fld##_MASK)	\
> +	 << CTXDESC_CD_0_TCR_##fld##_SHIFT)
> +
> +
> +struct arm_smmu_cd {
> +	struct iommu_pasid_entry	entry;
> +
> +	u64				ttbr;
> +	u64				tcr;
> +	u64				mair;
> +};
> +
> +#define pasid_entry_to_cd(entry) \
> +	container_of((entry), struct arm_smmu_cd, entry)
> +
> +struct arm_smmu_cd_tables {
> +	struct iommu_pasid_table	pasid;
> +
> +	void				*ptr;
> +	dma_addr_t			ptr_dma;
> +};
> +
> +#define pasid_to_cd_tables(pasid_table) \
> +	container_of((pasid_table), struct arm_smmu_cd_tables, pasid)
> +
> +#define pasid_ops_to_tables(ops) \
> +	pasid_to_cd_tables(iommu_pasid_table_ops_to_table(ops))
> +
> +static DEFINE_IDA(asid_ida);
> +
> +static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
> +{
> +	u64 val = 0;
> +
> +	/* Repack the TCR. Just care about TTBR0 for now */
> +	val |= ARM_SMMU_TCR2CD(tcr, T0SZ);
> +	val |= ARM_SMMU_TCR2CD(tcr, TG0);
> +	val |= ARM_SMMU_TCR2CD(tcr, IRGN0);
> +	val |= ARM_SMMU_TCR2CD(tcr, ORGN0);
> +	val |= ARM_SMMU_TCR2CD(tcr, SH0);
> +	val |= ARM_SMMU_TCR2CD(tcr, EPD0);
> +	val |= ARM_SMMU_TCR2CD(tcr, EPD1);
> +	val |= ARM_SMMU_TCR2CD(tcr, IPS);
> +	val |= ARM_SMMU_TCR2CD(tcr, TBI0);
> +
> +	return val;
> +}
> +
> +static int arm_smmu_write_ctx_desc(struct arm_smmu_cd_tables *tbl, int ssid,
> +				    struct arm_smmu_cd *cd)
> +{
> +	u64 val;
> +	__u64 *cdptr = tbl->ptr;
> +	struct arm_smmu_context_cfg *cfg = &tbl->pasid.cfg.arm_smmu;
> +
> +	if (!cd || WARN_ON(ssid))
> +		return -EINVAL;
> +
> +	/*
> +	 * We don't need to issue any invalidation here, as we'll invalidate
> +	 * the STE when installing the new entry anyway.
> +	 */
> +	val = arm_smmu_cpu_tcr_to_cd(cd->tcr) |
> +#ifdef __BIG_ENDIAN
> +	      CTXDESC_CD_0_ENDI |
> +#endif
> +	      CTXDESC_CD_0_R | CTXDESC_CD_0_A | CTXDESC_CD_0_ASET_PRIVATE |
> +	      CTXDESC_CD_0_AA64 | cd->entry.tag << CTXDESC_CD_0_ASID_SHIFT |
> +	      CTXDESC_CD_0_V;
> +
> +	if (cfg->stall)
> +		val |= CTXDESC_CD_0_S;
> +
> +	cdptr[0] = cpu_to_le64(val);
> +
> +	val = cd->ttbr & CTXDESC_CD_1_TTB0_MASK << CTXDESC_CD_1_TTB0_SHIFT;
> +	cdptr[1] = cpu_to_le64(val);
> +
> +	cdptr[3] = cpu_to_le64(cd->mair << CTXDESC_CD_3_MAIR_SHIFT);
> +
> +	return 0;
> +}
> +
> +static struct iommu_pasid_entry *
> +arm_smmu_alloc_shared_cd(struct iommu_pasid_table_ops *ops, struct mm_struct *mm)
> +{
> +	return ERR_PTR(-ENODEV);
> +}
> +
> +static struct iommu_pasid_entry *
> +arm_smmu_alloc_priv_cd(struct iommu_pasid_table_ops *ops,
> +		       enum io_pgtable_fmt fmt,
> +		       struct io_pgtable_cfg *cfg)
> +{
> +	int ret;
> +	int asid;
> +	struct arm_smmu_cd *cd;
> +	struct arm_smmu_cd_tables *tbl = pasid_ops_to_tables(ops);
> +	struct arm_smmu_context_cfg *ctx_cfg = &tbl->pasid.cfg.arm_smmu;
> +
> +	cd = kzalloc(sizeof(*cd), GFP_KERNEL);
> +	if (!cd)
> +		return ERR_PTR(-ENOMEM);
> +
> +	asid = ida_simple_get(&asid_ida, 0, 1 << ctx_cfg->asid_bits,
> +			      GFP_KERNEL);
> +	if (asid < 0) {
> +		kfree(cd);
> +		return ERR_PTR(asid);
> +	}
> +
> +	cd->entry.tag = asid;
> +
> +	switch (fmt) {
> +	case ARM_64_LPAE_S1:
> +		cd->ttbr	= cfg->arm_lpae_s1_cfg.ttbr[0];
> +		cd->tcr		= cfg->arm_lpae_s1_cfg.tcr;
> +		cd->mair	= cfg->arm_lpae_s1_cfg.mair[0];
> +		break;
> +	default:
> +		pr_err("Unsupported pgtable format 0x%x\n", fmt);
> +		ret = -EINVAL;
> +		goto err_free_asid;
> +	}
> +
> +	return &cd->entry;
> +
> +err_free_asid:
> +	ida_simple_remove(&asid_ida, asid);
> +
> +	kfree(cd);
> +
> +	return ERR_PTR(ret);
> +}
> +
> +static void arm_smmu_free_cd(struct iommu_pasid_table_ops *ops,
> +			     struct iommu_pasid_entry *entry)
> +{
> +	struct arm_smmu_cd *cd = pasid_entry_to_cd(entry);
> +
> +	ida_simple_remove(&asid_ida, (u16)entry->tag);
> +	kfree(cd);
> +}
> +
> +static int arm_smmu_set_cd(struct iommu_pasid_table_ops *ops, int pasid,
> +			   struct iommu_pasid_entry *entry)
> +{
> +	struct arm_smmu_cd_tables *tbl = pasid_ops_to_tables(ops);
> +	struct arm_smmu_cd *cd = pasid_entry_to_cd(entry);
> +
> +	if (WARN_ON(pasid > (1 << tbl->pasid.cfg.order)))
> +		return -EINVAL;
> +
> +	return arm_smmu_write_ctx_desc(tbl, pasid, cd);
> +}
> +
> +static void arm_smmu_clear_cd(struct iommu_pasid_table_ops *ops, int pasid,
> +			      struct iommu_pasid_entry *entry)
> +{
> +	struct arm_smmu_cd_tables *tbl = pasid_ops_to_tables(ops);
> +
> +	if (WARN_ON(pasid > (1 << tbl->pasid.cfg.order)))
> +		return;
> +
> +	arm_smmu_write_ctx_desc(tbl, pasid, NULL);
> +}
> +
> +static struct iommu_pasid_table *
> +arm_smmu_alloc_cd_tables(struct iommu_pasid_table_cfg *cfg, void *cookie)
> +{
> +	struct arm_smmu_cd_tables *tbl;
> +	struct device *dev = cfg->iommu_dev;
> +
> +	if (cfg->order) {
> +		/* TODO: support SSID */
> +		return NULL;
> +	}
> +
> +	tbl = devm_kzalloc(dev, sizeof(*tbl), GFP_KERNEL);
> +	if (!tbl)
> +		return NULL;
> +
> +	tbl->ptr = dmam_alloc_coherent(dev, CTXDESC_CD_DWORDS << 3,
> +				       &tbl->ptr_dma, GFP_KERNEL | __GFP_ZERO);
> +	if (!tbl->ptr) {
> +		dev_warn(dev, "failed to allocate context descriptor\n");
> +		goto err_free_tbl;
> +	}
> +
> +	tbl->pasid.ops = (struct iommu_pasid_table_ops) {
> +		.alloc_priv_entry	= arm_smmu_alloc_priv_cd,
> +		.alloc_shared_entry	= arm_smmu_alloc_shared_cd,
> +		.free_entry		= arm_smmu_free_cd,
> +		.set_entry		= arm_smmu_set_cd,
> +		.clear_entry		= arm_smmu_clear_cd,
> +	};
> +
> +	cfg->base		= tbl->ptr_dma;
> +	cfg->arm_smmu.s1fmt	= ARM_SMMU_S1FMT_LINEAR;
> +
> +	return &tbl->pasid;
> +
> +err_free_tbl:
> +	devm_kfree(dev, tbl);
> +
> +	return NULL;
> +}
> +
> +static void arm_smmu_free_cd_tables(struct iommu_pasid_table *pasid_table)
> +{
> +	struct iommu_pasid_table_cfg *cfg = &pasid_table->cfg;
> +	struct device *dev = cfg->iommu_dev;
> +	struct arm_smmu_cd_tables *tbl = pasid_to_cd_tables(pasid_table);
> +
> +	dmam_free_coherent(dev, CTXDESC_CD_DWORDS << 3,
> +			   tbl->ptr, tbl->ptr_dma);
> +	devm_kfree(dev, tbl);
> +}
> +
> +struct iommu_pasid_init_fns arm_smmu_v3_pasid_init_fns = {
> +	.alloc	= arm_smmu_alloc_cd_tables,
> +	.free	= arm_smmu_free_cd_tables,
> +};
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index fb2507ffcdaf..b6d8c90fafb3 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -40,6 +40,7 @@
>  #include <linux/amba/bus.h>
>  
>  #include "io-pgtable.h"
> +#include "iommu-pasid.h"
>  
>  /* MMIO registers */
>  #define ARM_SMMU_IDR0			0x0
> @@ -281,60 +282,6 @@
>  #define STRTAB_STE_3_S2TTB_SHIFT	4
>  #define STRTAB_STE_3_S2TTB_MASK		0xfffffffffffUL
>  
> -/* Context descriptor (stage-1 only) */
> -#define CTXDESC_CD_DWORDS		8
> -#define CTXDESC_CD_0_TCR_T0SZ_SHIFT	0
> -#define ARM64_TCR_T0SZ_SHIFT		0
> -#define ARM64_TCR_T0SZ_MASK		0x1fUL
> -#define CTXDESC_CD_0_TCR_TG0_SHIFT	6
> -#define ARM64_TCR_TG0_SHIFT		14
> -#define ARM64_TCR_TG0_MASK		0x3UL
> -#define CTXDESC_CD_0_TCR_IRGN0_SHIFT	8
> -#define ARM64_TCR_IRGN0_SHIFT		8
> -#define ARM64_TCR_IRGN0_MASK		0x3UL
> -#define CTXDESC_CD_0_TCR_ORGN0_SHIFT	10
> -#define ARM64_TCR_ORGN0_SHIFT		10
> -#define ARM64_TCR_ORGN0_MASK		0x3UL
> -#define CTXDESC_CD_0_TCR_SH0_SHIFT	12
> -#define ARM64_TCR_SH0_SHIFT		12
> -#define ARM64_TCR_SH0_MASK		0x3UL
> -#define CTXDESC_CD_0_TCR_EPD0_SHIFT	14
> -#define ARM64_TCR_EPD0_SHIFT		7
> -#define ARM64_TCR_EPD0_MASK		0x1UL
> -#define CTXDESC_CD_0_TCR_EPD1_SHIFT	30
> -#define ARM64_TCR_EPD1_SHIFT		23
> -#define ARM64_TCR_EPD1_MASK		0x1UL
> -
> -#define CTXDESC_CD_0_ENDI		(1UL << 15)
> -#define CTXDESC_CD_0_V			(1UL << 31)
> -
> -#define CTXDESC_CD_0_TCR_IPS_SHIFT	32
> -#define ARM64_TCR_IPS_SHIFT		32
> -#define ARM64_TCR_IPS_MASK		0x7UL
> -#define CTXDESC_CD_0_TCR_TBI0_SHIFT	38
> -#define ARM64_TCR_TBI0_SHIFT		37
> -#define ARM64_TCR_TBI0_MASK		0x1UL
> -
> -#define CTXDESC_CD_0_AA64		(1UL << 41)
> -#define CTXDESC_CD_0_S			(1UL << 44)
> -#define CTXDESC_CD_0_R			(1UL << 45)
> -#define CTXDESC_CD_0_A			(1UL << 46)
> -#define CTXDESC_CD_0_ASET_SHIFT		47
> -#define CTXDESC_CD_0_ASET_SHARED	(0UL << CTXDESC_CD_0_ASET_SHIFT)
> -#define CTXDESC_CD_0_ASET_PRIVATE	(1UL << CTXDESC_CD_0_ASET_SHIFT)
> -#define CTXDESC_CD_0_ASID_SHIFT		48
> -#define CTXDESC_CD_0_ASID_MASK		0xffffUL
> -
> -#define CTXDESC_CD_1_TTB0_SHIFT		4
> -#define CTXDESC_CD_1_TTB0_MASK		0xfffffffffffUL
> -
> -#define CTXDESC_CD_3_MAIR_SHIFT		0
> -
> -/* Convert between AArch64 (CPU) TCR format and SMMU CD format */
> -#define ARM_SMMU_TCR2CD(tcr, fld)					\
> -	(((tcr) >> ARM64_TCR_##fld##_SHIFT & ARM64_TCR_##fld##_MASK)	\
> -	 << CTXDESC_CD_0_TCR_##fld##_SHIFT)
> -
>  /* Command queue */
>  #define CMDQ_ENT_DWORDS			2
>  #define CMDQ_MAX_SZ_SHIFT		8
> @@ -353,6 +300,8 @@
>  #define CMDQ_PREFETCH_1_SIZE_SHIFT	0
>  #define CMDQ_PREFETCH_1_ADDR_MASK	~0xfffUL
>  
> +#define CMDQ_CFGI_0_SSID_SHIFT		12
> +#define CMDQ_CFGI_0_SSID_MASK		0xfffffUL
>  #define CMDQ_CFGI_0_SID_SHIFT		32
>  #define CMDQ_CFGI_0_SID_MASK		0xffffffffUL
>  #define CMDQ_CFGI_1_LEAF		(1UL << 0)
> @@ -476,8 +425,11 @@ struct arm_smmu_cmdq_ent {
>  
>  		#define CMDQ_OP_CFGI_STE	0x3
>  		#define CMDQ_OP_CFGI_ALL	0x4
> +		#define CMDQ_OP_CFGI_CD		0x5
> +		#define CMDQ_OP_CFGI_CD_ALL	0x6
>  		struct {
>  			u32			sid;
> +			u32			ssid;
>  			union {
>  				bool		leaf;
>  				u8		span;
> @@ -552,15 +504,9 @@ struct arm_smmu_strtab_l1_desc {
>  };
>  
>  struct arm_smmu_s1_cfg {
> -	__le64				*cdptr;
> -	dma_addr_t			cdptr_dma;
> -
> -	struct arm_smmu_ctx_desc {
> -		u16	asid;
> -		u64	ttbr;
> -		u64	tcr;
> -		u64	mair;
> -	}				cd;
> +	struct iommu_pasid_table_cfg	tables;
> +	struct iommu_pasid_table_ops	*ops;
> +	struct iommu_pasid_entry	*cd0; /* Default context */
>  };
>  
>  struct arm_smmu_s2_cfg {
> @@ -629,9 +575,7 @@ struct arm_smmu_device {
>  	unsigned long			oas; /* PA */
>  	unsigned long			pgsize_bitmap;
>  
> -#define ARM_SMMU_MAX_ASIDS		(1 << 16)
>  	unsigned int			asid_bits;
> -	DECLARE_BITMAP(asid_map, ARM_SMMU_MAX_ASIDS);
>  
>  #define ARM_SMMU_MAX_VMIDS		(1 << 16)
>  	unsigned int			vmid_bits;
> @@ -855,10 +799,16 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
>  		cmd[1] |= ent->prefetch.size << CMDQ_PREFETCH_1_SIZE_SHIFT;
>  		cmd[1] |= ent->prefetch.addr & CMDQ_PREFETCH_1_ADDR_MASK;
>  		break;
> +	case CMDQ_OP_CFGI_CD:
> +		cmd[0] |= ent->cfgi.ssid << CMDQ_CFGI_0_SSID_SHIFT;
> +		/* Fallthrough */
>  	case CMDQ_OP_CFGI_STE:
>  		cmd[0] |= (u64)ent->cfgi.sid << CMDQ_CFGI_0_SID_SHIFT;
>  		cmd[1] |= ent->cfgi.leaf ? CMDQ_CFGI_1_LEAF : 0;
>  		break;
> +	case CMDQ_OP_CFGI_CD_ALL:
> +		cmd[0] |= (u64)ent->cfgi.sid << CMDQ_CFGI_0_SID_SHIFT;
> +		break;
>  	case CMDQ_OP_CFGI_ALL:
>  		/* Cover the entire SID range */
>  		cmd[1] |= CMDQ_CFGI_1_RANGE_MASK << CMDQ_CFGI_1_RANGE_SHIFT;
> @@ -1059,54 +1009,6 @@ static void arm_smmu_cmdq_issue_sync(struct arm_smmu_device *smmu)
>  		dev_err_ratelimited(smmu->dev, "CMD_SYNC timeout\n");
>  }
>  
> -/* Context descriptor manipulation functions */
> -static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
> -{
> -	u64 val = 0;
> -
> -	/* Repack the TCR. Just care about TTBR0 for now */
> -	val |= ARM_SMMU_TCR2CD(tcr, T0SZ);
> -	val |= ARM_SMMU_TCR2CD(tcr, TG0);
> -	val |= ARM_SMMU_TCR2CD(tcr, IRGN0);
> -	val |= ARM_SMMU_TCR2CD(tcr, ORGN0);
> -	val |= ARM_SMMU_TCR2CD(tcr, SH0);
> -	val |= ARM_SMMU_TCR2CD(tcr, EPD0);
> -	val |= ARM_SMMU_TCR2CD(tcr, EPD1);
> -	val |= ARM_SMMU_TCR2CD(tcr, IPS);
> -	val |= ARM_SMMU_TCR2CD(tcr, TBI0);
> -
> -	return val;
> -}
> -
> -static void arm_smmu_write_ctx_desc(struct arm_smmu_device *smmu,
> -				    struct arm_smmu_s1_cfg *cfg)
> -{
> -	u64 val;
> -
> -	/*
> -	 * We don't need to issue any invalidation here, as we'll invalidate
> -	 * the STE when installing the new entry anyway.
> -	 */
> -	val = arm_smmu_cpu_tcr_to_cd(cfg->cd.tcr) |
> -#ifdef __BIG_ENDIAN
> -	      CTXDESC_CD_0_ENDI |
> -#endif
> -	      CTXDESC_CD_0_R | CTXDESC_CD_0_A | CTXDESC_CD_0_ASET_PRIVATE |
> -	      CTXDESC_CD_0_AA64 | (u64)cfg->cd.asid << CTXDESC_CD_0_ASID_SHIFT |
> -	      CTXDESC_CD_0_V;
> -
> -	/* STALL_MODEL==0b10 && CD.S==0 is ILLEGAL */
> -	if (smmu->features & ARM_SMMU_FEAT_STALL_FORCE)
> -		val |= CTXDESC_CD_0_S;
> -
> -	cfg->cdptr[0] = cpu_to_le64(val);
> -
> -	val = cfg->cd.ttbr & CTXDESC_CD_1_TTB0_MASK << CTXDESC_CD_1_TTB0_SHIFT;
> -	cfg->cdptr[1] = cpu_to_le64(val);
> -
> -	cfg->cdptr[3] = cpu_to_le64(cfg->cd.mair << CTXDESC_CD_3_MAIR_SHIFT);
> -}
> -
>  /* Stream table manipulation functions */
>  static void
>  arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
> @@ -1222,7 +1124,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
>  		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
>  			dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
>  
> -		val |= (ste->s1_cfg->cdptr_dma & STRTAB_STE_0_S1CTXPTR_MASK
> +		val |= (ste->s1_cfg->tables.base & STRTAB_STE_0_S1CTXPTR_MASK
>  		        << STRTAB_STE_0_S1CTXPTR_SHIFT) |
>  			STRTAB_STE_0_CFG_S1_TRANS;
>  	}
> @@ -1466,8 +1368,10 @@ static void arm_smmu_tlb_inv_context(void *cookie)
>  	struct arm_smmu_cmdq_ent cmd;
>  
>  	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
> +		if (unlikely(!smmu_domain->s1_cfg.cd0))
> +			return;
>  		cmd.opcode	= CMDQ_OP_TLBI_NH_ASID;
> -		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd.asid;
> +		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd0->tag;
>  		cmd.tlbi.vmid	= 0;
>  	} else {
>  		cmd.opcode	= CMDQ_OP_TLBI_S12_VMALL;
> @@ -1491,8 +1395,10 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
>  	};
>  
>  	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
> +		if (unlikely(!smmu_domain->s1_cfg.cd0))
> +			return;
>  		cmd.opcode	= CMDQ_OP_TLBI_NH_VA;
> -		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd.asid;
> +		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd0->tag;
>  	} else {
>  		cmd.opcode	= CMDQ_OP_TLBI_S2_IPA;
>  		cmd.tlbi.vmid	= smmu_domain->s2_cfg.vmid;
> @@ -1510,6 +1416,71 @@ static const struct iommu_gather_ops arm_smmu_gather_ops = {
>  	.tlb_sync	= arm_smmu_tlb_sync,
>  };
>  
> +/* PASID TABLE API */
> +static void __arm_smmu_sync_cd(struct arm_smmu_domain *smmu_domain,
> +			       struct arm_smmu_cmdq_ent *cmd)
> +{
> +	size_t i;
> +	unsigned long flags;
> +	struct arm_smmu_master_data *master;
> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
> +
> +	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
> +	list_for_each_entry(master, &smmu_domain->devices, list) {
> +		struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
> +
> +		for (i = 0; i < fwspec->num_ids; i++) {
> +			cmd->cfgi.sid = fwspec->ids[i];
> +			arm_smmu_cmdq_issue_cmd(smmu, cmd);
> +		}
> +	}
> +	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
> +
> +	__arm_smmu_tlb_sync(smmu);
> +}
> +
> +static void arm_smmu_sync_cd(void *cookie, int ssid, bool leaf)
> +{
> +	struct arm_smmu_cmdq_ent cmd = {
> +		.opcode	= CMDQ_OP_CFGI_CD_ALL,
> +		.cfgi	= {
> +			.ssid	= ssid,
> +			.leaf	= leaf,
> +		},
> +	};
> +
> +	__arm_smmu_sync_cd(cookie, &cmd);
> +}
> +
> +static void arm_smmu_sync_cd_all(void *cookie)
> +{
> +	struct arm_smmu_cmdq_ent cmd = {
> +		.opcode	= CMDQ_OP_CFGI_CD_ALL,
> +	};
> +
> +	__arm_smmu_sync_cd(cookie, &cmd);
> +}
> +
> +static void arm_smmu_tlb_inv_ssid(void *cookie, int ssid,
> +				  struct iommu_pasid_entry *entry)
> +{
> +	struct arm_smmu_domain *smmu_domain = cookie;
> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
> +	struct arm_smmu_cmdq_ent cmd = {
> +		.opcode		= CMDQ_OP_TLBI_NH_ASID,
> +		.tlbi.asid	= entry->tag,
> +	};
> +
> +	arm_smmu_cmdq_issue_cmd(smmu, &cmd);
> +	__arm_smmu_tlb_sync(smmu);
> +}
> +
> +static struct iommu_pasid_sync_ops arm_smmu_ctx_sync = {
> +	.cfg_flush	= arm_smmu_sync_cd,
> +	.cfg_flush_all	= arm_smmu_sync_cd_all,
> +	.tlb_flush	= arm_smmu_tlb_inv_ssid,
> +};
> +
>  /* IOMMU API */
>  static bool arm_smmu_capable(enum iommu_cap cap)
>  {
> @@ -1582,15 +1553,11 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
>  
>  	/* Free the CD and ASID, if we allocated them */
>  	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
> -		struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
> -
> -		if (cfg->cdptr) {
> -			dmam_free_coherent(smmu_domain->smmu->dev,
> -					   CTXDESC_CD_DWORDS << 3,
> -					   cfg->cdptr,
> -					   cfg->cdptr_dma);
> +		struct iommu_pasid_table_ops *ops = smmu_domain->s1_cfg.ops;
>  
> -			arm_smmu_bitmap_free(smmu->asid_map, cfg->cd.asid);
> +		if (ops) {
> +			ops->free_entry(ops, smmu_domain->s1_cfg.cd0);
> +			iommu_free_pasid_ops(ops);
>  		}
>  	} else {
>  		struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg;
> @@ -1605,31 +1572,42 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
>  				       struct io_pgtable_cfg *pgtbl_cfg)
>  {
>  	int ret;
> -	int asid;
> -	struct arm_smmu_device *smmu = smmu_domain->smmu;
> +	struct iommu_pasid_entry *entry;
> +	struct iommu_pasid_table_ops *ops;
>  	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
> +	struct iommu_pasid_table_cfg pasid_cfg = {
> +		.iommu_dev		= smmu->dev,
> +		.sync			= &arm_smmu_ctx_sync,
> +		.arm_smmu = {
> +			.stall		= !!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE),
> +			.asid_bits	= smmu->asid_bits,
> +		},
> +	};
>  
> -	asid = arm_smmu_bitmap_alloc(smmu->asid_map, smmu->asid_bits);
> -	if (asid < 0)
> -		return asid;
> +	ops = iommu_alloc_pasid_ops(PASID_TABLE_ARM_SMMU_V3, &pasid_cfg,
> +				    smmu_domain);
> +	if (!ops)
> +		return -ENOMEM;
>  
> -	cfg->cdptr = dmam_alloc_coherent(smmu->dev, CTXDESC_CD_DWORDS << 3,
> -					 &cfg->cdptr_dma,
> -					 GFP_KERNEL | __GFP_ZERO);
> -	if (!cfg->cdptr) {
> -		dev_warn(smmu->dev, "failed to allocate context descriptor\n");
> -		ret = -ENOMEM;
> -		goto out_free_asid;
> +	/* Create default entry */
> +	entry = ops->alloc_priv_entry(ops, ARM_64_LPAE_S1, pgtbl_cfg);
> +	if (IS_ERR(entry)) {
> +		iommu_free_pasid_ops(ops);
> +		return PTR_ERR(entry);
>  	}
>  
> -	cfg->cd.asid	= (u16)asid;
> -	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
> -	cfg->cd.tcr	= pgtbl_cfg->arm_lpae_s1_cfg.tcr;
> -	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair[0];
> -	return 0;
> +	ret = ops->set_entry(ops, 0, entry);
> +	if (ret) {
> +		ops->free_entry(ops, entry);
> +		iommu_free_pasid_ops(ops);
> +		return ret;
> +	}
> +
> +	cfg->tables	= pasid_cfg;
> +	cfg->ops	= ops;
> +	cfg->cd0	= entry;
>  
> -out_free_asid:
> -	arm_smmu_bitmap_free(smmu->asid_map, asid);
>  	return ret;
>  }
>  
> @@ -1832,7 +1810,6 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  	} else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>  		ste->s1_cfg = &smmu_domain->s1_cfg;
>  		ste->s2_cfg = NULL;
> -		arm_smmu_write_ctx_desc(smmu, ste->s1_cfg);
>  	} else {
>  		ste->s1_cfg = NULL;
>  		ste->s2_cfg = &smmu_domain->s2_cfg;
> diff --git a/drivers/iommu/iommu-pasid.c b/drivers/iommu/iommu-pasid.c
> index 6b21d369d514..239b91e18543 100644
> --- a/drivers/iommu/iommu-pasid.c
> +++ b/drivers/iommu/iommu-pasid.c
> @@ -13,6 +13,7 @@
>  
>  static const struct iommu_pasid_init_fns *
>  pasid_table_init_fns[PASID_TABLE_NUM_FMTS] = {
> +	[PASID_TABLE_ARM_SMMU_V3] = &arm_smmu_v3_pasid_init_fns,
>  };
>  
>  struct iommu_pasid_table_ops *
> diff --git a/drivers/iommu/iommu-pasid.h b/drivers/iommu/iommu-pasid.h
> index 40a27d35c1e0..77e449a1655b 100644
> --- a/drivers/iommu/iommu-pasid.h
> +++ b/drivers/iommu/iommu-pasid.h
> @@ -15,6 +15,7 @@
>  struct mm_struct;
>  
>  enum iommu_pasid_table_fmt {
> +	PASID_TABLE_ARM_SMMU_V3,
>  	PASID_TABLE_NUM_FMTS,
>  };
>  
> @@ -73,6 +74,25 @@ struct iommu_pasid_sync_ops {
>  			  struct iommu_pasid_entry *entry);
>  };
>  
> +/**
> + * arm_smmu_context_cfg - PASID table configuration for ARM SMMU v3
> + *
> + * SMMU properties:
> + * @stall:	devices attached to the domain are allowed to stall.
> + * @asid_bits:	number of ASID bits supported by the SMMU
> + *
> + * @s1fmt:	PASID table format, chosen by the allocator.
> + */
> +struct arm_smmu_context_cfg {
> +	u8				stall:1;
> +	u8				asid_bits;
> +
> +#define ARM_SMMU_S1FMT_LINEAR		0x0
> +#define ARM_SMMU_S1FMT_4K_L2		0x1
> +#define ARM_SMMU_S1FMT_64K_L2		0x2
> +	u8				s1fmt;
> +};
> +
>  /**
>   * struct iommu_pasid_table_cfg - Configuration data for a set of PASID tables.
>   *
> @@ -88,6 +108,11 @@ struct iommu_pasid_table_cfg {
>  	const struct iommu_pasid_sync_ops *sync;
>  
>  	dma_addr_t			base;
> +
> +	/* Low-level data specific to the IOMMU */
> +	union {
> +		struct arm_smmu_context_cfg arm_smmu;
> +	};
>  };
>  
>  struct iommu_pasid_table_ops *
> @@ -139,4 +164,6 @@ static inline void iommu_pasid_flush_tlbs(struct iommu_pasid_table *table,
>  	table->cfg.sync->tlb_flush(table->cookie, pasid, entry);
>  }
>  
> +extern struct iommu_pasid_init_fns arm_smmu_v3_pasid_init_fns;
> +
>  #endif /* __IOMMU_PASID_H */

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 17/37] iommu/arm-smmu-v3: Move context descriptor code
@ 2018-03-09 11:44         ` Jonathan Cameron
  0 siblings, 0 replies; 317+ messages in thread
From: Jonathan Cameron @ 2018-03-09 11:44 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, kvm,
	joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	shunyong.yang, nwatters, okaya, jcrouse, rfranz, dwmw2,
	jacob.jun.pan, yi.l.liu, ashok.raj, robdclark, christian.koenig,
	bharatku

On Mon, 12 Feb 2018 18:33:32 +0000
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> In order to add support for substream ID, move the context descriptor code
> into a separate library. At the moment it only manages context descriptor
> 0, which is used for non-PASID translations.
> 
> One important behavior change is the ASID allocator, which is now global
> instead of per-SMMU. If we end up needing per-SMMU ASIDs after all, it
> would be relatively simple to move back to per-device allocator instead
> of a global one. Sharing ASIDs will require an IDR, so implement the
> ASID allocator with an IDA instead of porting the bitmap, to ease the
> transition.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Hi Jean-Philippe,

This would have been easier to review if split into a 'move' and additional
patches actually making the changes described.

Superficially it looks like there may be more going on in here than the
above description suggests.  I'm unsure why we are gaining 
the CFGI_CD_ALL and similar in this patch as there is just to much going on.

Thanks,

Jonathan
> ---
>  MAINTAINERS                         |   2 +-
>  drivers/iommu/Kconfig               |  11 ++
>  drivers/iommu/Makefile              |   1 +
>  drivers/iommu/arm-smmu-v3-context.c | 289 ++++++++++++++++++++++++++++++++++++
>  drivers/iommu/arm-smmu-v3.c         | 265 +++++++++++++++------------------
>  drivers/iommu/iommu-pasid.c         |   1 +
>  drivers/iommu/iommu-pasid.h         |  27 ++++
>  7 files changed, 451 insertions(+), 145 deletions(-)
>  create mode 100644 drivers/iommu/arm-smmu-v3-context.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 9cb8ced8322a..93507bfe03a6 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1104,7 +1104,7 @@ R:	Robin Murphy <robin.murphy@arm.com>
>  L:	linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
>  S:	Maintained
>  F:	drivers/iommu/arm-smmu.c
> -F:	drivers/iommu/arm-smmu-v3.c
> +F:	drivers/iommu/arm-smmu-v3*
>  F:	drivers/iommu/io-pgtable-arm.c
>  F:	drivers/iommu/io-pgtable-arm.h
>  F:	drivers/iommu/io-pgtable-arm-v7s.c
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 8add90ba9b75..4b272925ee78 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -66,6 +66,16 @@ menu "Generic PASID table support"
>  config IOMMU_PASID_TABLE
>  	bool
>  
> +config ARM_SMMU_V3_CONTEXT
> +	bool "ARM SMMU v3 Context Descriptor tables"
> +	select IOMMU_PASID_TABLE
> +	depends on ARM64
> +	help
> +	Enable support for ARM SMMU v3 Context Descriptor tables, used for DMA
> +	and PASID support.
> +
> +	If unsure, say N here.
> +
>  endmenu
>  
>  config IOMMU_IOVA
> @@ -344,6 +354,7 @@ config ARM_SMMU_V3
>  	depends on ARM64
>  	select IOMMU_API
>  	select IOMMU_IO_PGTABLE_LPAE
> +	select ARM_SMMU_V3_CONTEXT
>  	select GENERIC_MSI_IRQ_DOMAIN
>  	help
>  	  Support for implementations of the ARM System MMU architecture
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 338e59c93131..22758960ed02 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -9,6 +9,7 @@ obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
>  obj-$(CONFIG_IOMMU_PASID_TABLE) += iommu-pasid.o
> +obj-$(CONFIG_ARM_SMMU_V3_CONTEXT) += arm-smmu-v3-context.o
>  obj-$(CONFIG_IOMMU_IOVA) += iova.o
>  obj-$(CONFIG_OF_IOMMU)	+= of_iommu.o
>  obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
> diff --git a/drivers/iommu/arm-smmu-v3-context.c b/drivers/iommu/arm-smmu-v3-context.c
> new file mode 100644
> index 000000000000..e910cb356f45
> --- /dev/null
> +++ b/drivers/iommu/arm-smmu-v3-context.c
> @@ -0,0 +1,289 @@
> +/*
> + * Context descriptor table driver for SMMUv3
> + *
> + * Copyright (C) 2018 ARM Ltd.
> + *
> + * SPDX-License-Identifier: GPL-2.0
> + */
> +
> +#include <linux/device.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/idr.h>
> +#include <linux/kernel.h>
> +#include <linux/slab.h>
> +
> +#include "iommu-pasid.h"
> +
> +#define CTXDESC_CD_DWORDS		8
> +#define CTXDESC_CD_0_TCR_T0SZ_SHIFT	0
> +#define ARM64_TCR_T0SZ_SHIFT		0
> +#define ARM64_TCR_T0SZ_MASK		0x1fUL
> +#define CTXDESC_CD_0_TCR_TG0_SHIFT	6
> +#define ARM64_TCR_TG0_SHIFT		14
> +#define ARM64_TCR_TG0_MASK		0x3UL
> +#define CTXDESC_CD_0_TCR_IRGN0_SHIFT	8
> +#define ARM64_TCR_IRGN0_SHIFT		8
> +#define ARM64_TCR_IRGN0_MASK		0x3UL
> +#define CTXDESC_CD_0_TCR_ORGN0_SHIFT	10
> +#define ARM64_TCR_ORGN0_SHIFT		10
> +#define ARM64_TCR_ORGN0_MASK		0x3UL
> +#define CTXDESC_CD_0_TCR_SH0_SHIFT	12
> +#define ARM64_TCR_SH0_SHIFT		12
> +#define ARM64_TCR_SH0_MASK		0x3UL
> +#define CTXDESC_CD_0_TCR_EPD0_SHIFT	14
> +#define ARM64_TCR_EPD0_SHIFT		7
> +#define ARM64_TCR_EPD0_MASK		0x1UL
> +#define CTXDESC_CD_0_TCR_EPD1_SHIFT	30
> +#define ARM64_TCR_EPD1_SHIFT		23
> +#define ARM64_TCR_EPD1_MASK		0x1UL
> +
> +#define CTXDESC_CD_0_ENDI		(1UL << 15)
> +#define CTXDESC_CD_0_V			(1UL << 31)
> +
> +#define CTXDESC_CD_0_TCR_IPS_SHIFT	32
> +#define ARM64_TCR_IPS_SHIFT		32
> +#define ARM64_TCR_IPS_MASK		0x7UL
> +#define CTXDESC_CD_0_TCR_TBI0_SHIFT	38
> +#define ARM64_TCR_TBI0_SHIFT		37
> +#define ARM64_TCR_TBI0_MASK		0x1UL
> +
> +#define CTXDESC_CD_0_AA64		(1UL << 41)
> +#define CTXDESC_CD_0_S			(1UL << 44)
> +#define CTXDESC_CD_0_R			(1UL << 45)
> +#define CTXDESC_CD_0_A			(1UL << 46)
> +#define CTXDESC_CD_0_ASET_SHIFT		47
> +#define CTXDESC_CD_0_ASET_SHARED	(0UL << CTXDESC_CD_0_ASET_SHIFT)
> +#define CTXDESC_CD_0_ASET_PRIVATE	(1UL << CTXDESC_CD_0_ASET_SHIFT)
> +#define CTXDESC_CD_0_ASID_SHIFT		48
> +#define CTXDESC_CD_0_ASID_MASK		0xffffUL
> +
> +#define CTXDESC_CD_1_TTB0_SHIFT		4
> +#define CTXDESC_CD_1_TTB0_MASK		0xfffffffffffUL
> +
> +#define CTXDESC_CD_3_MAIR_SHIFT		0
> +
> +/* Convert between AArch64 (CPU) TCR format and SMMU CD format */
> +#define ARM_SMMU_TCR2CD(tcr, fld)					\
> +	(((tcr) >> ARM64_TCR_##fld##_SHIFT & ARM64_TCR_##fld##_MASK)	\
> +	 << CTXDESC_CD_0_TCR_##fld##_SHIFT)
> +
> +
> +struct arm_smmu_cd {
> +	struct iommu_pasid_entry	entry;
> +
> +	u64				ttbr;
> +	u64				tcr;
> +	u64				mair;
> +};
> +
> +#define pasid_entry_to_cd(entry) \
> +	container_of((entry), struct arm_smmu_cd, entry)
> +
> +struct arm_smmu_cd_tables {
> +	struct iommu_pasid_table	pasid;
> +
> +	void				*ptr;
> +	dma_addr_t			ptr_dma;
> +};
> +
> +#define pasid_to_cd_tables(pasid_table) \
> +	container_of((pasid_table), struct arm_smmu_cd_tables, pasid)
> +
> +#define pasid_ops_to_tables(ops) \
> +	pasid_to_cd_tables(iommu_pasid_table_ops_to_table(ops))
> +
> +static DEFINE_IDA(asid_ida);
> +
> +static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
> +{
> +	u64 val = 0;
> +
> +	/* Repack the TCR. Just care about TTBR0 for now */
> +	val |= ARM_SMMU_TCR2CD(tcr, T0SZ);
> +	val |= ARM_SMMU_TCR2CD(tcr, TG0);
> +	val |= ARM_SMMU_TCR2CD(tcr, IRGN0);
> +	val |= ARM_SMMU_TCR2CD(tcr, ORGN0);
> +	val |= ARM_SMMU_TCR2CD(tcr, SH0);
> +	val |= ARM_SMMU_TCR2CD(tcr, EPD0);
> +	val |= ARM_SMMU_TCR2CD(tcr, EPD1);
> +	val |= ARM_SMMU_TCR2CD(tcr, IPS);
> +	val |= ARM_SMMU_TCR2CD(tcr, TBI0);
> +
> +	return val;
> +}
> +
> +static int arm_smmu_write_ctx_desc(struct arm_smmu_cd_tables *tbl, int ssid,
> +				    struct arm_smmu_cd *cd)
> +{
> +	u64 val;
> +	__u64 *cdptr = tbl->ptr;
> +	struct arm_smmu_context_cfg *cfg = &tbl->pasid.cfg.arm_smmu;
> +
> +	if (!cd || WARN_ON(ssid))
> +		return -EINVAL;
> +
> +	/*
> +	 * We don't need to issue any invalidation here, as we'll invalidate
> +	 * the STE when installing the new entry anyway.
> +	 */
> +	val = arm_smmu_cpu_tcr_to_cd(cd->tcr) |
> +#ifdef __BIG_ENDIAN
> +	      CTXDESC_CD_0_ENDI |
> +#endif
> +	      CTXDESC_CD_0_R | CTXDESC_CD_0_A | CTXDESC_CD_0_ASET_PRIVATE |
> +	      CTXDESC_CD_0_AA64 | cd->entry.tag << CTXDESC_CD_0_ASID_SHIFT |
> +	      CTXDESC_CD_0_V;
> +
> +	if (cfg->stall)
> +		val |= CTXDESC_CD_0_S;
> +
> +	cdptr[0] = cpu_to_le64(val);
> +
> +	val = cd->ttbr & CTXDESC_CD_1_TTB0_MASK << CTXDESC_CD_1_TTB0_SHIFT;
> +	cdptr[1] = cpu_to_le64(val);
> +
> +	cdptr[3] = cpu_to_le64(cd->mair << CTXDESC_CD_3_MAIR_SHIFT);
> +
> +	return 0;
> +}
> +
> +static struct iommu_pasid_entry *
> +arm_smmu_alloc_shared_cd(struct iommu_pasid_table_ops *ops, struct mm_struct *mm)
> +{
> +	return ERR_PTR(-ENODEV);
> +}
> +
> +static struct iommu_pasid_entry *
> +arm_smmu_alloc_priv_cd(struct iommu_pasid_table_ops *ops,
> +		       enum io_pgtable_fmt fmt,
> +		       struct io_pgtable_cfg *cfg)
> +{
> +	int ret;
> +	int asid;
> +	struct arm_smmu_cd *cd;
> +	struct arm_smmu_cd_tables *tbl = pasid_ops_to_tables(ops);
> +	struct arm_smmu_context_cfg *ctx_cfg = &tbl->pasid.cfg.arm_smmu;
> +
> +	cd = kzalloc(sizeof(*cd), GFP_KERNEL);
> +	if (!cd)
> +		return ERR_PTR(-ENOMEM);
> +
> +	asid = ida_simple_get(&asid_ida, 0, 1 << ctx_cfg->asid_bits,
> +			      GFP_KERNEL);
> +	if (asid < 0) {
> +		kfree(cd);
> +		return ERR_PTR(asid);
> +	}
> +
> +	cd->entry.tag = asid;
> +
> +	switch (fmt) {
> +	case ARM_64_LPAE_S1:
> +		cd->ttbr	= cfg->arm_lpae_s1_cfg.ttbr[0];
> +		cd->tcr		= cfg->arm_lpae_s1_cfg.tcr;
> +		cd->mair	= cfg->arm_lpae_s1_cfg.mair[0];
> +		break;
> +	default:
> +		pr_err("Unsupported pgtable format 0x%x\n", fmt);
> +		ret = -EINVAL;
> +		goto err_free_asid;
> +	}
> +
> +	return &cd->entry;
> +
> +err_free_asid:
> +	ida_simple_remove(&asid_ida, asid);
> +
> +	kfree(cd);
> +
> +	return ERR_PTR(ret);
> +}
> +
> +static void arm_smmu_free_cd(struct iommu_pasid_table_ops *ops,
> +			     struct iommu_pasid_entry *entry)
> +{
> +	struct arm_smmu_cd *cd = pasid_entry_to_cd(entry);
> +
> +	ida_simple_remove(&asid_ida, (u16)entry->tag);
> +	kfree(cd);
> +}
> +
> +static int arm_smmu_set_cd(struct iommu_pasid_table_ops *ops, int pasid,
> +			   struct iommu_pasid_entry *entry)
> +{
> +	struct arm_smmu_cd_tables *tbl = pasid_ops_to_tables(ops);
> +	struct arm_smmu_cd *cd = pasid_entry_to_cd(entry);
> +
> +	if (WARN_ON(pasid > (1 << tbl->pasid.cfg.order)))
> +		return -EINVAL;
> +
> +	return arm_smmu_write_ctx_desc(tbl, pasid, cd);
> +}
> +
> +static void arm_smmu_clear_cd(struct iommu_pasid_table_ops *ops, int pasid,
> +			      struct iommu_pasid_entry *entry)
> +{
> +	struct arm_smmu_cd_tables *tbl = pasid_ops_to_tables(ops);
> +
> +	if (WARN_ON(pasid > (1 << tbl->pasid.cfg.order)))
> +		return;
> +
> +	arm_smmu_write_ctx_desc(tbl, pasid, NULL);
> +}
> +
> +static struct iommu_pasid_table *
> +arm_smmu_alloc_cd_tables(struct iommu_pasid_table_cfg *cfg, void *cookie)
> +{
> +	struct arm_smmu_cd_tables *tbl;
> +	struct device *dev = cfg->iommu_dev;
> +
> +	if (cfg->order) {
> +		/* TODO: support SSID */
> +		return NULL;
> +	}
> +
> +	tbl = devm_kzalloc(dev, sizeof(*tbl), GFP_KERNEL);
> +	if (!tbl)
> +		return NULL;
> +
> +	tbl->ptr = dmam_alloc_coherent(dev, CTXDESC_CD_DWORDS << 3,
> +				       &tbl->ptr_dma, GFP_KERNEL | __GFP_ZERO);
> +	if (!tbl->ptr) {
> +		dev_warn(dev, "failed to allocate context descriptor\n");
> +		goto err_free_tbl;
> +	}
> +
> +	tbl->pasid.ops = (struct iommu_pasid_table_ops) {
> +		.alloc_priv_entry	= arm_smmu_alloc_priv_cd,
> +		.alloc_shared_entry	= arm_smmu_alloc_shared_cd,
> +		.free_entry		= arm_smmu_free_cd,
> +		.set_entry		= arm_smmu_set_cd,
> +		.clear_entry		= arm_smmu_clear_cd,
> +	};
> +
> +	cfg->base		= tbl->ptr_dma;
> +	cfg->arm_smmu.s1fmt	= ARM_SMMU_S1FMT_LINEAR;
> +
> +	return &tbl->pasid;
> +
> +err_free_tbl:
> +	devm_kfree(dev, tbl);
> +
> +	return NULL;
> +}
> +
> +static void arm_smmu_free_cd_tables(struct iommu_pasid_table *pasid_table)
> +{
> +	struct iommu_pasid_table_cfg *cfg = &pasid_table->cfg;
> +	struct device *dev = cfg->iommu_dev;
> +	struct arm_smmu_cd_tables *tbl = pasid_to_cd_tables(pasid_table);
> +
> +	dmam_free_coherent(dev, CTXDESC_CD_DWORDS << 3,
> +			   tbl->ptr, tbl->ptr_dma);
> +	devm_kfree(dev, tbl);
> +}
> +
> +struct iommu_pasid_init_fns arm_smmu_v3_pasid_init_fns = {
> +	.alloc	= arm_smmu_alloc_cd_tables,
> +	.free	= arm_smmu_free_cd_tables,
> +};
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index fb2507ffcdaf..b6d8c90fafb3 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -40,6 +40,7 @@
>  #include <linux/amba/bus.h>
>  
>  #include "io-pgtable.h"
> +#include "iommu-pasid.h"
>  
>  /* MMIO registers */
>  #define ARM_SMMU_IDR0			0x0
> @@ -281,60 +282,6 @@
>  #define STRTAB_STE_3_S2TTB_SHIFT	4
>  #define STRTAB_STE_3_S2TTB_MASK		0xfffffffffffUL
>  
> -/* Context descriptor (stage-1 only) */
> -#define CTXDESC_CD_DWORDS		8
> -#define CTXDESC_CD_0_TCR_T0SZ_SHIFT	0
> -#define ARM64_TCR_T0SZ_SHIFT		0
> -#define ARM64_TCR_T0SZ_MASK		0x1fUL
> -#define CTXDESC_CD_0_TCR_TG0_SHIFT	6
> -#define ARM64_TCR_TG0_SHIFT		14
> -#define ARM64_TCR_TG0_MASK		0x3UL
> -#define CTXDESC_CD_0_TCR_IRGN0_SHIFT	8
> -#define ARM64_TCR_IRGN0_SHIFT		8
> -#define ARM64_TCR_IRGN0_MASK		0x3UL
> -#define CTXDESC_CD_0_TCR_ORGN0_SHIFT	10
> -#define ARM64_TCR_ORGN0_SHIFT		10
> -#define ARM64_TCR_ORGN0_MASK		0x3UL
> -#define CTXDESC_CD_0_TCR_SH0_SHIFT	12
> -#define ARM64_TCR_SH0_SHIFT		12
> -#define ARM64_TCR_SH0_MASK		0x3UL
> -#define CTXDESC_CD_0_TCR_EPD0_SHIFT	14
> -#define ARM64_TCR_EPD0_SHIFT		7
> -#define ARM64_TCR_EPD0_MASK		0x1UL
> -#define CTXDESC_CD_0_TCR_EPD1_SHIFT	30
> -#define ARM64_TCR_EPD1_SHIFT		23
> -#define ARM64_TCR_EPD1_MASK		0x1UL
> -
> -#define CTXDESC_CD_0_ENDI		(1UL << 15)
> -#define CTXDESC_CD_0_V			(1UL << 31)
> -
> -#define CTXDESC_CD_0_TCR_IPS_SHIFT	32
> -#define ARM64_TCR_IPS_SHIFT		32
> -#define ARM64_TCR_IPS_MASK		0x7UL
> -#define CTXDESC_CD_0_TCR_TBI0_SHIFT	38
> -#define ARM64_TCR_TBI0_SHIFT		37
> -#define ARM64_TCR_TBI0_MASK		0x1UL
> -
> -#define CTXDESC_CD_0_AA64		(1UL << 41)
> -#define CTXDESC_CD_0_S			(1UL << 44)
> -#define CTXDESC_CD_0_R			(1UL << 45)
> -#define CTXDESC_CD_0_A			(1UL << 46)
> -#define CTXDESC_CD_0_ASET_SHIFT		47
> -#define CTXDESC_CD_0_ASET_SHARED	(0UL << CTXDESC_CD_0_ASET_SHIFT)
> -#define CTXDESC_CD_0_ASET_PRIVATE	(1UL << CTXDESC_CD_0_ASET_SHIFT)
> -#define CTXDESC_CD_0_ASID_SHIFT		48
> -#define CTXDESC_CD_0_ASID_MASK		0xffffUL
> -
> -#define CTXDESC_CD_1_TTB0_SHIFT		4
> -#define CTXDESC_CD_1_TTB0_MASK		0xfffffffffffUL
> -
> -#define CTXDESC_CD_3_MAIR_SHIFT		0
> -
> -/* Convert between AArch64 (CPU) TCR format and SMMU CD format */
> -#define ARM_SMMU_TCR2CD(tcr, fld)					\
> -	(((tcr) >> ARM64_TCR_##fld##_SHIFT & ARM64_TCR_##fld##_MASK)	\
> -	 << CTXDESC_CD_0_TCR_##fld##_SHIFT)
> -
>  /* Command queue */
>  #define CMDQ_ENT_DWORDS			2
>  #define CMDQ_MAX_SZ_SHIFT		8
> @@ -353,6 +300,8 @@
>  #define CMDQ_PREFETCH_1_SIZE_SHIFT	0
>  #define CMDQ_PREFETCH_1_ADDR_MASK	~0xfffUL
>  
> +#define CMDQ_CFGI_0_SSID_SHIFT		12
> +#define CMDQ_CFGI_0_SSID_MASK		0xfffffUL
>  #define CMDQ_CFGI_0_SID_SHIFT		32
>  #define CMDQ_CFGI_0_SID_MASK		0xffffffffUL
>  #define CMDQ_CFGI_1_LEAF		(1UL << 0)
> @@ -476,8 +425,11 @@ struct arm_smmu_cmdq_ent {
>  
>  		#define CMDQ_OP_CFGI_STE	0x3
>  		#define CMDQ_OP_CFGI_ALL	0x4
> +		#define CMDQ_OP_CFGI_CD		0x5
> +		#define CMDQ_OP_CFGI_CD_ALL	0x6
>  		struct {
>  			u32			sid;
> +			u32			ssid;
>  			union {
>  				bool		leaf;
>  				u8		span;
> @@ -552,15 +504,9 @@ struct arm_smmu_strtab_l1_desc {
>  };
>  
>  struct arm_smmu_s1_cfg {
> -	__le64				*cdptr;
> -	dma_addr_t			cdptr_dma;
> -
> -	struct arm_smmu_ctx_desc {
> -		u16	asid;
> -		u64	ttbr;
> -		u64	tcr;
> -		u64	mair;
> -	}				cd;
> +	struct iommu_pasid_table_cfg	tables;
> +	struct iommu_pasid_table_ops	*ops;
> +	struct iommu_pasid_entry	*cd0; /* Default context */
>  };
>  
>  struct arm_smmu_s2_cfg {
> @@ -629,9 +575,7 @@ struct arm_smmu_device {
>  	unsigned long			oas; /* PA */
>  	unsigned long			pgsize_bitmap;
>  
> -#define ARM_SMMU_MAX_ASIDS		(1 << 16)
>  	unsigned int			asid_bits;
> -	DECLARE_BITMAP(asid_map, ARM_SMMU_MAX_ASIDS);
>  
>  #define ARM_SMMU_MAX_VMIDS		(1 << 16)
>  	unsigned int			vmid_bits;
> @@ -855,10 +799,16 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
>  		cmd[1] |= ent->prefetch.size << CMDQ_PREFETCH_1_SIZE_SHIFT;
>  		cmd[1] |= ent->prefetch.addr & CMDQ_PREFETCH_1_ADDR_MASK;
>  		break;
> +	case CMDQ_OP_CFGI_CD:
> +		cmd[0] |= ent->cfgi.ssid << CMDQ_CFGI_0_SSID_SHIFT;
> +		/* Fallthrough */
>  	case CMDQ_OP_CFGI_STE:
>  		cmd[0] |= (u64)ent->cfgi.sid << CMDQ_CFGI_0_SID_SHIFT;
>  		cmd[1] |= ent->cfgi.leaf ? CMDQ_CFGI_1_LEAF : 0;
>  		break;
> +	case CMDQ_OP_CFGI_CD_ALL:
> +		cmd[0] |= (u64)ent->cfgi.sid << CMDQ_CFGI_0_SID_SHIFT;
> +		break;
>  	case CMDQ_OP_CFGI_ALL:
>  		/* Cover the entire SID range */
>  		cmd[1] |= CMDQ_CFGI_1_RANGE_MASK << CMDQ_CFGI_1_RANGE_SHIFT;
> @@ -1059,54 +1009,6 @@ static void arm_smmu_cmdq_issue_sync(struct arm_smmu_device *smmu)
>  		dev_err_ratelimited(smmu->dev, "CMD_SYNC timeout\n");
>  }
>  
> -/* Context descriptor manipulation functions */
> -static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
> -{
> -	u64 val = 0;
> -
> -	/* Repack the TCR. Just care about TTBR0 for now */
> -	val |= ARM_SMMU_TCR2CD(tcr, T0SZ);
> -	val |= ARM_SMMU_TCR2CD(tcr, TG0);
> -	val |= ARM_SMMU_TCR2CD(tcr, IRGN0);
> -	val |= ARM_SMMU_TCR2CD(tcr, ORGN0);
> -	val |= ARM_SMMU_TCR2CD(tcr, SH0);
> -	val |= ARM_SMMU_TCR2CD(tcr, EPD0);
> -	val |= ARM_SMMU_TCR2CD(tcr, EPD1);
> -	val |= ARM_SMMU_TCR2CD(tcr, IPS);
> -	val |= ARM_SMMU_TCR2CD(tcr, TBI0);
> -
> -	return val;
> -}
> -
> -static void arm_smmu_write_ctx_desc(struct arm_smmu_device *smmu,
> -				    struct arm_smmu_s1_cfg *cfg)
> -{
> -	u64 val;
> -
> -	/*
> -	 * We don't need to issue any invalidation here, as we'll invalidate
> -	 * the STE when installing the new entry anyway.
> -	 */
> -	val = arm_smmu_cpu_tcr_to_cd(cfg->cd.tcr) |
> -#ifdef __BIG_ENDIAN
> -	      CTXDESC_CD_0_ENDI |
> -#endif
> -	      CTXDESC_CD_0_R | CTXDESC_CD_0_A | CTXDESC_CD_0_ASET_PRIVATE |
> -	      CTXDESC_CD_0_AA64 | (u64)cfg->cd.asid << CTXDESC_CD_0_ASID_SHIFT |
> -	      CTXDESC_CD_0_V;
> -
> -	/* STALL_MODEL==0b10 && CD.S==0 is ILLEGAL */
> -	if (smmu->features & ARM_SMMU_FEAT_STALL_FORCE)
> -		val |= CTXDESC_CD_0_S;
> -
> -	cfg->cdptr[0] = cpu_to_le64(val);
> -
> -	val = cfg->cd.ttbr & CTXDESC_CD_1_TTB0_MASK << CTXDESC_CD_1_TTB0_SHIFT;
> -	cfg->cdptr[1] = cpu_to_le64(val);
> -
> -	cfg->cdptr[3] = cpu_to_le64(cfg->cd.mair << CTXDESC_CD_3_MAIR_SHIFT);
> -}
> -
>  /* Stream table manipulation functions */
>  static void
>  arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
> @@ -1222,7 +1124,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
>  		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
>  			dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
>  
> -		val |= (ste->s1_cfg->cdptr_dma & STRTAB_STE_0_S1CTXPTR_MASK
> +		val |= (ste->s1_cfg->tables.base & STRTAB_STE_0_S1CTXPTR_MASK
>  		        << STRTAB_STE_0_S1CTXPTR_SHIFT) |
>  			STRTAB_STE_0_CFG_S1_TRANS;
>  	}
> @@ -1466,8 +1368,10 @@ static void arm_smmu_tlb_inv_context(void *cookie)
>  	struct arm_smmu_cmdq_ent cmd;
>  
>  	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
> +		if (unlikely(!smmu_domain->s1_cfg.cd0))
> +			return;
>  		cmd.opcode	= CMDQ_OP_TLBI_NH_ASID;
> -		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd.asid;
> +		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd0->tag;
>  		cmd.tlbi.vmid	= 0;
>  	} else {
>  		cmd.opcode	= CMDQ_OP_TLBI_S12_VMALL;
> @@ -1491,8 +1395,10 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
>  	};
>  
>  	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
> +		if (unlikely(!smmu_domain->s1_cfg.cd0))
> +			return;
>  		cmd.opcode	= CMDQ_OP_TLBI_NH_VA;
> -		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd.asid;
> +		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd0->tag;
>  	} else {
>  		cmd.opcode	= CMDQ_OP_TLBI_S2_IPA;
>  		cmd.tlbi.vmid	= smmu_domain->s2_cfg.vmid;
> @@ -1510,6 +1416,71 @@ static const struct iommu_gather_ops arm_smmu_gather_ops = {
>  	.tlb_sync	= arm_smmu_tlb_sync,
>  };
>  
> +/* PASID TABLE API */
> +static void __arm_smmu_sync_cd(struct arm_smmu_domain *smmu_domain,
> +			       struct arm_smmu_cmdq_ent *cmd)
> +{
> +	size_t i;
> +	unsigned long flags;
> +	struct arm_smmu_master_data *master;
> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
> +
> +	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
> +	list_for_each_entry(master, &smmu_domain->devices, list) {
> +		struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
> +
> +		for (i = 0; i < fwspec->num_ids; i++) {
> +			cmd->cfgi.sid = fwspec->ids[i];
> +			arm_smmu_cmdq_issue_cmd(smmu, cmd);
> +		}
> +	}
> +	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
> +
> +	__arm_smmu_tlb_sync(smmu);
> +}
> +
> +static void arm_smmu_sync_cd(void *cookie, int ssid, bool leaf)
> +{
> +	struct arm_smmu_cmdq_ent cmd = {
> +		.opcode	= CMDQ_OP_CFGI_CD_ALL,
> +		.cfgi	= {
> +			.ssid	= ssid,
> +			.leaf	= leaf,
> +		},
> +	};
> +
> +	__arm_smmu_sync_cd(cookie, &cmd);
> +}
> +
> +static void arm_smmu_sync_cd_all(void *cookie)
> +{
> +	struct arm_smmu_cmdq_ent cmd = {
> +		.opcode	= CMDQ_OP_CFGI_CD_ALL,
> +	};
> +
> +	__arm_smmu_sync_cd(cookie, &cmd);
> +}
> +
> +static void arm_smmu_tlb_inv_ssid(void *cookie, int ssid,
> +				  struct iommu_pasid_entry *entry)
> +{
> +	struct arm_smmu_domain *smmu_domain = cookie;
> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
> +	struct arm_smmu_cmdq_ent cmd = {
> +		.opcode		= CMDQ_OP_TLBI_NH_ASID,
> +		.tlbi.asid	= entry->tag,
> +	};
> +
> +	arm_smmu_cmdq_issue_cmd(smmu, &cmd);
> +	__arm_smmu_tlb_sync(smmu);
> +}
> +
> +static struct iommu_pasid_sync_ops arm_smmu_ctx_sync = {
> +	.cfg_flush	= arm_smmu_sync_cd,
> +	.cfg_flush_all	= arm_smmu_sync_cd_all,
> +	.tlb_flush	= arm_smmu_tlb_inv_ssid,
> +};
> +
>  /* IOMMU API */
>  static bool arm_smmu_capable(enum iommu_cap cap)
>  {
> @@ -1582,15 +1553,11 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
>  
>  	/* Free the CD and ASID, if we allocated them */
>  	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
> -		struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
> -
> -		if (cfg->cdptr) {
> -			dmam_free_coherent(smmu_domain->smmu->dev,
> -					   CTXDESC_CD_DWORDS << 3,
> -					   cfg->cdptr,
> -					   cfg->cdptr_dma);
> +		struct iommu_pasid_table_ops *ops = smmu_domain->s1_cfg.ops;
>  
> -			arm_smmu_bitmap_free(smmu->asid_map, cfg->cd.asid);
> +		if (ops) {
> +			ops->free_entry(ops, smmu_domain->s1_cfg.cd0);
> +			iommu_free_pasid_ops(ops);
>  		}
>  	} else {
>  		struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg;
> @@ -1605,31 +1572,42 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
>  				       struct io_pgtable_cfg *pgtbl_cfg)
>  {
>  	int ret;
> -	int asid;
> -	struct arm_smmu_device *smmu = smmu_domain->smmu;
> +	struct iommu_pasid_entry *entry;
> +	struct iommu_pasid_table_ops *ops;
>  	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
> +	struct iommu_pasid_table_cfg pasid_cfg = {
> +		.iommu_dev		= smmu->dev,
> +		.sync			= &arm_smmu_ctx_sync,
> +		.arm_smmu = {
> +			.stall		= !!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE),
> +			.asid_bits	= smmu->asid_bits,
> +		},
> +	};
>  
> -	asid = arm_smmu_bitmap_alloc(smmu->asid_map, smmu->asid_bits);
> -	if (asid < 0)
> -		return asid;
> +	ops = iommu_alloc_pasid_ops(PASID_TABLE_ARM_SMMU_V3, &pasid_cfg,
> +				    smmu_domain);
> +	if (!ops)
> +		return -ENOMEM;
>  
> -	cfg->cdptr = dmam_alloc_coherent(smmu->dev, CTXDESC_CD_DWORDS << 3,
> -					 &cfg->cdptr_dma,
> -					 GFP_KERNEL | __GFP_ZERO);
> -	if (!cfg->cdptr) {
> -		dev_warn(smmu->dev, "failed to allocate context descriptor\n");
> -		ret = -ENOMEM;
> -		goto out_free_asid;
> +	/* Create default entry */
> +	entry = ops->alloc_priv_entry(ops, ARM_64_LPAE_S1, pgtbl_cfg);
> +	if (IS_ERR(entry)) {
> +		iommu_free_pasid_ops(ops);
> +		return PTR_ERR(entry);
>  	}
>  
> -	cfg->cd.asid	= (u16)asid;
> -	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
> -	cfg->cd.tcr	= pgtbl_cfg->arm_lpae_s1_cfg.tcr;
> -	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair[0];
> -	return 0;
> +	ret = ops->set_entry(ops, 0, entry);
> +	if (ret) {
> +		ops->free_entry(ops, entry);
> +		iommu_free_pasid_ops(ops);
> +		return ret;
> +	}
> +
> +	cfg->tables	= pasid_cfg;
> +	cfg->ops	= ops;
> +	cfg->cd0	= entry;
>  
> -out_free_asid:
> -	arm_smmu_bitmap_free(smmu->asid_map, asid);
>  	return ret;
>  }
>  
> @@ -1832,7 +1810,6 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  	} else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>  		ste->s1_cfg = &smmu_domain->s1_cfg;
>  		ste->s2_cfg = NULL;
> -		arm_smmu_write_ctx_desc(smmu, ste->s1_cfg);
>  	} else {
>  		ste->s1_cfg = NULL;
>  		ste->s2_cfg = &smmu_domain->s2_cfg;
> diff --git a/drivers/iommu/iommu-pasid.c b/drivers/iommu/iommu-pasid.c
> index 6b21d369d514..239b91e18543 100644
> --- a/drivers/iommu/iommu-pasid.c
> +++ b/drivers/iommu/iommu-pasid.c
> @@ -13,6 +13,7 @@
>  
>  static const struct iommu_pasid_init_fns *
>  pasid_table_init_fns[PASID_TABLE_NUM_FMTS] = {
> +	[PASID_TABLE_ARM_SMMU_V3] = &arm_smmu_v3_pasid_init_fns,
>  };
>  
>  struct iommu_pasid_table_ops *
> diff --git a/drivers/iommu/iommu-pasid.h b/drivers/iommu/iommu-pasid.h
> index 40a27d35c1e0..77e449a1655b 100644
> --- a/drivers/iommu/iommu-pasid.h
> +++ b/drivers/iommu/iommu-pasid.h
> @@ -15,6 +15,7 @@
>  struct mm_struct;
>  
>  enum iommu_pasid_table_fmt {
> +	PASID_TABLE_ARM_SMMU_V3,
>  	PASID_TABLE_NUM_FMTS,
>  };
>  
> @@ -73,6 +74,25 @@ struct iommu_pasid_sync_ops {
>  			  struct iommu_pasid_entry *entry);
>  };
>  
> +/**
> + * arm_smmu_context_cfg - PASID table configuration for ARM SMMU v3
> + *
> + * SMMU properties:
> + * @stall:	devices attached to the domain are allowed to stall.
> + * @asid_bits:	number of ASID bits supported by the SMMU
> + *
> + * @s1fmt:	PASID table format, chosen by the allocator.
> + */
> +struct arm_smmu_context_cfg {
> +	u8				stall:1;
> +	u8				asid_bits;
> +
> +#define ARM_SMMU_S1FMT_LINEAR		0x0
> +#define ARM_SMMU_S1FMT_4K_L2		0x1
> +#define ARM_SMMU_S1FMT_64K_L2		0x2
> +	u8				s1fmt;
> +};
> +
>  /**
>   * struct iommu_pasid_table_cfg - Configuration data for a set of PASID tables.
>   *
> @@ -88,6 +108,11 @@ struct iommu_pasid_table_cfg {
>  	const struct iommu_pasid_sync_ops *sync;
>  
>  	dma_addr_t			base;
> +
> +	/* Low-level data specific to the IOMMU */
> +	union {
> +		struct arm_smmu_context_cfg arm_smmu;
> +	};
>  };
>  
>  struct iommu_pasid_table_ops *
> @@ -139,4 +164,6 @@ static inline void iommu_pasid_flush_tlbs(struct iommu_pasid_table *table,
>  	table->cfg.sync->tlb_flush(table->cookie, pasid, entry);
>  }
>  
> +extern struct iommu_pasid_init_fns arm_smmu_v3_pasid_init_fns;
> +
>  #endif /* __IOMMU_PASID_H */


^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 17/37] iommu/arm-smmu-v3: Move context descriptor code
@ 2018-03-09 11:44         ` Jonathan Cameron
  0 siblings, 0 replies; 317+ messages in thread
From: Jonathan Cameron @ 2018-03-09 11:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 12 Feb 2018 18:33:32 +0000
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> In order to add support for substream ID, move the context descriptor code
> into a separate library. At the moment it only manages context descriptor
> 0, which is used for non-PASID translations.
> 
> One important behavior change is the ASID allocator, which is now global
> instead of per-SMMU. If we end up needing per-SMMU ASIDs after all, it
> would be relatively simple to move back to per-device allocator instead
> of a global one. Sharing ASIDs will require an IDR, so implement the
> ASID allocator with an IDA instead of porting the bitmap, to ease the
> transition.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Hi Jean-Philippe,

This would have been easier to review if split into a 'move' and additional
patches actually making the changes described.

Superficially it looks like there may be more going on in here than the
above description suggests.  I'm unsure why we are gaining 
the CFGI_CD_ALL and similar in this patch as there is just to much going on.

Thanks,

Jonathan
> ---
>  MAINTAINERS                         |   2 +-
>  drivers/iommu/Kconfig               |  11 ++
>  drivers/iommu/Makefile              |   1 +
>  drivers/iommu/arm-smmu-v3-context.c | 289 ++++++++++++++++++++++++++++++++++++
>  drivers/iommu/arm-smmu-v3.c         | 265 +++++++++++++++------------------
>  drivers/iommu/iommu-pasid.c         |   1 +
>  drivers/iommu/iommu-pasid.h         |  27 ++++
>  7 files changed, 451 insertions(+), 145 deletions(-)
>  create mode 100644 drivers/iommu/arm-smmu-v3-context.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 9cb8ced8322a..93507bfe03a6 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1104,7 +1104,7 @@ R:	Robin Murphy <robin.murphy@arm.com>
>  L:	linux-arm-kernel at lists.infradead.org (moderated for non-subscribers)
>  S:	Maintained
>  F:	drivers/iommu/arm-smmu.c
> -F:	drivers/iommu/arm-smmu-v3.c
> +F:	drivers/iommu/arm-smmu-v3*
>  F:	drivers/iommu/io-pgtable-arm.c
>  F:	drivers/iommu/io-pgtable-arm.h
>  F:	drivers/iommu/io-pgtable-arm-v7s.c
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 8add90ba9b75..4b272925ee78 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -66,6 +66,16 @@ menu "Generic PASID table support"
>  config IOMMU_PASID_TABLE
>  	bool
>  
> +config ARM_SMMU_V3_CONTEXT
> +	bool "ARM SMMU v3 Context Descriptor tables"
> +	select IOMMU_PASID_TABLE
> +	depends on ARM64
> +	help
> +	Enable support for ARM SMMU v3 Context Descriptor tables, used for DMA
> +	and PASID support.
> +
> +	If unsure, say N here.
> +
>  endmenu
>  
>  config IOMMU_IOVA
> @@ -344,6 +354,7 @@ config ARM_SMMU_V3
>  	depends on ARM64
>  	select IOMMU_API
>  	select IOMMU_IO_PGTABLE_LPAE
> +	select ARM_SMMU_V3_CONTEXT
>  	select GENERIC_MSI_IRQ_DOMAIN
>  	help
>  	  Support for implementations of the ARM System MMU architecture
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 338e59c93131..22758960ed02 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -9,6 +9,7 @@ obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
>  obj-$(CONFIG_IOMMU_PASID_TABLE) += iommu-pasid.o
> +obj-$(CONFIG_ARM_SMMU_V3_CONTEXT) += arm-smmu-v3-context.o
>  obj-$(CONFIG_IOMMU_IOVA) += iova.o
>  obj-$(CONFIG_OF_IOMMU)	+= of_iommu.o
>  obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
> diff --git a/drivers/iommu/arm-smmu-v3-context.c b/drivers/iommu/arm-smmu-v3-context.c
> new file mode 100644
> index 000000000000..e910cb356f45
> --- /dev/null
> +++ b/drivers/iommu/arm-smmu-v3-context.c
> @@ -0,0 +1,289 @@
> +/*
> + * Context descriptor table driver for SMMUv3
> + *
> + * Copyright (C) 2018 ARM Ltd.
> + *
> + * SPDX-License-Identifier: GPL-2.0
> + */
> +
> +#include <linux/device.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/idr.h>
> +#include <linux/kernel.h>
> +#include <linux/slab.h>
> +
> +#include "iommu-pasid.h"
> +
> +#define CTXDESC_CD_DWORDS		8
> +#define CTXDESC_CD_0_TCR_T0SZ_SHIFT	0
> +#define ARM64_TCR_T0SZ_SHIFT		0
> +#define ARM64_TCR_T0SZ_MASK		0x1fUL
> +#define CTXDESC_CD_0_TCR_TG0_SHIFT	6
> +#define ARM64_TCR_TG0_SHIFT		14
> +#define ARM64_TCR_TG0_MASK		0x3UL
> +#define CTXDESC_CD_0_TCR_IRGN0_SHIFT	8
> +#define ARM64_TCR_IRGN0_SHIFT		8
> +#define ARM64_TCR_IRGN0_MASK		0x3UL
> +#define CTXDESC_CD_0_TCR_ORGN0_SHIFT	10
> +#define ARM64_TCR_ORGN0_SHIFT		10
> +#define ARM64_TCR_ORGN0_MASK		0x3UL
> +#define CTXDESC_CD_0_TCR_SH0_SHIFT	12
> +#define ARM64_TCR_SH0_SHIFT		12
> +#define ARM64_TCR_SH0_MASK		0x3UL
> +#define CTXDESC_CD_0_TCR_EPD0_SHIFT	14
> +#define ARM64_TCR_EPD0_SHIFT		7
> +#define ARM64_TCR_EPD0_MASK		0x1UL
> +#define CTXDESC_CD_0_TCR_EPD1_SHIFT	30
> +#define ARM64_TCR_EPD1_SHIFT		23
> +#define ARM64_TCR_EPD1_MASK		0x1UL
> +
> +#define CTXDESC_CD_0_ENDI		(1UL << 15)
> +#define CTXDESC_CD_0_V			(1UL << 31)
> +
> +#define CTXDESC_CD_0_TCR_IPS_SHIFT	32
> +#define ARM64_TCR_IPS_SHIFT		32
> +#define ARM64_TCR_IPS_MASK		0x7UL
> +#define CTXDESC_CD_0_TCR_TBI0_SHIFT	38
> +#define ARM64_TCR_TBI0_SHIFT		37
> +#define ARM64_TCR_TBI0_MASK		0x1UL
> +
> +#define CTXDESC_CD_0_AA64		(1UL << 41)
> +#define CTXDESC_CD_0_S			(1UL << 44)
> +#define CTXDESC_CD_0_R			(1UL << 45)
> +#define CTXDESC_CD_0_A			(1UL << 46)
> +#define CTXDESC_CD_0_ASET_SHIFT		47
> +#define CTXDESC_CD_0_ASET_SHARED	(0UL << CTXDESC_CD_0_ASET_SHIFT)
> +#define CTXDESC_CD_0_ASET_PRIVATE	(1UL << CTXDESC_CD_0_ASET_SHIFT)
> +#define CTXDESC_CD_0_ASID_SHIFT		48
> +#define CTXDESC_CD_0_ASID_MASK		0xffffUL
> +
> +#define CTXDESC_CD_1_TTB0_SHIFT		4
> +#define CTXDESC_CD_1_TTB0_MASK		0xfffffffffffUL
> +
> +#define CTXDESC_CD_3_MAIR_SHIFT		0
> +
> +/* Convert between AArch64 (CPU) TCR format and SMMU CD format */
> +#define ARM_SMMU_TCR2CD(tcr, fld)					\
> +	(((tcr) >> ARM64_TCR_##fld##_SHIFT & ARM64_TCR_##fld##_MASK)	\
> +	 << CTXDESC_CD_0_TCR_##fld##_SHIFT)
> +
> +
> +struct arm_smmu_cd {
> +	struct iommu_pasid_entry	entry;
> +
> +	u64				ttbr;
> +	u64				tcr;
> +	u64				mair;
> +};
> +
> +#define pasid_entry_to_cd(entry) \
> +	container_of((entry), struct arm_smmu_cd, entry)
> +
> +struct arm_smmu_cd_tables {
> +	struct iommu_pasid_table	pasid;
> +
> +	void				*ptr;
> +	dma_addr_t			ptr_dma;
> +};
> +
> +#define pasid_to_cd_tables(pasid_table) \
> +	container_of((pasid_table), struct arm_smmu_cd_tables, pasid)
> +
> +#define pasid_ops_to_tables(ops) \
> +	pasid_to_cd_tables(iommu_pasid_table_ops_to_table(ops))
> +
> +static DEFINE_IDA(asid_ida);
> +
> +static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
> +{
> +	u64 val = 0;
> +
> +	/* Repack the TCR. Just care about TTBR0 for now */
> +	val |= ARM_SMMU_TCR2CD(tcr, T0SZ);
> +	val |= ARM_SMMU_TCR2CD(tcr, TG0);
> +	val |= ARM_SMMU_TCR2CD(tcr, IRGN0);
> +	val |= ARM_SMMU_TCR2CD(tcr, ORGN0);
> +	val |= ARM_SMMU_TCR2CD(tcr, SH0);
> +	val |= ARM_SMMU_TCR2CD(tcr, EPD0);
> +	val |= ARM_SMMU_TCR2CD(tcr, EPD1);
> +	val |= ARM_SMMU_TCR2CD(tcr, IPS);
> +	val |= ARM_SMMU_TCR2CD(tcr, TBI0);
> +
> +	return val;
> +}
> +
> +static int arm_smmu_write_ctx_desc(struct arm_smmu_cd_tables *tbl, int ssid,
> +				    struct arm_smmu_cd *cd)
> +{
> +	u64 val;
> +	__u64 *cdptr = tbl->ptr;
> +	struct arm_smmu_context_cfg *cfg = &tbl->pasid.cfg.arm_smmu;
> +
> +	if (!cd || WARN_ON(ssid))
> +		return -EINVAL;
> +
> +	/*
> +	 * We don't need to issue any invalidation here, as we'll invalidate
> +	 * the STE when installing the new entry anyway.
> +	 */
> +	val = arm_smmu_cpu_tcr_to_cd(cd->tcr) |
> +#ifdef __BIG_ENDIAN
> +	      CTXDESC_CD_0_ENDI |
> +#endif
> +	      CTXDESC_CD_0_R | CTXDESC_CD_0_A | CTXDESC_CD_0_ASET_PRIVATE |
> +	      CTXDESC_CD_0_AA64 | cd->entry.tag << CTXDESC_CD_0_ASID_SHIFT |
> +	      CTXDESC_CD_0_V;
> +
> +	if (cfg->stall)
> +		val |= CTXDESC_CD_0_S;
> +
> +	cdptr[0] = cpu_to_le64(val);
> +
> +	val = cd->ttbr & CTXDESC_CD_1_TTB0_MASK << CTXDESC_CD_1_TTB0_SHIFT;
> +	cdptr[1] = cpu_to_le64(val);
> +
> +	cdptr[3] = cpu_to_le64(cd->mair << CTXDESC_CD_3_MAIR_SHIFT);
> +
> +	return 0;
> +}
> +
> +static struct iommu_pasid_entry *
> +arm_smmu_alloc_shared_cd(struct iommu_pasid_table_ops *ops, struct mm_struct *mm)
> +{
> +	return ERR_PTR(-ENODEV);
> +}
> +
> +static struct iommu_pasid_entry *
> +arm_smmu_alloc_priv_cd(struct iommu_pasid_table_ops *ops,
> +		       enum io_pgtable_fmt fmt,
> +		       struct io_pgtable_cfg *cfg)
> +{
> +	int ret;
> +	int asid;
> +	struct arm_smmu_cd *cd;
> +	struct arm_smmu_cd_tables *tbl = pasid_ops_to_tables(ops);
> +	struct arm_smmu_context_cfg *ctx_cfg = &tbl->pasid.cfg.arm_smmu;
> +
> +	cd = kzalloc(sizeof(*cd), GFP_KERNEL);
> +	if (!cd)
> +		return ERR_PTR(-ENOMEM);
> +
> +	asid = ida_simple_get(&asid_ida, 0, 1 << ctx_cfg->asid_bits,
> +			      GFP_KERNEL);
> +	if (asid < 0) {
> +		kfree(cd);
> +		return ERR_PTR(asid);
> +	}
> +
> +	cd->entry.tag = asid;
> +
> +	switch (fmt) {
> +	case ARM_64_LPAE_S1:
> +		cd->ttbr	= cfg->arm_lpae_s1_cfg.ttbr[0];
> +		cd->tcr		= cfg->arm_lpae_s1_cfg.tcr;
> +		cd->mair	= cfg->arm_lpae_s1_cfg.mair[0];
> +		break;
> +	default:
> +		pr_err("Unsupported pgtable format 0x%x\n", fmt);
> +		ret = -EINVAL;
> +		goto err_free_asid;
> +	}
> +
> +	return &cd->entry;
> +
> +err_free_asid:
> +	ida_simple_remove(&asid_ida, asid);
> +
> +	kfree(cd);
> +
> +	return ERR_PTR(ret);
> +}
> +
> +static void arm_smmu_free_cd(struct iommu_pasid_table_ops *ops,
> +			     struct iommu_pasid_entry *entry)
> +{
> +	struct arm_smmu_cd *cd = pasid_entry_to_cd(entry);
> +
> +	ida_simple_remove(&asid_ida, (u16)entry->tag);
> +	kfree(cd);
> +}
> +
> +static int arm_smmu_set_cd(struct iommu_pasid_table_ops *ops, int pasid,
> +			   struct iommu_pasid_entry *entry)
> +{
> +	struct arm_smmu_cd_tables *tbl = pasid_ops_to_tables(ops);
> +	struct arm_smmu_cd *cd = pasid_entry_to_cd(entry);
> +
> +	if (WARN_ON(pasid > (1 << tbl->pasid.cfg.order)))
> +		return -EINVAL;
> +
> +	return arm_smmu_write_ctx_desc(tbl, pasid, cd);
> +}
> +
> +static void arm_smmu_clear_cd(struct iommu_pasid_table_ops *ops, int pasid,
> +			      struct iommu_pasid_entry *entry)
> +{
> +	struct arm_smmu_cd_tables *tbl = pasid_ops_to_tables(ops);
> +
> +	if (WARN_ON(pasid > (1 << tbl->pasid.cfg.order)))
> +		return;
> +
> +	arm_smmu_write_ctx_desc(tbl, pasid, NULL);
> +}
> +
> +static struct iommu_pasid_table *
> +arm_smmu_alloc_cd_tables(struct iommu_pasid_table_cfg *cfg, void *cookie)
> +{
> +	struct arm_smmu_cd_tables *tbl;
> +	struct device *dev = cfg->iommu_dev;
> +
> +	if (cfg->order) {
> +		/* TODO: support SSID */
> +		return NULL;
> +	}
> +
> +	tbl = devm_kzalloc(dev, sizeof(*tbl), GFP_KERNEL);
> +	if (!tbl)
> +		return NULL;
> +
> +	tbl->ptr = dmam_alloc_coherent(dev, CTXDESC_CD_DWORDS << 3,
> +				       &tbl->ptr_dma, GFP_KERNEL | __GFP_ZERO);
> +	if (!tbl->ptr) {
> +		dev_warn(dev, "failed to allocate context descriptor\n");
> +		goto err_free_tbl;
> +	}
> +
> +	tbl->pasid.ops = (struct iommu_pasid_table_ops) {
> +		.alloc_priv_entry	= arm_smmu_alloc_priv_cd,
> +		.alloc_shared_entry	= arm_smmu_alloc_shared_cd,
> +		.free_entry		= arm_smmu_free_cd,
> +		.set_entry		= arm_smmu_set_cd,
> +		.clear_entry		= arm_smmu_clear_cd,
> +	};
> +
> +	cfg->base		= tbl->ptr_dma;
> +	cfg->arm_smmu.s1fmt	= ARM_SMMU_S1FMT_LINEAR;
> +
> +	return &tbl->pasid;
> +
> +err_free_tbl:
> +	devm_kfree(dev, tbl);
> +
> +	return NULL;
> +}
> +
> +static void arm_smmu_free_cd_tables(struct iommu_pasid_table *pasid_table)
> +{
> +	struct iommu_pasid_table_cfg *cfg = &pasid_table->cfg;
> +	struct device *dev = cfg->iommu_dev;
> +	struct arm_smmu_cd_tables *tbl = pasid_to_cd_tables(pasid_table);
> +
> +	dmam_free_coherent(dev, CTXDESC_CD_DWORDS << 3,
> +			   tbl->ptr, tbl->ptr_dma);
> +	devm_kfree(dev, tbl);
> +}
> +
> +struct iommu_pasid_init_fns arm_smmu_v3_pasid_init_fns = {
> +	.alloc	= arm_smmu_alloc_cd_tables,
> +	.free	= arm_smmu_free_cd_tables,
> +};
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index fb2507ffcdaf..b6d8c90fafb3 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -40,6 +40,7 @@
>  #include <linux/amba/bus.h>
>  
>  #include "io-pgtable.h"
> +#include "iommu-pasid.h"
>  
>  /* MMIO registers */
>  #define ARM_SMMU_IDR0			0x0
> @@ -281,60 +282,6 @@
>  #define STRTAB_STE_3_S2TTB_SHIFT	4
>  #define STRTAB_STE_3_S2TTB_MASK		0xfffffffffffUL
>  
> -/* Context descriptor (stage-1 only) */
> -#define CTXDESC_CD_DWORDS		8
> -#define CTXDESC_CD_0_TCR_T0SZ_SHIFT	0
> -#define ARM64_TCR_T0SZ_SHIFT		0
> -#define ARM64_TCR_T0SZ_MASK		0x1fUL
> -#define CTXDESC_CD_0_TCR_TG0_SHIFT	6
> -#define ARM64_TCR_TG0_SHIFT		14
> -#define ARM64_TCR_TG0_MASK		0x3UL
> -#define CTXDESC_CD_0_TCR_IRGN0_SHIFT	8
> -#define ARM64_TCR_IRGN0_SHIFT		8
> -#define ARM64_TCR_IRGN0_MASK		0x3UL
> -#define CTXDESC_CD_0_TCR_ORGN0_SHIFT	10
> -#define ARM64_TCR_ORGN0_SHIFT		10
> -#define ARM64_TCR_ORGN0_MASK		0x3UL
> -#define CTXDESC_CD_0_TCR_SH0_SHIFT	12
> -#define ARM64_TCR_SH0_SHIFT		12
> -#define ARM64_TCR_SH0_MASK		0x3UL
> -#define CTXDESC_CD_0_TCR_EPD0_SHIFT	14
> -#define ARM64_TCR_EPD0_SHIFT		7
> -#define ARM64_TCR_EPD0_MASK		0x1UL
> -#define CTXDESC_CD_0_TCR_EPD1_SHIFT	30
> -#define ARM64_TCR_EPD1_SHIFT		23
> -#define ARM64_TCR_EPD1_MASK		0x1UL
> -
> -#define CTXDESC_CD_0_ENDI		(1UL << 15)
> -#define CTXDESC_CD_0_V			(1UL << 31)
> -
> -#define CTXDESC_CD_0_TCR_IPS_SHIFT	32
> -#define ARM64_TCR_IPS_SHIFT		32
> -#define ARM64_TCR_IPS_MASK		0x7UL
> -#define CTXDESC_CD_0_TCR_TBI0_SHIFT	38
> -#define ARM64_TCR_TBI0_SHIFT		37
> -#define ARM64_TCR_TBI0_MASK		0x1UL
> -
> -#define CTXDESC_CD_0_AA64		(1UL << 41)
> -#define CTXDESC_CD_0_S			(1UL << 44)
> -#define CTXDESC_CD_0_R			(1UL << 45)
> -#define CTXDESC_CD_0_A			(1UL << 46)
> -#define CTXDESC_CD_0_ASET_SHIFT		47
> -#define CTXDESC_CD_0_ASET_SHARED	(0UL << CTXDESC_CD_0_ASET_SHIFT)
> -#define CTXDESC_CD_0_ASET_PRIVATE	(1UL << CTXDESC_CD_0_ASET_SHIFT)
> -#define CTXDESC_CD_0_ASID_SHIFT		48
> -#define CTXDESC_CD_0_ASID_MASK		0xffffUL
> -
> -#define CTXDESC_CD_1_TTB0_SHIFT		4
> -#define CTXDESC_CD_1_TTB0_MASK		0xfffffffffffUL
> -
> -#define CTXDESC_CD_3_MAIR_SHIFT		0
> -
> -/* Convert between AArch64 (CPU) TCR format and SMMU CD format */
> -#define ARM_SMMU_TCR2CD(tcr, fld)					\
> -	(((tcr) >> ARM64_TCR_##fld##_SHIFT & ARM64_TCR_##fld##_MASK)	\
> -	 << CTXDESC_CD_0_TCR_##fld##_SHIFT)
> -
>  /* Command queue */
>  #define CMDQ_ENT_DWORDS			2
>  #define CMDQ_MAX_SZ_SHIFT		8
> @@ -353,6 +300,8 @@
>  #define CMDQ_PREFETCH_1_SIZE_SHIFT	0
>  #define CMDQ_PREFETCH_1_ADDR_MASK	~0xfffUL
>  
> +#define CMDQ_CFGI_0_SSID_SHIFT		12
> +#define CMDQ_CFGI_0_SSID_MASK		0xfffffUL
>  #define CMDQ_CFGI_0_SID_SHIFT		32
>  #define CMDQ_CFGI_0_SID_MASK		0xffffffffUL
>  #define CMDQ_CFGI_1_LEAF		(1UL << 0)
> @@ -476,8 +425,11 @@ struct arm_smmu_cmdq_ent {
>  
>  		#define CMDQ_OP_CFGI_STE	0x3
>  		#define CMDQ_OP_CFGI_ALL	0x4
> +		#define CMDQ_OP_CFGI_CD		0x5
> +		#define CMDQ_OP_CFGI_CD_ALL	0x6
>  		struct {
>  			u32			sid;
> +			u32			ssid;
>  			union {
>  				bool		leaf;
>  				u8		span;
> @@ -552,15 +504,9 @@ struct arm_smmu_strtab_l1_desc {
>  };
>  
>  struct arm_smmu_s1_cfg {
> -	__le64				*cdptr;
> -	dma_addr_t			cdptr_dma;
> -
> -	struct arm_smmu_ctx_desc {
> -		u16	asid;
> -		u64	ttbr;
> -		u64	tcr;
> -		u64	mair;
> -	}				cd;
> +	struct iommu_pasid_table_cfg	tables;
> +	struct iommu_pasid_table_ops	*ops;
> +	struct iommu_pasid_entry	*cd0; /* Default context */
>  };
>  
>  struct arm_smmu_s2_cfg {
> @@ -629,9 +575,7 @@ struct arm_smmu_device {
>  	unsigned long			oas; /* PA */
>  	unsigned long			pgsize_bitmap;
>  
> -#define ARM_SMMU_MAX_ASIDS		(1 << 16)
>  	unsigned int			asid_bits;
> -	DECLARE_BITMAP(asid_map, ARM_SMMU_MAX_ASIDS);
>  
>  #define ARM_SMMU_MAX_VMIDS		(1 << 16)
>  	unsigned int			vmid_bits;
> @@ -855,10 +799,16 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
>  		cmd[1] |= ent->prefetch.size << CMDQ_PREFETCH_1_SIZE_SHIFT;
>  		cmd[1] |= ent->prefetch.addr & CMDQ_PREFETCH_1_ADDR_MASK;
>  		break;
> +	case CMDQ_OP_CFGI_CD:
> +		cmd[0] |= ent->cfgi.ssid << CMDQ_CFGI_0_SSID_SHIFT;
> +		/* Fallthrough */
>  	case CMDQ_OP_CFGI_STE:
>  		cmd[0] |= (u64)ent->cfgi.sid << CMDQ_CFGI_0_SID_SHIFT;
>  		cmd[1] |= ent->cfgi.leaf ? CMDQ_CFGI_1_LEAF : 0;
>  		break;
> +	case CMDQ_OP_CFGI_CD_ALL:
> +		cmd[0] |= (u64)ent->cfgi.sid << CMDQ_CFGI_0_SID_SHIFT;
> +		break;
>  	case CMDQ_OP_CFGI_ALL:
>  		/* Cover the entire SID range */
>  		cmd[1] |= CMDQ_CFGI_1_RANGE_MASK << CMDQ_CFGI_1_RANGE_SHIFT;
> @@ -1059,54 +1009,6 @@ static void arm_smmu_cmdq_issue_sync(struct arm_smmu_device *smmu)
>  		dev_err_ratelimited(smmu->dev, "CMD_SYNC timeout\n");
>  }
>  
> -/* Context descriptor manipulation functions */
> -static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
> -{
> -	u64 val = 0;
> -
> -	/* Repack the TCR. Just care about TTBR0 for now */
> -	val |= ARM_SMMU_TCR2CD(tcr, T0SZ);
> -	val |= ARM_SMMU_TCR2CD(tcr, TG0);
> -	val |= ARM_SMMU_TCR2CD(tcr, IRGN0);
> -	val |= ARM_SMMU_TCR2CD(tcr, ORGN0);
> -	val |= ARM_SMMU_TCR2CD(tcr, SH0);
> -	val |= ARM_SMMU_TCR2CD(tcr, EPD0);
> -	val |= ARM_SMMU_TCR2CD(tcr, EPD1);
> -	val |= ARM_SMMU_TCR2CD(tcr, IPS);
> -	val |= ARM_SMMU_TCR2CD(tcr, TBI0);
> -
> -	return val;
> -}
> -
> -static void arm_smmu_write_ctx_desc(struct arm_smmu_device *smmu,
> -				    struct arm_smmu_s1_cfg *cfg)
> -{
> -	u64 val;
> -
> -	/*
> -	 * We don't need to issue any invalidation here, as we'll invalidate
> -	 * the STE when installing the new entry anyway.
> -	 */
> -	val = arm_smmu_cpu_tcr_to_cd(cfg->cd.tcr) |
> -#ifdef __BIG_ENDIAN
> -	      CTXDESC_CD_0_ENDI |
> -#endif
> -	      CTXDESC_CD_0_R | CTXDESC_CD_0_A | CTXDESC_CD_0_ASET_PRIVATE |
> -	      CTXDESC_CD_0_AA64 | (u64)cfg->cd.asid << CTXDESC_CD_0_ASID_SHIFT |
> -	      CTXDESC_CD_0_V;
> -
> -	/* STALL_MODEL==0b10 && CD.S==0 is ILLEGAL */
> -	if (smmu->features & ARM_SMMU_FEAT_STALL_FORCE)
> -		val |= CTXDESC_CD_0_S;
> -
> -	cfg->cdptr[0] = cpu_to_le64(val);
> -
> -	val = cfg->cd.ttbr & CTXDESC_CD_1_TTB0_MASK << CTXDESC_CD_1_TTB0_SHIFT;
> -	cfg->cdptr[1] = cpu_to_le64(val);
> -
> -	cfg->cdptr[3] = cpu_to_le64(cfg->cd.mair << CTXDESC_CD_3_MAIR_SHIFT);
> -}
> -
>  /* Stream table manipulation functions */
>  static void
>  arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
> @@ -1222,7 +1124,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
>  		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
>  			dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
>  
> -		val |= (ste->s1_cfg->cdptr_dma & STRTAB_STE_0_S1CTXPTR_MASK
> +		val |= (ste->s1_cfg->tables.base & STRTAB_STE_0_S1CTXPTR_MASK
>  		        << STRTAB_STE_0_S1CTXPTR_SHIFT) |
>  			STRTAB_STE_0_CFG_S1_TRANS;
>  	}
> @@ -1466,8 +1368,10 @@ static void arm_smmu_tlb_inv_context(void *cookie)
>  	struct arm_smmu_cmdq_ent cmd;
>  
>  	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
> +		if (unlikely(!smmu_domain->s1_cfg.cd0))
> +			return;
>  		cmd.opcode	= CMDQ_OP_TLBI_NH_ASID;
> -		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd.asid;
> +		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd0->tag;
>  		cmd.tlbi.vmid	= 0;
>  	} else {
>  		cmd.opcode	= CMDQ_OP_TLBI_S12_VMALL;
> @@ -1491,8 +1395,10 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
>  	};
>  
>  	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
> +		if (unlikely(!smmu_domain->s1_cfg.cd0))
> +			return;
>  		cmd.opcode	= CMDQ_OP_TLBI_NH_VA;
> -		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd.asid;
> +		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd0->tag;
>  	} else {
>  		cmd.opcode	= CMDQ_OP_TLBI_S2_IPA;
>  		cmd.tlbi.vmid	= smmu_domain->s2_cfg.vmid;
> @@ -1510,6 +1416,71 @@ static const struct iommu_gather_ops arm_smmu_gather_ops = {
>  	.tlb_sync	= arm_smmu_tlb_sync,
>  };
>  
> +/* PASID TABLE API */
> +static void __arm_smmu_sync_cd(struct arm_smmu_domain *smmu_domain,
> +			       struct arm_smmu_cmdq_ent *cmd)
> +{
> +	size_t i;
> +	unsigned long flags;
> +	struct arm_smmu_master_data *master;
> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
> +
> +	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
> +	list_for_each_entry(master, &smmu_domain->devices, list) {
> +		struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
> +
> +		for (i = 0; i < fwspec->num_ids; i++) {
> +			cmd->cfgi.sid = fwspec->ids[i];
> +			arm_smmu_cmdq_issue_cmd(smmu, cmd);
> +		}
> +	}
> +	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
> +
> +	__arm_smmu_tlb_sync(smmu);
> +}
> +
> +static void arm_smmu_sync_cd(void *cookie, int ssid, bool leaf)
> +{
> +	struct arm_smmu_cmdq_ent cmd = {
> +		.opcode	= CMDQ_OP_CFGI_CD_ALL,
> +		.cfgi	= {
> +			.ssid	= ssid,
> +			.leaf	= leaf,
> +		},
> +	};
> +
> +	__arm_smmu_sync_cd(cookie, &cmd);
> +}
> +
> +static void arm_smmu_sync_cd_all(void *cookie)
> +{
> +	struct arm_smmu_cmdq_ent cmd = {
> +		.opcode	= CMDQ_OP_CFGI_CD_ALL,
> +	};
> +
> +	__arm_smmu_sync_cd(cookie, &cmd);
> +}
> +
> +static void arm_smmu_tlb_inv_ssid(void *cookie, int ssid,
> +				  struct iommu_pasid_entry *entry)
> +{
> +	struct arm_smmu_domain *smmu_domain = cookie;
> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
> +	struct arm_smmu_cmdq_ent cmd = {
> +		.opcode		= CMDQ_OP_TLBI_NH_ASID,
> +		.tlbi.asid	= entry->tag,
> +	};
> +
> +	arm_smmu_cmdq_issue_cmd(smmu, &cmd);
> +	__arm_smmu_tlb_sync(smmu);
> +}
> +
> +static struct iommu_pasid_sync_ops arm_smmu_ctx_sync = {
> +	.cfg_flush	= arm_smmu_sync_cd,
> +	.cfg_flush_all	= arm_smmu_sync_cd_all,
> +	.tlb_flush	= arm_smmu_tlb_inv_ssid,
> +};
> +
>  /* IOMMU API */
>  static bool arm_smmu_capable(enum iommu_cap cap)
>  {
> @@ -1582,15 +1553,11 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
>  
>  	/* Free the CD and ASID, if we allocated them */
>  	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
> -		struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
> -
> -		if (cfg->cdptr) {
> -			dmam_free_coherent(smmu_domain->smmu->dev,
> -					   CTXDESC_CD_DWORDS << 3,
> -					   cfg->cdptr,
> -					   cfg->cdptr_dma);
> +		struct iommu_pasid_table_ops *ops = smmu_domain->s1_cfg.ops;
>  
> -			arm_smmu_bitmap_free(smmu->asid_map, cfg->cd.asid);
> +		if (ops) {
> +			ops->free_entry(ops, smmu_domain->s1_cfg.cd0);
> +			iommu_free_pasid_ops(ops);
>  		}
>  	} else {
>  		struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg;
> @@ -1605,31 +1572,42 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
>  				       struct io_pgtable_cfg *pgtbl_cfg)
>  {
>  	int ret;
> -	int asid;
> -	struct arm_smmu_device *smmu = smmu_domain->smmu;
> +	struct iommu_pasid_entry *entry;
> +	struct iommu_pasid_table_ops *ops;
>  	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
> +	struct iommu_pasid_table_cfg pasid_cfg = {
> +		.iommu_dev		= smmu->dev,
> +		.sync			= &arm_smmu_ctx_sync,
> +		.arm_smmu = {
> +			.stall		= !!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE),
> +			.asid_bits	= smmu->asid_bits,
> +		},
> +	};
>  
> -	asid = arm_smmu_bitmap_alloc(smmu->asid_map, smmu->asid_bits);
> -	if (asid < 0)
> -		return asid;
> +	ops = iommu_alloc_pasid_ops(PASID_TABLE_ARM_SMMU_V3, &pasid_cfg,
> +				    smmu_domain);
> +	if (!ops)
> +		return -ENOMEM;
>  
> -	cfg->cdptr = dmam_alloc_coherent(smmu->dev, CTXDESC_CD_DWORDS << 3,
> -					 &cfg->cdptr_dma,
> -					 GFP_KERNEL | __GFP_ZERO);
> -	if (!cfg->cdptr) {
> -		dev_warn(smmu->dev, "failed to allocate context descriptor\n");
> -		ret = -ENOMEM;
> -		goto out_free_asid;
> +	/* Create default entry */
> +	entry = ops->alloc_priv_entry(ops, ARM_64_LPAE_S1, pgtbl_cfg);
> +	if (IS_ERR(entry)) {
> +		iommu_free_pasid_ops(ops);
> +		return PTR_ERR(entry);
>  	}
>  
> -	cfg->cd.asid	= (u16)asid;
> -	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
> -	cfg->cd.tcr	= pgtbl_cfg->arm_lpae_s1_cfg.tcr;
> -	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair[0];
> -	return 0;
> +	ret = ops->set_entry(ops, 0, entry);
> +	if (ret) {
> +		ops->free_entry(ops, entry);
> +		iommu_free_pasid_ops(ops);
> +		return ret;
> +	}
> +
> +	cfg->tables	= pasid_cfg;
> +	cfg->ops	= ops;
> +	cfg->cd0	= entry;
>  
> -out_free_asid:
> -	arm_smmu_bitmap_free(smmu->asid_map, asid);
>  	return ret;
>  }
>  
> @@ -1832,7 +1810,6 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  	} else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>  		ste->s1_cfg = &smmu_domain->s1_cfg;
>  		ste->s2_cfg = NULL;
> -		arm_smmu_write_ctx_desc(smmu, ste->s1_cfg);
>  	} else {
>  		ste->s1_cfg = NULL;
>  		ste->s2_cfg = &smmu_domain->s2_cfg;
> diff --git a/drivers/iommu/iommu-pasid.c b/drivers/iommu/iommu-pasid.c
> index 6b21d369d514..239b91e18543 100644
> --- a/drivers/iommu/iommu-pasid.c
> +++ b/drivers/iommu/iommu-pasid.c
> @@ -13,6 +13,7 @@
>  
>  static const struct iommu_pasid_init_fns *
>  pasid_table_init_fns[PASID_TABLE_NUM_FMTS] = {
> +	[PASID_TABLE_ARM_SMMU_V3] = &arm_smmu_v3_pasid_init_fns,
>  };
>  
>  struct iommu_pasid_table_ops *
> diff --git a/drivers/iommu/iommu-pasid.h b/drivers/iommu/iommu-pasid.h
> index 40a27d35c1e0..77e449a1655b 100644
> --- a/drivers/iommu/iommu-pasid.h
> +++ b/drivers/iommu/iommu-pasid.h
> @@ -15,6 +15,7 @@
>  struct mm_struct;
>  
>  enum iommu_pasid_table_fmt {
> +	PASID_TABLE_ARM_SMMU_V3,
>  	PASID_TABLE_NUM_FMTS,
>  };
>  
> @@ -73,6 +74,25 @@ struct iommu_pasid_sync_ops {
>  			  struct iommu_pasid_entry *entry);
>  };
>  
> +/**
> + * arm_smmu_context_cfg - PASID table configuration for ARM SMMU v3
> + *
> + * SMMU properties:
> + * @stall:	devices attached to the domain are allowed to stall.
> + * @asid_bits:	number of ASID bits supported by the SMMU
> + *
> + * @s1fmt:	PASID table format, chosen by the allocator.
> + */
> +struct arm_smmu_context_cfg {
> +	u8				stall:1;
> +	u8				asid_bits;
> +
> +#define ARM_SMMU_S1FMT_LINEAR		0x0
> +#define ARM_SMMU_S1FMT_4K_L2		0x1
> +#define ARM_SMMU_S1FMT_64K_L2		0x2
> +	u8				s1fmt;
> +};
> +
>  /**
>   * struct iommu_pasid_table_cfg - Configuration data for a set of PASID tables.
>   *
> @@ -88,6 +108,11 @@ struct iommu_pasid_table_cfg {
>  	const struct iommu_pasid_sync_ops *sync;
>  
>  	dma_addr_t			base;
> +
> +	/* Low-level data specific to the IOMMU */
> +	union {
> +		struct arm_smmu_context_cfg arm_smmu;
> +	};
>  };
>  
>  struct iommu_pasid_table_ops *
> @@ -139,4 +164,6 @@ static inline void iommu_pasid_flush_tlbs(struct iommu_pasid_table *table,
>  	table->cfg.sync->tlb_flush(table->cookie, pasid, entry);
>  }
>  
> +extern struct iommu_pasid_init_fns arm_smmu_v3_pasid_init_fns;
> +
>  #endif /* __IOMMU_PASID_H */

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 07/37] iommu: Add a page fault handler
  2018-03-08 15:40       ` Jonathan Cameron
  (?)
@ 2018-03-14 13:08           ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-14 13:08 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Mark Rutland, ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, Catalin Marinas,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw

Hi Jonathan,

Thanks for reviewing

On 08/03/18 15:40, Jonathan Cameron wrote:
>> +/**
>> + * iommu_fault_queue_unregister() - Unregister an IOMMU driver from the fault
>> + * queue.
>> + * @flush_notifier: same parameter as iommu_fault_queue_register
>> + */
>> +void iommu_fault_queue_unregister(struct notifier_block *flush_notifier)
>> +{
>> +	down_write(&iommu_fault_queue_sem);
>> +	if (refcount_dec_and_test(&iommu_fault_queue_refs)) {
>> +		destroy_workqueue(iommu_fault_queue);
>> +		iommu_fault_queue = NULL;
>> +	}
>> +	up_write(&iommu_fault_queue_sem);
>> +
>> +	if (flush_notifier)
>> +		blocking_notifier_chain_unregister(&iommu_fault_queue_flush_notifiers,
>> +						   flush_notifier);
> I would expect the ordering in queue_unregister to be the reverse of queue
> register (to make it obvious there are no races).
> 
> That would put this last block at the start before potentially destroying
> the work queue.  If I'm missing something then perhaps a comment to
> explain why the ordering is not the obvious one?

Sure, I'll fix the order, I don't think there was any good reason for it

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 07/37] iommu: Add a page fault handler
@ 2018-03-14 13:08           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-14 13:08 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Mark Rutland, xieyisheng1, ilias.apalodimas, kvm, linux-pci,
	xuzaibo, Will Deacon, okaya, yi.l.liu, Lorenzo Pieralisi,
	ashok.raj, tn, joro, robdclark, bharatku, linux-acpi,
	Catalin Marinas, rfranz, lenb, devicetree, jacob.jun.pan,
	alex.williamson, robh+dt, thunder.leizhen, bhelgaas,
	linux-arm-kernel, shunyong.yang, dwmw2, liubo95, rjw, jcrouse,
	iommu, hanjun.guo, Sudeep Holla, Robin Murphy, christian.koenig,
	nwatters

Hi Jonathan,

Thanks for reviewing

On 08/03/18 15:40, Jonathan Cameron wrote:
>> +/**
>> + * iommu_fault_queue_unregister() - Unregister an IOMMU driver from the fault
>> + * queue.
>> + * @flush_notifier: same parameter as iommu_fault_queue_register
>> + */
>> +void iommu_fault_queue_unregister(struct notifier_block *flush_notifier)
>> +{
>> +	down_write(&iommu_fault_queue_sem);
>> +	if (refcount_dec_and_test(&iommu_fault_queue_refs)) {
>> +		destroy_workqueue(iommu_fault_queue);
>> +		iommu_fault_queue = NULL;
>> +	}
>> +	up_write(&iommu_fault_queue_sem);
>> +
>> +	if (flush_notifier)
>> +		blocking_notifier_chain_unregister(&iommu_fault_queue_flush_notifiers,
>> +						   flush_notifier);
> I would expect the ordering in queue_unregister to be the reverse of queue
> register (to make it obvious there are no races).
> 
> That would put this last block at the start before potentially destroying
> the work queue.  If I'm missing something then perhaps a comment to
> explain why the ordering is not the obvious one?

Sure, I'll fix the order, I don't think there was any good reason for it

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 07/37] iommu: Add a page fault handler
@ 2018-03-14 13:08           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-14 13:08 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jonathan,

Thanks for reviewing

On 08/03/18 15:40, Jonathan Cameron wrote:
>> +/**
>> + * iommu_fault_queue_unregister() - Unregister an IOMMU driver from the fault
>> + * queue.
>> + * @flush_notifier: same parameter as iommu_fault_queue_register
>> + */
>> +void iommu_fault_queue_unregister(struct notifier_block *flush_notifier)
>> +{
>> +	down_write(&iommu_fault_queue_sem);
>> +	if (refcount_dec_and_test(&iommu_fault_queue_refs)) {
>> +		destroy_workqueue(iommu_fault_queue);
>> +		iommu_fault_queue = NULL;
>> +	}
>> +	up_write(&iommu_fault_queue_sem);
>> +
>> +	if (flush_notifier)
>> +		blocking_notifier_chain_unregister(&iommu_fault_queue_flush_notifiers,
>> +						   flush_notifier);
> I would expect the ordering in queue_unregister to be the reverse of queue
> register (to make it obvious there are no races).
> 
> That would put this last block at the start before potentially destroying
> the work queue.  If I'm missing something then perhaps a comment to
> explain why the ordering is not the obvious one?

Sure, I'll fix the order, I don't think there was any good reason for it

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 17/37] iommu/arm-smmu-v3: Move context descriptor code
  2018-03-09 11:44         ` Jonathan Cameron
  (?)
@ 2018-03-14 13:08             ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-14 13:08 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Mark Rutland, ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, Catalin Marinas,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw

On 09/03/18 11:44, Jonathan Cameron wrote:
> On Mon, 12 Feb 2018 18:33:32 +0000
> Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:
> 
>> In order to add support for substream ID, move the context descriptor code
>> into a separate library. At the moment it only manages context descriptor
>> 0, which is used for non-PASID translations.
>>
>> One important behavior change is the ASID allocator, which is now global
>> instead of per-SMMU. If we end up needing per-SMMU ASIDs after all, it
>> would be relatively simple to move back to per-device allocator instead
>> of a global one. Sharing ASIDs will require an IDR, so implement the
>> ASID allocator with an IDA instead of porting the bitmap, to ease the
>> transition.
>>
>> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
> Hi Jean-Philippe,
> 
> This would have been easier to review if split into a 'move' and additional
> patches actually making the changes described.
> 
> Superficially it looks like there may be more going on in here than the
> above description suggests.  I'm unsure why we are gaining 
> the CFGI_CD_ALL and similar in this patch as there is just to much going on.

Ok I'll try to split this

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 17/37] iommu/arm-smmu-v3: Move context descriptor code
@ 2018-03-14 13:08             ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-14 13:08 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Mark Rutland, xieyisheng1, ilias.apalodimas, kvm, linux-pci,
	xuzaibo, Will Deacon, okaya, yi.l.liu, Lorenzo Pieralisi,
	ashok.raj, tn, joro, robdclark, bharatku, linux-acpi,
	Catalin Marinas, rfranz, lenb, devicetree, jacob.jun.pan,
	alex.williamson, robh+dt, thunder.leizhen, bhelgaas,
	linux-arm-kernel, shunyong.yang, dwmw2, liubo95, rjw, jcrouse,
	iommu, hanjun.guo, Sudeep Holla, Robin Murphy, christian.koenig,
	nwatters

On 09/03/18 11:44, Jonathan Cameron wrote:
> On Mon, 12 Feb 2018 18:33:32 +0000
> Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:
> 
>> In order to add support for substream ID, move the context descriptor code
>> into a separate library. At the moment it only manages context descriptor
>> 0, which is used for non-PASID translations.
>>
>> One important behavior change is the ASID allocator, which is now global
>> instead of per-SMMU. If we end up needing per-SMMU ASIDs after all, it
>> would be relatively simple to move back to per-device allocator instead
>> of a global one. Sharing ASIDs will require an IDR, so implement the
>> ASID allocator with an IDA instead of porting the bitmap, to ease the
>> transition.
>>
>> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> Hi Jean-Philippe,
> 
> This would have been easier to review if split into a 'move' and additional
> patches actually making the changes described.
> 
> Superficially it looks like there may be more going on in here than the
> above description suggests.  I'm unsure why we are gaining 
> the CFGI_CD_ALL and similar in this patch as there is just to much going on.

Ok I'll try to split this

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 17/37] iommu/arm-smmu-v3: Move context descriptor code
@ 2018-03-14 13:08             ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-14 13:08 UTC (permalink / raw)
  To: linux-arm-kernel

On 09/03/18 11:44, Jonathan Cameron wrote:
> On Mon, 12 Feb 2018 18:33:32 +0000
> Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:
> 
>> In order to add support for substream ID, move the context descriptor code
>> into a separate library. At the moment it only manages context descriptor
>> 0, which is used for non-PASID translations.
>>
>> One important behavior change is the ASID allocator, which is now global
>> instead of per-SMMU. If we end up needing per-SMMU ASIDs after all, it
>> would be relatively simple to move back to per-device allocator instead
>> of a global one. Sharing ASIDs will require an IDR, so implement the
>> ASID allocator with an IDA instead of porting the bitmap, to ease the
>> transition.
>>
>> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> Hi Jean-Philippe,
> 
> This would have been easier to review if split into a 'move' and additional
> patches actually making the changes described.
> 
> Superficially it looks like there may be more going on in here than the
> above description suggests.  I'm unsure why we are gaining 
> the CFGI_CD_ALL and similar in this patch as there is just to much going on.

Ok I'll try to split this

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 27/37] iommu/arm-smmu-v3: Register fault workqueue
  2018-03-08 17:44         ` Jonathan Cameron
  (?)
@ 2018-03-14 13:08             ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-14 13:08 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Mark Rutland, ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, Catalin Marinas,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw

On 08/03/18 17:44, Jonathan Cameron wrote:
>> @@ -3168,6 +3260,13 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
>>  	if (ret)
>>  		return ret;
>>  
>> +	if (smmu->features & (ARM_SMMU_FEAT_STALLS | ARM_SMMU_FEAT_PRI)) {
>> +		smmu->faultq_nb.notifier_call = arm_smmu_flush_queues;
>> +		ret = iommu_fault_queue_register(&smmu->faultq_nb);
> Here you register only if this smmu supports stalls or pri which is fine, but
> see the unregister path.
> 
>> +		if (ret)
>> +			return ret;
>> +	}
>> +
>>  	/* And we're up. Go go go! */
>>  	ret = iommu_device_sysfs_add(&smmu->iommu, dev, NULL,
>>  				     "smmu3.%pa", &ioaddr);
>> @@ -3210,6 +3309,8 @@ static int arm_smmu_device_remove(struct platform_device *pdev)
>>  {
>>  	struct arm_smmu_device *smmu = platform_get_drvdata(pdev);
>>  
>> +	iommu_fault_queue_unregister(&smmu->faultq_nb);
> 
> Here you unregister from the fault queue unconditionally.  That is mostly
> safe but it seems to decrement and potentially destroy the work queue that
> is in use by another smmu instance that does support page faulting.

Ah yes, we'll need to check this

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 27/37] iommu/arm-smmu-v3: Register fault workqueue
@ 2018-03-14 13:08             ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-14 13:08 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Mark Rutland, xieyisheng1, ilias.apalodimas, kvm, linux-pci,
	xuzaibo, Will Deacon, okaya, yi.l.liu, Lorenzo Pieralisi,
	ashok.raj, tn, joro, robdclark, bharatku, linux-acpi,
	Catalin Marinas, rfranz, lenb, devicetree, jacob.jun.pan,
	alex.williamson, robh+dt, thunder.leizhen, bhelgaas,
	linux-arm-kernel, shunyong.yang, dwmw2, liubo95, rjw, jcrouse,
	iommu, hanjun.guo, Sudeep Holla, Robin Murphy, christian.koenig,
	nwatters

On 08/03/18 17:44, Jonathan Cameron wrote:
>> @@ -3168,6 +3260,13 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
>>  	if (ret)
>>  		return ret;
>>  
>> +	if (smmu->features & (ARM_SMMU_FEAT_STALLS | ARM_SMMU_FEAT_PRI)) {
>> +		smmu->faultq_nb.notifier_call = arm_smmu_flush_queues;
>> +		ret = iommu_fault_queue_register(&smmu->faultq_nb);
> Here you register only if this smmu supports stalls or pri which is fine, but
> see the unregister path.
> 
>> +		if (ret)
>> +			return ret;
>> +	}
>> +
>>  	/* And we're up. Go go go! */
>>  	ret = iommu_device_sysfs_add(&smmu->iommu, dev, NULL,
>>  				     "smmu3.%pa", &ioaddr);
>> @@ -3210,6 +3309,8 @@ static int arm_smmu_device_remove(struct platform_device *pdev)
>>  {
>>  	struct arm_smmu_device *smmu = platform_get_drvdata(pdev);
>>  
>> +	iommu_fault_queue_unregister(&smmu->faultq_nb);
> 
> Here you unregister from the fault queue unconditionally.  That is mostly
> safe but it seems to decrement and potentially destroy the work queue that
> is in use by another smmu instance that does support page faulting.

Ah yes, we'll need to check this

Thanks,
Jean


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 27/37] iommu/arm-smmu-v3: Register fault workqueue
@ 2018-03-14 13:08             ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-14 13:08 UTC (permalink / raw)
  To: linux-arm-kernel

On 08/03/18 17:44, Jonathan Cameron wrote:
>> @@ -3168,6 +3260,13 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
>>  	if (ret)
>>  		return ret;
>>  
>> +	if (smmu->features & (ARM_SMMU_FEAT_STALLS | ARM_SMMU_FEAT_PRI)) {
>> +		smmu->faultq_nb.notifier_call = arm_smmu_flush_queues;
>> +		ret = iommu_fault_queue_register(&smmu->faultq_nb);
> Here you register only if this smmu supports stalls or pri which is fine, but
> see the unregister path.
> 
>> +		if (ret)
>> +			return ret;
>> +	}
>> +
>>  	/* And we're up. Go go go! */
>>  	ret = iommu_device_sysfs_add(&smmu->iommu, dev, NULL,
>>  				     "smmu3.%pa", &ioaddr);
>> @@ -3210,6 +3309,8 @@ static int arm_smmu_device_remove(struct platform_device *pdev)
>>  {
>>  	struct arm_smmu_device *smmu = platform_get_drvdata(pdev);
>>  
>> +	iommu_fault_queue_unregister(&smmu->faultq_nb);
> 
> Here you unregister from the fault queue unconditionally.  That is mostly
> safe but it seems to decrement and potentially destroy the work queue that
> is in use by another smmu instance that does support page faulting.

Ah yes, we'll need to check this

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 28/37] iommu/arm-smmu-v3: Maintain a SID->device structure
  2018-03-08 17:34         ` Jonathan Cameron
  (?)
@ 2018-03-14 13:09             ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-14 13:09 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Mark Rutland, ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, Catalin Marinas,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw

On 08/03/18 17:34, Jonathan Cameron wrote:
>>  static int arm_smmu_add_device(struct device *dev)
>> @@ -2198,6 +2298,7 @@ static int arm_smmu_add_device(struct device *dev)
>>  
>>  	group = iommu_group_get_for_dev(dev);
>>  	if (!IS_ERR(group)) {
>> +		arm_smmu_insert_master(smmu, master);
> There are some error cases it would be good to take notice off when
> inserting the master.  Admittedly the same is true of iommu_device_link
> so I guess you are keeping with the existing code style.
> 
> Would also be nice if the later bit of rework to drop these out
> of the if statement was done before this patch in the series.

Not sure that's worth a separate patch, maybe we can do it here.

Thanks,
Jean

> 
>>  		iommu_group_put(group);
>>  		iommu_device_link(&smmu->iommu, dev);
>>  	}
>> @@ -2218,6 +2319,7 @@ static void arm_smmu_remove_device(struct device *dev)
>>  	smmu = master->smmu;
>>  	if (master && master->ste.assigned)
>>  		arm_smmu_detach_dev(dev);
>> +	arm_smmu_remove_master(smmu, master);
>>  	iommu_group_remove_device(dev);
>>  	iommu_device_unlink(&smmu->iommu, dev);
>>  	kfree(master);
>> @@ -2527,6 +2629,9 @@ static int arm_smmu_init_structures(struct arm_smmu_device *smmu)
>>  	int ret;
>>  
>>  	atomic_set(&smmu->sync_nr, 0);
>> +	mutex_init(&smmu->streams_mutex);
>> +	smmu->streams = RB_ROOT;
>> +
>>  	ret = arm_smmu_init_queues(smmu);
>>  	if (ret)
>>  		return ret;
> 
> 

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 28/37] iommu/arm-smmu-v3: Maintain a SID->device structure
@ 2018-03-14 13:09             ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-14 13:09 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Mark Rutland, xieyisheng1, ilias.apalodimas, kvm, linux-pci,
	xuzaibo, Will Deacon, okaya, yi.l.liu, Lorenzo Pieralisi,
	ashok.raj, tn, joro, robdclark, bharatku, linux-acpi,
	Catalin Marinas, rfranz, lenb, devicetree, jacob.jun.pan,
	alex.williamson, robh+dt, thunder.leizhen, bhelgaas,
	linux-arm-kernel, shunyong.yang, dwmw2, liubo95, rjw, jcrouse,
	iommu, hanjun.guo, Sudeep Holla, Robin Murphy, christian.koenig,
	nwatters

On 08/03/18 17:34, Jonathan Cameron wrote:
>>  static int arm_smmu_add_device(struct device *dev)
>> @@ -2198,6 +2298,7 @@ static int arm_smmu_add_device(struct device *dev)
>>  
>>  	group = iommu_group_get_for_dev(dev);
>>  	if (!IS_ERR(group)) {
>> +		arm_smmu_insert_master(smmu, master);
> There are some error cases it would be good to take notice off when
> inserting the master.  Admittedly the same is true of iommu_device_link
> so I guess you are keeping with the existing code style.
> 
> Would also be nice if the later bit of rework to drop these out
> of the if statement was done before this patch in the series.

Not sure that's worth a separate patch, maybe we can do it here.

Thanks,
Jean

> 
>>  		iommu_group_put(group);
>>  		iommu_device_link(&smmu->iommu, dev);
>>  	}
>> @@ -2218,6 +2319,7 @@ static void arm_smmu_remove_device(struct device *dev)
>>  	smmu = master->smmu;
>>  	if (master && master->ste.assigned)
>>  		arm_smmu_detach_dev(dev);
>> +	arm_smmu_remove_master(smmu, master);
>>  	iommu_group_remove_device(dev);
>>  	iommu_device_unlink(&smmu->iommu, dev);
>>  	kfree(master);
>> @@ -2527,6 +2629,9 @@ static int arm_smmu_init_structures(struct arm_smmu_device *smmu)
>>  	int ret;
>>  
>>  	atomic_set(&smmu->sync_nr, 0);
>> +	mutex_init(&smmu->streams_mutex);
>> +	smmu->streams = RB_ROOT;
>> +
>>  	ret = arm_smmu_init_queues(smmu);
>>  	if (ret)
>>  		return ret;
> 
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 28/37] iommu/arm-smmu-v3: Maintain a SID->device structure
@ 2018-03-14 13:09             ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-14 13:09 UTC (permalink / raw)
  To: linux-arm-kernel

On 08/03/18 17:34, Jonathan Cameron wrote:
>>  static int arm_smmu_add_device(struct device *dev)
>> @@ -2198,6 +2298,7 @@ static int arm_smmu_add_device(struct device *dev)
>>  
>>  	group = iommu_group_get_for_dev(dev);
>>  	if (!IS_ERR(group)) {
>> +		arm_smmu_insert_master(smmu, master);
> There are some error cases it would be good to take notice off when
> inserting the master.  Admittedly the same is true of iommu_device_link
> so I guess you are keeping with the existing code style.
> 
> Would also be nice if the later bit of rework to drop these out
> of the if statement was done before this patch in the series.

Not sure that's worth a separate patch, maybe we can do it here.

Thanks,
Jean

> 
>>  		iommu_group_put(group);
>>  		iommu_device_link(&smmu->iommu, dev);
>>  	}
>> @@ -2218,6 +2319,7 @@ static void arm_smmu_remove_device(struct device *dev)
>>  	smmu = master->smmu;
>>  	if (master && master->ste.assigned)
>>  		arm_smmu_detach_dev(dev);
>> +	arm_smmu_remove_master(smmu, master);
>>  	iommu_group_remove_device(dev);
>>  	iommu_device_unlink(&smmu->iommu, dev);
>>  	kfree(master);
>> @@ -2527,6 +2629,9 @@ static int arm_smmu_init_structures(struct arm_smmu_device *smmu)
>>  	int ret;
>>  
>>  	atomic_set(&smmu->sync_nr, 0);
>> +	mutex_init(&smmu->streams_mutex);
>> +	smmu->streams = RB_ROOT;
>> +
>>  	ret = arm_smmu_init_queues(smmu);
>>  	if (ret)
>>  		return ret;
> 
> 

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 31/37] iommu/arm-smmu-v3: Add support for PCI ATS
  2018-03-08 16:17         ` Jonathan Cameron
  (?)
@ 2018-03-14 13:09             ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-14 13:09 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Mark Rutland, ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, Catalin Marinas,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw

On 08/03/18 16:17, Jonathan Cameron wrote:
>> +	arm_smmu_enable_ats(master);
> It's a bit nasty not to handle the errors that this could output (other than
> the ENOSYS for when it's not available). Seems that it would be nice to at
> least add a note to the log if people are expecting it to work and it won't
> because some condition or other isn't met.

I agree it's not ideal. Last time this came up the problem was that
checking if ATS is supported requires an ugly ifdef. A proper
implementation requires more support in the PCI core, e.g. a
pci_ats_supported() function.

https://www.spinics.net/lists/kvm/msg145932.html

>> +
>>  	group = iommu_group_get_for_dev(dev);
>> -	if (!IS_ERR(group)) {
>> -		arm_smmu_insert_master(smmu, master);
>> -		iommu_group_put(group);
>> -		iommu_device_link(&smmu->iommu, dev);
>> +	if (IS_ERR(group)) {
>> +		ret = PTR_ERR(group);
>> +		goto err_disable_ats;
>>  	}
>>  
>> -	return PTR_ERR_OR_ZERO(group);
>> +	iommu_group_put(group);
>> +	arm_smmu_insert_master(smmu, master);
>> +	iommu_device_link(&smmu->iommu, dev);
>> +
>> +	return 0;
>> +
>> +err_disable_ats:
>> +	arm_smmu_disable_ats(master);
> master is leaked here I think...
> Possibly other things as this doesn't line up with the
> remove which I'd have mostly expected it to do.

> There are some slightly fishy bits of ordering in the original code
> anyway that I'm not seeing justification for (why is
> the iommu_device_unlink later than one might expect for
> example).

Yeah, knowing the rest of the probing code, there may exist subtle legacy
reasons for not freeing the master here and the strange orderings. I try
to keep existing behaviors where possible since I barely even have the
bandwidth to fix my own code.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 31/37] iommu/arm-smmu-v3: Add support for PCI ATS
@ 2018-03-14 13:09             ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-14 13:09 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Mark Rutland, xieyisheng1, ilias.apalodimas, kvm, linux-pci,
	xuzaibo, Will Deacon, okaya, yi.l.liu, Lorenzo Pieralisi,
	ashok.raj, tn, joro, robdclark, bharatku, linux-acpi,
	Catalin Marinas, rfranz, lenb, devicetree, jacob.jun.pan,
	alex.williamson, robh+dt, thunder.leizhen, bhelgaas,
	linux-arm-kernel, shunyong.yang, dwmw2, liubo95, rjw, jcrouse,
	iommu, hanjun.guo, Sudeep Holla, Robin Murphy, christian.koenig,
	nwatters

On 08/03/18 16:17, Jonathan Cameron wrote:
>> +	arm_smmu_enable_ats(master);
> It's a bit nasty not to handle the errors that this could output (other than
> the ENOSYS for when it's not available). Seems that it would be nice to at
> least add a note to the log if people are expecting it to work and it won't
> because some condition or other isn't met.

I agree it's not ideal. Last time this came up the problem was that
checking if ATS is supported requires an ugly ifdef. A proper
implementation requires more support in the PCI core, e.g. a
pci_ats_supported() function.

https://www.spinics.net/lists/kvm/msg145932.html

>> +
>>  	group = iommu_group_get_for_dev(dev);
>> -	if (!IS_ERR(group)) {
>> -		arm_smmu_insert_master(smmu, master);
>> -		iommu_group_put(group);
>> -		iommu_device_link(&smmu->iommu, dev);
>> +	if (IS_ERR(group)) {
>> +		ret = PTR_ERR(group);
>> +		goto err_disable_ats;
>>  	}
>>  
>> -	return PTR_ERR_OR_ZERO(group);
>> +	iommu_group_put(group);
>> +	arm_smmu_insert_master(smmu, master);
>> +	iommu_device_link(&smmu->iommu, dev);
>> +
>> +	return 0;
>> +
>> +err_disable_ats:
>> +	arm_smmu_disable_ats(master);
> master is leaked here I think...
> Possibly other things as this doesn't line up with the
> remove which I'd have mostly expected it to do.

> There are some slightly fishy bits of ordering in the original code
> anyway that I'm not seeing justification for (why is
> the iommu_device_unlink later than one might expect for
> example).

Yeah, knowing the rest of the probing code, there may exist subtle legacy
reasons for not freeing the master here and the strange orderings. I try
to keep existing behaviors where possible since I barely even have the
bandwidth to fix my own code.

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 31/37] iommu/arm-smmu-v3: Add support for PCI ATS
@ 2018-03-14 13:09             ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-14 13:09 UTC (permalink / raw)
  To: linux-arm-kernel

On 08/03/18 16:17, Jonathan Cameron wrote:
>> +	arm_smmu_enable_ats(master);
> It's a bit nasty not to handle the errors that this could output (other than
> the ENOSYS for when it's not available). Seems that it would be nice to at
> least add a note to the log if people are expecting it to work and it won't
> because some condition or other isn't met.

I agree it's not ideal. Last time this came up the problem was that
checking if ATS is supported requires an ugly ifdef. A proper
implementation requires more support in the PCI core, e.g. a
pci_ats_supported() function.

https://www.spinics.net/lists/kvm/msg145932.html

>> +
>>  	group = iommu_group_get_for_dev(dev);
>> -	if (!IS_ERR(group)) {
>> -		arm_smmu_insert_master(smmu, master);
>> -		iommu_group_put(group);
>> -		iommu_device_link(&smmu->iommu, dev);
>> +	if (IS_ERR(group)) {
>> +		ret = PTR_ERR(group);
>> +		goto err_disable_ats;
>>  	}
>>  
>> -	return PTR_ERR_OR_ZERO(group);
>> +	iommu_group_put(group);
>> +	arm_smmu_insert_master(smmu, master);
>> +	iommu_device_link(&smmu->iommu, dev);
>> +
>> +	return 0;
>> +
>> +err_disable_ats:
>> +	arm_smmu_disable_ats(master);
> master is leaked here I think...
> Possibly other things as this doesn't line up with the
> remove which I'd have mostly expected it to do.

> There are some slightly fishy bits of ordering in the original code
> anyway that I'm not seeing justification for (why is
> the iommu_device_unlink later than one might expect for
> example).

Yeah, knowing the rest of the probing code, there may exist subtle legacy
reasons for not freeing the master here and the strange orderings. I try
to keep existing behaviors where possible since I barely even have the
bandwidth to fix my own code.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 35/37] iommu/arm-smmu-v3: Add support for PRI
  2018-03-08 16:24         ` Jonathan Cameron
  (?)
@ 2018-03-14 13:10             ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-14 13:10 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Mark Rutland, ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, Catalin Marinas,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw

On 08/03/18 16:24, Jonathan Cameron wrote:
> On Mon, 12 Feb 2018 18:33:50 +0000
> Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:
> 
>> For PCI devices that support it, enable the PRI capability and handle
>> PRI Page Requests with the generic fault handler.
>>
>> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
> A couple of nitpicks.
> 
>> ---
>>  drivers/iommu/arm-smmu-v3.c | 174 ++++++++++++++++++++++++++++++--------------
>>  1 file changed, 119 insertions(+), 55 deletions(-)
>>
>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>> index 8d09615fab35..ace2f995b0c0 100644
>> --- a/drivers/iommu/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm-smmu-v3.c
>> @@ -271,6 +271,7 @@
>>  #define STRTAB_STE_1_S1COR_SHIFT	4
>>  #define STRTAB_STE_1_S1CSH_SHIFT	6
>>  
>> +#define STRTAB_STE_1_PPAR		(1UL << 18)
>>  #define STRTAB_STE_1_S1STALLD		(1UL << 27)
>>  
>>  #define STRTAB_STE_1_EATS_ABT		0UL
>> @@ -346,9 +347,9 @@
>>  #define CMDQ_PRI_1_GRPID_SHIFT		0
>>  #define CMDQ_PRI_1_GRPID_MASK		0x1ffUL
>>  #define CMDQ_PRI_1_RESP_SHIFT		12
>> -#define CMDQ_PRI_1_RESP_DENY		(0UL << CMDQ_PRI_1_RESP_SHIFT)
>> -#define CMDQ_PRI_1_RESP_FAIL		(1UL << CMDQ_PRI_1_RESP_SHIFT)
>> -#define CMDQ_PRI_1_RESP_SUCC		(2UL << CMDQ_PRI_1_RESP_SHIFT)
>> +#define CMDQ_PRI_1_RESP_FAILURE		(0UL << CMDQ_PRI_1_RESP_SHIFT)
>> +#define CMDQ_PRI_1_RESP_INVALID		(1UL << CMDQ_PRI_1_RESP_SHIFT)
>> +#define CMDQ_PRI_1_RESP_SUCCESS		(2UL << CMDQ_PRI_1_RESP_SHIFT)
> Mixing fixing up this naming with the rest of the patch does make things a
> little harder to read than they would have been if done as separate patches.
> Worth splitting?

ok

[...]
> 
> The function ordering gets a bit random as you add all the new ones,
> Might be better to keep each disable following each enable.

Agreed

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 35/37] iommu/arm-smmu-v3: Add support for PRI
@ 2018-03-14 13:10             ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-14 13:10 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Mark Rutland, xieyisheng1, ilias.apalodimas, kvm, linux-pci,
	xuzaibo, Will Deacon, okaya, yi.l.liu, Lorenzo Pieralisi,
	ashok.raj, tn, joro, robdclark, bharatku, linux-acpi,
	Catalin Marinas, rfranz, lenb, devicetree, jacob.jun.pan,
	alex.williamson, robh+dt, thunder.leizhen, bhelgaas,
	linux-arm-kernel, shunyong.yang, dwmw2, liubo95, rjw, jcrouse,
	iommu, hanjun.guo, Sudeep Holla, Robin Murphy, christian.koenig,
	nwatters

On 08/03/18 16:24, Jonathan Cameron wrote:
> On Mon, 12 Feb 2018 18:33:50 +0000
> Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:
> 
>> For PCI devices that support it, enable the PRI capability and handle
>> PRI Page Requests with the generic fault handler.
>>
>> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> A couple of nitpicks.
> 
>> ---
>>  drivers/iommu/arm-smmu-v3.c | 174 ++++++++++++++++++++++++++++++--------------
>>  1 file changed, 119 insertions(+), 55 deletions(-)
>>
>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>> index 8d09615fab35..ace2f995b0c0 100644
>> --- a/drivers/iommu/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm-smmu-v3.c
>> @@ -271,6 +271,7 @@
>>  #define STRTAB_STE_1_S1COR_SHIFT	4
>>  #define STRTAB_STE_1_S1CSH_SHIFT	6
>>  
>> +#define STRTAB_STE_1_PPAR		(1UL << 18)
>>  #define STRTAB_STE_1_S1STALLD		(1UL << 27)
>>  
>>  #define STRTAB_STE_1_EATS_ABT		0UL
>> @@ -346,9 +347,9 @@
>>  #define CMDQ_PRI_1_GRPID_SHIFT		0
>>  #define CMDQ_PRI_1_GRPID_MASK		0x1ffUL
>>  #define CMDQ_PRI_1_RESP_SHIFT		12
>> -#define CMDQ_PRI_1_RESP_DENY		(0UL << CMDQ_PRI_1_RESP_SHIFT)
>> -#define CMDQ_PRI_1_RESP_FAIL		(1UL << CMDQ_PRI_1_RESP_SHIFT)
>> -#define CMDQ_PRI_1_RESP_SUCC		(2UL << CMDQ_PRI_1_RESP_SHIFT)
>> +#define CMDQ_PRI_1_RESP_FAILURE		(0UL << CMDQ_PRI_1_RESP_SHIFT)
>> +#define CMDQ_PRI_1_RESP_INVALID		(1UL << CMDQ_PRI_1_RESP_SHIFT)
>> +#define CMDQ_PRI_1_RESP_SUCCESS		(2UL << CMDQ_PRI_1_RESP_SHIFT)
> Mixing fixing up this naming with the rest of the patch does make things a
> little harder to read than they would have been if done as separate patches.
> Worth splitting?

ok

[...]
> 
> The function ordering gets a bit random as you add all the new ones,
> Might be better to keep each disable following each enable.

Agreed

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 35/37] iommu/arm-smmu-v3: Add support for PRI
@ 2018-03-14 13:10             ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-14 13:10 UTC (permalink / raw)
  To: linux-arm-kernel

On 08/03/18 16:24, Jonathan Cameron wrote:
> On Mon, 12 Feb 2018 18:33:50 +0000
> Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:
> 
>> For PCI devices that support it, enable the PRI capability and handle
>> PRI Page Requests with the generic fault handler.
>>
>> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> A couple of nitpicks.
> 
>> ---
>>  drivers/iommu/arm-smmu-v3.c | 174 ++++++++++++++++++++++++++++++--------------
>>  1 file changed, 119 insertions(+), 55 deletions(-)
>>
>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>> index 8d09615fab35..ace2f995b0c0 100644
>> --- a/drivers/iommu/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm-smmu-v3.c
>> @@ -271,6 +271,7 @@
>>  #define STRTAB_STE_1_S1COR_SHIFT	4
>>  #define STRTAB_STE_1_S1CSH_SHIFT	6
>>  
>> +#define STRTAB_STE_1_PPAR		(1UL << 18)
>>  #define STRTAB_STE_1_S1STALLD		(1UL << 27)
>>  
>>  #define STRTAB_STE_1_EATS_ABT		0UL
>> @@ -346,9 +347,9 @@
>>  #define CMDQ_PRI_1_GRPID_SHIFT		0
>>  #define CMDQ_PRI_1_GRPID_MASK		0x1ffUL
>>  #define CMDQ_PRI_1_RESP_SHIFT		12
>> -#define CMDQ_PRI_1_RESP_DENY		(0UL << CMDQ_PRI_1_RESP_SHIFT)
>> -#define CMDQ_PRI_1_RESP_FAIL		(1UL << CMDQ_PRI_1_RESP_SHIFT)
>> -#define CMDQ_PRI_1_RESP_SUCC		(2UL << CMDQ_PRI_1_RESP_SHIFT)
>> +#define CMDQ_PRI_1_RESP_FAILURE		(0UL << CMDQ_PRI_1_RESP_SHIFT)
>> +#define CMDQ_PRI_1_RESP_INVALID		(1UL << CMDQ_PRI_1_RESP_SHIFT)
>> +#define CMDQ_PRI_1_RESP_SUCCESS		(2UL << CMDQ_PRI_1_RESP_SHIFT)
> Mixing fixing up this naming with the rest of the patch does make things a
> little harder to read than they would have been if done as separate patches.
> Worth splitting?

ok

[...]
> 
> The function ordering gets a bit random as you add all the new ones,
> Might be better to keep each disable following each enable.

Agreed

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 03/37] iommu/sva: Manage process address spaces
  2018-02-12 18:33     ` Jean-Philippe Brucker
  (?)
@ 2018-04-10 18:53       ` Sinan Kaya
  -1 siblings, 0 replies; 317+ messages in thread
From: Sinan Kaya @ 2018-04-10 18:53 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu, kvm
  Cc: mark.rutland, xieyisheng1, ilias.apalodimas, catalin.marinas,
	xuzaibo, jonathan.cameron, will.deacon, yi.l.liu,
	lorenzo.pieralisi, ashok.raj, tn, joro, bharatku, rfranz, lenb,
	jacob.jun.pan, alex.williamson, robh+dt, thunder.leizhen,
	bhelgaas, shunyong.yang, dwmw2, liubo95, rjw, jcrouse, robdclark,
	hanjun.guo, sudeep.holla, robin.murphy, christian.koenig,
	nwatters

On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
> +static void io_mm_detach_all_locked(struct iommu_bond *bond)
> +{
> +	while (!io_mm_detach_locked(bond));
> +}
> +

I don't remember if I mentioned this before or not but I think this loop
needs a little bit relaxation with yield and maybe an informational message
with might help if wait exceeds some time.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 03/37] iommu/sva: Manage process address spaces
@ 2018-04-10 18:53       ` Sinan Kaya
  0 siblings, 0 replies; 317+ messages in thread
From: Sinan Kaya @ 2018-04-10 18:53 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu, kvm
  Cc: mark.rutland, xieyisheng1, ilias.apalodimas, catalin.marinas,
	xuzaibo, jonathan.cameron, will.deacon, yi.l.liu,
	lorenzo.pieralisi, ashok.raj, tn, joro, bharatku, rfranz, lenb,
	jacob.jun.pan, alex.williamson, robh+dt, thunder.leizhen,
	bhelgaas, shunyong.yang, dwmw2, liubo95, rjw, jcrouse, robdclark,
	hanjun.guo, sudeep.holla, robin.murphy, christian.koenig,
	nwatters

On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
> +static void io_mm_detach_all_locked(struct iommu_bond *bond)
> +{
> +	while (!io_mm_detach_locked(bond));
> +}
> +

I don't remember if I mentioned this before or not but I think this loop
needs a little bit relaxation with yield and maybe an informational message
with might help if wait exceeds some time.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 03/37] iommu/sva: Manage process address spaces
@ 2018-04-10 18:53       ` Sinan Kaya
  0 siblings, 0 replies; 317+ messages in thread
From: Sinan Kaya @ 2018-04-10 18:53 UTC (permalink / raw)
  To: linux-arm-kernel

On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
> +static void io_mm_detach_all_locked(struct iommu_bond *bond)
> +{
> +	while (!io_mm_detach_locked(bond));
> +}
> +

I don't remember if I mentioned this before or not but I think this loop
needs a little bit relaxation with yield and maybe an informational message
with might help if wait exceeds some time.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 03/37] iommu/sva: Manage process address spaces
  2018-04-10 18:53       ` Sinan Kaya
  (?)
@ 2018-04-13 10:59           ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-04-13 10:59 UTC (permalink / raw)
  To: Sinan Kaya, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Mark Rutland, ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	Catalin Marinas, xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	rjw-LthD3rsA81gm4RdzfppkhA, Sudeep Holla,
	christian.koenig-5C7GfCeVMHo

On 10/04/18 19:53, Sinan Kaya wrote:
> On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
>> +static void io_mm_detach_all_locked(struct iommu_bond *bond)
>> +{
>> +	while (!io_mm_detach_locked(bond));
>> +}
>> +
> 
> I don't remember if I mentioned this before or not but I think this loop
> needs a little bit relaxation with yield and maybe an informational message
> with might help if wait exceeds some time.

Right, at the very least we should have a cpu_relax here. I think this
bit is going away, though, because I want to lift the possibility of
calling bind() for the same dev/mm pair multiple times. It's not useful
in my opinion because that call could only be issued by a given driver.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 03/37] iommu/sva: Manage process address spaces
@ 2018-04-13 10:59           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-04-13 10:59 UTC (permalink / raw)
  To: Sinan Kaya, linux-arm-kernel, linux-pci, linux-acpi, devicetree,
	iommu, kvm
  Cc: joro, robh+dt, Mark Rutland, Catalin Marinas, Will Deacon,
	Lorenzo Pieralisi, hanjun.guo, Sudeep Holla, rjw, lenb,
	Robin Murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, jcrouse, rfranz,
	dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

On 10/04/18 19:53, Sinan Kaya wrote:
> On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
>> +static void io_mm_detach_all_locked(struct iommu_bond *bond)
>> +{
>> +	while (!io_mm_detach_locked(bond));
>> +}
>> +
> 
> I don't remember if I mentioned this before or not but I think this loop
> needs a little bit relaxation with yield and maybe an informational message
> with might help if wait exceeds some time.

Right, at the very least we should have a cpu_relax here. I think this
bit is going away, though, because I want to lift the possibility of
calling bind() for the same dev/mm pair multiple times. It's not useful
in my opinion because that call could only be issued by a given driver.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 03/37] iommu/sva: Manage process address spaces
@ 2018-04-13 10:59           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-04-13 10:59 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/04/18 19:53, Sinan Kaya wrote:
> On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
>> +static void io_mm_detach_all_locked(struct iommu_bond *bond)
>> +{
>> +	while (!io_mm_detach_locked(bond));
>> +}
>> +
> 
> I don't remember if I mentioned this before or not but I think this loop
> needs a little bit relaxation with yield and maybe an informational message
> with might help if wait exceeds some time.

Right, at the very least we should have a cpu_relax here. I think this
bit is going away, though, because I want to lift the possibility of
calling bind() for the same dev/mm pair multiple times. It's not useful
in my opinion because that call could only be issued by a given driver.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 03/37] iommu/sva: Manage process address spaces
  2018-02-12 18:33     ` Jean-Philippe Brucker
  (?)
@ 2018-04-24  1:32         ` Sinan Kaya
  -1 siblings, 0 replies; 317+ messages in thread
From: Sinan Kaya @ 2018-04-24  1:32 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: mark.rutland-5wv7dgnIgG8,
	ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	catalin.marinas-5wv7dgnIgG8, xuzaibo-hv44wF8Li93QT0dZR+AlfA,
	will.deacon-5wv7dgnIgG8, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	rjw-LthD3rsA81gm4RdzfppkhA, sudeep.holla-5wv7dgnIgG8,
	christian.koenig-5C7GfCeVMHo

On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
> /**
>   * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a device
>   * @dev: the device
> @@ -129,7 +439,10 @@ EXPORT_SYMBOL_GPL(iommu_sva_device_shutdown);
>  int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
>  			  unsigned long flags, void *drvdata)
>  {
> +	int i, ret;
> +	struct io_mm *io_mm = NULL;
>  	struct iommu_domain *domain;
> +	struct iommu_bond *bond = NULL, *tmp;
>  	struct iommu_param *dev_param = dev->iommu_param;
>  
>  	domain = iommu_get_domain_for_dev(dev);
> @@ -145,7 +458,42 @@ int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
>  	if (flags != (IOMMU_SVA_FEAT_PASID | IOMMU_SVA_FEAT_IOPF))
>  		return -EINVAL;
>  
> -	return -ENOSYS; /* TODO */
> +	/* If an io_mm already exists, use it */
> +	spin_lock(&iommu_sva_lock);
> +	idr_for_each_entry(&iommu_pasid_idr, io_mm, i) {
> +		if (io_mm->mm != mm || !io_mm_get_locked(io_mm))
> +			continue;
> +
> +		/* Is it already bound to this device? */
> +		list_for_each_entry(tmp, &io_mm->devices, mm_head) {
> +			if (tmp->dev != dev)
> +				continue;
> +
> +			bond = tmp;
> +			refcount_inc(&bond->refs);
> +			io_mm_put_locked(io_mm);
> +			break;
> +		}
> +		break;
> +	}
> +	spin_unlock(&iommu_sva_lock);
> +
> +	if (bond)

Please return pasid when you find an io_mm that is already bound. Something like
*pasid = io_mm->pasid should do the work here when bond is true.

> +		return 0;


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 03/37] iommu/sva: Manage process address spaces
@ 2018-04-24  1:32         ` Sinan Kaya
  0 siblings, 0 replies; 317+ messages in thread
From: Sinan Kaya @ 2018-04-24  1:32 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu, kvm
  Cc: mark.rutland, xieyisheng1, ilias.apalodimas, catalin.marinas,
	xuzaibo, jonathan.cameron, will.deacon, yi.l.liu,
	lorenzo.pieralisi, ashok.raj, tn, joro, bharatku, rfranz, lenb,
	jacob.jun.pan, alex.williamson, robh+dt, thunder.leizhen,
	bhelgaas, shunyong.yang, dwmw2, liubo95, rjw, jcrouse, robdclark,
	hanjun.guo, sudeep.holla, robin.murphy, christian.koenig,
	nwatters

On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
> /**
>   * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a device
>   * @dev: the device
> @@ -129,7 +439,10 @@ EXPORT_SYMBOL_GPL(iommu_sva_device_shutdown);
>  int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
>  			  unsigned long flags, void *drvdata)
>  {
> +	int i, ret;
> +	struct io_mm *io_mm = NULL;
>  	struct iommu_domain *domain;
> +	struct iommu_bond *bond = NULL, *tmp;
>  	struct iommu_param *dev_param = dev->iommu_param;
>  
>  	domain = iommu_get_domain_for_dev(dev);
> @@ -145,7 +458,42 @@ int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
>  	if (flags != (IOMMU_SVA_FEAT_PASID | IOMMU_SVA_FEAT_IOPF))
>  		return -EINVAL;
>  
> -	return -ENOSYS; /* TODO */
> +	/* If an io_mm already exists, use it */
> +	spin_lock(&iommu_sva_lock);
> +	idr_for_each_entry(&iommu_pasid_idr, io_mm, i) {
> +		if (io_mm->mm != mm || !io_mm_get_locked(io_mm))
> +			continue;
> +
> +		/* Is it already bound to this device? */
> +		list_for_each_entry(tmp, &io_mm->devices, mm_head) {
> +			if (tmp->dev != dev)
> +				continue;
> +
> +			bond = tmp;
> +			refcount_inc(&bond->refs);
> +			io_mm_put_locked(io_mm);
> +			break;
> +		}
> +		break;
> +	}
> +	spin_unlock(&iommu_sva_lock);
> +
> +	if (bond)

Please return pasid when you find an io_mm that is already bound. Something like
*pasid = io_mm->pasid should do the work here when bond is true.

> +		return 0;


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 03/37] iommu/sva: Manage process address spaces
@ 2018-04-24  1:32         ` Sinan Kaya
  0 siblings, 0 replies; 317+ messages in thread
From: Sinan Kaya @ 2018-04-24  1:32 UTC (permalink / raw)
  To: linux-arm-kernel

On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
> /**
>   * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a device
>   * @dev: the device
> @@ -129,7 +439,10 @@ EXPORT_SYMBOL_GPL(iommu_sva_device_shutdown);
>  int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
>  			  unsigned long flags, void *drvdata)
>  {
> +	int i, ret;
> +	struct io_mm *io_mm = NULL;
>  	struct iommu_domain *domain;
> +	struct iommu_bond *bond = NULL, *tmp;
>  	struct iommu_param *dev_param = dev->iommu_param;
>  
>  	domain = iommu_get_domain_for_dev(dev);
> @@ -145,7 +458,42 @@ int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
>  	if (flags != (IOMMU_SVA_FEAT_PASID | IOMMU_SVA_FEAT_IOPF))
>  		return -EINVAL;
>  
> -	return -ENOSYS; /* TODO */
> +	/* If an io_mm already exists, use it */
> +	spin_lock(&iommu_sva_lock);
> +	idr_for_each_entry(&iommu_pasid_idr, io_mm, i) {
> +		if (io_mm->mm != mm || !io_mm_get_locked(io_mm))
> +			continue;
> +
> +		/* Is it already bound to this device? */
> +		list_for_each_entry(tmp, &io_mm->devices, mm_head) {
> +			if (tmp->dev != dev)
> +				continue;
> +
> +			bond = tmp;
> +			refcount_inc(&bond->refs);
> +			io_mm_put_locked(io_mm);
> +			break;
> +		}
> +		break;
> +	}
> +	spin_unlock(&iommu_sva_lock);
> +
> +	if (bond)

Please return pasid when you find an io_mm that is already bound. Something like
*pasid = io_mm->pasid should do the work here when bond is true.

> +		return 0;


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 03/37] iommu/sva: Manage process address spaces
  2018-04-24  1:32         ` Sinan Kaya
  (?)
@ 2018-04-24  9:33             ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-04-24  9:33 UTC (permalink / raw)
  To: Sinan Kaya, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Mark Rutland, ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	Catalin Marinas, xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	rjw-LthD3rsA81gm4RdzfppkhA, Sudeep Holla,
	christian.koenig-5C7GfCeVMHo

On 24/04/18 02:32, Sinan Kaya wrote:
> On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
>> /**
>>   * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a device
>>   * @dev: the device
>> @@ -129,7 +439,10 @@ EXPORT_SYMBOL_GPL(iommu_sva_device_shutdown);
>>  int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
>>  			  unsigned long flags, void *drvdata)
>>  {
>> +	int i, ret;
>> +	struct io_mm *io_mm = NULL;
>>  	struct iommu_domain *domain;
>> +	struct iommu_bond *bond = NULL, *tmp;
>>  	struct iommu_param *dev_param = dev->iommu_param;
>>  
>>  	domain = iommu_get_domain_for_dev(dev);
>> @@ -145,7 +458,42 @@ int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
>>  	if (flags != (IOMMU_SVA_FEAT_PASID | IOMMU_SVA_FEAT_IOPF))
>>  		return -EINVAL;
>>  
>> -	return -ENOSYS; /* TODO */
>> +	/* If an io_mm already exists, use it */
>> +	spin_lock(&iommu_sva_lock);
>> +	idr_for_each_entry(&iommu_pasid_idr, io_mm, i) {
>> +		if (io_mm->mm != mm || !io_mm_get_locked(io_mm))
>> +			continue;
>> +
>> +		/* Is it already bound to this device? */
>> +		list_for_each_entry(tmp, &io_mm->devices, mm_head) {
>> +			if (tmp->dev != dev)
>> +				continue;
>> +
>> +			bond = tmp;
>> +			refcount_inc(&bond->refs);
>> +			io_mm_put_locked(io_mm);
>> +			break;
>> +		}
>> +		break;
>> +	}
>> +	spin_unlock(&iommu_sva_lock);
>> +
>> +	if (bond)
> 
> Please return pasid when you find an io_mm that is already bound. Something like
> *pasid = io_mm->pasid should do the work here when bond is true.

Right. I think we should also keep returning 0, not switch to -EEXIST or
similar. So in next version a driver can call bind(devX, mmY) multiple
times, but the first unbind() removes the bond.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 03/37] iommu/sva: Manage process address spaces
@ 2018-04-24  9:33             ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-04-24  9:33 UTC (permalink / raw)
  To: Sinan Kaya, linux-arm-kernel, linux-pci, linux-acpi, devicetree,
	iommu, kvm
  Cc: Mark Rutland, xieyisheng1, ilias.apalodimas, Catalin Marinas,
	xuzaibo, jonathan.cameron, Will Deacon, yi.l.liu,
	Lorenzo Pieralisi, ashok.raj, tn, joro, bharatku, rfranz, lenb,
	jacob.jun.pan, alex.williamson, robh+dt, thunder.leizhen,
	bhelgaas, shunyong.yang, dwmw2, liubo95, rjw, jcrouse, robdclark,
	hanjun.guo, Sudeep Holla, Robin Murphy, christian.koenig,
	nwatters

On 24/04/18 02:32, Sinan Kaya wrote:
> On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
>> /**
>>   * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a device
>>   * @dev: the device
>> @@ -129,7 +439,10 @@ EXPORT_SYMBOL_GPL(iommu_sva_device_shutdown);
>>  int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
>>  			  unsigned long flags, void *drvdata)
>>  {
>> +	int i, ret;
>> +	struct io_mm *io_mm = NULL;
>>  	struct iommu_domain *domain;
>> +	struct iommu_bond *bond = NULL, *tmp;
>>  	struct iommu_param *dev_param = dev->iommu_param;
>>  
>>  	domain = iommu_get_domain_for_dev(dev);
>> @@ -145,7 +458,42 @@ int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
>>  	if (flags != (IOMMU_SVA_FEAT_PASID | IOMMU_SVA_FEAT_IOPF))
>>  		return -EINVAL;
>>  
>> -	return -ENOSYS; /* TODO */
>> +	/* If an io_mm already exists, use it */
>> +	spin_lock(&iommu_sva_lock);
>> +	idr_for_each_entry(&iommu_pasid_idr, io_mm, i) {
>> +		if (io_mm->mm != mm || !io_mm_get_locked(io_mm))
>> +			continue;
>> +
>> +		/* Is it already bound to this device? */
>> +		list_for_each_entry(tmp, &io_mm->devices, mm_head) {
>> +			if (tmp->dev != dev)
>> +				continue;
>> +
>> +			bond = tmp;
>> +			refcount_inc(&bond->refs);
>> +			io_mm_put_locked(io_mm);
>> +			break;
>> +		}
>> +		break;
>> +	}
>> +	spin_unlock(&iommu_sva_lock);
>> +
>> +	if (bond)
> 
> Please return pasid when you find an io_mm that is already bound. Something like
> *pasid = io_mm->pasid should do the work here when bond is true.

Right. I think we should also keep returning 0, not switch to -EEXIST or
similar. So in next version a driver can call bind(devX, mmY) multiple
times, but the first unbind() removes the bond.

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 03/37] iommu/sva: Manage process address spaces
@ 2018-04-24  9:33             ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-04-24  9:33 UTC (permalink / raw)
  To: linux-arm-kernel

On 24/04/18 02:32, Sinan Kaya wrote:
> On 2/12/2018 1:33 PM, Jean-Philippe Brucker wrote:
>> /**
>>   * iommu_sva_device_init() - Initialize Shared Virtual Addressing for a device
>>   * @dev: the device
>> @@ -129,7 +439,10 @@ EXPORT_SYMBOL_GPL(iommu_sva_device_shutdown);
>>  int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
>>  			  unsigned long flags, void *drvdata)
>>  {
>> +	int i, ret;
>> +	struct io_mm *io_mm = NULL;
>>  	struct iommu_domain *domain;
>> +	struct iommu_bond *bond = NULL, *tmp;
>>  	struct iommu_param *dev_param = dev->iommu_param;
>>  
>>  	domain = iommu_get_domain_for_dev(dev);
>> @@ -145,7 +458,42 @@ int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
>>  	if (flags != (IOMMU_SVA_FEAT_PASID | IOMMU_SVA_FEAT_IOPF))
>>  		return -EINVAL;
>>  
>> -	return -ENOSYS; /* TODO */
>> +	/* If an io_mm already exists, use it */
>> +	spin_lock(&iommu_sva_lock);
>> +	idr_for_each_entry(&iommu_pasid_idr, io_mm, i) {
>> +		if (io_mm->mm != mm || !io_mm_get_locked(io_mm))
>> +			continue;
>> +
>> +		/* Is it already bound to this device? */
>> +		list_for_each_entry(tmp, &io_mm->devices, mm_head) {
>> +			if (tmp->dev != dev)
>> +				continue;
>> +
>> +			bond = tmp;
>> +			refcount_inc(&bond->refs);
>> +			io_mm_put_locked(io_mm);
>> +			break;
>> +		}
>> +		break;
>> +	}
>> +	spin_unlock(&iommu_sva_lock);
>> +
>> +	if (bond)
> 
> Please return pasid when you find an io_mm that is already bound. Something like
> *pasid = io_mm->pasid should do the work here when bond is true.

Right. I think we should also keep returning 0, not switch to -EEXIST or
similar. So in next version a driver can call bind(devX, mmY) multiple
times, but the first unbind() removes the bond.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 03/37] iommu/sva: Manage process address spaces
  2018-04-24  9:33             ` Jean-Philippe Brucker
  (?)
@ 2018-04-24 17:17                 ` Sinan Kaya
  -1 siblings, 0 replies; 317+ messages in thread
From: Sinan Kaya @ 2018-04-24 17:17 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Mark Rutland, ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	Catalin Marinas, xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	rjw-LthD3rsA81gm4RdzfppkhA, Sudeep Holla,
	christian.koenig-5C7GfCeVMHo

On 4/24/2018 5:33 AM, Jean-Philippe Brucker wrote:
>> Please return pasid when you find an io_mm that is already bound. Something like
>> *pasid = io_mm->pasid should do the work here when bond is true.
> Right. I think we should also keep returning 0, not switch to -EEXIST or
> similar. So in next version a driver can call bind(devX, mmY) multiple
> times, but the first unbind() removes the bond.

If we are going to allow multiple binds, then the last unbind should
remove the bond rather than the first one via reference counting.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 03/37] iommu/sva: Manage process address spaces
@ 2018-04-24 17:17                 ` Sinan Kaya
  0 siblings, 0 replies; 317+ messages in thread
From: Sinan Kaya @ 2018-04-24 17:17 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu, kvm
  Cc: joro, robh+dt, Mark Rutland, Catalin Marinas, Will Deacon,
	Lorenzo Pieralisi, hanjun.guo, Sudeep Holla, rjw, lenb,
	Robin Murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, xuzaibo, ilias.apalodimas,
	jonathan.cameron, shunyong.yang, nwatters, jcrouse, rfranz,
	dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark,
	christian.koenig, bharatku

On 4/24/2018 5:33 AM, Jean-Philippe Brucker wrote:
>> Please return pasid when you find an io_mm that is already bound. Something like
>> *pasid = io_mm->pasid should do the work here when bond is true.
> Right. I think we should also keep returning 0, not switch to -EEXIST or
> similar. So in next version a driver can call bind(devX, mmY) multiple
> times, but the first unbind() removes the bond.

If we are going to allow multiple binds, then the last unbind should
remove the bond rather than the first one via reference counting.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 03/37] iommu/sva: Manage process address spaces
@ 2018-04-24 17:17                 ` Sinan Kaya
  0 siblings, 0 replies; 317+ messages in thread
From: Sinan Kaya @ 2018-04-24 17:17 UTC (permalink / raw)
  To: linux-arm-kernel

On 4/24/2018 5:33 AM, Jean-Philippe Brucker wrote:
>> Please return pasid when you find an io_mm that is already bound. Something like
>> *pasid = io_mm->pasid should do the work here when bond is true.
> Right. I think we should also keep returning 0, not switch to -EEXIST or
> similar. So in next version a driver can call bind(devX, mmY) multiple
> times, but the first unbind() removes the bond.

If we are going to allow multiple binds, then the last unbind should
remove the bond rather than the first one via reference counting.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 03/37] iommu/sva: Manage process address spaces
  2018-04-24 17:17                 ` Sinan Kaya
@ 2018-04-24 18:52                     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker via iommu @ 2018-04-24 18:52 UTC (permalink / raw)
  To: Sinan Kaya, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Mark Rutland, ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	Catalin Marinas, xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	rjw-LthD3rsA81gm4RdzfppkhA, Sudeep Holla,
	christian.koenig-5C7GfCeVMHo

On 24/04/18 18:17, Sinan Kaya wrote:
> On 4/24/2018 5:33 AM, Jean-Philippe Brucker wrote:
>>> Please return pasid when you find an io_mm that is already bound. Something like
>>> *pasid = io_mm->pasid should do the work here when bond is true.
>> Right. I think we should also keep returning 0, not switch to -EEXIST or
>> similar. So in next version a driver can call bind(devX, mmY) multiple
>> times, but the first unbind() removes the bond.
> 
> If we are going to allow multiple binds, then the last unbind should
> remove the bond rather than the first one via reference counting.

Yeah that's probably better. Since a bond belongs to a device driver it
doesn't need multiple bind/unbind, so earlier in this thread (1/37) I
talked about removing the bond->refs. But thinking about it, there still
is a need for it. When mm exits, we now need to call the device driver's
mm_exit handler outside of the spinlock, so we have to take a ref in
order to prevent a concurrent unbind() from freeing the bond.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 03/37] iommu/sva: Manage process address spaces
@ 2018-04-24 18:52                     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-04-24 18:52 UTC (permalink / raw)
  To: linux-arm-kernel

On 24/04/18 18:17, Sinan Kaya wrote:
> On 4/24/2018 5:33 AM, Jean-Philippe Brucker wrote:
>>> Please return pasid when you find an io_mm that is already bound. Something like
>>> *pasid = io_mm->pasid should do the work here when bond is true.
>> Right. I think we should also keep returning 0, not switch to -EEXIST or
>> similar. So in next version a driver can call bind(devX, mmY) multiple
>> times, but the first unbind() removes the bond.
> 
> If we are going to allow multiple binds, then the last unbind should
> remove the bond rather than the first one via reference counting.

Yeah that's probably better. Since a bond belongs to a device driver it
doesn't need multiple bind/unbind, so earlier in this thread (1/37) I
talked about removing the bond->refs. But thinking about it, there still
is a need for it. When mm exits, we now need to call the device driver's
mm_exit handler outside of the spinlock, so we have to take a ref in
order to prevent a concurrent unbind() from freeing the bond.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 27/37] iommu/arm-smmu-v3: Register fault workqueue
  2018-03-22  1:09               ` Yisheng Xie
  (?)
@ 2018-04-04 10:13                   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-04-04 10:13 UTC (permalink / raw)
  To: Yisheng Xie
  Cc: Mark Rutland, ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, robh-DgEjT+Ai2ygdnm+yROfE0A,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, Catalin Marinas,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw-LthD3rsA81gHJAlzAAQw8A

On 22/03/18 01:09, Yisheng Xie wrote:
> Hi Jean,
> 
> On 2018/3/21 21:24, Jean-Philippe Brucker wrote:
>> Hi Yisheng,
>>
>> On 19/03/18 11:03, Yisheng Xie wrote:
>>> Hi Jean,
>>>
>>> [...]
>>>> @@ -3168,6 +3260,13 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
>>>>  	if (ret)
>>>>  		return ret;
>>>>  
>>>> +	if (smmu->features & (ARM_SMMU_FEAT_STALLS | ARM_SMMU_FEAT_PRI)) {
>>>> +		smmu->faultq_nb.notifier_call = arm_smmu_flush_queues;
>>>> +		ret = iommu_fault_queue_register(&smmu->faultq_nb);
>>>> +		if (ret)
>>>> +			return ret;
>>>
>>> I find a case here: CONFIG_IOMMU_FAULT=n, and smmu support feature STALLS or PRI,
>>> the device probe will failed here, is this what we want?
>>
>> Since CONFIG_ARM_SMMU_V3 selects CONFIG_IOMMU_FAULT, I don't think it
>> can happen. 
> 
> But CONFIG_IOMMU_FAULT can be changed after select, maybe can make it as unchangeable. Seems sensible, I don't see a reason to leave IOMMU_FAULT selectable
manually.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 27/37] iommu/arm-smmu-v3: Register fault workqueue
@ 2018-04-04 10:13                   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-04-04 10:13 UTC (permalink / raw)
  To: Yisheng Xie
  Cc: Mark Rutland, ilias.apalodimas, kvm, linux-pci, xuzaibo,
	jonathan.cameron, Will Deacon, okaya, robh, Lorenzo Pieralisi,
	ashok.raj, tn, joro, robdclark, bharatku, linux-acpi,
	Catalin Marinas, rfranz, lenb, devicetree, jacob.jun.pan,
	alex.williamson, yi.l.liu, thunder.leizhen, bhelgaas,
	linux-arm-kernel, shunyong.yang, dwmw2, liubo95, rjw, jcrouse,
	iommu, hanjun.guo, Sudeep Holla, Robin Murphy, christian.koenig,
	nwatters

On 22/03/18 01:09, Yisheng Xie wrote:
> Hi Jean,
> 
> On 2018/3/21 21:24, Jean-Philippe Brucker wrote:
>> Hi Yisheng,
>>
>> On 19/03/18 11:03, Yisheng Xie wrote:
>>> Hi Jean,
>>>
>>> [...]
>>>> @@ -3168,6 +3260,13 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
>>>>  	if (ret)
>>>>  		return ret;
>>>>  
>>>> +	if (smmu->features & (ARM_SMMU_FEAT_STALLS | ARM_SMMU_FEAT_PRI)) {
>>>> +		smmu->faultq_nb.notifier_call = arm_smmu_flush_queues;
>>>> +		ret = iommu_fault_queue_register(&smmu->faultq_nb);
>>>> +		if (ret)
>>>> +			return ret;
>>>
>>> I find a case here: CONFIG_IOMMU_FAULT=n, and smmu support feature STALLS or PRI,
>>> the device probe will failed here, is this what we want?
>>
>> Since CONFIG_ARM_SMMU_V3 selects CONFIG_IOMMU_FAULT, I don't think it
>> can happen. 
> 
> But CONFIG_IOMMU_FAULT can be changed after select, maybe can make it as unchangeable. Seems sensible, I don't see a reason to leave IOMMU_FAULT selectable
manually.

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 27/37] iommu/arm-smmu-v3: Register fault workqueue
@ 2018-04-04 10:13                   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-04-04 10:13 UTC (permalink / raw)
  To: linux-arm-kernel

On 22/03/18 01:09, Yisheng Xie wrote:
> Hi Jean,
> 
> On 2018/3/21 21:24, Jean-Philippe Brucker wrote:
>> Hi Yisheng,
>>
>> On 19/03/18 11:03, Yisheng Xie wrote:
>>> Hi Jean,
>>>
>>> [...]
>>>> @@ -3168,6 +3260,13 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
>>>>  	if (ret)
>>>>  		return ret;
>>>>  
>>>> +	if (smmu->features & (ARM_SMMU_FEAT_STALLS | ARM_SMMU_FEAT_PRI)) {
>>>> +		smmu->faultq_nb.notifier_call = arm_smmu_flush_queues;
>>>> +		ret = iommu_fault_queue_register(&smmu->faultq_nb);
>>>> +		if (ret)
>>>> +			return ret;
>>>
>>> I find a case here: CONFIG_IOMMU_FAULT=n, and smmu support feature STALLS or PRI,
>>> the device probe will failed here, is this what we want?
>>
>> Since CONFIG_ARM_SMMU_V3 selects CONFIG_IOMMU_FAULT, I don't think it
>> can happen. 
> 
> But CONFIG_IOMMU_FAULT can be changed after select, maybe can make it as unchangeable. Seems sensible, I don't see a reason to leave IOMMU_FAULT selectable
manually.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 27/37] iommu/arm-smmu-v3: Register fault workqueue
  2018-03-21 13:24           ` Jean-Philippe Brucker
  (?)
@ 2018-03-22  1:09               ` Yisheng Xie
  -1 siblings, 0 replies; 317+ messages in thread
From: Yisheng Xie @ 2018-03-22  1:09 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Mark Rutland, ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, robh-DgEjT+Ai2ygdnm+yROfE0A,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, Catalin Marinas,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw-LthD3rsA81gHJAlzAAQw8A

Hi Jean,

On 2018/3/21 21:24, Jean-Philippe Brucker wrote:
> Hi Yisheng,
> 
> On 19/03/18 11:03, Yisheng Xie wrote:
>> Hi Jean,
>>
>> [...]
>>> @@ -3168,6 +3260,13 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
>>>  	if (ret)
>>>  		return ret;
>>>  
>>> +	if (smmu->features & (ARM_SMMU_FEAT_STALLS | ARM_SMMU_FEAT_PRI)) {
>>> +		smmu->faultq_nb.notifier_call = arm_smmu_flush_queues;
>>> +		ret = iommu_fault_queue_register(&smmu->faultq_nb);
>>> +		if (ret)
>>> +			return ret;
>>
>> I find a case here: CONFIG_IOMMU_FAULT=n, and smmu support feature STALLS or PRI,
>> the device probe will failed here, is this what we want?
> 
> Since CONFIG_ARM_SMMU_V3 selects CONFIG_IOMMU_FAULT, I don't think it
> can happen. 

But CONFIG_IOMMU_FAULT can be changed after select, maybe can make it as unchangeable.

Thanks
Yisheng

> I'm not sure what the best practices are, but I feel like it
> would be too much work to guard against config combinations that violate
> an explicit "select" in Kconfig
> 
> Thanks,
> Jean
> 
> .
> 

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 27/37] iommu/arm-smmu-v3: Register fault workqueue
@ 2018-03-22  1:09               ` Yisheng Xie
  0 siblings, 0 replies; 317+ messages in thread
From: Yisheng Xie @ 2018-03-22  1:09 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Mark Rutland, ilias.apalodimas, kvm, linux-pci, xuzaibo,
	jonathan.cameron, Will Deacon, okaya, robh, Lorenzo Pieralisi,
	ashok.raj, tn, joro, robdclark, bharatku, linux-acpi,
	Catalin Marinas, rfranz, lenb, devicetree, jacob.jun.pan,
	alex.williamson, yi.l.liu, thunder.leizhen, bhelgaas,
	linux-arm-kernel, shunyong.yang, dwmw2, liubo95, rjw, jcrouse,
	iommu, hanjun.guo, Sudeep Holla, Robin Murphy, christian.koenig,
	nwatters

Hi Jean,

On 2018/3/21 21:24, Jean-Philippe Brucker wrote:
> Hi Yisheng,
> 
> On 19/03/18 11:03, Yisheng Xie wrote:
>> Hi Jean,
>>
>> [...]
>>> @@ -3168,6 +3260,13 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
>>>  	if (ret)
>>>  		return ret;
>>>  
>>> +	if (smmu->features & (ARM_SMMU_FEAT_STALLS | ARM_SMMU_FEAT_PRI)) {
>>> +		smmu->faultq_nb.notifier_call = arm_smmu_flush_queues;
>>> +		ret = iommu_fault_queue_register(&smmu->faultq_nb);
>>> +		if (ret)
>>> +			return ret;
>>
>> I find a case here: CONFIG_IOMMU_FAULT=n, and smmu support feature STALLS or PRI,
>> the device probe will failed here, is this what we want?
> 
> Since CONFIG_ARM_SMMU_V3 selects CONFIG_IOMMU_FAULT, I don't think it
> can happen. 

But CONFIG_IOMMU_FAULT can be changed after select, maybe can make it as unchangeable.

Thanks
Yisheng

> I'm not sure what the best practices are, but I feel like it
> would be too much work to guard against config combinations that violate
> an explicit "select" in Kconfig
> 
> Thanks,
> Jean
> 
> .
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 27/37] iommu/arm-smmu-v3: Register fault workqueue
@ 2018-03-22  1:09               ` Yisheng Xie
  0 siblings, 0 replies; 317+ messages in thread
From: Yisheng Xie @ 2018-03-22  1:09 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jean,

On 2018/3/21 21:24, Jean-Philippe Brucker wrote:
> Hi Yisheng,
> 
> On 19/03/18 11:03, Yisheng Xie wrote:
>> Hi Jean,
>>
>> [...]
>>> @@ -3168,6 +3260,13 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
>>>  	if (ret)
>>>  		return ret;
>>>  
>>> +	if (smmu->features & (ARM_SMMU_FEAT_STALLS | ARM_SMMU_FEAT_PRI)) {
>>> +		smmu->faultq_nb.notifier_call = arm_smmu_flush_queues;
>>> +		ret = iommu_fault_queue_register(&smmu->faultq_nb);
>>> +		if (ret)
>>> +			return ret;
>>
>> I find a case here: CONFIG_IOMMU_FAULT=n, and smmu support feature STALLS or PRI,
>> the device probe will failed here, is this what we want?
> 
> Since CONFIG_ARM_SMMU_V3 selects CONFIG_IOMMU_FAULT, I don't think it
> can happen. 

But CONFIG_IOMMU_FAULT can be changed after select, maybe can make it as unchangeable.

Thanks
Yisheng

> I'm not sure what the best practices are, but I feel like it
> would be too much work to guard against config combinations that violate
> an explicit "select" in Kconfig
> 
> Thanks,
> Jean
> 
> .
> 

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 27/37] iommu/arm-smmu-v3: Register fault workqueue
  2018-03-19 11:03       ` Yisheng Xie
  (?)
@ 2018-03-21 13:24           ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-21 13:24 UTC (permalink / raw)
  To: Yisheng Xie
  Cc: Mark Rutland, ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, robh-DgEjT+Ai2ygdnm+yROfE0A,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, Catalin Marinas,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw-LthD3rsA81gHJAlzAAQw8A

Hi Yisheng,

On 19/03/18 11:03, Yisheng Xie wrote:
> Hi Jean,
> 
> [...]
>> @@ -3168,6 +3260,13 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
>>  	if (ret)
>>  		return ret;
>>  
>> +	if (smmu->features & (ARM_SMMU_FEAT_STALLS | ARM_SMMU_FEAT_PRI)) {
>> +		smmu->faultq_nb.notifier_call = arm_smmu_flush_queues;
>> +		ret = iommu_fault_queue_register(&smmu->faultq_nb);
>> +		if (ret)
>> +			return ret;
> 
> I find a case here: CONFIG_IOMMU_FAULT=n, and smmu support feature STALLS or PRI,
> the device probe will failed here, is this what we want?

Since CONFIG_ARM_SMMU_V3 selects CONFIG_IOMMU_FAULT, I don't think it
can happen. I'm not sure what the best practices are, but I feel like it
would be too much work to guard against config combinations that violate
an explicit "select" in Kconfig

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 27/37] iommu/arm-smmu-v3: Register fault workqueue
@ 2018-03-21 13:24           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-21 13:24 UTC (permalink / raw)
  To: Yisheng Xie
  Cc: Mark Rutland, ilias.apalodimas, kvm, linux-pci, xuzaibo,
	jonathan.cameron, Will Deacon, okaya, robh, Lorenzo Pieralisi,
	ashok.raj, tn, joro, robdclark, bharatku, linux-acpi,
	Catalin Marinas, rfranz, lenb, devicetree, jacob.jun.pan,
	alex.williamson, yi.l.liu, thunder.leizhen, bhelgaas,
	linux-arm-kernel, shunyong.yang, dwmw2, liubo95, rjw, jcrouse,
	iommu, hanjun.guo, Sudeep Holla, Robin Murphy, christian.koenig,
	nwatters

Hi Yisheng,

On 19/03/18 11:03, Yisheng Xie wrote:
> Hi Jean,
> 
> [...]
>> @@ -3168,6 +3260,13 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
>>  	if (ret)
>>  		return ret;
>>  
>> +	if (smmu->features & (ARM_SMMU_FEAT_STALLS | ARM_SMMU_FEAT_PRI)) {
>> +		smmu->faultq_nb.notifier_call = arm_smmu_flush_queues;
>> +		ret = iommu_fault_queue_register(&smmu->faultq_nb);
>> +		if (ret)
>> +			return ret;
> 
> I find a case here: CONFIG_IOMMU_FAULT=n, and smmu support feature STALLS or PRI,
> the device probe will failed here, is this what we want?

Since CONFIG_ARM_SMMU_V3 selects CONFIG_IOMMU_FAULT, I don't think it
can happen. I'm not sure what the best practices are, but I feel like it
would be too much work to guard against config combinations that violate
an explicit "select" in Kconfig

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 27/37] iommu/arm-smmu-v3: Register fault workqueue
@ 2018-03-21 13:24           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 317+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-21 13:24 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Yisheng,

On 19/03/18 11:03, Yisheng Xie wrote:
> Hi Jean,
> 
> [...]
>> @@ -3168,6 +3260,13 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
>>  	if (ret)
>>  		return ret;
>>  
>> +	if (smmu->features & (ARM_SMMU_FEAT_STALLS | ARM_SMMU_FEAT_PRI)) {
>> +		smmu->faultq_nb.notifier_call = arm_smmu_flush_queues;
>> +		ret = iommu_fault_queue_register(&smmu->faultq_nb);
>> +		if (ret)
>> +			return ret;
> 
> I find a case here: CONFIG_IOMMU_FAULT=n, and smmu support feature STALLS or PRI,
> the device probe will failed here, is this what we want?

Since CONFIG_ARM_SMMU_V3 selects CONFIG_IOMMU_FAULT, I don't think it
can happen. I'm not sure what the best practices are, but I feel like it
would be too much work to guard against config combinations that violate
an explicit "select" in Kconfig

Thanks,
Jean

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 27/37] iommu/arm-smmu-v3: Register fault workqueue
       [not found] ` <1519280641-30258-27-git-send-email-xieyisheng1@huawei.com>
       [not found]   ` <1519280641-30258-27-git-send-email-xieyisheng1-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2018-03-19 11:03       ` Yisheng Xie
  0 siblings, 0 replies; 317+ messages in thread
From: Yisheng Xie @ 2018-03-19 11:03 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Mark Rutland, ilias.apalodimas-QSEj5FYQhm4dnm+yROfE0A,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	xuzaibo-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, robh-DgEjT+Ai2ygdnm+yROfE0A,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	bharatku-gjFFaj9aHVfQT0dZR+AlfA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, Catalin Marinas,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw-LthD3rsA81gm4RdzfppkhA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Sudeep Holla,
	christian.koenig-5C7GfCeVMHo

Hi Jean,

[...]
> @@ -3168,6 +3260,13 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
>  	if (ret)
>  		return ret;
>  
> +	if (smmu->features & (ARM_SMMU_FEAT_STALLS | ARM_SMMU_FEAT_PRI)) {
> +		smmu->faultq_nb.notifier_call = arm_smmu_flush_queues;
> +		ret = iommu_fault_queue_register(&smmu->faultq_nb);
> +		if (ret)
> +			return ret;

I find a case here: CONFIG_IOMMU_FAULT=n, and smmu support feature STALLS or PRI,
the device probe will failed here, is this what we want?

Thanks
Yisheng

> +	}
> +
>  	/* And we're up. Go go go! */
>  	ret = iommu_device_sysfs_add(&smmu->iommu, dev, NULL,
>  				     "smmu3.%pa", &ioaddr);

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH 27/37] iommu/arm-smmu-v3: Register fault workqueue
@ 2018-03-19 11:03       ` Yisheng Xie
  0 siblings, 0 replies; 317+ messages in thread
From: Yisheng Xie @ 2018-03-19 11:03 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Mark Rutland, ilias.apalodimas, kvm, linux-pci, xuzaibo,
	jonathan.cameron, Will Deacon, okaya, robh, Lorenzo Pieralisi,
	ashok.raj, tn, joro, robdclark, bharatku, linux-acpi,
	Catalin Marinas, rfranz, lenb, devicetree, jacob.jun.pan,
	alex.williamson, yi.l.liu, thunder.leizhen, bhelgaas,
	linux-arm-kernel, shunyong.yang, dwmw2, liubo95, rjw, jcrouse,
	iommu, hanjun.guo, Sudeep Holla, Robin Murphy, christian.koenig,
	nwatters

Hi Jean,

[...]
> @@ -3168,6 +3260,13 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
>  	if (ret)
>  		return ret;
>  
> +	if (smmu->features & (ARM_SMMU_FEAT_STALLS | ARM_SMMU_FEAT_PRI)) {
> +		smmu->faultq_nb.notifier_call = arm_smmu_flush_queues;
> +		ret = iommu_fault_queue_register(&smmu->faultq_nb);
> +		if (ret)
> +			return ret;

I find a case here: CONFIG_IOMMU_FAULT=n, and smmu support feature STALLS or PRI,
the device probe will failed here, is this what we want?

Thanks
Yisheng

> +	}
> +
>  	/* And we're up. Go go go! */
>  	ret = iommu_device_sysfs_add(&smmu->iommu, dev, NULL,
>  				     "smmu3.%pa", &ioaddr);


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [PATCH 27/37] iommu/arm-smmu-v3: Register fault workqueue
@ 2018-03-19 11:03       ` Yisheng Xie
  0 siblings, 0 replies; 317+ messages in thread
From: Yisheng Xie @ 2018-03-19 11:03 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jean,

[...]
> @@ -3168,6 +3260,13 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
>  	if (ret)
>  		return ret;
>  
> +	if (smmu->features & (ARM_SMMU_FEAT_STALLS | ARM_SMMU_FEAT_PRI)) {
> +		smmu->faultq_nb.notifier_call = arm_smmu_flush_queues;
> +		ret = iommu_fault_queue_register(&smmu->faultq_nb);
> +		if (ret)
> +			return ret;

I find a case here: CONFIG_IOMMU_FAULT=n, and smmu support feature STALLS or PRI,
the device probe will failed here, is this what we want?

Thanks
Yisheng

> +	}
> +
>  	/* And we're up. Go go go! */
>  	ret = iommu_device_sysfs_add(&smmu->iommu, dev, NULL,
>  				     "smmu3.%pa", &ioaddr);

^ permalink raw reply	[flat|nested] 317+ messages in thread

end of thread, other threads:[~2018-04-24 18:52 UTC | newest]

Thread overview: 317+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-12 18:33 [PATCH 00/37] Shared Virtual Addressing for the IOMMU Jean-Philippe Brucker
2018-02-12 18:33 ` Jean-Philippe Brucker
2018-02-12 18:33 ` [PATCH 01/37] iommu: Introduce Shared Virtual Addressing API Jean-Philippe Brucker
2018-02-12 18:33   ` Jean-Philippe Brucker
     [not found]   ` <20180212183352.22730-2-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
2018-02-13  7:31     ` Tian, Kevin
2018-02-13  7:31       ` Tian, Kevin
2018-02-13  7:31       ` Tian, Kevin
     [not found]       ` <AADFC41AFE54684AB9EE6CBC0274A5D191002823-0J0gbvR4kThpB2pF5aRoyrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2018-02-13 12:40         ` Jean-Philippe Brucker
2018-02-13 12:40           ` Jean-Philippe Brucker
2018-02-13 12:40           ` Jean-Philippe Brucker
2018-02-13 23:43           ` Tian, Kevin
2018-02-13 23:43             ` Tian, Kevin
2018-02-13 23:43             ` Tian, Kevin
     [not found]             ` <AADFC41AFE54684AB9EE6CBC0274A5D191003B1B-0J0gbvR4kThpB2pF5aRoyrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2018-02-15 12:42               ` Jean-Philippe Brucker
2018-02-15 12:42                 ` Jean-Philippe Brucker
2018-02-15 12:42                 ` Jean-Philippe Brucker
     [not found]                 ` <0b579768-3090-dd50-58b1-3385be92ef21-5wv7dgnIgG8@public.gmane.org>
2018-02-27  6:21                   ` Tian, Kevin
2018-02-27  6:21                     ` Tian, Kevin
2018-02-27  6:21                     ` Tian, Kevin
     [not found]                     ` <AADFC41AFE54684AB9EE6CBC0274A5D19101C8A7-0J0gbvR4kThpB2pF5aRoyrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2018-02-28 16:20                       ` Jean-Philippe Brucker
2018-02-28 16:20                         ` Jean-Philippe Brucker
2018-02-28 16:20                         ` Jean-Philippe Brucker
2018-02-15  9:59   ` Joerg Roedel
2018-02-15  9:59     ` Joerg Roedel
2018-02-15  9:59     ` Joerg Roedel
     [not found]     ` <20180215095909.r4nwqjhuijusssuy-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2018-02-15 12:43       ` Jean-Philippe Brucker
2018-02-15 12:43         ` Jean-Philippe Brucker
2018-02-15 12:43         ` Jean-Philippe Brucker
2018-02-12 18:33 ` [PATCH 02/37] iommu/sva: Bind process address spaces to devices Jean-Philippe Brucker
2018-02-12 18:33   ` Jean-Philippe Brucker
2018-02-13  7:54   ` Tian, Kevin
2018-02-13  7:54     ` Tian, Kevin
2018-02-13  7:54     ` Tian, Kevin
2018-02-13 12:57     ` Jean-Philippe Brucker
2018-02-13 12:57       ` Jean-Philippe Brucker
2018-02-13 12:57       ` Jean-Philippe Brucker
2018-02-13 12:57       ` Jean-Philippe Brucker
2018-02-13 23:34       ` Tian, Kevin
2018-02-13 23:34         ` Tian, Kevin
2018-02-13 23:34         ` Tian, Kevin
     [not found]         ` <AADFC41AFE54684AB9EE6CBC0274A5D191003AD6-0J0gbvR4kThpB2pF5aRoyrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2018-02-15 12:40           ` Jean-Philippe Brucker
2018-02-15 12:40             ` Jean-Philippe Brucker
2018-02-15 12:40             ` Jean-Philippe Brucker
     [not found]             ` <ca4d4992-0c8b-dae6-e443-7c7f7164be60-5wv7dgnIgG8@public.gmane.org>
2018-03-01  3:03               ` Liu, Yi L
2018-03-01  3:03                 ` Liu, Yi L
2018-03-01  3:03                 ` Liu, Yi L
     [not found]                 ` <A2975661238FB949B60364EF0F2C257439B829DA-0J0gbvR4kTg/UvCtAeCM4rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2018-03-02 16:03                   ` Jean-Philippe Brucker
2018-03-02 16:03                     ` Jean-Philippe Brucker
2018-03-02 16:03                     ` Jean-Philippe Brucker
     [not found]       ` <b9eacb30-817f-9027-bc0a-1f01cf9f13f9-5wv7dgnIgG8@public.gmane.org>
2018-02-15 10:21         ` joro-zLv9SwRftAIdnm+yROfE0A
2018-02-15 10:21           ` joro at 8bytes.org
2018-02-15 10:21           ` joro-zLv9SwRftAIdnm+yROfE0A
2018-02-15 10:21           ` joro
     [not found]           ` <20180215102113.c7t7rrnyzgazmdli-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2018-02-15 12:29             ` Christian König
2018-02-15 12:29               ` Christian König
2018-02-15 12:29               ` Christian König
2018-02-15 12:29               ` Christian König
2018-02-15 12:46             ` Jean-Philippe Brucker
2018-02-15 12:46               ` Jean-Philippe Brucker
2018-02-15 12:46               ` Jean-Philippe Brucker
     [not found]   ` <20180212183352.22730-3-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
2018-02-28 20:34     ` Sinan Kaya
2018-02-28 20:34       ` Sinan Kaya
2018-02-28 20:34       ` Sinan Kaya
     [not found]       ` <bce32071-4159-3bdd-1e03-77f540ee4509-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2018-03-02 12:32         ` Jean-Philippe Brucker
2018-03-02 12:32           ` Jean-Philippe Brucker
2018-03-02 12:32           ` Jean-Philippe Brucker
2018-02-12 18:33 ` [PATCH 05/37] iommu/sva: Track mm changes with an MMU notifier Jean-Philippe Brucker
2018-02-12 18:33   ` Jean-Philippe Brucker
2018-02-12 18:33 ` [PATCH 06/37] iommu/sva: Search mm by PASID Jean-Philippe Brucker
2018-02-12 18:33   ` Jean-Philippe Brucker
2018-02-12 18:33 ` [PATCH 07/37] iommu: Add a page fault handler Jean-Philippe Brucker
2018-02-12 18:33   ` Jean-Philippe Brucker
2018-02-14  7:18   ` Jacob Pan
2018-02-14  7:18     ` Jacob Pan
2018-02-14  7:18     ` Jacob Pan
2018-02-15 13:49     ` Jean-Philippe Brucker
2018-02-15 13:49       ` Jean-Philippe Brucker
2018-02-15 13:49       ` Jean-Philippe Brucker
     [not found]   ` <20180212183352.22730-8-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
2018-03-05 21:44     ` Sinan Kaya
2018-03-05 21:44       ` Sinan Kaya
2018-03-05 21:44       ` Sinan Kaya
     [not found]       ` <b2a3d2a7-7042-aef3-0def-05e64e39d046-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2018-03-06 10:24         ` Jean-Philippe Brucker
2018-03-06 10:24           ` Jean-Philippe Brucker
2018-03-06 10:24           ` Jean-Philippe Brucker
2018-03-05 21:53     ` Sinan Kaya
2018-03-05 21:53       ` Sinan Kaya
2018-03-05 21:53       ` Sinan Kaya
     [not found]       ` <77afa195-4842-a112-eba5-409b861b5315-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2018-03-06 10:46         ` Jean-Philippe Brucker
2018-03-06 10:46           ` Jean-Philippe Brucker
2018-03-06 10:46           ` Jean-Philippe Brucker
     [not found]           ` <430e9754-4cf7-0aa8-7899-fc13e6a2e079-5wv7dgnIgG8@public.gmane.org>
2018-03-06 12:52             ` okaya-sgV2jX0FEOL9JmXXK+q4OQ
2018-03-06 12:52               ` okaya at codeaurora.org
2018-03-06 12:52               ` okaya
2018-03-08 15:40     ` Jonathan Cameron
2018-03-08 15:40       ` Jonathan Cameron
2018-03-08 15:40       ` Jonathan Cameron
     [not found]       ` <20180308164035.000065c2-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2018-03-14 13:08         ` Jean-Philippe Brucker
2018-03-14 13:08           ` Jean-Philippe Brucker
2018-03-14 13:08           ` Jean-Philippe Brucker
2018-02-12 18:33 ` [PATCH 08/37] iommu/fault: Handle mm faults Jean-Philippe Brucker
2018-02-12 18:33   ` Jean-Philippe Brucker
     [not found]   ` <20180212183352.22730-9-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
2018-02-14 18:46     ` Jacob Pan
2018-02-14 18:46       ` Jacob Pan
2018-02-14 18:46       ` Jacob Pan
2018-02-15 13:51       ` Jean-Philippe Brucker
2018-02-15 13:51         ` Jean-Philippe Brucker
2018-02-15 13:51         ` Jean-Philippe Brucker
2018-02-12 18:33 ` [PATCH 09/37] iommu/fault: Let handler return a fault response Jean-Philippe Brucker
2018-02-12 18:33   ` Jean-Philippe Brucker
     [not found]   ` <20180212183352.22730-10-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
2018-02-20 23:19     ` Jacob Pan
2018-02-20 23:19       ` Jacob Pan
2018-02-20 23:19       ` Jacob Pan
2018-02-21 10:28       ` Jean-Philippe Brucker
2018-02-21 10:28         ` Jean-Philippe Brucker
2018-02-21 10:28         ` Jean-Philippe Brucker
2018-02-12 18:33 ` [PATCH 11/37] dt-bindings: document stall and PASID properties for IOMMU masters Jean-Philippe Brucker
2018-02-12 18:33   ` Jean-Philippe Brucker
     [not found]   ` <20180212183352.22730-12-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
2018-02-19  2:51     ` Rob Herring
2018-02-19  2:51       ` Rob Herring
2018-02-19  2:51       ` Rob Herring
2018-02-20 11:28       ` Jean-Philippe Brucker
2018-02-20 11:28         ` Jean-Philippe Brucker
2018-02-20 11:28         ` Jean-Philippe Brucker
2018-02-12 18:33 ` [PATCH 12/37] iommu/of: Add stall and pasid properties to iommu_fwspec Jean-Philippe Brucker
2018-02-12 18:33   ` Jean-Philippe Brucker
2018-02-12 18:33 ` [PATCH 15/37] iommu/io-pgtable-arm: Factor out ARM LPAE register defines Jean-Philippe Brucker
2018-02-12 18:33   ` Jean-Philippe Brucker
2018-02-12 18:33 ` [PATCH 18/37] iommu/arm-smmu-v3: Add support for Substream IDs Jean-Philippe Brucker
2018-02-12 18:33   ` Jean-Philippe Brucker
2018-02-12 18:33 ` [PATCH 20/37] iommu/arm-smmu-v3: Share process page tables Jean-Philippe Brucker
2018-02-12 18:33   ` Jean-Philippe Brucker
2018-02-12 18:33 ` [PATCH 23/37] iommu/arm-smmu-v3: Enable broadcast TLB maintenance Jean-Philippe Brucker
2018-02-12 18:33   ` Jean-Philippe Brucker
2018-02-12 18:33 ` [PATCH 26/37] iommu/arm-smmu-v3: Add support for Hardware Translation Table Update Jean-Philippe Brucker
2018-02-12 18:33   ` Jean-Philippe Brucker
     [not found] ` <20180212183352.22730-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
2018-02-12 18:33   ` [PATCH 03/37] iommu/sva: Manage process address spaces Jean-Philippe Brucker
2018-02-12 18:33     ` Jean-Philippe Brucker
2018-02-12 18:33     ` Jean-Philippe Brucker
     [not found]     ` <20180212183352.22730-4-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
2018-03-01  6:52       ` Lu Baolu
2018-03-01  6:52         ` Lu Baolu
2018-03-01  6:52         ` Lu Baolu
     [not found]         ` <5A97A324.9050605-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-03-01  8:04           ` Christian König
2018-03-01  8:04             ` Christian König
2018-03-01  8:04             ` Christian König
     [not found]             ` <cd4d7a98-e45e-7066-345f-52d8eef926a2-5C7GfCeVMHo@public.gmane.org>
2018-03-02 16:42               ` Jean-Philippe Brucker
2018-03-02 16:42                 ` Jean-Philippe Brucker
2018-03-02 16:42                 ` Jean-Philippe Brucker
2018-03-02 16:19           ` Jean-Philippe Brucker
2018-03-02 16:19             ` Jean-Philippe Brucker
2018-03-02 16:19             ` Jean-Philippe Brucker
2018-03-05 15:28       ` Sinan Kaya
2018-03-05 15:28         ` Sinan Kaya
2018-03-05 15:28         ` Sinan Kaya
     [not found]         ` <27a044ee-0ed7-0470-0fef-289d0d5cf5e8-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2018-03-06 10:37           ` Jean-Philippe Brucker
2018-03-06 10:37             ` Jean-Philippe Brucker
2018-03-06 10:37             ` Jean-Philippe Brucker
2018-04-24  1:32       ` Sinan Kaya
2018-04-24  1:32         ` Sinan Kaya
2018-04-24  1:32         ` Sinan Kaya
     [not found]         ` <57d77955-caa7-ddac-df7d-7eef1f05dbb2-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2018-04-24  9:33           ` Jean-Philippe Brucker
2018-04-24  9:33             ` Jean-Philippe Brucker
2018-04-24  9:33             ` Jean-Philippe Brucker
     [not found]             ` <66ec18ca-ea4e-d224-c9c5-8dbee5da8a72-5wv7dgnIgG8@public.gmane.org>
2018-04-24 17:17               ` Sinan Kaya
2018-04-24 17:17                 ` Sinan Kaya
2018-04-24 17:17                 ` Sinan Kaya
     [not found]                 ` <e7c4053a-20cc-d2db-16da-100b1157eca4-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2018-04-24 18:52                   ` Jean-Philippe Brucker via iommu
2018-04-24 18:52                     ` Jean-Philippe Brucker
2018-04-10 18:53     ` Sinan Kaya
2018-04-10 18:53       ` Sinan Kaya
2018-04-10 18:53       ` Sinan Kaya
     [not found]       ` <04d4d161-ed72-f6b6-9b94-1d60bd79ef94-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2018-04-13 10:59         ` Jean-Philippe Brucker
2018-04-13 10:59           ` Jean-Philippe Brucker
2018-04-13 10:59           ` Jean-Philippe Brucker
2018-02-12 18:33   ` [PATCH 04/37] iommu/sva: Add a mm_exit callback for device drivers Jean-Philippe Brucker
2018-02-12 18:33     ` Jean-Philippe Brucker
2018-02-12 18:33     ` Jean-Philippe Brucker
     [not found]     ` <20180212183352.22730-5-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
2018-02-13  8:11       ` Tian, Kevin
2018-02-13  8:11         ` Tian, Kevin
2018-02-13  8:11         ` Tian, Kevin
2018-02-13 12:57         ` Jean-Philippe Brucker
2018-02-13 12:57           ` Jean-Philippe Brucker
2018-02-13 12:57           ` Jean-Philippe Brucker
2018-02-13 12:57           ` Jean-Philippe Brucker
2018-02-12 18:33   ` [PATCH 10/37] iommu/fault: Allow blocking fault handlers Jean-Philippe Brucker
2018-02-12 18:33     ` Jean-Philippe Brucker
2018-02-12 18:33     ` Jean-Philippe Brucker
2018-02-12 18:33   ` [PATCH 13/37] arm64: mm: Pin down ASIDs for sharing mm with devices Jean-Philippe Brucker
2018-02-12 18:33     ` Jean-Philippe Brucker
2018-02-12 18:33     ` Jean-Philippe Brucker
2018-02-12 18:33   ` [PATCH 14/37] iommu/arm-smmu-v3: Link domains and devices Jean-Philippe Brucker
2018-02-12 18:33     ` Jean-Philippe Brucker
2018-02-12 18:33     ` Jean-Philippe Brucker
2018-02-12 18:33   ` [PATCH 16/37] iommu: Add generic PASID table library Jean-Philippe Brucker
2018-02-12 18:33     ` Jean-Philippe Brucker
2018-02-12 18:33     ` Jean-Philippe Brucker
     [not found]     ` <20180212183352.22730-17-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
2018-02-27 18:51       ` Jacob Pan
2018-02-27 18:51         ` Jacob Pan
2018-02-27 18:51         ` Jacob Pan
2018-02-28 16:22         ` Jean-Philippe Brucker
2018-02-28 16:22           ` Jean-Philippe Brucker
2018-02-28 16:22           ` Jean-Philippe Brucker
2018-02-12 18:33   ` [PATCH 17/37] iommu/arm-smmu-v3: Move context descriptor code Jean-Philippe Brucker
2018-02-12 18:33     ` Jean-Philippe Brucker
2018-02-12 18:33     ` Jean-Philippe Brucker
     [not found]     ` <20180212183352.22730-18-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
2018-03-09 11:44       ` Jonathan Cameron
2018-03-09 11:44         ` Jonathan Cameron
2018-03-09 11:44         ` Jonathan Cameron
     [not found]         ` <20180309124445.00005e08-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2018-03-14 13:08           ` Jean-Philippe Brucker
2018-03-14 13:08             ` Jean-Philippe Brucker
2018-03-14 13:08             ` Jean-Philippe Brucker
2018-02-12 18:33   ` [PATCH 19/37] iommu/arm-smmu-v3: Add second level of context descriptor table Jean-Philippe Brucker
2018-02-12 18:33     ` Jean-Philippe Brucker
2018-02-12 18:33     ` Jean-Philippe Brucker
2018-02-12 18:33   ` [PATCH 21/37] iommu/arm-smmu-v3: Seize private ASID Jean-Philippe Brucker
2018-02-12 18:33     ` Jean-Philippe Brucker
2018-02-12 18:33     ` Jean-Philippe Brucker
2018-02-12 18:33   ` [PATCH 22/37] iommu/arm-smmu-v3: Add support for VHE Jean-Philippe Brucker
2018-02-12 18:33     ` Jean-Philippe Brucker
2018-02-12 18:33     ` Jean-Philippe Brucker
2018-02-12 18:33   ` [PATCH 24/37] iommu/arm-smmu-v3: Add SVA feature checking Jean-Philippe Brucker
2018-02-12 18:33     ` Jean-Philippe Brucker
2018-02-12 18:33     ` Jean-Philippe Brucker
2018-02-12 18:33   ` [PATCH 25/37] iommu/arm-smmu-v3: Implement mm operations Jean-Philippe Brucker
2018-02-12 18:33     ` Jean-Philippe Brucker
2018-02-12 18:33     ` Jean-Philippe Brucker
2018-02-12 18:33   ` [PATCH 27/37] iommu/arm-smmu-v3: Register fault workqueue Jean-Philippe Brucker
2018-02-12 18:33     ` Jean-Philippe Brucker
2018-02-12 18:33     ` Jean-Philippe Brucker
     [not found]     ` <20180212183352.22730-28-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
2018-03-08 17:44       ` Jonathan Cameron
2018-03-08 17:44         ` Jonathan Cameron
2018-03-08 17:44         ` Jonathan Cameron
     [not found]         ` <20180308184454.00000b4e-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2018-03-14 13:08           ` Jean-Philippe Brucker
2018-03-14 13:08             ` Jean-Philippe Brucker
2018-03-14 13:08             ` Jean-Philippe Brucker
2018-02-12 18:33   ` [PATCH 28/37] iommu/arm-smmu-v3: Maintain a SID->device structure Jean-Philippe Brucker
2018-02-12 18:33     ` Jean-Philippe Brucker
2018-02-12 18:33     ` Jean-Philippe Brucker
     [not found]     ` <20180212183352.22730-29-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
2018-03-08 17:34       ` Jonathan Cameron
2018-03-08 17:34         ` Jonathan Cameron
2018-03-08 17:34         ` Jonathan Cameron
     [not found]         ` <20180308183431.00005f86-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2018-03-14 13:09           ` Jean-Philippe Brucker
2018-03-14 13:09             ` Jean-Philippe Brucker
2018-03-14 13:09             ` Jean-Philippe Brucker
2018-02-12 18:33   ` [PATCH 29/37] iommu/arm-smmu-v3: Add stall support for platform devices Jean-Philippe Brucker
2018-02-12 18:33     ` Jean-Philippe Brucker
2018-02-12 18:33     ` Jean-Philippe Brucker
2018-02-13  1:46     ` Xu Zaibo
2018-02-13  1:46       ` Xu Zaibo
2018-02-13  1:46       ` Xu Zaibo
2018-02-13  1:46       ` Xu Zaibo
     [not found]       ` <5A824359.1080005-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2018-02-13 12:58         ` Jean-Philippe Brucker
2018-02-13 12:58           ` Jean-Philippe Brucker
2018-02-13 12:58           ` Jean-Philippe Brucker
2018-02-12 18:33   ` [PATCH 30/37] ACPI/IORT: Check ATS capability in root complex nodes Jean-Philippe Brucker
2018-02-12 18:33     ` Jean-Philippe Brucker
2018-02-12 18:33     ` Jean-Philippe Brucker
2018-02-12 18:33   ` [PATCH 31/37] iommu/arm-smmu-v3: Add support for PCI ATS Jean-Philippe Brucker
2018-02-12 18:33     ` Jean-Philippe Brucker
2018-02-12 18:33     ` Jean-Philippe Brucker
     [not found]     ` <20180212183352.22730-32-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
2018-03-08 16:17       ` Jonathan Cameron
2018-03-08 16:17         ` Jonathan Cameron
2018-03-08 16:17         ` Jonathan Cameron
     [not found]         ` <20180308171725.0000763c-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2018-03-14 13:09           ` Jean-Philippe Brucker
2018-03-14 13:09             ` Jean-Philippe Brucker
2018-03-14 13:09             ` Jean-Philippe Brucker
2018-02-12 18:33   ` [PATCH 32/37] iommu/arm-smmu-v3: Hook up ATC invalidation to mm ops Jean-Philippe Brucker
2018-02-12 18:33     ` Jean-Philippe Brucker
2018-02-12 18:33     ` Jean-Philippe Brucker
2018-02-12 18:33   ` [PATCH 33/37] iommu/arm-smmu-v3: Disable tagged pointers Jean-Philippe Brucker
2018-02-12 18:33     ` Jean-Philippe Brucker
2018-02-12 18:33     ` Jean-Philippe Brucker
2018-02-12 18:33   ` [PATCH 35/37] iommu/arm-smmu-v3: Add support for PRI Jean-Philippe Brucker
2018-02-12 18:33     ` Jean-Philippe Brucker
2018-02-12 18:33     ` Jean-Philippe Brucker
     [not found]     ` <20180212183352.22730-36-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
2018-03-05 12:29       ` Dongdong Liu
2018-03-05 12:29         ` Dongdong Liu
2018-03-05 12:29         ` Dongdong Liu
2018-03-05 12:29         ` Dongdong Liu
     [not found]         ` <6f55afcf-04b0-0dc4-6c75-064b70e6851c-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2018-03-05 13:09           ` Jean-Philippe Brucker
2018-03-05 13:09             ` Jean-Philippe Brucker
2018-03-05 13:09             ` Jean-Philippe Brucker
2018-03-08 16:24       ` Jonathan Cameron
2018-03-08 16:24         ` Jonathan Cameron
2018-03-08 16:24         ` Jonathan Cameron
     [not found]         ` <20180308172436.00006554-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2018-03-14 13:10           ` Jean-Philippe Brucker
2018-03-14 13:10             ` Jean-Philippe Brucker
2018-03-14 13:10             ` Jean-Philippe Brucker
2018-02-12 18:33 ` [PATCH 34/37] PCI: Make "PRG Response PASID Required" handling common Jean-Philippe Brucker
2018-02-12 18:33   ` Jean-Philippe Brucker
2018-02-12 18:33 ` [PATCH 36/37] iommu/arm-smmu-v3: Add support for PCI PASID Jean-Philippe Brucker
2018-02-12 18:33   ` Jean-Philippe Brucker
2018-02-12 18:33 ` [PATCH 37/37] vfio: Add support for Shared Virtual Addressing Jean-Philippe Brucker
2018-02-12 18:33   ` Jean-Philippe Brucker
     [not found]   ` <20180212183352.22730-38-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
2018-02-16 19:33     ` Alex Williamson
2018-02-16 19:33       ` Alex Williamson
2018-02-16 19:33       ` Alex Williamson
     [not found]       ` <20180216123329.10f6dc23-DGNDKt5SQtizQB+pC5nmwQ@public.gmane.org>
2018-02-20 11:26         ` Jean-Philippe Brucker
2018-02-20 11:26           ` Jean-Philippe Brucker
2018-02-20 11:26           ` Jean-Philippe Brucker
2018-02-28  1:26     ` Sinan Kaya
2018-02-28  1:26       ` Sinan Kaya
2018-02-28  1:26       ` Sinan Kaya
     [not found]       ` <1e76c66c-952e-71bd-d831-d3a1ded9559c-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2018-02-28 16:25         ` Jean-Philippe Brucker
2018-02-28 16:25           ` Jean-Philippe Brucker
2018-02-28 16:25           ` Jean-Philippe Brucker
     [not found] <1519280641-30258-1-git-send-email-xieyisheng1@huawei.com>
     [not found] ` <1519280641-30258-27-git-send-email-xieyisheng1@huawei.com>
     [not found]   ` <1519280641-30258-27-git-send-email-xieyisheng1-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2018-03-19 11:03     ` [PATCH 27/37] iommu/arm-smmu-v3: Register fault workqueue Yisheng Xie
2018-03-19 11:03       ` Yisheng Xie
2018-03-19 11:03       ` Yisheng Xie
     [not found]       ` <f2841aac-4c5e-4ac2-fa4d-81d6b2857503-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2018-03-21 13:24         ` Jean-Philippe Brucker
2018-03-21 13:24           ` Jean-Philippe Brucker
2018-03-21 13:24           ` Jean-Philippe Brucker
     [not found]           ` <cabd71fd-c54c-c21a-a5b7-227e69fa4286-5wv7dgnIgG8@public.gmane.org>
2018-03-22  1:09             ` Yisheng Xie
2018-03-22  1:09               ` Yisheng Xie
2018-03-22  1:09               ` Yisheng Xie
     [not found]               ` <c4f0a441-e975-395a-fc38-3686db21227d-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2018-04-04 10:13                 ` Jean-Philippe Brucker
2018-04-04 10:13                   ` Jean-Philippe Brucker
2018-04-04 10:13                   ` Jean-Philippe Brucker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.