All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFCv2 PATCH 00/36] Process management for IOMMU + SVM for SMMUv3
@ 2017-10-06 13:31 ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

Following discussions at plumbers and elsewhere, it seems like we need to
unify some of the Shared Virtual Memory (SVM) code, in order to define
clear semantics for the SVM API.

My previous RFC [1] was centered on the SMMUv3, but some of this code will
need to be reused by the SMMUv2 and virtio-iommu drivers. This second
proposal focuses on abstracting a little more into the core IOMMU API, and
also trying to find common ground for all SVM-capable IOMMUs.

SVM is, in the context of the IOMMU, sharing page tables between a process
and a device. Traditionally it requires IO Page Fault and Process Address
Space ID capabilities in device and IOMMU.

* A device driver can bind a process to a device, with iommu_process_bind.
  Internally we hold on to the mm and get notified of its activity with an
  mmu_notifier. The bond is removed by exit_mm, by a call to
  iommu_process_unbind or iommu_detach_device.

* iommu_process_bind returns a 20-bit PASID (PCI terminology) to the
  device driver, which programs it into the device to access the process
  address space.

* The device and the IOMMU support recoverable page faults. This can be
  either ATS+PRI for PCI, or platform-specific mechanisms such as Stall
  for SMMU.

Ideally systems wanting to use SVM have to support these three features,
but in practice we'll see implementations supporting just a subset of
them, especially in validation environments. So even if this particular
patchset assumes all three capabilities, it should also be possible to
support PASID without IOPF (by pinning everything, see non-system SVM in
OpenCL), or IOPF without PASID (sharing the single device address space
with a process, could be useful for DPDK+VFIO).

Implementing both these cases would enable PT sharing alone. Some people
would also like IOPF alone without SVM (covered by this series) or process
management without shared PT (not covered). Using these features
individually is also important for testing, as SVM is in its infancy and
providing easy ways to test is essential to reduce the number of quirks
down the line.

  Process management
  ==================

The first part of this series introduces boilerplate code for managing
PASIDs and processes bound to devices. It's something any IOMMU driver
that wants to support bind/unbind will have to do, and it is difficult to
get right.

Patches
1: iommu_process and PASID allocation, attach and release
2: process_exit callback for device drivers
3: iommu_process search by PASID
4: track process changes with an MMU notifiers
5: bind and unbind operations

My proposal uses the following model:

* The PASID space is system-wide. This means that a Linux process will
  have a single PASID. I introduce the iommu_process structure and a
  global IDR to manage this.

* An iommu_process can be bound to multiple domains, and a domain can have
  multiple iommu_process.

* IOMMU groups share same PASID table. IOMMU groups are a convenient way
  to cover various hardware weaknesses that prevent a group of device to
  be isolated by the IOMMU (untrusted bridge, for instance). It's foolish
  to assume that all PASID implementations will perfectly isolate devices
  within a bus and functions within a device, so let's assume all devices
  within an IOMMU group have to share PASID traffic as well. In general
  there will be a single device per group.

* It's up to the driver implementation to decide where to implement the
  PASID tables. For SMMU it's more convenient to have a single PASID table
  per domain. And I think the model fits better with the existing IOMMU
  API: IOVA traffic is shared by all devices in a domain, so should PASID
  traffic.

  This isn't a hard requirement though, an implementation can still have a
  PASID table for each device.

  Fault handling
  ==============

The second part adds a few helpers for distributing recoverable and
unrecoverable faults to other parts of the kernel:

* to the mm subsystem, when process page tables are shared with a device,
* to VFIO allowing it to forward translation faults to guests, and let
  them to recover from it,
* to device drivers that need to do something a bit more complex than just
  displaying a fault on dmesg.

You'll notice that this overlaps the work carried out by Jacob Pan for
vSVM fault reporting (published a few hours ago! [2]), which goes in the
same direction. For iommu_fault definition and handler registration it's
probably best to go with his more complete patchset, but I needed some
code to present the full solution and a way to describe both PRI and stall
data.

Patches
6: a new fault handler registration for device drivers (see also [2])
7: report faults to device drivers or add them to a workqueue (ditto)
8: call handle_mm_fault for recoverable faults
9: allow device driver to register blocking handlers

For the moment the interactions between process and fault queue are the
following. Hopefully it should be sufficient.

* When unbinding a process, the fault queue has to be flushed to ensure
  that no old fault will hit a future process that obtains the same PASID.

* When handling a fault, find a process by PASID and handle the fault on
  its mm. The process structure is refcounted, so releasing it in the
  fault handler might free the process.

Patch 10 adds a VFIO interface for binding a device owned by a userspace
driver to processes. I didn't add capability detection now, leaving that
discussion for later (also needed by vSVM).

  ARM SMMUv3 support
  ==================

The third part adds an example user, the SMMUv3 driver. A lot of
preparatory work is still needed to support these features, I only
extracted a small part of the previous series to make it common.

If you don't care about SMMU I advise to look at patches 21, which uses
the new process management interface. Patches 27, 29 and 35 use the new
fault queue for PRI and Stall.

Patches:
11:     track domain-master links (for ATS and CD invalidation)
12-13   add stall and PASID properties to the device tree
     -> New.
14-15:  add SSID support to the SMMU
     -> Now initializes the CD tables from the value found in DT.
16-20:  share ASID and page tables part 1
21:     implement iommu-process operations
     -> New.
22-26:  share ASID and page tables part 2
27:     use the new fault queue
     -> New.
28:     find masters by SID
     -> New.
29:     add stall support
     -> New.
30-36:  add PCI ATS, PRI and PASID
     -> Now uses mostly core code

This series is available on my svm/rfc2 branch [3]. It is based on v4.14
with Yisheng's stall fix [4]. Patch 8 also requires mmput_async which
should be added back soon enough [5]. Updates and fixes will go on
branch svm/current until next version.

Hoping this helps,
Jean

[1] https://lists.linuxfoundation.org/pipermail/iommu/2017-February/020599.html
[2] https://patchwork.kernel.org/patch/9988089/
[3] git://linux-arm.org/linux-jpb svm/rfc2
[4] https://patchwork.kernel.org/patch/9963863/
[5] https://patchwork.kernel.org/patch/9952257/

Jean-Philippe Brucker (36):
  iommu: Keep track of processes and PASIDs
  iommu: Add a process_exit callback for device drivers
  iommu/process: Add public function to search for a process
  iommu/process: Track process changes with an mmu_notifier
  iommu/process: Bind and unbind process to and from devices
  iommu: Extend fault reporting
  iommu: Add a fault handler
  iommu/fault: Handle mm faults
  iommu/fault: Allow blocking fault handlers
  vfio: Add support for Shared Virtual Memory
  iommu/arm-smmu-v3: Link domains and devices
  dt-bindings: document stall and PASID properties for IOMMU masters
  iommu/of: Add stall and pasid properties to iommu_fwspec
  iommu/arm-smmu-v3: Add support for Substream IDs
  iommu/arm-smmu-v3: Add second level of context descriptor table
  iommu/arm-smmu-v3: Add support for VHE
  iommu/arm-smmu-v3: Support broadcast TLB maintenance
  iommu/arm-smmu-v3: Add SVM feature checking
  arm64: mm: Pin down ASIDs for sharing contexts with devices
  iommu/arm-smmu-v3: Track ASID state
  iommu/arm-smmu-v3: Implement process operations
  iommu/io-pgtable-arm: Factor out ARM LPAE register defines
  iommu/arm-smmu-v3: Share process page tables
  iommu/arm-smmu-v3: Steal private ASID from a domain
  iommu/arm-smmu-v3: Use shared ASID set
  iommu/arm-smmu-v3: Add support for Hardware Translation Table Update
  iommu/arm-smmu-v3: Register fault workqueue
  iommu/arm-smmu-v3: Maintain a SID->device structure
  iommu/arm-smmu-v3: Add stall support for platform devices
  ACPI/IORT: Check ATS capability in root complex nodes
  iommu/arm-smmu-v3: Add support for PCI ATS
  iommu/arm-smmu-v3: Hook ATC invalidation to process ops
  iommu/arm-smmu-v3: Disable tagged pointers
  PCI: Make "PRG Response PASID Required" handling common
  iommu/arm-smmu-v3: Add support for PRI
  iommu/arm-smmu-v3: Add support for PCI PASID

 Documentation/devicetree/bindings/iommu/iommu.txt |   24 +
 MAINTAINERS                                       |    1 +
 arch/arm64/include/asm/mmu.h                      |    1 +
 arch/arm64/include/asm/mmu_context.h              |   11 +-
 arch/arm64/mm/context.c                           |   80 +-
 drivers/acpi/arm64/iort.c                         |   11 +
 drivers/iommu/Kconfig                             |   19 +
 drivers/iommu/Makefile                            |    2 +
 drivers/iommu/amd_iommu.c                         |   19 +-
 drivers/iommu/arm-smmu-v3.c                       | 1990 ++++++++++++++++++---
 drivers/iommu/io-pgfault.c                        |  421 +++++
 drivers/iommu/io-pgtable-arm.c                    |   48 +-
 drivers/iommu/io-pgtable-arm.h                    |   67 +
 drivers/iommu/iommu-process.c                     |  604 +++++++
 drivers/iommu/iommu.c                             |  113 ++
 drivers/iommu/of_iommu.c                          |   10 +
 drivers/pci/ats.c                                 |   17 +
 drivers/vfio/vfio_iommu_type1.c                   |  243 ++-
 include/linux/iommu.h                             |  254 ++-
 include/linux/pci-ats.h                           |    8 +
 include/uapi/linux/pci_regs.h                     |    1 +
 include/uapi/linux/vfio.h                         |   69 +
 22 files changed, 3690 insertions(+), 323 deletions(-)
 create mode 100644 drivers/iommu/io-pgfault.c
 create mode 100644 drivers/iommu/io-pgtable-arm.h
 create mode 100644 drivers/iommu/iommu-process.c

-- 
2.13.3


^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 00/36] Process management for IOMMU + SVM for SMMUv3
@ 2017-10-06 13:31 ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: mark.rutland, xieyisheng1, gabriele.paoloni, catalin.marinas,
	will.deacon, okaya, yi.l.liu, lorenzo.pieralisi, ashok.raj, tn,
	joro, rfranz, lenb, jacob.jun.pan, alex.williamson, robh+dt,
	thunder.leizhen, bhelgaas, dwmw2, liubo95, rjw, robdclark,
	hanjun.guo, sudeep.holla, robin.murphy, nwatters

Following discussions at plumbers and elsewhere, it seems like we need to
unify some of the Shared Virtual Memory (SVM) code, in order to define
clear semantics for the SVM API.

My previous RFC [1] was centered on the SMMUv3, but some of this code will
need to be reused by the SMMUv2 and virtio-iommu drivers. This second
proposal focuses on abstracting a little more into the core IOMMU API, and
also trying to find common ground for all SVM-capable IOMMUs.

SVM is, in the context of the IOMMU, sharing page tables between a process
and a device. Traditionally it requires IO Page Fault and Process Address
Space ID capabilities in device and IOMMU.

* A device driver can bind a process to a device, with iommu_process_bind.
  Internally we hold on to the mm and get notified of its activity with an
  mmu_notifier. The bond is removed by exit_mm, by a call to
  iommu_process_unbind or iommu_detach_device.

* iommu_process_bind returns a 20-bit PASID (PCI terminology) to the
  device driver, which programs it into the device to access the process
  address space.

* The device and the IOMMU support recoverable page faults. This can be
  either ATS+PRI for PCI, or platform-specific mechanisms such as Stall
  for SMMU.

Ideally systems wanting to use SVM have to support these three features,
but in practice we'll see implementations supporting just a subset of
them, especially in validation environments. So even if this particular
patchset assumes all three capabilities, it should also be possible to
support PASID without IOPF (by pinning everything, see non-system SVM in
OpenCL), or IOPF without PASID (sharing the single device address space
with a process, could be useful for DPDK+VFIO).

Implementing both these cases would enable PT sharing alone. Some people
would also like IOPF alone without SVM (covered by this series) or process
management without shared PT (not covered). Using these features
individually is also important for testing, as SVM is in its infancy and
providing easy ways to test is essential to reduce the number of quirks
down the line.

  Process management
  ==================

The first part of this series introduces boilerplate code for managing
PASIDs and processes bound to devices. It's something any IOMMU driver
that wants to support bind/unbind will have to do, and it is difficult to
get right.

Patches
1: iommu_process and PASID allocation, attach and release
2: process_exit callback for device drivers
3: iommu_process search by PASID
4: track process changes with an MMU notifiers
5: bind and unbind operations

My proposal uses the following model:

* The PASID space is system-wide. This means that a Linux process will
  have a single PASID. I introduce the iommu_process structure and a
  global IDR to manage this.

* An iommu_process can be bound to multiple domains, and a domain can have
  multiple iommu_process.

* IOMMU groups share same PASID table. IOMMU groups are a convenient way
  to cover various hardware weaknesses that prevent a group of device to
  be isolated by the IOMMU (untrusted bridge, for instance). It's foolish
  to assume that all PASID implementations will perfectly isolate devices
  within a bus and functions within a device, so let's assume all devices
  within an IOMMU group have to share PASID traffic as well. In general
  there will be a single device per group.

* It's up to the driver implementation to decide where to implement the
  PASID tables. For SMMU it's more convenient to have a single PASID table
  per domain. And I think the model fits better with the existing IOMMU
  API: IOVA traffic is shared by all devices in a domain, so should PASID
  traffic.

  This isn't a hard requirement though, an implementation can still have a
  PASID table for each device.

  Fault handling
  ==============

The second part adds a few helpers for distributing recoverable and
unrecoverable faults to other parts of the kernel:

* to the mm subsystem, when process page tables are shared with a device,
* to VFIO allowing it to forward translation faults to guests, and let
  them to recover from it,
* to device drivers that need to do something a bit more complex than just
  displaying a fault on dmesg.

You'll notice that this overlaps the work carried out by Jacob Pan for
vSVM fault reporting (published a few hours ago! [2]), which goes in the
same direction. For iommu_fault definition and handler registration it's
probably best to go with his more complete patchset, but I needed some
code to present the full solution and a way to describe both PRI and stall
data.

Patches
6: a new fault handler registration for device drivers (see also [2])
7: report faults to device drivers or add them to a workqueue (ditto)
8: call handle_mm_fault for recoverable faults
9: allow device driver to register blocking handlers

For the moment the interactions between process and fault queue are the
following. Hopefully it should be sufficient.

* When unbinding a process, the fault queue has to be flushed to ensure
  that no old fault will hit a future process that obtains the same PASID.

* When handling a fault, find a process by PASID and handle the fault on
  its mm. The process structure is refcounted, so releasing it in the
  fault handler might free the process.

Patch 10 adds a VFIO interface for binding a device owned by a userspace
driver to processes. I didn't add capability detection now, leaving that
discussion for later (also needed by vSVM).

  ARM SMMUv3 support
  ==================

The third part adds an example user, the SMMUv3 driver. A lot of
preparatory work is still needed to support these features, I only
extracted a small part of the previous series to make it common.

If you don't care about SMMU I advise to look at patches 21, which uses
the new process management interface. Patches 27, 29 and 35 use the new
fault queue for PRI and Stall.

Patches:
11:     track domain-master links (for ATS and CD invalidation)
12-13   add stall and PASID properties to the device tree
     -> New.
14-15:  add SSID support to the SMMU
     -> Now initializes the CD tables from the value found in DT.
16-20:  share ASID and page tables part 1
21:     implement iommu-process operations
     -> New.
22-26:  share ASID and page tables part 2
27:     use the new fault queue
     -> New.
28:     find masters by SID
     -> New.
29:     add stall support
     -> New.
30-36:  add PCI ATS, PRI and PASID
     -> Now uses mostly core code

This series is available on my svm/rfc2 branch [3]. It is based on v4.14
with Yisheng's stall fix [4]. Patch 8 also requires mmput_async which
should be added back soon enough [5]. Updates and fixes will go on
branch svm/current until next version.

Hoping this helps,
Jean

[1] https://lists.linuxfoundation.org/pipermail/iommu/2017-February/020599.html
[2] https://patchwork.kernel.org/patch/9988089/
[3] git://linux-arm.org/linux-jpb svm/rfc2
[4] https://patchwork.kernel.org/patch/9963863/
[5] https://patchwork.kernel.org/patch/9952257/

Jean-Philippe Brucker (36):
  iommu: Keep track of processes and PASIDs
  iommu: Add a process_exit callback for device drivers
  iommu/process: Add public function to search for a process
  iommu/process: Track process changes with an mmu_notifier
  iommu/process: Bind and unbind process to and from devices
  iommu: Extend fault reporting
  iommu: Add a fault handler
  iommu/fault: Handle mm faults
  iommu/fault: Allow blocking fault handlers
  vfio: Add support for Shared Virtual Memory
  iommu/arm-smmu-v3: Link domains and devices
  dt-bindings: document stall and PASID properties for IOMMU masters
  iommu/of: Add stall and pasid properties to iommu_fwspec
  iommu/arm-smmu-v3: Add support for Substream IDs
  iommu/arm-smmu-v3: Add second level of context descriptor table
  iommu/arm-smmu-v3: Add support for VHE
  iommu/arm-smmu-v3: Support broadcast TLB maintenance
  iommu/arm-smmu-v3: Add SVM feature checking
  arm64: mm: Pin down ASIDs for sharing contexts with devices
  iommu/arm-smmu-v3: Track ASID state
  iommu/arm-smmu-v3: Implement process operations
  iommu/io-pgtable-arm: Factor out ARM LPAE register defines
  iommu/arm-smmu-v3: Share process page tables
  iommu/arm-smmu-v3: Steal private ASID from a domain
  iommu/arm-smmu-v3: Use shared ASID set
  iommu/arm-smmu-v3: Add support for Hardware Translation Table Update
  iommu/arm-smmu-v3: Register fault workqueue
  iommu/arm-smmu-v3: Maintain a SID->device structure
  iommu/arm-smmu-v3: Add stall support for platform devices
  ACPI/IORT: Check ATS capability in root complex nodes
  iommu/arm-smmu-v3: Add support for PCI ATS
  iommu/arm-smmu-v3: Hook ATC invalidation to process ops
  iommu/arm-smmu-v3: Disable tagged pointers
  PCI: Make "PRG Response PASID Required" handling common
  iommu/arm-smmu-v3: Add support for PRI
  iommu/arm-smmu-v3: Add support for PCI PASID

 Documentation/devicetree/bindings/iommu/iommu.txt |   24 +
 MAINTAINERS                                       |    1 +
 arch/arm64/include/asm/mmu.h                      |    1 +
 arch/arm64/include/asm/mmu_context.h              |   11 +-
 arch/arm64/mm/context.c                           |   80 +-
 drivers/acpi/arm64/iort.c                         |   11 +
 drivers/iommu/Kconfig                             |   19 +
 drivers/iommu/Makefile                            |    2 +
 drivers/iommu/amd_iommu.c                         |   19 +-
 drivers/iommu/arm-smmu-v3.c                       | 1990 ++++++++++++++++++---
 drivers/iommu/io-pgfault.c                        |  421 +++++
 drivers/iommu/io-pgtable-arm.c                    |   48 +-
 drivers/iommu/io-pgtable-arm.h                    |   67 +
 drivers/iommu/iommu-process.c                     |  604 +++++++
 drivers/iommu/iommu.c                             |  113 ++
 drivers/iommu/of_iommu.c                          |   10 +
 drivers/pci/ats.c                                 |   17 +
 drivers/vfio/vfio_iommu_type1.c                   |  243 ++-
 include/linux/iommu.h                             |  254 ++-
 include/linux/pci-ats.h                           |    8 +
 include/uapi/linux/pci_regs.h                     |    1 +
 include/uapi/linux/vfio.h                         |   69 +
 22 files changed, 3690 insertions(+), 323 deletions(-)
 create mode 100644 drivers/iommu/io-pgfault.c
 create mode 100644 drivers/iommu/io-pgtable-arm.h
 create mode 100644 drivers/iommu/iommu-process.c

-- 
2.13.3


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 00/36] Process management for IOMMU + SVM for SMMUv3
@ 2017-10-06 13:31 ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

Following discussions at plumbers and elsewhere, it seems like we need to
unify some of the Shared Virtual Memory (SVM) code, in order to define
clear semantics for the SVM API.

My previous RFC [1] was centered on the SMMUv3, but some of this code will
need to be reused by the SMMUv2 and virtio-iommu drivers. This second
proposal focuses on abstracting a little more into the core IOMMU API, and
also trying to find common ground for all SVM-capable IOMMUs.

SVM is, in the context of the IOMMU, sharing page tables between a process
and a device. Traditionally it requires IO Page Fault and Process Address
Space ID capabilities in device and IOMMU.

* A device driver can bind a process to a device, with iommu_process_bind.
  Internally we hold on to the mm and get notified of its activity with an
  mmu_notifier. The bond is removed by exit_mm, by a call to
  iommu_process_unbind or iommu_detach_device.

* iommu_process_bind returns a 20-bit PASID (PCI terminology) to the
  device driver, which programs it into the device to access the process
  address space.

* The device and the IOMMU support recoverable page faults. This can be
  either ATS+PRI for PCI, or platform-specific mechanisms such as Stall
  for SMMU.

Ideally systems wanting to use SVM have to support these three features,
but in practice we'll see implementations supporting just a subset of
them, especially in validation environments. So even if this particular
patchset assumes all three capabilities, it should also be possible to
support PASID without IOPF (by pinning everything, see non-system SVM in
OpenCL), or IOPF without PASID (sharing the single device address space
with a process, could be useful for DPDK+VFIO).

Implementing both these cases would enable PT sharing alone. Some people
would also like IOPF alone without SVM (covered by this series) or process
management without shared PT (not covered). Using these features
individually is also important for testing, as SVM is in its infancy and
providing easy ways to test is essential to reduce the number of quirks
down the line.

  Process management
  ==================

The first part of this series introduces boilerplate code for managing
PASIDs and processes bound to devices. It's something any IOMMU driver
that wants to support bind/unbind will have to do, and it is difficult to
get right.

Patches
1: iommu_process and PASID allocation, attach and release
2: process_exit callback for device drivers
3: iommu_process search by PASID
4: track process changes with an MMU notifiers
5: bind and unbind operations

My proposal uses the following model:

* The PASID space is system-wide. This means that a Linux process will
  have a single PASID. I introduce the iommu_process structure and a
  global IDR to manage this.

* An iommu_process can be bound to multiple domains, and a domain can have
  multiple iommu_process.

* IOMMU groups share same PASID table. IOMMU groups are a convenient way
  to cover various hardware weaknesses that prevent a group of device to
  be isolated by the IOMMU (untrusted bridge, for instance). It's foolish
  to assume that all PASID implementations will perfectly isolate devices
  within a bus and functions within a device, so let's assume all devices
  within an IOMMU group have to share PASID traffic as well. In general
  there will be a single device per group.

* It's up to the driver implementation to decide where to implement the
  PASID tables. For SMMU it's more convenient to have a single PASID table
  per domain. And I think the model fits better with the existing IOMMU
  API: IOVA traffic is shared by all devices in a domain, so should PASID
  traffic.

  This isn't a hard requirement though, an implementation can still have a
  PASID table for each device.

  Fault handling
  ==============

The second part adds a few helpers for distributing recoverable and
unrecoverable faults to other parts of the kernel:

* to the mm subsystem, when process page tables are shared with a device,
* to VFIO allowing it to forward translation faults to guests, and let
  them to recover from it,
* to device drivers that need to do something a bit more complex than just
  displaying a fault on dmesg.

You'll notice that this overlaps the work carried out by Jacob Pan for
vSVM fault reporting (published a few hours ago! [2]), which goes in the
same direction. For iommu_fault definition and handler registration it's
probably best to go with his more complete patchset, but I needed some
code to present the full solution and a way to describe both PRI and stall
data.

Patches
6: a new fault handler registration for device drivers (see also [2])
7: report faults to device drivers or add them to a workqueue (ditto)
8: call handle_mm_fault for recoverable faults
9: allow device driver to register blocking handlers

For the moment the interactions between process and fault queue are the
following. Hopefully it should be sufficient.

* When unbinding a process, the fault queue has to be flushed to ensure
  that no old fault will hit a future process that obtains the same PASID.

* When handling a fault, find a process by PASID and handle the fault on
  its mm. The process structure is refcounted, so releasing it in the
  fault handler might free the process.

Patch 10 adds a VFIO interface for binding a device owned by a userspace
driver to processes. I didn't add capability detection now, leaving that
discussion for later (also needed by vSVM).

  ARM SMMUv3 support
  ==================

The third part adds an example user, the SMMUv3 driver. A lot of
preparatory work is still needed to support these features, I only
extracted a small part of the previous series to make it common.

If you don't care about SMMU I advise to look at patches 21, which uses
the new process management interface. Patches 27, 29 and 35 use the new
fault queue for PRI and Stall.

Patches:
11:     track domain-master links (for ATS and CD invalidation)
12-13   add stall and PASID properties to the device tree
     -> New.
14-15:  add SSID support to the SMMU
     -> Now initializes the CD tables from the value found in DT.
16-20:  share ASID and page tables part 1
21:     implement iommu-process operations
     -> New.
22-26:  share ASID and page tables part 2
27:     use the new fault queue
     -> New.
28:     find masters by SID
     -> New.
29:     add stall support
     -> New.
30-36:  add PCI ATS, PRI and PASID
     -> Now uses mostly core code

This series is available on my svm/rfc2 branch [3]. It is based on v4.14
with Yisheng's stall fix [4]. Patch 8 also requires mmput_async which
should be added back soon enough [5]. Updates and fixes will go on
branch svm/current until next version.

Hoping this helps,
Jean

[1] https://lists.linuxfoundation.org/pipermail/iommu/2017-February/020599.html
[2] https://patchwork.kernel.org/patch/9988089/
[3] git://linux-arm.org/linux-jpb svm/rfc2
[4] https://patchwork.kernel.org/patch/9963863/
[5] https://patchwork.kernel.org/patch/9952257/

Jean-Philippe Brucker (36):
  iommu: Keep track of processes and PASIDs
  iommu: Add a process_exit callback for device drivers
  iommu/process: Add public function to search for a process
  iommu/process: Track process changes with an mmu_notifier
  iommu/process: Bind and unbind process to and from devices
  iommu: Extend fault reporting
  iommu: Add a fault handler
  iommu/fault: Handle mm faults
  iommu/fault: Allow blocking fault handlers
  vfio: Add support for Shared Virtual Memory
  iommu/arm-smmu-v3: Link domains and devices
  dt-bindings: document stall and PASID properties for IOMMU masters
  iommu/of: Add stall and pasid properties to iommu_fwspec
  iommu/arm-smmu-v3: Add support for Substream IDs
  iommu/arm-smmu-v3: Add second level of context descriptor table
  iommu/arm-smmu-v3: Add support for VHE
  iommu/arm-smmu-v3: Support broadcast TLB maintenance
  iommu/arm-smmu-v3: Add SVM feature checking
  arm64: mm: Pin down ASIDs for sharing contexts with devices
  iommu/arm-smmu-v3: Track ASID state
  iommu/arm-smmu-v3: Implement process operations
  iommu/io-pgtable-arm: Factor out ARM LPAE register defines
  iommu/arm-smmu-v3: Share process page tables
  iommu/arm-smmu-v3: Steal private ASID from a domain
  iommu/arm-smmu-v3: Use shared ASID set
  iommu/arm-smmu-v3: Add support for Hardware Translation Table Update
  iommu/arm-smmu-v3: Register fault workqueue
  iommu/arm-smmu-v3: Maintain a SID->device structure
  iommu/arm-smmu-v3: Add stall support for platform devices
  ACPI/IORT: Check ATS capability in root complex nodes
  iommu/arm-smmu-v3: Add support for PCI ATS
  iommu/arm-smmu-v3: Hook ATC invalidation to process ops
  iommu/arm-smmu-v3: Disable tagged pointers
  PCI: Make "PRG Response PASID Required" handling common
  iommu/arm-smmu-v3: Add support for PRI
  iommu/arm-smmu-v3: Add support for PCI PASID

 Documentation/devicetree/bindings/iommu/iommu.txt |   24 +
 MAINTAINERS                                       |    1 +
 arch/arm64/include/asm/mmu.h                      |    1 +
 arch/arm64/include/asm/mmu_context.h              |   11 +-
 arch/arm64/mm/context.c                           |   80 +-
 drivers/acpi/arm64/iort.c                         |   11 +
 drivers/iommu/Kconfig                             |   19 +
 drivers/iommu/Makefile                            |    2 +
 drivers/iommu/amd_iommu.c                         |   19 +-
 drivers/iommu/arm-smmu-v3.c                       | 1990 ++++++++++++++++++---
 drivers/iommu/io-pgfault.c                        |  421 +++++
 drivers/iommu/io-pgtable-arm.c                    |   48 +-
 drivers/iommu/io-pgtable-arm.h                    |   67 +
 drivers/iommu/iommu-process.c                     |  604 +++++++
 drivers/iommu/iommu.c                             |  113 ++
 drivers/iommu/of_iommu.c                          |   10 +
 drivers/pci/ats.c                                 |   17 +
 drivers/vfio/vfio_iommu_type1.c                   |  243 ++-
 include/linux/iommu.h                             |  254 ++-
 include/linux/pci-ats.h                           |    8 +
 include/uapi/linux/pci_regs.h                     |    1 +
 include/uapi/linux/vfio.h                         |   69 +
 22 files changed, 3690 insertions(+), 323 deletions(-)
 create mode 100644 drivers/iommu/io-pgfault.c
 create mode 100644 drivers/iommu/io-pgtable-arm.h
 create mode 100644 drivers/iommu/iommu-process.c

-- 
2.13.3

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
  2017-10-06 13:31 ` Jean-Philippe Brucker
  (?)
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: mark.rutland-5wv7dgnIgG8,
	gabriele.paoloni-hv44wF8Li93QT0dZR+AlfA,
	catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	rjw-LthD3rsA81gm4RdzfppkhA, sudeep.holla-5wv7dgnIgG8

IOMMU drivers need a way to bind Linux processes to devices. This is used
for Shared Virtual Memory (SVM), where devices support paging. In that
mode, DMA can directly target virtual addresses of a process.

Introduce boilerplate code for allocating process structures and binding
them to devices. Four operations are added to IOMMU drivers:

* process_alloc, process_free: to create an iommu_process structure and
  perform architecture-specific operations required to grab the process
  (for instance on ARM SMMU, pin down the CPU ASID). There is a single
  iommu_process structure per Linux process.

* process_attach: attach a process to a device. The IOMMU driver checks
  that the device is capable of sharing an address space with this
  process, and writes the PASID table entry to install the process page
  directory.

  Some IOMMU drivers (e.g. ARM SMMU and virtio-iommu) will have a single
  PASID table per domain, for convenience. Other can implement it
  differently but to help these drivers, process_attach and process_detach
  take a 'first' or 'last' parameter telling whether they need to
  install/remove the PASID entry or only send the required TLB
  invalidations.

* process_detach: detach a process from a device. The IOMMU driver removes
  the PASID table entry and invalidates the IOTLBs.

process_attach and process_detach operations are serialized with a
spinlock. At the moment it is global, but if we try to optimize it, the
core should at least prevent concurrent attach/detach on the same domain.
(so multi-level PASID table code can allocate tables lazily without having
to go through the io-pgtable concurrency nightmare). process_alloc can
sleep, but process_free must not (because we'll have to call it from
call_srcu.)

At the moment we use an IDR for allocating PASIDs and retrieving contexts.
We also use a single spinlock. These can be refined and optimized later (a
custom allocator will be needed for top-down PASID allocation).

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/Kconfig         |  10 ++
 drivers/iommu/Makefile        |   1 +
 drivers/iommu/iommu-process.c | 225 ++++++++++++++++++++++++++++++++++++++++++
 drivers/iommu/iommu.c         |   1 +
 include/linux/iommu.h         |  24 +++++
 5 files changed, 261 insertions(+)
 create mode 100644 drivers/iommu/iommu-process.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index f3a21343e636..1ea5c90e37be 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -74,6 +74,16 @@ config IOMMU_DMA
 	select IOMMU_IOVA
 	select NEED_SG_DMA_LENGTH
 
+config IOMMU_PROCESS
+	bool "Process management API for the IOMMU"
+	select IOMMU_API
+	help
+	  Enable process management for the IOMMU API. In systems that support
+	  it, device drivers can bind processes to devices and share their page
+	  tables using this API.
+
+	  If unsure, say N here.
+
 config FSL_PAMU
 	bool "Freescale IOMMU support"
 	depends on PCI
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index b910aea813a1..a2832edbfaa2 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -1,6 +1,7 @@
 obj-$(CONFIG_IOMMU_API) += iommu.o
 obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
+obj-$(CONFIG_IOMMU_PROCESS) += iommu-process.o
 obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
diff --git a/drivers/iommu/iommu-process.c b/drivers/iommu/iommu-process.c
new file mode 100644
index 000000000000..a7e5a1c94305
--- /dev/null
+++ b/drivers/iommu/iommu-process.c
@@ -0,0 +1,225 @@
+/*
+ * Track processes bound to devices
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
+ *
+ * Copyright (C) 2017 ARM Ltd.
+ *
+ * Author: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
+ */
+
+#include <linux/idr.h>
+#include <linux/iommu.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+
+/* Link between a domain and a process */
+struct iommu_context {
+	struct iommu_process	*process;
+	struct iommu_domain	*domain;
+
+	struct list_head	process_head;
+	struct list_head	domain_head;
+
+	/* Number of devices that use this context */
+	refcount_t		ref;
+};
+
+/*
+ * Because we're using an IDR, PASIDs are limited to 31 bits (the sign bit is
+ * used for returning errors). In practice implementations will use at most 20
+ * bits, which is the PCI limit.
+ */
+static DEFINE_IDR(iommu_process_idr);
+
+/*
+ * For the moment this is an all-purpose lock. It serializes
+ * access/modifications to contexts (process-domain links), access/modifications
+ * to the PASID IDR, and changes to process refcount as well.
+ */
+static DEFINE_SPINLOCK(iommu_process_lock);
+
+/*
+ * Allocate a iommu_process structure for the given task.
+ *
+ * Ideally we shouldn't need the domain parameter, since iommu_process is
+ * system-wide, but we use it to retrieve the driver's allocation ops and a
+ * PASID range.
+ */
+static struct iommu_process *
+iommu_process_alloc(struct iommu_domain *domain, struct task_struct *task)
+{
+	int err;
+	int pasid;
+	struct iommu_process *process;
+
+	if (WARN_ON(!domain->ops->process_alloc || !domain->ops->process_free))
+		return ERR_PTR(-ENODEV);
+
+	process = domain->ops->process_alloc(task);
+	if (IS_ERR(process))
+		return process;
+	if (!process)
+		return ERR_PTR(-ENOMEM);
+
+	process->pid		= get_task_pid(task, PIDTYPE_PID);
+	process->release	= domain->ops->process_free;
+	INIT_LIST_HEAD(&process->domains);
+	kref_init(&process->kref);
+
+	if (!process->pid) {
+		err = -EINVAL;
+		goto err_free_process;
+	}
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&iommu_process_lock);
+	pasid = idr_alloc_cyclic(&iommu_process_idr, process, domain->min_pasid,
+				 domain->max_pasid + 1, GFP_ATOMIC);
+	process->pasid = pasid;
+	spin_unlock(&iommu_process_lock);
+	idr_preload_end();
+
+	if (pasid < 0) {
+		err = pasid;
+		goto err_put_pid;
+	}
+
+	return process;
+
+err_put_pid:
+	put_pid(process->pid);
+
+err_free_process:
+	domain->ops->process_free(process);
+
+	return ERR_PTR(err);
+}
+
+static void iommu_process_release(struct kref *kref)
+{
+	struct iommu_process *process;
+	void (*release)(struct iommu_process *);
+
+	assert_spin_locked(&iommu_process_lock);
+
+	process = container_of(kref, struct iommu_process, kref);
+	release = process->release;
+
+	WARN_ON(!list_empty(&process->domains));
+
+	idr_remove(&iommu_process_idr, process->pasid);
+	put_pid(process->pid);
+	release(process);
+}
+
+/*
+ * Returns non-zero if a reference to the process was successfully taken.
+ * Returns zero if the process is being freed and should not be used.
+ */
+static int iommu_process_get_locked(struct iommu_process *process)
+{
+	assert_spin_locked(&iommu_process_lock);
+
+	if (process)
+		return kref_get_unless_zero(&process->kref);
+
+	return 0;
+}
+
+static void iommu_process_put_locked(struct iommu_process *process)
+{
+	assert_spin_locked(&iommu_process_lock);
+
+	kref_put(&process->kref, iommu_process_release);
+}
+
+static int iommu_process_attach(struct iommu_domain *domain, struct device *dev,
+				struct iommu_process *process)
+{
+	int err;
+	int pasid = process->pasid;
+	struct iommu_context *context;
+
+	if (WARN_ON(!domain->ops->process_attach || !domain->ops->process_detach))
+		return -ENODEV;
+
+	if (pasid > domain->max_pasid || pasid < domain->min_pasid)
+		return -ENOSPC;
+
+	context = kzalloc(sizeof(*context), GFP_KERNEL);
+	if (!context)
+		return -ENOMEM;
+
+	context->process	= process;
+	context->domain		= domain;
+	refcount_set(&context->ref, 1);
+
+	spin_lock(&iommu_process_lock);
+	err = domain->ops->process_attach(domain, dev, process, true);
+	if (err) {
+		kfree(context);
+		spin_unlock(&iommu_process_lock);
+		return err;
+	}
+
+	list_add(&context->process_head, &process->domains);
+	list_add(&context->domain_head, &domain->processes);
+	spin_unlock(&iommu_process_lock);
+
+	return 0;
+}
+
+static void iommu_context_free(struct iommu_context *context)
+{
+	assert_spin_locked(&iommu_process_lock);
+
+	if (WARN_ON(!context->process || !context->domain))
+		return;
+
+	list_del(&context->process_head);
+	list_del(&context->domain_head);
+	iommu_process_put_locked(context->process);
+
+	kfree(context);
+}
+
+/* Attach an existing context to the device */
+static int iommu_process_attach_locked(struct iommu_context *context,
+				       struct device *dev)
+{
+	assert_spin_locked(&iommu_process_lock);
+
+	refcount_inc(&context->ref);
+	return context->domain->ops->process_attach(context->domain, dev,
+						    context->process, false);
+}
+
+/* Detach device from context and release it if necessary */
+static void iommu_process_detach_locked(struct iommu_context *context,
+					struct device *dev)
+{
+	bool last = false;
+	struct iommu_domain *domain = context->domain;
+
+	assert_spin_locked(&iommu_process_lock);
+
+	if (refcount_dec_and_test(&context->ref))
+		last = true;
+
+	domain->ops->process_detach(domain, dev, context->process, last);
+
+	if (last)
+		iommu_context_free(context);
+}
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3de5c0bcb5cc..b2b34cf7c978 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1264,6 +1264,7 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
 	domain->type = type;
 	/* Assume all sizes by default; the driver may override this later */
 	domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
+	INIT_LIST_HEAD(&domain->processes);
 
 	return domain;
 }
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 41b8c5757859..3978dc094706 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -94,6 +94,19 @@ struct iommu_domain {
 	void *handler_token;
 	struct iommu_domain_geometry geometry;
 	void *iova_cookie;
+
+	unsigned int min_pasid, max_pasid;
+	struct list_head processes;
+};
+
+struct iommu_process {
+	struct pid		*pid;
+	int			pasid;
+	struct list_head	domains;
+	struct kref		kref;
+
+	/* Release callback for this process */
+	void (*release)(struct iommu_process *process);
 };
 
 enum iommu_cap {
@@ -164,6 +177,11 @@ struct iommu_resv_region {
  * @domain_free: free iommu domain
  * @attach_dev: attach device to an iommu domain
  * @detach_dev: detach device from an iommu domain
+ * @process_alloc: allocate iommu process
+ * @process_free: free iommu process
+ * @process_attach: attach iommu process to a domain
+ * @process_detach: detach iommu process from a domain. Remove PASID entry and
+ *                  flush associated TLB entries.
  * @map: map a physically contiguous memory region to an iommu domain
  * @unmap: unmap a physically contiguous memory region from an iommu domain
  * @map_sg: map a scatter-gather list of physically contiguous memory chunks
@@ -197,6 +215,12 @@ struct iommu_ops {
 
 	int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
 	void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
+	struct iommu_process *(*process_alloc)(struct task_struct *task);
+	void (*process_free)(struct iommu_process *process);
+	int (*process_attach)(struct iommu_domain *domain, struct device *dev,
+			      struct iommu_process *process, bool first);
+	void (*process_detach)(struct iommu_domain *domain, struct device *dev,
+			       struct iommu_process *process, bool last);
 	int (*map)(struct iommu_domain *domain, unsigned long iova,
 		   phys_addr_t paddr, size_t size, int prot);
 	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

IOMMU drivers need a way to bind Linux processes to devices. This is used
for Shared Virtual Memory (SVM), where devices support paging. In that
mode, DMA can directly target virtual addresses of a process.

Introduce boilerplate code for allocating process structures and binding
them to devices. Four operations are added to IOMMU drivers:

* process_alloc, process_free: to create an iommu_process structure and
  perform architecture-specific operations required to grab the process
  (for instance on ARM SMMU, pin down the CPU ASID). There is a single
  iommu_process structure per Linux process.

* process_attach: attach a process to a device. The IOMMU driver checks
  that the device is capable of sharing an address space with this
  process, and writes the PASID table entry to install the process page
  directory.

  Some IOMMU drivers (e.g. ARM SMMU and virtio-iommu) will have a single
  PASID table per domain, for convenience. Other can implement it
  differently but to help these drivers, process_attach and process_detach
  take a 'first' or 'last' parameter telling whether they need to
  install/remove the PASID entry or only send the required TLB
  invalidations.

* process_detach: detach a process from a device. The IOMMU driver removes
  the PASID table entry and invalidates the IOTLBs.

process_attach and process_detach operations are serialized with a
spinlock. At the moment it is global, but if we try to optimize it, the
core should at least prevent concurrent attach/detach on the same domain.
(so multi-level PASID table code can allocate tables lazily without having
to go through the io-pgtable concurrency nightmare). process_alloc can
sleep, but process_free must not (because we'll have to call it from
call_srcu.)

At the moment we use an IDR for allocating PASIDs and retrieving contexts.
We also use a single spinlock. These can be refined and optimized later (a
custom allocator will be needed for top-down PASID allocation).

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/Kconfig         |  10 ++
 drivers/iommu/Makefile        |   1 +
 drivers/iommu/iommu-process.c | 225 ++++++++++++++++++++++++++++++++++++++++++
 drivers/iommu/iommu.c         |   1 +
 include/linux/iommu.h         |  24 +++++
 5 files changed, 261 insertions(+)
 create mode 100644 drivers/iommu/iommu-process.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index f3a21343e636..1ea5c90e37be 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -74,6 +74,16 @@ config IOMMU_DMA
 	select IOMMU_IOVA
 	select NEED_SG_DMA_LENGTH
 
+config IOMMU_PROCESS
+	bool "Process management API for the IOMMU"
+	select IOMMU_API
+	help
+	  Enable process management for the IOMMU API. In systems that support
+	  it, device drivers can bind processes to devices and share their page
+	  tables using this API.
+
+	  If unsure, say N here.
+
 config FSL_PAMU
 	bool "Freescale IOMMU support"
 	depends on PCI
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index b910aea813a1..a2832edbfaa2 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -1,6 +1,7 @@
 obj-$(CONFIG_IOMMU_API) += iommu.o
 obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
+obj-$(CONFIG_IOMMU_PROCESS) += iommu-process.o
 obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
diff --git a/drivers/iommu/iommu-process.c b/drivers/iommu/iommu-process.c
new file mode 100644
index 000000000000..a7e5a1c94305
--- /dev/null
+++ b/drivers/iommu/iommu-process.c
@@ -0,0 +1,225 @@
+/*
+ * Track processes bound to devices
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
+ *
+ * Copyright (C) 2017 ARM Ltd.
+ *
+ * Author: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
+ */
+
+#include <linux/idr.h>
+#include <linux/iommu.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+
+/* Link between a domain and a process */
+struct iommu_context {
+	struct iommu_process	*process;
+	struct iommu_domain	*domain;
+
+	struct list_head	process_head;
+	struct list_head	domain_head;
+
+	/* Number of devices that use this context */
+	refcount_t		ref;
+};
+
+/*
+ * Because we're using an IDR, PASIDs are limited to 31 bits (the sign bit is
+ * used for returning errors). In practice implementations will use at most 20
+ * bits, which is the PCI limit.
+ */
+static DEFINE_IDR(iommu_process_idr);
+
+/*
+ * For the moment this is an all-purpose lock. It serializes
+ * access/modifications to contexts (process-domain links), access/modifications
+ * to the PASID IDR, and changes to process refcount as well.
+ */
+static DEFINE_SPINLOCK(iommu_process_lock);
+
+/*
+ * Allocate a iommu_process structure for the given task.
+ *
+ * Ideally we shouldn't need the domain parameter, since iommu_process is
+ * system-wide, but we use it to retrieve the driver's allocation ops and a
+ * PASID range.
+ */
+static struct iommu_process *
+iommu_process_alloc(struct iommu_domain *domain, struct task_struct *task)
+{
+	int err;
+	int pasid;
+	struct iommu_process *process;
+
+	if (WARN_ON(!domain->ops->process_alloc || !domain->ops->process_free))
+		return ERR_PTR(-ENODEV);
+
+	process = domain->ops->process_alloc(task);
+	if (IS_ERR(process))
+		return process;
+	if (!process)
+		return ERR_PTR(-ENOMEM);
+
+	process->pid		= get_task_pid(task, PIDTYPE_PID);
+	process->release	= domain->ops->process_free;
+	INIT_LIST_HEAD(&process->domains);
+	kref_init(&process->kref);
+
+	if (!process->pid) {
+		err = -EINVAL;
+		goto err_free_process;
+	}
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&iommu_process_lock);
+	pasid = idr_alloc_cyclic(&iommu_process_idr, process, domain->min_pasid,
+				 domain->max_pasid + 1, GFP_ATOMIC);
+	process->pasid = pasid;
+	spin_unlock(&iommu_process_lock);
+	idr_preload_end();
+
+	if (pasid < 0) {
+		err = pasid;
+		goto err_put_pid;
+	}
+
+	return process;
+
+err_put_pid:
+	put_pid(process->pid);
+
+err_free_process:
+	domain->ops->process_free(process);
+
+	return ERR_PTR(err);
+}
+
+static void iommu_process_release(struct kref *kref)
+{
+	struct iommu_process *process;
+	void (*release)(struct iommu_process *);
+
+	assert_spin_locked(&iommu_process_lock);
+
+	process = container_of(kref, struct iommu_process, kref);
+	release = process->release;
+
+	WARN_ON(!list_empty(&process->domains));
+
+	idr_remove(&iommu_process_idr, process->pasid);
+	put_pid(process->pid);
+	release(process);
+}
+
+/*
+ * Returns non-zero if a reference to the process was successfully taken.
+ * Returns zero if the process is being freed and should not be used.
+ */
+static int iommu_process_get_locked(struct iommu_process *process)
+{
+	assert_spin_locked(&iommu_process_lock);
+
+	if (process)
+		return kref_get_unless_zero(&process->kref);
+
+	return 0;
+}
+
+static void iommu_process_put_locked(struct iommu_process *process)
+{
+	assert_spin_locked(&iommu_process_lock);
+
+	kref_put(&process->kref, iommu_process_release);
+}
+
+static int iommu_process_attach(struct iommu_domain *domain, struct device *dev,
+				struct iommu_process *process)
+{
+	int err;
+	int pasid = process->pasid;
+	struct iommu_context *context;
+
+	if (WARN_ON(!domain->ops->process_attach || !domain->ops->process_detach))
+		return -ENODEV;
+
+	if (pasid > domain->max_pasid || pasid < domain->min_pasid)
+		return -ENOSPC;
+
+	context = kzalloc(sizeof(*context), GFP_KERNEL);
+	if (!context)
+		return -ENOMEM;
+
+	context->process	= process;
+	context->domain		= domain;
+	refcount_set(&context->ref, 1);
+
+	spin_lock(&iommu_process_lock);
+	err = domain->ops->process_attach(domain, dev, process, true);
+	if (err) {
+		kfree(context);
+		spin_unlock(&iommu_process_lock);
+		return err;
+	}
+
+	list_add(&context->process_head, &process->domains);
+	list_add(&context->domain_head, &domain->processes);
+	spin_unlock(&iommu_process_lock);
+
+	return 0;
+}
+
+static void iommu_context_free(struct iommu_context *context)
+{
+	assert_spin_locked(&iommu_process_lock);
+
+	if (WARN_ON(!context->process || !context->domain))
+		return;
+
+	list_del(&context->process_head);
+	list_del(&context->domain_head);
+	iommu_process_put_locked(context->process);
+
+	kfree(context);
+}
+
+/* Attach an existing context to the device */
+static int iommu_process_attach_locked(struct iommu_context *context,
+				       struct device *dev)
+{
+	assert_spin_locked(&iommu_process_lock);
+
+	refcount_inc(&context->ref);
+	return context->domain->ops->process_attach(context->domain, dev,
+						    context->process, false);
+}
+
+/* Detach device from context and release it if necessary */
+static void iommu_process_detach_locked(struct iommu_context *context,
+					struct device *dev)
+{
+	bool last = false;
+	struct iommu_domain *domain = context->domain;
+
+	assert_spin_locked(&iommu_process_lock);
+
+	if (refcount_dec_and_test(&context->ref))
+		last = true;
+
+	domain->ops->process_detach(domain, dev, context->process, last);
+
+	if (last)
+		iommu_context_free(context);
+}
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3de5c0bcb5cc..b2b34cf7c978 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1264,6 +1264,7 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
 	domain->type = type;
 	/* Assume all sizes by default; the driver may override this later */
 	domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
+	INIT_LIST_HEAD(&domain->processes);
 
 	return domain;
 }
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 41b8c5757859..3978dc094706 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -94,6 +94,19 @@ struct iommu_domain {
 	void *handler_token;
 	struct iommu_domain_geometry geometry;
 	void *iova_cookie;
+
+	unsigned int min_pasid, max_pasid;
+	struct list_head processes;
+};
+
+struct iommu_process {
+	struct pid		*pid;
+	int			pasid;
+	struct list_head	domains;
+	struct kref		kref;
+
+	/* Release callback for this process */
+	void (*release)(struct iommu_process *process);
 };
 
 enum iommu_cap {
@@ -164,6 +177,11 @@ struct iommu_resv_region {
  * @domain_free: free iommu domain
  * @attach_dev: attach device to an iommu domain
  * @detach_dev: detach device from an iommu domain
+ * @process_alloc: allocate iommu process
+ * @process_free: free iommu process
+ * @process_attach: attach iommu process to a domain
+ * @process_detach: detach iommu process from a domain. Remove PASID entry and
+ *                  flush associated TLB entries.
  * @map: map a physically contiguous memory region to an iommu domain
  * @unmap: unmap a physically contiguous memory region from an iommu domain
  * @map_sg: map a scatter-gather list of physically contiguous memory chunks
@@ -197,6 +215,12 @@ struct iommu_ops {
 
 	int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
 	void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
+	struct iommu_process *(*process_alloc)(struct task_struct *task);
+	void (*process_free)(struct iommu_process *process);
+	int (*process_attach)(struct iommu_domain *domain, struct device *dev,
+			      struct iommu_process *process, bool first);
+	void (*process_detach)(struct iommu_domain *domain, struct device *dev,
+			       struct iommu_process *process, bool last);
 	int (*map)(struct iommu_domain *domain, unsigned long iova,
 		   phys_addr_t paddr, size_t size, int prot);
 	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

IOMMU drivers need a way to bind Linux processes to devices. This is used
for Shared Virtual Memory (SVM), where devices support paging. In that
mode, DMA can directly target virtual addresses of a process.

Introduce boilerplate code for allocating process structures and binding
them to devices. Four operations are added to IOMMU drivers:

* process_alloc, process_free: to create an iommu_process structure and
  perform architecture-specific operations required to grab the process
  (for instance on ARM SMMU, pin down the CPU ASID). There is a single
  iommu_process structure per Linux process.

* process_attach: attach a process to a device. The IOMMU driver checks
  that the device is capable of sharing an address space with this
  process, and writes the PASID table entry to install the process page
  directory.

  Some IOMMU drivers (e.g. ARM SMMU and virtio-iommu) will have a single
  PASID table per domain, for convenience. Other can implement it
  differently but to help these drivers, process_attach and process_detach
  take a 'first' or 'last' parameter telling whether they need to
  install/remove the PASID entry or only send the required TLB
  invalidations.

* process_detach: detach a process from a device. The IOMMU driver removes
  the PASID table entry and invalidates the IOTLBs.

process_attach and process_detach operations are serialized with a
spinlock. At the moment it is global, but if we try to optimize it, the
core should at least prevent concurrent attach/detach on the same domain.
(so multi-level PASID table code can allocate tables lazily without having
to go through the io-pgtable concurrency nightmare). process_alloc can
sleep, but process_free must not (because we'll have to call it from
call_srcu.)

At the moment we use an IDR for allocating PASIDs and retrieving contexts.
We also use a single spinlock. These can be refined and optimized later (a
custom allocator will be needed for top-down PASID allocation).

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/Kconfig         |  10 ++
 drivers/iommu/Makefile        |   1 +
 drivers/iommu/iommu-process.c | 225 ++++++++++++++++++++++++++++++++++++++++++
 drivers/iommu/iommu.c         |   1 +
 include/linux/iommu.h         |  24 +++++
 5 files changed, 261 insertions(+)
 create mode 100644 drivers/iommu/iommu-process.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index f3a21343e636..1ea5c90e37be 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -74,6 +74,16 @@ config IOMMU_DMA
 	select IOMMU_IOVA
 	select NEED_SG_DMA_LENGTH
 
+config IOMMU_PROCESS
+	bool "Process management API for the IOMMU"
+	select IOMMU_API
+	help
+	  Enable process management for the IOMMU API. In systems that support
+	  it, device drivers can bind processes to devices and share their page
+	  tables using this API.
+
+	  If unsure, say N here.
+
 config FSL_PAMU
 	bool "Freescale IOMMU support"
 	depends on PCI
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index b910aea813a1..a2832edbfaa2 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -1,6 +1,7 @@
 obj-$(CONFIG_IOMMU_API) += iommu.o
 obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
+obj-$(CONFIG_IOMMU_PROCESS) += iommu-process.o
 obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
diff --git a/drivers/iommu/iommu-process.c b/drivers/iommu/iommu-process.c
new file mode 100644
index 000000000000..a7e5a1c94305
--- /dev/null
+++ b/drivers/iommu/iommu-process.c
@@ -0,0 +1,225 @@
+/*
+ * Track processes bound to devices
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
+ *
+ * Copyright (C) 2017 ARM Ltd.
+ *
+ * Author: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
+ */
+
+#include <linux/idr.h>
+#include <linux/iommu.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+
+/* Link between a domain and a process */
+struct iommu_context {
+	struct iommu_process	*process;
+	struct iommu_domain	*domain;
+
+	struct list_head	process_head;
+	struct list_head	domain_head;
+
+	/* Number of devices that use this context */
+	refcount_t		ref;
+};
+
+/*
+ * Because we're using an IDR, PASIDs are limited to 31 bits (the sign bit is
+ * used for returning errors). In practice implementations will use at most 20
+ * bits, which is the PCI limit.
+ */
+static DEFINE_IDR(iommu_process_idr);
+
+/*
+ * For the moment this is an all-purpose lock. It serializes
+ * access/modifications to contexts (process-domain links), access/modifications
+ * to the PASID IDR, and changes to process refcount as well.
+ */
+static DEFINE_SPINLOCK(iommu_process_lock);
+
+/*
+ * Allocate a iommu_process structure for the given task.
+ *
+ * Ideally we shouldn't need the domain parameter, since iommu_process is
+ * system-wide, but we use it to retrieve the driver's allocation ops and a
+ * PASID range.
+ */
+static struct iommu_process *
+iommu_process_alloc(struct iommu_domain *domain, struct task_struct *task)
+{
+	int err;
+	int pasid;
+	struct iommu_process *process;
+
+	if (WARN_ON(!domain->ops->process_alloc || !domain->ops->process_free))
+		return ERR_PTR(-ENODEV);
+
+	process = domain->ops->process_alloc(task);
+	if (IS_ERR(process))
+		return process;
+	if (!process)
+		return ERR_PTR(-ENOMEM);
+
+	process->pid		= get_task_pid(task, PIDTYPE_PID);
+	process->release	= domain->ops->process_free;
+	INIT_LIST_HEAD(&process->domains);
+	kref_init(&process->kref);
+
+	if (!process->pid) {
+		err = -EINVAL;
+		goto err_free_process;
+	}
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&iommu_process_lock);
+	pasid = idr_alloc_cyclic(&iommu_process_idr, process, domain->min_pasid,
+				 domain->max_pasid + 1, GFP_ATOMIC);
+	process->pasid = pasid;
+	spin_unlock(&iommu_process_lock);
+	idr_preload_end();
+
+	if (pasid < 0) {
+		err = pasid;
+		goto err_put_pid;
+	}
+
+	return process;
+
+err_put_pid:
+	put_pid(process->pid);
+
+err_free_process:
+	domain->ops->process_free(process);
+
+	return ERR_PTR(err);
+}
+
+static void iommu_process_release(struct kref *kref)
+{
+	struct iommu_process *process;
+	void (*release)(struct iommu_process *);
+
+	assert_spin_locked(&iommu_process_lock);
+
+	process = container_of(kref, struct iommu_process, kref);
+	release = process->release;
+
+	WARN_ON(!list_empty(&process->domains));
+
+	idr_remove(&iommu_process_idr, process->pasid);
+	put_pid(process->pid);
+	release(process);
+}
+
+/*
+ * Returns non-zero if a reference to the process was successfully taken.
+ * Returns zero if the process is being freed and should not be used.
+ */
+static int iommu_process_get_locked(struct iommu_process *process)
+{
+	assert_spin_locked(&iommu_process_lock);
+
+	if (process)
+		return kref_get_unless_zero(&process->kref);
+
+	return 0;
+}
+
+static void iommu_process_put_locked(struct iommu_process *process)
+{
+	assert_spin_locked(&iommu_process_lock);
+
+	kref_put(&process->kref, iommu_process_release);
+}
+
+static int iommu_process_attach(struct iommu_domain *domain, struct device *dev,
+				struct iommu_process *process)
+{
+	int err;
+	int pasid = process->pasid;
+	struct iommu_context *context;
+
+	if (WARN_ON(!domain->ops->process_attach || !domain->ops->process_detach))
+		return -ENODEV;
+
+	if (pasid > domain->max_pasid || pasid < domain->min_pasid)
+		return -ENOSPC;
+
+	context = kzalloc(sizeof(*context), GFP_KERNEL);
+	if (!context)
+		return -ENOMEM;
+
+	context->process	= process;
+	context->domain		= domain;
+	refcount_set(&context->ref, 1);
+
+	spin_lock(&iommu_process_lock);
+	err = domain->ops->process_attach(domain, dev, process, true);
+	if (err) {
+		kfree(context);
+		spin_unlock(&iommu_process_lock);
+		return err;
+	}
+
+	list_add(&context->process_head, &process->domains);
+	list_add(&context->domain_head, &domain->processes);
+	spin_unlock(&iommu_process_lock);
+
+	return 0;
+}
+
+static void iommu_context_free(struct iommu_context *context)
+{
+	assert_spin_locked(&iommu_process_lock);
+
+	if (WARN_ON(!context->process || !context->domain))
+		return;
+
+	list_del(&context->process_head);
+	list_del(&context->domain_head);
+	iommu_process_put_locked(context->process);
+
+	kfree(context);
+}
+
+/* Attach an existing context to the device */
+static int iommu_process_attach_locked(struct iommu_context *context,
+				       struct device *dev)
+{
+	assert_spin_locked(&iommu_process_lock);
+
+	refcount_inc(&context->ref);
+	return context->domain->ops->process_attach(context->domain, dev,
+						    context->process, false);
+}
+
+/* Detach device from context and release it if necessary */
+static void iommu_process_detach_locked(struct iommu_context *context,
+					struct device *dev)
+{
+	bool last = false;
+	struct iommu_domain *domain = context->domain;
+
+	assert_spin_locked(&iommu_process_lock);
+
+	if (refcount_dec_and_test(&context->ref))
+		last = true;
+
+	domain->ops->process_detach(domain, dev, context->process, last);
+
+	if (last)
+		iommu_context_free(context);
+}
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3de5c0bcb5cc..b2b34cf7c978 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1264,6 +1264,7 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
 	domain->type = type;
 	/* Assume all sizes by default; the driver may override this later */
 	domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
+	INIT_LIST_HEAD(&domain->processes);
 
 	return domain;
 }
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 41b8c5757859..3978dc094706 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -94,6 +94,19 @@ struct iommu_domain {
 	void *handler_token;
 	struct iommu_domain_geometry geometry;
 	void *iova_cookie;
+
+	unsigned int min_pasid, max_pasid;
+	struct list_head processes;
+};
+
+struct iommu_process {
+	struct pid		*pid;
+	int			pasid;
+	struct list_head	domains;
+	struct kref		kref;
+
+	/* Release callback for this process */
+	void (*release)(struct iommu_process *process);
 };
 
 enum iommu_cap {
@@ -164,6 +177,11 @@ struct iommu_resv_region {
  * @domain_free: free iommu domain
  * @attach_dev: attach device to an iommu domain
  * @detach_dev: detach device from an iommu domain
+ * @process_alloc: allocate iommu process
+ * @process_free: free iommu process
+ * @process_attach: attach iommu process to a domain
+ * @process_detach: detach iommu process from a domain. Remove PASID entry and
+ *                  flush associated TLB entries.
  * @map: map a physically contiguous memory region to an iommu domain
  * @unmap: unmap a physically contiguous memory region from an iommu domain
  * @map_sg: map a scatter-gather list of physically contiguous memory chunks
@@ -197,6 +215,12 @@ struct iommu_ops {
 
 	int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
 	void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
+	struct iommu_process *(*process_alloc)(struct task_struct *task);
+	void (*process_free)(struct iommu_process *process);
+	int (*process_attach)(struct iommu_domain *domain, struct device *dev,
+			      struct iommu_process *process, bool first);
+	void (*process_detach)(struct iommu_domain *domain, struct device *dev,
+			       struct iommu_process *process, bool last);
 	int (*map)(struct iommu_domain *domain, unsigned long iova,
 		   phys_addr_t paddr, size_t size, int prot);
 	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 02/36] iommu: Add a process_exit callback for device drivers
  2017-10-06 13:31 ` Jean-Philippe Brucker
  (?)
@ 2017-10-06 13:31   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

When a process exits, we need to ensure that devices attached to it stop
issuing transactions with its PASID. Let device drivers register a
callback to be notified on process exit.

At the moment the callback is set on the domain like the fault handler,
because we don't have a structure available for IOMMU masters. This can
become problematic if different devices in a domain are managed by
distinct device drivers (for example multiple devices in the same group).
The problem is the same for the fault handler, so we'll probably fix them
all at once.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/iommu-process.c | 31 +++++++++++++++++++++++++++++++
 include/linux/iommu.h         | 19 +++++++++++++++++++
 2 files changed, 50 insertions(+)

diff --git a/drivers/iommu/iommu-process.c b/drivers/iommu/iommu-process.c
index a7e5a1c94305..61ca0bd707c0 100644
--- a/drivers/iommu/iommu-process.c
+++ b/drivers/iommu/iommu-process.c
@@ -223,3 +223,34 @@ static void iommu_process_detach_locked(struct iommu_context *context,
 	if (last)
 		iommu_context_free(context);
 }
+
+/**
+ * iommu_set_process_exit_handler() - set a callback for stopping the use of
+ * PASID in a device.
+ * @dev: the device
+ * @handler: exit handler
+ * @token: user data, will be passed back to the exit handler
+ *
+ * Users of the bind/unbind API should call this function to set a
+ * device-specific callback telling them when a process is exiting.
+ *
+ * After the callback returns, the device must not issue any more transaction
+ * with the PASIDs given as argument to the handler. It can be a single PASID
+ * value or the special IOMMU_PROCESS_EXIT_ALL.
+ *
+ * The handler itself should return 0 on success, and an appropriate error code
+ * otherwise.
+ */
+void iommu_set_process_exit_handler(struct device *dev,
+				    iommu_process_exit_handler_t handler,
+				    void *token)
+{
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+	if (WARN_ON(!domain))
+		return;
+
+	domain->process_exit = handler;
+	domain->process_exit_token = token;
+}
+EXPORT_SYMBOL_GPL(iommu_set_process_exit_handler);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 3978dc094706..8d74f9058f30 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -56,6 +56,11 @@ struct notifier_block;
 typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
 			struct device *, unsigned long, int, void *);
 
+/* All process are being detached from this device */
+#define IOMMU_PROCESS_EXIT_ALL			(-1)
+typedef int (*iommu_process_exit_handler_t)(struct iommu_domain *, struct device *dev,
+					    int pasid, void *);
+
 struct iommu_domain_geometry {
 	dma_addr_t aperture_start; /* First address that can be mapped    */
 	dma_addr_t aperture_end;   /* Last address that can be mapped     */
@@ -92,6 +97,8 @@ struct iommu_domain {
 	unsigned long pgsize_bitmap;	/* Bitmap of page sizes in use */
 	iommu_fault_handler_t handler;
 	void *handler_token;
+	iommu_process_exit_handler_t process_exit;
+	void *process_exit_token;
 	struct iommu_domain_geometry geometry;
 	void *iova_cookie;
 
@@ -722,4 +729,16 @@ const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
 
 #endif /* CONFIG_IOMMU_API */
 
+#ifdef CONFIG_IOMMU_PROCESS
+extern void iommu_set_process_exit_handler(struct device *dev,
+					   iommu_process_exit_handler_t cb,
+					   void *token);
+#else /* CONFIG_IOMMU_PROCESS */
+static inline void iommu_set_process_exit_handler(struct device *dev,
+						  iommu_process_exit_handler_t cb,
+						  void *token)
+{
+}
+#endif /* CONFIG_IOMMU_PROCESS */
+
 #endif /* __LINUX_IOMMU_H */
-- 
2.13.3


^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 02/36] iommu: Add a process_exit callback for device drivers
@ 2017-10-06 13:31   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: mark.rutland, xieyisheng1, gabriele.paoloni, catalin.marinas,
	will.deacon, okaya, yi.l.liu, lorenzo.pieralisi, ashok.raj, tn,
	joro, rfranz, lenb, jacob.jun.pan, alex.williamson, robh+dt,
	thunder.leizhen, bhelgaas, dwmw2, liubo95, rjw, robdclark,
	hanjun.guo, sudeep.holla, robin.murphy, nwatters

When a process exits, we need to ensure that devices attached to it stop
issuing transactions with its PASID. Let device drivers register a
callback to be notified on process exit.

At the moment the callback is set on the domain like the fault handler,
because we don't have a structure available for IOMMU masters. This can
become problematic if different devices in a domain are managed by
distinct device drivers (for example multiple devices in the same group).
The problem is the same for the fault handler, so we'll probably fix them
all at once.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/iommu-process.c | 31 +++++++++++++++++++++++++++++++
 include/linux/iommu.h         | 19 +++++++++++++++++++
 2 files changed, 50 insertions(+)

diff --git a/drivers/iommu/iommu-process.c b/drivers/iommu/iommu-process.c
index a7e5a1c94305..61ca0bd707c0 100644
--- a/drivers/iommu/iommu-process.c
+++ b/drivers/iommu/iommu-process.c
@@ -223,3 +223,34 @@ static void iommu_process_detach_locked(struct iommu_context *context,
 	if (last)
 		iommu_context_free(context);
 }
+
+/**
+ * iommu_set_process_exit_handler() - set a callback for stopping the use of
+ * PASID in a device.
+ * @dev: the device
+ * @handler: exit handler
+ * @token: user data, will be passed back to the exit handler
+ *
+ * Users of the bind/unbind API should call this function to set a
+ * device-specific callback telling them when a process is exiting.
+ *
+ * After the callback returns, the device must not issue any more transaction
+ * with the PASIDs given as argument to the handler. It can be a single PASID
+ * value or the special IOMMU_PROCESS_EXIT_ALL.
+ *
+ * The handler itself should return 0 on success, and an appropriate error code
+ * otherwise.
+ */
+void iommu_set_process_exit_handler(struct device *dev,
+				    iommu_process_exit_handler_t handler,
+				    void *token)
+{
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+	if (WARN_ON(!domain))
+		return;
+
+	domain->process_exit = handler;
+	domain->process_exit_token = token;
+}
+EXPORT_SYMBOL_GPL(iommu_set_process_exit_handler);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 3978dc094706..8d74f9058f30 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -56,6 +56,11 @@ struct notifier_block;
 typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
 			struct device *, unsigned long, int, void *);
 
+/* All process are being detached from this device */
+#define IOMMU_PROCESS_EXIT_ALL			(-1)
+typedef int (*iommu_process_exit_handler_t)(struct iommu_domain *, struct device *dev,
+					    int pasid, void *);
+
 struct iommu_domain_geometry {
 	dma_addr_t aperture_start; /* First address that can be mapped    */
 	dma_addr_t aperture_end;   /* Last address that can be mapped     */
@@ -92,6 +97,8 @@ struct iommu_domain {
 	unsigned long pgsize_bitmap;	/* Bitmap of page sizes in use */
 	iommu_fault_handler_t handler;
 	void *handler_token;
+	iommu_process_exit_handler_t process_exit;
+	void *process_exit_token;
 	struct iommu_domain_geometry geometry;
 	void *iova_cookie;
 
@@ -722,4 +729,16 @@ const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
 
 #endif /* CONFIG_IOMMU_API */
 
+#ifdef CONFIG_IOMMU_PROCESS
+extern void iommu_set_process_exit_handler(struct device *dev,
+					   iommu_process_exit_handler_t cb,
+					   void *token);
+#else /* CONFIG_IOMMU_PROCESS */
+static inline void iommu_set_process_exit_handler(struct device *dev,
+						  iommu_process_exit_handler_t cb,
+						  void *token)
+{
+}
+#endif /* CONFIG_IOMMU_PROCESS */
+
 #endif /* __LINUX_IOMMU_H */
-- 
2.13.3


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 02/36] iommu: Add a process_exit callback for device drivers
@ 2017-10-06 13:31   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

When a process exits, we need to ensure that devices attached to it stop
issuing transactions with its PASID. Let device drivers register a
callback to be notified on process exit.

At the moment the callback is set on the domain like the fault handler,
because we don't have a structure available for IOMMU masters. This can
become problematic if different devices in a domain are managed by
distinct device drivers (for example multiple devices in the same group).
The problem is the same for the fault handler, so we'll probably fix them
all at once.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/iommu-process.c | 31 +++++++++++++++++++++++++++++++
 include/linux/iommu.h         | 19 +++++++++++++++++++
 2 files changed, 50 insertions(+)

diff --git a/drivers/iommu/iommu-process.c b/drivers/iommu/iommu-process.c
index a7e5a1c94305..61ca0bd707c0 100644
--- a/drivers/iommu/iommu-process.c
+++ b/drivers/iommu/iommu-process.c
@@ -223,3 +223,34 @@ static void iommu_process_detach_locked(struct iommu_context *context,
 	if (last)
 		iommu_context_free(context);
 }
+
+/**
+ * iommu_set_process_exit_handler() - set a callback for stopping the use of
+ * PASID in a device.
+ * @dev: the device
+ * @handler: exit handler
+ * @token: user data, will be passed back to the exit handler
+ *
+ * Users of the bind/unbind API should call this function to set a
+ * device-specific callback telling them when a process is exiting.
+ *
+ * After the callback returns, the device must not issue any more transaction
+ * with the PASIDs given as argument to the handler. It can be a single PASID
+ * value or the special IOMMU_PROCESS_EXIT_ALL.
+ *
+ * The handler itself should return 0 on success, and an appropriate error code
+ * otherwise.
+ */
+void iommu_set_process_exit_handler(struct device *dev,
+				    iommu_process_exit_handler_t handler,
+				    void *token)
+{
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+	if (WARN_ON(!domain))
+		return;
+
+	domain->process_exit = handler;
+	domain->process_exit_token = token;
+}
+EXPORT_SYMBOL_GPL(iommu_set_process_exit_handler);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 3978dc094706..8d74f9058f30 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -56,6 +56,11 @@ struct notifier_block;
 typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
 			struct device *, unsigned long, int, void *);
 
+/* All process are being detached from this device */
+#define IOMMU_PROCESS_EXIT_ALL			(-1)
+typedef int (*iommu_process_exit_handler_t)(struct iommu_domain *, struct device *dev,
+					    int pasid, void *);
+
 struct iommu_domain_geometry {
 	dma_addr_t aperture_start; /* First address that can be mapped    */
 	dma_addr_t aperture_end;   /* Last address that can be mapped     */
@@ -92,6 +97,8 @@ struct iommu_domain {
 	unsigned long pgsize_bitmap;	/* Bitmap of page sizes in use */
 	iommu_fault_handler_t handler;
 	void *handler_token;
+	iommu_process_exit_handler_t process_exit;
+	void *process_exit_token;
 	struct iommu_domain_geometry geometry;
 	void *iova_cookie;
 
@@ -722,4 +729,16 @@ const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
 
 #endif /* CONFIG_IOMMU_API */
 
+#ifdef CONFIG_IOMMU_PROCESS
+extern void iommu_set_process_exit_handler(struct device *dev,
+					   iommu_process_exit_handler_t cb,
+					   void *token);
+#else /* CONFIG_IOMMU_PROCESS */
+static inline void iommu_set_process_exit_handler(struct device *dev,
+						  iommu_process_exit_handler_t cb,
+						  void *token)
+{
+}
+#endif /* CONFIG_IOMMU_PROCESS */
+
 #endif /* __LINUX_IOMMU_H */
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 03/36] iommu/process: Add public function to search for a process
  2017-10-06 13:31 ` Jean-Philippe Brucker
  (?)
@ 2017-10-06 13:31   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

The fault handler will need to find a process given its PASID. This is
the reason we have an IDR for storing processes, so hook it up.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/iommu-process.c | 35 +++++++++++++++++++++++++++++++++++
 include/linux/iommu.h         | 12 ++++++++++++
 2 files changed, 47 insertions(+)

diff --git a/drivers/iommu/iommu-process.c b/drivers/iommu/iommu-process.c
index 61ca0bd707c0..8f4c98632d58 100644
--- a/drivers/iommu/iommu-process.c
+++ b/drivers/iommu/iommu-process.c
@@ -145,6 +145,41 @@ static void iommu_process_put_locked(struct iommu_process *process)
 	kref_put(&process->kref, iommu_process_release);
 }
 
+/**
+ * iommu_process_put - Put reference to process, freeing it if necessary.
+ */
+void iommu_process_put(struct iommu_process *process)
+{
+	spin_lock(&iommu_process_lock);
+	iommu_process_put_locked(process);
+	spin_unlock(&iommu_process_lock);
+}
+EXPORT_SYMBOL_GPL(iommu_process_put);
+
+/**
+ * iommu_process_find - Find process associated to the given PASID
+ *
+ * Returns the IOMMU process corresponding to this PASID, or NULL if not found.
+ * A reference to the iommu_process is kept, and must be released with
+ * iommu_process_put.
+ */
+struct iommu_process *iommu_process_find(int pasid)
+{
+	struct iommu_process *process;
+
+	spin_lock(&iommu_process_lock);
+	process = idr_find(&iommu_process_idr, pasid);
+	if (process) {
+		if (!iommu_process_get_locked(process))
+			/* kref is 0, process is defunct */
+			process = NULL;
+	}
+	spin_unlock(&iommu_process_lock);
+
+	return process;
+}
+EXPORT_SYMBOL_GPL(iommu_process_find);
+
 static int iommu_process_attach(struct iommu_domain *domain, struct device *dev,
 				struct iommu_process *process)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 8d74f9058f30..e9528fcacab1 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -733,12 +733,24 @@ const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
 extern void iommu_set_process_exit_handler(struct device *dev,
 					   iommu_process_exit_handler_t cb,
 					   void *token);
+extern struct iommu_process *iommu_process_find(int pasid);
+extern void iommu_process_put(struct iommu_process *process);
+
 #else /* CONFIG_IOMMU_PROCESS */
 static inline void iommu_set_process_exit_handler(struct device *dev,
 						  iommu_process_exit_handler_t cb,
 						  void *token)
 {
 }
+
+static inline struct iommu_process *iommu_process_find(int pasid)
+{
+	return NULL;
+}
+
+static inline void iommu_process_put(struct iommu_process *process)
+{
+}
 #endif /* CONFIG_IOMMU_PROCESS */
 
 #endif /* __LINUX_IOMMU_H */
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 03/36] iommu/process: Add public function to search for a process
@ 2017-10-06 13:31   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: mark.rutland, xieyisheng1, gabriele.paoloni, catalin.marinas,
	will.deacon, okaya, yi.l.liu, lorenzo.pieralisi, ashok.raj, tn,
	joro, rfranz, lenb, jacob.jun.pan, alex.williamson, robh+dt,
	thunder.leizhen, bhelgaas, dwmw2, liubo95, rjw, robdclark,
	hanjun.guo, sudeep.holla, robin.murphy, nwatters

The fault handler will need to find a process given its PASID. This is
the reason we have an IDR for storing processes, so hook it up.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/iommu-process.c | 35 +++++++++++++++++++++++++++++++++++
 include/linux/iommu.h         | 12 ++++++++++++
 2 files changed, 47 insertions(+)

diff --git a/drivers/iommu/iommu-process.c b/drivers/iommu/iommu-process.c
index 61ca0bd707c0..8f4c98632d58 100644
--- a/drivers/iommu/iommu-process.c
+++ b/drivers/iommu/iommu-process.c
@@ -145,6 +145,41 @@ static void iommu_process_put_locked(struct iommu_process *process)
 	kref_put(&process->kref, iommu_process_release);
 }
 
+/**
+ * iommu_process_put - Put reference to process, freeing it if necessary.
+ */
+void iommu_process_put(struct iommu_process *process)
+{
+	spin_lock(&iommu_process_lock);
+	iommu_process_put_locked(process);
+	spin_unlock(&iommu_process_lock);
+}
+EXPORT_SYMBOL_GPL(iommu_process_put);
+
+/**
+ * iommu_process_find - Find process associated to the given PASID
+ *
+ * Returns the IOMMU process corresponding to this PASID, or NULL if not found.
+ * A reference to the iommu_process is kept, and must be released with
+ * iommu_process_put.
+ */
+struct iommu_process *iommu_process_find(int pasid)
+{
+	struct iommu_process *process;
+
+	spin_lock(&iommu_process_lock);
+	process = idr_find(&iommu_process_idr, pasid);
+	if (process) {
+		if (!iommu_process_get_locked(process))
+			/* kref is 0, process is defunct */
+			process = NULL;
+	}
+	spin_unlock(&iommu_process_lock);
+
+	return process;
+}
+EXPORT_SYMBOL_GPL(iommu_process_find);
+
 static int iommu_process_attach(struct iommu_domain *domain, struct device *dev,
 				struct iommu_process *process)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 8d74f9058f30..e9528fcacab1 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -733,12 +733,24 @@ const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
 extern void iommu_set_process_exit_handler(struct device *dev,
 					   iommu_process_exit_handler_t cb,
 					   void *token);
+extern struct iommu_process *iommu_process_find(int pasid);
+extern void iommu_process_put(struct iommu_process *process);
+
 #else /* CONFIG_IOMMU_PROCESS */
 static inline void iommu_set_process_exit_handler(struct device *dev,
 						  iommu_process_exit_handler_t cb,
 						  void *token)
 {
 }
+
+static inline struct iommu_process *iommu_process_find(int pasid)
+{
+	return NULL;
+}
+
+static inline void iommu_process_put(struct iommu_process *process)
+{
+}
 #endif /* CONFIG_IOMMU_PROCESS */
 
 #endif /* __LINUX_IOMMU_H */
-- 
2.13.3


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 03/36] iommu/process: Add public function to search for a process
@ 2017-10-06 13:31   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

The fault handler will need to find a process given its PASID. This is
the reason we have an IDR for storing processes, so hook it up.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/iommu-process.c | 35 +++++++++++++++++++++++++++++++++++
 include/linux/iommu.h         | 12 ++++++++++++
 2 files changed, 47 insertions(+)

diff --git a/drivers/iommu/iommu-process.c b/drivers/iommu/iommu-process.c
index 61ca0bd707c0..8f4c98632d58 100644
--- a/drivers/iommu/iommu-process.c
+++ b/drivers/iommu/iommu-process.c
@@ -145,6 +145,41 @@ static void iommu_process_put_locked(struct iommu_process *process)
 	kref_put(&process->kref, iommu_process_release);
 }
 
+/**
+ * iommu_process_put - Put reference to process, freeing it if necessary.
+ */
+void iommu_process_put(struct iommu_process *process)
+{
+	spin_lock(&iommu_process_lock);
+	iommu_process_put_locked(process);
+	spin_unlock(&iommu_process_lock);
+}
+EXPORT_SYMBOL_GPL(iommu_process_put);
+
+/**
+ * iommu_process_find - Find process associated to the given PASID
+ *
+ * Returns the IOMMU process corresponding to this PASID, or NULL if not found.
+ * A reference to the iommu_process is kept, and must be released with
+ * iommu_process_put.
+ */
+struct iommu_process *iommu_process_find(int pasid)
+{
+	struct iommu_process *process;
+
+	spin_lock(&iommu_process_lock);
+	process = idr_find(&iommu_process_idr, pasid);
+	if (process) {
+		if (!iommu_process_get_locked(process))
+			/* kref is 0, process is defunct */
+			process = NULL;
+	}
+	spin_unlock(&iommu_process_lock);
+
+	return process;
+}
+EXPORT_SYMBOL_GPL(iommu_process_find);
+
 static int iommu_process_attach(struct iommu_domain *domain, struct device *dev,
 				struct iommu_process *process)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 8d74f9058f30..e9528fcacab1 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -733,12 +733,24 @@ const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
 extern void iommu_set_process_exit_handler(struct device *dev,
 					   iommu_process_exit_handler_t cb,
 					   void *token);
+extern struct iommu_process *iommu_process_find(int pasid);
+extern void iommu_process_put(struct iommu_process *process);
+
 #else /* CONFIG_IOMMU_PROCESS */
 static inline void iommu_set_process_exit_handler(struct device *dev,
 						  iommu_process_exit_handler_t cb,
 						  void *token)
 {
 }
+
+static inline struct iommu_process *iommu_process_find(int pasid)
+{
+	return NULL;
+}
+
+static inline void iommu_process_put(struct iommu_process *process)
+{
+}
 #endif /* CONFIG_IOMMU_PROCESS */
 
 #endif /* __LINUX_IOMMU_H */
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 04/36] iommu/process: Track process changes with an mmu_notifier
  2017-10-06 13:31 ` Jean-Philippe Brucker
  (?)
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: joro-zLv9SwRftAIdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	mark.rutland-5wv7dgnIgG8, catalin.marinas-5wv7dgnIgG8,
	will.deacon-5wv7dgnIgG8, lorenzo.pieralisi-5wv7dgnIgG8,
	hanjun.guo-QSEj5FYQhm4dnm+yROfE0A, sudeep.holla-5wv7dgnIgG8,
	rjw-LthD3rsA81gm4RdzfppkhA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	robin.murphy-5wv7dgnIgG8, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	tn-nYOzD4b6Jr9Wk0Htik3J/w, liubo95-hv44wF8Li93QT0dZR+AlfA,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	gabriele.paoloni-hv44wF8Li93QT0dZR+AlfA,
	nwatters-sgV2jX0FEOL9JmXXK+q4OQ, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA,
	yi.l.liu-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	robdclark-Re5JQEeQqe8AvxtiuMwx3w

When creating an iommu_process structure, register a notifier to be
informed of changes to the virtual address space and to know when the
process exits.

Two new operations are added to the IOMMU driver:

* process_invalidate when a range of addresses is unmapped, to let the
  IOMMU driver send TLB invalidations.

* process_exit when the mm is released. It's a bit more involved in this
  case, as the IOMMU driver has to tell all devices drivers to stop using
  this PASID, then clear the PASID table and invalidate TLBs.

Adding the notifier in the mix complicates process release. In one case
device drivers free the process explicitly by calling unbind (or detaching
the device). In the other case the process could crash before unbind, in
which case the release notifier has to do all the work.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/iommu-process.c | 165 ++++++++++++++++++++++++++++++++++++++++--
 include/linux/iommu.h         |  12 +++
 2 files changed, 170 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/iommu-process.c b/drivers/iommu/iommu-process.c
index 8f4c98632d58..1ef3f55b962b 100644
--- a/drivers/iommu/iommu-process.c
+++ b/drivers/iommu/iommu-process.c
@@ -21,9 +21,14 @@
 
 #include <linux/idr.h>
 #include <linux/iommu.h>
+#include <linux/mmu_notifier.h>
 #include <linux/slab.h>
+#include <linux/sched/mm.h>
 #include <linux/spinlock.h>
 
+/* FIXME: stub for the fault queue. Remove later. */
+#define iommu_fault_queue_flush(...)
+
 /* Link between a domain and a process */
 struct iommu_context {
 	struct iommu_process	*process;
@@ -50,6 +55,8 @@ static DEFINE_IDR(iommu_process_idr);
  */
 static DEFINE_SPINLOCK(iommu_process_lock);
 
+static struct mmu_notifier_ops iommu_process_mmu_notfier;
+
 /*
  * Allocate a iommu_process structure for the given task.
  *
@@ -74,15 +81,21 @@ iommu_process_alloc(struct iommu_domain *domain, struct task_struct *task)
 		return ERR_PTR(-ENOMEM);
 
 	process->pid		= get_task_pid(task, PIDTYPE_PID);
+	process->mm		= get_task_mm(task);
+	process->notifier.ops	= &iommu_process_mmu_notfier;
 	process->release	= domain->ops->process_free;
 	INIT_LIST_HEAD(&process->domains);
-	kref_init(&process->kref);
 
 	if (!process->pid) {
 		err = -EINVAL;
 		goto err_free_process;
 	}
 
+	if (!process->mm) {
+		err = -EINVAL;
+		goto err_put_pid;
+	}
+
 	idr_preload(GFP_KERNEL);
 	spin_lock(&iommu_process_lock);
 	pasid = idr_alloc_cyclic(&iommu_process_idr, process, domain->min_pasid,
@@ -93,11 +106,44 @@ iommu_process_alloc(struct iommu_domain *domain, struct task_struct *task)
 
 	if (pasid < 0) {
 		err = pasid;
-		goto err_put_pid;
+		goto err_put_mm;
 	}
 
+	err = mmu_notifier_register(&process->notifier, process->mm);
+	if (err)
+		goto err_free_pasid;
+
+	/*
+	 * Now that the MMU notifier is valid, we can allow users to grab this
+	 * process by setting a valid refcount. Before that it was accessible in
+	 * the IDR but invalid.
+	 *
+	 * Users of the process structure obtain it with inc_not_zero, which
+	 * provides a control dependency to ensure that they don't modify the
+	 * structure if they didn't acquire the ref. So I think we need a write
+	 * barrier here to pair with that control dependency (XXX probably
+	 * nonsense.)
+	 */
+	smp_wmb();
+	kref_init(&process->kref);
+
+	/* A mm_count reference is kept by the notifier */
+	mmput(process->mm);
+
 	return process;
 
+err_free_pasid:
+	/*
+	 * Even if the process is accessible from the IDR at this point, kref is
+	 * 0 so no user could get a reference to it. Free it manually.
+	 */
+	spin_lock(&iommu_process_lock);
+	idr_remove(&iommu_process_idr, process->pasid);
+	spin_unlock(&iommu_process_lock);
+
+err_put_mm:
+	mmput(process->mm);
+
 err_put_pid:
 	put_pid(process->pid);
 
@@ -107,21 +153,46 @@ iommu_process_alloc(struct iommu_domain *domain, struct task_struct *task)
 	return ERR_PTR(err);
 }
 
-static void iommu_process_release(struct kref *kref)
+static void iommu_process_free(struct rcu_head *rcu)
 {
 	struct iommu_process *process;
 	void (*release)(struct iommu_process *);
 
+	process = container_of(rcu, struct iommu_process, rcu);
+	release = process->release;
+
+	release(process);
+}
+
+static void iommu_process_release(struct kref *kref)
+{
+	struct iommu_process *process;
+
 	assert_spin_locked(&iommu_process_lock);
 
 	process = container_of(kref, struct iommu_process, kref);
-	release = process->release;
-
 	WARN_ON(!list_empty(&process->domains));
 
 	idr_remove(&iommu_process_idr, process->pasid);
 	put_pid(process->pid);
-	release(process);
+
+	/*
+	 * If we're being released from process exit, the notifier callback
+	 * ->release has already been called. Otherwise we don't need to go
+	 * through there, the process isn't attached to anything anymore. Hence
+	 * no_release.
+	 */
+	mmu_notifier_unregister_no_release(&process->notifier, process->mm);
+
+	/*
+	 * We can't free the structure here, because ->release might be
+	 * attempting to grab it concurrently. And in the other case, if the
+	 * structure is being released from within ->release, then
+	 * __mmu_notifier_release expects to still have a valid mn when
+	 * returning. So free the structure when it's safe, after the RCU grace
+	 * period elapsed.
+	 */
+	mmu_notifier_call_srcu(&process->rcu, iommu_process_free);
 }
 
 /*
@@ -187,7 +258,8 @@ static int iommu_process_attach(struct iommu_domain *domain, struct device *dev,
 	int pasid = process->pasid;
 	struct iommu_context *context;
 
-	if (WARN_ON(!domain->ops->process_attach || !domain->ops->process_detach))
+	if (WARN_ON(!domain->ops->process_attach || !domain->ops->process_detach ||
+		    !domain->ops->process_exit || !domain->ops->process_invalidate))
 		return -ENODEV;
 
 	if (pasid > domain->max_pasid || pasid < domain->min_pasid)
@@ -259,6 +331,85 @@ static void iommu_process_detach_locked(struct iommu_context *context,
 		iommu_context_free(context);
 }
 
+/*
+ * Called when the process exits. Might race with unbind or any other function
+ * dropping the last reference to the process. As the mmu notifier doesn't hold
+ * any reference to the process when calling ->release, try to take a reference.
+ */
+static void iommu_notifier_release(struct mmu_notifier *mn, struct mm_struct *mm)
+{
+	struct iommu_context *context, *next;
+	struct iommu_process *process = container_of(mn, struct iommu_process, notifier);
+
+	/*
+	 * If the process is exiting then domains are still attached to the
+	 * process. A few things need to be done before it is safe to release
+	 *
+	 * 1) Tell the IOMMU driver to stop using this PASID (and forward the
+	 *    message to attached device drivers. It can then clear the PASID
+	 *    table and invalidate relevant TLBs.
+	 *
+	 * 2) Drop all references to this process, by freeing the contexts.
+	 */
+	spin_lock(&iommu_process_lock);
+	if (!iommu_process_get_locked(process)) {
+		/* Someone's already taking care of it. */
+		spin_unlock(&iommu_process_lock);
+		return;
+	}
+
+	list_for_each_entry_safe(context, next, &process->domains, process_head) {
+		context->domain->ops->process_exit(context->domain, process);
+		iommu_context_free(context);
+	}
+	spin_unlock(&iommu_process_lock);
+
+	iommu_fault_queue_flush(NULL);
+
+	/*
+	 * We're now reasonably certain that no more fault is being handled for
+	 * this process, since we just flushed them all out of the fault queue.
+	 * Release the last reference to free the process.
+	 */
+	iommu_process_put(process);
+}
+
+static void iommu_notifier_invalidate_range(struct mmu_notifier *mn, struct mm_struct *mm,
+					    unsigned long start, unsigned long end)
+{
+	struct iommu_context *context;
+	struct iommu_process *process = container_of(mn, struct iommu_process, notifier);
+
+	spin_lock(&iommu_process_lock);
+	list_for_each_entry(context, &process->domains, process_head) {
+		context->domain->ops->process_invalidate(context->domain,
+						 process, start, end - start);
+	}
+	spin_unlock(&iommu_process_lock);
+}
+
+static int iommu_notifier_clear_flush_young(struct mmu_notifier *mn,
+					    struct mm_struct *mm,
+					    unsigned long start,
+					    unsigned long end)
+{
+	iommu_notifier_invalidate_range(mn, mm, start, end);
+	return 0;
+}
+
+static void iommu_notifier_change_pte(struct mmu_notifier *mn, struct mm_struct *mm,
+				      unsigned long address, pte_t pte)
+{
+	iommu_notifier_invalidate_range(mn, mm, address, address + PAGE_SIZE);
+}
+
+static struct mmu_notifier_ops iommu_process_mmu_notfier = {
+	.release		= iommu_notifier_release,
+	.clear_flush_young	= iommu_notifier_clear_flush_young,
+	.change_pte		= iommu_notifier_change_pte,
+	.invalidate_range	= iommu_notifier_invalidate_range,
+};
+
 /**
  * iommu_set_process_exit_handler() - set a callback for stopping the use of
  * PASID in a device.
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index e9528fcacab1..42b818437fa1 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -25,6 +25,7 @@
 #include <linux/errno.h>
 #include <linux/err.h>
 #include <linux/of.h>
+#include <linux/mmu_notifier.h>
 
 #define IOMMU_READ	(1 << 0)
 #define IOMMU_WRITE	(1 << 1)
@@ -111,9 +112,13 @@ struct iommu_process {
 	int			pasid;
 	struct list_head	domains;
 	struct kref		kref;
+	struct mmu_notifier	notifier;
+	struct mm_struct	*mm;
 
 	/* Release callback for this process */
 	void (*release)(struct iommu_process *process);
+	/* For postponed release */
+	struct rcu_head		rcu;
 };
 
 enum iommu_cap {
@@ -189,6 +194,9 @@ struct iommu_resv_region {
  * @process_attach: attach iommu process to a domain
  * @process_detach: detach iommu process from a domain. Remove PASID entry and
  *                  flush associated TLB entries.
+ * @process_invalidate: Invalidate a range of mappings for a process.
+ * @process_exit: A process is exiting. Stop using the PASID, remove PASID entry
+ *                and flush associated TLB entries.
  * @map: map a physically contiguous memory region to an iommu domain
  * @unmap: unmap a physically contiguous memory region from an iommu domain
  * @map_sg: map a scatter-gather list of physically contiguous memory chunks
@@ -228,6 +236,10 @@ struct iommu_ops {
 			      struct iommu_process *process, bool first);
 	void (*process_detach)(struct iommu_domain *domain, struct device *dev,
 			       struct iommu_process *process, bool last);
+	void (*process_invalidate)(struct iommu_domain *domain,
+				   struct iommu_process *process,
+				   unsigned long iova, size_t size);
+	void (*process_exit)(struct iommu_domain *domain, struct iommu_process *process);
 	int (*map)(struct iommu_domain *domain, unsigned long iova,
 		   phys_addr_t paddr, size_t size, int prot);
 	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
-- 
2.13.3

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 04/36] iommu/process: Track process changes with an mmu_notifier
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

When creating an iommu_process structure, register a notifier to be
informed of changes to the virtual address space and to know when the
process exits.

Two new operations are added to the IOMMU driver:

* process_invalidate when a range of addresses is unmapped, to let the
  IOMMU driver send TLB invalidations.

* process_exit when the mm is released. It's a bit more involved in this
  case, as the IOMMU driver has to tell all devices drivers to stop using
  this PASID, then clear the PASID table and invalidate TLBs.

Adding the notifier in the mix complicates process release. In one case
device drivers free the process explicitly by calling unbind (or detaching
the device). In the other case the process could crash before unbind, in
which case the release notifier has to do all the work.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/iommu-process.c | 165 ++++++++++++++++++++++++++++++++++++++++--
 include/linux/iommu.h         |  12 +++
 2 files changed, 170 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/iommu-process.c b/drivers/iommu/iommu-process.c
index 8f4c98632d58..1ef3f55b962b 100644
--- a/drivers/iommu/iommu-process.c
+++ b/drivers/iommu/iommu-process.c
@@ -21,9 +21,14 @@
 
 #include <linux/idr.h>
 #include <linux/iommu.h>
+#include <linux/mmu_notifier.h>
 #include <linux/slab.h>
+#include <linux/sched/mm.h>
 #include <linux/spinlock.h>
 
+/* FIXME: stub for the fault queue. Remove later. */
+#define iommu_fault_queue_flush(...)
+
 /* Link between a domain and a process */
 struct iommu_context {
 	struct iommu_process	*process;
@@ -50,6 +55,8 @@ static DEFINE_IDR(iommu_process_idr);
  */
 static DEFINE_SPINLOCK(iommu_process_lock);
 
+static struct mmu_notifier_ops iommu_process_mmu_notfier;
+
 /*
  * Allocate a iommu_process structure for the given task.
  *
@@ -74,15 +81,21 @@ iommu_process_alloc(struct iommu_domain *domain, struct task_struct *task)
 		return ERR_PTR(-ENOMEM);
 
 	process->pid		= get_task_pid(task, PIDTYPE_PID);
+	process->mm		= get_task_mm(task);
+	process->notifier.ops	= &iommu_process_mmu_notfier;
 	process->release	= domain->ops->process_free;
 	INIT_LIST_HEAD(&process->domains);
-	kref_init(&process->kref);
 
 	if (!process->pid) {
 		err = -EINVAL;
 		goto err_free_process;
 	}
 
+	if (!process->mm) {
+		err = -EINVAL;
+		goto err_put_pid;
+	}
+
 	idr_preload(GFP_KERNEL);
 	spin_lock(&iommu_process_lock);
 	pasid = idr_alloc_cyclic(&iommu_process_idr, process, domain->min_pasid,
@@ -93,11 +106,44 @@ iommu_process_alloc(struct iommu_domain *domain, struct task_struct *task)
 
 	if (pasid < 0) {
 		err = pasid;
-		goto err_put_pid;
+		goto err_put_mm;
 	}
 
+	err = mmu_notifier_register(&process->notifier, process->mm);
+	if (err)
+		goto err_free_pasid;
+
+	/*
+	 * Now that the MMU notifier is valid, we can allow users to grab this
+	 * process by setting a valid refcount. Before that it was accessible in
+	 * the IDR but invalid.
+	 *
+	 * Users of the process structure obtain it with inc_not_zero, which
+	 * provides a control dependency to ensure that they don't modify the
+	 * structure if they didn't acquire the ref. So I think we need a write
+	 * barrier here to pair with that control dependency (XXX probably
+	 * nonsense.)
+	 */
+	smp_wmb();
+	kref_init(&process->kref);
+
+	/* A mm_count reference is kept by the notifier */
+	mmput(process->mm);
+
 	return process;
 
+err_free_pasid:
+	/*
+	 * Even if the process is accessible from the IDR at this point, kref is
+	 * 0 so no user could get a reference to it. Free it manually.
+	 */
+	spin_lock(&iommu_process_lock);
+	idr_remove(&iommu_process_idr, process->pasid);
+	spin_unlock(&iommu_process_lock);
+
+err_put_mm:
+	mmput(process->mm);
+
 err_put_pid:
 	put_pid(process->pid);
 
@@ -107,21 +153,46 @@ iommu_process_alloc(struct iommu_domain *domain, struct task_struct *task)
 	return ERR_PTR(err);
 }
 
-static void iommu_process_release(struct kref *kref)
+static void iommu_process_free(struct rcu_head *rcu)
 {
 	struct iommu_process *process;
 	void (*release)(struct iommu_process *);
 
+	process = container_of(rcu, struct iommu_process, rcu);
+	release = process->release;
+
+	release(process);
+}
+
+static void iommu_process_release(struct kref *kref)
+{
+	struct iommu_process *process;
+
 	assert_spin_locked(&iommu_process_lock);
 
 	process = container_of(kref, struct iommu_process, kref);
-	release = process->release;
-
 	WARN_ON(!list_empty(&process->domains));
 
 	idr_remove(&iommu_process_idr, process->pasid);
 	put_pid(process->pid);
-	release(process);
+
+	/*
+	 * If we're being released from process exit, the notifier callback
+	 * ->release has already been called. Otherwise we don't need to go
+	 * through there, the process isn't attached to anything anymore. Hence
+	 * no_release.
+	 */
+	mmu_notifier_unregister_no_release(&process->notifier, process->mm);
+
+	/*
+	 * We can't free the structure here, because ->release might be
+	 * attempting to grab it concurrently. And in the other case, if the
+	 * structure is being released from within ->release, then
+	 * __mmu_notifier_release expects to still have a valid mn when
+	 * returning. So free the structure when it's safe, after the RCU grace
+	 * period elapsed.
+	 */
+	mmu_notifier_call_srcu(&process->rcu, iommu_process_free);
 }
 
 /*
@@ -187,7 +258,8 @@ static int iommu_process_attach(struct iommu_domain *domain, struct device *dev,
 	int pasid = process->pasid;
 	struct iommu_context *context;
 
-	if (WARN_ON(!domain->ops->process_attach || !domain->ops->process_detach))
+	if (WARN_ON(!domain->ops->process_attach || !domain->ops->process_detach ||
+		    !domain->ops->process_exit || !domain->ops->process_invalidate))
 		return -ENODEV;
 
 	if (pasid > domain->max_pasid || pasid < domain->min_pasid)
@@ -259,6 +331,85 @@ static void iommu_process_detach_locked(struct iommu_context *context,
 		iommu_context_free(context);
 }
 
+/*
+ * Called when the process exits. Might race with unbind or any other function
+ * dropping the last reference to the process. As the mmu notifier doesn't hold
+ * any reference to the process when calling ->release, try to take a reference.
+ */
+static void iommu_notifier_release(struct mmu_notifier *mn, struct mm_struct *mm)
+{
+	struct iommu_context *context, *next;
+	struct iommu_process *process = container_of(mn, struct iommu_process, notifier);
+
+	/*
+	 * If the process is exiting then domains are still attached to the
+	 * process. A few things need to be done before it is safe to release
+	 *
+	 * 1) Tell the IOMMU driver to stop using this PASID (and forward the
+	 *    message to attached device drivers. It can then clear the PASID
+	 *    table and invalidate relevant TLBs.
+	 *
+	 * 2) Drop all references to this process, by freeing the contexts.
+	 */
+	spin_lock(&iommu_process_lock);
+	if (!iommu_process_get_locked(process)) {
+		/* Someone's already taking care of it. */
+		spin_unlock(&iommu_process_lock);
+		return;
+	}
+
+	list_for_each_entry_safe(context, next, &process->domains, process_head) {
+		context->domain->ops->process_exit(context->domain, process);
+		iommu_context_free(context);
+	}
+	spin_unlock(&iommu_process_lock);
+
+	iommu_fault_queue_flush(NULL);
+
+	/*
+	 * We're now reasonably certain that no more fault is being handled for
+	 * this process, since we just flushed them all out of the fault queue.
+	 * Release the last reference to free the process.
+	 */
+	iommu_process_put(process);
+}
+
+static void iommu_notifier_invalidate_range(struct mmu_notifier *mn, struct mm_struct *mm,
+					    unsigned long start, unsigned long end)
+{
+	struct iommu_context *context;
+	struct iommu_process *process = container_of(mn, struct iommu_process, notifier);
+
+	spin_lock(&iommu_process_lock);
+	list_for_each_entry(context, &process->domains, process_head) {
+		context->domain->ops->process_invalidate(context->domain,
+						 process, start, end - start);
+	}
+	spin_unlock(&iommu_process_lock);
+}
+
+static int iommu_notifier_clear_flush_young(struct mmu_notifier *mn,
+					    struct mm_struct *mm,
+					    unsigned long start,
+					    unsigned long end)
+{
+	iommu_notifier_invalidate_range(mn, mm, start, end);
+	return 0;
+}
+
+static void iommu_notifier_change_pte(struct mmu_notifier *mn, struct mm_struct *mm,
+				      unsigned long address, pte_t pte)
+{
+	iommu_notifier_invalidate_range(mn, mm, address, address + PAGE_SIZE);
+}
+
+static struct mmu_notifier_ops iommu_process_mmu_notfier = {
+	.release		= iommu_notifier_release,
+	.clear_flush_young	= iommu_notifier_clear_flush_young,
+	.change_pte		= iommu_notifier_change_pte,
+	.invalidate_range	= iommu_notifier_invalidate_range,
+};
+
 /**
  * iommu_set_process_exit_handler() - set a callback for stopping the use of
  * PASID in a device.
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index e9528fcacab1..42b818437fa1 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -25,6 +25,7 @@
 #include <linux/errno.h>
 #include <linux/err.h>
 #include <linux/of.h>
+#include <linux/mmu_notifier.h>
 
 #define IOMMU_READ	(1 << 0)
 #define IOMMU_WRITE	(1 << 1)
@@ -111,9 +112,13 @@ struct iommu_process {
 	int			pasid;
 	struct list_head	domains;
 	struct kref		kref;
+	struct mmu_notifier	notifier;
+	struct mm_struct	*mm;
 
 	/* Release callback for this process */
 	void (*release)(struct iommu_process *process);
+	/* For postponed release */
+	struct rcu_head		rcu;
 };
 
 enum iommu_cap {
@@ -189,6 +194,9 @@ struct iommu_resv_region {
  * @process_attach: attach iommu process to a domain
  * @process_detach: detach iommu process from a domain. Remove PASID entry and
  *                  flush associated TLB entries.
+ * @process_invalidate: Invalidate a range of mappings for a process.
+ * @process_exit: A process is exiting. Stop using the PASID, remove PASID entry
+ *                and flush associated TLB entries.
  * @map: map a physically contiguous memory region to an iommu domain
  * @unmap: unmap a physically contiguous memory region from an iommu domain
  * @map_sg: map a scatter-gather list of physically contiguous memory chunks
@@ -228,6 +236,10 @@ struct iommu_ops {
 			      struct iommu_process *process, bool first);
 	void (*process_detach)(struct iommu_domain *domain, struct device *dev,
 			       struct iommu_process *process, bool last);
+	void (*process_invalidate)(struct iommu_domain *domain,
+				   struct iommu_process *process,
+				   unsigned long iova, size_t size);
+	void (*process_exit)(struct iommu_domain *domain, struct iommu_process *process);
 	int (*map)(struct iommu_domain *domain, unsigned long iova,
 		   phys_addr_t paddr, size_t size, int prot);
 	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
-- 
2.13.3


^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 04/36] iommu/process: Track process changes with an mmu_notifier
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

When creating an iommu_process structure, register a notifier to be
informed of changes to the virtual address space and to know when the
process exits.

Two new operations are added to the IOMMU driver:

* process_invalidate when a range of addresses is unmapped, to let the
  IOMMU driver send TLB invalidations.

* process_exit when the mm is released. It's a bit more involved in this
  case, as the IOMMU driver has to tell all devices drivers to stop using
  this PASID, then clear the PASID table and invalidate TLBs.

Adding the notifier in the mix complicates process release. In one case
device drivers free the process explicitly by calling unbind (or detaching
the device). In the other case the process could crash before unbind, in
which case the release notifier has to do all the work.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/iommu-process.c | 165 ++++++++++++++++++++++++++++++++++++++++--
 include/linux/iommu.h         |  12 +++
 2 files changed, 170 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/iommu-process.c b/drivers/iommu/iommu-process.c
index 8f4c98632d58..1ef3f55b962b 100644
--- a/drivers/iommu/iommu-process.c
+++ b/drivers/iommu/iommu-process.c
@@ -21,9 +21,14 @@
 
 #include <linux/idr.h>
 #include <linux/iommu.h>
+#include <linux/mmu_notifier.h>
 #include <linux/slab.h>
+#include <linux/sched/mm.h>
 #include <linux/spinlock.h>
 
+/* FIXME: stub for the fault queue. Remove later. */
+#define iommu_fault_queue_flush(...)
+
 /* Link between a domain and a process */
 struct iommu_context {
 	struct iommu_process	*process;
@@ -50,6 +55,8 @@ static DEFINE_IDR(iommu_process_idr);
  */
 static DEFINE_SPINLOCK(iommu_process_lock);
 
+static struct mmu_notifier_ops iommu_process_mmu_notfier;
+
 /*
  * Allocate a iommu_process structure for the given task.
  *
@@ -74,15 +81,21 @@ iommu_process_alloc(struct iommu_domain *domain, struct task_struct *task)
 		return ERR_PTR(-ENOMEM);
 
 	process->pid		= get_task_pid(task, PIDTYPE_PID);
+	process->mm		= get_task_mm(task);
+	process->notifier.ops	= &iommu_process_mmu_notfier;
 	process->release	= domain->ops->process_free;
 	INIT_LIST_HEAD(&process->domains);
-	kref_init(&process->kref);
 
 	if (!process->pid) {
 		err = -EINVAL;
 		goto err_free_process;
 	}
 
+	if (!process->mm) {
+		err = -EINVAL;
+		goto err_put_pid;
+	}
+
 	idr_preload(GFP_KERNEL);
 	spin_lock(&iommu_process_lock);
 	pasid = idr_alloc_cyclic(&iommu_process_idr, process, domain->min_pasid,
@@ -93,11 +106,44 @@ iommu_process_alloc(struct iommu_domain *domain, struct task_struct *task)
 
 	if (pasid < 0) {
 		err = pasid;
-		goto err_put_pid;
+		goto err_put_mm;
 	}
 
+	err = mmu_notifier_register(&process->notifier, process->mm);
+	if (err)
+		goto err_free_pasid;
+
+	/*
+	 * Now that the MMU notifier is valid, we can allow users to grab this
+	 * process by setting a valid refcount. Before that it was accessible in
+	 * the IDR but invalid.
+	 *
+	 * Users of the process structure obtain it with inc_not_zero, which
+	 * provides a control dependency to ensure that they don't modify the
+	 * structure if they didn't acquire the ref. So I think we need a write
+	 * barrier here to pair with that control dependency (XXX probably
+	 * nonsense.)
+	 */
+	smp_wmb();
+	kref_init(&process->kref);
+
+	/* A mm_count reference is kept by the notifier */
+	mmput(process->mm);
+
 	return process;
 
+err_free_pasid:
+	/*
+	 * Even if the process is accessible from the IDR at this point, kref is
+	 * 0 so no user could get a reference to it. Free it manually.
+	 */
+	spin_lock(&iommu_process_lock);
+	idr_remove(&iommu_process_idr, process->pasid);
+	spin_unlock(&iommu_process_lock);
+
+err_put_mm:
+	mmput(process->mm);
+
 err_put_pid:
 	put_pid(process->pid);
 
@@ -107,21 +153,46 @@ iommu_process_alloc(struct iommu_domain *domain, struct task_struct *task)
 	return ERR_PTR(err);
 }
 
-static void iommu_process_release(struct kref *kref)
+static void iommu_process_free(struct rcu_head *rcu)
 {
 	struct iommu_process *process;
 	void (*release)(struct iommu_process *);
 
+	process = container_of(rcu, struct iommu_process, rcu);
+	release = process->release;
+
+	release(process);
+}
+
+static void iommu_process_release(struct kref *kref)
+{
+	struct iommu_process *process;
+
 	assert_spin_locked(&iommu_process_lock);
 
 	process = container_of(kref, struct iommu_process, kref);
-	release = process->release;
-
 	WARN_ON(!list_empty(&process->domains));
 
 	idr_remove(&iommu_process_idr, process->pasid);
 	put_pid(process->pid);
-	release(process);
+
+	/*
+	 * If we're being released from process exit, the notifier callback
+	 * ->release has already been called. Otherwise we don't need to go
+	 * through there, the process isn't attached to anything anymore. Hence
+	 * no_release.
+	 */
+	mmu_notifier_unregister_no_release(&process->notifier, process->mm);
+
+	/*
+	 * We can't free the structure here, because ->release might be
+	 * attempting to grab it concurrently. And in the other case, if the
+	 * structure is being released from within ->release, then
+	 * __mmu_notifier_release expects to still have a valid mn when
+	 * returning. So free the structure when it's safe, after the RCU grace
+	 * period elapsed.
+	 */
+	mmu_notifier_call_srcu(&process->rcu, iommu_process_free);
 }
 
 /*
@@ -187,7 +258,8 @@ static int iommu_process_attach(struct iommu_domain *domain, struct device *dev,
 	int pasid = process->pasid;
 	struct iommu_context *context;
 
-	if (WARN_ON(!domain->ops->process_attach || !domain->ops->process_detach))
+	if (WARN_ON(!domain->ops->process_attach || !domain->ops->process_detach ||
+		    !domain->ops->process_exit || !domain->ops->process_invalidate))
 		return -ENODEV;
 
 	if (pasid > domain->max_pasid || pasid < domain->min_pasid)
@@ -259,6 +331,85 @@ static void iommu_process_detach_locked(struct iommu_context *context,
 		iommu_context_free(context);
 }
 
+/*
+ * Called when the process exits. Might race with unbind or any other function
+ * dropping the last reference to the process. As the mmu notifier doesn't hold
+ * any reference to the process when calling ->release, try to take a reference.
+ */
+static void iommu_notifier_release(struct mmu_notifier *mn, struct mm_struct *mm)
+{
+	struct iommu_context *context, *next;
+	struct iommu_process *process = container_of(mn, struct iommu_process, notifier);
+
+	/*
+	 * If the process is exiting then domains are still attached to the
+	 * process. A few things need to be done before it is safe to release
+	 *
+	 * 1) Tell the IOMMU driver to stop using this PASID (and forward the
+	 *    message to attached device drivers. It can then clear the PASID
+	 *    table and invalidate relevant TLBs.
+	 *
+	 * 2) Drop all references to this process, by freeing the contexts.
+	 */
+	spin_lock(&iommu_process_lock);
+	if (!iommu_process_get_locked(process)) {
+		/* Someone's already taking care of it. */
+		spin_unlock(&iommu_process_lock);
+		return;
+	}
+
+	list_for_each_entry_safe(context, next, &process->domains, process_head) {
+		context->domain->ops->process_exit(context->domain, process);
+		iommu_context_free(context);
+	}
+	spin_unlock(&iommu_process_lock);
+
+	iommu_fault_queue_flush(NULL);
+
+	/*
+	 * We're now reasonably certain that no more fault is being handled for
+	 * this process, since we just flushed them all out of the fault queue.
+	 * Release the last reference to free the process.
+	 */
+	iommu_process_put(process);
+}
+
+static void iommu_notifier_invalidate_range(struct mmu_notifier *mn, struct mm_struct *mm,
+					    unsigned long start, unsigned long end)
+{
+	struct iommu_context *context;
+	struct iommu_process *process = container_of(mn, struct iommu_process, notifier);
+
+	spin_lock(&iommu_process_lock);
+	list_for_each_entry(context, &process->domains, process_head) {
+		context->domain->ops->process_invalidate(context->domain,
+						 process, start, end - start);
+	}
+	spin_unlock(&iommu_process_lock);
+}
+
+static int iommu_notifier_clear_flush_young(struct mmu_notifier *mn,
+					    struct mm_struct *mm,
+					    unsigned long start,
+					    unsigned long end)
+{
+	iommu_notifier_invalidate_range(mn, mm, start, end);
+	return 0;
+}
+
+static void iommu_notifier_change_pte(struct mmu_notifier *mn, struct mm_struct *mm,
+				      unsigned long address, pte_t pte)
+{
+	iommu_notifier_invalidate_range(mn, mm, address, address + PAGE_SIZE);
+}
+
+static struct mmu_notifier_ops iommu_process_mmu_notfier = {
+	.release		= iommu_notifier_release,
+	.clear_flush_young	= iommu_notifier_clear_flush_young,
+	.change_pte		= iommu_notifier_change_pte,
+	.invalidate_range	= iommu_notifier_invalidate_range,
+};
+
 /**
  * iommu_set_process_exit_handler() - set a callback for stopping the use of
  * PASID in a device.
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index e9528fcacab1..42b818437fa1 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -25,6 +25,7 @@
 #include <linux/errno.h>
 #include <linux/err.h>
 #include <linux/of.h>
+#include <linux/mmu_notifier.h>
 
 #define IOMMU_READ	(1 << 0)
 #define IOMMU_WRITE	(1 << 1)
@@ -111,9 +112,13 @@ struct iommu_process {
 	int			pasid;
 	struct list_head	domains;
 	struct kref		kref;
+	struct mmu_notifier	notifier;
+	struct mm_struct	*mm;
 
 	/* Release callback for this process */
 	void (*release)(struct iommu_process *process);
+	/* For postponed release */
+	struct rcu_head		rcu;
 };
 
 enum iommu_cap {
@@ -189,6 +194,9 @@ struct iommu_resv_region {
  * @process_attach: attach iommu process to a domain
  * @process_detach: detach iommu process from a domain. Remove PASID entry and
  *                  flush associated TLB entries.
+ * @process_invalidate: Invalidate a range of mappings for a process.
+ * @process_exit: A process is exiting. Stop using the PASID, remove PASID entry
+ *                and flush associated TLB entries.
  * @map: map a physically contiguous memory region to an iommu domain
  * @unmap: unmap a physically contiguous memory region from an iommu domain
  * @map_sg: map a scatter-gather list of physically contiguous memory chunks
@@ -228,6 +236,10 @@ struct iommu_ops {
 			      struct iommu_process *process, bool first);
 	void (*process_detach)(struct iommu_domain *domain, struct device *dev,
 			       struct iommu_process *process, bool last);
+	void (*process_invalidate)(struct iommu_domain *domain,
+				   struct iommu_process *process,
+				   unsigned long iova, size_t size);
+	void (*process_exit)(struct iommu_domain *domain, struct iommu_process *process);
 	int (*map)(struct iommu_domain *domain, unsigned long iova,
 		   phys_addr_t paddr, size_t size, int prot);
 	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 05/36] iommu/process: Bind and unbind process to and from devices
  2017-10-06 13:31 ` Jean-Philippe Brucker
  (?)
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: mark.rutland-5wv7dgnIgG8,
	gabriele.paoloni-hv44wF8Li93QT0dZR+AlfA,
	catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	rjw-LthD3rsA81gm4RdzfppkhA, sudeep.holla-5wv7dgnIgG8

Add bind and unbind operations to the IOMMU API. Device drivers can use
them to share process page tables with their device.
iommu_process_bind_group is provided for VFIO's convenience, as it needs
to provide a coherent interface on containers. Device drivers will most
likely want to use iommu_process_bind_device, which doesn't bind the whole
group.

PASIDs are de facto shared between all devices in a group (because of
hardware weaknesses), but we don't do anything about it at the API level.
Making bind_device call bind_group is probably the wrong way around,
because it requires more work on our side for no benefit. We'd have to
replay all binds each time a device is hotplugged into a group. But when a
device is hotplugged into a group, the device driver will have to do a
bind before using its PASID anyway and we can reject inconsistencies at
that point.

Concurrent calls to iommu_process_bind_device for the same process are not
supported at the moment (they'll race on process_alloc which will only
succeed for the first one; the others will have to retry the bind). I also
don't support calling bind() on a dying process, not sure if it matters.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/iommu-process.c | 165 ++++++++++++++++++++++++++++++++++++++++++
 drivers/iommu/iommu.c         |  64 ++++++++++++++++
 include/linux/iommu.h         |  41 +++++++++++
 3 files changed, 270 insertions(+)

diff --git a/drivers/iommu/iommu-process.c b/drivers/iommu/iommu-process.c
index 1ef3f55b962b..dee7691e3791 100644
--- a/drivers/iommu/iommu-process.c
+++ b/drivers/iommu/iommu-process.c
@@ -411,6 +411,171 @@ static struct mmu_notifier_ops iommu_process_mmu_notfier = {
 };
 
 /**
+ * iommu_process_bind_device - Bind a process address space to a device
+ * @dev: the device
+ * @task: the process to bind
+ * @pasid: valid address where the PASID will be stored
+ * @flags: bond properties (IOMMU_PROCESS_BIND_*)
+ *
+ * Create a bond between device and task, allowing the device to access the
+ * process address space using the returned PASID.
+ *
+ * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an error
+ * is returned.
+ */
+int iommu_process_bind_device(struct device *dev, struct task_struct *task,
+			      int *pasid, int flags)
+{
+	int err, i;
+	int nesting;
+	struct pid *pid;
+	struct iommu_domain *domain;
+	struct iommu_process *process;
+	struct iommu_context *cur_context;
+	struct iommu_context *context = NULL;
+
+	domain = iommu_get_domain_for_dev(dev);
+	if (WARN_ON(!domain))
+		return -EINVAL;
+
+	if (!iommu_domain_get_attr(domain, DOMAIN_ATTR_NESTING, &nesting) &&
+	    nesting)
+		return -EINVAL;
+
+	pid = get_task_pid(task, PIDTYPE_PID);
+	if (!pid)
+		return -EINVAL;
+
+	/* If an iommu_process already exists, use it */
+	spin_lock(&iommu_process_lock);
+	idr_for_each_entry(&iommu_process_idr, process, i) {
+		if (process->pid != pid)
+			continue;
+
+		if (!iommu_process_get_locked(process)) {
+			/* Process is defunct, create a new one */
+			process = NULL;
+			break;
+		}
+
+		/* Great, is it also bound to this domain? */
+		list_for_each_entry(cur_context, &process->domains,
+				    process_head) {
+			if (cur_context->domain != domain)
+				continue;
+
+			context = cur_context;
+			*pasid = process->pasid;
+
+			/* Splendid, tell the driver and increase the ref */
+			err = iommu_process_attach_locked(context, dev);
+			if (err)
+				iommu_process_put_locked(process);
+
+			break;
+		}
+		break;
+	}
+	spin_unlock(&iommu_process_lock);
+	put_pid(pid);
+
+	if (context)
+		return err;
+
+	if (!process) {
+		process = iommu_process_alloc(domain, task);
+		if (IS_ERR(process))
+			return PTR_ERR(process);
+	}
+
+	err = iommu_process_attach(domain, dev, process);
+	if (err) {
+		iommu_process_put(process);
+		return err;
+	}
+
+	*pasid = process->pasid;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iommu_process_bind_device);
+
+/**
+ * iommu_process_unbind_device - Remove a bond created with
+ * iommu_process_bind_device.
+ *
+ * @dev: the device
+ * @pasid: the pasid returned by bind
+ */
+int iommu_process_unbind_device(struct device *dev, int pasid)
+{
+	struct iommu_domain *domain;
+	struct iommu_process *process;
+	struct iommu_context *cur_context;
+	struct iommu_context *context = NULL;
+
+	domain = iommu_get_domain_for_dev(dev);
+	if (WARN_ON(!domain))
+		return -EINVAL;
+
+	/*
+	 * Caller stopped the device from issuing PASIDs, now make sure they are
+	 * out of the fault queue.
+	 */
+	iommu_fault_queue_flush(dev);
+
+	spin_lock(&iommu_process_lock);
+	process = idr_find(&iommu_process_idr, pasid);
+	if (!process) {
+		spin_unlock(&iommu_process_lock);
+		return -ESRCH;
+	}
+
+	list_for_each_entry(cur_context, &process->domains, process_head) {
+		if (cur_context->domain == domain) {
+			context = cur_context;
+			break;
+		}
+	}
+
+	if (context)
+		iommu_process_detach_locked(context, dev);
+	spin_unlock(&iommu_process_lock);
+
+	return context ? 0 : -ESRCH;
+}
+EXPORT_SYMBOL_GPL(iommu_process_unbind_device);
+
+/*
+ * __iommu_process_unbind_dev_all - Detach all processes attached to this
+ * device.
+ *
+ * When detaching @device from @domain, IOMMU drivers have to use this function.
+ */
+void __iommu_process_unbind_dev_all(struct iommu_domain *domain, struct device *dev)
+{
+	struct iommu_context *context, *next;
+
+	/* Ask device driver to stop using all PASIDs */
+	spin_lock(&iommu_process_lock);
+	if (domain->process_exit) {
+		list_for_each_entry(context, &domain->processes, domain_head)
+			domain->process_exit(domain, dev,
+					     context->process->pasid,
+					     domain->process_exit_token);
+	}
+	spin_unlock(&iommu_process_lock);
+
+	iommu_fault_queue_flush(dev);
+
+	spin_lock(&iommu_process_lock);
+	list_for_each_entry_safe(context, next, &domain->processes, domain_head)
+		iommu_process_detach_locked(context, dev);
+	spin_unlock(&iommu_process_lock);
+}
+EXPORT_SYMBOL_GPL(__iommu_process_unbind_dev_all);
+
+/**
  * iommu_set_process_exit_handler() - set a callback for stopping the use of
  * PASID in a device.
  * @dev: the device
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index b2b34cf7c978..f9cb89dd28f5 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1460,6 +1460,70 @@ void iommu_detach_group(struct iommu_domain *domain, struct iommu_group *group)
 }
 EXPORT_SYMBOL_GPL(iommu_detach_group);
 
+/*
+ * iommu_process_bind_group - Share process address space with all devices in
+ * the group.
+ * @group: the iommu group
+ * @task: the process to bind
+ * @pasid: valid address where the PASID will be stored
+ * @flags: bond properties (IOMMU_PROCESS_BIND_*)
+ *
+ * Create a bond between group and process, allowing devices in the group to
+ * access the process address space using @pasid.
+ *
+ * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an error
+ * is returned.
+ */
+int iommu_process_bind_group(struct iommu_group *group,
+			     struct task_struct *task, int *pasid, int flags)
+{
+	struct group_device *device;
+	int ret = -ENODEV;
+
+	if (!pasid)
+		return -EINVAL;
+
+	if (!group->domain)
+		return -EINVAL;
+
+	mutex_lock(&group->mutex);
+	list_for_each_entry(device, &group->devices, list) {
+		ret = iommu_process_bind_device(device->dev, task, pasid,
+						flags);
+		if (ret)
+			break;
+	}
+
+	if (ret) {
+		list_for_each_entry_continue_reverse(device, &group->devices, list)
+			iommu_process_unbind_device(device->dev, *pasid);
+	}
+	mutex_unlock(&group->mutex);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_process_bind_group);
+
+/**
+ * iommu_process_unbind_group - Remove a bond created with
+ * iommu_process_bind_group
+ *
+ * @group: the group
+ * @pasid: the pasid returned by bind
+ */
+int iommu_process_unbind_group(struct iommu_group *group, int pasid)
+{
+	struct group_device *device;
+
+	mutex_lock(&group->mutex);
+	list_for_each_entry(device, &group->devices, list)
+		iommu_process_unbind_device(device->dev, pasid);
+	mutex_unlock(&group->mutex);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iommu_process_unbind_group);
+
 phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova)
 {
 	if (unlikely(domain->ops->iova_to_phys == NULL))
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 42b818437fa1..e64c2711ea8d 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -454,6 +454,10 @@ int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
 void iommu_fwspec_free(struct device *dev);
 int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids);
 const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode);
+extern int iommu_process_bind_group(struct iommu_group *group,
+				    struct task_struct *task, int *pasid,
+				    int flags);
+extern int iommu_process_unbind_group(struct iommu_group *group, int pasid);
 
 #else /* CONFIG_IOMMU_API */
 
@@ -739,6 +743,19 @@ const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
 	return NULL;
 }
 
+static inline int iommu_process_bind_group(struct iommu_group *group,
+					   struct task_struct *task, int *pasid,
+					   int flags)
+{
+	return -ENODEV;
+}
+
+static inline int iommu_process_unbind_group(struct iommu_group *group,
+					     int pasid)
+{
+	return -ENODEV;
+}
+
 #endif /* CONFIG_IOMMU_API */
 
 #ifdef CONFIG_IOMMU_PROCESS
@@ -747,6 +764,12 @@ extern void iommu_set_process_exit_handler(struct device *dev,
 					   void *token);
 extern struct iommu_process *iommu_process_find(int pasid);
 extern void iommu_process_put(struct iommu_process *process);
+extern int iommu_process_bind_device(struct device *dev,
+				     struct task_struct *task, int *pasid,
+				     int flags);
+extern int iommu_process_unbind_device(struct device *dev, int pasid);
+extern void __iommu_process_unbind_dev_all(struct iommu_domain *domain,
+					   struct device *dev);
 
 #else /* CONFIG_IOMMU_PROCESS */
 static inline void iommu_set_process_exit_handler(struct device *dev,
@@ -763,6 +786,24 @@ static inline struct iommu_process *iommu_process_find(int pasid)
 static inline void iommu_process_put(struct iommu_process *process)
 {
 }
+
+static inline int iommu_process_bind_device(struct device *dev,
+					    struct task_struct *task,
+					    int *pasid, int flags)
+{
+	return -ENODEV;
+}
+
+static inline int iommu_process_unbind_device(struct device *dev, int pasid)
+{
+	return -ENODEV;
+}
+
+static inline void __iommu_process_unbind_dev_all(struct iommu_domain *domain,
+						  struct device *dev)
+{
+}
+
 #endif /* CONFIG_IOMMU_PROCESS */
 
 #endif /* __LINUX_IOMMU_H */
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 05/36] iommu/process: Bind and unbind process to and from devices
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

Add bind and unbind operations to the IOMMU API. Device drivers can use
them to share process page tables with their device.
iommu_process_bind_group is provided for VFIO's convenience, as it needs
to provide a coherent interface on containers. Device drivers will most
likely want to use iommu_process_bind_device, which doesn't bind the whole
group.

PASIDs are de facto shared between all devices in a group (because of
hardware weaknesses), but we don't do anything about it at the API level.
Making bind_device call bind_group is probably the wrong way around,
because it requires more work on our side for no benefit. We'd have to
replay all binds each time a device is hotplugged into a group. But when a
device is hotplugged into a group, the device driver will have to do a
bind before using its PASID anyway and we can reject inconsistencies at
that point.

Concurrent calls to iommu_process_bind_device for the same process are not
supported at the moment (they'll race on process_alloc which will only
succeed for the first one; the others will have to retry the bind). I also
don't support calling bind() on a dying process, not sure if it matters.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/iommu-process.c | 165 ++++++++++++++++++++++++++++++++++++++++++
 drivers/iommu/iommu.c         |  64 ++++++++++++++++
 include/linux/iommu.h         |  41 +++++++++++
 3 files changed, 270 insertions(+)

diff --git a/drivers/iommu/iommu-process.c b/drivers/iommu/iommu-process.c
index 1ef3f55b962b..dee7691e3791 100644
--- a/drivers/iommu/iommu-process.c
+++ b/drivers/iommu/iommu-process.c
@@ -411,6 +411,171 @@ static struct mmu_notifier_ops iommu_process_mmu_notfier = {
 };
 
 /**
+ * iommu_process_bind_device - Bind a process address space to a device
+ * @dev: the device
+ * @task: the process to bind
+ * @pasid: valid address where the PASID will be stored
+ * @flags: bond properties (IOMMU_PROCESS_BIND_*)
+ *
+ * Create a bond between device and task, allowing the device to access the
+ * process address space using the returned PASID.
+ *
+ * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an error
+ * is returned.
+ */
+int iommu_process_bind_device(struct device *dev, struct task_struct *task,
+			      int *pasid, int flags)
+{
+	int err, i;
+	int nesting;
+	struct pid *pid;
+	struct iommu_domain *domain;
+	struct iommu_process *process;
+	struct iommu_context *cur_context;
+	struct iommu_context *context = NULL;
+
+	domain = iommu_get_domain_for_dev(dev);
+	if (WARN_ON(!domain))
+		return -EINVAL;
+
+	if (!iommu_domain_get_attr(domain, DOMAIN_ATTR_NESTING, &nesting) &&
+	    nesting)
+		return -EINVAL;
+
+	pid = get_task_pid(task, PIDTYPE_PID);
+	if (!pid)
+		return -EINVAL;
+
+	/* If an iommu_process already exists, use it */
+	spin_lock(&iommu_process_lock);
+	idr_for_each_entry(&iommu_process_idr, process, i) {
+		if (process->pid != pid)
+			continue;
+
+		if (!iommu_process_get_locked(process)) {
+			/* Process is defunct, create a new one */
+			process = NULL;
+			break;
+		}
+
+		/* Great, is it also bound to this domain? */
+		list_for_each_entry(cur_context, &process->domains,
+				    process_head) {
+			if (cur_context->domain != domain)
+				continue;
+
+			context = cur_context;
+			*pasid = process->pasid;
+
+			/* Splendid, tell the driver and increase the ref */
+			err = iommu_process_attach_locked(context, dev);
+			if (err)
+				iommu_process_put_locked(process);
+
+			break;
+		}
+		break;
+	}
+	spin_unlock(&iommu_process_lock);
+	put_pid(pid);
+
+	if (context)
+		return err;
+
+	if (!process) {
+		process = iommu_process_alloc(domain, task);
+		if (IS_ERR(process))
+			return PTR_ERR(process);
+	}
+
+	err = iommu_process_attach(domain, dev, process);
+	if (err) {
+		iommu_process_put(process);
+		return err;
+	}
+
+	*pasid = process->pasid;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iommu_process_bind_device);
+
+/**
+ * iommu_process_unbind_device - Remove a bond created with
+ * iommu_process_bind_device.
+ *
+ * @dev: the device
+ * @pasid: the pasid returned by bind
+ */
+int iommu_process_unbind_device(struct device *dev, int pasid)
+{
+	struct iommu_domain *domain;
+	struct iommu_process *process;
+	struct iommu_context *cur_context;
+	struct iommu_context *context = NULL;
+
+	domain = iommu_get_domain_for_dev(dev);
+	if (WARN_ON(!domain))
+		return -EINVAL;
+
+	/*
+	 * Caller stopped the device from issuing PASIDs, now make sure they are
+	 * out of the fault queue.
+	 */
+	iommu_fault_queue_flush(dev);
+
+	spin_lock(&iommu_process_lock);
+	process = idr_find(&iommu_process_idr, pasid);
+	if (!process) {
+		spin_unlock(&iommu_process_lock);
+		return -ESRCH;
+	}
+
+	list_for_each_entry(cur_context, &process->domains, process_head) {
+		if (cur_context->domain == domain) {
+			context = cur_context;
+			break;
+		}
+	}
+
+	if (context)
+		iommu_process_detach_locked(context, dev);
+	spin_unlock(&iommu_process_lock);
+
+	return context ? 0 : -ESRCH;
+}
+EXPORT_SYMBOL_GPL(iommu_process_unbind_device);
+
+/*
+ * __iommu_process_unbind_dev_all - Detach all processes attached to this
+ * device.
+ *
+ * When detaching @device from @domain, IOMMU drivers have to use this function.
+ */
+void __iommu_process_unbind_dev_all(struct iommu_domain *domain, struct device *dev)
+{
+	struct iommu_context *context, *next;
+
+	/* Ask device driver to stop using all PASIDs */
+	spin_lock(&iommu_process_lock);
+	if (domain->process_exit) {
+		list_for_each_entry(context, &domain->processes, domain_head)
+			domain->process_exit(domain, dev,
+					     context->process->pasid,
+					     domain->process_exit_token);
+	}
+	spin_unlock(&iommu_process_lock);
+
+	iommu_fault_queue_flush(dev);
+
+	spin_lock(&iommu_process_lock);
+	list_for_each_entry_safe(context, next, &domain->processes, domain_head)
+		iommu_process_detach_locked(context, dev);
+	spin_unlock(&iommu_process_lock);
+}
+EXPORT_SYMBOL_GPL(__iommu_process_unbind_dev_all);
+
+/**
  * iommu_set_process_exit_handler() - set a callback for stopping the use of
  * PASID in a device.
  * @dev: the device
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index b2b34cf7c978..f9cb89dd28f5 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1460,6 +1460,70 @@ void iommu_detach_group(struct iommu_domain *domain, struct iommu_group *group)
 }
 EXPORT_SYMBOL_GPL(iommu_detach_group);
 
+/*
+ * iommu_process_bind_group - Share process address space with all devices in
+ * the group.
+ * @group: the iommu group
+ * @task: the process to bind
+ * @pasid: valid address where the PASID will be stored
+ * @flags: bond properties (IOMMU_PROCESS_BIND_*)
+ *
+ * Create a bond between group and process, allowing devices in the group to
+ * access the process address space using @pasid.
+ *
+ * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an error
+ * is returned.
+ */
+int iommu_process_bind_group(struct iommu_group *group,
+			     struct task_struct *task, int *pasid, int flags)
+{
+	struct group_device *device;
+	int ret = -ENODEV;
+
+	if (!pasid)
+		return -EINVAL;
+
+	if (!group->domain)
+		return -EINVAL;
+
+	mutex_lock(&group->mutex);
+	list_for_each_entry(device, &group->devices, list) {
+		ret = iommu_process_bind_device(device->dev, task, pasid,
+						flags);
+		if (ret)
+			break;
+	}
+
+	if (ret) {
+		list_for_each_entry_continue_reverse(device, &group->devices, list)
+			iommu_process_unbind_device(device->dev, *pasid);
+	}
+	mutex_unlock(&group->mutex);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_process_bind_group);
+
+/**
+ * iommu_process_unbind_group - Remove a bond created with
+ * iommu_process_bind_group
+ *
+ * @group: the group
+ * @pasid: the pasid returned by bind
+ */
+int iommu_process_unbind_group(struct iommu_group *group, int pasid)
+{
+	struct group_device *device;
+
+	mutex_lock(&group->mutex);
+	list_for_each_entry(device, &group->devices, list)
+		iommu_process_unbind_device(device->dev, pasid);
+	mutex_unlock(&group->mutex);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iommu_process_unbind_group);
+
 phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova)
 {
 	if (unlikely(domain->ops->iova_to_phys == NULL))
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 42b818437fa1..e64c2711ea8d 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -454,6 +454,10 @@ int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
 void iommu_fwspec_free(struct device *dev);
 int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids);
 const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode);
+extern int iommu_process_bind_group(struct iommu_group *group,
+				    struct task_struct *task, int *pasid,
+				    int flags);
+extern int iommu_process_unbind_group(struct iommu_group *group, int pasid);
 
 #else /* CONFIG_IOMMU_API */
 
@@ -739,6 +743,19 @@ const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
 	return NULL;
 }
 
+static inline int iommu_process_bind_group(struct iommu_group *group,
+					   struct task_struct *task, int *pasid,
+					   int flags)
+{
+	return -ENODEV;
+}
+
+static inline int iommu_process_unbind_group(struct iommu_group *group,
+					     int pasid)
+{
+	return -ENODEV;
+}
+
 #endif /* CONFIG_IOMMU_API */
 
 #ifdef CONFIG_IOMMU_PROCESS
@@ -747,6 +764,12 @@ extern void iommu_set_process_exit_handler(struct device *dev,
 					   void *token);
 extern struct iommu_process *iommu_process_find(int pasid);
 extern void iommu_process_put(struct iommu_process *process);
+extern int iommu_process_bind_device(struct device *dev,
+				     struct task_struct *task, int *pasid,
+				     int flags);
+extern int iommu_process_unbind_device(struct device *dev, int pasid);
+extern void __iommu_process_unbind_dev_all(struct iommu_domain *domain,
+					   struct device *dev);
 
 #else /* CONFIG_IOMMU_PROCESS */
 static inline void iommu_set_process_exit_handler(struct device *dev,
@@ -763,6 +786,24 @@ static inline struct iommu_process *iommu_process_find(int pasid)
 static inline void iommu_process_put(struct iommu_process *process)
 {
 }
+
+static inline int iommu_process_bind_device(struct device *dev,
+					    struct task_struct *task,
+					    int *pasid, int flags)
+{
+	return -ENODEV;
+}
+
+static inline int iommu_process_unbind_device(struct device *dev, int pasid)
+{
+	return -ENODEV;
+}
+
+static inline void __iommu_process_unbind_dev_all(struct iommu_domain *domain,
+						  struct device *dev)
+{
+}
+
 #endif /* CONFIG_IOMMU_PROCESS */
 
 #endif /* __LINUX_IOMMU_H */
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 05/36] iommu/process: Bind and unbind process to and from devices
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

Add bind and unbind operations to the IOMMU API. Device drivers can use
them to share process page tables with their device.
iommu_process_bind_group is provided for VFIO's convenience, as it needs
to provide a coherent interface on containers. Device drivers will most
likely want to use iommu_process_bind_device, which doesn't bind the whole
group.

PASIDs are de facto shared between all devices in a group (because of
hardware weaknesses), but we don't do anything about it at the API level.
Making bind_device call bind_group is probably the wrong way around,
because it requires more work on our side for no benefit. We'd have to
replay all binds each time a device is hotplugged into a group. But when a
device is hotplugged into a group, the device driver will have to do a
bind before using its PASID anyway and we can reject inconsistencies at
that point.

Concurrent calls to iommu_process_bind_device for the same process are not
supported at the moment (they'll race on process_alloc which will only
succeed for the first one; the others will have to retry the bind). I also
don't support calling bind() on a dying process, not sure if it matters.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/iommu-process.c | 165 ++++++++++++++++++++++++++++++++++++++++++
 drivers/iommu/iommu.c         |  64 ++++++++++++++++
 include/linux/iommu.h         |  41 +++++++++++
 3 files changed, 270 insertions(+)

diff --git a/drivers/iommu/iommu-process.c b/drivers/iommu/iommu-process.c
index 1ef3f55b962b..dee7691e3791 100644
--- a/drivers/iommu/iommu-process.c
+++ b/drivers/iommu/iommu-process.c
@@ -411,6 +411,171 @@ static struct mmu_notifier_ops iommu_process_mmu_notfier = {
 };
 
 /**
+ * iommu_process_bind_device - Bind a process address space to a device
+ * @dev: the device
+ * @task: the process to bind
+ * @pasid: valid address where the PASID will be stored
+ * @flags: bond properties (IOMMU_PROCESS_BIND_*)
+ *
+ * Create a bond between device and task, allowing the device to access the
+ * process address space using the returned PASID.
+ *
+ * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an error
+ * is returned.
+ */
+int iommu_process_bind_device(struct device *dev, struct task_struct *task,
+			      int *pasid, int flags)
+{
+	int err, i;
+	int nesting;
+	struct pid *pid;
+	struct iommu_domain *domain;
+	struct iommu_process *process;
+	struct iommu_context *cur_context;
+	struct iommu_context *context = NULL;
+
+	domain = iommu_get_domain_for_dev(dev);
+	if (WARN_ON(!domain))
+		return -EINVAL;
+
+	if (!iommu_domain_get_attr(domain, DOMAIN_ATTR_NESTING, &nesting) &&
+	    nesting)
+		return -EINVAL;
+
+	pid = get_task_pid(task, PIDTYPE_PID);
+	if (!pid)
+		return -EINVAL;
+
+	/* If an iommu_process already exists, use it */
+	spin_lock(&iommu_process_lock);
+	idr_for_each_entry(&iommu_process_idr, process, i) {
+		if (process->pid != pid)
+			continue;
+
+		if (!iommu_process_get_locked(process)) {
+			/* Process is defunct, create a new one */
+			process = NULL;
+			break;
+		}
+
+		/* Great, is it also bound to this domain? */
+		list_for_each_entry(cur_context, &process->domains,
+				    process_head) {
+			if (cur_context->domain != domain)
+				continue;
+
+			context = cur_context;
+			*pasid = process->pasid;
+
+			/* Splendid, tell the driver and increase the ref */
+			err = iommu_process_attach_locked(context, dev);
+			if (err)
+				iommu_process_put_locked(process);
+
+			break;
+		}
+		break;
+	}
+	spin_unlock(&iommu_process_lock);
+	put_pid(pid);
+
+	if (context)
+		return err;
+
+	if (!process) {
+		process = iommu_process_alloc(domain, task);
+		if (IS_ERR(process))
+			return PTR_ERR(process);
+	}
+
+	err = iommu_process_attach(domain, dev, process);
+	if (err) {
+		iommu_process_put(process);
+		return err;
+	}
+
+	*pasid = process->pasid;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iommu_process_bind_device);
+
+/**
+ * iommu_process_unbind_device - Remove a bond created with
+ * iommu_process_bind_device.
+ *
+ * @dev: the device
+ * @pasid: the pasid returned by bind
+ */
+int iommu_process_unbind_device(struct device *dev, int pasid)
+{
+	struct iommu_domain *domain;
+	struct iommu_process *process;
+	struct iommu_context *cur_context;
+	struct iommu_context *context = NULL;
+
+	domain = iommu_get_domain_for_dev(dev);
+	if (WARN_ON(!domain))
+		return -EINVAL;
+
+	/*
+	 * Caller stopped the device from issuing PASIDs, now make sure they are
+	 * out of the fault queue.
+	 */
+	iommu_fault_queue_flush(dev);
+
+	spin_lock(&iommu_process_lock);
+	process = idr_find(&iommu_process_idr, pasid);
+	if (!process) {
+		spin_unlock(&iommu_process_lock);
+		return -ESRCH;
+	}
+
+	list_for_each_entry(cur_context, &process->domains, process_head) {
+		if (cur_context->domain == domain) {
+			context = cur_context;
+			break;
+		}
+	}
+
+	if (context)
+		iommu_process_detach_locked(context, dev);
+	spin_unlock(&iommu_process_lock);
+
+	return context ? 0 : -ESRCH;
+}
+EXPORT_SYMBOL_GPL(iommu_process_unbind_device);
+
+/*
+ * __iommu_process_unbind_dev_all - Detach all processes attached to this
+ * device.
+ *
+ * When detaching @device from @domain, IOMMU drivers have to use this function.
+ */
+void __iommu_process_unbind_dev_all(struct iommu_domain *domain, struct device *dev)
+{
+	struct iommu_context *context, *next;
+
+	/* Ask device driver to stop using all PASIDs */
+	spin_lock(&iommu_process_lock);
+	if (domain->process_exit) {
+		list_for_each_entry(context, &domain->processes, domain_head)
+			domain->process_exit(domain, dev,
+					     context->process->pasid,
+					     domain->process_exit_token);
+	}
+	spin_unlock(&iommu_process_lock);
+
+	iommu_fault_queue_flush(dev);
+
+	spin_lock(&iommu_process_lock);
+	list_for_each_entry_safe(context, next, &domain->processes, domain_head)
+		iommu_process_detach_locked(context, dev);
+	spin_unlock(&iommu_process_lock);
+}
+EXPORT_SYMBOL_GPL(__iommu_process_unbind_dev_all);
+
+/**
  * iommu_set_process_exit_handler() - set a callback for stopping the use of
  * PASID in a device.
  * @dev: the device
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index b2b34cf7c978..f9cb89dd28f5 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1460,6 +1460,70 @@ void iommu_detach_group(struct iommu_domain *domain, struct iommu_group *group)
 }
 EXPORT_SYMBOL_GPL(iommu_detach_group);
 
+/*
+ * iommu_process_bind_group - Share process address space with all devices in
+ * the group.
+ * @group: the iommu group
+ * @task: the process to bind
+ * @pasid: valid address where the PASID will be stored
+ * @flags: bond properties (IOMMU_PROCESS_BIND_*)
+ *
+ * Create a bond between group and process, allowing devices in the group to
+ * access the process address space using @pasid.
+ *
+ * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an error
+ * is returned.
+ */
+int iommu_process_bind_group(struct iommu_group *group,
+			     struct task_struct *task, int *pasid, int flags)
+{
+	struct group_device *device;
+	int ret = -ENODEV;
+
+	if (!pasid)
+		return -EINVAL;
+
+	if (!group->domain)
+		return -EINVAL;
+
+	mutex_lock(&group->mutex);
+	list_for_each_entry(device, &group->devices, list) {
+		ret = iommu_process_bind_device(device->dev, task, pasid,
+						flags);
+		if (ret)
+			break;
+	}
+
+	if (ret) {
+		list_for_each_entry_continue_reverse(device, &group->devices, list)
+			iommu_process_unbind_device(device->dev, *pasid);
+	}
+	mutex_unlock(&group->mutex);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_process_bind_group);
+
+/**
+ * iommu_process_unbind_group - Remove a bond created with
+ * iommu_process_bind_group
+ *
+ * @group: the group
+ * @pasid: the pasid returned by bind
+ */
+int iommu_process_unbind_group(struct iommu_group *group, int pasid)
+{
+	struct group_device *device;
+
+	mutex_lock(&group->mutex);
+	list_for_each_entry(device, &group->devices, list)
+		iommu_process_unbind_device(device->dev, pasid);
+	mutex_unlock(&group->mutex);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iommu_process_unbind_group);
+
 phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova)
 {
 	if (unlikely(domain->ops->iova_to_phys == NULL))
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 42b818437fa1..e64c2711ea8d 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -454,6 +454,10 @@ int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
 void iommu_fwspec_free(struct device *dev);
 int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids);
 const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode);
+extern int iommu_process_bind_group(struct iommu_group *group,
+				    struct task_struct *task, int *pasid,
+				    int flags);
+extern int iommu_process_unbind_group(struct iommu_group *group, int pasid);
 
 #else /* CONFIG_IOMMU_API */
 
@@ -739,6 +743,19 @@ const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)
 	return NULL;
 }
 
+static inline int iommu_process_bind_group(struct iommu_group *group,
+					   struct task_struct *task, int *pasid,
+					   int flags)
+{
+	return -ENODEV;
+}
+
+static inline int iommu_process_unbind_group(struct iommu_group *group,
+					     int pasid)
+{
+	return -ENODEV;
+}
+
 #endif /* CONFIG_IOMMU_API */
 
 #ifdef CONFIG_IOMMU_PROCESS
@@ -747,6 +764,12 @@ extern void iommu_set_process_exit_handler(struct device *dev,
 					   void *token);
 extern struct iommu_process *iommu_process_find(int pasid);
 extern void iommu_process_put(struct iommu_process *process);
+extern int iommu_process_bind_device(struct device *dev,
+				     struct task_struct *task, int *pasid,
+				     int flags);
+extern int iommu_process_unbind_device(struct device *dev, int pasid);
+extern void __iommu_process_unbind_dev_all(struct iommu_domain *domain,
+					   struct device *dev);
 
 #else /* CONFIG_IOMMU_PROCESS */
 static inline void iommu_set_process_exit_handler(struct device *dev,
@@ -763,6 +786,24 @@ static inline struct iommu_process *iommu_process_find(int pasid)
 static inline void iommu_process_put(struct iommu_process *process)
 {
 }
+
+static inline int iommu_process_bind_device(struct device *dev,
+					    struct task_struct *task,
+					    int *pasid, int flags)
+{
+	return -ENODEV;
+}
+
+static inline int iommu_process_unbind_device(struct device *dev, int pasid)
+{
+	return -ENODEV;
+}
+
+static inline void __iommu_process_unbind_dev_all(struct iommu_domain *domain,
+						  struct device *dev)
+{
+}
+
 #endif /* CONFIG_IOMMU_PROCESS */
 
 #endif /* __LINUX_IOMMU_H */
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 06/36] iommu: Extend fault reporting
  2017-10-06 13:31 ` Jean-Philippe Brucker
  (?)
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: mark.rutland-5wv7dgnIgG8,
	gabriele.paoloni-hv44wF8Li93QT0dZR+AlfA,
	catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	rjw-LthD3rsA81gm4RdzfppkhA, sudeep.holla-5wv7dgnIgG8

A number of new users will need additional information in the IOMMU fault
report, such as PASID and/or PRI group. Pass a new iommu_fault structure
to the driver callbacks.

For the moment add the new API in parallel, with an "ext" prefix, to let
users move to the new API at their pace. I think it would be nice to use a
single API though. There are only 4 device drivers using it, and receiving
an iommu_fault instead of iova/flags wouldn't hurt them much.

For the same reason as the process_exit handler, set_fault_handler is done
on a device rather than a domain (although for the moment stored in the
domain). Even when multiple heterogenous devices are in the same IOMMU
group, each of their driver might want to register a fault handler. At the
moment they'll race to set the handler, and the winning driver will
receive fault reports from other devices.

The new registering function also takes flags as arguments, giving future
users a way to specify at which point of the fault process they want to be
called.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/iommu.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/iommu.h | 18 ++++++++++++++++++
 2 files changed, 60 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f9cb89dd28f5..ee956b5fc301 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1234,6 +1234,8 @@ EXPORT_SYMBOL_GPL(iommu_capable);
  * This function should be used by IOMMU users which want to be notified
  * whenever an IOMMU fault happens.
  *
+ * Note that new users should use iommu_set_ext_fault_handler instead.
+ *
  * The fault handler itself should return 0 on success, and an appropriate
  * error code otherwise.
  */
@@ -1243,11 +1245,44 @@ void iommu_set_fault_handler(struct iommu_domain *domain,
 {
 	BUG_ON(!domain);
 
+	if (WARN_ON(domain->ext_handler))
+		return;
+
 	domain->handler = handler;
 	domain->handler_token = token;
 }
 EXPORT_SYMBOL_GPL(iommu_set_fault_handler);
 
+/**
+ * iommu_set_ext_fault_handler() - set a fault handler for a device
+ * @dev: the device
+ * @handler: fault handler
+ * @token: user data, will be passed back to the fault handler
+ * @flags: IOMMU_FAULT_HANDLER_* parameters.
+ *
+ * This function should be used by IOMMU users which want to be notified
+ * whenever an IOMMU fault happens.
+ *
+ * The fault handler itself should return 0 on success, and an appropriate
+ * error code otherwise.
+ */
+void iommu_set_ext_fault_handler(struct device *dev,
+				 iommu_ext_fault_handler_t handler,
+				 void *token, int flags)
+{
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+	if (WARN_ON(!domain))
+		return;
+
+	if (WARN_ON(domain->handler || domain->ext_handler))
+		return;
+
+	domain->ext_handler = handler;
+	domain->handler_token = token;
+}
+EXPORT_SYMBOL_GPL(iommu_set_ext_fault_handler);
+
 static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
 						 unsigned type)
 {
@@ -1787,6 +1822,10 @@ int report_iommu_fault(struct iommu_domain *domain, struct device *dev,
 		       unsigned long iova, int flags)
 {
 	int ret = -ENOSYS;
+	struct iommu_fault fault = {
+		.address	= iova,
+		.flags		= flags,
+	};
 
 	/*
 	 * if upper layers showed interest and installed a fault handler,
@@ -1795,6 +1834,9 @@ int report_iommu_fault(struct iommu_domain *domain, struct device *dev,
 	if (domain->handler)
 		ret = domain->handler(domain, dev, iova, flags,
 						domain->handler_token);
+	else if (domain->ext_handler)
+		ret = domain->ext_handler(domain, dev, &fault,
+					  domain->handler_token);
 
 	trace_io_page_fault(dev, iova, flags);
 	return ret;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index e64c2711ea8d..ea4eaf585eb4 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -57,6 +57,14 @@ struct notifier_block;
 typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
 			struct device *, unsigned long, int, void *);
 
+struct iommu_fault {
+	unsigned long		address;
+	unsigned int		flags;
+};
+
+typedef int (*iommu_ext_fault_handler_t)(struct iommu_domain *, struct device *,
+					 struct iommu_fault *, void *);
+
 /* All process are being detached from this device */
 #define IOMMU_PROCESS_EXIT_ALL			(-1)
 typedef int (*iommu_process_exit_handler_t)(struct iommu_domain *, struct device *dev,
@@ -97,6 +105,7 @@ struct iommu_domain {
 	const struct iommu_ops *ops;
 	unsigned long pgsize_bitmap;	/* Bitmap of page sizes in use */
 	iommu_fault_handler_t handler;
+	iommu_ext_fault_handler_t ext_handler;
 	void *handler_token;
 	iommu_process_exit_handler_t process_exit;
 	void *process_exit_token;
@@ -352,6 +361,9 @@ extern size_t default_iommu_map_sg(struct iommu_domain *domain, unsigned long io
 extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova);
 extern void iommu_set_fault_handler(struct iommu_domain *domain,
 			iommu_fault_handler_t handler, void *token);
+extern void iommu_set_ext_fault_handler(struct device *dev,
+					iommu_ext_fault_handler_t handler,
+					void *token, int flags);
 
 extern void iommu_get_resv_regions(struct device *dev, struct list_head *list);
 extern void iommu_put_resv_regions(struct device *dev, struct list_head *list);
@@ -566,6 +578,12 @@ static inline void iommu_set_fault_handler(struct iommu_domain *domain,
 {
 }
 
+static inline void iommu_set_ext_fault_handler(struct device *dev,
+				 iommu_ext_fault_handler_t handler, void *token,
+				 int flags)
+{
+}
+
 static inline void iommu_get_resv_regions(struct device *dev,
 					struct list_head *list)
 {
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 06/36] iommu: Extend fault reporting
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

A number of new users will need additional information in the IOMMU fault
report, such as PASID and/or PRI group. Pass a new iommu_fault structure
to the driver callbacks.

For the moment add the new API in parallel, with an "ext" prefix, to let
users move to the new API at their pace. I think it would be nice to use a
single API though. There are only 4 device drivers using it, and receiving
an iommu_fault instead of iova/flags wouldn't hurt them much.

For the same reason as the process_exit handler, set_fault_handler is done
on a device rather than a domain (although for the moment stored in the
domain). Even when multiple heterogenous devices are in the same IOMMU
group, each of their driver might want to register a fault handler. At the
moment they'll race to set the handler, and the winning driver will
receive fault reports from other devices.

The new registering function also takes flags as arguments, giving future
users a way to specify at which point of the fault process they want to be
called.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/iommu.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/iommu.h | 18 ++++++++++++++++++
 2 files changed, 60 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f9cb89dd28f5..ee956b5fc301 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1234,6 +1234,8 @@ EXPORT_SYMBOL_GPL(iommu_capable);
  * This function should be used by IOMMU users which want to be notified
  * whenever an IOMMU fault happens.
  *
+ * Note that new users should use iommu_set_ext_fault_handler instead.
+ *
  * The fault handler itself should return 0 on success, and an appropriate
  * error code otherwise.
  */
@@ -1243,11 +1245,44 @@ void iommu_set_fault_handler(struct iommu_domain *domain,
 {
 	BUG_ON(!domain);
 
+	if (WARN_ON(domain->ext_handler))
+		return;
+
 	domain->handler = handler;
 	domain->handler_token = token;
 }
 EXPORT_SYMBOL_GPL(iommu_set_fault_handler);
 
+/**
+ * iommu_set_ext_fault_handler() - set a fault handler for a device
+ * @dev: the device
+ * @handler: fault handler
+ * @token: user data, will be passed back to the fault handler
+ * @flags: IOMMU_FAULT_HANDLER_* parameters.
+ *
+ * This function should be used by IOMMU users which want to be notified
+ * whenever an IOMMU fault happens.
+ *
+ * The fault handler itself should return 0 on success, and an appropriate
+ * error code otherwise.
+ */
+void iommu_set_ext_fault_handler(struct device *dev,
+				 iommu_ext_fault_handler_t handler,
+				 void *token, int flags)
+{
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+	if (WARN_ON(!domain))
+		return;
+
+	if (WARN_ON(domain->handler || domain->ext_handler))
+		return;
+
+	domain->ext_handler = handler;
+	domain->handler_token = token;
+}
+EXPORT_SYMBOL_GPL(iommu_set_ext_fault_handler);
+
 static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
 						 unsigned type)
 {
@@ -1787,6 +1822,10 @@ int report_iommu_fault(struct iommu_domain *domain, struct device *dev,
 		       unsigned long iova, int flags)
 {
 	int ret = -ENOSYS;
+	struct iommu_fault fault = {
+		.address	= iova,
+		.flags		= flags,
+	};
 
 	/*
 	 * if upper layers showed interest and installed a fault handler,
@@ -1795,6 +1834,9 @@ int report_iommu_fault(struct iommu_domain *domain, struct device *dev,
 	if (domain->handler)
 		ret = domain->handler(domain, dev, iova, flags,
 						domain->handler_token);
+	else if (domain->ext_handler)
+		ret = domain->ext_handler(domain, dev, &fault,
+					  domain->handler_token);
 
 	trace_io_page_fault(dev, iova, flags);
 	return ret;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index e64c2711ea8d..ea4eaf585eb4 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -57,6 +57,14 @@ struct notifier_block;
 typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
 			struct device *, unsigned long, int, void *);
 
+struct iommu_fault {
+	unsigned long		address;
+	unsigned int		flags;
+};
+
+typedef int (*iommu_ext_fault_handler_t)(struct iommu_domain *, struct device *,
+					 struct iommu_fault *, void *);
+
 /* All process are being detached from this device */
 #define IOMMU_PROCESS_EXIT_ALL			(-1)
 typedef int (*iommu_process_exit_handler_t)(struct iommu_domain *, struct device *dev,
@@ -97,6 +105,7 @@ struct iommu_domain {
 	const struct iommu_ops *ops;
 	unsigned long pgsize_bitmap;	/* Bitmap of page sizes in use */
 	iommu_fault_handler_t handler;
+	iommu_ext_fault_handler_t ext_handler;
 	void *handler_token;
 	iommu_process_exit_handler_t process_exit;
 	void *process_exit_token;
@@ -352,6 +361,9 @@ extern size_t default_iommu_map_sg(struct iommu_domain *domain, unsigned long io
 extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova);
 extern void iommu_set_fault_handler(struct iommu_domain *domain,
 			iommu_fault_handler_t handler, void *token);
+extern void iommu_set_ext_fault_handler(struct device *dev,
+					iommu_ext_fault_handler_t handler,
+					void *token, int flags);
 
 extern void iommu_get_resv_regions(struct device *dev, struct list_head *list);
 extern void iommu_put_resv_regions(struct device *dev, struct list_head *list);
@@ -566,6 +578,12 @@ static inline void iommu_set_fault_handler(struct iommu_domain *domain,
 {
 }
 
+static inline void iommu_set_ext_fault_handler(struct device *dev,
+				 iommu_ext_fault_handler_t handler, void *token,
+				 int flags)
+{
+}
+
 static inline void iommu_get_resv_regions(struct device *dev,
 					struct list_head *list)
 {
-- 
2.13.3


^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 06/36] iommu: Extend fault reporting
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

A number of new users will need additional information in the IOMMU fault
report, such as PASID and/or PRI group. Pass a new iommu_fault structure
to the driver callbacks.

For the moment add the new API in parallel, with an "ext" prefix, to let
users move to the new API at their pace. I think it would be nice to use a
single API though. There are only 4 device drivers using it, and receiving
an iommu_fault instead of iova/flags wouldn't hurt them much.

For the same reason as the process_exit handler, set_fault_handler is done
on a device rather than a domain (although for the moment stored in the
domain). Even when multiple heterogenous devices are in the same IOMMU
group, each of their driver might want to register a fault handler. At the
moment they'll race to set the handler, and the winning driver will
receive fault reports from other devices.

The new registering function also takes flags as arguments, giving future
users a way to specify at which point of the fault process they want to be
called.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/iommu.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/iommu.h | 18 ++++++++++++++++++
 2 files changed, 60 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f9cb89dd28f5..ee956b5fc301 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1234,6 +1234,8 @@ EXPORT_SYMBOL_GPL(iommu_capable);
  * This function should be used by IOMMU users which want to be notified
  * whenever an IOMMU fault happens.
  *
+ * Note that new users should use iommu_set_ext_fault_handler instead.
+ *
  * The fault handler itself should return 0 on success, and an appropriate
  * error code otherwise.
  */
@@ -1243,11 +1245,44 @@ void iommu_set_fault_handler(struct iommu_domain *domain,
 {
 	BUG_ON(!domain);
 
+	if (WARN_ON(domain->ext_handler))
+		return;
+
 	domain->handler = handler;
 	domain->handler_token = token;
 }
 EXPORT_SYMBOL_GPL(iommu_set_fault_handler);
 
+/**
+ * iommu_set_ext_fault_handler() - set a fault handler for a device
+ * @dev: the device
+ * @handler: fault handler
+ * @token: user data, will be passed back to the fault handler
+ * @flags: IOMMU_FAULT_HANDLER_* parameters.
+ *
+ * This function should be used by IOMMU users which want to be notified
+ * whenever an IOMMU fault happens.
+ *
+ * The fault handler itself should return 0 on success, and an appropriate
+ * error code otherwise.
+ */
+void iommu_set_ext_fault_handler(struct device *dev,
+				 iommu_ext_fault_handler_t handler,
+				 void *token, int flags)
+{
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+	if (WARN_ON(!domain))
+		return;
+
+	if (WARN_ON(domain->handler || domain->ext_handler))
+		return;
+
+	domain->ext_handler = handler;
+	domain->handler_token = token;
+}
+EXPORT_SYMBOL_GPL(iommu_set_ext_fault_handler);
+
 static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
 						 unsigned type)
 {
@@ -1787,6 +1822,10 @@ int report_iommu_fault(struct iommu_domain *domain, struct device *dev,
 		       unsigned long iova, int flags)
 {
 	int ret = -ENOSYS;
+	struct iommu_fault fault = {
+		.address	= iova,
+		.flags		= flags,
+	};
 
 	/*
 	 * if upper layers showed interest and installed a fault handler,
@@ -1795,6 +1834,9 @@ int report_iommu_fault(struct iommu_domain *domain, struct device *dev,
 	if (domain->handler)
 		ret = domain->handler(domain, dev, iova, flags,
 						domain->handler_token);
+	else if (domain->ext_handler)
+		ret = domain->ext_handler(domain, dev, &fault,
+					  domain->handler_token);
 
 	trace_io_page_fault(dev, iova, flags);
 	return ret;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index e64c2711ea8d..ea4eaf585eb4 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -57,6 +57,14 @@ struct notifier_block;
 typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
 			struct device *, unsigned long, int, void *);
 
+struct iommu_fault {
+	unsigned long		address;
+	unsigned int		flags;
+};
+
+typedef int (*iommu_ext_fault_handler_t)(struct iommu_domain *, struct device *,
+					 struct iommu_fault *, void *);
+
 /* All process are being detached from this device */
 #define IOMMU_PROCESS_EXIT_ALL			(-1)
 typedef int (*iommu_process_exit_handler_t)(struct iommu_domain *, struct device *dev,
@@ -97,6 +105,7 @@ struct iommu_domain {
 	const struct iommu_ops *ops;
 	unsigned long pgsize_bitmap;	/* Bitmap of page sizes in use */
 	iommu_fault_handler_t handler;
+	iommu_ext_fault_handler_t ext_handler;
 	void *handler_token;
 	iommu_process_exit_handler_t process_exit;
 	void *process_exit_token;
@@ -352,6 +361,9 @@ extern size_t default_iommu_map_sg(struct iommu_domain *domain, unsigned long io
 extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova);
 extern void iommu_set_fault_handler(struct iommu_domain *domain,
 			iommu_fault_handler_t handler, void *token);
+extern void iommu_set_ext_fault_handler(struct device *dev,
+					iommu_ext_fault_handler_t handler,
+					void *token, int flags);
 
 extern void iommu_get_resv_regions(struct device *dev, struct list_head *list);
 extern void iommu_put_resv_regions(struct device *dev, struct list_head *list);
@@ -566,6 +578,12 @@ static inline void iommu_set_fault_handler(struct iommu_domain *domain,
 {
 }
 
+static inline void iommu_set_ext_fault_handler(struct device *dev,
+				 iommu_ext_fault_handler_t handler, void *token,
+				 int flags)
+{
+}
+
 static inline void iommu_get_resv_regions(struct device *dev,
 					struct list_head *list)
 {
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 07/36] iommu: Add a fault handler
  2017-10-06 13:31 ` Jean-Philippe Brucker
  (?)
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: mark.rutland-5wv7dgnIgG8,
	gabriele.paoloni-hv44wF8Li93QT0dZR+AlfA,
	catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	rjw-LthD3rsA81gm4RdzfppkhA, sudeep.holla-5wv7dgnIgG8

Some systems allow devices to do paging. For example systems supporting
PCI's PRI extension or ARM SMMU's stall model. As more IOMMU drivers are
adding support for page faults, we see a number of patterns that are
common to all implementations. Let's try to unify some of the generic
code.

Add boilerplate code to handle device page requests:

* An IOMMU drivers instantiate a fault workqueue if necessary, using
  iommu_fault_queue_init and iommu_fault_queue_destroy.

* When it receives a fault report, supposedly in an IRQ handler, the IOMMU
  driver reports the fault using handle_iommu_fault (as opposed to the
  current report_iommu_fault)

* Then depending on the domain configuration, we either immediately
  forward it to a device driver, or submit it to the fault queue, to be
  handled in a thread.

* When the fault corresponds to a process context, call the mm fault
  handler on it (in the next patch).

* Once the fault is handled, it is completed. This is either done
  automatically by the mm wrapper, or manually by a device driver (e.g.
  VFIO).

A new operation, fault_response, is added to IOMMU drivers. It takes the
same fault context passed to handle_iommu_fault and a status, allowing the
driver to complete the fault, for instance by sending a PRG Response in
PCI PRI.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/Kconfig         |   9 ++
 drivers/iommu/Makefile        |   1 +
 drivers/iommu/io-pgfault.c    | 330 ++++++++++++++++++++++++++++++++++++++++++
 drivers/iommu/iommu-process.c |   3 -
 include/linux/iommu.h         | 102 ++++++++++++-
 5 files changed, 440 insertions(+), 5 deletions(-)
 create mode 100644 drivers/iommu/io-pgfault.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 1ea5c90e37be..a34d268d8ed3 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -84,6 +84,15 @@ config IOMMU_PROCESS
 
 	  If unsure, say N here.
 
+config IOMMU_FAULT
+	bool "Fault handler for the IOMMU API"
+	select IOMMU_API
+	help
+	  Enable the generic fault handler for the IOMMU API, that handles
+	  recoverable page faults or inject them into guests.
+
+	  If unsure, say N here.
+
 config FSL_PAMU
 	bool "Freescale IOMMU support"
 	depends on PCI
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index a2832edbfaa2..c34cbea482f0 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -2,6 +2,7 @@ obj-$(CONFIG_IOMMU_API) += iommu.o
 obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
 obj-$(CONFIG_IOMMU_PROCESS) += iommu-process.o
+obj-$(CONFIG_IOMMU_FAULT) += io-pgfault.o
 obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
new file mode 100644
index 000000000000..f31bc24534b0
--- /dev/null
+++ b/drivers/iommu/io-pgfault.c
@@ -0,0 +1,330 @@
+/*
+ * Handle device page faults
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
+ *
+ * Copyright (C) 2017 ARM Ltd.
+ *
+ * Author: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
+ */
+
+#include <linux/iommu.h>
+#include <linux/list.h>
+#include <linux/slab.h>
+#include <linux/workqueue.h>
+
+static struct workqueue_struct *iommu_fault_queue;
+static DECLARE_RWSEM(iommu_fault_queue_sem);
+static refcount_t iommu_fault_queue_refs = REFCOUNT_INIT(0);
+static BLOCKING_NOTIFIER_HEAD(iommu_fault_queue_flush_notifiers);
+
+/* Used to store incomplete fault groups */
+static LIST_HEAD(iommu_partial_faults);
+static DEFINE_SPINLOCK(iommu_partial_faults_lock);
+
+struct iommu_fault_context {
+	struct iommu_domain	*domain;
+	struct device		*dev;
+	struct iommu_fault	params;
+	struct list_head	head;
+};
+
+struct iommu_fault_group {
+	struct list_head	faults;
+	struct work_struct	work;
+};
+
+/*
+ * iommu_fault_finish - Finish handling a fault
+ *
+ * Send a response if necessary and pass on the sanitized status code
+ */
+static int iommu_fault_finish(struct iommu_domain *domain, struct device *dev,
+			      struct iommu_fault *fault, int status)
+{
+	/*
+	 * There is no "handling" an unrecoverable fault, so the only valid
+	 * return values are 0 or an error.
+	 */
+	if (!(fault->flags & IOMMU_FAULT_RECOVERABLE))
+		return status > 0 ? 0 : status;
+
+	/* Device driver took ownership of the fault and will complete it later */
+	if (status == IOMMU_FAULT_STATUS_IGNORE)
+		return 0;
+
+	/*
+	 * There was an internal error with handling the recoverable fault (e.g.
+	 * OOM or no handler). Try to complete the fault if possible.
+	 */
+	if (status <= 0)
+		status = IOMMU_FAULT_STATUS_INVALID;
+
+	if (WARN_ON(!domain->ops->fault_response))
+		/*
+		 * The IOMMU driver shouldn't have submitted recoverable faults
+		 * if it cannot receive a response.
+		 */
+		return -EINVAL;
+
+	return domain->ops->fault_response(domain, dev, fault, status);
+}
+
+static int iommu_fault_handle_single(struct iommu_fault_context *fault)
+{
+	/* TODO */
+	return -ENODEV;
+}
+
+static void iommu_fault_handle_group(struct work_struct *work)
+{
+	struct iommu_fault_group *group;
+	struct iommu_fault_context *fault, *next;
+	int status = IOMMU_FAULT_STATUS_HANDLED;
+
+	group = container_of(work, struct iommu_fault_group, work);
+
+	list_for_each_entry_safe(fault, next, &group->faults, head) {
+		struct iommu_fault *params = &fault->params;
+		/*
+		 * Errors are sticky: don't handle subsequent faults in the
+		 * group if there is an error.
+		 */
+		if (status == IOMMU_FAULT_STATUS_HANDLED)
+			status = iommu_fault_handle_single(fault);
+
+		if (params->flags & IOMMU_FAULT_LAST ||
+		    !(params->flags & IOMMU_FAULT_GROUP)) {
+			iommu_fault_finish(fault->domain, fault->dev,
+					   &fault->params, status);
+		}
+
+		kfree(fault);
+	}
+
+	kfree(group);
+}
+
+static int iommu_queue_fault(struct iommu_domain *domain, struct device *dev,
+			     struct iommu_fault *params)
+{
+	struct iommu_fault_group *group;
+	struct iommu_fault_context *fault = kzalloc(sizeof(*fault), GFP_KERNEL);
+
+	/*
+	 * FIXME There is a race here, with queue_register. The last IOMMU
+	 * driver has to ensure no fault is reported anymore before
+	 * unregistering, so that doesn't matter. But you could have an IOMMU
+	 * device that didn't register to the fault queue and is still reporting
+	 * faults while the last queue user disappears. It really shouldn't get
+	 * here, but it currently does if there is a blocking handler.
+	 */
+	if (!iommu_fault_queue)
+		return -ENOSYS;
+
+	if (!fault)
+		return -ENOMEM;
+
+	fault->dev = dev;
+	fault->domain = domain;
+	fault->params = *params;
+
+	if ((params->flags & IOMMU_FAULT_LAST) || !(params->flags & IOMMU_FAULT_GROUP)) {
+		group = kzalloc(sizeof(*group), GFP_KERNEL);
+		if (!group) {
+			kfree(fault);
+			return -ENOMEM;
+		}
+
+		INIT_LIST_HEAD(&group->faults);
+		list_add(&fault->head, &group->faults);
+		INIT_WORK(&group->work, iommu_fault_handle_group);
+	} else {
+		/* Non-last request of a group. Postpone until the last one */
+		spin_lock(&iommu_partial_faults_lock);
+		list_add(&fault->head, &iommu_partial_faults);
+		spin_unlock(&iommu_partial_faults_lock);
+
+		return IOMMU_FAULT_STATUS_IGNORE;
+	}
+
+	if (params->flags & IOMMU_FAULT_GROUP) {
+		struct iommu_fault_context *cur, *next;
+
+		/* See if we have pending faults for this group */
+		spin_lock(&iommu_partial_faults_lock);
+		list_for_each_entry_safe(cur, next, &iommu_partial_faults, head) {
+			if (cur->params.id == params->id && cur->dev == dev) {
+				list_del(&cur->head);
+				/* Insert *before* the last fault */
+				list_add(&cur->head, &group->faults);
+			}
+		}
+		spin_unlock(&iommu_partial_faults_lock);
+	}
+
+	queue_work(iommu_fault_queue, &group->work);
+
+	/* Postpone the fault completion */
+	return IOMMU_FAULT_STATUS_IGNORE;
+}
+
+/**
+ * handle_iommu_fault - Handle fault in device driver or mm
+ *
+ * If the device driver expressed interest in handling fault, report it throught
+ * the domain handler. If the fault is recoverable, try to page in the address.
+ */
+int handle_iommu_fault(struct iommu_domain *domain, struct device *dev,
+		       struct iommu_fault *fault)
+{
+	int ret = -ENOSYS;
+
+	/*
+	 * if upper layers showed interest and installed a fault handler,
+	 * invoke it.
+	 */
+	if (domain->ext_handler) {
+		ret = domain->ext_handler(domain, dev, fault,
+					  domain->handler_token);
+
+		if (ret != IOMMU_FAULT_STATUS_NONE)
+			return iommu_fault_finish(domain, dev, fault, ret);
+	} else if (domain->handler && !(fault->flags &
+		   (IOMMU_FAULT_RECOVERABLE | IOMMU_FAULT_PASID))) {
+		/* Fall back to the old method if possible */
+		ret = domain->handler(domain, dev, fault->address,
+				      fault->flags, domain->handler_token);
+		if (ret)
+			return ret;
+	}
+
+	/* If the handler is blocking, handle fault in the workqueue */
+	if (fault->flags & IOMMU_FAULT_RECOVERABLE)
+		ret = iommu_queue_fault(domain, dev, fault);
+
+	return iommu_fault_finish(domain, dev, fault, ret);
+}
+EXPORT_SYMBOL_GPL(handle_iommu_fault);
+
+/**
+ * iommu_fault_response - Complete a recoverable fault
+ * @domain: iommu domain passed to the handler
+ * @dev: device passed to the handler
+ * @fault: fault passed to the handler
+ * @status: action to perform
+ *
+ * An atomic handler that took ownership of the fault (by returning
+ * IOMMU_FAULT_STATUS_IGNORE) must complete the fault by calling this function.
+ */
+int iommu_fault_response(struct iommu_domain *domain, struct device *dev,
+			 struct iommu_fault *fault, enum iommu_fault_status status)
+{
+	/* No response is need for unrecoverable faults... */
+	if (!(fault->flags & IOMMU_FAULT_RECOVERABLE))
+		return -EINVAL;
+
+	/* Ignore is certainly the wrong thing to do at this point */
+	if (WARN_ON(status == IOMMU_FAULT_STATUS_IGNORE ||
+		    status == IOMMU_FAULT_STATUS_NONE))
+		status = IOMMU_FAULT_STATUS_INVALID;
+
+	return iommu_fault_finish(domain, dev, fault, status);
+}
+EXPORT_SYMBOL_GPL(iommu_fault_response);
+
+/**
+ * iommu_fault_queue_register - register an IOMMU driver to the global fault
+ * queue
+ *
+ * @flush_notifier: a notifier block that is called before the fault queue is
+ * flushed. The IOMMU driver should commit all faults that are pending in its
+ * low-level queues at the time of the call, into the fault queue. The notifier
+ * takes a device pointer as argument, hinting what endpoint is causing the
+ * flush. When the device is NULL, all faults should be committed.
+ */
+int iommu_fault_queue_register(struct notifier_block *flush_notifier)
+{
+	/*
+	 * The WQ is unordered because the low-level handler enqueues faults by
+	 * group. PRI requests within a group have to be ordered, but once
+	 * that's dealt with, the high-level function can handle groups out of
+	 * order.
+	 */
+	down_write(&iommu_fault_queue_sem);
+	if (!iommu_fault_queue) {
+		iommu_fault_queue = alloc_workqueue("iommu_fault_queue",
+						    WQ_UNBOUND, 0);
+		if (iommu_fault_queue)
+			refcount_set(&iommu_fault_queue_refs, 1);
+	} else {
+		refcount_inc(&iommu_fault_queue_refs);
+	}
+	up_write(&iommu_fault_queue_sem);
+
+	if (!iommu_fault_queue)
+		return -ENOMEM;
+
+	if (flush_notifier)
+		blocking_notifier_chain_register(&iommu_fault_queue_flush_notifiers,
+						 flush_notifier);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iommu_fault_queue_register);
+
+/**
+ * iommu_fault_queue_flush - Ensure that all queued faults have been processed.
+ * @dev: the endpoint whose faults need to be flushed. If NULL, flush all
+ *       pending faults.
+ *
+ * Users must call this function when releasing a PASID, to ensure that all
+ * pending faults affecting this PASID have been handled, and won't affect the
+ * address space of a subsequent process that reuses this PASID.
+ */
+void iommu_fault_queue_flush(struct device *dev)
+{
+	blocking_notifier_call_chain(&iommu_fault_queue_flush_notifiers, 0, dev);
+
+	down_read(&iommu_fault_queue_sem);
+	/*
+	 * Don't flush the partial faults list. All PRGs with the PASID are
+	 * complete and have been submitted to the queue.
+	 */
+	if (iommu_fault_queue)
+		flush_workqueue(iommu_fault_queue);
+	up_read(&iommu_fault_queue_sem);
+}
+EXPORT_SYMBOL_GPL(iommu_fault_queue_flush);
+
+/**
+ * iommu_fault_queue_unregister - Unregister an IOMMU driver from the global
+ * fault queue.
+ *
+ * @flush_notifier: same parameter as iommu_fault_queue_register
+ */
+void iommu_fault_queue_unregister(struct notifier_block *flush_notifier)
+{
+	down_write(&iommu_fault_queue_sem);
+	if (refcount_dec_and_test(&iommu_fault_queue_refs)) {
+		destroy_workqueue(iommu_fault_queue);
+		iommu_fault_queue = NULL;
+	}
+	up_write(&iommu_fault_queue_sem);
+
+	if (flush_notifier)
+		blocking_notifier_chain_unregister(&iommu_fault_queue_flush_notifiers,
+						   flush_notifier);
+}
+EXPORT_SYMBOL_GPL(iommu_fault_queue_unregister);
diff --git a/drivers/iommu/iommu-process.c b/drivers/iommu/iommu-process.c
index dee7691e3791..092240708b78 100644
--- a/drivers/iommu/iommu-process.c
+++ b/drivers/iommu/iommu-process.c
@@ -26,9 +26,6 @@
 #include <linux/sched/mm.h>
 #include <linux/spinlock.h>
 
-/* FIXME: stub for the fault queue. Remove later. */
-#define iommu_fault_queue_flush(...)
-
 /* Link between a domain and a process */
 struct iommu_context {
 	struct iommu_process	*process;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index ea4eaf585eb4..37fafaf07ee2 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -51,15 +51,69 @@ struct iommu_domain;
 struct notifier_block;
 
 /* iommu fault flags */
-#define IOMMU_FAULT_READ	0x0
-#define IOMMU_FAULT_WRITE	0x1
+#define IOMMU_FAULT_READ		(1 << 0)
+#define IOMMU_FAULT_WRITE		(1 << 1)
+#define IOMMU_FAULT_EXEC		(1 << 2)
+#define IOMMU_FAULT_PRIV		(1 << 3)
+/*
+ * If a fault is recoverable, then it *must* be completed, once handled, with
+ * iommu_fault_response.
+ */
+#define IOMMU_FAULT_RECOVERABLE		(1 << 4)
+/* The PASID field is valid */
+#define IOMMU_FAULT_PASID		(1 << 5)
+/* Fault is part of a group (PCI PRG) */
+#define IOMMU_FAULT_GROUP		(1 << 6)
+/* Fault is last of its group */
+#define IOMMU_FAULT_LAST		(1 << 7)
+
+/**
+ * enum iommu_fault_status - Return status of fault handlers, telling the IOMMU
+ *	driver how to proceed with the fault.
+ *
+ * @IOMMU_FAULT_STATUS_NONE: Fault was not handled. Call the next handler, or
+ *	terminate.
+ * @IOMMU_FAULT_STATUS_FAILURE: General error. Drop all subsequent faults from
+ *	this device if possible. This is "Response Failure" in PCI PRI.
+ * @IOMMU_FAULT_STATUS_INVALID: Could not handle this fault, don't retry the
+ *	access. This is "Invalid Request" in PCI PRI.
+ * @IOMMU_FAULT_STATUS_HANDLED: Fault has been handled and the page tables
+ *	populated, retry the access.
+ * @IOMMU_FAULT_STATUS_IGNORE: Stop processing the fault, and do not send a
+ *	reply to the device.
+ *
+ * For unrecoverable faults, the only valid status is IOMMU_FAULT_STATUS_NONE
+ * For a recoverable fault, if no one handled the fault, treat as
+ * IOMMU_FAULT_STATUS_INVALID.
+ */
+enum iommu_fault_status {
+	IOMMU_FAULT_STATUS_NONE = 0,
+	IOMMU_FAULT_STATUS_FAILURE,
+	IOMMU_FAULT_STATUS_INVALID,
+	IOMMU_FAULT_STATUS_HANDLED,
+	IOMMU_FAULT_STATUS_IGNORE,
+};
 
 typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
 			struct device *, unsigned long, int, void *);
 
 struct iommu_fault {
+	/* Faulting address */
 	unsigned long		address;
+	/* Fault flags */
 	unsigned int		flags;
+	/* Process address space ID (if IOMMU_FAULT_PASID is present) */
+	u32			pasid;
+	/*
+	 * For PCI PRI, 'id' is the PRG. For others, it's a tag identifying a
+	 * single fault.
+	 */
+	unsigned int		id;
+	/*
+	 * IOMMU vendor-specific things. This cannot be a private pointer
+	 * because the fault report might leave the kernel and into a guest.
+	 */
+	u64			iommu_data;
 };
 
 typedef int (*iommu_ext_fault_handler_t)(struct iommu_domain *, struct device *,
@@ -228,6 +282,7 @@ struct iommu_resv_region {
  * @domain_set_windows: Set the number of windows for a domain
  * @domain_get_windows: Return the number of windows for a domain
  * @of_xlate: add OF master IDs to iommu grouping
+ * @fault_reponse: complete a recoverable fault
  * @pgsize_bitmap: bitmap of all possible supported page sizes
  */
 struct iommu_ops {
@@ -287,6 +342,10 @@ struct iommu_ops {
 	int (*of_xlate)(struct device *dev, struct of_phandle_args *args);
 	bool (*is_attach_deferred)(struct iommu_domain *domain, struct device *dev);
 
+	int (*fault_response)(struct iommu_domain *domain, struct device *dev,
+			      struct iommu_fault *fault,
+			      enum iommu_fault_status status);
+
 	unsigned long pgsize_bitmap;
 };
 
@@ -824,4 +883,43 @@ static inline void __iommu_process_unbind_dev_all(struct iommu_domain *domain,
 
 #endif /* CONFIG_IOMMU_PROCESS */
 
+#ifdef CONFIG_IOMMU_FAULT
+extern int handle_iommu_fault(struct iommu_domain *domain, struct device *dev,
+			      struct iommu_fault *fault);
+extern int iommu_fault_response(struct iommu_domain *domain, struct device *dev,
+				struct iommu_fault *fault,
+				enum iommu_fault_status status);
+extern int iommu_fault_queue_register(struct notifier_block *flush_notifier);
+extern void iommu_fault_queue_flush(struct device *dev);
+extern void iommu_fault_queue_unregister(struct notifier_block *flush_notifier);
+#else /* CONFIG_IOMMU_FAULT */
+static inline int handle_iommu_fault(struct iommu_domain *domain,
+				     struct device *dev,
+				     struct iommu_fault *fault)
+{
+	return -ENODEV;
+}
+
+static inline int iommu_fault_response(struct iommu_domain *domain,
+				       struct device *dev,
+				       struct iommu_fault *fault,
+				       enum iommu_fault_status status)
+{
+	return -ENODEV;
+}
+
+static inline int iommu_fault_queue_register(struct notifier_block *flush_notifier)
+{
+	return -ENODEV;
+}
+
+static inline void iommu_fault_queue_flush(struct device *dev)
+{
+}
+
+static inline void iommu_fault_queue_unregister(struct notifier_block *flush_notifier)
+{
+}
+#endif /* CONFIG_IOMMU_FAULT */
+
 #endif /* __LINUX_IOMMU_H */
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 07/36] iommu: Add a fault handler
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

Some systems allow devices to do paging. For example systems supporting
PCI's PRI extension or ARM SMMU's stall model. As more IOMMU drivers are
adding support for page faults, we see a number of patterns that are
common to all implementations. Let's try to unify some of the generic
code.

Add boilerplate code to handle device page requests:

* An IOMMU drivers instantiate a fault workqueue if necessary, using
  iommu_fault_queue_init and iommu_fault_queue_destroy.

* When it receives a fault report, supposedly in an IRQ handler, the IOMMU
  driver reports the fault using handle_iommu_fault (as opposed to the
  current report_iommu_fault)

* Then depending on the domain configuration, we either immediately
  forward it to a device driver, or submit it to the fault queue, to be
  handled in a thread.

* When the fault corresponds to a process context, call the mm fault
  handler on it (in the next patch).

* Once the fault is handled, it is completed. This is either done
  automatically by the mm wrapper, or manually by a device driver (e.g.
  VFIO).

A new operation, fault_response, is added to IOMMU drivers. It takes the
same fault context passed to handle_iommu_fault and a status, allowing the
driver to complete the fault, for instance by sending a PRG Response in
PCI PRI.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/Kconfig         |   9 ++
 drivers/iommu/Makefile        |   1 +
 drivers/iommu/io-pgfault.c    | 330 ++++++++++++++++++++++++++++++++++++++++++
 drivers/iommu/iommu-process.c |   3 -
 include/linux/iommu.h         | 102 ++++++++++++-
 5 files changed, 440 insertions(+), 5 deletions(-)
 create mode 100644 drivers/iommu/io-pgfault.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 1ea5c90e37be..a34d268d8ed3 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -84,6 +84,15 @@ config IOMMU_PROCESS
 
 	  If unsure, say N here.
 
+config IOMMU_FAULT
+	bool "Fault handler for the IOMMU API"
+	select IOMMU_API
+	help
+	  Enable the generic fault handler for the IOMMU API, that handles
+	  recoverable page faults or inject them into guests.
+
+	  If unsure, say N here.
+
 config FSL_PAMU
 	bool "Freescale IOMMU support"
 	depends on PCI
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index a2832edbfaa2..c34cbea482f0 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -2,6 +2,7 @@ obj-$(CONFIG_IOMMU_API) += iommu.o
 obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
 obj-$(CONFIG_IOMMU_PROCESS) += iommu-process.o
+obj-$(CONFIG_IOMMU_FAULT) += io-pgfault.o
 obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
new file mode 100644
index 000000000000..f31bc24534b0
--- /dev/null
+++ b/drivers/iommu/io-pgfault.c
@@ -0,0 +1,330 @@
+/*
+ * Handle device page faults
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
+ *
+ * Copyright (C) 2017 ARM Ltd.
+ *
+ * Author: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
+ */
+
+#include <linux/iommu.h>
+#include <linux/list.h>
+#include <linux/slab.h>
+#include <linux/workqueue.h>
+
+static struct workqueue_struct *iommu_fault_queue;
+static DECLARE_RWSEM(iommu_fault_queue_sem);
+static refcount_t iommu_fault_queue_refs = REFCOUNT_INIT(0);
+static BLOCKING_NOTIFIER_HEAD(iommu_fault_queue_flush_notifiers);
+
+/* Used to store incomplete fault groups */
+static LIST_HEAD(iommu_partial_faults);
+static DEFINE_SPINLOCK(iommu_partial_faults_lock);
+
+struct iommu_fault_context {
+	struct iommu_domain	*domain;
+	struct device		*dev;
+	struct iommu_fault	params;
+	struct list_head	head;
+};
+
+struct iommu_fault_group {
+	struct list_head	faults;
+	struct work_struct	work;
+};
+
+/*
+ * iommu_fault_finish - Finish handling a fault
+ *
+ * Send a response if necessary and pass on the sanitized status code
+ */
+static int iommu_fault_finish(struct iommu_domain *domain, struct device *dev,
+			      struct iommu_fault *fault, int status)
+{
+	/*
+	 * There is no "handling" an unrecoverable fault, so the only valid
+	 * return values are 0 or an error.
+	 */
+	if (!(fault->flags & IOMMU_FAULT_RECOVERABLE))
+		return status > 0 ? 0 : status;
+
+	/* Device driver took ownership of the fault and will complete it later */
+	if (status == IOMMU_FAULT_STATUS_IGNORE)
+		return 0;
+
+	/*
+	 * There was an internal error with handling the recoverable fault (e.g.
+	 * OOM or no handler). Try to complete the fault if possible.
+	 */
+	if (status <= 0)
+		status = IOMMU_FAULT_STATUS_INVALID;
+
+	if (WARN_ON(!domain->ops->fault_response))
+		/*
+		 * The IOMMU driver shouldn't have submitted recoverable faults
+		 * if it cannot receive a response.
+		 */
+		return -EINVAL;
+
+	return domain->ops->fault_response(domain, dev, fault, status);
+}
+
+static int iommu_fault_handle_single(struct iommu_fault_context *fault)
+{
+	/* TODO */
+	return -ENODEV;
+}
+
+static void iommu_fault_handle_group(struct work_struct *work)
+{
+	struct iommu_fault_group *group;
+	struct iommu_fault_context *fault, *next;
+	int status = IOMMU_FAULT_STATUS_HANDLED;
+
+	group = container_of(work, struct iommu_fault_group, work);
+
+	list_for_each_entry_safe(fault, next, &group->faults, head) {
+		struct iommu_fault *params = &fault->params;
+		/*
+		 * Errors are sticky: don't handle subsequent faults in the
+		 * group if there is an error.
+		 */
+		if (status == IOMMU_FAULT_STATUS_HANDLED)
+			status = iommu_fault_handle_single(fault);
+
+		if (params->flags & IOMMU_FAULT_LAST ||
+		    !(params->flags & IOMMU_FAULT_GROUP)) {
+			iommu_fault_finish(fault->domain, fault->dev,
+					   &fault->params, status);
+		}
+
+		kfree(fault);
+	}
+
+	kfree(group);
+}
+
+static int iommu_queue_fault(struct iommu_domain *domain, struct device *dev,
+			     struct iommu_fault *params)
+{
+	struct iommu_fault_group *group;
+	struct iommu_fault_context *fault = kzalloc(sizeof(*fault), GFP_KERNEL);
+
+	/*
+	 * FIXME There is a race here, with queue_register. The last IOMMU
+	 * driver has to ensure no fault is reported anymore before
+	 * unregistering, so that doesn't matter. But you could have an IOMMU
+	 * device that didn't register to the fault queue and is still reporting
+	 * faults while the last queue user disappears. It really shouldn't get
+	 * here, but it currently does if there is a blocking handler.
+	 */
+	if (!iommu_fault_queue)
+		return -ENOSYS;
+
+	if (!fault)
+		return -ENOMEM;
+
+	fault->dev = dev;
+	fault->domain = domain;
+	fault->params = *params;
+
+	if ((params->flags & IOMMU_FAULT_LAST) || !(params->flags & IOMMU_FAULT_GROUP)) {
+		group = kzalloc(sizeof(*group), GFP_KERNEL);
+		if (!group) {
+			kfree(fault);
+			return -ENOMEM;
+		}
+
+		INIT_LIST_HEAD(&group->faults);
+		list_add(&fault->head, &group->faults);
+		INIT_WORK(&group->work, iommu_fault_handle_group);
+	} else {
+		/* Non-last request of a group. Postpone until the last one */
+		spin_lock(&iommu_partial_faults_lock);
+		list_add(&fault->head, &iommu_partial_faults);
+		spin_unlock(&iommu_partial_faults_lock);
+
+		return IOMMU_FAULT_STATUS_IGNORE;
+	}
+
+	if (params->flags & IOMMU_FAULT_GROUP) {
+		struct iommu_fault_context *cur, *next;
+
+		/* See if we have pending faults for this group */
+		spin_lock(&iommu_partial_faults_lock);
+		list_for_each_entry_safe(cur, next, &iommu_partial_faults, head) {
+			if (cur->params.id == params->id && cur->dev == dev) {
+				list_del(&cur->head);
+				/* Insert *before* the last fault */
+				list_add(&cur->head, &group->faults);
+			}
+		}
+		spin_unlock(&iommu_partial_faults_lock);
+	}
+
+	queue_work(iommu_fault_queue, &group->work);
+
+	/* Postpone the fault completion */
+	return IOMMU_FAULT_STATUS_IGNORE;
+}
+
+/**
+ * handle_iommu_fault - Handle fault in device driver or mm
+ *
+ * If the device driver expressed interest in handling fault, report it throught
+ * the domain handler. If the fault is recoverable, try to page in the address.
+ */
+int handle_iommu_fault(struct iommu_domain *domain, struct device *dev,
+		       struct iommu_fault *fault)
+{
+	int ret = -ENOSYS;
+
+	/*
+	 * if upper layers showed interest and installed a fault handler,
+	 * invoke it.
+	 */
+	if (domain->ext_handler) {
+		ret = domain->ext_handler(domain, dev, fault,
+					  domain->handler_token);
+
+		if (ret != IOMMU_FAULT_STATUS_NONE)
+			return iommu_fault_finish(domain, dev, fault, ret);
+	} else if (domain->handler && !(fault->flags &
+		   (IOMMU_FAULT_RECOVERABLE | IOMMU_FAULT_PASID))) {
+		/* Fall back to the old method if possible */
+		ret = domain->handler(domain, dev, fault->address,
+				      fault->flags, domain->handler_token);
+		if (ret)
+			return ret;
+	}
+
+	/* If the handler is blocking, handle fault in the workqueue */
+	if (fault->flags & IOMMU_FAULT_RECOVERABLE)
+		ret = iommu_queue_fault(domain, dev, fault);
+
+	return iommu_fault_finish(domain, dev, fault, ret);
+}
+EXPORT_SYMBOL_GPL(handle_iommu_fault);
+
+/**
+ * iommu_fault_response - Complete a recoverable fault
+ * @domain: iommu domain passed to the handler
+ * @dev: device passed to the handler
+ * @fault: fault passed to the handler
+ * @status: action to perform
+ *
+ * An atomic handler that took ownership of the fault (by returning
+ * IOMMU_FAULT_STATUS_IGNORE) must complete the fault by calling this function.
+ */
+int iommu_fault_response(struct iommu_domain *domain, struct device *dev,
+			 struct iommu_fault *fault, enum iommu_fault_status status)
+{
+	/* No response is need for unrecoverable faults... */
+	if (!(fault->flags & IOMMU_FAULT_RECOVERABLE))
+		return -EINVAL;
+
+	/* Ignore is certainly the wrong thing to do at this point */
+	if (WARN_ON(status == IOMMU_FAULT_STATUS_IGNORE ||
+		    status == IOMMU_FAULT_STATUS_NONE))
+		status = IOMMU_FAULT_STATUS_INVALID;
+
+	return iommu_fault_finish(domain, dev, fault, status);
+}
+EXPORT_SYMBOL_GPL(iommu_fault_response);
+
+/**
+ * iommu_fault_queue_register - register an IOMMU driver to the global fault
+ * queue
+ *
+ * @flush_notifier: a notifier block that is called before the fault queue is
+ * flushed. The IOMMU driver should commit all faults that are pending in its
+ * low-level queues at the time of the call, into the fault queue. The notifier
+ * takes a device pointer as argument, hinting what endpoint is causing the
+ * flush. When the device is NULL, all faults should be committed.
+ */
+int iommu_fault_queue_register(struct notifier_block *flush_notifier)
+{
+	/*
+	 * The WQ is unordered because the low-level handler enqueues faults by
+	 * group. PRI requests within a group have to be ordered, but once
+	 * that's dealt with, the high-level function can handle groups out of
+	 * order.
+	 */
+	down_write(&iommu_fault_queue_sem);
+	if (!iommu_fault_queue) {
+		iommu_fault_queue = alloc_workqueue("iommu_fault_queue",
+						    WQ_UNBOUND, 0);
+		if (iommu_fault_queue)
+			refcount_set(&iommu_fault_queue_refs, 1);
+	} else {
+		refcount_inc(&iommu_fault_queue_refs);
+	}
+	up_write(&iommu_fault_queue_sem);
+
+	if (!iommu_fault_queue)
+		return -ENOMEM;
+
+	if (flush_notifier)
+		blocking_notifier_chain_register(&iommu_fault_queue_flush_notifiers,
+						 flush_notifier);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iommu_fault_queue_register);
+
+/**
+ * iommu_fault_queue_flush - Ensure that all queued faults have been processed.
+ * @dev: the endpoint whose faults need to be flushed. If NULL, flush all
+ *       pending faults.
+ *
+ * Users must call this function when releasing a PASID, to ensure that all
+ * pending faults affecting this PASID have been handled, and won't affect the
+ * address space of a subsequent process that reuses this PASID.
+ */
+void iommu_fault_queue_flush(struct device *dev)
+{
+	blocking_notifier_call_chain(&iommu_fault_queue_flush_notifiers, 0, dev);
+
+	down_read(&iommu_fault_queue_sem);
+	/*
+	 * Don't flush the partial faults list. All PRGs with the PASID are
+	 * complete and have been submitted to the queue.
+	 */
+	if (iommu_fault_queue)
+		flush_workqueue(iommu_fault_queue);
+	up_read(&iommu_fault_queue_sem);
+}
+EXPORT_SYMBOL_GPL(iommu_fault_queue_flush);
+
+/**
+ * iommu_fault_queue_unregister - Unregister an IOMMU driver from the global
+ * fault queue.
+ *
+ * @flush_notifier: same parameter as iommu_fault_queue_register
+ */
+void iommu_fault_queue_unregister(struct notifier_block *flush_notifier)
+{
+	down_write(&iommu_fault_queue_sem);
+	if (refcount_dec_and_test(&iommu_fault_queue_refs)) {
+		destroy_workqueue(iommu_fault_queue);
+		iommu_fault_queue = NULL;
+	}
+	up_write(&iommu_fault_queue_sem);
+
+	if (flush_notifier)
+		blocking_notifier_chain_unregister(&iommu_fault_queue_flush_notifiers,
+						   flush_notifier);
+}
+EXPORT_SYMBOL_GPL(iommu_fault_queue_unregister);
diff --git a/drivers/iommu/iommu-process.c b/drivers/iommu/iommu-process.c
index dee7691e3791..092240708b78 100644
--- a/drivers/iommu/iommu-process.c
+++ b/drivers/iommu/iommu-process.c
@@ -26,9 +26,6 @@
 #include <linux/sched/mm.h>
 #include <linux/spinlock.h>
 
-/* FIXME: stub for the fault queue. Remove later. */
-#define iommu_fault_queue_flush(...)
-
 /* Link between a domain and a process */
 struct iommu_context {
 	struct iommu_process	*process;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index ea4eaf585eb4..37fafaf07ee2 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -51,15 +51,69 @@ struct iommu_domain;
 struct notifier_block;
 
 /* iommu fault flags */
-#define IOMMU_FAULT_READ	0x0
-#define IOMMU_FAULT_WRITE	0x1
+#define IOMMU_FAULT_READ		(1 << 0)
+#define IOMMU_FAULT_WRITE		(1 << 1)
+#define IOMMU_FAULT_EXEC		(1 << 2)
+#define IOMMU_FAULT_PRIV		(1 << 3)
+/*
+ * If a fault is recoverable, then it *must* be completed, once handled, with
+ * iommu_fault_response.
+ */
+#define IOMMU_FAULT_RECOVERABLE		(1 << 4)
+/* The PASID field is valid */
+#define IOMMU_FAULT_PASID		(1 << 5)
+/* Fault is part of a group (PCI PRG) */
+#define IOMMU_FAULT_GROUP		(1 << 6)
+/* Fault is last of its group */
+#define IOMMU_FAULT_LAST		(1 << 7)
+
+/**
+ * enum iommu_fault_status - Return status of fault handlers, telling the IOMMU
+ *	driver how to proceed with the fault.
+ *
+ * @IOMMU_FAULT_STATUS_NONE: Fault was not handled. Call the next handler, or
+ *	terminate.
+ * @IOMMU_FAULT_STATUS_FAILURE: General error. Drop all subsequent faults from
+ *	this device if possible. This is "Response Failure" in PCI PRI.
+ * @IOMMU_FAULT_STATUS_INVALID: Could not handle this fault, don't retry the
+ *	access. This is "Invalid Request" in PCI PRI.
+ * @IOMMU_FAULT_STATUS_HANDLED: Fault has been handled and the page tables
+ *	populated, retry the access.
+ * @IOMMU_FAULT_STATUS_IGNORE: Stop processing the fault, and do not send a
+ *	reply to the device.
+ *
+ * For unrecoverable faults, the only valid status is IOMMU_FAULT_STATUS_NONE
+ * For a recoverable fault, if no one handled the fault, treat as
+ * IOMMU_FAULT_STATUS_INVALID.
+ */
+enum iommu_fault_status {
+	IOMMU_FAULT_STATUS_NONE = 0,
+	IOMMU_FAULT_STATUS_FAILURE,
+	IOMMU_FAULT_STATUS_INVALID,
+	IOMMU_FAULT_STATUS_HANDLED,
+	IOMMU_FAULT_STATUS_IGNORE,
+};
 
 typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
 			struct device *, unsigned long, int, void *);
 
 struct iommu_fault {
+	/* Faulting address */
 	unsigned long		address;
+	/* Fault flags */
 	unsigned int		flags;
+	/* Process address space ID (if IOMMU_FAULT_PASID is present) */
+	u32			pasid;
+	/*
+	 * For PCI PRI, 'id' is the PRG. For others, it's a tag identifying a
+	 * single fault.
+	 */
+	unsigned int		id;
+	/*
+	 * IOMMU vendor-specific things. This cannot be a private pointer
+	 * because the fault report might leave the kernel and into a guest.
+	 */
+	u64			iommu_data;
 };
 
 typedef int (*iommu_ext_fault_handler_t)(struct iommu_domain *, struct device *,
@@ -228,6 +282,7 @@ struct iommu_resv_region {
  * @domain_set_windows: Set the number of windows for a domain
  * @domain_get_windows: Return the number of windows for a domain
  * @of_xlate: add OF master IDs to iommu grouping
+ * @fault_reponse: complete a recoverable fault
  * @pgsize_bitmap: bitmap of all possible supported page sizes
  */
 struct iommu_ops {
@@ -287,6 +342,10 @@ struct iommu_ops {
 	int (*of_xlate)(struct device *dev, struct of_phandle_args *args);
 	bool (*is_attach_deferred)(struct iommu_domain *domain, struct device *dev);
 
+	int (*fault_response)(struct iommu_domain *domain, struct device *dev,
+			      struct iommu_fault *fault,
+			      enum iommu_fault_status status);
+
 	unsigned long pgsize_bitmap;
 };
 
@@ -824,4 +883,43 @@ static inline void __iommu_process_unbind_dev_all(struct iommu_domain *domain,
 
 #endif /* CONFIG_IOMMU_PROCESS */
 
+#ifdef CONFIG_IOMMU_FAULT
+extern int handle_iommu_fault(struct iommu_domain *domain, struct device *dev,
+			      struct iommu_fault *fault);
+extern int iommu_fault_response(struct iommu_domain *domain, struct device *dev,
+				struct iommu_fault *fault,
+				enum iommu_fault_status status);
+extern int iommu_fault_queue_register(struct notifier_block *flush_notifier);
+extern void iommu_fault_queue_flush(struct device *dev);
+extern void iommu_fault_queue_unregister(struct notifier_block *flush_notifier);
+#else /* CONFIG_IOMMU_FAULT */
+static inline int handle_iommu_fault(struct iommu_domain *domain,
+				     struct device *dev,
+				     struct iommu_fault *fault)
+{
+	return -ENODEV;
+}
+
+static inline int iommu_fault_response(struct iommu_domain *domain,
+				       struct device *dev,
+				       struct iommu_fault *fault,
+				       enum iommu_fault_status status)
+{
+	return -ENODEV;
+}
+
+static inline int iommu_fault_queue_register(struct notifier_block *flush_notifier)
+{
+	return -ENODEV;
+}
+
+static inline void iommu_fault_queue_flush(struct device *dev)
+{
+}
+
+static inline void iommu_fault_queue_unregister(struct notifier_block *flush_notifier)
+{
+}
+#endif /* CONFIG_IOMMU_FAULT */
+
 #endif /* __LINUX_IOMMU_H */
-- 
2.13.3


^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 07/36] iommu: Add a fault handler
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

Some systems allow devices to do paging. For example systems supporting
PCI's PRI extension or ARM SMMU's stall model. As more IOMMU drivers are
adding support for page faults, we see a number of patterns that are
common to all implementations. Let's try to unify some of the generic
code.

Add boilerplate code to handle device page requests:

* An IOMMU drivers instantiate a fault workqueue if necessary, using
  iommu_fault_queue_init and iommu_fault_queue_destroy.

* When it receives a fault report, supposedly in an IRQ handler, the IOMMU
  driver reports the fault using handle_iommu_fault (as opposed to the
  current report_iommu_fault)

* Then depending on the domain configuration, we either immediately
  forward it to a device driver, or submit it to the fault queue, to be
  handled in a thread.

* When the fault corresponds to a process context, call the mm fault
  handler on it (in the next patch).

* Once the fault is handled, it is completed. This is either done
  automatically by the mm wrapper, or manually by a device driver (e.g.
  VFIO).

A new operation, fault_response, is added to IOMMU drivers. It takes the
same fault context passed to handle_iommu_fault and a status, allowing the
driver to complete the fault, for instance by sending a PRG Response in
PCI PRI.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/Kconfig         |   9 ++
 drivers/iommu/Makefile        |   1 +
 drivers/iommu/io-pgfault.c    | 330 ++++++++++++++++++++++++++++++++++++++++++
 drivers/iommu/iommu-process.c |   3 -
 include/linux/iommu.h         | 102 ++++++++++++-
 5 files changed, 440 insertions(+), 5 deletions(-)
 create mode 100644 drivers/iommu/io-pgfault.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 1ea5c90e37be..a34d268d8ed3 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -84,6 +84,15 @@ config IOMMU_PROCESS
 
 	  If unsure, say N here.
 
+config IOMMU_FAULT
+	bool "Fault handler for the IOMMU API"
+	select IOMMU_API
+	help
+	  Enable the generic fault handler for the IOMMU API, that handles
+	  recoverable page faults or inject them into guests.
+
+	  If unsure, say N here.
+
 config FSL_PAMU
 	bool "Freescale IOMMU support"
 	depends on PCI
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index a2832edbfaa2..c34cbea482f0 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -2,6 +2,7 @@ obj-$(CONFIG_IOMMU_API) += iommu.o
 obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
 obj-$(CONFIG_IOMMU_PROCESS) += iommu-process.o
+obj-$(CONFIG_IOMMU_FAULT) += io-pgfault.o
 obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
new file mode 100644
index 000000000000..f31bc24534b0
--- /dev/null
+++ b/drivers/iommu/io-pgfault.c
@@ -0,0 +1,330 @@
+/*
+ * Handle device page faults
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
+ *
+ * Copyright (C) 2017 ARM Ltd.
+ *
+ * Author: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
+ */
+
+#include <linux/iommu.h>
+#include <linux/list.h>
+#include <linux/slab.h>
+#include <linux/workqueue.h>
+
+static struct workqueue_struct *iommu_fault_queue;
+static DECLARE_RWSEM(iommu_fault_queue_sem);
+static refcount_t iommu_fault_queue_refs = REFCOUNT_INIT(0);
+static BLOCKING_NOTIFIER_HEAD(iommu_fault_queue_flush_notifiers);
+
+/* Used to store incomplete fault groups */
+static LIST_HEAD(iommu_partial_faults);
+static DEFINE_SPINLOCK(iommu_partial_faults_lock);
+
+struct iommu_fault_context {
+	struct iommu_domain	*domain;
+	struct device		*dev;
+	struct iommu_fault	params;
+	struct list_head	head;
+};
+
+struct iommu_fault_group {
+	struct list_head	faults;
+	struct work_struct	work;
+};
+
+/*
+ * iommu_fault_finish - Finish handling a fault
+ *
+ * Send a response if necessary and pass on the sanitized status code
+ */
+static int iommu_fault_finish(struct iommu_domain *domain, struct device *dev,
+			      struct iommu_fault *fault, int status)
+{
+	/*
+	 * There is no "handling" an unrecoverable fault, so the only valid
+	 * return values are 0 or an error.
+	 */
+	if (!(fault->flags & IOMMU_FAULT_RECOVERABLE))
+		return status > 0 ? 0 : status;
+
+	/* Device driver took ownership of the fault and will complete it later */
+	if (status == IOMMU_FAULT_STATUS_IGNORE)
+		return 0;
+
+	/*
+	 * There was an internal error with handling the recoverable fault (e.g.
+	 * OOM or no handler). Try to complete the fault if possible.
+	 */
+	if (status <= 0)
+		status = IOMMU_FAULT_STATUS_INVALID;
+
+	if (WARN_ON(!domain->ops->fault_response))
+		/*
+		 * The IOMMU driver shouldn't have submitted recoverable faults
+		 * if it cannot receive a response.
+		 */
+		return -EINVAL;
+
+	return domain->ops->fault_response(domain, dev, fault, status);
+}
+
+static int iommu_fault_handle_single(struct iommu_fault_context *fault)
+{
+	/* TODO */
+	return -ENODEV;
+}
+
+static void iommu_fault_handle_group(struct work_struct *work)
+{
+	struct iommu_fault_group *group;
+	struct iommu_fault_context *fault, *next;
+	int status = IOMMU_FAULT_STATUS_HANDLED;
+
+	group = container_of(work, struct iommu_fault_group, work);
+
+	list_for_each_entry_safe(fault, next, &group->faults, head) {
+		struct iommu_fault *params = &fault->params;
+		/*
+		 * Errors are sticky: don't handle subsequent faults in the
+		 * group if there is an error.
+		 */
+		if (status == IOMMU_FAULT_STATUS_HANDLED)
+			status = iommu_fault_handle_single(fault);
+
+		if (params->flags & IOMMU_FAULT_LAST ||
+		    !(params->flags & IOMMU_FAULT_GROUP)) {
+			iommu_fault_finish(fault->domain, fault->dev,
+					   &fault->params, status);
+		}
+
+		kfree(fault);
+	}
+
+	kfree(group);
+}
+
+static int iommu_queue_fault(struct iommu_domain *domain, struct device *dev,
+			     struct iommu_fault *params)
+{
+	struct iommu_fault_group *group;
+	struct iommu_fault_context *fault = kzalloc(sizeof(*fault), GFP_KERNEL);
+
+	/*
+	 * FIXME There is a race here, with queue_register. The last IOMMU
+	 * driver has to ensure no fault is reported anymore before
+	 * unregistering, so that doesn't matter. But you could have an IOMMU
+	 * device that didn't register to the fault queue and is still reporting
+	 * faults while the last queue user disappears. It really shouldn't get
+	 * here, but it currently does if there is a blocking handler.
+	 */
+	if (!iommu_fault_queue)
+		return -ENOSYS;
+
+	if (!fault)
+		return -ENOMEM;
+
+	fault->dev = dev;
+	fault->domain = domain;
+	fault->params = *params;
+
+	if ((params->flags & IOMMU_FAULT_LAST) || !(params->flags & IOMMU_FAULT_GROUP)) {
+		group = kzalloc(sizeof(*group), GFP_KERNEL);
+		if (!group) {
+			kfree(fault);
+			return -ENOMEM;
+		}
+
+		INIT_LIST_HEAD(&group->faults);
+		list_add(&fault->head, &group->faults);
+		INIT_WORK(&group->work, iommu_fault_handle_group);
+	} else {
+		/* Non-last request of a group. Postpone until the last one */
+		spin_lock(&iommu_partial_faults_lock);
+		list_add(&fault->head, &iommu_partial_faults);
+		spin_unlock(&iommu_partial_faults_lock);
+
+		return IOMMU_FAULT_STATUS_IGNORE;
+	}
+
+	if (params->flags & IOMMU_FAULT_GROUP) {
+		struct iommu_fault_context *cur, *next;
+
+		/* See if we have pending faults for this group */
+		spin_lock(&iommu_partial_faults_lock);
+		list_for_each_entry_safe(cur, next, &iommu_partial_faults, head) {
+			if (cur->params.id == params->id && cur->dev == dev) {
+				list_del(&cur->head);
+				/* Insert *before* the last fault */
+				list_add(&cur->head, &group->faults);
+			}
+		}
+		spin_unlock(&iommu_partial_faults_lock);
+	}
+
+	queue_work(iommu_fault_queue, &group->work);
+
+	/* Postpone the fault completion */
+	return IOMMU_FAULT_STATUS_IGNORE;
+}
+
+/**
+ * handle_iommu_fault - Handle fault in device driver or mm
+ *
+ * If the device driver expressed interest in handling fault, report it throught
+ * the domain handler. If the fault is recoverable, try to page in the address.
+ */
+int handle_iommu_fault(struct iommu_domain *domain, struct device *dev,
+		       struct iommu_fault *fault)
+{
+	int ret = -ENOSYS;
+
+	/*
+	 * if upper layers showed interest and installed a fault handler,
+	 * invoke it.
+	 */
+	if (domain->ext_handler) {
+		ret = domain->ext_handler(domain, dev, fault,
+					  domain->handler_token);
+
+		if (ret != IOMMU_FAULT_STATUS_NONE)
+			return iommu_fault_finish(domain, dev, fault, ret);
+	} else if (domain->handler && !(fault->flags &
+		   (IOMMU_FAULT_RECOVERABLE | IOMMU_FAULT_PASID))) {
+		/* Fall back to the old method if possible */
+		ret = domain->handler(domain, dev, fault->address,
+				      fault->flags, domain->handler_token);
+		if (ret)
+			return ret;
+	}
+
+	/* If the handler is blocking, handle fault in the workqueue */
+	if (fault->flags & IOMMU_FAULT_RECOVERABLE)
+		ret = iommu_queue_fault(domain, dev, fault);
+
+	return iommu_fault_finish(domain, dev, fault, ret);
+}
+EXPORT_SYMBOL_GPL(handle_iommu_fault);
+
+/**
+ * iommu_fault_response - Complete a recoverable fault
+ * @domain: iommu domain passed to the handler
+ * @dev: device passed to the handler
+ * @fault: fault passed to the handler
+ * @status: action to perform
+ *
+ * An atomic handler that took ownership of the fault (by returning
+ * IOMMU_FAULT_STATUS_IGNORE) must complete the fault by calling this function.
+ */
+int iommu_fault_response(struct iommu_domain *domain, struct device *dev,
+			 struct iommu_fault *fault, enum iommu_fault_status status)
+{
+	/* No response is need for unrecoverable faults... */
+	if (!(fault->flags & IOMMU_FAULT_RECOVERABLE))
+		return -EINVAL;
+
+	/* Ignore is certainly the wrong thing to do at this point */
+	if (WARN_ON(status == IOMMU_FAULT_STATUS_IGNORE ||
+		    status == IOMMU_FAULT_STATUS_NONE))
+		status = IOMMU_FAULT_STATUS_INVALID;
+
+	return iommu_fault_finish(domain, dev, fault, status);
+}
+EXPORT_SYMBOL_GPL(iommu_fault_response);
+
+/**
+ * iommu_fault_queue_register - register an IOMMU driver to the global fault
+ * queue
+ *
+ * @flush_notifier: a notifier block that is called before the fault queue is
+ * flushed. The IOMMU driver should commit all faults that are pending in its
+ * low-level queues at the time of the call, into the fault queue. The notifier
+ * takes a device pointer as argument, hinting what endpoint is causing the
+ * flush. When the device is NULL, all faults should be committed.
+ */
+int iommu_fault_queue_register(struct notifier_block *flush_notifier)
+{
+	/*
+	 * The WQ is unordered because the low-level handler enqueues faults by
+	 * group. PRI requests within a group have to be ordered, but once
+	 * that's dealt with, the high-level function can handle groups out of
+	 * order.
+	 */
+	down_write(&iommu_fault_queue_sem);
+	if (!iommu_fault_queue) {
+		iommu_fault_queue = alloc_workqueue("iommu_fault_queue",
+						    WQ_UNBOUND, 0);
+		if (iommu_fault_queue)
+			refcount_set(&iommu_fault_queue_refs, 1);
+	} else {
+		refcount_inc(&iommu_fault_queue_refs);
+	}
+	up_write(&iommu_fault_queue_sem);
+
+	if (!iommu_fault_queue)
+		return -ENOMEM;
+
+	if (flush_notifier)
+		blocking_notifier_chain_register(&iommu_fault_queue_flush_notifiers,
+						 flush_notifier);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iommu_fault_queue_register);
+
+/**
+ * iommu_fault_queue_flush - Ensure that all queued faults have been processed.
+ * @dev: the endpoint whose faults need to be flushed. If NULL, flush all
+ *       pending faults.
+ *
+ * Users must call this function when releasing a PASID, to ensure that all
+ * pending faults affecting this PASID have been handled, and won't affect the
+ * address space of a subsequent process that reuses this PASID.
+ */
+void iommu_fault_queue_flush(struct device *dev)
+{
+	blocking_notifier_call_chain(&iommu_fault_queue_flush_notifiers, 0, dev);
+
+	down_read(&iommu_fault_queue_sem);
+	/*
+	 * Don't flush the partial faults list. All PRGs with the PASID are
+	 * complete and have been submitted to the queue.
+	 */
+	if (iommu_fault_queue)
+		flush_workqueue(iommu_fault_queue);
+	up_read(&iommu_fault_queue_sem);
+}
+EXPORT_SYMBOL_GPL(iommu_fault_queue_flush);
+
+/**
+ * iommu_fault_queue_unregister - Unregister an IOMMU driver from the global
+ * fault queue.
+ *
+ * @flush_notifier: same parameter as iommu_fault_queue_register
+ */
+void iommu_fault_queue_unregister(struct notifier_block *flush_notifier)
+{
+	down_write(&iommu_fault_queue_sem);
+	if (refcount_dec_and_test(&iommu_fault_queue_refs)) {
+		destroy_workqueue(iommu_fault_queue);
+		iommu_fault_queue = NULL;
+	}
+	up_write(&iommu_fault_queue_sem);
+
+	if (flush_notifier)
+		blocking_notifier_chain_unregister(&iommu_fault_queue_flush_notifiers,
+						   flush_notifier);
+}
+EXPORT_SYMBOL_GPL(iommu_fault_queue_unregister);
diff --git a/drivers/iommu/iommu-process.c b/drivers/iommu/iommu-process.c
index dee7691e3791..092240708b78 100644
--- a/drivers/iommu/iommu-process.c
+++ b/drivers/iommu/iommu-process.c
@@ -26,9 +26,6 @@
 #include <linux/sched/mm.h>
 #include <linux/spinlock.h>
 
-/* FIXME: stub for the fault queue. Remove later. */
-#define iommu_fault_queue_flush(...)
-
 /* Link between a domain and a process */
 struct iommu_context {
 	struct iommu_process	*process;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index ea4eaf585eb4..37fafaf07ee2 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -51,15 +51,69 @@ struct iommu_domain;
 struct notifier_block;
 
 /* iommu fault flags */
-#define IOMMU_FAULT_READ	0x0
-#define IOMMU_FAULT_WRITE	0x1
+#define IOMMU_FAULT_READ		(1 << 0)
+#define IOMMU_FAULT_WRITE		(1 << 1)
+#define IOMMU_FAULT_EXEC		(1 << 2)
+#define IOMMU_FAULT_PRIV		(1 << 3)
+/*
+ * If a fault is recoverable, then it *must* be completed, once handled, with
+ * iommu_fault_response.
+ */
+#define IOMMU_FAULT_RECOVERABLE		(1 << 4)
+/* The PASID field is valid */
+#define IOMMU_FAULT_PASID		(1 << 5)
+/* Fault is part of a group (PCI PRG) */
+#define IOMMU_FAULT_GROUP		(1 << 6)
+/* Fault is last of its group */
+#define IOMMU_FAULT_LAST		(1 << 7)
+
+/**
+ * enum iommu_fault_status - Return status of fault handlers, telling the IOMMU
+ *	driver how to proceed with the fault.
+ *
+ * @IOMMU_FAULT_STATUS_NONE: Fault was not handled. Call the next handler, or
+ *	terminate.
+ * @IOMMU_FAULT_STATUS_FAILURE: General error. Drop all subsequent faults from
+ *	this device if possible. This is "Response Failure" in PCI PRI.
+ * @IOMMU_FAULT_STATUS_INVALID: Could not handle this fault, don't retry the
+ *	access. This is "Invalid Request" in PCI PRI.
+ * @IOMMU_FAULT_STATUS_HANDLED: Fault has been handled and the page tables
+ *	populated, retry the access.
+ * @IOMMU_FAULT_STATUS_IGNORE: Stop processing the fault, and do not send a
+ *	reply to the device.
+ *
+ * For unrecoverable faults, the only valid status is IOMMU_FAULT_STATUS_NONE
+ * For a recoverable fault, if no one handled the fault, treat as
+ * IOMMU_FAULT_STATUS_INVALID.
+ */
+enum iommu_fault_status {
+	IOMMU_FAULT_STATUS_NONE = 0,
+	IOMMU_FAULT_STATUS_FAILURE,
+	IOMMU_FAULT_STATUS_INVALID,
+	IOMMU_FAULT_STATUS_HANDLED,
+	IOMMU_FAULT_STATUS_IGNORE,
+};
 
 typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
 			struct device *, unsigned long, int, void *);
 
 struct iommu_fault {
+	/* Faulting address */
 	unsigned long		address;
+	/* Fault flags */
 	unsigned int		flags;
+	/* Process address space ID (if IOMMU_FAULT_PASID is present) */
+	u32			pasid;
+	/*
+	 * For PCI PRI, 'id' is the PRG. For others, it's a tag identifying a
+	 * single fault.
+	 */
+	unsigned int		id;
+	/*
+	 * IOMMU vendor-specific things. This cannot be a private pointer
+	 * because the fault report might leave the kernel and into a guest.
+	 */
+	u64			iommu_data;
 };
 
 typedef int (*iommu_ext_fault_handler_t)(struct iommu_domain *, struct device *,
@@ -228,6 +282,7 @@ struct iommu_resv_region {
  * @domain_set_windows: Set the number of windows for a domain
  * @domain_get_windows: Return the number of windows for a domain
  * @of_xlate: add OF master IDs to iommu grouping
+ * @fault_reponse: complete a recoverable fault
  * @pgsize_bitmap: bitmap of all possible supported page sizes
  */
 struct iommu_ops {
@@ -287,6 +342,10 @@ struct iommu_ops {
 	int (*of_xlate)(struct device *dev, struct of_phandle_args *args);
 	bool (*is_attach_deferred)(struct iommu_domain *domain, struct device *dev);
 
+	int (*fault_response)(struct iommu_domain *domain, struct device *dev,
+			      struct iommu_fault *fault,
+			      enum iommu_fault_status status);
+
 	unsigned long pgsize_bitmap;
 };
 
@@ -824,4 +883,43 @@ static inline void __iommu_process_unbind_dev_all(struct iommu_domain *domain,
 
 #endif /* CONFIG_IOMMU_PROCESS */
 
+#ifdef CONFIG_IOMMU_FAULT
+extern int handle_iommu_fault(struct iommu_domain *domain, struct device *dev,
+			      struct iommu_fault *fault);
+extern int iommu_fault_response(struct iommu_domain *domain, struct device *dev,
+				struct iommu_fault *fault,
+				enum iommu_fault_status status);
+extern int iommu_fault_queue_register(struct notifier_block *flush_notifier);
+extern void iommu_fault_queue_flush(struct device *dev);
+extern void iommu_fault_queue_unregister(struct notifier_block *flush_notifier);
+#else /* CONFIG_IOMMU_FAULT */
+static inline int handle_iommu_fault(struct iommu_domain *domain,
+				     struct device *dev,
+				     struct iommu_fault *fault)
+{
+	return -ENODEV;
+}
+
+static inline int iommu_fault_response(struct iommu_domain *domain,
+				       struct device *dev,
+				       struct iommu_fault *fault,
+				       enum iommu_fault_status status)
+{
+	return -ENODEV;
+}
+
+static inline int iommu_fault_queue_register(struct notifier_block *flush_notifier)
+{
+	return -ENODEV;
+}
+
+static inline void iommu_fault_queue_flush(struct device *dev)
+{
+}
+
+static inline void iommu_fault_queue_unregister(struct notifier_block *flush_notifier)
+{
+}
+#endif /* CONFIG_IOMMU_FAULT */
+
 #endif /* __LINUX_IOMMU_H */
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 08/36] iommu/fault: Handle mm faults
  2017-10-06 13:31 ` Jean-Philippe Brucker
  (?)
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: mark.rutland-5wv7dgnIgG8,
	gabriele.paoloni-hv44wF8Li93QT0dZR+AlfA,
	catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	rjw-LthD3rsA81gm4RdzfppkhA, sudeep.holla-5wv7dgnIgG8

When a recoverable page fault is handled by the fault workqueue, find the
associated process and call handle_mm_fault.

In theory, we don't even need to take a reference to the iommu_process,
because any release of the structure is preceded by a flush of the queue.
I don't feel comfortable removing the pinning at the moment, though.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/io-pgfault.c | 83 ++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 81 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index f31bc24534b0..532bdb9ce519 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -21,6 +21,7 @@
 
 #include <linux/iommu.h>
 #include <linux/list.h>
+#include <linux/sched/mm.h>
 #include <linux/slab.h>
 #include <linux/workqueue.h>
 
@@ -83,8 +84,86 @@ static int iommu_fault_finish(struct iommu_domain *domain, struct device *dev,
 
 static int iommu_fault_handle_single(struct iommu_fault_context *fault)
 {
-	/* TODO */
-	return -ENODEV;
+	struct mm_struct *mm;
+	struct vm_area_struct *vma;
+	struct iommu_process *process;
+	int ret = IOMMU_FAULT_STATUS_INVALID;
+	unsigned int access_flags = 0;
+	unsigned int fault_flags = FAULT_FLAG_REMOTE;
+	struct iommu_fault *params = &fault->params;
+
+	if (!(params->flags & IOMMU_FAULT_PASID))
+		return ret;
+
+	process = iommu_process_find(params->pasid);
+	if (!process)
+		return ret;
+
+	if ((params->flags & (IOMMU_FAULT_LAST | IOMMU_FAULT_READ |
+			      IOMMU_FAULT_WRITE)) == IOMMU_FAULT_LAST) {
+		/* Special case: PASID Stop Marker doesn't require a response */
+		ret = IOMMU_FAULT_STATUS_IGNORE;
+		goto out_put_process;
+	}
+
+	mm = process->mm;
+	if (!mmget_not_zero(mm)) {
+		/* Process is dead */
+		goto out_put_process;
+	}
+
+	down_read(&mm->mmap_sem);
+
+	vma = find_extend_vma(mm, params->address);
+	if (!vma)
+		/* Unmapped area */
+		goto out_put_mm;
+
+	if (params->flags & IOMMU_FAULT_READ)
+		access_flags |= VM_READ;
+
+	if (params->flags & IOMMU_FAULT_WRITE) {
+		access_flags |= VM_WRITE;
+		fault_flags |= FAULT_FLAG_WRITE;
+	}
+
+	if (params->flags & IOMMU_FAULT_EXEC) {
+		access_flags |= VM_EXEC;
+		fault_flags |= FAULT_FLAG_INSTRUCTION;
+	}
+
+	if (!(params->flags & IOMMU_FAULT_PRIV))
+		fault_flags |= FAULT_FLAG_USER;
+
+	if (access_flags & ~vma->vm_flags)
+		/* Access fault */
+		goto out_put_mm;
+
+	ret = handle_mm_fault(vma, params->address, fault_flags);
+	ret = ret & VM_FAULT_ERROR ? IOMMU_FAULT_STATUS_INVALID :
+		IOMMU_FAULT_STATUS_HANDLED;
+
+out_put_mm:
+	up_read(&mm->mmap_sem);
+
+	/*
+	 * Here's a fun scenario: the process exits while we're handling the
+	 * fault on its mm. Since we're the last mm_user, mmput will call
+	 * mm_exit immediately. exit_mm releases the mmu notifier, which calls
+	 * iommu_notifier_release, which has to flush the fault queue that we're
+	 * executing on... It's actually easy to reproduce with a DMA engine,
+	 * and I did observe a lockdep splat. Therefore move the release of the
+	 * mm to another thread, if we're the last user.
+	 *
+	 * mmput_async was removed in 4.14, and added back in 4.15(?)
+	 * https://patchwork.kernel.org/patch/9952257/
+	 */
+	mmput_async(mm);
+
+out_put_process:
+	iommu_process_put(process);
+
+	return ret;
 }
 
 static void iommu_fault_handle_group(struct work_struct *work)
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 08/36] iommu/fault: Handle mm faults
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

When a recoverable page fault is handled by the fault workqueue, find the
associated process and call handle_mm_fault.

In theory, we don't even need to take a reference to the iommu_process,
because any release of the structure is preceded by a flush of the queue.
I don't feel comfortable removing the pinning at the moment, though.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/io-pgfault.c | 83 ++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 81 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index f31bc24534b0..532bdb9ce519 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -21,6 +21,7 @@
 
 #include <linux/iommu.h>
 #include <linux/list.h>
+#include <linux/sched/mm.h>
 #include <linux/slab.h>
 #include <linux/workqueue.h>
 
@@ -83,8 +84,86 @@ static int iommu_fault_finish(struct iommu_domain *domain, struct device *dev,
 
 static int iommu_fault_handle_single(struct iommu_fault_context *fault)
 {
-	/* TODO */
-	return -ENODEV;
+	struct mm_struct *mm;
+	struct vm_area_struct *vma;
+	struct iommu_process *process;
+	int ret = IOMMU_FAULT_STATUS_INVALID;
+	unsigned int access_flags = 0;
+	unsigned int fault_flags = FAULT_FLAG_REMOTE;
+	struct iommu_fault *params = &fault->params;
+
+	if (!(params->flags & IOMMU_FAULT_PASID))
+		return ret;
+
+	process = iommu_process_find(params->pasid);
+	if (!process)
+		return ret;
+
+	if ((params->flags & (IOMMU_FAULT_LAST | IOMMU_FAULT_READ |
+			      IOMMU_FAULT_WRITE)) == IOMMU_FAULT_LAST) {
+		/* Special case: PASID Stop Marker doesn't require a response */
+		ret = IOMMU_FAULT_STATUS_IGNORE;
+		goto out_put_process;
+	}
+
+	mm = process->mm;
+	if (!mmget_not_zero(mm)) {
+		/* Process is dead */
+		goto out_put_process;
+	}
+
+	down_read(&mm->mmap_sem);
+
+	vma = find_extend_vma(mm, params->address);
+	if (!vma)
+		/* Unmapped area */
+		goto out_put_mm;
+
+	if (params->flags & IOMMU_FAULT_READ)
+		access_flags |= VM_READ;
+
+	if (params->flags & IOMMU_FAULT_WRITE) {
+		access_flags |= VM_WRITE;
+		fault_flags |= FAULT_FLAG_WRITE;
+	}
+
+	if (params->flags & IOMMU_FAULT_EXEC) {
+		access_flags |= VM_EXEC;
+		fault_flags |= FAULT_FLAG_INSTRUCTION;
+	}
+
+	if (!(params->flags & IOMMU_FAULT_PRIV))
+		fault_flags |= FAULT_FLAG_USER;
+
+	if (access_flags & ~vma->vm_flags)
+		/* Access fault */
+		goto out_put_mm;
+
+	ret = handle_mm_fault(vma, params->address, fault_flags);
+	ret = ret & VM_FAULT_ERROR ? IOMMU_FAULT_STATUS_INVALID :
+		IOMMU_FAULT_STATUS_HANDLED;
+
+out_put_mm:
+	up_read(&mm->mmap_sem);
+
+	/*
+	 * Here's a fun scenario: the process exits while we're handling the
+	 * fault on its mm. Since we're the last mm_user, mmput will call
+	 * mm_exit immediately. exit_mm releases the mmu notifier, which calls
+	 * iommu_notifier_release, which has to flush the fault queue that we're
+	 * executing on... It's actually easy to reproduce with a DMA engine,
+	 * and I did observe a lockdep splat. Therefore move the release of the
+	 * mm to another thread, if we're the last user.
+	 *
+	 * mmput_async was removed in 4.14, and added back in 4.15(?)
+	 * https://patchwork.kernel.org/patch/9952257/
+	 */
+	mmput_async(mm);
+
+out_put_process:
+	iommu_process_put(process);
+
+	return ret;
 }
 
 static void iommu_fault_handle_group(struct work_struct *work)
-- 
2.13.3


^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 08/36] iommu/fault: Handle mm faults
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

When a recoverable page fault is handled by the fault workqueue, find the
associated process and call handle_mm_fault.

In theory, we don't even need to take a reference to the iommu_process,
because any release of the structure is preceded by a flush of the queue.
I don't feel comfortable removing the pinning at the moment, though.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/io-pgfault.c | 83 ++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 81 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index f31bc24534b0..532bdb9ce519 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -21,6 +21,7 @@
 
 #include <linux/iommu.h>
 #include <linux/list.h>
+#include <linux/sched/mm.h>
 #include <linux/slab.h>
 #include <linux/workqueue.h>
 
@@ -83,8 +84,86 @@ static int iommu_fault_finish(struct iommu_domain *domain, struct device *dev,
 
 static int iommu_fault_handle_single(struct iommu_fault_context *fault)
 {
-	/* TODO */
-	return -ENODEV;
+	struct mm_struct *mm;
+	struct vm_area_struct *vma;
+	struct iommu_process *process;
+	int ret = IOMMU_FAULT_STATUS_INVALID;
+	unsigned int access_flags = 0;
+	unsigned int fault_flags = FAULT_FLAG_REMOTE;
+	struct iommu_fault *params = &fault->params;
+
+	if (!(params->flags & IOMMU_FAULT_PASID))
+		return ret;
+
+	process = iommu_process_find(params->pasid);
+	if (!process)
+		return ret;
+
+	if ((params->flags & (IOMMU_FAULT_LAST | IOMMU_FAULT_READ |
+			      IOMMU_FAULT_WRITE)) == IOMMU_FAULT_LAST) {
+		/* Special case: PASID Stop Marker doesn't require a response */
+		ret = IOMMU_FAULT_STATUS_IGNORE;
+		goto out_put_process;
+	}
+
+	mm = process->mm;
+	if (!mmget_not_zero(mm)) {
+		/* Process is dead */
+		goto out_put_process;
+	}
+
+	down_read(&mm->mmap_sem);
+
+	vma = find_extend_vma(mm, params->address);
+	if (!vma)
+		/* Unmapped area */
+		goto out_put_mm;
+
+	if (params->flags & IOMMU_FAULT_READ)
+		access_flags |= VM_READ;
+
+	if (params->flags & IOMMU_FAULT_WRITE) {
+		access_flags |= VM_WRITE;
+		fault_flags |= FAULT_FLAG_WRITE;
+	}
+
+	if (params->flags & IOMMU_FAULT_EXEC) {
+		access_flags |= VM_EXEC;
+		fault_flags |= FAULT_FLAG_INSTRUCTION;
+	}
+
+	if (!(params->flags & IOMMU_FAULT_PRIV))
+		fault_flags |= FAULT_FLAG_USER;
+
+	if (access_flags & ~vma->vm_flags)
+		/* Access fault */
+		goto out_put_mm;
+
+	ret = handle_mm_fault(vma, params->address, fault_flags);
+	ret = ret & VM_FAULT_ERROR ? IOMMU_FAULT_STATUS_INVALID :
+		IOMMU_FAULT_STATUS_HANDLED;
+
+out_put_mm:
+	up_read(&mm->mmap_sem);
+
+	/*
+	 * Here's a fun scenario: the process exits while we're handling the
+	 * fault on its mm. Since we're the last mm_user, mmput will call
+	 * mm_exit immediately. exit_mm releases the mmu notifier, which calls
+	 * iommu_notifier_release, which has to flush the fault queue that we're
+	 * executing on... It's actually easy to reproduce with a DMA engine,
+	 * and I did observe a lockdep splat. Therefore move the release of the
+	 * mm to another thread, if we're the last user.
+	 *
+	 * mmput_async was removed in 4.14, and added back in 4.15(?)
+	 * https://patchwork.kernel.org/patch/9952257/
+	 */
+	mmput_async(mm);
+
+out_put_process:
+	iommu_process_put(process);
+
+	return ret;
 }
 
 static void iommu_fault_handle_group(struct work_struct *work)
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 09/36] iommu/fault: Allow blocking fault handlers
  2017-10-06 13:31 ` Jean-Philippe Brucker
@ 2017-10-06 13:31   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

Allow device driver to register their fault handler at various stages of
the handling path, by adding flags to iommu_set_ext_fault_handler. Since
we now have a fault workqueue, it is quite easy to call their handler from
thread context instead of IRQ handler.

A driver can request to be called both in blocking and non-blocking
context, so it can filter faults early and only execute the blocking code
for some of them. Add the IOMMU_FAULT_ATOMIC fault flag to tell the driver
where we're calling it from.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---

Rob, would this do what you want? The MSM driver can register its handler
with ATOMIC | BLOCKING flags. When called in IRQ context, it can ignore
the fault by returning IOMMU_FAULT_STATUS_NONE, or drop it by returning
IOMMU_FAULT_STATUS_INVALID. When called in thread context, it can sleep
and then return IOMMU_FAULT_STATUS_INVALID to terminate the fault.
---
 drivers/iommu/io-pgfault.c | 16 ++++++++++++++--
 drivers/iommu/iommu.c      | 12 +++++++++---
 include/linux/iommu.h      | 20 +++++++++++++++++++-
 3 files changed, 42 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index 532bdb9ce519..3ec8179f58b5 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -91,6 +91,14 @@ static int iommu_fault_handle_single(struct iommu_fault_context *fault)
 	unsigned int access_flags = 0;
 	unsigned int fault_flags = FAULT_FLAG_REMOTE;
 	struct iommu_fault *params = &fault->params;
+	struct iommu_domain *domain = fault->domain;
+
+	if (domain->handler_flags & IOMMU_FAULT_HANDLER_BLOCKING) {
+		ret = domain->ext_handler(domain, fault->dev, &fault->params,
+					  domain->handler_token);
+		if (ret != IOMMU_FAULT_STATUS_NONE)
+			return ret;
+	}
 
 	if (!(params->flags & IOMMU_FAULT_PASID))
 		return ret;
@@ -274,7 +282,8 @@ int handle_iommu_fault(struct iommu_domain *domain, struct device *dev,
 	 * if upper layers showed interest and installed a fault handler,
 	 * invoke it.
 	 */
-	if (domain->ext_handler) {
+	if (domain->handler_flags & IOMMU_FAULT_HANDLER_ATOMIC) {
+		fault->flags |= IOMMU_FAULT_ATOMIC;
 		ret = domain->ext_handler(domain, dev, fault,
 					  domain->handler_token);
 
@@ -290,8 +299,11 @@ int handle_iommu_fault(struct iommu_domain *domain, struct device *dev,
 	}
 
 	/* If the handler is blocking, handle fault in the workqueue */
-	if (fault->flags & IOMMU_FAULT_RECOVERABLE)
+	if ((fault->flags & IOMMU_FAULT_RECOVERABLE) ||
+	    (domain->handler_flags & IOMMU_FAULT_HANDLER_BLOCKING)) {
+		fault->flags &= ~IOMMU_FAULT_ATOMIC;
 		ret = iommu_queue_fault(domain, dev, fault);
+	}
 
 	return iommu_fault_finish(domain, dev, fault, ret);
 }
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index ee956b5fc301..c189648ab7b4 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1258,7 +1258,9 @@ EXPORT_SYMBOL_GPL(iommu_set_fault_handler);
  * @dev: the device
  * @handler: fault handler
  * @token: user data, will be passed back to the fault handler
- * @flags: IOMMU_FAULT_HANDLER_* parameters.
+ * @flags: IOMMU_FAULT_HANDLER_* parameters. Allows the driver to tell when it
+ * wants to be notified. By default the handler will only be called from atomic
+ * context.
  *
  * This function should be used by IOMMU users which want to be notified
  * whenever an IOMMU fault happens.
@@ -1275,11 +1277,15 @@ void iommu_set_ext_fault_handler(struct device *dev,
 	if (WARN_ON(!domain))
 		return;
 
+	if (!flags)
+		flags |= IOMMU_FAULT_HANDLER_ATOMIC;
+
 	if (WARN_ON(domain->handler || domain->ext_handler))
 		return;
 
 	domain->ext_handler = handler;
 	domain->handler_token = token;
+	domain->handler_flags = flags;
 }
 EXPORT_SYMBOL_GPL(iommu_set_ext_fault_handler);
 
@@ -1824,7 +1830,7 @@ int report_iommu_fault(struct iommu_domain *domain, struct device *dev,
 	int ret = -ENOSYS;
 	struct iommu_fault fault = {
 		.address	= iova,
-		.flags		= flags,
+		.flags		= flags | IOMMU_FAULT_ATOMIC,
 	};
 
 	/*
@@ -1834,7 +1840,7 @@ int report_iommu_fault(struct iommu_domain *domain, struct device *dev,
 	if (domain->handler)
 		ret = domain->handler(domain, dev, iova, flags,
 						domain->handler_token);
-	else if (domain->ext_handler)
+	else if (domain->handler_flags & IOMMU_FAULT_HANDLER_ATOMIC)
 		ret = domain->ext_handler(domain, dev, &fault,
 					  domain->handler_token);
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 37fafaf07ee2..a6d417785c7b 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -66,6 +66,8 @@ struct notifier_block;
 #define IOMMU_FAULT_GROUP		(1 << 6)
 /* Fault is last of its group */
 #define IOMMU_FAULT_LAST		(1 << 7)
+/* The fault handler is being called from atomic context */
+#define IOMMU_FAULT_ATOMIC		(1 << 8)
 
 /**
  * enum iommu_fault_status - Return status of fault handlers, telling the IOMMU
@@ -97,6 +99,21 @@ enum iommu_fault_status {
 typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
 			struct device *, unsigned long, int, void *);
 
+/*
+ * IOMMU_FAULT_HANDLER_ATOMIC: Notify device driver from within atomic context
+ * (IRQ handler). The callback is not allowed to sleep. If the fault is
+ * recoverable, the driver must either return a fault status telling the IOMMU
+ * driver how to complete the fault (FAILURE, INVALID, HANDLED) or complete the
+ * fault later with iommu_fault_response.
+ */
+#define IOMMU_FAULT_HANDLER_ATOMIC	(1 << 0)
+/*
+ * IOMMU_FAULT_HANDLER_BLOCKING: Notify device driver from a thread. If the fault
+ * is recoverable, the driver must return a fault status telling the IOMMU
+ * driver how to complete the fault (FAILURE, INVALID, HANDLED)
+ */
+#define IOMMU_FAULT_HANDLER_BLOCKING	(1 << 1)
+
 struct iommu_fault {
 	/* Faulting address */
 	unsigned long		address;
@@ -161,6 +178,7 @@ struct iommu_domain {
 	iommu_fault_handler_t handler;
 	iommu_ext_fault_handler_t ext_handler;
 	void *handler_token;
+	int handler_flags;
 	iommu_process_exit_handler_t process_exit;
 	void *process_exit_token;
 	struct iommu_domain_geometry geometry;
@@ -633,7 +651,7 @@ static inline phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain, dma_ad
 }
 
 static inline void iommu_set_fault_handler(struct iommu_domain *domain,
-				iommu_fault_handler_t handler, void *token)
+				iommu_fault_handler_t handler, void *token, int flags)
 {
 }
 
-- 
2.13.3


^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 09/36] iommu/fault: Allow blocking fault handlers
@ 2017-10-06 13:31   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

Allow device driver to register their fault handler at various stages of
the handling path, by adding flags to iommu_set_ext_fault_handler. Since
we now have a fault workqueue, it is quite easy to call their handler from
thread context instead of IRQ handler.

A driver can request to be called both in blocking and non-blocking
context, so it can filter faults early and only execute the blocking code
for some of them. Add the IOMMU_FAULT_ATOMIC fault flag to tell the driver
where we're calling it from.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---

Rob, would this do what you want? The MSM driver can register its handler
with ATOMIC | BLOCKING flags. When called in IRQ context, it can ignore
the fault by returning IOMMU_FAULT_STATUS_NONE, or drop it by returning
IOMMU_FAULT_STATUS_INVALID. When called in thread context, it can sleep
and then return IOMMU_FAULT_STATUS_INVALID to terminate the fault.
---
 drivers/iommu/io-pgfault.c | 16 ++++++++++++++--
 drivers/iommu/iommu.c      | 12 +++++++++---
 include/linux/iommu.h      | 20 +++++++++++++++++++-
 3 files changed, 42 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index 532bdb9ce519..3ec8179f58b5 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -91,6 +91,14 @@ static int iommu_fault_handle_single(struct iommu_fault_context *fault)
 	unsigned int access_flags = 0;
 	unsigned int fault_flags = FAULT_FLAG_REMOTE;
 	struct iommu_fault *params = &fault->params;
+	struct iommu_domain *domain = fault->domain;
+
+	if (domain->handler_flags & IOMMU_FAULT_HANDLER_BLOCKING) {
+		ret = domain->ext_handler(domain, fault->dev, &fault->params,
+					  domain->handler_token);
+		if (ret != IOMMU_FAULT_STATUS_NONE)
+			return ret;
+	}
 
 	if (!(params->flags & IOMMU_FAULT_PASID))
 		return ret;
@@ -274,7 +282,8 @@ int handle_iommu_fault(struct iommu_domain *domain, struct device *dev,
 	 * if upper layers showed interest and installed a fault handler,
 	 * invoke it.
 	 */
-	if (domain->ext_handler) {
+	if (domain->handler_flags & IOMMU_FAULT_HANDLER_ATOMIC) {
+		fault->flags |= IOMMU_FAULT_ATOMIC;
 		ret = domain->ext_handler(domain, dev, fault,
 					  domain->handler_token);
 
@@ -290,8 +299,11 @@ int handle_iommu_fault(struct iommu_domain *domain, struct device *dev,
 	}
 
 	/* If the handler is blocking, handle fault in the workqueue */
-	if (fault->flags & IOMMU_FAULT_RECOVERABLE)
+	if ((fault->flags & IOMMU_FAULT_RECOVERABLE) ||
+	    (domain->handler_flags & IOMMU_FAULT_HANDLER_BLOCKING)) {
+		fault->flags &= ~IOMMU_FAULT_ATOMIC;
 		ret = iommu_queue_fault(domain, dev, fault);
+	}
 
 	return iommu_fault_finish(domain, dev, fault, ret);
 }
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index ee956b5fc301..c189648ab7b4 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1258,7 +1258,9 @@ EXPORT_SYMBOL_GPL(iommu_set_fault_handler);
  * @dev: the device
  * @handler: fault handler
  * @token: user data, will be passed back to the fault handler
- * @flags: IOMMU_FAULT_HANDLER_* parameters.
+ * @flags: IOMMU_FAULT_HANDLER_* parameters. Allows the driver to tell when it
+ * wants to be notified. By default the handler will only be called from atomic
+ * context.
  *
  * This function should be used by IOMMU users which want to be notified
  * whenever an IOMMU fault happens.
@@ -1275,11 +1277,15 @@ void iommu_set_ext_fault_handler(struct device *dev,
 	if (WARN_ON(!domain))
 		return;
 
+	if (!flags)
+		flags |= IOMMU_FAULT_HANDLER_ATOMIC;
+
 	if (WARN_ON(domain->handler || domain->ext_handler))
 		return;
 
 	domain->ext_handler = handler;
 	domain->handler_token = token;
+	domain->handler_flags = flags;
 }
 EXPORT_SYMBOL_GPL(iommu_set_ext_fault_handler);
 
@@ -1824,7 +1830,7 @@ int report_iommu_fault(struct iommu_domain *domain, struct device *dev,
 	int ret = -ENOSYS;
 	struct iommu_fault fault = {
 		.address	= iova,
-		.flags		= flags,
+		.flags		= flags | IOMMU_FAULT_ATOMIC,
 	};
 
 	/*
@@ -1834,7 +1840,7 @@ int report_iommu_fault(struct iommu_domain *domain, struct device *dev,
 	if (domain->handler)
 		ret = domain->handler(domain, dev, iova, flags,
 						domain->handler_token);
-	else if (domain->ext_handler)
+	else if (domain->handler_flags & IOMMU_FAULT_HANDLER_ATOMIC)
 		ret = domain->ext_handler(domain, dev, &fault,
 					  domain->handler_token);
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 37fafaf07ee2..a6d417785c7b 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -66,6 +66,8 @@ struct notifier_block;
 #define IOMMU_FAULT_GROUP		(1 << 6)
 /* Fault is last of its group */
 #define IOMMU_FAULT_LAST		(1 << 7)
+/* The fault handler is being called from atomic context */
+#define IOMMU_FAULT_ATOMIC		(1 << 8)
 
 /**
  * enum iommu_fault_status - Return status of fault handlers, telling the IOMMU
@@ -97,6 +99,21 @@ enum iommu_fault_status {
 typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
 			struct device *, unsigned long, int, void *);
 
+/*
+ * IOMMU_FAULT_HANDLER_ATOMIC: Notify device driver from within atomic context
+ * (IRQ handler). The callback is not allowed to sleep. If the fault is
+ * recoverable, the driver must either return a fault status telling the IOMMU
+ * driver how to complete the fault (FAILURE, INVALID, HANDLED) or complete the
+ * fault later with iommu_fault_response.
+ */
+#define IOMMU_FAULT_HANDLER_ATOMIC	(1 << 0)
+/*
+ * IOMMU_FAULT_HANDLER_BLOCKING: Notify device driver from a thread. If the fault
+ * is recoverable, the driver must return a fault status telling the IOMMU
+ * driver how to complete the fault (FAILURE, INVALID, HANDLED)
+ */
+#define IOMMU_FAULT_HANDLER_BLOCKING	(1 << 1)
+
 struct iommu_fault {
 	/* Faulting address */
 	unsigned long		address;
@@ -161,6 +178,7 @@ struct iommu_domain {
 	iommu_fault_handler_t handler;
 	iommu_ext_fault_handler_t ext_handler;
 	void *handler_token;
+	int handler_flags;
 	iommu_process_exit_handler_t process_exit;
 	void *process_exit_token;
 	struct iommu_domain_geometry geometry;
@@ -633,7 +651,7 @@ static inline phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain, dma_ad
 }
 
 static inline void iommu_set_fault_handler(struct iommu_domain *domain,
-				iommu_fault_handler_t handler, void *token)
+				iommu_fault_handler_t handler, void *token, int flags)
 {
 }
 
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 10/36] vfio: Add support for Shared Virtual Memory
  2017-10-06 13:31 ` Jean-Philippe Brucker
@ 2017-10-06 13:31   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

Add two new ioctl for VFIO containers. VFIO_DEVICE_BIND_PROCESS creates a
bond between a container and a process address space, identified by a
device-specific ID named PASID. This allows the device to target DMA
transactions at the process virtual addresses without a need for mapping
and unmapping buffers explicitly in the IOMMU. The process page tables are
shared with the IOMMU, and mechanisms such as PCI ATS/PRI may be used to
handle faults. VFIO_DEVICE_UNBIND_PROCESS removed a bond identified by a
PASID.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/vfio/vfio_iommu_type1.c | 243 +++++++++++++++++++++++++++++++++++++++-
 include/uapi/linux/vfio.h       |  69 ++++++++++++
 2 files changed, 311 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 92155cce926d..4bfb92273cb5 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -30,6 +30,7 @@
 #include <linux/iommu.h>
 #include <linux/module.h>
 #include <linux/mm.h>
+#include <linux/ptrace.h>
 #include <linux/rbtree.h>
 #include <linux/sched/signal.h>
 #include <linux/sched/mm.h>
@@ -60,6 +61,7 @@ MODULE_PARM_DESC(disable_hugepages,
 
 struct vfio_iommu {
 	struct list_head	domain_list;
+	struct list_head	process_list;
 	struct vfio_domain	*external_domain; /* domain for external user */
 	struct mutex		lock;
 	struct rb_root		dma_list;
@@ -92,6 +94,12 @@ struct vfio_group {
 	struct list_head	next;
 };
 
+struct vfio_process {
+	int			pasid;
+	struct pid		*pid;
+	struct list_head	next;
+};
+
 /*
  * Guest RAM pinning working set or DMA target
  */
@@ -1114,6 +1122,25 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
 	return 0;
 }
 
+static int vfio_iommu_replay_bind(struct vfio_iommu *iommu, struct vfio_group *group)
+{
+	int ret;
+	u32 pasid;
+	struct vfio_process *vfio_process;
+
+	list_for_each_entry(vfio_process, &iommu->process_list, next) {
+		struct task_struct *task = get_pid_task(vfio_process->pid,
+							PIDTYPE_PID);
+
+		ret = iommu_process_bind_group(group->iommu_group, task, &pasid, 0);
+		put_task_struct(task);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
 /*
  * We change our unmap behavior slightly depending on whether the IOMMU
  * supports fine-grained superpages.  IOMMUs like AMD-Vi will use a superpage
@@ -1301,8 +1328,9 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 				list_add(&group->next, &d->group_list);
 				iommu_domain_free(domain->domain);
 				kfree(domain);
+				ret = vfio_iommu_replay_bind(iommu, group);
 				mutex_unlock(&iommu->lock);
-				return 0;
+				return ret;
 			}
 
 			ret = iommu_attach_group(domain->domain, iommu_group);
@@ -1318,6 +1346,10 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	if (ret)
 		goto out_detach;
 
+	ret = vfio_iommu_replay_bind(iommu, group);
+	if (ret)
+		goto out_detach;
+
 	if (resv_msi) {
 		ret = iommu_get_msi_cookie(domain->domain, resv_msi_base);
 		if (ret)
@@ -1349,6 +1381,21 @@ static void vfio_iommu_unmap_unpin_all(struct vfio_iommu *iommu)
 		vfio_remove_dma(iommu, rb_entry(node, struct vfio_dma, node));
 }
 
+static void vfio_iommu_unbind_all(struct vfio_iommu *iommu)
+{
+	struct vfio_process *process, *process_tmp;
+
+	list_for_each_entry_safe(process, process_tmp, &iommu->process_list, next) {
+		/*
+		 * No need to unbind manually, iommu_detach_group should
+		 * do it for us.
+		 */
+		put_pid(process->pid);
+		kfree(process);
+	}
+	INIT_LIST_HEAD(&iommu->process_list);
+}
+
 static void vfio_iommu_unmap_unpin_reaccount(struct vfio_iommu *iommu)
 {
 	struct rb_node *n, *p;
@@ -1438,6 +1485,7 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
 					vfio_iommu_unmap_unpin_all(iommu);
 				else
 					vfio_iommu_unmap_unpin_reaccount(iommu);
+				vfio_iommu_unbind_all(iommu);
 			}
 			iommu_domain_free(domain->domain);
 			list_del(&domain->next);
@@ -1472,6 +1520,7 @@ static void *vfio_iommu_type1_open(unsigned long arg)
 	}
 
 	INIT_LIST_HEAD(&iommu->domain_list);
+	INIT_LIST_HEAD(&iommu->process_list);
 	iommu->dma_list = RB_ROOT;
 	mutex_init(&iommu->lock);
 	BLOCKING_INIT_NOTIFIER_HEAD(&iommu->notifier);
@@ -1506,6 +1555,7 @@ static void vfio_iommu_type1_release(void *iommu_data)
 		kfree(iommu->external_domain);
 	}
 
+	vfio_iommu_unbind_all(iommu);
 	vfio_iommu_unmap_unpin_all(iommu);
 
 	list_for_each_entry_safe(domain, domain_tmp,
@@ -1534,6 +1584,159 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
 	return ret;
 }
 
+static long vfio_iommu_type1_bind_process(struct vfio_iommu *iommu,
+					  void __user *arg,
+					  struct vfio_iommu_type1_bind *bind)
+{
+	struct vfio_iommu_type1_bind_process params;
+	struct vfio_process *vfio_process;
+	struct vfio_domain *domain;
+	struct task_struct *task;
+	struct vfio_group *group;
+	struct mm_struct *mm;
+	unsigned long minsz;
+	struct pid *pid;
+	int ret;
+
+	minsz = sizeof(*bind) + sizeof(params);
+	if (bind->argsz < minsz)
+		return -EINVAL;
+
+	arg += sizeof(*bind);
+	ret = copy_from_user(&params, arg, sizeof(params));
+	if (ret)
+		return -EFAULT;
+
+	if (params.flags & ~VFIO_IOMMU_BIND_PID)
+		return -EINVAL;
+
+	if (params.flags & VFIO_IOMMU_BIND_PID) {
+		pid_t vpid;
+
+		minsz += sizeof(pid_t);
+		if (bind->argsz < minsz)
+			return -EINVAL;
+
+		ret = copy_from_user(&vpid, arg + sizeof(params), sizeof(pid_t));
+		if (ret)
+			return -EFAULT;
+
+		rcu_read_lock();
+		task = find_task_by_vpid(vpid);
+		if (task)
+			get_task_struct(task);
+		rcu_read_unlock();
+		if (!task)
+			return -ESRCH;
+
+		/* Ensure current has RW access on the mm */
+		mm = mm_access(task, PTRACE_MODE_ATTACH_REALCREDS);
+		if (!mm || IS_ERR(mm)) {
+			put_task_struct(task);
+			return IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
+		}
+		mmput(mm);
+	} else {
+		get_task_struct(current);
+		task = current;
+	}
+
+	pid = get_task_pid(task, PIDTYPE_PID);
+	mutex_lock(&iommu->lock);
+	list_for_each_entry(vfio_process, &iommu->process_list, next) {
+		if (vfio_process->pid != pid)
+			continue;
+
+		params.pasid = vfio_process->pasid;
+
+		mutex_unlock(&iommu->lock);
+		put_pid(pid);
+		put_task_struct(task);
+		return copy_to_user(arg, &params, sizeof(params)) ?
+			-EFAULT : 0;
+	}
+
+	vfio_process = kzalloc(sizeof(*vfio_process), GFP_KERNEL);
+	if (!vfio_process) {
+		mutex_unlock(&iommu->lock);
+		put_pid(pid);
+		put_task_struct(task);
+		return -ENOMEM;
+	}
+
+	list_for_each_entry(domain, &iommu->domain_list, next) {
+		list_for_each_entry(group, &domain->group_list, next) {
+			ret = iommu_process_bind_group(group->iommu_group, task,
+						       &params.pasid, 0);
+			if (ret)
+				break;
+		}
+		if (ret)
+			break;
+	}
+
+	if (!ret) {
+		vfio_process->pid = pid;
+		vfio_process->pasid = params.pasid;
+		list_add(&vfio_process->next, &iommu->process_list);
+	}
+
+	mutex_unlock(&iommu->lock);
+
+	put_task_struct(task);
+
+	if (ret)
+		kfree(vfio_process);
+	else
+		ret = copy_to_user(arg, &params, sizeof(params)) ?
+			-EFAULT : 0;
+
+	return ret;
+}
+
+static long vfio_iommu_type1_unbind_process(struct vfio_iommu *iommu,
+					    void __user *arg,
+					    struct vfio_iommu_type1_bind *bind)
+{
+	int ret = -EINVAL;
+	unsigned long minsz;
+	struct vfio_process *process;
+	struct vfio_group *group;
+	struct vfio_domain *domain;
+	struct vfio_iommu_type1_bind_process params;
+
+	minsz = sizeof(*bind) + sizeof(params);
+	if (bind->argsz < minsz)
+		return -EINVAL;
+
+	arg += sizeof(*bind);
+	ret = copy_from_user(&params, arg, sizeof(params));
+	if (ret)
+		return -EFAULT;
+
+	if (params.flags)
+		return -EINVAL;
+
+	mutex_lock(&iommu->lock);
+	list_for_each_entry(process, &iommu->process_list, next) {
+		if (process->pasid != params.pasid)
+			continue;
+
+		list_for_each_entry(domain, &iommu->domain_list, next)
+			list_for_each_entry(group, &domain->group_list, next)
+				iommu_process_unbind_group(group->iommu_group,
+							   process->pasid);
+
+		put_pid(process->pid);
+		list_del(&process->next);
+		kfree(process);
+		break;
+	}
+	mutex_unlock(&iommu->lock);
+
+	return ret;
+}
+
 static long vfio_iommu_type1_ioctl(void *iommu_data,
 				   unsigned int cmd, unsigned long arg)
 {
@@ -1604,6 +1807,44 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 
 		return copy_to_user((void __user *)arg, &unmap, minsz) ?
 			-EFAULT : 0;
+
+	} else if (cmd == VFIO_IOMMU_BIND) {
+		struct vfio_iommu_type1_bind bind;
+
+		minsz = offsetofend(struct vfio_iommu_type1_bind, mode);
+
+		if (copy_from_user(&bind, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (bind.argsz < minsz)
+			return -EINVAL;
+
+		switch (bind.mode) {
+		case VFIO_IOMMU_BIND_PROCESS:
+			return vfio_iommu_type1_bind_process(iommu, (void *)arg,
+							     &bind);
+		default:
+			return -EINVAL;
+		}
+
+	} else if (cmd == VFIO_IOMMU_UNBIND) {
+		struct vfio_iommu_type1_bind bind;
+
+		minsz = offsetofend(struct vfio_iommu_type1_bind, mode);
+
+		if (copy_from_user(&bind, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (bind.argsz < minsz)
+			return -EINVAL;
+
+		switch (bind.mode) {
+		case VFIO_IOMMU_BIND_PROCESS:
+			return vfio_iommu_type1_unbind_process(iommu, (void *)arg,
+							       &bind);
+		default:
+			return -EINVAL;
+		}
 	}
 
 	return -ENOTTY;
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index ae461050661a..6da8321c33dc 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -565,6 +565,75 @@ struct vfio_iommu_type1_dma_unmap {
 #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
 #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
 
+/*
+ * Allocate a PASID for a local process, and use it to attach this process to
+ * devices in the container. Devices can then tag their DMA traffic with the
+ * returned @pasid to perform transactions on the associated virtual address
+ * space. Mapping and unmapping of buffers is performed by standard functions
+ * such as mmap and malloc.
+ *
+ * If flag is VFIO_IOMMU_BIND_PID, bind to a process different from the calling
+ * one. data contains the pid of that process, a s32. Given that the caller owns
+ * the device, setting this flag grants the caller read and write permissions on
+ * the entire address space of foreign process described by @pid. Therefore,
+ * permission to perform the bind operation on a foreign process is governed by
+ * the ptrace access mode PTRACE_MODE_ATTACH_REALCREDS check. See man ptrace(2)
+ * for more information.
+ *
+ * On success, VFIO writes a Process Address Space ID (PASID) into @pasid. This
+ * ID is unique to a process and can be used on all devices in the container.
+ *
+ * On fork, the child inherits the device fd and can use the bonds setup by its
+ * parent. Consequently, the child has R/W access on the address spaces bound by
+ * its parent. After an execv, the device fd is closed and the child doesn't
+ * have access to the address space anymore.
+ */
+struct vfio_iommu_type1_bind_process {
+	__u32	flags;
+#define VFIO_IOMMU_BIND_PID		(1 << 0)
+	__u32	pasid;
+	__u8	data[];
+};
+
+/*
+ * Only mode supported at the moment is VFIO_IOMMU_BIND_PROCESS, which takes
+ * vfio_iommu_type1_bind_process in data.
+ */
+struct vfio_iommu_type1_bind {
+	__u32	argsz;
+	__u32	mode;
+#define VFIO_IOMMU_BIND_PROCESS		(1 << 0)
+	__u8	data[];
+};
+
+/*
+ * VFIO_IOMMU_BIND - _IOWR(VFIO_TYPE, VFIO_BASE + 22, struct vfio_iommu_bind)
+ *
+ * Manage address spaces of devices in this container. Initially a TYPE1
+ * container can only have one address space, managed with
+ * VFIO_IOMMU_MAP/UNMAP_DMA.
+ *
+ * An IOMMU of type VFIO_TYPE1_NESTING_IOMMU can be managed by both MAP/UNMAP
+ * and BIND ioctls at the same time. MAP/UNMAP acts on the stage-2 (host) page
+ * tables, and BIND manages the stage-1 (guest) page tables. Other types of
+ * IOMMU may allow MAP/UNMAP and BIND to coexist, where MAP/UNMAP controls
+ * non-PASID traffic and BIND controls PASID traffic. But this depends on the
+ * underlying IOMMU architecture and isn't guaranteed.
+ *
+ * Availability of this feature depends on the device, its bus, the underlying
+ * IOMMU and the CPU architecture.
+ *
+ * returns: 0 on success, -errno on failure.
+ */
+#define VFIO_IOMMU_BIND		_IO(VFIO_TYPE, VFIO_BASE + 22)
+
+/*
+ * VFIO_IOMMU_UNBIND - _IOWR(VFIO_TYPE, VFIO_BASE + 23, struct vfio_iommu_bind)
+ *
+ * Undo what was done by the corresponding VFIO_IOMMU_BIND ioctl.
+ */
+#define VFIO_IOMMU_UNBIND	_IO(VFIO_TYPE, VFIO_BASE + 23)
+
 /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
 
 /*
-- 
2.13.3


^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 10/36] vfio: Add support for Shared Virtual Memory
@ 2017-10-06 13:31   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

Add two new ioctl for VFIO containers. VFIO_DEVICE_BIND_PROCESS creates a
bond between a container and a process address space, identified by a
device-specific ID named PASID. This allows the device to target DMA
transactions at the process virtual addresses without a need for mapping
and unmapping buffers explicitly in the IOMMU. The process page tables are
shared with the IOMMU, and mechanisms such as PCI ATS/PRI may be used to
handle faults. VFIO_DEVICE_UNBIND_PROCESS removed a bond identified by a
PASID.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/vfio/vfio_iommu_type1.c | 243 +++++++++++++++++++++++++++++++++++++++-
 include/uapi/linux/vfio.h       |  69 ++++++++++++
 2 files changed, 311 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 92155cce926d..4bfb92273cb5 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -30,6 +30,7 @@
 #include <linux/iommu.h>
 #include <linux/module.h>
 #include <linux/mm.h>
+#include <linux/ptrace.h>
 #include <linux/rbtree.h>
 #include <linux/sched/signal.h>
 #include <linux/sched/mm.h>
@@ -60,6 +61,7 @@ MODULE_PARM_DESC(disable_hugepages,
 
 struct vfio_iommu {
 	struct list_head	domain_list;
+	struct list_head	process_list;
 	struct vfio_domain	*external_domain; /* domain for external user */
 	struct mutex		lock;
 	struct rb_root		dma_list;
@@ -92,6 +94,12 @@ struct vfio_group {
 	struct list_head	next;
 };
 
+struct vfio_process {
+	int			pasid;
+	struct pid		*pid;
+	struct list_head	next;
+};
+
 /*
  * Guest RAM pinning working set or DMA target
  */
@@ -1114,6 +1122,25 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
 	return 0;
 }
 
+static int vfio_iommu_replay_bind(struct vfio_iommu *iommu, struct vfio_group *group)
+{
+	int ret;
+	u32 pasid;
+	struct vfio_process *vfio_process;
+
+	list_for_each_entry(vfio_process, &iommu->process_list, next) {
+		struct task_struct *task = get_pid_task(vfio_process->pid,
+							PIDTYPE_PID);
+
+		ret = iommu_process_bind_group(group->iommu_group, task, &pasid, 0);
+		put_task_struct(task);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
 /*
  * We change our unmap behavior slightly depending on whether the IOMMU
  * supports fine-grained superpages.  IOMMUs like AMD-Vi will use a superpage
@@ -1301,8 +1328,9 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 				list_add(&group->next, &d->group_list);
 				iommu_domain_free(domain->domain);
 				kfree(domain);
+				ret = vfio_iommu_replay_bind(iommu, group);
 				mutex_unlock(&iommu->lock);
-				return 0;
+				return ret;
 			}
 
 			ret = iommu_attach_group(domain->domain, iommu_group);
@@ -1318,6 +1346,10 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	if (ret)
 		goto out_detach;
 
+	ret = vfio_iommu_replay_bind(iommu, group);
+	if (ret)
+		goto out_detach;
+
 	if (resv_msi) {
 		ret = iommu_get_msi_cookie(domain->domain, resv_msi_base);
 		if (ret)
@@ -1349,6 +1381,21 @@ static void vfio_iommu_unmap_unpin_all(struct vfio_iommu *iommu)
 		vfio_remove_dma(iommu, rb_entry(node, struct vfio_dma, node));
 }
 
+static void vfio_iommu_unbind_all(struct vfio_iommu *iommu)
+{
+	struct vfio_process *process, *process_tmp;
+
+	list_for_each_entry_safe(process, process_tmp, &iommu->process_list, next) {
+		/*
+		 * No need to unbind manually, iommu_detach_group should
+		 * do it for us.
+		 */
+		put_pid(process->pid);
+		kfree(process);
+	}
+	INIT_LIST_HEAD(&iommu->process_list);
+}
+
 static void vfio_iommu_unmap_unpin_reaccount(struct vfio_iommu *iommu)
 {
 	struct rb_node *n, *p;
@@ -1438,6 +1485,7 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
 					vfio_iommu_unmap_unpin_all(iommu);
 				else
 					vfio_iommu_unmap_unpin_reaccount(iommu);
+				vfio_iommu_unbind_all(iommu);
 			}
 			iommu_domain_free(domain->domain);
 			list_del(&domain->next);
@@ -1472,6 +1520,7 @@ static void *vfio_iommu_type1_open(unsigned long arg)
 	}
 
 	INIT_LIST_HEAD(&iommu->domain_list);
+	INIT_LIST_HEAD(&iommu->process_list);
 	iommu->dma_list = RB_ROOT;
 	mutex_init(&iommu->lock);
 	BLOCKING_INIT_NOTIFIER_HEAD(&iommu->notifier);
@@ -1506,6 +1555,7 @@ static void vfio_iommu_type1_release(void *iommu_data)
 		kfree(iommu->external_domain);
 	}
 
+	vfio_iommu_unbind_all(iommu);
 	vfio_iommu_unmap_unpin_all(iommu);
 
 	list_for_each_entry_safe(domain, domain_tmp,
@@ -1534,6 +1584,159 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
 	return ret;
 }
 
+static long vfio_iommu_type1_bind_process(struct vfio_iommu *iommu,
+					  void __user *arg,
+					  struct vfio_iommu_type1_bind *bind)
+{
+	struct vfio_iommu_type1_bind_process params;
+	struct vfio_process *vfio_process;
+	struct vfio_domain *domain;
+	struct task_struct *task;
+	struct vfio_group *group;
+	struct mm_struct *mm;
+	unsigned long minsz;
+	struct pid *pid;
+	int ret;
+
+	minsz = sizeof(*bind) + sizeof(params);
+	if (bind->argsz < minsz)
+		return -EINVAL;
+
+	arg += sizeof(*bind);
+	ret = copy_from_user(&params, arg, sizeof(params));
+	if (ret)
+		return -EFAULT;
+
+	if (params.flags & ~VFIO_IOMMU_BIND_PID)
+		return -EINVAL;
+
+	if (params.flags & VFIO_IOMMU_BIND_PID) {
+		pid_t vpid;
+
+		minsz += sizeof(pid_t);
+		if (bind->argsz < minsz)
+			return -EINVAL;
+
+		ret = copy_from_user(&vpid, arg + sizeof(params), sizeof(pid_t));
+		if (ret)
+			return -EFAULT;
+
+		rcu_read_lock();
+		task = find_task_by_vpid(vpid);
+		if (task)
+			get_task_struct(task);
+		rcu_read_unlock();
+		if (!task)
+			return -ESRCH;
+
+		/* Ensure current has RW access on the mm */
+		mm = mm_access(task, PTRACE_MODE_ATTACH_REALCREDS);
+		if (!mm || IS_ERR(mm)) {
+			put_task_struct(task);
+			return IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
+		}
+		mmput(mm);
+	} else {
+		get_task_struct(current);
+		task = current;
+	}
+
+	pid = get_task_pid(task, PIDTYPE_PID);
+	mutex_lock(&iommu->lock);
+	list_for_each_entry(vfio_process, &iommu->process_list, next) {
+		if (vfio_process->pid != pid)
+			continue;
+
+		params.pasid = vfio_process->pasid;
+
+		mutex_unlock(&iommu->lock);
+		put_pid(pid);
+		put_task_struct(task);
+		return copy_to_user(arg, &params, sizeof(params)) ?
+			-EFAULT : 0;
+	}
+
+	vfio_process = kzalloc(sizeof(*vfio_process), GFP_KERNEL);
+	if (!vfio_process) {
+		mutex_unlock(&iommu->lock);
+		put_pid(pid);
+		put_task_struct(task);
+		return -ENOMEM;
+	}
+
+	list_for_each_entry(domain, &iommu->domain_list, next) {
+		list_for_each_entry(group, &domain->group_list, next) {
+			ret = iommu_process_bind_group(group->iommu_group, task,
+						       &params.pasid, 0);
+			if (ret)
+				break;
+		}
+		if (ret)
+			break;
+	}
+
+	if (!ret) {
+		vfio_process->pid = pid;
+		vfio_process->pasid = params.pasid;
+		list_add(&vfio_process->next, &iommu->process_list);
+	}
+
+	mutex_unlock(&iommu->lock);
+
+	put_task_struct(task);
+
+	if (ret)
+		kfree(vfio_process);
+	else
+		ret = copy_to_user(arg, &params, sizeof(params)) ?
+			-EFAULT : 0;
+
+	return ret;
+}
+
+static long vfio_iommu_type1_unbind_process(struct vfio_iommu *iommu,
+					    void __user *arg,
+					    struct vfio_iommu_type1_bind *bind)
+{
+	int ret = -EINVAL;
+	unsigned long minsz;
+	struct vfio_process *process;
+	struct vfio_group *group;
+	struct vfio_domain *domain;
+	struct vfio_iommu_type1_bind_process params;
+
+	minsz = sizeof(*bind) + sizeof(params);
+	if (bind->argsz < minsz)
+		return -EINVAL;
+
+	arg += sizeof(*bind);
+	ret = copy_from_user(&params, arg, sizeof(params));
+	if (ret)
+		return -EFAULT;
+
+	if (params.flags)
+		return -EINVAL;
+
+	mutex_lock(&iommu->lock);
+	list_for_each_entry(process, &iommu->process_list, next) {
+		if (process->pasid != params.pasid)
+			continue;
+
+		list_for_each_entry(domain, &iommu->domain_list, next)
+			list_for_each_entry(group, &domain->group_list, next)
+				iommu_process_unbind_group(group->iommu_group,
+							   process->pasid);
+
+		put_pid(process->pid);
+		list_del(&process->next);
+		kfree(process);
+		break;
+	}
+	mutex_unlock(&iommu->lock);
+
+	return ret;
+}
+
 static long vfio_iommu_type1_ioctl(void *iommu_data,
 				   unsigned int cmd, unsigned long arg)
 {
@@ -1604,6 +1807,44 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 
 		return copy_to_user((void __user *)arg, &unmap, minsz) ?
 			-EFAULT : 0;
+
+	} else if (cmd == VFIO_IOMMU_BIND) {
+		struct vfio_iommu_type1_bind bind;
+
+		minsz = offsetofend(struct vfio_iommu_type1_bind, mode);
+
+		if (copy_from_user(&bind, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (bind.argsz < minsz)
+			return -EINVAL;
+
+		switch (bind.mode) {
+		case VFIO_IOMMU_BIND_PROCESS:
+			return vfio_iommu_type1_bind_process(iommu, (void *)arg,
+							     &bind);
+		default:
+			return -EINVAL;
+		}
+
+	} else if (cmd == VFIO_IOMMU_UNBIND) {
+		struct vfio_iommu_type1_bind bind;
+
+		minsz = offsetofend(struct vfio_iommu_type1_bind, mode);
+
+		if (copy_from_user(&bind, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (bind.argsz < minsz)
+			return -EINVAL;
+
+		switch (bind.mode) {
+		case VFIO_IOMMU_BIND_PROCESS:
+			return vfio_iommu_type1_unbind_process(iommu, (void *)arg,
+							       &bind);
+		default:
+			return -EINVAL;
+		}
 	}
 
 	return -ENOTTY;
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index ae461050661a..6da8321c33dc 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -565,6 +565,75 @@ struct vfio_iommu_type1_dma_unmap {
 #define VFIO_IOMMU_ENABLE	_IO(VFIO_TYPE, VFIO_BASE + 15)
 #define VFIO_IOMMU_DISABLE	_IO(VFIO_TYPE, VFIO_BASE + 16)
 
+/*
+ * Allocate a PASID for a local process, and use it to attach this process to
+ * devices in the container. Devices can then tag their DMA traffic with the
+ * returned @pasid to perform transactions on the associated virtual address
+ * space. Mapping and unmapping of buffers is performed by standard functions
+ * such as mmap and malloc.
+ *
+ * If flag is VFIO_IOMMU_BIND_PID, bind to a process different from the calling
+ * one. data contains the pid of that process, a s32. Given that the caller owns
+ * the device, setting this flag grants the caller read and write permissions on
+ * the entire address space of foreign process described by @pid. Therefore,
+ * permission to perform the bind operation on a foreign process is governed by
+ * the ptrace access mode PTRACE_MODE_ATTACH_REALCREDS check. See man ptrace(2)
+ * for more information.
+ *
+ * On success, VFIO writes a Process Address Space ID (PASID) into @pasid. This
+ * ID is unique to a process and can be used on all devices in the container.
+ *
+ * On fork, the child inherits the device fd and can use the bonds setup by its
+ * parent. Consequently, the child has R/W access on the address spaces bound by
+ * its parent. After an execv, the device fd is closed and the child doesn't
+ * have access to the address space anymore.
+ */
+struct vfio_iommu_type1_bind_process {
+	__u32	flags;
+#define VFIO_IOMMU_BIND_PID		(1 << 0)
+	__u32	pasid;
+	__u8	data[];
+};
+
+/*
+ * Only mode supported at the moment is VFIO_IOMMU_BIND_PROCESS, which takes
+ * vfio_iommu_type1_bind_process in data.
+ */
+struct vfio_iommu_type1_bind {
+	__u32	argsz;
+	__u32	mode;
+#define VFIO_IOMMU_BIND_PROCESS		(1 << 0)
+	__u8	data[];
+};
+
+/*
+ * VFIO_IOMMU_BIND - _IOWR(VFIO_TYPE, VFIO_BASE + 22, struct vfio_iommu_bind)
+ *
+ * Manage address spaces of devices in this container. Initially a TYPE1
+ * container can only have one address space, managed with
+ * VFIO_IOMMU_MAP/UNMAP_DMA.
+ *
+ * An IOMMU of type VFIO_TYPE1_NESTING_IOMMU can be managed by both MAP/UNMAP
+ * and BIND ioctls@the same time. MAP/UNMAP acts on the stage-2 (host) page
+ * tables, and BIND manages the stage-1 (guest) page tables. Other types of
+ * IOMMU may allow MAP/UNMAP and BIND to coexist, where MAP/UNMAP controls
+ * non-PASID traffic and BIND controls PASID traffic. But this depends on the
+ * underlying IOMMU architecture and isn't guaranteed.
+ *
+ * Availability of this feature depends on the device, its bus, the underlying
+ * IOMMU and the CPU architecture.
+ *
+ * returns: 0 on success, -errno on failure.
+ */
+#define VFIO_IOMMU_BIND		_IO(VFIO_TYPE, VFIO_BASE + 22)
+
+/*
+ * VFIO_IOMMU_UNBIND - _IOWR(VFIO_TYPE, VFIO_BASE + 23, struct vfio_iommu_bind)
+ *
+ * Undo what was done by the corresponding VFIO_IOMMU_BIND ioctl.
+ */
+#define VFIO_IOMMU_UNBIND	_IO(VFIO_TYPE, VFIO_BASE + 23)
+
 /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */
 
 /*
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 11/36] iommu/arm-smmu-v3: Link domains and devices
  2017-10-06 13:31 ` Jean-Philippe Brucker
@ 2017-10-06 13:31   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

When removing a mapping from a domain, we need to send an invalidation to
all devices that might have stored it in their Address Translation Cache
(ATC). In addition with SVM, we'll need to invalidate context descriptors
of all devices attached to a live domain.

Maintain a list of devices in each domain, protected by a spinlock. It is
updated every time we attach or detach devices to and from domains.

It needs to be a spinlock because we'll invalidate ATC entries from
within hardirq-safe contexts, but it may be possible to relax the read
side with RCU later.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 22a6b08ef014..ecc424b15749 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -644,6 +644,11 @@ struct arm_smmu_device {
 struct arm_smmu_master_data {
 	struct arm_smmu_device		*smmu;
 	struct arm_smmu_strtab_ent	ste;
+
+	struct arm_smmu_domain		*domain;
+	struct list_head		list; /* domain->devices */
+
+	struct device			*dev;
 };
 
 /* SMMU private data for an IOMMU domain */
@@ -667,6 +672,9 @@ struct arm_smmu_domain {
 	};
 
 	struct iommu_domain		domain;
+
+	struct list_head		devices;
+	spinlock_t			devices_lock;
 };
 
 struct arm_smmu_option_prop {
@@ -1461,6 +1469,9 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 	}
 
 	mutex_init(&smmu_domain->init_mutex);
+	INIT_LIST_HEAD(&smmu_domain->devices);
+	spin_lock_init(&smmu_domain->devices_lock);
+
 	return &smmu_domain->domain;
 }
 
@@ -1666,7 +1677,17 @@ static void arm_smmu_install_ste_for_dev(struct iommu_fwspec *fwspec)
 
 static void arm_smmu_detach_dev(struct device *dev)
 {
+	unsigned long flags;
 	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
+	struct arm_smmu_domain *smmu_domain = master->domain;
+
+	if (smmu_domain) {
+		spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+		list_del(&master->list);
+		spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+
+		master->domain = NULL;
+	}
 
 	master->ste.assigned = false;
 	arm_smmu_install_ste_for_dev(dev->iommu_fwspec);
@@ -1675,6 +1696,7 @@ static void arm_smmu_detach_dev(struct device *dev)
 static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 {
 	int ret = 0;
+	unsigned long flags;
 	struct arm_smmu_device *smmu;
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 	struct arm_smmu_master_data *master;
@@ -1710,6 +1732,11 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	}
 
 	ste->assigned = true;
+	master->domain = smmu_domain;
+
+	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+	list_add(&master->list, &smmu_domain->devices);
+	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
 
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_BYPASS) {
 		ste->s1_cfg = NULL;
@@ -1820,6 +1847,7 @@ static int arm_smmu_add_device(struct device *dev)
 			return -ENOMEM;
 
 		master->smmu = smmu;
+		master->dev = dev;
 		fwspec->iommu_priv = master;
 	}
 
-- 
2.13.3


^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 11/36] iommu/arm-smmu-v3: Link domains and devices
@ 2017-10-06 13:31   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

When removing a mapping from a domain, we need to send an invalidation to
all devices that might have stored it in their Address Translation Cache
(ATC). In addition with SVM, we'll need to invalidate context descriptors
of all devices attached to a live domain.

Maintain a list of devices in each domain, protected by a spinlock. It is
updated every time we attach or detach devices to and from domains.

It needs to be a spinlock because we'll invalidate ATC entries from
within hardirq-safe contexts, but it may be possible to relax the read
side with RCU later.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 22a6b08ef014..ecc424b15749 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -644,6 +644,11 @@ struct arm_smmu_device {
 struct arm_smmu_master_data {
 	struct arm_smmu_device		*smmu;
 	struct arm_smmu_strtab_ent	ste;
+
+	struct arm_smmu_domain		*domain;
+	struct list_head		list; /* domain->devices */
+
+	struct device			*dev;
 };
 
 /* SMMU private data for an IOMMU domain */
@@ -667,6 +672,9 @@ struct arm_smmu_domain {
 	};
 
 	struct iommu_domain		domain;
+
+	struct list_head		devices;
+	spinlock_t			devices_lock;
 };
 
 struct arm_smmu_option_prop {
@@ -1461,6 +1469,9 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 	}
 
 	mutex_init(&smmu_domain->init_mutex);
+	INIT_LIST_HEAD(&smmu_domain->devices);
+	spin_lock_init(&smmu_domain->devices_lock);
+
 	return &smmu_domain->domain;
 }
 
@@ -1666,7 +1677,17 @@ static void arm_smmu_install_ste_for_dev(struct iommu_fwspec *fwspec)
 
 static void arm_smmu_detach_dev(struct device *dev)
 {
+	unsigned long flags;
 	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
+	struct arm_smmu_domain *smmu_domain = master->domain;
+
+	if (smmu_domain) {
+		spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+		list_del(&master->list);
+		spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+
+		master->domain = NULL;
+	}
 
 	master->ste.assigned = false;
 	arm_smmu_install_ste_for_dev(dev->iommu_fwspec);
@@ -1675,6 +1696,7 @@ static void arm_smmu_detach_dev(struct device *dev)
 static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 {
 	int ret = 0;
+	unsigned long flags;
 	struct arm_smmu_device *smmu;
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 	struct arm_smmu_master_data *master;
@@ -1710,6 +1732,11 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	}
 
 	ste->assigned = true;
+	master->domain = smmu_domain;
+
+	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+	list_add(&master->list, &smmu_domain->devices);
+	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
 
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_BYPASS) {
 		ste->s1_cfg = NULL;
@@ -1820,6 +1847,7 @@ static int arm_smmu_add_device(struct device *dev)
 			return -ENOMEM;
 
 		master->smmu = smmu;
+		master->dev = dev;
 		fwspec->iommu_priv = master;
 	}
 
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 12/36] dt-bindings: document stall and PASID properties for IOMMU masters
  2017-10-06 13:31 ` Jean-Philippe Brucker
@ 2017-10-06 13:31   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

On ARM systems, some platform devices behind an IOMMU may support stall
and PASID features. Stall is the ability to recover from page faults and
PASID offers multiple process address spaces to the device. Together they
allow to do paging with a device. Let the firmware tell us when a device
supports stall and PASID.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 Documentation/devicetree/bindings/iommu/iommu.txt | 24 +++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/iommu.txt b/Documentation/devicetree/bindings/iommu/iommu.txt
index 5a8b4624defc..c589b75f7277 100644
--- a/Documentation/devicetree/bindings/iommu/iommu.txt
+++ b/Documentation/devicetree/bindings/iommu/iommu.txt
@@ -86,6 +86,30 @@ have a means to turn off translation. But it is invalid in such cases to
 disable the IOMMU's device tree node in the first place because it would
 prevent any driver from properly setting up the translations.
 
+Optional properties:
+--------------------
+- dma-can-stall: When present, the master can wait for a transaction to
+  complete for an indefinite amount of time. Upon translation fault some
+  IOMMUs, instead of aborting the translation immediately, may first
+  notify the driver and keep the transaction in flight. This allows the OS
+  to inspect the fault and, for example, make physical pages resident
+  before updating the mappings and completing the transaction. Such IOMMU
+  accepts a limited number of simultaneous stalled transactions before
+  having to either put back-pressure on the master, or abort new faulting
+  transactions.
+
+  Firmware has to opt-in stalling, because most buses and masters don't
+  support it. In particular it isn't compatible with PCI, where
+  transactions have to complete before a time limit. More generally it
+  won't work in systems and masters that haven't been designed for
+  stalling. For example the OS, in order to handle a stalled transaction,
+  may attempt to retrieve pages from secondary storage in a stalled
+  domain, leading to a deadlock.
+
+- pasid-bits: Some masters support multiple address spaces for DMA. By
+  tagging DMA transactions with an address space identifier. By default,
+  this is 0, which means that the device only has one address space.
+
 
 Notes:
 ======
-- 
2.13.3


^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 12/36] dt-bindings: document stall and PASID properties for IOMMU masters
@ 2017-10-06 13:31   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

On ARM systems, some platform devices behind an IOMMU may support stall
and PASID features. Stall is the ability to recover from page faults and
PASID offers multiple process address spaces to the device. Together they
allow to do paging with a device. Let the firmware tell us when a device
supports stall and PASID.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 Documentation/devicetree/bindings/iommu/iommu.txt | 24 +++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/iommu.txt b/Documentation/devicetree/bindings/iommu/iommu.txt
index 5a8b4624defc..c589b75f7277 100644
--- a/Documentation/devicetree/bindings/iommu/iommu.txt
+++ b/Documentation/devicetree/bindings/iommu/iommu.txt
@@ -86,6 +86,30 @@ have a means to turn off translation. But it is invalid in such cases to
 disable the IOMMU's device tree node in the first place because it would
 prevent any driver from properly setting up the translations.
 
+Optional properties:
+--------------------
+- dma-can-stall: When present, the master can wait for a transaction to
+  complete for an indefinite amount of time. Upon translation fault some
+  IOMMUs, instead of aborting the translation immediately, may first
+  notify the driver and keep the transaction in flight. This allows the OS
+  to inspect the fault and, for example, make physical pages resident
+  before updating the mappings and completing the transaction. Such IOMMU
+  accepts a limited number of simultaneous stalled transactions before
+  having to either put back-pressure on the master, or abort new faulting
+  transactions.
+
+  Firmware has to opt-in stalling, because most buses and masters don't
+  support it. In particular it isn't compatible with PCI, where
+  transactions have to complete before a time limit. More generally it
+  won't work in systems and masters that haven't been designed for
+  stalling. For example the OS, in order to handle a stalled transaction,
+  may attempt to retrieve pages from secondary storage in a stalled
+  domain, leading to a deadlock.
+
+- pasid-bits: Some masters support multiple address spaces for DMA. By
+  tagging DMA transactions with an address space identifier. By default,
+  this is 0, which means that the device only has one address space.
+
 
 Notes:
 ======
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 13/36] iommu/of: Add stall and pasid properties to iommu_fwspec
  2017-10-06 13:31 ` Jean-Philippe Brucker
  (?)
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: mark.rutland-5wv7dgnIgG8,
	gabriele.paoloni-hv44wF8Li93QT0dZR+AlfA,
	catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	rjw-LthD3rsA81gm4RdzfppkhA, sudeep.holla-5wv7dgnIgG8

Add stall and pasid properties to iommu_fwspec, and fill them when
dma-can-stall and pasid-bits properties are present in the device tree.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/of_iommu.c | 10 ++++++++++
 include/linux/iommu.h    |  2 ++
 2 files changed, 12 insertions(+)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 50947ebb6d17..345286bfdbfc 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -204,6 +204,16 @@ const struct iommu_ops *of_iommu_configure(struct device *dev,
 			if (err)
 				break;
 		}
+
+		if (!err && dev->iommu_fwspec) {
+			const __be32 *prop;
+			if (of_get_property(master_np, "dma-can-stall", NULL))
+				dev->iommu_fwspec->can_stall = true;
+
+			prop = of_get_property(master_np, "pasid-bits", NULL);
+			if (prop)
+				dev->iommu_fwspec->num_pasid_bits = be32_to_cpu(*prop);
+		}
 	}
 
 	/*
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index a6d417785c7b..2eb65d4724bb 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -535,6 +535,8 @@ struct iommu_fwspec {
 	struct fwnode_handle	*iommu_fwnode;
 	void			*iommu_priv;
 	unsigned int		num_ids;
+	unsigned int		num_pasid_bits;
+	bool			can_stall;
 	u32			ids[1];
 };
 
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 13/36] iommu/of: Add stall and pasid properties to iommu_fwspec
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

Add stall and pasid properties to iommu_fwspec, and fill them when
dma-can-stall and pasid-bits properties are present in the device tree.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/of_iommu.c | 10 ++++++++++
 include/linux/iommu.h    |  2 ++
 2 files changed, 12 insertions(+)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 50947ebb6d17..345286bfdbfc 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -204,6 +204,16 @@ const struct iommu_ops *of_iommu_configure(struct device *dev,
 			if (err)
 				break;
 		}
+
+		if (!err && dev->iommu_fwspec) {
+			const __be32 *prop;
+			if (of_get_property(master_np, "dma-can-stall", NULL))
+				dev->iommu_fwspec->can_stall = true;
+
+			prop = of_get_property(master_np, "pasid-bits", NULL);
+			if (prop)
+				dev->iommu_fwspec->num_pasid_bits = be32_to_cpu(*prop);
+		}
 	}
 
 	/*
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index a6d417785c7b..2eb65d4724bb 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -535,6 +535,8 @@ struct iommu_fwspec {
 	struct fwnode_handle	*iommu_fwnode;
 	void			*iommu_priv;
 	unsigned int		num_ids;
+	unsigned int		num_pasid_bits;
+	bool			can_stall;
 	u32			ids[1];
 };
 
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 13/36] iommu/of: Add stall and pasid properties to iommu_fwspec
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

Add stall and pasid properties to iommu_fwspec, and fill them when
dma-can-stall and pasid-bits properties are present in the device tree.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/of_iommu.c | 10 ++++++++++
 include/linux/iommu.h    |  2 ++
 2 files changed, 12 insertions(+)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 50947ebb6d17..345286bfdbfc 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -204,6 +204,16 @@ const struct iommu_ops *of_iommu_configure(struct device *dev,
 			if (err)
 				break;
 		}
+
+		if (!err && dev->iommu_fwspec) {
+			const __be32 *prop;
+			if (of_get_property(master_np, "dma-can-stall", NULL))
+				dev->iommu_fwspec->can_stall = true;
+
+			prop = of_get_property(master_np, "pasid-bits", NULL);
+			if (prop)
+				dev->iommu_fwspec->num_pasid_bits = be32_to_cpu(*prop);
+		}
 	}
 
 	/*
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index a6d417785c7b..2eb65d4724bb 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -535,6 +535,8 @@ struct iommu_fwspec {
 	struct fwnode_handle	*iommu_fwnode;
 	void			*iommu_priv;
 	unsigned int		num_ids;
+	unsigned int		num_pasid_bits;
+	bool			can_stall;
 	u32			ids[1];
 };
 
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for Substream IDs
  2017-10-06 13:31 ` Jean-Philippe Brucker
@ 2017-10-06 13:31   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

At the moment, the SMMUv3 driver offers only one stage-1 or stage-2
address space to each device. SMMUv3 allows to associate multiple address
spaces per device. In addition to the Stream ID (SID), that identifies a
device, we can now have Substream IDs (SSID) identifying an address space.
In PCIe lingo, SID is called Requester ID (RID) and SSID is called Process
Address-Space ID (PASID).

Prepare the driver for SSID support, by adding context descriptor tables
in STEs (previously a single static context descriptor). A complete
stage-1 walk is now performed like this by the SMMU:

      Stream tables          Ctx. tables          Page tables
        +--------+   ,------->+-------+   ,------->+-------+
        :        :   |        :       :   |        :       :
        +--------+   |        +-------+   |        +-------+
   SID->|  STE   |---'  SSID->|  CD   |---'  IOVA->|  PTE  |--> IPA
        +--------+            +-------+            +-------+
        :        :            :       :            :       :
        +--------+            +-------+            +-------+

SSIDs are allocated by the core.

Note that we only implement one level of context descriptor table for now,
but as with stream and page tables, an SSID can be split to target
multiple levels of tables.

In all stream table entries, we set S1DSS=SSID0 mode, which forces all
traffic lacking an SSID to be routed to context descriptor 0.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 228 ++++++++++++++++++++++++++++++++++++--------
 1 file changed, 188 insertions(+), 40 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index ecc424b15749..37061e1cbae4 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -244,6 +244,12 @@
 #define STRTAB_STE_0_S1CDMAX_SHIFT	59
 #define STRTAB_STE_0_S1CDMAX_MASK	0x1fUL
 
+#define STRTAB_STE_1_S1DSS_SHIFT	0
+#define STRTAB_STE_1_S1DSS_MASK		0x3UL
+#define STRTAB_STE_1_S1DSS_TERMINATE	(0x0 << STRTAB_STE_1_S1DSS_SHIFT)
+#define STRTAB_STE_1_S1DSS_BYPASS	(0x1 << STRTAB_STE_1_S1DSS_SHIFT)
+#define STRTAB_STE_1_S1DSS_SSID0	(0x2 << STRTAB_STE_1_S1DSS_SHIFT)
+
 #define STRTAB_STE_1_S1C_CACHE_NC	0UL
 #define STRTAB_STE_1_S1C_CACHE_WBRA	1UL
 #define STRTAB_STE_1_S1C_CACHE_WT	2UL
@@ -349,10 +355,14 @@
 #define CMDQ_0_OP_MASK			0xffUL
 #define CMDQ_0_SSV			(1UL << 11)
 
+#define CMDQ_PREFETCH_0_SSID_SHIFT	12
+#define CMDQ_PREFETCH_0_SSID_MASK	0xfffffUL
 #define CMDQ_PREFETCH_0_SID_SHIFT	32
 #define CMDQ_PREFETCH_1_SIZE_SHIFT	0
 #define CMDQ_PREFETCH_1_ADDR_MASK	~0xfffUL
 
+#define CMDQ_CFGI_0_SSID_SHIFT		12
+#define CMDQ_CFGI_0_SSID_MASK		0xfffffUL
 #define CMDQ_CFGI_0_SID_SHIFT		32
 #define CMDQ_CFGI_0_SID_MASK		0xffffffffUL
 #define CMDQ_CFGI_1_LEAF		(1UL << 0)
@@ -469,14 +479,18 @@ struct arm_smmu_cmdq_ent {
 		#define CMDQ_OP_PREFETCH_CFG	0x1
 		struct {
 			u32			sid;
+			u32			ssid;
 			u8			size;
 			u64			addr;
 		} prefetch;
 
 		#define CMDQ_OP_CFGI_STE	0x3
 		#define CMDQ_OP_CFGI_ALL	0x4
+		#define CMDQ_OP_CFGI_CD		0x5
+		#define CMDQ_OP_CFGI_CD_ALL	0x6
 		struct {
 			u32			sid;
+			u32			ssid;
 			union {
 				bool		leaf;
 				u8		span;
@@ -546,16 +560,20 @@ struct arm_smmu_strtab_l1_desc {
 	dma_addr_t			l2ptr_dma;
 };
 
+struct arm_smmu_ctx_desc {
+	u16				asid;
+	u64				ttbr;
+	u64				tcr;
+	u64				mair;
+};
+
 struct arm_smmu_s1_cfg {
 	__le64				*cdptr;
 	dma_addr_t			cdptr_dma;
 
-	struct arm_smmu_ctx_desc {
-		u16	asid;
-		u64	ttbr;
-		u64	tcr;
-		u64	mair;
-	}				cd;
+	size_t				num_contexts;
+
+	struct arm_smmu_ctx_desc	cd; /* Default context (SSID0) */
 };
 
 struct arm_smmu_s2_cfg {
@@ -649,6 +667,8 @@ struct arm_smmu_master_data {
 	struct list_head		list; /* domain->devices */
 
 	struct device			*dev;
+
+	size_t				num_ssids;
 };
 
 /* SMMU private data for an IOMMU domain */
@@ -840,14 +860,22 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 	case CMDQ_OP_TLBI_NSNH_ALL:
 		break;
 	case CMDQ_OP_PREFETCH_CFG:
+		cmd[0] |= ent->substream_valid ? CMDQ_0_SSV : 0;
 		cmd[0] |= (u64)ent->prefetch.sid << CMDQ_PREFETCH_0_SID_SHIFT;
+		cmd[0] |= ent->prefetch.ssid << CMDQ_PREFETCH_0_SSID_SHIFT;
 		cmd[1] |= ent->prefetch.size << CMDQ_PREFETCH_1_SIZE_SHIFT;
 		cmd[1] |= ent->prefetch.addr & CMDQ_PREFETCH_1_ADDR_MASK;
 		break;
+	case CMDQ_OP_CFGI_CD:
+		cmd[0] |= ent->cfgi.ssid << CMDQ_CFGI_0_SSID_SHIFT;
+		/* pass through */
 	case CMDQ_OP_CFGI_STE:
 		cmd[0] |= (u64)ent->cfgi.sid << CMDQ_CFGI_0_SID_SHIFT;
 		cmd[1] |= ent->cfgi.leaf ? CMDQ_CFGI_1_LEAF : 0;
 		break;
+	case CMDQ_OP_CFGI_CD_ALL:
+		cmd[0] |= (u64)ent->cfgi.sid << CMDQ_CFGI_0_SID_SHIFT;
+		break;
 	case CMDQ_OP_CFGI_ALL:
 		/* Cover the entire SID range */
 		cmd[1] |= CMDQ_CFGI_1_RANGE_MASK << CMDQ_CFGI_1_RANGE_SHIFT;
@@ -972,6 +1000,35 @@ static void arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu,
 }
 
 /* Context descriptor manipulation functions */
+static void arm_smmu_sync_cd(struct arm_smmu_domain *smmu_domain, u32 ssid)
+{
+	size_t i;
+	unsigned long flags;
+	struct arm_smmu_master_data *master;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	struct arm_smmu_cmdq_ent cmd = {
+		.opcode = CMDQ_OP_CFGI_CD,
+		.cfgi   = {
+			.ssid   = ssid,
+			.leaf   = true,
+		},
+	};
+
+	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+	list_for_each_entry(master, &smmu_domain->devices, list) {
+		struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
+
+		for (i = 0; i < fwspec->num_ids; i++) {
+			cmd.cfgi.sid = fwspec->ids[i];
+			arm_smmu_cmdq_issue_cmd(smmu, &cmd);
+		}
+	}
+	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+
+	cmd.opcode = CMDQ_OP_CMD_SYNC;
+	arm_smmu_cmdq_issue_cmd(smmu, &cmd);
+}
+
 static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
 {
 	u64 val = 0;
@@ -990,33 +1047,116 @@ static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
 	return val;
 }
 
-static void arm_smmu_write_ctx_desc(struct arm_smmu_device *smmu,
-				    struct arm_smmu_s1_cfg *cfg)
+static void arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
+				    u32 ssid, struct arm_smmu_ctx_desc *cd)
 {
 	u64 val;
+	bool cd_live;
+	__u64 *cdptr = (__u64 *)smmu_domain->s1_cfg.cdptr + ssid * CTXDESC_CD_DWORDS;
 
 	/*
-	 * We don't need to issue any invalidation here, as we'll invalidate
-	 * the STE when installing the new entry anyway.
+	 * This function handles the following cases:
+	 *
+	 * (1) Install primary CD, for normal DMA traffic (SSID = 0). In this
+	 *     case, invalidation is performed when installing the STE.
+	 * (2) Install a secondary CD, for SID+SSID traffic, followed by an
+	 *     invalidation.
+	 * (3) Update ASID of primary CD. This is allowed by atomically writing
+	 *     the first 64 bits of the CD, followed by invalidation of the old
+	 *     entry and mappings.
+	 * (4) Remove a secondary CD and invalidate it.
 	 */
-	val = arm_smmu_cpu_tcr_to_cd(cfg->cd.tcr) |
+
+	val = le64_to_cpu(cdptr[0]);
+	cd_live = !!(val & CTXDESC_CD_0_V);
+
+	if (!cd) {
+		/* (4) */
+		cdptr[0] = 0;
+		if (ssid)
+			arm_smmu_sync_cd(smmu_domain, ssid);
+		return;
+	}
+
+	if (cd_live) {
+		/* (3) */
+		val &= ~(CTXDESC_CD_0_ASID_MASK << CTXDESC_CD_0_ASID_SHIFT);
+		val |= (u64)cd->asid << CTXDESC_CD_0_ASID_SHIFT;
+
+		cdptr[0] = cpu_to_le64(val);
+		/*
+		 * Until CD+TLB invalidation, both ASIDs may be used for tagging
+		 * this substream's traffic
+		 */
+
+	} else {
+		/* (1) and (2) */
+		cdptr[1] = cpu_to_le64(cd->ttbr & CTXDESC_CD_1_TTB0_MASK
+				       << CTXDESC_CD_1_TTB0_SHIFT);
+		cdptr[2] = 0;
+		cdptr[3] = cpu_to_le64(cd->mair << CTXDESC_CD_3_MAIR_SHIFT);
+
+		if (ssid)
+			/*
+			 * STE is live, and the SMMU might fetch this CD at any
+			 * time. Ensure it observes the rest of the CD before we
+			 * enable it.
+			 */
+			arm_smmu_sync_cd(smmu_domain, ssid);
+
+		val = arm_smmu_cpu_tcr_to_cd(cd->tcr) |
 #ifdef __BIG_ENDIAN
-	      CTXDESC_CD_0_ENDI |
+		      CTXDESC_CD_0_ENDI |
 #endif
-	      CTXDESC_CD_0_R | CTXDESC_CD_0_A | CTXDESC_CD_0_ASET_PRIVATE |
-	      CTXDESC_CD_0_AA64 | (u64)cfg->cd.asid << CTXDESC_CD_0_ASID_SHIFT |
-	      CTXDESC_CD_0_V;
+		      CTXDESC_CD_0_R | CTXDESC_CD_0_A |
+		      CTXDESC_CD_0_ASET_PRIVATE |
+		      CTXDESC_CD_0_AA64 |
+		      (u64)cd->asid << CTXDESC_CD_0_ASID_SHIFT |
+		      CTXDESC_CD_0_V;
+
+		/* STALL_MODEL==0b10 && CD.S==0 is ILLEGAL */
+		if (smmu_domain->smmu->features & ARM_SMMU_FEAT_STALL_FORCE)
+			val |= CTXDESC_CD_0_S;
+
+		cdptr[0] = cpu_to_le64(val);
+	}
+
+	if (ssid || cd_live)
+		arm_smmu_sync_cd(smmu_domain, ssid);
+}
+
+static int arm_smmu_alloc_cd_tables(struct arm_smmu_domain *smmu_domain)
+{
+	int num_ssids;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
+
+	if (WARN_ON(smmu_domain->stage != ARM_SMMU_DOMAIN_S1))
+		return -EINVAL;
+
+	num_ssids = cfg->num_contexts;
+
+	cfg->cdptr = dmam_alloc_coherent(smmu->dev,
+					 num_ssids * (CTXDESC_CD_DWORDS << 3),
+					 &cfg->cdptr_dma,
+					 GFP_KERNEL | __GFP_ZERO);
+	if (!cfg->cdptr)
+		return -ENOMEM;
 
-	/* STALL_MODEL==0b10 && CD.S==0 is ILLEGAL */
-	if (smmu->features & ARM_SMMU_FEAT_STALL_FORCE)
-		val |= CTXDESC_CD_0_S;
+	return 0;
+}
 
-	cfg->cdptr[0] = cpu_to_le64(val);
+static void arm_smmu_free_cd_tables(struct arm_smmu_domain *smmu_domain)
+{
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
 
-	val = cfg->cd.ttbr & CTXDESC_CD_1_TTB0_MASK << CTXDESC_CD_1_TTB0_SHIFT;
-	cfg->cdptr[1] = cpu_to_le64(val);
+	if (WARN_ON(smmu_domain->stage != ARM_SMMU_DOMAIN_S1))
+		return;
 
-	cfg->cdptr[3] = cpu_to_le64(cfg->cd.mair << CTXDESC_CD_3_MAIR_SHIFT);
+	dmam_free_coherent(smmu->dev,
+			   cfg->num_contexts * (CTXDESC_CD_DWORDS << 3),
+			   cfg->cdptr, cfg->cdptr_dma);
 }
 
 /* Stream table manipulation functions */
@@ -1115,8 +1255,12 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 	}
 
 	if (ste->s1_cfg) {
+		unsigned int s1cdmax = ilog2(ste->s1_cfg->num_contexts);
+
 		BUG_ON(ste_live);
+
 		dst[1] = cpu_to_le64(
+			 STRTAB_STE_1_S1DSS_SSID0 |
 			 STRTAB_STE_1_S1C_CACHE_WBRA
 			 << STRTAB_STE_1_S1CIR_SHIFT |
 			 STRTAB_STE_1_S1C_CACHE_WBRA
@@ -1133,6 +1277,9 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 
 		val |= (ste->s1_cfg->cdptr_dma & STRTAB_STE_0_S1CTXPTR_MASK
 		        << STRTAB_STE_0_S1CTXPTR_SHIFT) |
+			(u64)(s1cdmax & STRTAB_STE_0_S1CDMAX_MASK)
+			<< STRTAB_STE_0_S1CDMAX_SHIFT |
+			STRTAB_STE_0_S1FMT_LINEAR |
 			STRTAB_STE_0_CFG_S1_TRANS;
 	}
 
@@ -1501,16 +1648,11 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
 	iommu_put_dma_cookie(domain);
 	free_io_pgtable_ops(smmu_domain->pgtbl_ops);
 
-	/* Free the CD and ASID, if we allocated them */
+	/* Free the CD table and ASID, if we allocated them */
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
 		struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
-
-		if (cfg->cdptr) {
-			dmam_free_coherent(smmu_domain->smmu->dev,
-					   CTXDESC_CD_DWORDS << 3,
-					   cfg->cdptr,
-					   cfg->cdptr_dma);
-
+		if (cfg->num_contexts) {
+			arm_smmu_free_cd_tables(smmu_domain);
 			arm_smmu_bitmap_free(smmu->asid_map, cfg->cd.asid);
 		}
 	} else {
@@ -1534,14 +1676,9 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 	if (asid < 0)
 		return asid;
 
-	cfg->cdptr = dmam_alloc_coherent(smmu->dev, CTXDESC_CD_DWORDS << 3,
-					 &cfg->cdptr_dma,
-					 GFP_KERNEL | __GFP_ZERO);
-	if (!cfg->cdptr) {
-		dev_warn(smmu->dev, "failed to allocate context descriptor\n");
-		ret = -ENOMEM;
+	ret = arm_smmu_alloc_cd_tables(smmu_domain);
+	if (ret)
 		goto out_free_asid;
-	}
 
 	cfg->cd.asid	= (u16)asid;
 	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
@@ -1571,7 +1708,8 @@ static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain,
 	return 0;
 }
 
-static int arm_smmu_domain_finalise(struct iommu_domain *domain)
+static int arm_smmu_domain_finalise(struct iommu_domain *domain,
+				    struct arm_smmu_master_data *master)
 {
 	int ret;
 	unsigned long ias, oas;
@@ -1600,6 +1738,12 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain)
 		oas = smmu->ias;
 		fmt = ARM_64_LPAE_S1;
 		finalise_stage_fn = arm_smmu_domain_finalise_s1;
+
+		if (master->num_ssids) {
+			domain->min_pasid = 1;
+			domain->max_pasid = master->num_ssids - 1;
+			smmu_domain->s1_cfg.num_contexts = master->num_ssids;
+		}
 		break;
 	case ARM_SMMU_DOMAIN_NESTED:
 	case ARM_SMMU_DOMAIN_S2:
@@ -1717,7 +1861,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 
 	if (!smmu_domain->smmu) {
 		smmu_domain->smmu = smmu;
-		ret = arm_smmu_domain_finalise(domain);
+		ret = arm_smmu_domain_finalise(domain, master);
 		if (ret) {
 			smmu_domain->smmu = NULL;
 			goto out_unlock;
@@ -1744,7 +1888,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	} else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
 		ste->s1_cfg = &smmu_domain->s1_cfg;
 		ste->s2_cfg = NULL;
-		arm_smmu_write_ctx_desc(smmu, ste->s1_cfg);
+		arm_smmu_write_ctx_desc(smmu_domain, 0, &ste->s1_cfg->cd);
 	} else {
 		ste->s1_cfg = NULL;
 		ste->s2_cfg = &smmu_domain->s2_cfg;
@@ -1866,6 +2010,10 @@ static int arm_smmu_add_device(struct device *dev)
 		}
 	}
 
+	if (smmu->ssid_bits)
+		master->num_ssids = 1 << min(smmu->ssid_bits,
+					     fwspec->num_pasid_bits);
+
 	group = iommu_group_get_for_dev(dev);
 	if (!IS_ERR(group)) {
 		iommu_group_put(group);
-- 
2.13.3


^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for Substream IDs
@ 2017-10-06 13:31   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

At the moment, the SMMUv3 driver offers only one stage-1 or stage-2
address space to each device. SMMUv3 allows to associate multiple address
spaces per device. In addition to the Stream ID (SID), that identifies a
device, we can now have Substream IDs (SSID) identifying an address space.
In PCIe lingo, SID is called Requester ID (RID) and SSID is called Process
Address-Space ID (PASID).

Prepare the driver for SSID support, by adding context descriptor tables
in STEs (previously a single static context descriptor). A complete
stage-1 walk is now performed like this by the SMMU:

      Stream tables          Ctx. tables          Page tables
        +--------+   ,------->+-------+   ,------->+-------+
        :        :   |        :       :   |        :       :
        +--------+   |        +-------+   |        +-------+
   SID->|  STE   |---'  SSID->|  CD   |---'  IOVA->|  PTE  |--> IPA
        +--------+            +-------+            +-------+
        :        :            :       :            :       :
        +--------+            +-------+            +-------+

SSIDs are allocated by the core.

Note that we only implement one level of context descriptor table for now,
but as with stream and page tables, an SSID can be split to target
multiple levels of tables.

In all stream table entries, we set S1DSS=SSID0 mode, which forces all
traffic lacking an SSID to be routed to context descriptor 0.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 228 ++++++++++++++++++++++++++++++++++++--------
 1 file changed, 188 insertions(+), 40 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index ecc424b15749..37061e1cbae4 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -244,6 +244,12 @@
 #define STRTAB_STE_0_S1CDMAX_SHIFT	59
 #define STRTAB_STE_0_S1CDMAX_MASK	0x1fUL
 
+#define STRTAB_STE_1_S1DSS_SHIFT	0
+#define STRTAB_STE_1_S1DSS_MASK		0x3UL
+#define STRTAB_STE_1_S1DSS_TERMINATE	(0x0 << STRTAB_STE_1_S1DSS_SHIFT)
+#define STRTAB_STE_1_S1DSS_BYPASS	(0x1 << STRTAB_STE_1_S1DSS_SHIFT)
+#define STRTAB_STE_1_S1DSS_SSID0	(0x2 << STRTAB_STE_1_S1DSS_SHIFT)
+
 #define STRTAB_STE_1_S1C_CACHE_NC	0UL
 #define STRTAB_STE_1_S1C_CACHE_WBRA	1UL
 #define STRTAB_STE_1_S1C_CACHE_WT	2UL
@@ -349,10 +355,14 @@
 #define CMDQ_0_OP_MASK			0xffUL
 #define CMDQ_0_SSV			(1UL << 11)
 
+#define CMDQ_PREFETCH_0_SSID_SHIFT	12
+#define CMDQ_PREFETCH_0_SSID_MASK	0xfffffUL
 #define CMDQ_PREFETCH_0_SID_SHIFT	32
 #define CMDQ_PREFETCH_1_SIZE_SHIFT	0
 #define CMDQ_PREFETCH_1_ADDR_MASK	~0xfffUL
 
+#define CMDQ_CFGI_0_SSID_SHIFT		12
+#define CMDQ_CFGI_0_SSID_MASK		0xfffffUL
 #define CMDQ_CFGI_0_SID_SHIFT		32
 #define CMDQ_CFGI_0_SID_MASK		0xffffffffUL
 #define CMDQ_CFGI_1_LEAF		(1UL << 0)
@@ -469,14 +479,18 @@ struct arm_smmu_cmdq_ent {
 		#define CMDQ_OP_PREFETCH_CFG	0x1
 		struct {
 			u32			sid;
+			u32			ssid;
 			u8			size;
 			u64			addr;
 		} prefetch;
 
 		#define CMDQ_OP_CFGI_STE	0x3
 		#define CMDQ_OP_CFGI_ALL	0x4
+		#define CMDQ_OP_CFGI_CD		0x5
+		#define CMDQ_OP_CFGI_CD_ALL	0x6
 		struct {
 			u32			sid;
+			u32			ssid;
 			union {
 				bool		leaf;
 				u8		span;
@@ -546,16 +560,20 @@ struct arm_smmu_strtab_l1_desc {
 	dma_addr_t			l2ptr_dma;
 };
 
+struct arm_smmu_ctx_desc {
+	u16				asid;
+	u64				ttbr;
+	u64				tcr;
+	u64				mair;
+};
+
 struct arm_smmu_s1_cfg {
 	__le64				*cdptr;
 	dma_addr_t			cdptr_dma;
 
-	struct arm_smmu_ctx_desc {
-		u16	asid;
-		u64	ttbr;
-		u64	tcr;
-		u64	mair;
-	}				cd;
+	size_t				num_contexts;
+
+	struct arm_smmu_ctx_desc	cd; /* Default context (SSID0) */
 };
 
 struct arm_smmu_s2_cfg {
@@ -649,6 +667,8 @@ struct arm_smmu_master_data {
 	struct list_head		list; /* domain->devices */
 
 	struct device			*dev;
+
+	size_t				num_ssids;
 };
 
 /* SMMU private data for an IOMMU domain */
@@ -840,14 +860,22 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 	case CMDQ_OP_TLBI_NSNH_ALL:
 		break;
 	case CMDQ_OP_PREFETCH_CFG:
+		cmd[0] |= ent->substream_valid ? CMDQ_0_SSV : 0;
 		cmd[0] |= (u64)ent->prefetch.sid << CMDQ_PREFETCH_0_SID_SHIFT;
+		cmd[0] |= ent->prefetch.ssid << CMDQ_PREFETCH_0_SSID_SHIFT;
 		cmd[1] |= ent->prefetch.size << CMDQ_PREFETCH_1_SIZE_SHIFT;
 		cmd[1] |= ent->prefetch.addr & CMDQ_PREFETCH_1_ADDR_MASK;
 		break;
+	case CMDQ_OP_CFGI_CD:
+		cmd[0] |= ent->cfgi.ssid << CMDQ_CFGI_0_SSID_SHIFT;
+		/* pass through */
 	case CMDQ_OP_CFGI_STE:
 		cmd[0] |= (u64)ent->cfgi.sid << CMDQ_CFGI_0_SID_SHIFT;
 		cmd[1] |= ent->cfgi.leaf ? CMDQ_CFGI_1_LEAF : 0;
 		break;
+	case CMDQ_OP_CFGI_CD_ALL:
+		cmd[0] |= (u64)ent->cfgi.sid << CMDQ_CFGI_0_SID_SHIFT;
+		break;
 	case CMDQ_OP_CFGI_ALL:
 		/* Cover the entire SID range */
 		cmd[1] |= CMDQ_CFGI_1_RANGE_MASK << CMDQ_CFGI_1_RANGE_SHIFT;
@@ -972,6 +1000,35 @@ static void arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu,
 }
 
 /* Context descriptor manipulation functions */
+static void arm_smmu_sync_cd(struct arm_smmu_domain *smmu_domain, u32 ssid)
+{
+	size_t i;
+	unsigned long flags;
+	struct arm_smmu_master_data *master;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	struct arm_smmu_cmdq_ent cmd = {
+		.opcode = CMDQ_OP_CFGI_CD,
+		.cfgi   = {
+			.ssid   = ssid,
+			.leaf   = true,
+		},
+	};
+
+	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+	list_for_each_entry(master, &smmu_domain->devices, list) {
+		struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
+
+		for (i = 0; i < fwspec->num_ids; i++) {
+			cmd.cfgi.sid = fwspec->ids[i];
+			arm_smmu_cmdq_issue_cmd(smmu, &cmd);
+		}
+	}
+	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+
+	cmd.opcode = CMDQ_OP_CMD_SYNC;
+	arm_smmu_cmdq_issue_cmd(smmu, &cmd);
+}
+
 static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
 {
 	u64 val = 0;
@@ -990,33 +1047,116 @@ static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
 	return val;
 }
 
-static void arm_smmu_write_ctx_desc(struct arm_smmu_device *smmu,
-				    struct arm_smmu_s1_cfg *cfg)
+static void arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
+				    u32 ssid, struct arm_smmu_ctx_desc *cd)
 {
 	u64 val;
+	bool cd_live;
+	__u64 *cdptr = (__u64 *)smmu_domain->s1_cfg.cdptr + ssid * CTXDESC_CD_DWORDS;
 
 	/*
-	 * We don't need to issue any invalidation here, as we'll invalidate
-	 * the STE when installing the new entry anyway.
+	 * This function handles the following cases:
+	 *
+	 * (1) Install primary CD, for normal DMA traffic (SSID = 0). In this
+	 *     case, invalidation is performed when installing the STE.
+	 * (2) Install a secondary CD, for SID+SSID traffic, followed by an
+	 *     invalidation.
+	 * (3) Update ASID of primary CD. This is allowed by atomically writing
+	 *     the first 64 bits of the CD, followed by invalidation of the old
+	 *     entry and mappings.
+	 * (4) Remove a secondary CD and invalidate it.
 	 */
-	val = arm_smmu_cpu_tcr_to_cd(cfg->cd.tcr) |
+
+	val = le64_to_cpu(cdptr[0]);
+	cd_live = !!(val & CTXDESC_CD_0_V);
+
+	if (!cd) {
+		/* (4) */
+		cdptr[0] = 0;
+		if (ssid)
+			arm_smmu_sync_cd(smmu_domain, ssid);
+		return;
+	}
+
+	if (cd_live) {
+		/* (3) */
+		val &= ~(CTXDESC_CD_0_ASID_MASK << CTXDESC_CD_0_ASID_SHIFT);
+		val |= (u64)cd->asid << CTXDESC_CD_0_ASID_SHIFT;
+
+		cdptr[0] = cpu_to_le64(val);
+		/*
+		 * Until CD+TLB invalidation, both ASIDs may be used for tagging
+		 * this substream's traffic
+		 */
+
+	} else {
+		/* (1) and (2) */
+		cdptr[1] = cpu_to_le64(cd->ttbr & CTXDESC_CD_1_TTB0_MASK
+				       << CTXDESC_CD_1_TTB0_SHIFT);
+		cdptr[2] = 0;
+		cdptr[3] = cpu_to_le64(cd->mair << CTXDESC_CD_3_MAIR_SHIFT);
+
+		if (ssid)
+			/*
+			 * STE is live, and the SMMU might fetch this CD@any
+			 * time. Ensure it observes the rest of the CD before we
+			 * enable it.
+			 */
+			arm_smmu_sync_cd(smmu_domain, ssid);
+
+		val = arm_smmu_cpu_tcr_to_cd(cd->tcr) |
 #ifdef __BIG_ENDIAN
-	      CTXDESC_CD_0_ENDI |
+		      CTXDESC_CD_0_ENDI |
 #endif
-	      CTXDESC_CD_0_R | CTXDESC_CD_0_A | CTXDESC_CD_0_ASET_PRIVATE |
-	      CTXDESC_CD_0_AA64 | (u64)cfg->cd.asid << CTXDESC_CD_0_ASID_SHIFT |
-	      CTXDESC_CD_0_V;
+		      CTXDESC_CD_0_R | CTXDESC_CD_0_A |
+		      CTXDESC_CD_0_ASET_PRIVATE |
+		      CTXDESC_CD_0_AA64 |
+		      (u64)cd->asid << CTXDESC_CD_0_ASID_SHIFT |
+		      CTXDESC_CD_0_V;
+
+		/* STALL_MODEL==0b10 && CD.S==0 is ILLEGAL */
+		if (smmu_domain->smmu->features & ARM_SMMU_FEAT_STALL_FORCE)
+			val |= CTXDESC_CD_0_S;
+
+		cdptr[0] = cpu_to_le64(val);
+	}
+
+	if (ssid || cd_live)
+		arm_smmu_sync_cd(smmu_domain, ssid);
+}
+
+static int arm_smmu_alloc_cd_tables(struct arm_smmu_domain *smmu_domain)
+{
+	int num_ssids;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
+
+	if (WARN_ON(smmu_domain->stage != ARM_SMMU_DOMAIN_S1))
+		return -EINVAL;
+
+	num_ssids = cfg->num_contexts;
+
+	cfg->cdptr = dmam_alloc_coherent(smmu->dev,
+					 num_ssids * (CTXDESC_CD_DWORDS << 3),
+					 &cfg->cdptr_dma,
+					 GFP_KERNEL | __GFP_ZERO);
+	if (!cfg->cdptr)
+		return -ENOMEM;
 
-	/* STALL_MODEL==0b10 && CD.S==0 is ILLEGAL */
-	if (smmu->features & ARM_SMMU_FEAT_STALL_FORCE)
-		val |= CTXDESC_CD_0_S;
+	return 0;
+}
 
-	cfg->cdptr[0] = cpu_to_le64(val);
+static void arm_smmu_free_cd_tables(struct arm_smmu_domain *smmu_domain)
+{
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
 
-	val = cfg->cd.ttbr & CTXDESC_CD_1_TTB0_MASK << CTXDESC_CD_1_TTB0_SHIFT;
-	cfg->cdptr[1] = cpu_to_le64(val);
+	if (WARN_ON(smmu_domain->stage != ARM_SMMU_DOMAIN_S1))
+		return;
 
-	cfg->cdptr[3] = cpu_to_le64(cfg->cd.mair << CTXDESC_CD_3_MAIR_SHIFT);
+	dmam_free_coherent(smmu->dev,
+			   cfg->num_contexts * (CTXDESC_CD_DWORDS << 3),
+			   cfg->cdptr, cfg->cdptr_dma);
 }
 
 /* Stream table manipulation functions */
@@ -1115,8 +1255,12 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 	}
 
 	if (ste->s1_cfg) {
+		unsigned int s1cdmax = ilog2(ste->s1_cfg->num_contexts);
+
 		BUG_ON(ste_live);
+
 		dst[1] = cpu_to_le64(
+			 STRTAB_STE_1_S1DSS_SSID0 |
 			 STRTAB_STE_1_S1C_CACHE_WBRA
 			 << STRTAB_STE_1_S1CIR_SHIFT |
 			 STRTAB_STE_1_S1C_CACHE_WBRA
@@ -1133,6 +1277,9 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 
 		val |= (ste->s1_cfg->cdptr_dma & STRTAB_STE_0_S1CTXPTR_MASK
 		        << STRTAB_STE_0_S1CTXPTR_SHIFT) |
+			(u64)(s1cdmax & STRTAB_STE_0_S1CDMAX_MASK)
+			<< STRTAB_STE_0_S1CDMAX_SHIFT |
+			STRTAB_STE_0_S1FMT_LINEAR |
 			STRTAB_STE_0_CFG_S1_TRANS;
 	}
 
@@ -1501,16 +1648,11 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
 	iommu_put_dma_cookie(domain);
 	free_io_pgtable_ops(smmu_domain->pgtbl_ops);
 
-	/* Free the CD and ASID, if we allocated them */
+	/* Free the CD table and ASID, if we allocated them */
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
 		struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
-
-		if (cfg->cdptr) {
-			dmam_free_coherent(smmu_domain->smmu->dev,
-					   CTXDESC_CD_DWORDS << 3,
-					   cfg->cdptr,
-					   cfg->cdptr_dma);
-
+		if (cfg->num_contexts) {
+			arm_smmu_free_cd_tables(smmu_domain);
 			arm_smmu_bitmap_free(smmu->asid_map, cfg->cd.asid);
 		}
 	} else {
@@ -1534,14 +1676,9 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 	if (asid < 0)
 		return asid;
 
-	cfg->cdptr = dmam_alloc_coherent(smmu->dev, CTXDESC_CD_DWORDS << 3,
-					 &cfg->cdptr_dma,
-					 GFP_KERNEL | __GFP_ZERO);
-	if (!cfg->cdptr) {
-		dev_warn(smmu->dev, "failed to allocate context descriptor\n");
-		ret = -ENOMEM;
+	ret = arm_smmu_alloc_cd_tables(smmu_domain);
+	if (ret)
 		goto out_free_asid;
-	}
 
 	cfg->cd.asid	= (u16)asid;
 	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
@@ -1571,7 +1708,8 @@ static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain,
 	return 0;
 }
 
-static int arm_smmu_domain_finalise(struct iommu_domain *domain)
+static int arm_smmu_domain_finalise(struct iommu_domain *domain,
+				    struct arm_smmu_master_data *master)
 {
 	int ret;
 	unsigned long ias, oas;
@@ -1600,6 +1738,12 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain)
 		oas = smmu->ias;
 		fmt = ARM_64_LPAE_S1;
 		finalise_stage_fn = arm_smmu_domain_finalise_s1;
+
+		if (master->num_ssids) {
+			domain->min_pasid = 1;
+			domain->max_pasid = master->num_ssids - 1;
+			smmu_domain->s1_cfg.num_contexts = master->num_ssids;
+		}
 		break;
 	case ARM_SMMU_DOMAIN_NESTED:
 	case ARM_SMMU_DOMAIN_S2:
@@ -1717,7 +1861,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 
 	if (!smmu_domain->smmu) {
 		smmu_domain->smmu = smmu;
-		ret = arm_smmu_domain_finalise(domain);
+		ret = arm_smmu_domain_finalise(domain, master);
 		if (ret) {
 			smmu_domain->smmu = NULL;
 			goto out_unlock;
@@ -1744,7 +1888,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	} else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
 		ste->s1_cfg = &smmu_domain->s1_cfg;
 		ste->s2_cfg = NULL;
-		arm_smmu_write_ctx_desc(smmu, ste->s1_cfg);
+		arm_smmu_write_ctx_desc(smmu_domain, 0, &ste->s1_cfg->cd);
 	} else {
 		ste->s1_cfg = NULL;
 		ste->s2_cfg = &smmu_domain->s2_cfg;
@@ -1866,6 +2010,10 @@ static int arm_smmu_add_device(struct device *dev)
 		}
 	}
 
+	if (smmu->ssid_bits)
+		master->num_ssids = 1 << min(smmu->ssid_bits,
+					     fwspec->num_pasid_bits);
+
 	group = iommu_group_get_for_dev(dev);
 	if (!IS_ERR(group)) {
 		iommu_group_put(group);
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 15/36] iommu/arm-smmu-v3: Add second level of context descriptor table
  2017-10-06 13:31 ` Jean-Philippe Brucker
@ 2017-10-06 13:31   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

The SMMU can support up to 20 bits of SSID. Add a second level of page
tables to accommodate this. Devices that support more than 1024 SSIDs now
have a table of 1024 L1 entries (8kB), pointing to tables of 1024 context
descriptors (64kB), allocated on demand.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 198 +++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 179 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 37061e1cbae4..c444f9e83b91 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -239,6 +239,8 @@
 
 #define STRTAB_STE_0_S1FMT_SHIFT	4
 #define STRTAB_STE_0_S1FMT_LINEAR	(0UL << STRTAB_STE_0_S1FMT_SHIFT)
+#define STRTAB_STE_0_S1FMT_4K_L2	(1UL << STRTAB_STE_0_S1FMT_SHIFT)
+#define STRTAB_STE_0_S1FMT_64K_L2	(2UL << STRTAB_STE_0_S1FMT_SHIFT)
 #define STRTAB_STE_0_S1CTXPTR_SHIFT	6
 #define STRTAB_STE_0_S1CTXPTR_MASK	0x3ffffffffffUL
 #define STRTAB_STE_0_S1CDMAX_SHIFT	59
@@ -287,7 +289,21 @@
 #define STRTAB_STE_3_S2TTB_SHIFT	4
 #define STRTAB_STE_3_S2TTB_MASK		0xfffffffffffUL
 
-/* Context descriptor (stage-1 only) */
+/*
+ * Context descriptor
+ *
+ * Linear: when less than 1024 SSIDs are supported
+ * 2lvl: at most 1024 L1 entrie,
+ *	 1024 lazy entries per table.
+ */
+#define CTXDESC_SPLIT			10
+#define CTXDESC_NUM_L2_ENTRIES		(1 << CTXDESC_SPLIT)
+
+#define CTXDESC_L1_DESC_DWORD		1
+#define CTXDESC_L1_DESC_VALID		1
+#define CTXDESC_L1_DESC_L2PTR_SHIFT	12
+#define CTXDESC_L1_DESC_L2PTR_MASK	0xfffffffffUL
+
 #define CTXDESC_CD_DWORDS		8
 #define CTXDESC_CD_0_TCR_T0SZ_SHIFT	0
 #define ARM64_TCR_T0SZ_SHIFT		0
@@ -567,9 +583,24 @@ struct arm_smmu_ctx_desc {
 	u64				mair;
 };
 
-struct arm_smmu_s1_cfg {
+struct arm_smmu_cd_table {
 	__le64				*cdptr;
 	dma_addr_t			cdptr_dma;
+};
+
+struct arm_smmu_s1_cfg {
+	bool				linear;
+
+	union {
+		struct arm_smmu_cd_table table;
+		struct {
+			__le64		*ptr;
+			dma_addr_t	ptr_dma;
+			size_t		num_entries;
+
+			struct arm_smmu_cd_table *tables;
+		} l1;
+	};
 
 	size_t				num_contexts;
 
@@ -1000,7 +1031,8 @@ static void arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu,
 }
 
 /* Context descriptor manipulation functions */
-static void arm_smmu_sync_cd(struct arm_smmu_domain *smmu_domain, u32 ssid)
+static void arm_smmu_sync_cd(struct arm_smmu_domain *smmu_domain, u32 ssid,
+			     bool leaf)
 {
 	size_t i;
 	unsigned long flags;
@@ -1010,7 +1042,7 @@ static void arm_smmu_sync_cd(struct arm_smmu_domain *smmu_domain, u32 ssid)
 		.opcode = CMDQ_OP_CFGI_CD,
 		.cfgi   = {
 			.ssid   = ssid,
-			.leaf   = true,
+			.leaf   = leaf,
 		},
 	};
 
@@ -1029,6 +1061,69 @@ static void arm_smmu_sync_cd(struct arm_smmu_domain *smmu_domain, u32 ssid)
 	arm_smmu_cmdq_issue_cmd(smmu, &cmd);
 }
 
+static int arm_smmu_alloc_cd_leaf_table(struct arm_smmu_device *smmu,
+					struct arm_smmu_cd_table *desc,
+					size_t num_entries)
+{
+	size_t size = num_entries * (CTXDESC_CD_DWORDS << 3);
+
+	desc->cdptr = dmam_alloc_coherent(smmu->dev, size, &desc->cdptr_dma,
+					  GFP_ATOMIC | __GFP_ZERO);
+	if (!desc->cdptr)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static void arm_smmu_free_cd_leaf_table(struct arm_smmu_device *smmu,
+					struct arm_smmu_cd_table *desc,
+					size_t num_entries)
+{
+	size_t size = num_entries * (CTXDESC_CD_DWORDS << 3);
+
+	dmam_free_coherent(smmu->dev, size, desc->cdptr, desc->cdptr_dma);
+}
+
+static void arm_smmu_write_cd_l1_desc(__le64 *dst,
+				      struct arm_smmu_cd_table *table)
+{
+	u64 val = (table->cdptr_dma & CTXDESC_L1_DESC_L2PTR_MASK
+		  << CTXDESC_L1_DESC_L2PTR_SHIFT) | CTXDESC_L1_DESC_VALID;
+
+	*dst = cpu_to_le64(val);
+}
+
+static __u64 *arm_smmu_get_cd_ptr(struct arm_smmu_domain *smmu_domain, u32 ssid)
+{
+	unsigned long idx;
+	struct arm_smmu_cd_table *l1_desc;
+	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
+
+	if (cfg->linear)
+		return cfg->table.cdptr + ssid * CTXDESC_CD_DWORDS;
+
+	idx = ssid >> CTXDESC_SPLIT;
+	if (idx >= cfg->l1.num_entries)
+		return NULL;
+
+	l1_desc = &cfg->l1.tables[idx];
+	if (!l1_desc->cdptr) {
+		__le64 *l1ptr = cfg->l1.ptr + idx * CTXDESC_L1_DESC_DWORD;
+
+		if (arm_smmu_alloc_cd_leaf_table(smmu_domain->smmu, l1_desc,
+						 CTXDESC_NUM_L2_ENTRIES))
+			return NULL;
+
+		arm_smmu_write_cd_l1_desc(l1ptr, l1_desc);
+		/* An invalid L1 entry is allowed to be cached */
+		arm_smmu_sync_cd(smmu_domain, idx << CTXDESC_SPLIT, false);
+	}
+
+	idx = ssid & (CTXDESC_NUM_L2_ENTRIES - 1);
+
+	return l1_desc->cdptr + idx * CTXDESC_CD_DWORDS;
+}
+
 static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
 {
 	u64 val = 0;
@@ -1052,7 +1147,7 @@ static void arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
 {
 	u64 val;
 	bool cd_live;
-	__u64 *cdptr = (__u64 *)smmu_domain->s1_cfg.cdptr + ssid * CTXDESC_CD_DWORDS;
+	__u64 *cdptr = arm_smmu_get_cd_ptr(smmu_domain, ssid);
 
 	/*
 	 * This function handles the following cases:
@@ -1067,6 +1162,9 @@ static void arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
 	 * (4) Remove a secondary CD and invalidate it.
 	 */
 
+	if (WARN_ON(!cdptr))
+		return;
+
 	val = le64_to_cpu(cdptr[0]);
 	cd_live = !!(val & CTXDESC_CD_0_V);
 
@@ -1074,7 +1172,7 @@ static void arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
 		/* (4) */
 		cdptr[0] = 0;
 		if (ssid)
-			arm_smmu_sync_cd(smmu_domain, ssid);
+			arm_smmu_sync_cd(smmu_domain, ssid, true);
 		return;
 	}
 
@@ -1102,7 +1200,7 @@ static void arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
 			 * time. Ensure it observes the rest of the CD before we
 			 * enable it.
 			 */
-			arm_smmu_sync_cd(smmu_domain, ssid);
+			arm_smmu_sync_cd(smmu_domain, ssid, true);
 
 		val = arm_smmu_cpu_tcr_to_cd(cd->tcr) |
 #ifdef __BIG_ENDIAN
@@ -1122,12 +1220,15 @@ static void arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
 	}
 
 	if (ssid || cd_live)
-		arm_smmu_sync_cd(smmu_domain, ssid);
+		arm_smmu_sync_cd(smmu_domain, ssid, true);
 }
 
 static int arm_smmu_alloc_cd_tables(struct arm_smmu_domain *smmu_domain)
 {
+	int ret;
 	int num_ssids;
+	size_t num_leaf_entries, size = 0;
+	struct arm_smmu_cd_table *leaf_table;
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
 
@@ -1135,28 +1236,80 @@ static int arm_smmu_alloc_cd_tables(struct arm_smmu_domain *smmu_domain)
 		return -EINVAL;
 
 	num_ssids = cfg->num_contexts;
+	if (num_ssids <= CTXDESC_NUM_L2_ENTRIES) {
+		/* Fits in a single table */
+		cfg->linear = true;
+		num_leaf_entries = num_ssids;
+		leaf_table = &cfg->table;
+	} else {
+		/*
+		 * SSID[S1CDmax-1:10] indexes 1st-level table, SSID[9:0] indexes
+		 * 2nd-level
+		 */
+		cfg->linear = false;
+		cfg->l1.num_entries = num_ssids / CTXDESC_NUM_L2_ENTRIES;
 
-	cfg->cdptr = dmam_alloc_coherent(smmu->dev,
-					 num_ssids * (CTXDESC_CD_DWORDS << 3),
-					 &cfg->cdptr_dma,
-					 GFP_KERNEL | __GFP_ZERO);
-	if (!cfg->cdptr)
-		return -ENOMEM;
+		cfg->l1.tables = devm_kzalloc(smmu->dev,
+					      sizeof(struct arm_smmu_cd_table) *
+					      cfg->l1.num_entries, GFP_KERNEL);
+		if (!cfg->l1.tables)
+			return -ENOMEM;
+
+		size = cfg->l1.num_entries * (CTXDESC_L1_DESC_DWORD << 3);
+		cfg->l1.ptr = dmam_alloc_coherent(smmu->dev, size,
+						  &cfg->l1.ptr_dma,
+						  GFP_KERNEL | __GFP_ZERO);
+		if (!cfg->l1.ptr) {
+			devm_kfree(smmu->dev, cfg->l1.tables);
+			return -ENOMEM;
+		}
+
+		num_leaf_entries = CTXDESC_NUM_L2_ENTRIES;
+		leaf_table = cfg->l1.tables;
+	}
+
+	ret = arm_smmu_alloc_cd_leaf_table(smmu, leaf_table, num_leaf_entries);
+	if (ret) {
+		if (!cfg->linear) {
+			dmam_free_coherent(smmu->dev, size, cfg->l1.ptr,
+					   cfg->l1.ptr_dma);
+			devm_kfree(smmu->dev, cfg->l1.tables);
+		}
+
+		return ret;
+	}
+
+	if (!cfg->linear)
+		arm_smmu_write_cd_l1_desc(cfg->l1.ptr, leaf_table);
 
 	return 0;
 }
 
 static void arm_smmu_free_cd_tables(struct arm_smmu_domain *smmu_domain)
 {
+	size_t i, size;
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
 
 	if (WARN_ON(smmu_domain->stage != ARM_SMMU_DOMAIN_S1))
 		return;
 
-	dmam_free_coherent(smmu->dev,
-			   cfg->num_contexts * (CTXDESC_CD_DWORDS << 3),
-			   cfg->cdptr, cfg->cdptr_dma);
+	if (cfg->linear) {
+		arm_smmu_free_cd_leaf_table(smmu, &cfg->table, cfg->num_contexts);
+	} else {
+		for (i = 0; i < cfg->l1.num_entries; i++) {
+			struct arm_smmu_cd_table *desc = &cfg->l1.tables[i];
+
+			if (!desc->cdptr)
+				continue;
+
+			arm_smmu_free_cd_leaf_table(smmu, desc,
+						    CTXDESC_NUM_L2_ENTRIES);
+		}
+
+		size = cfg->l1.num_entries * (CTXDESC_L1_DESC_DWORD << 3);
+		dmam_free_coherent(smmu->dev, size, cfg->l1.ptr, cfg->l1.ptr_dma);
+	}
 }
 
 /* Stream table manipulation functions */
@@ -1255,10 +1408,16 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 	}
 
 	if (ste->s1_cfg) {
+		dma_addr_t s1ctxptr;
 		unsigned int s1cdmax = ilog2(ste->s1_cfg->num_contexts);
 
 		BUG_ON(ste_live);
 
+		if (ste->s1_cfg->linear)
+			s1ctxptr = ste->s1_cfg->table.cdptr_dma;
+		else
+			s1ctxptr = ste->s1_cfg->l1.ptr_dma;
+
 		dst[1] = cpu_to_le64(
 			 STRTAB_STE_1_S1DSS_SSID0 |
 			 STRTAB_STE_1_S1C_CACHE_WBRA
@@ -1275,11 +1434,12 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
 			dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
 
-		val |= (ste->s1_cfg->cdptr_dma & STRTAB_STE_0_S1CTXPTR_MASK
+		val |= (s1ctxptr & STRTAB_STE_0_S1CTXPTR_MASK
 		        << STRTAB_STE_0_S1CTXPTR_SHIFT) |
 			(u64)(s1cdmax & STRTAB_STE_0_S1CDMAX_MASK)
 			<< STRTAB_STE_0_S1CDMAX_SHIFT |
-			STRTAB_STE_0_S1FMT_LINEAR |
+			(ste->s1_cfg->linear ? STRTAB_STE_0_S1FMT_LINEAR :
+			   STRTAB_STE_0_S1FMT_64K_L2) |
 			STRTAB_STE_0_CFG_S1_TRANS;
 	}
 
-- 
2.13.3


^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 15/36] iommu/arm-smmu-v3: Add second level of context descriptor table
@ 2017-10-06 13:31   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

The SMMU can support up to 20 bits of SSID. Add a second level of page
tables to accommodate this. Devices that support more than 1024 SSIDs now
have a table of 1024 L1 entries (8kB), pointing to tables of 1024 context
descriptors (64kB), allocated on demand.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 198 +++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 179 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 37061e1cbae4..c444f9e83b91 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -239,6 +239,8 @@
 
 #define STRTAB_STE_0_S1FMT_SHIFT	4
 #define STRTAB_STE_0_S1FMT_LINEAR	(0UL << STRTAB_STE_0_S1FMT_SHIFT)
+#define STRTAB_STE_0_S1FMT_4K_L2	(1UL << STRTAB_STE_0_S1FMT_SHIFT)
+#define STRTAB_STE_0_S1FMT_64K_L2	(2UL << STRTAB_STE_0_S1FMT_SHIFT)
 #define STRTAB_STE_0_S1CTXPTR_SHIFT	6
 #define STRTAB_STE_0_S1CTXPTR_MASK	0x3ffffffffffUL
 #define STRTAB_STE_0_S1CDMAX_SHIFT	59
@@ -287,7 +289,21 @@
 #define STRTAB_STE_3_S2TTB_SHIFT	4
 #define STRTAB_STE_3_S2TTB_MASK		0xfffffffffffUL
 
-/* Context descriptor (stage-1 only) */
+/*
+ * Context descriptor
+ *
+ * Linear: when less than 1024 SSIDs are supported
+ * 2lvl:@most 1024 L1 entrie,
+ *	 1024 lazy entries per table.
+ */
+#define CTXDESC_SPLIT			10
+#define CTXDESC_NUM_L2_ENTRIES		(1 << CTXDESC_SPLIT)
+
+#define CTXDESC_L1_DESC_DWORD		1
+#define CTXDESC_L1_DESC_VALID		1
+#define CTXDESC_L1_DESC_L2PTR_SHIFT	12
+#define CTXDESC_L1_DESC_L2PTR_MASK	0xfffffffffUL
+
 #define CTXDESC_CD_DWORDS		8
 #define CTXDESC_CD_0_TCR_T0SZ_SHIFT	0
 #define ARM64_TCR_T0SZ_SHIFT		0
@@ -567,9 +583,24 @@ struct arm_smmu_ctx_desc {
 	u64				mair;
 };
 
-struct arm_smmu_s1_cfg {
+struct arm_smmu_cd_table {
 	__le64				*cdptr;
 	dma_addr_t			cdptr_dma;
+};
+
+struct arm_smmu_s1_cfg {
+	bool				linear;
+
+	union {
+		struct arm_smmu_cd_table table;
+		struct {
+			__le64		*ptr;
+			dma_addr_t	ptr_dma;
+			size_t		num_entries;
+
+			struct arm_smmu_cd_table *tables;
+		} l1;
+	};
 
 	size_t				num_contexts;
 
@@ -1000,7 +1031,8 @@ static void arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu,
 }
 
 /* Context descriptor manipulation functions */
-static void arm_smmu_sync_cd(struct arm_smmu_domain *smmu_domain, u32 ssid)
+static void arm_smmu_sync_cd(struct arm_smmu_domain *smmu_domain, u32 ssid,
+			     bool leaf)
 {
 	size_t i;
 	unsigned long flags;
@@ -1010,7 +1042,7 @@ static void arm_smmu_sync_cd(struct arm_smmu_domain *smmu_domain, u32 ssid)
 		.opcode = CMDQ_OP_CFGI_CD,
 		.cfgi   = {
 			.ssid   = ssid,
-			.leaf   = true,
+			.leaf   = leaf,
 		},
 	};
 
@@ -1029,6 +1061,69 @@ static void arm_smmu_sync_cd(struct arm_smmu_domain *smmu_domain, u32 ssid)
 	arm_smmu_cmdq_issue_cmd(smmu, &cmd);
 }
 
+static int arm_smmu_alloc_cd_leaf_table(struct arm_smmu_device *smmu,
+					struct arm_smmu_cd_table *desc,
+					size_t num_entries)
+{
+	size_t size = num_entries * (CTXDESC_CD_DWORDS << 3);
+
+	desc->cdptr = dmam_alloc_coherent(smmu->dev, size, &desc->cdptr_dma,
+					  GFP_ATOMIC | __GFP_ZERO);
+	if (!desc->cdptr)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static void arm_smmu_free_cd_leaf_table(struct arm_smmu_device *smmu,
+					struct arm_smmu_cd_table *desc,
+					size_t num_entries)
+{
+	size_t size = num_entries * (CTXDESC_CD_DWORDS << 3);
+
+	dmam_free_coherent(smmu->dev, size, desc->cdptr, desc->cdptr_dma);
+}
+
+static void arm_smmu_write_cd_l1_desc(__le64 *dst,
+				      struct arm_smmu_cd_table *table)
+{
+	u64 val = (table->cdptr_dma & CTXDESC_L1_DESC_L2PTR_MASK
+		  << CTXDESC_L1_DESC_L2PTR_SHIFT) | CTXDESC_L1_DESC_VALID;
+
+	*dst = cpu_to_le64(val);
+}
+
+static __u64 *arm_smmu_get_cd_ptr(struct arm_smmu_domain *smmu_domain, u32 ssid)
+{
+	unsigned long idx;
+	struct arm_smmu_cd_table *l1_desc;
+	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
+
+	if (cfg->linear)
+		return cfg->table.cdptr + ssid * CTXDESC_CD_DWORDS;
+
+	idx = ssid >> CTXDESC_SPLIT;
+	if (idx >= cfg->l1.num_entries)
+		return NULL;
+
+	l1_desc = &cfg->l1.tables[idx];
+	if (!l1_desc->cdptr) {
+		__le64 *l1ptr = cfg->l1.ptr + idx * CTXDESC_L1_DESC_DWORD;
+
+		if (arm_smmu_alloc_cd_leaf_table(smmu_domain->smmu, l1_desc,
+						 CTXDESC_NUM_L2_ENTRIES))
+			return NULL;
+
+		arm_smmu_write_cd_l1_desc(l1ptr, l1_desc);
+		/* An invalid L1 entry is allowed to be cached */
+		arm_smmu_sync_cd(smmu_domain, idx << CTXDESC_SPLIT, false);
+	}
+
+	idx = ssid & (CTXDESC_NUM_L2_ENTRIES - 1);
+
+	return l1_desc->cdptr + idx * CTXDESC_CD_DWORDS;
+}
+
 static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
 {
 	u64 val = 0;
@@ -1052,7 +1147,7 @@ static void arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
 {
 	u64 val;
 	bool cd_live;
-	__u64 *cdptr = (__u64 *)smmu_domain->s1_cfg.cdptr + ssid * CTXDESC_CD_DWORDS;
+	__u64 *cdptr = arm_smmu_get_cd_ptr(smmu_domain, ssid);
 
 	/*
 	 * This function handles the following cases:
@@ -1067,6 +1162,9 @@ static void arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
 	 * (4) Remove a secondary CD and invalidate it.
 	 */
 
+	if (WARN_ON(!cdptr))
+		return;
+
 	val = le64_to_cpu(cdptr[0]);
 	cd_live = !!(val & CTXDESC_CD_0_V);
 
@@ -1074,7 +1172,7 @@ static void arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
 		/* (4) */
 		cdptr[0] = 0;
 		if (ssid)
-			arm_smmu_sync_cd(smmu_domain, ssid);
+			arm_smmu_sync_cd(smmu_domain, ssid, true);
 		return;
 	}
 
@@ -1102,7 +1200,7 @@ static void arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
 			 * time. Ensure it observes the rest of the CD before we
 			 * enable it.
 			 */
-			arm_smmu_sync_cd(smmu_domain, ssid);
+			arm_smmu_sync_cd(smmu_domain, ssid, true);
 
 		val = arm_smmu_cpu_tcr_to_cd(cd->tcr) |
 #ifdef __BIG_ENDIAN
@@ -1122,12 +1220,15 @@ static void arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
 	}
 
 	if (ssid || cd_live)
-		arm_smmu_sync_cd(smmu_domain, ssid);
+		arm_smmu_sync_cd(smmu_domain, ssid, true);
 }
 
 static int arm_smmu_alloc_cd_tables(struct arm_smmu_domain *smmu_domain)
 {
+	int ret;
 	int num_ssids;
+	size_t num_leaf_entries, size = 0;
+	struct arm_smmu_cd_table *leaf_table;
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
 
@@ -1135,28 +1236,80 @@ static int arm_smmu_alloc_cd_tables(struct arm_smmu_domain *smmu_domain)
 		return -EINVAL;
 
 	num_ssids = cfg->num_contexts;
+	if (num_ssids <= CTXDESC_NUM_L2_ENTRIES) {
+		/* Fits in a single table */
+		cfg->linear = true;
+		num_leaf_entries = num_ssids;
+		leaf_table = &cfg->table;
+	} else {
+		/*
+		 * SSID[S1CDmax-1:10] indexes 1st-level table, SSID[9:0] indexes
+		 * 2nd-level
+		 */
+		cfg->linear = false;
+		cfg->l1.num_entries = num_ssids / CTXDESC_NUM_L2_ENTRIES;
 
-	cfg->cdptr = dmam_alloc_coherent(smmu->dev,
-					 num_ssids * (CTXDESC_CD_DWORDS << 3),
-					 &cfg->cdptr_dma,
-					 GFP_KERNEL | __GFP_ZERO);
-	if (!cfg->cdptr)
-		return -ENOMEM;
+		cfg->l1.tables = devm_kzalloc(smmu->dev,
+					      sizeof(struct arm_smmu_cd_table) *
+					      cfg->l1.num_entries, GFP_KERNEL);
+		if (!cfg->l1.tables)
+			return -ENOMEM;
+
+		size = cfg->l1.num_entries * (CTXDESC_L1_DESC_DWORD << 3);
+		cfg->l1.ptr = dmam_alloc_coherent(smmu->dev, size,
+						  &cfg->l1.ptr_dma,
+						  GFP_KERNEL | __GFP_ZERO);
+		if (!cfg->l1.ptr) {
+			devm_kfree(smmu->dev, cfg->l1.tables);
+			return -ENOMEM;
+		}
+
+		num_leaf_entries = CTXDESC_NUM_L2_ENTRIES;
+		leaf_table = cfg->l1.tables;
+	}
+
+	ret = arm_smmu_alloc_cd_leaf_table(smmu, leaf_table, num_leaf_entries);
+	if (ret) {
+		if (!cfg->linear) {
+			dmam_free_coherent(smmu->dev, size, cfg->l1.ptr,
+					   cfg->l1.ptr_dma);
+			devm_kfree(smmu->dev, cfg->l1.tables);
+		}
+
+		return ret;
+	}
+
+	if (!cfg->linear)
+		arm_smmu_write_cd_l1_desc(cfg->l1.ptr, leaf_table);
 
 	return 0;
 }
 
 static void arm_smmu_free_cd_tables(struct arm_smmu_domain *smmu_domain)
 {
+	size_t i, size;
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
 
 	if (WARN_ON(smmu_domain->stage != ARM_SMMU_DOMAIN_S1))
 		return;
 
-	dmam_free_coherent(smmu->dev,
-			   cfg->num_contexts * (CTXDESC_CD_DWORDS << 3),
-			   cfg->cdptr, cfg->cdptr_dma);
+	if (cfg->linear) {
+		arm_smmu_free_cd_leaf_table(smmu, &cfg->table, cfg->num_contexts);
+	} else {
+		for (i = 0; i < cfg->l1.num_entries; i++) {
+			struct arm_smmu_cd_table *desc = &cfg->l1.tables[i];
+
+			if (!desc->cdptr)
+				continue;
+
+			arm_smmu_free_cd_leaf_table(smmu, desc,
+						    CTXDESC_NUM_L2_ENTRIES);
+		}
+
+		size = cfg->l1.num_entries * (CTXDESC_L1_DESC_DWORD << 3);
+		dmam_free_coherent(smmu->dev, size, cfg->l1.ptr, cfg->l1.ptr_dma);
+	}
 }
 
 /* Stream table manipulation functions */
@@ -1255,10 +1408,16 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 	}
 
 	if (ste->s1_cfg) {
+		dma_addr_t s1ctxptr;
 		unsigned int s1cdmax = ilog2(ste->s1_cfg->num_contexts);
 
 		BUG_ON(ste_live);
 
+		if (ste->s1_cfg->linear)
+			s1ctxptr = ste->s1_cfg->table.cdptr_dma;
+		else
+			s1ctxptr = ste->s1_cfg->l1.ptr_dma;
+
 		dst[1] = cpu_to_le64(
 			 STRTAB_STE_1_S1DSS_SSID0 |
 			 STRTAB_STE_1_S1C_CACHE_WBRA
@@ -1275,11 +1434,12 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
 			dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
 
-		val |= (ste->s1_cfg->cdptr_dma & STRTAB_STE_0_S1CTXPTR_MASK
+		val |= (s1ctxptr & STRTAB_STE_0_S1CTXPTR_MASK
 		        << STRTAB_STE_0_S1CTXPTR_SHIFT) |
 			(u64)(s1cdmax & STRTAB_STE_0_S1CDMAX_MASK)
 			<< STRTAB_STE_0_S1CDMAX_SHIFT |
-			STRTAB_STE_0_S1FMT_LINEAR |
+			(ste->s1_cfg->linear ? STRTAB_STE_0_S1FMT_LINEAR :
+			   STRTAB_STE_0_S1FMT_64K_L2) |
 			STRTAB_STE_0_CFG_S1_TRANS;
 	}
 
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 16/36] iommu/arm-smmu-v3: Add support for VHE
  2017-10-06 13:31 ` Jean-Philippe Brucker
@ 2017-10-06 13:31   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

ARMv8.1 extensions added Virtualization Host Extensions (VHE), which allow
to run a host kernel at EL2. When using normal DMA, Device and CPU address
spaces are orthogonal, and do not need to implement the same capabilities,
so VHE hasn't been in use on the SMMU side until now.

With shared address spaces however, ASIDs are shared between MMU and SMMU,
and broadcast TLB invalidations issued by a CPU are taken into account by
the SMMU. TLB entries on both sides need to have identical exception level
in order to be shot with a single invalidation.

When the CPU is using VHE, enable VHE in the SMMU and for all streams.
Normal DMA mappings will need to use TLBI_EL2 commands instead of TLBI_NH,
but shouldn't be otherwise affected by this change.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 29 ++++++++++++++++++++++++-----
 1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index c444f9e83b91..27376e1193c1 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -22,6 +22,7 @@
 
 #include <linux/acpi.h>
 #include <linux/acpi_iort.h>
+#include <linux/cpufeature.h>
 #include <linux/delay.h>
 #include <linux/dma-iommu.h>
 #include <linux/err.h>
@@ -516,6 +517,8 @@ struct arm_smmu_cmdq_ent {
 		#define CMDQ_OP_TLBI_NH_ASID	0x11
 		#define CMDQ_OP_TLBI_NH_VA	0x12
 		#define CMDQ_OP_TLBI_EL2_ALL	0x20
+		#define CMDQ_OP_TLBI_EL2_ASID	0x21
+		#define CMDQ_OP_TLBI_EL2_VA	0x22
 		#define CMDQ_OP_TLBI_S12_VMALL	0x28
 		#define CMDQ_OP_TLBI_S2_IPA	0x2a
 		#define CMDQ_OP_TLBI_NSNH_ALL	0x30
@@ -655,6 +658,7 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_STALLS		(1 << 11)
 #define ARM_SMMU_FEAT_HYP		(1 << 12)
 #define ARM_SMMU_FEAT_STALL_FORCE	(1 << 13)
+#define ARM_SMMU_FEAT_E2H		(1 << 14)
 	u32				features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
@@ -912,6 +916,7 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 		cmd[1] |= CMDQ_CFGI_1_RANGE_MASK << CMDQ_CFGI_1_RANGE_SHIFT;
 		break;
 	case CMDQ_OP_TLBI_NH_VA:
+	case CMDQ_OP_TLBI_EL2_VA:
 		cmd[0] |= (u64)ent->tlbi.asid << CMDQ_TLBI_0_ASID_SHIFT;
 		cmd[1] |= ent->tlbi.leaf ? CMDQ_TLBI_1_LEAF : 0;
 		cmd[1] |= ent->tlbi.addr & CMDQ_TLBI_1_VA_MASK;
@@ -927,6 +932,9 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 	case CMDQ_OP_TLBI_S12_VMALL:
 		cmd[0] |= (u64)ent->tlbi.vmid << CMDQ_TLBI_0_VMID_SHIFT;
 		break;
+	case CMDQ_OP_TLBI_EL2_ASID:
+		cmd[0] |= (u64)ent->tlbi.asid << CMDQ_TLBI_0_ASID_SHIFT;
+		break;
 	case CMDQ_OP_PRI_RESP:
 		cmd[0] |= ent->substream_valid ? CMDQ_0_SSV : 0;
 		cmd[0] |= ent->pri.ssid << CMDQ_PRI_0_SSID_SHIFT;
@@ -1428,7 +1436,9 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 #ifdef CONFIG_PCI_ATS
 			 STRTAB_STE_1_EATS_TRANS << STRTAB_STE_1_EATS_SHIFT |
 #endif
-			 STRTAB_STE_1_STRW_NSEL1 << STRTAB_STE_1_STRW_SHIFT);
+			 (smmu->features & ARM_SMMU_FEAT_E2H ?
+			  STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1) <<
+			 STRTAB_STE_1_STRW_SHIFT);
 
 		if (smmu->features & ARM_SMMU_FEAT_STALLS &&
 		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
@@ -1694,7 +1704,8 @@ static void arm_smmu_tlb_inv_context(void *cookie)
 	struct arm_smmu_cmdq_ent cmd;
 
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
-		cmd.opcode	= CMDQ_OP_TLBI_NH_ASID;
+		cmd.opcode	= smmu->features & ARM_SMMU_FEAT_E2H ?
+				  CMDQ_OP_TLBI_EL2_ASID : CMDQ_OP_TLBI_NH_ASID;
 		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd.asid;
 		cmd.tlbi.vmid	= 0;
 	} else {
@@ -1719,7 +1730,8 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
 	};
 
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
-		cmd.opcode	= CMDQ_OP_TLBI_NH_VA;
+		cmd.opcode	= smmu->features & ARM_SMMU_FEAT_E2H ?
+				  CMDQ_OP_TLBI_EL2_VA : CMDQ_OP_TLBI_NH_VA;
 		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd.asid;
 	} else {
 		cmd.opcode	= CMDQ_OP_TLBI_S2_IPA;
@@ -2718,7 +2730,11 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
 	writel_relaxed(reg, smmu->base + ARM_SMMU_CR1);
 
 	/* CR2 (random crap) */
-	reg = CR2_PTM | CR2_RECINVSID | CR2_E2H;
+	reg = CR2_PTM | CR2_RECINVSID;
+
+	if (smmu->features & ARM_SMMU_FEAT_E2H)
+		reg |= CR2_E2H;
+
 	writel_relaxed(reg, smmu->base + ARM_SMMU_CR2);
 
 	/* Stream table */
@@ -2868,8 +2884,11 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 	if (reg & IDR0_MSI)
 		smmu->features |= ARM_SMMU_FEAT_MSI;
 
-	if (reg & IDR0_HYP)
+	if (reg & IDR0_HYP) {
 		smmu->features |= ARM_SMMU_FEAT_HYP;
+		if (cpus_have_cap(ARM64_HAS_VIRT_HOST_EXTN))
+			smmu->features |= ARM_SMMU_FEAT_E2H;
+	}
 
 	/*
 	 * The coherency feature as set by FW is used in preference to the ID
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 16/36] iommu/arm-smmu-v3: Add support for VHE
@ 2017-10-06 13:31   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

ARMv8.1 extensions added Virtualization Host Extensions (VHE), which allow
to run a host kernel at EL2. When using normal DMA, Device and CPU address
spaces are orthogonal, and do not need to implement the same capabilities,
so VHE hasn't been in use on the SMMU side until now.

With shared address spaces however, ASIDs are shared between MMU and SMMU,
and broadcast TLB invalidations issued by a CPU are taken into account by
the SMMU. TLB entries on both sides need to have identical exception level
in order to be shot with a single invalidation.

When the CPU is using VHE, enable VHE in the SMMU and for all streams.
Normal DMA mappings will need to use TLBI_EL2 commands instead of TLBI_NH,
but shouldn't be otherwise affected by this change.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 29 ++++++++++++++++++++++++-----
 1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index c444f9e83b91..27376e1193c1 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -22,6 +22,7 @@
 
 #include <linux/acpi.h>
 #include <linux/acpi_iort.h>
+#include <linux/cpufeature.h>
 #include <linux/delay.h>
 #include <linux/dma-iommu.h>
 #include <linux/err.h>
@@ -516,6 +517,8 @@ struct arm_smmu_cmdq_ent {
 		#define CMDQ_OP_TLBI_NH_ASID	0x11
 		#define CMDQ_OP_TLBI_NH_VA	0x12
 		#define CMDQ_OP_TLBI_EL2_ALL	0x20
+		#define CMDQ_OP_TLBI_EL2_ASID	0x21
+		#define CMDQ_OP_TLBI_EL2_VA	0x22
 		#define CMDQ_OP_TLBI_S12_VMALL	0x28
 		#define CMDQ_OP_TLBI_S2_IPA	0x2a
 		#define CMDQ_OP_TLBI_NSNH_ALL	0x30
@@ -655,6 +658,7 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_STALLS		(1 << 11)
 #define ARM_SMMU_FEAT_HYP		(1 << 12)
 #define ARM_SMMU_FEAT_STALL_FORCE	(1 << 13)
+#define ARM_SMMU_FEAT_E2H		(1 << 14)
 	u32				features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
@@ -912,6 +916,7 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 		cmd[1] |= CMDQ_CFGI_1_RANGE_MASK << CMDQ_CFGI_1_RANGE_SHIFT;
 		break;
 	case CMDQ_OP_TLBI_NH_VA:
+	case CMDQ_OP_TLBI_EL2_VA:
 		cmd[0] |= (u64)ent->tlbi.asid << CMDQ_TLBI_0_ASID_SHIFT;
 		cmd[1] |= ent->tlbi.leaf ? CMDQ_TLBI_1_LEAF : 0;
 		cmd[1] |= ent->tlbi.addr & CMDQ_TLBI_1_VA_MASK;
@@ -927,6 +932,9 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 	case CMDQ_OP_TLBI_S12_VMALL:
 		cmd[0] |= (u64)ent->tlbi.vmid << CMDQ_TLBI_0_VMID_SHIFT;
 		break;
+	case CMDQ_OP_TLBI_EL2_ASID:
+		cmd[0] |= (u64)ent->tlbi.asid << CMDQ_TLBI_0_ASID_SHIFT;
+		break;
 	case CMDQ_OP_PRI_RESP:
 		cmd[0] |= ent->substream_valid ? CMDQ_0_SSV : 0;
 		cmd[0] |= ent->pri.ssid << CMDQ_PRI_0_SSID_SHIFT;
@@ -1428,7 +1436,9 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 #ifdef CONFIG_PCI_ATS
 			 STRTAB_STE_1_EATS_TRANS << STRTAB_STE_1_EATS_SHIFT |
 #endif
-			 STRTAB_STE_1_STRW_NSEL1 << STRTAB_STE_1_STRW_SHIFT);
+			 (smmu->features & ARM_SMMU_FEAT_E2H ?
+			  STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1) <<
+			 STRTAB_STE_1_STRW_SHIFT);
 
 		if (smmu->features & ARM_SMMU_FEAT_STALLS &&
 		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
@@ -1694,7 +1704,8 @@ static void arm_smmu_tlb_inv_context(void *cookie)
 	struct arm_smmu_cmdq_ent cmd;
 
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
-		cmd.opcode	= CMDQ_OP_TLBI_NH_ASID;
+		cmd.opcode	= smmu->features & ARM_SMMU_FEAT_E2H ?
+				  CMDQ_OP_TLBI_EL2_ASID : CMDQ_OP_TLBI_NH_ASID;
 		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd.asid;
 		cmd.tlbi.vmid	= 0;
 	} else {
@@ -1719,7 +1730,8 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
 	};
 
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
-		cmd.opcode	= CMDQ_OP_TLBI_NH_VA;
+		cmd.opcode	= smmu->features & ARM_SMMU_FEAT_E2H ?
+				  CMDQ_OP_TLBI_EL2_VA : CMDQ_OP_TLBI_NH_VA;
 		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd.asid;
 	} else {
 		cmd.opcode	= CMDQ_OP_TLBI_S2_IPA;
@@ -2718,7 +2730,11 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
 	writel_relaxed(reg, smmu->base + ARM_SMMU_CR1);
 
 	/* CR2 (random crap) */
-	reg = CR2_PTM | CR2_RECINVSID | CR2_E2H;
+	reg = CR2_PTM | CR2_RECINVSID;
+
+	if (smmu->features & ARM_SMMU_FEAT_E2H)
+		reg |= CR2_E2H;
+
 	writel_relaxed(reg, smmu->base + ARM_SMMU_CR2);
 
 	/* Stream table */
@@ -2868,8 +2884,11 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 	if (reg & IDR0_MSI)
 		smmu->features |= ARM_SMMU_FEAT_MSI;
 
-	if (reg & IDR0_HYP)
+	if (reg & IDR0_HYP) {
 		smmu->features |= ARM_SMMU_FEAT_HYP;
+		if (cpus_have_cap(ARM64_HAS_VIRT_HOST_EXTN))
+			smmu->features |= ARM_SMMU_FEAT_E2H;
+	}
 
 	/*
 	 * The coherency feature as set by FW is used in preference to the ID
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 17/36] iommu/arm-smmu-v3: Support broadcast TLB maintenance
  2017-10-06 13:31 ` Jean-Philippe Brucker
@ 2017-10-06 13:31   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

The SMMUv3 can handle invalidation targeted at TLB entries with shared
ASIDs.  If the implementation supports broadcast TLB maintenance, enable
it and keep track of it in a feature bit. The SMMU will then take into
account the following CPU instruction for ASIDs in the shared set:

* TLBI VAE1IS(ASID, VA)
* TLBI ASIDE1IS(ASID)

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 27376e1193c1..b23f69aa242e 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -64,6 +64,7 @@
 #define IDR0_ASID16			(1 << 12)
 #define IDR0_ATS			(1 << 10)
 #define IDR0_HYP			(1 << 9)
+#define IDR0_BTM			(1 << 5)
 #define IDR0_COHACC			(1 << 4)
 #define IDR0_TTF_SHIFT			2
 #define IDR0_TTF_MASK			0x3
@@ -659,6 +660,7 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_HYP		(1 << 12)
 #define ARM_SMMU_FEAT_STALL_FORCE	(1 << 13)
 #define ARM_SMMU_FEAT_E2H		(1 << 14)
+#define ARM_SMMU_FEAT_BTM		(1 << 15)
 	u32				features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
@@ -2730,11 +2732,14 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
 	writel_relaxed(reg, smmu->base + ARM_SMMU_CR1);
 
 	/* CR2 (random crap) */
-	reg = CR2_PTM | CR2_RECINVSID;
+	reg = CR2_RECINVSID;
 
 	if (smmu->features & ARM_SMMU_FEAT_E2H)
 		reg |= CR2_E2H;
 
+	if (!(smmu->features & ARM_SMMU_FEAT_BTM))
+		reg |= CR2_PTM;
+
 	writel_relaxed(reg, smmu->base + ARM_SMMU_CR2);
 
 	/* Stream table */
@@ -2837,6 +2842,7 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 {
 	u32 reg;
 	bool coherent = smmu->features & ARM_SMMU_FEAT_COHERENCY;
+	bool vhe = cpus_have_cap(ARM64_HAS_VIRT_HOST_EXTN);
 
 	/* IDR0 */
 	reg = readl_relaxed(smmu->base + ARM_SMMU_IDR0);
@@ -2886,11 +2892,20 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 
 	if (reg & IDR0_HYP) {
 		smmu->features |= ARM_SMMU_FEAT_HYP;
-		if (cpus_have_cap(ARM64_HAS_VIRT_HOST_EXTN))
+		if (vhe)
 			smmu->features |= ARM_SMMU_FEAT_E2H;
 	}
 
 	/*
+	 * If the CPU is using VHE, but the SMMU doesn't support it, the SMMU
+	 * will create TLB entries for NH-EL1 world and will miss the
+	 * broadcasted TLB invalidations that target EL2-E2H world. Don't enable
+	 * BTM in that case.
+	 */
+	if (reg & IDR0_BTM && (!vhe || reg & IDR0_HYP))
+		smmu->features |= ARM_SMMU_FEAT_BTM;
+
+	/*
 	 * The coherency feature as set by FW is used in preference to the ID
 	 * register, but warn on mismatch.
 	 */
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 17/36] iommu/arm-smmu-v3: Support broadcast TLB maintenance
@ 2017-10-06 13:31   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

The SMMUv3 can handle invalidation targeted at TLB entries with shared
ASIDs.  If the implementation supports broadcast TLB maintenance, enable
it and keep track of it in a feature bit. The SMMU will then take into
account the following CPU instruction for ASIDs in the shared set:

* TLBI VAE1IS(ASID, VA)
* TLBI ASIDE1IS(ASID)

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 27376e1193c1..b23f69aa242e 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -64,6 +64,7 @@
 #define IDR0_ASID16			(1 << 12)
 #define IDR0_ATS			(1 << 10)
 #define IDR0_HYP			(1 << 9)
+#define IDR0_BTM			(1 << 5)
 #define IDR0_COHACC			(1 << 4)
 #define IDR0_TTF_SHIFT			2
 #define IDR0_TTF_MASK			0x3
@@ -659,6 +660,7 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_HYP		(1 << 12)
 #define ARM_SMMU_FEAT_STALL_FORCE	(1 << 13)
 #define ARM_SMMU_FEAT_E2H		(1 << 14)
+#define ARM_SMMU_FEAT_BTM		(1 << 15)
 	u32				features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
@@ -2730,11 +2732,14 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
 	writel_relaxed(reg, smmu->base + ARM_SMMU_CR1);
 
 	/* CR2 (random crap) */
-	reg = CR2_PTM | CR2_RECINVSID;
+	reg = CR2_RECINVSID;
 
 	if (smmu->features & ARM_SMMU_FEAT_E2H)
 		reg |= CR2_E2H;
 
+	if (!(smmu->features & ARM_SMMU_FEAT_BTM))
+		reg |= CR2_PTM;
+
 	writel_relaxed(reg, smmu->base + ARM_SMMU_CR2);
 
 	/* Stream table */
@@ -2837,6 +2842,7 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 {
 	u32 reg;
 	bool coherent = smmu->features & ARM_SMMU_FEAT_COHERENCY;
+	bool vhe = cpus_have_cap(ARM64_HAS_VIRT_HOST_EXTN);
 
 	/* IDR0 */
 	reg = readl_relaxed(smmu->base + ARM_SMMU_IDR0);
@@ -2886,11 +2892,20 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 
 	if (reg & IDR0_HYP) {
 		smmu->features |= ARM_SMMU_FEAT_HYP;
-		if (cpus_have_cap(ARM64_HAS_VIRT_HOST_EXTN))
+		if (vhe)
 			smmu->features |= ARM_SMMU_FEAT_E2H;
 	}
 
 	/*
+	 * If the CPU is using VHE, but the SMMU doesn't support it, the SMMU
+	 * will create TLB entries for NH-EL1 world and will miss the
+	 * broadcasted TLB invalidations that target EL2-E2H world. Don't enable
+	 * BTM in that case.
+	 */
+	if (reg & IDR0_BTM && (!vhe || reg & IDR0_HYP))
+		smmu->features |= ARM_SMMU_FEAT_BTM;
+
+	/*
 	 * The coherency feature as set by FW is used in preference to the ID
 	 * register, but warn on mismatch.
 	 */
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 18/36] iommu/arm-smmu-v3: Add SVM feature checking
  2017-10-06 13:31 ` Jean-Philippe Brucker
@ 2017-10-06 13:31   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

Aggregate all sanity-checks for sharing CPU page tables with the SMMU
under a single ARM_SMMU_FEAT_SVM bit. For PCIe SVM, users also need to
check FEAT_ATS and FEAT_PRI. For platform SVM, they will most likely have
to check FEAT_STALLS and FEAT_BTM.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 60 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index b23f69aa242e..96347aad605f 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -661,6 +661,7 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_STALL_FORCE	(1 << 13)
 #define ARM_SMMU_FEAT_E2H		(1 << 14)
 #define ARM_SMMU_FEAT_BTM		(1 << 15)
+#define ARM_SMMU_FEAT_SVM		(1 << 16)
 	u32				features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
@@ -2838,6 +2839,62 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
 	return 0;
 }
 
+static bool arm_smmu_supports_svm(struct arm_smmu_device *smmu)
+{
+	unsigned long reg, fld;
+	unsigned long oas;
+	unsigned long asid_bits;
+
+	u32 feat_mask = ARM_SMMU_FEAT_BTM | ARM_SMMU_FEAT_COHERENCY;
+
+	if ((smmu->features & feat_mask) != feat_mask)
+		return false;
+
+	if (!(smmu->pgsize_bitmap & PAGE_SIZE))
+		return false;
+
+	/*
+	 * Get the smallest PA size of all CPUs (sanitized by cpufeature). We're
+	 * not even pretending to support AArch32 here.
+	 */
+	reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
+	fld = cpuid_feature_extract_unsigned_field(reg, ID_AA64MMFR0_PARANGE_SHIFT);
+	switch (fld) {
+	case 0x0:
+		oas = 32;
+		break;
+	case 0x1:
+		oas = 36;
+		break;
+	case 0x2:
+		oas = 40;
+		break;
+	case 0x3:
+		oas = 42;
+		break;
+	case 0x4:
+		oas = 44;
+		break;
+	case 0x5:
+		oas = 48;
+		break;
+	default:
+		return false;
+	}
+
+	/* abort if MMU outputs addresses greater than what we support. */
+	if (smmu->oas < oas)
+		return false;
+
+	/* We can support bigger ASIDs than the CPU, but not smaller */
+	fld = cpuid_feature_extract_unsigned_field(reg, ID_AA64MMFR0_ASID_SHIFT);
+	asid_bits = fld ? 16 : 8;
+	if (smmu->asid_bits < asid_bits)
+		return false;
+
+	return true;
+}
+
 static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 {
 	u32 reg;
@@ -3032,6 +3089,9 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 
 	smmu->ias = max(smmu->ias, smmu->oas);
 
+	if (arm_smmu_supports_svm(smmu))
+		smmu->features |= ARM_SMMU_FEAT_SVM;
+
 	dev_info(smmu->dev, "ias %lu-bit, oas %lu-bit (features 0x%08x)\n",
 		 smmu->ias, smmu->oas, smmu->features);
 	return 0;
-- 
2.13.3


^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 18/36] iommu/arm-smmu-v3: Add SVM feature checking
@ 2017-10-06 13:31   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

Aggregate all sanity-checks for sharing CPU page tables with the SMMU
under a single ARM_SMMU_FEAT_SVM bit. For PCIe SVM, users also need to
check FEAT_ATS and FEAT_PRI. For platform SVM, they will most likely have
to check FEAT_STALLS and FEAT_BTM.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 60 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index b23f69aa242e..96347aad605f 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -661,6 +661,7 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_STALL_FORCE	(1 << 13)
 #define ARM_SMMU_FEAT_E2H		(1 << 14)
 #define ARM_SMMU_FEAT_BTM		(1 << 15)
+#define ARM_SMMU_FEAT_SVM		(1 << 16)
 	u32				features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
@@ -2838,6 +2839,62 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
 	return 0;
 }
 
+static bool arm_smmu_supports_svm(struct arm_smmu_device *smmu)
+{
+	unsigned long reg, fld;
+	unsigned long oas;
+	unsigned long asid_bits;
+
+	u32 feat_mask = ARM_SMMU_FEAT_BTM | ARM_SMMU_FEAT_COHERENCY;
+
+	if ((smmu->features & feat_mask) != feat_mask)
+		return false;
+
+	if (!(smmu->pgsize_bitmap & PAGE_SIZE))
+		return false;
+
+	/*
+	 * Get the smallest PA size of all CPUs (sanitized by cpufeature). We're
+	 * not even pretending to support AArch32 here.
+	 */
+	reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
+	fld = cpuid_feature_extract_unsigned_field(reg, ID_AA64MMFR0_PARANGE_SHIFT);
+	switch (fld) {
+	case 0x0:
+		oas = 32;
+		break;
+	case 0x1:
+		oas = 36;
+		break;
+	case 0x2:
+		oas = 40;
+		break;
+	case 0x3:
+		oas = 42;
+		break;
+	case 0x4:
+		oas = 44;
+		break;
+	case 0x5:
+		oas = 48;
+		break;
+	default:
+		return false;
+	}
+
+	/* abort if MMU outputs addresses greater than what we support. */
+	if (smmu->oas < oas)
+		return false;
+
+	/* We can support bigger ASIDs than the CPU, but not smaller */
+	fld = cpuid_feature_extract_unsigned_field(reg, ID_AA64MMFR0_ASID_SHIFT);
+	asid_bits = fld ? 16 : 8;
+	if (smmu->asid_bits < asid_bits)
+		return false;
+
+	return true;
+}
+
 static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 {
 	u32 reg;
@@ -3032,6 +3089,9 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 
 	smmu->ias = max(smmu->ias, smmu->oas);
 
+	if (arm_smmu_supports_svm(smmu))
+		smmu->features |= ARM_SMMU_FEAT_SVM;
+
 	dev_info(smmu->dev, "ias %lu-bit, oas %lu-bit (features 0x%08x)\n",
 		 smmu->ias, smmu->oas, smmu->features);
 	return 0;
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 19/36] arm64: mm: Pin down ASIDs for sharing contexts with devices
  2017-10-06 13:31 ` Jean-Philippe Brucker
  (?)
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: joro-zLv9SwRftAIdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	mark.rutland-5wv7dgnIgG8, catalin.marinas-5wv7dgnIgG8,
	will.deacon-5wv7dgnIgG8, lorenzo.pieralisi-5wv7dgnIgG8,
	hanjun.guo-QSEj5FYQhm4dnm+yROfE0A, sudeep.holla-5wv7dgnIgG8,
	rjw-LthD3rsA81gm4RdzfppkhA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	robin.murphy-5wv7dgnIgG8, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	tn-nYOzD4b6Jr9Wk0Htik3J/w, liubo95-hv44wF8Li93QT0dZR+AlfA,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	gabriele.paoloni-hv44wF8Li93QT0dZR+AlfA,
	nwatters-sgV2jX0FEOL9JmXXK+q4OQ, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA,
	yi.l.liu-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	robdclark-Re5JQEeQqe8AvxtiuMwx3w

In order to enable address space sharing with the IOMMU, we introduce
functions mm_context_get and mm_context_put, that pin down a context and
ensure that its ASID won't be modified willy-nilly after a rollover.

Pinning is necessary because, once a device is using an ASID, it needs a
valid and unique one at all times, whether the associated task is running
or not.

Without pinning, we would need to notify the IOMMU when we're about to use
a new ASID for a task. Things would get messy when a new task is assigned
a shared ASID. Consider the following scenario:

1. Task t1 is running on CPUx with shared ASID (1, 1)
2. Task t2 is scheduled on CPUx, gets ASID (1, 2)
3. Task tn is scheduled on CPUy, a rollover occurs, tn gets ASID (2, 1)
   We would now have to immediately generate a new ASID for t1, notify
   the IOMMU, and finally enable task tn. We are holding the lock during
   all that time, since we can't afford having another CPU trigger a
   rollover.

It gets needlessly complicated, and all we wanted to do was schedule poor
task tn, that has no business with the IOMMU. By letting the IOMMU pin
tasks when needed, we avoid stalling the slow path, and let the pinning
fail when we're out of potential ASIDs.

After a rollover, we assume that there is at least one more ASID than
number of CPUs. So we can use (NR_ASIDS - NR_CPUS - 1) as a hard limit for
the number of ASIDs we can afford to share with the IOMMU.

Since multiple IOMMUs could pin the same context, we need to keep track of
the number of references. Add a refcount value in mm_context_t for this
purpose.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 arch/arm64/include/asm/mmu.h         |  1 +
 arch/arm64/include/asm/mmu_context.h | 11 ++++-
 arch/arm64/mm/context.c              | 80 +++++++++++++++++++++++++++++++++++-
 3 files changed, 90 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 0d34bf0a89c7..3e687fc49825 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -20,6 +20,7 @@
 
 typedef struct {
 	atomic64_t	id;
+	unsigned long	refcount;
 	void		*vdso;
 	unsigned long	flags;
 } mm_context_t;
diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index 3257895a9b5e..52c2f8e04a18 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -154,7 +154,13 @@ static inline void cpu_replace_ttbr1(pgd_t *pgd)
 #define destroy_context(mm)		do { } while(0)
 void check_and_switch_context(struct mm_struct *mm, unsigned int cpu);
 
-#define init_new_context(tsk,mm)	({ atomic64_set(&(mm)->context.id, 0); 0; })
+static inline int
+init_new_context(struct task_struct *tsk, struct mm_struct *mm)
+{
+	atomic64_set(&mm->context.id, 0);
+	mm->context.refcount = 0;
+	return 0;
+}
 
 /*
  * This is called when "tsk" is about to enter lazy TLB mode.
@@ -226,6 +232,9 @@ switch_mm(struct mm_struct *prev, struct mm_struct *next,
 
 void verify_cpu_asid_bits(void);
 
+unsigned long mm_context_get(struct mm_struct *mm);
+void mm_context_put(struct mm_struct *mm);
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* !__ASM_MMU_CONTEXT_H */
diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index ab9f5f0fb2c7..a15c90083a57 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -37,6 +37,10 @@ static DEFINE_PER_CPU(atomic64_t, active_asids);
 static DEFINE_PER_CPU(u64, reserved_asids);
 static cpumask_t tlb_flush_pending;
 
+static unsigned long max_pinned_asids;
+static unsigned long nr_pinned_asids;
+static unsigned long *pinned_asid_map;
+
 #define ASID_MASK		(~GENMASK(asid_bits - 1, 0))
 #define ASID_FIRST_VERSION	(1UL << asid_bits)
 #define NUM_USER_ASIDS		ASID_FIRST_VERSION
@@ -92,7 +96,7 @@ static void flush_context(unsigned int cpu)
 	u64 asid;
 
 	/* Update the list of reserved ASIDs and the ASID bitmap. */
-	bitmap_clear(asid_map, 0, NUM_USER_ASIDS);
+	bitmap_copy(asid_map, pinned_asid_map, NUM_USER_ASIDS);
 
 	set_reserved_asid_bits();
 
@@ -154,6 +158,10 @@ static u64 new_context(struct mm_struct *mm, unsigned int cpu)
 	if (asid != 0) {
 		u64 newasid = generation | (asid & ~ASID_MASK);
 
+		/* That ASID is pinned for us, we're good to go. */
+		if (mm->context.refcount)
+			return newasid;
+
 		/*
 		 * If our current ASID was active during a rollover, we
 		 * can continue to use it and this was just a false alarm.
@@ -235,6 +243,63 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)
 		cpu_switch_mm(mm->pgd, mm);
 }
 
+unsigned long mm_context_get(struct mm_struct *mm)
+{
+	unsigned long flags;
+	u64 asid;
+
+	raw_spin_lock_irqsave(&cpu_asid_lock, flags);
+
+	asid = atomic64_read(&mm->context.id);
+
+	if (mm->context.refcount) {
+		mm->context.refcount++;
+		asid &= ~ASID_MASK;
+		goto out_unlock;
+	}
+
+	if (nr_pinned_asids >= max_pinned_asids) {
+		asid = 0;
+		goto out_unlock;
+	}
+
+	if (((asid ^ atomic64_read(&asid_generation)) >> asid_bits)) {
+		/*
+		 * We went through one or more rollover since that ASID was
+		 * used. Ensure that it is still valid, or generate a new one.
+		 * The cpu argument isn't used by new_context.
+		 */
+		asid = new_context(mm, 0);
+		atomic64_set(&mm->context.id, asid);
+	}
+
+	asid &= ~ASID_MASK;
+
+	nr_pinned_asids++;
+	__set_bit(asid, pinned_asid_map);
+	mm->context.refcount++;
+
+out_unlock:
+	raw_spin_unlock_irqrestore(&cpu_asid_lock, flags);
+
+	return asid;
+}
+
+void mm_context_put(struct mm_struct *mm)
+{
+	unsigned long flags;
+	u64 asid = atomic64_read(&mm->context.id) & ~ASID_MASK;
+
+	raw_spin_lock_irqsave(&cpu_asid_lock, flags);
+
+	if (--mm->context.refcount == 0) {
+		__clear_bit(asid, pinned_asid_map);
+		nr_pinned_asids--;
+	}
+
+	raw_spin_unlock_irqrestore(&cpu_asid_lock, flags);
+}
+
 static int asids_init(void)
 {
 	asid_bits = get_cpu_asid_bits();
@@ -252,6 +317,19 @@ static int asids_init(void)
 
 	set_reserved_asid_bits();
 
+	pinned_asid_map = kzalloc(BITS_TO_LONGS(NUM_USER_ASIDS)
+				  * sizeof(*pinned_asid_map), GFP_KERNEL);
+	if (!pinned_asid_map)
+		panic("Failed to allocate pinned bitmap\n");
+
+	/*
+	 * We assume that an ASID is always available after a rollback. This
+	 * means that even if all CPUs have a reserved ASID, there still is at
+	 * least one slot available in the asid_bitmap.
+	 */
+	max_pinned_asids = NUM_USER_ASIDS - num_possible_cpus() - 2;
+	nr_pinned_asids = 0;
+
 	pr_info("ASID allocator initialised with %lu entries\n", NUM_USER_ASIDS);
 	return 0;
 }
-- 
2.13.3

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 19/36] arm64: mm: Pin down ASIDs for sharing contexts with devices
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

In order to enable address space sharing with the IOMMU, we introduce
functions mm_context_get and mm_context_put, that pin down a context and
ensure that its ASID won't be modified willy-nilly after a rollover.

Pinning is necessary because, once a device is using an ASID, it needs a
valid and unique one at all times, whether the associated task is running
or not.

Without pinning, we would need to notify the IOMMU when we're about to use
a new ASID for a task. Things would get messy when a new task is assigned
a shared ASID. Consider the following scenario:

1. Task t1 is running on CPUx with shared ASID (1, 1)
2. Task t2 is scheduled on CPUx, gets ASID (1, 2)
3. Task tn is scheduled on CPUy, a rollover occurs, tn gets ASID (2, 1)
   We would now have to immediately generate a new ASID for t1, notify
   the IOMMU, and finally enable task tn. We are holding the lock during
   all that time, since we can't afford having another CPU trigger a
   rollover.

It gets needlessly complicated, and all we wanted to do was schedule poor
task tn, that has no business with the IOMMU. By letting the IOMMU pin
tasks when needed, we avoid stalling the slow path, and let the pinning
fail when we're out of potential ASIDs.

After a rollover, we assume that there is at least one more ASID than
number of CPUs. So we can use (NR_ASIDS - NR_CPUS - 1) as a hard limit for
the number of ASIDs we can afford to share with the IOMMU.

Since multiple IOMMUs could pin the same context, we need to keep track of
the number of references. Add a refcount value in mm_context_t for this
purpose.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 arch/arm64/include/asm/mmu.h         |  1 +
 arch/arm64/include/asm/mmu_context.h | 11 ++++-
 arch/arm64/mm/context.c              | 80 +++++++++++++++++++++++++++++++++++-
 3 files changed, 90 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 0d34bf0a89c7..3e687fc49825 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -20,6 +20,7 @@
 
 typedef struct {
 	atomic64_t	id;
+	unsigned long	refcount;
 	void		*vdso;
 	unsigned long	flags;
 } mm_context_t;
diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index 3257895a9b5e..52c2f8e04a18 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -154,7 +154,13 @@ static inline void cpu_replace_ttbr1(pgd_t *pgd)
 #define destroy_context(mm)		do { } while(0)
 void check_and_switch_context(struct mm_struct *mm, unsigned int cpu);
 
-#define init_new_context(tsk,mm)	({ atomic64_set(&(mm)->context.id, 0); 0; })
+static inline int
+init_new_context(struct task_struct *tsk, struct mm_struct *mm)
+{
+	atomic64_set(&mm->context.id, 0);
+	mm->context.refcount = 0;
+	return 0;
+}
 
 /*
  * This is called when "tsk" is about to enter lazy TLB mode.
@@ -226,6 +232,9 @@ switch_mm(struct mm_struct *prev, struct mm_struct *next,
 
 void verify_cpu_asid_bits(void);
 
+unsigned long mm_context_get(struct mm_struct *mm);
+void mm_context_put(struct mm_struct *mm);
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* !__ASM_MMU_CONTEXT_H */
diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index ab9f5f0fb2c7..a15c90083a57 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -37,6 +37,10 @@ static DEFINE_PER_CPU(atomic64_t, active_asids);
 static DEFINE_PER_CPU(u64, reserved_asids);
 static cpumask_t tlb_flush_pending;
 
+static unsigned long max_pinned_asids;
+static unsigned long nr_pinned_asids;
+static unsigned long *pinned_asid_map;
+
 #define ASID_MASK		(~GENMASK(asid_bits - 1, 0))
 #define ASID_FIRST_VERSION	(1UL << asid_bits)
 #define NUM_USER_ASIDS		ASID_FIRST_VERSION
@@ -92,7 +96,7 @@ static void flush_context(unsigned int cpu)
 	u64 asid;
 
 	/* Update the list of reserved ASIDs and the ASID bitmap. */
-	bitmap_clear(asid_map, 0, NUM_USER_ASIDS);
+	bitmap_copy(asid_map, pinned_asid_map, NUM_USER_ASIDS);
 
 	set_reserved_asid_bits();
 
@@ -154,6 +158,10 @@ static u64 new_context(struct mm_struct *mm, unsigned int cpu)
 	if (asid != 0) {
 		u64 newasid = generation | (asid & ~ASID_MASK);
 
+		/* That ASID is pinned for us, we're good to go. */
+		if (mm->context.refcount)
+			return newasid;
+
 		/*
 		 * If our current ASID was active during a rollover, we
 		 * can continue to use it and this was just a false alarm.
@@ -235,6 +243,63 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)
 		cpu_switch_mm(mm->pgd, mm);
 }
 
+unsigned long mm_context_get(struct mm_struct *mm)
+{
+	unsigned long flags;
+	u64 asid;
+
+	raw_spin_lock_irqsave(&cpu_asid_lock, flags);
+
+	asid = atomic64_read(&mm->context.id);
+
+	if (mm->context.refcount) {
+		mm->context.refcount++;
+		asid &= ~ASID_MASK;
+		goto out_unlock;
+	}
+
+	if (nr_pinned_asids >= max_pinned_asids) {
+		asid = 0;
+		goto out_unlock;
+	}
+
+	if (((asid ^ atomic64_read(&asid_generation)) >> asid_bits)) {
+		/*
+		 * We went through one or more rollover since that ASID was
+		 * used. Ensure that it is still valid, or generate a new one.
+		 * The cpu argument isn't used by new_context.
+		 */
+		asid = new_context(mm, 0);
+		atomic64_set(&mm->context.id, asid);
+	}
+
+	asid &= ~ASID_MASK;
+
+	nr_pinned_asids++;
+	__set_bit(asid, pinned_asid_map);
+	mm->context.refcount++;
+
+out_unlock:
+	raw_spin_unlock_irqrestore(&cpu_asid_lock, flags);
+
+	return asid;
+}
+
+void mm_context_put(struct mm_struct *mm)
+{
+	unsigned long flags;
+	u64 asid = atomic64_read(&mm->context.id) & ~ASID_MASK;
+
+	raw_spin_lock_irqsave(&cpu_asid_lock, flags);
+
+	if (--mm->context.refcount == 0) {
+		__clear_bit(asid, pinned_asid_map);
+		nr_pinned_asids--;
+	}
+
+	raw_spin_unlock_irqrestore(&cpu_asid_lock, flags);
+}
+
 static int asids_init(void)
 {
 	asid_bits = get_cpu_asid_bits();
@@ -252,6 +317,19 @@ static int asids_init(void)
 
 	set_reserved_asid_bits();
 
+	pinned_asid_map = kzalloc(BITS_TO_LONGS(NUM_USER_ASIDS)
+				  * sizeof(*pinned_asid_map), GFP_KERNEL);
+	if (!pinned_asid_map)
+		panic("Failed to allocate pinned bitmap\n");
+
+	/*
+	 * We assume that an ASID is always available after a rollback. This
+	 * means that even if all CPUs have a reserved ASID, there still is at
+	 * least one slot available in the asid_bitmap.
+	 */
+	max_pinned_asids = NUM_USER_ASIDS - num_possible_cpus() - 2;
+	nr_pinned_asids = 0;
+
 	pr_info("ASID allocator initialised with %lu entries\n", NUM_USER_ASIDS);
 	return 0;
 }
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 19/36] arm64: mm: Pin down ASIDs for sharing contexts with devices
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

In order to enable address space sharing with the IOMMU, we introduce
functions mm_context_get and mm_context_put, that pin down a context and
ensure that its ASID won't be modified willy-nilly after a rollover.

Pinning is necessary because, once a device is using an ASID, it needs a
valid and unique one at all times, whether the associated task is running
or not.

Without pinning, we would need to notify the IOMMU when we're about to use
a new ASID for a task. Things would get messy when a new task is assigned
a shared ASID. Consider the following scenario:

1. Task t1 is running on CPUx with shared ASID (1, 1)
2. Task t2 is scheduled on CPUx, gets ASID (1, 2)
3. Task tn is scheduled on CPUy, a rollover occurs, tn gets ASID (2, 1)
   We would now have to immediately generate a new ASID for t1, notify
   the IOMMU, and finally enable task tn. We are holding the lock during
   all that time, since we can't afford having another CPU trigger a
   rollover.

It gets needlessly complicated, and all we wanted to do was schedule poor
task tn, that has no business with the IOMMU. By letting the IOMMU pin
tasks when needed, we avoid stalling the slow path, and let the pinning
fail when we're out of potential ASIDs.

After a rollover, we assume that there is at least one more ASID than
number of CPUs. So we can use (NR_ASIDS - NR_CPUS - 1) as a hard limit for
the number of ASIDs we can afford to share with the IOMMU.

Since multiple IOMMUs could pin the same context, we need to keep track of
the number of references. Add a refcount value in mm_context_t for this
purpose.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 arch/arm64/include/asm/mmu.h         |  1 +
 arch/arm64/include/asm/mmu_context.h | 11 ++++-
 arch/arm64/mm/context.c              | 80 +++++++++++++++++++++++++++++++++++-
 3 files changed, 90 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 0d34bf0a89c7..3e687fc49825 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -20,6 +20,7 @@
 
 typedef struct {
 	atomic64_t	id;
+	unsigned long	refcount;
 	void		*vdso;
 	unsigned long	flags;
 } mm_context_t;
diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index 3257895a9b5e..52c2f8e04a18 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -154,7 +154,13 @@ static inline void cpu_replace_ttbr1(pgd_t *pgd)
 #define destroy_context(mm)		do { } while(0)
 void check_and_switch_context(struct mm_struct *mm, unsigned int cpu);
 
-#define init_new_context(tsk,mm)	({ atomic64_set(&(mm)->context.id, 0); 0; })
+static inline int
+init_new_context(struct task_struct *tsk, struct mm_struct *mm)
+{
+	atomic64_set(&mm->context.id, 0);
+	mm->context.refcount = 0;
+	return 0;
+}
 
 /*
  * This is called when "tsk" is about to enter lazy TLB mode.
@@ -226,6 +232,9 @@ switch_mm(struct mm_struct *prev, struct mm_struct *next,
 
 void verify_cpu_asid_bits(void);
 
+unsigned long mm_context_get(struct mm_struct *mm);
+void mm_context_put(struct mm_struct *mm);
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* !__ASM_MMU_CONTEXT_H */
diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index ab9f5f0fb2c7..a15c90083a57 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -37,6 +37,10 @@ static DEFINE_PER_CPU(atomic64_t, active_asids);
 static DEFINE_PER_CPU(u64, reserved_asids);
 static cpumask_t tlb_flush_pending;
 
+static unsigned long max_pinned_asids;
+static unsigned long nr_pinned_asids;
+static unsigned long *pinned_asid_map;
+
 #define ASID_MASK		(~GENMASK(asid_bits - 1, 0))
 #define ASID_FIRST_VERSION	(1UL << asid_bits)
 #define NUM_USER_ASIDS		ASID_FIRST_VERSION
@@ -92,7 +96,7 @@ static void flush_context(unsigned int cpu)
 	u64 asid;
 
 	/* Update the list of reserved ASIDs and the ASID bitmap. */
-	bitmap_clear(asid_map, 0, NUM_USER_ASIDS);
+	bitmap_copy(asid_map, pinned_asid_map, NUM_USER_ASIDS);
 
 	set_reserved_asid_bits();
 
@@ -154,6 +158,10 @@ static u64 new_context(struct mm_struct *mm, unsigned int cpu)
 	if (asid != 0) {
 		u64 newasid = generation | (asid & ~ASID_MASK);
 
+		/* That ASID is pinned for us, we're good to go. */
+		if (mm->context.refcount)
+			return newasid;
+
 		/*
 		 * If our current ASID was active during a rollover, we
 		 * can continue to use it and this was just a false alarm.
@@ -235,6 +243,63 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)
 		cpu_switch_mm(mm->pgd, mm);
 }
 
+unsigned long mm_context_get(struct mm_struct *mm)
+{
+	unsigned long flags;
+	u64 asid;
+
+	raw_spin_lock_irqsave(&cpu_asid_lock, flags);
+
+	asid = atomic64_read(&mm->context.id);
+
+	if (mm->context.refcount) {
+		mm->context.refcount++;
+		asid &= ~ASID_MASK;
+		goto out_unlock;
+	}
+
+	if (nr_pinned_asids >= max_pinned_asids) {
+		asid = 0;
+		goto out_unlock;
+	}
+
+	if (((asid ^ atomic64_read(&asid_generation)) >> asid_bits)) {
+		/*
+		 * We went through one or more rollover since that ASID was
+		 * used. Ensure that it is still valid, or generate a new one.
+		 * The cpu argument isn't used by new_context.
+		 */
+		asid = new_context(mm, 0);
+		atomic64_set(&mm->context.id, asid);
+	}
+
+	asid &= ~ASID_MASK;
+
+	nr_pinned_asids++;
+	__set_bit(asid, pinned_asid_map);
+	mm->context.refcount++;
+
+out_unlock:
+	raw_spin_unlock_irqrestore(&cpu_asid_lock, flags);
+
+	return asid;
+}
+
+void mm_context_put(struct mm_struct *mm)
+{
+	unsigned long flags;
+	u64 asid = atomic64_read(&mm->context.id) & ~ASID_MASK;
+
+	raw_spin_lock_irqsave(&cpu_asid_lock, flags);
+
+	if (--mm->context.refcount == 0) {
+		__clear_bit(asid, pinned_asid_map);
+		nr_pinned_asids--;
+	}
+
+	raw_spin_unlock_irqrestore(&cpu_asid_lock, flags);
+}
+
 static int asids_init(void)
 {
 	asid_bits = get_cpu_asid_bits();
@@ -252,6 +317,19 @@ static int asids_init(void)
 
 	set_reserved_asid_bits();
 
+	pinned_asid_map = kzalloc(BITS_TO_LONGS(NUM_USER_ASIDS)
+				  * sizeof(*pinned_asid_map), GFP_KERNEL);
+	if (!pinned_asid_map)
+		panic("Failed to allocate pinned bitmap\n");
+
+	/*
+	 * We assume that an ASID is always available after a rollback. This
+	 * means that even if all CPUs have a reserved ASID, there still is at
+	 * least one slot available in the asid_bitmap.
+	 */
+	max_pinned_asids = NUM_USER_ASIDS - num_possible_cpus() - 2;
+	nr_pinned_asids = 0;
+
 	pr_info("ASID allocator initialised with %lu entries\n", NUM_USER_ASIDS);
 	return 0;
 }
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 20/36] iommu/arm-smmu-v3: Track ASID state
  2017-10-06 13:31 ` Jean-Philippe Brucker
  (?)
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: joro-zLv9SwRftAIdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	mark.rutland-5wv7dgnIgG8, catalin.marinas-5wv7dgnIgG8,
	will.deacon-5wv7dgnIgG8, lorenzo.pieralisi-5wv7dgnIgG8,
	hanjun.guo-QSEj5FYQhm4dnm+yROfE0A, sudeep.holla-5wv7dgnIgG8,
	rjw-LthD3rsA81gm4RdzfppkhA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	robin.murphy-5wv7dgnIgG8, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	tn-nYOzD4b6Jr9Wk0Htik3J/w, liubo95-hv44wF8Li93QT0dZR+AlfA,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	gabriele.paoloni-hv44wF8Li93QT0dZR+AlfA,
	nwatters-sgV2jX0FEOL9JmXXK+q4OQ, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA,
	yi.l.liu-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	robdclark-Re5JQEeQqe8AvxtiuMwx3w

At the moment each SMMU has a 8- or 16-bit ASID set and allocates one ASID
per device via a bitmap. ASIDs are used to differentiate address spaces in
SMMU TLB entries. With SVM, sharing process address spaces with the SMMU,
we need to use CPU ASIDs in SMMU contexts, to ensure that broadcast TLB
invalidations reach the right IOTLB entries.

When binding a process address space to a device, we become slaves to the
arch ASID allocator. We have to use whatever ASID they give us. If a
domain is currently using it, then we'll either abort or steal that ASID.

To make matters worse, tasks are global, while domains are per-SMMU. SMMU
ASIDs can be aliased across different SMMUs, but the CPU ASID space is
unique across the whole system.

Introduce an IDR for SMMU ASID allocation. It allows to keep information
about an ASID, for instance which domain it is assigned to or how many
devices are using it.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu-v3.c | 53 +++++++++++++++++++++++++++++++++++++--------
 1 file changed, 44 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 96347aad605f..71fc3a2c8a95 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -640,6 +640,10 @@ struct arm_smmu_strtab_cfg {
 	u32				strtab_base_cfg;
 };
 
+struct arm_smmu_asid_state {
+	struct arm_smmu_domain		*domain;
+};
+
 /* An SMMUv3 instance */
 struct arm_smmu_device {
 	struct device			*dev;
@@ -681,7 +685,8 @@ struct arm_smmu_device {
 
 #define ARM_SMMU_MAX_ASIDS		(1 << 16)
 	unsigned int			asid_bits;
-	DECLARE_BITMAP(asid_map, ARM_SMMU_MAX_ASIDS);
+	struct idr			asid_idr;
+	spinlock_t			asid_lock;
 
 #define ARM_SMMU_MAX_VMIDS		(1 << 16)
 	unsigned int			vmid_bits;
@@ -1828,7 +1833,11 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
 		struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
 		if (cfg->num_contexts) {
 			arm_smmu_free_cd_tables(smmu_domain);
-			arm_smmu_bitmap_free(smmu->asid_map, cfg->cd.asid);
+
+			spin_lock(&smmu->asid_lock);
+			kfree(idr_find(&smmu->asid_idr, cfg->cd.asid));
+			idr_remove(&smmu->asid_idr, cfg->cd.asid);
+			spin_unlock(&smmu->asid_lock);
 		}
 	} else {
 		struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg;
@@ -1844,25 +1853,48 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 {
 	int ret;
 	int asid;
+	struct arm_smmu_asid_state *asid_state;
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
 
-	asid = arm_smmu_bitmap_alloc(smmu->asid_map, smmu->asid_bits);
-	if (asid < 0)
-		return asid;
-
 	ret = arm_smmu_alloc_cd_tables(smmu_domain);
 	if (ret)
-		goto out_free_asid;
+		return ret;
+
+	asid_state = kzalloc(sizeof(*asid_state), GFP_KERNEL);
+	if (!asid_state) {
+		ret = -ENOMEM;
+		goto out_free_tables;
+	}
+
+	asid_state->domain = smmu_domain;
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&smmu->asid_lock);
+	asid = idr_alloc_cyclic(&smmu->asid_idr, asid_state, 0,
+				1 << smmu->asid_bits, GFP_ATOMIC);
 
 	cfg->cd.asid	= (u16)asid;
 	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
 	cfg->cd.tcr	= pgtbl_cfg->arm_lpae_s1_cfg.tcr;
 	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair[0];
+
+	spin_unlock(&smmu->asid_lock);
+	idr_preload_end();
+
+	if (asid < 0) {
+		ret = asid;
+		goto out_free_asid_state;
+	}
+
 	return 0;
 
-out_free_asid:
-	arm_smmu_bitmap_free(smmu->asid_map, asid);
+out_free_asid_state:
+	kfree(asid_state);
+
+out_free_tables:
+	arm_smmu_free_cd_tables(smmu_domain);
+
 	return ret;
 }
 
@@ -2506,6 +2538,9 @@ static int arm_smmu_init_structures(struct arm_smmu_device *smmu)
 {
 	int ret;
 
+	spin_lock_init(&smmu->asid_lock);
+	idr_init(&smmu->asid_idr);
+
 	ret = arm_smmu_init_queues(smmu);
 	if (ret)
 		return ret;
-- 
2.13.3

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 20/36] iommu/arm-smmu-v3: Track ASID state
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

At the moment each SMMU has a 8- or 16-bit ASID set and allocates one ASID
per device via a bitmap. ASIDs are used to differentiate address spaces in
SMMU TLB entries. With SVM, sharing process address spaces with the SMMU,
we need to use CPU ASIDs in SMMU contexts, to ensure that broadcast TLB
invalidations reach the right IOTLB entries.

When binding a process address space to a device, we become slaves to the
arch ASID allocator. We have to use whatever ASID they give us. If a
domain is currently using it, then we'll either abort or steal that ASID.

To make matters worse, tasks are global, while domains are per-SMMU. SMMU
ASIDs can be aliased across different SMMUs, but the CPU ASID space is
unique across the whole system.

Introduce an IDR for SMMU ASID allocation. It allows to keep information
about an ASID, for instance which domain it is assigned to or how many
devices are using it.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 53 +++++++++++++++++++++++++++++++++++++--------
 1 file changed, 44 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 96347aad605f..71fc3a2c8a95 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -640,6 +640,10 @@ struct arm_smmu_strtab_cfg {
 	u32				strtab_base_cfg;
 };
 
+struct arm_smmu_asid_state {
+	struct arm_smmu_domain		*domain;
+};
+
 /* An SMMUv3 instance */
 struct arm_smmu_device {
 	struct device			*dev;
@@ -681,7 +685,8 @@ struct arm_smmu_device {
 
 #define ARM_SMMU_MAX_ASIDS		(1 << 16)
 	unsigned int			asid_bits;
-	DECLARE_BITMAP(asid_map, ARM_SMMU_MAX_ASIDS);
+	struct idr			asid_idr;
+	spinlock_t			asid_lock;
 
 #define ARM_SMMU_MAX_VMIDS		(1 << 16)
 	unsigned int			vmid_bits;
@@ -1828,7 +1833,11 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
 		struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
 		if (cfg->num_contexts) {
 			arm_smmu_free_cd_tables(smmu_domain);
-			arm_smmu_bitmap_free(smmu->asid_map, cfg->cd.asid);
+
+			spin_lock(&smmu->asid_lock);
+			kfree(idr_find(&smmu->asid_idr, cfg->cd.asid));
+			idr_remove(&smmu->asid_idr, cfg->cd.asid);
+			spin_unlock(&smmu->asid_lock);
 		}
 	} else {
 		struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg;
@@ -1844,25 +1853,48 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 {
 	int ret;
 	int asid;
+	struct arm_smmu_asid_state *asid_state;
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
 
-	asid = arm_smmu_bitmap_alloc(smmu->asid_map, smmu->asid_bits);
-	if (asid < 0)
-		return asid;
-
 	ret = arm_smmu_alloc_cd_tables(smmu_domain);
 	if (ret)
-		goto out_free_asid;
+		return ret;
+
+	asid_state = kzalloc(sizeof(*asid_state), GFP_KERNEL);
+	if (!asid_state) {
+		ret = -ENOMEM;
+		goto out_free_tables;
+	}
+
+	asid_state->domain = smmu_domain;
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&smmu->asid_lock);
+	asid = idr_alloc_cyclic(&smmu->asid_idr, asid_state, 0,
+				1 << smmu->asid_bits, GFP_ATOMIC);
 
 	cfg->cd.asid	= (u16)asid;
 	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
 	cfg->cd.tcr	= pgtbl_cfg->arm_lpae_s1_cfg.tcr;
 	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair[0];
+
+	spin_unlock(&smmu->asid_lock);
+	idr_preload_end();
+
+	if (asid < 0) {
+		ret = asid;
+		goto out_free_asid_state;
+	}
+
 	return 0;
 
-out_free_asid:
-	arm_smmu_bitmap_free(smmu->asid_map, asid);
+out_free_asid_state:
+	kfree(asid_state);
+
+out_free_tables:
+	arm_smmu_free_cd_tables(smmu_domain);
+
 	return ret;
 }
 
@@ -2506,6 +2538,9 @@ static int arm_smmu_init_structures(struct arm_smmu_device *smmu)
 {
 	int ret;
 
+	spin_lock_init(&smmu->asid_lock);
+	idr_init(&smmu->asid_idr);
+
 	ret = arm_smmu_init_queues(smmu);
 	if (ret)
 		return ret;
-- 
2.13.3


^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 20/36] iommu/arm-smmu-v3: Track ASID state
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

At the moment each SMMU has a 8- or 16-bit ASID set and allocates one ASID
per device via a bitmap. ASIDs are used to differentiate address spaces in
SMMU TLB entries. With SVM, sharing process address spaces with the SMMU,
we need to use CPU ASIDs in SMMU contexts, to ensure that broadcast TLB
invalidations reach the right IOTLB entries.

When binding a process address space to a device, we become slaves to the
arch ASID allocator. We have to use whatever ASID they give us. If a
domain is currently using it, then we'll either abort or steal that ASID.

To make matters worse, tasks are global, while domains are per-SMMU. SMMU
ASIDs can be aliased across different SMMUs, but the CPU ASID space is
unique across the whole system.

Introduce an IDR for SMMU ASID allocation. It allows to keep information
about an ASID, for instance which domain it is assigned to or how many
devices are using it.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 53 +++++++++++++++++++++++++++++++++++++--------
 1 file changed, 44 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 96347aad605f..71fc3a2c8a95 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -640,6 +640,10 @@ struct arm_smmu_strtab_cfg {
 	u32				strtab_base_cfg;
 };
 
+struct arm_smmu_asid_state {
+	struct arm_smmu_domain		*domain;
+};
+
 /* An SMMUv3 instance */
 struct arm_smmu_device {
 	struct device			*dev;
@@ -681,7 +685,8 @@ struct arm_smmu_device {
 
 #define ARM_SMMU_MAX_ASIDS		(1 << 16)
 	unsigned int			asid_bits;
-	DECLARE_BITMAP(asid_map, ARM_SMMU_MAX_ASIDS);
+	struct idr			asid_idr;
+	spinlock_t			asid_lock;
 
 #define ARM_SMMU_MAX_VMIDS		(1 << 16)
 	unsigned int			vmid_bits;
@@ -1828,7 +1833,11 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
 		struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
 		if (cfg->num_contexts) {
 			arm_smmu_free_cd_tables(smmu_domain);
-			arm_smmu_bitmap_free(smmu->asid_map, cfg->cd.asid);
+
+			spin_lock(&smmu->asid_lock);
+			kfree(idr_find(&smmu->asid_idr, cfg->cd.asid));
+			idr_remove(&smmu->asid_idr, cfg->cd.asid);
+			spin_unlock(&smmu->asid_lock);
 		}
 	} else {
 		struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg;
@@ -1844,25 +1853,48 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 {
 	int ret;
 	int asid;
+	struct arm_smmu_asid_state *asid_state;
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
 
-	asid = arm_smmu_bitmap_alloc(smmu->asid_map, smmu->asid_bits);
-	if (asid < 0)
-		return asid;
-
 	ret = arm_smmu_alloc_cd_tables(smmu_domain);
 	if (ret)
-		goto out_free_asid;
+		return ret;
+
+	asid_state = kzalloc(sizeof(*asid_state), GFP_KERNEL);
+	if (!asid_state) {
+		ret = -ENOMEM;
+		goto out_free_tables;
+	}
+
+	asid_state->domain = smmu_domain;
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&smmu->asid_lock);
+	asid = idr_alloc_cyclic(&smmu->asid_idr, asid_state, 0,
+				1 << smmu->asid_bits, GFP_ATOMIC);
 
 	cfg->cd.asid	= (u16)asid;
 	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
 	cfg->cd.tcr	= pgtbl_cfg->arm_lpae_s1_cfg.tcr;
 	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair[0];
+
+	spin_unlock(&smmu->asid_lock);
+	idr_preload_end();
+
+	if (asid < 0) {
+		ret = asid;
+		goto out_free_asid_state;
+	}
+
 	return 0;
 
-out_free_asid:
-	arm_smmu_bitmap_free(smmu->asid_map, asid);
+out_free_asid_state:
+	kfree(asid_state);
+
+out_free_tables:
+	arm_smmu_free_cd_tables(smmu_domain);
+
 	return ret;
 }
 
@@ -2506,6 +2538,9 @@ static int arm_smmu_init_structures(struct arm_smmu_device *smmu)
 {
 	int ret;
 
+	spin_lock_init(&smmu->asid_lock);
+	idr_init(&smmu->asid_idr);
+
 	ret = arm_smmu_init_queues(smmu);
 	if (ret)
 		return ret;
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 21/36] iommu/arm-smmu-v3: Implement process operations
  2017-10-06 13:31 ` Jean-Philippe Brucker
  (?)
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: mark.rutland-5wv7dgnIgG8,
	gabriele.paoloni-hv44wF8Li93QT0dZR+AlfA,
	catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	rjw-LthD3rsA81gm4RdzfppkhA, sudeep.holla-5wv7dgnIgG8

Hook process operations to support PASID and page table sharing with the
SMMUv3:

* process_allocate pins down its ASID and initializes the context
  descriptor fields.
* process_free releases the ASID.
* process_attach checks device capabilities and writes the context
  descriptor. More work is required to ensure that the process' ASID isn't
  being used for io-pgtables.
* process_detach clears the context descriptor and sends required
  invalidations.
* process_invalidate sends required invalidations.
* process_exit stops us of the PASID, clears the context descriptor and
  performs required invalidations.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu-v3.c | 207 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 207 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 71fc3a2c8a95..c86a1182c137 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -29,6 +29,7 @@
 #include <linux/interrupt.h>
 #include <linux/iommu.h>
 #include <linux/iopoll.h>
+#include <linux/mmu_context.h>
 #include <linux/module.h>
 #include <linux/msi.h>
 #include <linux/of.h>
@@ -37,6 +38,7 @@
 #include <linux/of_platform.h>
 #include <linux/pci.h>
 #include <linux/platform_device.h>
+#include <linux/sched/mm.h>
 
 #include <linux/amba/bus.h>
 
@@ -642,6 +644,7 @@ struct arm_smmu_strtab_cfg {
 
 struct arm_smmu_asid_state {
 	struct arm_smmu_domain		*domain;
+	unsigned long			refs;
 };
 
 /* An SMMUv3 instance */
@@ -712,6 +715,9 @@ struct arm_smmu_master_data {
 	struct device			*dev;
 
 	size_t				num_ssids;
+	bool				can_fault;
+	/* Number of processes attached */
+	int				processes;
 };
 
 /* SMMU private data for an IOMMU domain */
@@ -740,6 +746,11 @@ struct arm_smmu_domain {
 	spinlock_t			devices_lock;
 };
 
+struct arm_smmu_process {
+	struct iommu_process		process;
+	struct arm_smmu_ctx_desc	ctx_desc;
+};
+
 struct arm_smmu_option_prop {
 	u32 opt;
 	const char *prop;
@@ -766,6 +777,11 @@ static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
 	return container_of(dom, struct arm_smmu_domain, domain);
 }
 
+static struct arm_smmu_process *to_smmu_process(struct iommu_process *process)
+{
+	return container_of(process, struct arm_smmu_process, process);
+}
+
 static void parse_driver_options(struct arm_smmu_device *smmu)
 {
 	int i = 0;
@@ -2032,6 +2048,13 @@ static void arm_smmu_detach_dev(struct device *dev)
 	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
 	struct arm_smmu_domain *smmu_domain = master->domain;
 
+	/*
+	 * Core is preventing concurrent calls between attach and bind, so this
+	 * read only races with process_exit (FIXME).
+	 */
+	if (master->processes)
+		__iommu_process_unbind_dev_all(&smmu_domain->domain, dev);
+
 	if (smmu_domain) {
 		spin_lock_irqsave(&smmu_domain->devices_lock, flags);
 		list_del(&master->list);
@@ -2143,6 +2166,184 @@ arm_smmu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova)
 	return ops->iova_to_phys(ops, iova);
 }
 
+static int arm_smmu_process_init_pgtable(struct arm_smmu_process *smmu_process,
+					 struct mm_struct *mm)
+{
+	int asid;
+
+	asid = mm_context_get(mm);
+	if (!asid)
+		return -ENOSPC;
+
+	smmu_process->ctx_desc.asid = asid;
+	/* TODO: init the rest */
+
+	return 0;
+}
+
+static struct iommu_process *arm_smmu_process_alloc(struct task_struct *task)
+{
+	int ret;
+	struct mm_struct *mm;
+	struct arm_smmu_process *smmu_process;
+
+	smmu_process = kzalloc(sizeof(*smmu_process), GFP_KERNEL);
+
+	mm = get_task_mm(task);
+	if (!mm) {
+		kfree(smmu_process);
+		return NULL;
+	}
+
+	ret = arm_smmu_process_init_pgtable(smmu_process, mm);
+	mmput(mm);
+	if (ret) {
+		kfree(smmu_process);
+		return NULL;
+	}
+
+	return &smmu_process->process;
+}
+
+static void arm_smmu_process_free(struct iommu_process *process)
+{
+	struct arm_smmu_process *smmu_process = to_smmu_process(process);
+
+	/* Unpin ASID */
+	mm_context_put(process->mm);
+
+	kfree(smmu_process);
+}
+
+static int arm_smmu_process_share(struct arm_smmu_domain *smmu_domain,
+				  struct arm_smmu_process *smmu_process)
+{
+	int asid, ret;
+	struct arm_smmu_asid_state *asid_state;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+	asid = smmu_process->ctx_desc.asid;
+
+	asid_state = idr_find(&smmu->asid_idr, asid);
+	if (asid_state && asid_state->domain) {
+		return -EEXIST;
+	} else if (asid_state) {
+		asid_state->refs++;
+		return 0;
+	}
+
+	asid_state = kzalloc(sizeof(*asid_state), GFP_ATOMIC);
+	asid_state->refs = 1;
+
+	if (!asid_state)
+		return -ENOMEM;
+
+	ret = idr_alloc(&smmu->asid_idr, asid_state, asid, asid + 1, GFP_ATOMIC);
+	return ret < 0 ? ret : 0;
+}
+
+static int arm_smmu_process_attach(struct iommu_domain *domain,
+				   struct device *dev,
+				   struct iommu_process *process, bool first)
+{
+	int ret;
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct arm_smmu_process *smmu_process = to_smmu_process(process);
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
+
+	if (!(smmu->features & ARM_SMMU_FEAT_SVM))
+		return -ENODEV;
+
+	/* TODO: process->no_pasid */
+	if (process->pasid >= master->num_ssids)
+		return -ENODEV;
+
+	/* TODO: process->no_need_for_pri_ill_pin_everything */
+	if (!master->can_fault)
+		return -ENODEV;
+
+	master->processes++;
+
+	if (!first)
+		return 0;
+
+	spin_lock(&smmu->asid_lock);
+	ret = arm_smmu_process_share(smmu_domain, smmu_process);
+	spin_unlock(&smmu->asid_lock);
+	if (ret)
+		return ret;
+
+	arm_smmu_write_ctx_desc(smmu_domain, process->pasid, &smmu_process->ctx_desc);
+
+	return 0;
+}
+
+static void arm_smmu_process_detach(struct iommu_domain *domain,
+				    struct device *dev,
+				    struct iommu_process *process, bool last)
+{
+	struct arm_smmu_asid_state *asid_state;
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct arm_smmu_process *smmu_process = to_smmu_process(process);
+	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+	master->processes--;
+
+	if (last) {
+		spin_lock(&smmu->asid_lock);
+		asid_state = idr_find(&smmu->asid_idr, smmu_process->ctx_desc.asid);
+		if (--asid_state->refs == 0) {
+			idr_remove(&smmu->asid_idr, smmu_process->ctx_desc.asid);
+			kfree(asid_state);
+		}
+		spin_unlock(&smmu->asid_lock);
+
+		arm_smmu_write_ctx_desc(smmu_domain, process->pasid, NULL);
+	}
+
+	/* TODO: Invalidate ATC. */
+	/* TODO: Invalidate all mappings if last and not DVM. */
+}
+
+static void arm_smmu_process_invalidate(struct iommu_domain *domain,
+					struct iommu_process *process,
+					unsigned long iova, size_t size)
+{
+	/*
+	 * TODO: Invalidate ATC.
+	 * TODO: Invalidate mapping if not DVM
+	 */
+}
+
+static void arm_smmu_process_exit(struct iommu_domain *domain,
+				  struct iommu_process *process)
+{
+	struct arm_smmu_master_data *master;
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+
+	if (!domain->process_exit)
+		return;
+
+	spin_lock(&smmu_domain->devices_lock);
+	list_for_each_entry(master, &smmu_domain->devices, list) {
+		if (!master->processes)
+			continue;
+
+		master->processes--;
+		domain->process_exit(domain, master->dev, process->pasid,
+				     domain->process_exit_token);
+
+		/* TODO: inval ATC */
+	}
+	spin_unlock(&smmu_domain->devices_lock);
+
+	arm_smmu_write_ctx_desc(smmu_domain, process->pasid, NULL);
+
+	/* TODO: Invalidate all mappings if not DVM */
+}
+
 static struct platform_driver arm_smmu_driver;
 
 static int arm_smmu_match_node(struct device *dev, void *data)
@@ -2351,6 +2552,12 @@ static struct iommu_ops arm_smmu_ops = {
 	.domain_alloc		= arm_smmu_domain_alloc,
 	.domain_free		= arm_smmu_domain_free,
 	.attach_dev		= arm_smmu_attach_dev,
+	.process_alloc		= arm_smmu_process_alloc,
+	.process_free		= arm_smmu_process_free,
+	.process_attach		= arm_smmu_process_attach,
+	.process_detach		= arm_smmu_process_detach,
+	.process_invalidate	= arm_smmu_process_invalidate,
+	.process_exit		= arm_smmu_process_exit,
 	.map			= arm_smmu_map,
 	.unmap			= arm_smmu_unmap,
 	.map_sg			= default_iommu_map_sg,
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 21/36] iommu/arm-smmu-v3: Implement process operations
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

Hook process operations to support PASID and page table sharing with the
SMMUv3:

* process_allocate pins down its ASID and initializes the context
  descriptor fields.
* process_free releases the ASID.
* process_attach checks device capabilities and writes the context
  descriptor. More work is required to ensure that the process' ASID isn't
  being used for io-pgtables.
* process_detach clears the context descriptor and sends required
  invalidations.
* process_invalidate sends required invalidations.
* process_exit stops us of the PASID, clears the context descriptor and
  performs required invalidations.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 207 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 207 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 71fc3a2c8a95..c86a1182c137 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -29,6 +29,7 @@
 #include <linux/interrupt.h>
 #include <linux/iommu.h>
 #include <linux/iopoll.h>
+#include <linux/mmu_context.h>
 #include <linux/module.h>
 #include <linux/msi.h>
 #include <linux/of.h>
@@ -37,6 +38,7 @@
 #include <linux/of_platform.h>
 #include <linux/pci.h>
 #include <linux/platform_device.h>
+#include <linux/sched/mm.h>
 
 #include <linux/amba/bus.h>
 
@@ -642,6 +644,7 @@ struct arm_smmu_strtab_cfg {
 
 struct arm_smmu_asid_state {
 	struct arm_smmu_domain		*domain;
+	unsigned long			refs;
 };
 
 /* An SMMUv3 instance */
@@ -712,6 +715,9 @@ struct arm_smmu_master_data {
 	struct device			*dev;
 
 	size_t				num_ssids;
+	bool				can_fault;
+	/* Number of processes attached */
+	int				processes;
 };
 
 /* SMMU private data for an IOMMU domain */
@@ -740,6 +746,11 @@ struct arm_smmu_domain {
 	spinlock_t			devices_lock;
 };
 
+struct arm_smmu_process {
+	struct iommu_process		process;
+	struct arm_smmu_ctx_desc	ctx_desc;
+};
+
 struct arm_smmu_option_prop {
 	u32 opt;
 	const char *prop;
@@ -766,6 +777,11 @@ static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
 	return container_of(dom, struct arm_smmu_domain, domain);
 }
 
+static struct arm_smmu_process *to_smmu_process(struct iommu_process *process)
+{
+	return container_of(process, struct arm_smmu_process, process);
+}
+
 static void parse_driver_options(struct arm_smmu_device *smmu)
 {
 	int i = 0;
@@ -2032,6 +2048,13 @@ static void arm_smmu_detach_dev(struct device *dev)
 	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
 	struct arm_smmu_domain *smmu_domain = master->domain;
 
+	/*
+	 * Core is preventing concurrent calls between attach and bind, so this
+	 * read only races with process_exit (FIXME).
+	 */
+	if (master->processes)
+		__iommu_process_unbind_dev_all(&smmu_domain->domain, dev);
+
 	if (smmu_domain) {
 		spin_lock_irqsave(&smmu_domain->devices_lock, flags);
 		list_del(&master->list);
@@ -2143,6 +2166,184 @@ arm_smmu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova)
 	return ops->iova_to_phys(ops, iova);
 }
 
+static int arm_smmu_process_init_pgtable(struct arm_smmu_process *smmu_process,
+					 struct mm_struct *mm)
+{
+	int asid;
+
+	asid = mm_context_get(mm);
+	if (!asid)
+		return -ENOSPC;
+
+	smmu_process->ctx_desc.asid = asid;
+	/* TODO: init the rest */
+
+	return 0;
+}
+
+static struct iommu_process *arm_smmu_process_alloc(struct task_struct *task)
+{
+	int ret;
+	struct mm_struct *mm;
+	struct arm_smmu_process *smmu_process;
+
+	smmu_process = kzalloc(sizeof(*smmu_process), GFP_KERNEL);
+
+	mm = get_task_mm(task);
+	if (!mm) {
+		kfree(smmu_process);
+		return NULL;
+	}
+
+	ret = arm_smmu_process_init_pgtable(smmu_process, mm);
+	mmput(mm);
+	if (ret) {
+		kfree(smmu_process);
+		return NULL;
+	}
+
+	return &smmu_process->process;
+}
+
+static void arm_smmu_process_free(struct iommu_process *process)
+{
+	struct arm_smmu_process *smmu_process = to_smmu_process(process);
+
+	/* Unpin ASID */
+	mm_context_put(process->mm);
+
+	kfree(smmu_process);
+}
+
+static int arm_smmu_process_share(struct arm_smmu_domain *smmu_domain,
+				  struct arm_smmu_process *smmu_process)
+{
+	int asid, ret;
+	struct arm_smmu_asid_state *asid_state;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+	asid = smmu_process->ctx_desc.asid;
+
+	asid_state = idr_find(&smmu->asid_idr, asid);
+	if (asid_state && asid_state->domain) {
+		return -EEXIST;
+	} else if (asid_state) {
+		asid_state->refs++;
+		return 0;
+	}
+
+	asid_state = kzalloc(sizeof(*asid_state), GFP_ATOMIC);
+	asid_state->refs = 1;
+
+	if (!asid_state)
+		return -ENOMEM;
+
+	ret = idr_alloc(&smmu->asid_idr, asid_state, asid, asid + 1, GFP_ATOMIC);
+	return ret < 0 ? ret : 0;
+}
+
+static int arm_smmu_process_attach(struct iommu_domain *domain,
+				   struct device *dev,
+				   struct iommu_process *process, bool first)
+{
+	int ret;
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct arm_smmu_process *smmu_process = to_smmu_process(process);
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
+
+	if (!(smmu->features & ARM_SMMU_FEAT_SVM))
+		return -ENODEV;
+
+	/* TODO: process->no_pasid */
+	if (process->pasid >= master->num_ssids)
+		return -ENODEV;
+
+	/* TODO: process->no_need_for_pri_ill_pin_everything */
+	if (!master->can_fault)
+		return -ENODEV;
+
+	master->processes++;
+
+	if (!first)
+		return 0;
+
+	spin_lock(&smmu->asid_lock);
+	ret = arm_smmu_process_share(smmu_domain, smmu_process);
+	spin_unlock(&smmu->asid_lock);
+	if (ret)
+		return ret;
+
+	arm_smmu_write_ctx_desc(smmu_domain, process->pasid, &smmu_process->ctx_desc);
+
+	return 0;
+}
+
+static void arm_smmu_process_detach(struct iommu_domain *domain,
+				    struct device *dev,
+				    struct iommu_process *process, bool last)
+{
+	struct arm_smmu_asid_state *asid_state;
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct arm_smmu_process *smmu_process = to_smmu_process(process);
+	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+	master->processes--;
+
+	if (last) {
+		spin_lock(&smmu->asid_lock);
+		asid_state = idr_find(&smmu->asid_idr, smmu_process->ctx_desc.asid);
+		if (--asid_state->refs == 0) {
+			idr_remove(&smmu->asid_idr, smmu_process->ctx_desc.asid);
+			kfree(asid_state);
+		}
+		spin_unlock(&smmu->asid_lock);
+
+		arm_smmu_write_ctx_desc(smmu_domain, process->pasid, NULL);
+	}
+
+	/* TODO: Invalidate ATC. */
+	/* TODO: Invalidate all mappings if last and not DVM. */
+}
+
+static void arm_smmu_process_invalidate(struct iommu_domain *domain,
+					struct iommu_process *process,
+					unsigned long iova, size_t size)
+{
+	/*
+	 * TODO: Invalidate ATC.
+	 * TODO: Invalidate mapping if not DVM
+	 */
+}
+
+static void arm_smmu_process_exit(struct iommu_domain *domain,
+				  struct iommu_process *process)
+{
+	struct arm_smmu_master_data *master;
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+
+	if (!domain->process_exit)
+		return;
+
+	spin_lock(&smmu_domain->devices_lock);
+	list_for_each_entry(master, &smmu_domain->devices, list) {
+		if (!master->processes)
+			continue;
+
+		master->processes--;
+		domain->process_exit(domain, master->dev, process->pasid,
+				     domain->process_exit_token);
+
+		/* TODO: inval ATC */
+	}
+	spin_unlock(&smmu_domain->devices_lock);
+
+	arm_smmu_write_ctx_desc(smmu_domain, process->pasid, NULL);
+
+	/* TODO: Invalidate all mappings if not DVM */
+}
+
 static struct platform_driver arm_smmu_driver;
 
 static int arm_smmu_match_node(struct device *dev, void *data)
@@ -2351,6 +2552,12 @@ static struct iommu_ops arm_smmu_ops = {
 	.domain_alloc		= arm_smmu_domain_alloc,
 	.domain_free		= arm_smmu_domain_free,
 	.attach_dev		= arm_smmu_attach_dev,
+	.process_alloc		= arm_smmu_process_alloc,
+	.process_free		= arm_smmu_process_free,
+	.process_attach		= arm_smmu_process_attach,
+	.process_detach		= arm_smmu_process_detach,
+	.process_invalidate	= arm_smmu_process_invalidate,
+	.process_exit		= arm_smmu_process_exit,
 	.map			= arm_smmu_map,
 	.unmap			= arm_smmu_unmap,
 	.map_sg			= default_iommu_map_sg,
-- 
2.13.3


^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 21/36] iommu/arm-smmu-v3: Implement process operations
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

Hook process operations to support PASID and page table sharing with the
SMMUv3:

* process_allocate pins down its ASID and initializes the context
  descriptor fields.
* process_free releases the ASID.
* process_attach checks device capabilities and writes the context
  descriptor. More work is required to ensure that the process' ASID isn't
  being used for io-pgtables.
* process_detach clears the context descriptor and sends required
  invalidations.
* process_invalidate sends required invalidations.
* process_exit stops us of the PASID, clears the context descriptor and
  performs required invalidations.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 207 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 207 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 71fc3a2c8a95..c86a1182c137 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -29,6 +29,7 @@
 #include <linux/interrupt.h>
 #include <linux/iommu.h>
 #include <linux/iopoll.h>
+#include <linux/mmu_context.h>
 #include <linux/module.h>
 #include <linux/msi.h>
 #include <linux/of.h>
@@ -37,6 +38,7 @@
 #include <linux/of_platform.h>
 #include <linux/pci.h>
 #include <linux/platform_device.h>
+#include <linux/sched/mm.h>
 
 #include <linux/amba/bus.h>
 
@@ -642,6 +644,7 @@ struct arm_smmu_strtab_cfg {
 
 struct arm_smmu_asid_state {
 	struct arm_smmu_domain		*domain;
+	unsigned long			refs;
 };
 
 /* An SMMUv3 instance */
@@ -712,6 +715,9 @@ struct arm_smmu_master_data {
 	struct device			*dev;
 
 	size_t				num_ssids;
+	bool				can_fault;
+	/* Number of processes attached */
+	int				processes;
 };
 
 /* SMMU private data for an IOMMU domain */
@@ -740,6 +746,11 @@ struct arm_smmu_domain {
 	spinlock_t			devices_lock;
 };
 
+struct arm_smmu_process {
+	struct iommu_process		process;
+	struct arm_smmu_ctx_desc	ctx_desc;
+};
+
 struct arm_smmu_option_prop {
 	u32 opt;
 	const char *prop;
@@ -766,6 +777,11 @@ static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
 	return container_of(dom, struct arm_smmu_domain, domain);
 }
 
+static struct arm_smmu_process *to_smmu_process(struct iommu_process *process)
+{
+	return container_of(process, struct arm_smmu_process, process);
+}
+
 static void parse_driver_options(struct arm_smmu_device *smmu)
 {
 	int i = 0;
@@ -2032,6 +2048,13 @@ static void arm_smmu_detach_dev(struct device *dev)
 	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
 	struct arm_smmu_domain *smmu_domain = master->domain;
 
+	/*
+	 * Core is preventing concurrent calls between attach and bind, so this
+	 * read only races with process_exit (FIXME).
+	 */
+	if (master->processes)
+		__iommu_process_unbind_dev_all(&smmu_domain->domain, dev);
+
 	if (smmu_domain) {
 		spin_lock_irqsave(&smmu_domain->devices_lock, flags);
 		list_del(&master->list);
@@ -2143,6 +2166,184 @@ arm_smmu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova)
 	return ops->iova_to_phys(ops, iova);
 }
 
+static int arm_smmu_process_init_pgtable(struct arm_smmu_process *smmu_process,
+					 struct mm_struct *mm)
+{
+	int asid;
+
+	asid = mm_context_get(mm);
+	if (!asid)
+		return -ENOSPC;
+
+	smmu_process->ctx_desc.asid = asid;
+	/* TODO: init the rest */
+
+	return 0;
+}
+
+static struct iommu_process *arm_smmu_process_alloc(struct task_struct *task)
+{
+	int ret;
+	struct mm_struct *mm;
+	struct arm_smmu_process *smmu_process;
+
+	smmu_process = kzalloc(sizeof(*smmu_process), GFP_KERNEL);
+
+	mm = get_task_mm(task);
+	if (!mm) {
+		kfree(smmu_process);
+		return NULL;
+	}
+
+	ret = arm_smmu_process_init_pgtable(smmu_process, mm);
+	mmput(mm);
+	if (ret) {
+		kfree(smmu_process);
+		return NULL;
+	}
+
+	return &smmu_process->process;
+}
+
+static void arm_smmu_process_free(struct iommu_process *process)
+{
+	struct arm_smmu_process *smmu_process = to_smmu_process(process);
+
+	/* Unpin ASID */
+	mm_context_put(process->mm);
+
+	kfree(smmu_process);
+}
+
+static int arm_smmu_process_share(struct arm_smmu_domain *smmu_domain,
+				  struct arm_smmu_process *smmu_process)
+{
+	int asid, ret;
+	struct arm_smmu_asid_state *asid_state;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+	asid = smmu_process->ctx_desc.asid;
+
+	asid_state = idr_find(&smmu->asid_idr, asid);
+	if (asid_state && asid_state->domain) {
+		return -EEXIST;
+	} else if (asid_state) {
+		asid_state->refs++;
+		return 0;
+	}
+
+	asid_state = kzalloc(sizeof(*asid_state), GFP_ATOMIC);
+	asid_state->refs = 1;
+
+	if (!asid_state)
+		return -ENOMEM;
+
+	ret = idr_alloc(&smmu->asid_idr, asid_state, asid, asid + 1, GFP_ATOMIC);
+	return ret < 0 ? ret : 0;
+}
+
+static int arm_smmu_process_attach(struct iommu_domain *domain,
+				   struct device *dev,
+				   struct iommu_process *process, bool first)
+{
+	int ret;
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct arm_smmu_process *smmu_process = to_smmu_process(process);
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
+
+	if (!(smmu->features & ARM_SMMU_FEAT_SVM))
+		return -ENODEV;
+
+	/* TODO: process->no_pasid */
+	if (process->pasid >= master->num_ssids)
+		return -ENODEV;
+
+	/* TODO: process->no_need_for_pri_ill_pin_everything */
+	if (!master->can_fault)
+		return -ENODEV;
+
+	master->processes++;
+
+	if (!first)
+		return 0;
+
+	spin_lock(&smmu->asid_lock);
+	ret = arm_smmu_process_share(smmu_domain, smmu_process);
+	spin_unlock(&smmu->asid_lock);
+	if (ret)
+		return ret;
+
+	arm_smmu_write_ctx_desc(smmu_domain, process->pasid, &smmu_process->ctx_desc);
+
+	return 0;
+}
+
+static void arm_smmu_process_detach(struct iommu_domain *domain,
+				    struct device *dev,
+				    struct iommu_process *process, bool last)
+{
+	struct arm_smmu_asid_state *asid_state;
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct arm_smmu_process *smmu_process = to_smmu_process(process);
+	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+	master->processes--;
+
+	if (last) {
+		spin_lock(&smmu->asid_lock);
+		asid_state = idr_find(&smmu->asid_idr, smmu_process->ctx_desc.asid);
+		if (--asid_state->refs == 0) {
+			idr_remove(&smmu->asid_idr, smmu_process->ctx_desc.asid);
+			kfree(asid_state);
+		}
+		spin_unlock(&smmu->asid_lock);
+
+		arm_smmu_write_ctx_desc(smmu_domain, process->pasid, NULL);
+	}
+
+	/* TODO: Invalidate ATC. */
+	/* TODO: Invalidate all mappings if last and not DVM. */
+}
+
+static void arm_smmu_process_invalidate(struct iommu_domain *domain,
+					struct iommu_process *process,
+					unsigned long iova, size_t size)
+{
+	/*
+	 * TODO: Invalidate ATC.
+	 * TODO: Invalidate mapping if not DVM
+	 */
+}
+
+static void arm_smmu_process_exit(struct iommu_domain *domain,
+				  struct iommu_process *process)
+{
+	struct arm_smmu_master_data *master;
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+
+	if (!domain->process_exit)
+		return;
+
+	spin_lock(&smmu_domain->devices_lock);
+	list_for_each_entry(master, &smmu_domain->devices, list) {
+		if (!master->processes)
+			continue;
+
+		master->processes--;
+		domain->process_exit(domain, master->dev, process->pasid,
+				     domain->process_exit_token);
+
+		/* TODO: inval ATC */
+	}
+	spin_unlock(&smmu_domain->devices_lock);
+
+	arm_smmu_write_ctx_desc(smmu_domain, process->pasid, NULL);
+
+	/* TODO: Invalidate all mappings if not DVM */
+}
+
 static struct platform_driver arm_smmu_driver;
 
 static int arm_smmu_match_node(struct device *dev, void *data)
@@ -2351,6 +2552,12 @@ static struct iommu_ops arm_smmu_ops = {
 	.domain_alloc		= arm_smmu_domain_alloc,
 	.domain_free		= arm_smmu_domain_free,
 	.attach_dev		= arm_smmu_attach_dev,
+	.process_alloc		= arm_smmu_process_alloc,
+	.process_free		= arm_smmu_process_free,
+	.process_attach		= arm_smmu_process_attach,
+	.process_detach		= arm_smmu_process_detach,
+	.process_invalidate	= arm_smmu_process_invalidate,
+	.process_exit		= arm_smmu_process_exit,
 	.map			= arm_smmu_map,
 	.unmap			= arm_smmu_unmap,
 	.map_sg			= default_iommu_map_sg,
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 22/36] iommu/io-pgtable-arm: Factor out ARM LPAE register defines
  2017-10-06 13:31 ` Jean-Philippe Brucker
@ 2017-10-06 13:31   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

For SVM, we'll need to extract CPU page table information and mirror it in
the substream setup. Move relevant defines to a common header.

Fix TCR_SZ_MASK while we're at it.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 MAINTAINERS                    |  1 +
 drivers/iommu/io-pgtable-arm.c | 48 +-----------------------------
 drivers/iommu/io-pgtable-arm.h | 67 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 69 insertions(+), 47 deletions(-)
 create mode 100644 drivers/iommu/io-pgtable-arm.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 65b0c88d5ee0..cff90315c2ec 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1089,6 +1089,7 @@ S:	Maintained
 F:	drivers/iommu/arm-smmu.c
 F:	drivers/iommu/arm-smmu-v3.c
 F:	drivers/iommu/io-pgtable-arm.c
+F:	drivers/iommu/io-pgtable-arm.h
 F:	drivers/iommu/io-pgtable-arm-v7s.c
 
 ARM SUB-ARCHITECTURES
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index e8018a308868..443234a564a6 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -31,6 +31,7 @@
 #include <asm/barrier.h>
 
 #include "io-pgtable.h"
+#include "io-pgtable-arm.h"
 
 #define ARM_LPAE_MAX_ADDR_BITS		48
 #define ARM_LPAE_S2_MAX_CONCAT_PAGES	16
@@ -118,53 +119,6 @@
 #define ARM_LPAE_PTE_MEMATTR_DEV	(((arm_lpae_iopte)0x1) << 2)
 
 /* Register bits */
-#define ARM_32_LPAE_TCR_EAE		(1 << 31)
-#define ARM_64_LPAE_S2_TCR_RES1		(1 << 31)
-
-#define ARM_LPAE_TCR_EPD1		(1 << 23)
-
-#define ARM_LPAE_TCR_TG0_4K		(0 << 14)
-#define ARM_LPAE_TCR_TG0_64K		(1 << 14)
-#define ARM_LPAE_TCR_TG0_16K		(2 << 14)
-
-#define ARM_LPAE_TCR_SH0_SHIFT		12
-#define ARM_LPAE_TCR_SH0_MASK		0x3
-#define ARM_LPAE_TCR_SH_NS		0
-#define ARM_LPAE_TCR_SH_OS		2
-#define ARM_LPAE_TCR_SH_IS		3
-
-#define ARM_LPAE_TCR_ORGN0_SHIFT	10
-#define ARM_LPAE_TCR_IRGN0_SHIFT	8
-#define ARM_LPAE_TCR_RGN_MASK		0x3
-#define ARM_LPAE_TCR_RGN_NC		0
-#define ARM_LPAE_TCR_RGN_WBWA		1
-#define ARM_LPAE_TCR_RGN_WT		2
-#define ARM_LPAE_TCR_RGN_WB		3
-
-#define ARM_LPAE_TCR_SL0_SHIFT		6
-#define ARM_LPAE_TCR_SL0_MASK		0x3
-
-#define ARM_LPAE_TCR_T0SZ_SHIFT		0
-#define ARM_LPAE_TCR_SZ_MASK		0xf
-
-#define ARM_LPAE_TCR_PS_SHIFT		16
-#define ARM_LPAE_TCR_PS_MASK		0x7
-
-#define ARM_LPAE_TCR_IPS_SHIFT		32
-#define ARM_LPAE_TCR_IPS_MASK		0x7
-
-#define ARM_LPAE_TCR_PS_32_BIT		0x0ULL
-#define ARM_LPAE_TCR_PS_36_BIT		0x1ULL
-#define ARM_LPAE_TCR_PS_40_BIT		0x2ULL
-#define ARM_LPAE_TCR_PS_42_BIT		0x3ULL
-#define ARM_LPAE_TCR_PS_44_BIT		0x4ULL
-#define ARM_LPAE_TCR_PS_48_BIT		0x5ULL
-
-#define ARM_LPAE_MAIR_ATTR_SHIFT(n)	((n) << 3)
-#define ARM_LPAE_MAIR_ATTR_MASK		0xff
-#define ARM_LPAE_MAIR_ATTR_DEVICE	0x04
-#define ARM_LPAE_MAIR_ATTR_NC		0x44
-#define ARM_LPAE_MAIR_ATTR_WBRWA	0xff
 #define ARM_LPAE_MAIR_ATTR_IDX_NC	0
 #define ARM_LPAE_MAIR_ATTR_IDX_CACHE	1
 #define ARM_LPAE_MAIR_ATTR_IDX_DEV	2
diff --git a/drivers/iommu/io-pgtable-arm.h b/drivers/iommu/io-pgtable-arm.h
new file mode 100644
index 000000000000..cb31314971ac
--- /dev/null
+++ b/drivers/iommu/io-pgtable-arm.h
@@ -0,0 +1,67 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Copyright (C) 2017 ARM Limited
+ */
+#ifndef __IO_PGTABLE_ARM_H
+#define __IO_PGTABLE_ARM_H
+
+#define ARM_32_LPAE_TCR_EAE		(1 << 31)
+#define ARM_64_LPAE_S2_TCR_RES1		(1 << 31)
+
+#define ARM_LPAE_TCR_EPD1		(1 << 23)
+
+#define ARM_LPAE_TCR_TG0_4K		(0 << 14)
+#define ARM_LPAE_TCR_TG0_64K		(1 << 14)
+#define ARM_LPAE_TCR_TG0_16K		(2 << 14)
+
+#define ARM_LPAE_TCR_SH0_SHIFT		12
+#define ARM_LPAE_TCR_SH0_MASK		0x3
+#define ARM_LPAE_TCR_SH_NS		0
+#define ARM_LPAE_TCR_SH_OS		2
+#define ARM_LPAE_TCR_SH_IS		3
+
+#define ARM_LPAE_TCR_ORGN0_SHIFT	10
+#define ARM_LPAE_TCR_IRGN0_SHIFT	8
+#define ARM_LPAE_TCR_RGN_MASK		0x3
+#define ARM_LPAE_TCR_RGN_NC		0
+#define ARM_LPAE_TCR_RGN_WBWA		1
+#define ARM_LPAE_TCR_RGN_WT		2
+#define ARM_LPAE_TCR_RGN_WB		3
+
+#define ARM_LPAE_TCR_SL0_SHIFT		6
+#define ARM_LPAE_TCR_SL0_MASK		0x3
+
+#define ARM_LPAE_TCR_T0SZ_SHIFT		0
+#define ARM_LPAE_TCR_SZ_MASK		0x3f
+
+#define ARM_LPAE_TCR_PS_SHIFT		16
+#define ARM_LPAE_TCR_PS_MASK		0x7
+
+#define ARM_LPAE_TCR_IPS_SHIFT		32
+#define ARM_LPAE_TCR_IPS_MASK		0x7
+
+#define ARM_LPAE_TCR_PS_32_BIT		0x0ULL
+#define ARM_LPAE_TCR_PS_36_BIT		0x1ULL
+#define ARM_LPAE_TCR_PS_40_BIT		0x2ULL
+#define ARM_LPAE_TCR_PS_42_BIT		0x3ULL
+#define ARM_LPAE_TCR_PS_44_BIT		0x4ULL
+#define ARM_LPAE_TCR_PS_48_BIT		0x5ULL
+
+#define ARM_LPAE_MAIR_ATTR_SHIFT(n)	((n) << 3)
+#define ARM_LPAE_MAIR_ATTR_MASK		0xff
+#define ARM_LPAE_MAIR_ATTR_DEVICE	0x04
+#define ARM_LPAE_MAIR_ATTR_NC		0x44
+#define ARM_LPAE_MAIR_ATTR_WBRWA	0xff
+
+#endif /* __IO_PGTABLE_ARM_H */
-- 
2.13.3


^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 22/36] iommu/io-pgtable-arm: Factor out ARM LPAE register defines
@ 2017-10-06 13:31   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

For SVM, we'll need to extract CPU page table information and mirror it in
the substream setup. Move relevant defines to a common header.

Fix TCR_SZ_MASK while we're at it.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 MAINTAINERS                    |  1 +
 drivers/iommu/io-pgtable-arm.c | 48 +-----------------------------
 drivers/iommu/io-pgtable-arm.h | 67 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 69 insertions(+), 47 deletions(-)
 create mode 100644 drivers/iommu/io-pgtable-arm.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 65b0c88d5ee0..cff90315c2ec 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1089,6 +1089,7 @@ S:	Maintained
 F:	drivers/iommu/arm-smmu.c
 F:	drivers/iommu/arm-smmu-v3.c
 F:	drivers/iommu/io-pgtable-arm.c
+F:	drivers/iommu/io-pgtable-arm.h
 F:	drivers/iommu/io-pgtable-arm-v7s.c
 
 ARM SUB-ARCHITECTURES
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index e8018a308868..443234a564a6 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -31,6 +31,7 @@
 #include <asm/barrier.h>
 
 #include "io-pgtable.h"
+#include "io-pgtable-arm.h"
 
 #define ARM_LPAE_MAX_ADDR_BITS		48
 #define ARM_LPAE_S2_MAX_CONCAT_PAGES	16
@@ -118,53 +119,6 @@
 #define ARM_LPAE_PTE_MEMATTR_DEV	(((arm_lpae_iopte)0x1) << 2)
 
 /* Register bits */
-#define ARM_32_LPAE_TCR_EAE		(1 << 31)
-#define ARM_64_LPAE_S2_TCR_RES1		(1 << 31)
-
-#define ARM_LPAE_TCR_EPD1		(1 << 23)
-
-#define ARM_LPAE_TCR_TG0_4K		(0 << 14)
-#define ARM_LPAE_TCR_TG0_64K		(1 << 14)
-#define ARM_LPAE_TCR_TG0_16K		(2 << 14)
-
-#define ARM_LPAE_TCR_SH0_SHIFT		12
-#define ARM_LPAE_TCR_SH0_MASK		0x3
-#define ARM_LPAE_TCR_SH_NS		0
-#define ARM_LPAE_TCR_SH_OS		2
-#define ARM_LPAE_TCR_SH_IS		3
-
-#define ARM_LPAE_TCR_ORGN0_SHIFT	10
-#define ARM_LPAE_TCR_IRGN0_SHIFT	8
-#define ARM_LPAE_TCR_RGN_MASK		0x3
-#define ARM_LPAE_TCR_RGN_NC		0
-#define ARM_LPAE_TCR_RGN_WBWA		1
-#define ARM_LPAE_TCR_RGN_WT		2
-#define ARM_LPAE_TCR_RGN_WB		3
-
-#define ARM_LPAE_TCR_SL0_SHIFT		6
-#define ARM_LPAE_TCR_SL0_MASK		0x3
-
-#define ARM_LPAE_TCR_T0SZ_SHIFT		0
-#define ARM_LPAE_TCR_SZ_MASK		0xf
-
-#define ARM_LPAE_TCR_PS_SHIFT		16
-#define ARM_LPAE_TCR_PS_MASK		0x7
-
-#define ARM_LPAE_TCR_IPS_SHIFT		32
-#define ARM_LPAE_TCR_IPS_MASK		0x7
-
-#define ARM_LPAE_TCR_PS_32_BIT		0x0ULL
-#define ARM_LPAE_TCR_PS_36_BIT		0x1ULL
-#define ARM_LPAE_TCR_PS_40_BIT		0x2ULL
-#define ARM_LPAE_TCR_PS_42_BIT		0x3ULL
-#define ARM_LPAE_TCR_PS_44_BIT		0x4ULL
-#define ARM_LPAE_TCR_PS_48_BIT		0x5ULL
-
-#define ARM_LPAE_MAIR_ATTR_SHIFT(n)	((n) << 3)
-#define ARM_LPAE_MAIR_ATTR_MASK		0xff
-#define ARM_LPAE_MAIR_ATTR_DEVICE	0x04
-#define ARM_LPAE_MAIR_ATTR_NC		0x44
-#define ARM_LPAE_MAIR_ATTR_WBRWA	0xff
 #define ARM_LPAE_MAIR_ATTR_IDX_NC	0
 #define ARM_LPAE_MAIR_ATTR_IDX_CACHE	1
 #define ARM_LPAE_MAIR_ATTR_IDX_DEV	2
diff --git a/drivers/iommu/io-pgtable-arm.h b/drivers/iommu/io-pgtable-arm.h
new file mode 100644
index 000000000000..cb31314971ac
--- /dev/null
+++ b/drivers/iommu/io-pgtable-arm.h
@@ -0,0 +1,67 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Copyright (C) 2017 ARM Limited
+ */
+#ifndef __IO_PGTABLE_ARM_H
+#define __IO_PGTABLE_ARM_H
+
+#define ARM_32_LPAE_TCR_EAE		(1 << 31)
+#define ARM_64_LPAE_S2_TCR_RES1		(1 << 31)
+
+#define ARM_LPAE_TCR_EPD1		(1 << 23)
+
+#define ARM_LPAE_TCR_TG0_4K		(0 << 14)
+#define ARM_LPAE_TCR_TG0_64K		(1 << 14)
+#define ARM_LPAE_TCR_TG0_16K		(2 << 14)
+
+#define ARM_LPAE_TCR_SH0_SHIFT		12
+#define ARM_LPAE_TCR_SH0_MASK		0x3
+#define ARM_LPAE_TCR_SH_NS		0
+#define ARM_LPAE_TCR_SH_OS		2
+#define ARM_LPAE_TCR_SH_IS		3
+
+#define ARM_LPAE_TCR_ORGN0_SHIFT	10
+#define ARM_LPAE_TCR_IRGN0_SHIFT	8
+#define ARM_LPAE_TCR_RGN_MASK		0x3
+#define ARM_LPAE_TCR_RGN_NC		0
+#define ARM_LPAE_TCR_RGN_WBWA		1
+#define ARM_LPAE_TCR_RGN_WT		2
+#define ARM_LPAE_TCR_RGN_WB		3
+
+#define ARM_LPAE_TCR_SL0_SHIFT		6
+#define ARM_LPAE_TCR_SL0_MASK		0x3
+
+#define ARM_LPAE_TCR_T0SZ_SHIFT		0
+#define ARM_LPAE_TCR_SZ_MASK		0x3f
+
+#define ARM_LPAE_TCR_PS_SHIFT		16
+#define ARM_LPAE_TCR_PS_MASK		0x7
+
+#define ARM_LPAE_TCR_IPS_SHIFT		32
+#define ARM_LPAE_TCR_IPS_MASK		0x7
+
+#define ARM_LPAE_TCR_PS_32_BIT		0x0ULL
+#define ARM_LPAE_TCR_PS_36_BIT		0x1ULL
+#define ARM_LPAE_TCR_PS_40_BIT		0x2ULL
+#define ARM_LPAE_TCR_PS_42_BIT		0x3ULL
+#define ARM_LPAE_TCR_PS_44_BIT		0x4ULL
+#define ARM_LPAE_TCR_PS_48_BIT		0x5ULL
+
+#define ARM_LPAE_MAIR_ATTR_SHIFT(n)	((n) << 3)
+#define ARM_LPAE_MAIR_ATTR_MASK		0xff
+#define ARM_LPAE_MAIR_ATTR_DEVICE	0x04
+#define ARM_LPAE_MAIR_ATTR_NC		0x44
+#define ARM_LPAE_MAIR_ATTR_WBRWA	0xff
+
+#endif /* __IO_PGTABLE_ARM_H */
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 23/36] iommu/arm-smmu-v3: Share process page tables
  2017-10-06 13:31 ` Jean-Philippe Brucker
  (?)
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: mark.rutland-5wv7dgnIgG8,
	gabriele.paoloni-hv44wF8Li93QT0dZR+AlfA,
	catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	rjw-LthD3rsA81gm4RdzfppkhA, sudeep.holla-5wv7dgnIgG8

Copy the content of TCR, MAIR and TTBR of a given task into a context
descriptor.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu-v3.c | 38 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index c86a1182c137..293f260782c2 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -43,6 +43,7 @@
 #include <linux/amba/bus.h>
 
 #include "io-pgtable.h"
+#include "io-pgtable-arm.h"
 
 /* MMIO registers */
 #define ARM_SMMU_IDR0			0x0
@@ -2170,13 +2171,46 @@ static int arm_smmu_process_init_pgtable(struct arm_smmu_process *smmu_process,
 					 struct mm_struct *mm)
 {
 	int asid;
+	unsigned long tcr;
+	unsigned long reg, par;
+	struct arm_smmu_ctx_desc *cfg = &smmu_process->ctx_desc;
 
 	asid = mm_context_get(mm);
 	if (!asid)
 		return -ENOSPC;
 
-	smmu_process->ctx_desc.asid = asid;
-	/* TODO: init the rest */
+	tcr = TCR_T0SZ(VA_BITS) | TCR_IRGN0_WBWA | TCR_ORGN0_WBWA |
+		TCR_SH0_INNER | ARM_LPAE_TCR_EPD1;
+
+	switch (PAGE_SIZE) {
+		case SZ_4K:
+			tcr |= TCR_TG0_4K;
+			break;
+		case SZ_16K:
+			tcr |= TCR_TG0_16K;
+			break;
+		case SZ_64K:
+			tcr |= TCR_TG0_64K;
+			break;
+		default:
+			WARN_ON(1);
+			return -EFAULT;
+	}
+
+	reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
+	par = cpuid_feature_extract_unsigned_field(reg, ID_AA64MMFR0_PARANGE_SHIFT);
+	tcr |= par << ARM_LPAE_TCR_IPS_SHIFT;
+
+	tcr |= TCR_TBI0;
+
+	cfg->asid       = asid;
+	cfg->ttbr       = virt_to_phys(mm->pgd);
+	/*
+	 * MAIR value is pretty much constant and global, so we can just get it
+	 * from the current CPU register
+	 */
+	cfg->mair       = read_sysreg(mair_el1);
+	cfg->tcr        = tcr;
 
 	return 0;
 }
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 23/36] iommu/arm-smmu-v3: Share process page tables
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

Copy the content of TCR, MAIR and TTBR of a given task into a context
descriptor.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 38 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index c86a1182c137..293f260782c2 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -43,6 +43,7 @@
 #include <linux/amba/bus.h>
 
 #include "io-pgtable.h"
+#include "io-pgtable-arm.h"
 
 /* MMIO registers */
 #define ARM_SMMU_IDR0			0x0
@@ -2170,13 +2171,46 @@ static int arm_smmu_process_init_pgtable(struct arm_smmu_process *smmu_process,
 					 struct mm_struct *mm)
 {
 	int asid;
+	unsigned long tcr;
+	unsigned long reg, par;
+	struct arm_smmu_ctx_desc *cfg = &smmu_process->ctx_desc;
 
 	asid = mm_context_get(mm);
 	if (!asid)
 		return -ENOSPC;
 
-	smmu_process->ctx_desc.asid = asid;
-	/* TODO: init the rest */
+	tcr = TCR_T0SZ(VA_BITS) | TCR_IRGN0_WBWA | TCR_ORGN0_WBWA |
+		TCR_SH0_INNER | ARM_LPAE_TCR_EPD1;
+
+	switch (PAGE_SIZE) {
+		case SZ_4K:
+			tcr |= TCR_TG0_4K;
+			break;
+		case SZ_16K:
+			tcr |= TCR_TG0_16K;
+			break;
+		case SZ_64K:
+			tcr |= TCR_TG0_64K;
+			break;
+		default:
+			WARN_ON(1);
+			return -EFAULT;
+	}
+
+	reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
+	par = cpuid_feature_extract_unsigned_field(reg, ID_AA64MMFR0_PARANGE_SHIFT);
+	tcr |= par << ARM_LPAE_TCR_IPS_SHIFT;
+
+	tcr |= TCR_TBI0;
+
+	cfg->asid       = asid;
+	cfg->ttbr       = virt_to_phys(mm->pgd);
+	/*
+	 * MAIR value is pretty much constant and global, so we can just get it
+	 * from the current CPU register
+	 */
+	cfg->mair       = read_sysreg(mair_el1);
+	cfg->tcr        = tcr;
 
 	return 0;
 }
-- 
2.13.3


^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 23/36] iommu/arm-smmu-v3: Share process page tables
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

Copy the content of TCR, MAIR and TTBR of a given task into a context
descriptor.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 38 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index c86a1182c137..293f260782c2 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -43,6 +43,7 @@
 #include <linux/amba/bus.h>
 
 #include "io-pgtable.h"
+#include "io-pgtable-arm.h"
 
 /* MMIO registers */
 #define ARM_SMMU_IDR0			0x0
@@ -2170,13 +2171,46 @@ static int arm_smmu_process_init_pgtable(struct arm_smmu_process *smmu_process,
 					 struct mm_struct *mm)
 {
 	int asid;
+	unsigned long tcr;
+	unsigned long reg, par;
+	struct arm_smmu_ctx_desc *cfg = &smmu_process->ctx_desc;
 
 	asid = mm_context_get(mm);
 	if (!asid)
 		return -ENOSPC;
 
-	smmu_process->ctx_desc.asid = asid;
-	/* TODO: init the rest */
+	tcr = TCR_T0SZ(VA_BITS) | TCR_IRGN0_WBWA | TCR_ORGN0_WBWA |
+		TCR_SH0_INNER | ARM_LPAE_TCR_EPD1;
+
+	switch (PAGE_SIZE) {
+		case SZ_4K:
+			tcr |= TCR_TG0_4K;
+			break;
+		case SZ_16K:
+			tcr |= TCR_TG0_16K;
+			break;
+		case SZ_64K:
+			tcr |= TCR_TG0_64K;
+			break;
+		default:
+			WARN_ON(1);
+			return -EFAULT;
+	}
+
+	reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
+	par = cpuid_feature_extract_unsigned_field(reg, ID_AA64MMFR0_PARANGE_SHIFT);
+	tcr |= par << ARM_LPAE_TCR_IPS_SHIFT;
+
+	tcr |= TCR_TBI0;
+
+	cfg->asid       = asid;
+	cfg->ttbr       = virt_to_phys(mm->pgd);
+	/*
+	 * MAIR value is pretty much constant and global, so we can just get it
+	 * from the current CPU register
+	 */
+	cfg->mair       = read_sysreg(mair_el1);
+	cfg->tcr        = tcr;
 
 	return 0;
 }
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 24/36] iommu/arm-smmu-v3: Steal private ASID from a domain
  2017-10-06 13:31 ` Jean-Philippe Brucker
@ 2017-10-06 13:31   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

The SMMU only has one ASID space, so the process allocator competes with
the domain allocator for ASIDs. Process ASIDs are allocated by the arch
allocator and shared with CPUs, whereas domain ASIDs are private to the
SMMU, and not affected by broadcast TLB invalidations.

When the process allocator pins an mm_context and gets an ASID that is
already in use by the SMMU, it belongs to a domain. At the moment we
simply abort the bind, but we can try one step further. Attempt to assign
a new private ASID to the domain, and steal the old one for our process.

Use the smmu-wide ASID lock to prevent racing with attach_dev over the
foreign domain. We now need to also take this lock when modifying entry 0
of the context table. Concurrent modifications of a given context table
used to be prevented by group->mutex but in this patch we modify the CD of
another group.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 53 +++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 49 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 293f260782c2..e89e6d1263d9 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1731,7 +1731,7 @@ static void arm_smmu_tlb_inv_context(void *cookie)
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
 		cmd.opcode	= smmu->features & ARM_SMMU_FEAT_E2H ?
 				  CMDQ_OP_TLBI_EL2_ASID : CMDQ_OP_TLBI_NH_ASID;
-		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd.asid;
+		cmd.tlbi.asid	= READ_ONCE(smmu_domain->s1_cfg.cd.asid);
 		cmd.tlbi.vmid	= 0;
 	} else {
 		cmd.opcode	= CMDQ_OP_TLBI_S12_VMALL;
@@ -1757,7 +1757,7 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
 		cmd.opcode	= smmu->features & ARM_SMMU_FEAT_E2H ?
 				  CMDQ_OP_TLBI_EL2_VA : CMDQ_OP_TLBI_NH_VA;
-		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd.asid;
+		cmd.tlbi.asid	= READ_ONCE(smmu_domain->s1_cfg.cd.asid);
 	} else {
 		cmd.opcode	= CMDQ_OP_TLBI_S2_IPA;
 		cmd.tlbi.vmid	= smmu_domain->s2_cfg.vmid;
@@ -2119,7 +2119,9 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	} else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
 		ste->s1_cfg = &smmu_domain->s1_cfg;
 		ste->s2_cfg = NULL;
+		spin_lock(&smmu->asid_lock);
 		arm_smmu_write_ctx_desc(smmu_domain, 0, &ste->s1_cfg->cd);
+		spin_unlock(&smmu->asid_lock);
 	} else {
 		ste->s1_cfg = NULL;
 		ste->s2_cfg = &smmu_domain->s2_cfg;
@@ -2253,14 +2255,57 @@ static int arm_smmu_process_share(struct arm_smmu_domain *smmu_domain,
 				  struct arm_smmu_process *smmu_process)
 {
 	int asid, ret;
-	struct arm_smmu_asid_state *asid_state;
+	struct arm_smmu_asid_state *asid_state, *new_state;
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 
 	asid = smmu_process->ctx_desc.asid;
 
 	asid_state = idr_find(&smmu->asid_idr, asid);
 	if (asid_state && asid_state->domain) {
-		return -EEXIST;
+		struct arm_smmu_domain *smmu_domain = asid_state->domain;
+		struct arm_smmu_cmdq_ent cmd = {
+			.opcode = smmu->features & ARM_SMMU_FEAT_E2H ?
+				CMDQ_OP_TLBI_EL2_ASID : CMDQ_OP_TLBI_NH_ASID,
+		};
+
+		new_state = kzalloc(sizeof(*new_state), GFP_ATOMIC);
+		if (!new_state)
+			return -ENOMEM;
+
+		new_state->domain = smmu_domain;
+
+		ret = idr_alloc_cyclic(&smmu->asid_idr, new_state, 0,
+				       1 << smmu->asid_bits, GFP_ATOMIC);
+		if (ret < 0) {
+			kfree(new_state);
+			return ret;
+		}
+
+		/*
+		 * Race with unmap; TLB invalidations will start targeting the
+		 * new ASID, which isn't assigned yet. We'll do an
+		 * invalidate-all on the old ASID later, so it doesn't matter.
+		 */
+		WRITE_ONCE(smmu_domain->s1_cfg.cd.asid, ret);
+
+		/*
+		 * Update ASID and invalidate CD in all associated masters.
+		 * There will be some overlapping between use of both ASIDs,
+		 * until we invalidate the TLB.
+		 */
+		arm_smmu_write_ctx_desc(smmu_domain, 0, &smmu_domain->s1_cfg.cd);
+
+		/* Invalidate TLB entries previously associated with that domain */
+		cmd.tlbi.asid = asid;
+		arm_smmu_cmdq_issue_cmd(smmu, &cmd);
+		cmd.opcode = CMDQ_OP_CMD_SYNC;
+		arm_smmu_cmdq_issue_cmd(smmu, &cmd);
+
+		asid_state->domain = NULL;
+		asid_state->refs = 1;
+
+		return 0;
+
 	} else if (asid_state) {
 		asid_state->refs++;
 		return 0;
-- 
2.13.3


^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 24/36] iommu/arm-smmu-v3: Steal private ASID from a domain
@ 2017-10-06 13:31   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

The SMMU only has one ASID space, so the process allocator competes with
the domain allocator for ASIDs. Process ASIDs are allocated by the arch
allocator and shared with CPUs, whereas domain ASIDs are private to the
SMMU, and not affected by broadcast TLB invalidations.

When the process allocator pins an mm_context and gets an ASID that is
already in use by the SMMU, it belongs to a domain. At the moment we
simply abort the bind, but we can try one step further. Attempt to assign
a new private ASID to the domain, and steal the old one for our process.

Use the smmu-wide ASID lock to prevent racing with attach_dev over the
foreign domain. We now need to also take this lock when modifying entry 0
of the context table. Concurrent modifications of a given context table
used to be prevented by group->mutex but in this patch we modify the CD of
another group.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 53 +++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 49 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 293f260782c2..e89e6d1263d9 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1731,7 +1731,7 @@ static void arm_smmu_tlb_inv_context(void *cookie)
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
 		cmd.opcode	= smmu->features & ARM_SMMU_FEAT_E2H ?
 				  CMDQ_OP_TLBI_EL2_ASID : CMDQ_OP_TLBI_NH_ASID;
-		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd.asid;
+		cmd.tlbi.asid	= READ_ONCE(smmu_domain->s1_cfg.cd.asid);
 		cmd.tlbi.vmid	= 0;
 	} else {
 		cmd.opcode	= CMDQ_OP_TLBI_S12_VMALL;
@@ -1757,7 +1757,7 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
 	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
 		cmd.opcode	= smmu->features & ARM_SMMU_FEAT_E2H ?
 				  CMDQ_OP_TLBI_EL2_VA : CMDQ_OP_TLBI_NH_VA;
-		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd.asid;
+		cmd.tlbi.asid	= READ_ONCE(smmu_domain->s1_cfg.cd.asid);
 	} else {
 		cmd.opcode	= CMDQ_OP_TLBI_S2_IPA;
 		cmd.tlbi.vmid	= smmu_domain->s2_cfg.vmid;
@@ -2119,7 +2119,9 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	} else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
 		ste->s1_cfg = &smmu_domain->s1_cfg;
 		ste->s2_cfg = NULL;
+		spin_lock(&smmu->asid_lock);
 		arm_smmu_write_ctx_desc(smmu_domain, 0, &ste->s1_cfg->cd);
+		spin_unlock(&smmu->asid_lock);
 	} else {
 		ste->s1_cfg = NULL;
 		ste->s2_cfg = &smmu_domain->s2_cfg;
@@ -2253,14 +2255,57 @@ static int arm_smmu_process_share(struct arm_smmu_domain *smmu_domain,
 				  struct arm_smmu_process *smmu_process)
 {
 	int asid, ret;
-	struct arm_smmu_asid_state *asid_state;
+	struct arm_smmu_asid_state *asid_state, *new_state;
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 
 	asid = smmu_process->ctx_desc.asid;
 
 	asid_state = idr_find(&smmu->asid_idr, asid);
 	if (asid_state && asid_state->domain) {
-		return -EEXIST;
+		struct arm_smmu_domain *smmu_domain = asid_state->domain;
+		struct arm_smmu_cmdq_ent cmd = {
+			.opcode = smmu->features & ARM_SMMU_FEAT_E2H ?
+				CMDQ_OP_TLBI_EL2_ASID : CMDQ_OP_TLBI_NH_ASID,
+		};
+
+		new_state = kzalloc(sizeof(*new_state), GFP_ATOMIC);
+		if (!new_state)
+			return -ENOMEM;
+
+		new_state->domain = smmu_domain;
+
+		ret = idr_alloc_cyclic(&smmu->asid_idr, new_state, 0,
+				       1 << smmu->asid_bits, GFP_ATOMIC);
+		if (ret < 0) {
+			kfree(new_state);
+			return ret;
+		}
+
+		/*
+		 * Race with unmap; TLB invalidations will start targeting the
+		 * new ASID, which isn't assigned yet. We'll do an
+		 * invalidate-all on the old ASID later, so it doesn't matter.
+		 */
+		WRITE_ONCE(smmu_domain->s1_cfg.cd.asid, ret);
+
+		/*
+		 * Update ASID and invalidate CD in all associated masters.
+		 * There will be some overlapping between use of both ASIDs,
+		 * until we invalidate the TLB.
+		 */
+		arm_smmu_write_ctx_desc(smmu_domain, 0, &smmu_domain->s1_cfg.cd);
+
+		/* Invalidate TLB entries previously associated with that domain */
+		cmd.tlbi.asid = asid;
+		arm_smmu_cmdq_issue_cmd(smmu, &cmd);
+		cmd.opcode = CMDQ_OP_CMD_SYNC;
+		arm_smmu_cmdq_issue_cmd(smmu, &cmd);
+
+		asid_state->domain = NULL;
+		asid_state->refs = 1;
+
+		return 0;
+
 	} else if (asid_state) {
 		asid_state->refs++;
 		return 0;
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 25/36] iommu/arm-smmu-v3: Use shared ASID set
  2017-10-06 13:31 ` Jean-Philippe Brucker
@ 2017-10-06 13:31   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

We now have two exclusive sets of ASIDs: private and shared. SMMUv3 allows
for contexts to take part in distributed TLB maintenance via the ASET bit.
When this bit is 0 for a given context, TLB entries tagged with its ASID
are invalidated by broadcast TLB maintenance. Set ASET=0 for process
contexts.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index e89e6d1263d9..b7355630526a 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1240,7 +1240,8 @@ static void arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
 		      CTXDESC_CD_0_ENDI |
 #endif
 		      CTXDESC_CD_0_R | CTXDESC_CD_0_A |
-		      CTXDESC_CD_0_ASET_PRIVATE |
+		      (ssid ? CTXDESC_CD_0_ASET_SHARED :
+			      CTXDESC_CD_0_ASET_PRIVATE) |
 		      CTXDESC_CD_0_AA64 |
 		      (u64)cd->asid << CTXDESC_CD_0_ASID_SHIFT |
 		      CTXDESC_CD_0_V;
-- 
2.13.3


^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 25/36] iommu/arm-smmu-v3: Use shared ASID set
@ 2017-10-06 13:31   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

We now have two exclusive sets of ASIDs: private and shared. SMMUv3 allows
for contexts to take part in distributed TLB maintenance via the ASET bit.
When this bit is 0 for a given context, TLB entries tagged with its ASID
are invalidated by broadcast TLB maintenance. Set ASET=0 for process
contexts.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index e89e6d1263d9..b7355630526a 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1240,7 +1240,8 @@ static void arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
 		      CTXDESC_CD_0_ENDI |
 #endif
 		      CTXDESC_CD_0_R | CTXDESC_CD_0_A |
-		      CTXDESC_CD_0_ASET_PRIVATE |
+		      (ssid ? CTXDESC_CD_0_ASET_SHARED :
+			      CTXDESC_CD_0_ASET_PRIVATE) |
 		      CTXDESC_CD_0_AA64 |
 		      (u64)cd->asid << CTXDESC_CD_0_ASID_SHIFT |
 		      CTXDESC_CD_0_V;
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 26/36] iommu/arm-smmu-v3: Add support for Hardware Translation Table Update
  2017-10-06 13:31 ` Jean-Philippe Brucker
@ 2017-10-06 13:31   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

If the SMMU supports it and the kernel was built with HTTU support, enable
hardware update of access and dirty flags. This is essential for shared
page tables, to reduce the number of access faults on the fault queue.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 32 ++++++++++++++++++++++++++++----
 1 file changed, 28 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index b7355630526a..2b2e2be03de7 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -67,6 +67,8 @@
 #define IDR0_ASID16			(1 << 12)
 #define IDR0_ATS			(1 << 10)
 #define IDR0_HYP			(1 << 9)
+#define IDR0_HD				(1 << 7)
+#define IDR0_HA				(1 << 6)
 #define IDR0_BTM			(1 << 5)
 #define IDR0_COHACC			(1 << 4)
 #define IDR0_TTF_SHIFT			2
@@ -342,7 +344,16 @@
 #define ARM64_TCR_TBI0_SHIFT		37
 #define ARM64_TCR_TBI0_MASK		0x1UL
 
+#define ARM64_TCR_HA_SHIFT		39
+#define ARM64_TCR_HA_MASK		0x1UL
+#define ARM64_TCR_HD_SHIFT		40
+#define ARM64_TCR_HD_MASK		0x1UL
+
 #define CTXDESC_CD_0_AA64		(1UL << 41)
+#define CTXDESC_CD_0_TCR_HD_SHIFT	42
+#define CTXDESC_CD_0_TCR_HA_SHIFT	43
+#define CTXDESC_CD_0_HD			(1UL << CTXDESC_CD_0_TCR_HD_SHIFT)
+#define CTXDESC_CD_0_HA			(1UL << CTXDESC_CD_0_TCR_HA_SHIFT)
 #define CTXDESC_CD_0_S			(1UL << 44)
 #define CTXDESC_CD_0_R			(1UL << 45)
 #define CTXDESC_CD_0_A			(1UL << 46)
@@ -670,6 +681,8 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_E2H		(1 << 14)
 #define ARM_SMMU_FEAT_BTM		(1 << 15)
 #define ARM_SMMU_FEAT_SVM		(1 << 16)
+#define ARM_SMMU_FEAT_HA		(1 << 17)
+#define ARM_SMMU_FEAT_HD		(1 << 18)
 	u32				features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
@@ -1157,7 +1170,7 @@ static __u64 *arm_smmu_get_cd_ptr(struct arm_smmu_domain *smmu_domain, u32 ssid)
 	return l1_desc->cdptr + idx * CTXDESC_CD_DWORDS;
 }
 
-static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
+static u64 arm_smmu_cpu_tcr_to_cd(struct arm_smmu_device *smmu, u64 tcr)
 {
 	u64 val = 0;
 
@@ -1172,6 +1185,12 @@ static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
 	val |= ARM_SMMU_TCR2CD(tcr, IPS);
 	val |= ARM_SMMU_TCR2CD(tcr, TBI0);
 
+	if (smmu->features & ARM_SMMU_FEAT_HA)
+		val |= ARM_SMMU_TCR2CD(tcr, HA);
+
+	if (smmu->features & ARM_SMMU_FEAT_HD)
+		val |= ARM_SMMU_TCR2CD(tcr, HD);
+
 	return val;
 }
 
@@ -1235,7 +1254,7 @@ static void arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
 			 */
 			arm_smmu_sync_cd(smmu_domain, ssid, true);
 
-		val = arm_smmu_cpu_tcr_to_cd(cd->tcr) |
+		val = arm_smmu_cpu_tcr_to_cd(smmu_domain->smmu, cd->tcr) |
 #ifdef __BIG_ENDIAN
 		      CTXDESC_CD_0_ENDI |
 #endif
@@ -2203,8 +2222,7 @@ static int arm_smmu_process_init_pgtable(struct arm_smmu_process *smmu_process,
 	reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
 	par = cpuid_feature_extract_unsigned_field(reg, ID_AA64MMFR0_PARANGE_SHIFT);
 	tcr |= par << ARM_LPAE_TCR_IPS_SHIFT;
-
-	tcr |= TCR_TBI0;
+	tcr |= TCR_TBI0 | TCR_HA | TCR_HD;
 
 	cfg->asid       = asid;
 	cfg->ttbr       = virt_to_phys(mm->pgd);
@@ -3275,6 +3293,12 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 			smmu->features |= ARM_SMMU_FEAT_E2H;
 	}
 
+	if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM) && (reg & (IDR0_HA | IDR0_HD))) {
+		smmu->features |= ARM_SMMU_FEAT_HA;
+		if (reg & IDR0_HD)
+			smmu->features |= ARM_SMMU_FEAT_HD;
+	}
+
 	/*
 	 * If the CPU is using VHE, but the SMMU doesn't support it, the SMMU
 	 * will create TLB entries for NH-EL1 world and will miss the
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 26/36] iommu/arm-smmu-v3: Add support for Hardware Translation Table Update
@ 2017-10-06 13:31   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

If the SMMU supports it and the kernel was built with HTTU support, enable
hardware update of access and dirty flags. This is essential for shared
page tables, to reduce the number of access faults on the fault queue.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 32 ++++++++++++++++++++++++++++----
 1 file changed, 28 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index b7355630526a..2b2e2be03de7 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -67,6 +67,8 @@
 #define IDR0_ASID16			(1 << 12)
 #define IDR0_ATS			(1 << 10)
 #define IDR0_HYP			(1 << 9)
+#define IDR0_HD				(1 << 7)
+#define IDR0_HA				(1 << 6)
 #define IDR0_BTM			(1 << 5)
 #define IDR0_COHACC			(1 << 4)
 #define IDR0_TTF_SHIFT			2
@@ -342,7 +344,16 @@
 #define ARM64_TCR_TBI0_SHIFT		37
 #define ARM64_TCR_TBI0_MASK		0x1UL
 
+#define ARM64_TCR_HA_SHIFT		39
+#define ARM64_TCR_HA_MASK		0x1UL
+#define ARM64_TCR_HD_SHIFT		40
+#define ARM64_TCR_HD_MASK		0x1UL
+
 #define CTXDESC_CD_0_AA64		(1UL << 41)
+#define CTXDESC_CD_0_TCR_HD_SHIFT	42
+#define CTXDESC_CD_0_TCR_HA_SHIFT	43
+#define CTXDESC_CD_0_HD			(1UL << CTXDESC_CD_0_TCR_HD_SHIFT)
+#define CTXDESC_CD_0_HA			(1UL << CTXDESC_CD_0_TCR_HA_SHIFT)
 #define CTXDESC_CD_0_S			(1UL << 44)
 #define CTXDESC_CD_0_R			(1UL << 45)
 #define CTXDESC_CD_0_A			(1UL << 46)
@@ -670,6 +681,8 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_E2H		(1 << 14)
 #define ARM_SMMU_FEAT_BTM		(1 << 15)
 #define ARM_SMMU_FEAT_SVM		(1 << 16)
+#define ARM_SMMU_FEAT_HA		(1 << 17)
+#define ARM_SMMU_FEAT_HD		(1 << 18)
 	u32				features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
@@ -1157,7 +1170,7 @@ static __u64 *arm_smmu_get_cd_ptr(struct arm_smmu_domain *smmu_domain, u32 ssid)
 	return l1_desc->cdptr + idx * CTXDESC_CD_DWORDS;
 }
 
-static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
+static u64 arm_smmu_cpu_tcr_to_cd(struct arm_smmu_device *smmu, u64 tcr)
 {
 	u64 val = 0;
 
@@ -1172,6 +1185,12 @@ static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
 	val |= ARM_SMMU_TCR2CD(tcr, IPS);
 	val |= ARM_SMMU_TCR2CD(tcr, TBI0);
 
+	if (smmu->features & ARM_SMMU_FEAT_HA)
+		val |= ARM_SMMU_TCR2CD(tcr, HA);
+
+	if (smmu->features & ARM_SMMU_FEAT_HD)
+		val |= ARM_SMMU_TCR2CD(tcr, HD);
+
 	return val;
 }
 
@@ -1235,7 +1254,7 @@ static void arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
 			 */
 			arm_smmu_sync_cd(smmu_domain, ssid, true);
 
-		val = arm_smmu_cpu_tcr_to_cd(cd->tcr) |
+		val = arm_smmu_cpu_tcr_to_cd(smmu_domain->smmu, cd->tcr) |
 #ifdef __BIG_ENDIAN
 		      CTXDESC_CD_0_ENDI |
 #endif
@@ -2203,8 +2222,7 @@ static int arm_smmu_process_init_pgtable(struct arm_smmu_process *smmu_process,
 	reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
 	par = cpuid_feature_extract_unsigned_field(reg, ID_AA64MMFR0_PARANGE_SHIFT);
 	tcr |= par << ARM_LPAE_TCR_IPS_SHIFT;
-
-	tcr |= TCR_TBI0;
+	tcr |= TCR_TBI0 | TCR_HA | TCR_HD;
 
 	cfg->asid       = asid;
 	cfg->ttbr       = virt_to_phys(mm->pgd);
@@ -3275,6 +3293,12 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 			smmu->features |= ARM_SMMU_FEAT_E2H;
 	}
 
+	if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM) && (reg & (IDR0_HA | IDR0_HD))) {
+		smmu->features |= ARM_SMMU_FEAT_HA;
+		if (reg & IDR0_HD)
+			smmu->features |= ARM_SMMU_FEAT_HD;
+	}
+
 	/*
 	 * If the CPU is using VHE, but the SMMU doesn't support it, the SMMU
 	 * will create TLB entries for NH-EL1 world and will miss the
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 27/36] iommu/arm-smmu-v3: Register fault workqueue
  2017-10-06 13:31 ` Jean-Philippe Brucker
@ 2017-10-06 13:31   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

When using PRI or Stall, the PRI or event handler enqueues faults into the
core fault queue. Register it based on the SMMU features.

When the core stops using a PASID, it notifies the SMMU to flush all
instances of this PASID from the PRI queue. Add a way to flush the PRI and
event queue. PRI and event thread now take a spinlock while processing the
queue. The flush handler takes this lock to inspect the queue state.
We avoid livelock, where the SMMU adds fault to the queue faster than we
can consume them, by incrementing a 'batch' number on every cycle so the
flush handler only has to wait a complete cycle (two batch increments.)

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 104 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 102 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 2b2e2be03de7..7d68c6aecb14 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -570,6 +570,10 @@ struct arm_smmu_queue {
 
 	u32 __iomem			*prod_reg;
 	u32 __iomem			*cons_reg;
+
+	/* Event and PRI */
+	u64				batch;
+	wait_queue_head_t		wq;
 };
 
 struct arm_smmu_cmdq {
@@ -716,6 +720,9 @@ struct arm_smmu_device {
 
 	/* IOMMU core code handle */
 	struct iommu_device		iommu;
+
+	/* Notifier for the fault queue */
+	struct notifier_block		faultq_nb;
 };
 
 /* SMMU private data for each master */
@@ -1568,19 +1575,27 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
 static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 {
 	int i;
+	int num_handled = 0;
 	struct arm_smmu_device *smmu = dev;
 	struct arm_smmu_queue *q = &smmu->evtq.q;
+	size_t queue_size = 1 << q->max_n_shift;
 	u64 evt[EVTQ_ENT_DWORDS];
 
+	spin_lock(&q->wq.lock);
 	do {
 		while (!queue_remove_raw(q, evt)) {
 			u8 id = evt[0] >> EVTQ_0_ID_SHIFT & EVTQ_0_ID_MASK;
 
+			if (++num_handled == queue_size) {
+				q->batch++;
+				wake_up_locked(&q->wq);
+				num_handled = 0;
+			}
+
 			dev_info(smmu->dev, "event 0x%02x received:\n", id);
 			for (i = 0; i < ARRAY_SIZE(evt); ++i)
 				dev_info(smmu->dev, "\t0x%016llx\n",
 					 (unsigned long long)evt[i]);
-
 		}
 
 		/*
@@ -1593,6 +1608,11 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 
 	/* Sync our overflow flag, as we believe we're up to speed */
 	q->cons = Q_OVF(q, q->prod) | Q_WRP(q, q->cons) | Q_IDX(q, q->cons);
+
+	q->batch++;
+	wake_up_locked(&q->wq);
+	spin_unlock(&q->wq.lock);
+
 	return IRQ_HANDLED;
 }
 
@@ -1636,13 +1656,24 @@ static void arm_smmu_handle_ppr(struct arm_smmu_device *smmu, u64 *evt)
 
 static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
 {
+	int num_handled = 0;
 	struct arm_smmu_device *smmu = dev;
 	struct arm_smmu_queue *q = &smmu->priq.q;
+	size_t queue_size = 1 << q->max_n_shift;
 	u64 evt[PRIQ_ENT_DWORDS];
 
+	spin_lock(&q->wq.lock);
 	do {
-		while (!queue_remove_raw(q, evt))
+		while (!queue_remove_raw(q, evt)) {
+			spin_unlock(&q->wq.lock);
 			arm_smmu_handle_ppr(smmu, evt);
+			spin_lock(&q->wq.lock);
+			if (++num_handled == queue_size) {
+				q->batch++;
+				wake_up_locked(&q->wq);
+				num_handled = 0;
+			}
+		}
 
 		if (queue_sync_prod(q) == -EOVERFLOW)
 			dev_err(smmu->dev, "PRIQ overflow detected -- requests lost\n");
@@ -1650,9 +1681,65 @@ static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
 
 	/* Sync our overflow flag, as we believe we're up to speed */
 	q->cons = Q_OVF(q, q->prod) | Q_WRP(q, q->cons) | Q_IDX(q, q->cons);
+
+	q->batch++;
+	wake_up_locked(&q->wq);
+	spin_unlock(&q->wq.lock);
+
 	return IRQ_HANDLED;
 }
 
+/*
+ * arm_smmu_flush_queue - wait until all events/PRIs currently in the queue have
+ * been consumed.
+ *
+ * Wait until the queue thread finished a batch, or until the queue is empty.
+ * Note that we don't handle overflows on q->batch. If it occurs, just wait for
+ * the queue to be empty.
+ */
+static int arm_smmu_flush_queue(struct arm_smmu_device *smmu,
+				struct arm_smmu_queue *q, const char *name)
+{
+	int ret;
+	u64 batch;
+
+	spin_lock(&q->wq.lock);
+	if (queue_sync_prod(q) == -EOVERFLOW)
+		dev_err(smmu->dev, "%s overflow detected -- requests lost\n", name);
+
+	batch = q->batch;
+	ret = wait_event_interruptible_locked(q->wq, queue_empty(q) ||
+					      q->batch >= batch + 2);
+	spin_unlock(&q->wq.lock);
+
+	return ret;
+}
+
+static int arm_smmu_flush_queues(struct notifier_block *nb,
+				 unsigned long action, void *data)
+{
+	struct arm_smmu_device *smmu = container_of(nb, struct arm_smmu_device,
+						    faultq_nb);
+	struct device *dev = data;
+	struct arm_smmu_master_data *master = NULL;
+
+	if (dev)
+		master = dev->iommu_fwspec->iommu_priv;
+
+	if (master) {
+		/* TODO: add support for PRI and Stall */
+		return 0;
+	}
+
+	/* No target device, flush all queues. */
+	if (smmu->features & ARM_SMMU_FEAT_STALLS)
+		arm_smmu_flush_queue(smmu, &smmu->evtq.q, "evtq");
+	if (smmu->features & ARM_SMMU_FEAT_PRI)
+		arm_smmu_flush_queue(smmu, &smmu->priq.q, "priq");
+
+	return 0;
+}
+
 static irqreturn_t arm_smmu_cmdq_sync_handler(int irq, void *dev)
 {
 	/* We don't actually use CMD_SYNC interrupts for anything */
@@ -2697,6 +2784,10 @@ static int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
 		     << Q_BASE_LOG2SIZE_SHIFT;
 
 	q->prod = q->cons = 0;
+
+	init_waitqueue_head(&q->wq);
+	q->batch = 0;
+
 	return 0;
 }
 
@@ -3594,6 +3685,13 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
 	if (ret)
 		return ret;
 
+	if (smmu->features & (ARM_SMMU_FEAT_STALLS | ARM_SMMU_FEAT_PRI)) {
+		smmu->faultq_nb.notifier_call = arm_smmu_flush_queues;
+		ret = iommu_fault_queue_register(&smmu->faultq_nb);
+		if (ret)
+			return ret;
+	}
+
 	/* And we're up. Go go go! */
 	ret = iommu_device_sysfs_add(&smmu->iommu, dev, NULL,
 				     "smmu3.%pa", &ioaddr);
@@ -3636,6 +3734,8 @@ static int arm_smmu_device_remove(struct platform_device *pdev)
 {
 	struct arm_smmu_device *smmu = platform_get_drvdata(pdev);
 
+	iommu_fault_queue_unregister(&smmu->faultq_nb);
+
 	arm_smmu_device_disable(smmu);
 
 	return 0;
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 27/36] iommu/arm-smmu-v3: Register fault workqueue
@ 2017-10-06 13:31   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

When using PRI or Stall, the PRI or event handler enqueues faults into the
core fault queue. Register it based on the SMMU features.

When the core stops using a PASID, it notifies the SMMU to flush all
instances of this PASID from the PRI queue. Add a way to flush the PRI and
event queue. PRI and event thread now take a spinlock while processing the
queue. The flush handler takes this lock to inspect the queue state.
We avoid livelock, where the SMMU adds fault to the queue faster than we
can consume them, by incrementing a 'batch' number on every cycle so the
flush handler only has to wait a complete cycle (two batch increments.)

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 104 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 102 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 2b2e2be03de7..7d68c6aecb14 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -570,6 +570,10 @@ struct arm_smmu_queue {
 
 	u32 __iomem			*prod_reg;
 	u32 __iomem			*cons_reg;
+
+	/* Event and PRI */
+	u64				batch;
+	wait_queue_head_t		wq;
 };
 
 struct arm_smmu_cmdq {
@@ -716,6 +720,9 @@ struct arm_smmu_device {
 
 	/* IOMMU core code handle */
 	struct iommu_device		iommu;
+
+	/* Notifier for the fault queue */
+	struct notifier_block		faultq_nb;
 };
 
 /* SMMU private data for each master */
@@ -1568,19 +1575,27 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
 static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 {
 	int i;
+	int num_handled = 0;
 	struct arm_smmu_device *smmu = dev;
 	struct arm_smmu_queue *q = &smmu->evtq.q;
+	size_t queue_size = 1 << q->max_n_shift;
 	u64 evt[EVTQ_ENT_DWORDS];
 
+	spin_lock(&q->wq.lock);
 	do {
 		while (!queue_remove_raw(q, evt)) {
 			u8 id = evt[0] >> EVTQ_0_ID_SHIFT & EVTQ_0_ID_MASK;
 
+			if (++num_handled == queue_size) {
+				q->batch++;
+				wake_up_locked(&q->wq);
+				num_handled = 0;
+			}
+
 			dev_info(smmu->dev, "event 0x%02x received:\n", id);
 			for (i = 0; i < ARRAY_SIZE(evt); ++i)
 				dev_info(smmu->dev, "\t0x%016llx\n",
 					 (unsigned long long)evt[i]);
-
 		}
 
 		/*
@@ -1593,6 +1608,11 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 
 	/* Sync our overflow flag, as we believe we're up to speed */
 	q->cons = Q_OVF(q, q->prod) | Q_WRP(q, q->cons) | Q_IDX(q, q->cons);
+
+	q->batch++;
+	wake_up_locked(&q->wq);
+	spin_unlock(&q->wq.lock);
+
 	return IRQ_HANDLED;
 }
 
@@ -1636,13 +1656,24 @@ static void arm_smmu_handle_ppr(struct arm_smmu_device *smmu, u64 *evt)
 
 static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
 {
+	int num_handled = 0;
 	struct arm_smmu_device *smmu = dev;
 	struct arm_smmu_queue *q = &smmu->priq.q;
+	size_t queue_size = 1 << q->max_n_shift;
 	u64 evt[PRIQ_ENT_DWORDS];
 
+	spin_lock(&q->wq.lock);
 	do {
-		while (!queue_remove_raw(q, evt))
+		while (!queue_remove_raw(q, evt)) {
+			spin_unlock(&q->wq.lock);
 			arm_smmu_handle_ppr(smmu, evt);
+			spin_lock(&q->wq.lock);
+			if (++num_handled == queue_size) {
+				q->batch++;
+				wake_up_locked(&q->wq);
+				num_handled = 0;
+			}
+		}
 
 		if (queue_sync_prod(q) == -EOVERFLOW)
 			dev_err(smmu->dev, "PRIQ overflow detected -- requests lost\n");
@@ -1650,9 +1681,65 @@ static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
 
 	/* Sync our overflow flag, as we believe we're up to speed */
 	q->cons = Q_OVF(q, q->prod) | Q_WRP(q, q->cons) | Q_IDX(q, q->cons);
+
+	q->batch++;
+	wake_up_locked(&q->wq);
+	spin_unlock(&q->wq.lock);
+
 	return IRQ_HANDLED;
 }
 
+/*
+ * arm_smmu_flush_queue - wait until all events/PRIs currently in the queue have
+ * been consumed.
+ *
+ * Wait until the queue thread finished a batch, or until the queue is empty.
+ * Note that we don't handle overflows on q->batch. If it occurs, just wait for
+ * the queue to be empty.
+ */
+static int arm_smmu_flush_queue(struct arm_smmu_device *smmu,
+				struct arm_smmu_queue *q, const char *name)
+{
+	int ret;
+	u64 batch;
+
+	spin_lock(&q->wq.lock);
+	if (queue_sync_prod(q) == -EOVERFLOW)
+		dev_err(smmu->dev, "%s overflow detected -- requests lost\n", name);
+
+	batch = q->batch;
+	ret = wait_event_interruptible_locked(q->wq, queue_empty(q) ||
+					      q->batch >= batch + 2);
+	spin_unlock(&q->wq.lock);
+
+	return ret;
+}
+
+static int arm_smmu_flush_queues(struct notifier_block *nb,
+				 unsigned long action, void *data)
+{
+	struct arm_smmu_device *smmu = container_of(nb, struct arm_smmu_device,
+						    faultq_nb);
+	struct device *dev = data;
+	struct arm_smmu_master_data *master = NULL;
+
+	if (dev)
+		master = dev->iommu_fwspec->iommu_priv;
+
+	if (master) {
+		/* TODO: add support for PRI and Stall */
+		return 0;
+	}
+
+	/* No target device, flush all queues. */
+	if (smmu->features & ARM_SMMU_FEAT_STALLS)
+		arm_smmu_flush_queue(smmu, &smmu->evtq.q, "evtq");
+	if (smmu->features & ARM_SMMU_FEAT_PRI)
+		arm_smmu_flush_queue(smmu, &smmu->priq.q, "priq");
+
+	return 0;
+}
+
 static irqreturn_t arm_smmu_cmdq_sync_handler(int irq, void *dev)
 {
 	/* We don't actually use CMD_SYNC interrupts for anything */
@@ -2697,6 +2784,10 @@ static int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
 		     << Q_BASE_LOG2SIZE_SHIFT;
 
 	q->prod = q->cons = 0;
+
+	init_waitqueue_head(&q->wq);
+	q->batch = 0;
+
 	return 0;
 }
 
@@ -3594,6 +3685,13 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
 	if (ret)
 		return ret;
 
+	if (smmu->features & (ARM_SMMU_FEAT_STALLS | ARM_SMMU_FEAT_PRI)) {
+		smmu->faultq_nb.notifier_call = arm_smmu_flush_queues;
+		ret = iommu_fault_queue_register(&smmu->faultq_nb);
+		if (ret)
+			return ret;
+	}
+
 	/* And we're up. Go go go! */
 	ret = iommu_device_sysfs_add(&smmu->iommu, dev, NULL,
 				     "smmu3.%pa", &ioaddr);
@@ -3636,6 +3734,8 @@ static int arm_smmu_device_remove(struct platform_device *pdev)
 {
 	struct arm_smmu_device *smmu = platform_get_drvdata(pdev);
 
+	iommu_fault_queue_unregister(&smmu->faultq_nb);
+
 	arm_smmu_device_disable(smmu);
 
 	return 0;
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 28/36] iommu/arm-smmu-v3: Maintain a SID->device structure
  2017-10-06 13:31 ` Jean-Philippe Brucker
  (?)
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: joro-zLv9SwRftAIdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	mark.rutland-5wv7dgnIgG8, catalin.marinas-5wv7dgnIgG8,
	will.deacon-5wv7dgnIgG8, lorenzo.pieralisi-5wv7dgnIgG8,
	hanjun.guo-QSEj5FYQhm4dnm+yROfE0A, sudeep.holla-5wv7dgnIgG8,
	rjw-LthD3rsA81gm4RdzfppkhA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	robin.murphy-5wv7dgnIgG8, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	tn-nYOzD4b6Jr9Wk0Htik3J/w, liubo95-hv44wF8Li93QT0dZR+AlfA,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	gabriele.paoloni-hv44wF8Li93QT0dZR+AlfA,
	nwatters-sgV2jX0FEOL9JmXXK+q4OQ, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA,
	yi.l.liu-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	robdclark-Re5JQEeQqe8AvxtiuMwx3w

When handling faults from the event or PRI queue, we need to find the
struct device associated to a SID. Add a rb_tree to keep track of SIDs.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu-v3.c | 104 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 104 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 7d68c6aecb14..4e915e649643 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -721,10 +721,19 @@ struct arm_smmu_device {
 	/* IOMMU core code handle */
 	struct iommu_device		iommu;
 
+	struct rb_root			streams;
+	struct mutex			streams_mutex;
+
 	/* Notifier for the fault queue */
 	struct notifier_block		faultq_nb;
 };
 
+struct arm_smmu_stream {
+	u32				id;
+	struct arm_smmu_master_data	*master;
+	struct rb_node			node;
+};
+
 /* SMMU private data for each master */
 struct arm_smmu_master_data {
 	struct arm_smmu_device		*smmu;
@@ -732,6 +741,7 @@ struct arm_smmu_master_data {
 
 	struct arm_smmu_domain		*domain;
 	struct list_head		list; /* domain->devices */
+	struct arm_smmu_stream		*streams;
 
 	struct device			*dev;
 
@@ -1571,6 +1581,31 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
 	return 0;
 }
 
+static struct arm_smmu_master_data *
+arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
+{
+	struct rb_node *node;
+	struct arm_smmu_stream *stream;
+	struct arm_smmu_master_data *master = NULL;
+
+	mutex_lock(&smmu->streams_mutex);
+	node = smmu->streams.rb_node;
+	while (node) {
+		stream = rb_entry(node, struct arm_smmu_stream, node);
+		if (stream->id < sid) {
+			node = node->rb_right;
+		} else if (stream->id > sid) {
+			node = node->rb_left;
+		} else {
+			master = stream->master;
+			break;
+		}
+	}
+	mutex_unlock(&smmu->streams_mutex);
+
+	return master;
+}
+
 /* IRQ and event handlers */
 static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 {
@@ -2555,6 +2590,71 @@ static bool arm_smmu_sid_in_range(struct arm_smmu_device *smmu, u32 sid)
 	return sid < limit;
 }
 
+static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
+				  struct arm_smmu_master_data *master)
+{
+	int i;
+	int ret = 0;
+	struct arm_smmu_stream *new_stream, *cur_stream;
+	struct rb_node **new_node, *parent_node = NULL;
+	struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
+
+	master->streams = kcalloc(fwspec->num_ids,
+				  sizeof(struct arm_smmu_stream), GFP_KERNEL);
+	if (!master->streams)
+		return -ENOMEM;
+
+	mutex_lock(&smmu->streams_mutex);
+	for (i = 0; i < fwspec->num_ids && !ret; i++) {
+		new_stream = &master->streams[i];
+		new_stream->id = fwspec->ids[i];
+		new_stream->master = master;
+
+		new_node = &(smmu->streams.rb_node);
+		while (*new_node) {
+			cur_stream = rb_entry(*new_node, struct arm_smmu_stream,
+					      node);
+			parent_node = *new_node;
+			if (cur_stream->id > new_stream->id) {
+				new_node = &((*new_node)->rb_left);
+			} else if (cur_stream->id < new_stream->id) {
+				new_node = &((*new_node)->rb_right);
+			} else {
+				dev_warn(master->dev,
+					 "stream %u already in tree\n",
+					 cur_stream->id);
+				ret = -EINVAL;
+				break;
+			}
+		}
+
+		if (!ret) {
+			rb_link_node(&new_stream->node, parent_node, new_node);
+			rb_insert_color(&new_stream->node, &smmu->streams);
+		}
+	}
+	mutex_unlock(&smmu->streams_mutex);
+
+	return ret;
+}
+
+static void arm_smmu_remove_master(struct arm_smmu_device *smmu,
+				   struct arm_smmu_master_data *master)
+{
+	int i;
+	struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
+
+	if (!master->streams)
+		return;
+
+	mutex_lock(&smmu->streams_mutex);
+	for (i = 0; i < fwspec->num_ids; i++)
+		rb_erase(&master->streams[i].node, &smmu->streams);
+	mutex_unlock(&smmu->streams_mutex);
+
+	kfree(master->streams);
+}
+
 static struct iommu_ops arm_smmu_ops;
 
 static int arm_smmu_add_device(struct device *dev)
@@ -2609,6 +2709,7 @@ static int arm_smmu_add_device(struct device *dev)
 
 	group = iommu_group_get_for_dev(dev);
 	if (!IS_ERR(group)) {
+		arm_smmu_insert_master(smmu, master);
 		iommu_group_put(group);
 		iommu_device_link(&smmu->iommu, dev);
 	}
@@ -2629,6 +2730,7 @@ static void arm_smmu_remove_device(struct device *dev)
 	smmu = master->smmu;
 	if (master && master->ste.assigned)
 		arm_smmu_detach_dev(dev);
+	arm_smmu_remove_master(smmu, master);
 	iommu_group_remove_device(dev);
 	iommu_device_unlink(&smmu->iommu, dev);
 	kfree(master);
@@ -2936,6 +3038,8 @@ static int arm_smmu_init_structures(struct arm_smmu_device *smmu)
 
 	spin_lock_init(&smmu->asid_lock);
 	idr_init(&smmu->asid_idr);
+	mutex_init(&smmu->streams_mutex);
+	smmu->streams = RB_ROOT;
 
 	ret = arm_smmu_init_queues(smmu);
 	if (ret)
-- 
2.13.3

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 28/36] iommu/arm-smmu-v3: Maintain a SID->device structure
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

When handling faults from the event or PRI queue, we need to find the
struct device associated to a SID. Add a rb_tree to keep track of SIDs.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 104 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 104 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 7d68c6aecb14..4e915e649643 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -721,10 +721,19 @@ struct arm_smmu_device {
 	/* IOMMU core code handle */
 	struct iommu_device		iommu;
 
+	struct rb_root			streams;
+	struct mutex			streams_mutex;
+
 	/* Notifier for the fault queue */
 	struct notifier_block		faultq_nb;
 };
 
+struct arm_smmu_stream {
+	u32				id;
+	struct arm_smmu_master_data	*master;
+	struct rb_node			node;
+};
+
 /* SMMU private data for each master */
 struct arm_smmu_master_data {
 	struct arm_smmu_device		*smmu;
@@ -732,6 +741,7 @@ struct arm_smmu_master_data {
 
 	struct arm_smmu_domain		*domain;
 	struct list_head		list; /* domain->devices */
+	struct arm_smmu_stream		*streams;
 
 	struct device			*dev;
 
@@ -1571,6 +1581,31 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
 	return 0;
 }
 
+static struct arm_smmu_master_data *
+arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
+{
+	struct rb_node *node;
+	struct arm_smmu_stream *stream;
+	struct arm_smmu_master_data *master = NULL;
+
+	mutex_lock(&smmu->streams_mutex);
+	node = smmu->streams.rb_node;
+	while (node) {
+		stream = rb_entry(node, struct arm_smmu_stream, node);
+		if (stream->id < sid) {
+			node = node->rb_right;
+		} else if (stream->id > sid) {
+			node = node->rb_left;
+		} else {
+			master = stream->master;
+			break;
+		}
+	}
+	mutex_unlock(&smmu->streams_mutex);
+
+	return master;
+}
+
 /* IRQ and event handlers */
 static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 {
@@ -2555,6 +2590,71 @@ static bool arm_smmu_sid_in_range(struct arm_smmu_device *smmu, u32 sid)
 	return sid < limit;
 }
 
+static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
+				  struct arm_smmu_master_data *master)
+{
+	int i;
+	int ret = 0;
+	struct arm_smmu_stream *new_stream, *cur_stream;
+	struct rb_node **new_node, *parent_node = NULL;
+	struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
+
+	master->streams = kcalloc(fwspec->num_ids,
+				  sizeof(struct arm_smmu_stream), GFP_KERNEL);
+	if (!master->streams)
+		return -ENOMEM;
+
+	mutex_lock(&smmu->streams_mutex);
+	for (i = 0; i < fwspec->num_ids && !ret; i++) {
+		new_stream = &master->streams[i];
+		new_stream->id = fwspec->ids[i];
+		new_stream->master = master;
+
+		new_node = &(smmu->streams.rb_node);
+		while (*new_node) {
+			cur_stream = rb_entry(*new_node, struct arm_smmu_stream,
+					      node);
+			parent_node = *new_node;
+			if (cur_stream->id > new_stream->id) {
+				new_node = &((*new_node)->rb_left);
+			} else if (cur_stream->id < new_stream->id) {
+				new_node = &((*new_node)->rb_right);
+			} else {
+				dev_warn(master->dev,
+					 "stream %u already in tree\n",
+					 cur_stream->id);
+				ret = -EINVAL;
+				break;
+			}
+		}
+
+		if (!ret) {
+			rb_link_node(&new_stream->node, parent_node, new_node);
+			rb_insert_color(&new_stream->node, &smmu->streams);
+		}
+	}
+	mutex_unlock(&smmu->streams_mutex);
+
+	return ret;
+}
+
+static void arm_smmu_remove_master(struct arm_smmu_device *smmu,
+				   struct arm_smmu_master_data *master)
+{
+	int i;
+	struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
+
+	if (!master->streams)
+		return;
+
+	mutex_lock(&smmu->streams_mutex);
+	for (i = 0; i < fwspec->num_ids; i++)
+		rb_erase(&master->streams[i].node, &smmu->streams);
+	mutex_unlock(&smmu->streams_mutex);
+
+	kfree(master->streams);
+}
+
 static struct iommu_ops arm_smmu_ops;
 
 static int arm_smmu_add_device(struct device *dev)
@@ -2609,6 +2709,7 @@ static int arm_smmu_add_device(struct device *dev)
 
 	group = iommu_group_get_for_dev(dev);
 	if (!IS_ERR(group)) {
+		arm_smmu_insert_master(smmu, master);
 		iommu_group_put(group);
 		iommu_device_link(&smmu->iommu, dev);
 	}
@@ -2629,6 +2730,7 @@ static void arm_smmu_remove_device(struct device *dev)
 	smmu = master->smmu;
 	if (master && master->ste.assigned)
 		arm_smmu_detach_dev(dev);
+	arm_smmu_remove_master(smmu, master);
 	iommu_group_remove_device(dev);
 	iommu_device_unlink(&smmu->iommu, dev);
 	kfree(master);
@@ -2936,6 +3038,8 @@ static int arm_smmu_init_structures(struct arm_smmu_device *smmu)
 
 	spin_lock_init(&smmu->asid_lock);
 	idr_init(&smmu->asid_idr);
+	mutex_init(&smmu->streams_mutex);
+	smmu->streams = RB_ROOT;
 
 	ret = arm_smmu_init_queues(smmu);
 	if (ret)
-- 
2.13.3


^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 28/36] iommu/arm-smmu-v3: Maintain a SID->device structure
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

When handling faults from the event or PRI queue, we need to find the
struct device associated to a SID. Add a rb_tree to keep track of SIDs.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 104 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 104 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 7d68c6aecb14..4e915e649643 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -721,10 +721,19 @@ struct arm_smmu_device {
 	/* IOMMU core code handle */
 	struct iommu_device		iommu;
 
+	struct rb_root			streams;
+	struct mutex			streams_mutex;
+
 	/* Notifier for the fault queue */
 	struct notifier_block		faultq_nb;
 };
 
+struct arm_smmu_stream {
+	u32				id;
+	struct arm_smmu_master_data	*master;
+	struct rb_node			node;
+};
+
 /* SMMU private data for each master */
 struct arm_smmu_master_data {
 	struct arm_smmu_device		*smmu;
@@ -732,6 +741,7 @@ struct arm_smmu_master_data {
 
 	struct arm_smmu_domain		*domain;
 	struct list_head		list; /* domain->devices */
+	struct arm_smmu_stream		*streams;
 
 	struct device			*dev;
 
@@ -1571,6 +1581,31 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
 	return 0;
 }
 
+static struct arm_smmu_master_data *
+arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
+{
+	struct rb_node *node;
+	struct arm_smmu_stream *stream;
+	struct arm_smmu_master_data *master = NULL;
+
+	mutex_lock(&smmu->streams_mutex);
+	node = smmu->streams.rb_node;
+	while (node) {
+		stream = rb_entry(node, struct arm_smmu_stream, node);
+		if (stream->id < sid) {
+			node = node->rb_right;
+		} else if (stream->id > sid) {
+			node = node->rb_left;
+		} else {
+			master = stream->master;
+			break;
+		}
+	}
+	mutex_unlock(&smmu->streams_mutex);
+
+	return master;
+}
+
 /* IRQ and event handlers */
 static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 {
@@ -2555,6 +2590,71 @@ static bool arm_smmu_sid_in_range(struct arm_smmu_device *smmu, u32 sid)
 	return sid < limit;
 }
 
+static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
+				  struct arm_smmu_master_data *master)
+{
+	int i;
+	int ret = 0;
+	struct arm_smmu_stream *new_stream, *cur_stream;
+	struct rb_node **new_node, *parent_node = NULL;
+	struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
+
+	master->streams = kcalloc(fwspec->num_ids,
+				  sizeof(struct arm_smmu_stream), GFP_KERNEL);
+	if (!master->streams)
+		return -ENOMEM;
+
+	mutex_lock(&smmu->streams_mutex);
+	for (i = 0; i < fwspec->num_ids && !ret; i++) {
+		new_stream = &master->streams[i];
+		new_stream->id = fwspec->ids[i];
+		new_stream->master = master;
+
+		new_node = &(smmu->streams.rb_node);
+		while (*new_node) {
+			cur_stream = rb_entry(*new_node, struct arm_smmu_stream,
+					      node);
+			parent_node = *new_node;
+			if (cur_stream->id > new_stream->id) {
+				new_node = &((*new_node)->rb_left);
+			} else if (cur_stream->id < new_stream->id) {
+				new_node = &((*new_node)->rb_right);
+			} else {
+				dev_warn(master->dev,
+					 "stream %u already in tree\n",
+					 cur_stream->id);
+				ret = -EINVAL;
+				break;
+			}
+		}
+
+		if (!ret) {
+			rb_link_node(&new_stream->node, parent_node, new_node);
+			rb_insert_color(&new_stream->node, &smmu->streams);
+		}
+	}
+	mutex_unlock(&smmu->streams_mutex);
+
+	return ret;
+}
+
+static void arm_smmu_remove_master(struct arm_smmu_device *smmu,
+				   struct arm_smmu_master_data *master)
+{
+	int i;
+	struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
+
+	if (!master->streams)
+		return;
+
+	mutex_lock(&smmu->streams_mutex);
+	for (i = 0; i < fwspec->num_ids; i++)
+		rb_erase(&master->streams[i].node, &smmu->streams);
+	mutex_unlock(&smmu->streams_mutex);
+
+	kfree(master->streams);
+}
+
 static struct iommu_ops arm_smmu_ops;
 
 static int arm_smmu_add_device(struct device *dev)
@@ -2609,6 +2709,7 @@ static int arm_smmu_add_device(struct device *dev)
 
 	group = iommu_group_get_for_dev(dev);
 	if (!IS_ERR(group)) {
+		arm_smmu_insert_master(smmu, master);
 		iommu_group_put(group);
 		iommu_device_link(&smmu->iommu, dev);
 	}
@@ -2629,6 +2730,7 @@ static void arm_smmu_remove_device(struct device *dev)
 	smmu = master->smmu;
 	if (master && master->ste.assigned)
 		arm_smmu_detach_dev(dev);
+	arm_smmu_remove_master(smmu, master);
 	iommu_group_remove_device(dev);
 	iommu_device_unlink(&smmu->iommu, dev);
 	kfree(master);
@@ -2936,6 +3038,8 @@ static int arm_smmu_init_structures(struct arm_smmu_device *smmu)
 
 	spin_lock_init(&smmu->asid_lock);
 	idr_init(&smmu->asid_idr);
+	mutex_init(&smmu->streams_mutex);
+	smmu->streams = RB_ROOT;
 
 	ret = arm_smmu_init_queues(smmu);
 	if (ret)
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 29/36] iommu/arm-smmu-v3: Add stall support for platform devices
  2017-10-06 13:31 ` Jean-Philippe Brucker
  (?)
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: mark.rutland-5wv7dgnIgG8,
	gabriele.paoloni-hv44wF8Li93QT0dZR+AlfA,
	catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	rjw-LthD3rsA81gm4RdzfppkhA, sudeep.holla-5wv7dgnIgG8

The SMMU provides a Stall model for handling page faults in platform
devices. It is similar to PCI PRI, but doesn't require devices to have
their own translation cache. Instead, faulting transactions are parked and
the OS is given a chance to fix the page tables and retry the transaction.

Enable stall for devices that support it (opt-in by firmware). When an
event corresponds to a translation error, call the IOMMU fault handler. If
the fault is recoverable, it will call us back to terminate or continue
the stall.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu-v3.c | 176 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 172 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 4e915e649643..48a1da0934b4 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -418,6 +418,15 @@
 #define CMDQ_PRI_1_RESP_FAIL		(1UL << CMDQ_PRI_1_RESP_SHIFT)
 #define CMDQ_PRI_1_RESP_SUCC		(2UL << CMDQ_PRI_1_RESP_SHIFT)
 
+#define CMDQ_RESUME_0_SID_SHIFT		32
+#define CMDQ_RESUME_0_SID_MASK		0xffffffffUL
+#define CMDQ_RESUME_0_ACTION_SHIFT	12
+#define CMDQ_RESUME_0_ACTION_TERM	(0UL << CMDQ_RESUME_0_ACTION_SHIFT)
+#define CMDQ_RESUME_0_ACTION_RETRY	(1UL << CMDQ_RESUME_0_ACTION_SHIFT)
+#define CMDQ_RESUME_0_ACTION_ABORT	(2UL << CMDQ_RESUME_0_ACTION_SHIFT)
+#define CMDQ_RESUME_1_STAG_SHIFT	0
+#define CMDQ_RESUME_1_STAG_MASK		0xffffUL
+
 #define CMDQ_SYNC_0_CS_SHIFT		12
 #define CMDQ_SYNC_0_CS_NONE		(0UL << CMDQ_SYNC_0_CS_SHIFT)
 #define CMDQ_SYNC_0_CS_SEV		(2UL << CMDQ_SYNC_0_CS_SHIFT)
@@ -429,6 +438,31 @@
 #define EVTQ_0_ID_SHIFT			0
 #define EVTQ_0_ID_MASK			0xffUL
 
+#define EVT_ID_TRANSLATION_FAULT	0x10
+#define EVT_ID_ADDR_SIZE_FAULT		0x11
+#define EVT_ID_ACCESS_FAULT		0x12
+#define EVT_ID_PERMISSION_FAULT		0x13
+
+#define EVTQ_0_SSV			(1UL << 11)
+#define EVTQ_0_SSID_SHIFT		12
+#define EVTQ_0_SSID_MASK		0xfffffUL
+#define EVTQ_0_SID_SHIFT		32
+#define EVTQ_0_SID_MASK			0xffffffffUL
+#define EVTQ_1_STAG_SHIFT		0
+#define EVTQ_1_STAG_MASK		0xffffUL
+#define EVTQ_1_STALL			(1UL << 31)
+#define EVTQ_1_PRIV			(1UL << 33)
+#define EVTQ_1_EXEC			(1UL << 34)
+#define EVTQ_1_READ			(1UL << 35)
+#define EVTQ_1_S2			(1UL << 39)
+#define EVTQ_1_CLASS_SHIFT		40
+#define EVTQ_1_CLASS_MASK		0x3UL
+#define EVTQ_1_TT_READ			(1UL << 44)
+#define EVTQ_2_ADDR_SHIFT		0
+#define EVTQ_2_ADDR_MASK		0xffffffffffffffffUL
+#define EVTQ_3_IPA_SHIFT		12
+#define EVTQ_3_IPA_MASK			0xffffffffffUL
+
 /* PRI queue */
 #define PRIQ_ENT_DWORDS			2
 #define PRIQ_MAX_SZ_SHIFT		8
@@ -456,6 +490,9 @@
 #define MSI_IOVA_BASE			0x8000000
 #define MSI_IOVA_LENGTH			0x100000
 
+/* Flags for iommu_data in iommu_fault */
+#define ARM_SMMU_FAULT_STALL		(1 << 0)
+
 /* Until ACPICA headers cover IORT rev. C */
 #ifndef ACPI_IORT_SMMU_HISILICON_HI161X
 #define ACPI_IORT_SMMU_HISILICON_HI161X		0x1
@@ -552,6 +589,13 @@ struct arm_smmu_cmdq_ent {
 			enum pri_resp		resp;
 		} pri;
 
+		#define CMDQ_OP_RESUME		0x44
+		struct {
+			u32			sid;
+			u16			stag;
+			enum iommu_fault_status	resp;
+		} resume;
+
 		#define CMDQ_OP_CMD_SYNC	0x46
 	};
 };
@@ -625,6 +669,7 @@ struct arm_smmu_s1_cfg {
 	};
 
 	size_t				num_contexts;
+	bool				can_stall;
 
 	struct arm_smmu_ctx_desc	cd; /* Default context (SSID0) */
 };
@@ -646,6 +691,8 @@ struct arm_smmu_strtab_ent {
 	bool				assigned;
 	struct arm_smmu_s1_cfg		*s1_cfg;
 	struct arm_smmu_s2_cfg		*s2_cfg;
+
+	bool				can_stall;
 };
 
 struct arm_smmu_strtab_cfg {
@@ -1009,6 +1056,21 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 			return -EINVAL;
 		}
 		break;
+	case CMDQ_OP_RESUME:
+		switch (ent->resume.resp) {
+		case IOMMU_FAULT_STATUS_FAILURE:
+		case IOMMU_FAULT_STATUS_INVALID:
+			cmd[0] |= CMDQ_RESUME_0_ACTION_ABORT;
+			break;
+		case IOMMU_FAULT_STATUS_HANDLED:
+			cmd[0] |= CMDQ_RESUME_0_ACTION_RETRY;
+			break;
+		default:
+			return -EINVAL;
+		}
+		cmd[0] |= (u64)ent->resume.sid << CMDQ_RESUME_0_SID_SHIFT;
+		cmd[1] |= ent->resume.stag << CMDQ_RESUME_1_STAG_SHIFT;
+		break;
 	case CMDQ_OP_CMD_SYNC:
 		cmd[0] |= CMDQ_SYNC_0_CS_SEV;
 		break;
@@ -1093,6 +1155,32 @@ static void arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu,
 	spin_unlock_irqrestore(&smmu->cmdq.lock, flags);
 }
 
+static int arm_smmu_fault_response(struct iommu_domain *domain,
+				   struct device *dev,
+				   struct iommu_fault *fault,
+				   enum iommu_fault_status resp)
+{
+	int sid = dev->iommu_fwspec->ids[0];
+	struct arm_smmu_cmdq_ent cmd = {0};
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+
+	if (fault->iommu_data & ARM_SMMU_FAULT_STALL) {
+		cmd.opcode		= CMDQ_OP_RESUME;
+		cmd.resume.sid		= sid;
+		cmd.resume.stag		= fault->id;
+		cmd.resume.resp		= resp;
+	} else {
+		/* TODO: put PRI response here */
+		return -EINVAL;
+	}
+
+	arm_smmu_cmdq_issue_cmd(smmu_domain->smmu, &cmd);
+	cmd.opcode = CMDQ_OP_CMD_SYNC;
+	arm_smmu_cmdq_issue_cmd(smmu_domain->smmu, &cmd);
+
+	return 0;
+}
+
 /* Context descriptor manipulation functions */
 static void arm_smmu_sync_cd(struct arm_smmu_domain *smmu_domain, u32 ssid,
 			     bool leaf)
@@ -1283,7 +1371,8 @@ static void arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
 		      CTXDESC_CD_0_V;
 
 		/* STALL_MODEL==0b10 && CD.S==0 is ILLEGAL */
-		if (smmu_domain->smmu->features & ARM_SMMU_FEAT_STALL_FORCE)
+		if ((smmu_domain->smmu->features & ARM_SMMU_FEAT_STALL_FORCE) ||
+		    (ssid && smmu_domain->s1_cfg.can_stall))
 			val |= CTXDESC_CD_0_S;
 
 		cdptr[0] = cpu_to_le64(val);
@@ -1503,7 +1592,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 			 STRTAB_STE_1_STRW_SHIFT);
 
 		if (smmu->features & ARM_SMMU_FEAT_STALLS &&
-		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
+		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE) &&
+		   !ste->can_stall)
 			dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
 
 		val |= (s1ctxptr & STRTAB_STE_0_S1CTXPTR_MASK
@@ -1606,10 +1696,72 @@ arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
 	return master;
 }
 
+static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt)
+{
+	struct iommu_domain *domain;
+	struct arm_smmu_master_data *master;
+	u8 type = evt[0] >> EVTQ_0_ID_SHIFT & EVTQ_0_ID_MASK;
+	u32 sid = evt[0] >> EVTQ_0_SID_SHIFT & EVTQ_0_SID_MASK;
+
+	struct iommu_fault fault = {
+		.id		= evt[1] >> EVTQ_1_STAG_SHIFT & EVTQ_1_STAG_MASK,
+		.address	= evt[2] >> EVTQ_2_ADDR_SHIFT & EVTQ_2_ADDR_MASK,
+		.iommu_data	= ARM_SMMU_FAULT_STALL,
+	};
+
+	switch (type) {
+	case EVT_ID_TRANSLATION_FAULT:
+	case EVT_ID_ADDR_SIZE_FAULT:
+	case EVT_ID_ACCESS_FAULT:
+	case EVT_ID_PERMISSION_FAULT:
+		break;
+	default:
+		return -EFAULT;
+	}
+
+	/* Stage-2 is always pinned at the moment */
+	if (evt[1] & EVTQ_1_S2)
+		return -EFAULT;
+
+	master = arm_smmu_find_master(smmu, sid);
+	if (!master)
+		return -EINVAL;
+
+	/*
+	 * The domain is valid until the fault returns, because detach() flushes
+	 * the fault queue.
+	 */
+	domain = iommu_get_domain_for_dev(master->dev);
+	if (!domain)
+		return -EINVAL;
+
+	if (evt[1] & EVTQ_1_STALL)
+		fault.flags |= IOMMU_FAULT_RECOVERABLE;
+
+	if (evt[1] & EVTQ_1_READ)
+		fault.flags |= IOMMU_FAULT_READ;
+	else
+		fault.flags |= IOMMU_FAULT_WRITE;
+
+	if (evt[1] & EVTQ_1_EXEC)
+		fault.flags |= IOMMU_FAULT_EXEC;
+
+	if (evt[1] & EVTQ_1_PRIV)
+		fault.flags |= IOMMU_FAULT_PRIV;
+
+	if (evt[0] & EVTQ_0_SSV) {
+		fault.flags |= IOMMU_FAULT_PASID;
+		fault.pasid = evt[0] >> EVTQ_0_SSID_SHIFT & EVTQ_0_SSID_MASK;
+	}
+
+	/* Report to device driver or populate the page tables */
+	return handle_iommu_fault(domain, master->dev, &fault);
+}
+
 /* IRQ and event handlers */
 static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 {
-	int i;
+	int i, ret;
 	int num_handled = 0;
 	struct arm_smmu_device *smmu = dev;
 	struct arm_smmu_queue *q = &smmu->evtq.q;
@@ -1621,12 +1773,19 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 		while (!queue_remove_raw(q, evt)) {
 			u8 id = evt[0] >> EVTQ_0_ID_SHIFT & EVTQ_0_ID_MASK;
 
+			spin_unlock(&q->wq.lock);
+			ret = arm_smmu_handle_evt(smmu, evt);
+			spin_lock(&q->wq.lock);
+
 			if (++num_handled == queue_size) {
 				q->batch++;
 				wake_up_locked(&q->wq);
 				num_handled = 0;
 			}
 
+			if (!ret)
+				continue;
+
 			dev_info(smmu->dev, "event 0x%02x received:\n", id);
 			for (i = 0; i < ARRAY_SIZE(evt); ++i)
 				dev_info(smmu->dev, "\t0x%016llx\n",
@@ -1762,7 +1921,9 @@ static int arm_smmu_flush_queues(struct notifier_block *nb,
 		master = dev->iommu_fwspec->iommu_priv;
 
 	if (master) {
-		/* TODO: add support for PRI and Stall */
+		if (master->ste.can_stall)
+			arm_smmu_flush_queue(smmu, &smmu->evtq.q, "evtq");
+		/* TODO: add support for PRI */
 		return 0;
 	}
 
@@ -2110,6 +2271,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
 			domain->max_pasid = master->num_ssids - 1;
 			smmu_domain->s1_cfg.num_contexts = master->num_ssids;
 		}
+		smmu_domain->s1_cfg.can_stall = master->ste.can_stall;
 		break;
 	case ARM_SMMU_DOMAIN_NESTED:
 	case ARM_SMMU_DOMAIN_S2:
@@ -2707,6 +2869,11 @@ static int arm_smmu_add_device(struct device *dev)
 		master->num_ssids = 1 << min(smmu->ssid_bits,
 					     fwspec->num_pasid_bits);
 
+	if (fwspec->can_stall && smmu->features & ARM_SMMU_FEAT_STALLS) {
+		master->can_fault = true;
+		master->ste.can_stall = true;
+	}
+
 	group = iommu_group_get_for_dev(dev);
 	if (!IS_ERR(group)) {
 		arm_smmu_insert_master(smmu, master);
@@ -2845,6 +3012,7 @@ static struct iommu_ops arm_smmu_ops = {
 	.process_detach		= arm_smmu_process_detach,
 	.process_invalidate	= arm_smmu_process_invalidate,
 	.process_exit		= arm_smmu_process_exit,
+	.fault_response		= arm_smmu_fault_response,
 	.map			= arm_smmu_map,
 	.unmap			= arm_smmu_unmap,
 	.map_sg			= default_iommu_map_sg,
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 29/36] iommu/arm-smmu-v3: Add stall support for platform devices
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

The SMMU provides a Stall model for handling page faults in platform
devices. It is similar to PCI PRI, but doesn't require devices to have
their own translation cache. Instead, faulting transactions are parked and
the OS is given a chance to fix the page tables and retry the transaction.

Enable stall for devices that support it (opt-in by firmware). When an
event corresponds to a translation error, call the IOMMU fault handler. If
the fault is recoverable, it will call us back to terminate or continue
the stall.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 176 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 172 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 4e915e649643..48a1da0934b4 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -418,6 +418,15 @@
 #define CMDQ_PRI_1_RESP_FAIL		(1UL << CMDQ_PRI_1_RESP_SHIFT)
 #define CMDQ_PRI_1_RESP_SUCC		(2UL << CMDQ_PRI_1_RESP_SHIFT)
 
+#define CMDQ_RESUME_0_SID_SHIFT		32
+#define CMDQ_RESUME_0_SID_MASK		0xffffffffUL
+#define CMDQ_RESUME_0_ACTION_SHIFT	12
+#define CMDQ_RESUME_0_ACTION_TERM	(0UL << CMDQ_RESUME_0_ACTION_SHIFT)
+#define CMDQ_RESUME_0_ACTION_RETRY	(1UL << CMDQ_RESUME_0_ACTION_SHIFT)
+#define CMDQ_RESUME_0_ACTION_ABORT	(2UL << CMDQ_RESUME_0_ACTION_SHIFT)
+#define CMDQ_RESUME_1_STAG_SHIFT	0
+#define CMDQ_RESUME_1_STAG_MASK		0xffffUL
+
 #define CMDQ_SYNC_0_CS_SHIFT		12
 #define CMDQ_SYNC_0_CS_NONE		(0UL << CMDQ_SYNC_0_CS_SHIFT)
 #define CMDQ_SYNC_0_CS_SEV		(2UL << CMDQ_SYNC_0_CS_SHIFT)
@@ -429,6 +438,31 @@
 #define EVTQ_0_ID_SHIFT			0
 #define EVTQ_0_ID_MASK			0xffUL
 
+#define EVT_ID_TRANSLATION_FAULT	0x10
+#define EVT_ID_ADDR_SIZE_FAULT		0x11
+#define EVT_ID_ACCESS_FAULT		0x12
+#define EVT_ID_PERMISSION_FAULT		0x13
+
+#define EVTQ_0_SSV			(1UL << 11)
+#define EVTQ_0_SSID_SHIFT		12
+#define EVTQ_0_SSID_MASK		0xfffffUL
+#define EVTQ_0_SID_SHIFT		32
+#define EVTQ_0_SID_MASK			0xffffffffUL
+#define EVTQ_1_STAG_SHIFT		0
+#define EVTQ_1_STAG_MASK		0xffffUL
+#define EVTQ_1_STALL			(1UL << 31)
+#define EVTQ_1_PRIV			(1UL << 33)
+#define EVTQ_1_EXEC			(1UL << 34)
+#define EVTQ_1_READ			(1UL << 35)
+#define EVTQ_1_S2			(1UL << 39)
+#define EVTQ_1_CLASS_SHIFT		40
+#define EVTQ_1_CLASS_MASK		0x3UL
+#define EVTQ_1_TT_READ			(1UL << 44)
+#define EVTQ_2_ADDR_SHIFT		0
+#define EVTQ_2_ADDR_MASK		0xffffffffffffffffUL
+#define EVTQ_3_IPA_SHIFT		12
+#define EVTQ_3_IPA_MASK			0xffffffffffUL
+
 /* PRI queue */
 #define PRIQ_ENT_DWORDS			2
 #define PRIQ_MAX_SZ_SHIFT		8
@@ -456,6 +490,9 @@
 #define MSI_IOVA_BASE			0x8000000
 #define MSI_IOVA_LENGTH			0x100000
 
+/* Flags for iommu_data in iommu_fault */
+#define ARM_SMMU_FAULT_STALL		(1 << 0)
+
 /* Until ACPICA headers cover IORT rev. C */
 #ifndef ACPI_IORT_SMMU_HISILICON_HI161X
 #define ACPI_IORT_SMMU_HISILICON_HI161X		0x1
@@ -552,6 +589,13 @@ struct arm_smmu_cmdq_ent {
 			enum pri_resp		resp;
 		} pri;
 
+		#define CMDQ_OP_RESUME		0x44
+		struct {
+			u32			sid;
+			u16			stag;
+			enum iommu_fault_status	resp;
+		} resume;
+
 		#define CMDQ_OP_CMD_SYNC	0x46
 	};
 };
@@ -625,6 +669,7 @@ struct arm_smmu_s1_cfg {
 	};
 
 	size_t				num_contexts;
+	bool				can_stall;
 
 	struct arm_smmu_ctx_desc	cd; /* Default context (SSID0) */
 };
@@ -646,6 +691,8 @@ struct arm_smmu_strtab_ent {
 	bool				assigned;
 	struct arm_smmu_s1_cfg		*s1_cfg;
 	struct arm_smmu_s2_cfg		*s2_cfg;
+
+	bool				can_stall;
 };
 
 struct arm_smmu_strtab_cfg {
@@ -1009,6 +1056,21 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 			return -EINVAL;
 		}
 		break;
+	case CMDQ_OP_RESUME:
+		switch (ent->resume.resp) {
+		case IOMMU_FAULT_STATUS_FAILURE:
+		case IOMMU_FAULT_STATUS_INVALID:
+			cmd[0] |= CMDQ_RESUME_0_ACTION_ABORT;
+			break;
+		case IOMMU_FAULT_STATUS_HANDLED:
+			cmd[0] |= CMDQ_RESUME_0_ACTION_RETRY;
+			break;
+		default:
+			return -EINVAL;
+		}
+		cmd[0] |= (u64)ent->resume.sid << CMDQ_RESUME_0_SID_SHIFT;
+		cmd[1] |= ent->resume.stag << CMDQ_RESUME_1_STAG_SHIFT;
+		break;
 	case CMDQ_OP_CMD_SYNC:
 		cmd[0] |= CMDQ_SYNC_0_CS_SEV;
 		break;
@@ -1093,6 +1155,32 @@ static void arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu,
 	spin_unlock_irqrestore(&smmu->cmdq.lock, flags);
 }
 
+static int arm_smmu_fault_response(struct iommu_domain *domain,
+				   struct device *dev,
+				   struct iommu_fault *fault,
+				   enum iommu_fault_status resp)
+{
+	int sid = dev->iommu_fwspec->ids[0];
+	struct arm_smmu_cmdq_ent cmd = {0};
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+
+	if (fault->iommu_data & ARM_SMMU_FAULT_STALL) {
+		cmd.opcode		= CMDQ_OP_RESUME;
+		cmd.resume.sid		= sid;
+		cmd.resume.stag		= fault->id;
+		cmd.resume.resp		= resp;
+	} else {
+		/* TODO: put PRI response here */
+		return -EINVAL;
+	}
+
+	arm_smmu_cmdq_issue_cmd(smmu_domain->smmu, &cmd);
+	cmd.opcode = CMDQ_OP_CMD_SYNC;
+	arm_smmu_cmdq_issue_cmd(smmu_domain->smmu, &cmd);
+
+	return 0;
+}
+
 /* Context descriptor manipulation functions */
 static void arm_smmu_sync_cd(struct arm_smmu_domain *smmu_domain, u32 ssid,
 			     bool leaf)
@@ -1283,7 +1371,8 @@ static void arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
 		      CTXDESC_CD_0_V;
 
 		/* STALL_MODEL==0b10 && CD.S==0 is ILLEGAL */
-		if (smmu_domain->smmu->features & ARM_SMMU_FEAT_STALL_FORCE)
+		if ((smmu_domain->smmu->features & ARM_SMMU_FEAT_STALL_FORCE) ||
+		    (ssid && smmu_domain->s1_cfg.can_stall))
 			val |= CTXDESC_CD_0_S;
 
 		cdptr[0] = cpu_to_le64(val);
@@ -1503,7 +1592,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 			 STRTAB_STE_1_STRW_SHIFT);
 
 		if (smmu->features & ARM_SMMU_FEAT_STALLS &&
-		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
+		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE) &&
+		   !ste->can_stall)
 			dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
 
 		val |= (s1ctxptr & STRTAB_STE_0_S1CTXPTR_MASK
@@ -1606,10 +1696,72 @@ arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
 	return master;
 }
 
+static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt)
+{
+	struct iommu_domain *domain;
+	struct arm_smmu_master_data *master;
+	u8 type = evt[0] >> EVTQ_0_ID_SHIFT & EVTQ_0_ID_MASK;
+	u32 sid = evt[0] >> EVTQ_0_SID_SHIFT & EVTQ_0_SID_MASK;
+
+	struct iommu_fault fault = {
+		.id		= evt[1] >> EVTQ_1_STAG_SHIFT & EVTQ_1_STAG_MASK,
+		.address	= evt[2] >> EVTQ_2_ADDR_SHIFT & EVTQ_2_ADDR_MASK,
+		.iommu_data	= ARM_SMMU_FAULT_STALL,
+	};
+
+	switch (type) {
+	case EVT_ID_TRANSLATION_FAULT:
+	case EVT_ID_ADDR_SIZE_FAULT:
+	case EVT_ID_ACCESS_FAULT:
+	case EVT_ID_PERMISSION_FAULT:
+		break;
+	default:
+		return -EFAULT;
+	}
+
+	/* Stage-2 is always pinned at the moment */
+	if (evt[1] & EVTQ_1_S2)
+		return -EFAULT;
+
+	master = arm_smmu_find_master(smmu, sid);
+	if (!master)
+		return -EINVAL;
+
+	/*
+	 * The domain is valid until the fault returns, because detach() flushes
+	 * the fault queue.
+	 */
+	domain = iommu_get_domain_for_dev(master->dev);
+	if (!domain)
+		return -EINVAL;
+
+	if (evt[1] & EVTQ_1_STALL)
+		fault.flags |= IOMMU_FAULT_RECOVERABLE;
+
+	if (evt[1] & EVTQ_1_READ)
+		fault.flags |= IOMMU_FAULT_READ;
+	else
+		fault.flags |= IOMMU_FAULT_WRITE;
+
+	if (evt[1] & EVTQ_1_EXEC)
+		fault.flags |= IOMMU_FAULT_EXEC;
+
+	if (evt[1] & EVTQ_1_PRIV)
+		fault.flags |= IOMMU_FAULT_PRIV;
+
+	if (evt[0] & EVTQ_0_SSV) {
+		fault.flags |= IOMMU_FAULT_PASID;
+		fault.pasid = evt[0] >> EVTQ_0_SSID_SHIFT & EVTQ_0_SSID_MASK;
+	}
+
+	/* Report to device driver or populate the page tables */
+	return handle_iommu_fault(domain, master->dev, &fault);
+}
+
 /* IRQ and event handlers */
 static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 {
-	int i;
+	int i, ret;
 	int num_handled = 0;
 	struct arm_smmu_device *smmu = dev;
 	struct arm_smmu_queue *q = &smmu->evtq.q;
@@ -1621,12 +1773,19 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 		while (!queue_remove_raw(q, evt)) {
 			u8 id = evt[0] >> EVTQ_0_ID_SHIFT & EVTQ_0_ID_MASK;
 
+			spin_unlock(&q->wq.lock);
+			ret = arm_smmu_handle_evt(smmu, evt);
+			spin_lock(&q->wq.lock);
+
 			if (++num_handled == queue_size) {
 				q->batch++;
 				wake_up_locked(&q->wq);
 				num_handled = 0;
 			}
 
+			if (!ret)
+				continue;
+
 			dev_info(smmu->dev, "event 0x%02x received:\n", id);
 			for (i = 0; i < ARRAY_SIZE(evt); ++i)
 				dev_info(smmu->dev, "\t0x%016llx\n",
@@ -1762,7 +1921,9 @@ static int arm_smmu_flush_queues(struct notifier_block *nb,
 		master = dev->iommu_fwspec->iommu_priv;
 
 	if (master) {
-		/* TODO: add support for PRI and Stall */
+		if (master->ste.can_stall)
+			arm_smmu_flush_queue(smmu, &smmu->evtq.q, "evtq");
+		/* TODO: add support for PRI */
 		return 0;
 	}
 
@@ -2110,6 +2271,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
 			domain->max_pasid = master->num_ssids - 1;
 			smmu_domain->s1_cfg.num_contexts = master->num_ssids;
 		}
+		smmu_domain->s1_cfg.can_stall = master->ste.can_stall;
 		break;
 	case ARM_SMMU_DOMAIN_NESTED:
 	case ARM_SMMU_DOMAIN_S2:
@@ -2707,6 +2869,11 @@ static int arm_smmu_add_device(struct device *dev)
 		master->num_ssids = 1 << min(smmu->ssid_bits,
 					     fwspec->num_pasid_bits);
 
+	if (fwspec->can_stall && smmu->features & ARM_SMMU_FEAT_STALLS) {
+		master->can_fault = true;
+		master->ste.can_stall = true;
+	}
+
 	group = iommu_group_get_for_dev(dev);
 	if (!IS_ERR(group)) {
 		arm_smmu_insert_master(smmu, master);
@@ -2845,6 +3012,7 @@ static struct iommu_ops arm_smmu_ops = {
 	.process_detach		= arm_smmu_process_detach,
 	.process_invalidate	= arm_smmu_process_invalidate,
 	.process_exit		= arm_smmu_process_exit,
+	.fault_response		= arm_smmu_fault_response,
 	.map			= arm_smmu_map,
 	.unmap			= arm_smmu_unmap,
 	.map_sg			= default_iommu_map_sg,
-- 
2.13.3


^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 29/36] iommu/arm-smmu-v3: Add stall support for platform devices
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

The SMMU provides a Stall model for handling page faults in platform
devices. It is similar to PCI PRI, but doesn't require devices to have
their own translation cache. Instead, faulting transactions are parked and
the OS is given a chance to fix the page tables and retry the transaction.

Enable stall for devices that support it (opt-in by firmware). When an
event corresponds to a translation error, call the IOMMU fault handler. If
the fault is recoverable, it will call us back to terminate or continue
the stall.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 176 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 172 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 4e915e649643..48a1da0934b4 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -418,6 +418,15 @@
 #define CMDQ_PRI_1_RESP_FAIL		(1UL << CMDQ_PRI_1_RESP_SHIFT)
 #define CMDQ_PRI_1_RESP_SUCC		(2UL << CMDQ_PRI_1_RESP_SHIFT)
 
+#define CMDQ_RESUME_0_SID_SHIFT		32
+#define CMDQ_RESUME_0_SID_MASK		0xffffffffUL
+#define CMDQ_RESUME_0_ACTION_SHIFT	12
+#define CMDQ_RESUME_0_ACTION_TERM	(0UL << CMDQ_RESUME_0_ACTION_SHIFT)
+#define CMDQ_RESUME_0_ACTION_RETRY	(1UL << CMDQ_RESUME_0_ACTION_SHIFT)
+#define CMDQ_RESUME_0_ACTION_ABORT	(2UL << CMDQ_RESUME_0_ACTION_SHIFT)
+#define CMDQ_RESUME_1_STAG_SHIFT	0
+#define CMDQ_RESUME_1_STAG_MASK		0xffffUL
+
 #define CMDQ_SYNC_0_CS_SHIFT		12
 #define CMDQ_SYNC_0_CS_NONE		(0UL << CMDQ_SYNC_0_CS_SHIFT)
 #define CMDQ_SYNC_0_CS_SEV		(2UL << CMDQ_SYNC_0_CS_SHIFT)
@@ -429,6 +438,31 @@
 #define EVTQ_0_ID_SHIFT			0
 #define EVTQ_0_ID_MASK			0xffUL
 
+#define EVT_ID_TRANSLATION_FAULT	0x10
+#define EVT_ID_ADDR_SIZE_FAULT		0x11
+#define EVT_ID_ACCESS_FAULT		0x12
+#define EVT_ID_PERMISSION_FAULT		0x13
+
+#define EVTQ_0_SSV			(1UL << 11)
+#define EVTQ_0_SSID_SHIFT		12
+#define EVTQ_0_SSID_MASK		0xfffffUL
+#define EVTQ_0_SID_SHIFT		32
+#define EVTQ_0_SID_MASK			0xffffffffUL
+#define EVTQ_1_STAG_SHIFT		0
+#define EVTQ_1_STAG_MASK		0xffffUL
+#define EVTQ_1_STALL			(1UL << 31)
+#define EVTQ_1_PRIV			(1UL << 33)
+#define EVTQ_1_EXEC			(1UL << 34)
+#define EVTQ_1_READ			(1UL << 35)
+#define EVTQ_1_S2			(1UL << 39)
+#define EVTQ_1_CLASS_SHIFT		40
+#define EVTQ_1_CLASS_MASK		0x3UL
+#define EVTQ_1_TT_READ			(1UL << 44)
+#define EVTQ_2_ADDR_SHIFT		0
+#define EVTQ_2_ADDR_MASK		0xffffffffffffffffUL
+#define EVTQ_3_IPA_SHIFT		12
+#define EVTQ_3_IPA_MASK			0xffffffffffUL
+
 /* PRI queue */
 #define PRIQ_ENT_DWORDS			2
 #define PRIQ_MAX_SZ_SHIFT		8
@@ -456,6 +490,9 @@
 #define MSI_IOVA_BASE			0x8000000
 #define MSI_IOVA_LENGTH			0x100000
 
+/* Flags for iommu_data in iommu_fault */
+#define ARM_SMMU_FAULT_STALL		(1 << 0)
+
 /* Until ACPICA headers cover IORT rev. C */
 #ifndef ACPI_IORT_SMMU_HISILICON_HI161X
 #define ACPI_IORT_SMMU_HISILICON_HI161X		0x1
@@ -552,6 +589,13 @@ struct arm_smmu_cmdq_ent {
 			enum pri_resp		resp;
 		} pri;
 
+		#define CMDQ_OP_RESUME		0x44
+		struct {
+			u32			sid;
+			u16			stag;
+			enum iommu_fault_status	resp;
+		} resume;
+
 		#define CMDQ_OP_CMD_SYNC	0x46
 	};
 };
@@ -625,6 +669,7 @@ struct arm_smmu_s1_cfg {
 	};
 
 	size_t				num_contexts;
+	bool				can_stall;
 
 	struct arm_smmu_ctx_desc	cd; /* Default context (SSID0) */
 };
@@ -646,6 +691,8 @@ struct arm_smmu_strtab_ent {
 	bool				assigned;
 	struct arm_smmu_s1_cfg		*s1_cfg;
 	struct arm_smmu_s2_cfg		*s2_cfg;
+
+	bool				can_stall;
 };
 
 struct arm_smmu_strtab_cfg {
@@ -1009,6 +1056,21 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 			return -EINVAL;
 		}
 		break;
+	case CMDQ_OP_RESUME:
+		switch (ent->resume.resp) {
+		case IOMMU_FAULT_STATUS_FAILURE:
+		case IOMMU_FAULT_STATUS_INVALID:
+			cmd[0] |= CMDQ_RESUME_0_ACTION_ABORT;
+			break;
+		case IOMMU_FAULT_STATUS_HANDLED:
+			cmd[0] |= CMDQ_RESUME_0_ACTION_RETRY;
+			break;
+		default:
+			return -EINVAL;
+		}
+		cmd[0] |= (u64)ent->resume.sid << CMDQ_RESUME_0_SID_SHIFT;
+		cmd[1] |= ent->resume.stag << CMDQ_RESUME_1_STAG_SHIFT;
+		break;
 	case CMDQ_OP_CMD_SYNC:
 		cmd[0] |= CMDQ_SYNC_0_CS_SEV;
 		break;
@@ -1093,6 +1155,32 @@ static void arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu,
 	spin_unlock_irqrestore(&smmu->cmdq.lock, flags);
 }
 
+static int arm_smmu_fault_response(struct iommu_domain *domain,
+				   struct device *dev,
+				   struct iommu_fault *fault,
+				   enum iommu_fault_status resp)
+{
+	int sid = dev->iommu_fwspec->ids[0];
+	struct arm_smmu_cmdq_ent cmd = {0};
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+
+	if (fault->iommu_data & ARM_SMMU_FAULT_STALL) {
+		cmd.opcode		= CMDQ_OP_RESUME;
+		cmd.resume.sid		= sid;
+		cmd.resume.stag		= fault->id;
+		cmd.resume.resp		= resp;
+	} else {
+		/* TODO: put PRI response here */
+		return -EINVAL;
+	}
+
+	arm_smmu_cmdq_issue_cmd(smmu_domain->smmu, &cmd);
+	cmd.opcode = CMDQ_OP_CMD_SYNC;
+	arm_smmu_cmdq_issue_cmd(smmu_domain->smmu, &cmd);
+
+	return 0;
+}
+
 /* Context descriptor manipulation functions */
 static void arm_smmu_sync_cd(struct arm_smmu_domain *smmu_domain, u32 ssid,
 			     bool leaf)
@@ -1283,7 +1371,8 @@ static void arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain,
 		      CTXDESC_CD_0_V;
 
 		/* STALL_MODEL==0b10 && CD.S==0 is ILLEGAL */
-		if (smmu_domain->smmu->features & ARM_SMMU_FEAT_STALL_FORCE)
+		if ((smmu_domain->smmu->features & ARM_SMMU_FEAT_STALL_FORCE) ||
+		    (ssid && smmu_domain->s1_cfg.can_stall))
 			val |= CTXDESC_CD_0_S;
 
 		cdptr[0] = cpu_to_le64(val);
@@ -1503,7 +1592,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 			 STRTAB_STE_1_STRW_SHIFT);
 
 		if (smmu->features & ARM_SMMU_FEAT_STALLS &&
-		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
+		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE) &&
+		   !ste->can_stall)
 			dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
 
 		val |= (s1ctxptr & STRTAB_STE_0_S1CTXPTR_MASK
@@ -1606,10 +1696,72 @@ arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
 	return master;
 }
 
+static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt)
+{
+	struct iommu_domain *domain;
+	struct arm_smmu_master_data *master;
+	u8 type = evt[0] >> EVTQ_0_ID_SHIFT & EVTQ_0_ID_MASK;
+	u32 sid = evt[0] >> EVTQ_0_SID_SHIFT & EVTQ_0_SID_MASK;
+
+	struct iommu_fault fault = {
+		.id		= evt[1] >> EVTQ_1_STAG_SHIFT & EVTQ_1_STAG_MASK,
+		.address	= evt[2] >> EVTQ_2_ADDR_SHIFT & EVTQ_2_ADDR_MASK,
+		.iommu_data	= ARM_SMMU_FAULT_STALL,
+	};
+
+	switch (type) {
+	case EVT_ID_TRANSLATION_FAULT:
+	case EVT_ID_ADDR_SIZE_FAULT:
+	case EVT_ID_ACCESS_FAULT:
+	case EVT_ID_PERMISSION_FAULT:
+		break;
+	default:
+		return -EFAULT;
+	}
+
+	/* Stage-2 is always pinned at the moment */
+	if (evt[1] & EVTQ_1_S2)
+		return -EFAULT;
+
+	master = arm_smmu_find_master(smmu, sid);
+	if (!master)
+		return -EINVAL;
+
+	/*
+	 * The domain is valid until the fault returns, because detach() flushes
+	 * the fault queue.
+	 */
+	domain = iommu_get_domain_for_dev(master->dev);
+	if (!domain)
+		return -EINVAL;
+
+	if (evt[1] & EVTQ_1_STALL)
+		fault.flags |= IOMMU_FAULT_RECOVERABLE;
+
+	if (evt[1] & EVTQ_1_READ)
+		fault.flags |= IOMMU_FAULT_READ;
+	else
+		fault.flags |= IOMMU_FAULT_WRITE;
+
+	if (evt[1] & EVTQ_1_EXEC)
+		fault.flags |= IOMMU_FAULT_EXEC;
+
+	if (evt[1] & EVTQ_1_PRIV)
+		fault.flags |= IOMMU_FAULT_PRIV;
+
+	if (evt[0] & EVTQ_0_SSV) {
+		fault.flags |= IOMMU_FAULT_PASID;
+		fault.pasid = evt[0] >> EVTQ_0_SSID_SHIFT & EVTQ_0_SSID_MASK;
+	}
+
+	/* Report to device driver or populate the page tables */
+	return handle_iommu_fault(domain, master->dev, &fault);
+}
+
 /* IRQ and event handlers */
 static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 {
-	int i;
+	int i, ret;
 	int num_handled = 0;
 	struct arm_smmu_device *smmu = dev;
 	struct arm_smmu_queue *q = &smmu->evtq.q;
@@ -1621,12 +1773,19 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 		while (!queue_remove_raw(q, evt)) {
 			u8 id = evt[0] >> EVTQ_0_ID_SHIFT & EVTQ_0_ID_MASK;
 
+			spin_unlock(&q->wq.lock);
+			ret = arm_smmu_handle_evt(smmu, evt);
+			spin_lock(&q->wq.lock);
+
 			if (++num_handled == queue_size) {
 				q->batch++;
 				wake_up_locked(&q->wq);
 				num_handled = 0;
 			}
 
+			if (!ret)
+				continue;
+
 			dev_info(smmu->dev, "event 0x%02x received:\n", id);
 			for (i = 0; i < ARRAY_SIZE(evt); ++i)
 				dev_info(smmu->dev, "\t0x%016llx\n",
@@ -1762,7 +1921,9 @@ static int arm_smmu_flush_queues(struct notifier_block *nb,
 		master = dev->iommu_fwspec->iommu_priv;
 
 	if (master) {
-		/* TODO: add support for PRI and Stall */
+		if (master->ste.can_stall)
+			arm_smmu_flush_queue(smmu, &smmu->evtq.q, "evtq");
+		/* TODO: add support for PRI */
 		return 0;
 	}
 
@@ -2110,6 +2271,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
 			domain->max_pasid = master->num_ssids - 1;
 			smmu_domain->s1_cfg.num_contexts = master->num_ssids;
 		}
+		smmu_domain->s1_cfg.can_stall = master->ste.can_stall;
 		break;
 	case ARM_SMMU_DOMAIN_NESTED:
 	case ARM_SMMU_DOMAIN_S2:
@@ -2707,6 +2869,11 @@ static int arm_smmu_add_device(struct device *dev)
 		master->num_ssids = 1 << min(smmu->ssid_bits,
 					     fwspec->num_pasid_bits);
 
+	if (fwspec->can_stall && smmu->features & ARM_SMMU_FEAT_STALLS) {
+		master->can_fault = true;
+		master->ste.can_stall = true;
+	}
+
 	group = iommu_group_get_for_dev(dev);
 	if (!IS_ERR(group)) {
 		arm_smmu_insert_master(smmu, master);
@@ -2845,6 +3012,7 @@ static struct iommu_ops arm_smmu_ops = {
 	.process_detach		= arm_smmu_process_detach,
 	.process_invalidate	= arm_smmu_process_invalidate,
 	.process_exit		= arm_smmu_process_exit,
+	.fault_response		= arm_smmu_fault_response,
 	.map			= arm_smmu_map,
 	.unmap			= arm_smmu_unmap,
 	.map_sg			= default_iommu_map_sg,
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 30/36] ACPI/IORT: Check ATS capability in root complex nodes
  2017-10-06 13:31 ` Jean-Philippe Brucker
  (?)
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: joro-zLv9SwRftAIdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	mark.rutland-5wv7dgnIgG8, catalin.marinas-5wv7dgnIgG8,
	will.deacon-5wv7dgnIgG8, lorenzo.pieralisi-5wv7dgnIgG8,
	hanjun.guo-QSEj5FYQhm4dnm+yROfE0A, sudeep.holla-5wv7dgnIgG8,
	rjw-LthD3rsA81gm4RdzfppkhA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	robin.murphy-5wv7dgnIgG8, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	tn-nYOzD4b6Jr9Wk0Htik3J/w, liubo95-hv44wF8Li93QT0dZR+AlfA,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	xieyisheng1-hv44wF8Li93QT0dZR+AlfA,
	gabriele.paoloni-hv44wF8Li93QT0dZR+AlfA,
	nwatters-sgV2jX0FEOL9JmXXK+q4OQ, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA,
	yi.l.liu-ral2JQCrhuEAvxtiuMwx3w,
	ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	robdclark-Re5JQEeQqe8AvxtiuMwx3w

Root complex node in IORT has a bit telling whether it supports ATS or
not. Store this bit in the IOMMU fwspec when setting up a device, so it
can be accessed later by an IOMMU driver.

Use the negative version at the moment because it's not clear if/how the
bit needs to be integrated in other firmare descriptions. The SMMU has a
feature bit telling if it supports ATS, which might be sufficient in most
systems for deciding whether or not we should enable the ATS capability in
endpoints.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/acpi/arm64/iort.c | 11 +++++++++++
 include/linux/iommu.h     |  4 ++++
 2 files changed, 15 insertions(+)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 9565d572f8dd..b94eac8ed21e 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -768,6 +768,14 @@ void iort_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size)
 	dev_dbg(dev, "dma_pfn_offset(%#08llx)\n", offset);
 }
 
+static bool iort_pci_rc_supports_ats(struct acpi_iort_node *node)
+{
+	struct acpi_iort_root_complex *pci_rc;
+
+	pci_rc = (struct acpi_iort_root_complex *)node->node_data;
+	return pci_rc->ats_attribute & ACPI_IORT_ATS_SUPPORTED;
+}
+
 /**
  * iort_iommu_configure - Set-up IOMMU configuration for a device.
  *
@@ -803,6 +811,9 @@ const struct iommu_ops *iort_iommu_configure(struct device *dev)
 		info.node = node;
 		err = pci_for_each_dma_alias(to_pci_dev(dev),
 					     iort_pci_iommu_init, &info);
+
+		if (!err && !iort_pci_rc_supports_ats(node))
+			dev->iommu_fwspec->flags |= IOMMU_FWSPEC_PCI_NO_ATS;
 	} else {
 		int i = 0;
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 2eb65d4724bb..661031aed0c4 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -534,12 +534,16 @@ struct iommu_fwspec {
 	const struct iommu_ops	*ops;
 	struct fwnode_handle	*iommu_fwnode;
 	void			*iommu_priv;
+	u32			flags;
 	unsigned int		num_ids;
 	unsigned int		num_pasid_bits;
 	bool			can_stall;
 	u32			ids[1];
 };
 
+/* Firmware disabled ATS in the root complex */
+#define IOMMU_FWSPEC_PCI_NO_ATS			(1 << 0)
+
 int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
 		      const struct iommu_ops *ops);
 void iommu_fwspec_free(struct device *dev);
-- 
2.13.3

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 30/36] ACPI/IORT: Check ATS capability in root complex nodes
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

Root complex node in IORT has a bit telling whether it supports ATS or
not. Store this bit in the IOMMU fwspec when setting up a device, so it
can be accessed later by an IOMMU driver.

Use the negative version at the moment because it's not clear if/how the
bit needs to be integrated in other firmare descriptions. The SMMU has a
feature bit telling if it supports ATS, which might be sufficient in most
systems for deciding whether or not we should enable the ATS capability in
endpoints.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/acpi/arm64/iort.c | 11 +++++++++++
 include/linux/iommu.h     |  4 ++++
 2 files changed, 15 insertions(+)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 9565d572f8dd..b94eac8ed21e 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -768,6 +768,14 @@ void iort_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size)
 	dev_dbg(dev, "dma_pfn_offset(%#08llx)\n", offset);
 }
 
+static bool iort_pci_rc_supports_ats(struct acpi_iort_node *node)
+{
+	struct acpi_iort_root_complex *pci_rc;
+
+	pci_rc = (struct acpi_iort_root_complex *)node->node_data;
+	return pci_rc->ats_attribute & ACPI_IORT_ATS_SUPPORTED;
+}
+
 /**
  * iort_iommu_configure - Set-up IOMMU configuration for a device.
  *
@@ -803,6 +811,9 @@ const struct iommu_ops *iort_iommu_configure(struct device *dev)
 		info.node = node;
 		err = pci_for_each_dma_alias(to_pci_dev(dev),
 					     iort_pci_iommu_init, &info);
+
+		if (!err && !iort_pci_rc_supports_ats(node))
+			dev->iommu_fwspec->flags |= IOMMU_FWSPEC_PCI_NO_ATS;
 	} else {
 		int i = 0;
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 2eb65d4724bb..661031aed0c4 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -534,12 +534,16 @@ struct iommu_fwspec {
 	const struct iommu_ops	*ops;
 	struct fwnode_handle	*iommu_fwnode;
 	void			*iommu_priv;
+	u32			flags;
 	unsigned int		num_ids;
 	unsigned int		num_pasid_bits;
 	bool			can_stall;
 	u32			ids[1];
 };
 
+/* Firmware disabled ATS in the root complex */
+#define IOMMU_FWSPEC_PCI_NO_ATS			(1 << 0)
+
 int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
 		      const struct iommu_ops *ops);
 void iommu_fwspec_free(struct device *dev);
-- 
2.13.3


^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 30/36] ACPI/IORT: Check ATS capability in root complex nodes
@ 2017-10-06 13:31     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

Root complex node in IORT has a bit telling whether it supports ATS or
not. Store this bit in the IOMMU fwspec when setting up a device, so it
can be accessed later by an IOMMU driver.

Use the negative version at the moment because it's not clear if/how the
bit needs to be integrated in other firmare descriptions. The SMMU has a
feature bit telling if it supports ATS, which might be sufficient in most
systems for deciding whether or not we should enable the ATS capability in
endpoints.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/acpi/arm64/iort.c | 11 +++++++++++
 include/linux/iommu.h     |  4 ++++
 2 files changed, 15 insertions(+)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 9565d572f8dd..b94eac8ed21e 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -768,6 +768,14 @@ void iort_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size)
 	dev_dbg(dev, "dma_pfn_offset(%#08llx)\n", offset);
 }
 
+static bool iort_pci_rc_supports_ats(struct acpi_iort_node *node)
+{
+	struct acpi_iort_root_complex *pci_rc;
+
+	pci_rc = (struct acpi_iort_root_complex *)node->node_data;
+	return pci_rc->ats_attribute & ACPI_IORT_ATS_SUPPORTED;
+}
+
 /**
  * iort_iommu_configure - Set-up IOMMU configuration for a device.
  *
@@ -803,6 +811,9 @@ const struct iommu_ops *iort_iommu_configure(struct device *dev)
 		info.node = node;
 		err = pci_for_each_dma_alias(to_pci_dev(dev),
 					     iort_pci_iommu_init, &info);
+
+		if (!err && !iort_pci_rc_supports_ats(node))
+			dev->iommu_fwspec->flags |= IOMMU_FWSPEC_PCI_NO_ATS;
 	} else {
 		int i = 0;
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 2eb65d4724bb..661031aed0c4 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -534,12 +534,16 @@ struct iommu_fwspec {
 	const struct iommu_ops	*ops;
 	struct fwnode_handle	*iommu_fwnode;
 	void			*iommu_priv;
+	u32			flags;
 	unsigned int		num_ids;
 	unsigned int		num_pasid_bits;
 	bool			can_stall;
 	u32			ids[1];
 };
 
+/* Firmware disabled ATS in the root complex */
+#define IOMMU_FWSPEC_PCI_NO_ATS			(1 << 0)
+
 int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
 		      const struct iommu_ops *ops);
 void iommu_fwspec_free(struct device *dev);
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 31/36] iommu/arm-smmu-v3: Add support for PCI ATS
  2017-10-06 13:31 ` Jean-Philippe Brucker
@ 2017-10-06 13:31   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

PCIe devices can implement their own TLB, named Address Translation Cache
(ATC). Enable Address Translation Service (ATS) for devices that support
it and send them invalidation requests whenever we invalidate the IOTLBs.

  Range calculation
  -----------------

The invalidation packet itself is a bit awkward: range must be naturally
aligned, which means that the start address is a multiple of the range
size. In addition, the size must be a power of two number of 4k pages. We
have a few options to enforce this constraint:

(1) Find the smallest naturally aligned region that covers the requested
    range. This is simple to compute and only takes one ATC_INV, but it
    will spill on lots of neighbouring ATC entries.

(2) Align the start address to the region size (rounded up to a power of
    two), and send a second invalidation for the next range of the same
    size. Still not great, but reduces spilling.

(3) Cover the range exactly with the smallest number of naturally aligned
    regions. This would be interesting to implement but as for (2),
    requires multiple ATC_INV.

As I suspect ATC invalidation packets will be a very scarce resource, I'll
go with option (1) for now, and only send one big invalidation. We can
move to (2), which is both easier to read and more gentle with the ATC,
once we've observed on real systems that we can send multiple smaller
Invalidation Requests for roughly the same price as a single big one.

Note that with io-pgtable, the unmap function is called for each page, so
this doesn't matter. The problem shows up when sharing page tables with
the MMU.

  Timeout
  -------

ATC invalidation is allowed to take up to 90 seconds, according to the
PCIe spec, so it is possible to hit the SMMU command queue timeout during
normal operations.

Some SMMU implementations will raise a CERROR_ATC_INV_SYNC when a CMD_SYNC
fails because of an ATC invalidation. Some will just abort the CMD_SYNC.
Others might let CMD_SYNC complete and have an asynchronous IMPDEF
mechanism to record the error. When we receive a CERROR_ATC_INV_SYNC, we
could retry sending all ATC_INV since last successful CMD_SYNC. When a
CMD_SYNC fails without CERROR_ATC_INV_SYNC, we could retry sending *all*
commands since last successful CMD_SYNC.

We cannot afford to wait 90 seconds in iommu_unmap, let alone MMU
notifiers. So we'd have to introduce a more clever system if this timeout
becomes a problem, like keeping hold of mappings and invalidating in the
background. Implementing safe delayed invalidations is a very complex
problem and deserves a series of its own. We'll assess whether more work
is needed to properly handle ATC invalidation timeouts once this code runs
on real hardware.

  Misc
  ----

I didn't put ATC and TLB invalidations in the same functions for three
reasons:

* TLB invalidation by range is batched and committed with a single sync.
  Batching ATC invalidation is inconvenient, endpoints limit the number of
  inflight invalidations. We'd have to count the number of invalidations
  queued and send a sync periodically. In addition, I suspect we always
  need a sync between TLB and ATC invalidation for the same page.

* Doing ATC invalidation outside tlb_inv_range also allows to send less
  requests, since TLB invalidations are done per page or block, while ATC
  invalidations target IOVA ranges.

* TLB invalidation by context is performed when freeing the domain, at
  which point there isn't any device attached anymore.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 238 ++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 228 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 48a1da0934b4..d03bec4d4b82 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -37,6 +37,7 @@
 #include <linux/of_iommu.h>
 #include <linux/of_platform.h>
 #include <linux/pci.h>
+#include <linux/pci-ats.h>
 #include <linux/platform_device.h>
 #include <linux/sched/mm.h>
 
@@ -109,6 +110,7 @@
 #define IDR5_OAS_48_BIT			(5 << IDR5_OAS_SHIFT)
 
 #define ARM_SMMU_CR0			0x20
+#define CR0_ATSCHK			(1 << 4)
 #define CR0_CMDQEN			(1 << 3)
 #define CR0_EVTQEN			(1 << 2)
 #define CR0_PRIQEN			(1 << 1)
@@ -382,6 +384,7 @@
 #define CMDQ_ERR_CERROR_NONE_IDX	0
 #define CMDQ_ERR_CERROR_ILL_IDX		1
 #define CMDQ_ERR_CERROR_ABT_IDX		2
+#define CMDQ_ERR_CERROR_ATC_INV_IDX	3
 
 #define CMDQ_0_OP_SHIFT			0
 #define CMDQ_0_OP_MASK			0xffUL
@@ -407,6 +410,15 @@
 #define CMDQ_TLBI_1_VA_MASK		~0xfffUL
 #define CMDQ_TLBI_1_IPA_MASK		0xfffffffff000UL
 
+#define CMDQ_ATC_0_SSID_SHIFT		12
+#define CMDQ_ATC_0_SSID_MASK		0xfffffUL
+#define CMDQ_ATC_0_SID_SHIFT		32
+#define CMDQ_ATC_0_SID_MASK		0xffffffffUL
+#define CMDQ_ATC_0_GLOBAL		(1UL << 9)
+#define CMDQ_ATC_1_SIZE_SHIFT		0
+#define CMDQ_ATC_1_SIZE_MASK		0x3fUL
+#define CMDQ_ATC_1_ADDR_MASK		~0xfffUL
+
 #define CMDQ_PRI_0_SSID_SHIFT		12
 #define CMDQ_PRI_0_SSID_MASK		0xfffffUL
 #define CMDQ_PRI_0_SID_SHIFT		32
@@ -507,6 +519,11 @@ module_param_named(disable_bypass, disable_bypass, bool, S_IRUGO);
 MODULE_PARM_DESC(disable_bypass,
 	"Disable bypass streams such that incoming transactions from devices that are not attached to an iommu domain will report an abort back to the device and will not be allowed to pass through the SMMU.");
 
+static bool disable_ats_check;
+module_param_named(disable_ats_check, disable_ats_check, bool, S_IRUGO);
+MODULE_PARM_DESC(disable_ats_check,
+	"By default, the SMMU checks whether each incoming transaction marked as translated is allowed by the stream configuration. This option disables the check.");
+
 enum pri_resp {
 	PRI_RESP_DENY,
 	PRI_RESP_FAIL,
@@ -581,6 +598,16 @@ struct arm_smmu_cmdq_ent {
 			u64			addr;
 		} tlbi;
 
+		#define CMDQ_OP_ATC_INV		0x40
+		#define ATC_INV_SIZE_ALL	52
+		struct {
+			u32			sid;
+			u32			ssid;
+			u64			addr;
+			u8			size;
+			bool			global;
+		} atc;
+
 		#define CMDQ_OP_PRI_RESP	0x41
 		struct {
 			u32			sid;
@@ -1037,6 +1064,14 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 	case CMDQ_OP_TLBI_EL2_ASID:
 		cmd[0] |= (u64)ent->tlbi.asid << CMDQ_TLBI_0_ASID_SHIFT;
 		break;
+	case CMDQ_OP_ATC_INV:
+		cmd[0] |= ent->substream_valid ? CMDQ_0_SSV : 0;
+		cmd[0] |= ent->atc.global ? CMDQ_ATC_0_GLOBAL : 0;
+		cmd[0] |= ent->atc.ssid << CMDQ_ATC_0_SSID_SHIFT;
+		cmd[0] |= (u64)ent->atc.sid << CMDQ_ATC_0_SID_SHIFT;
+		cmd[1] |= ent->atc.size << CMDQ_ATC_1_SIZE_SHIFT;
+		cmd[1] |= ent->atc.addr & CMDQ_ATC_1_ADDR_MASK;
+		break;
 	case CMDQ_OP_PRI_RESP:
 		cmd[0] |= ent->substream_valid ? CMDQ_0_SSV : 0;
 		cmd[0] |= ent->pri.ssid << CMDQ_PRI_0_SSID_SHIFT;
@@ -1087,6 +1122,7 @@ static void arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu)
 		[CMDQ_ERR_CERROR_NONE_IDX]	= "No error",
 		[CMDQ_ERR_CERROR_ILL_IDX]	= "Illegal command",
 		[CMDQ_ERR_CERROR_ABT_IDX]	= "Abort on command fetch",
+		[CMDQ_ERR_CERROR_ATC_INV_IDX]	= "ATC invalidate timeout",
 	};
 
 	int i;
@@ -1106,6 +1142,14 @@ static void arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu)
 		dev_err(smmu->dev, "retrying command fetch\n");
 	case CMDQ_ERR_CERROR_NONE_IDX:
 		return;
+	case CMDQ_ERR_CERROR_ATC_INV_IDX:
+		/*
+		 * ATC Invalidation Completion timeout. CONS is still pointing
+		 * at the CMD_SYNC. Attempt to complete other pending commands
+		 * by repeating the CMD_SYNC, though we might well end up back
+		 * here since the ATC invalidation may still be pending.
+		 */
+		return;
 	case CMDQ_ERR_CERROR_ILL_IDX:
 		/* Fallthrough */
 	default:
@@ -1584,9 +1628,6 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 			 STRTAB_STE_1_S1C_CACHE_WBRA
 			 << STRTAB_STE_1_S1COR_SHIFT |
 			 STRTAB_STE_1_S1C_SH_ISH << STRTAB_STE_1_S1CSH_SHIFT |
-#ifdef CONFIG_PCI_ATS
-			 STRTAB_STE_1_EATS_TRANS << STRTAB_STE_1_EATS_SHIFT |
-#endif
 			 (smmu->features & ARM_SMMU_FEAT_E2H ?
 			  STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1) <<
 			 STRTAB_STE_1_STRW_SHIFT);
@@ -1623,6 +1664,10 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 		val |= STRTAB_STE_0_CFG_S2_TRANS;
 	}
 
+	if (IS_ENABLED(CONFIG_PCI_ATS))
+		dst[1] |= cpu_to_le64(STRTAB_STE_1_EATS_TRANS
+				      << STRTAB_STE_1_EATS_SHIFT);
+
 	arm_smmu_sync_ste_for_sid(smmu, sid);
 	dst[0] = cpu_to_le64(val);
 	arm_smmu_sync_ste_for_sid(smmu, sid);
@@ -2078,6 +2123,106 @@ static const struct iommu_gather_ops arm_smmu_gather_ops = {
 	.tlb_sync	= arm_smmu_tlb_sync,
 };
 
+static bool arm_smmu_master_has_ats(struct arm_smmu_master_data *master)
+{
+	return dev_is_pci(master->dev) && to_pci_dev(master->dev)->ats_enabled;
+}
+
+static void
+arm_smmu_atc_inv_to_cmd(int ssid, unsigned long iova, size_t size,
+			struct arm_smmu_cmdq_ent *cmd)
+{
+	size_t log2_span;
+	size_t span_mask;
+	/* ATC invalidates are always on 4096 bytes pages */
+	size_t inval_grain_shift = 12;
+	unsigned long page_start, page_end;
+
+	*cmd = (struct arm_smmu_cmdq_ent) {
+		.opcode			= CMDQ_OP_ATC_INV,
+		.substream_valid	= !!ssid,
+		.atc.ssid		= ssid,
+	};
+
+	if (!size) {
+		cmd->atc.size = ATC_INV_SIZE_ALL;
+		return;
+	}
+
+	page_start	= iova >> inval_grain_shift;
+	page_end	= (iova + size - 1) >> inval_grain_shift;
+
+	/*
+	 * Find the smallest power of two that covers the range. Most
+	 * significant differing bit between start and end address indicates the
+	 * required span, ie. fls(start ^ end). For example:
+	 *
+	 * We want to invalidate pages [8; 11]. This is already the ideal range:
+	 *		x = 0b1000 ^ 0b1011 = 0b11
+	 *		span = 1 << fls(x) = 4
+	 *
+	 * To invalidate pages [7; 10], we need to invalidate [0; 15]:
+	 *		x = 0b0111 ^ 0b1010 = 0b1101
+	 *		span = 1 << fls(x) = 16
+	 */
+	log2_span	= fls_long(page_start ^ page_end);
+	span_mask	= (1ULL << log2_span) - 1;
+
+	page_start	&= ~span_mask;
+
+	cmd->atc.addr	= page_start << inval_grain_shift;
+	cmd->atc.size	= log2_span;
+}
+
+static int arm_smmu_atc_inv_master(struct arm_smmu_master_data *master,
+				   struct arm_smmu_cmdq_ent *cmd)
+{
+	int i;
+	struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
+	struct arm_smmu_cmdq_ent sync_cmd = {
+		.opcode = CMDQ_OP_CMD_SYNC,
+	};
+
+	if (!arm_smmu_master_has_ats(master))
+		return 0;
+
+	for (i = 0; i < fwspec->num_ids; i++) {
+		cmd->atc.sid = fwspec->ids[i];
+		arm_smmu_cmdq_issue_cmd(master->smmu, cmd);
+	}
+
+	arm_smmu_cmdq_issue_cmd(master->smmu, &sync_cmd);
+
+	return 0;
+}
+
+static int arm_smmu_atc_inv_master_all(struct arm_smmu_master_data *master,
+				       int ssid)
+{
+	struct arm_smmu_cmdq_ent cmd;
+
+	arm_smmu_atc_inv_to_cmd(ssid, 0, 0, &cmd);
+	return arm_smmu_atc_inv_master(master, &cmd);
+}
+
+static size_t
+arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
+			unsigned long iova, size_t size)
+{
+	unsigned long flags;
+	struct arm_smmu_cmdq_ent cmd;
+	struct arm_smmu_master_data *master;
+
+	arm_smmu_atc_inv_to_cmd(ssid, iova, size, &cmd);
+
+	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+	list_for_each_entry(master, &smmu_domain->devices, list)
+		arm_smmu_atc_inv_master(master, &cmd);
+	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+
+	return size;
+}
+
 /* IOMMU API */
 static bool arm_smmu_capable(enum iommu_cap cap)
 {
@@ -2361,6 +2506,8 @@ static void arm_smmu_detach_dev(struct device *dev)
 		__iommu_process_unbind_dev_all(&smmu_domain->domain, dev);
 
 	if (smmu_domain) {
+		arm_smmu_atc_inv_master_all(master, 0);
+
 		spin_lock_irqsave(&smmu_domain->devices_lock, flags);
 		list_del(&master->list);
 		spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
@@ -2451,12 +2598,19 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
 static size_t
 arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
 {
-	struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
+	int ret;
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
 
 	if (!ops)
 		return 0;
 
-	return ops->unmap(ops, iova, size);
+	ret = ops->unmap(ops, iova, size);
+
+	if (ret && smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS)
+		ret = arm_smmu_atc_inv_domain(smmu_domain, 0, iova, size);
+
+	return ret;
 }
 
 static phys_addr_t
@@ -2752,6 +2906,48 @@ static bool arm_smmu_sid_in_range(struct arm_smmu_device *smmu, u32 sid)
 	return sid < limit;
 }
 
+static int arm_smmu_enable_ats(struct arm_smmu_master_data *master)
+{
+	int ret;
+	size_t stu;
+	struct pci_dev *pdev;
+	struct arm_smmu_device *smmu = master->smmu;
+	struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
+
+	if (!(smmu->features & ARM_SMMU_FEAT_ATS) || !dev_is_pci(master->dev) ||
+	    (fwspec->flags & IOMMU_FWSPEC_PCI_NO_ATS))
+		return -ENOSYS;
+
+	pdev = to_pci_dev(master->dev);
+
+	/* Smallest Translation Unit: log2 of the smallest supported granule */
+	stu = __ffs(smmu->pgsize_bitmap);
+
+	ret = pci_enable_ats(pdev, stu);
+	if (ret)
+		return ret;
+
+	dev_dbg(&pdev->dev, "enabled ATS (STU=%zu, QDEP=%d)\n", stu,
+		pci_ats_queue_depth(pdev));
+
+	return 0;
+}
+
+static void arm_smmu_disable_ats(struct arm_smmu_master_data *master)
+{
+	struct pci_dev *pdev;
+
+	if (!dev_is_pci(master->dev))
+		return;
+
+	pdev = to_pci_dev(master->dev);
+
+	if (!pdev->ats_enabled)
+		return;
+
+	pci_disable_ats(pdev);
+}
+
 static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
 				  struct arm_smmu_master_data *master)
 {
@@ -2874,14 +3070,24 @@ static int arm_smmu_add_device(struct device *dev)
 		master->ste.can_stall = true;
 	}
 
+	arm_smmu_enable_ats(master);
+
 	group = iommu_group_get_for_dev(dev);
-	if (!IS_ERR(group)) {
-		arm_smmu_insert_master(smmu, master);
-		iommu_group_put(group);
-		iommu_device_link(&smmu->iommu, dev);
+	if (IS_ERR(group)) {
+		ret = PTR_ERR(group);
+		goto err_disable_ats;
 	}
 
-	return PTR_ERR_OR_ZERO(group);
+	iommu_group_put(group);
+	arm_smmu_insert_master(smmu, master);
+	iommu_device_link(&smmu->iommu, dev);
+
+	return 0;
+
+err_disable_ats:
+	arm_smmu_disable_ats(master);
+
+	return ret;
 }
 
 static void arm_smmu_remove_device(struct device *dev)
@@ -2898,6 +3104,8 @@ static void arm_smmu_remove_device(struct device *dev)
 	if (master && master->ste.assigned)
 		arm_smmu_detach_dev(dev);
 	arm_smmu_remove_master(smmu, master);
+	arm_smmu_disable_ats(master);
+
 	iommu_group_remove_device(dev);
 	iommu_device_unlink(&smmu->iommu, dev);
 	kfree(master);
@@ -3515,6 +3723,16 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
 		}
 	}
 
+	if (smmu->features & ARM_SMMU_FEAT_ATS && !disable_ats_check) {
+		enables |= CR0_ATSCHK;
+		ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0,
+					      ARM_SMMU_CR0ACK);
+		if (ret) {
+			dev_err(smmu->dev, "failed to enable ATS check\n");
+			return ret;
+		}
+	}
+
 	ret = arm_smmu_setup_irqs(smmu);
 	if (ret) {
 		dev_err(smmu->dev, "failed to setup irqs\n");
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 31/36] iommu/arm-smmu-v3: Add support for PCI ATS
@ 2017-10-06 13:31   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

PCIe devices can implement their own TLB, named Address Translation Cache
(ATC). Enable Address Translation Service (ATS) for devices that support
it and send them invalidation requests whenever we invalidate the IOTLBs.

  Range calculation
  -----------------

The invalidation packet itself is a bit awkward: range must be naturally
aligned, which means that the start address is a multiple of the range
size. In addition, the size must be a power of two number of 4k pages. We
have a few options to enforce this constraint:

(1) Find the smallest naturally aligned region that covers the requested
    range. This is simple to compute and only takes one ATC_INV, but it
    will spill on lots of neighbouring ATC entries.

(2) Align the start address to the region size (rounded up to a power of
    two), and send a second invalidation for the next range of the same
    size. Still not great, but reduces spilling.

(3) Cover the range exactly with the smallest number of naturally aligned
    regions. This would be interesting to implement but as for (2),
    requires multiple ATC_INV.

As I suspect ATC invalidation packets will be a very scarce resource, I'll
go with option (1) for now, and only send one big invalidation. We can
move to (2), which is both easier to read and more gentle with the ATC,
once we've observed on real systems that we can send multiple smaller
Invalidation Requests for roughly the same price as a single big one.

Note that with io-pgtable, the unmap function is called for each page, so
this doesn't matter. The problem shows up when sharing page tables with
the MMU.

  Timeout
  -------

ATC invalidation is allowed to take up to 90 seconds, according to the
PCIe spec, so it is possible to hit the SMMU command queue timeout during
normal operations.

Some SMMU implementations will raise a CERROR_ATC_INV_SYNC when a CMD_SYNC
fails because of an ATC invalidation. Some will just abort the CMD_SYNC.
Others might let CMD_SYNC complete and have an asynchronous IMPDEF
mechanism to record the error. When we receive a CERROR_ATC_INV_SYNC, we
could retry sending all ATC_INV since last successful CMD_SYNC. When a
CMD_SYNC fails without CERROR_ATC_INV_SYNC, we could retry sending *all*
commands since last successful CMD_SYNC.

We cannot afford to wait 90 seconds in iommu_unmap, let alone MMU
notifiers. So we'd have to introduce a more clever system if this timeout
becomes a problem, like keeping hold of mappings and invalidating in the
background. Implementing safe delayed invalidations is a very complex
problem and deserves a series of its own. We'll assess whether more work
is needed to properly handle ATC invalidation timeouts once this code runs
on real hardware.

  Misc
  ----

I didn't put ATC and TLB invalidations in the same functions for three
reasons:

* TLB invalidation by range is batched and committed with a single sync.
  Batching ATC invalidation is inconvenient, endpoints limit the number of
  inflight invalidations. We'd have to count the number of invalidations
  queued and send a sync periodically. In addition, I suspect we always
  need a sync between TLB and ATC invalidation for the same page.

* Doing ATC invalidation outside tlb_inv_range also allows to send less
  requests, since TLB invalidations are done per page or block, while ATC
  invalidations target IOVA ranges.

* TLB invalidation by context is performed when freeing the domain, at
  which point there isn't any device attached anymore.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 238 ++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 228 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 48a1da0934b4..d03bec4d4b82 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -37,6 +37,7 @@
 #include <linux/of_iommu.h>
 #include <linux/of_platform.h>
 #include <linux/pci.h>
+#include <linux/pci-ats.h>
 #include <linux/platform_device.h>
 #include <linux/sched/mm.h>
 
@@ -109,6 +110,7 @@
 #define IDR5_OAS_48_BIT			(5 << IDR5_OAS_SHIFT)
 
 #define ARM_SMMU_CR0			0x20
+#define CR0_ATSCHK			(1 << 4)
 #define CR0_CMDQEN			(1 << 3)
 #define CR0_EVTQEN			(1 << 2)
 #define CR0_PRIQEN			(1 << 1)
@@ -382,6 +384,7 @@
 #define CMDQ_ERR_CERROR_NONE_IDX	0
 #define CMDQ_ERR_CERROR_ILL_IDX		1
 #define CMDQ_ERR_CERROR_ABT_IDX		2
+#define CMDQ_ERR_CERROR_ATC_INV_IDX	3
 
 #define CMDQ_0_OP_SHIFT			0
 #define CMDQ_0_OP_MASK			0xffUL
@@ -407,6 +410,15 @@
 #define CMDQ_TLBI_1_VA_MASK		~0xfffUL
 #define CMDQ_TLBI_1_IPA_MASK		0xfffffffff000UL
 
+#define CMDQ_ATC_0_SSID_SHIFT		12
+#define CMDQ_ATC_0_SSID_MASK		0xfffffUL
+#define CMDQ_ATC_0_SID_SHIFT		32
+#define CMDQ_ATC_0_SID_MASK		0xffffffffUL
+#define CMDQ_ATC_0_GLOBAL		(1UL << 9)
+#define CMDQ_ATC_1_SIZE_SHIFT		0
+#define CMDQ_ATC_1_SIZE_MASK		0x3fUL
+#define CMDQ_ATC_1_ADDR_MASK		~0xfffUL
+
 #define CMDQ_PRI_0_SSID_SHIFT		12
 #define CMDQ_PRI_0_SSID_MASK		0xfffffUL
 #define CMDQ_PRI_0_SID_SHIFT		32
@@ -507,6 +519,11 @@ module_param_named(disable_bypass, disable_bypass, bool, S_IRUGO);
 MODULE_PARM_DESC(disable_bypass,
 	"Disable bypass streams such that incoming transactions from devices that are not attached to an iommu domain will report an abort back to the device and will not be allowed to pass through the SMMU.");
 
+static bool disable_ats_check;
+module_param_named(disable_ats_check, disable_ats_check, bool, S_IRUGO);
+MODULE_PARM_DESC(disable_ats_check,
+	"By default, the SMMU checks whether each incoming transaction marked as translated is allowed by the stream configuration. This option disables the check.");
+
 enum pri_resp {
 	PRI_RESP_DENY,
 	PRI_RESP_FAIL,
@@ -581,6 +598,16 @@ struct arm_smmu_cmdq_ent {
 			u64			addr;
 		} tlbi;
 
+		#define CMDQ_OP_ATC_INV		0x40
+		#define ATC_INV_SIZE_ALL	52
+		struct {
+			u32			sid;
+			u32			ssid;
+			u64			addr;
+			u8			size;
+			bool			global;
+		} atc;
+
 		#define CMDQ_OP_PRI_RESP	0x41
 		struct {
 			u32			sid;
@@ -1037,6 +1064,14 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 	case CMDQ_OP_TLBI_EL2_ASID:
 		cmd[0] |= (u64)ent->tlbi.asid << CMDQ_TLBI_0_ASID_SHIFT;
 		break;
+	case CMDQ_OP_ATC_INV:
+		cmd[0] |= ent->substream_valid ? CMDQ_0_SSV : 0;
+		cmd[0] |= ent->atc.global ? CMDQ_ATC_0_GLOBAL : 0;
+		cmd[0] |= ent->atc.ssid << CMDQ_ATC_0_SSID_SHIFT;
+		cmd[0] |= (u64)ent->atc.sid << CMDQ_ATC_0_SID_SHIFT;
+		cmd[1] |= ent->atc.size << CMDQ_ATC_1_SIZE_SHIFT;
+		cmd[1] |= ent->atc.addr & CMDQ_ATC_1_ADDR_MASK;
+		break;
 	case CMDQ_OP_PRI_RESP:
 		cmd[0] |= ent->substream_valid ? CMDQ_0_SSV : 0;
 		cmd[0] |= ent->pri.ssid << CMDQ_PRI_0_SSID_SHIFT;
@@ -1087,6 +1122,7 @@ static void arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu)
 		[CMDQ_ERR_CERROR_NONE_IDX]	= "No error",
 		[CMDQ_ERR_CERROR_ILL_IDX]	= "Illegal command",
 		[CMDQ_ERR_CERROR_ABT_IDX]	= "Abort on command fetch",
+		[CMDQ_ERR_CERROR_ATC_INV_IDX]	= "ATC invalidate timeout",
 	};
 
 	int i;
@@ -1106,6 +1142,14 @@ static void arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu)
 		dev_err(smmu->dev, "retrying command fetch\n");
 	case CMDQ_ERR_CERROR_NONE_IDX:
 		return;
+	case CMDQ_ERR_CERROR_ATC_INV_IDX:
+		/*
+		 * ATC Invalidation Completion timeout. CONS is still pointing
+		 * at the CMD_SYNC. Attempt to complete other pending commands
+		 * by repeating the CMD_SYNC, though we might well end up back
+		 * here since the ATC invalidation may still be pending.
+		 */
+		return;
 	case CMDQ_ERR_CERROR_ILL_IDX:
 		/* Fallthrough */
 	default:
@@ -1584,9 +1628,6 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 			 STRTAB_STE_1_S1C_CACHE_WBRA
 			 << STRTAB_STE_1_S1COR_SHIFT |
 			 STRTAB_STE_1_S1C_SH_ISH << STRTAB_STE_1_S1CSH_SHIFT |
-#ifdef CONFIG_PCI_ATS
-			 STRTAB_STE_1_EATS_TRANS << STRTAB_STE_1_EATS_SHIFT |
-#endif
 			 (smmu->features & ARM_SMMU_FEAT_E2H ?
 			  STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1) <<
 			 STRTAB_STE_1_STRW_SHIFT);
@@ -1623,6 +1664,10 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 		val |= STRTAB_STE_0_CFG_S2_TRANS;
 	}
 
+	if (IS_ENABLED(CONFIG_PCI_ATS))
+		dst[1] |= cpu_to_le64(STRTAB_STE_1_EATS_TRANS
+				      << STRTAB_STE_1_EATS_SHIFT);
+
 	arm_smmu_sync_ste_for_sid(smmu, sid);
 	dst[0] = cpu_to_le64(val);
 	arm_smmu_sync_ste_for_sid(smmu, sid);
@@ -2078,6 +2123,106 @@ static const struct iommu_gather_ops arm_smmu_gather_ops = {
 	.tlb_sync	= arm_smmu_tlb_sync,
 };
 
+static bool arm_smmu_master_has_ats(struct arm_smmu_master_data *master)
+{
+	return dev_is_pci(master->dev) && to_pci_dev(master->dev)->ats_enabled;
+}
+
+static void
+arm_smmu_atc_inv_to_cmd(int ssid, unsigned long iova, size_t size,
+			struct arm_smmu_cmdq_ent *cmd)
+{
+	size_t log2_span;
+	size_t span_mask;
+	/* ATC invalidates are always on 4096 bytes pages */
+	size_t inval_grain_shift = 12;
+	unsigned long page_start, page_end;
+
+	*cmd = (struct arm_smmu_cmdq_ent) {
+		.opcode			= CMDQ_OP_ATC_INV,
+		.substream_valid	= !!ssid,
+		.atc.ssid		= ssid,
+	};
+
+	if (!size) {
+		cmd->atc.size = ATC_INV_SIZE_ALL;
+		return;
+	}
+
+	page_start	= iova >> inval_grain_shift;
+	page_end	= (iova + size - 1) >> inval_grain_shift;
+
+	/*
+	 * Find the smallest power of two that covers the range. Most
+	 * significant differing bit between start and end address indicates the
+	 * required span, ie. fls(start ^ end). For example:
+	 *
+	 * We want to invalidate pages [8; 11]. This is already the ideal range:
+	 *		x = 0b1000 ^ 0b1011 = 0b11
+	 *		span = 1 << fls(x) = 4
+	 *
+	 * To invalidate pages [7; 10], we need to invalidate [0; 15]:
+	 *		x = 0b0111 ^ 0b1010 = 0b1101
+	 *		span = 1 << fls(x) = 16
+	 */
+	log2_span	= fls_long(page_start ^ page_end);
+	span_mask	= (1ULL << log2_span) - 1;
+
+	page_start	&= ~span_mask;
+
+	cmd->atc.addr	= page_start << inval_grain_shift;
+	cmd->atc.size	= log2_span;
+}
+
+static int arm_smmu_atc_inv_master(struct arm_smmu_master_data *master,
+				   struct arm_smmu_cmdq_ent *cmd)
+{
+	int i;
+	struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
+	struct arm_smmu_cmdq_ent sync_cmd = {
+		.opcode = CMDQ_OP_CMD_SYNC,
+	};
+
+	if (!arm_smmu_master_has_ats(master))
+		return 0;
+
+	for (i = 0; i < fwspec->num_ids; i++) {
+		cmd->atc.sid = fwspec->ids[i];
+		arm_smmu_cmdq_issue_cmd(master->smmu, cmd);
+	}
+
+	arm_smmu_cmdq_issue_cmd(master->smmu, &sync_cmd);
+
+	return 0;
+}
+
+static int arm_smmu_atc_inv_master_all(struct arm_smmu_master_data *master,
+				       int ssid)
+{
+	struct arm_smmu_cmdq_ent cmd;
+
+	arm_smmu_atc_inv_to_cmd(ssid, 0, 0, &cmd);
+	return arm_smmu_atc_inv_master(master, &cmd);
+}
+
+static size_t
+arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
+			unsigned long iova, size_t size)
+{
+	unsigned long flags;
+	struct arm_smmu_cmdq_ent cmd;
+	struct arm_smmu_master_data *master;
+
+	arm_smmu_atc_inv_to_cmd(ssid, iova, size, &cmd);
+
+	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+	list_for_each_entry(master, &smmu_domain->devices, list)
+		arm_smmu_atc_inv_master(master, &cmd);
+	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+
+	return size;
+}
+
 /* IOMMU API */
 static bool arm_smmu_capable(enum iommu_cap cap)
 {
@@ -2361,6 +2506,8 @@ static void arm_smmu_detach_dev(struct device *dev)
 		__iommu_process_unbind_dev_all(&smmu_domain->domain, dev);
 
 	if (smmu_domain) {
+		arm_smmu_atc_inv_master_all(master, 0);
+
 		spin_lock_irqsave(&smmu_domain->devices_lock, flags);
 		list_del(&master->list);
 		spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
@@ -2451,12 +2598,19 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
 static size_t
 arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
 {
-	struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
+	int ret;
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
 
 	if (!ops)
 		return 0;
 
-	return ops->unmap(ops, iova, size);
+	ret = ops->unmap(ops, iova, size);
+
+	if (ret && smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS)
+		ret = arm_smmu_atc_inv_domain(smmu_domain, 0, iova, size);
+
+	return ret;
 }
 
 static phys_addr_t
@@ -2752,6 +2906,48 @@ static bool arm_smmu_sid_in_range(struct arm_smmu_device *smmu, u32 sid)
 	return sid < limit;
 }
 
+static int arm_smmu_enable_ats(struct arm_smmu_master_data *master)
+{
+	int ret;
+	size_t stu;
+	struct pci_dev *pdev;
+	struct arm_smmu_device *smmu = master->smmu;
+	struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
+
+	if (!(smmu->features & ARM_SMMU_FEAT_ATS) || !dev_is_pci(master->dev) ||
+	    (fwspec->flags & IOMMU_FWSPEC_PCI_NO_ATS))
+		return -ENOSYS;
+
+	pdev = to_pci_dev(master->dev);
+
+	/* Smallest Translation Unit: log2 of the smallest supported granule */
+	stu = __ffs(smmu->pgsize_bitmap);
+
+	ret = pci_enable_ats(pdev, stu);
+	if (ret)
+		return ret;
+
+	dev_dbg(&pdev->dev, "enabled ATS (STU=%zu, QDEP=%d)\n", stu,
+		pci_ats_queue_depth(pdev));
+
+	return 0;
+}
+
+static void arm_smmu_disable_ats(struct arm_smmu_master_data *master)
+{
+	struct pci_dev *pdev;
+
+	if (!dev_is_pci(master->dev))
+		return;
+
+	pdev = to_pci_dev(master->dev);
+
+	if (!pdev->ats_enabled)
+		return;
+
+	pci_disable_ats(pdev);
+}
+
 static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
 				  struct arm_smmu_master_data *master)
 {
@@ -2874,14 +3070,24 @@ static int arm_smmu_add_device(struct device *dev)
 		master->ste.can_stall = true;
 	}
 
+	arm_smmu_enable_ats(master);
+
 	group = iommu_group_get_for_dev(dev);
-	if (!IS_ERR(group)) {
-		arm_smmu_insert_master(smmu, master);
-		iommu_group_put(group);
-		iommu_device_link(&smmu->iommu, dev);
+	if (IS_ERR(group)) {
+		ret = PTR_ERR(group);
+		goto err_disable_ats;
 	}
 
-	return PTR_ERR_OR_ZERO(group);
+	iommu_group_put(group);
+	arm_smmu_insert_master(smmu, master);
+	iommu_device_link(&smmu->iommu, dev);
+
+	return 0;
+
+err_disable_ats:
+	arm_smmu_disable_ats(master);
+
+	return ret;
 }
 
 static void arm_smmu_remove_device(struct device *dev)
@@ -2898,6 +3104,8 @@ static void arm_smmu_remove_device(struct device *dev)
 	if (master && master->ste.assigned)
 		arm_smmu_detach_dev(dev);
 	arm_smmu_remove_master(smmu, master);
+	arm_smmu_disable_ats(master);
+
 	iommu_group_remove_device(dev);
 	iommu_device_unlink(&smmu->iommu, dev);
 	kfree(master);
@@ -3515,6 +3723,16 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
 		}
 	}
 
+	if (smmu->features & ARM_SMMU_FEAT_ATS && !disable_ats_check) {
+		enables |= CR0_ATSCHK;
+		ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0,
+					      ARM_SMMU_CR0ACK);
+		if (ret) {
+			dev_err(smmu->dev, "failed to enable ATS check\n");
+			return ret;
+		}
+	}
+
 	ret = arm_smmu_setup_irqs(smmu);
 	if (ret) {
 		dev_err(smmu->dev, "failed to setup irqs\n");
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 32/36] iommu/arm-smmu-v3: Hook ATC invalidation to process ops
  2017-10-06 13:31 ` Jean-Philippe Brucker
@ 2017-10-06 13:31   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

The core calls us when a process is modified. Perform the required ATC
invalidations.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index d03bec4d4b82..f591f1974228 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2839,7 +2839,7 @@ static void arm_smmu_process_detach(struct iommu_domain *domain,
 		arm_smmu_write_ctx_desc(smmu_domain, process->pasid, NULL);
 	}
 
-	/* TODO: Invalidate ATC. */
+	arm_smmu_atc_inv_master_all(master, process->pasid);
 	/* TODO: Invalidate all mappings if last and not DVM. */
 }
 
@@ -2847,8 +2847,9 @@ static void arm_smmu_process_invalidate(struct iommu_domain *domain,
 					struct iommu_process *process,
 					unsigned long iova, size_t size)
 {
+	arm_smmu_atc_inv_domain(to_smmu_domain(domain), process->pasid,
+				iova, size);
 	/*
-	 * TODO: Invalidate ATC.
 	 * TODO: Invalidate mapping if not DVM
 	 */
 }
@@ -2871,7 +2872,7 @@ static void arm_smmu_process_exit(struct iommu_domain *domain,
 		domain->process_exit(domain, master->dev, process->pasid,
 				     domain->process_exit_token);
 
-		/* TODO: inval ATC */
+		arm_smmu_atc_inv_master_all(master, process->pasid);
 	}
 	spin_unlock(&smmu_domain->devices_lock);
 
-- 
2.13.3


^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 32/36] iommu/arm-smmu-v3: Hook ATC invalidation to process ops
@ 2017-10-06 13:31   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

The core calls us when a process is modified. Perform the required ATC
invalidations.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index d03bec4d4b82..f591f1974228 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2839,7 +2839,7 @@ static void arm_smmu_process_detach(struct iommu_domain *domain,
 		arm_smmu_write_ctx_desc(smmu_domain, process->pasid, NULL);
 	}
 
-	/* TODO: Invalidate ATC. */
+	arm_smmu_atc_inv_master_all(master, process->pasid);
 	/* TODO: Invalidate all mappings if last and not DVM. */
 }
 
@@ -2847,8 +2847,9 @@ static void arm_smmu_process_invalidate(struct iommu_domain *domain,
 					struct iommu_process *process,
 					unsigned long iova, size_t size)
 {
+	arm_smmu_atc_inv_domain(to_smmu_domain(domain), process->pasid,
+				iova, size);
 	/*
-	 * TODO: Invalidate ATC.
 	 * TODO: Invalidate mapping if not DVM
 	 */
 }
@@ -2871,7 +2872,7 @@ static void arm_smmu_process_exit(struct iommu_domain *domain,
 		domain->process_exit(domain, master->dev, process->pasid,
 				     domain->process_exit_token);
 
-		/* TODO: inval ATC */
+		arm_smmu_atc_inv_master_all(master, process->pasid);
 	}
 	spin_unlock(&smmu_domain->devices_lock);
 
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 33/36] iommu/arm-smmu-v3: Disable tagged pointers
  2017-10-06 13:31 ` Jean-Philippe Brucker
@ 2017-10-06 13:32   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:32 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

The ARM architecture has a "Top Byte Ignore" (TBI) option that makes the
MMU mask out bits [63:56] of an address, allowing a userspace application
to store data in its pointers. This option is incompatible with PCI ATS.

If TBI is enabled in the SMMU and userspace triggers DMA transactions on
tagged pointers, the endpoint might create ATC entries for addresses that
include a tag. Software would then have to send ATC invalidation packets
for each 255 possible alias of an address, or just wipe the whole address
space. This is not a viable option, so disable TBI.

The impact of this change is unclear, since there are very few users of
tagged pointers, much less SVM. But the requirement introduced by this
patch doesn't seem excessive: a userspace application using both tagged
pointers and SVM should now sanitize addresses (clear the tag) before
using them for device DMA.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index f591f1974228..f008b4617cd4 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1332,7 +1332,6 @@ static u64 arm_smmu_cpu_tcr_to_cd(struct arm_smmu_device *smmu, u64 tcr)
 	val |= ARM_SMMU_TCR2CD(tcr, EPD0);
 	val |= ARM_SMMU_TCR2CD(tcr, EPD1);
 	val |= ARM_SMMU_TCR2CD(tcr, IPS);
-	val |= ARM_SMMU_TCR2CD(tcr, TBI0);
 
 	if (smmu->features & ARM_SMMU_FEAT_HA)
 		val |= ARM_SMMU_TCR2CD(tcr, HA);
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 33/36] iommu/arm-smmu-v3: Disable tagged pointers
@ 2017-10-06 13:32   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:32 UTC (permalink / raw)
  To: linux-arm-kernel

The ARM architecture has a "Top Byte Ignore" (TBI) option that makes the
MMU mask out bits [63:56] of an address, allowing a userspace application
to store data in its pointers. This option is incompatible with PCI ATS.

If TBI is enabled in the SMMU and userspace triggers DMA transactions on
tagged pointers, the endpoint might create ATC entries for addresses that
include a tag. Software would then have to send ATC invalidation packets
for each 255 possible alias of an address, or just wipe the whole address
space. This is not a viable option, so disable TBI.

The impact of this change is unclear, since there are very few users of
tagged pointers, much less SVM. But the requirement introduced by this
patch doesn't seem excessive: a userspace application using both tagged
pointers and SVM should now sanitize addresses (clear the tag) before
using them for device DMA.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index f591f1974228..f008b4617cd4 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1332,7 +1332,6 @@ static u64 arm_smmu_cpu_tcr_to_cd(struct arm_smmu_device *smmu, u64 tcr)
 	val |= ARM_SMMU_TCR2CD(tcr, EPD0);
 	val |= ARM_SMMU_TCR2CD(tcr, EPD1);
 	val |= ARM_SMMU_TCR2CD(tcr, IPS);
-	val |= ARM_SMMU_TCR2CD(tcr, TBI0);
 
 	if (smmu->features & ARM_SMMU_FEAT_HA)
 		val |= ARM_SMMU_TCR2CD(tcr, HA);
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 34/36] PCI: Make "PRG Response PASID Required" handling common
  2017-10-06 13:31 ` Jean-Philippe Brucker
  (?)
@ 2017-10-06 13:32     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:32 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: mark.rutland-5wv7dgnIgG8,
	gabriele.paoloni-hv44wF8Li93QT0dZR+AlfA,
	catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	rjw-LthD3rsA81gm4RdzfppkhA, sudeep.holla-5wv7dgnIgG8

The PASID ECN to the PCIe spec added a bit in the PRI status register that
allows a Function to declare whether a PRG Response should contain the
PASID prefix or not.

Move the helper that accesses it from amd_iommu into the PCI subsystem,
renaming it to be consistent with the current spec (PRPR - PRG Response
PASID Required).

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/amd_iommu.c     | 19 +------------------
 drivers/pci/ats.c             | 17 +++++++++++++++++
 include/linux/pci-ats.h       |  8 ++++++++
 include/uapi/linux/pci_regs.h |  1 +
 4 files changed, 27 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 51f8215877f5..45036a253d63 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2039,23 +2039,6 @@ static int pdev_iommuv2_enable(struct pci_dev *pdev)
 	return ret;
 }
 
-/* FIXME: Move this to PCI code */
-#define PCI_PRI_TLP_OFF		(1 << 15)
-
-static bool pci_pri_tlp_required(struct pci_dev *pdev)
-{
-	u16 status;
-	int pos;
-
-	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
-	if (!pos)
-		return false;
-
-	pci_read_config_word(pdev, pos + PCI_PRI_STATUS, &status);
-
-	return (status & PCI_PRI_TLP_OFF) ? true : false;
-}
-
 /*
  * If a device is not yet associated with a domain, this function
  * assigns it visible for the hardware
@@ -2084,7 +2067,7 @@ static int attach_device(struct device *dev,
 
 			dev_data->ats.enabled = true;
 			dev_data->ats.qdep    = pci_ats_queue_depth(pdev);
-			dev_data->pri_tlp     = pci_pri_tlp_required(pdev);
+			dev_data->pri_tlp     = pci_prg_resp_requires_prefix(pdev);
 		}
 	} else if (amd_iommu_iotlb_sup &&
 		   pci_enable_ats(pdev, PAGE_SHIFT) == 0) {
diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index ad8ddbbbf245..f95e42df728b 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -389,3 +389,20 @@ int pci_max_pasids(struct pci_dev *pdev)
 }
 EXPORT_SYMBOL_GPL(pci_max_pasids);
 #endif /* CONFIG_PCI_PASID */
+
+#if defined(CONFIG_PCI_PASID) && defined(CONFIG_PCI_PRI)
+bool pci_prg_resp_requires_prefix(struct pci_dev *pdev)
+{
+	u16 status;
+	int pos;
+
+	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
+	if (!pos)
+		return false;
+
+	pci_read_config_word(pdev, pos + PCI_PRI_STATUS, &status);
+
+	return !!(status & PCI_PRI_STATUS_PRPR);
+}
+EXPORT_SYMBOL_GPL(pci_prg_resp_requires_prefix);
+#endif /* CONFIG_PCI_PASID && CONFIG_PCI_PRI */
diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
index 782fb8e0755f..367ea9448441 100644
--- a/include/linux/pci-ats.h
+++ b/include/linux/pci-ats.h
@@ -67,5 +67,13 @@ static inline int pci_max_pasids(struct pci_dev *pdev)
 
 #endif /* CONFIG_PCI_PASID */
 
+#if defined(CONFIG_PCI_PASID) && defined(CONFIG_PCI_PRI)
+bool pci_prg_resp_requires_prefix(struct pci_dev *pdev);
+#else
+static inline bool pci_prg_resp_requires_prefix(struct pci_dev *pdev)
+{
+	return false;
+}
+#endif /* CONFIG_PCI_PASID && CONFIG_PCI_PRI */
 
 #endif /* LINUX_PCI_ATS_H*/
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index f8d58045926f..a0eeb16a2bfe 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -862,6 +862,7 @@
 #define  PCI_PRI_STATUS_RF	0x001	/* Response Failure */
 #define  PCI_PRI_STATUS_UPRGI	0x002	/* Unexpected PRG index */
 #define  PCI_PRI_STATUS_STOPPED	0x100	/* PRI Stopped */
+#define  PCI_PRI_STATUS_PRPR	0x8000	/* PRG Response requires PASID prefix */
 #define PCI_PRI_MAX_REQ		0x08	/* PRI max reqs supported */
 #define PCI_PRI_ALLOC_REQ	0x0c	/* PRI max reqs allowed */
 #define PCI_EXT_CAP_PRI_SIZEOF	16
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 34/36] PCI: Make "PRG Response PASID Required" handling common
@ 2017-10-06 13:32     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:32 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

The PASID ECN to the PCIe spec added a bit in the PRI status register that
allows a Function to declare whether a PRG Response should contain the
PASID prefix or not.

Move the helper that accesses it from amd_iommu into the PCI subsystem,
renaming it to be consistent with the current spec (PRPR - PRG Response
PASID Required).

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/amd_iommu.c     | 19 +------------------
 drivers/pci/ats.c             | 17 +++++++++++++++++
 include/linux/pci-ats.h       |  8 ++++++++
 include/uapi/linux/pci_regs.h |  1 +
 4 files changed, 27 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 51f8215877f5..45036a253d63 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2039,23 +2039,6 @@ static int pdev_iommuv2_enable(struct pci_dev *pdev)
 	return ret;
 }
 
-/* FIXME: Move this to PCI code */
-#define PCI_PRI_TLP_OFF		(1 << 15)
-
-static bool pci_pri_tlp_required(struct pci_dev *pdev)
-{
-	u16 status;
-	int pos;
-
-	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
-	if (!pos)
-		return false;
-
-	pci_read_config_word(pdev, pos + PCI_PRI_STATUS, &status);
-
-	return (status & PCI_PRI_TLP_OFF) ? true : false;
-}
-
 /*
  * If a device is not yet associated with a domain, this function
  * assigns it visible for the hardware
@@ -2084,7 +2067,7 @@ static int attach_device(struct device *dev,
 
 			dev_data->ats.enabled = true;
 			dev_data->ats.qdep    = pci_ats_queue_depth(pdev);
-			dev_data->pri_tlp     = pci_pri_tlp_required(pdev);
+			dev_data->pri_tlp     = pci_prg_resp_requires_prefix(pdev);
 		}
 	} else if (amd_iommu_iotlb_sup &&
 		   pci_enable_ats(pdev, PAGE_SHIFT) == 0) {
diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index ad8ddbbbf245..f95e42df728b 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -389,3 +389,20 @@ int pci_max_pasids(struct pci_dev *pdev)
 }
 EXPORT_SYMBOL_GPL(pci_max_pasids);
 #endif /* CONFIG_PCI_PASID */
+
+#if defined(CONFIG_PCI_PASID) && defined(CONFIG_PCI_PRI)
+bool pci_prg_resp_requires_prefix(struct pci_dev *pdev)
+{
+	u16 status;
+	int pos;
+
+	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
+	if (!pos)
+		return false;
+
+	pci_read_config_word(pdev, pos + PCI_PRI_STATUS, &status);
+
+	return !!(status & PCI_PRI_STATUS_PRPR);
+}
+EXPORT_SYMBOL_GPL(pci_prg_resp_requires_prefix);
+#endif /* CONFIG_PCI_PASID && CONFIG_PCI_PRI */
diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
index 782fb8e0755f..367ea9448441 100644
--- a/include/linux/pci-ats.h
+++ b/include/linux/pci-ats.h
@@ -67,5 +67,13 @@ static inline int pci_max_pasids(struct pci_dev *pdev)
 
 #endif /* CONFIG_PCI_PASID */
 
+#if defined(CONFIG_PCI_PASID) && defined(CONFIG_PCI_PRI)
+bool pci_prg_resp_requires_prefix(struct pci_dev *pdev);
+#else
+static inline bool pci_prg_resp_requires_prefix(struct pci_dev *pdev)
+{
+	return false;
+}
+#endif /* CONFIG_PCI_PASID && CONFIG_PCI_PRI */
 
 #endif /* LINUX_PCI_ATS_H*/
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index f8d58045926f..a0eeb16a2bfe 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -862,6 +862,7 @@
 #define  PCI_PRI_STATUS_RF	0x001	/* Response Failure */
 #define  PCI_PRI_STATUS_UPRGI	0x002	/* Unexpected PRG index */
 #define  PCI_PRI_STATUS_STOPPED	0x100	/* PRI Stopped */
+#define  PCI_PRI_STATUS_PRPR	0x8000	/* PRG Response requires PASID prefix */
 #define PCI_PRI_MAX_REQ		0x08	/* PRI max reqs supported */
 #define PCI_PRI_ALLOC_REQ	0x0c	/* PRI max reqs allowed */
 #define PCI_EXT_CAP_PRI_SIZEOF	16
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 34/36] PCI: Make "PRG Response PASID Required" handling common
@ 2017-10-06 13:32     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:32 UTC (permalink / raw)
  To: linux-arm-kernel

The PASID ECN to the PCIe spec added a bit in the PRI status register that
allows a Function to declare whether a PRG Response should contain the
PASID prefix or not.

Move the helper that accesses it from amd_iommu into the PCI subsystem,
renaming it to be consistent with the current spec (PRPR - PRG Response
PASID Required).

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/amd_iommu.c     | 19 +------------------
 drivers/pci/ats.c             | 17 +++++++++++++++++
 include/linux/pci-ats.h       |  8 ++++++++
 include/uapi/linux/pci_regs.h |  1 +
 4 files changed, 27 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 51f8215877f5..45036a253d63 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2039,23 +2039,6 @@ static int pdev_iommuv2_enable(struct pci_dev *pdev)
 	return ret;
 }
 
-/* FIXME: Move this to PCI code */
-#define PCI_PRI_TLP_OFF		(1 << 15)
-
-static bool pci_pri_tlp_required(struct pci_dev *pdev)
-{
-	u16 status;
-	int pos;
-
-	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
-	if (!pos)
-		return false;
-
-	pci_read_config_word(pdev, pos + PCI_PRI_STATUS, &status);
-
-	return (status & PCI_PRI_TLP_OFF) ? true : false;
-}
-
 /*
  * If a device is not yet associated with a domain, this function
  * assigns it visible for the hardware
@@ -2084,7 +2067,7 @@ static int attach_device(struct device *dev,
 
 			dev_data->ats.enabled = true;
 			dev_data->ats.qdep    = pci_ats_queue_depth(pdev);
-			dev_data->pri_tlp     = pci_pri_tlp_required(pdev);
+			dev_data->pri_tlp     = pci_prg_resp_requires_prefix(pdev);
 		}
 	} else if (amd_iommu_iotlb_sup &&
 		   pci_enable_ats(pdev, PAGE_SHIFT) == 0) {
diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index ad8ddbbbf245..f95e42df728b 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -389,3 +389,20 @@ int pci_max_pasids(struct pci_dev *pdev)
 }
 EXPORT_SYMBOL_GPL(pci_max_pasids);
 #endif /* CONFIG_PCI_PASID */
+
+#if defined(CONFIG_PCI_PASID) && defined(CONFIG_PCI_PRI)
+bool pci_prg_resp_requires_prefix(struct pci_dev *pdev)
+{
+	u16 status;
+	int pos;
+
+	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
+	if (!pos)
+		return false;
+
+	pci_read_config_word(pdev, pos + PCI_PRI_STATUS, &status);
+
+	return !!(status & PCI_PRI_STATUS_PRPR);
+}
+EXPORT_SYMBOL_GPL(pci_prg_resp_requires_prefix);
+#endif /* CONFIG_PCI_PASID && CONFIG_PCI_PRI */
diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
index 782fb8e0755f..367ea9448441 100644
--- a/include/linux/pci-ats.h
+++ b/include/linux/pci-ats.h
@@ -67,5 +67,13 @@ static inline int pci_max_pasids(struct pci_dev *pdev)
 
 #endif /* CONFIG_PCI_PASID */
 
+#if defined(CONFIG_PCI_PASID) && defined(CONFIG_PCI_PRI)
+bool pci_prg_resp_requires_prefix(struct pci_dev *pdev);
+#else
+static inline bool pci_prg_resp_requires_prefix(struct pci_dev *pdev)
+{
+	return false;
+}
+#endif /* CONFIG_PCI_PASID && CONFIG_PCI_PRI */
 
 #endif /* LINUX_PCI_ATS_H*/
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index f8d58045926f..a0eeb16a2bfe 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -862,6 +862,7 @@
 #define  PCI_PRI_STATUS_RF	0x001	/* Response Failure */
 #define  PCI_PRI_STATUS_UPRGI	0x002	/* Unexpected PRG index */
 #define  PCI_PRI_STATUS_STOPPED	0x100	/* PRI Stopped */
+#define  PCI_PRI_STATUS_PRPR	0x8000	/* PRG Response requires PASID prefix */
 #define PCI_PRI_MAX_REQ		0x08	/* PRI max reqs supported */
 #define PCI_PRI_ALLOC_REQ	0x0c	/* PRI max reqs allowed */
 #define PCI_EXT_CAP_PRI_SIZEOF	16
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 35/36] iommu/arm-smmu-v3: Add support for PRI
  2017-10-06 13:31 ` Jean-Philippe Brucker
  (?)
@ 2017-10-06 13:32     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:32 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: mark.rutland-5wv7dgnIgG8,
	gabriele.paoloni-hv44wF8Li93QT0dZR+AlfA,
	catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	rjw-LthD3rsA81gm4RdzfppkhA, sudeep.holla-5wv7dgnIgG8

For PCI devices that support it, enable the PRI capability and handle
PRI Page Requests with the generic fault handler.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu-v3.c | 176 ++++++++++++++++++++++++++++++--------------
 1 file changed, 122 insertions(+), 54 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index f008b4617cd4..852714f35010 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -272,6 +272,7 @@
 #define STRTAB_STE_1_S1COR_SHIFT	4
 #define STRTAB_STE_1_S1CSH_SHIFT	6
 
+#define STRTAB_STE_1_PPAR		(1UL << 18)
 #define STRTAB_STE_1_S1STALLD		(1UL << 27)
 
 #define STRTAB_STE_1_EATS_ABT		0UL
@@ -426,9 +427,9 @@
 #define CMDQ_PRI_1_GRPID_SHIFT		0
 #define CMDQ_PRI_1_GRPID_MASK		0x1ffUL
 #define CMDQ_PRI_1_RESP_SHIFT		12
-#define CMDQ_PRI_1_RESP_DENY		(0UL << CMDQ_PRI_1_RESP_SHIFT)
-#define CMDQ_PRI_1_RESP_FAIL		(1UL << CMDQ_PRI_1_RESP_SHIFT)
-#define CMDQ_PRI_1_RESP_SUCC		(2UL << CMDQ_PRI_1_RESP_SHIFT)
+#define CMDQ_PRI_1_RESP_FAILURE		(0UL << CMDQ_PRI_1_RESP_SHIFT)
+#define CMDQ_PRI_1_RESP_INVALID		(1UL << CMDQ_PRI_1_RESP_SHIFT)
+#define CMDQ_PRI_1_RESP_SUCCESS		(2UL << CMDQ_PRI_1_RESP_SHIFT)
 
 #define CMDQ_RESUME_0_SID_SHIFT		32
 #define CMDQ_RESUME_0_SID_MASK		0xffffffffUL
@@ -504,6 +505,7 @@
 
 /* Flags for iommu_data in iommu_fault */
 #define ARM_SMMU_FAULT_STALL		(1 << 0)
+#define ARM_SMMU_FAULT_RESP_PASID	(1 << 1);
 
 /* Until ACPICA headers cover IORT rev. C */
 #ifndef ACPI_IORT_SMMU_HISILICON_HI161X
@@ -524,12 +526,6 @@ module_param_named(disable_ats_check, disable_ats_check, bool, S_IRUGO);
 MODULE_PARM_DESC(disable_ats_check,
 	"By default, the SMMU checks whether each incoming transaction marked as translated is allowed by the stream configuration. This option disables the check.");
 
-enum pri_resp {
-	PRI_RESP_DENY,
-	PRI_RESP_FAIL,
-	PRI_RESP_SUCC,
-};
-
 enum arm_smmu_msi_index {
 	EVTQ_MSI_INDEX,
 	GERROR_MSI_INDEX,
@@ -613,7 +609,7 @@ struct arm_smmu_cmdq_ent {
 			u32			sid;
 			u32			ssid;
 			u16			grpid;
-			enum pri_resp		resp;
+			enum iommu_fault_status	resp;
 		} pri;
 
 		#define CMDQ_OP_RESUME		0x44
@@ -720,6 +716,7 @@ struct arm_smmu_strtab_ent {
 	struct arm_smmu_s2_cfg		*s2_cfg;
 
 	bool				can_stall;
+	bool				prg_resp_needs_ssid;
 };
 
 struct arm_smmu_strtab_cfg {
@@ -1078,14 +1075,14 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 		cmd[0] |= (u64)ent->pri.sid << CMDQ_PRI_0_SID_SHIFT;
 		cmd[1] |= ent->pri.grpid << CMDQ_PRI_1_GRPID_SHIFT;
 		switch (ent->pri.resp) {
-		case PRI_RESP_DENY:
-			cmd[1] |= CMDQ_PRI_1_RESP_DENY;
+		case IOMMU_FAULT_STATUS_FAILURE:
+			cmd[1] |= CMDQ_PRI_1_RESP_FAILURE;
 			break;
-		case PRI_RESP_FAIL:
-			cmd[1] |= CMDQ_PRI_1_RESP_FAIL;
+		case IOMMU_FAULT_STATUS_INVALID:
+			cmd[1] |= CMDQ_PRI_1_RESP_INVALID;
 			break;
-		case PRI_RESP_SUCC:
-			cmd[1] |= CMDQ_PRI_1_RESP_SUCC;
+		case IOMMU_FAULT_STATUS_HANDLED:
+			cmd[1] |= CMDQ_PRI_1_RESP_SUCCESS;
 			break;
 		default:
 			return -EINVAL;
@@ -1214,8 +1211,13 @@ static int arm_smmu_fault_response(struct iommu_domain *domain,
 		cmd.resume.stag		= fault->id;
 		cmd.resume.resp		= resp;
 	} else {
-		/* TODO: put PRI response here */
-		return -EINVAL;
+		cmd.opcode		= CMDQ_OP_PRI_RESP;
+		cmd.substream_valid	= fault->iommu_data &
+					  ARM_SMMU_FAULT_RESP_PASID;
+		cmd.pri.sid		= sid;
+		cmd.pri.ssid		= fault->pasid;
+		cmd.pri.grpid		= fault->id;
+		cmd.pri.resp		= resp;
 	}
 
 	arm_smmu_cmdq_issue_cmd(smmu_domain->smmu, &cmd);
@@ -1631,6 +1633,9 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 			  STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1) <<
 			 STRTAB_STE_1_STRW_SHIFT);
 
+		if (ste->prg_resp_needs_ssid)
+			dst[1] |= STRTAB_STE_1_PPAR;
+
 		if (smmu->features & ARM_SMMU_FEAT_STALLS &&
 		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE) &&
 		   !ste->can_stall)
@@ -1856,40 +1861,42 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 
 static void arm_smmu_handle_ppr(struct arm_smmu_device *smmu, u64 *evt)
 {
-	u32 sid, ssid;
-	u16 grpid;
-	bool ssv, last;
-
-	sid = evt[0] >> PRIQ_0_SID_SHIFT & PRIQ_0_SID_MASK;
-	ssv = evt[0] & PRIQ_0_SSID_V;
-	ssid = ssv ? evt[0] >> PRIQ_0_SSID_SHIFT & PRIQ_0_SSID_MASK : 0;
-	last = evt[0] & PRIQ_0_PRG_LAST;
-	grpid = evt[1] >> PRIQ_1_PRG_IDX_SHIFT & PRIQ_1_PRG_IDX_MASK;
-
-	dev_info(smmu->dev, "unexpected PRI request received:\n");
-	dev_info(smmu->dev,
-		 "\tsid 0x%08x.0x%05x: [%u%s] %sprivileged %s%s%s access at iova 0x%016llx\n",
-		 sid, ssid, grpid, last ? "L" : "",
-		 evt[0] & PRIQ_0_PERM_PRIV ? "" : "un",
-		 evt[0] & PRIQ_0_PERM_READ ? "R" : "",
-		 evt[0] & PRIQ_0_PERM_WRITE ? "W" : "",
-		 evt[0] & PRIQ_0_PERM_EXEC ? "X" : "",
-		 evt[1] & PRIQ_1_ADDR_MASK << PRIQ_1_ADDR_SHIFT);
+	u32 sid = evt[0] >> PRIQ_0_SID_SHIFT & PRIQ_0_SID_MASK;
 
-	if (last) {
-		struct arm_smmu_cmdq_ent cmd = {
-			.opcode			= CMDQ_OP_PRI_RESP,
-			.substream_valid	= ssv,
-			.pri			= {
-				.sid	= sid,
-				.ssid	= ssid,
-				.grpid	= grpid,
-				.resp	= PRI_RESP_DENY,
-			},
-		};
+	struct arm_smmu_master_data *master;
+	struct iommu_domain *domain;
+	struct iommu_fault fault = {
+		.pasid		= evt[0] >> PRIQ_0_SSID_SHIFT & PRIQ_0_SSID_MASK,
+		.id		= evt[1] >> PRIQ_1_PRG_IDX_SHIFT & PRIQ_1_PRG_IDX_MASK,
+		.address	= evt[1] & PRIQ_1_ADDR_MASK << PRIQ_1_ADDR_SHIFT,
+		.flags		= IOMMU_FAULT_GROUP | IOMMU_FAULT_RECOVERABLE,
+	};
 
-		arm_smmu_cmdq_issue_cmd(smmu, &cmd);
-	}
+	if (evt[0] & PRIQ_0_SSID_V)
+		fault.flags |= IOMMU_FAULT_PASID;
+	if (evt[0] & PRIQ_0_PRG_LAST)
+		fault.flags |= IOMMU_FAULT_LAST;
+	if (evt[0] & PRIQ_0_PERM_READ)
+		fault.flags |= IOMMU_FAULT_READ;
+	if (evt[0] & PRIQ_0_PERM_WRITE)
+		fault.flags |= IOMMU_FAULT_WRITE;
+	if (evt[0] & PRIQ_0_PERM_EXEC)
+		fault.flags |= IOMMU_FAULT_EXEC;
+	if (evt[0] & PRIQ_0_PERM_PRIV)
+		fault.flags |= IOMMU_FAULT_PRIV;
+
+	master = arm_smmu_find_master(smmu, sid);
+	if (WARN_ON(!master))
+		return;
+
+	if (fault.flags & IOMMU_FAULT_PASID && master->ste.prg_resp_needs_ssid)
+		fault.iommu_data |= ARM_SMMU_FAULT_RESP_PASID;
+
+	domain = iommu_get_domain_for_dev(master->dev);
+	if (WARN_ON(!domain))
+		return;
+
+	handle_iommu_fault(domain, master->dev, &fault);
 }
 
 static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
@@ -1967,7 +1974,8 @@ static int arm_smmu_flush_queues(struct notifier_block *nb,
 	if (master) {
 		if (master->ste.can_stall)
 			arm_smmu_flush_queue(smmu, &smmu->evtq.q, "evtq");
-		/* TODO: add support for PRI */
+		else if (master->can_fault)
+			arm_smmu_flush_queue(smmu, &smmu->priq.q, "priq");
 		return 0;
 	}
 
@@ -2933,6 +2941,46 @@ static int arm_smmu_enable_ats(struct arm_smmu_master_data *master)
 	return 0;
 }
 
+static int arm_smmu_enable_pri(struct arm_smmu_master_data *master)
+{
+	int ret, pos;
+	struct pci_dev *pdev;
+	/*
+	 * TODO: find a good inflight PPR number. We should divide the PRI queue
+	 * by the number of PRI-capable devices, but it's impossible to know
+	 * about current and future (hotplugged) devices. So we're at risk of
+	 * dropping PPRs (and leaking pending requests in the FQ).
+	 */
+	size_t max_inflight_pprs = 16;
+	struct arm_smmu_device *smmu = master->smmu;
+
+	if (!(smmu->features & ARM_SMMU_FEAT_PRI) || !dev_is_pci(master->dev))
+		return -ENOSYS;
+
+	pdev = to_pci_dev(master->dev);
+
+	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
+	if (!pos)
+		return -ENOSYS;
+
+	ret = pci_reset_pri(pdev);
+	if (ret)
+		return ret;
+
+	ret = pci_enable_pri(pdev, max_inflight_pprs);
+	if (ret) {
+		dev_err(master->dev, "cannot enable PRI: %d\n", ret);
+		return ret;
+	}
+
+	master->can_fault = true;
+	master->ste.prg_resp_needs_ssid = pci_prg_resp_requires_prefix(pdev);
+
+	dev_dbg(master->dev, "enabled PRI");
+
+	return 0;
+}
+
 static void arm_smmu_disable_ats(struct arm_smmu_master_data *master)
 {
 	struct pci_dev *pdev;
@@ -2948,6 +2996,22 @@ static void arm_smmu_disable_ats(struct arm_smmu_master_data *master)
 	pci_disable_ats(pdev);
 }
 
+static void arm_smmu_disable_pri(struct arm_smmu_master_data *master)
+{
+	struct pci_dev *pdev;
+
+	if (!dev_is_pci(master->dev))
+		return;
+
+	pdev = to_pci_dev(master->dev);
+
+	if (!pdev->pri_enabled)
+		return;
+
+	pci_disable_pri(pdev);
+	master->can_fault = false;
+}
+
 static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
 				  struct arm_smmu_master_data *master)
 {
@@ -3070,12 +3134,13 @@ static int arm_smmu_add_device(struct device *dev)
 		master->ste.can_stall = true;
 	}
 
-	arm_smmu_enable_ats(master);
+	if (!arm_smmu_enable_ats(master))
+		arm_smmu_enable_pri(master);
 
 	group = iommu_group_get_for_dev(dev);
 	if (IS_ERR(group)) {
 		ret = PTR_ERR(group);
-		goto err_disable_ats;
+		goto err_disable_pri;
 	}
 
 	iommu_group_put(group);
@@ -3084,7 +3149,8 @@ static int arm_smmu_add_device(struct device *dev)
 
 	return 0;
 
-err_disable_ats:
+err_disable_pri:
+	arm_smmu_disable_pri(master);
 	arm_smmu_disable_ats(master);
 
 	return ret;
@@ -3104,6 +3170,8 @@ static void arm_smmu_remove_device(struct device *dev)
 	if (master && master->ste.assigned)
 		arm_smmu_detach_dev(dev);
 	arm_smmu_remove_master(smmu, master);
+
+	arm_smmu_disable_pri(master);
 	arm_smmu_disable_ats(master);
 
 	iommu_group_remove_device(dev);
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 35/36] iommu/arm-smmu-v3: Add support for PRI
@ 2017-10-06 13:32     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:32 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

For PCI devices that support it, enable the PRI capability and handle
PRI Page Requests with the generic fault handler.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 176 ++++++++++++++++++++++++++++++--------------
 1 file changed, 122 insertions(+), 54 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index f008b4617cd4..852714f35010 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -272,6 +272,7 @@
 #define STRTAB_STE_1_S1COR_SHIFT	4
 #define STRTAB_STE_1_S1CSH_SHIFT	6
 
+#define STRTAB_STE_1_PPAR		(1UL << 18)
 #define STRTAB_STE_1_S1STALLD		(1UL << 27)
 
 #define STRTAB_STE_1_EATS_ABT		0UL
@@ -426,9 +427,9 @@
 #define CMDQ_PRI_1_GRPID_SHIFT		0
 #define CMDQ_PRI_1_GRPID_MASK		0x1ffUL
 #define CMDQ_PRI_1_RESP_SHIFT		12
-#define CMDQ_PRI_1_RESP_DENY		(0UL << CMDQ_PRI_1_RESP_SHIFT)
-#define CMDQ_PRI_1_RESP_FAIL		(1UL << CMDQ_PRI_1_RESP_SHIFT)
-#define CMDQ_PRI_1_RESP_SUCC		(2UL << CMDQ_PRI_1_RESP_SHIFT)
+#define CMDQ_PRI_1_RESP_FAILURE		(0UL << CMDQ_PRI_1_RESP_SHIFT)
+#define CMDQ_PRI_1_RESP_INVALID		(1UL << CMDQ_PRI_1_RESP_SHIFT)
+#define CMDQ_PRI_1_RESP_SUCCESS		(2UL << CMDQ_PRI_1_RESP_SHIFT)
 
 #define CMDQ_RESUME_0_SID_SHIFT		32
 #define CMDQ_RESUME_0_SID_MASK		0xffffffffUL
@@ -504,6 +505,7 @@
 
 /* Flags for iommu_data in iommu_fault */
 #define ARM_SMMU_FAULT_STALL		(1 << 0)
+#define ARM_SMMU_FAULT_RESP_PASID	(1 << 1);
 
 /* Until ACPICA headers cover IORT rev. C */
 #ifndef ACPI_IORT_SMMU_HISILICON_HI161X
@@ -524,12 +526,6 @@ module_param_named(disable_ats_check, disable_ats_check, bool, S_IRUGO);
 MODULE_PARM_DESC(disable_ats_check,
 	"By default, the SMMU checks whether each incoming transaction marked as translated is allowed by the stream configuration. This option disables the check.");
 
-enum pri_resp {
-	PRI_RESP_DENY,
-	PRI_RESP_FAIL,
-	PRI_RESP_SUCC,
-};
-
 enum arm_smmu_msi_index {
 	EVTQ_MSI_INDEX,
 	GERROR_MSI_INDEX,
@@ -613,7 +609,7 @@ struct arm_smmu_cmdq_ent {
 			u32			sid;
 			u32			ssid;
 			u16			grpid;
-			enum pri_resp		resp;
+			enum iommu_fault_status	resp;
 		} pri;
 
 		#define CMDQ_OP_RESUME		0x44
@@ -720,6 +716,7 @@ struct arm_smmu_strtab_ent {
 	struct arm_smmu_s2_cfg		*s2_cfg;
 
 	bool				can_stall;
+	bool				prg_resp_needs_ssid;
 };
 
 struct arm_smmu_strtab_cfg {
@@ -1078,14 +1075,14 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 		cmd[0] |= (u64)ent->pri.sid << CMDQ_PRI_0_SID_SHIFT;
 		cmd[1] |= ent->pri.grpid << CMDQ_PRI_1_GRPID_SHIFT;
 		switch (ent->pri.resp) {
-		case PRI_RESP_DENY:
-			cmd[1] |= CMDQ_PRI_1_RESP_DENY;
+		case IOMMU_FAULT_STATUS_FAILURE:
+			cmd[1] |= CMDQ_PRI_1_RESP_FAILURE;
 			break;
-		case PRI_RESP_FAIL:
-			cmd[1] |= CMDQ_PRI_1_RESP_FAIL;
+		case IOMMU_FAULT_STATUS_INVALID:
+			cmd[1] |= CMDQ_PRI_1_RESP_INVALID;
 			break;
-		case PRI_RESP_SUCC:
-			cmd[1] |= CMDQ_PRI_1_RESP_SUCC;
+		case IOMMU_FAULT_STATUS_HANDLED:
+			cmd[1] |= CMDQ_PRI_1_RESP_SUCCESS;
 			break;
 		default:
 			return -EINVAL;
@@ -1214,8 +1211,13 @@ static int arm_smmu_fault_response(struct iommu_domain *domain,
 		cmd.resume.stag		= fault->id;
 		cmd.resume.resp		= resp;
 	} else {
-		/* TODO: put PRI response here */
-		return -EINVAL;
+		cmd.opcode		= CMDQ_OP_PRI_RESP;
+		cmd.substream_valid	= fault->iommu_data &
+					  ARM_SMMU_FAULT_RESP_PASID;
+		cmd.pri.sid		= sid;
+		cmd.pri.ssid		= fault->pasid;
+		cmd.pri.grpid		= fault->id;
+		cmd.pri.resp		= resp;
 	}
 
 	arm_smmu_cmdq_issue_cmd(smmu_domain->smmu, &cmd);
@@ -1631,6 +1633,9 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 			  STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1) <<
 			 STRTAB_STE_1_STRW_SHIFT);
 
+		if (ste->prg_resp_needs_ssid)
+			dst[1] |= STRTAB_STE_1_PPAR;
+
 		if (smmu->features & ARM_SMMU_FEAT_STALLS &&
 		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE) &&
 		   !ste->can_stall)
@@ -1856,40 +1861,42 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 
 static void arm_smmu_handle_ppr(struct arm_smmu_device *smmu, u64 *evt)
 {
-	u32 sid, ssid;
-	u16 grpid;
-	bool ssv, last;
-
-	sid = evt[0] >> PRIQ_0_SID_SHIFT & PRIQ_0_SID_MASK;
-	ssv = evt[0] & PRIQ_0_SSID_V;
-	ssid = ssv ? evt[0] >> PRIQ_0_SSID_SHIFT & PRIQ_0_SSID_MASK : 0;
-	last = evt[0] & PRIQ_0_PRG_LAST;
-	grpid = evt[1] >> PRIQ_1_PRG_IDX_SHIFT & PRIQ_1_PRG_IDX_MASK;
-
-	dev_info(smmu->dev, "unexpected PRI request received:\n");
-	dev_info(smmu->dev,
-		 "\tsid 0x%08x.0x%05x: [%u%s] %sprivileged %s%s%s access at iova 0x%016llx\n",
-		 sid, ssid, grpid, last ? "L" : "",
-		 evt[0] & PRIQ_0_PERM_PRIV ? "" : "un",
-		 evt[0] & PRIQ_0_PERM_READ ? "R" : "",
-		 evt[0] & PRIQ_0_PERM_WRITE ? "W" : "",
-		 evt[0] & PRIQ_0_PERM_EXEC ? "X" : "",
-		 evt[1] & PRIQ_1_ADDR_MASK << PRIQ_1_ADDR_SHIFT);
+	u32 sid = evt[0] >> PRIQ_0_SID_SHIFT & PRIQ_0_SID_MASK;
 
-	if (last) {
-		struct arm_smmu_cmdq_ent cmd = {
-			.opcode			= CMDQ_OP_PRI_RESP,
-			.substream_valid	= ssv,
-			.pri			= {
-				.sid	= sid,
-				.ssid	= ssid,
-				.grpid	= grpid,
-				.resp	= PRI_RESP_DENY,
-			},
-		};
+	struct arm_smmu_master_data *master;
+	struct iommu_domain *domain;
+	struct iommu_fault fault = {
+		.pasid		= evt[0] >> PRIQ_0_SSID_SHIFT & PRIQ_0_SSID_MASK,
+		.id		= evt[1] >> PRIQ_1_PRG_IDX_SHIFT & PRIQ_1_PRG_IDX_MASK,
+		.address	= evt[1] & PRIQ_1_ADDR_MASK << PRIQ_1_ADDR_SHIFT,
+		.flags		= IOMMU_FAULT_GROUP | IOMMU_FAULT_RECOVERABLE,
+	};
 
-		arm_smmu_cmdq_issue_cmd(smmu, &cmd);
-	}
+	if (evt[0] & PRIQ_0_SSID_V)
+		fault.flags |= IOMMU_FAULT_PASID;
+	if (evt[0] & PRIQ_0_PRG_LAST)
+		fault.flags |= IOMMU_FAULT_LAST;
+	if (evt[0] & PRIQ_0_PERM_READ)
+		fault.flags |= IOMMU_FAULT_READ;
+	if (evt[0] & PRIQ_0_PERM_WRITE)
+		fault.flags |= IOMMU_FAULT_WRITE;
+	if (evt[0] & PRIQ_0_PERM_EXEC)
+		fault.flags |= IOMMU_FAULT_EXEC;
+	if (evt[0] & PRIQ_0_PERM_PRIV)
+		fault.flags |= IOMMU_FAULT_PRIV;
+
+	master = arm_smmu_find_master(smmu, sid);
+	if (WARN_ON(!master))
+		return;
+
+	if (fault.flags & IOMMU_FAULT_PASID && master->ste.prg_resp_needs_ssid)
+		fault.iommu_data |= ARM_SMMU_FAULT_RESP_PASID;
+
+	domain = iommu_get_domain_for_dev(master->dev);
+	if (WARN_ON(!domain))
+		return;
+
+	handle_iommu_fault(domain, master->dev, &fault);
 }
 
 static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
@@ -1967,7 +1974,8 @@ static int arm_smmu_flush_queues(struct notifier_block *nb,
 	if (master) {
 		if (master->ste.can_stall)
 			arm_smmu_flush_queue(smmu, &smmu->evtq.q, "evtq");
-		/* TODO: add support for PRI */
+		else if (master->can_fault)
+			arm_smmu_flush_queue(smmu, &smmu->priq.q, "priq");
 		return 0;
 	}
 
@@ -2933,6 +2941,46 @@ static int arm_smmu_enable_ats(struct arm_smmu_master_data *master)
 	return 0;
 }
 
+static int arm_smmu_enable_pri(struct arm_smmu_master_data *master)
+{
+	int ret, pos;
+	struct pci_dev *pdev;
+	/*
+	 * TODO: find a good inflight PPR number. We should divide the PRI queue
+	 * by the number of PRI-capable devices, but it's impossible to know
+	 * about current and future (hotplugged) devices. So we're at risk of
+	 * dropping PPRs (and leaking pending requests in the FQ).
+	 */
+	size_t max_inflight_pprs = 16;
+	struct arm_smmu_device *smmu = master->smmu;
+
+	if (!(smmu->features & ARM_SMMU_FEAT_PRI) || !dev_is_pci(master->dev))
+		return -ENOSYS;
+
+	pdev = to_pci_dev(master->dev);
+
+	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
+	if (!pos)
+		return -ENOSYS;
+
+	ret = pci_reset_pri(pdev);
+	if (ret)
+		return ret;
+
+	ret = pci_enable_pri(pdev, max_inflight_pprs);
+	if (ret) {
+		dev_err(master->dev, "cannot enable PRI: %d\n", ret);
+		return ret;
+	}
+
+	master->can_fault = true;
+	master->ste.prg_resp_needs_ssid = pci_prg_resp_requires_prefix(pdev);
+
+	dev_dbg(master->dev, "enabled PRI");
+
+	return 0;
+}
+
 static void arm_smmu_disable_ats(struct arm_smmu_master_data *master)
 {
 	struct pci_dev *pdev;
@@ -2948,6 +2996,22 @@ static void arm_smmu_disable_ats(struct arm_smmu_master_data *master)
 	pci_disable_ats(pdev);
 }
 
+static void arm_smmu_disable_pri(struct arm_smmu_master_data *master)
+{
+	struct pci_dev *pdev;
+
+	if (!dev_is_pci(master->dev))
+		return;
+
+	pdev = to_pci_dev(master->dev);
+
+	if (!pdev->pri_enabled)
+		return;
+
+	pci_disable_pri(pdev);
+	master->can_fault = false;
+}
+
 static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
 				  struct arm_smmu_master_data *master)
 {
@@ -3070,12 +3134,13 @@ static int arm_smmu_add_device(struct device *dev)
 		master->ste.can_stall = true;
 	}
 
-	arm_smmu_enable_ats(master);
+	if (!arm_smmu_enable_ats(master))
+		arm_smmu_enable_pri(master);
 
 	group = iommu_group_get_for_dev(dev);
 	if (IS_ERR(group)) {
 		ret = PTR_ERR(group);
-		goto err_disable_ats;
+		goto err_disable_pri;
 	}
 
 	iommu_group_put(group);
@@ -3084,7 +3149,8 @@ static int arm_smmu_add_device(struct device *dev)
 
 	return 0;
 
-err_disable_ats:
+err_disable_pri:
+	arm_smmu_disable_pri(master);
 	arm_smmu_disable_ats(master);
 
 	return ret;
@@ -3104,6 +3170,8 @@ static void arm_smmu_remove_device(struct device *dev)
 	if (master && master->ste.assigned)
 		arm_smmu_detach_dev(dev);
 	arm_smmu_remove_master(smmu, master);
+
+	arm_smmu_disable_pri(master);
 	arm_smmu_disable_ats(master);
 
 	iommu_group_remove_device(dev);
-- 
2.13.3


^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 35/36] iommu/arm-smmu-v3: Add support for PRI
@ 2017-10-06 13:32     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:32 UTC (permalink / raw)
  To: linux-arm-kernel

For PCI devices that support it, enable the PRI capability and handle
PRI Page Requests with the generic fault handler.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 176 ++++++++++++++++++++++++++++++--------------
 1 file changed, 122 insertions(+), 54 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index f008b4617cd4..852714f35010 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -272,6 +272,7 @@
 #define STRTAB_STE_1_S1COR_SHIFT	4
 #define STRTAB_STE_1_S1CSH_SHIFT	6
 
+#define STRTAB_STE_1_PPAR		(1UL << 18)
 #define STRTAB_STE_1_S1STALLD		(1UL << 27)
 
 #define STRTAB_STE_1_EATS_ABT		0UL
@@ -426,9 +427,9 @@
 #define CMDQ_PRI_1_GRPID_SHIFT		0
 #define CMDQ_PRI_1_GRPID_MASK		0x1ffUL
 #define CMDQ_PRI_1_RESP_SHIFT		12
-#define CMDQ_PRI_1_RESP_DENY		(0UL << CMDQ_PRI_1_RESP_SHIFT)
-#define CMDQ_PRI_1_RESP_FAIL		(1UL << CMDQ_PRI_1_RESP_SHIFT)
-#define CMDQ_PRI_1_RESP_SUCC		(2UL << CMDQ_PRI_1_RESP_SHIFT)
+#define CMDQ_PRI_1_RESP_FAILURE		(0UL << CMDQ_PRI_1_RESP_SHIFT)
+#define CMDQ_PRI_1_RESP_INVALID		(1UL << CMDQ_PRI_1_RESP_SHIFT)
+#define CMDQ_PRI_1_RESP_SUCCESS		(2UL << CMDQ_PRI_1_RESP_SHIFT)
 
 #define CMDQ_RESUME_0_SID_SHIFT		32
 #define CMDQ_RESUME_0_SID_MASK		0xffffffffUL
@@ -504,6 +505,7 @@
 
 /* Flags for iommu_data in iommu_fault */
 #define ARM_SMMU_FAULT_STALL		(1 << 0)
+#define ARM_SMMU_FAULT_RESP_PASID	(1 << 1);
 
 /* Until ACPICA headers cover IORT rev. C */
 #ifndef ACPI_IORT_SMMU_HISILICON_HI161X
@@ -524,12 +526,6 @@ module_param_named(disable_ats_check, disable_ats_check, bool, S_IRUGO);
 MODULE_PARM_DESC(disable_ats_check,
 	"By default, the SMMU checks whether each incoming transaction marked as translated is allowed by the stream configuration. This option disables the check.");
 
-enum pri_resp {
-	PRI_RESP_DENY,
-	PRI_RESP_FAIL,
-	PRI_RESP_SUCC,
-};
-
 enum arm_smmu_msi_index {
 	EVTQ_MSI_INDEX,
 	GERROR_MSI_INDEX,
@@ -613,7 +609,7 @@ struct arm_smmu_cmdq_ent {
 			u32			sid;
 			u32			ssid;
 			u16			grpid;
-			enum pri_resp		resp;
+			enum iommu_fault_status	resp;
 		} pri;
 
 		#define CMDQ_OP_RESUME		0x44
@@ -720,6 +716,7 @@ struct arm_smmu_strtab_ent {
 	struct arm_smmu_s2_cfg		*s2_cfg;
 
 	bool				can_stall;
+	bool				prg_resp_needs_ssid;
 };
 
 struct arm_smmu_strtab_cfg {
@@ -1078,14 +1075,14 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 		cmd[0] |= (u64)ent->pri.sid << CMDQ_PRI_0_SID_SHIFT;
 		cmd[1] |= ent->pri.grpid << CMDQ_PRI_1_GRPID_SHIFT;
 		switch (ent->pri.resp) {
-		case PRI_RESP_DENY:
-			cmd[1] |= CMDQ_PRI_1_RESP_DENY;
+		case IOMMU_FAULT_STATUS_FAILURE:
+			cmd[1] |= CMDQ_PRI_1_RESP_FAILURE;
 			break;
-		case PRI_RESP_FAIL:
-			cmd[1] |= CMDQ_PRI_1_RESP_FAIL;
+		case IOMMU_FAULT_STATUS_INVALID:
+			cmd[1] |= CMDQ_PRI_1_RESP_INVALID;
 			break;
-		case PRI_RESP_SUCC:
-			cmd[1] |= CMDQ_PRI_1_RESP_SUCC;
+		case IOMMU_FAULT_STATUS_HANDLED:
+			cmd[1] |= CMDQ_PRI_1_RESP_SUCCESS;
 			break;
 		default:
 			return -EINVAL;
@@ -1214,8 +1211,13 @@ static int arm_smmu_fault_response(struct iommu_domain *domain,
 		cmd.resume.stag		= fault->id;
 		cmd.resume.resp		= resp;
 	} else {
-		/* TODO: put PRI response here */
-		return -EINVAL;
+		cmd.opcode		= CMDQ_OP_PRI_RESP;
+		cmd.substream_valid	= fault->iommu_data &
+					  ARM_SMMU_FAULT_RESP_PASID;
+		cmd.pri.sid		= sid;
+		cmd.pri.ssid		= fault->pasid;
+		cmd.pri.grpid		= fault->id;
+		cmd.pri.resp		= resp;
 	}
 
 	arm_smmu_cmdq_issue_cmd(smmu_domain->smmu, &cmd);
@@ -1631,6 +1633,9 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 			  STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1) <<
 			 STRTAB_STE_1_STRW_SHIFT);
 
+		if (ste->prg_resp_needs_ssid)
+			dst[1] |= STRTAB_STE_1_PPAR;
+
 		if (smmu->features & ARM_SMMU_FEAT_STALLS &&
 		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE) &&
 		   !ste->can_stall)
@@ -1856,40 +1861,42 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 
 static void arm_smmu_handle_ppr(struct arm_smmu_device *smmu, u64 *evt)
 {
-	u32 sid, ssid;
-	u16 grpid;
-	bool ssv, last;
-
-	sid = evt[0] >> PRIQ_0_SID_SHIFT & PRIQ_0_SID_MASK;
-	ssv = evt[0] & PRIQ_0_SSID_V;
-	ssid = ssv ? evt[0] >> PRIQ_0_SSID_SHIFT & PRIQ_0_SSID_MASK : 0;
-	last = evt[0] & PRIQ_0_PRG_LAST;
-	grpid = evt[1] >> PRIQ_1_PRG_IDX_SHIFT & PRIQ_1_PRG_IDX_MASK;
-
-	dev_info(smmu->dev, "unexpected PRI request received:\n");
-	dev_info(smmu->dev,
-		 "\tsid 0x%08x.0x%05x: [%u%s] %sprivileged %s%s%s access at iova 0x%016llx\n",
-		 sid, ssid, grpid, last ? "L" : "",
-		 evt[0] & PRIQ_0_PERM_PRIV ? "" : "un",
-		 evt[0] & PRIQ_0_PERM_READ ? "R" : "",
-		 evt[0] & PRIQ_0_PERM_WRITE ? "W" : "",
-		 evt[0] & PRIQ_0_PERM_EXEC ? "X" : "",
-		 evt[1] & PRIQ_1_ADDR_MASK << PRIQ_1_ADDR_SHIFT);
+	u32 sid = evt[0] >> PRIQ_0_SID_SHIFT & PRIQ_0_SID_MASK;
 
-	if (last) {
-		struct arm_smmu_cmdq_ent cmd = {
-			.opcode			= CMDQ_OP_PRI_RESP,
-			.substream_valid	= ssv,
-			.pri			= {
-				.sid	= sid,
-				.ssid	= ssid,
-				.grpid	= grpid,
-				.resp	= PRI_RESP_DENY,
-			},
-		};
+	struct arm_smmu_master_data *master;
+	struct iommu_domain *domain;
+	struct iommu_fault fault = {
+		.pasid		= evt[0] >> PRIQ_0_SSID_SHIFT & PRIQ_0_SSID_MASK,
+		.id		= evt[1] >> PRIQ_1_PRG_IDX_SHIFT & PRIQ_1_PRG_IDX_MASK,
+		.address	= evt[1] & PRIQ_1_ADDR_MASK << PRIQ_1_ADDR_SHIFT,
+		.flags		= IOMMU_FAULT_GROUP | IOMMU_FAULT_RECOVERABLE,
+	};
 
-		arm_smmu_cmdq_issue_cmd(smmu, &cmd);
-	}
+	if (evt[0] & PRIQ_0_SSID_V)
+		fault.flags |= IOMMU_FAULT_PASID;
+	if (evt[0] & PRIQ_0_PRG_LAST)
+		fault.flags |= IOMMU_FAULT_LAST;
+	if (evt[0] & PRIQ_0_PERM_READ)
+		fault.flags |= IOMMU_FAULT_READ;
+	if (evt[0] & PRIQ_0_PERM_WRITE)
+		fault.flags |= IOMMU_FAULT_WRITE;
+	if (evt[0] & PRIQ_0_PERM_EXEC)
+		fault.flags |= IOMMU_FAULT_EXEC;
+	if (evt[0] & PRIQ_0_PERM_PRIV)
+		fault.flags |= IOMMU_FAULT_PRIV;
+
+	master = arm_smmu_find_master(smmu, sid);
+	if (WARN_ON(!master))
+		return;
+
+	if (fault.flags & IOMMU_FAULT_PASID && master->ste.prg_resp_needs_ssid)
+		fault.iommu_data |= ARM_SMMU_FAULT_RESP_PASID;
+
+	domain = iommu_get_domain_for_dev(master->dev);
+	if (WARN_ON(!domain))
+		return;
+
+	handle_iommu_fault(domain, master->dev, &fault);
 }
 
 static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
@@ -1967,7 +1974,8 @@ static int arm_smmu_flush_queues(struct notifier_block *nb,
 	if (master) {
 		if (master->ste.can_stall)
 			arm_smmu_flush_queue(smmu, &smmu->evtq.q, "evtq");
-		/* TODO: add support for PRI */
+		else if (master->can_fault)
+			arm_smmu_flush_queue(smmu, &smmu->priq.q, "priq");
 		return 0;
 	}
 
@@ -2933,6 +2941,46 @@ static int arm_smmu_enable_ats(struct arm_smmu_master_data *master)
 	return 0;
 }
 
+static int arm_smmu_enable_pri(struct arm_smmu_master_data *master)
+{
+	int ret, pos;
+	struct pci_dev *pdev;
+	/*
+	 * TODO: find a good inflight PPR number. We should divide the PRI queue
+	 * by the number of PRI-capable devices, but it's impossible to know
+	 * about current and future (hotplugged) devices. So we're at risk of
+	 * dropping PPRs (and leaking pending requests in the FQ).
+	 */
+	size_t max_inflight_pprs = 16;
+	struct arm_smmu_device *smmu = master->smmu;
+
+	if (!(smmu->features & ARM_SMMU_FEAT_PRI) || !dev_is_pci(master->dev))
+		return -ENOSYS;
+
+	pdev = to_pci_dev(master->dev);
+
+	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
+	if (!pos)
+		return -ENOSYS;
+
+	ret = pci_reset_pri(pdev);
+	if (ret)
+		return ret;
+
+	ret = pci_enable_pri(pdev, max_inflight_pprs);
+	if (ret) {
+		dev_err(master->dev, "cannot enable PRI: %d\n", ret);
+		return ret;
+	}
+
+	master->can_fault = true;
+	master->ste.prg_resp_needs_ssid = pci_prg_resp_requires_prefix(pdev);
+
+	dev_dbg(master->dev, "enabled PRI");
+
+	return 0;
+}
+
 static void arm_smmu_disable_ats(struct arm_smmu_master_data *master)
 {
 	struct pci_dev *pdev;
@@ -2948,6 +2996,22 @@ static void arm_smmu_disable_ats(struct arm_smmu_master_data *master)
 	pci_disable_ats(pdev);
 }
 
+static void arm_smmu_disable_pri(struct arm_smmu_master_data *master)
+{
+	struct pci_dev *pdev;
+
+	if (!dev_is_pci(master->dev))
+		return;
+
+	pdev = to_pci_dev(master->dev);
+
+	if (!pdev->pri_enabled)
+		return;
+
+	pci_disable_pri(pdev);
+	master->can_fault = false;
+}
+
 static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
 				  struct arm_smmu_master_data *master)
 {
@@ -3070,12 +3134,13 @@ static int arm_smmu_add_device(struct device *dev)
 		master->ste.can_stall = true;
 	}
 
-	arm_smmu_enable_ats(master);
+	if (!arm_smmu_enable_ats(master))
+		arm_smmu_enable_pri(master);
 
 	group = iommu_group_get_for_dev(dev);
 	if (IS_ERR(group)) {
 		ret = PTR_ERR(group);
-		goto err_disable_ats;
+		goto err_disable_pri;
 	}
 
 	iommu_group_put(group);
@@ -3084,7 +3149,8 @@ static int arm_smmu_add_device(struct device *dev)
 
 	return 0;
 
-err_disable_ats:
+err_disable_pri:
+	arm_smmu_disable_pri(master);
 	arm_smmu_disable_ats(master);
 
 	return ret;
@@ -3104,6 +3170,8 @@ static void arm_smmu_remove_device(struct device *dev)
 	if (master && master->ste.assigned)
 		arm_smmu_detach_dev(dev);
 	arm_smmu_remove_master(smmu, master);
+
+	arm_smmu_disable_pri(master);
 	arm_smmu_disable_ats(master);
 
 	iommu_group_remove_device(dev);
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 36/36] iommu/arm-smmu-v3: Add support for PCI PASID
  2017-10-06 13:31 ` Jean-Philippe Brucker
@ 2017-10-06 13:32   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:32 UTC (permalink / raw)
  To: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

Enable PASID for PCI devices that support it.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 52 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 852714f35010..42c8378624ed 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -3012,6 +3012,50 @@ static void arm_smmu_disable_pri(struct arm_smmu_master_data *master)
 	master->can_fault = false;
 }
 
+static int arm_smmu_enable_pasid(struct arm_smmu_master_data *master)
+{
+	int ret;
+	int features;
+	int num_ssids;
+	struct pci_dev *pdev;
+
+	if (!dev_is_pci(master->dev))
+		return -ENOSYS;
+
+	pdev = to_pci_dev(master->dev);
+
+	features = pci_pasid_features(pdev);
+	if (features < 0)
+		return -ENOSYS;
+
+	num_ssids = pci_max_pasids(pdev);
+
+	dev_dbg(&pdev->dev, "device supports %#x SSIDs [%s%s]\n", num_ssids,
+		(features & PCI_PASID_CAP_EXEC) ? "x" : "",
+		(features & PCI_PASID_CAP_PRIV) ? "p" : "");
+
+	num_ssids = clamp_val(num_ssids, 1, 1 << master->smmu->ssid_bits);
+	num_ssids = rounddown_pow_of_two(num_ssids);
+
+	ret = pci_enable_pasid(pdev, features);
+	return ret ? ret : num_ssids;
+}
+
+static void arm_smmu_disable_pasid(struct arm_smmu_master_data *master)
+{
+	struct pci_dev *pdev;
+
+	if (!dev_is_pci(master->dev))
+		return;
+
+	pdev = to_pci_dev(master->dev);
+
+	if (!pdev->pasid_enabled)
+		return;
+
+	pci_disable_pasid(pdev);
+}
+
 static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
 				  struct arm_smmu_master_data *master)
 {
@@ -3134,6 +3178,11 @@ static int arm_smmu_add_device(struct device *dev)
 		master->ste.can_stall = true;
 	}
 
+	/* PASID must be enabled before ATS */
+	ret = arm_smmu_enable_pasid(master);
+	if (ret > 0)
+		master->num_ssids = ret;
+
 	if (!arm_smmu_enable_ats(master))
 		arm_smmu_enable_pri(master);
 
@@ -3152,6 +3201,7 @@ static int arm_smmu_add_device(struct device *dev)
 err_disable_pri:
 	arm_smmu_disable_pri(master);
 	arm_smmu_disable_ats(master);
+	arm_smmu_disable_pasid(master);
 
 	return ret;
 }
@@ -3172,7 +3222,9 @@ static void arm_smmu_remove_device(struct device *dev)
 	arm_smmu_remove_master(smmu, master);
 
 	arm_smmu_disable_pri(master);
+	/* PASID must be disabled after ATS */
 	arm_smmu_disable_ats(master);
+	arm_smmu_disable_pasid(master);
 
 	iommu_group_remove_device(dev);
 	iommu_device_unlink(&smmu->iommu, dev);
-- 
2.13.3


^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 36/36] iommu/arm-smmu-v3: Add support for PCI PASID
@ 2017-10-06 13:32   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-06 13:32 UTC (permalink / raw)
  To: linux-arm-kernel

Enable PASID for PCI devices that support it.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 52 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 852714f35010..42c8378624ed 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -3012,6 +3012,50 @@ static void arm_smmu_disable_pri(struct arm_smmu_master_data *master)
 	master->can_fault = false;
 }
 
+static int arm_smmu_enable_pasid(struct arm_smmu_master_data *master)
+{
+	int ret;
+	int features;
+	int num_ssids;
+	struct pci_dev *pdev;
+
+	if (!dev_is_pci(master->dev))
+		return -ENOSYS;
+
+	pdev = to_pci_dev(master->dev);
+
+	features = pci_pasid_features(pdev);
+	if (features < 0)
+		return -ENOSYS;
+
+	num_ssids = pci_max_pasids(pdev);
+
+	dev_dbg(&pdev->dev, "device supports %#x SSIDs [%s%s]\n", num_ssids,
+		(features & PCI_PASID_CAP_EXEC) ? "x" : "",
+		(features & PCI_PASID_CAP_PRIV) ? "p" : "");
+
+	num_ssids = clamp_val(num_ssids, 1, 1 << master->smmu->ssid_bits);
+	num_ssids = rounddown_pow_of_two(num_ssids);
+
+	ret = pci_enable_pasid(pdev, features);
+	return ret ? ret : num_ssids;
+}
+
+static void arm_smmu_disable_pasid(struct arm_smmu_master_data *master)
+{
+	struct pci_dev *pdev;
+
+	if (!dev_is_pci(master->dev))
+		return;
+
+	pdev = to_pci_dev(master->dev);
+
+	if (!pdev->pasid_enabled)
+		return;
+
+	pci_disable_pasid(pdev);
+}
+
 static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
 				  struct arm_smmu_master_data *master)
 {
@@ -3134,6 +3178,11 @@ static int arm_smmu_add_device(struct device *dev)
 		master->ste.can_stall = true;
 	}
 
+	/* PASID must be enabled before ATS */
+	ret = arm_smmu_enable_pasid(master);
+	if (ret > 0)
+		master->num_ssids = ret;
+
 	if (!arm_smmu_enable_ats(master))
 		arm_smmu_enable_pri(master);
 
@@ -3152,6 +3201,7 @@ static int arm_smmu_add_device(struct device *dev)
 err_disable_pri:
 	arm_smmu_disable_pri(master);
 	arm_smmu_disable_ats(master);
+	arm_smmu_disable_pasid(master);
 
 	return ret;
 }
@@ -3172,7 +3222,9 @@ static void arm_smmu_remove_device(struct device *dev)
 	arm_smmu_remove_master(smmu, master);
 
 	arm_smmu_disable_pri(master);
+	/* PASID must be disabled after ATS */
 	arm_smmu_disable_ats(master);
+	arm_smmu_disable_pasid(master);
 
 	iommu_group_remove_device(dev);
 	iommu_device_unlink(&smmu->iommu, dev);
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 34/36] PCI: Make "PRG Response PASID Required" handling common
  2017-10-06 13:32     ` Jean-Philippe Brucker
  (?)
@ 2017-10-06 18:11         ` Bjorn Helgaas
  -1 siblings, 0 replies; 268+ messages in thread
From: Bjorn Helgaas @ 2017-10-06 18:11 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: mark.rutland-5wv7dgnIgG8,
	gabriele.paoloni-hv44wF8Li93QT0dZR+AlfA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	catalin.marinas-5wv7dgnIgG8, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, devicetree-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw-LthD3rsA81gm4RdzfppkhA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	sudeep.holla-5wv7dgnIgG8

On Fri, Oct 06, 2017 at 02:32:01PM +0100, Jean-Philippe Brucker wrote:
> The PASID ECN to the PCIe spec added a bit in the PRI status register that
> allows a Function to declare whether a PRG Response should contain the
> PASID prefix or not.
> 
> Move the helper that accesses it from amd_iommu into the PCI subsystem,
> renaming it to be consistent with the current spec (PRPR - PRG Response
> PASID Required).
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>

Acked-by: Bjorn Helgaas <bhelgaas-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>

I assume this will be merged with the rest of the series, probably via an
IOMMU tree.

> ---
>  drivers/iommu/amd_iommu.c     | 19 +------------------
>  drivers/pci/ats.c             | 17 +++++++++++++++++
>  include/linux/pci-ats.h       |  8 ++++++++
>  include/uapi/linux/pci_regs.h |  1 +
>  4 files changed, 27 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index 51f8215877f5..45036a253d63 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -2039,23 +2039,6 @@ static int pdev_iommuv2_enable(struct pci_dev *pdev)
>  	return ret;
>  }
>  
> -/* FIXME: Move this to PCI code */
> -#define PCI_PRI_TLP_OFF		(1 << 15)
> -
> -static bool pci_pri_tlp_required(struct pci_dev *pdev)
> -{
> -	u16 status;
> -	int pos;
> -
> -	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> -	if (!pos)
> -		return false;
> -
> -	pci_read_config_word(pdev, pos + PCI_PRI_STATUS, &status);
> -
> -	return (status & PCI_PRI_TLP_OFF) ? true : false;
> -}
> -
>  /*
>   * If a device is not yet associated with a domain, this function
>   * assigns it visible for the hardware
> @@ -2084,7 +2067,7 @@ static int attach_device(struct device *dev,
>  
>  			dev_data->ats.enabled = true;
>  			dev_data->ats.qdep    = pci_ats_queue_depth(pdev);
> -			dev_data->pri_tlp     = pci_pri_tlp_required(pdev);
> +			dev_data->pri_tlp     = pci_prg_resp_requires_prefix(pdev);
>  		}
>  	} else if (amd_iommu_iotlb_sup &&
>  		   pci_enable_ats(pdev, PAGE_SHIFT) == 0) {
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index ad8ddbbbf245..f95e42df728b 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -389,3 +389,20 @@ int pci_max_pasids(struct pci_dev *pdev)
>  }
>  EXPORT_SYMBOL_GPL(pci_max_pasids);
>  #endif /* CONFIG_PCI_PASID */
> +
> +#if defined(CONFIG_PCI_PASID) && defined(CONFIG_PCI_PRI)
> +bool pci_prg_resp_requires_prefix(struct pci_dev *pdev)
> +{
> +	u16 status;
> +	int pos;
> +
> +	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> +	if (!pos)
> +		return false;
> +
> +	pci_read_config_word(pdev, pos + PCI_PRI_STATUS, &status);
> +
> +	return !!(status & PCI_PRI_STATUS_PRPR);
> +}
> +EXPORT_SYMBOL_GPL(pci_prg_resp_requires_prefix);
> +#endif /* CONFIG_PCI_PASID && CONFIG_PCI_PRI */
> diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
> index 782fb8e0755f..367ea9448441 100644
> --- a/include/linux/pci-ats.h
> +++ b/include/linux/pci-ats.h
> @@ -67,5 +67,13 @@ static inline int pci_max_pasids(struct pci_dev *pdev)
>  
>  #endif /* CONFIG_PCI_PASID */
>  
> +#if defined(CONFIG_PCI_PASID) && defined(CONFIG_PCI_PRI)
> +bool pci_prg_resp_requires_prefix(struct pci_dev *pdev);
> +#else
> +static inline bool pci_prg_resp_requires_prefix(struct pci_dev *pdev)
> +{
> +	return false;
> +}
> +#endif /* CONFIG_PCI_PASID && CONFIG_PCI_PRI */
>  
>  #endif /* LINUX_PCI_ATS_H*/
> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> index f8d58045926f..a0eeb16a2bfe 100644
> --- a/include/uapi/linux/pci_regs.h
> +++ b/include/uapi/linux/pci_regs.h
> @@ -862,6 +862,7 @@
>  #define  PCI_PRI_STATUS_RF	0x001	/* Response Failure */
>  #define  PCI_PRI_STATUS_UPRGI	0x002	/* Unexpected PRG index */
>  #define  PCI_PRI_STATUS_STOPPED	0x100	/* PRI Stopped */
> +#define  PCI_PRI_STATUS_PRPR	0x8000	/* PRG Response requires PASID prefix */
>  #define PCI_PRI_MAX_REQ		0x08	/* PRI max reqs supported */
>  #define PCI_PRI_ALLOC_REQ	0x0c	/* PRI max reqs allowed */
>  #define PCI_EXT_CAP_PRI_SIZEOF	16
> -- 
> 2.13.3
> 

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 34/36] PCI: Make "PRG Response PASID Required" handling common
@ 2017-10-06 18:11         ` Bjorn Helgaas
  0 siblings, 0 replies; 268+ messages in thread
From: Bjorn Helgaas @ 2017-10-06 18:11 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: mark.rutland, xieyisheng1, gabriele.paoloni, linux-pci,
	will.deacon, okaya, yi.l.liu, lorenzo.pieralisi, ashok.raj, tn,
	joro, robdclark, linux-acpi, catalin.marinas, rfranz, lenb,
	devicetree, jacob.jun.pan, alex.williamson, robh+dt,
	thunder.leizhen, bhelgaas, linux-arm-kernel, dwmw2, liubo95, rjw,
	iommu, hanjun.guo, sudeep.holla, robin.murphy, nwatters

On Fri, Oct 06, 2017 at 02:32:01PM +0100, Jean-Philippe Brucker wrote:
> The PASID ECN to the PCIe spec added a bit in the PRI status register that
> allows a Function to declare whether a PRG Response should contain the
> PASID prefix or not.
> 
> Move the helper that accesses it from amd_iommu into the PCI subsystem,
> renaming it to be consistent with the current spec (PRPR - PRG Response
> PASID Required).
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>

Acked-by: Bjorn Helgaas <bhelgaas@google.com>

I assume this will be merged with the rest of the series, probably via an
IOMMU tree.

> ---
>  drivers/iommu/amd_iommu.c     | 19 +------------------
>  drivers/pci/ats.c             | 17 +++++++++++++++++
>  include/linux/pci-ats.h       |  8 ++++++++
>  include/uapi/linux/pci_regs.h |  1 +
>  4 files changed, 27 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index 51f8215877f5..45036a253d63 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -2039,23 +2039,6 @@ static int pdev_iommuv2_enable(struct pci_dev *pdev)
>  	return ret;
>  }
>  
> -/* FIXME: Move this to PCI code */
> -#define PCI_PRI_TLP_OFF		(1 << 15)
> -
> -static bool pci_pri_tlp_required(struct pci_dev *pdev)
> -{
> -	u16 status;
> -	int pos;
> -
> -	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> -	if (!pos)
> -		return false;
> -
> -	pci_read_config_word(pdev, pos + PCI_PRI_STATUS, &status);
> -
> -	return (status & PCI_PRI_TLP_OFF) ? true : false;
> -}
> -
>  /*
>   * If a device is not yet associated with a domain, this function
>   * assigns it visible for the hardware
> @@ -2084,7 +2067,7 @@ static int attach_device(struct device *dev,
>  
>  			dev_data->ats.enabled = true;
>  			dev_data->ats.qdep    = pci_ats_queue_depth(pdev);
> -			dev_data->pri_tlp     = pci_pri_tlp_required(pdev);
> +			dev_data->pri_tlp     = pci_prg_resp_requires_prefix(pdev);
>  		}
>  	} else if (amd_iommu_iotlb_sup &&
>  		   pci_enable_ats(pdev, PAGE_SHIFT) == 0) {
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index ad8ddbbbf245..f95e42df728b 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -389,3 +389,20 @@ int pci_max_pasids(struct pci_dev *pdev)
>  }
>  EXPORT_SYMBOL_GPL(pci_max_pasids);
>  #endif /* CONFIG_PCI_PASID */
> +
> +#if defined(CONFIG_PCI_PASID) && defined(CONFIG_PCI_PRI)
> +bool pci_prg_resp_requires_prefix(struct pci_dev *pdev)
> +{
> +	u16 status;
> +	int pos;
> +
> +	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> +	if (!pos)
> +		return false;
> +
> +	pci_read_config_word(pdev, pos + PCI_PRI_STATUS, &status);
> +
> +	return !!(status & PCI_PRI_STATUS_PRPR);
> +}
> +EXPORT_SYMBOL_GPL(pci_prg_resp_requires_prefix);
> +#endif /* CONFIG_PCI_PASID && CONFIG_PCI_PRI */
> diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
> index 782fb8e0755f..367ea9448441 100644
> --- a/include/linux/pci-ats.h
> +++ b/include/linux/pci-ats.h
> @@ -67,5 +67,13 @@ static inline int pci_max_pasids(struct pci_dev *pdev)
>  
>  #endif /* CONFIG_PCI_PASID */
>  
> +#if defined(CONFIG_PCI_PASID) && defined(CONFIG_PCI_PRI)
> +bool pci_prg_resp_requires_prefix(struct pci_dev *pdev);
> +#else
> +static inline bool pci_prg_resp_requires_prefix(struct pci_dev *pdev)
> +{
> +	return false;
> +}
> +#endif /* CONFIG_PCI_PASID && CONFIG_PCI_PRI */
>  
>  #endif /* LINUX_PCI_ATS_H*/
> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> index f8d58045926f..a0eeb16a2bfe 100644
> --- a/include/uapi/linux/pci_regs.h
> +++ b/include/uapi/linux/pci_regs.h
> @@ -862,6 +862,7 @@
>  #define  PCI_PRI_STATUS_RF	0x001	/* Response Failure */
>  #define  PCI_PRI_STATUS_UPRGI	0x002	/* Unexpected PRG index */
>  #define  PCI_PRI_STATUS_STOPPED	0x100	/* PRI Stopped */
> +#define  PCI_PRI_STATUS_PRPR	0x8000	/* PRG Response requires PASID prefix */
>  #define PCI_PRI_MAX_REQ		0x08	/* PRI max reqs supported */
>  #define PCI_PRI_ALLOC_REQ	0x0c	/* PRI max reqs allowed */
>  #define PCI_EXT_CAP_PRI_SIZEOF	16
> -- 
> 2.13.3
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 34/36] PCI: Make "PRG Response PASID Required" handling common
@ 2017-10-06 18:11         ` Bjorn Helgaas
  0 siblings, 0 replies; 268+ messages in thread
From: Bjorn Helgaas @ 2017-10-06 18:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Oct 06, 2017 at 02:32:01PM +0100, Jean-Philippe Brucker wrote:
> The PASID ECN to the PCIe spec added a bit in the PRI status register that
> allows a Function to declare whether a PRG Response should contain the
> PASID prefix or not.
> 
> Move the helper that accesses it from amd_iommu into the PCI subsystem,
> renaming it to be consistent with the current spec (PRPR - PRG Response
> PASID Required).
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>

Acked-by: Bjorn Helgaas <bhelgaas@google.com>

I assume this will be merged with the rest of the series, probably via an
IOMMU tree.

> ---
>  drivers/iommu/amd_iommu.c     | 19 +------------------
>  drivers/pci/ats.c             | 17 +++++++++++++++++
>  include/linux/pci-ats.h       |  8 ++++++++
>  include/uapi/linux/pci_regs.h |  1 +
>  4 files changed, 27 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index 51f8215877f5..45036a253d63 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -2039,23 +2039,6 @@ static int pdev_iommuv2_enable(struct pci_dev *pdev)
>  	return ret;
>  }
>  
> -/* FIXME: Move this to PCI code */
> -#define PCI_PRI_TLP_OFF		(1 << 15)
> -
> -static bool pci_pri_tlp_required(struct pci_dev *pdev)
> -{
> -	u16 status;
> -	int pos;
> -
> -	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> -	if (!pos)
> -		return false;
> -
> -	pci_read_config_word(pdev, pos + PCI_PRI_STATUS, &status);
> -
> -	return (status & PCI_PRI_TLP_OFF) ? true : false;
> -}
> -
>  /*
>   * If a device is not yet associated with a domain, this function
>   * assigns it visible for the hardware
> @@ -2084,7 +2067,7 @@ static int attach_device(struct device *dev,
>  
>  			dev_data->ats.enabled = true;
>  			dev_data->ats.qdep    = pci_ats_queue_depth(pdev);
> -			dev_data->pri_tlp     = pci_pri_tlp_required(pdev);
> +			dev_data->pri_tlp     = pci_prg_resp_requires_prefix(pdev);
>  		}
>  	} else if (amd_iommu_iotlb_sup &&
>  		   pci_enable_ats(pdev, PAGE_SHIFT) == 0) {
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index ad8ddbbbf245..f95e42df728b 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -389,3 +389,20 @@ int pci_max_pasids(struct pci_dev *pdev)
>  }
>  EXPORT_SYMBOL_GPL(pci_max_pasids);
>  #endif /* CONFIG_PCI_PASID */
> +
> +#if defined(CONFIG_PCI_PASID) && defined(CONFIG_PCI_PRI)
> +bool pci_prg_resp_requires_prefix(struct pci_dev *pdev)
> +{
> +	u16 status;
> +	int pos;
> +
> +	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> +	if (!pos)
> +		return false;
> +
> +	pci_read_config_word(pdev, pos + PCI_PRI_STATUS, &status);
> +
> +	return !!(status & PCI_PRI_STATUS_PRPR);
> +}
> +EXPORT_SYMBOL_GPL(pci_prg_resp_requires_prefix);
> +#endif /* CONFIG_PCI_PASID && CONFIG_PCI_PRI */
> diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
> index 782fb8e0755f..367ea9448441 100644
> --- a/include/linux/pci-ats.h
> +++ b/include/linux/pci-ats.h
> @@ -67,5 +67,13 @@ static inline int pci_max_pasids(struct pci_dev *pdev)
>  
>  #endif /* CONFIG_PCI_PASID */
>  
> +#if defined(CONFIG_PCI_PASID) && defined(CONFIG_PCI_PRI)
> +bool pci_prg_resp_requires_prefix(struct pci_dev *pdev);
> +#else
> +static inline bool pci_prg_resp_requires_prefix(struct pci_dev *pdev)
> +{
> +	return false;
> +}
> +#endif /* CONFIG_PCI_PASID && CONFIG_PCI_PRI */
>  
>  #endif /* LINUX_PCI_ATS_H*/
> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> index f8d58045926f..a0eeb16a2bfe 100644
> --- a/include/uapi/linux/pci_regs.h
> +++ b/include/uapi/linux/pci_regs.h
> @@ -862,6 +862,7 @@
>  #define  PCI_PRI_STATUS_RF	0x001	/* Response Failure */
>  #define  PCI_PRI_STATUS_UPRGI	0x002	/* Unexpected PRG index */
>  #define  PCI_PRI_STATUS_STOPPED	0x100	/* PRI Stopped */
> +#define  PCI_PRI_STATUS_PRPR	0x8000	/* PRG Response requires PASID prefix */
>  #define PCI_PRI_MAX_REQ		0x08	/* PRI max reqs supported */
>  #define PCI_PRI_ALLOC_REQ	0x0c	/* PRI max reqs allowed */
>  #define PCI_EXT_CAP_PRI_SIZEOF	16
> -- 
> 2.13.3
> 

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 00/36] Process management for IOMMU + SVM for SMMUv3
  2017-10-06 13:31 ` Jean-Philippe Brucker
  (?)
@ 2017-10-09  9:49   ` Yisheng Xie
  -1 siblings, 0 replies; 268+ messages in thread
From: Yisheng Xie @ 2017-10-09  9:49 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, gabriele.paoloni, nwatters, okaya, rfranz,
	dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

Hi Jean,

On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
> Following discussions at plumbers and elsewhere, it seems like we need to
> unify some of the Shared Virtual Memory (SVM) code, in order to define
> clear semantics for the SVM API.
> 
> My previous RFC [1] was centered on the SMMUv3, but some of this code will
> need to be reused by the SMMUv2 and virtio-iommu drivers. This second
> proposal focuses on abstracting a little more into the core IOMMU API, and
> also trying to find common ground for all SVM-capable IOMMUs.
> 
> SVM is, in the context of the IOMMU, sharing page tables between a process
> and a device. Traditionally it requires IO Page Fault and Process Address
> Space ID capabilities in device and IOMMU.
> 
> * A device driver can bind a process to a device, with iommu_process_bind.
>   Internally we hold on to the mm and get notified of its activity with an
>   mmu_notifier. The bond is removed by exit_mm, by a call to
>   iommu_process_unbind or iommu_detach_device.
> 
> * iommu_process_bind returns a 20-bit PASID (PCI terminology) to the
>   device driver, which programs it into the device to access the process
>   address space.
> 
> * The device and the IOMMU support recoverable page faults. This can be
>   either ATS+PRI for PCI, or platform-specific mechanisms such as Stall
>   for SMMU.
> 
> Ideally systems wanting to use SVM have to support these three features,
> but in practice we'll see implementations supporting just a subset of
> them, especially in validation environments. So even if this particular
> patchset assumes all three capabilities, it should also be possible to
> support PASID without IOPF (by pinning everything, see non-system SVM in
> OpenCL)
How to pin everything? If user malloc anything we should pin it. It should
from user or driver?

> , or IOPF without PASID (sharing the single device address space
> with a process, could be useful for DPDK+VFIO).
> 
> Implementing both these cases would enable PT sharing alone. Some people
> would also like IOPF alone without SVM (covered by this series) or process
> management without shared PT (not covered). Using these features
> individually is also important for testing, as SVM is in its infancy and
> providing easy ways to test is essential to reduce the number of quirks
> down the line.
> 
>   Process management
>   ==================
> 
> The first part of this series introduces boilerplate code for managing
> PASIDs and processes bound to devices. It's something any IOMMU driver
> that wants to support bind/unbind will have to do, and it is difficult to
> get right.
> 
> Patches
> 1: iommu_process and PASID allocation, attach and release
> 2: process_exit callback for device drivers
> 3: iommu_process search by PASID
> 4: track process changes with an MMU notifiers
> 5: bind and unbind operations
> 
> My proposal uses the following model:
> 
> * The PASID space is system-wide. This means that a Linux process will
>   have a single PASID. I introduce the iommu_process structure and a
>   global IDR to manage this.
> 
> * An iommu_process can be bound to multiple domains, and a domain can have
>   multiple iommu_process.
when bind a task to device, can we create a single domain for it? I am thinking
about process management without shared PT(for some device only support PASID
without pri ability), it seems hard to expand if a domain have multiple iommu_process?
Do you have any idea about this?

> 
> * IOMMU groups share same PASID table. IOMMU groups are a convenient way
>   to cover various hardware weaknesses that prevent a group of device to
>   be isolated by the IOMMU (untrusted bridge, for instance). It's foolish
>   to assume that all PASID implementations will perfectly isolate devices
>   within a bus and functions within a device, so let's assume all devices
>   within an IOMMU group have to share PASID traffic as well. In general
>   there will be a single device per group.
> 
> * It's up to the driver implementation to decide where to implement the
>   PASID tables. For SMMU it's more convenient to have a single PASID table
>   per domain. And I think the model fits better with the existing IOMMU
>   API: IOVA traffic is shared by all devices in a domain, so should PASID
>   traffic.
What's the meaning of "share PASID traffic"? PASID space is system-wide,
and a domain can have multiple iommu_process , so a domain can have multiple
PASIDs , one PASID for a iommu_process, right?

Yisheng Xie
Thanks



^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 00/36] Process management for IOMMU + SVM for SMMUv3
@ 2017-10-09  9:49   ` Yisheng Xie
  0 siblings, 0 replies; 268+ messages in thread
From: Yisheng Xie @ 2017-10-09  9:49 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu
  Cc: mark.rutland, gabriele.paoloni, catalin.marinas, will.deacon,
	okaya, yi.l.liu, lorenzo.pieralisi, ashok.raj, tn, joro, rfranz,
	lenb, jacob.jun.pan, alex.williamson, robh+dt, thunder.leizhen,
	bhelgaas, dwmw2, liubo95, rjw, robdclark, hanjun.guo,
	sudeep.holla, robin.murphy, nwatters

Hi Jean,

On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
> Following discussions at plumbers and elsewhere, it seems like we need to
> unify some of the Shared Virtual Memory (SVM) code, in order to define
> clear semantics for the SVM API.
> 
> My previous RFC [1] was centered on the SMMUv3, but some of this code will
> need to be reused by the SMMUv2 and virtio-iommu drivers. This second
> proposal focuses on abstracting a little more into the core IOMMU API, and
> also trying to find common ground for all SVM-capable IOMMUs.
> 
> SVM is, in the context of the IOMMU, sharing page tables between a process
> and a device. Traditionally it requires IO Page Fault and Process Address
> Space ID capabilities in device and IOMMU.
> 
> * A device driver can bind a process to a device, with iommu_process_bind.
>   Internally we hold on to the mm and get notified of its activity with an
>   mmu_notifier. The bond is removed by exit_mm, by a call to
>   iommu_process_unbind or iommu_detach_device.
> 
> * iommu_process_bind returns a 20-bit PASID (PCI terminology) to the
>   device driver, which programs it into the device to access the process
>   address space.
> 
> * The device and the IOMMU support recoverable page faults. This can be
>   either ATS+PRI for PCI, or platform-specific mechanisms such as Stall
>   for SMMU.
> 
> Ideally systems wanting to use SVM have to support these three features,
> but in practice we'll see implementations supporting just a subset of
> them, especially in validation environments. So even if this particular
> patchset assumes all three capabilities, it should also be possible to
> support PASID without IOPF (by pinning everything, see non-system SVM in
> OpenCL)
How to pin everything? If user malloc anything we should pin it. It should
from user or driver?

> , or IOPF without PASID (sharing the single device address space
> with a process, could be useful for DPDK+VFIO).
> 
> Implementing both these cases would enable PT sharing alone. Some people
> would also like IOPF alone without SVM (covered by this series) or process
> management without shared PT (not covered). Using these features
> individually is also important for testing, as SVM is in its infancy and
> providing easy ways to test is essential to reduce the number of quirks
> down the line.
> 
>   Process management
>   ==================
> 
> The first part of this series introduces boilerplate code for managing
> PASIDs and processes bound to devices. It's something any IOMMU driver
> that wants to support bind/unbind will have to do, and it is difficult to
> get right.
> 
> Patches
> 1: iommu_process and PASID allocation, attach and release
> 2: process_exit callback for device drivers
> 3: iommu_process search by PASID
> 4: track process changes with an MMU notifiers
> 5: bind and unbind operations
> 
> My proposal uses the following model:
> 
> * The PASID space is system-wide. This means that a Linux process will
>   have a single PASID. I introduce the iommu_process structure and a
>   global IDR to manage this.
> 
> * An iommu_process can be bound to multiple domains, and a domain can have
>   multiple iommu_process.
when bind a task to device, can we create a single domain for it? I am thinking
about process management without shared PT(for some device only support PASID
without pri ability), it seems hard to expand if a domain have multiple iommu_process?
Do you have any idea about this?

> 
> * IOMMU groups share same PASID table. IOMMU groups are a convenient way
>   to cover various hardware weaknesses that prevent a group of device to
>   be isolated by the IOMMU (untrusted bridge, for instance). It's foolish
>   to assume that all PASID implementations will perfectly isolate devices
>   within a bus and functions within a device, so let's assume all devices
>   within an IOMMU group have to share PASID traffic as well. In general
>   there will be a single device per group.
> 
> * It's up to the driver implementation to decide where to implement the
>   PASID tables. For SMMU it's more convenient to have a single PASID table
>   per domain. And I think the model fits better with the existing IOMMU
>   API: IOVA traffic is shared by all devices in a domain, so should PASID
>   traffic.
What's the meaning of "share PASID traffic"? PASID space is system-wide,
and a domain can have multiple iommu_process , so a domain can have multiple
PASIDs , one PASID for a iommu_process, right?

Yisheng Xie
Thanks



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 00/36] Process management for IOMMU + SVM for SMMUv3
@ 2017-10-09  9:49   ` Yisheng Xie
  0 siblings, 0 replies; 268+ messages in thread
From: Yisheng Xie @ 2017-10-09  9:49 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jean,

On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
> Following discussions at plumbers and elsewhere, it seems like we need to
> unify some of the Shared Virtual Memory (SVM) code, in order to define
> clear semantics for the SVM API.
> 
> My previous RFC [1] was centered on the SMMUv3, but some of this code will
> need to be reused by the SMMUv2 and virtio-iommu drivers. This second
> proposal focuses on abstracting a little more into the core IOMMU API, and
> also trying to find common ground for all SVM-capable IOMMUs.
> 
> SVM is, in the context of the IOMMU, sharing page tables between a process
> and a device. Traditionally it requires IO Page Fault and Process Address
> Space ID capabilities in device and IOMMU.
> 
> * A device driver can bind a process to a device, with iommu_process_bind.
>   Internally we hold on to the mm and get notified of its activity with an
>   mmu_notifier. The bond is removed by exit_mm, by a call to
>   iommu_process_unbind or iommu_detach_device.
> 
> * iommu_process_bind returns a 20-bit PASID (PCI terminology) to the
>   device driver, which programs it into the device to access the process
>   address space.
> 
> * The device and the IOMMU support recoverable page faults. This can be
>   either ATS+PRI for PCI, or platform-specific mechanisms such as Stall
>   for SMMU.
> 
> Ideally systems wanting to use SVM have to support these three features,
> but in practice we'll see implementations supporting just a subset of
> them, especially in validation environments. So even if this particular
> patchset assumes all three capabilities, it should also be possible to
> support PASID without IOPF (by pinning everything, see non-system SVM in
> OpenCL)
How to pin everything? If user malloc anything we should pin it. It should
from user or driver?

> , or IOPF without PASID (sharing the single device address space
> with a process, could be useful for DPDK+VFIO).
> 
> Implementing both these cases would enable PT sharing alone. Some people
> would also like IOPF alone without SVM (covered by this series) or process
> management without shared PT (not covered). Using these features
> individually is also important for testing, as SVM is in its infancy and
> providing easy ways to test is essential to reduce the number of quirks
> down the line.
> 
>   Process management
>   ==================
> 
> The first part of this series introduces boilerplate code for managing
> PASIDs and processes bound to devices. It's something any IOMMU driver
> that wants to support bind/unbind will have to do, and it is difficult to
> get right.
> 
> Patches
> 1: iommu_process and PASID allocation, attach and release
> 2: process_exit callback for device drivers
> 3: iommu_process search by PASID
> 4: track process changes with an MMU notifiers
> 5: bind and unbind operations
> 
> My proposal uses the following model:
> 
> * The PASID space is system-wide. This means that a Linux process will
>   have a single PASID. I introduce the iommu_process structure and a
>   global IDR to manage this.
> 
> * An iommu_process can be bound to multiple domains, and a domain can have
>   multiple iommu_process.
when bind a task to device, can we create a single domain for it? I am thinking
about process management without shared PT(for some device only support PASID
without pri ability), it seems hard to expand if a domain have multiple iommu_process?
Do you have any idea about this?

> 
> * IOMMU groups share same PASID table. IOMMU groups are a convenient way
>   to cover various hardware weaknesses that prevent a group of device to
>   be isolated by the IOMMU (untrusted bridge, for instance). It's foolish
>   to assume that all PASID implementations will perfectly isolate devices
>   within a bus and functions within a device, so let's assume all devices
>   within an IOMMU group have to share PASID traffic as well. In general
>   there will be a single device per group.
> 
> * It's up to the driver implementation to decide where to implement the
>   PASID tables. For SMMU it's more convenient to have a single PASID table
>   per domain. And I think the model fits better with the existing IOMMU
>   API: IOVA traffic is shared by all devices in a domain, so should PASID
>   traffic.
What's the meaning of "share PASID traffic"? PASID space is system-wide,
and a domain can have multiple iommu_process , so a domain can have multiple
PASIDs , one PASID for a iommu_process, right?

Yisheng Xie
Thanks

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 00/36] Process management for IOMMU + SVM for SMMUv3
  2017-10-09  9:49   ` Yisheng Xie
  (?)
@ 2017-10-09 11:36     ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-09 11:36 UTC (permalink / raw)
  To: Yisheng Xie, linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, Mark Rutland, Catalin Marinas, Will Deacon,
	Lorenzo Pieralisi, hanjun.guo, Sudeep Holla, rjw, lenb,
	Robin Murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, gabriele.paoloni, nwatters

Hi,

On 09/10/17 10:49, Yisheng Xie wrote:
> Hi Jean,
> 
> On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
>> Following discussions at plumbers and elsewhere, it seems like we need to
>> unify some of the Shared Virtual Memory (SVM) code, in order to define
>> clear semantics for the SVM API.
>>
>> My previous RFC [1] was centered on the SMMUv3, but some of this code will
>> need to be reused by the SMMUv2 and virtio-iommu drivers. This second
>> proposal focuses on abstracting a little more into the core IOMMU API, and
>> also trying to find common ground for all SVM-capable IOMMUs.
>>
>> SVM is, in the context of the IOMMU, sharing page tables between a process
>> and a device. Traditionally it requires IO Page Fault and Process Address
>> Space ID capabilities in device and IOMMU.
>>
>> * A device driver can bind a process to a device, with iommu_process_bind.
>>   Internally we hold on to the mm and get notified of its activity with an
>>   mmu_notifier. The bond is removed by exit_mm, by a call to
>>   iommu_process_unbind or iommu_detach_device.
>>
>> * iommu_process_bind returns a 20-bit PASID (PCI terminology) to the
>>   device driver, which programs it into the device to access the process
>>   address space.
>>
>> * The device and the IOMMU support recoverable page faults. This can be
>>   either ATS+PRI for PCI, or platform-specific mechanisms such as Stall
>>   for SMMU.
>>
>> Ideally systems wanting to use SVM have to support these three features,
>> but in practice we'll see implementations supporting just a subset of
>> them, especially in validation environments. So even if this particular
>> patchset assumes all three capabilities, it should also be possible to
>> support PASID without IOPF (by pinning everything, see non-system SVM in
>> OpenCL)
> How to pin everything? If user malloc anything we should pin it. It should
> from user or driver?

For userspace drivers, I guess it would be via a VFIO ioctl, that does the
same preparatory work as VFIO_IOMMU_MAP_DMA, but doesn't call iommu_map.
For things like OpenCL SVM buffers, it's the kernel driver that does the
pinning, just like VFIO does it, before launching the work on a SVM buffer.

>> , or IOPF without PASID (sharing the single device address space
>> with a process, could be useful for DPDK+VFIO).
>>
>> Implementing both these cases would enable PT sharing alone. Some people
>> would also like IOPF alone without SVM (covered by this series) or process
>> management without shared PT (not covered). Using these features
>> individually is also important for testing, as SVM is in its infancy and
>> providing easy ways to test is essential to reduce the number of quirks
>> down the line.
>>
>>   Process management
>>   ==================
>>
>> The first part of this series introduces boilerplate code for managing
>> PASIDs and processes bound to devices. It's something any IOMMU driver
>> that wants to support bind/unbind will have to do, and it is difficult to
>> get right.
>>
>> Patches
>> 1: iommu_process and PASID allocation, attach and release
>> 2: process_exit callback for device drivers
>> 3: iommu_process search by PASID
>> 4: track process changes with an MMU notifiers
>> 5: bind and unbind operations
>>
>> My proposal uses the following model:
>>
>> * The PASID space is system-wide. This means that a Linux process will
>>   have a single PASID. I introduce the iommu_process structure and a
>>   global IDR to manage this.
>>
>> * An iommu_process can be bound to multiple domains, and a domain can have
>>   multiple iommu_process.
> when bind a task to device, can we create a single domain for it? I am thinking
> about process management without shared PT(for some device only support PASID
> without pri ability), it seems hard to expand if a domain have multiple iommu_process?
> Do you have any idea about this?

A device always has to be in a domain, as far as I know. Not supporting
PRI forces you to pin down all user mappings (or just the ones you use for
DMA) but you should sill be able to share PT. Now if you don't support
shared PT either, but only PASID, then you'll have to use io-pgtable and a
new map/unmap API on an iommu_process. I don't understand your concern
though, how would the link between process and domains prevent this use-case?

>> * IOMMU groups share same PASID table. IOMMU groups are a convenient way
>>   to cover various hardware weaknesses that prevent a group of device to
>>   be isolated by the IOMMU (untrusted bridge, for instance). It's foolish
>>   to assume that all PASID implementations will perfectly isolate devices
>>   within a bus and functions within a device, so let's assume all devices
>>   within an IOMMU group have to share PASID traffic as well. In general
>>   there will be a single device per group.
>>
>> * It's up to the driver implementation to decide where to implement the
>>   PASID tables. For SMMU it's more convenient to have a single PASID table
>>   per domain. And I think the model fits better with the existing IOMMU
>>   API: IOVA traffic is shared by all devices in a domain, so should PASID
>>   traffic.
> What's the meaning of "share PASID traffic"? PASID space is system-wide,
> and a domain can have multiple iommu_process , so a domain can have multiple
> PASIDs , one PASID for a iommu_process, right?

Yes, I meant that if a device can access mappings for a specific PASID,
then other devices in that same domain are also able to access them.

A few reasons for this choice in the SMMU:
* As all devices in an IOMMU group will be in the same domain and share
the same PASID traffic, it encompasses that case. Groups are the smallest
isolation granularity, then users are free to choose to put different
IOMMU groups in different domains.
* For architectures that can have both non-PASID and PASID traffic
simultaneously, like the SMMU, it is simpler to reason about PASID tables
being a domain, rather than sharing PASID0 within the domain and handling
all others per device.
* It's the same principle as non-PASID mappings (iommu_map/unmap is on a
domain).
* It implement the classic example of IOMMU architectures where multiple
device descriptors point to the same PASID tables.
* It may be desirable for drivers to share PASIDs within a domain, if they
are actually using domains for conveniently sharing address spaces between
devices. I'm not sure how much this is used as a feature. It does model a
shared bus where each device can snoop DMA, so it may be useful.

bind/unbind operations are done on devices and not domains, though,
because it allows users to know which device supports PASID, PRI, etc.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 00/36] Process management for IOMMU + SVM for SMMUv3
@ 2017-10-09 11:36     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-09 11:36 UTC (permalink / raw)
  To: Yisheng Xie, linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, Mark Rutland, Catalin Marinas, Will Deacon,
	Lorenzo Pieralisi, hanjun.guo, Sudeep Holla, rjw, lenb,
	Robin Murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, gabriele.paoloni, nwatters, okaya, rfranz,
	dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

Hi,

On 09/10/17 10:49, Yisheng Xie wrote:
> Hi Jean,
> 
> On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
>> Following discussions at plumbers and elsewhere, it seems like we need to
>> unify some of the Shared Virtual Memory (SVM) code, in order to define
>> clear semantics for the SVM API.
>>
>> My previous RFC [1] was centered on the SMMUv3, but some of this code will
>> need to be reused by the SMMUv2 and virtio-iommu drivers. This second
>> proposal focuses on abstracting a little more into the core IOMMU API, and
>> also trying to find common ground for all SVM-capable IOMMUs.
>>
>> SVM is, in the context of the IOMMU, sharing page tables between a process
>> and a device. Traditionally it requires IO Page Fault and Process Address
>> Space ID capabilities in device and IOMMU.
>>
>> * A device driver can bind a process to a device, with iommu_process_bind.
>>   Internally we hold on to the mm and get notified of its activity with an
>>   mmu_notifier. The bond is removed by exit_mm, by a call to
>>   iommu_process_unbind or iommu_detach_device.
>>
>> * iommu_process_bind returns a 20-bit PASID (PCI terminology) to the
>>   device driver, which programs it into the device to access the process
>>   address space.
>>
>> * The device and the IOMMU support recoverable page faults. This can be
>>   either ATS+PRI for PCI, or platform-specific mechanisms such as Stall
>>   for SMMU.
>>
>> Ideally systems wanting to use SVM have to support these three features,
>> but in practice we'll see implementations supporting just a subset of
>> them, especially in validation environments. So even if this particular
>> patchset assumes all three capabilities, it should also be possible to
>> support PASID without IOPF (by pinning everything, see non-system SVM in
>> OpenCL)
> How to pin everything? If user malloc anything we should pin it. It should
> from user or driver?

For userspace drivers, I guess it would be via a VFIO ioctl, that does the
same preparatory work as VFIO_IOMMU_MAP_DMA, but doesn't call iommu_map.
For things like OpenCL SVM buffers, it's the kernel driver that does the
pinning, just like VFIO does it, before launching the work on a SVM buffer.

>> , or IOPF without PASID (sharing the single device address space
>> with a process, could be useful for DPDK+VFIO).
>>
>> Implementing both these cases would enable PT sharing alone. Some people
>> would also like IOPF alone without SVM (covered by this series) or process
>> management without shared PT (not covered). Using these features
>> individually is also important for testing, as SVM is in its infancy and
>> providing easy ways to test is essential to reduce the number of quirks
>> down the line.
>>
>>   Process management
>>   ==================
>>
>> The first part of this series introduces boilerplate code for managing
>> PASIDs and processes bound to devices. It's something any IOMMU driver
>> that wants to support bind/unbind will have to do, and it is difficult to
>> get right.
>>
>> Patches
>> 1: iommu_process and PASID allocation, attach and release
>> 2: process_exit callback for device drivers
>> 3: iommu_process search by PASID
>> 4: track process changes with an MMU notifiers
>> 5: bind and unbind operations
>>
>> My proposal uses the following model:
>>
>> * The PASID space is system-wide. This means that a Linux process will
>>   have a single PASID. I introduce the iommu_process structure and a
>>   global IDR to manage this.
>>
>> * An iommu_process can be bound to multiple domains, and a domain can have
>>   multiple iommu_process.
> when bind a task to device, can we create a single domain for it? I am thinking
> about process management without shared PT(for some device only support PASID
> without pri ability), it seems hard to expand if a domain have multiple iommu_process?
> Do you have any idea about this?

A device always has to be in a domain, as far as I know. Not supporting
PRI forces you to pin down all user mappings (or just the ones you use for
DMA) but you should sill be able to share PT. Now if you don't support
shared PT either, but only PASID, then you'll have to use io-pgtable and a
new map/unmap API on an iommu_process. I don't understand your concern
though, how would the link between process and domains prevent this use-case?

>> * IOMMU groups share same PASID table. IOMMU groups are a convenient way
>>   to cover various hardware weaknesses that prevent a group of device to
>>   be isolated by the IOMMU (untrusted bridge, for instance). It's foolish
>>   to assume that all PASID implementations will perfectly isolate devices
>>   within a bus and functions within a device, so let's assume all devices
>>   within an IOMMU group have to share PASID traffic as well. In general
>>   there will be a single device per group.
>>
>> * It's up to the driver implementation to decide where to implement the
>>   PASID tables. For SMMU it's more convenient to have a single PASID table
>>   per domain. And I think the model fits better with the existing IOMMU
>>   API: IOVA traffic is shared by all devices in a domain, so should PASID
>>   traffic.
> What's the meaning of "share PASID traffic"? PASID space is system-wide,
> and a domain can have multiple iommu_process , so a domain can have multiple
> PASIDs , one PASID for a iommu_process, right?

Yes, I meant that if a device can access mappings for a specific PASID,
then other devices in that same domain are also able to access them.

A few reasons for this choice in the SMMU:
* As all devices in an IOMMU group will be in the same domain and share
the same PASID traffic, it encompasses that case. Groups are the smallest
isolation granularity, then users are free to choose to put different
IOMMU groups in different domains.
* For architectures that can have both non-PASID and PASID traffic
simultaneously, like the SMMU, it is simpler to reason about PASID tables
being a domain, rather than sharing PASID0 within the domain and handling
all others per device.
* It's the same principle as non-PASID mappings (iommu_map/unmap is on a
domain).
* It implement the classic example of IOMMU architectures where multiple
device descriptors point to the same PASID tables.
* It may be desirable for drivers to share PASIDs within a domain, if they
are actually using domains for conveniently sharing address spaces between
devices. I'm not sure how much this is used as a feature. It does model a
shared bus where each device can snoop DMA, so it may be useful.

bind/unbind operations are done on devices and not domains, though,
because it allows users to know which device supports PASID, PRI, etc.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 00/36] Process management for IOMMU + SVM for SMMUv3
@ 2017-10-09 11:36     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-09 11:36 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 09/10/17 10:49, Yisheng Xie wrote:
> Hi Jean,
> 
> On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
>> Following discussions at plumbers and elsewhere, it seems like we need to
>> unify some of the Shared Virtual Memory (SVM) code, in order to define
>> clear semantics for the SVM API.
>>
>> My previous RFC [1] was centered on the SMMUv3, but some of this code will
>> need to be reused by the SMMUv2 and virtio-iommu drivers. This second
>> proposal focuses on abstracting a little more into the core IOMMU API, and
>> also trying to find common ground for all SVM-capable IOMMUs.
>>
>> SVM is, in the context of the IOMMU, sharing page tables between a process
>> and a device. Traditionally it requires IO Page Fault and Process Address
>> Space ID capabilities in device and IOMMU.
>>
>> * A device driver can bind a process to a device, with iommu_process_bind.
>>   Internally we hold on to the mm and get notified of its activity with an
>>   mmu_notifier. The bond is removed by exit_mm, by a call to
>>   iommu_process_unbind or iommu_detach_device.
>>
>> * iommu_process_bind returns a 20-bit PASID (PCI terminology) to the
>>   device driver, which programs it into the device to access the process
>>   address space.
>>
>> * The device and the IOMMU support recoverable page faults. This can be
>>   either ATS+PRI for PCI, or platform-specific mechanisms such as Stall
>>   for SMMU.
>>
>> Ideally systems wanting to use SVM have to support these three features,
>> but in practice we'll see implementations supporting just a subset of
>> them, especially in validation environments. So even if this particular
>> patchset assumes all three capabilities, it should also be possible to
>> support PASID without IOPF (by pinning everything, see non-system SVM in
>> OpenCL)
> How to pin everything? If user malloc anything we should pin it. It should
> from user or driver?

For userspace drivers, I guess it would be via a VFIO ioctl, that does the
same preparatory work as VFIO_IOMMU_MAP_DMA, but doesn't call iommu_map.
For things like OpenCL SVM buffers, it's the kernel driver that does the
pinning, just like VFIO does it, before launching the work on a SVM buffer.

>> , or IOPF without PASID (sharing the single device address space
>> with a process, could be useful for DPDK+VFIO).
>>
>> Implementing both these cases would enable PT sharing alone. Some people
>> would also like IOPF alone without SVM (covered by this series) or process
>> management without shared PT (not covered). Using these features
>> individually is also important for testing, as SVM is in its infancy and
>> providing easy ways to test is essential to reduce the number of quirks
>> down the line.
>>
>>   Process management
>>   ==================
>>
>> The first part of this series introduces boilerplate code for managing
>> PASIDs and processes bound to devices. It's something any IOMMU driver
>> that wants to support bind/unbind will have to do, and it is difficult to
>> get right.
>>
>> Patches
>> 1: iommu_process and PASID allocation, attach and release
>> 2: process_exit callback for device drivers
>> 3: iommu_process search by PASID
>> 4: track process changes with an MMU notifiers
>> 5: bind and unbind operations
>>
>> My proposal uses the following model:
>>
>> * The PASID space is system-wide. This means that a Linux process will
>>   have a single PASID. I introduce the iommu_process structure and a
>>   global IDR to manage this.
>>
>> * An iommu_process can be bound to multiple domains, and a domain can have
>>   multiple iommu_process.
> when bind a task to device, can we create a single domain for it? I am thinking
> about process management without shared PT(for some device only support PASID
> without pri ability), it seems hard to expand if a domain have multiple iommu_process?
> Do you have any idea about this?

A device always has to be in a domain, as far as I know. Not supporting
PRI forces you to pin down all user mappings (or just the ones you use for
DMA) but you should sill be able to share PT. Now if you don't support
shared PT either, but only PASID, then you'll have to use io-pgtable and a
new map/unmap API on an iommu_process. I don't understand your concern
though, how would the link between process and domains prevent this use-case?

>> * IOMMU groups share same PASID table. IOMMU groups are a convenient way
>>   to cover various hardware weaknesses that prevent a group of device to
>>   be isolated by the IOMMU (untrusted bridge, for instance). It's foolish
>>   to assume that all PASID implementations will perfectly isolate devices
>>   within a bus and functions within a device, so let's assume all devices
>>   within an IOMMU group have to share PASID traffic as well. In general
>>   there will be a single device per group.
>>
>> * It's up to the driver implementation to decide where to implement the
>>   PASID tables. For SMMU it's more convenient to have a single PASID table
>>   per domain. And I think the model fits better with the existing IOMMU
>>   API: IOVA traffic is shared by all devices in a domain, so should PASID
>>   traffic.
> What's the meaning of "share PASID traffic"? PASID space is system-wide,
> and a domain can have multiple iommu_process , so a domain can have multiple
> PASIDs , one PASID for a iommu_process, right?

Yes, I meant that if a device can access mappings for a specific PASID,
then other devices in that same domain are also able to access them.

A few reasons for this choice in the SMMU:
* As all devices in an IOMMU group will be in the same domain and share
the same PASID traffic, it encompasses that case. Groups are the smallest
isolation granularity, then users are free to choose to put different
IOMMU groups in different domains.
* For architectures that can have both non-PASID and PASID traffic
simultaneously, like the SMMU, it is simpler to reason about PASID tables
being a domain, rather than sharing PASID0 within the domain and handling
all others per device.
* It's the same principle as non-PASID mappings (iommu_map/unmap is on a
domain).
* It implement the classic example of IOMMU architectures where multiple
device descriptors point to the same PASID tables.
* It may be desirable for drivers to share PASIDs within a domain, if they
are actually using domains for conveniently sharing address spaces between
devices. I'm not sure how much this is used as a feature. It does model a
shared bus where each device can snoop DMA, so it may be useful.

bind/unbind operations are done on devices and not domains, though,
because it allows users to know which device supports PASID, PRI, etc.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 05/36] iommu/process: Bind and unbind process to and from devices
  2017-10-06 13:31     ` Jean-Philippe Brucker
@ 2017-10-11 11:33       ` Joerg Roedel
  -1 siblings, 0 replies; 268+ messages in thread
From: Joerg Roedel @ 2017-10-11 11:33 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu,
	robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

Hi Jean-Philipe,

Thanks for your patches, this is definitly a step in the right direction
to get generic support for IO virtual memory into the IOMMU core code.

But I see an issue with the design around task_struct, please see
below.

On Fri, Oct 06, 2017 at 02:31:32PM +0100, Jean-Philippe Brucker wrote:
> +int iommu_process_bind_device(struct device *dev, struct task_struct *task,
> +			      int *pasid, int flags)

I just took this patch as an example, it is in the other patches as
well. The code is designed around 'struct task_struct' while it should
really be designed around 'struct mm_struct'. You are not attaching a
specific process to a device, but the address-space of one or more
processes. And that should be reflected in the design.

There are several bad implications of building it around task_struct,
one is the life-time of the binding. If the address space is detached
from the device when the process exits, only the thread that did the
setup can can safely make use of the device, because if the device is
accessed from another thread it will crash when the setup-thread exits.

There are other benefits of using mm_struct, for example that you can
use mmu_notifiers to exit-handling.

Here is how I think the base API should look like:

	* iommu_iovm_device_init(struct device *dev);
	  iommu_iovm_device_shutdown(struct device *dev);

	  These two functions do the device specific setup/shutdown. For
	  PCI this would include enabling the PRI, PASID, and ATS
	  capabilities and setting up a PASID table for the device.

	* iommu_iovm_bind_mm(struct device *dev, struct mm_struct *mm,
			     iovm_shutdown_cb *cb);
	  iommu_iovm_unbind_mm(struct device *dev, struct mm_struct *mm);

	  These functions add and delete the entries in the PASID table
	  for the device and setup mmu_notifiers for the mm_struct to
	  keep IOMMU TLBs in sync with the CPU TLBs.

	  The shutdown_cb is invoked by the IOMMU core code when the
	  mm_struct goes away, for example because the process
	  segfaults.

	The PASID handling is best done these functions as well, unless
	there is a strong reason to allow device drivers to do the
	handling themselves.

The context data can be stored directly in mm_struct, including the
PASID for that mm.

There is of course more functionality needed, the above only outlines
the very basic needs.


Regards,

	Joerg


^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 05/36] iommu/process: Bind and unbind process to and from devices
@ 2017-10-11 11:33       ` Joerg Roedel
  0 siblings, 0 replies; 268+ messages in thread
From: Joerg Roedel @ 2017-10-11 11:33 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jean-Philipe,

Thanks for your patches, this is definitly a step in the right direction
to get generic support for IO virtual memory into the IOMMU core code.

But I see an issue with the design around task_struct, please see
below.

On Fri, Oct 06, 2017 at 02:31:32PM +0100, Jean-Philippe Brucker wrote:
> +int iommu_process_bind_device(struct device *dev, struct task_struct *task,
> +			      int *pasid, int flags)

I just took this patch as an example, it is in the other patches as
well. The code is designed around 'struct task_struct' while it should
really be designed around 'struct mm_struct'. You are not attaching a
specific process to a device, but the address-space of one or more
processes. And that should be reflected in the design.

There are several bad implications of building it around task_struct,
one is the life-time of the binding. If the address space is detached
from the device when the process exits, only the thread that did the
setup can can safely make use of the device, because if the device is
accessed from another thread it will crash when the setup-thread exits.

There are other benefits of using mm_struct, for example that you can
use mmu_notifiers to exit-handling.

Here is how I think the base API should look like:

	* iommu_iovm_device_init(struct device *dev);
	  iommu_iovm_device_shutdown(struct device *dev);

	  These two functions do the device specific setup/shutdown. For
	  PCI this would include enabling the PRI, PASID, and ATS
	  capabilities and setting up a PASID table for the device.

	* iommu_iovm_bind_mm(struct device *dev, struct mm_struct *mm,
			     iovm_shutdown_cb *cb);
	  iommu_iovm_unbind_mm(struct device *dev, struct mm_struct *mm);

	  These functions add and delete the entries in the PASID table
	  for the device and setup mmu_notifiers for the mm_struct to
	  keep IOMMU TLBs in sync with the CPU TLBs.

	  The shutdown_cb is invoked by the IOMMU core code when the
	  mm_struct goes away, for example because the process
	  segfaults.

	The PASID handling is best done these functions as well, unless
	there is a strong reason to allow device drivers to do the
	handling themselves.

The context data can be stored directly in mm_struct, including the
PASID for that mm.

There is of course more functionality needed, the above only outlines
the very basic needs.


Regards,

	Joerg

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 05/36] iommu/process: Bind and unbind process to and from devices
  2017-10-11 11:33       ` Joerg Roedel
  (?)
@ 2017-10-12 11:13         ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-12 11:13 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu,
	robh+dt, Mark Rutland, Catalin Marinas, Will Deacon,
	Lorenzo Pieralisi, hanjun.guo, Sudeep Holla, rjw, lenb,
	Robin Murphy, bhelgaas@google.com

Hi Joerg,

On 11/10/17 12:33, Joerg Roedel wrote:
> Hi Jean-Philipe,
> 
> Thanks for your patches, this is definitly a step in the right direction
> to get generic support for IO virtual memory into the IOMMU core code.
> 
> But I see an issue with the design around task_struct, please see
> below.
> 
> On Fri, Oct 06, 2017 at 02:31:32PM +0100, Jean-Philippe Brucker wrote:
>> +int iommu_process_bind_device(struct device *dev, struct task_struct *task,
>> +			      int *pasid, int flags)
> 
> I just took this patch as an example, it is in the other patches as
> well. The code is designed around 'struct task_struct' while it should
> really be designed around 'struct mm_struct'. You are not attaching a
> specific process to a device, but the address-space of one or more
> processes. And that should be reflected in the design.

Agreed. The task struct is only really needed for obtaining the mm in my
code. It also keeps hold of a pid, but that's wrong and easy to remove.

> There are several bad implications of building it around task_struct,
> one is the life-time of the binding. If the address space is detached
> from the device when the process exits, only the thread that did the
> setup can can safely make use of the device, because if the device is
> accessed from another thread it will crash when the setup-thread exits.
> 
> There are other benefits of using mm_struct, for example that you can
> use mmu_notifiers to exit-handling.
> 
> Here is how I think the base API should look like:
> 
> 	* iommu_iovm_device_init(struct device *dev);
> 	  iommu_iovm_device_shutdown(struct device *dev);
> 
> 	  These two functions do the device specific setup/shutdown. For
> 	  PCI this would include enabling the PRI, PASID, and ATS
> 	  capabilities and setting up a PASID table for the device.

Ok. On SMMU the PASID table also hosts the non-PASID page table pointer,
so the table and capability cannot be setup later than attach_device (and
we'll probably have to enable PASID in add_device). But I suppose it's an
implementation detail.

Some device drivers will want to use ATS alone, for accelerating IOVA
traffic. Should we enable it automatically, or provide device drivers with
a way to enable it manually? According to the PCI spec, PASID has to be
enabled before ATS, so device_init would have to first disable ATS in that
case.

> 	* iommu_iovm_bind_mm(struct device *dev, struct mm_struct *mm,
> 			     iovm_shutdown_cb *cb);
> 	  iommu_iovm_unbind_mm(struct device *dev, struct mm_struct *mm);
> 
> 	  These functions add and delete the entries in the PASID table
> 	  for the device and setup mmu_notifiers for the mm_struct to
> 	  keep IOMMU TLBs in sync with the CPU TLBs.
> 
> 	  The shutdown_cb is invoked by the IOMMU core code when the
> 	  mm_struct goes away, for example because the process
> 	  segfaults.
> 
> 	The PASID handling is best done these functions as well, unless
> 	there is a strong reason to allow device drivers to do the
> 	handling themselves.
> 
> The context data can be stored directly in mm_struct, including the
> PASID for that mm.

Changing mm_struct probably isn't required at the moment, since the mm
subsystem won't use the context data or the PASID. Outside of
drivers/iommu/, only the caller of bind_mm needs the PASID in order to
program it into the device. The only advantage I see would be slightly
faster bind(), when finding out if a mm is already bound to devices. But
we don't really need fast bind(), so I don't think we'd have enough
material to argue for a change in mm_struct.

We do need to allocate a separate "iommu_mm_struct" wrapper to store the
mmu_notifier. Maybe bind() could return this structure (that contains the
PASID), and unbind() would take this iommu_mm_struct as argument?

Thanks
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 05/36] iommu/process: Bind and unbind process to and from devices
@ 2017-10-12 11:13         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-12 11:13 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu,
	robh+dt, Mark Rutland, Catalin Marinas, Will Deacon,
	Lorenzo Pieralisi, hanjun.guo, Sudeep Holla, rjw, lenb,
	Robin Murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

Hi Joerg,

On 11/10/17 12:33, Joerg Roedel wrote:
> Hi Jean-Philipe,
> 
> Thanks for your patches, this is definitly a step in the right direction
> to get generic support for IO virtual memory into the IOMMU core code.
> 
> But I see an issue with the design around task_struct, please see
> below.
> 
> On Fri, Oct 06, 2017 at 02:31:32PM +0100, Jean-Philippe Brucker wrote:
>> +int iommu_process_bind_device(struct device *dev, struct task_struct *task,
>> +			      int *pasid, int flags)
> 
> I just took this patch as an example, it is in the other patches as
> well. The code is designed around 'struct task_struct' while it should
> really be designed around 'struct mm_struct'. You are not attaching a
> specific process to a device, but the address-space of one or more
> processes. And that should be reflected in the design.

Agreed. The task struct is only really needed for obtaining the mm in my
code. It also keeps hold of a pid, but that's wrong and easy to remove.

> There are several bad implications of building it around task_struct,
> one is the life-time of the binding. If the address space is detached
> from the device when the process exits, only the thread that did the
> setup can can safely make use of the device, because if the device is
> accessed from another thread it will crash when the setup-thread exits.
> 
> There are other benefits of using mm_struct, for example that you can
> use mmu_notifiers to exit-handling.
> 
> Here is how I think the base API should look like:
> 
> 	* iommu_iovm_device_init(struct device *dev);
> 	  iommu_iovm_device_shutdown(struct device *dev);
> 
> 	  These two functions do the device specific setup/shutdown. For
> 	  PCI this would include enabling the PRI, PASID, and ATS
> 	  capabilities and setting up a PASID table for the device.

Ok. On SMMU the PASID table also hosts the non-PASID page table pointer,
so the table and capability cannot be setup later than attach_device (and
we'll probably have to enable PASID in add_device). But I suppose it's an
implementation detail.

Some device drivers will want to use ATS alone, for accelerating IOVA
traffic. Should we enable it automatically, or provide device drivers with
a way to enable it manually? According to the PCI spec, PASID has to be
enabled before ATS, so device_init would have to first disable ATS in that
case.

> 	* iommu_iovm_bind_mm(struct device *dev, struct mm_struct *mm,
> 			     iovm_shutdown_cb *cb);
> 	  iommu_iovm_unbind_mm(struct device *dev, struct mm_struct *mm);
> 
> 	  These functions add and delete the entries in the PASID table
> 	  for the device and setup mmu_notifiers for the mm_struct to
> 	  keep IOMMU TLBs in sync with the CPU TLBs.
> 
> 	  The shutdown_cb is invoked by the IOMMU core code when the
> 	  mm_struct goes away, for example because the process
> 	  segfaults.
> 
> 	The PASID handling is best done these functions as well, unless
> 	there is a strong reason to allow device drivers to do the
> 	handling themselves.
> 
> The context data can be stored directly in mm_struct, including the
> PASID for that mm.

Changing mm_struct probably isn't required at the moment, since the mm
subsystem won't use the context data or the PASID. Outside of
drivers/iommu/, only the caller of bind_mm needs the PASID in order to
program it into the device. The only advantage I see would be slightly
faster bind(), when finding out if a mm is already bound to devices. But
we don't really need fast bind(), so I don't think we'd have enough
material to argue for a change in mm_struct.

We do need to allocate a separate "iommu_mm_struct" wrapper to store the
mmu_notifier. Maybe bind() could return this structure (that contains the
PASID), and unbind() would take this iommu_mm_struct as argument?

Thanks
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 05/36] iommu/process: Bind and unbind process to and from devices
@ 2017-10-12 11:13         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-12 11:13 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Joerg,

On 11/10/17 12:33, Joerg Roedel wrote:
> Hi Jean-Philipe,
> 
> Thanks for your patches, this is definitly a step in the right direction
> to get generic support for IO virtual memory into the IOMMU core code.
> 
> But I see an issue with the design around task_struct, please see
> below.
> 
> On Fri, Oct 06, 2017 at 02:31:32PM +0100, Jean-Philippe Brucker wrote:
>> +int iommu_process_bind_device(struct device *dev, struct task_struct *task,
>> +			      int *pasid, int flags)
> 
> I just took this patch as an example, it is in the other patches as
> well. The code is designed around 'struct task_struct' while it should
> really be designed around 'struct mm_struct'. You are not attaching a
> specific process to a device, but the address-space of one or more
> processes. And that should be reflected in the design.

Agreed. The task struct is only really needed for obtaining the mm in my
code. It also keeps hold of a pid, but that's wrong and easy to remove.

> There are several bad implications of building it around task_struct,
> one is the life-time of the binding. If the address space is detached
> from the device when the process exits, only the thread that did the
> setup can can safely make use of the device, because if the device is
> accessed from another thread it will crash when the setup-thread exits.
> 
> There are other benefits of using mm_struct, for example that you can
> use mmu_notifiers to exit-handling.
> 
> Here is how I think the base API should look like:
> 
> 	* iommu_iovm_device_init(struct device *dev);
> 	  iommu_iovm_device_shutdown(struct device *dev);
> 
> 	  These two functions do the device specific setup/shutdown. For
> 	  PCI this would include enabling the PRI, PASID, and ATS
> 	  capabilities and setting up a PASID table for the device.

Ok. On SMMU the PASID table also hosts the non-PASID page table pointer,
so the table and capability cannot be setup later than attach_device (and
we'll probably have to enable PASID in add_device). But I suppose it's an
implementation detail.

Some device drivers will want to use ATS alone, for accelerating IOVA
traffic. Should we enable it automatically, or provide device drivers with
a way to enable it manually? According to the PCI spec, PASID has to be
enabled before ATS, so device_init would have to first disable ATS in that
case.

> 	* iommu_iovm_bind_mm(struct device *dev, struct mm_struct *mm,
> 			     iovm_shutdown_cb *cb);
> 	  iommu_iovm_unbind_mm(struct device *dev, struct mm_struct *mm);
> 
> 	  These functions add and delete the entries in the PASID table
> 	  for the device and setup mmu_notifiers for the mm_struct to
> 	  keep IOMMU TLBs in sync with the CPU TLBs.
> 
> 	  The shutdown_cb is invoked by the IOMMU core code when the
> 	  mm_struct goes away, for example because the process
> 	  segfaults.
> 
> 	The PASID handling is best done these functions as well, unless
> 	there is a strong reason to allow device drivers to do the
> 	handling themselves.
> 
> The context data can be stored directly in mm_struct, including the
> PASID for that mm.

Changing mm_struct probably isn't required at the moment, since the mm
subsystem won't use the context data or the PASID. Outside of
drivers/iommu/, only the caller of bind_mm needs the PASID in order to
program it into the device. The only advantage I see would be slightly
faster bind(), when finding out if a mm is already bound to devices. But
we don't really need fast bind(), so I don't think we'd have enough
material to argue for a change in mm_struct.

We do need to allocate a separate "iommu_mm_struct" wrapper to store the
mmu_notifier. Maybe bind() could return this structure (that contains the
PASID), and unbind() would take this iommu_mm_struct as argument?

Thanks
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 00/36] Process management for IOMMU + SVM for SMMUv3
  2017-10-09 11:36     ` Jean-Philippe Brucker
  (?)
@ 2017-10-12 12:05         ` Yisheng Xie
  -1 siblings, 0 replies; 268+ messages in thread
From: Yisheng Xie @ 2017-10-12 12:05 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: joro-zLv9SwRftAIdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	Mark Rutland, Catalin Marinas, Will Deacon, Lorenzo Pieralisi,
	hanjun.guo-QSEj5FYQhm4dnm+yROfE0A, Sudeep Holla,
	rjw-LthD3rsA81gm4RdzfppkhA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	Robin Murphy, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	tn-nYOzD4b6Jr9Wk0Htik3J/w, liubo95-hv44wF8Li93QT0dZR+AlfA,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	gabriele.paoloni-hv44wF8Li93QT0dZR+AlfA, nwatters

Hi Jean,

Thanks for replying.
On 2017/10/9 19:36, Jean-Philippe Brucker wrote:
> Hi,
> 
> On 09/10/17 10:49, Yisheng Xie wrote:
>> Hi Jean,
>>
>> On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
>>> Following discussions at plumbers and elsewhere, it seems like we need to
>>> unify some of the Shared Virtual Memory (SVM) code, in order to define
>>> clear semantics for the SVM API.
>>>
>>> My previous RFC [1] was centered on the SMMUv3, but some of this code will
>>> need to be reused by the SMMUv2 and virtio-iommu drivers. This second
>>> proposal focuses on abstracting a little more into the core IOMMU API, and
>>> also trying to find common ground for all SVM-capable IOMMUs.
>>>
>>> SVM is, in the context of the IOMMU, sharing page tables between a process
>>> and a device. Traditionally it requires IO Page Fault and Process Address
>>> Space ID capabilities in device and IOMMU.
>>>
>>> * A device driver can bind a process to a device, with iommu_process_bind.
>>>   Internally we hold on to the mm and get notified of its activity with an
>>>   mmu_notifier. The bond is removed by exit_mm, by a call to
>>>   iommu_process_unbind or iommu_detach_device.
>>>
>>> * iommu_process_bind returns a 20-bit PASID (PCI terminology) to the
>>>   device driver, which programs it into the device to access the process
>>>   address space.
>>>
>>> * The device and the IOMMU support recoverable page faults. This can be
>>>   either ATS+PRI for PCI, or platform-specific mechanisms such as Stall
>>>   for SMMU.
>>>
>>> Ideally systems wanting to use SVM have to support these three features,
>>> but in practice we'll see implementations supporting just a subset of
>>> them, especially in validation environments. So even if this particular
>>> patchset assumes all three capabilities, it should also be possible to
>>> support PASID without IOPF (by pinning everything, see non-system SVM in
>>> OpenCL)
>> How to pin everything? If user malloc anything we should pin it. It should
>> from user or driver?
> 
> For userspace drivers, I guess it would be via a VFIO ioctl, that does the
> same preparatory work as VFIO_IOMMU_MAP_DMA, but doesn't call iommu_map.
> For things like OpenCL SVM buffers, it's the kernel driver that does the
> pinning, just like VFIO does it, before launching the work on a SVM buffer.
> 
>>> , or IOPF without PASID (sharing the single device address space
>>> with a process, could be useful for DPDK+VFIO).
>>>
>>> Implementing both these cases would enable PT sharing alone. Some people
>>> would also like IOPF alone without SVM (covered by this series) or process
>>> management without shared PT (not covered). Using these features
>>> individually is also important for testing, as SVM is in its infancy and
>>> providing easy ways to test is essential to reduce the number of quirks
>>> down the line.
>>>
>>>   Process management
>>>   ==================
>>>
>>> The first part of this series introduces boilerplate code for managing
>>> PASIDs and processes bound to devices. It's something any IOMMU driver
>>> that wants to support bind/unbind will have to do, and it is difficult to
>>> get right.
>>>
>>> Patches
>>> 1: iommu_process and PASID allocation, attach and release
>>> 2: process_exit callback for device drivers
>>> 3: iommu_process search by PASID
>>> 4: track process changes with an MMU notifiers
>>> 5: bind and unbind operations
>>>
>>> My proposal uses the following model:
>>>
>>> * The PASID space is system-wide. This means that a Linux process will
>>>   have a single PASID. I introduce the iommu_process structure and a
>>>   global IDR to manage this.
>>>
>>> * An iommu_process can be bound to multiple domains, and a domain can have
>>>   multiple iommu_process.
>> when bind a task to device, can we create a single domain for it? I am thinking
>> about process management without shared PT(for some device only support PASID
>> without pri ability), it seems hard to expand if a domain have multiple iommu_process?
>> Do you have any idea about this?
> 
> A device always has to be in a domain, as far as I know. Not supporting
> PRI forces you to pin down all user mappings (or just the ones you use for
> DMA) but you should sill be able to share PT. Now if you don't support
> shared PT either, but only PASID, then you'll have to use io-pgtable and a
> new map/unmap API on an iommu_process. I don't understand your concern
> though, how would the link between process and domains prevent this use-case?
> 
So you mean that if an iommu_process bind to multiple devices it should create
multiple io-pgtables? or just share the same io-pgtable?

>>> * IOMMU groups share same PASID table. IOMMU groups are a convenient way
>>>   to cover various hardware weaknesses that prevent a group of device to
>>>   be isolated by the IOMMU (untrusted bridge, for instance). It's foolish
>>>   to assume that all PASID implementations will perfectly isolate devices
>>>   within a bus and functions within a device, so let's assume all devices
>>>   within an IOMMU group have to share PASID traffic as well. In general
>>>   there will be a single device per group.
>>>
>>> * It's up to the driver implementation to decide where to implement the
>>>   PASID tables. For SMMU it's more convenient to have a single PASID table
>>>   per domain. And I think the model fits better with the existing IOMMU
>>>   API: IOVA traffic is shared by all devices in a domain, so should PASID
>>>   traffic.
>> What's the meaning of "share PASID traffic"? PASID space is system-wide,
>> and a domain can have multiple iommu_process , so a domain can have multiple
>> PASIDs , one PASID for a iommu_process, right?
I get what your mean now, thanks for your explain.

> 
> Yes, I meant that if a device can access mappings for a specific PASID,
> then other devices in that same domain are also able to access them.
> 
> A few reasons for this choice in the SMMU:
> * As all devices in an IOMMU group will be in the same domain and share
> the same PASID traffic, it encompasses that case. Groups are the smallest
> isolation granularity, then users are free to choose to put different
> IOMMU groups in different domains.
> * For architectures that can have both non-PASID and PASID traffic
> simultaneously, like the SMMU, it is simpler to reason about PASID tables
> being a domain, rather than sharing PASID0 within the domain and handling
> all others per device.
> * It's the same principle as non-PASID mappings (iommu_map/unmap is on a
> domain).
> * It implement the classic example of IOMMU architectures where multiple
> device descriptors point to the same PASID tables.
> * It may be desirable for drivers to share PASIDs within a domain, if they
> are actually using domains for conveniently sharing address spaces between
> devices. I'm not sure how much this is used as a feature. It does model a
> shared bus where each device can snoop DMA, so it may be useful.
> 

I get another question about this design, thinking about the following case:

If a platform device with PASID ability, e.g. accelerator, which have multiple
accelerator process units(APUs), it may create multiple virtual devices, one
virtual device represent for an APU, with the same sid.

They can be in different groups, however must be in the same domain as this
design, for domain held the PASID table, right? So how could they be used by
different guest OS?

Thanks
Yisheng Xie

> bind/unbind operations are done on devices and not domains, though,
> because it allows users to know which device supports PASID, PRI, etc.
> 
> Thanks,
> Jean
> 
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 00/36] Process management for IOMMU + SVM for SMMUv3
@ 2017-10-12 12:05         ` Yisheng Xie
  0 siblings, 0 replies; 268+ messages in thread
From: Yisheng Xie @ 2017-10-12 12:05 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu
  Cc: Mark Rutland, gabriele.paoloni, Catalin Marinas, Will Deacon,
	okaya, yi.l.liu, Lorenzo Pieralisi, ashok.raj, tn, joro, rfranz,
	lenb, jacob.jun.pan, alex.williamson, robh+dt, thunder.leizhen,
	bhelgaas, dwmw2, liubo95, rjw, robdclark, hanjun.guo,
	Sudeep Holla, Robin Murphy, nwatters

Hi Jean,

Thanks for replying.
On 2017/10/9 19:36, Jean-Philippe Brucker wrote:
> Hi,
> 
> On 09/10/17 10:49, Yisheng Xie wrote:
>> Hi Jean,
>>
>> On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
>>> Following discussions at plumbers and elsewhere, it seems like we need to
>>> unify some of the Shared Virtual Memory (SVM) code, in order to define
>>> clear semantics for the SVM API.
>>>
>>> My previous RFC [1] was centered on the SMMUv3, but some of this code will
>>> need to be reused by the SMMUv2 and virtio-iommu drivers. This second
>>> proposal focuses on abstracting a little more into the core IOMMU API, and
>>> also trying to find common ground for all SVM-capable IOMMUs.
>>>
>>> SVM is, in the context of the IOMMU, sharing page tables between a process
>>> and a device. Traditionally it requires IO Page Fault and Process Address
>>> Space ID capabilities in device and IOMMU.
>>>
>>> * A device driver can bind a process to a device, with iommu_process_bind.
>>>   Internally we hold on to the mm and get notified of its activity with an
>>>   mmu_notifier. The bond is removed by exit_mm, by a call to
>>>   iommu_process_unbind or iommu_detach_device.
>>>
>>> * iommu_process_bind returns a 20-bit PASID (PCI terminology) to the
>>>   device driver, which programs it into the device to access the process
>>>   address space.
>>>
>>> * The device and the IOMMU support recoverable page faults. This can be
>>>   either ATS+PRI for PCI, or platform-specific mechanisms such as Stall
>>>   for SMMU.
>>>
>>> Ideally systems wanting to use SVM have to support these three features,
>>> but in practice we'll see implementations supporting just a subset of
>>> them, especially in validation environments. So even if this particular
>>> patchset assumes all three capabilities, it should also be possible to
>>> support PASID without IOPF (by pinning everything, see non-system SVM in
>>> OpenCL)
>> How to pin everything? If user malloc anything we should pin it. It should
>> from user or driver?
> 
> For userspace drivers, I guess it would be via a VFIO ioctl, that does the
> same preparatory work as VFIO_IOMMU_MAP_DMA, but doesn't call iommu_map.
> For things like OpenCL SVM buffers, it's the kernel driver that does the
> pinning, just like VFIO does it, before launching the work on a SVM buffer.
> 
>>> , or IOPF without PASID (sharing the single device address space
>>> with a process, could be useful for DPDK+VFIO).
>>>
>>> Implementing both these cases would enable PT sharing alone. Some people
>>> would also like IOPF alone without SVM (covered by this series) or process
>>> management without shared PT (not covered). Using these features
>>> individually is also important for testing, as SVM is in its infancy and
>>> providing easy ways to test is essential to reduce the number of quirks
>>> down the line.
>>>
>>>   Process management
>>>   ==================
>>>
>>> The first part of this series introduces boilerplate code for managing
>>> PASIDs and processes bound to devices. It's something any IOMMU driver
>>> that wants to support bind/unbind will have to do, and it is difficult to
>>> get right.
>>>
>>> Patches
>>> 1: iommu_process and PASID allocation, attach and release
>>> 2: process_exit callback for device drivers
>>> 3: iommu_process search by PASID
>>> 4: track process changes with an MMU notifiers
>>> 5: bind and unbind operations
>>>
>>> My proposal uses the following model:
>>>
>>> * The PASID space is system-wide. This means that a Linux process will
>>>   have a single PASID. I introduce the iommu_process structure and a
>>>   global IDR to manage this.
>>>
>>> * An iommu_process can be bound to multiple domains, and a domain can have
>>>   multiple iommu_process.
>> when bind a task to device, can we create a single domain for it? I am thinking
>> about process management without shared PT(for some device only support PASID
>> without pri ability), it seems hard to expand if a domain have multiple iommu_process?
>> Do you have any idea about this?
> 
> A device always has to be in a domain, as far as I know. Not supporting
> PRI forces you to pin down all user mappings (or just the ones you use for
> DMA) but you should sill be able to share PT. Now if you don't support
> shared PT either, but only PASID, then you'll have to use io-pgtable and a
> new map/unmap API on an iommu_process. I don't understand your concern
> though, how would the link between process and domains prevent this use-case?
> 
So you mean that if an iommu_process bind to multiple devices it should create
multiple io-pgtables? or just share the same io-pgtable?

>>> * IOMMU groups share same PASID table. IOMMU groups are a convenient way
>>>   to cover various hardware weaknesses that prevent a group of device to
>>>   be isolated by the IOMMU (untrusted bridge, for instance). It's foolish
>>>   to assume that all PASID implementations will perfectly isolate devices
>>>   within a bus and functions within a device, so let's assume all devices
>>>   within an IOMMU group have to share PASID traffic as well. In general
>>>   there will be a single device per group.
>>>
>>> * It's up to the driver implementation to decide where to implement the
>>>   PASID tables. For SMMU it's more convenient to have a single PASID table
>>>   per domain. And I think the model fits better with the existing IOMMU
>>>   API: IOVA traffic is shared by all devices in a domain, so should PASID
>>>   traffic.
>> What's the meaning of "share PASID traffic"? PASID space is system-wide,
>> and a domain can have multiple iommu_process , so a domain can have multiple
>> PASIDs , one PASID for a iommu_process, right?
I get what your mean now, thanks for your explain.

> 
> Yes, I meant that if a device can access mappings for a specific PASID,
> then other devices in that same domain are also able to access them.
> 
> A few reasons for this choice in the SMMU:
> * As all devices in an IOMMU group will be in the same domain and share
> the same PASID traffic, it encompasses that case. Groups are the smallest
> isolation granularity, then users are free to choose to put different
> IOMMU groups in different domains.
> * For architectures that can have both non-PASID and PASID traffic
> simultaneously, like the SMMU, it is simpler to reason about PASID tables
> being a domain, rather than sharing PASID0 within the domain and handling
> all others per device.
> * It's the same principle as non-PASID mappings (iommu_map/unmap is on a
> domain).
> * It implement the classic example of IOMMU architectures where multiple
> device descriptors point to the same PASID tables.
> * It may be desirable for drivers to share PASIDs within a domain, if they
> are actually using domains for conveniently sharing address spaces between
> devices. I'm not sure how much this is used as a feature. It does model a
> shared bus where each device can snoop DMA, so it may be useful.
> 

I get another question about this design, thinking about the following case:

If a platform device with PASID ability, e.g. accelerator, which have multiple
accelerator process units(APUs), it may create multiple virtual devices, one
virtual device represent for an APU, with the same sid.

They can be in different groups, however must be in the same domain as this
design, for domain held the PASID table, right? So how could they be used by
different guest OS?

Thanks
Yisheng Xie

> bind/unbind operations are done on devices and not domains, though,
> because it allows users to know which device supports PASID, PRI, etc.
> 
> Thanks,
> Jean
> 
> .
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 00/36] Process management for IOMMU + SVM for SMMUv3
@ 2017-10-12 12:05         ` Yisheng Xie
  0 siblings, 0 replies; 268+ messages in thread
From: Yisheng Xie @ 2017-10-12 12:05 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jean,

Thanks for replying.
On 2017/10/9 19:36, Jean-Philippe Brucker wrote:
> Hi,
> 
> On 09/10/17 10:49, Yisheng Xie wrote:
>> Hi Jean,
>>
>> On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
>>> Following discussions at plumbers and elsewhere, it seems like we need to
>>> unify some of the Shared Virtual Memory (SVM) code, in order to define
>>> clear semantics for the SVM API.
>>>
>>> My previous RFC [1] was centered on the SMMUv3, but some of this code will
>>> need to be reused by the SMMUv2 and virtio-iommu drivers. This second
>>> proposal focuses on abstracting a little more into the core IOMMU API, and
>>> also trying to find common ground for all SVM-capable IOMMUs.
>>>
>>> SVM is, in the context of the IOMMU, sharing page tables between a process
>>> and a device. Traditionally it requires IO Page Fault and Process Address
>>> Space ID capabilities in device and IOMMU.
>>>
>>> * A device driver can bind a process to a device, with iommu_process_bind.
>>>   Internally we hold on to the mm and get notified of its activity with an
>>>   mmu_notifier. The bond is removed by exit_mm, by a call to
>>>   iommu_process_unbind or iommu_detach_device.
>>>
>>> * iommu_process_bind returns a 20-bit PASID (PCI terminology) to the
>>>   device driver, which programs it into the device to access the process
>>>   address space.
>>>
>>> * The device and the IOMMU support recoverable page faults. This can be
>>>   either ATS+PRI for PCI, or platform-specific mechanisms such as Stall
>>>   for SMMU.
>>>
>>> Ideally systems wanting to use SVM have to support these three features,
>>> but in practice we'll see implementations supporting just a subset of
>>> them, especially in validation environments. So even if this particular
>>> patchset assumes all three capabilities, it should also be possible to
>>> support PASID without IOPF (by pinning everything, see non-system SVM in
>>> OpenCL)
>> How to pin everything? If user malloc anything we should pin it. It should
>> from user or driver?
> 
> For userspace drivers, I guess it would be via a VFIO ioctl, that does the
> same preparatory work as VFIO_IOMMU_MAP_DMA, but doesn't call iommu_map.
> For things like OpenCL SVM buffers, it's the kernel driver that does the
> pinning, just like VFIO does it, before launching the work on a SVM buffer.
> 
>>> , or IOPF without PASID (sharing the single device address space
>>> with a process, could be useful for DPDK+VFIO).
>>>
>>> Implementing both these cases would enable PT sharing alone. Some people
>>> would also like IOPF alone without SVM (covered by this series) or process
>>> management without shared PT (not covered). Using these features
>>> individually is also important for testing, as SVM is in its infancy and
>>> providing easy ways to test is essential to reduce the number of quirks
>>> down the line.
>>>
>>>   Process management
>>>   ==================
>>>
>>> The first part of this series introduces boilerplate code for managing
>>> PASIDs and processes bound to devices. It's something any IOMMU driver
>>> that wants to support bind/unbind will have to do, and it is difficult to
>>> get right.
>>>
>>> Patches
>>> 1: iommu_process and PASID allocation, attach and release
>>> 2: process_exit callback for device drivers
>>> 3: iommu_process search by PASID
>>> 4: track process changes with an MMU notifiers
>>> 5: bind and unbind operations
>>>
>>> My proposal uses the following model:
>>>
>>> * The PASID space is system-wide. This means that a Linux process will
>>>   have a single PASID. I introduce the iommu_process structure and a
>>>   global IDR to manage this.
>>>
>>> * An iommu_process can be bound to multiple domains, and a domain can have
>>>   multiple iommu_process.
>> when bind a task to device, can we create a single domain for it? I am thinking
>> about process management without shared PT(for some device only support PASID
>> without pri ability), it seems hard to expand if a domain have multiple iommu_process?
>> Do you have any idea about this?
> 
> A device always has to be in a domain, as far as I know. Not supporting
> PRI forces you to pin down all user mappings (or just the ones you use for
> DMA) but you should sill be able to share PT. Now if you don't support
> shared PT either, but only PASID, then you'll have to use io-pgtable and a
> new map/unmap API on an iommu_process. I don't understand your concern
> though, how would the link between process and domains prevent this use-case?
> 
So you mean that if an iommu_process bind to multiple devices it should create
multiple io-pgtables? or just share the same io-pgtable?

>>> * IOMMU groups share same PASID table. IOMMU groups are a convenient way
>>>   to cover various hardware weaknesses that prevent a group of device to
>>>   be isolated by the IOMMU (untrusted bridge, for instance). It's foolish
>>>   to assume that all PASID implementations will perfectly isolate devices
>>>   within a bus and functions within a device, so let's assume all devices
>>>   within an IOMMU group have to share PASID traffic as well. In general
>>>   there will be a single device per group.
>>>
>>> * It's up to the driver implementation to decide where to implement the
>>>   PASID tables. For SMMU it's more convenient to have a single PASID table
>>>   per domain. And I think the model fits better with the existing IOMMU
>>>   API: IOVA traffic is shared by all devices in a domain, so should PASID
>>>   traffic.
>> What's the meaning of "share PASID traffic"? PASID space is system-wide,
>> and a domain can have multiple iommu_process , so a domain can have multiple
>> PASIDs , one PASID for a iommu_process, right?
I get what your mean now, thanks for your explain.

> 
> Yes, I meant that if a device can access mappings for a specific PASID,
> then other devices in that same domain are also able to access them.
> 
> A few reasons for this choice in the SMMU:
> * As all devices in an IOMMU group will be in the same domain and share
> the same PASID traffic, it encompasses that case. Groups are the smallest
> isolation granularity, then users are free to choose to put different
> IOMMU groups in different domains.
> * For architectures that can have both non-PASID and PASID traffic
> simultaneously, like the SMMU, it is simpler to reason about PASID tables
> being a domain, rather than sharing PASID0 within the domain and handling
> all others per device.
> * It's the same principle as non-PASID mappings (iommu_map/unmap is on a
> domain).
> * It implement the classic example of IOMMU architectures where multiple
> device descriptors point to the same PASID tables.
> * It may be desirable for drivers to share PASIDs within a domain, if they
> are actually using domains for conveniently sharing address spaces between
> devices. I'm not sure how much this is used as a feature. It does model a
> shared bus where each device can snoop DMA, so it may be useful.
> 

I get another question about this design, thinking about the following case:

If a platform device with PASID ability, e.g. accelerator, which have multiple
accelerator process units(APUs), it may create multiple virtual devices, one
virtual device represent for an APU, with the same sid.

They can be in different groups, however must be in the same domain as this
design, for domain held the PASID table, right? So how could they be used by
different guest OS?

Thanks
Yisheng Xie

> bind/unbind operations are done on devices and not domains, though,
> because it allows users to know which device supports PASID, PRI, etc.
> 
> Thanks,
> Jean
> 
> .
> 

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 05/36] iommu/process: Bind and unbind process to and from devices
  2017-10-12 11:13         ` Jean-Philippe Brucker
  (?)
@ 2017-10-12 12:47             ` Joerg Roedel
  -1 siblings, 0 replies; 268+ messages in thread
From: Joerg Roedel @ 2017-10-12 12:47 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, Mark Rutland, Catalin Marinas,
	Will Deacon, Lorenzo Pieralisi,
	hanjun.guo-QSEj5FYQhm4dnm+yROfE0A, Sudeep Holla,
	rjw-LthD3rsA81gm4RdzfppkhA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	Robin Murphy, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org

Hi Jean-Philippe,

On Thu, Oct 12, 2017 at 12:13:20PM +0100, Jean-Philippe Brucker wrote:
> On 11/10/17 12:33, Joerg Roedel wrote:
> > Here is how I think the base API should look like:
> > 
> > 	* iommu_iovm_device_init(struct device *dev);
> > 	  iommu_iovm_device_shutdown(struct device *dev);
> > 
> > 	  These two functions do the device specific setup/shutdown. For
> > 	  PCI this would include enabling the PRI, PASID, and ATS
> > 	  capabilities and setting up a PASID table for the device.
> 
> Ok. On SMMU the PASID table also hosts the non-PASID page table pointer,
> so the table and capability cannot be setup later than attach_device (and
> we'll probably have to enable PASID in add_device). But I suppose it's an
> implementation detail.

Right, when the capabilities are enabled is an implementation detail of
the iommu-drivers. 

> Some device drivers will want to use ATS alone, for accelerating IOVA
> traffic. Should we enable it automatically, or provide device drivers with
> a way to enable it manually? According to the PCI spec, PASID has to be
> enabled before ATS, so device_init would have to first disable ATS in that
> case.

Yes, driver can enable ATS for normal use of a device, and
disable/re-enable it when the driver requests PASID/PRI functionality.
That is also an implementation detail. You should probably also document
that the init/shutdown functions may interrupt device operation, so that
driver writers are aware of that.

> 
> > 	* iommu_iovm_bind_mm(struct device *dev, struct mm_struct *mm,
> > 			     iovm_shutdown_cb *cb);
> > 	  iommu_iovm_unbind_mm(struct device *dev, struct mm_struct *mm);
> > 
> > 	  These functions add and delete the entries in the PASID table
> > 	  for the device and setup mmu_notifiers for the mm_struct to
> > 	  keep IOMMU TLBs in sync with the CPU TLBs.
> > 
> > 	  The shutdown_cb is invoked by the IOMMU core code when the
> > 	  mm_struct goes away, for example because the process
> > 	  segfaults.
> > 
> > 	The PASID handling is best done these functions as well, unless
> > 	there is a strong reason to allow device drivers to do the
> > 	handling themselves.
> > 
> > The context data can be stored directly in mm_struct, including the
> > PASID for that mm.
> 
> Changing mm_struct probably isn't required at the moment, since the mm
> subsystem won't use the context data or the PASID. Outside of
> drivers/iommu/, only the caller of bind_mm needs the PASID in order to
> program it into the device. The only advantage I see would be slightly
> faster bind(), when finding out if a mm is already bound to devices. But
> we don't really need fast bind(), so I don't think we'd have enough
> material to argue for a change in mm_struct.

The idea behind storing the PASID in mm_struct is that we have a
system-wide PASID-allocator and only one PASID per address space, even
when accessed from multiple devices.

There will be hardware implemenations where this is required, afaik. It
doesn't mean that it _needs_ to be part of mm_struct, but it is
certainly easier than tracking this 1-1 relation separatly.

> We do need to allocate a separate "iommu_mm_struct" wrapper to store the
> mmu_notifier. Maybe bind() could return this structure (that contains the
> PASID), and unbind() would take this iommu_mm_struct as argument?

I'd like to track this iommu_mm_struct only in iommu-code, otherwise
drivers need to track the pointer somewhere. And we need to track the
mm_struct->iommu_mm_struct relation anyway in core-code to handle events
like segfaults, when the whole mm_struct goes away under us. So when we
track this in core code, there is no need to track this again in the
device drivers.


Regards,

	Joerg

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 05/36] iommu/process: Bind and unbind process to and from devices
@ 2017-10-12 12:47             ` Joerg Roedel
  0 siblings, 0 replies; 268+ messages in thread
From: Joerg Roedel @ 2017-10-12 12:47 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Mark Rutland, xieyisheng1, gabriele.paoloni, linux-pci,
	Will Deacon, okaya, yi.l.liu, Lorenzo Pieralisi, ashok.raj, tn,
	robdclark, linux-acpi, Catalin Marinas, rfranz, lenb, devicetree,
	jacob.jun.pan, alex.williamson, robh+dt, thunder.leizhen,
	bhelgaas, linux-arm-kernel, dwmw2, liubo95, rjw, iommu,
	hanjun.guo, Sudeep Holla, Robin Murphy, nwatters

Hi Jean-Philippe,

On Thu, Oct 12, 2017 at 12:13:20PM +0100, Jean-Philippe Brucker wrote:
> On 11/10/17 12:33, Joerg Roedel wrote:
> > Here is how I think the base API should look like:
> > 
> > 	* iommu_iovm_device_init(struct device *dev);
> > 	  iommu_iovm_device_shutdown(struct device *dev);
> > 
> > 	  These two functions do the device specific setup/shutdown. For
> > 	  PCI this would include enabling the PRI, PASID, and ATS
> > 	  capabilities and setting up a PASID table for the device.
> 
> Ok. On SMMU the PASID table also hosts the non-PASID page table pointer,
> so the table and capability cannot be setup later than attach_device (and
> we'll probably have to enable PASID in add_device). But I suppose it's an
> implementation detail.

Right, when the capabilities are enabled is an implementation detail of
the iommu-drivers. 

> Some device drivers will want to use ATS alone, for accelerating IOVA
> traffic. Should we enable it automatically, or provide device drivers with
> a way to enable it manually? According to the PCI spec, PASID has to be
> enabled before ATS, so device_init would have to first disable ATS in that
> case.

Yes, driver can enable ATS for normal use of a device, and
disable/re-enable it when the driver requests PASID/PRI functionality.
That is also an implementation detail. You should probably also document
that the init/shutdown functions may interrupt device operation, so that
driver writers are aware of that.

> 
> > 	* iommu_iovm_bind_mm(struct device *dev, struct mm_struct *mm,
> > 			     iovm_shutdown_cb *cb);
> > 	  iommu_iovm_unbind_mm(struct device *dev, struct mm_struct *mm);
> > 
> > 	  These functions add and delete the entries in the PASID table
> > 	  for the device and setup mmu_notifiers for the mm_struct to
> > 	  keep IOMMU TLBs in sync with the CPU TLBs.
> > 
> > 	  The shutdown_cb is invoked by the IOMMU core code when the
> > 	  mm_struct goes away, for example because the process
> > 	  segfaults.
> > 
> > 	The PASID handling is best done these functions as well, unless
> > 	there is a strong reason to allow device drivers to do the
> > 	handling themselves.
> > 
> > The context data can be stored directly in mm_struct, including the
> > PASID for that mm.
> 
> Changing mm_struct probably isn't required at the moment, since the mm
> subsystem won't use the context data or the PASID. Outside of
> drivers/iommu/, only the caller of bind_mm needs the PASID in order to
> program it into the device. The only advantage I see would be slightly
> faster bind(), when finding out if a mm is already bound to devices. But
> we don't really need fast bind(), so I don't think we'd have enough
> material to argue for a change in mm_struct.

The idea behind storing the PASID in mm_struct is that we have a
system-wide PASID-allocator and only one PASID per address space, even
when accessed from multiple devices.

There will be hardware implemenations where this is required, afaik. It
doesn't mean that it _needs_ to be part of mm_struct, but it is
certainly easier than tracking this 1-1 relation separatly.

> We do need to allocate a separate "iommu_mm_struct" wrapper to store the
> mmu_notifier. Maybe bind() could return this structure (that contains the
> PASID), and unbind() would take this iommu_mm_struct as argument?

I'd like to track this iommu_mm_struct only in iommu-code, otherwise
drivers need to track the pointer somewhere. And we need to track the
mm_struct->iommu_mm_struct relation anyway in core-code to handle events
like segfaults, when the whole mm_struct goes away under us. So when we
track this in core code, there is no need to track this again in the
device drivers.


Regards,

	Joerg


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 05/36] iommu/process: Bind and unbind process to and from devices
@ 2017-10-12 12:47             ` Joerg Roedel
  0 siblings, 0 replies; 268+ messages in thread
From: Joerg Roedel @ 2017-10-12 12:47 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jean-Philippe,

On Thu, Oct 12, 2017 at 12:13:20PM +0100, Jean-Philippe Brucker wrote:
> On 11/10/17 12:33, Joerg Roedel wrote:
> > Here is how I think the base API should look like:
> > 
> > 	* iommu_iovm_device_init(struct device *dev);
> > 	  iommu_iovm_device_shutdown(struct device *dev);
> > 
> > 	  These two functions do the device specific setup/shutdown. For
> > 	  PCI this would include enabling the PRI, PASID, and ATS
> > 	  capabilities and setting up a PASID table for the device.
> 
> Ok. On SMMU the PASID table also hosts the non-PASID page table pointer,
> so the table and capability cannot be setup later than attach_device (and
> we'll probably have to enable PASID in add_device). But I suppose it's an
> implementation detail.

Right, when the capabilities are enabled is an implementation detail of
the iommu-drivers. 

> Some device drivers will want to use ATS alone, for accelerating IOVA
> traffic. Should we enable it automatically, or provide device drivers with
> a way to enable it manually? According to the PCI spec, PASID has to be
> enabled before ATS, so device_init would have to first disable ATS in that
> case.

Yes, driver can enable ATS for normal use of a device, and
disable/re-enable it when the driver requests PASID/PRI functionality.
That is also an implementation detail. You should probably also document
that the init/shutdown functions may interrupt device operation, so that
driver writers are aware of that.

> 
> > 	* iommu_iovm_bind_mm(struct device *dev, struct mm_struct *mm,
> > 			     iovm_shutdown_cb *cb);
> > 	  iommu_iovm_unbind_mm(struct device *dev, struct mm_struct *mm);
> > 
> > 	  These functions add and delete the entries in the PASID table
> > 	  for the device and setup mmu_notifiers for the mm_struct to
> > 	  keep IOMMU TLBs in sync with the CPU TLBs.
> > 
> > 	  The shutdown_cb is invoked by the IOMMU core code when the
> > 	  mm_struct goes away, for example because the process
> > 	  segfaults.
> > 
> > 	The PASID handling is best done these functions as well, unless
> > 	there is a strong reason to allow device drivers to do the
> > 	handling themselves.
> > 
> > The context data can be stored directly in mm_struct, including the
> > PASID for that mm.
> 
> Changing mm_struct probably isn't required at the moment, since the mm
> subsystem won't use the context data or the PASID. Outside of
> drivers/iommu/, only the caller of bind_mm needs the PASID in order to
> program it into the device. The only advantage I see would be slightly
> faster bind(), when finding out if a mm is already bound to devices. But
> we don't really need fast bind(), so I don't think we'd have enough
> material to argue for a change in mm_struct.

The idea behind storing the PASID in mm_struct is that we have a
system-wide PASID-allocator and only one PASID per address space, even
when accessed from multiple devices.

There will be hardware implemenations where this is required, afaik. It
doesn't mean that it _needs_ to be part of mm_struct, but it is
certainly easier than tracking this 1-1 relation separatly.

> We do need to allocate a separate "iommu_mm_struct" wrapper to store the
> mmu_notifier. Maybe bind() could return this structure (that contains the
> PASID), and unbind() would take this iommu_mm_struct as argument?

I'd like to track this iommu_mm_struct only in iommu-code, otherwise
drivers need to track the pointer somewhere. And we need to track the
mm_struct->iommu_mm_struct relation anyway in core-code to handle events
like segfaults, when the whole mm_struct goes away under us. So when we
track this in core code, there is no need to track this again in the
device drivers.


Regards,

	Joerg

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 00/36] Process management for IOMMU + SVM for SMMUv3
  2017-10-12 12:05         ` Yisheng Xie
  (?)
@ 2017-10-12 12:55           ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-12 12:55 UTC (permalink / raw)
  To: Yisheng Xie, linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, Mark Rutland, Catalin Marinas, Will Deacon,
	Lorenzo Pieralisi, hanjun.guo, Sudeep Holla, rjw, lenb,
	Robin Murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, gabriele.paoloni, nwatters

On 12/10/17 13:05, Yisheng Xie wrote:
[...]
>>>> * An iommu_process can be bound to multiple domains, and a domain can have
>>>>   multiple iommu_process.
>>> when bind a task to device, can we create a single domain for it? I am thinking
>>> about process management without shared PT(for some device only support PASID
>>> without pri ability), it seems hard to expand if a domain have multiple iommu_process?
>>> Do you have any idea about this?
>>
>> A device always has to be in a domain, as far as I know. Not supporting
>> PRI forces you to pin down all user mappings (or just the ones you use for
>> DMA) but you should sill be able to share PT. Now if you don't support
>> shared PT either, but only PASID, then you'll have to use io-pgtable and a
>> new map/unmap API on an iommu_process. I don't understand your concern
>> though, how would the link between process and domains prevent this use-case?
>>
> So you mean that if an iommu_process bind to multiple devices it should create
> multiple io-pgtables? or just share the same io-pgtable?

I don't know to be honest, I haven't thought much about the io-pgtable
case, I'm all about sharing the mm :)

It really depends on what the user (GPU driver I assume) wants. I think
that if you're not sharing an mm with the device, then you're trying to
hide parts of the process to the device, so you'd also want the
flexibility of having different io-pgtables between devices. Different
devices accessing isolated parts of the process requires separate io-pgtables.

>>>> * IOMMU groups share same PASID table. IOMMU groups are a convenient way
>>>>   to cover various hardware weaknesses that prevent a group of device to
>>>>   be isolated by the IOMMU (untrusted bridge, for instance). It's foolish
>>>>   to assume that all PASID implementations will perfectly isolate devices
>>>>   within a bus and functions within a device, so let's assume all devices
>>>>   within an IOMMU group have to share PASID traffic as well. In general
>>>>   there will be a single device per group.
>>>>
>>>> * It's up to the driver implementation to decide where to implement the
>>>>   PASID tables. For SMMU it's more convenient to have a single PASID table
>>>>   per domain. And I think the model fits better with the existing IOMMU
>>>>   API: IOVA traffic is shared by all devices in a domain, so should PASID
>>>>   traffic.
>>> What's the meaning of "share PASID traffic"? PASID space is system-wide,
>>> and a domain can have multiple iommu_process , so a domain can have multiple
>>> PASIDs , one PASID for a iommu_process, right?
> I get what your mean now, thanks for your explain.
> 
>>
>> Yes, I meant that if a device can access mappings for a specific PASID,
>> then other devices in that same domain are also able to access them.
>>
>> A few reasons for this choice in the SMMU:
>> * As all devices in an IOMMU group will be in the same domain and share
>> the same PASID traffic, it encompasses that case. Groups are the smallest
>> isolation granularity, then users are free to choose to put different
>> IOMMU groups in different domains.
>> * For architectures that can have both non-PASID and PASID traffic
>> simultaneously, like the SMMU, it is simpler to reason about PASID tables
>> being a domain, rather than sharing PASID0 within the domain and handling
>> all others per device.
>> * It's the same principle as non-PASID mappings (iommu_map/unmap is on a
>> domain).
>> * It implement the classic example of IOMMU architectures where multiple
>> device descriptors point to the same PASID tables.
>> * It may be desirable for drivers to share PASIDs within a domain, if they
>> are actually using domains for conveniently sharing address spaces between
>> devices. I'm not sure how much this is used as a feature. It does model a
>> shared bus where each device can snoop DMA, so it may be useful.
>>
> 
> I get another question about this design, thinking about the following case:
> 
> If a platform device with PASID ability, e.g. accelerator, which have multiple
> accelerator process units(APUs), it may create multiple virtual devices, one
> virtual device represent for an APU, with the same sid.
> 
> They can be in different groups, however must be in the same domain as this
> design, for domain held the PASID table, right? So how could they be used by
> different guest OS?

If they have the same SID, they will be in the same group as there will be
a single entry in the SMMU stream table. Otherwise, if the virtual devices
can be properly isolated from each other (in the same way as PCI SR-IOV),
they will each have their own SID, can each be in a different IOMMU group
and can be assigned to different guests.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 00/36] Process management for IOMMU + SVM for SMMUv3
@ 2017-10-12 12:55           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-12 12:55 UTC (permalink / raw)
  To: Yisheng Xie, linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, Mark Rutland, Catalin Marinas, Will Deacon,
	Lorenzo Pieralisi, hanjun.guo, Sudeep Holla, rjw, lenb,
	Robin Murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, gabriele.paoloni, nwatters, okaya, rfranz,
	dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

On 12/10/17 13:05, Yisheng Xie wrote:
[...]
>>>> * An iommu_process can be bound to multiple domains, and a domain can have
>>>>   multiple iommu_process.
>>> when bind a task to device, can we create a single domain for it? I am thinking
>>> about process management without shared PT(for some device only support PASID
>>> without pri ability), it seems hard to expand if a domain have multiple iommu_process?
>>> Do you have any idea about this?
>>
>> A device always has to be in a domain, as far as I know. Not supporting
>> PRI forces you to pin down all user mappings (or just the ones you use for
>> DMA) but you should sill be able to share PT. Now if you don't support
>> shared PT either, but only PASID, then you'll have to use io-pgtable and a
>> new map/unmap API on an iommu_process. I don't understand your concern
>> though, how would the link between process and domains prevent this use-case?
>>
> So you mean that if an iommu_process bind to multiple devices it should create
> multiple io-pgtables? or just share the same io-pgtable?

I don't know to be honest, I haven't thought much about the io-pgtable
case, I'm all about sharing the mm :)

It really depends on what the user (GPU driver I assume) wants. I think
that if you're not sharing an mm with the device, then you're trying to
hide parts of the process to the device, so you'd also want the
flexibility of having different io-pgtables between devices. Different
devices accessing isolated parts of the process requires separate io-pgtables.

>>>> * IOMMU groups share same PASID table. IOMMU groups are a convenient way
>>>>   to cover various hardware weaknesses that prevent a group of device to
>>>>   be isolated by the IOMMU (untrusted bridge, for instance). It's foolish
>>>>   to assume that all PASID implementations will perfectly isolate devices
>>>>   within a bus and functions within a device, so let's assume all devices
>>>>   within an IOMMU group have to share PASID traffic as well. In general
>>>>   there will be a single device per group.
>>>>
>>>> * It's up to the driver implementation to decide where to implement the
>>>>   PASID tables. For SMMU it's more convenient to have a single PASID table
>>>>   per domain. And I think the model fits better with the existing IOMMU
>>>>   API: IOVA traffic is shared by all devices in a domain, so should PASID
>>>>   traffic.
>>> What's the meaning of "share PASID traffic"? PASID space is system-wide,
>>> and a domain can have multiple iommu_process , so a domain can have multiple
>>> PASIDs , one PASID for a iommu_process, right?
> I get what your mean now, thanks for your explain.
> 
>>
>> Yes, I meant that if a device can access mappings for a specific PASID,
>> then other devices in that same domain are also able to access them.
>>
>> A few reasons for this choice in the SMMU:
>> * As all devices in an IOMMU group will be in the same domain and share
>> the same PASID traffic, it encompasses that case. Groups are the smallest
>> isolation granularity, then users are free to choose to put different
>> IOMMU groups in different domains.
>> * For architectures that can have both non-PASID and PASID traffic
>> simultaneously, like the SMMU, it is simpler to reason about PASID tables
>> being a domain, rather than sharing PASID0 within the domain and handling
>> all others per device.
>> * It's the same principle as non-PASID mappings (iommu_map/unmap is on a
>> domain).
>> * It implement the classic example of IOMMU architectures where multiple
>> device descriptors point to the same PASID tables.
>> * It may be desirable for drivers to share PASIDs within a domain, if they
>> are actually using domains for conveniently sharing address spaces between
>> devices. I'm not sure how much this is used as a feature. It does model a
>> shared bus where each device can snoop DMA, so it may be useful.
>>
> 
> I get another question about this design, thinking about the following case:
> 
> If a platform device with PASID ability, e.g. accelerator, which have multiple
> accelerator process units(APUs), it may create multiple virtual devices, one
> virtual device represent for an APU, with the same sid.
> 
> They can be in different groups, however must be in the same domain as this
> design, for domain held the PASID table, right? So how could they be used by
> different guest OS?

If they have the same SID, they will be in the same group as there will be
a single entry in the SMMU stream table. Otherwise, if the virtual devices
can be properly isolated from each other (in the same way as PCI SR-IOV),
they will each have their own SID, can each be in a different IOMMU group
and can be assigned to different guests.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 00/36] Process management for IOMMU + SVM for SMMUv3
@ 2017-10-12 12:55           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-12 12:55 UTC (permalink / raw)
  To: linux-arm-kernel

On 12/10/17 13:05, Yisheng Xie wrote:
[...]
>>>> * An iommu_process can be bound to multiple domains, and a domain can have
>>>>   multiple iommu_process.
>>> when bind a task to device, can we create a single domain for it? I am thinking
>>> about process management without shared PT(for some device only support PASID
>>> without pri ability), it seems hard to expand if a domain have multiple iommu_process?
>>> Do you have any idea about this?
>>
>> A device always has to be in a domain, as far as I know. Not supporting
>> PRI forces you to pin down all user mappings (or just the ones you use for
>> DMA) but you should sill be able to share PT. Now if you don't support
>> shared PT either, but only PASID, then you'll have to use io-pgtable and a
>> new map/unmap API on an iommu_process. I don't understand your concern
>> though, how would the link between process and domains prevent this use-case?
>>
> So you mean that if an iommu_process bind to multiple devices it should create
> multiple io-pgtables? or just share the same io-pgtable?

I don't know to be honest, I haven't thought much about the io-pgtable
case, I'm all about sharing the mm :)

It really depends on what the user (GPU driver I assume) wants. I think
that if you're not sharing an mm with the device, then you're trying to
hide parts of the process to the device, so you'd also want the
flexibility of having different io-pgtables between devices. Different
devices accessing isolated parts of the process requires separate io-pgtables.

>>>> * IOMMU groups share same PASID table. IOMMU groups are a convenient way
>>>>   to cover various hardware weaknesses that prevent a group of device to
>>>>   be isolated by the IOMMU (untrusted bridge, for instance). It's foolish
>>>>   to assume that all PASID implementations will perfectly isolate devices
>>>>   within a bus and functions within a device, so let's assume all devices
>>>>   within an IOMMU group have to share PASID traffic as well. In general
>>>>   there will be a single device per group.
>>>>
>>>> * It's up to the driver implementation to decide where to implement the
>>>>   PASID tables. For SMMU it's more convenient to have a single PASID table
>>>>   per domain. And I think the model fits better with the existing IOMMU
>>>>   API: IOVA traffic is shared by all devices in a domain, so should PASID
>>>>   traffic.
>>> What's the meaning of "share PASID traffic"? PASID space is system-wide,
>>> and a domain can have multiple iommu_process , so a domain can have multiple
>>> PASIDs , one PASID for a iommu_process, right?
> I get what your mean now, thanks for your explain.
> 
>>
>> Yes, I meant that if a device can access mappings for a specific PASID,
>> then other devices in that same domain are also able to access them.
>>
>> A few reasons for this choice in the SMMU:
>> * As all devices in an IOMMU group will be in the same domain and share
>> the same PASID traffic, it encompasses that case. Groups are the smallest
>> isolation granularity, then users are free to choose to put different
>> IOMMU groups in different domains.
>> * For architectures that can have both non-PASID and PASID traffic
>> simultaneously, like the SMMU, it is simpler to reason about PASID tables
>> being a domain, rather than sharing PASID0 within the domain and handling
>> all others per device.
>> * It's the same principle as non-PASID mappings (iommu_map/unmap is on a
>> domain).
>> * It implement the classic example of IOMMU architectures where multiple
>> device descriptors point to the same PASID tables.
>> * It may be desirable for drivers to share PASIDs within a domain, if they
>> are actually using domains for conveniently sharing address spaces between
>> devices. I'm not sure how much this is used as a feature. It does model a
>> shared bus where each device can snoop DMA, so it may be useful.
>>
> 
> I get another question about this design, thinking about the following case:
> 
> If a platform device with PASID ability, e.g. accelerator, which have multiple
> accelerator process units(APUs), it may create multiple virtual devices, one
> virtual device represent for an APU, with the same sid.
> 
> They can be in different groups, however must be in the same domain as this
> design, for domain held the PASID table, right? So how could they be used by
> different guest OS?

If they have the same SID, they will be in the same group as there will be
a single entry in the SMMU stream table. Otherwise, if the virtual devices
can be properly isolated from each other (in the same way as PCI SR-IOV),
they will each have their own SID, can each be in a different IOMMU group
and can be assigned to different guests.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 00/36] Process management for IOMMU + SVM for SMMUv3
  2017-10-12 12:55           ` Jean-Philippe Brucker
  (?)
@ 2017-10-12 15:28               ` Jordan Crouse
  -1 siblings, 0 replies; 268+ messages in thread
From: Jordan Crouse @ 2017-10-12 15:28 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Yisheng Xie, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Mark Rutland,
	gabriele.paoloni-hv44wF8Li93QT0dZR+AlfA, Catalin Marinas,
	Will Deacon, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	dwmw2

On Thu, Oct 12, 2017 at 01:55:32PM +0100, Jean-Philippe Brucker wrote:
> On 12/10/17 13:05, Yisheng Xie wrote:
> [...]
> >>>> * An iommu_process can be bound to multiple domains, and a domain can have
> >>>>   multiple iommu_process.
> >>> when bind a task to device, can we create a single domain for it? I am thinking
> >>> about process management without shared PT(for some device only support PASID
> >>> without pri ability), it seems hard to expand if a domain have multiple iommu_process?
> >>> Do you have any idea about this?
> >>
> >> A device always has to be in a domain, as far as I know. Not supporting
> >> PRI forces you to pin down all user mappings (or just the ones you use for
> >> DMA) but you should sill be able to share PT. Now if you don't support
> >> shared PT either, but only PASID, then you'll have to use io-pgtable and a
> >> new map/unmap API on an iommu_process. I don't understand your concern
> >> though, how would the link between process and domains prevent this use-case?
> >>
> > So you mean that if an iommu_process bind to multiple devices it should create
> > multiple io-pgtables? or just share the same io-pgtable?
> 
> I don't know to be honest, I haven't thought much about the io-pgtable
> case, I'm all about sharing the mm :)
> 
> It really depends on what the user (GPU driver I assume) wants. I think
> that if you're not sharing an mm with the device, then you're trying to
> hide parts of the process to the device, so you'd also want the
> flexibility of having different io-pgtables between devices. Different
> devices accessing isolated parts of the process requires separate io-pgtables.

In our specific Snapdragon use case the GPU is the only entity that cares about
process specific io-pgtables.  Everything else (display, video, camera) is happy
using a global io-ptgable.  The reasoning is that the GPU is programmable from
user space and can be easily used to copy data whereas the other use cases have
mostly fixed functions.

Even if different devices did want to have a process specific io-pgtable I doubt
we would share them.  Every device uses the IOMMU differently and the magic
needed to share a io-pgtable between (for example) a GPU and a DSP would be
prohibitively complicated.

Jordan

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 00/36] Process management for IOMMU + SVM for SMMUv3
@ 2017-10-12 15:28               ` Jordan Crouse
  0 siblings, 0 replies; 268+ messages in thread
From: Jordan Crouse @ 2017-10-12 15:28 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Yisheng Xie, linux-arm-kernel, linux-pci, linux-acpi, devicetree,
	iommu, Mark Rutland, gabriele.paoloni, Catalin Marinas,
	Will Deacon, okaya, rfranz, lenb, robh+dt, bhelgaas, dwmw2, rjw,
	Sudeep Holla

On Thu, Oct 12, 2017 at 01:55:32PM +0100, Jean-Philippe Brucker wrote:
> On 12/10/17 13:05, Yisheng Xie wrote:
> [...]
> >>>> * An iommu_process can be bound to multiple domains, and a domain can have
> >>>>   multiple iommu_process.
> >>> when bind a task to device, can we create a single domain for it? I am thinking
> >>> about process management without shared PT(for some device only support PASID
> >>> without pri ability), it seems hard to expand if a domain have multiple iommu_process?
> >>> Do you have any idea about this?
> >>
> >> A device always has to be in a domain, as far as I know. Not supporting
> >> PRI forces you to pin down all user mappings (or just the ones you use for
> >> DMA) but you should sill be able to share PT. Now if you don't support
> >> shared PT either, but only PASID, then you'll have to use io-pgtable and a
> >> new map/unmap API on an iommu_process. I don't understand your concern
> >> though, how would the link between process and domains prevent this use-case?
> >>
> > So you mean that if an iommu_process bind to multiple devices it should create
> > multiple io-pgtables? or just share the same io-pgtable?
> 
> I don't know to be honest, I haven't thought much about the io-pgtable
> case, I'm all about sharing the mm :)
> 
> It really depends on what the user (GPU driver I assume) wants. I think
> that if you're not sharing an mm with the device, then you're trying to
> hide parts of the process to the device, so you'd also want the
> flexibility of having different io-pgtables between devices. Different
> devices accessing isolated parts of the process requires separate io-pgtables.

In our specific Snapdragon use case the GPU is the only entity that cares about
process specific io-pgtables.  Everything else (display, video, camera) is happy
using a global io-ptgable.  The reasoning is that the GPU is programmable from
user space and can be easily used to copy data whereas the other use cases have
mostly fixed functions.

Even if different devices did want to have a process specific io-pgtable I doubt
we would share them.  Every device uses the IOMMU differently and the magic
needed to share a io-pgtable between (for example) a GPU and a DSP would be
prohibitively complicated.

Jordan

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 00/36] Process management for IOMMU + SVM for SMMUv3
@ 2017-10-12 15:28               ` Jordan Crouse
  0 siblings, 0 replies; 268+ messages in thread
From: Jordan Crouse @ 2017-10-12 15:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Oct 12, 2017 at 01:55:32PM +0100, Jean-Philippe Brucker wrote:
> On 12/10/17 13:05, Yisheng Xie wrote:
> [...]
> >>>> * An iommu_process can be bound to multiple domains, and a domain can have
> >>>>   multiple iommu_process.
> >>> when bind a task to device, can we create a single domain for it? I am thinking
> >>> about process management without shared PT(for some device only support PASID
> >>> without pri ability), it seems hard to expand if a domain have multiple iommu_process?
> >>> Do you have any idea about this?
> >>
> >> A device always has to be in a domain, as far as I know. Not supporting
> >> PRI forces you to pin down all user mappings (or just the ones you use for
> >> DMA) but you should sill be able to share PT. Now if you don't support
> >> shared PT either, but only PASID, then you'll have to use io-pgtable and a
> >> new map/unmap API on an iommu_process. I don't understand your concern
> >> though, how would the link between process and domains prevent this use-case?
> >>
> > So you mean that if an iommu_process bind to multiple devices it should create
> > multiple io-pgtables? or just share the same io-pgtable?
> 
> I don't know to be honest, I haven't thought much about the io-pgtable
> case, I'm all about sharing the mm :)
> 
> It really depends on what the user (GPU driver I assume) wants. I think
> that if you're not sharing an mm with the device, then you're trying to
> hide parts of the process to the device, so you'd also want the
> flexibility of having different io-pgtables between devices. Different
> devices accessing isolated parts of the process requires separate io-pgtables.

In our specific Snapdragon use case the GPU is the only entity that cares about
process specific io-pgtables.  Everything else (display, video, camera) is happy
using a global io-ptgable.  The reasoning is that the GPU is programmable from
user space and can be easily used to copy data whereas the other use cases have
mostly fixed functions.

Even if different devices did want to have a process specific io-pgtable I doubt
we would share them.  Every device uses the IOMMU differently and the magic
needed to share a io-pgtable between (for example) a GPU and a DSP would be
prohibitively complicated.

Jordan

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 12/36] dt-bindings: document stall and PASID properties for IOMMU masters
  2017-10-06 13:31   ` Jean-Philippe Brucker
  (?)
@ 2017-10-13 19:10       ` Rob Herring
  -1 siblings, 0 replies; 268+ messages in thread
From: Rob Herring @ 2017-10-13 19:10 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: mark.rutland-5wv7dgnIgG8,
	gabriele.paoloni-hv44wF8Li93QT0dZR+AlfA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	catalin.marinas-5wv7dgnIgG8, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, devicetree-u79uwXL29TY76Z2rM5mHXA,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw-LthD3rsA81gm4RdzfppkhA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	sudeep.holla-5wv7dgnIgG8

On Fri, Oct 06, 2017 at 02:31:39PM +0100, Jean-Philippe Brucker wrote:
> On ARM systems, some platform devices behind an IOMMU may support stall
> and PASID features. Stall is the ability to recover from page faults and
> PASID offers multiple process address spaces to the device. Together they
> allow to do paging with a device. Let the firmware tell us when a device
> supports stall and PASID.

Can't these be implied by the compatible string of the devices?

> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
> ---
>  Documentation/devicetree/bindings/iommu/iommu.txt | 24 +++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/iommu/iommu.txt b/Documentation/devicetree/bindings/iommu/iommu.txt
> index 5a8b4624defc..c589b75f7277 100644
> --- a/Documentation/devicetree/bindings/iommu/iommu.txt
> +++ b/Documentation/devicetree/bindings/iommu/iommu.txt
> @@ -86,6 +86,30 @@ have a means to turn off translation. But it is invalid in such cases to
>  disable the IOMMU's device tree node in the first place because it would
>  prevent any driver from properly setting up the translations.
>  
> +Optional properties:
> +--------------------
> +- dma-can-stall: When present, the master can wait for a transaction to
> +  complete for an indefinite amount of time. Upon translation fault some
> +  IOMMUs, instead of aborting the translation immediately, may first
> +  notify the driver and keep the transaction in flight. This allows the OS
> +  to inspect the fault and, for example, make physical pages resident
> +  before updating the mappings and completing the transaction. Such IOMMU
> +  accepts a limited number of simultaneous stalled transactions before
> +  having to either put back-pressure on the master, or abort new faulting
> +  transactions.
> +
> +  Firmware has to opt-in stalling, because most buses and masters don't
> +  support it. In particular it isn't compatible with PCI, where
> +  transactions have to complete before a time limit. More generally it
> +  won't work in systems and masters that haven't been designed for
> +  stalling. For example the OS, in order to handle a stalled transaction,
> +  may attempt to retrieve pages from secondary storage in a stalled
> +  domain, leading to a deadlock.
> +
> +- pasid-bits: Some masters support multiple address spaces for DMA. By
> +  tagging DMA transactions with an address space identifier. By default,
> +  this is 0, which means that the device only has one address space.
> +
>  
>  Notes:
>  ======
> -- 
> 2.13.3
> 

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 12/36] dt-bindings: document stall and PASID properties for IOMMU masters
@ 2017-10-13 19:10       ` Rob Herring
  0 siblings, 0 replies; 268+ messages in thread
From: Rob Herring @ 2017-10-13 19:10 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: mark.rutland, xieyisheng1, gabriele.paoloni, linux-pci,
	will.deacon, okaya, lorenzo.pieralisi, ashok.raj, tn, joro,
	robdclark, linux-acpi, catalin.marinas, rfranz, lenb, devicetree,
	jacob.jun.pan, alex.williamson, yi.l.liu, thunder.leizhen,
	bhelgaas, linux-arm-kernel, dwmw2, liubo95, rjw, iommu,
	hanjun.guo, sudeep.holla, robin.murphy, nwatters

On Fri, Oct 06, 2017 at 02:31:39PM +0100, Jean-Philippe Brucker wrote:
> On ARM systems, some platform devices behind an IOMMU may support stall
> and PASID features. Stall is the ability to recover from page faults and
> PASID offers multiple process address spaces to the device. Together they
> allow to do paging with a device. Let the firmware tell us when a device
> supports stall and PASID.

Can't these be implied by the compatible string of the devices?

> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  Documentation/devicetree/bindings/iommu/iommu.txt | 24 +++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/iommu/iommu.txt b/Documentation/devicetree/bindings/iommu/iommu.txt
> index 5a8b4624defc..c589b75f7277 100644
> --- a/Documentation/devicetree/bindings/iommu/iommu.txt
> +++ b/Documentation/devicetree/bindings/iommu/iommu.txt
> @@ -86,6 +86,30 @@ have a means to turn off translation. But it is invalid in such cases to
>  disable the IOMMU's device tree node in the first place because it would
>  prevent any driver from properly setting up the translations.
>  
> +Optional properties:
> +--------------------
> +- dma-can-stall: When present, the master can wait for a transaction to
> +  complete for an indefinite amount of time. Upon translation fault some
> +  IOMMUs, instead of aborting the translation immediately, may first
> +  notify the driver and keep the transaction in flight. This allows the OS
> +  to inspect the fault and, for example, make physical pages resident
> +  before updating the mappings and completing the transaction. Such IOMMU
> +  accepts a limited number of simultaneous stalled transactions before
> +  having to either put back-pressure on the master, or abort new faulting
> +  transactions.
> +
> +  Firmware has to opt-in stalling, because most buses and masters don't
> +  support it. In particular it isn't compatible with PCI, where
> +  transactions have to complete before a time limit. More generally it
> +  won't work in systems and masters that haven't been designed for
> +  stalling. For example the OS, in order to handle a stalled transaction,
> +  may attempt to retrieve pages from secondary storage in a stalled
> +  domain, leading to a deadlock.
> +
> +- pasid-bits: Some masters support multiple address spaces for DMA. By
> +  tagging DMA transactions with an address space identifier. By default,
> +  this is 0, which means that the device only has one address space.
> +
>  
>  Notes:
>  ======
> -- 
> 2.13.3
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 12/36] dt-bindings: document stall and PASID properties for IOMMU masters
@ 2017-10-13 19:10       ` Rob Herring
  0 siblings, 0 replies; 268+ messages in thread
From: Rob Herring @ 2017-10-13 19:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Oct 06, 2017 at 02:31:39PM +0100, Jean-Philippe Brucker wrote:
> On ARM systems, some platform devices behind an IOMMU may support stall
> and PASID features. Stall is the ability to recover from page faults and
> PASID offers multiple process address spaces to the device. Together they
> allow to do paging with a device. Let the firmware tell us when a device
> supports stall and PASID.

Can't these be implied by the compatible string of the devices?

> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  Documentation/devicetree/bindings/iommu/iommu.txt | 24 +++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/iommu/iommu.txt b/Documentation/devicetree/bindings/iommu/iommu.txt
> index 5a8b4624defc..c589b75f7277 100644
> --- a/Documentation/devicetree/bindings/iommu/iommu.txt
> +++ b/Documentation/devicetree/bindings/iommu/iommu.txt
> @@ -86,6 +86,30 @@ have a means to turn off translation. But it is invalid in such cases to
>  disable the IOMMU's device tree node in the first place because it would
>  prevent any driver from properly setting up the translations.
>  
> +Optional properties:
> +--------------------
> +- dma-can-stall: When present, the master can wait for a transaction to
> +  complete for an indefinite amount of time. Upon translation fault some
> +  IOMMUs, instead of aborting the translation immediately, may first
> +  notify the driver and keep the transaction in flight. This allows the OS
> +  to inspect the fault and, for example, make physical pages resident
> +  before updating the mappings and completing the transaction. Such IOMMU
> +  accepts a limited number of simultaneous stalled transactions before
> +  having to either put back-pressure on the master, or abort new faulting
> +  transactions.
> +
> +  Firmware has to opt-in stalling, because most buses and masters don't
> +  support it. In particular it isn't compatible with PCI, where
> +  transactions have to complete before a time limit. More generally it
> +  won't work in systems and masters that haven't been designed for
> +  stalling. For example the OS, in order to handle a stalled transaction,
> +  may attempt to retrieve pages from secondary storage in a stalled
> +  domain, leading to a deadlock.
> +
> +- pasid-bits: Some masters support multiple address spaces for DMA. By
> +  tagging DMA transactions with an address space identifier. By default,
> +  this is 0, which means that the device only has one address space.
> +
>  
>  Notes:
>  ======
> -- 
> 2.13.3
> 

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 12/36] dt-bindings: document stall and PASID properties for IOMMU masters
  2017-10-13 19:10       ` Rob Herring
  (?)
@ 2017-10-16 10:23         ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-16 10:23 UTC (permalink / raw)
  To: Rob Herring
  Cc: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, joro,
	Mark Rutland, Catalin Marinas, Will Deacon, Lorenzo Pieralisi,
	hanjun.guo, Sudeep Holla, rjw, lenb, Robin Murphy, bhelgaas

On 13/10/17 20:10, Rob Herring wrote:
> On Fri, Oct 06, 2017 at 02:31:39PM +0100, Jean-Philippe Brucker wrote:
>> On ARM systems, some platform devices behind an IOMMU may support stall
>> and PASID features. Stall is the ability to recover from page faults and
>> PASID offers multiple process address spaces to the device. Together they
>> allow to do paging with a device. Let the firmware tell us when a device
>> supports stall and PASID.
> 
> Can't these be implied by the compatible string of the devices?

I think that PASID capacity can be deduced from the compatible string but
don't know how these devices will be implemented (the only known example
being a software model for testing). In any case implementing PASID based
on the compatible string is tricky, because the IOMMU driver needs to know
PASID capacity before the device driver has had time to probe the device.
Maybe we could get away with a static table associating compatible string
to PASID capacity, but it's not very nice.

For stall it's a property of the integration between device and IOMMU,
much like the "iommus" property, so it can't be deduced only from the
compatible string. It's crucial that the firmware validates that stalling
is safe, because we don't have any other way to discover it.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 12/36] dt-bindings: document stall and PASID properties for IOMMU masters
@ 2017-10-16 10:23         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-16 10:23 UTC (permalink / raw)
  To: Rob Herring
  Cc: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, joro,
	Mark Rutland, Catalin Marinas, Will Deacon, Lorenzo Pieralisi,
	hanjun.guo, Sudeep Holla, rjw, lenb, Robin Murphy, bhelgaas,
	alex.williamson, tn, liubo95, thunder.leizhen, xieyisheng1,
	gabriele.paoloni, nwatters, okaya, rfranz, dwmw2, jacob.jun.pan,
	yi.l.liu, ashok.raj, robdclark

On 13/10/17 20:10, Rob Herring wrote:
> On Fri, Oct 06, 2017 at 02:31:39PM +0100, Jean-Philippe Brucker wrote:
>> On ARM systems, some platform devices behind an IOMMU may support stall
>> and PASID features. Stall is the ability to recover from page faults and
>> PASID offers multiple process address spaces to the device. Together they
>> allow to do paging with a device. Let the firmware tell us when a device
>> supports stall and PASID.
> 
> Can't these be implied by the compatible string of the devices?

I think that PASID capacity can be deduced from the compatible string but
don't know how these devices will be implemented (the only known example
being a software model for testing). In any case implementing PASID based
on the compatible string is tricky, because the IOMMU driver needs to know
PASID capacity before the device driver has had time to probe the device.
Maybe we could get away with a static table associating compatible string
to PASID capacity, but it's not very nice.

For stall it's a property of the integration between device and IOMMU,
much like the "iommus" property, so it can't be deduced only from the
compatible string. It's crucial that the firmware validates that stalling
is safe, because we don't have any other way to discover it.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 12/36] dt-bindings: document stall and PASID properties for IOMMU masters
@ 2017-10-16 10:23         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-16 10:23 UTC (permalink / raw)
  To: linux-arm-kernel

On 13/10/17 20:10, Rob Herring wrote:
> On Fri, Oct 06, 2017 at 02:31:39PM +0100, Jean-Philippe Brucker wrote:
>> On ARM systems, some platform devices behind an IOMMU may support stall
>> and PASID features. Stall is the ability to recover from page faults and
>> PASID offers multiple process address spaces to the device. Together they
>> allow to do paging with a device. Let the firmware tell us when a device
>> supports stall and PASID.
> 
> Can't these be implied by the compatible string of the devices?

I think that PASID capacity can be deduced from the compatible string but
don't know how these devices will be implemented (the only known example
being a software model for testing). In any case implementing PASID based
on the compatible string is tricky, because the IOMMU driver needs to know
PASID capacity before the device driver has had time to probe the device.
Maybe we could get away with a static table associating compatible string
to PASID capacity, but it's not very nice.

For stall it's a property of the integration between device and IOMMU,
much like the "iommus" property, so it can't be deduced only from the
compatible string. It's crucial that the firmware validates that stalling
is safe, because we don't have any other way to discover it.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 12/36] dt-bindings: document stall and PASID properties for IOMMU masters
  2017-10-16 10:23         ` Jean-Philippe Brucker
  (?)
@ 2017-10-18  2:06             ` Rob Herring
  -1 siblings, 0 replies; 268+ messages in thread
From: Rob Herring @ 2017-10-18  2:06 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	joro-zLv9SwRftAIdnm+yROfE0A, Mark Rutland, Catalin Marinas,
	Will Deacon, Lorenzo Pieralisi,
	hanjun.guo-QSEj5FYQhm4dnm+yROfE0A, Sudeep Holla,
	rjw-LthD3rsA81gm4RdzfppkhA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	Robin Murphy, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA

On Mon, Oct 16, 2017 at 5:23 AM, Jean-Philippe Brucker
<jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:
> On 13/10/17 20:10, Rob Herring wrote:
>> On Fri, Oct 06, 2017 at 02:31:39PM +0100, Jean-Philippe Brucker wrote:
>>> On ARM systems, some platform devices behind an IOMMU may support stall
>>> and PASID features. Stall is the ability to recover from page faults and
>>> PASID offers multiple process address spaces to the device. Together they
>>> allow to do paging with a device. Let the firmware tell us when a device
>>> supports stall and PASID.
>>
>> Can't these be implied by the compatible string of the devices?
>
> I think that PASID capacity can be deduced from the compatible string but
> don't know how these devices will be implemented (the only known example
> being a software model for testing). In any case implementing PASID based
> on the compatible string is tricky, because the IOMMU driver needs to know
> PASID capacity before the device driver has had time to probe the device.
> Maybe we could get away with a static table associating compatible string
> to PASID capacity, but it's not very nice.
>
> For stall it's a property of the integration between device and IOMMU,
> much like the "iommus" property, so it can't be deduced only from the
> compatible string. It's crucial that the firmware validates that stalling
> is safe, because we don't have any other way to discover it.

Okay, fair enough. I'll let others comment.

Rob
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 12/36] dt-bindings: document stall and PASID properties for IOMMU masters
@ 2017-10-18  2:06             ` Rob Herring
  0 siblings, 0 replies; 268+ messages in thread
From: Rob Herring @ 2017-10-18  2:06 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Mark Rutland, xieyisheng1, gabriele.paoloni, linux-pci,
	Will Deacon, okaya, Lorenzo Pieralisi, ashok.raj, tn, joro,
	robdclark, linux-acpi, Catalin Marinas, rfranz, lenb, devicetree,
	jacob.jun.pan, alex.williamson, yi.l.liu, thunder.leizhen,
	bhelgaas, linux-arm-kernel, dwmw2, liubo95, rjw, iommu,
	hanjun.guo, Sudeep Holla, Robin Murphy, nwatters

On Mon, Oct 16, 2017 at 5:23 AM, Jean-Philippe Brucker
<jean-philippe.brucker@arm.com> wrote:
> On 13/10/17 20:10, Rob Herring wrote:
>> On Fri, Oct 06, 2017 at 02:31:39PM +0100, Jean-Philippe Brucker wrote:
>>> On ARM systems, some platform devices behind an IOMMU may support stall
>>> and PASID features. Stall is the ability to recover from page faults and
>>> PASID offers multiple process address spaces to the device. Together they
>>> allow to do paging with a device. Let the firmware tell us when a device
>>> supports stall and PASID.
>>
>> Can't these be implied by the compatible string of the devices?
>
> I think that PASID capacity can be deduced from the compatible string but
> don't know how these devices will be implemented (the only known example
> being a software model for testing). In any case implementing PASID based
> on the compatible string is tricky, because the IOMMU driver needs to know
> PASID capacity before the device driver has had time to probe the device.
> Maybe we could get away with a static table associating compatible string
> to PASID capacity, but it's not very nice.
>
> For stall it's a property of the integration between device and IOMMU,
> much like the "iommus" property, so it can't be deduced only from the
> compatible string. It's crucial that the firmware validates that stalling
> is safe, because we don't have any other way to discover it.

Okay, fair enough. I'll let others comment.

Rob

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 12/36] dt-bindings: document stall and PASID properties for IOMMU masters
@ 2017-10-18  2:06             ` Rob Herring
  0 siblings, 0 replies; 268+ messages in thread
From: Rob Herring @ 2017-10-18  2:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 16, 2017 at 5:23 AM, Jean-Philippe Brucker
<jean-philippe.brucker@arm.com> wrote:
> On 13/10/17 20:10, Rob Herring wrote:
>> On Fri, Oct 06, 2017 at 02:31:39PM +0100, Jean-Philippe Brucker wrote:
>>> On ARM systems, some platform devices behind an IOMMU may support stall
>>> and PASID features. Stall is the ability to recover from page faults and
>>> PASID offers multiple process address spaces to the device. Together they
>>> allow to do paging with a device. Let the firmware tell us when a device
>>> supports stall and PASID.
>>
>> Can't these be implied by the compatible string of the devices?
>
> I think that PASID capacity can be deduced from the compatible string but
> don't know how these devices will be implemented (the only known example
> being a software model for testing). In any case implementing PASID based
> on the compatible string is tricky, because the IOMMU driver needs to know
> PASID capacity before the device driver has had time to probe the device.
> Maybe we could get away with a static table associating compatible string
> to PASID capacity, but it's not very nice.
>
> For stall it's a property of the integration between device and IOMMU,
> much like the "iommus" property, so it can't be deduced only from the
> compatible string. It's crucial that the firmware validates that stalling
> is safe, because we don't have any other way to discover it.

Okay, fair enough. I'll let others comment.

Rob

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
  2017-10-06 13:31     ` Jean-Philippe Brucker
  (?)
@ 2017-10-20 23:32         ` Sinan Kaya
  -1 siblings, 0 replies; 268+ messages in thread
From: Sinan Kaya @ 2017-10-20 23:32 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: mark.rutland-5wv7dgnIgG8,
	gabriele.paoloni-hv44wF8Li93QT0dZR+AlfA,
	catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw-LthD3rsA81gm4RdzfppkhA,
	sudeep.holla-5wv7dgnIgG8

few nits below.

> +/*
> + * Allocate a iommu_process structure for the given task.
> + *
> + * Ideally we shouldn't need the domain parameter, since iommu_process is
> + * system-wide, but we use it to retrieve the driver's allocation ops and a
> + * PASID range.
> + */
> +static struct iommu_process *
> +iommu_process_alloc(struct iommu_domain *domain, struct task_struct *task)
> +{
> +	int err;
> +	int pasid;
> +	struct iommu_process *process;
> +
> +	if (WARN_ON(!domain->ops->process_alloc || !domain->ops->process_free))
> +		return ERR_PTR(-ENODEV);
> +
> +	process = domain->ops->process_alloc(task);
> +	if (IS_ERR(process))
> +		return process;
> +	if (!process)
> +		return ERR_PTR(-ENOMEM);
> +
> +	process->pid		= get_task_pid(task, PIDTYPE_PID);
> +	process->release	= domain->ops->process_free;
> +	INIT_LIST_HEAD(&process->domains);
> +	kref_init(&process->kref);
> +
nit, I think you should place this check right after the pid assignment.

> +	if (!process->pid) {
> +		err = -EINVAL;
> +		goto err_free_process;
> +	}
> +
> +	idr_preload(GFP_KERNEL);
> +	spin_lock(&iommu_process_lock);
> +	pasid = idr_alloc_cyclic(&iommu_process_idr, process, domain->min_pasid,
> +				 domain->max_pasid + 1, GFP_ATOMIC);
> +	process->pasid = pasid;
> +	spin_unlock(&iommu_process_lock);
> +	idr_preload_end();
> +

nit, You can maybe return here if pasid is not negative.

> +	if (pasid < 0) {
> +		err = pasid;
> +		goto err_put_pid;
> +	}
> +
> +	return process;
> +
> +err_put_pid:
> +	put_pid(process->pid);
> +
> +err_free_process:
> +	domain->ops->process_free(process);
> +
> +	return ERR_PTR(err);
> +}
> +
> +static void iommu_process_release(struct kref *kref)
> +{
> +	struct iommu_process *process;
> +	void (*release)(struct iommu_process *);
> +
> +	assert_spin_locked(&iommu_process_lock);
> +
> +	process = container_of(kref, struct iommu_process, kref);

if we are concerned about things going wrong (assert above), we should
also add some pointer check here (WARN) for process and release pointers as well.

> +	release = process->release;
> +
> +	WARN_ON(!list_empty(&process->domains));
> +
> +	idr_remove(&iommu_process_idr, process->pasid);
> +	put_pid(process->pid);
> +	release(process);
> +}
> +
> +/*
> + * Returns non-zero if a reference to the process was successfully taken.
> + * Returns zero if the process is being freed and should not be used.
> + */
> +static int iommu_process_get_locked(struct iommu_process *process)
> +{
> +	assert_spin_locked(&iommu_process_lock);
> +
> +	if (process)
> +		return kref_get_unless_zero(&process->kref);
> +
> +	return 0;
> +}
> +
> +static void iommu_process_put_locked(struct iommu_process *process)
> +{
> +	assert_spin_locked(&iommu_process_lock);
> +
> +	kref_put(&process->kref, iommu_process_release);
> +}
> +
> +static int iommu_process_attach(struct iommu_domain *domain, struct device *dev,
> +				struct iommu_process *process)
> +{
> +	int err;
> +	int pasid = process->pasid;
> +	struct iommu_context *context;
> +
> +	if (WARN_ON(!domain->ops->process_attach || !domain->ops->process_detach))
> +		return -ENODEV;
> +
> +	if (pasid > domain->max_pasid || pasid < domain->min_pasid)
> +		return -ENOSPC;
> +
> +	context = kzalloc(sizeof(*context), GFP_KERNEL);
> +	if (!context)
> +		return -ENOMEM;
> +

devm_kzalloc maybe?

> +	context->process	= process;
> +	context->domain		= domain;
> +	refcount_set(&context->ref, 1);
> +
> +	spin_lock(&iommu_process_lock);
> +	err = domain->ops->process_attach(domain, dev, process, true);
> +	if (err) {
> +		kfree(context);
> +		spin_unlock(&iommu_process_lock);
> +		return err;
> +	}
> +
> +	list_add(&context->process_head, &process->domains);
> +	list_add(&context->domain_head, &domain->processes);
> +	spin_unlock(&iommu_process_lock);
> +
> +	return 0;
> +}

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
@ 2017-10-20 23:32         ` Sinan Kaya
  0 siblings, 0 replies; 268+ messages in thread
From: Sinan Kaya @ 2017-10-20 23:32 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, rfranz,
	dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

few nits below.

> +/*
> + * Allocate a iommu_process structure for the given task.
> + *
> + * Ideally we shouldn't need the domain parameter, since iommu_process is
> + * system-wide, but we use it to retrieve the driver's allocation ops and a
> + * PASID range.
> + */
> +static struct iommu_process *
> +iommu_process_alloc(struct iommu_domain *domain, struct task_struct *task)
> +{
> +	int err;
> +	int pasid;
> +	struct iommu_process *process;
> +
> +	if (WARN_ON(!domain->ops->process_alloc || !domain->ops->process_free))
> +		return ERR_PTR(-ENODEV);
> +
> +	process = domain->ops->process_alloc(task);
> +	if (IS_ERR(process))
> +		return process;
> +	if (!process)
> +		return ERR_PTR(-ENOMEM);
> +
> +	process->pid		= get_task_pid(task, PIDTYPE_PID);
> +	process->release	= domain->ops->process_free;
> +	INIT_LIST_HEAD(&process->domains);
> +	kref_init(&process->kref);
> +
nit, I think you should place this check right after the pid assignment.

> +	if (!process->pid) {
> +		err = -EINVAL;
> +		goto err_free_process;
> +	}
> +
> +	idr_preload(GFP_KERNEL);
> +	spin_lock(&iommu_process_lock);
> +	pasid = idr_alloc_cyclic(&iommu_process_idr, process, domain->min_pasid,
> +				 domain->max_pasid + 1, GFP_ATOMIC);
> +	process->pasid = pasid;
> +	spin_unlock(&iommu_process_lock);
> +	idr_preload_end();
> +

nit, You can maybe return here if pasid is not negative.

> +	if (pasid < 0) {
> +		err = pasid;
> +		goto err_put_pid;
> +	}
> +
> +	return process;
> +
> +err_put_pid:
> +	put_pid(process->pid);
> +
> +err_free_process:
> +	domain->ops->process_free(process);
> +
> +	return ERR_PTR(err);
> +}
> +
> +static void iommu_process_release(struct kref *kref)
> +{
> +	struct iommu_process *process;
> +	void (*release)(struct iommu_process *);
> +
> +	assert_spin_locked(&iommu_process_lock);
> +
> +	process = container_of(kref, struct iommu_process, kref);

if we are concerned about things going wrong (assert above), we should
also add some pointer check here (WARN) for process and release pointers as well.

> +	release = process->release;
> +
> +	WARN_ON(!list_empty(&process->domains));
> +
> +	idr_remove(&iommu_process_idr, process->pasid);
> +	put_pid(process->pid);
> +	release(process);
> +}
> +
> +/*
> + * Returns non-zero if a reference to the process was successfully taken.
> + * Returns zero if the process is being freed and should not be used.
> + */
> +static int iommu_process_get_locked(struct iommu_process *process)
> +{
> +	assert_spin_locked(&iommu_process_lock);
> +
> +	if (process)
> +		return kref_get_unless_zero(&process->kref);
> +
> +	return 0;
> +}
> +
> +static void iommu_process_put_locked(struct iommu_process *process)
> +{
> +	assert_spin_locked(&iommu_process_lock);
> +
> +	kref_put(&process->kref, iommu_process_release);
> +}
> +
> +static int iommu_process_attach(struct iommu_domain *domain, struct device *dev,
> +				struct iommu_process *process)
> +{
> +	int err;
> +	int pasid = process->pasid;
> +	struct iommu_context *context;
> +
> +	if (WARN_ON(!domain->ops->process_attach || !domain->ops->process_detach))
> +		return -ENODEV;
> +
> +	if (pasid > domain->max_pasid || pasid < domain->min_pasid)
> +		return -ENOSPC;
> +
> +	context = kzalloc(sizeof(*context), GFP_KERNEL);
> +	if (!context)
> +		return -ENOMEM;
> +

devm_kzalloc maybe?

> +	context->process	= process;
> +	context->domain		= domain;
> +	refcount_set(&context->ref, 1);
> +
> +	spin_lock(&iommu_process_lock);
> +	err = domain->ops->process_attach(domain, dev, process, true);
> +	if (err) {
> +		kfree(context);
> +		spin_unlock(&iommu_process_lock);
> +		return err;
> +	}
> +
> +	list_add(&context->process_head, &process->domains);
> +	list_add(&context->domain_head, &domain->processes);
> +	spin_unlock(&iommu_process_lock);
> +
> +	return 0;
> +}

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
@ 2017-10-20 23:32         ` Sinan Kaya
  0 siblings, 0 replies; 268+ messages in thread
From: Sinan Kaya @ 2017-10-20 23:32 UTC (permalink / raw)
  To: linux-arm-kernel

few nits below.

> +/*
> + * Allocate a iommu_process structure for the given task.
> + *
> + * Ideally we shouldn't need the domain parameter, since iommu_process is
> + * system-wide, but we use it to retrieve the driver's allocation ops and a
> + * PASID range.
> + */
> +static struct iommu_process *
> +iommu_process_alloc(struct iommu_domain *domain, struct task_struct *task)
> +{
> +	int err;
> +	int pasid;
> +	struct iommu_process *process;
> +
> +	if (WARN_ON(!domain->ops->process_alloc || !domain->ops->process_free))
> +		return ERR_PTR(-ENODEV);
> +
> +	process = domain->ops->process_alloc(task);
> +	if (IS_ERR(process))
> +		return process;
> +	if (!process)
> +		return ERR_PTR(-ENOMEM);
> +
> +	process->pid		= get_task_pid(task, PIDTYPE_PID);
> +	process->release	= domain->ops->process_free;
> +	INIT_LIST_HEAD(&process->domains);
> +	kref_init(&process->kref);
> +
nit, I think you should place this check right after the pid assignment.

> +	if (!process->pid) {
> +		err = -EINVAL;
> +		goto err_free_process;
> +	}
> +
> +	idr_preload(GFP_KERNEL);
> +	spin_lock(&iommu_process_lock);
> +	pasid = idr_alloc_cyclic(&iommu_process_idr, process, domain->min_pasid,
> +				 domain->max_pasid + 1, GFP_ATOMIC);
> +	process->pasid = pasid;
> +	spin_unlock(&iommu_process_lock);
> +	idr_preload_end();
> +

nit, You can maybe return here if pasid is not negative.

> +	if (pasid < 0) {
> +		err = pasid;
> +		goto err_put_pid;
> +	}
> +
> +	return process;
> +
> +err_put_pid:
> +	put_pid(process->pid);
> +
> +err_free_process:
> +	domain->ops->process_free(process);
> +
> +	return ERR_PTR(err);
> +}
> +
> +static void iommu_process_release(struct kref *kref)
> +{
> +	struct iommu_process *process;
> +	void (*release)(struct iommu_process *);
> +
> +	assert_spin_locked(&iommu_process_lock);
> +
> +	process = container_of(kref, struct iommu_process, kref);

if we are concerned about things going wrong (assert above), we should
also add some pointer check here (WARN) for process and release pointers as well.

> +	release = process->release;
> +
> +	WARN_ON(!list_empty(&process->domains));
> +
> +	idr_remove(&iommu_process_idr, process->pasid);
> +	put_pid(process->pid);
> +	release(process);
> +}
> +
> +/*
> + * Returns non-zero if a reference to the process was successfully taken.
> + * Returns zero if the process is being freed and should not be used.
> + */
> +static int iommu_process_get_locked(struct iommu_process *process)
> +{
> +	assert_spin_locked(&iommu_process_lock);
> +
> +	if (process)
> +		return kref_get_unless_zero(&process->kref);
> +
> +	return 0;
> +}
> +
> +static void iommu_process_put_locked(struct iommu_process *process)
> +{
> +	assert_spin_locked(&iommu_process_lock);
> +
> +	kref_put(&process->kref, iommu_process_release);
> +}
> +
> +static int iommu_process_attach(struct iommu_domain *domain, struct device *dev,
> +				struct iommu_process *process)
> +{
> +	int err;
> +	int pasid = process->pasid;
> +	struct iommu_context *context;
> +
> +	if (WARN_ON(!domain->ops->process_attach || !domain->ops->process_detach))
> +		return -ENODEV;
> +
> +	if (pasid > domain->max_pasid || pasid < domain->min_pasid)
> +		return -ENOSPC;
> +
> +	context = kzalloc(sizeof(*context), GFP_KERNEL);
> +	if (!context)
> +		return -ENOMEM;
> +

devm_kzalloc maybe?

> +	context->process	= process;
> +	context->domain		= domain;
> +	refcount_set(&context->ref, 1);
> +
> +	spin_lock(&iommu_process_lock);
> +	err = domain->ops->process_attach(domain, dev, process, true);
> +	if (err) {
> +		kfree(context);
> +		spin_unlock(&iommu_process_lock);
> +		return err;
> +	}
> +
> +	list_add(&context->process_head, &process->domains);
> +	list_add(&context->domain_head, &domain->processes);
> +	spin_unlock(&iommu_process_lock);
> +
> +	return 0;
> +}

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 05/36] iommu/process: Bind and unbind process to and from devices
  2017-10-06 13:31     ` Jean-Philippe Brucker
  (?)
@ 2017-10-21 15:47         ` Sinan Kaya
  -1 siblings, 0 replies; 268+ messages in thread
From: Sinan Kaya @ 2017-10-21 15:47 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: mark.rutland-5wv7dgnIgG8,
	gabriele.paoloni-hv44wF8Li93QT0dZR+AlfA,
	catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw-LthD3rsA81gm4RdzfppkhA,
	sudeep.holla-5wv7dgnIgG8

Just some improvement suggestions.

On 10/6/2017 9:31 AM, Jean-Philippe Brucker wrote:
> +	spin_lock(&iommu_process_lock);
> +	idr_for_each_entry(&iommu_process_idr, process, i) {
> +		if (process->pid != pid)
> +			continue;
if you see this construct a lot, this could be a for_each_iommu_process.

> +
> +		if (!iommu_process_get_locked(process)) {
> +			/* Process is defunct, create a new one */
> +			process = NULL;
> +			break;
> +		}
> +
> +		/* Great, is it also bound to this domain? */
> +		list_for_each_entry(cur_context, &process->domains,
> +				    process_head) {
> +			if (cur_context->domain != domain)
> +				continue;
if you see this construct a lot, this could be a for_each_iommu_process_domain.

> +
> +			context = cur_context;
> +			*pasid = process->pasid;
> +
> +			/* Splendid, tell the driver and increase the ref */
> +			err = iommu_process_attach_locked(context, dev);
> +			if (err)
> +				iommu_process_put_locked(process);
> +
> +			break;
> +		}
> +		break;
> +	}
> +	spin_unlock(&iommu_process_lock);
> +	put_pid(pid);
> +
> +	if (context)
> +		return err;

I think you should make the section above a independent function and return here when the
context is found.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 05/36] iommu/process: Bind and unbind process to and from devices
@ 2017-10-21 15:47         ` Sinan Kaya
  0 siblings, 0 replies; 268+ messages in thread
From: Sinan Kaya @ 2017-10-21 15:47 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, rfranz,
	dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

Just some improvement suggestions.

On 10/6/2017 9:31 AM, Jean-Philippe Brucker wrote:
> +	spin_lock(&iommu_process_lock);
> +	idr_for_each_entry(&iommu_process_idr, process, i) {
> +		if (process->pid != pid)
> +			continue;
if you see this construct a lot, this could be a for_each_iommu_process.

> +
> +		if (!iommu_process_get_locked(process)) {
> +			/* Process is defunct, create a new one */
> +			process = NULL;
> +			break;
> +		}
> +
> +		/* Great, is it also bound to this domain? */
> +		list_for_each_entry(cur_context, &process->domains,
> +				    process_head) {
> +			if (cur_context->domain != domain)
> +				continue;
if you see this construct a lot, this could be a for_each_iommu_process_domain.

> +
> +			context = cur_context;
> +			*pasid = process->pasid;
> +
> +			/* Splendid, tell the driver and increase the ref */
> +			err = iommu_process_attach_locked(context, dev);
> +			if (err)
> +				iommu_process_put_locked(process);
> +
> +			break;
> +		}
> +		break;
> +	}
> +	spin_unlock(&iommu_process_lock);
> +	put_pid(pid);
> +
> +	if (context)
> +		return err;

I think you should make the section above a independent function and return here when the
context is found.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 05/36] iommu/process: Bind and unbind process to and from devices
@ 2017-10-21 15:47         ` Sinan Kaya
  0 siblings, 0 replies; 268+ messages in thread
From: Sinan Kaya @ 2017-10-21 15:47 UTC (permalink / raw)
  To: linux-arm-kernel

Just some improvement suggestions.

On 10/6/2017 9:31 AM, Jean-Philippe Brucker wrote:
> +	spin_lock(&iommu_process_lock);
> +	idr_for_each_entry(&iommu_process_idr, process, i) {
> +		if (process->pid != pid)
> +			continue;
if you see this construct a lot, this could be a for_each_iommu_process.

> +
> +		if (!iommu_process_get_locked(process)) {
> +			/* Process is defunct, create a new one */
> +			process = NULL;
> +			break;
> +		}
> +
> +		/* Great, is it also bound to this domain? */
> +		list_for_each_entry(cur_context, &process->domains,
> +				    process_head) {
> +			if (cur_context->domain != domain)
> +				continue;
if you see this construct a lot, this could be a for_each_iommu_process_domain.

> +
> +			context = cur_context;
> +			*pasid = process->pasid;
> +
> +			/* Splendid, tell the driver and increase the ref */
> +			err = iommu_process_attach_locked(context, dev);
> +			if (err)
> +				iommu_process_put_locked(process);
> +
> +			break;
> +		}
> +		break;
> +	}
> +	spin_unlock(&iommu_process_lock);
> +	put_pid(pid);
> +
> +	if (context)
> +		return err;

I think you should make the section above a independent function and return here when the
context is found.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 268+ messages in thread

* RE: [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
  2017-10-06 13:31     ` Jean-Philippe Brucker
  (?)
@ 2017-10-23 11:04       ` Liu, Yi L
  -1 siblings, 0 replies; 268+ messages in thread
From: Liu, Yi L @ 2017-10-23 11:04 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1

Hi Jean,

> -----Original Message-----
> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
> Sent: Friday, October 6, 2017 9:31 PM
> To: linux-arm-kernel@lists.infradead.org; linux-pci@vger.kernel.org; linux-
> acpi@vger.kernel.org; devicetree@vger.kernel.org; iommu@lists.linux-
> foundation.org
> Cc: joro@8bytes.org; robh+dt@kernel.org; mark.rutland@arm.com;
> catalin.marinas@arm.com; will.deacon@arm.com; lorenzo.pieralisi@arm.com;
> hanjun.guo@linaro.org; sudeep.holla@arm.com; rjw@rjwysocki.net;
> lenb@kernel.org; robin.murphy@arm.com; bhelgaas@google.com;
> alex.williamson@redhat.com; tn@semihalf.com; liubo95@huawei.com;
> thunder.leizhen@huawei.com; xieyisheng1@huawei.com;
> gabriele.paoloni@huawei.com; nwatters@codeaurora.org; okaya@codeaurora.org;
> rfranz@cavium.com; dwmw2@infradead.org; jacob.jun.pan@linux.intel.com; Liu, Yi
> L <yi.l.liu@intel.com>; Raj, Ashok <ashok.raj@intel.com>; robdclark@gmail.com
> Subject: [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
> 
> IOMMU drivers need a way to bind Linux processes to devices. This is used for
> Shared Virtual Memory (SVM), where devices support paging. In that mode, DMA can
> directly target virtual addresses of a process.
> 
> Introduce boilerplate code for allocating process structures and binding them to
> devices. Four operations are added to IOMMU drivers:
> 
> * process_alloc, process_free: to create an iommu_process structure and
>   perform architecture-specific operations required to grab the process
>   (for instance on ARM SMMU, pin down the CPU ASID). There is a single
>   iommu_process structure per Linux process.
> 
> * process_attach: attach a process to a device. The IOMMU driver checks
>   that the device is capable of sharing an address space with this
>   process, and writes the PASID table entry to install the process page
>   directory.
> 
>   Some IOMMU drivers (e.g. ARM SMMU and virtio-iommu) will have a single
>   PASID table per domain, for convenience. Other can implement it
>   differently but to help these drivers, process_attach and process_detach
>   take a 'first' or 'last' parameter telling whether they need to
>   install/remove the PASID entry or only send the required TLB
>   invalidations.
> 
> * process_detach: detach a process from a device. The IOMMU driver removes
>   the PASID table entry and invalidates the IOTLBs.
> 
> process_attach and process_detach operations are serialized with a spinlock. At the
> moment it is global, but if we try to optimize it, the core should at least prevent
> concurrent attach/detach on the same domain.
> (so multi-level PASID table code can allocate tables lazily without having to go
> through the io-pgtable concurrency nightmare). process_alloc can sleep, but
> process_free must not (because we'll have to call it from
> call_srcu.)
> 
> At the moment we use an IDR for allocating PASIDs and retrieving contexts.
> We also use a single spinlock. These can be refined and optimized later (a custom
> allocator will be needed for top-down PASID allocation).
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  drivers/iommu/Kconfig         |  10 ++
>  drivers/iommu/Makefile        |   1 +
>  drivers/iommu/iommu-process.c | 225
> ++++++++++++++++++++++++++++++++++++++++++
>  drivers/iommu/iommu.c         |   1 +
>  include/linux/iommu.h         |  24 +++++
>  5 files changed, 261 insertions(+)
>  create mode 100644 drivers/iommu/iommu-process.c
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index
> f3a21343e636..1ea5c90e37be 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -74,6 +74,16 @@ config IOMMU_DMA
>  	select IOMMU_IOVA
>  	select NEED_SG_DMA_LENGTH
> 
> +config IOMMU_PROCESS
> +	bool "Process management API for the IOMMU"
> +	select IOMMU_API
> +	help
> +	  Enable process management for the IOMMU API. In systems that support
> +	  it, device drivers can bind processes to devices and share their page
> +	  tables using this API.
> +
> +	  If unsure, say N here.
> +
>  config FSL_PAMU
>  	bool "Freescale IOMMU support"
>  	depends on PCI
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index
> b910aea813a1..a2832edbfaa2 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -1,6 +1,7 @@
>  obj-$(CONFIG_IOMMU_API) += iommu.o
>  obj-$(CONFIG_IOMMU_API) += iommu-traces.o
>  obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
> +obj-$(CONFIG_IOMMU_PROCESS) += iommu-process.o
>  obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o diff --git
> a/drivers/iommu/iommu-process.c b/drivers/iommu/iommu-process.c new file
> mode 100644 index 000000000000..a7e5a1c94305
> --- /dev/null
> +++ b/drivers/iommu/iommu-process.c
> @@ -0,0 +1,225 @@
> +/*
> + * Track processes bound to devices
> + *
> + * This program is free software; you can redistribute it and/or modify
> +it
> + * under the terms of the GNU General Public License version 2 as
> +published
> + * by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307
> +USA
> + *
> + * Copyright (C) 2017 ARM Ltd.
> + *
> + * Author: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>  */
> +
> +#include <linux/idr.h>
> +#include <linux/iommu.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +
> +/* Link between a domain and a process */ struct iommu_context {
> +	struct iommu_process	*process;
> +	struct iommu_domain	*domain;
> +
> +	struct list_head	process_head;
> +	struct list_head	domain_head;
> +
> +	/* Number of devices that use this context */
> +	refcount_t		ref;
> +};
> +
> +/*
> + * Because we're using an IDR, PASIDs are limited to 31 bits (the sign
> +bit is
> + * used for returning errors). In practice implementations will use at
> +most 20
> + * bits, which is the PCI limit.
> + */
> +static DEFINE_IDR(iommu_process_idr);
> +
> +/*
> + * For the moment this is an all-purpose lock. It serializes
> + * access/modifications to contexts (process-domain links),
> +access/modifications
> + * to the PASID IDR, and changes to process refcount as well.
> + */
> +static DEFINE_SPINLOCK(iommu_process_lock);
> +
> +/*
> + * Allocate a iommu_process structure for the given task.
> + *
> + * Ideally we shouldn't need the domain parameter, since iommu_process
> +is
> + * system-wide, but we use it to retrieve the driver's allocation ops
> +and a
> + * PASID range.
> + */
> +static struct iommu_process *
> +iommu_process_alloc(struct iommu_domain *domain, struct task_struct
> +*task) {
> +	int err;
> +	int pasid;
> +	struct iommu_process *process;
> +
> +	if (WARN_ON(!domain->ops->process_alloc || !domain->ops-
> >process_free))
> +		return ERR_PTR(-ENODEV);
> +
> +	process = domain->ops->process_alloc(task);
> +	if (IS_ERR(process))
> +		return process;
> +	if (!process)
> +		return ERR_PTR(-ENOMEM);
> +
> +	process->pid		= get_task_pid(task, PIDTYPE_PID);
> +	process->release	= domain->ops->process_free;
> +	INIT_LIST_HEAD(&process->domains);
> +	kref_init(&process->kref);
> +
> +	if (!process->pid) {
> +		err = -EINVAL;
> +		goto err_free_process;
> +	}
> +
> +	idr_preload(GFP_KERNEL);
> +	spin_lock(&iommu_process_lock);
> +	pasid = idr_alloc_cyclic(&iommu_process_idr, process, domain->min_pasid,
> +				 domain->max_pasid + 1, GFP_ATOMIC);
> +	process->pasid = pasid;

[Liu, Yi L] If I'm understanding well, here is managing the pasid allocation in iommu
layer instead of vendor iommu driver? Is there strong reason here? I think pasid
management may be better within vendor iommu driver as pasid management
could differ from vendor to vendor.

Regards,
Yi L


^ permalink raw reply	[flat|nested] 268+ messages in thread

* RE: [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
@ 2017-10-23 11:04       ` Liu, Yi L
  0 siblings, 0 replies; 268+ messages in thread
From: Liu, Yi L @ 2017-10-23 11:04 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, Raj, Ashok, robdclark

Hi Jean,

> -----Original Message-----
> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
> Sent: Friday, October 6, 2017 9:31 PM
> To: linux-arm-kernel@lists.infradead.org; linux-pci@vger.kernel.org; linux-
> acpi@vger.kernel.org; devicetree@vger.kernel.org; iommu@lists.linux-
> foundation.org
> Cc: joro@8bytes.org; robh+dt@kernel.org; mark.rutland@arm.com;
> catalin.marinas@arm.com; will.deacon@arm.com; lorenzo.pieralisi@arm.com;
> hanjun.guo@linaro.org; sudeep.holla@arm.com; rjw@rjwysocki.net;
> lenb@kernel.org; robin.murphy@arm.com; bhelgaas@google.com;
> alex.williamson@redhat.com; tn@semihalf.com; liubo95@huawei.com;
> thunder.leizhen@huawei.com; xieyisheng1@huawei.com;
> gabriele.paoloni@huawei.com; nwatters@codeaurora.org; okaya@codeaurora.org;
> rfranz@cavium.com; dwmw2@infradead.org; jacob.jun.pan@linux.intel.com; Liu, Yi
> L <yi.l.liu@intel.com>; Raj, Ashok <ashok.raj@intel.com>; robdclark@gmail.com
> Subject: [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
> 
> IOMMU drivers need a way to bind Linux processes to devices. This is used for
> Shared Virtual Memory (SVM), where devices support paging. In that mode, DMA can
> directly target virtual addresses of a process.
> 
> Introduce boilerplate code for allocating process structures and binding them to
> devices. Four operations are added to IOMMU drivers:
> 
> * process_alloc, process_free: to create an iommu_process structure and
>   perform architecture-specific operations required to grab the process
>   (for instance on ARM SMMU, pin down the CPU ASID). There is a single
>   iommu_process structure per Linux process.
> 
> * process_attach: attach a process to a device. The IOMMU driver checks
>   that the device is capable of sharing an address space with this
>   process, and writes the PASID table entry to install the process page
>   directory.
> 
>   Some IOMMU drivers (e.g. ARM SMMU and virtio-iommu) will have a single
>   PASID table per domain, for convenience. Other can implement it
>   differently but to help these drivers, process_attach and process_detach
>   take a 'first' or 'last' parameter telling whether they need to
>   install/remove the PASID entry or only send the required TLB
>   invalidations.
> 
> * process_detach: detach a process from a device. The IOMMU driver removes
>   the PASID table entry and invalidates the IOTLBs.
> 
> process_attach and process_detach operations are serialized with a spinlock. At the
> moment it is global, but if we try to optimize it, the core should at least prevent
> concurrent attach/detach on the same domain.
> (so multi-level PASID table code can allocate tables lazily without having to go
> through the io-pgtable concurrency nightmare). process_alloc can sleep, but
> process_free must not (because we'll have to call it from
> call_srcu.)
> 
> At the moment we use an IDR for allocating PASIDs and retrieving contexts.
> We also use a single spinlock. These can be refined and optimized later (a custom
> allocator will be needed for top-down PASID allocation).
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  drivers/iommu/Kconfig         |  10 ++
>  drivers/iommu/Makefile        |   1 +
>  drivers/iommu/iommu-process.c | 225
> ++++++++++++++++++++++++++++++++++++++++++
>  drivers/iommu/iommu.c         |   1 +
>  include/linux/iommu.h         |  24 +++++
>  5 files changed, 261 insertions(+)
>  create mode 100644 drivers/iommu/iommu-process.c
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index
> f3a21343e636..1ea5c90e37be 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -74,6 +74,16 @@ config IOMMU_DMA
>  	select IOMMU_IOVA
>  	select NEED_SG_DMA_LENGTH
> 
> +config IOMMU_PROCESS
> +	bool "Process management API for the IOMMU"
> +	select IOMMU_API
> +	help
> +	  Enable process management for the IOMMU API. In systems that support
> +	  it, device drivers can bind processes to devices and share their page
> +	  tables using this API.
> +
> +	  If unsure, say N here.
> +
>  config FSL_PAMU
>  	bool "Freescale IOMMU support"
>  	depends on PCI
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index
> b910aea813a1..a2832edbfaa2 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -1,6 +1,7 @@
>  obj-$(CONFIG_IOMMU_API) += iommu.o
>  obj-$(CONFIG_IOMMU_API) += iommu-traces.o
>  obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
> +obj-$(CONFIG_IOMMU_PROCESS) += iommu-process.o
>  obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o diff --git
> a/drivers/iommu/iommu-process.c b/drivers/iommu/iommu-process.c new file
> mode 100644 index 000000000000..a7e5a1c94305
> --- /dev/null
> +++ b/drivers/iommu/iommu-process.c
> @@ -0,0 +1,225 @@
> +/*
> + * Track processes bound to devices
> + *
> + * This program is free software; you can redistribute it and/or modify
> +it
> + * under the terms of the GNU General Public License version 2 as
> +published
> + * by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307
> +USA
> + *
> + * Copyright (C) 2017 ARM Ltd.
> + *
> + * Author: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>  */
> +
> +#include <linux/idr.h>
> +#include <linux/iommu.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +
> +/* Link between a domain and a process */ struct iommu_context {
> +	struct iommu_process	*process;
> +	struct iommu_domain	*domain;
> +
> +	struct list_head	process_head;
> +	struct list_head	domain_head;
> +
> +	/* Number of devices that use this context */
> +	refcount_t		ref;
> +};
> +
> +/*
> + * Because we're using an IDR, PASIDs are limited to 31 bits (the sign
> +bit is
> + * used for returning errors). In practice implementations will use at
> +most 20
> + * bits, which is the PCI limit.
> + */
> +static DEFINE_IDR(iommu_process_idr);
> +
> +/*
> + * For the moment this is an all-purpose lock. It serializes
> + * access/modifications to contexts (process-domain links),
> +access/modifications
> + * to the PASID IDR, and changes to process refcount as well.
> + */
> +static DEFINE_SPINLOCK(iommu_process_lock);
> +
> +/*
> + * Allocate a iommu_process structure for the given task.
> + *
> + * Ideally we shouldn't need the domain parameter, since iommu_process
> +is
> + * system-wide, but we use it to retrieve the driver's allocation ops
> +and a
> + * PASID range.
> + */
> +static struct iommu_process *
> +iommu_process_alloc(struct iommu_domain *domain, struct task_struct
> +*task) {
> +	int err;
> +	int pasid;
> +	struct iommu_process *process;
> +
> +	if (WARN_ON(!domain->ops->process_alloc || !domain->ops-
> >process_free))
> +		return ERR_PTR(-ENODEV);
> +
> +	process = domain->ops->process_alloc(task);
> +	if (IS_ERR(process))
> +		return process;
> +	if (!process)
> +		return ERR_PTR(-ENOMEM);
> +
> +	process->pid		= get_task_pid(task, PIDTYPE_PID);
> +	process->release	= domain->ops->process_free;
> +	INIT_LIST_HEAD(&process->domains);
> +	kref_init(&process->kref);
> +
> +	if (!process->pid) {
> +		err = -EINVAL;
> +		goto err_free_process;
> +	}
> +
> +	idr_preload(GFP_KERNEL);
> +	spin_lock(&iommu_process_lock);
> +	pasid = idr_alloc_cyclic(&iommu_process_idr, process, domain->min_pasid,
> +				 domain->max_pasid + 1, GFP_ATOMIC);
> +	process->pasid = pasid;

[Liu, Yi L] If I'm understanding well, here is managing the pasid allocation in iommu
layer instead of vendor iommu driver? Is there strong reason here? I think pasid
management may be better within vendor iommu driver as pasid management
could differ from vendor to vendor.

Regards,
Yi L


^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
@ 2017-10-23 11:04       ` Liu, Yi L
  0 siblings, 0 replies; 268+ messages in thread
From: Liu, Yi L @ 2017-10-23 11:04 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jean,

> -----Original Message-----
> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker at arm.com]
> Sent: Friday, October 6, 2017 9:31 PM
> To: linux-arm-kernel at lists.infradead.org; linux-pci at vger.kernel.org; linux-
> acpi at vger.kernel.org; devicetree at vger.kernel.org; iommu at lists.linux-
> foundation.org
> Cc: joro at 8bytes.org; robh+dt at kernel.org; mark.rutland at arm.com;
> catalin.marinas at arm.com; will.deacon at arm.com; lorenzo.pieralisi at arm.com;
> hanjun.guo at linaro.org; sudeep.holla at arm.com; rjw at rjwysocki.net;
> lenb at kernel.org; robin.murphy at arm.com; bhelgaas at google.com;
> alex.williamson at redhat.com; tn at semihalf.com; liubo95 at huawei.com;
> thunder.leizhen at huawei.com; xieyisheng1 at huawei.com;
> gabriele.paoloni at huawei.com; nwatters at codeaurora.org; okaya at codeaurora.org;
> rfranz at cavium.com; dwmw2 at infradead.org; jacob.jun.pan at linux.intel.com; Liu, Yi
> L <yi.l.liu@intel.com>; Raj, Ashok <ashok.raj@intel.com>; robdclark at gmail.com
> Subject: [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
> 
> IOMMU drivers need a way to bind Linux processes to devices. This is used for
> Shared Virtual Memory (SVM), where devices support paging. In that mode, DMA can
> directly target virtual addresses of a process.
> 
> Introduce boilerplate code for allocating process structures and binding them to
> devices. Four operations are added to IOMMU drivers:
> 
> * process_alloc, process_free: to create an iommu_process structure and
>   perform architecture-specific operations required to grab the process
>   (for instance on ARM SMMU, pin down the CPU ASID). There is a single
>   iommu_process structure per Linux process.
> 
> * process_attach: attach a process to a device. The IOMMU driver checks
>   that the device is capable of sharing an address space with this
>   process, and writes the PASID table entry to install the process page
>   directory.
> 
>   Some IOMMU drivers (e.g. ARM SMMU and virtio-iommu) will have a single
>   PASID table per domain, for convenience. Other can implement it
>   differently but to help these drivers, process_attach and process_detach
>   take a 'first' or 'last' parameter telling whether they need to
>   install/remove the PASID entry or only send the required TLB
>   invalidations.
> 
> * process_detach: detach a process from a device. The IOMMU driver removes
>   the PASID table entry and invalidates the IOTLBs.
> 
> process_attach and process_detach operations are serialized with a spinlock. At the
> moment it is global, but if we try to optimize it, the core should at least prevent
> concurrent attach/detach on the same domain.
> (so multi-level PASID table code can allocate tables lazily without having to go
> through the io-pgtable concurrency nightmare). process_alloc can sleep, but
> process_free must not (because we'll have to call it from
> call_srcu.)
> 
> At the moment we use an IDR for allocating PASIDs and retrieving contexts.
> We also use a single spinlock. These can be refined and optimized later (a custom
> allocator will be needed for top-down PASID allocation).
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  drivers/iommu/Kconfig         |  10 ++
>  drivers/iommu/Makefile        |   1 +
>  drivers/iommu/iommu-process.c | 225
> ++++++++++++++++++++++++++++++++++++++++++
>  drivers/iommu/iommu.c         |   1 +
>  include/linux/iommu.h         |  24 +++++
>  5 files changed, 261 insertions(+)
>  create mode 100644 drivers/iommu/iommu-process.c
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index
> f3a21343e636..1ea5c90e37be 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -74,6 +74,16 @@ config IOMMU_DMA
>  	select IOMMU_IOVA
>  	select NEED_SG_DMA_LENGTH
> 
> +config IOMMU_PROCESS
> +	bool "Process management API for the IOMMU"
> +	select IOMMU_API
> +	help
> +	  Enable process management for the IOMMU API. In systems that support
> +	  it, device drivers can bind processes to devices and share their page
> +	  tables using this API.
> +
> +	  If unsure, say N here.
> +
>  config FSL_PAMU
>  	bool "Freescale IOMMU support"
>  	depends on PCI
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index
> b910aea813a1..a2832edbfaa2 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -1,6 +1,7 @@
>  obj-$(CONFIG_IOMMU_API) += iommu.o
>  obj-$(CONFIG_IOMMU_API) += iommu-traces.o
>  obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
> +obj-$(CONFIG_IOMMU_PROCESS) += iommu-process.o
>  obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o diff --git
> a/drivers/iommu/iommu-process.c b/drivers/iommu/iommu-process.c new file
> mode 100644 index 000000000000..a7e5a1c94305
> --- /dev/null
> +++ b/drivers/iommu/iommu-process.c
> @@ -0,0 +1,225 @@
> +/*
> + * Track processes bound to devices
> + *
> + * This program is free software; you can redistribute it and/or modify
> +it
> + * under the terms of the GNU General Public License version 2 as
> +published
> + * by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307
> +USA
> + *
> + * Copyright (C) 2017 ARM Ltd.
> + *
> + * Author: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>  */
> +
> +#include <linux/idr.h>
> +#include <linux/iommu.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +
> +/* Link between a domain and a process */ struct iommu_context {
> +	struct iommu_process	*process;
> +	struct iommu_domain	*domain;
> +
> +	struct list_head	process_head;
> +	struct list_head	domain_head;
> +
> +	/* Number of devices that use this context */
> +	refcount_t		ref;
> +};
> +
> +/*
> + * Because we're using an IDR, PASIDs are limited to 31 bits (the sign
> +bit is
> + * used for returning errors). In practice implementations will use at
> +most 20
> + * bits, which is the PCI limit.
> + */
> +static DEFINE_IDR(iommu_process_idr);
> +
> +/*
> + * For the moment this is an all-purpose lock. It serializes
> + * access/modifications to contexts (process-domain links),
> +access/modifications
> + * to the PASID IDR, and changes to process refcount as well.
> + */
> +static DEFINE_SPINLOCK(iommu_process_lock);
> +
> +/*
> + * Allocate a iommu_process structure for the given task.
> + *
> + * Ideally we shouldn't need the domain parameter, since iommu_process
> +is
> + * system-wide, but we use it to retrieve the driver's allocation ops
> +and a
> + * PASID range.
> + */
> +static struct iommu_process *
> +iommu_process_alloc(struct iommu_domain *domain, struct task_struct
> +*task) {
> +	int err;
> +	int pasid;
> +	struct iommu_process *process;
> +
> +	if (WARN_ON(!domain->ops->process_alloc || !domain->ops-
> >process_free))
> +		return ERR_PTR(-ENODEV);
> +
> +	process = domain->ops->process_alloc(task);
> +	if (IS_ERR(process))
> +		return process;
> +	if (!process)
> +		return ERR_PTR(-ENOMEM);
> +
> +	process->pid		= get_task_pid(task, PIDTYPE_PID);
> +	process->release	= domain->ops->process_free;
> +	INIT_LIST_HEAD(&process->domains);
> +	kref_init(&process->kref);
> +
> +	if (!process->pid) {
> +		err = -EINVAL;
> +		goto err_free_process;
> +	}
> +
> +	idr_preload(GFP_KERNEL);
> +	spin_lock(&iommu_process_lock);
> +	pasid = idr_alloc_cyclic(&iommu_process_idr, process, domain->min_pasid,
> +				 domain->max_pasid + 1, GFP_ATOMIC);
> +	process->pasid = pasid;

[Liu, Yi L] If I'm understanding well, here is managing the pasid allocation in iommu
layer instead of vendor iommu driver? Is there strong reason here? I think pasid
management may be better within vendor iommu driver as pasid management
could differ from vendor to vendor.

Regards,
Yi L

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
  2017-10-23 11:04       ` Liu, Yi L
  (?)
@ 2017-10-23 12:17         ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-23 12:17 UTC (permalink / raw)
  To: Liu, Yi L, linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, Mark Rutland, Catalin Marinas, Will Deacon,
	Lorenzo Pieralisi, hanjun.guo, Sudeep Holla, rjw, lenb,
	Robin Murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni

On 23/10/17 12:04, Liu, Yi L wrote:
>> +	idr_preload(GFP_KERNEL);
>> +	spin_lock(&iommu_process_lock);
>> +	pasid = idr_alloc_cyclic(&iommu_process_idr, process, domain->min_pasid,
>> +				 domain->max_pasid + 1, GFP_ATOMIC);
>> +	process->pasid = pasid;
> 
> [Liu, Yi L] If I'm understanding well, here is managing the pasid allocation in iommu
> layer instead of vendor iommu driver? Is there strong reason here? I think pasid
> management may be better within vendor iommu driver as pasid management
> could differ from vendor to vendor.

But that's the thing, we're trying to abstract PASID and process
management to have it in the core, because there shouldn't be many
differences from vendor to vendor. This way we have the allocation code in
one place and vendor drivers don't have to copy paste it from other drivers.

It's just a global number within a range, so I don't think vendors will
have many different ways of designing it.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
@ 2017-10-23 12:17         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-23 12:17 UTC (permalink / raw)
  To: Liu, Yi L, linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: Mark Rutland, xieyisheng1, gabriele.paoloni, Catalin Marinas,
	Will Deacon, okaya, Lorenzo Pieralisi, Raj, Ashok, tn, joro,
	rfranz, lenb, jacob.jun.pan, alex.williamson, robh+dt,
	thunder.leizhen, bhelgaas, dwmw2, liubo95, rjw, robdclark,
	hanjun.guo, Sudeep Holla, Robin Murphy, nwatters

On 23/10/17 12:04, Liu, Yi L wrote:
>> +	idr_preload(GFP_KERNEL);
>> +	spin_lock(&iommu_process_lock);
>> +	pasid = idr_alloc_cyclic(&iommu_process_idr, process, domain->min_pasid,
>> +				 domain->max_pasid + 1, GFP_ATOMIC);
>> +	process->pasid = pasid;
> 
> [Liu, Yi L] If I'm understanding well, here is managing the pasid allocation in iommu
> layer instead of vendor iommu driver? Is there strong reason here? I think pasid
> management may be better within vendor iommu driver as pasid management
> could differ from vendor to vendor.

But that's the thing, we're trying to abstract PASID and process
management to have it in the core, because there shouldn't be many
differences from vendor to vendor. This way we have the allocation code in
one place and vendor drivers don't have to copy paste it from other drivers.

It's just a global number within a range, so I don't think vendors will
have many different ways of designing it.

Thanks,
Jean





_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
@ 2017-10-23 12:17         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-23 12:17 UTC (permalink / raw)
  To: linux-arm-kernel

On 23/10/17 12:04, Liu, Yi L wrote:
>> +	idr_preload(GFP_KERNEL);
>> +	spin_lock(&iommu_process_lock);
>> +	pasid = idr_alloc_cyclic(&iommu_process_idr, process, domain->min_pasid,
>> +				 domain->max_pasid + 1, GFP_ATOMIC);
>> +	process->pasid = pasid;
> 
> [Liu, Yi L] If I'm understanding well, here is managing the pasid allocation in iommu
> layer instead of vendor iommu driver? Is there strong reason here? I think pasid
> management may be better within vendor iommu driver as pasid management
> could differ from vendor to vendor.

But that's the thing, we're trying to abstract PASID and process
management to have it in the core, because there shouldn't be many
differences from vendor to vendor. This way we have the allocation code in
one place and vendor drivers don't have to copy paste it from other drivers.

It's just a global number within a range, so I don't think vendors will
have many different ways of designing it.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 00/36] Process management for IOMMU + SVM for SMMUv3
  2017-10-12 15:28               ` Jordan Crouse
@ 2017-10-23 13:00                   ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-23 13:00 UTC (permalink / raw)
  To: jcrouse-sgV2jX0FEOL9JmXXK+q4OQ
  Cc: gabriele.paoloni-hv44wF8Li93QT0dZR+AlfA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Hi Jordan,

[Lots of IOMMU people have been dropped from Cc, I've tried to add them back]

On 12/10/17 16:28, Jordan Crouse wrote:
> On Thu, Oct 12, 2017 at 01:55:32PM +0100, Jean-Philippe Brucker wrote:
>> On 12/10/17 13:05, Yisheng Xie wrote:
>> [...]
>>>>>> * An iommu_process can be bound to multiple domains, and a domain can have
>>>>>>   multiple iommu_process.
>>>>> when bind a task to device, can we create a single domain for it? I am thinking
>>>>> about process management without shared PT(for some device only support PASID
>>>>> without pri ability), it seems hard to expand if a domain have multiple iommu_process?
>>>>> Do you have any idea about this?
>>>>
>>>> A device always has to be in a domain, as far as I know. Not supporting
>>>> PRI forces you to pin down all user mappings (or just the ones you use for
>>>> DMA) but you should sill be able to share PT. Now if you don't support
>>>> shared PT either, but only PASID, then you'll have to use io-pgtable and a
>>>> new map/unmap API on an iommu_process. I don't understand your concern
>>>> though, how would the link between process and domains prevent this use-case?
>>>>
>>> So you mean that if an iommu_process bind to multiple devices it should create
>>> multiple io-pgtables? or just share the same io-pgtable?
>>
>> I don't know to be honest, I haven't thought much about the io-pgtable
>> case, I'm all about sharing the mm :)
>>
>> It really depends on what the user (GPU driver I assume) wants. I think
>> that if you're not sharing an mm with the device, then you're trying to
>> hide parts of the process to the device, so you'd also want the
>> flexibility of having different io-pgtables between devices. Different
>> devices accessing isolated parts of the process requires separate io-pgtables.
> 
> In our specific Snapdragon use case the GPU is the only entity that cares about
> process specific io-pgtables.  Everything else (display, video, camera) is happy
> using a global io-ptgable.  The reasoning is that the GPU is programmable from
> user space and can be easily used to copy data whereas the other use cases have
> mostly fixed functions.
> 
> Even if different devices did want to have a process specific io-pgtable I doubt
> we would share them.  Every device uses the IOMMU differently and the magic
> needed to share a io-pgtable between (for example) a GPU and a DSP would be
> prohibitively complicated.
> 
> Jordan



More context here:
https://www.mail-archive.com/iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org/msg20368.html

So to summarize the Snapdragon case, if I understand correctly you need
two additional features:

(1) A way to create process address spaces, that are not bound to an mm
but to a separate io-pgtable. And a way to map/unmap these contexts.

(2) A way to obtain the PGD in order to program it into the GPU. And also
the ASID I suppose? What about TCR and MAIR?

For (1), I can see some value in isolating process contexts with
io-pgtable without going all the way and sharing the mm. The IOVA=VA
use-case feels a bit weak. But it does provide better isolation than
dma_map/unmap, if the GPU is in charge of PASIDs then two processes that
execute code on the GPU cannot access each others' DMA buffers. Maybe
other users will want that feature (but they really should be using bind_mm!).

In next version I'm going to replace iommu_process_bind by something like
iommu_sva_bind_mm, which reduces the scope of the API I'm introducing and
doesn't fit your case anymore. What you need is a shortcut into the PASID
allocator, a way to allocate a private PASID with io-pgtables instead of
one backed by an mm. Something like:

iommu_sva_alloc_pasid(domain, dev) -> pasid
iommu_sva_map(pasid, iova, size, flags)
iommu_sva_unmap(pasid, iova, size)
iommu_sva_free_pasid(domain, pasid)

Then for (2) the GPU is tightly integrated into the SMMU and can switch
contexts. I might be wrong but I don't see this case becoming standard as
new implementations move to PASIDs, we shouldn't spend too much time
making it generic. But to make it fit into the PASID API, how about the
following.

We provide a backdoor to the GPU driver, allowing it to register PASID ops
into SMMUv2 driver:

struct smmuv2_pasid_ops {
	int (*install_pasid)(struct iommu_domain, int pasid, ttbr, asid
			     and whatnot);
	void (*remove_pasid)(struct iommu_domain, int pasid);
}

On PASID-capable IOMMUs, iommu_sva_alloc_pasid would install a context
descriptor into the PASID tables (owned by the IOMMU), pointing to the
io-pgtable. As SMMUv2 doesn't support PASID, iommu_sva_alloc_pasid
wouldn't actually install a context descriptor but instead call back into
the GPU driver with install_pasid. The GPU can then do its thing, call
sva_map/unmap, and switch contexts.

The good thing is that (1) and (2) are separate, so you get the same
callbacks if you're using iommu_sva_bind_mm instead of the private pasid
thing.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 00/36] Process management for IOMMU + SVM for SMMUv3
@ 2017-10-23 13:00                   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-23 13:00 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jordan,

[Lots of IOMMU people have been dropped from Cc, I've tried to add them back]

On 12/10/17 16:28, Jordan Crouse wrote:
> On Thu, Oct 12, 2017 at 01:55:32PM +0100, Jean-Philippe Brucker wrote:
>> On 12/10/17 13:05, Yisheng Xie wrote:
>> [...]
>>>>>> * An iommu_process can be bound to multiple domains, and a domain can have
>>>>>>   multiple iommu_process.
>>>>> when bind a task to device, can we create a single domain for it? I am thinking
>>>>> about process management without shared PT(for some device only support PASID
>>>>> without pri ability), it seems hard to expand if a domain have multiple iommu_process?
>>>>> Do you have any idea about this?
>>>>
>>>> A device always has to be in a domain, as far as I know. Not supporting
>>>> PRI forces you to pin down all user mappings (or just the ones you use for
>>>> DMA) but you should sill be able to share PT. Now if you don't support
>>>> shared PT either, but only PASID, then you'll have to use io-pgtable and a
>>>> new map/unmap API on an iommu_process. I don't understand your concern
>>>> though, how would the link between process and domains prevent this use-case?
>>>>
>>> So you mean that if an iommu_process bind to multiple devices it should create
>>> multiple io-pgtables? or just share the same io-pgtable?
>>
>> I don't know to be honest, I haven't thought much about the io-pgtable
>> case, I'm all about sharing the mm :)
>>
>> It really depends on what the user (GPU driver I assume) wants. I think
>> that if you're not sharing an mm with the device, then you're trying to
>> hide parts of the process to the device, so you'd also want the
>> flexibility of having different io-pgtables between devices. Different
>> devices accessing isolated parts of the process requires separate io-pgtables.
> 
> In our specific Snapdragon use case the GPU is the only entity that cares about
> process specific io-pgtables.  Everything else (display, video, camera) is happy
> using a global io-ptgable.  The reasoning is that the GPU is programmable from
> user space and can be easily used to copy data whereas the other use cases have
> mostly fixed functions.
> 
> Even if different devices did want to have a process specific io-pgtable I doubt
> we would share them.  Every device uses the IOMMU differently and the magic
> needed to share a io-pgtable between (for example) a GPU and a DSP would be
> prohibitively complicated.
> 
> Jordan



More context here:
https://www.mail-archive.com/iommu at lists.linux-foundation.org/msg20368.html

So to summarize the Snapdragon case, if I understand correctly you need
two additional features:

(1) A way to create process address spaces, that are not bound to an mm
but to a separate io-pgtable. And a way to map/unmap these contexts.

(2) A way to obtain the PGD in order to program it into the GPU. And also
the ASID I suppose? What about TCR and MAIR?

For (1), I can see some value in isolating process contexts with
io-pgtable without going all the way and sharing the mm. The IOVA=VA
use-case feels a bit weak. But it does provide better isolation than
dma_map/unmap, if the GPU is in charge of PASIDs then two processes that
execute code on the GPU cannot access each others' DMA buffers. Maybe
other users will want that feature (but they really should be using bind_mm!).

In next version I'm going to replace iommu_process_bind by something like
iommu_sva_bind_mm, which reduces the scope of the API I'm introducing and
doesn't fit your case anymore. What you need is a shortcut into the PASID
allocator, a way to allocate a private PASID with io-pgtables instead of
one backed by an mm. Something like:

iommu_sva_alloc_pasid(domain, dev) -> pasid
iommu_sva_map(pasid, iova, size, flags)
iommu_sva_unmap(pasid, iova, size)
iommu_sva_free_pasid(domain, pasid)

Then for (2) the GPU is tightly integrated into the SMMU and can switch
contexts. I might be wrong but I don't see this case becoming standard as
new implementations move to PASIDs, we shouldn't spend too much time
making it generic. But to make it fit into the PASID API, how about the
following.

We provide a backdoor to the GPU driver, allowing it to register PASID ops
into SMMUv2 driver:

struct smmuv2_pasid_ops {
	int (*install_pasid)(struct iommu_domain, int pasid, ttbr, asid
			     and whatnot);
	void (*remove_pasid)(struct iommu_domain, int pasid);
}

On PASID-capable IOMMUs, iommu_sva_alloc_pasid would install a context
descriptor into the PASID tables (owned by the IOMMU), pointing to the
io-pgtable. As SMMUv2 doesn't support PASID, iommu_sva_alloc_pasid
wouldn't actually install a context descriptor but instead call back into
the GPU driver with install_pasid. The GPU can then do its thing, call
sva_map/unmap, and switch contexts.

The good thing is that (1) and (2) are separate, so you get the same
callbacks if you're using iommu_sva_bind_mm instead of the private pasid
thing.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
  2017-10-23 12:17         ` Jean-Philippe Brucker
  (?)
@ 2017-10-25 18:05             ` Raj, Ashok
  -1 siblings, 0 replies; 268+ messages in thread
From: Raj, Ashok @ 2017-10-25 18:05 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Mark Rutland, gabriele.paoloni-hv44wF8Li93QT0dZR+AlfA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	Catalin Marinas, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, devicetree-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw-LthD3rsA81gm4RdzfppkhA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Sudeep Holla

Hi Jean

On Mon, Oct 23, 2017 at 01:17:07PM +0100, Jean-Philippe Brucker wrote:
> On 23/10/17 12:04, Liu, Yi L wrote:
> >> +	idr_preload(GFP_KERNEL);
> >> +	spin_lock(&iommu_process_lock);
> >> +	pasid = idr_alloc_cyclic(&iommu_process_idr, process, domain->min_pasid,
> >> +				 domain->max_pasid + 1, GFP_ATOMIC);
> >> +	process->pasid = pasid;
> > 
> > [Liu, Yi L] If I'm understanding well, here is managing the pasid allocation in iommu
> > layer instead of vendor iommu driver? Is there strong reason here? I think pasid
> > management may be better within vendor iommu driver as pasid management
> > could differ from vendor to vendor.
> 
> But that's the thing, we're trying to abstract PASID and process
> management to have it in the core, because there shouldn't be many
> differences from vendor to vendor. This way we have the allocation code in
> one place and vendor drivers don't have to copy paste it from other drivers.

I think this makes sense for the native case and also in the vIOMMU
if the PASID tables and allocation are completely managed by the guest.

If the vIOMMU requires any co-ordination in how the PASID's are allocated
for guest devices there might need to be some control on how these are 
allocated that ultimately need to be managed by VMM/Physical IOMMU. For 
instance if the PASID space is sparse for e.g

if we make the PASID allocation as one of the ops, the IOMMU implementation
will choose the default function, or if it choose a differnt mechanism it would
have that flexibility.

Does this make sense?

Cheers,
Ashok

> 
> It's just a global number within a range, so I don't think vendors will
> have many different ways of designing it.
> 
> Thanks,
> Jean
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
@ 2017-10-25 18:05             ` Raj, Ashok
  0 siblings, 0 replies; 268+ messages in thread
From: Raj, Ashok @ 2017-10-25 18:05 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Liu, Yi L, linux-arm-kernel, linux-pci, linux-acpi, devicetree,
	iommu, joro, robh+dt, Mark Rutland, Catalin Marinas, Will Deacon,
	Lorenzo Pieralisi, hanjun.guo, Sudeep Holla, rjw, lenb,
	Robin Murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, robdclark, Ashok Raj

Hi Jean

On Mon, Oct 23, 2017 at 01:17:07PM +0100, Jean-Philippe Brucker wrote:
> On 23/10/17 12:04, Liu, Yi L wrote:
> >> +	idr_preload(GFP_KERNEL);
> >> +	spin_lock(&iommu_process_lock);
> >> +	pasid = idr_alloc_cyclic(&iommu_process_idr, process, domain->min_pasid,
> >> +				 domain->max_pasid + 1, GFP_ATOMIC);
> >> +	process->pasid = pasid;
> > 
> > [Liu, Yi L] If I'm understanding well, here is managing the pasid allocation in iommu
> > layer instead of vendor iommu driver? Is there strong reason here? I think pasid
> > management may be better within vendor iommu driver as pasid management
> > could differ from vendor to vendor.
> 
> But that's the thing, we're trying to abstract PASID and process
> management to have it in the core, because there shouldn't be many
> differences from vendor to vendor. This way we have the allocation code in
> one place and vendor drivers don't have to copy paste it from other drivers.

I think this makes sense for the native case and also in the vIOMMU
if the PASID tables and allocation are completely managed by the guest.

If the vIOMMU requires any co-ordination in how the PASID's are allocated
for guest devices there might need to be some control on how these are 
allocated that ultimately need to be managed by VMM/Physical IOMMU. For 
instance if the PASID space is sparse for e.g

if we make the PASID allocation as one of the ops, the IOMMU implementation
will choose the default function, or if it choose a differnt mechanism it would
have that flexibility.

Does this make sense?

Cheers,
Ashok

> 
> It's just a global number within a range, so I don't think vendors will
> have many different ways of designing it.
> 
> Thanks,
> Jean
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
@ 2017-10-25 18:05             ` Raj, Ashok
  0 siblings, 0 replies; 268+ messages in thread
From: Raj, Ashok @ 2017-10-25 18:05 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jean

On Mon, Oct 23, 2017 at 01:17:07PM +0100, Jean-Philippe Brucker wrote:
> On 23/10/17 12:04, Liu, Yi L wrote:
> >> +	idr_preload(GFP_KERNEL);
> >> +	spin_lock(&iommu_process_lock);
> >> +	pasid = idr_alloc_cyclic(&iommu_process_idr, process, domain->min_pasid,
> >> +				 domain->max_pasid + 1, GFP_ATOMIC);
> >> +	process->pasid = pasid;
> > 
> > [Liu, Yi L] If I'm understanding well, here is managing the pasid allocation in iommu
> > layer instead of vendor iommu driver? Is there strong reason here? I think pasid
> > management may be better within vendor iommu driver as pasid management
> > could differ from vendor to vendor.
> 
> But that's the thing, we're trying to abstract PASID and process
> management to have it in the core, because there shouldn't be many
> differences from vendor to vendor. This way we have the allocation code in
> one place and vendor drivers don't have to copy paste it from other drivers.

I think this makes sense for the native case and also in the vIOMMU
if the PASID tables and allocation are completely managed by the guest.

If the vIOMMU requires any co-ordination in how the PASID's are allocated
for guest devices there might need to be some control on how these are 
allocated that ultimately need to be managed by VMM/Physical IOMMU. For 
instance if the PASID space is sparse for e.g

if we make the PASID allocation as one of the ops, the IOMMU implementation
will choose the default function, or if it choose a differnt mechanism it would
have that flexibility.

Does this make sense?

Cheers,
Ashok

> 
> It's just a global number within a range, so I don't think vendors will
> have many different ways of designing it.
> 
> Thanks,
> Jean
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 00/36] Process management for IOMMU + SVM for SMMUv3
  2017-10-23 13:00                   ` Jean-Philippe Brucker
@ 2017-10-25 20:20                       ` Jordan Crouse
  -1 siblings, 0 replies; 268+ messages in thread
From: Jordan Crouse @ 2017-10-25 20:20 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: gabriele.paoloni-hv44wF8Li93QT0dZR+AlfA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Mon, Oct 23, 2017 at 02:00:07PM +0100, Jean-Philippe Brucker wrote:
> Hi Jordan,
> 
> [Lots of IOMMU people have been dropped from Cc, I've tried to add them back]
> 
> On 12/10/17 16:28, Jordan Crouse wrote:
> > On Thu, Oct 12, 2017 at 01:55:32PM +0100, Jean-Philippe Brucker wrote:
> >> On 12/10/17 13:05, Yisheng Xie wrote:
> >> [...]
> >>>>>> * An iommu_process can be bound to multiple domains, and a domain can have
> >>>>>>   multiple iommu_process.
> >>>>> when bind a task to device, can we create a single domain for it? I am thinking
> >>>>> about process management without shared PT(for some device only support PASID
> >>>>> without pri ability), it seems hard to expand if a domain have multiple iommu_process?
> >>>>> Do you have any idea about this?
> >>>>
> >>>> A device always has to be in a domain, as far as I know. Not supporting
> >>>> PRI forces you to pin down all user mappings (or just the ones you use for
> >>>> DMA) but you should sill be able to share PT. Now if you don't support
> >>>> shared PT either, but only PASID, then you'll have to use io-pgtable and a
> >>>> new map/unmap API on an iommu_process. I don't understand your concern
> >>>> though, how would the link between process and domains prevent this use-case?
> >>>>
> >>> So you mean that if an iommu_process bind to multiple devices it should create
> >>> multiple io-pgtables? or just share the same io-pgtable?
> >>
> >> I don't know to be honest, I haven't thought much about the io-pgtable
> >> case, I'm all about sharing the mm :)
> >>
> >> It really depends on what the user (GPU driver I assume) wants. I think
> >> that if you're not sharing an mm with the device, then you're trying to
> >> hide parts of the process to the device, so you'd also want the
> >> flexibility of having different io-pgtables between devices. Different
> >> devices accessing isolated parts of the process requires separate io-pgtables.
> > 
> > In our specific Snapdragon use case the GPU is the only entity that cares about
> > process specific io-pgtables.  Everything else (display, video, camera) is happy
> > using a global io-ptgable.  The reasoning is that the GPU is programmable from
> > user space and can be easily used to copy data whereas the other use cases have
> > mostly fixed functions.
> > 
> > Even if different devices did want to have a process specific io-pgtable I doubt
> > we would share them.  Every device uses the IOMMU differently and the magic
> > needed to share a io-pgtable between (for example) a GPU and a DSP would be
> > prohibitively complicated.
> > 
> > Jordan
> 
> 
> 
> More context here:
> https://www.mail-archive.com/iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org/msg20368.html
> 
> So to summarize the Snapdragon case, if I understand correctly you need
> two additional features:
> 
> (1) A way to create process address spaces, that are not bound to an mm
> but to a separate io-pgtable. And a way to map/unmap these contexts.

Correct.

> (2) A way to obtain the PGD in order to program it into the GPU. And also
> the ASID I suppose? What about TCR and MAIR?
>
PGD and ASID.  Not the TCR and MAIR, at least not in the current iteration.

> For (1), I can see some value in isolating process contexts with
> io-pgtable without going all the way and sharing the mm. The IOVA=VA
> use-case feels a bit weak. But it does provide better isolation than
> dma_map/unmap, if the GPU is in charge of PASIDs then two processes that
> execute code on the GPU cannot access each others' DMA buffers. Maybe
> other users will want that feature (but they really should be using bind_mm!).

That is exactly the use case.  A real-world attach vector in the mobile GPU
world is a malicious app that knows that knows that if have a banking app
active and copies the surfaces or at the very least scribbles over everything
and is very rude.

> In next version I'm going to replace iommu_process_bind by something like
> iommu_sva_bind_mm, which reduces the scope of the API I'm introducing and
> doesn't fit your case anymore. What you need is a shortcut into the PASID
> allocator, a way to allocate a private PASID with io-pgtables instead of
> one backed by an mm. Something like:
> 
> iommu_sva_alloc_pasid(domain, dev) -> pasid
> iommu_sva_map(pasid, iova, size, flags)
> iommu_sva_unmap(pasid, iova, size)
> iommu_sva_free_pasid(domain, pasid)

Yep, that matches up with my thinking.

> Then for (2) the GPU is tightly integrated into the SMMU and can switch
> contexts. I might be wrong but I don't see this case becoming standard as
> new implementations move to PASIDs, we shouldn't spend too much time
> making it generic.

Agreed. This is rather specific use case.

> But to make it fit into the PASID API, how about the following.


> We provide a backdoor to the GPU driver, allowing it to register PASID ops
> into SMMUv2 driver:
> 
> struct smmuv2_pasid_ops {
> 	int (*install_pasid)(struct iommu_domain, int pasid, ttbr, asid
> 			     and whatnot);
> 	void (*remove_pasid)(struct iommu_domain, int pasid);
> }
> 
> On PASID-capable IOMMUs, iommu_sva_alloc_pasid would install a context
> descriptor into the PASID tables (owned by the IOMMU), pointing to the
> io-pgtable. As SMMUv2 doesn't support PASID, iommu_sva_alloc_pasid
> wouldn't actually install a context descriptor but instead call back into
> the GPU driver with install_pasid. The GPU can then do its thing, call
> sva_map/unmap, and switch contexts.
> 
> The good thing is that (1) and (2) are separate, so you get the same
> callbacks if you're using iommu_sva_bind_mm instead of the private pasid
> thing.

This sounds ideal. It seems to scratch all the right itches that we have. 

Thanks for thinking about this use case. I appreciate your time.

Jordan

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 00/36] Process management for IOMMU + SVM for SMMUv3
@ 2017-10-25 20:20                       ` Jordan Crouse
  0 siblings, 0 replies; 268+ messages in thread
From: Jordan Crouse @ 2017-10-25 20:20 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 23, 2017 at 02:00:07PM +0100, Jean-Philippe Brucker wrote:
> Hi Jordan,
> 
> [Lots of IOMMU people have been dropped from Cc, I've tried to add them back]
> 
> On 12/10/17 16:28, Jordan Crouse wrote:
> > On Thu, Oct 12, 2017 at 01:55:32PM +0100, Jean-Philippe Brucker wrote:
> >> On 12/10/17 13:05, Yisheng Xie wrote:
> >> [...]
> >>>>>> * An iommu_process can be bound to multiple domains, and a domain can have
> >>>>>>   multiple iommu_process.
> >>>>> when bind a task to device, can we create a single domain for it? I am thinking
> >>>>> about process management without shared PT(for some device only support PASID
> >>>>> without pri ability), it seems hard to expand if a domain have multiple iommu_process?
> >>>>> Do you have any idea about this?
> >>>>
> >>>> A device always has to be in a domain, as far as I know. Not supporting
> >>>> PRI forces you to pin down all user mappings (or just the ones you use for
> >>>> DMA) but you should sill be able to share PT. Now if you don't support
> >>>> shared PT either, but only PASID, then you'll have to use io-pgtable and a
> >>>> new map/unmap API on an iommu_process. I don't understand your concern
> >>>> though, how would the link between process and domains prevent this use-case?
> >>>>
> >>> So you mean that if an iommu_process bind to multiple devices it should create
> >>> multiple io-pgtables? or just share the same io-pgtable?
> >>
> >> I don't know to be honest, I haven't thought much about the io-pgtable
> >> case, I'm all about sharing the mm :)
> >>
> >> It really depends on what the user (GPU driver I assume) wants. I think
> >> that if you're not sharing an mm with the device, then you're trying to
> >> hide parts of the process to the device, so you'd also want the
> >> flexibility of having different io-pgtables between devices. Different
> >> devices accessing isolated parts of the process requires separate io-pgtables.
> > 
> > In our specific Snapdragon use case the GPU is the only entity that cares about
> > process specific io-pgtables.  Everything else (display, video, camera) is happy
> > using a global io-ptgable.  The reasoning is that the GPU is programmable from
> > user space and can be easily used to copy data whereas the other use cases have
> > mostly fixed functions.
> > 
> > Even if different devices did want to have a process specific io-pgtable I doubt
> > we would share them.  Every device uses the IOMMU differently and the magic
> > needed to share a io-pgtable between (for example) a GPU and a DSP would be
> > prohibitively complicated.
> > 
> > Jordan
> 
> 
> 
> More context here:
> https://www.mail-archive.com/iommu at lists.linux-foundation.org/msg20368.html
> 
> So to summarize the Snapdragon case, if I understand correctly you need
> two additional features:
> 
> (1) A way to create process address spaces, that are not bound to an mm
> but to a separate io-pgtable. And a way to map/unmap these contexts.

Correct.

> (2) A way to obtain the PGD in order to program it into the GPU. And also
> the ASID I suppose? What about TCR and MAIR?
>
PGD and ASID.  Not the TCR and MAIR, at least not in the current iteration.

> For (1), I can see some value in isolating process contexts with
> io-pgtable without going all the way and sharing the mm. The IOVA=VA
> use-case feels a bit weak. But it does provide better isolation than
> dma_map/unmap, if the GPU is in charge of PASIDs then two processes that
> execute code on the GPU cannot access each others' DMA buffers. Maybe
> other users will want that feature (but they really should be using bind_mm!).

That is exactly the use case.  A real-world attach vector in the mobile GPU
world is a malicious app that knows that knows that if have a banking app
active and copies the surfaces or at the very least scribbles over everything
and is very rude.

> In next version I'm going to replace iommu_process_bind by something like
> iommu_sva_bind_mm, which reduces the scope of the API I'm introducing and
> doesn't fit your case anymore. What you need is a shortcut into the PASID
> allocator, a way to allocate a private PASID with io-pgtables instead of
> one backed by an mm. Something like:
> 
> iommu_sva_alloc_pasid(domain, dev) -> pasid
> iommu_sva_map(pasid, iova, size, flags)
> iommu_sva_unmap(pasid, iova, size)
> iommu_sva_free_pasid(domain, pasid)

Yep, that matches up with my thinking.

> Then for (2) the GPU is tightly integrated into the SMMU and can switch
> contexts. I might be wrong but I don't see this case becoming standard as
> new implementations move to PASIDs, we shouldn't spend too much time
> making it generic.

Agreed. This is rather specific use case.

> But to make it fit into the PASID API, how about the following.


> We provide a backdoor to the GPU driver, allowing it to register PASID ops
> into SMMUv2 driver:
> 
> struct smmuv2_pasid_ops {
> 	int (*install_pasid)(struct iommu_domain, int pasid, ttbr, asid
> 			     and whatnot);
> 	void (*remove_pasid)(struct iommu_domain, int pasid);
> }
> 
> On PASID-capable IOMMUs, iommu_sva_alloc_pasid would install a context
> descriptor into the PASID tables (owned by the IOMMU), pointing to the
> io-pgtable. As SMMUv2 doesn't support PASID, iommu_sva_alloc_pasid
> wouldn't actually install a context descriptor but instead call back into
> the GPU driver with install_pasid. The GPU can then do its thing, call
> sva_map/unmap, and switch contexts.
> 
> The good thing is that (1) and (2) are separate, so you get the same
> callbacks if you're using iommu_sva_bind_mm instead of the private pasid
> thing.

This sounds ideal. It seems to scratch all the right itches that we have. 

Thanks for thinking about this use case. I appreciate your time.

Jordan

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
  2017-10-25 18:05             ` Raj, Ashok
  (?)
@ 2017-10-30 10:28               ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-30 10:28 UTC (permalink / raw)
  To: Raj, Ashok
  Cc: Mark Rutland, gabriele.paoloni-hv44wF8Li93QT0dZR+AlfA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	Catalin Marinas, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, devicetree-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw-LthD3rsA81gm4RdzfppkhA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Sudeep Holla

On 25/10/17 19:05, Raj, Ashok wrote:
> Hi Jean
> 
> On Mon, Oct 23, 2017 at 01:17:07PM +0100, Jean-Philippe Brucker wrote:
>> On 23/10/17 12:04, Liu, Yi L wrote:
>>>> +	idr_preload(GFP_KERNEL);
>>>> +	spin_lock(&iommu_process_lock);
>>>> +	pasid = idr_alloc_cyclic(&iommu_process_idr, process, domain->min_pasid,
>>>> +				 domain->max_pasid + 1, GFP_ATOMIC);
>>>> +	process->pasid = pasid;
>>>
>>> [Liu, Yi L] If I'm understanding well, here is managing the pasid allocation in iommu
>>> layer instead of vendor iommu driver? Is there strong reason here? I think pasid
>>> management may be better within vendor iommu driver as pasid management
>>> could differ from vendor to vendor.
>>
>> But that's the thing, we're trying to abstract PASID and process
>> management to have it in the core, because there shouldn't be many
>> differences from vendor to vendor. This way we have the allocation code in
>> one place and vendor drivers don't have to copy paste it from other drivers.
> 
> I think this makes sense for the native case and also in the vIOMMU
> if the PASID tables and allocation are completely managed by the guest.
> 
> If the vIOMMU requires any co-ordination in how the PASID's are allocated
> for guest devices there might need to be some control on how these are 
> allocated that ultimately need to be managed by VMM/Physical IOMMU. For 
> instance if the PASID space is sparse for e.g

(I don't have your example)

> if we make the PASID allocation as one of the ops, the IOMMU implementation
> will choose the default function, or if it choose a differnt mechanism it would
> have that flexibility.
> 
> Does this make sense?

If the PASID space is sparse, maybe we can add a firmware or probe
mechanism to declare reserved PASIDs, like we have for reserved IOVAs,
that feeds into the core IOMMU driver. But I agree that we can always let
vendor drivers implement their own allocator if they need one in the
future. For the moment it can stay generic.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
@ 2017-10-30 10:28               ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-30 10:28 UTC (permalink / raw)
  To: Raj, Ashok
  Cc: Mark Rutland, xieyisheng1, gabriele.paoloni, linux-pci,
	Will Deacon, okaya, Lorenzo Pieralisi, Liu, Yi L, tn, joro,
	robdclark, linux-acpi, Catalin Marinas, rfranz, lenb, devicetree,
	jacob.jun.pan, alex.williamson, robh+dt, thunder.leizhen,
	bhelgaas, linux-arm-kernel, dwmw2, liubo95, rjw, iommu,
	hanjun.guo, Sudeep Holla, Robin Murphy, nwatters

On 25/10/17 19:05, Raj, Ashok wrote:
> Hi Jean
> 
> On Mon, Oct 23, 2017 at 01:17:07PM +0100, Jean-Philippe Brucker wrote:
>> On 23/10/17 12:04, Liu, Yi L wrote:
>>>> +	idr_preload(GFP_KERNEL);
>>>> +	spin_lock(&iommu_process_lock);
>>>> +	pasid = idr_alloc_cyclic(&iommu_process_idr, process, domain->min_pasid,
>>>> +				 domain->max_pasid + 1, GFP_ATOMIC);
>>>> +	process->pasid = pasid;
>>>
>>> [Liu, Yi L] If I'm understanding well, here is managing the pasid allocation in iommu
>>> layer instead of vendor iommu driver? Is there strong reason here? I think pasid
>>> management may be better within vendor iommu driver as pasid management
>>> could differ from vendor to vendor.
>>
>> But that's the thing, we're trying to abstract PASID and process
>> management to have it in the core, because there shouldn't be many
>> differences from vendor to vendor. This way we have the allocation code in
>> one place and vendor drivers don't have to copy paste it from other drivers.
> 
> I think this makes sense for the native case and also in the vIOMMU
> if the PASID tables and allocation are completely managed by the guest.
> 
> If the vIOMMU requires any co-ordination in how the PASID's are allocated
> for guest devices there might need to be some control on how these are 
> allocated that ultimately need to be managed by VMM/Physical IOMMU. For 
> instance if the PASID space is sparse for e.g

(I don't have your example)

> if we make the PASID allocation as one of the ops, the IOMMU implementation
> will choose the default function, or if it choose a differnt mechanism it would
> have that flexibility.
> 
> Does this make sense?

If the PASID space is sparse, maybe we can add a firmware or probe
mechanism to declare reserved PASIDs, like we have for reserved IOVAs,
that feeds into the core IOMMU driver. But I agree that we can always let
vendor drivers implement their own allocator if they need one in the
future. For the moment it can stay generic.

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
@ 2017-10-30 10:28               ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-10-30 10:28 UTC (permalink / raw)
  To: linux-arm-kernel

On 25/10/17 19:05, Raj, Ashok wrote:
> Hi Jean
> 
> On Mon, Oct 23, 2017 at 01:17:07PM +0100, Jean-Philippe Brucker wrote:
>> On 23/10/17 12:04, Liu, Yi L wrote:
>>>> +	idr_preload(GFP_KERNEL);
>>>> +	spin_lock(&iommu_process_lock);
>>>> +	pasid = idr_alloc_cyclic(&iommu_process_idr, process, domain->min_pasid,
>>>> +				 domain->max_pasid + 1, GFP_ATOMIC);
>>>> +	process->pasid = pasid;
>>>
>>> [Liu, Yi L] If I'm understanding well, here is managing the pasid allocation in iommu
>>> layer instead of vendor iommu driver? Is there strong reason here? I think pasid
>>> management may be better within vendor iommu driver as pasid management
>>> could differ from vendor to vendor.
>>
>> But that's the thing, we're trying to abstract PASID and process
>> management to have it in the core, because there shouldn't be many
>> differences from vendor to vendor. This way we have the allocation code in
>> one place and vendor drivers don't have to copy paste it from other drivers.
> 
> I think this makes sense for the native case and also in the vIOMMU
> if the PASID tables and allocation are completely managed by the guest.
> 
> If the vIOMMU requires any co-ordination in how the PASID's are allocated
> for guest devices there might need to be some control on how these are 
> allocated that ultimately need to be managed by VMM/Physical IOMMU. For 
> instance if the PASID space is sparse for e.g

(I don't have your example)

> if we make the PASID allocation as one of the ops, the IOMMU implementation
> will choose the default function, or if it choose a differnt mechanism it would
> have that flexibility.
> 
> Does this make sense?

If the PASID space is sparse, maybe we can add a firmware or probe
mechanism to declare reserved PASIDs, like we have for reserved IOVAs,
that feeds into the core IOMMU driver. But I agree that we can always let
vendor drivers implement their own allocator if they need one in the
future. For the moment it can stay generic.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* RE: [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for Substream IDs
  2017-10-06 13:31   ` Jean-Philippe Brucker
  (?)
@ 2017-11-02 12:49     ` Shameerali Kolothum Thodi
  -1 siblings, 0 replies; 268+ messages in thread
From: Shameerali Kolothum Thodi @ 2017-11-02 12:49 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu
  Cc: mark.rutland, xieyisheng (A),
	Gabriele Paoloni, catalin.marinas, will.deacon, okaya, yi.l.liu,
	lorenzo.pieralisi, ashok.raj, tn, joro, rfranz, lenb,
	jacob.jun.pan, alex.williamson, robh+dt,

Hi Jean,

> -----Original Message-----
> From: linux-arm-kernel [mailto:linux-arm-kernel-bounces@lists.infradead.org]
> On Behalf Of Jean-Philippe Brucker
> Sent: Friday, October 06, 2017 2:32 PM
> To: linux-arm-kernel@lists.infradead.org; linux-pci@vger.kernel.org; linux-
> acpi@vger.kernel.org; devicetree@vger.kernel.org; iommu@lists.linux-
> foundation.org
> Cc: mark.rutland@arm.com; xieyisheng (A) <xieyisheng1@huawei.com>;
> Gabriele Paoloni <gabriele.paoloni@huawei.com>; catalin.marinas@arm.com;
> will.deacon@arm.com; okaya@codeaurora.org; yi.l.liu@intel.com;
> lorenzo.pieralisi@arm.com; ashok.raj@intel.com; tn@semihalf.com;
> joro@8bytes.org; rfranz@cavium.com; lenb@kernel.org;
> jacob.jun.pan@linux.intel.com; alex.williamson@redhat.com;
> robh+dt@kernel.org; Leizhen (ThunderTown) <thunder.leizhen@huawei.com>;
> bhelgaas@google.com; dwmw2@infradead.org; liubo (CU)
> <liubo95@huawei.com>; rjw@rjwysocki.net; robdclark@gmail.com;
> hanjun.guo@linaro.org; sudeep.holla@arm.com; robin.murphy@arm.com;
> nwatters@codeaurora.org
> Subject: [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for
> Substream IDs
> 
> At the moment, the SMMUv3 driver offers only one stage-1 or stage-2
> address space to each device. SMMUv3 allows to associate multiple address
> spaces per device. In addition to the Stream ID (SID), that identifies a
> device, we can now have Substream IDs (SSID) identifying an address space.
> In PCIe lingo, SID is called Requester ID (RID) and SSID is called Process
> Address-Space ID (PASID).

We had a go with this series on HiSIlicon D05 platform which doesn't have
support for ssids/ATS/PRI, to make sure it generally works.

But observed the below crash on boot,

[   16.009084] WARNING: CPU: 59 PID: 391 at mm/page_alloc.c:3883 __alloc_pages_nodemask+0x19c/0xc48
[   16.026797] Modules linked in:
[   16.032944] CPU: 59 PID: 391 Comm: kworker/59:1 Not tainted 4.14.0-rc1-159539-ge42aca3 #236
[...]
[   16.068206] Workqueue: events deferred_probe_work_func
[   16.078557] task: ffff8017d38a0000 task.stack: ffff00000b198000
[   16.090486] PC is at __alloc_pages_nodemask+0x19c/0xc48
[   16.101013] LR is at __alloc_pages_nodemask+0xe0/0xc48
[   16.469220] [<ffff000008186b94>] __alloc_pages_nodemask+0x19c/0xc48
[   16.481854] [<ffff0000081d65b0>] alloc_pages_current+0x80/0xcc
[   16.493607] [<ffff000008182be8>] __get_free_pages+0xc/0x38
[   16.504661] [<ffff0000083c4d58>] swiotlb_alloc_coherent+0x64/0x190
[   16.517117] [<ffff00000809824c>] __dma_alloc+0x110/0x204
[   16.527820] [<ffff00000858e850>] dmam_alloc_coherent+0x88/0xf0
[   16.539575] [<ffff000008568884>] arm_smmu_domain_finalise_s1+0x60/0x248
[   16.552909] [<ffff00000856c104>] arm_smmu_attach_dev+0x264/0x300
[   16.565013] [<ffff00000855d40c>] __iommu_attach_device+0x48/0x5c
[   16.577117] [<ffff00000855e698>] iommu_group_add_device+0x144/0x3a4
[   16.589746] [<ffff00000855ed18>] iommu_group_get_for_dev+0x70/0xf8
[   16.602201] [<ffff00000856a314>] arm_smmu_add_device+0x1a4/0x418
[   16.614308] [<ffff00000849dfcc>] iort_iommu_configure+0xf0/0x16c
[   16.626416] [<ffff000008468c50>] acpi_dma_configure+0x30/0x70
[   16.637994] [<ffff00000858f00c>] dma_configure+0xa8/0xd4
[   16.648695] [<ffff00000857706c>] driver_probe_device+0x1a4/0x2dc
[   16.673081] [<ffff0000085752c8>] bus_for_each_drv+0x54/0x94
[   16.684307] [<ffff000008576db0>] __device_attach+0xc4/0x12c
[   16.695533] [<ffff000008577350>] device_initial_probe+0x10/0x18
[   16.707462] [<ffff0000085762b4>] bus_probe_device+0x90/0x98

After a bit of debug it looks like on platforms where ssid is not supported,
s1_cfg.num_contexts is set to zero and it eventually results in this crash 
in,
arm_smmu_domain_finalise_s1() -->arm_smmu_alloc_cd_tables()-->
arm_smmu_alloc_cd_leaf_table() as num_leaf_entries is zero.

With the below fix, it works on D05 now,

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 8ad90e2..51f5821 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2433,7 +2433,10 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
                        domain->min_pasid = 1;
                        domain->max_pasid = master->num_ssids - 1;
                        smmu_domain->s1_cfg.num_contexts = master->num_ssids;
+               } else {
+                       smmu_domain->s1_cfg.num_contexts = 1;
                }
+
                smmu_domain->s1_cfg.can_stall = master->ste.can_stall;
                break;
        case ARM_SMMU_DOMAIN_NESTED:


I am not sure this is right place do this. Please take a look.

Thanks,
Shameer


^ permalink raw reply related	[flat|nested] 268+ messages in thread

* RE: [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for Substream IDs
@ 2017-11-02 12:49     ` Shameerali Kolothum Thodi
  0 siblings, 0 replies; 268+ messages in thread
From: Shameerali Kolothum Thodi @ 2017-11-02 12:49 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu
  Cc: mark.rutland, xieyisheng (A),
	Gabriele Paoloni, catalin.marinas, will.deacon, Linuxarm, okaya,
	lorenzo.pieralisi, yi.l.liu, ashok.raj, tn, joro, rfranz, lenb,
	jacob.jun.pan, alex.williamson, robh+dt, Leizhen (ThunderTown),
	bhelgaas, robin.murphy, liubo (CU),
	rjw, robdclark, hanjun.guo, sudeep.holla, dwmw2, nwatters

Hi Jean,

> -----Original Message-----
> From: linux-arm-kernel [mailto:linux-arm-kernel-bounces@lists.infradead.org]
> On Behalf Of Jean-Philippe Brucker
> Sent: Friday, October 06, 2017 2:32 PM
> To: linux-arm-kernel@lists.infradead.org; linux-pci@vger.kernel.org; linux-
> acpi@vger.kernel.org; devicetree@vger.kernel.org; iommu@lists.linux-
> foundation.org
> Cc: mark.rutland@arm.com; xieyisheng (A) <xieyisheng1@huawei.com>;
> Gabriele Paoloni <gabriele.paoloni@huawei.com>; catalin.marinas@arm.com;
> will.deacon@arm.com; okaya@codeaurora.org; yi.l.liu@intel.com;
> lorenzo.pieralisi@arm.com; ashok.raj@intel.com; tn@semihalf.com;
> joro@8bytes.org; rfranz@cavium.com; lenb@kernel.org;
> jacob.jun.pan@linux.intel.com; alex.williamson@redhat.com;
> robh+dt@kernel.org; Leizhen (ThunderTown) <thunder.leizhen@huawei.com>;
> bhelgaas@google.com; dwmw2@infradead.org; liubo (CU)
> <liubo95@huawei.com>; rjw@rjwysocki.net; robdclark@gmail.com;
> hanjun.guo@linaro.org; sudeep.holla@arm.com; robin.murphy@arm.com;
> nwatters@codeaurora.org
> Subject: [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for
> Substream IDs
> 
> At the moment, the SMMUv3 driver offers only one stage-1 or stage-2
> address space to each device. SMMUv3 allows to associate multiple address
> spaces per device. In addition to the Stream ID (SID), that identifies a
> device, we can now have Substream IDs (SSID) identifying an address space.
> In PCIe lingo, SID is called Requester ID (RID) and SSID is called Process
> Address-Space ID (PASID).

We had a go with this series on HiSIlicon D05 platform which doesn't have
support for ssids/ATS/PRI, to make sure it generally works.

But observed the below crash on boot,

[   16.009084] WARNING: CPU: 59 PID: 391 at mm/page_alloc.c:3883 __alloc_pages_nodemask+0x19c/0xc48
[   16.026797] Modules linked in:
[   16.032944] CPU: 59 PID: 391 Comm: kworker/59:1 Not tainted 4.14.0-rc1-159539-ge42aca3 #236
[...]
[   16.068206] Workqueue: events deferred_probe_work_func
[   16.078557] task: ffff8017d38a0000 task.stack: ffff00000b198000
[   16.090486] PC is at __alloc_pages_nodemask+0x19c/0xc48
[   16.101013] LR is at __alloc_pages_nodemask+0xe0/0xc48
[   16.469220] [<ffff000008186b94>] __alloc_pages_nodemask+0x19c/0xc48
[   16.481854] [<ffff0000081d65b0>] alloc_pages_current+0x80/0xcc
[   16.493607] [<ffff000008182be8>] __get_free_pages+0xc/0x38
[   16.504661] [<ffff0000083c4d58>] swiotlb_alloc_coherent+0x64/0x190
[   16.517117] [<ffff00000809824c>] __dma_alloc+0x110/0x204
[   16.527820] [<ffff00000858e850>] dmam_alloc_coherent+0x88/0xf0
[   16.539575] [<ffff000008568884>] arm_smmu_domain_finalise_s1+0x60/0x248
[   16.552909] [<ffff00000856c104>] arm_smmu_attach_dev+0x264/0x300
[   16.565013] [<ffff00000855d40c>] __iommu_attach_device+0x48/0x5c
[   16.577117] [<ffff00000855e698>] iommu_group_add_device+0x144/0x3a4
[   16.589746] [<ffff00000855ed18>] iommu_group_get_for_dev+0x70/0xf8
[   16.602201] [<ffff00000856a314>] arm_smmu_add_device+0x1a4/0x418
[   16.614308] [<ffff00000849dfcc>] iort_iommu_configure+0xf0/0x16c
[   16.626416] [<ffff000008468c50>] acpi_dma_configure+0x30/0x70
[   16.637994] [<ffff00000858f00c>] dma_configure+0xa8/0xd4
[   16.648695] [<ffff00000857706c>] driver_probe_device+0x1a4/0x2dc
[   16.673081] [<ffff0000085752c8>] bus_for_each_drv+0x54/0x94
[   16.684307] [<ffff000008576db0>] __device_attach+0xc4/0x12c
[   16.695533] [<ffff000008577350>] device_initial_probe+0x10/0x18
[   16.707462] [<ffff0000085762b4>] bus_probe_device+0x90/0x98

After a bit of debug it looks like on platforms where ssid is not supported,
s1_cfg.num_contexts is set to zero and it eventually results in this crash 
in,
arm_smmu_domain_finalise_s1() -->arm_smmu_alloc_cd_tables()-->
arm_smmu_alloc_cd_leaf_table() as num_leaf_entries is zero.

With the below fix, it works on D05 now,

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 8ad90e2..51f5821 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2433,7 +2433,10 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
                        domain->min_pasid = 1;
                        domain->max_pasid = master->num_ssids - 1;
                        smmu_domain->s1_cfg.num_contexts = master->num_ssids;
+               } else {
+                       smmu_domain->s1_cfg.num_contexts = 1;
                }
+
                smmu_domain->s1_cfg.can_stall = master->ste.can_stall;
                break;
        case ARM_SMMU_DOMAIN_NESTED:


I am not sure this is right place do this. Please take a look.

Thanks,
Shameer


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for Substream IDs
@ 2017-11-02 12:49     ` Shameerali Kolothum Thodi
  0 siblings, 0 replies; 268+ messages in thread
From: Shameerali Kolothum Thodi @ 2017-11-02 12:49 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jean,

> -----Original Message-----
> From: linux-arm-kernel [mailto:linux-arm-kernel-bounces at lists.infradead.org]
> On Behalf Of Jean-Philippe Brucker
> Sent: Friday, October 06, 2017 2:32 PM
> To: linux-arm-kernel at lists.infradead.org; linux-pci at vger.kernel.org; linux-
> acpi at vger.kernel.org; devicetree at vger.kernel.org; iommu at lists.linux-
> foundation.org
> Cc: mark.rutland at arm.com; xieyisheng (A) <xieyisheng1@huawei.com>;
> Gabriele Paoloni <gabriele.paoloni@huawei.com>; catalin.marinas at arm.com;
> will.deacon at arm.com; okaya at codeaurora.org; yi.l.liu at intel.com;
> lorenzo.pieralisi at arm.com; ashok.raj at intel.com; tn at semihalf.com;
> joro at 8bytes.org; rfranz at cavium.com; lenb at kernel.org;
> jacob.jun.pan at linux.intel.com; alex.williamson at redhat.com;
> robh+dt at kernel.org; Leizhen (ThunderTown) <thunder.leizhen@huawei.com>;
> bhelgaas at google.com; dwmw2 at infradead.org; liubo (CU)
> <liubo95@huawei.com>; rjw at rjwysocki.net; robdclark at gmail.com;
> hanjun.guo at linaro.org; sudeep.holla at arm.com; robin.murphy at arm.com;
> nwatters at codeaurora.org
> Subject: [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for
> Substream IDs
> 
> At the moment, the SMMUv3 driver offers only one stage-1 or stage-2
> address space to each device. SMMUv3 allows to associate multiple address
> spaces per device. In addition to the Stream ID (SID), that identifies a
> device, we can now have Substream IDs (SSID) identifying an address space.
> In PCIe lingo, SID is called Requester ID (RID) and SSID is called Process
> Address-Space ID (PASID).

We had a go with this series on HiSIlicon D05 platform which doesn't have
support for ssids/ATS/PRI, to make sure it generally works.

But observed the below crash on boot,

[   16.009084] WARNING: CPU: 59 PID: 391 at mm/page_alloc.c:3883 __alloc_pages_nodemask+0x19c/0xc48
[   16.026797] Modules linked in:
[   16.032944] CPU: 59 PID: 391 Comm: kworker/59:1 Not tainted 4.14.0-rc1-159539-ge42aca3 #236
[...]
[   16.068206] Workqueue: events deferred_probe_work_func
[   16.078557] task: ffff8017d38a0000 task.stack: ffff00000b198000
[   16.090486] PC is at __alloc_pages_nodemask+0x19c/0xc48
[   16.101013] LR is at __alloc_pages_nodemask+0xe0/0xc48
[   16.469220] [<ffff000008186b94>] __alloc_pages_nodemask+0x19c/0xc48
[   16.481854] [<ffff0000081d65b0>] alloc_pages_current+0x80/0xcc
[   16.493607] [<ffff000008182be8>] __get_free_pages+0xc/0x38
[   16.504661] [<ffff0000083c4d58>] swiotlb_alloc_coherent+0x64/0x190
[   16.517117] [<ffff00000809824c>] __dma_alloc+0x110/0x204
[   16.527820] [<ffff00000858e850>] dmam_alloc_coherent+0x88/0xf0
[   16.539575] [<ffff000008568884>] arm_smmu_domain_finalise_s1+0x60/0x248
[   16.552909] [<ffff00000856c104>] arm_smmu_attach_dev+0x264/0x300
[   16.565013] [<ffff00000855d40c>] __iommu_attach_device+0x48/0x5c
[   16.577117] [<ffff00000855e698>] iommu_group_add_device+0x144/0x3a4
[   16.589746] [<ffff00000855ed18>] iommu_group_get_for_dev+0x70/0xf8
[   16.602201] [<ffff00000856a314>] arm_smmu_add_device+0x1a4/0x418
[   16.614308] [<ffff00000849dfcc>] iort_iommu_configure+0xf0/0x16c
[   16.626416] [<ffff000008468c50>] acpi_dma_configure+0x30/0x70
[   16.637994] [<ffff00000858f00c>] dma_configure+0xa8/0xd4
[   16.648695] [<ffff00000857706c>] driver_probe_device+0x1a4/0x2dc
[   16.673081] [<ffff0000085752c8>] bus_for_each_drv+0x54/0x94
[   16.684307] [<ffff000008576db0>] __device_attach+0xc4/0x12c
[   16.695533] [<ffff000008577350>] device_initial_probe+0x10/0x18
[   16.707462] [<ffff0000085762b4>] bus_probe_device+0x90/0x98

After a bit of debug it looks like on platforms where ssid is not supported,
s1_cfg.num_contexts is set to zero and it eventually results in this crash 
in,
arm_smmu_domain_finalise_s1() -->arm_smmu_alloc_cd_tables()-->
arm_smmu_alloc_cd_leaf_table() as num_leaf_entries is zero.

With the below fix, it works on D05 now,

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 8ad90e2..51f5821 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2433,7 +2433,10 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
                        domain->min_pasid = 1;
                        domain->max_pasid = master->num_ssids - 1;
                        smmu_domain->s1_cfg.num_contexts = master->num_ssids;
+               } else {
+                       smmu_domain->s1_cfg.num_contexts = 1;
                }
+
                smmu_domain->s1_cfg.can_stall = master->ste.can_stall;
                break;
        case ARM_SMMU_DOMAIN_NESTED:


I am not sure this is right place do this. Please take a look.

Thanks,
Shameer

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for Substream IDs
  2017-11-02 12:49     ` Shameerali Kolothum Thodi
  (?)
@ 2017-11-02 15:51       ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-02 15:51 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu,
	Mark Rutland, xieyisheng (A),
	Gabriele Paoloni, Catalin Marinas, Will Deacon, okaya, yi.l.liu,
	Lorenzo Pieralisi, ashok.raj, tn, joro

Hi Shameer,

On Thu, Nov 02, 2017 at 12:49:32PM +0000, Shameerali Kolothum Thodi wrote:
> We had a go with this series on HiSIlicon D05 platform which doesn't have
> support for ssids/ATS/PRI, to make sure it generally works.
> 
> But observed the below crash on boot,
> 
> [   16.009084] WARNING: CPU: 59 PID: 391 at mm/page_alloc.c:3883 __alloc_pages_nodemask+0x19c/0xc48
> [   16.026797] Modules linked in:
> [   16.032944] CPU: 59 PID: 391 Comm: kworker/59:1 Not tainted 4.14.0-rc1-159539-ge42aca3 #236
> [...]
> [   16.068206] Workqueue: events deferred_probe_work_func
> [   16.078557] task: ffff8017d38a0000 task.stack: ffff00000b198000
> [   16.090486] PC is at __alloc_pages_nodemask+0x19c/0xc48
> [   16.101013] LR is at __alloc_pages_nodemask+0xe0/0xc48
> [   16.469220] [<ffff000008186b94>] __alloc_pages_nodemask+0x19c/0xc48
> [   16.481854] [<ffff0000081d65b0>] alloc_pages_current+0x80/0xcc
> [   16.493607] [<ffff000008182be8>] __get_free_pages+0xc/0x38
> [   16.504661] [<ffff0000083c4d58>] swiotlb_alloc_coherent+0x64/0x190
> [   16.517117] [<ffff00000809824c>] __dma_alloc+0x110/0x204
> [   16.527820] [<ffff00000858e850>] dmam_alloc_coherent+0x88/0xf0
> [   16.539575] [<ffff000008568884>] arm_smmu_domain_finalise_s1+0x60/0x248
> [   16.552909] [<ffff00000856c104>] arm_smmu_attach_dev+0x264/0x300
> [   16.565013] [<ffff00000855d40c>] __iommu_attach_device+0x48/0x5c
> [   16.577117] [<ffff00000855e698>] iommu_group_add_device+0x144/0x3a4
> [   16.589746] [<ffff00000855ed18>] iommu_group_get_for_dev+0x70/0xf8
> [   16.602201] [<ffff00000856a314>] arm_smmu_add_device+0x1a4/0x418
> [   16.614308] [<ffff00000849dfcc>] iort_iommu_configure+0xf0/0x16c
> [   16.626416] [<ffff000008468c50>] acpi_dma_configure+0x30/0x70
> [   16.637994] [<ffff00000858f00c>] dma_configure+0xa8/0xd4
> [   16.648695] [<ffff00000857706c>] driver_probe_device+0x1a4/0x2dc
> [   16.673081] [<ffff0000085752c8>] bus_for_each_drv+0x54/0x94
> [   16.684307] [<ffff000008576db0>] __device_attach+0xc4/0x12c
> [   16.695533] [<ffff000008577350>] device_initial_probe+0x10/0x18
> [   16.707462] [<ffff0000085762b4>] bus_probe_device+0x90/0x98
> 
> After a bit of debug it looks like on platforms where ssid is not supported,
> s1_cfg.num_contexts is set to zero and it eventually results in this crash 
> in,
> arm_smmu_domain_finalise_s1() -->arm_smmu_alloc_cd_tables()-->
> arm_smmu_alloc_cd_leaf_table() as num_leaf_entries is zero.
> 
> With the below fix, it works on D05 now,
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 8ad90e2..51f5821 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -2433,7 +2433,10 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
>                         domain->min_pasid = 1;
>                         domain->max_pasid = master->num_ssids - 1;
>                         smmu_domain->s1_cfg.num_contexts = master->num_ssids;
> +               } else {
> +                       smmu_domain->s1_cfg.num_contexts = 1;
>                 }
> +
>                 smmu_domain->s1_cfg.can_stall = master->ste.can_stall;
>                 break;
>         case ARM_SMMU_DOMAIN_NESTED:
> 
> 
> I am not sure this is right place do this. Please take a look.

Thanks for testing the series and reporting the bug. I added the
following patch to branch svm/current, does it work for you?

Thanks,
Jean

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 42c8378624ed..edda466adc81 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -3169,9 +3169,7 @@ static int arm_smmu_add_device(struct device *dev)
                }
        }

-       if (smmu->ssid_bits)
-               master->num_ssids = 1 << min(smmu->ssid_bits,
-                                            fwspec->num_pasid_bits);
+       master->num_ssids = 1 << min(smmu->ssid_bits, fwspec->num_pasid_bits);

        if (fwspec->can_stall && smmu->features & ARM_SMMU_FEAT_STALLS) {
                master->can_fault = true;


^ permalink raw reply related	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for Substream IDs
@ 2017-11-02 15:51       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-02 15:51 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu,
	Mark Rutland, xieyisheng (A),
	Gabriele Paoloni, Catalin Marinas, Will Deacon, okaya, yi.l.liu,
	Lorenzo Pieralisi, ashok.raj, tn, joro, rfranz, lenb,
	jacob.jun.pan, alex.williamson, robh+dt, Leizhen (ThunderTown),
	bhelgaas, dwmw2, liubo (CU),
	rjw, robdclark, hanjun.guo, Sudeep Holla, Robin Murphy, nwatters,
	Linuxarm

Hi Shameer,

On Thu, Nov 02, 2017 at 12:49:32PM +0000, Shameerali Kolothum Thodi wrote:
> We had a go with this series on HiSIlicon D05 platform which doesn't have
> support for ssids/ATS/PRI, to make sure it generally works.
> 
> But observed the below crash on boot,
> 
> [   16.009084] WARNING: CPU: 59 PID: 391 at mm/page_alloc.c:3883 __alloc_pages_nodemask+0x19c/0xc48
> [   16.026797] Modules linked in:
> [   16.032944] CPU: 59 PID: 391 Comm: kworker/59:1 Not tainted 4.14.0-rc1-159539-ge42aca3 #236
> [...]
> [   16.068206] Workqueue: events deferred_probe_work_func
> [   16.078557] task: ffff8017d38a0000 task.stack: ffff00000b198000
> [   16.090486] PC is at __alloc_pages_nodemask+0x19c/0xc48
> [   16.101013] LR is at __alloc_pages_nodemask+0xe0/0xc48
> [   16.469220] [<ffff000008186b94>] __alloc_pages_nodemask+0x19c/0xc48
> [   16.481854] [<ffff0000081d65b0>] alloc_pages_current+0x80/0xcc
> [   16.493607] [<ffff000008182be8>] __get_free_pages+0xc/0x38
> [   16.504661] [<ffff0000083c4d58>] swiotlb_alloc_coherent+0x64/0x190
> [   16.517117] [<ffff00000809824c>] __dma_alloc+0x110/0x204
> [   16.527820] [<ffff00000858e850>] dmam_alloc_coherent+0x88/0xf0
> [   16.539575] [<ffff000008568884>] arm_smmu_domain_finalise_s1+0x60/0x248
> [   16.552909] [<ffff00000856c104>] arm_smmu_attach_dev+0x264/0x300
> [   16.565013] [<ffff00000855d40c>] __iommu_attach_device+0x48/0x5c
> [   16.577117] [<ffff00000855e698>] iommu_group_add_device+0x144/0x3a4
> [   16.589746] [<ffff00000855ed18>] iommu_group_get_for_dev+0x70/0xf8
> [   16.602201] [<ffff00000856a314>] arm_smmu_add_device+0x1a4/0x418
> [   16.614308] [<ffff00000849dfcc>] iort_iommu_configure+0xf0/0x16c
> [   16.626416] [<ffff000008468c50>] acpi_dma_configure+0x30/0x70
> [   16.637994] [<ffff00000858f00c>] dma_configure+0xa8/0xd4
> [   16.648695] [<ffff00000857706c>] driver_probe_device+0x1a4/0x2dc
> [   16.673081] [<ffff0000085752c8>] bus_for_each_drv+0x54/0x94
> [   16.684307] [<ffff000008576db0>] __device_attach+0xc4/0x12c
> [   16.695533] [<ffff000008577350>] device_initial_probe+0x10/0x18
> [   16.707462] [<ffff0000085762b4>] bus_probe_device+0x90/0x98
> 
> After a bit of debug it looks like on platforms where ssid is not supported,
> s1_cfg.num_contexts is set to zero and it eventually results in this crash 
> in,
> arm_smmu_domain_finalise_s1() -->arm_smmu_alloc_cd_tables()-->
> arm_smmu_alloc_cd_leaf_table() as num_leaf_entries is zero.
> 
> With the below fix, it works on D05 now,
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 8ad90e2..51f5821 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -2433,7 +2433,10 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
>                         domain->min_pasid = 1;
>                         domain->max_pasid = master->num_ssids - 1;
>                         smmu_domain->s1_cfg.num_contexts = master->num_ssids;
> +               } else {
> +                       smmu_domain->s1_cfg.num_contexts = 1;
>                 }
> +
>                 smmu_domain->s1_cfg.can_stall = master->ste.can_stall;
>                 break;
>         case ARM_SMMU_DOMAIN_NESTED:
> 
> 
> I am not sure this is right place do this. Please take a look.

Thanks for testing the series and reporting the bug. I added the
following patch to branch svm/current, does it work for you?

Thanks,
Jean

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 42c8378624ed..edda466adc81 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -3169,9 +3169,7 @@ static int arm_smmu_add_device(struct device *dev)
                }
        }

-       if (smmu->ssid_bits)
-               master->num_ssids = 1 << min(smmu->ssid_bits,
-                                            fwspec->num_pasid_bits);
+       master->num_ssids = 1 << min(smmu->ssid_bits, fwspec->num_pasid_bits);

        if (fwspec->can_stall && smmu->features & ARM_SMMU_FEAT_STALLS) {
                master->can_fault = true;


^ permalink raw reply related	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for Substream IDs
@ 2017-11-02 15:51       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-02 15:51 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Shameer,

On Thu, Nov 02, 2017 at 12:49:32PM +0000, Shameerali Kolothum Thodi wrote:
> We had a go with this series on HiSIlicon D05 platform which doesn't have
> support for ssids/ATS/PRI, to make sure it generally works.
> 
> But observed the below crash on boot,
> 
> [   16.009084] WARNING: CPU: 59 PID: 391 at mm/page_alloc.c:3883 __alloc_pages_nodemask+0x19c/0xc48
> [   16.026797] Modules linked in:
> [   16.032944] CPU: 59 PID: 391 Comm: kworker/59:1 Not tainted 4.14.0-rc1-159539-ge42aca3 #236
> [...]
> [   16.068206] Workqueue: events deferred_probe_work_func
> [   16.078557] task: ffff8017d38a0000 task.stack: ffff00000b198000
> [   16.090486] PC is at __alloc_pages_nodemask+0x19c/0xc48
> [   16.101013] LR is at __alloc_pages_nodemask+0xe0/0xc48
> [   16.469220] [<ffff000008186b94>] __alloc_pages_nodemask+0x19c/0xc48
> [   16.481854] [<ffff0000081d65b0>] alloc_pages_current+0x80/0xcc
> [   16.493607] [<ffff000008182be8>] __get_free_pages+0xc/0x38
> [   16.504661] [<ffff0000083c4d58>] swiotlb_alloc_coherent+0x64/0x190
> [   16.517117] [<ffff00000809824c>] __dma_alloc+0x110/0x204
> [   16.527820] [<ffff00000858e850>] dmam_alloc_coherent+0x88/0xf0
> [   16.539575] [<ffff000008568884>] arm_smmu_domain_finalise_s1+0x60/0x248
> [   16.552909] [<ffff00000856c104>] arm_smmu_attach_dev+0x264/0x300
> [   16.565013] [<ffff00000855d40c>] __iommu_attach_device+0x48/0x5c
> [   16.577117] [<ffff00000855e698>] iommu_group_add_device+0x144/0x3a4
> [   16.589746] [<ffff00000855ed18>] iommu_group_get_for_dev+0x70/0xf8
> [   16.602201] [<ffff00000856a314>] arm_smmu_add_device+0x1a4/0x418
> [   16.614308] [<ffff00000849dfcc>] iort_iommu_configure+0xf0/0x16c
> [   16.626416] [<ffff000008468c50>] acpi_dma_configure+0x30/0x70
> [   16.637994] [<ffff00000858f00c>] dma_configure+0xa8/0xd4
> [   16.648695] [<ffff00000857706c>] driver_probe_device+0x1a4/0x2dc
> [   16.673081] [<ffff0000085752c8>] bus_for_each_drv+0x54/0x94
> [   16.684307] [<ffff000008576db0>] __device_attach+0xc4/0x12c
> [   16.695533] [<ffff000008577350>] device_initial_probe+0x10/0x18
> [   16.707462] [<ffff0000085762b4>] bus_probe_device+0x90/0x98
> 
> After a bit of debug it looks like on platforms where ssid is not supported,
> s1_cfg.num_contexts is set to zero and it eventually results in this crash 
> in,
> arm_smmu_domain_finalise_s1() -->arm_smmu_alloc_cd_tables()-->
> arm_smmu_alloc_cd_leaf_table() as num_leaf_entries is zero.
> 
> With the below fix, it works on D05 now,
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 8ad90e2..51f5821 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -2433,7 +2433,10 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
>                         domain->min_pasid = 1;
>                         domain->max_pasid = master->num_ssids - 1;
>                         smmu_domain->s1_cfg.num_contexts = master->num_ssids;
> +               } else {
> +                       smmu_domain->s1_cfg.num_contexts = 1;
>                 }
> +
>                 smmu_domain->s1_cfg.can_stall = master->ste.can_stall;
>                 break;
>         case ARM_SMMU_DOMAIN_NESTED:
> 
> 
> I am not sure this is right place do this. Please take a look.

Thanks for testing the series and reporting the bug. I added the
following patch to branch svm/current, does it work for you?

Thanks,
Jean

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 42c8378624ed..edda466adc81 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -3169,9 +3169,7 @@ static int arm_smmu_add_device(struct device *dev)
                }
        }

-       if (smmu->ssid_bits)
-               master->num_ssids = 1 << min(smmu->ssid_bits,
-                                            fwspec->num_pasid_bits);
+       master->num_ssids = 1 << min(smmu->ssid_bits, fwspec->num_pasid_bits);

        if (fwspec->can_stall && smmu->features & ARM_SMMU_FEAT_STALLS) {
                master->can_fault = true;

^ permalink raw reply related	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
  2017-10-20 23:32         ` Sinan Kaya
  (?)
@ 2017-11-02 16:20           ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-02 16:20 UTC (permalink / raw)
  To: Sinan Kaya, linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, Mark Rutland, Catalin Marinas, Will Deacon,
	Lorenzo Pieralisi, hanjun.guo, Sudeep Holla, rjw, lenb,
	Robin Murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni

Hi Sinan,

Sorry for the delay, thanks for reviewing this!

On 21/10/17 00:32, Sinan Kaya wrote:
> few nits below.
> 
>> +/*
>> + * Allocate a iommu_process structure for the given task.
>> + *
>> + * Ideally we shouldn't need the domain parameter, since iommu_process is
>> + * system-wide, but we use it to retrieve the driver's allocation ops and a
>> + * PASID range.
>> + */
>> +static struct iommu_process *
>> +iommu_process_alloc(struct iommu_domain *domain, struct task_struct *task)
>> +{
>> +	int err;
>> +	int pasid;
>> +	struct iommu_process *process;
>> +
>> +	if (WARN_ON(!domain->ops->process_alloc || !domain->ops->process_free))
>> +		return ERR_PTR(-ENODEV);
>> +
>> +	process = domain->ops->process_alloc(task);
>> +	if (IS_ERR(process))
>> +		return process;
>> +	if (!process)
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	process->pid		= get_task_pid(task, PIDTYPE_PID);
>> +	process->release	= domain->ops->process_free;
>> +	INIT_LIST_HEAD(&process->domains);
>> +	kref_init(&process->kref);
>> +
> nit, I think you should place this check right after the pid assignment.

Sure

>> +	if (!process->pid) {
>> +		err = -EINVAL;
>> +		goto err_free_process;
>> +	}
>> +
>> +	idr_preload(GFP_KERNEL);
>> +	spin_lock(&iommu_process_lock);
>> +	pasid = idr_alloc_cyclic(&iommu_process_idr, process, domain->min_pasid,
>> +				 domain->max_pasid + 1, GFP_ATOMIC);
>> +	process->pasid = pasid;
>> +	spin_unlock(&iommu_process_lock);
>> +	idr_preload_end();
>> +
> 
> nit, You can maybe return here if pasid is not negative.

Ok

>> +	if (pasid < 0) {
>> +		err = pasid;
>> +		goto err_put_pid;
>> +	}
>> +
>> +	return process;
>> +
>> +err_put_pid:
>> +	put_pid(process->pid);
>> +
>> +err_free_process:
>> +	domain->ops->process_free(process);
>> +
>> +	return ERR_PTR(err);
>> +}
>> +
>> +static void iommu_process_release(struct kref *kref)
>> +{
>> +	struct iommu_process *process;
>> +	void (*release)(struct iommu_process *);
>> +
>> +	assert_spin_locked(&iommu_process_lock);
>> +
>> +	process = container_of(kref, struct iommu_process, kref);
> 
> if we are concerned about things going wrong (assert above), we should
> also add some pointer check here (WARN) for process and release pointers as well.

We can probably get rid of this assert, as any external users releasing
the process will have to go through iommu_process_put() which takes the
lock. process_alloc() ensures that release isn't NULL, and process should
always be valid here since we're being called from kref_put, but I should
check the value in process_put.

>> +	release = process->release;
>> +
>> +	WARN_ON(!list_empty(&process->domains));
>> +
>> +	idr_remove(&iommu_process_idr, process->pasid);
>> +	put_pid(process->pid);
>> +	release(process);
>> +}
>> +
>> +/*
>> + * Returns non-zero if a reference to the process was successfully taken.
>> + * Returns zero if the process is being freed and should not be used.
>> + */
>> +static int iommu_process_get_locked(struct iommu_process *process)
>> +{
>> +	assert_spin_locked(&iommu_process_lock);
>> +
>> +	if (process)
>> +		return kref_get_unless_zero(&process->kref);
>> +
>> +	return 0;
>> +}
>> +
>> +static void iommu_process_put_locked(struct iommu_process *process)
>> +{
>> +	assert_spin_locked(&iommu_process_lock);
>> +
>> +	kref_put(&process->kref, iommu_process_release);
>> +}
>> +
>> +static int iommu_process_attach(struct iommu_domain *domain, struct device *dev,
>> +				struct iommu_process *process)
>> +{
>> +	int err;
>> +	int pasid = process->pasid;
>> +	struct iommu_context *context;
>> +
>> +	if (WARN_ON(!domain->ops->process_attach || !domain->ops->process_detach))
>> +		return -ENODEV;
>> +
>> +	if (pasid > domain->max_pasid || pasid < domain->min_pasid)
>> +		return -ENOSPC;
>> +
>> +	context = kzalloc(sizeof(*context), GFP_KERNEL);
>> +	if (!context)
>> +		return -ENOMEM;
>> +
> 
> devm_kzalloc maybe?

I don't think we can ever leak contexts. Before the device is released, it
has to detach() from the domain, which will unbind from any process (call
unbind_dev_all()).

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
@ 2017-11-02 16:20           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-02 16:20 UTC (permalink / raw)
  To: Sinan Kaya, linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, Mark Rutland, Catalin Marinas, Will Deacon,
	Lorenzo Pieralisi, hanjun.guo, Sudeep Holla, rjw, lenb,
	Robin Murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, rfranz,
	dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

Hi Sinan,

Sorry for the delay, thanks for reviewing this!

On 21/10/17 00:32, Sinan Kaya wrote:
> few nits below.
> 
>> +/*
>> + * Allocate a iommu_process structure for the given task.
>> + *
>> + * Ideally we shouldn't need the domain parameter, since iommu_process is
>> + * system-wide, but we use it to retrieve the driver's allocation ops and a
>> + * PASID range.
>> + */
>> +static struct iommu_process *
>> +iommu_process_alloc(struct iommu_domain *domain, struct task_struct *task)
>> +{
>> +	int err;
>> +	int pasid;
>> +	struct iommu_process *process;
>> +
>> +	if (WARN_ON(!domain->ops->process_alloc || !domain->ops->process_free))
>> +		return ERR_PTR(-ENODEV);
>> +
>> +	process = domain->ops->process_alloc(task);
>> +	if (IS_ERR(process))
>> +		return process;
>> +	if (!process)
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	process->pid		= get_task_pid(task, PIDTYPE_PID);
>> +	process->release	= domain->ops->process_free;
>> +	INIT_LIST_HEAD(&process->domains);
>> +	kref_init(&process->kref);
>> +
> nit, I think you should place this check right after the pid assignment.

Sure

>> +	if (!process->pid) {
>> +		err = -EINVAL;
>> +		goto err_free_process;
>> +	}
>> +
>> +	idr_preload(GFP_KERNEL);
>> +	spin_lock(&iommu_process_lock);
>> +	pasid = idr_alloc_cyclic(&iommu_process_idr, process, domain->min_pasid,
>> +				 domain->max_pasid + 1, GFP_ATOMIC);
>> +	process->pasid = pasid;
>> +	spin_unlock(&iommu_process_lock);
>> +	idr_preload_end();
>> +
> 
> nit, You can maybe return here if pasid is not negative.

Ok

>> +	if (pasid < 0) {
>> +		err = pasid;
>> +		goto err_put_pid;
>> +	}
>> +
>> +	return process;
>> +
>> +err_put_pid:
>> +	put_pid(process->pid);
>> +
>> +err_free_process:
>> +	domain->ops->process_free(process);
>> +
>> +	return ERR_PTR(err);
>> +}
>> +
>> +static void iommu_process_release(struct kref *kref)
>> +{
>> +	struct iommu_process *process;
>> +	void (*release)(struct iommu_process *);
>> +
>> +	assert_spin_locked(&iommu_process_lock);
>> +
>> +	process = container_of(kref, struct iommu_process, kref);
> 
> if we are concerned about things going wrong (assert above), we should
> also add some pointer check here (WARN) for process and release pointers as well.

We can probably get rid of this assert, as any external users releasing
the process will have to go through iommu_process_put() which takes the
lock. process_alloc() ensures that release isn't NULL, and process should
always be valid here since we're being called from kref_put, but I should
check the value in process_put.

>> +	release = process->release;
>> +
>> +	WARN_ON(!list_empty(&process->domains));
>> +
>> +	idr_remove(&iommu_process_idr, process->pasid);
>> +	put_pid(process->pid);
>> +	release(process);
>> +}
>> +
>> +/*
>> + * Returns non-zero if a reference to the process was successfully taken.
>> + * Returns zero if the process is being freed and should not be used.
>> + */
>> +static int iommu_process_get_locked(struct iommu_process *process)
>> +{
>> +	assert_spin_locked(&iommu_process_lock);
>> +
>> +	if (process)
>> +		return kref_get_unless_zero(&process->kref);
>> +
>> +	return 0;
>> +}
>> +
>> +static void iommu_process_put_locked(struct iommu_process *process)
>> +{
>> +	assert_spin_locked(&iommu_process_lock);
>> +
>> +	kref_put(&process->kref, iommu_process_release);
>> +}
>> +
>> +static int iommu_process_attach(struct iommu_domain *domain, struct device *dev,
>> +				struct iommu_process *process)
>> +{
>> +	int err;
>> +	int pasid = process->pasid;
>> +	struct iommu_context *context;
>> +
>> +	if (WARN_ON(!domain->ops->process_attach || !domain->ops->process_detach))
>> +		return -ENODEV;
>> +
>> +	if (pasid > domain->max_pasid || pasid < domain->min_pasid)
>> +		return -ENOSPC;
>> +
>> +	context = kzalloc(sizeof(*context), GFP_KERNEL);
>> +	if (!context)
>> +		return -ENOMEM;
>> +
> 
> devm_kzalloc maybe?

I don't think we can ever leak contexts. Before the device is released, it
has to detach() from the domain, which will unbind from any process (call
unbind_dev_all()).

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
@ 2017-11-02 16:20           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-02 16:20 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Sinan,

Sorry for the delay, thanks for reviewing this!

On 21/10/17 00:32, Sinan Kaya wrote:
> few nits below.
> 
>> +/*
>> + * Allocate a iommu_process structure for the given task.
>> + *
>> + * Ideally we shouldn't need the domain parameter, since iommu_process is
>> + * system-wide, but we use it to retrieve the driver's allocation ops and a
>> + * PASID range.
>> + */
>> +static struct iommu_process *
>> +iommu_process_alloc(struct iommu_domain *domain, struct task_struct *task)
>> +{
>> +	int err;
>> +	int pasid;
>> +	struct iommu_process *process;
>> +
>> +	if (WARN_ON(!domain->ops->process_alloc || !domain->ops->process_free))
>> +		return ERR_PTR(-ENODEV);
>> +
>> +	process = domain->ops->process_alloc(task);
>> +	if (IS_ERR(process))
>> +		return process;
>> +	if (!process)
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	process->pid		= get_task_pid(task, PIDTYPE_PID);
>> +	process->release	= domain->ops->process_free;
>> +	INIT_LIST_HEAD(&process->domains);
>> +	kref_init(&process->kref);
>> +
> nit, I think you should place this check right after the pid assignment.

Sure

>> +	if (!process->pid) {
>> +		err = -EINVAL;
>> +		goto err_free_process;
>> +	}
>> +
>> +	idr_preload(GFP_KERNEL);
>> +	spin_lock(&iommu_process_lock);
>> +	pasid = idr_alloc_cyclic(&iommu_process_idr, process, domain->min_pasid,
>> +				 domain->max_pasid + 1, GFP_ATOMIC);
>> +	process->pasid = pasid;
>> +	spin_unlock(&iommu_process_lock);
>> +	idr_preload_end();
>> +
> 
> nit, You can maybe return here if pasid is not negative.

Ok

>> +	if (pasid < 0) {
>> +		err = pasid;
>> +		goto err_put_pid;
>> +	}
>> +
>> +	return process;
>> +
>> +err_put_pid:
>> +	put_pid(process->pid);
>> +
>> +err_free_process:
>> +	domain->ops->process_free(process);
>> +
>> +	return ERR_PTR(err);
>> +}
>> +
>> +static void iommu_process_release(struct kref *kref)
>> +{
>> +	struct iommu_process *process;
>> +	void (*release)(struct iommu_process *);
>> +
>> +	assert_spin_locked(&iommu_process_lock);
>> +
>> +	process = container_of(kref, struct iommu_process, kref);
> 
> if we are concerned about things going wrong (assert above), we should
> also add some pointer check here (WARN) for process and release pointers as well.

We can probably get rid of this assert, as any external users releasing
the process will have to go through iommu_process_put() which takes the
lock. process_alloc() ensures that release isn't NULL, and process should
always be valid here since we're being called from kref_put, but I should
check the value in process_put.

>> +	release = process->release;
>> +
>> +	WARN_ON(!list_empty(&process->domains));
>> +
>> +	idr_remove(&iommu_process_idr, process->pasid);
>> +	put_pid(process->pid);
>> +	release(process);
>> +}
>> +
>> +/*
>> + * Returns non-zero if a reference to the process was successfully taken.
>> + * Returns zero if the process is being freed and should not be used.
>> + */
>> +static int iommu_process_get_locked(struct iommu_process *process)
>> +{
>> +	assert_spin_locked(&iommu_process_lock);
>> +
>> +	if (process)
>> +		return kref_get_unless_zero(&process->kref);
>> +
>> +	return 0;
>> +}
>> +
>> +static void iommu_process_put_locked(struct iommu_process *process)
>> +{
>> +	assert_spin_locked(&iommu_process_lock);
>> +
>> +	kref_put(&process->kref, iommu_process_release);
>> +}
>> +
>> +static int iommu_process_attach(struct iommu_domain *domain, struct device *dev,
>> +				struct iommu_process *process)
>> +{
>> +	int err;
>> +	int pasid = process->pasid;
>> +	struct iommu_context *context;
>> +
>> +	if (WARN_ON(!domain->ops->process_attach || !domain->ops->process_detach))
>> +		return -ENODEV;
>> +
>> +	if (pasid > domain->max_pasid || pasid < domain->min_pasid)
>> +		return -ENOSPC;
>> +
>> +	context = kzalloc(sizeof(*context), GFP_KERNEL);
>> +	if (!context)
>> +		return -ENOMEM;
>> +
> 
> devm_kzalloc maybe?

I don't think we can ever leak contexts. Before the device is released, it
has to detach() from the domain, which will unbind from any process (call
unbind_dev_all()).

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 05/36] iommu/process: Bind and unbind process to and from devices
  2017-10-21 15:47         ` Sinan Kaya
  (?)
@ 2017-11-02 16:21             ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-02 16:21 UTC (permalink / raw)
  To: Sinan Kaya, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: Mark Rutland, gabriele.paoloni-hv44wF8Li93QT0dZR+AlfA,
	Catalin Marinas, Will Deacon, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	rjw-LthD3rsA81gm4RdzfppkhA, Sudeep Holla

On 21/10/17 16:47, Sinan Kaya wrote:
> Just some improvement suggestions.
> 
> On 10/6/2017 9:31 AM, Jean-Philippe Brucker wrote:
>> +	spin_lock(&iommu_process_lock);
>> +	idr_for_each_entry(&iommu_process_idr, process, i) {
>> +		if (process->pid != pid)
>> +			continue;
> if you see this construct a lot, this could be a for_each_iommu_process.
> 
>> +
>> +		if (!iommu_process_get_locked(process)) {
>> +			/* Process is defunct, create a new one */
>> +			process = NULL;
>> +			break;
>> +		}
>> +
>> +		/* Great, is it also bound to this domain? */
>> +		list_for_each_entry(cur_context, &process->domains,
>> +				    process_head) {
>> +			if (cur_context->domain != domain)
>> +				continue;
> if you see this construct a lot, this could be a for_each_iommu_process_domain.
> 
>> +
>> +			context = cur_context;
>> +			*pasid = process->pasid;
>> +
>> +			/* Splendid, tell the driver and increase the ref */
>> +			err = iommu_process_attach_locked(context, dev);
>> +			if (err)
>> +				iommu_process_put_locked(process);
>> +
>> +			break;
>> +		}
>> +		break;
>> +	}
>> +	spin_unlock(&iommu_process_lock);
>> +	put_pid(pid);
>> +
>> +	if (context)
>> +		return err;
> 
> I think you should make the section above a independent function and return here when the
> context is found.

Hopefully this code will only be needed for bind(), but moving it to a
separate function should look better.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 05/36] iommu/process: Bind and unbind process to and from devices
@ 2017-11-02 16:21             ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-02 16:21 UTC (permalink / raw)
  To: Sinan Kaya, linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: Mark Rutland, xieyisheng1, gabriele.paoloni, Catalin Marinas,
	Will Deacon, yi.l.liu, Lorenzo Pieralisi, ashok.raj, tn, joro,
	rfranz, lenb, jacob.jun.pan, alex.williamson, robh+dt,
	thunder.leizhen, bhelgaas, dwmw2, liubo95, rjw, robdclark,
	hanjun.guo, Sudeep Holla, Robin Murphy, nwatters

On 21/10/17 16:47, Sinan Kaya wrote:
> Just some improvement suggestions.
> 
> On 10/6/2017 9:31 AM, Jean-Philippe Brucker wrote:
>> +	spin_lock(&iommu_process_lock);
>> +	idr_for_each_entry(&iommu_process_idr, process, i) {
>> +		if (process->pid != pid)
>> +			continue;
> if you see this construct a lot, this could be a for_each_iommu_process.
> 
>> +
>> +		if (!iommu_process_get_locked(process)) {
>> +			/* Process is defunct, create a new one */
>> +			process = NULL;
>> +			break;
>> +		}
>> +
>> +		/* Great, is it also bound to this domain? */
>> +		list_for_each_entry(cur_context, &process->domains,
>> +				    process_head) {
>> +			if (cur_context->domain != domain)
>> +				continue;
> if you see this construct a lot, this could be a for_each_iommu_process_domain.
> 
>> +
>> +			context = cur_context;
>> +			*pasid = process->pasid;
>> +
>> +			/* Splendid, tell the driver and increase the ref */
>> +			err = iommu_process_attach_locked(context, dev);
>> +			if (err)
>> +				iommu_process_put_locked(process);
>> +
>> +			break;
>> +		}
>> +		break;
>> +	}
>> +	spin_unlock(&iommu_process_lock);
>> +	put_pid(pid);
>> +
>> +	if (context)
>> +		return err;
> 
> I think you should make the section above a independent function and return here when the
> context is found.

Hopefully this code will only be needed for bind(), but moving it to a
separate function should look better.

Thanks,
Jean


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 05/36] iommu/process: Bind and unbind process to and from devices
@ 2017-11-02 16:21             ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-02 16:21 UTC (permalink / raw)
  To: linux-arm-kernel

On 21/10/17 16:47, Sinan Kaya wrote:
> Just some improvement suggestions.
> 
> On 10/6/2017 9:31 AM, Jean-Philippe Brucker wrote:
>> +	spin_lock(&iommu_process_lock);
>> +	idr_for_each_entry(&iommu_process_idr, process, i) {
>> +		if (process->pid != pid)
>> +			continue;
> if you see this construct a lot, this could be a for_each_iommu_process.
> 
>> +
>> +		if (!iommu_process_get_locked(process)) {
>> +			/* Process is defunct, create a new one */
>> +			process = NULL;
>> +			break;
>> +		}
>> +
>> +		/* Great, is it also bound to this domain? */
>> +		list_for_each_entry(cur_context, &process->domains,
>> +				    process_head) {
>> +			if (cur_context->domain != domain)
>> +				continue;
> if you see this construct a lot, this could be a for_each_iommu_process_domain.
> 
>> +
>> +			context = cur_context;
>> +			*pasid = process->pasid;
>> +
>> +			/* Splendid, tell the driver and increase the ref */
>> +			err = iommu_process_attach_locked(context, dev);
>> +			if (err)
>> +				iommu_process_put_locked(process);
>> +
>> +			break;
>> +		}
>> +		break;
>> +	}
>> +	spin_unlock(&iommu_process_lock);
>> +	put_pid(pid);
>> +
>> +	if (context)
>> +		return err;
> 
> I think you should make the section above a independent function and return here when the
> context is found.

Hopefully this code will only be needed for bind(), but moving it to a
separate function should look better.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* RE: [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for Substream IDs
  2017-11-02 15:51       ` Jean-Philippe Brucker
  (?)
@ 2017-11-02 17:02         ` Shameerali Kolothum Thodi
  -1 siblings, 0 replies; 268+ messages in thread
From: Shameerali Kolothum Thodi @ 2017-11-02 17:02 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu,
	Mark Rutland, xieyisheng (A),
	Gabriele Paoloni, Catalin Marinas, Will Deacon, okaya, yi.l.liu,
	Lorenzo Pieralisi, ashok.raj, tn, joro



> -----Original Message-----
> From: Jean-Philippe Brucker [mailto:Jean-Philippe.Brucker@arm.com]
> Sent: Thursday, November 02, 2017 3:52 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: linux-arm-kernel@lists.infradead.org; linux-pci@vger.kernel.org; linux-
> acpi@vger.kernel.org; devicetree@vger.kernel.org; iommu@lists.linux-
> foundation.org; Mark Rutland <Mark.Rutland@arm.com>; xieyisheng (A)
> <xieyisheng1@huawei.com>; Gabriele Paoloni
> <gabriele.paoloni@huawei.com>; Catalin Marinas
> <Catalin.Marinas@arm.com>; Will Deacon <Will.Deacon@arm.com>;
> okaya@codeaurora.org; yi.l.liu@intel.com; Lorenzo Pieralisi
> <Lorenzo.Pieralisi@arm.com>; ashok.raj@intel.com; tn@semihalf.com;
> joro@8bytes.org; rfranz@cavium.com; lenb@kernel.org;
> jacob.jun.pan@linux.intel.com; alex.williamson@redhat.com;
> robh+dt@kernel.org; Leizhen (ThunderTown) <thunder.leizhen@huawei.com>;
> bhelgaas@google.com; dwmw2@infradead.org; liubo (CU)
> <liubo95@huawei.com>; rjw@rjwysocki.net; robdclark@gmail.com;
> hanjun.guo@linaro.org; Sudeep Holla <Sudeep.Holla@arm.com>; Robin
> Murphy <Robin.Murphy@arm.com>; nwatters@codeaurora.org; Linuxarm
> <linuxarm@huawei.com>
> Subject: Re: [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for
> Substream IDs
> 
> Hi Shameer,
> 
> On Thu, Nov 02, 2017 at 12:49:32PM +0000, Shameerali Kolothum Thodi wrote:
> > We had a go with this series on HiSIlicon D05 platform which doesn't have
> > support for ssids/ATS/PRI, to make sure it generally works.
> >
> > But observed the below crash on boot,
> >
> > [   16.009084] WARNING: CPU: 59 PID: 391 at mm/page_alloc.c:3883
> __alloc_pages_nodemask+0x19c/0xc48
> > [   16.026797] Modules linked in:
> > [   16.032944] CPU: 59 PID: 391 Comm: kworker/59:1 Not tainted 4.14.0-rc1-
> 159539-ge42aca3 #236
> > [...]
> > [   16.068206] Workqueue: events deferred_probe_work_func
> > [   16.078557] task: ffff8017d38a0000 task.stack: ffff00000b198000
> > [   16.090486] PC is at __alloc_pages_nodemask+0x19c/0xc48
> > [   16.101013] LR is at __alloc_pages_nodemask+0xe0/0xc48
> > [   16.469220] [<ffff000008186b94>] __alloc_pages_nodemask+0x19c/0xc48
> > [   16.481854] [<ffff0000081d65b0>] alloc_pages_current+0x80/0xcc
> > [   16.493607] [<ffff000008182be8>] __get_free_pages+0xc/0x38
> > [   16.504661] [<ffff0000083c4d58>] swiotlb_alloc_coherent+0x64/0x190
> > [   16.517117] [<ffff00000809824c>] __dma_alloc+0x110/0x204
> > [   16.527820] [<ffff00000858e850>] dmam_alloc_coherent+0x88/0xf0
> > [   16.539575] [<ffff000008568884>]
> arm_smmu_domain_finalise_s1+0x60/0x248
> > [   16.552909] [<ffff00000856c104>] arm_smmu_attach_dev+0x264/0x300
> > [   16.565013] [<ffff00000855d40c>] __iommu_attach_device+0x48/0x5c
> > [   16.577117] [<ffff00000855e698>] iommu_group_add_device+0x144/0x3a4
> > [   16.589746] [<ffff00000855ed18>] iommu_group_get_for_dev+0x70/0xf8
> > [   16.602201] [<ffff00000856a314>] arm_smmu_add_device+0x1a4/0x418
> > [   16.614308] [<ffff00000849dfcc>] iort_iommu_configure+0xf0/0x16c
> > [   16.626416] [<ffff000008468c50>] acpi_dma_configure+0x30/0x70
> > [   16.637994] [<ffff00000858f00c>] dma_configure+0xa8/0xd4
> > [   16.648695] [<ffff00000857706c>] driver_probe_device+0x1a4/0x2dc
> > [   16.673081] [<ffff0000085752c8>] bus_for_each_drv+0x54/0x94
> > [   16.684307] [<ffff000008576db0>] __device_attach+0xc4/0x12c
> > [   16.695533] [<ffff000008577350>] device_initial_probe+0x10/0x18
> > [   16.707462] [<ffff0000085762b4>] bus_probe_device+0x90/0x98
> >
> > After a bit of debug it looks like on platforms where ssid is not supported,
> > s1_cfg.num_contexts is set to zero and it eventually results in this crash
> > in,
> > arm_smmu_domain_finalise_s1() -->arm_smmu_alloc_cd_tables()-->
> > arm_smmu_alloc_cd_leaf_table() as num_leaf_entries is zero.
> >
> > With the below fix, it works on D05 now,
> >
> > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> > index 8ad90e2..51f5821 100644
> > --- a/drivers/iommu/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm-smmu-v3.c
> > @@ -2433,7 +2433,10 @@ static int arm_smmu_domain_finalise(struct
> iommu_domain *domain,
> >                         domain->min_pasid = 1;
> >                         domain->max_pasid = master->num_ssids - 1;
> >                         smmu_domain->s1_cfg.num_contexts = master->num_ssids;
> > +               } else {
> > +                       smmu_domain->s1_cfg.num_contexts = 1;
> >                 }
> > +
> >                 smmu_domain->s1_cfg.can_stall = master->ste.can_stall;
> >                 break;
> >         case ARM_SMMU_DOMAIN_NESTED:
> >
> >
> > I am not sure this is right place do this. Please take a look.
> 
> Thanks for testing the series and reporting the bug. I added the
> following patch to branch svm/current, does it work for you?

Yes, it does.

Thanks,
Shameer
 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 42c8378624ed..edda466adc81 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -3169,9 +3169,7 @@ static int arm_smmu_add_device(struct device *dev)
>                 }
>         }
> 
> -       if (smmu->ssid_bits)
> -               master->num_ssids = 1 << min(smmu->ssid_bits,
> -                                            fwspec->num_pasid_bits);
> +       master->num_ssids = 1 << min(smmu->ssid_bits, fwspec-
> >num_pasid_bits);
> 
>         if (fwspec->can_stall && smmu->features & ARM_SMMU_FEAT_STALLS) {
>                 master->can_fault = true;


^ permalink raw reply	[flat|nested] 268+ messages in thread

* RE: [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for Substream IDs
@ 2017-11-02 17:02         ` Shameerali Kolothum Thodi
  0 siblings, 0 replies; 268+ messages in thread
From: Shameerali Kolothum Thodi @ 2017-11-02 17:02 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu,
	Mark Rutland, xieyisheng (A),
	Gabriele Paoloni, Catalin Marinas, Will Deacon, okaya, yi.l.liu,
	Lorenzo Pieralisi, ashok.raj, tn, joro, rfranz, lenb,
	jacob.jun.pan, alex.williamson, robh+dt, Leizhen (ThunderTown),
	bhelgaas, dwmw2, liubo (CU),
	rjw, robdclark, hanjun.guo, Sudeep Holla, Robin Murphy, nwatters,
	Linuxarm

DQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogSmVhbi1QaGlsaXBwZSBC
cnVja2VyIFttYWlsdG86SmVhbi1QaGlsaXBwZS5CcnVja2VyQGFybS5jb21dDQo+IFNlbnQ6IFRo
dXJzZGF5LCBOb3ZlbWJlciAwMiwgMjAxNyAzOjUyIFBNDQo+IFRvOiBTaGFtZWVyYWxpIEtvbG90
aHVtIFRob2RpIDxzaGFtZWVyYWxpLmtvbG90aHVtLnRob2RpQGh1YXdlaS5jb20+DQo+IENjOiBs
aW51eC1hcm0ta2VybmVsQGxpc3RzLmluZnJhZGVhZC5vcmc7IGxpbnV4LXBjaUB2Z2VyLmtlcm5l
bC5vcmc7IGxpbnV4LQ0KPiBhY3BpQHZnZXIua2VybmVsLm9yZzsgZGV2aWNldHJlZUB2Z2VyLmtl
cm5lbC5vcmc7IGlvbW11QGxpc3RzLmxpbnV4LQ0KPiBmb3VuZGF0aW9uLm9yZzsgTWFyayBSdXRs
YW5kIDxNYXJrLlJ1dGxhbmRAYXJtLmNvbT47IHhpZXlpc2hlbmcgKEEpDQo+IDx4aWV5aXNoZW5n
MUBodWF3ZWkuY29tPjsgR2FicmllbGUgUGFvbG9uaQ0KPiA8Z2FicmllbGUucGFvbG9uaUBodWF3
ZWkuY29tPjsgQ2F0YWxpbiBNYXJpbmFzDQo+IDxDYXRhbGluLk1hcmluYXNAYXJtLmNvbT47IFdp
bGwgRGVhY29uIDxXaWxsLkRlYWNvbkBhcm0uY29tPjsNCj4gb2theWFAY29kZWF1cm9yYS5vcmc7
IHlpLmwubGl1QGludGVsLmNvbTsgTG9yZW56byBQaWVyYWxpc2kNCj4gPExvcmVuem8uUGllcmFs
aXNpQGFybS5jb20+OyBhc2hvay5yYWpAaW50ZWwuY29tOyB0bkBzZW1paGFsZi5jb207DQo+IGpv
cm9AOGJ5dGVzLm9yZzsgcmZyYW56QGNhdml1bS5jb207IGxlbmJAa2VybmVsLm9yZzsNCj4gamFj
b2IuanVuLnBhbkBsaW51eC5pbnRlbC5jb207IGFsZXgud2lsbGlhbXNvbkByZWRoYXQuY29tOw0K
PiByb2JoK2R0QGtlcm5lbC5vcmc7IExlaXpoZW4gKFRodW5kZXJUb3duKSA8dGh1bmRlci5sZWl6
aGVuQGh1YXdlaS5jb20+Ow0KPiBiaGVsZ2Fhc0Bnb29nbGUuY29tOyBkd213MkBpbmZyYWRlYWQu
b3JnOyBsaXVibyAoQ1UpDQo+IDxsaXVibzk1QGh1YXdlaS5jb20+OyByandAcmp3eXNvY2tpLm5l
dDsgcm9iZGNsYXJrQGdtYWlsLmNvbTsNCj4gaGFuanVuLmd1b0BsaW5hcm8ub3JnOyBTdWRlZXAg
SG9sbGEgPFN1ZGVlcC5Ib2xsYUBhcm0uY29tPjsgUm9iaW4NCj4gTXVycGh5IDxSb2Jpbi5NdXJw
aHlAYXJtLmNvbT47IG53YXR0ZXJzQGNvZGVhdXJvcmEub3JnOyBMaW51eGFybQ0KPiA8bGludXhh
cm1AaHVhd2VpLmNvbT4NCj4gU3ViamVjdDogUmU6IFtSRkN2MiBQQVRDSCAxNC8zNl0gaW9tbXUv
YXJtLXNtbXUtdjM6IEFkZCBzdXBwb3J0IGZvcg0KPiBTdWJzdHJlYW0gSURzDQo+IA0KPiBIaSBT
aGFtZWVyLA0KPiANCj4gT24gVGh1LCBOb3YgMDIsIDIwMTcgYXQgMTI6NDk6MzJQTSArMDAwMCwg
U2hhbWVlcmFsaSBLb2xvdGh1bSBUaG9kaSB3cm90ZToNCj4gPiBXZSBoYWQgYSBnbyB3aXRoIHRo
aXMgc2VyaWVzIG9uIEhpU0lsaWNvbiBEMDUgcGxhdGZvcm0gd2hpY2ggZG9lc24ndCBoYXZlDQo+
ID4gc3VwcG9ydCBmb3Igc3NpZHMvQVRTL1BSSSwgdG8gbWFrZSBzdXJlIGl0IGdlbmVyYWxseSB3
b3Jrcy4NCj4gPg0KPiA+IEJ1dCBvYnNlcnZlZCB0aGUgYmVsb3cgY3Jhc2ggb24gYm9vdCwNCj4g
Pg0KPiA+IFsgICAxNi4wMDkwODRdIFdBUk5JTkc6IENQVTogNTkgUElEOiAzOTEgYXQgbW0vcGFn
ZV9hbGxvYy5jOjM4ODMNCj4gX19hbGxvY19wYWdlc19ub2RlbWFzaysweDE5Yy8weGM0OA0KPiA+
IFsgICAxNi4wMjY3OTddIE1vZHVsZXMgbGlua2VkIGluOg0KPiA+IFsgICAxNi4wMzI5NDRdIENQ
VTogNTkgUElEOiAzOTEgQ29tbToga3dvcmtlci81OToxIE5vdCB0YWludGVkIDQuMTQuMC1yYzEt
DQo+IDE1OTUzOS1nZTQyYWNhMyAjMjM2DQo+ID4gWy4uLl0NCj4gPiBbICAgMTYuMDY4MjA2XSBX
b3JrcXVldWU6IGV2ZW50cyBkZWZlcnJlZF9wcm9iZV93b3JrX2Z1bmMNCj4gPiBbICAgMTYuMDc4
NTU3XSB0YXNrOiBmZmZmODAxN2QzOGEwMDAwIHRhc2suc3RhY2s6IGZmZmYwMDAwMGIxOTgwMDAN
Cj4gPiBbICAgMTYuMDkwNDg2XSBQQyBpcyBhdCBfX2FsbG9jX3BhZ2VzX25vZGVtYXNrKzB4MTlj
LzB4YzQ4DQo+ID4gWyAgIDE2LjEwMTAxM10gTFIgaXMgYXQgX19hbGxvY19wYWdlc19ub2RlbWFz
aysweGUwLzB4YzQ4DQo+ID4gWyAgIDE2LjQ2OTIyMF0gWzxmZmZmMDAwMDA4MTg2Yjk0Pl0gX19h
bGxvY19wYWdlc19ub2RlbWFzaysweDE5Yy8weGM0OA0KPiA+IFsgICAxNi40ODE4NTRdIFs8ZmZm
ZjAwMDAwODFkNjViMD5dIGFsbG9jX3BhZ2VzX2N1cnJlbnQrMHg4MC8weGNjDQo+ID4gWyAgIDE2
LjQ5MzYwN10gWzxmZmZmMDAwMDA4MTgyYmU4Pl0gX19nZXRfZnJlZV9wYWdlcysweGMvMHgzOA0K
PiA+IFsgICAxNi41MDQ2NjFdIFs8ZmZmZjAwMDAwODNjNGQ1OD5dIHN3aW90bGJfYWxsb2NfY29o
ZXJlbnQrMHg2NC8weDE5MA0KPiA+IFsgICAxNi41MTcxMTddIFs8ZmZmZjAwMDAwODA5ODI0Yz5d
IF9fZG1hX2FsbG9jKzB4MTEwLzB4MjA0DQo+ID4gWyAgIDE2LjUyNzgyMF0gWzxmZmZmMDAwMDA4
NThlODUwPl0gZG1hbV9hbGxvY19jb2hlcmVudCsweDg4LzB4ZjANCj4gPiBbICAgMTYuNTM5NTc1
XSBbPGZmZmYwMDAwMDg1Njg4ODQ+XQ0KPiBhcm1fc21tdV9kb21haW5fZmluYWxpc2VfczErMHg2
MC8weDI0OA0KPiA+IFsgICAxNi41NTI5MDldIFs8ZmZmZjAwMDAwODU2YzEwND5dIGFybV9zbW11
X2F0dGFjaF9kZXYrMHgyNjQvMHgzMDANCj4gPiBbICAgMTYuNTY1MDEzXSBbPGZmZmYwMDAwMDg1
NWQ0MGM+XSBfX2lvbW11X2F0dGFjaF9kZXZpY2UrMHg0OC8weDVjDQo+ID4gWyAgIDE2LjU3NzEx
N10gWzxmZmZmMDAwMDA4NTVlNjk4Pl0gaW9tbXVfZ3JvdXBfYWRkX2RldmljZSsweDE0NC8weDNh
NA0KPiA+IFsgICAxNi41ODk3NDZdIFs8ZmZmZjAwMDAwODU1ZWQxOD5dIGlvbW11X2dyb3VwX2dl
dF9mb3JfZGV2KzB4NzAvMHhmOA0KPiA+IFsgICAxNi42MDIyMDFdIFs8ZmZmZjAwMDAwODU2YTMx
ND5dIGFybV9zbW11X2FkZF9kZXZpY2UrMHgxYTQvMHg0MTgNCj4gPiBbICAgMTYuNjE0MzA4XSBb
PGZmZmYwMDAwMDg0OWRmY2M+XSBpb3J0X2lvbW11X2NvbmZpZ3VyZSsweGYwLzB4MTZjDQo+ID4g
WyAgIDE2LjYyNjQxNl0gWzxmZmZmMDAwMDA4NDY4YzUwPl0gYWNwaV9kbWFfY29uZmlndXJlKzB4
MzAvMHg3MA0KPiA+IFsgICAxNi42Mzc5OTRdIFs8ZmZmZjAwMDAwODU4ZjAwYz5dIGRtYV9jb25m
aWd1cmUrMHhhOC8weGQ0DQo+ID4gWyAgIDE2LjY0ODY5NV0gWzxmZmZmMDAwMDA4NTc3MDZjPl0g
ZHJpdmVyX3Byb2JlX2RldmljZSsweDFhNC8weDJkYw0KPiA+IFsgICAxNi42NzMwODFdIFs8ZmZm
ZjAwMDAwODU3NTJjOD5dIGJ1c19mb3JfZWFjaF9kcnYrMHg1NC8weDk0DQo+ID4gWyAgIDE2LjY4
NDMwN10gWzxmZmZmMDAwMDA4NTc2ZGIwPl0gX19kZXZpY2VfYXR0YWNoKzB4YzQvMHgxMmMNCj4g
PiBbICAgMTYuNjk1NTMzXSBbPGZmZmYwMDAwMDg1NzczNTA+XSBkZXZpY2VfaW5pdGlhbF9wcm9i
ZSsweDEwLzB4MTgNCj4gPiBbICAgMTYuNzA3NDYyXSBbPGZmZmYwMDAwMDg1NzYyYjQ+XSBidXNf
cHJvYmVfZGV2aWNlKzB4OTAvMHg5OA0KPiA+DQo+ID4gQWZ0ZXIgYSBiaXQgb2YgZGVidWcgaXQg
bG9va3MgbGlrZSBvbiBwbGF0Zm9ybXMgd2hlcmUgc3NpZCBpcyBub3Qgc3VwcG9ydGVkLA0KPiA+
IHMxX2NmZy5udW1fY29udGV4dHMgaXMgc2V0IHRvIHplcm8gYW5kIGl0IGV2ZW50dWFsbHkgcmVz
dWx0cyBpbiB0aGlzIGNyYXNoDQo+ID4gaW4sDQo+ID4gYXJtX3NtbXVfZG9tYWluX2ZpbmFsaXNl
X3MxKCkgLS0+YXJtX3NtbXVfYWxsb2NfY2RfdGFibGVzKCktLT4NCj4gPiBhcm1fc21tdV9hbGxv
Y19jZF9sZWFmX3RhYmxlKCkgYXMgbnVtX2xlYWZfZW50cmllcyBpcyB6ZXJvLg0KPiA+DQo+ID4g
V2l0aCB0aGUgYmVsb3cgZml4LCBpdCB3b3JrcyBvbiBEMDUgbm93LA0KPiA+DQo+ID4gZGlmZiAt
LWdpdCBhL2RyaXZlcnMvaW9tbXUvYXJtLXNtbXUtdjMuYyBiL2RyaXZlcnMvaW9tbXUvYXJtLXNt
bXUtdjMuYw0KPiA+IGluZGV4IDhhZDkwZTIuLjUxZjU4MjEgMTAwNjQ0DQo+ID4gLS0tIGEvZHJp
dmVycy9pb21tdS9hcm0tc21tdS12My5jDQo+ID4gKysrIGIvZHJpdmVycy9pb21tdS9hcm0tc21t
dS12My5jDQo+ID4gQEAgLTI0MzMsNyArMjQzMywxMCBAQCBzdGF0aWMgaW50IGFybV9zbW11X2Rv
bWFpbl9maW5hbGlzZShzdHJ1Y3QNCj4gaW9tbXVfZG9tYWluICpkb21haW4sDQo+ID4gICAgICAg
ICAgICAgICAgICAgICAgICAgZG9tYWluLT5taW5fcGFzaWQgPSAxOw0KPiA+ICAgICAgICAgICAg
ICAgICAgICAgICAgIGRvbWFpbi0+bWF4X3Bhc2lkID0gbWFzdGVyLT5udW1fc3NpZHMgLSAxOw0K
PiA+ICAgICAgICAgICAgICAgICAgICAgICAgIHNtbXVfZG9tYWluLT5zMV9jZmcubnVtX2NvbnRl
eHRzID0gbWFzdGVyLT5udW1fc3NpZHM7DQo+ID4gKyAgICAgICAgICAgICAgIH0gZWxzZSB7DQo+
ID4gKyAgICAgICAgICAgICAgICAgICAgICAgc21tdV9kb21haW4tPnMxX2NmZy5udW1fY29udGV4
dHMgPSAxOw0KPiA+ICAgICAgICAgICAgICAgICB9DQo+ID4gKw0KPiA+ICAgICAgICAgICAgICAg
ICBzbW11X2RvbWFpbi0+czFfY2ZnLmNhbl9zdGFsbCA9IG1hc3Rlci0+c3RlLmNhbl9zdGFsbDsN
Cj4gPiAgICAgICAgICAgICAgICAgYnJlYWs7DQo+ID4gICAgICAgICBjYXNlIEFSTV9TTU1VX0RP
TUFJTl9ORVNURUQ6DQo+ID4NCj4gPg0KPiA+IEkgYW0gbm90IHN1cmUgdGhpcyBpcyByaWdodCBw
bGFjZSBkbyB0aGlzLiBQbGVhc2UgdGFrZSBhIGxvb2suDQo+IA0KPiBUaGFua3MgZm9yIHRlc3Rp
bmcgdGhlIHNlcmllcyBhbmQgcmVwb3J0aW5nIHRoZSBidWcuIEkgYWRkZWQgdGhlDQo+IGZvbGxv
d2luZyBwYXRjaCB0byBicmFuY2ggc3ZtL2N1cnJlbnQsIGRvZXMgaXQgd29yayBmb3IgeW91Pw0K
DQpZZXMsIGl0IGRvZXMuDQoNClRoYW5rcywNClNoYW1lZXINCiANCj4gZGlmZiAtLWdpdCBhL2Ry
aXZlcnMvaW9tbXUvYXJtLXNtbXUtdjMuYyBiL2RyaXZlcnMvaW9tbXUvYXJtLXNtbXUtdjMuYw0K
PiBpbmRleCA0MmM4Mzc4NjI0ZWQuLmVkZGE0NjZhZGM4MSAxMDA2NDQNCj4gLS0tIGEvZHJpdmVy
cy9pb21tdS9hcm0tc21tdS12My5jDQo+ICsrKyBiL2RyaXZlcnMvaW9tbXUvYXJtLXNtbXUtdjMu
Yw0KPiBAQCAtMzE2OSw5ICszMTY5LDcgQEAgc3RhdGljIGludCBhcm1fc21tdV9hZGRfZGV2aWNl
KHN0cnVjdCBkZXZpY2UgKmRldikNCj4gICAgICAgICAgICAgICAgIH0NCj4gICAgICAgICB9DQo+
IA0KPiAtICAgICAgIGlmIChzbW11LT5zc2lkX2JpdHMpDQo+IC0gICAgICAgICAgICAgICBtYXN0
ZXItPm51bV9zc2lkcyA9IDEgPDwgbWluKHNtbXUtPnNzaWRfYml0cywNCj4gLSAgICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgZndzcGVjLT5udW1fcGFzaWRfYml0cyk7
DQo+ICsgICAgICAgbWFzdGVyLT5udW1fc3NpZHMgPSAxIDw8IG1pbihzbW11LT5zc2lkX2JpdHMs
IGZ3c3BlYy0NCj4gPm51bV9wYXNpZF9iaXRzKTsNCj4gDQo+ICAgICAgICAgaWYgKGZ3c3BlYy0+
Y2FuX3N0YWxsICYmIHNtbXUtPmZlYXR1cmVzICYgQVJNX1NNTVVfRkVBVF9TVEFMTFMpIHsNCj4g
ICAgICAgICAgICAgICAgIG1hc3Rlci0+Y2FuX2ZhdWx0ID0gdHJ1ZTsNCg0K

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for Substream IDs
@ 2017-11-02 17:02         ` Shameerali Kolothum Thodi
  0 siblings, 0 replies; 268+ messages in thread
From: Shameerali Kolothum Thodi @ 2017-11-02 17:02 UTC (permalink / raw)
  To: linux-arm-kernel



> -----Original Message-----
> From: Jean-Philippe Brucker [mailto:Jean-Philippe.Brucker at arm.com]
> Sent: Thursday, November 02, 2017 3:52 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: linux-arm-kernel at lists.infradead.org; linux-pci at vger.kernel.org; linux-
> acpi at vger.kernel.org; devicetree at vger.kernel.org; iommu at lists.linux-
> foundation.org; Mark Rutland <Mark.Rutland@arm.com>; xieyisheng (A)
> <xieyisheng1@huawei.com>; Gabriele Paoloni
> <gabriele.paoloni@huawei.com>; Catalin Marinas
> <Catalin.Marinas@arm.com>; Will Deacon <Will.Deacon@arm.com>;
> okaya at codeaurora.org; yi.l.liu at intel.com; Lorenzo Pieralisi
> <Lorenzo.Pieralisi@arm.com>; ashok.raj at intel.com; tn at semihalf.com;
> joro at 8bytes.org; rfranz at cavium.com; lenb at kernel.org;
> jacob.jun.pan at linux.intel.com; alex.williamson at redhat.com;
> robh+dt at kernel.org; Leizhen (ThunderTown) <thunder.leizhen@huawei.com>;
> bhelgaas at google.com; dwmw2 at infradead.org; liubo (CU)
> <liubo95@huawei.com>; rjw at rjwysocki.net; robdclark at gmail.com;
> hanjun.guo at linaro.org; Sudeep Holla <Sudeep.Holla@arm.com>; Robin
> Murphy <Robin.Murphy@arm.com>; nwatters at codeaurora.org; Linuxarm
> <linuxarm@huawei.com>
> Subject: Re: [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for
> Substream IDs
> 
> Hi Shameer,
> 
> On Thu, Nov 02, 2017 at 12:49:32PM +0000, Shameerali Kolothum Thodi wrote:
> > We had a go with this series on HiSIlicon D05 platform which doesn't have
> > support for ssids/ATS/PRI, to make sure it generally works.
> >
> > But observed the below crash on boot,
> >
> > [   16.009084] WARNING: CPU: 59 PID: 391 at mm/page_alloc.c:3883
> __alloc_pages_nodemask+0x19c/0xc48
> > [   16.026797] Modules linked in:
> > [   16.032944] CPU: 59 PID: 391 Comm: kworker/59:1 Not tainted 4.14.0-rc1-
> 159539-ge42aca3 #236
> > [...]
> > [   16.068206] Workqueue: events deferred_probe_work_func
> > [   16.078557] task: ffff8017d38a0000 task.stack: ffff00000b198000
> > [   16.090486] PC is at __alloc_pages_nodemask+0x19c/0xc48
> > [   16.101013] LR is at __alloc_pages_nodemask+0xe0/0xc48
> > [   16.469220] [<ffff000008186b94>] __alloc_pages_nodemask+0x19c/0xc48
> > [   16.481854] [<ffff0000081d65b0>] alloc_pages_current+0x80/0xcc
> > [   16.493607] [<ffff000008182be8>] __get_free_pages+0xc/0x38
> > [   16.504661] [<ffff0000083c4d58>] swiotlb_alloc_coherent+0x64/0x190
> > [   16.517117] [<ffff00000809824c>] __dma_alloc+0x110/0x204
> > [   16.527820] [<ffff00000858e850>] dmam_alloc_coherent+0x88/0xf0
> > [   16.539575] [<ffff000008568884>]
> arm_smmu_domain_finalise_s1+0x60/0x248
> > [   16.552909] [<ffff00000856c104>] arm_smmu_attach_dev+0x264/0x300
> > [   16.565013] [<ffff00000855d40c>] __iommu_attach_device+0x48/0x5c
> > [   16.577117] [<ffff00000855e698>] iommu_group_add_device+0x144/0x3a4
> > [   16.589746] [<ffff00000855ed18>] iommu_group_get_for_dev+0x70/0xf8
> > [   16.602201] [<ffff00000856a314>] arm_smmu_add_device+0x1a4/0x418
> > [   16.614308] [<ffff00000849dfcc>] iort_iommu_configure+0xf0/0x16c
> > [   16.626416] [<ffff000008468c50>] acpi_dma_configure+0x30/0x70
> > [   16.637994] [<ffff00000858f00c>] dma_configure+0xa8/0xd4
> > [   16.648695] [<ffff00000857706c>] driver_probe_device+0x1a4/0x2dc
> > [   16.673081] [<ffff0000085752c8>] bus_for_each_drv+0x54/0x94
> > [   16.684307] [<ffff000008576db0>] __device_attach+0xc4/0x12c
> > [   16.695533] [<ffff000008577350>] device_initial_probe+0x10/0x18
> > [   16.707462] [<ffff0000085762b4>] bus_probe_device+0x90/0x98
> >
> > After a bit of debug it looks like on platforms where ssid is not supported,
> > s1_cfg.num_contexts is set to zero and it eventually results in this crash
> > in,
> > arm_smmu_domain_finalise_s1() -->arm_smmu_alloc_cd_tables()-->
> > arm_smmu_alloc_cd_leaf_table() as num_leaf_entries is zero.
> >
> > With the below fix, it works on D05 now,
> >
> > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> > index 8ad90e2..51f5821 100644
> > --- a/drivers/iommu/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm-smmu-v3.c
> > @@ -2433,7 +2433,10 @@ static int arm_smmu_domain_finalise(struct
> iommu_domain *domain,
> >                         domain->min_pasid = 1;
> >                         domain->max_pasid = master->num_ssids - 1;
> >                         smmu_domain->s1_cfg.num_contexts = master->num_ssids;
> > +               } else {
> > +                       smmu_domain->s1_cfg.num_contexts = 1;
> >                 }
> > +
> >                 smmu_domain->s1_cfg.can_stall = master->ste.can_stall;
> >                 break;
> >         case ARM_SMMU_DOMAIN_NESTED:
> >
> >
> > I am not sure this is right place do this. Please take a look.
> 
> Thanks for testing the series and reporting the bug. I added the
> following patch to branch svm/current, does it work for you?

Yes, it does.

Thanks,
Shameer
 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 42c8378624ed..edda466adc81 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -3169,9 +3169,7 @@ static int arm_smmu_add_device(struct device *dev)
>                 }
>         }
> 
> -       if (smmu->ssid_bits)
> -               master->num_ssids = 1 << min(smmu->ssid_bits,
> -                                            fwspec->num_pasid_bits);
> +       master->num_ssids = 1 << min(smmu->ssid_bits, fwspec-
> >num_pasid_bits);
> 
>         if (fwspec->can_stall && smmu->features & ARM_SMMU_FEAT_STALLS) {
>                 master->can_fault = true;

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for Substream IDs
  2017-11-02 17:02         ` Shameerali Kolothum Thodi
  (?)
@ 2017-11-03  5:45           ` Yisheng Xie
  -1 siblings, 0 replies; 268+ messages in thread
From: Yisheng Xie @ 2017-11-03  5:45 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, Jean-Philippe Brucker
  Cc: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu,
	Mark Rutland, Gabriele Paoloni, Catalin Marinas, Will Deacon,
	okaya, yi.l.liu, Lorenzo Pieralisi, ashok.raj, tn, joro,
	rfranz@cavium.com

Hi Jean,

On 2017/11/3 1:02, Shameerali Kolothum Thodi wrote:
> 
> 
>> -----Original Message-----
>> From: Jean-Philippe Brucker [mailto:Jean-Philippe.Brucker@arm.com]
>> Sent: Thursday, November 02, 2017 3:52 PM
>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
>> Cc: linux-arm-kernel@lists.infradead.org; linux-pci@vger.kernel.org; linux-
>> acpi@vger.kernel.org; devicetree@vger.kernel.org; iommu@lists.linux-
>> foundation.org; Mark Rutland <Mark.Rutland@arm.com>; xieyisheng (A)
>> <xieyisheng1@huawei.com>; Gabriele Paoloni
>> <gabriele.paoloni@huawei.com>; Catalin Marinas
>> <Catalin.Marinas@arm.com>; Will Deacon <Will.Deacon@arm.com>;
>> okaya@codeaurora.org; yi.l.liu@intel.com; Lorenzo Pieralisi
>> <Lorenzo.Pieralisi@arm.com>; ashok.raj@intel.com; tn@semihalf.com;
>> joro@8bytes.org; rfranz@cavium.com; lenb@kernel.org;
>> jacob.jun.pan@linux.intel.com; alex.williamson@redhat.com;
>> robh+dt@kernel.org; Leizhen (ThunderTown) <thunder.leizhen@huawei.com>;
>> bhelgaas@google.com; dwmw2@infradead.org; liubo (CU)
>> <liubo95@huawei.com>; rjw@rjwysocki.net; robdclark@gmail.com;
>> hanjun.guo@linaro.org; Sudeep Holla <Sudeep.Holla@arm.com>; Robin
>> Murphy <Robin.Murphy@arm.com>; nwatters@codeaurora.org; Linuxarm
>> <linuxarm@huawei.com>
>> Subject: Re: [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for
>> Substream IDs
>>
>> Hi Shameer,
>>
>> On Thu, Nov 02, 2017 at 12:49:32PM +0000, Shameerali Kolothum Thodi wrote:
>>> We had a go with this series on HiSIlicon D05 platform which doesn't have
>>> support for ssids/ATS/PRI, to make sure it generally works.
>>>
>>> But observed the below crash on boot,
>>>
>>> [   16.009084] WARNING: CPU: 59 PID: 391 at mm/page_alloc.c:3883
>> __alloc_pages_nodemask+0x19c/0xc48
>>> [   16.026797] Modules linked in:
>>> [   16.032944] CPU: 59 PID: 391 Comm: kworker/59:1 Not tainted 4.14.0-rc1-
>> 159539-ge42aca3 #236
>>> [...]
>>> [   16.068206] Workqueue: events deferred_probe_work_func
>>> [   16.078557] task: ffff8017d38a0000 task.stack: ffff00000b198000
>>> [   16.090486] PC is at __alloc_pages_nodemask+0x19c/0xc48
>>> [   16.101013] LR is at __alloc_pages_nodemask+0xe0/0xc48
>>> [   16.469220] [<ffff000008186b94>] __alloc_pages_nodemask+0x19c/0xc48
>>> [   16.481854] [<ffff0000081d65b0>] alloc_pages_current+0x80/0xcc
>>> [   16.493607] [<ffff000008182be8>] __get_free_pages+0xc/0x38
>>> [   16.504661] [<ffff0000083c4d58>] swiotlb_alloc_coherent+0x64/0x190
>>> [   16.517117] [<ffff00000809824c>] __dma_alloc+0x110/0x204
>>> [   16.527820] [<ffff00000858e850>] dmam_alloc_coherent+0x88/0xf0
>>> [   16.539575] [<ffff000008568884>]
>> arm_smmu_domain_finalise_s1+0x60/0x248
>>> [   16.552909] [<ffff00000856c104>] arm_smmu_attach_dev+0x264/0x300
>>> [   16.565013] [<ffff00000855d40c>] __iommu_attach_device+0x48/0x5c
>>> [   16.577117] [<ffff00000855e698>] iommu_group_add_device+0x144/0x3a4
>>> [   16.589746] [<ffff00000855ed18>] iommu_group_get_for_dev+0x70/0xf8
>>> [   16.602201] [<ffff00000856a314>] arm_smmu_add_device+0x1a4/0x418
>>> [   16.614308] [<ffff00000849dfcc>] iort_iommu_configure+0xf0/0x16c
>>> [   16.626416] [<ffff000008468c50>] acpi_dma_configure+0x30/0x70
>>> [   16.637994] [<ffff00000858f00c>] dma_configure+0xa8/0xd4
>>> [   16.648695] [<ffff00000857706c>] driver_probe_device+0x1a4/0x2dc
>>> [   16.673081] [<ffff0000085752c8>] bus_for_each_drv+0x54/0x94
>>> [   16.684307] [<ffff000008576db0>] __device_attach+0xc4/0x12c
>>> [   16.695533] [<ffff000008577350>] device_initial_probe+0x10/0x18
>>> [   16.707462] [<ffff0000085762b4>] bus_probe_device+0x90/0x98
>>>
>>> After a bit of debug it looks like on platforms where ssid is not supported,
>>> s1_cfg.num_contexts is set to zero and it eventually results in this crash
>>> in,
>>> arm_smmu_domain_finalise_s1() -->arm_smmu_alloc_cd_tables()-->
>>> arm_smmu_alloc_cd_leaf_table() as num_leaf_entries is zero.
>>>
>>> With the below fix, it works on D05 now,
>>>
>>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>>> index 8ad90e2..51f5821 100644
>>> --- a/drivers/iommu/arm-smmu-v3.c
>>> +++ b/drivers/iommu/arm-smmu-v3.c
>>> @@ -2433,7 +2433,10 @@ static int arm_smmu_domain_finalise(struct
>> iommu_domain *domain,
>>>                         domain->min_pasid = 1;
>>>                         domain->max_pasid = master->num_ssids - 1;
>>>                         smmu_domain->s1_cfg.num_contexts = master->num_ssids;
>>> +               } else {
>>> +                       smmu_domain->s1_cfg.num_contexts = 1;
>>>                 }
>>> +
>>>                 smmu_domain->s1_cfg.can_stall = master->ste.can_stall;
>>>                 break;
>>>         case ARM_SMMU_DOMAIN_NESTED:
>>>
>>>
>>> I am not sure this is right place do this. Please take a look.
>>
>> Thanks for testing the series and reporting the bug. I added the
>> following patch to branch svm/current, does it work for you?
> 
> Yes, it does.
> 
> Thanks,
> Shameer
>  
>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>> index 42c8378624ed..edda466adc81 100644
>> --- a/drivers/iommu/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm-smmu-v3.c
>> @@ -3169,9 +3169,7 @@ static int arm_smmu_add_device(struct device *dev)
>>                 }
>>         }
>>
>> -       if (smmu->ssid_bits)
>> -               master->num_ssids = 1 << min(smmu->ssid_bits,
>> -                                            fwspec->num_pasid_bits);
>> +       master->num_ssids = 1 << min(smmu->ssid_bits, fwspec-
>>> num_pasid_bits);

If fwspec->num_pasid_bits = 0, then master have _one_ num_ssids ?

It seems Shameerali's fix is better ?

>>
>>         if (fwspec->can_stall && smmu->features & ARM_SMMU_FEAT_STALLS) {
>>                 master->can_fault = true;
> 


^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for Substream IDs
@ 2017-11-03  5:45           ` Yisheng Xie
  0 siblings, 0 replies; 268+ messages in thread
From: Yisheng Xie @ 2017-11-03  5:45 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, Jean-Philippe Brucker
  Cc: Mark Rutland, Gabriele Paoloni, linux-pci, Will Deacon, Linuxarm,
	okaya, Lorenzo Pieralisi, yi.l.liu, ashok.raj, tn, joro,
	robdclark, linux-acpi, Catalin Marinas, rfranz, lenb, devicetree,
	jacob.jun.pan, alex.williamson, robh+dt, Leizhen (ThunderTown),
	bhelgaas, linux-arm-kernel, Robin Murphy, liubo (CU),
	rjw, iommu, hanjun.guo, Sudeep Holla, dwmw2, nwatters

Hi Jean,

On 2017/11/3 1:02, Shameerali Kolothum Thodi wrote:
> 
> 
>> -----Original Message-----
>> From: Jean-Philippe Brucker [mailto:Jean-Philippe.Brucker@arm.com]
>> Sent: Thursday, November 02, 2017 3:52 PM
>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
>> Cc: linux-arm-kernel@lists.infradead.org; linux-pci@vger.kernel.org; linux-
>> acpi@vger.kernel.org; devicetree@vger.kernel.org; iommu@lists.linux-
>> foundation.org; Mark Rutland <Mark.Rutland@arm.com>; xieyisheng (A)
>> <xieyisheng1@huawei.com>; Gabriele Paoloni
>> <gabriele.paoloni@huawei.com>; Catalin Marinas
>> <Catalin.Marinas@arm.com>; Will Deacon <Will.Deacon@arm.com>;
>> okaya@codeaurora.org; yi.l.liu@intel.com; Lorenzo Pieralisi
>> <Lorenzo.Pieralisi@arm.com>; ashok.raj@intel.com; tn@semihalf.com;
>> joro@8bytes.org; rfranz@cavium.com; lenb@kernel.org;
>> jacob.jun.pan@linux.intel.com; alex.williamson@redhat.com;
>> robh+dt@kernel.org; Leizhen (ThunderTown) <thunder.leizhen@huawei.com>;
>> bhelgaas@google.com; dwmw2@infradead.org; liubo (CU)
>> <liubo95@huawei.com>; rjw@rjwysocki.net; robdclark@gmail.com;
>> hanjun.guo@linaro.org; Sudeep Holla <Sudeep.Holla@arm.com>; Robin
>> Murphy <Robin.Murphy@arm.com>; nwatters@codeaurora.org; Linuxarm
>> <linuxarm@huawei.com>
>> Subject: Re: [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for
>> Substream IDs
>>
>> Hi Shameer,
>>
>> On Thu, Nov 02, 2017 at 12:49:32PM +0000, Shameerali Kolothum Thodi wrote:
>>> We had a go with this series on HiSIlicon D05 platform which doesn't have
>>> support for ssids/ATS/PRI, to make sure it generally works.
>>>
>>> But observed the below crash on boot,
>>>
>>> [   16.009084] WARNING: CPU: 59 PID: 391 at mm/page_alloc.c:3883
>> __alloc_pages_nodemask+0x19c/0xc48
>>> [   16.026797] Modules linked in:
>>> [   16.032944] CPU: 59 PID: 391 Comm: kworker/59:1 Not tainted 4.14.0-rc1-
>> 159539-ge42aca3 #236
>>> [...]
>>> [   16.068206] Workqueue: events deferred_probe_work_func
>>> [   16.078557] task: ffff8017d38a0000 task.stack: ffff00000b198000
>>> [   16.090486] PC is at __alloc_pages_nodemask+0x19c/0xc48
>>> [   16.101013] LR is at __alloc_pages_nodemask+0xe0/0xc48
>>> [   16.469220] [<ffff000008186b94>] __alloc_pages_nodemask+0x19c/0xc48
>>> [   16.481854] [<ffff0000081d65b0>] alloc_pages_current+0x80/0xcc
>>> [   16.493607] [<ffff000008182be8>] __get_free_pages+0xc/0x38
>>> [   16.504661] [<ffff0000083c4d58>] swiotlb_alloc_coherent+0x64/0x190
>>> [   16.517117] [<ffff00000809824c>] __dma_alloc+0x110/0x204
>>> [   16.527820] [<ffff00000858e850>] dmam_alloc_coherent+0x88/0xf0
>>> [   16.539575] [<ffff000008568884>]
>> arm_smmu_domain_finalise_s1+0x60/0x248
>>> [   16.552909] [<ffff00000856c104>] arm_smmu_attach_dev+0x264/0x300
>>> [   16.565013] [<ffff00000855d40c>] __iommu_attach_device+0x48/0x5c
>>> [   16.577117] [<ffff00000855e698>] iommu_group_add_device+0x144/0x3a4
>>> [   16.589746] [<ffff00000855ed18>] iommu_group_get_for_dev+0x70/0xf8
>>> [   16.602201] [<ffff00000856a314>] arm_smmu_add_device+0x1a4/0x418
>>> [   16.614308] [<ffff00000849dfcc>] iort_iommu_configure+0xf0/0x16c
>>> [   16.626416] [<ffff000008468c50>] acpi_dma_configure+0x30/0x70
>>> [   16.637994] [<ffff00000858f00c>] dma_configure+0xa8/0xd4
>>> [   16.648695] [<ffff00000857706c>] driver_probe_device+0x1a4/0x2dc
>>> [   16.673081] [<ffff0000085752c8>] bus_for_each_drv+0x54/0x94
>>> [   16.684307] [<ffff000008576db0>] __device_attach+0xc4/0x12c
>>> [   16.695533] [<ffff000008577350>] device_initial_probe+0x10/0x18
>>> [   16.707462] [<ffff0000085762b4>] bus_probe_device+0x90/0x98
>>>
>>> After a bit of debug it looks like on platforms where ssid is not supported,
>>> s1_cfg.num_contexts is set to zero and it eventually results in this crash
>>> in,
>>> arm_smmu_domain_finalise_s1() -->arm_smmu_alloc_cd_tables()-->
>>> arm_smmu_alloc_cd_leaf_table() as num_leaf_entries is zero.
>>>
>>> With the below fix, it works on D05 now,
>>>
>>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>>> index 8ad90e2..51f5821 100644
>>> --- a/drivers/iommu/arm-smmu-v3.c
>>> +++ b/drivers/iommu/arm-smmu-v3.c
>>> @@ -2433,7 +2433,10 @@ static int arm_smmu_domain_finalise(struct
>> iommu_domain *domain,
>>>                         domain->min_pasid = 1;
>>>                         domain->max_pasid = master->num_ssids - 1;
>>>                         smmu_domain->s1_cfg.num_contexts = master->num_ssids;
>>> +               } else {
>>> +                       smmu_domain->s1_cfg.num_contexts = 1;
>>>                 }
>>> +
>>>                 smmu_domain->s1_cfg.can_stall = master->ste.can_stall;
>>>                 break;
>>>         case ARM_SMMU_DOMAIN_NESTED:
>>>
>>>
>>> I am not sure this is right place do this. Please take a look.
>>
>> Thanks for testing the series and reporting the bug. I added the
>> following patch to branch svm/current, does it work for you?
> 
> Yes, it does.
> 
> Thanks,
> Shameer
>  
>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>> index 42c8378624ed..edda466adc81 100644
>> --- a/drivers/iommu/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm-smmu-v3.c
>> @@ -3169,9 +3169,7 @@ static int arm_smmu_add_device(struct device *dev)
>>                 }
>>         }
>>
>> -       if (smmu->ssid_bits)
>> -               master->num_ssids = 1 << min(smmu->ssid_bits,
>> -                                            fwspec->num_pasid_bits);
>> +       master->num_ssids = 1 << min(smmu->ssid_bits, fwspec-
>>> num_pasid_bits);

If fwspec->num_pasid_bits = 0, then master have _one_ num_ssids ?

It seems Shameerali's fix is better ?

>>
>>         if (fwspec->can_stall && smmu->features & ARM_SMMU_FEAT_STALLS) {
>>                 master->can_fault = true;
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for Substream IDs
@ 2017-11-03  5:45           ` Yisheng Xie
  0 siblings, 0 replies; 268+ messages in thread
From: Yisheng Xie @ 2017-11-03  5:45 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jean,

On 2017/11/3 1:02, Shameerali Kolothum Thodi wrote:
> 
> 
>> -----Original Message-----
>> From: Jean-Philippe Brucker [mailto:Jean-Philippe.Brucker at arm.com]
>> Sent: Thursday, November 02, 2017 3:52 PM
>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
>> Cc: linux-arm-kernel at lists.infradead.org; linux-pci at vger.kernel.org; linux-
>> acpi at vger.kernel.org; devicetree at vger.kernel.org; iommu at lists.linux-
>> foundation.org; Mark Rutland <Mark.Rutland@arm.com>; xieyisheng (A)
>> <xieyisheng1@huawei.com>; Gabriele Paoloni
>> <gabriele.paoloni@huawei.com>; Catalin Marinas
>> <Catalin.Marinas@arm.com>; Will Deacon <Will.Deacon@arm.com>;
>> okaya at codeaurora.org; yi.l.liu at intel.com; Lorenzo Pieralisi
>> <Lorenzo.Pieralisi@arm.com>; ashok.raj at intel.com; tn at semihalf.com;
>> joro at 8bytes.org; rfranz at cavium.com; lenb at kernel.org;
>> jacob.jun.pan at linux.intel.com; alex.williamson at redhat.com;
>> robh+dt at kernel.org; Leizhen (ThunderTown) <thunder.leizhen@huawei.com>;
>> bhelgaas at google.com; dwmw2 at infradead.org; liubo (CU)
>> <liubo95@huawei.com>; rjw at rjwysocki.net; robdclark at gmail.com;
>> hanjun.guo at linaro.org; Sudeep Holla <Sudeep.Holla@arm.com>; Robin
>> Murphy <Robin.Murphy@arm.com>; nwatters at codeaurora.org; Linuxarm
>> <linuxarm@huawei.com>
>> Subject: Re: [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for
>> Substream IDs
>>
>> Hi Shameer,
>>
>> On Thu, Nov 02, 2017 at 12:49:32PM +0000, Shameerali Kolothum Thodi wrote:
>>> We had a go with this series on HiSIlicon D05 platform which doesn't have
>>> support for ssids/ATS/PRI, to make sure it generally works.
>>>
>>> But observed the below crash on boot,
>>>
>>> [   16.009084] WARNING: CPU: 59 PID: 391 at mm/page_alloc.c:3883
>> __alloc_pages_nodemask+0x19c/0xc48
>>> [   16.026797] Modules linked in:
>>> [   16.032944] CPU: 59 PID: 391 Comm: kworker/59:1 Not tainted 4.14.0-rc1-
>> 159539-ge42aca3 #236
>>> [...]
>>> [   16.068206] Workqueue: events deferred_probe_work_func
>>> [   16.078557] task: ffff8017d38a0000 task.stack: ffff00000b198000
>>> [   16.090486] PC is at __alloc_pages_nodemask+0x19c/0xc48
>>> [   16.101013] LR is at __alloc_pages_nodemask+0xe0/0xc48
>>> [   16.469220] [<ffff000008186b94>] __alloc_pages_nodemask+0x19c/0xc48
>>> [   16.481854] [<ffff0000081d65b0>] alloc_pages_current+0x80/0xcc
>>> [   16.493607] [<ffff000008182be8>] __get_free_pages+0xc/0x38
>>> [   16.504661] [<ffff0000083c4d58>] swiotlb_alloc_coherent+0x64/0x190
>>> [   16.517117] [<ffff00000809824c>] __dma_alloc+0x110/0x204
>>> [   16.527820] [<ffff00000858e850>] dmam_alloc_coherent+0x88/0xf0
>>> [   16.539575] [<ffff000008568884>]
>> arm_smmu_domain_finalise_s1+0x60/0x248
>>> [   16.552909] [<ffff00000856c104>] arm_smmu_attach_dev+0x264/0x300
>>> [   16.565013] [<ffff00000855d40c>] __iommu_attach_device+0x48/0x5c
>>> [   16.577117] [<ffff00000855e698>] iommu_group_add_device+0x144/0x3a4
>>> [   16.589746] [<ffff00000855ed18>] iommu_group_get_for_dev+0x70/0xf8
>>> [   16.602201] [<ffff00000856a314>] arm_smmu_add_device+0x1a4/0x418
>>> [   16.614308] [<ffff00000849dfcc>] iort_iommu_configure+0xf0/0x16c
>>> [   16.626416] [<ffff000008468c50>] acpi_dma_configure+0x30/0x70
>>> [   16.637994] [<ffff00000858f00c>] dma_configure+0xa8/0xd4
>>> [   16.648695] [<ffff00000857706c>] driver_probe_device+0x1a4/0x2dc
>>> [   16.673081] [<ffff0000085752c8>] bus_for_each_drv+0x54/0x94
>>> [   16.684307] [<ffff000008576db0>] __device_attach+0xc4/0x12c
>>> [   16.695533] [<ffff000008577350>] device_initial_probe+0x10/0x18
>>> [   16.707462] [<ffff0000085762b4>] bus_probe_device+0x90/0x98
>>>
>>> After a bit of debug it looks like on platforms where ssid is not supported,
>>> s1_cfg.num_contexts is set to zero and it eventually results in this crash
>>> in,
>>> arm_smmu_domain_finalise_s1() -->arm_smmu_alloc_cd_tables()-->
>>> arm_smmu_alloc_cd_leaf_table() as num_leaf_entries is zero.
>>>
>>> With the below fix, it works on D05 now,
>>>
>>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>>> index 8ad90e2..51f5821 100644
>>> --- a/drivers/iommu/arm-smmu-v3.c
>>> +++ b/drivers/iommu/arm-smmu-v3.c
>>> @@ -2433,7 +2433,10 @@ static int arm_smmu_domain_finalise(struct
>> iommu_domain *domain,
>>>                         domain->min_pasid = 1;
>>>                         domain->max_pasid = master->num_ssids - 1;
>>>                         smmu_domain->s1_cfg.num_contexts = master->num_ssids;
>>> +               } else {
>>> +                       smmu_domain->s1_cfg.num_contexts = 1;
>>>                 }
>>> +
>>>                 smmu_domain->s1_cfg.can_stall = master->ste.can_stall;
>>>                 break;
>>>         case ARM_SMMU_DOMAIN_NESTED:
>>>
>>>
>>> I am not sure this is right place do this. Please take a look.
>>
>> Thanks for testing the series and reporting the bug. I added the
>> following patch to branch svm/current, does it work for you?
> 
> Yes, it does.
> 
> Thanks,
> Shameer
>  
>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>> index 42c8378624ed..edda466adc81 100644
>> --- a/drivers/iommu/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm-smmu-v3.c
>> @@ -3169,9 +3169,7 @@ static int arm_smmu_add_device(struct device *dev)
>>                 }
>>         }
>>
>> -       if (smmu->ssid_bits)
>> -               master->num_ssids = 1 << min(smmu->ssid_bits,
>> -                                            fwspec->num_pasid_bits);
>> +       master->num_ssids = 1 << min(smmu->ssid_bits, fwspec-
>>> num_pasid_bits);

If fwspec->num_pasid_bits = 0, then master have _one_ num_ssids ?

It seems Shameerali's fix is better ?

>>
>>         if (fwspec->can_stall && smmu->features & ARM_SMMU_FEAT_STALLS) {
>>                 master->can_fault = true;
> 

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for Substream IDs
  2017-11-03  5:45           ` Yisheng Xie
  (?)
@ 2017-11-03  9:37             ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-03  9:37 UTC (permalink / raw)
  To: Yisheng Xie, Shameerali Kolothum Thodi
  Cc: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu,
	Mark Rutland, Gabriele Paoloni, Catalin Marinas, Will Deacon,
	okaya, yi.l.liu, Lorenzo Pieralisi, ashok.raj, tn, joro,
	rfranz@cavium.com

On 03/11/17 05:45, Yisheng Xie wrote:
> Hi Jean,
> 
> On 2017/11/3 1:02, Shameerali Kolothum Thodi wrote:
>>
>>
>>> -----Original Message-----
>>> From: Jean-Philippe Brucker [mailto:Jean-Philippe.Brucker@arm.com]
>>> Sent: Thursday, November 02, 2017 3:52 PM
>>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
>>> Cc: linux-arm-kernel@lists.infradead.org; linux-pci@vger.kernel.org; linux-
>>> acpi@vger.kernel.org; devicetree@vger.kernel.org; iommu@lists.linux-
>>> foundation.org; Mark Rutland <Mark.Rutland@arm.com>; xieyisheng (A)
>>> <xieyisheng1@huawei.com>; Gabriele Paoloni
>>> <gabriele.paoloni@huawei.com>; Catalin Marinas
>>> <Catalin.Marinas@arm.com>; Will Deacon <Will.Deacon@arm.com>;
>>> okaya@codeaurora.org; yi.l.liu@intel.com; Lorenzo Pieralisi
>>> <Lorenzo.Pieralisi@arm.com>; ashok.raj@intel.com; tn@semihalf.com;
>>> joro@8bytes.org; rfranz@cavium.com; lenb@kernel.org;
>>> jacob.jun.pan@linux.intel.com; alex.williamson@redhat.com;
>>> robh+dt@kernel.org; Leizhen (ThunderTown) <thunder.leizhen@huawei.com>;
>>> bhelgaas@google.com; dwmw2@infradead.org; liubo (CU)
>>> <liubo95@huawei.com>; rjw@rjwysocki.net; robdclark@gmail.com;
>>> hanjun.guo@linaro.org; Sudeep Holla <Sudeep.Holla@arm.com>; Robin
>>> Murphy <Robin.Murphy@arm.com>; nwatters@codeaurora.org; Linuxarm
>>> <linuxarm@huawei.com>
>>> Subject: Re: [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for
>>> Substream IDs
>>>
>>> Hi Shameer,
>>>
>>> On Thu, Nov 02, 2017 at 12:49:32PM +0000, Shameerali Kolothum Thodi wrote:
>>>> We had a go with this series on HiSIlicon D05 platform which doesn't have
>>>> support for ssids/ATS/PRI, to make sure it generally works.
>>>>
>>>> But observed the below crash on boot,
>>>>
>>>> [   16.009084] WARNING: CPU: 59 PID: 391 at mm/page_alloc.c:3883
>>> __alloc_pages_nodemask+0x19c/0xc48
>>>> [   16.026797] Modules linked in:
>>>> [   16.032944] CPU: 59 PID: 391 Comm: kworker/59:1 Not tainted 4.14.0-rc1-
>>> 159539-ge42aca3 #236
>>>> [...]
>>>> [   16.068206] Workqueue: events deferred_probe_work_func
>>>> [   16.078557] task: ffff8017d38a0000 task.stack: ffff00000b198000
>>>> [   16.090486] PC is at __alloc_pages_nodemask+0x19c/0xc48
>>>> [   16.101013] LR is at __alloc_pages_nodemask+0xe0/0xc48
>>>> [   16.469220] [<ffff000008186b94>] __alloc_pages_nodemask+0x19c/0xc48
>>>> [   16.481854] [<ffff0000081d65b0>] alloc_pages_current+0x80/0xcc
>>>> [   16.493607] [<ffff000008182be8>] __get_free_pages+0xc/0x38
>>>> [   16.504661] [<ffff0000083c4d58>] swiotlb_alloc_coherent+0x64/0x190
>>>> [   16.517117] [<ffff00000809824c>] __dma_alloc+0x110/0x204
>>>> [   16.527820] [<ffff00000858e850>] dmam_alloc_coherent+0x88/0xf0
>>>> [   16.539575] [<ffff000008568884>]
>>> arm_smmu_domain_finalise_s1+0x60/0x248
>>>> [   16.552909] [<ffff00000856c104>] arm_smmu_attach_dev+0x264/0x300
>>>> [   16.565013] [<ffff00000855d40c>] __iommu_attach_device+0x48/0x5c
>>>> [   16.577117] [<ffff00000855e698>] iommu_group_add_device+0x144/0x3a4
>>>> [   16.589746] [<ffff00000855ed18>] iommu_group_get_for_dev+0x70/0xf8
>>>> [   16.602201] [<ffff00000856a314>] arm_smmu_add_device+0x1a4/0x418
>>>> [   16.614308] [<ffff00000849dfcc>] iort_iommu_configure+0xf0/0x16c
>>>> [   16.626416] [<ffff000008468c50>] acpi_dma_configure+0x30/0x70
>>>> [   16.637994] [<ffff00000858f00c>] dma_configure+0xa8/0xd4
>>>> [   16.648695] [<ffff00000857706c>] driver_probe_device+0x1a4/0x2dc
>>>> [   16.673081] [<ffff0000085752c8>] bus_for_each_drv+0x54/0x94
>>>> [   16.684307] [<ffff000008576db0>] __device_attach+0xc4/0x12c
>>>> [   16.695533] [<ffff000008577350>] device_initial_probe+0x10/0x18
>>>> [   16.707462] [<ffff0000085762b4>] bus_probe_device+0x90/0x98
>>>>
>>>> After a bit of debug it looks like on platforms where ssid is not supported,
>>>> s1_cfg.num_contexts is set to zero and it eventually results in this crash
>>>> in,
>>>> arm_smmu_domain_finalise_s1() -->arm_smmu_alloc_cd_tables()-->
>>>> arm_smmu_alloc_cd_leaf_table() as num_leaf_entries is zero.
>>>>
>>>> With the below fix, it works on D05 now,
>>>>
>>>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>>>> index 8ad90e2..51f5821 100644
>>>> --- a/drivers/iommu/arm-smmu-v3.c
>>>> +++ b/drivers/iommu/arm-smmu-v3.c
>>>> @@ -2433,7 +2433,10 @@ static int arm_smmu_domain_finalise(struct
>>> iommu_domain *domain,
>>>>                         domain->min_pasid = 1;
>>>>                         domain->max_pasid = master->num_ssids - 1;
>>>>                         smmu_domain->s1_cfg.num_contexts = master->num_ssids;
>>>> +               } else {
>>>> +                       smmu_domain->s1_cfg.num_contexts = 1;
>>>>                 }
>>>> +
>>>>                 smmu_domain->s1_cfg.can_stall = master->ste.can_stall;
>>>>                 break;
>>>>         case ARM_SMMU_DOMAIN_NESTED:
>>>>
>>>>
>>>> I am not sure this is right place do this. Please take a look.
>>>
>>> Thanks for testing the series and reporting the bug. I added the
>>> following patch to branch svm/current, does it work for you?
>>
>> Yes, it does.
>>
>> Thanks,
>> Shameer
>>  
>>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>>> index 42c8378624ed..edda466adc81 100644
>>> --- a/drivers/iommu/arm-smmu-v3.c
>>> +++ b/drivers/iommu/arm-smmu-v3.c
>>> @@ -3169,9 +3169,7 @@ static int arm_smmu_add_device(struct device *dev)
>>>                 }
>>>         }
>>>
>>> -       if (smmu->ssid_bits)
>>> -               master->num_ssids = 1 << min(smmu->ssid_bits,
>>> -                                            fwspec->num_pasid_bits);
>>> +       master->num_ssids = 1 << min(smmu->ssid_bits, fwspec-
>>>> num_pasid_bits);
> 
> If fwspec->num_pasid_bits = 0, then master have _one_ num_ssids ?

Yes, the context table allocator always needs to allocate at least one
entry, even if the master or SMMU doesn't support SSID. I think an earlier
version called this field "num_contexts", maybe we should got back to that
name for clarity?

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for Substream IDs
@ 2017-11-03  9:37             ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-03  9:37 UTC (permalink / raw)
  To: Yisheng Xie, Shameerali Kolothum Thodi
  Cc: Mark Rutland, Gabriele Paoloni, linux-pci, Will Deacon, Linuxarm,
	okaya, Lorenzo Pieralisi, yi.l.liu, ashok.raj, tn, joro,
	robdclark, linux-acpi, Catalin Marinas, rfranz, lenb, devicetree,
	jacob.jun.pan, alex.williamson, robh+dt, Leizhen (ThunderTown),
	bhelgaas, linux-arm-kernel, Robin Murphy, liubo (CU),
	rjw, iommu, hanjun.guo, Sudeep Holla, dwmw2, nwatters

On 03/11/17 05:45, Yisheng Xie wrote:
> Hi Jean,
> 
> On 2017/11/3 1:02, Shameerali Kolothum Thodi wrote:
>>
>>
>>> -----Original Message-----
>>> From: Jean-Philippe Brucker [mailto:Jean-Philippe.Brucker@arm.com]
>>> Sent: Thursday, November 02, 2017 3:52 PM
>>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
>>> Cc: linux-arm-kernel@lists.infradead.org; linux-pci@vger.kernel.org; linux-
>>> acpi@vger.kernel.org; devicetree@vger.kernel.org; iommu@lists.linux-
>>> foundation.org; Mark Rutland <Mark.Rutland@arm.com>; xieyisheng (A)
>>> <xieyisheng1@huawei.com>; Gabriele Paoloni
>>> <gabriele.paoloni@huawei.com>; Catalin Marinas
>>> <Catalin.Marinas@arm.com>; Will Deacon <Will.Deacon@arm.com>;
>>> okaya@codeaurora.org; yi.l.liu@intel.com; Lorenzo Pieralisi
>>> <Lorenzo.Pieralisi@arm.com>; ashok.raj@intel.com; tn@semihalf.com;
>>> joro@8bytes.org; rfranz@cavium.com; lenb@kernel.org;
>>> jacob.jun.pan@linux.intel.com; alex.williamson@redhat.com;
>>> robh+dt@kernel.org; Leizhen (ThunderTown) <thunder.leizhen@huawei.com>;
>>> bhelgaas@google.com; dwmw2@infradead.org; liubo (CU)
>>> <liubo95@huawei.com>; rjw@rjwysocki.net; robdclark@gmail.com;
>>> hanjun.guo@linaro.org; Sudeep Holla <Sudeep.Holla@arm.com>; Robin
>>> Murphy <Robin.Murphy@arm.com>; nwatters@codeaurora.org; Linuxarm
>>> <linuxarm@huawei.com>
>>> Subject: Re: [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for
>>> Substream IDs
>>>
>>> Hi Shameer,
>>>
>>> On Thu, Nov 02, 2017 at 12:49:32PM +0000, Shameerali Kolothum Thodi wrote:
>>>> We had a go with this series on HiSIlicon D05 platform which doesn't have
>>>> support for ssids/ATS/PRI, to make sure it generally works.
>>>>
>>>> But observed the below crash on boot,
>>>>
>>>> [   16.009084] WARNING: CPU: 59 PID: 391 at mm/page_alloc.c:3883
>>> __alloc_pages_nodemask+0x19c/0xc48
>>>> [   16.026797] Modules linked in:
>>>> [   16.032944] CPU: 59 PID: 391 Comm: kworker/59:1 Not tainted 4.14.0-rc1-
>>> 159539-ge42aca3 #236
>>>> [...]
>>>> [   16.068206] Workqueue: events deferred_probe_work_func
>>>> [   16.078557] task: ffff8017d38a0000 task.stack: ffff00000b198000
>>>> [   16.090486] PC is at __alloc_pages_nodemask+0x19c/0xc48
>>>> [   16.101013] LR is at __alloc_pages_nodemask+0xe0/0xc48
>>>> [   16.469220] [<ffff000008186b94>] __alloc_pages_nodemask+0x19c/0xc48
>>>> [   16.481854] [<ffff0000081d65b0>] alloc_pages_current+0x80/0xcc
>>>> [   16.493607] [<ffff000008182be8>] __get_free_pages+0xc/0x38
>>>> [   16.504661] [<ffff0000083c4d58>] swiotlb_alloc_coherent+0x64/0x190
>>>> [   16.517117] [<ffff00000809824c>] __dma_alloc+0x110/0x204
>>>> [   16.527820] [<ffff00000858e850>] dmam_alloc_coherent+0x88/0xf0
>>>> [   16.539575] [<ffff000008568884>]
>>> arm_smmu_domain_finalise_s1+0x60/0x248
>>>> [   16.552909] [<ffff00000856c104>] arm_smmu_attach_dev+0x264/0x300
>>>> [   16.565013] [<ffff00000855d40c>] __iommu_attach_device+0x48/0x5c
>>>> [   16.577117] [<ffff00000855e698>] iommu_group_add_device+0x144/0x3a4
>>>> [   16.589746] [<ffff00000855ed18>] iommu_group_get_for_dev+0x70/0xf8
>>>> [   16.602201] [<ffff00000856a314>] arm_smmu_add_device+0x1a4/0x418
>>>> [   16.614308] [<ffff00000849dfcc>] iort_iommu_configure+0xf0/0x16c
>>>> [   16.626416] [<ffff000008468c50>] acpi_dma_configure+0x30/0x70
>>>> [   16.637994] [<ffff00000858f00c>] dma_configure+0xa8/0xd4
>>>> [   16.648695] [<ffff00000857706c>] driver_probe_device+0x1a4/0x2dc
>>>> [   16.673081] [<ffff0000085752c8>] bus_for_each_drv+0x54/0x94
>>>> [   16.684307] [<ffff000008576db0>] __device_attach+0xc4/0x12c
>>>> [   16.695533] [<ffff000008577350>] device_initial_probe+0x10/0x18
>>>> [   16.707462] [<ffff0000085762b4>] bus_probe_device+0x90/0x98
>>>>
>>>> After a bit of debug it looks like on platforms where ssid is not supported,
>>>> s1_cfg.num_contexts is set to zero and it eventually results in this crash
>>>> in,
>>>> arm_smmu_domain_finalise_s1() -->arm_smmu_alloc_cd_tables()-->
>>>> arm_smmu_alloc_cd_leaf_table() as num_leaf_entries is zero.
>>>>
>>>> With the below fix, it works on D05 now,
>>>>
>>>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>>>> index 8ad90e2..51f5821 100644
>>>> --- a/drivers/iommu/arm-smmu-v3.c
>>>> +++ b/drivers/iommu/arm-smmu-v3.c
>>>> @@ -2433,7 +2433,10 @@ static int arm_smmu_domain_finalise(struct
>>> iommu_domain *domain,
>>>>                         domain->min_pasid = 1;
>>>>                         domain->max_pasid = master->num_ssids - 1;
>>>>                         smmu_domain->s1_cfg.num_contexts = master->num_ssids;
>>>> +               } else {
>>>> +                       smmu_domain->s1_cfg.num_contexts = 1;
>>>>                 }
>>>> +
>>>>                 smmu_domain->s1_cfg.can_stall = master->ste.can_stall;
>>>>                 break;
>>>>         case ARM_SMMU_DOMAIN_NESTED:
>>>>
>>>>
>>>> I am not sure this is right place do this. Please take a look.
>>>
>>> Thanks for testing the series and reporting the bug. I added the
>>> following patch to branch svm/current, does it work for you?
>>
>> Yes, it does.
>>
>> Thanks,
>> Shameer
>>  
>>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>>> index 42c8378624ed..edda466adc81 100644
>>> --- a/drivers/iommu/arm-smmu-v3.c
>>> +++ b/drivers/iommu/arm-smmu-v3.c
>>> @@ -3169,9 +3169,7 @@ static int arm_smmu_add_device(struct device *dev)
>>>                 }
>>>         }
>>>
>>> -       if (smmu->ssid_bits)
>>> -               master->num_ssids = 1 << min(smmu->ssid_bits,
>>> -                                            fwspec->num_pasid_bits);
>>> +       master->num_ssids = 1 << min(smmu->ssid_bits, fwspec-
>>>> num_pasid_bits);
> 
> If fwspec->num_pasid_bits = 0, then master have _one_ num_ssids ?

Yes, the context table allocator always needs to allocate at least one
entry, even if the master or SMMU doesn't support SSID. I think an earlier
version called this field "num_contexts", maybe we should got back to that
name for clarity?

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for Substream IDs
@ 2017-11-03  9:37             ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-03  9:37 UTC (permalink / raw)
  To: linux-arm-kernel

On 03/11/17 05:45, Yisheng Xie wrote:
> Hi Jean,
> 
> On 2017/11/3 1:02, Shameerali Kolothum Thodi wrote:
>>
>>
>>> -----Original Message-----
>>> From: Jean-Philippe Brucker [mailto:Jean-Philippe.Brucker at arm.com]
>>> Sent: Thursday, November 02, 2017 3:52 PM
>>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
>>> Cc: linux-arm-kernel at lists.infradead.org; linux-pci at vger.kernel.org; linux-
>>> acpi at vger.kernel.org; devicetree at vger.kernel.org; iommu at lists.linux-
>>> foundation.org; Mark Rutland <Mark.Rutland@arm.com>; xieyisheng (A)
>>> <xieyisheng1@huawei.com>; Gabriele Paoloni
>>> <gabriele.paoloni@huawei.com>; Catalin Marinas
>>> <Catalin.Marinas@arm.com>; Will Deacon <Will.Deacon@arm.com>;
>>> okaya at codeaurora.org; yi.l.liu at intel.com; Lorenzo Pieralisi
>>> <Lorenzo.Pieralisi@arm.com>; ashok.raj at intel.com; tn at semihalf.com;
>>> joro at 8bytes.org; rfranz at cavium.com; lenb at kernel.org;
>>> jacob.jun.pan at linux.intel.com; alex.williamson at redhat.com;
>>> robh+dt at kernel.org; Leizhen (ThunderTown) <thunder.leizhen@huawei.com>;
>>> bhelgaas at google.com; dwmw2 at infradead.org; liubo (CU)
>>> <liubo95@huawei.com>; rjw at rjwysocki.net; robdclark at gmail.com;
>>> hanjun.guo at linaro.org; Sudeep Holla <Sudeep.Holla@arm.com>; Robin
>>> Murphy <Robin.Murphy@arm.com>; nwatters at codeaurora.org; Linuxarm
>>> <linuxarm@huawei.com>
>>> Subject: Re: [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for
>>> Substream IDs
>>>
>>> Hi Shameer,
>>>
>>> On Thu, Nov 02, 2017 at 12:49:32PM +0000, Shameerali Kolothum Thodi wrote:
>>>> We had a go with this series on HiSIlicon D05 platform which doesn't have
>>>> support for ssids/ATS/PRI, to make sure it generally works.
>>>>
>>>> But observed the below crash on boot,
>>>>
>>>> [   16.009084] WARNING: CPU: 59 PID: 391 at mm/page_alloc.c:3883
>>> __alloc_pages_nodemask+0x19c/0xc48
>>>> [   16.026797] Modules linked in:
>>>> [   16.032944] CPU: 59 PID: 391 Comm: kworker/59:1 Not tainted 4.14.0-rc1-
>>> 159539-ge42aca3 #236
>>>> [...]
>>>> [   16.068206] Workqueue: events deferred_probe_work_func
>>>> [   16.078557] task: ffff8017d38a0000 task.stack: ffff00000b198000
>>>> [   16.090486] PC is at __alloc_pages_nodemask+0x19c/0xc48
>>>> [   16.101013] LR is at __alloc_pages_nodemask+0xe0/0xc48
>>>> [   16.469220] [<ffff000008186b94>] __alloc_pages_nodemask+0x19c/0xc48
>>>> [   16.481854] [<ffff0000081d65b0>] alloc_pages_current+0x80/0xcc
>>>> [   16.493607] [<ffff000008182be8>] __get_free_pages+0xc/0x38
>>>> [   16.504661] [<ffff0000083c4d58>] swiotlb_alloc_coherent+0x64/0x190
>>>> [   16.517117] [<ffff00000809824c>] __dma_alloc+0x110/0x204
>>>> [   16.527820] [<ffff00000858e850>] dmam_alloc_coherent+0x88/0xf0
>>>> [   16.539575] [<ffff000008568884>]
>>> arm_smmu_domain_finalise_s1+0x60/0x248
>>>> [   16.552909] [<ffff00000856c104>] arm_smmu_attach_dev+0x264/0x300
>>>> [   16.565013] [<ffff00000855d40c>] __iommu_attach_device+0x48/0x5c
>>>> [   16.577117] [<ffff00000855e698>] iommu_group_add_device+0x144/0x3a4
>>>> [   16.589746] [<ffff00000855ed18>] iommu_group_get_for_dev+0x70/0xf8
>>>> [   16.602201] [<ffff00000856a314>] arm_smmu_add_device+0x1a4/0x418
>>>> [   16.614308] [<ffff00000849dfcc>] iort_iommu_configure+0xf0/0x16c
>>>> [   16.626416] [<ffff000008468c50>] acpi_dma_configure+0x30/0x70
>>>> [   16.637994] [<ffff00000858f00c>] dma_configure+0xa8/0xd4
>>>> [   16.648695] [<ffff00000857706c>] driver_probe_device+0x1a4/0x2dc
>>>> [   16.673081] [<ffff0000085752c8>] bus_for_each_drv+0x54/0x94
>>>> [   16.684307] [<ffff000008576db0>] __device_attach+0xc4/0x12c
>>>> [   16.695533] [<ffff000008577350>] device_initial_probe+0x10/0x18
>>>> [   16.707462] [<ffff0000085762b4>] bus_probe_device+0x90/0x98
>>>>
>>>> After a bit of debug it looks like on platforms where ssid is not supported,
>>>> s1_cfg.num_contexts is set to zero and it eventually results in this crash
>>>> in,
>>>> arm_smmu_domain_finalise_s1() -->arm_smmu_alloc_cd_tables()-->
>>>> arm_smmu_alloc_cd_leaf_table() as num_leaf_entries is zero.
>>>>
>>>> With the below fix, it works on D05 now,
>>>>
>>>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>>>> index 8ad90e2..51f5821 100644
>>>> --- a/drivers/iommu/arm-smmu-v3.c
>>>> +++ b/drivers/iommu/arm-smmu-v3.c
>>>> @@ -2433,7 +2433,10 @@ static int arm_smmu_domain_finalise(struct
>>> iommu_domain *domain,
>>>>                         domain->min_pasid = 1;
>>>>                         domain->max_pasid = master->num_ssids - 1;
>>>>                         smmu_domain->s1_cfg.num_contexts = master->num_ssids;
>>>> +               } else {
>>>> +                       smmu_domain->s1_cfg.num_contexts = 1;
>>>>                 }
>>>> +
>>>>                 smmu_domain->s1_cfg.can_stall = master->ste.can_stall;
>>>>                 break;
>>>>         case ARM_SMMU_DOMAIN_NESTED:
>>>>
>>>>
>>>> I am not sure this is right place do this. Please take a look.
>>>
>>> Thanks for testing the series and reporting the bug. I added the
>>> following patch to branch svm/current, does it work for you?
>>
>> Yes, it does.
>>
>> Thanks,
>> Shameer
>>  
>>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>>> index 42c8378624ed..edda466adc81 100644
>>> --- a/drivers/iommu/arm-smmu-v3.c
>>> +++ b/drivers/iommu/arm-smmu-v3.c
>>> @@ -3169,9 +3169,7 @@ static int arm_smmu_add_device(struct device *dev)
>>>                 }
>>>         }
>>>
>>> -       if (smmu->ssid_bits)
>>> -               master->num_ssids = 1 << min(smmu->ssid_bits,
>>> -                                            fwspec->num_pasid_bits);
>>> +       master->num_ssids = 1 << min(smmu->ssid_bits, fwspec-
>>>> num_pasid_bits);
> 
> If fwspec->num_pasid_bits = 0, then master have _one_ num_ssids ?

Yes, the context table allocator always needs to allocate at least one
entry, even if the master or SMMU doesn't support SSID. I think an earlier
version called this field "num_contexts", maybe we should got back to that
name for clarity?

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* RE: [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for Substream IDs
  2017-11-03  9:37             ` Jean-Philippe Brucker
  (?)
@ 2017-11-03  9:39               ` Shameerali Kolothum Thodi
  -1 siblings, 0 replies; 268+ messages in thread
From: Shameerali Kolothum Thodi @ 2017-11-03  9:39 UTC (permalink / raw)
  To: Jean-Philippe Brucker, xieyisheng (A)
  Cc: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu,
	Mark Rutland, Gabriele Paoloni, Catalin Marinas, Will Deacon,
	okaya, yi.l.liu, Lorenzo Pieralisi, ashok.raj, tn, joro,
	rfranz@cavium.com



> -----Original Message-----
> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
> Sent: Friday, November 03, 2017 9:37 AM
> To: xieyisheng (A) <xieyisheng1@huawei.com>; Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>
> Cc: linux-arm-kernel@lists.infradead.org; linux-pci@vger.kernel.org; linux-
> acpi@vger.kernel.org; devicetree@vger.kernel.org; iommu@lists.linux-
> foundation.org; Mark Rutland <Mark.Rutland@arm.com>; Gabriele Paoloni
> <gabriele.paoloni@huawei.com>; Catalin Marinas
> <Catalin.Marinas@arm.com>; Will Deacon <Will.Deacon@arm.com>;
> okaya@codeaurora.org; yi.l.liu@intel.com; Lorenzo Pieralisi
> <Lorenzo.Pieralisi@arm.com>; ashok.raj@intel.com; tn@semihalf.com;
> joro@8bytes.org; rfranz@cavium.com; lenb@kernel.org;
> jacob.jun.pan@linux.intel.com; alex.williamson@redhat.com;
> robh+dt@kernel.org; Leizhen (ThunderTown) <thunder.leizhen@huawei.com>;
> bhelgaas@google.com; dwmw2@infradead.org; liubo (CU)
> <liubo95@huawei.com>; rjw@rjwysocki.net; robdclark@gmail.com;
> hanjun.guo@linaro.org; Sudeep Holla <Sudeep.Holla@arm.com>; Robin
> Murphy <Robin.Murphy@arm.com>; nwatters@codeaurora.org; Linuxarm
> <linuxarm@huawei.com>
> Subject: Re: [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for
> Substream IDs
> 
> On 03/11/17 05:45, Yisheng Xie wrote:
> > Hi Jean,
> >
> > On 2017/11/3 1:02, Shameerali Kolothum Thodi wrote:
> >>
> >>
> >>> -----Original Message-----
> >>> From: Jean-Philippe Brucker [mailto:Jean-Philippe.Brucker@arm.com]
> >>> Sent: Thursday, November 02, 2017 3:52 PM
> >>> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>
> >>> Cc: linux-arm-kernel@lists.infradead.org; linux-pci@vger.kernel.org; linux-
> >>> acpi@vger.kernel.org; devicetree@vger.kernel.org; iommu@lists.linux-
> >>> foundation.org; Mark Rutland <Mark.Rutland@arm.com>; xieyisheng (A)
> >>> <xieyisheng1@huawei.com>; Gabriele Paoloni
> >>> <gabriele.paoloni@huawei.com>; Catalin Marinas
> >>> <Catalin.Marinas@arm.com>; Will Deacon <Will.Deacon@arm.com>;
> >>> okaya@codeaurora.org; yi.l.liu@intel.com; Lorenzo Pieralisi
> >>> <Lorenzo.Pieralisi@arm.com>; ashok.raj@intel.com; tn@semihalf.com;
> >>> joro@8bytes.org; rfranz@cavium.com; lenb@kernel.org;
> >>> jacob.jun.pan@linux.intel.com; alex.williamson@redhat.com;
> >>> robh+dt@kernel.org; Leizhen (ThunderTown)
> <thunder.leizhen@huawei.com>;
> >>> bhelgaas@google.com; dwmw2@infradead.org; liubo (CU)
> >>> <liubo95@huawei.com>; rjw@rjwysocki.net; robdclark@gmail.com;
> >>> hanjun.guo@linaro.org; Sudeep Holla <Sudeep.Holla@arm.com>; Robin
> >>> Murphy <Robin.Murphy@arm.com>; nwatters@codeaurora.org; Linuxarm
> >>> <linuxarm@huawei.com>
> >>> Subject: Re: [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for
> >>> Substream IDs
> >>>
> >>> Hi Shameer,
> >>>
> >>> On Thu, Nov 02, 2017 at 12:49:32PM +0000, Shameerali Kolothum Thodi
> wrote:
> >>>> We had a go with this series on HiSIlicon D05 platform which doesn't have
> >>>> support for ssids/ATS/PRI, to make sure it generally works.
> >>>>
> >>>> But observed the below crash on boot,
> >>>>
> >>>> [   16.009084] WARNING: CPU: 59 PID: 391 at mm/page_alloc.c:3883
> >>> __alloc_pages_nodemask+0x19c/0xc48
> >>>> [   16.026797] Modules linked in:
> >>>> [   16.032944] CPU: 59 PID: 391 Comm: kworker/59:1 Not tainted 4.14.0-
> rc1-
> >>> 159539-ge42aca3 #236
> >>>> [...]
> >>>> [   16.068206] Workqueue: events deferred_probe_work_func
> >>>> [   16.078557] task: ffff8017d38a0000 task.stack: ffff00000b198000
> >>>> [   16.090486] PC is at __alloc_pages_nodemask+0x19c/0xc48
> >>>> [   16.101013] LR is at __alloc_pages_nodemask+0xe0/0xc48
> >>>> [   16.469220] [<ffff000008186b94>]
> __alloc_pages_nodemask+0x19c/0xc48
> >>>> [   16.481854] [<ffff0000081d65b0>] alloc_pages_current+0x80/0xcc
> >>>> [   16.493607] [<ffff000008182be8>] __get_free_pages+0xc/0x38
> >>>> [   16.504661] [<ffff0000083c4d58>] swiotlb_alloc_coherent+0x64/0x190
> >>>> [   16.517117] [<ffff00000809824c>] __dma_alloc+0x110/0x204
> >>>> [   16.527820] [<ffff00000858e850>] dmam_alloc_coherent+0x88/0xf0
> >>>> [   16.539575] [<ffff000008568884>]
> >>> arm_smmu_domain_finalise_s1+0x60/0x248
> >>>> [   16.552909] [<ffff00000856c104>]
> arm_smmu_attach_dev+0x264/0x300
> >>>> [   16.565013] [<ffff00000855d40c>] __iommu_attach_device+0x48/0x5c
> >>>> [   16.577117] [<ffff00000855e698>]
> iommu_group_add_device+0x144/0x3a4
> >>>> [   16.589746] [<ffff00000855ed18>]
> iommu_group_get_for_dev+0x70/0xf8
> >>>> [   16.602201] [<ffff00000856a314>]
> arm_smmu_add_device+0x1a4/0x418
> >>>> [   16.614308] [<ffff00000849dfcc>] iort_iommu_configure+0xf0/0x16c
> >>>> [   16.626416] [<ffff000008468c50>] acpi_dma_configure+0x30/0x70
> >>>> [   16.637994] [<ffff00000858f00c>] dma_configure+0xa8/0xd4
> >>>> [   16.648695] [<ffff00000857706c>] driver_probe_device+0x1a4/0x2dc
> >>>> [   16.673081] [<ffff0000085752c8>] bus_for_each_drv+0x54/0x94
> >>>> [   16.684307] [<ffff000008576db0>] __device_attach+0xc4/0x12c
> >>>> [   16.695533] [<ffff000008577350>] device_initial_probe+0x10/0x18
> >>>> [   16.707462] [<ffff0000085762b4>] bus_probe_device+0x90/0x98
> >>>>
> >>>> After a bit of debug it looks like on platforms where ssid is not supported,
> >>>> s1_cfg.num_contexts is set to zero and it eventually results in this crash
> >>>> in,
> >>>> arm_smmu_domain_finalise_s1() -->arm_smmu_alloc_cd_tables()-->
> >>>> arm_smmu_alloc_cd_leaf_table() as num_leaf_entries is zero.
> >>>>
> >>>> With the below fix, it works on D05 now,
> >>>>
> >>>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-
> v3.c
> >>>> index 8ad90e2..51f5821 100644
> >>>> --- a/drivers/iommu/arm-smmu-v3.c
> >>>> +++ b/drivers/iommu/arm-smmu-v3.c
> >>>> @@ -2433,7 +2433,10 @@ static int arm_smmu_domain_finalise(struct
> >>> iommu_domain *domain,
> >>>>                         domain->min_pasid = 1;
> >>>>                         domain->max_pasid = master->num_ssids - 1;
> >>>>                         smmu_domain->s1_cfg.num_contexts = master-
> >num_ssids;
> >>>> +               } else {
> >>>> +                       smmu_domain->s1_cfg.num_contexts = 1;
> >>>>                 }
> >>>> +
> >>>>                 smmu_domain->s1_cfg.can_stall = master->ste.can_stall;
> >>>>                 break;
> >>>>         case ARM_SMMU_DOMAIN_NESTED:
> >>>>
> >>>>
> >>>> I am not sure this is right place do this. Please take a look.
> >>>
> >>> Thanks for testing the series and reporting the bug. I added the
> >>> following patch to branch svm/current, does it work for you?
> >>
> >> Yes, it does.
> >>
> >> Thanks,
> >> Shameer
> >>
> >>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-
> v3.c
> >>> index 42c8378624ed..edda466adc81 100644
> >>> --- a/drivers/iommu/arm-smmu-v3.c
> >>> +++ b/drivers/iommu/arm-smmu-v3.c
> >>> @@ -3169,9 +3169,7 @@ static int arm_smmu_add_device(struct device
> *dev)
> >>>                 }
> >>>         }
> >>>
> >>> -       if (smmu->ssid_bits)
> >>> -               master->num_ssids = 1 << min(smmu->ssid_bits,
> >>> -                                            fwspec->num_pasid_bits);
> >>> +       master->num_ssids = 1 << min(smmu->ssid_bits, fwspec-
> >>>> num_pasid_bits);
> >
> > If fwspec->num_pasid_bits = 0, then master have _one_ num_ssids ?
> 
> Yes, the context table allocator always needs to allocate at least one
> entry, even if the master or SMMU doesn't support SSID. I think an earlier
> version called this field "num_contexts", maybe we should got back to that
> name for clarity?

+1 for that. As ssid can be zero as per the spec, num_ssids=1 will be slightly misleading.

Thanks,
Shameer


^ permalink raw reply	[flat|nested] 268+ messages in thread

* RE: [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for Substream IDs
@ 2017-11-03  9:39               ` Shameerali Kolothum Thodi
  0 siblings, 0 replies; 268+ messages in thread
From: Shameerali Kolothum Thodi @ 2017-11-03  9:39 UTC (permalink / raw)
  To: Jean-Philippe Brucker, xieyisheng (A)
  Cc: Mark Rutland, Gabriele Paoloni, linux-pci, Will Deacon, Linuxarm,
	okaya, Lorenzo Pieralisi, yi.l.liu, ashok.raj, tn, joro,
	robdclark, linux-acpi, Catalin Marinas, rfranz, lenb, devicetree,
	jacob.jun.pan, alex.williamson, robh+dt, Leizhen (ThunderTown),
	bhelgaas, linux-arm-kernel, Robin Murphy, liubo (CU),
	rjw, iommu, hanjun.guo, Sudeep Holla, dwmw2, nwatters



> -----Original Message-----
> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
> Sent: Friday, November 03, 2017 9:37 AM
> To: xieyisheng (A) <xieyisheng1@huawei.com>; Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>
> Cc: linux-arm-kernel@lists.infradead.org; linux-pci@vger.kernel.org; linux-
> acpi@vger.kernel.org; devicetree@vger.kernel.org; iommu@lists.linux-
> foundation.org; Mark Rutland <Mark.Rutland@arm.com>; Gabriele Paoloni
> <gabriele.paoloni@huawei.com>; Catalin Marinas
> <Catalin.Marinas@arm.com>; Will Deacon <Will.Deacon@arm.com>;
> okaya@codeaurora.org; yi.l.liu@intel.com; Lorenzo Pieralisi
> <Lorenzo.Pieralisi@arm.com>; ashok.raj@intel.com; tn@semihalf.com;
> joro@8bytes.org; rfranz@cavium.com; lenb@kernel.org;
> jacob.jun.pan@linux.intel.com; alex.williamson@redhat.com;
> robh+dt@kernel.org; Leizhen (ThunderTown) <thunder.leizhen@huawei.com>;
> bhelgaas@google.com; dwmw2@infradead.org; liubo (CU)
> <liubo95@huawei.com>; rjw@rjwysocki.net; robdclark@gmail.com;
> hanjun.guo@linaro.org; Sudeep Holla <Sudeep.Holla@arm.com>; Robin
> Murphy <Robin.Murphy@arm.com>; nwatters@codeaurora.org; Linuxarm
> <linuxarm@huawei.com>
> Subject: Re: [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for
> Substream IDs
> 
> On 03/11/17 05:45, Yisheng Xie wrote:
> > Hi Jean,
> >
> > On 2017/11/3 1:02, Shameerali Kolothum Thodi wrote:
> >>
> >>
> >>> -----Original Message-----
> >>> From: Jean-Philippe Brucker [mailto:Jean-Philippe.Brucker@arm.com]
> >>> Sent: Thursday, November 02, 2017 3:52 PM
> >>> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>
> >>> Cc: linux-arm-kernel@lists.infradead.org; linux-pci@vger.kernel.org; linux-
> >>> acpi@vger.kernel.org; devicetree@vger.kernel.org; iommu@lists.linux-
> >>> foundation.org; Mark Rutland <Mark.Rutland@arm.com>; xieyisheng (A)
> >>> <xieyisheng1@huawei.com>; Gabriele Paoloni
> >>> <gabriele.paoloni@huawei.com>; Catalin Marinas
> >>> <Catalin.Marinas@arm.com>; Will Deacon <Will.Deacon@arm.com>;
> >>> okaya@codeaurora.org; yi.l.liu@intel.com; Lorenzo Pieralisi
> >>> <Lorenzo.Pieralisi@arm.com>; ashok.raj@intel.com; tn@semihalf.com;
> >>> joro@8bytes.org; rfranz@cavium.com; lenb@kernel.org;
> >>> jacob.jun.pan@linux.intel.com; alex.williamson@redhat.com;
> >>> robh+dt@kernel.org; Leizhen (ThunderTown)
> <thunder.leizhen@huawei.com>;
> >>> bhelgaas@google.com; dwmw2@infradead.org; liubo (CU)
> >>> <liubo95@huawei.com>; rjw@rjwysocki.net; robdclark@gmail.com;
> >>> hanjun.guo@linaro.org; Sudeep Holla <Sudeep.Holla@arm.com>; Robin
> >>> Murphy <Robin.Murphy@arm.com>; nwatters@codeaurora.org; Linuxarm
> >>> <linuxarm@huawei.com>
> >>> Subject: Re: [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for
> >>> Substream IDs
> >>>
> >>> Hi Shameer,
> >>>
> >>> On Thu, Nov 02, 2017 at 12:49:32PM +0000, Shameerali Kolothum Thodi
> wrote:
> >>>> We had a go with this series on HiSIlicon D05 platform which doesn't have
> >>>> support for ssids/ATS/PRI, to make sure it generally works.
> >>>>
> >>>> But observed the below crash on boot,
> >>>>
> >>>> [   16.009084] WARNING: CPU: 59 PID: 391 at mm/page_alloc.c:3883
> >>> __alloc_pages_nodemask+0x19c/0xc48
> >>>> [   16.026797] Modules linked in:
> >>>> [   16.032944] CPU: 59 PID: 391 Comm: kworker/59:1 Not tainted 4.14.0-
> rc1-
> >>> 159539-ge42aca3 #236
> >>>> [...]
> >>>> [   16.068206] Workqueue: events deferred_probe_work_func
> >>>> [   16.078557] task: ffff8017d38a0000 task.stack: ffff00000b198000
> >>>> [   16.090486] PC is at __alloc_pages_nodemask+0x19c/0xc48
> >>>> [   16.101013] LR is at __alloc_pages_nodemask+0xe0/0xc48
> >>>> [   16.469220] [<ffff000008186b94>]
> __alloc_pages_nodemask+0x19c/0xc48
> >>>> [   16.481854] [<ffff0000081d65b0>] alloc_pages_current+0x80/0xcc
> >>>> [   16.493607] [<ffff000008182be8>] __get_free_pages+0xc/0x38
> >>>> [   16.504661] [<ffff0000083c4d58>] swiotlb_alloc_coherent+0x64/0x190
> >>>> [   16.517117] [<ffff00000809824c>] __dma_alloc+0x110/0x204
> >>>> [   16.527820] [<ffff00000858e850>] dmam_alloc_coherent+0x88/0xf0
> >>>> [   16.539575] [<ffff000008568884>]
> >>> arm_smmu_domain_finalise_s1+0x60/0x248
> >>>> [   16.552909] [<ffff00000856c104>]
> arm_smmu_attach_dev+0x264/0x300
> >>>> [   16.565013] [<ffff00000855d40c>] __iommu_attach_device+0x48/0x5c
> >>>> [   16.577117] [<ffff00000855e698>]
> iommu_group_add_device+0x144/0x3a4
> >>>> [   16.589746] [<ffff00000855ed18>]
> iommu_group_get_for_dev+0x70/0xf8
> >>>> [   16.602201] [<ffff00000856a314>]
> arm_smmu_add_device+0x1a4/0x418
> >>>> [   16.614308] [<ffff00000849dfcc>] iort_iommu_configure+0xf0/0x16c
> >>>> [   16.626416] [<ffff000008468c50>] acpi_dma_configure+0x30/0x70
> >>>> [   16.637994] [<ffff00000858f00c>] dma_configure+0xa8/0xd4
> >>>> [   16.648695] [<ffff00000857706c>] driver_probe_device+0x1a4/0x2dc
> >>>> [   16.673081] [<ffff0000085752c8>] bus_for_each_drv+0x54/0x94
> >>>> [   16.684307] [<ffff000008576db0>] __device_attach+0xc4/0x12c
> >>>> [   16.695533] [<ffff000008577350>] device_initial_probe+0x10/0x18
> >>>> [   16.707462] [<ffff0000085762b4>] bus_probe_device+0x90/0x98
> >>>>
> >>>> After a bit of debug it looks like on platforms where ssid is not supported,
> >>>> s1_cfg.num_contexts is set to zero and it eventually results in this crash
> >>>> in,
> >>>> arm_smmu_domain_finalise_s1() -->arm_smmu_alloc_cd_tables()-->
> >>>> arm_smmu_alloc_cd_leaf_table() as num_leaf_entries is zero.
> >>>>
> >>>> With the below fix, it works on D05 now,
> >>>>
> >>>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-
> v3.c
> >>>> index 8ad90e2..51f5821 100644
> >>>> --- a/drivers/iommu/arm-smmu-v3.c
> >>>> +++ b/drivers/iommu/arm-smmu-v3.c
> >>>> @@ -2433,7 +2433,10 @@ static int arm_smmu_domain_finalise(struct
> >>> iommu_domain *domain,
> >>>>                         domain->min_pasid = 1;
> >>>>                         domain->max_pasid = master->num_ssids - 1;
> >>>>                         smmu_domain->s1_cfg.num_contexts = master-
> >num_ssids;
> >>>> +               } else {
> >>>> +                       smmu_domain->s1_cfg.num_contexts = 1;
> >>>>                 }
> >>>> +
> >>>>                 smmu_domain->s1_cfg.can_stall = master->ste.can_stall;
> >>>>                 break;
> >>>>         case ARM_SMMU_DOMAIN_NESTED:
> >>>>
> >>>>
> >>>> I am not sure this is right place do this. Please take a look.
> >>>
> >>> Thanks for testing the series and reporting the bug. I added the
> >>> following patch to branch svm/current, does it work for you?
> >>
> >> Yes, it does.
> >>
> >> Thanks,
> >> Shameer
> >>
> >>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-
> v3.c
> >>> index 42c8378624ed..edda466adc81 100644
> >>> --- a/drivers/iommu/arm-smmu-v3.c
> >>> +++ b/drivers/iommu/arm-smmu-v3.c
> >>> @@ -3169,9 +3169,7 @@ static int arm_smmu_add_device(struct device
> *dev)
> >>>                 }
> >>>         }
> >>>
> >>> -       if (smmu->ssid_bits)
> >>> -               master->num_ssids = 1 << min(smmu->ssid_bits,
> >>> -                                            fwspec->num_pasid_bits);
> >>> +       master->num_ssids = 1 << min(smmu->ssid_bits, fwspec-
> >>>> num_pasid_bits);
> >
> > If fwspec->num_pasid_bits = 0, then master have _one_ num_ssids ?
> 
> Yes, the context table allocator always needs to allocate at least one
> entry, even if the master or SMMU doesn't support SSID. I think an earlier
> version called this field "num_contexts", maybe we should got back to that
> name for clarity?

+1 for that. As ssid can be zero as per the spec, num_ssids=1 will be slightly misleading.

Thanks,
Shameer

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for Substream IDs
@ 2017-11-03  9:39               ` Shameerali Kolothum Thodi
  0 siblings, 0 replies; 268+ messages in thread
From: Shameerali Kolothum Thodi @ 2017-11-03  9:39 UTC (permalink / raw)
  To: linux-arm-kernel



> -----Original Message-----
> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker at arm.com]
> Sent: Friday, November 03, 2017 9:37 AM
> To: xieyisheng (A) <xieyisheng1@huawei.com>; Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>
> Cc: linux-arm-kernel at lists.infradead.org; linux-pci at vger.kernel.org; linux-
> acpi at vger.kernel.org; devicetree at vger.kernel.org; iommu at lists.linux-
> foundation.org; Mark Rutland <Mark.Rutland@arm.com>; Gabriele Paoloni
> <gabriele.paoloni@huawei.com>; Catalin Marinas
> <Catalin.Marinas@arm.com>; Will Deacon <Will.Deacon@arm.com>;
> okaya at codeaurora.org; yi.l.liu at intel.com; Lorenzo Pieralisi
> <Lorenzo.Pieralisi@arm.com>; ashok.raj at intel.com; tn at semihalf.com;
> joro at 8bytes.org; rfranz at cavium.com; lenb at kernel.org;
> jacob.jun.pan at linux.intel.com; alex.williamson at redhat.com;
> robh+dt at kernel.org; Leizhen (ThunderTown) <thunder.leizhen@huawei.com>;
> bhelgaas at google.com; dwmw2 at infradead.org; liubo (CU)
> <liubo95@huawei.com>; rjw at rjwysocki.net; robdclark at gmail.com;
> hanjun.guo at linaro.org; Sudeep Holla <Sudeep.Holla@arm.com>; Robin
> Murphy <Robin.Murphy@arm.com>; nwatters at codeaurora.org; Linuxarm
> <linuxarm@huawei.com>
> Subject: Re: [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for
> Substream IDs
> 
> On 03/11/17 05:45, Yisheng Xie wrote:
> > Hi Jean,
> >
> > On 2017/11/3 1:02, Shameerali Kolothum Thodi wrote:
> >>
> >>
> >>> -----Original Message-----
> >>> From: Jean-Philippe Brucker [mailto:Jean-Philippe.Brucker at arm.com]
> >>> Sent: Thursday, November 02, 2017 3:52 PM
> >>> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>
> >>> Cc: linux-arm-kernel at lists.infradead.org; linux-pci at vger.kernel.org; linux-
> >>> acpi at vger.kernel.org; devicetree at vger.kernel.org; iommu at lists.linux-
> >>> foundation.org; Mark Rutland <Mark.Rutland@arm.com>; xieyisheng (A)
> >>> <xieyisheng1@huawei.com>; Gabriele Paoloni
> >>> <gabriele.paoloni@huawei.com>; Catalin Marinas
> >>> <Catalin.Marinas@arm.com>; Will Deacon <Will.Deacon@arm.com>;
> >>> okaya at codeaurora.org; yi.l.liu at intel.com; Lorenzo Pieralisi
> >>> <Lorenzo.Pieralisi@arm.com>; ashok.raj at intel.com; tn at semihalf.com;
> >>> joro at 8bytes.org; rfranz at cavium.com; lenb at kernel.org;
> >>> jacob.jun.pan at linux.intel.com; alex.williamson at redhat.com;
> >>> robh+dt at kernel.org; Leizhen (ThunderTown)
> <thunder.leizhen@huawei.com>;
> >>> bhelgaas at google.com; dwmw2 at infradead.org; liubo (CU)
> >>> <liubo95@huawei.com>; rjw at rjwysocki.net; robdclark at gmail.com;
> >>> hanjun.guo at linaro.org; Sudeep Holla <Sudeep.Holla@arm.com>; Robin
> >>> Murphy <Robin.Murphy@arm.com>; nwatters at codeaurora.org; Linuxarm
> >>> <linuxarm@huawei.com>
> >>> Subject: Re: [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for
> >>> Substream IDs
> >>>
> >>> Hi Shameer,
> >>>
> >>> On Thu, Nov 02, 2017 at 12:49:32PM +0000, Shameerali Kolothum Thodi
> wrote:
> >>>> We had a go with this series on HiSIlicon D05 platform which doesn't have
> >>>> support for ssids/ATS/PRI, to make sure it generally works.
> >>>>
> >>>> But observed the below crash on boot,
> >>>>
> >>>> [   16.009084] WARNING: CPU: 59 PID: 391 at mm/page_alloc.c:3883
> >>> __alloc_pages_nodemask+0x19c/0xc48
> >>>> [   16.026797] Modules linked in:
> >>>> [   16.032944] CPU: 59 PID: 391 Comm: kworker/59:1 Not tainted 4.14.0-
> rc1-
> >>> 159539-ge42aca3 #236
> >>>> [...]
> >>>> [   16.068206] Workqueue: events deferred_probe_work_func
> >>>> [   16.078557] task: ffff8017d38a0000 task.stack: ffff00000b198000
> >>>> [   16.090486] PC is at __alloc_pages_nodemask+0x19c/0xc48
> >>>> [   16.101013] LR is at __alloc_pages_nodemask+0xe0/0xc48
> >>>> [   16.469220] [<ffff000008186b94>]
> __alloc_pages_nodemask+0x19c/0xc48
> >>>> [   16.481854] [<ffff0000081d65b0>] alloc_pages_current+0x80/0xcc
> >>>> [   16.493607] [<ffff000008182be8>] __get_free_pages+0xc/0x38
> >>>> [   16.504661] [<ffff0000083c4d58>] swiotlb_alloc_coherent+0x64/0x190
> >>>> [   16.517117] [<ffff00000809824c>] __dma_alloc+0x110/0x204
> >>>> [   16.527820] [<ffff00000858e850>] dmam_alloc_coherent+0x88/0xf0
> >>>> [   16.539575] [<ffff000008568884>]
> >>> arm_smmu_domain_finalise_s1+0x60/0x248
> >>>> [   16.552909] [<ffff00000856c104>]
> arm_smmu_attach_dev+0x264/0x300
> >>>> [   16.565013] [<ffff00000855d40c>] __iommu_attach_device+0x48/0x5c
> >>>> [   16.577117] [<ffff00000855e698>]
> iommu_group_add_device+0x144/0x3a4
> >>>> [   16.589746] [<ffff00000855ed18>]
> iommu_group_get_for_dev+0x70/0xf8
> >>>> [   16.602201] [<ffff00000856a314>]
> arm_smmu_add_device+0x1a4/0x418
> >>>> [   16.614308] [<ffff00000849dfcc>] iort_iommu_configure+0xf0/0x16c
> >>>> [   16.626416] [<ffff000008468c50>] acpi_dma_configure+0x30/0x70
> >>>> [   16.637994] [<ffff00000858f00c>] dma_configure+0xa8/0xd4
> >>>> [   16.648695] [<ffff00000857706c>] driver_probe_device+0x1a4/0x2dc
> >>>> [   16.673081] [<ffff0000085752c8>] bus_for_each_drv+0x54/0x94
> >>>> [   16.684307] [<ffff000008576db0>] __device_attach+0xc4/0x12c
> >>>> [   16.695533] [<ffff000008577350>] device_initial_probe+0x10/0x18
> >>>> [   16.707462] [<ffff0000085762b4>] bus_probe_device+0x90/0x98
> >>>>
> >>>> After a bit of debug it looks like on platforms where ssid is not supported,
> >>>> s1_cfg.num_contexts is set to zero and it eventually results in this crash
> >>>> in,
> >>>> arm_smmu_domain_finalise_s1() -->arm_smmu_alloc_cd_tables()-->
> >>>> arm_smmu_alloc_cd_leaf_table() as num_leaf_entries is zero.
> >>>>
> >>>> With the below fix, it works on D05 now,
> >>>>
> >>>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-
> v3.c
> >>>> index 8ad90e2..51f5821 100644
> >>>> --- a/drivers/iommu/arm-smmu-v3.c
> >>>> +++ b/drivers/iommu/arm-smmu-v3.c
> >>>> @@ -2433,7 +2433,10 @@ static int arm_smmu_domain_finalise(struct
> >>> iommu_domain *domain,
> >>>>                         domain->min_pasid = 1;
> >>>>                         domain->max_pasid = master->num_ssids - 1;
> >>>>                         smmu_domain->s1_cfg.num_contexts = master-
> >num_ssids;
> >>>> +               } else {
> >>>> +                       smmu_domain->s1_cfg.num_contexts = 1;
> >>>>                 }
> >>>> +
> >>>>                 smmu_domain->s1_cfg.can_stall = master->ste.can_stall;
> >>>>                 break;
> >>>>         case ARM_SMMU_DOMAIN_NESTED:
> >>>>
> >>>>
> >>>> I am not sure this is right place do this. Please take a look.
> >>>
> >>> Thanks for testing the series and reporting the bug. I added the
> >>> following patch to branch svm/current, does it work for you?
> >>
> >> Yes, it does.
> >>
> >> Thanks,
> >> Shameer
> >>
> >>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-
> v3.c
> >>> index 42c8378624ed..edda466adc81 100644
> >>> --- a/drivers/iommu/arm-smmu-v3.c
> >>> +++ b/drivers/iommu/arm-smmu-v3.c
> >>> @@ -3169,9 +3169,7 @@ static int arm_smmu_add_device(struct device
> *dev)
> >>>                 }
> >>>         }
> >>>
> >>> -       if (smmu->ssid_bits)
> >>> -               master->num_ssids = 1 << min(smmu->ssid_bits,
> >>> -                                            fwspec->num_pasid_bits);
> >>> +       master->num_ssids = 1 << min(smmu->ssid_bits, fwspec-
> >>>> num_pasid_bits);
> >
> > If fwspec->num_pasid_bits = 0, then master have _one_ num_ssids ?
> 
> Yes, the context table allocator always needs to allocate at least one
> entry, even if the master or SMMU doesn't support SSID. I think an earlier
> version called this field "num_contexts", maybe we should got back to that
> name for clarity?

+1 for that. As ssid can be zero as per the spec, num_ssids=1 will be slightly misleading.

Thanks,
Shameer

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for Substream IDs
  2017-11-03  9:37             ` Jean-Philippe Brucker
  (?)
@ 2017-11-06  0:50               ` Yisheng Xie
  -1 siblings, 0 replies; 268+ messages in thread
From: Yisheng Xie @ 2017-11-06  0:50 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Shameerali Kolothum Thodi
  Cc: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu,
	Mark Rutland, Gabriele Paoloni, Catalin Marinas, Will Deacon,
	okaya, yi.l.liu, Lorenzo Pieralisi, ashok.raj, tn, joro,
	rfranz@cavium.com



On 2017/11/3 17:37, Jean-Philippe Brucker wrote:
> On 03/11/17 05:45, Yisheng Xie wrote:
>> Hi Jean,
>>
>> On 2017/11/3 1:02, Shameerali Kolothum Thodi wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Jean-Philippe Brucker [mailto:Jean-Philippe.Brucker@arm.com]
>>>> Sent: Thursday, November 02, 2017 3:52 PM
>>>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
>>>> Cc: linux-arm-kernel@lists.infradead.org; linux-pci@vger.kernel.org; linux-
>>>> acpi@vger.kernel.org; devicetree@vger.kernel.org; iommu@lists.linux-
>>>> foundation.org; Mark Rutland <Mark.Rutland@arm.com>; xieyisheng (A)
>>>> <xieyisheng1@huawei.com>; Gabriele Paoloni
>>>> <gabriele.paoloni@huawei.com>; Catalin Marinas
>>>> <Catalin.Marinas@arm.com>; Will Deacon <Will.Deacon@arm.com>;
>>>> okaya@codeaurora.org; yi.l.liu@intel.com; Lorenzo Pieralisi
>>>> <Lorenzo.Pieralisi@arm.com>; ashok.raj@intel.com; tn@semihalf.com;
>>>> joro@8bytes.org; rfranz@cavium.com; lenb@kernel.org;
>>>> jacob.jun.pan@linux.intel.com; alex.williamson@redhat.com;
>>>> robh+dt@kernel.org; Leizhen (ThunderTown) <thunder.leizhen@huawei.com>;
>>>> bhelgaas@google.com; dwmw2@infradead.org; liubo (CU)
>>>> <liubo95@huawei.com>; rjw@rjwysocki.net; robdclark@gmail.com;
>>>> hanjun.guo@linaro.org; Sudeep Holla <Sudeep.Holla@arm.com>; Robin
>>>> Murphy <Robin.Murphy@arm.com>; nwatters@codeaurora.org; Linuxarm
>>>> <linuxarm@huawei.com>
>>>> Subject: Re: [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for
>>>> Substream IDs
>>>>
>>>> Hi Shameer,
>>>>
>>>> On Thu, Nov 02, 2017 at 12:49:32PM +0000, Shameerali Kolothum Thodi wrote:
>>>>> We had a go with this series on HiSIlicon D05 platform which doesn't have
>>>>> support for ssids/ATS/PRI, to make sure it generally works.
>>>>>
>>>>> But observed the below crash on boot,
>>>>>
>>>>> [   16.009084] WARNING: CPU: 59 PID: 391 at mm/page_alloc.c:3883
>>>> __alloc_pages_nodemask+0x19c/0xc48
>>>>> [   16.026797] Modules linked in:
>>>>> [   16.032944] CPU: 59 PID: 391 Comm: kworker/59:1 Not tainted 4.14.0-rc1-
>>>> 159539-ge42aca3 #236
>>>>> [...]
>>>>> [   16.068206] Workqueue: events deferred_probe_work_func
>>>>> [   16.078557] task: ffff8017d38a0000 task.stack: ffff00000b198000
>>>>> [   16.090486] PC is at __alloc_pages_nodemask+0x19c/0xc48
>>>>> [   16.101013] LR is at __alloc_pages_nodemask+0xe0/0xc48
>>>>> [   16.469220] [<ffff000008186b94>] __alloc_pages_nodemask+0x19c/0xc48
>>>>> [   16.481854] [<ffff0000081d65b0>] alloc_pages_current+0x80/0xcc
>>>>> [   16.493607] [<ffff000008182be8>] __get_free_pages+0xc/0x38
>>>>> [   16.504661] [<ffff0000083c4d58>] swiotlb_alloc_coherent+0x64/0x190
>>>>> [   16.517117] [<ffff00000809824c>] __dma_alloc+0x110/0x204
>>>>> [   16.527820] [<ffff00000858e850>] dmam_alloc_coherent+0x88/0xf0
>>>>> [   16.539575] [<ffff000008568884>]
>>>> arm_smmu_domain_finalise_s1+0x60/0x248
>>>>> [   16.552909] [<ffff00000856c104>] arm_smmu_attach_dev+0x264/0x300
>>>>> [   16.565013] [<ffff00000855d40c>] __iommu_attach_device+0x48/0x5c
>>>>> [   16.577117] [<ffff00000855e698>] iommu_group_add_device+0x144/0x3a4
>>>>> [   16.589746] [<ffff00000855ed18>] iommu_group_get_for_dev+0x70/0xf8
>>>>> [   16.602201] [<ffff00000856a314>] arm_smmu_add_device+0x1a4/0x418
>>>>> [   16.614308] [<ffff00000849dfcc>] iort_iommu_configure+0xf0/0x16c
>>>>> [   16.626416] [<ffff000008468c50>] acpi_dma_configure+0x30/0x70
>>>>> [   16.637994] [<ffff00000858f00c>] dma_configure+0xa8/0xd4
>>>>> [   16.648695] [<ffff00000857706c>] driver_probe_device+0x1a4/0x2dc
>>>>> [   16.673081] [<ffff0000085752c8>] bus_for_each_drv+0x54/0x94
>>>>> [   16.684307] [<ffff000008576db0>] __device_attach+0xc4/0x12c
>>>>> [   16.695533] [<ffff000008577350>] device_initial_probe+0x10/0x18
>>>>> [   16.707462] [<ffff0000085762b4>] bus_probe_device+0x90/0x98
>>>>>
>>>>> After a bit of debug it looks like on platforms where ssid is not supported,
>>>>> s1_cfg.num_contexts is set to zero and it eventually results in this crash
>>>>> in,
>>>>> arm_smmu_domain_finalise_s1() -->arm_smmu_alloc_cd_tables()-->
>>>>> arm_smmu_alloc_cd_leaf_table() as num_leaf_entries is zero.
>>>>>
>>>>> With the below fix, it works on D05 now,
>>>>>
>>>>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>>>>> index 8ad90e2..51f5821 100644
>>>>> --- a/drivers/iommu/arm-smmu-v3.c
>>>>> +++ b/drivers/iommu/arm-smmu-v3.c
>>>>> @@ -2433,7 +2433,10 @@ static int arm_smmu_domain_finalise(struct
>>>> iommu_domain *domain,
>>>>>                         domain->min_pasid = 1;
>>>>>                         domain->max_pasid = master->num_ssids - 1;
>>>>>                         smmu_domain->s1_cfg.num_contexts = master->num_ssids;
>>>>> +               } else {
>>>>> +                       smmu_domain->s1_cfg.num_contexts = 1;
>>>>>                 }
>>>>> +
>>>>>                 smmu_domain->s1_cfg.can_stall = master->ste.can_stall;
>>>>>                 break;
>>>>>         case ARM_SMMU_DOMAIN_NESTED:
>>>>>
>>>>>
>>>>> I am not sure this is right place do this. Please take a look.
>>>>
>>>> Thanks for testing the series and reporting the bug. I added the
>>>> following patch to branch svm/current, does it work for you?
>>>
>>> Yes, it does.
>>>
>>> Thanks,
>>> Shameer
>>>  
>>>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>>>> index 42c8378624ed..edda466adc81 100644
>>>> --- a/drivers/iommu/arm-smmu-v3.c
>>>> +++ b/drivers/iommu/arm-smmu-v3.c
>>>> @@ -3169,9 +3169,7 @@ static int arm_smmu_add_device(struct device *dev)
>>>>                 }
>>>>         }
>>>>
>>>> -       if (smmu->ssid_bits)
>>>> -               master->num_ssids = 1 << min(smmu->ssid_bits,
>>>> -                                            fwspec->num_pasid_bits);
>>>> +       master->num_ssids = 1 << min(smmu->ssid_bits, fwspec-
>>>>> num_pasid_bits);
>>
>> If fwspec->num_pasid_bits = 0, then master have _one_ num_ssids ?
> 
> Yes, the context table allocator always needs to allocate at least one
> entry, even if the master or SMMU doesn't support SSID. I think an earlier
> version called this field "num_contexts", maybe we should got back to that
> name for clarity?
> 
Yes, it will be more clear.

Thanks
Yisheng

> Thanks,
> Jean
> 
> .
> 


^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for Substream IDs
@ 2017-11-06  0:50               ` Yisheng Xie
  0 siblings, 0 replies; 268+ messages in thread
From: Yisheng Xie @ 2017-11-06  0:50 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Shameerali Kolothum Thodi
  Cc: Mark Rutland, Gabriele Paoloni, linux-pci, Will Deacon, Linuxarm,
	okaya, Lorenzo Pieralisi, yi.l.liu, ashok.raj, tn, joro,
	robdclark, linux-acpi, Catalin Marinas, rfranz, lenb, devicetree,
	jacob.jun.pan, alex.williamson, robh+dt, Leizhen (ThunderTown),
	bhelgaas, linux-arm-kernel, Robin Murphy, liubo (CU),
	rjw, iommu, hanjun.guo, Sudeep Holla, dwmw2, nwatters



On 2017/11/3 17:37, Jean-Philippe Brucker wrote:
> On 03/11/17 05:45, Yisheng Xie wrote:
>> Hi Jean,
>>
>> On 2017/11/3 1:02, Shameerali Kolothum Thodi wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Jean-Philippe Brucker [mailto:Jean-Philippe.Brucker@arm.com]
>>>> Sent: Thursday, November 02, 2017 3:52 PM
>>>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
>>>> Cc: linux-arm-kernel@lists.infradead.org; linux-pci@vger.kernel.org; linux-
>>>> acpi@vger.kernel.org; devicetree@vger.kernel.org; iommu@lists.linux-
>>>> foundation.org; Mark Rutland <Mark.Rutland@arm.com>; xieyisheng (A)
>>>> <xieyisheng1@huawei.com>; Gabriele Paoloni
>>>> <gabriele.paoloni@huawei.com>; Catalin Marinas
>>>> <Catalin.Marinas@arm.com>; Will Deacon <Will.Deacon@arm.com>;
>>>> okaya@codeaurora.org; yi.l.liu@intel.com; Lorenzo Pieralisi
>>>> <Lorenzo.Pieralisi@arm.com>; ashok.raj@intel.com; tn@semihalf.com;
>>>> joro@8bytes.org; rfranz@cavium.com; lenb@kernel.org;
>>>> jacob.jun.pan@linux.intel.com; alex.williamson@redhat.com;
>>>> robh+dt@kernel.org; Leizhen (ThunderTown) <thunder.leizhen@huawei.com>;
>>>> bhelgaas@google.com; dwmw2@infradead.org; liubo (CU)
>>>> <liubo95@huawei.com>; rjw@rjwysocki.net; robdclark@gmail.com;
>>>> hanjun.guo@linaro.org; Sudeep Holla <Sudeep.Holla@arm.com>; Robin
>>>> Murphy <Robin.Murphy@arm.com>; nwatters@codeaurora.org; Linuxarm
>>>> <linuxarm@huawei.com>
>>>> Subject: Re: [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for
>>>> Substream IDs
>>>>
>>>> Hi Shameer,
>>>>
>>>> On Thu, Nov 02, 2017 at 12:49:32PM +0000, Shameerali Kolothum Thodi wrote:
>>>>> We had a go with this series on HiSIlicon D05 platform which doesn't have
>>>>> support for ssids/ATS/PRI, to make sure it generally works.
>>>>>
>>>>> But observed the below crash on boot,
>>>>>
>>>>> [   16.009084] WARNING: CPU: 59 PID: 391 at mm/page_alloc.c:3883
>>>> __alloc_pages_nodemask+0x19c/0xc48
>>>>> [   16.026797] Modules linked in:
>>>>> [   16.032944] CPU: 59 PID: 391 Comm: kworker/59:1 Not tainted 4.14.0-rc1-
>>>> 159539-ge42aca3 #236
>>>>> [...]
>>>>> [   16.068206] Workqueue: events deferred_probe_work_func
>>>>> [   16.078557] task: ffff8017d38a0000 task.stack: ffff00000b198000
>>>>> [   16.090486] PC is at __alloc_pages_nodemask+0x19c/0xc48
>>>>> [   16.101013] LR is at __alloc_pages_nodemask+0xe0/0xc48
>>>>> [   16.469220] [<ffff000008186b94>] __alloc_pages_nodemask+0x19c/0xc48
>>>>> [   16.481854] [<ffff0000081d65b0>] alloc_pages_current+0x80/0xcc
>>>>> [   16.493607] [<ffff000008182be8>] __get_free_pages+0xc/0x38
>>>>> [   16.504661] [<ffff0000083c4d58>] swiotlb_alloc_coherent+0x64/0x190
>>>>> [   16.517117] [<ffff00000809824c>] __dma_alloc+0x110/0x204
>>>>> [   16.527820] [<ffff00000858e850>] dmam_alloc_coherent+0x88/0xf0
>>>>> [   16.539575] [<ffff000008568884>]
>>>> arm_smmu_domain_finalise_s1+0x60/0x248
>>>>> [   16.552909] [<ffff00000856c104>] arm_smmu_attach_dev+0x264/0x300
>>>>> [   16.565013] [<ffff00000855d40c>] __iommu_attach_device+0x48/0x5c
>>>>> [   16.577117] [<ffff00000855e698>] iommu_group_add_device+0x144/0x3a4
>>>>> [   16.589746] [<ffff00000855ed18>] iommu_group_get_for_dev+0x70/0xf8
>>>>> [   16.602201] [<ffff00000856a314>] arm_smmu_add_device+0x1a4/0x418
>>>>> [   16.614308] [<ffff00000849dfcc>] iort_iommu_configure+0xf0/0x16c
>>>>> [   16.626416] [<ffff000008468c50>] acpi_dma_configure+0x30/0x70
>>>>> [   16.637994] [<ffff00000858f00c>] dma_configure+0xa8/0xd4
>>>>> [   16.648695] [<ffff00000857706c>] driver_probe_device+0x1a4/0x2dc
>>>>> [   16.673081] [<ffff0000085752c8>] bus_for_each_drv+0x54/0x94
>>>>> [   16.684307] [<ffff000008576db0>] __device_attach+0xc4/0x12c
>>>>> [   16.695533] [<ffff000008577350>] device_initial_probe+0x10/0x18
>>>>> [   16.707462] [<ffff0000085762b4>] bus_probe_device+0x90/0x98
>>>>>
>>>>> After a bit of debug it looks like on platforms where ssid is not supported,
>>>>> s1_cfg.num_contexts is set to zero and it eventually results in this crash
>>>>> in,
>>>>> arm_smmu_domain_finalise_s1() -->arm_smmu_alloc_cd_tables()-->
>>>>> arm_smmu_alloc_cd_leaf_table() as num_leaf_entries is zero.
>>>>>
>>>>> With the below fix, it works on D05 now,
>>>>>
>>>>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>>>>> index 8ad90e2..51f5821 100644
>>>>> --- a/drivers/iommu/arm-smmu-v3.c
>>>>> +++ b/drivers/iommu/arm-smmu-v3.c
>>>>> @@ -2433,7 +2433,10 @@ static int arm_smmu_domain_finalise(struct
>>>> iommu_domain *domain,
>>>>>                         domain->min_pasid = 1;
>>>>>                         domain->max_pasid = master->num_ssids - 1;
>>>>>                         smmu_domain->s1_cfg.num_contexts = master->num_ssids;
>>>>> +               } else {
>>>>> +                       smmu_domain->s1_cfg.num_contexts = 1;
>>>>>                 }
>>>>> +
>>>>>                 smmu_domain->s1_cfg.can_stall = master->ste.can_stall;
>>>>>                 break;
>>>>>         case ARM_SMMU_DOMAIN_NESTED:
>>>>>
>>>>>
>>>>> I am not sure this is right place do this. Please take a look.
>>>>
>>>> Thanks for testing the series and reporting the bug. I added the
>>>> following patch to branch svm/current, does it work for you?
>>>
>>> Yes, it does.
>>>
>>> Thanks,
>>> Shameer
>>>  
>>>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>>>> index 42c8378624ed..edda466adc81 100644
>>>> --- a/drivers/iommu/arm-smmu-v3.c
>>>> +++ b/drivers/iommu/arm-smmu-v3.c
>>>> @@ -3169,9 +3169,7 @@ static int arm_smmu_add_device(struct device *dev)
>>>>                 }
>>>>         }
>>>>
>>>> -       if (smmu->ssid_bits)
>>>> -               master->num_ssids = 1 << min(smmu->ssid_bits,
>>>> -                                            fwspec->num_pasid_bits);
>>>> +       master->num_ssids = 1 << min(smmu->ssid_bits, fwspec-
>>>>> num_pasid_bits);
>>
>> If fwspec->num_pasid_bits = 0, then master have _one_ num_ssids ?
> 
> Yes, the context table allocator always needs to allocate at least one
> entry, even if the master or SMMU doesn't support SSID. I think an earlier
> version called this field "num_contexts", maybe we should got back to that
> name for clarity?
> 
Yes, it will be more clear.

Thanks
Yisheng

> Thanks,
> Jean
> 
> .
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for Substream IDs
@ 2017-11-06  0:50               ` Yisheng Xie
  0 siblings, 0 replies; 268+ messages in thread
From: Yisheng Xie @ 2017-11-06  0:50 UTC (permalink / raw)
  To: linux-arm-kernel



On 2017/11/3 17:37, Jean-Philippe Brucker wrote:
> On 03/11/17 05:45, Yisheng Xie wrote:
>> Hi Jean,
>>
>> On 2017/11/3 1:02, Shameerali Kolothum Thodi wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Jean-Philippe Brucker [mailto:Jean-Philippe.Brucker at arm.com]
>>>> Sent: Thursday, November 02, 2017 3:52 PM
>>>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
>>>> Cc: linux-arm-kernel at lists.infradead.org; linux-pci at vger.kernel.org; linux-
>>>> acpi at vger.kernel.org; devicetree at vger.kernel.org; iommu at lists.linux-
>>>> foundation.org; Mark Rutland <Mark.Rutland@arm.com>; xieyisheng (A)
>>>> <xieyisheng1@huawei.com>; Gabriele Paoloni
>>>> <gabriele.paoloni@huawei.com>; Catalin Marinas
>>>> <Catalin.Marinas@arm.com>; Will Deacon <Will.Deacon@arm.com>;
>>>> okaya at codeaurora.org; yi.l.liu at intel.com; Lorenzo Pieralisi
>>>> <Lorenzo.Pieralisi@arm.com>; ashok.raj at intel.com; tn at semihalf.com;
>>>> joro at 8bytes.org; rfranz at cavium.com; lenb at kernel.org;
>>>> jacob.jun.pan at linux.intel.com; alex.williamson at redhat.com;
>>>> robh+dt at kernel.org; Leizhen (ThunderTown) <thunder.leizhen@huawei.com>;
>>>> bhelgaas at google.com; dwmw2 at infradead.org; liubo (CU)
>>>> <liubo95@huawei.com>; rjw at rjwysocki.net; robdclark at gmail.com;
>>>> hanjun.guo at linaro.org; Sudeep Holla <Sudeep.Holla@arm.com>; Robin
>>>> Murphy <Robin.Murphy@arm.com>; nwatters at codeaurora.org; Linuxarm
>>>> <linuxarm@huawei.com>
>>>> Subject: Re: [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for
>>>> Substream IDs
>>>>
>>>> Hi Shameer,
>>>>
>>>> On Thu, Nov 02, 2017 at 12:49:32PM +0000, Shameerali Kolothum Thodi wrote:
>>>>> We had a go with this series on HiSIlicon D05 platform which doesn't have
>>>>> support for ssids/ATS/PRI, to make sure it generally works.
>>>>>
>>>>> But observed the below crash on boot,
>>>>>
>>>>> [   16.009084] WARNING: CPU: 59 PID: 391 at mm/page_alloc.c:3883
>>>> __alloc_pages_nodemask+0x19c/0xc48
>>>>> [   16.026797] Modules linked in:
>>>>> [   16.032944] CPU: 59 PID: 391 Comm: kworker/59:1 Not tainted 4.14.0-rc1-
>>>> 159539-ge42aca3 #236
>>>>> [...]
>>>>> [   16.068206] Workqueue: events deferred_probe_work_func
>>>>> [   16.078557] task: ffff8017d38a0000 task.stack: ffff00000b198000
>>>>> [   16.090486] PC is at __alloc_pages_nodemask+0x19c/0xc48
>>>>> [   16.101013] LR is at __alloc_pages_nodemask+0xe0/0xc48
>>>>> [   16.469220] [<ffff000008186b94>] __alloc_pages_nodemask+0x19c/0xc48
>>>>> [   16.481854] [<ffff0000081d65b0>] alloc_pages_current+0x80/0xcc
>>>>> [   16.493607] [<ffff000008182be8>] __get_free_pages+0xc/0x38
>>>>> [   16.504661] [<ffff0000083c4d58>] swiotlb_alloc_coherent+0x64/0x190
>>>>> [   16.517117] [<ffff00000809824c>] __dma_alloc+0x110/0x204
>>>>> [   16.527820] [<ffff00000858e850>] dmam_alloc_coherent+0x88/0xf0
>>>>> [   16.539575] [<ffff000008568884>]
>>>> arm_smmu_domain_finalise_s1+0x60/0x248
>>>>> [   16.552909] [<ffff00000856c104>] arm_smmu_attach_dev+0x264/0x300
>>>>> [   16.565013] [<ffff00000855d40c>] __iommu_attach_device+0x48/0x5c
>>>>> [   16.577117] [<ffff00000855e698>] iommu_group_add_device+0x144/0x3a4
>>>>> [   16.589746] [<ffff00000855ed18>] iommu_group_get_for_dev+0x70/0xf8
>>>>> [   16.602201] [<ffff00000856a314>] arm_smmu_add_device+0x1a4/0x418
>>>>> [   16.614308] [<ffff00000849dfcc>] iort_iommu_configure+0xf0/0x16c
>>>>> [   16.626416] [<ffff000008468c50>] acpi_dma_configure+0x30/0x70
>>>>> [   16.637994] [<ffff00000858f00c>] dma_configure+0xa8/0xd4
>>>>> [   16.648695] [<ffff00000857706c>] driver_probe_device+0x1a4/0x2dc
>>>>> [   16.673081] [<ffff0000085752c8>] bus_for_each_drv+0x54/0x94
>>>>> [   16.684307] [<ffff000008576db0>] __device_attach+0xc4/0x12c
>>>>> [   16.695533] [<ffff000008577350>] device_initial_probe+0x10/0x18
>>>>> [   16.707462] [<ffff0000085762b4>] bus_probe_device+0x90/0x98
>>>>>
>>>>> After a bit of debug it looks like on platforms where ssid is not supported,
>>>>> s1_cfg.num_contexts is set to zero and it eventually results in this crash
>>>>> in,
>>>>> arm_smmu_domain_finalise_s1() -->arm_smmu_alloc_cd_tables()-->
>>>>> arm_smmu_alloc_cd_leaf_table() as num_leaf_entries is zero.
>>>>>
>>>>> With the below fix, it works on D05 now,
>>>>>
>>>>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>>>>> index 8ad90e2..51f5821 100644
>>>>> --- a/drivers/iommu/arm-smmu-v3.c
>>>>> +++ b/drivers/iommu/arm-smmu-v3.c
>>>>> @@ -2433,7 +2433,10 @@ static int arm_smmu_domain_finalise(struct
>>>> iommu_domain *domain,
>>>>>                         domain->min_pasid = 1;
>>>>>                         domain->max_pasid = master->num_ssids - 1;
>>>>>                         smmu_domain->s1_cfg.num_contexts = master->num_ssids;
>>>>> +               } else {
>>>>> +                       smmu_domain->s1_cfg.num_contexts = 1;
>>>>>                 }
>>>>> +
>>>>>                 smmu_domain->s1_cfg.can_stall = master->ste.can_stall;
>>>>>                 break;
>>>>>         case ARM_SMMU_DOMAIN_NESTED:
>>>>>
>>>>>
>>>>> I am not sure this is right place do this. Please take a look.
>>>>
>>>> Thanks for testing the series and reporting the bug. I added the
>>>> following patch to branch svm/current, does it work for you?
>>>
>>> Yes, it does.
>>>
>>> Thanks,
>>> Shameer
>>>  
>>>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>>>> index 42c8378624ed..edda466adc81 100644
>>>> --- a/drivers/iommu/arm-smmu-v3.c
>>>> +++ b/drivers/iommu/arm-smmu-v3.c
>>>> @@ -3169,9 +3169,7 @@ static int arm_smmu_add_device(struct device *dev)
>>>>                 }
>>>>         }
>>>>
>>>> -       if (smmu->ssid_bits)
>>>> -               master->num_ssids = 1 << min(smmu->ssid_bits,
>>>> -                                            fwspec->num_pasid_bits);
>>>> +       master->num_ssids = 1 << min(smmu->ssid_bits, fwspec-
>>>>> num_pasid_bits);
>>
>> If fwspec->num_pasid_bits = 0, then master have _one_ num_ssids ?
> 
> Yes, the context table allocator always needs to allocate at least one
> entry, even if the master or SMMU doesn't support SSID. I think an earlier
> version called this field "num_contexts", maybe we should got back to that
> name for clarity?
> 
Yes, it will be more clear.

Thanks
Yisheng

> Thanks,
> Jean
> 
> .
> 

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 00/36] Process management for IOMMU + SVM for SMMUv3
  2017-10-12 12:55           ` Jean-Philippe Brucker
  (?)
@ 2017-11-08  1:21             ` Bob Liu
  -1 siblings, 0 replies; 268+ messages in thread
From: Bob Liu @ 2017-11-08  1:21 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Yisheng Xie, linux-arm-kernel, linux-pci,
	linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, Mark Rutland, Catalin Marinas, Will Deacon,
	Lorenzo Pieralisi, hanjun.guo, Sudeep Holla, rjw, lenb,
	Robin Murphy, bhelgaas, alex.williamson, tn, thunder.leizhen,
	gabriele.paoloni, nwatters

Hi Jean,

On 2017/10/12 20:55, Jean-Philippe Brucker wrote:
> On 12/10/17 13:05, Yisheng Xie wrote:
> [...]
>>>>> * An iommu_process can be bound to multiple domains, and a domain can have
>>>>>   multiple iommu_process.
>>>> when bind a task to device, can we create a single domain for it? I am thinking
>>>> about process management without shared PT(for some device only support PASID
>>>> without pri ability), it seems hard to expand if a domain have multiple iommu_process?
>>>> Do you have any idea about this?
>>>
>>> A device always has to be in a domain, as far as I know. Not supporting
>>> PRI forces you to pin down all user mappings (or just the ones you use for
>>> DMA) but you should sill be able to share PT. Now if you don't support
>>> shared PT either, but only PASID, then you'll have to use io-pgtable and a
>>> new map/unmap API on an iommu_process. I don't understand your concern
>>> though, how would the link between process and domains prevent this use-case?
>>>
>> So you mean that if an iommu_process bind to multiple devices it should create
>> multiple io-pgtables? or just share the same io-pgtable?
> 
> I don't know to be honest, I haven't thought much about the io-pgtable
> case, I'm all about sharing the mm :)
> 

Sorry to get back to this thread, but traditional DMA_MAP use case may also want to
enable Substreamid/PASID.
As a general framework, you may also consider SubStreamid/Pasid support for dma map/io-pgtable.

We're considering make io-pgtables per SubStreamid/Pasid, but haven't decide put all 
io-pgtables into a single domain or iommu_process.

Thanks,
Liubo

> It really depends on what the user (GPU driver I assume) wants. I think
> that if you're not sharing an mm with the device, then you're trying to
> hide parts of the process to the device, so you'd also want the
> flexibility of having different io-pgtables between devices. Different
> devices accessing isolated parts of the process requires separate io-pgtables.
> 



^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 00/36] Process management for IOMMU + SVM for SMMUv3
@ 2017-11-08  1:21             ` Bob Liu
  0 siblings, 0 replies; 268+ messages in thread
From: Bob Liu @ 2017-11-08  1:21 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Yisheng Xie, linux-arm-kernel, linux-pci,
	linux-acpi, devicetree, iommu
  Cc: Mark Rutland, gabriele.paoloni, Catalin Marinas, Will Deacon,
	okaya, Liguozhu (Kenneth),
	Lorenzo Pieralisi, ashok.raj, tn, joro, rfranz, lenb,
	jacob.jun.pan, alex.williamson, yi.l.liu, thunder.leizhen,
	bhelgaas, dwmw2, rjw, robdclark, robh+dt, hanjun.guo,
	Sudeep Holla, Robin Murphy, nwatters

Hi Jean,

On 2017/10/12 20:55, Jean-Philippe Brucker wrote:
> On 12/10/17 13:05, Yisheng Xie wrote:
> [...]
>>>>> * An iommu_process can be bound to multiple domains, and a domain can have
>>>>>   multiple iommu_process.
>>>> when bind a task to device, can we create a single domain for it? I am thinking
>>>> about process management without shared PT(for some device only support PASID
>>>> without pri ability), it seems hard to expand if a domain have multiple iommu_process?
>>>> Do you have any idea about this?
>>>
>>> A device always has to be in a domain, as far as I know. Not supporting
>>> PRI forces you to pin down all user mappings (or just the ones you use for
>>> DMA) but you should sill be able to share PT. Now if you don't support
>>> shared PT either, but only PASID, then you'll have to use io-pgtable and a
>>> new map/unmap API on an iommu_process. I don't understand your concern
>>> though, how would the link between process and domains prevent this use-case?
>>>
>> So you mean that if an iommu_process bind to multiple devices it should create
>> multiple io-pgtables? or just share the same io-pgtable?
> 
> I don't know to be honest, I haven't thought much about the io-pgtable
> case, I'm all about sharing the mm :)
> 

Sorry to get back to this thread, but traditional DMA_MAP use case may also want to
enable Substreamid/PASID.
As a general framework, you may also consider SubStreamid/Pasid support for dma map/io-pgtable.

We're considering make io-pgtables per SubStreamid/Pasid, but haven't decide put all 
io-pgtables into a single domain or iommu_process.

Thanks,
Liubo

> It really depends on what the user (GPU driver I assume) wants. I think
> that if you're not sharing an mm with the device, then you're trying to
> hide parts of the process to the device, so you'd also want the
> flexibility of having different io-pgtables between devices. Different
> devices accessing isolated parts of the process requires separate io-pgtables.
> 



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 00/36] Process management for IOMMU + SVM for SMMUv3
@ 2017-11-08  1:21             ` Bob Liu
  0 siblings, 0 replies; 268+ messages in thread
From: Bob Liu @ 2017-11-08  1:21 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jean,

On 2017/10/12 20:55, Jean-Philippe Brucker wrote:
> On 12/10/17 13:05, Yisheng Xie wrote:
> [...]
>>>>> * An iommu_process can be bound to multiple domains, and a domain can have
>>>>>   multiple iommu_process.
>>>> when bind a task to device, can we create a single domain for it? I am thinking
>>>> about process management without shared PT(for some device only support PASID
>>>> without pri ability), it seems hard to expand if a domain have multiple iommu_process?
>>>> Do you have any idea about this?
>>>
>>> A device always has to be in a domain, as far as I know. Not supporting
>>> PRI forces you to pin down all user mappings (or just the ones you use for
>>> DMA) but you should sill be able to share PT. Now if you don't support
>>> shared PT either, but only PASID, then you'll have to use io-pgtable and a
>>> new map/unmap API on an iommu_process. I don't understand your concern
>>> though, how would the link between process and domains prevent this use-case?
>>>
>> So you mean that if an iommu_process bind to multiple devices it should create
>> multiple io-pgtables? or just share the same io-pgtable?
> 
> I don't know to be honest, I haven't thought much about the io-pgtable
> case, I'm all about sharing the mm :)
> 

Sorry to get back to this thread, but traditional DMA_MAP use case may also want to
enable Substreamid/PASID.
As a general framework, you may also consider SubStreamid/Pasid support for dma map/io-pgtable.

We're considering make io-pgtables per SubStreamid/Pasid, but haven't decide put all 
io-pgtables into a single domain or iommu_process.

Thanks,
Liubo

> It really depends on what the user (GPU driver I assume) wants. I think
> that if you're not sharing an mm with the device, then you're trying to
> hide parts of the process to the device, so you'd also want the
> flexibility of having different io-pgtables between devices. Different
> devices accessing isolated parts of the process requires separate io-pgtables.
> 

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 00/36] Process management for IOMMU + SVM for SMMUv3
  2017-11-08  1:21             ` Bob Liu
  (?)
@ 2017-11-08 10:50               ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-08 10:50 UTC (permalink / raw)
  To: Bob Liu, Yisheng Xie, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu
  Cc: joro, robh+dt, Mark Rutland, Catalin Marinas, Will Deacon,
	Lorenzo Pieralisi, hanjun.guo, Sudeep Holla, rjw, lenb,
	Robin Murphy, bhelgaas, alex.williamson, tn, thunder.leizhen,
	gabriele.paoloni, nwatters, ok

Hi Liubo,

On 08/11/17 01:21, Bob Liu wrote:
> Hi Jean,
> 
> On 2017/10/12 20:55, Jean-Philippe Brucker wrote:
>> On 12/10/17 13:05, Yisheng Xie wrote:
>> [...]
>>>>>> * An iommu_process can be bound to multiple domains, and a domain can have
>>>>>>   multiple iommu_process.
>>>>> when bind a task to device, can we create a single domain for it? I am thinking
>>>>> about process management without shared PT(for some device only support PASID
>>>>> without pri ability), it seems hard to expand if a domain have multiple iommu_process?
>>>>> Do you have any idea about this?
>>>>
>>>> A device always has to be in a domain, as far as I know. Not supporting
>>>> PRI forces you to pin down all user mappings (or just the ones you use for
>>>> DMA) but you should sill be able to share PT. Now if you don't support
>>>> shared PT either, but only PASID, then you'll have to use io-pgtable and a
>>>> new map/unmap API on an iommu_process. I don't understand your concern
>>>> though, how would the link between process and domains prevent this use-case?
>>>>
>>> So you mean that if an iommu_process bind to multiple devices it should create
>>> multiple io-pgtables? or just share the same io-pgtable?
>>
>> I don't know to be honest, I haven't thought much about the io-pgtable
>> case, I'm all about sharing the mm :)
>>
> 
> Sorry to get back to this thread, but traditional DMA_MAP use case may also want to
> enable Substreamid/PASID.
> As a general framework, you may also consider SubStreamid/Pasid support for dma map/io-pgtable.
> 
> We're considering make io-pgtables per SubStreamid/Pasid, but haven't decide put all 
> io-pgtables into a single domain or iommu_process.

Yes they should be in a single domain, see also my other reply here:
http://www.spinics.net/lists/arm-kernel/msg613586.html

I've only been thinking about the IOMMU API for the moment, but I guess
the VFIO API would use this extension? I suppose it would be a new PASID
field to DMA_MAP along with a flag. The PASID would probably be allocated
with BIND + some special flag.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 00/36] Process management for IOMMU + SVM for SMMUv3
@ 2017-11-08 10:50               ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-08 10:50 UTC (permalink / raw)
  To: Bob Liu, Yisheng Xie, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu
  Cc: Mark Rutland, gabriele.paoloni, Catalin Marinas, Will Deacon,
	okaya, liguozhu, Lorenzo Pieralisi, ashok.raj, tn, joro, rfranz,
	lenb, jacob.jun.pan, alex.williamson, yi.l.liu, thunder.leizhen,
	bhelgaas, dwmw2, rjw, robdclark, robh+dt, hanjun.guo,
	Sudeep Holla, Robin Murphy, nwatters

Hi Liubo,

On 08/11/17 01:21, Bob Liu wrote:
> Hi Jean,
> 
> On 2017/10/12 20:55, Jean-Philippe Brucker wrote:
>> On 12/10/17 13:05, Yisheng Xie wrote:
>> [...]
>>>>>> * An iommu_process can be bound to multiple domains, and a domain can have
>>>>>>   multiple iommu_process.
>>>>> when bind a task to device, can we create a single domain for it? I am thinking
>>>>> about process management without shared PT(for some device only support PASID
>>>>> without pri ability), it seems hard to expand if a domain have multiple iommu_process?
>>>>> Do you have any idea about this?
>>>>
>>>> A device always has to be in a domain, as far as I know. Not supporting
>>>> PRI forces you to pin down all user mappings (or just the ones you use for
>>>> DMA) but you should sill be able to share PT. Now if you don't support
>>>> shared PT either, but only PASID, then you'll have to use io-pgtable and a
>>>> new map/unmap API on an iommu_process. I don't understand your concern
>>>> though, how would the link between process and domains prevent this use-case?
>>>>
>>> So you mean that if an iommu_process bind to multiple devices it should create
>>> multiple io-pgtables? or just share the same io-pgtable?
>>
>> I don't know to be honest, I haven't thought much about the io-pgtable
>> case, I'm all about sharing the mm :)
>>
> 
> Sorry to get back to this thread, but traditional DMA_MAP use case may also want to
> enable Substreamid/PASID.
> As a general framework, you may also consider SubStreamid/Pasid support for dma map/io-pgtable.
> 
> We're considering make io-pgtables per SubStreamid/Pasid, but haven't decide put all 
> io-pgtables into a single domain or iommu_process.

Yes they should be in a single domain, see also my other reply here:
http://www.spinics.net/lists/arm-kernel/msg613586.html

I've only been thinking about the IOMMU API for the moment, but I guess
the VFIO API would use this extension? I suppose it would be a new PASID
field to DMA_MAP along with a flag. The PASID would probably be allocated
with BIND + some special flag.

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 00/36] Process management for IOMMU + SVM for SMMUv3
@ 2017-11-08 10:50               ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-08 10:50 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Liubo,

On 08/11/17 01:21, Bob Liu wrote:
> Hi Jean,
> 
> On 2017/10/12 20:55, Jean-Philippe Brucker wrote:
>> On 12/10/17 13:05, Yisheng Xie wrote:
>> [...]
>>>>>> * An iommu_process can be bound to multiple domains, and a domain can have
>>>>>>   multiple iommu_process.
>>>>> when bind a task to device, can we create a single domain for it? I am thinking
>>>>> about process management without shared PT(for some device only support PASID
>>>>> without pri ability), it seems hard to expand if a domain have multiple iommu_process?
>>>>> Do you have any idea about this?
>>>>
>>>> A device always has to be in a domain, as far as I know. Not supporting
>>>> PRI forces you to pin down all user mappings (or just the ones you use for
>>>> DMA) but you should sill be able to share PT. Now if you don't support
>>>> shared PT either, but only PASID, then you'll have to use io-pgtable and a
>>>> new map/unmap API on an iommu_process. I don't understand your concern
>>>> though, how would the link between process and domains prevent this use-case?
>>>>
>>> So you mean that if an iommu_process bind to multiple devices it should create
>>> multiple io-pgtables? or just share the same io-pgtable?
>>
>> I don't know to be honest, I haven't thought much about the io-pgtable
>> case, I'm all about sharing the mm :)
>>
> 
> Sorry to get back to this thread, but traditional DMA_MAP use case may also want to
> enable Substreamid/PASID.
> As a general framework, you may also consider SubStreamid/Pasid support for dma map/io-pgtable.
> 
> We're considering make io-pgtables per SubStreamid/Pasid, but haven't decide put all 
> io-pgtables into a single domain or iommu_process.

Yes they should be in a single domain, see also my other reply here:
http://www.spinics.net/lists/arm-kernel/msg613586.html

I've only been thinking about the IOMMU API for the moment, but I guess
the VFIO API would use this extension? I suppose it would be a new PASID
field to DMA_MAP along with a flag. The PASID would probably be allocated
with BIND + some special flag.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* RE: [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
  2017-10-06 13:31     ` Jean-Philippe Brucker
  (?)
@ 2017-11-08 17:50         ` Bharat Kumar Gogada
  -1 siblings, 0 replies; 268+ messages in thread
From: Bharat Kumar Gogada @ 2017-11-08 17:50 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: mark.rutland-5wv7dgnIgG8,
	gabriele.paoloni-hv44wF8Li93QT0dZR+AlfA,
	catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	rjw-LthD3rsA81gm4RdzfppkhA, sudeep.holla-5wv7dgnIgG8

Hi Jean,

+static struct iommu_process *
+iommu_process_alloc(struct iommu_domain *domain, struct task_struct 
+*task) {
+	int err;
+	int pasid;
+	struct iommu_process *process;
+
+	if (WARN_ON(!domain->ops->process_alloc || !domain->ops->process_free))
+		return ERR_PTR(-ENODEV);
+
+	process = domain->ops->process_alloc(task);
+	if (IS_ERR(process))
+		return process;
+	if (!process)
+		return ERR_PTR(-ENOMEM);
+
+	process->pid		= get_task_pid(task, PIDTYPE_PID);
+	process->release	= domain->ops->process_free;
+	INIT_LIST_HEAD(&process->domains);
+	kref_init(&process->kref);
+
+	if (!process->pid) {
+		err = -EINVAL;
+		goto err_free_process;
+	}
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&iommu_process_lock);
+	pasid = idr_alloc_cyclic(&iommu_process_idr, process, domain->min_pasid,
+				 domain->max_pasid + 1, GFP_ATOMIC);
If EP supports only one pasid; domain->min_pasid=1 and domain->max_pasid=0.
When idr_alloc_cyclic is called it invokes idr_get_free_cmn function where we have following condition. (Based on kernel 4.14-rc6)
	if (!radix_tree_tagged(root, IDR_FREE)) 
                start = max(start, maxindex + 1);
              if (start > max) 
		return ERR_PTR(-ENOSPC);
Here max is being assigned zero by the time this function is invoked, this value is based on domain->max_pasid.
This condition fails and ENOSPC is returned.
	
In this case even though hardware supports PASID, BIND flow fails.
Any reason why pasid allocation moved to idr allocations rather than bitmap allocations as in v1 patches ?

+	process->pasid = pasid;

Regards,
Bharat

^ permalink raw reply	[flat|nested] 268+ messages in thread

* RE: [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
@ 2017-11-08 17:50         ` Bharat Kumar Gogada
  0 siblings, 0 replies; 268+ messages in thread
From: Bharat Kumar Gogada @ 2017-11-08 17:50 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu
  Cc: mark.rutland, xieyisheng1, gabriele.paoloni, catalin.marinas,
	will.deacon, okaya, yi.l.liu, lorenzo.pieralisi, ashok.raj, tn,
	joro, rfranz, lenb, jacob.jun.pan, alex.williamson, robh+dt,
	thunder.leizhen, bhelgaas, dwmw2, liubo95, rjw, robdclark,
	hanjun.guo, sudeep.holla, robin.murphy, nwatters

Hi Jean,

+static struct iommu_process *
+iommu_process_alloc(struct iommu_domain *domain, struct task_struct 
+*task) {
+	int err;
+	int pasid;
+	struct iommu_process *process;
+
+	if (WARN_ON(!domain->ops->process_alloc || !domain->ops->process_free))
+		return ERR_PTR(-ENODEV);
+
+	process = domain->ops->process_alloc(task);
+	if (IS_ERR(process))
+		return process;
+	if (!process)
+		return ERR_PTR(-ENOMEM);
+
+	process->pid		= get_task_pid(task, PIDTYPE_PID);
+	process->release	= domain->ops->process_free;
+	INIT_LIST_HEAD(&process->domains);
+	kref_init(&process->kref);
+
+	if (!process->pid) {
+		err = -EINVAL;
+		goto err_free_process;
+	}
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&iommu_process_lock);
+	pasid = idr_alloc_cyclic(&iommu_process_idr, process, domain->min_pasid,
+				 domain->max_pasid + 1, GFP_ATOMIC);
If EP supports only one pasid; domain->min_pasid=1 and domain->max_pasid=0.
When idr_alloc_cyclic is called it invokes idr_get_free_cmn function where we have following condition. (Based on kernel 4.14-rc6)
	if (!radix_tree_tagged(root, IDR_FREE)) 
                start = max(start, maxindex + 1);
              if (start > max) 
		return ERR_PTR(-ENOSPC);
Here max is being assigned zero by the time this function is invoked, this value is based on domain->max_pasid.
This condition fails and ENOSPC is returned.
	
In this case even though hardware supports PASID, BIND flow fails.
Any reason why pasid allocation moved to idr allocations rather than bitmap allocations as in v1 patches ?

+	process->pasid = pasid;

Regards,
Bharat

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
@ 2017-11-08 17:50         ` Bharat Kumar Gogada
  0 siblings, 0 replies; 268+ messages in thread
From: Bharat Kumar Gogada @ 2017-11-08 17:50 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jean,

+static struct iommu_process *
+iommu_process_alloc(struct iommu_domain *domain, struct task_struct 
+*task) {
+	int err;
+	int pasid;
+	struct iommu_process *process;
+
+	if (WARN_ON(!domain->ops->process_alloc || !domain->ops->process_free))
+		return ERR_PTR(-ENODEV);
+
+	process = domain->ops->process_alloc(task);
+	if (IS_ERR(process))
+		return process;
+	if (!process)
+		return ERR_PTR(-ENOMEM);
+
+	process->pid		= get_task_pid(task, PIDTYPE_PID);
+	process->release	= domain->ops->process_free;
+	INIT_LIST_HEAD(&process->domains);
+	kref_init(&process->kref);
+
+	if (!process->pid) {
+		err = -EINVAL;
+		goto err_free_process;
+	}
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&iommu_process_lock);
+	pasid = idr_alloc_cyclic(&iommu_process_idr, process, domain->min_pasid,
+				 domain->max_pasid + 1, GFP_ATOMIC);
If EP supports only one pasid; domain->min_pasid=1 and domain->max_pasid=0.
When idr_alloc_cyclic is called it invokes idr_get_free_cmn function where we have following condition. (Based on kernel 4.14-rc6)
	if (!radix_tree_tagged(root, IDR_FREE)) 
                start = max(start, maxindex + 1);
              if (start > max) 
		return ERR_PTR(-ENOSPC);
Here max is being assigned zero by the time this function is invoked, this value is based on domain->max_pasid.
This condition fails and ENOSPC is returned.
	
In this case even though hardware supports PASID, BIND flow fails.
Any reason why pasid allocation moved to idr allocations rather than bitmap allocations as in v1 patches ?

+	process->pasid = pasid;

Regards,
Bharat

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 21/36] iommu/arm-smmu-v3: Implement process operations
  2017-10-06 13:31     ` Jean-Philippe Brucker
  (?)
@ 2017-11-09  3:32       ` Yisheng Xie
  -1 siblings, 0 replies; 268+ messages in thread
From: Yisheng Xie @ 2017-11-09  3:32 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, gabriele.paoloni, nwatters, okaya, rfranz,
	dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

Hi Jean,

On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
> Hook process operations to support PASID and page table sharing with the
> SMMUv3:
> 
> +
> +static void arm_smmu_process_exit(struct iommu_domain *domain,
> +				  struct iommu_process *process)
> +{
> +	struct arm_smmu_master_data *master;
> +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> +
> +	if (!domain->process_exit)
> +		return;

If domain do not set process_exit handler, just return? smmu do not
need invalid ATC, clear cd entry, etc.? Maybe you should check when
call domain->process_exit?

> +
> +	spin_lock(&smmu_domain->devices_lock);
> +	list_for_each_entry(master, &smmu_domain->devices, list) {
> +		if (!master->processes)
> +			continue;
> +
> +		master->processes--;
Add
		if (domain->process_exit)
here?
> +		domain->process_exit(domain, master->dev, process->pasid,
> +				     domain->process_exit_token);
> +
> +		/* TODO: inval ATC */
> +	}
> +	spin_unlock(&smmu_domain->devices_lock);
> +
> +	arm_smmu_write_ctx_desc(smmu_domain, process->pasid, NULL);
> +
> +	/* TODO: Invalidate all mappings if not DVM */
> +}
> +
Thanks
Yisheng Xie


^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 21/36] iommu/arm-smmu-v3: Implement process operations
@ 2017-11-09  3:32       ` Yisheng Xie
  0 siblings, 0 replies; 268+ messages in thread
From: Yisheng Xie @ 2017-11-09  3:32 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, gabriele.paoloni, nwatters, okaya, rfranz,
	dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

Hi Jean,

On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
> Hook process operations to support PASID and page table sharing with the
> SMMUv3:
> 
> +
> +static void arm_smmu_process_exit(struct iommu_domain *domain,
> +				  struct iommu_process *process)
> +{
> +	struct arm_smmu_master_data *master;
> +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> +
> +	if (!domain->process_exit)
> +		return;

If domain do not set process_exit handler, just return? smmu do not
need invalid ATC, clear cd entry, etc.? Maybe you should check when
call domain->process_exit?

> +
> +	spin_lock(&smmu_domain->devices_lock);
> +	list_for_each_entry(master, &smmu_domain->devices, list) {
> +		if (!master->processes)
> +			continue;
> +
> +		master->processes--;
Add
		if (domain->process_exit)
here?
> +		domain->process_exit(domain, master->dev, process->pasid,
> +				     domain->process_exit_token);
> +
> +		/* TODO: inval ATC */
> +	}
> +	spin_unlock(&smmu_domain->devices_lock);
> +
> +	arm_smmu_write_ctx_desc(smmu_domain, process->pasid, NULL);
> +
> +	/* TODO: Invalidate all mappings if not DVM */
> +}
> +
Thanks
Yisheng Xie


^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 21/36] iommu/arm-smmu-v3: Implement process operations
@ 2017-11-09  3:32       ` Yisheng Xie
  0 siblings, 0 replies; 268+ messages in thread
From: Yisheng Xie @ 2017-11-09  3:32 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jean,

On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
> Hook process operations to support PASID and page table sharing with the
> SMMUv3:
> 
> +
> +static void arm_smmu_process_exit(struct iommu_domain *domain,
> +				  struct iommu_process *process)
> +{
> +	struct arm_smmu_master_data *master;
> +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> +
> +	if (!domain->process_exit)
> +		return;

If domain do not set process_exit handler, just return? smmu do not
need invalid ATC, clear cd entry, etc.? Maybe you should check when
call domain->process_exit?

> +
> +	spin_lock(&smmu_domain->devices_lock);
> +	list_for_each_entry(master, &smmu_domain->devices, list) {
> +		if (!master->processes)
> +			continue;
> +
> +		master->processes--;
Add
		if (domain->process_exit)
here?
> +		domain->process_exit(domain, master->dev, process->pasid,
> +				     domain->process_exit_token);
> +
> +		/* TODO: inval ATC */
> +	}
> +	spin_unlock(&smmu_domain->devices_lock);
> +
> +	arm_smmu_write_ctx_desc(smmu_domain, process->pasid, NULL);
> +
> +	/* TODO: Invalidate all mappings if not DVM */
> +}
> +
Thanks
Yisheng Xie

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 21/36] iommu/arm-smmu-v3: Implement process operations
  2017-11-09  3:32       ` Yisheng Xie
  (?)
@ 2017-11-09 12:08         ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-09 12:08 UTC (permalink / raw)
  To: Yisheng Xie, linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, Mark Rutland, Catalin Marinas, Will Deacon,
	Lorenzo Pieralisi, hanjun.guo, Sudeep Holla, rjw, lenb,
	Robin Murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, gabriele.paoloni, nwatters

Hi,

On 09/11/17 03:32, Yisheng Xie wrote:
> Hi Jean,
> 
> On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
>> Hook process operations to support PASID and page table sharing with the
>> SMMUv3:
>>
>> +
>> +static void arm_smmu_process_exit(struct iommu_domain *domain,
>> +				  struct iommu_process *process)
>> +{
>> +	struct arm_smmu_master_data *master;
>> +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> +
>> +	if (!domain->process_exit)
>> +		return;
> 
> If domain do not set process_exit handler, just return? smmu do not
> need invalid ATC, clear cd entry, etc.? Maybe you should check when
> call domain->process_exit?

Indeed, that doesn't make sense. I'll move the check below.

Thanks,
Jean

>> +
>> +	spin_lock(&smmu_domain->devices_lock);
>> +	list_for_each_entry(master, &smmu_domain->devices, list) {
>> +		if (!master->processes)
>> +			continue;
>> +
>> +		master->processes--;
> Add
> 		if (domain->process_exit)
> here?
>> +		domain->process_exit(domain, master->dev, process->pasid,
>> +				     domain->process_exit_token);
>> +
>> +		/* TODO: inval ATC */
>> +	}
>> +	spin_unlock(&smmu_domain->devices_lock);
>> +
>> +	arm_smmu_write_ctx_desc(smmu_domain, process->pasid, NULL);
>> +
>> +	/* TODO: Invalidate all mappings if not DVM */
>> +}
>> +
> Thanks
> Yisheng Xie
> 
> 


^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 21/36] iommu/arm-smmu-v3: Implement process operations
@ 2017-11-09 12:08         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-09 12:08 UTC (permalink / raw)
  To: Yisheng Xie, linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: Mark Rutland, gabriele.paoloni, Catalin Marinas, Will Deacon,
	okaya, yi.l.liu, Lorenzo Pieralisi, ashok.raj, tn, joro, rfranz,
	lenb, jacob.jun.pan, alex.williamson, robh+dt, thunder.leizhen,
	bhelgaas, dwmw2, liubo95, rjw, robdclark, hanjun.guo,
	Sudeep Holla, Robin Murphy, nwatters

Hi,

On 09/11/17 03:32, Yisheng Xie wrote:
> Hi Jean,
> 
> On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
>> Hook process operations to support PASID and page table sharing with the
>> SMMUv3:
>>
>> +
>> +static void arm_smmu_process_exit(struct iommu_domain *domain,
>> +				  struct iommu_process *process)
>> +{
>> +	struct arm_smmu_master_data *master;
>> +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> +
>> +	if (!domain->process_exit)
>> +		return;
> 
> If domain do not set process_exit handler, just return? smmu do not
> need invalid ATC, clear cd entry, etc.? Maybe you should check when
> call domain->process_exit?

Indeed, that doesn't make sense. I'll move the check below.

Thanks,
Jean

>> +
>> +	spin_lock(&smmu_domain->devices_lock);
>> +	list_for_each_entry(master, &smmu_domain->devices, list) {
>> +		if (!master->processes)
>> +			continue;
>> +
>> +		master->processes--;
> Add
> 		if (domain->process_exit)
> here?
>> +		domain->process_exit(domain, master->dev, process->pasid,
>> +				     domain->process_exit_token);
>> +
>> +		/* TODO: inval ATC */
>> +	}
>> +	spin_unlock(&smmu_domain->devices_lock);
>> +
>> +	arm_smmu_write_ctx_desc(smmu_domain, process->pasid, NULL);
>> +
>> +	/* TODO: Invalidate all mappings if not DVM */
>> +}
>> +
> Thanks
> Yisheng Xie
> 
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 21/36] iommu/arm-smmu-v3: Implement process operations
@ 2017-11-09 12:08         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-09 12:08 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 09/11/17 03:32, Yisheng Xie wrote:
> Hi Jean,
> 
> On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
>> Hook process operations to support PASID and page table sharing with the
>> SMMUv3:
>>
>> +
>> +static void arm_smmu_process_exit(struct iommu_domain *domain,
>> +				  struct iommu_process *process)
>> +{
>> +	struct arm_smmu_master_data *master;
>> +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> +
>> +	if (!domain->process_exit)
>> +		return;
> 
> If domain do not set process_exit handler, just return? smmu do not
> need invalid ATC, clear cd entry, etc.? Maybe you should check when
> call domain->process_exit?

Indeed, that doesn't make sense. I'll move the check below.

Thanks,
Jean

>> +
>> +	spin_lock(&smmu_domain->devices_lock);
>> +	list_for_each_entry(master, &smmu_domain->devices, list) {
>> +		if (!master->processes)
>> +			continue;
>> +
>> +		master->processes--;
> Add
> 		if (domain->process_exit)
> here?
>> +		domain->process_exit(domain, master->dev, process->pasid,
>> +				     domain->process_exit_token);
>> +
>> +		/* TODO: inval ATC */
>> +	}
>> +	spin_unlock(&smmu_domain->devices_lock);
>> +
>> +	arm_smmu_write_ctx_desc(smmu_domain, process->pasid, NULL);
>> +
>> +	/* TODO: Invalidate all mappings if not DVM */
>> +}
>> +
> Thanks
> Yisheng Xie
> 
> 

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
  2017-11-08 17:50         ` Bharat Kumar Gogada
  (?)
@ 2017-11-09 12:13           ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-09 12:13 UTC (permalink / raw)
  To: Bharat Kumar Gogada, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu
  Cc: joro, robh+dt, Mark Rutland, Catalin Marinas, Will Deacon,
	Lorenzo Pieralisi, hanjun.guo, Sudeep Holla, rjw, lenb,
	Robin Murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni

Hi Bharat,

On 11/08/2017 05:50 PM, Bharat Kumar Gogada wrote:
> Hi Jean,
> 
> +static struct iommu_process *
> +iommu_process_alloc(struct iommu_domain *domain, struct task_struct
> +*task) {
> +       int err;
> +       int pasid;
> +       struct iommu_process *process;
> +
> +       if (WARN_ON(!domain->ops->process_alloc ||
> !domain->ops->process_free))
> +               return ERR_PTR(-ENODEV);
> +
> +       process = domain->ops->process_alloc(task);
> +       if (IS_ERR(process))
> +               return process;
> +       if (!process)
> +               return ERR_PTR(-ENOMEM);
> +
> +       process->pid            = get_task_pid(task, PIDTYPE_PID);
> +       process->release        = domain->ops->process_free;
> +       INIT_LIST_HEAD(&process->domains);
> +       kref_init(&process->kref);
> +
> +       if (!process->pid) {
> +               err = -EINVAL;
> +               goto err_free_process;
> +       }
> +
> +       idr_preload(GFP_KERNEL);
> +       spin_lock(&iommu_process_lock);
> +       pasid = idr_alloc_cyclic(&iommu_process_idr, process,
> domain->min_pasid,
> +                                domain->max_pasid + 1, GFP_ATOMIC);
> If EP supports only one pasid; domain->min_pasid=1 and domain->max_pasid=0.
> When idr_alloc_cyclic is called it invokes idr_get_free_cmn function
> where we have following condition. (Based on kernel 4.14-rc6)
>         if (!radix_tree_tagged(root, IDR_FREE))
>                 start = max(start, maxindex + 1);
>               if (start > max)
>                 return ERR_PTR(-ENOSPC);
> Here max is being assigned zero by the time this function is invoked,
> this value is based on domain->max_pasid.
> This condition fails and ENOSPC is returned.
>        
> In this case even though hardware supports PASID, BIND flow fails.

It should fail, since we're reserving PASID 0 for non-PASID transactions
with S1DSS=0b10. In addition, the SMMUv3 specification does not allow
using PASID with a single entry. See the description of S1CDMax in 5.2
Stream Table Entry:

"when this field is 0, the substreams of the STE are disabled and one CD
is available. (The minimum useful number of substreams is 2.) Any
transaction with a SubstreamID will be terminated with an abort and a
C_BAD_SUBSTREAMID event recorded."

> Any reason why pasid allocation moved to idr allocations rather than
> bitmap allocations as in v1 patches ?

Yes, idr provides a convenient way to quickly retrieve the context
associated with a PASID, when handling a fault. v1 had the allocation in a
bitmap and storing in a rb-tree. By using an idr we combine both and rely
on a well-tested infrastructure.

Note that in the future we might need to go back to handcrafting the PASID
allocation, but it will probably still be based on idr.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
@ 2017-11-09 12:13           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-09 12:13 UTC (permalink / raw)
  To: Bharat Kumar Gogada, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu
  Cc: Mark Rutland, xieyisheng1, gabriele.paoloni, Catalin Marinas,
	Will Deacon, okaya, yi.l.liu, Lorenzo Pieralisi, ashok.raj, tn,
	joro, rfranz, lenb, jacob.jun.pan, alex.williamson, robh+dt,
	thunder.leizhen, bhelgaas, dwmw2, liubo95, rjw, robdclark,
	hanjun.guo, Sudeep Holla, Robin Murphy, nwatters

Hi Bharat,

On 11/08/2017 05:50 PM, Bharat Kumar Gogada wrote:
> Hi Jean,
> 
> +static struct iommu_process *
> +iommu_process_alloc(struct iommu_domain *domain, struct task_struct
> +*task) {
> +       int err;
> +       int pasid;
> +       struct iommu_process *process;
> +
> +       if (WARN_ON(!domain->ops->process_alloc ||
> !domain->ops->process_free))
> +               return ERR_PTR(-ENODEV);
> +
> +       process = domain->ops->process_alloc(task);
> +       if (IS_ERR(process))
> +               return process;
> +       if (!process)
> +               return ERR_PTR(-ENOMEM);
> +
> +       process->pid            = get_task_pid(task, PIDTYPE_PID);
> +       process->release        = domain->ops->process_free;
> +       INIT_LIST_HEAD(&process->domains);
> +       kref_init(&process->kref);
> +
> +       if (!process->pid) {
> +               err = -EINVAL;
> +               goto err_free_process;
> +       }
> +
> +       idr_preload(GFP_KERNEL);
> +       spin_lock(&iommu_process_lock);
> +       pasid = idr_alloc_cyclic(&iommu_process_idr, process,
> domain->min_pasid,
> +                                domain->max_pasid + 1, GFP_ATOMIC);
> If EP supports only one pasid; domain->min_pasid=1 and domain->max_pasid=0.
> When idr_alloc_cyclic is called it invokes idr_get_free_cmn function
> where we have following condition. (Based on kernel 4.14-rc6)
>         if (!radix_tree_tagged(root, IDR_FREE))
>                 start = max(start, maxindex + 1);
>               if (start > max)
>                 return ERR_PTR(-ENOSPC);
> Here max is being assigned zero by the time this function is invoked,
> this value is based on domain->max_pasid.
> This condition fails and ENOSPC is returned.
>        
> In this case even though hardware supports PASID, BIND flow fails.

It should fail, since we're reserving PASID 0 for non-PASID transactions
with S1DSS=0b10. In addition, the SMMUv3 specification does not allow
using PASID with a single entry. See the description of S1CDMax in 5.2
Stream Table Entry:

"when this field is 0, the substreams of the STE are disabled and one CD
is available. (The minimum useful number of substreams is 2.) Any
transaction with a SubstreamID will be terminated with an abort and a
C_BAD_SUBSTREAMID event recorded."

> Any reason why pasid allocation moved to idr allocations rather than
> bitmap allocations as in v1 patches ?

Yes, idr provides a convenient way to quickly retrieve the context
associated with a PASID, when handling a fault. v1 had the allocation in a
bitmap and storing in a rb-tree. By using an idr we combine both and rely
on a well-tested infrastructure.

Note that in the future we might need to go back to handcrafting the PASID
allocation, but it will probably still be based on idr.

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
@ 2017-11-09 12:13           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-09 12:13 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Bharat,

On 11/08/2017 05:50 PM, Bharat Kumar Gogada wrote:
> Hi Jean,
> 
> +static struct iommu_process *
> +iommu_process_alloc(struct iommu_domain *domain, struct task_struct
> +*task) {
> +       int err;
> +       int pasid;
> +       struct iommu_process *process;
> +
> +       if (WARN_ON(!domain->ops->process_alloc ||
> !domain->ops->process_free))
> +               return ERR_PTR(-ENODEV);
> +
> +       process = domain->ops->process_alloc(task);
> +       if (IS_ERR(process))
> +               return process;
> +       if (!process)
> +               return ERR_PTR(-ENOMEM);
> +
> +       process->pid            = get_task_pid(task, PIDTYPE_PID);
> +       process->release        = domain->ops->process_free;
> +       INIT_LIST_HEAD(&process->domains);
> +       kref_init(&process->kref);
> +
> +       if (!process->pid) {
> +               err = -EINVAL;
> +               goto err_free_process;
> +       }
> +
> +       idr_preload(GFP_KERNEL);
> +       spin_lock(&iommu_process_lock);
> +       pasid = idr_alloc_cyclic(&iommu_process_idr, process,
> domain->min_pasid,
> +                                domain->max_pasid + 1, GFP_ATOMIC);
> If EP supports only one pasid; domain->min_pasid=1 and domain->max_pasid=0.
> When idr_alloc_cyclic is called it invokes idr_get_free_cmn function
> where we have following condition. (Based on kernel 4.14-rc6)
>         if (!radix_tree_tagged(root, IDR_FREE))
>                 start = max(start, maxindex + 1);
>               if (start > max)
>                 return ERR_PTR(-ENOSPC);
> Here max is being assigned zero by the time this function is invoked,
> this value is based on domain->max_pasid.
> This condition fails and ENOSPC is returned.
>        
> In this case even though hardware supports PASID, BIND flow fails.

It should fail, since we're reserving PASID 0 for non-PASID transactions
with S1DSS=0b10. In addition, the SMMUv3 specification does not allow
using PASID with a single entry. See the description of S1CDMax in 5.2
Stream Table Entry:

"when this field is 0, the substreams of the STE are disabled and one CD
is available. (The minimum useful number of substreams is 2.) Any
transaction with a SubstreamID will be terminated with an abort and a
C_BAD_SUBSTREAMID event recorded."

> Any reason why pasid allocation moved to idr allocations rather than
> bitmap allocations as in v1 patches ?

Yes, idr provides a convenient way to quickly retrieve the context
associated with a PASID, when handling a fault. v1 had the allocation in a
bitmap and storing in a rb-tree. By using an idr we combine both and rely
on a well-tested infrastructure.

Note that in the future we might need to go back to handcrafting the PASID
allocation, but it will probably still be based on idr.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
  2017-11-08 17:50         ` Bharat Kumar Gogada
  (?)
@ 2017-11-09 12:16             ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-09 12:16 UTC (permalink / raw)
  To: Bharat Kumar Gogada,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: Mark Rutland, gabriele.paoloni-hv44wF8Li93QT0dZR+AlfA,
	Catalin Marinas, Will Deacon, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw-LthD3rsA81gm4RdzfppkhA,
	Sudeep Holla

Hi Bharat,

On 11/08/2017 05:50 PM, Bharat Kumar Gogada wrote:
> Hi Jean,
> 
> +static struct iommu_process *
> +iommu_process_alloc(struct iommu_domain *domain, struct task_struct
> +*task) {
> +       int err;
> +       int pasid;
> +       struct iommu_process *process;
> +
> +       if (WARN_ON(!domain->ops->process_alloc ||
> !domain->ops->process_free))
> +               return ERR_PTR(-ENODEV);
> +
> +       process = domain->ops->process_alloc(task);
> +       if (IS_ERR(process))
> +               return process;
> +       if (!process)
> +               return ERR_PTR(-ENOMEM);
> +
> +       process->pid            = get_task_pid(task, PIDTYPE_PID);
> +       process->release        = domain->ops->process_free;
> +       INIT_LIST_HEAD(&process->domains);
> +       kref_init(&process->kref);
> +
> +       if (!process->pid) {
> +               err = -EINVAL;
> +               goto err_free_process;
> +       }
> +
> +       idr_preload(GFP_KERNEL);
> +       spin_lock(&iommu_process_lock);
> +       pasid = idr_alloc_cyclic(&iommu_process_idr, process,
> domain->min_pasid,
> +                                domain->max_pasid + 1, GFP_ATOMIC);
> If EP supports only one pasid; domain->min_pasid=1 and domain->max_pasid=0.
> When idr_alloc_cyclic is called it invokes idr_get_free_cmn function
> where we have following condition. (Based on kernel 4.14-rc6)
>         if (!radix_tree_tagged(root, IDR_FREE))
>                 start = max(start, maxindex + 1);
>               if (start > max)
>                 return ERR_PTR(-ENOSPC);
> Here max is being assigned zero by the time this function is invoked,
> this value is based on domain->max_pasid.
> This condition fails and ENOSPC is returned.
>        
> In this case even though hardware supports PASID, BIND flow fails.

It should fail, since we're reserving PASID 0 for non-PASID transactions
with S1DSS=0b10. In addition, the SMMUv3 specification does not allow
using PASID with a single entry. See the description of S1CDMax in 5.2
Stream Table Entry:

"when this field is 0, the substreams of the STE are disabled and one CD
is available. (The minimum useful number of substreams is 2.) Any
transaction with a SubstreamID will be terminated with an abort and a
C_BAD_SUBSTREAMID event recorded."

> Any reason why pasid allocation moved to idr allocations rather than
> bitmap allocations as in v1 patches ?

Yes, idr provides a convenient way to quickly retrieve the context
associated with a PASID, when handling a fault. v1 had the allocation in a
bitmap and storing in a rb-tree. By using an idr we combine both and rely
on a well-tested infrastructure.

Note that in the future we might need to go back to handcrafting the PASID
allocation, but it will probably still be based on idr.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
@ 2017-11-09 12:16             ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-09 12:16 UTC (permalink / raw)
  To: Bharat Kumar Gogada, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu
  Cc: joro, robh+dt, Mark Rutland, Catalin Marinas, Will Deacon,
	Lorenzo Pieralisi, hanjun.guo, Sudeep Holla, rjw, lenb,
	Robin Murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

Hi Bharat,

On 11/08/2017 05:50 PM, Bharat Kumar Gogada wrote:
> Hi Jean,
> 
> +static struct iommu_process *
> +iommu_process_alloc(struct iommu_domain *domain, struct task_struct
> +*task) {
> +       int err;
> +       int pasid;
> +       struct iommu_process *process;
> +
> +       if (WARN_ON(!domain->ops->process_alloc ||
> !domain->ops->process_free))
> +               return ERR_PTR(-ENODEV);
> +
> +       process = domain->ops->process_alloc(task);
> +       if (IS_ERR(process))
> +               return process;
> +       if (!process)
> +               return ERR_PTR(-ENOMEM);
> +
> +       process->pid            = get_task_pid(task, PIDTYPE_PID);
> +       process->release        = domain->ops->process_free;
> +       INIT_LIST_HEAD(&process->domains);
> +       kref_init(&process->kref);
> +
> +       if (!process->pid) {
> +               err = -EINVAL;
> +               goto err_free_process;
> +       }
> +
> +       idr_preload(GFP_KERNEL);
> +       spin_lock(&iommu_process_lock);
> +       pasid = idr_alloc_cyclic(&iommu_process_idr, process,
> domain->min_pasid,
> +                                domain->max_pasid + 1, GFP_ATOMIC);
> If EP supports only one pasid; domain->min_pasid=1 and domain->max_pasid=0.
> When idr_alloc_cyclic is called it invokes idr_get_free_cmn function
> where we have following condition. (Based on kernel 4.14-rc6)
>         if (!radix_tree_tagged(root, IDR_FREE))
>                 start = max(start, maxindex + 1);
>               if (start > max)
>                 return ERR_PTR(-ENOSPC);
> Here max is being assigned zero by the time this function is invoked,
> this value is based on domain->max_pasid.
> This condition fails and ENOSPC is returned.
>        
> In this case even though hardware supports PASID, BIND flow fails.

It should fail, since we're reserving PASID 0 for non-PASID transactions
with S1DSS=0b10. In addition, the SMMUv3 specification does not allow
using PASID with a single entry. See the description of S1CDMax in 5.2
Stream Table Entry:

"when this field is 0, the substreams of the STE are disabled and one CD
is available. (The minimum useful number of substreams is 2.) Any
transaction with a SubstreamID will be terminated with an abort and a
C_BAD_SUBSTREAMID event recorded."

> Any reason why pasid allocation moved to idr allocations rather than
> bitmap allocations as in v1 patches ?

Yes, idr provides a convenient way to quickly retrieve the context
associated with a PASID, when handling a fault. v1 had the allocation in a
bitmap and storing in a rb-tree. By using an idr we combine both and rely
on a well-tested infrastructure.

Note that in the future we might need to go back to handcrafting the PASID
allocation, but it will probably still be based on idr.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
@ 2017-11-09 12:16             ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-09 12:16 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Bharat,

On 11/08/2017 05:50 PM, Bharat Kumar Gogada wrote:
> Hi Jean,
> 
> +static struct iommu_process *
> +iommu_process_alloc(struct iommu_domain *domain, struct task_struct
> +*task) {
> +       int err;
> +       int pasid;
> +       struct iommu_process *process;
> +
> +       if (WARN_ON(!domain->ops->process_alloc ||
> !domain->ops->process_free))
> +               return ERR_PTR(-ENODEV);
> +
> +       process = domain->ops->process_alloc(task);
> +       if (IS_ERR(process))
> +               return process;
> +       if (!process)
> +               return ERR_PTR(-ENOMEM);
> +
> +       process->pid            = get_task_pid(task, PIDTYPE_PID);
> +       process->release        = domain->ops->process_free;
> +       INIT_LIST_HEAD(&process->domains);
> +       kref_init(&process->kref);
> +
> +       if (!process->pid) {
> +               err = -EINVAL;
> +               goto err_free_process;
> +       }
> +
> +       idr_preload(GFP_KERNEL);
> +       spin_lock(&iommu_process_lock);
> +       pasid = idr_alloc_cyclic(&iommu_process_idr, process,
> domain->min_pasid,
> +                                domain->max_pasid + 1, GFP_ATOMIC);
> If EP supports only one pasid; domain->min_pasid=1 and domain->max_pasid=0.
> When idr_alloc_cyclic is called it invokes idr_get_free_cmn function
> where we have following condition. (Based on kernel 4.14-rc6)
>         if (!radix_tree_tagged(root, IDR_FREE))
>                 start = max(start, maxindex + 1);
>               if (start > max)
>                 return ERR_PTR(-ENOSPC);
> Here max is being assigned zero by the time this function is invoked,
> this value is based on domain->max_pasid.
> This condition fails and ENOSPC is returned.
>        
> In this case even though hardware supports PASID, BIND flow fails.

It should fail, since we're reserving PASID 0 for non-PASID transactions
with S1DSS=0b10. In addition, the SMMUv3 specification does not allow
using PASID with a single entry. See the description of S1CDMax in 5.2
Stream Table Entry:

"when this field is 0, the substreams of the STE are disabled and one CD
is available. (The minimum useful number of substreams is 2.) Any
transaction with a SubstreamID will be terminated with an abort and a
C_BAD_SUBSTREAMID event recorded."

> Any reason why pasid allocation moved to idr allocations rather than
> bitmap allocations as in v1 patches ?

Yes, idr provides a convenient way to quickly retrieve the context
associated with a PASID, when handling a fault. v1 had the allocation in a
bitmap and storing in a rb-tree. By using an idr we combine both and rely
on a well-tested infrastructure.

Note that in the future we might need to go back to handcrafting the PASID
allocation, but it will probably still be based on idr.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* RE: [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
  2017-11-09 12:16             ` Jean-Philippe Brucker
  (?)
@ 2017-11-13 11:06                 ` Bharat Kumar Gogada
  -1 siblings, 0 replies; 268+ messages in thread
From: Bharat Kumar Gogada @ 2017-11-13 11:06 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: joro-zLv9SwRftAIdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	Mark Rutland, Catalin Marinas, Will Deacon,
	lorenzo.pieralisi-5wv7dgnIgG8, hanjun.guo-QSEj5FYQhm4dnm+yROfE0A,
	Sudeep Holla, rjw-LthD3rsA81gm4RdzfppkhA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, Robin Murphy,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	tn-nYOzD4b6Jr9Wk0Htik3J/w, liubo95-hv44wF8Li93QT0dZR+AlfA,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	xieyisheng1-hv44wF8Li93QT0dZR+AlfA, gabriele.pao

>        
> In this case even though hardware supports PASID, BIND flow fails.

It should fail, since we're reserving PASID 0 for non-PASID transactions with S1DSS=0b10. In addition, the SMMUv3 specification does not allow using PASID with a single entry. See the description of S1CDMax in 5.2 Stream Table Entry:

"when this field is 0, the substreams of the STE are disabled and one CD is available. (The minimum useful number of substreams is 2.) Any transaction with a SubstreamID will be terminated with an abort and a C_BAD_SUBSTREAMID event recorded."

> Any reason why pasid allocation moved to idr allocations rather than 
> bitmap allocations as in v1 patches ?

Yes, idr provides a convenient way to quickly retrieve the context associated with a PASID, when handling a fault. v1 had the allocation in a bitmap and storing in a rb-tree. By using an idr we combine both and rely on a well-tested infrastructure.

Note that in the future we might need to go back to handcrafting the PASID allocation, but it will probably still be based on idr.

Thanks for the clarification, Jean. 
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 268+ messages in thread

* RE: [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
@ 2017-11-13 11:06                 ` Bharat Kumar Gogada
  0 siblings, 0 replies; 268+ messages in thread
From: Bharat Kumar Gogada @ 2017-11-13 11:06 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu
  Cc: joro, robh+dt, Mark Rutland, Catalin Marinas, Will Deacon,
	lorenzo.pieralisi, hanjun.guo, Sudeep Holla, rjw, lenb,
	Robin Murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, okaya,
	rfranz, dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

>       =20
> In this case even though hardware supports PASID, BIND flow fails.

It should fail, since we're reserving PASID 0 for non-PASID transactions wi=
th S1DSS=3D0b10. In addition, the SMMUv3 specification does not allow using=
 PASID with a single entry. See the description of S1CDMax in 5.2 Stream Ta=
ble Entry:

"when this field is 0, the substreams of the STE are disabled and one CD is=
 available. (The minimum useful number of substreams is 2.) Any transaction=
 with a SubstreamID will be terminated with an abort and a C_BAD_SUBSTREAMI=
D event recorded."

> Any reason why pasid allocation moved to idr allocations rather than=20
> bitmap allocations as in v1 patches ?

Yes, idr provides a convenient way to quickly retrieve the context associat=
ed with a PASID, when handling a fault. v1 had the allocation in a bitmap a=
nd storing in a rb-tree. By using an idr we combine both and rely on a well=
-tested infrastructure.

Note that in the future we might need to go back to handcrafting the PASID =
allocation, but it will probably still be based on idr.

Thanks for the clarification, Jean.=20

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
@ 2017-11-13 11:06                 ` Bharat Kumar Gogada
  0 siblings, 0 replies; 268+ messages in thread
From: Bharat Kumar Gogada @ 2017-11-13 11:06 UTC (permalink / raw)
  To: linux-arm-kernel

>        
> In this case even though hardware supports PASID, BIND flow fails.

It should fail, since we're reserving PASID 0 for non-PASID transactions with S1DSS=0b10. In addition, the SMMUv3 specification does not allow using PASID with a single entry. See the description of S1CDMax in 5.2 Stream Table Entry:

"when this field is 0, the substreams of the STE are disabled and one CD is available. (The minimum useful number of substreams is 2.) Any transaction with a SubstreamID will be terminated with an abort and a C_BAD_SUBSTREAMID event recorded."

> Any reason why pasid allocation moved to idr allocations rather than 
> bitmap allocations as in v1 patches ?

Yes, idr provides a convenient way to quickly retrieve the context associated with a PASID, when handling a fault. v1 had the allocation in a bitmap and storing in a rb-tree. By using an idr we combine both and rely on a well-tested infrastructure.

Note that in the future we might need to go back to handcrafting the PASID allocation, but it will probably still be based on idr.

Thanks for the clarification, Jean. 

^ permalink raw reply	[flat|nested] 268+ messages in thread

* RE: [RFCv2 PATCH 31/36] iommu/arm-smmu-v3: Add support for PCI ATS
  2017-10-06 13:31   ` Jean-Philippe Brucker
  (?)
@ 2017-11-16 14:19     ` Bharat Kumar Gogada
  -1 siblings, 0 replies; 268+ messages in thread
From: Bharat Kumar Gogada @ 2017-11-16 14:19 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1

Hi Jean,

+static size_t
+arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
+			unsigned long iova, size_t size)
+{
+	unsigned long flags;
+	struct arm_smmu_cmdq_ent cmd;
+	struct arm_smmu_master_data *master;
+
+	arm_smmu_atc_inv_to_cmd(ssid, iova, size, &cmd);
+
+	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+	list_for_each_entry(master, &smmu_domain->devices, list)
+		arm_smmu_atc_inv_master(master, &cmd);
+	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+
+	return size;
+}
+
 /* IOMMU API */
 static bool arm_smmu_capable(enum iommu_cap cap)  { @@ -2361,6 +2506,8 @@ static void arm_smmu_detach_dev(struct device *dev)
 		__iommu_process_unbind_dev_all(&smmu_domain->domain, dev);
 
 	if (smmu_domain) {
+		arm_smmu_atc_inv_master_all(master, 0);
+
In BIND flow, when VFIO_IOMMU_UNBIND is invoked invalidation is sent on allocated PASID for this application.
When vfio group fd is closed after UNBIND,  arm_smmu_detach_dev is invoked now invalidation is being sent on ssid zero.
Why  invalidation needs to be sent on ssid zero ?

Regards,
Bharat


^ permalink raw reply	[flat|nested] 268+ messages in thread

* RE: [RFCv2 PATCH 31/36] iommu/arm-smmu-v3: Add support for PCI ATS
@ 2017-11-16 14:19     ` Bharat Kumar Gogada
  0 siblings, 0 replies; 268+ messages in thread
From: Bharat Kumar Gogada @ 2017-11-16 14:19 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu
  Cc: mark.rutland, xieyisheng1, gabriele.paoloni, catalin.marinas,
	will.deacon, okaya, yi.l.liu, lorenzo.pieralisi, ashok.raj, tn,
	joro, rfranz, lenb, jacob.jun.pan, alex.williamson, robh+dt,
	thunder.leizhen, bhelgaas, dwmw2, liubo95, rjw, robdclark,
	hanjun.guo, sudeep.holla, robin.murphy, nwatters

Hi Jean,

+static size_t
+arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
+			unsigned long iova, size_t size)
+{
+	unsigned long flags;
+	struct arm_smmu_cmdq_ent cmd;
+	struct arm_smmu_master_data *master;
+
+	arm_smmu_atc_inv_to_cmd(ssid, iova, size, &cmd);
+
+	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+	list_for_each_entry(master, &smmu_domain->devices, list)
+		arm_smmu_atc_inv_master(master, &cmd);
+	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+
+	return size;
+}
+
 /* IOMMU API */
 static bool arm_smmu_capable(enum iommu_cap cap)  { @@ -2361,6 +2506,8 @@ static void arm_smmu_detach_dev(struct device *dev)
 		__iommu_process_unbind_dev_all(&smmu_domain->domain, dev);
 
 	if (smmu_domain) {
+		arm_smmu_atc_inv_master_all(master, 0);
+
In BIND flow, when VFIO_IOMMU_UNBIND is invoked invalidation is sent on allocated PASID for this application.
When vfio group fd is closed after UNBIND,  arm_smmu_detach_dev is invoked now invalidation is being sent on ssid zero.
Why  invalidation needs to be sent on ssid zero ?

Regards,
Bharat


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 31/36] iommu/arm-smmu-v3: Add support for PCI ATS
@ 2017-11-16 14:19     ` Bharat Kumar Gogada
  0 siblings, 0 replies; 268+ messages in thread
From: Bharat Kumar Gogada @ 2017-11-16 14:19 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jean,

+static size_t
+arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
+			unsigned long iova, size_t size)
+{
+	unsigned long flags;
+	struct arm_smmu_cmdq_ent cmd;
+	struct arm_smmu_master_data *master;
+
+	arm_smmu_atc_inv_to_cmd(ssid, iova, size, &cmd);
+
+	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+	list_for_each_entry(master, &smmu_domain->devices, list)
+		arm_smmu_atc_inv_master(master, &cmd);
+	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+
+	return size;
+}
+
 /* IOMMU API */
 static bool arm_smmu_capable(enum iommu_cap cap)  { @@ -2361,6 +2506,8 @@ static void arm_smmu_detach_dev(struct device *dev)
 		__iommu_process_unbind_dev_all(&smmu_domain->domain, dev);
 
 	if (smmu_domain) {
+		arm_smmu_atc_inv_master_all(master, 0);
+
In BIND flow, when VFIO_IOMMU_UNBIND is invoked invalidation is sent on allocated PASID for this application.
When vfio group fd is closed after UNBIND,  arm_smmu_detach_dev is invoked now invalidation is being sent on ssid zero.
Why  invalidation needs to be sent on ssid zero ?

Regards,
Bharat

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 31/36] iommu/arm-smmu-v3: Add support for PCI ATS
  2017-11-16 14:19     ` Bharat Kumar Gogada
  (?)
@ 2017-11-16 15:03         ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-16 15:03 UTC (permalink / raw)
  To: Bharat Kumar Gogada,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: Mark Rutland, gabriele.paoloni-hv44wF8Li93QT0dZR+AlfA,
	Catalin Marinas, Will Deacon, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw-LthD3rsA81gm4RdzfppkhA,
	Sudeep Holla

Hi Bharat,

On 16/11/17 14:19, Bharat Kumar Gogada wrote:
> Hi Jean,
> 
> +static size_t
> +arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
> +			unsigned long iova, size_t size)
> +{
> +	unsigned long flags;
> +	struct arm_smmu_cmdq_ent cmd;
> +	struct arm_smmu_master_data *master;
> +
> +	arm_smmu_atc_inv_to_cmd(ssid, iova, size, &cmd);
> +
> +	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
> +	list_for_each_entry(master, &smmu_domain->devices, list)
> +		arm_smmu_atc_inv_master(master, &cmd);
> +	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
> +
> +	return size;
> +}
> +
>  /* IOMMU API */
>  static bool arm_smmu_capable(enum iommu_cap cap)  { @@ -2361,6 +2506,8 @@ static void arm_smmu_detach_dev(struct device *dev)
>  		__iommu_process_unbind_dev_all(&smmu_domain->domain, dev);
>  
>  	if (smmu_domain) {
> +		arm_smmu_atc_inv_master_all(master, 0);
> +
> In BIND flow, when VFIO_IOMMU_UNBIND is invoked invalidation is sent on allocated PASID for this application.
> When vfio group fd is closed after UNBIND,  arm_smmu_detach_dev is invoked now invalidation is being sent on ssid zero.
> Why  invalidation needs to be sent on ssid zero ?

It's possible to use bind/unbind and map/unmap APIs at the same time on a
domain. map/unmap modifies non-ssid mappings as usual, for context
descriptor 0. Note that SSID 0 is converted to "no SSID" by
arm_smmu_atc_inv_to_cmd.

That said, I think VFIO cleans all DMA mappings when the VFIO group fd is
closed, which will send individual ATC invalidation for each mapping. But
VFIO may not be the only user of this API, and future users may be less
careful. It's safer to send this global invalidation whenever we detach
the domain, to ensure we're not leaving stale ATC entries for the next user.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 31/36] iommu/arm-smmu-v3: Add support for PCI ATS
@ 2017-11-16 15:03         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-16 15:03 UTC (permalink / raw)
  To: Bharat Kumar Gogada, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu
  Cc: Mark Rutland, xieyisheng1, gabriele.paoloni, Catalin Marinas,
	Will Deacon, okaya, yi.l.liu, Lorenzo Pieralisi, ashok.raj, tn,
	joro, rfranz, lenb, jacob.jun.pan, alex.williamson, robh+dt,
	thunder.leizhen, bhelgaas, dwmw2, liubo95, rjw, robdclark,
	hanjun.guo, Sudeep Holla, Robin Murphy, nwatters

Hi Bharat,

On 16/11/17 14:19, Bharat Kumar Gogada wrote:
> Hi Jean,
> 
> +static size_t
> +arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
> +			unsigned long iova, size_t size)
> +{
> +	unsigned long flags;
> +	struct arm_smmu_cmdq_ent cmd;
> +	struct arm_smmu_master_data *master;
> +
> +	arm_smmu_atc_inv_to_cmd(ssid, iova, size, &cmd);
> +
> +	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
> +	list_for_each_entry(master, &smmu_domain->devices, list)
> +		arm_smmu_atc_inv_master(master, &cmd);
> +	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
> +
> +	return size;
> +}
> +
>  /* IOMMU API */
>  static bool arm_smmu_capable(enum iommu_cap cap)  { @@ -2361,6 +2506,8 @@ static void arm_smmu_detach_dev(struct device *dev)
>  		__iommu_process_unbind_dev_all(&smmu_domain->domain, dev);
>  
>  	if (smmu_domain) {
> +		arm_smmu_atc_inv_master_all(master, 0);
> +
> In BIND flow, when VFIO_IOMMU_UNBIND is invoked invalidation is sent on allocated PASID for this application.
> When vfio group fd is closed after UNBIND,  arm_smmu_detach_dev is invoked now invalidation is being sent on ssid zero.
> Why  invalidation needs to be sent on ssid zero ?

It's possible to use bind/unbind and map/unmap APIs at the same time on a
domain. map/unmap modifies non-ssid mappings as usual, for context
descriptor 0. Note that SSID 0 is converted to "no SSID" by
arm_smmu_atc_inv_to_cmd.

That said, I think VFIO cleans all DMA mappings when the VFIO group fd is
closed, which will send individual ATC invalidation for each mapping. But
VFIO may not be the only user of this API, and future users may be less
careful. It's safer to send this global invalidation whenever we detach
the domain, to ensure we're not leaving stale ATC entries for the next user.

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 31/36] iommu/arm-smmu-v3: Add support for PCI ATS
@ 2017-11-16 15:03         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-16 15:03 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Bharat,

On 16/11/17 14:19, Bharat Kumar Gogada wrote:
> Hi Jean,
> 
> +static size_t
> +arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
> +			unsigned long iova, size_t size)
> +{
> +	unsigned long flags;
> +	struct arm_smmu_cmdq_ent cmd;
> +	struct arm_smmu_master_data *master;
> +
> +	arm_smmu_atc_inv_to_cmd(ssid, iova, size, &cmd);
> +
> +	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
> +	list_for_each_entry(master, &smmu_domain->devices, list)
> +		arm_smmu_atc_inv_master(master, &cmd);
> +	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
> +
> +	return size;
> +}
> +
>  /* IOMMU API */
>  static bool arm_smmu_capable(enum iommu_cap cap)  { @@ -2361,6 +2506,8 @@ static void arm_smmu_detach_dev(struct device *dev)
>  		__iommu_process_unbind_dev_all(&smmu_domain->domain, dev);
>  
>  	if (smmu_domain) {
> +		arm_smmu_atc_inv_master_all(master, 0);
> +
> In BIND flow, when VFIO_IOMMU_UNBIND is invoked invalidation is sent on allocated PASID for this application.
> When vfio group fd is closed after UNBIND,  arm_smmu_detach_dev is invoked now invalidation is being sent on ssid zero.
> Why  invalidation needs to be sent on ssid zero ?

It's possible to use bind/unbind and map/unmap APIs at the same time on a
domain. map/unmap modifies non-ssid mappings as usual, for context
descriptor 0. Note that SSID 0 is converted to "no SSID" by
arm_smmu_atc_inv_to_cmd.

That said, I think VFIO cleans all DMA mappings when the VFIO group fd is
closed, which will send individual ATC invalidation for each mapping. But
VFIO may not be the only user of this API, and future users may be less
careful. It's safer to send this global invalidation whenever we detach
the domain, to ensure we're not leaving stale ATC entries for the next user.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* RE: [RFCv2 PATCH 31/36] iommu/arm-smmu-v3: Add support for PCI ATS
  2017-11-16 15:03         ` Jean-Philippe Brucker
  (?)
@ 2017-11-17  6:11             ` Bharat Kumar Gogada
  -1 siblings, 0 replies; 268+ messages in thread
From: Bharat Kumar Gogada @ 2017-11-17  6:11 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: Mark Rutland, gabriele.paoloni-hv44wF8Li93QT0dZR+AlfA,
	Catalin Marinas, Will Deacon, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw-LthD3rsA81gm4RdzfppkhA,
	Sudeep Holla

> +
> +	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
> +	list_for_each_entry(master, &smmu_domain->devices, list)
> +		arm_smmu_atc_inv_master(master, &cmd);
> +	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
> +
> +	return size;
> +}
> +
>  /* IOMMU API */
>  static bool arm_smmu_capable(enum iommu_cap cap)  { @@ -2361,6 +2506,8 @@ static void arm_smmu_detach_dev(struct device *dev)
>  		__iommu_process_unbind_dev_all(&smmu_domain->domain, dev);
>  
>  	if (smmu_domain) {
> +		arm_smmu_atc_inv_master_all(master, 0);
> +
> In BIND flow, when VFIO_IOMMU_UNBIND is invoked invalidation is sent on allocated PASID for this application.
> When vfio group fd is closed after UNBIND,  arm_smmu_detach_dev is invoked now invalidation is being sent on ssid zero.
> Why  invalidation needs to be sent on ssid zero ?

It's possible to use bind/unbind and map/unmap APIs at the same time on a domain. map/unmap modifies non-ssid mappings as usual, for context descriptor 0. Note that SSID 0 is converted to "no SSID" by arm_smmu_atc_inv_to_cmd.

That said, I think VFIO cleans all DMA mappings when the VFIO group fd is closed, which will send individual ATC invalidation for each mapping. But VFIO may not be the only user of this API, and future users may be less careful. It's safer to send this global invalidation whenever we detach the domain, to ensure we're not leaving stale ATC entries for the next user.

Thanks Jean, I see that currently vfio_group_fops_open does not allow multiple instances. 
If a device supports multiple PASID there might be different applications running parallel. 
So why is multiple instances restricted ?

Regards,
Bharat

^ permalink raw reply	[flat|nested] 268+ messages in thread

* RE: [RFCv2 PATCH 31/36] iommu/arm-smmu-v3: Add support for PCI ATS
@ 2017-11-17  6:11             ` Bharat Kumar Gogada
  0 siblings, 0 replies; 268+ messages in thread
From: Bharat Kumar Gogada @ 2017-11-17  6:11 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu
  Cc: Mark Rutland, xieyisheng1, gabriele.paoloni, Catalin Marinas,
	Will Deacon, okaya, yi.l.liu, lorenzo.pieralisi, ashok.raj, tn,
	joro, rfranz, lenb, jacob.jun.pan, alex.williamson, robh+dt,
	thunder.leizhen, bhelgaas, dwmw2, liubo95, rjw, robdclark,
	hanjun.guo, Sudeep Holla, Robin Murphy, nwatters

> +
> +	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
> +	list_for_each_entry(master, &smmu_domain->devices, list)
> +		arm_smmu_atc_inv_master(master, &cmd);
> +	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
> +
> +	return size;
> +}
> +
>  /* IOMMU API */
>  static bool arm_smmu_capable(enum iommu_cap cap)  { @@ -2361,6 +2506,8 @@ static void arm_smmu_detach_dev(struct device *dev)
>  		__iommu_process_unbind_dev_all(&smmu_domain->domain, dev);
>  
>  	if (smmu_domain) {
> +		arm_smmu_atc_inv_master_all(master, 0);
> +
> In BIND flow, when VFIO_IOMMU_UNBIND is invoked invalidation is sent on allocated PASID for this application.
> When vfio group fd is closed after UNBIND,  arm_smmu_detach_dev is invoked now invalidation is being sent on ssid zero.
> Why  invalidation needs to be sent on ssid zero ?

It's possible to use bind/unbind and map/unmap APIs at the same time on a domain. map/unmap modifies non-ssid mappings as usual, for context descriptor 0. Note that SSID 0 is converted to "no SSID" by arm_smmu_atc_inv_to_cmd.

That said, I think VFIO cleans all DMA mappings when the VFIO group fd is closed, which will send individual ATC invalidation for each mapping. But VFIO may not be the only user of this API, and future users may be less careful. It's safer to send this global invalidation whenever we detach the domain, to ensure we're not leaving stale ATC entries for the next user.

Thanks Jean, I see that currently vfio_group_fops_open does not allow multiple instances. 
If a device supports multiple PASID there might be different applications running parallel. 
So why is multiple instances restricted ?

Regards,
Bharat




_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 31/36] iommu/arm-smmu-v3: Add support for PCI ATS
@ 2017-11-17  6:11             ` Bharat Kumar Gogada
  0 siblings, 0 replies; 268+ messages in thread
From: Bharat Kumar Gogada @ 2017-11-17  6:11 UTC (permalink / raw)
  To: linux-arm-kernel

> +
> +	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
> +	list_for_each_entry(master, &smmu_domain->devices, list)
> +		arm_smmu_atc_inv_master(master, &cmd);
> +	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
> +
> +	return size;
> +}
> +
>  /* IOMMU API */
>  static bool arm_smmu_capable(enum iommu_cap cap)  { @@ -2361,6 +2506,8 @@ static void arm_smmu_detach_dev(struct device *dev)
>  		__iommu_process_unbind_dev_all(&smmu_domain->domain, dev);
>  
>  	if (smmu_domain) {
> +		arm_smmu_atc_inv_master_all(master, 0);
> +
> In BIND flow, when VFIO_IOMMU_UNBIND is invoked invalidation is sent on allocated PASID for this application.
> When vfio group fd is closed after UNBIND,  arm_smmu_detach_dev is invoked now invalidation is being sent on ssid zero.
> Why  invalidation needs to be sent on ssid zero ?

It's possible to use bind/unbind and map/unmap APIs at the same time on a domain. map/unmap modifies non-ssid mappings as usual, for context descriptor 0. Note that SSID 0 is converted to "no SSID" by arm_smmu_atc_inv_to_cmd.

That said, I think VFIO cleans all DMA mappings when the VFIO group fd is closed, which will send individual ATC invalidation for each mapping. But VFIO may not be the only user of this API, and future users may be less careful. It's safer to send this global invalidation whenever we detach the domain, to ensure we're not leaving stale ATC entries for the next user.

Thanks Jean, I see that currently vfio_group_fops_open does not allow multiple instances. 
If a device supports multiple PASID there might be different applications running parallel. 
So why is multiple instances restricted ?

Regards,
Bharat

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 31/36] iommu/arm-smmu-v3: Add support for PCI ATS
  2017-11-17  6:11             ` Bharat Kumar Gogada
  (?)
@ 2017-11-17 11:39                 ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-17 11:39 UTC (permalink / raw)
  To: Bharat Kumar Gogada,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: Mark Rutland, gabriele.paoloni-hv44wF8Li93QT0dZR+AlfA,
	Catalin Marinas, Will Deacon, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw-LthD3rsA81gm4RdzfppkhA,
	Sudeep Holla

On 17/11/17 06:11, Bharat Kumar Gogada wrote:
[...]
> Thanks Jean, I see that currently vfio_group_fops_open does not allow multiple instances. 
> If a device supports multiple PASID there might be different applications running parallel. 
> So why is multiple instances restricted ?

You can't have multiple processes owning the same PCI device, it's
unmanageable.

For using multiple PASIDs, my idea was that the userspace driver ("the
server"), that owns the device, would have a way to partition it into
smaller frames. It forks to create "clients" and assigns a PASID to each
of them (by issuing VFIO_BIND(client_pid) -> pasid, then writing the PASID
into a privileged MMIO frame that defines the partition properties). Each
client accesses an unprivileged MMIO frame to use a device partition (or
sends commands to the server via IPC), and can perform DMA on its own
virtual memory.

This is complete speculation of course, we have very little information on
how PASID-capable devices will be designed, so I'm trying to imagine
likely scenarios.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 31/36] iommu/arm-smmu-v3: Add support for PCI ATS
@ 2017-11-17 11:39                 ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-17 11:39 UTC (permalink / raw)
  To: Bharat Kumar Gogada, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu
  Cc: Mark Rutland, xieyisheng1, gabriele.paoloni, Catalin Marinas,
	Will Deacon, okaya, yi.l.liu, Lorenzo Pieralisi, ashok.raj, tn,
	joro, rfranz, lenb, jacob.jun.pan, alex.williamson, robh+dt,
	thunder.leizhen, bhelgaas, dwmw2, liubo95, rjw, robdclark,
	hanjun.guo, Sudeep Holla, Robin Murphy, nwatters

On 17/11/17 06:11, Bharat Kumar Gogada wrote:
[...]
> Thanks Jean, I see that currently vfio_group_fops_open does not allow multiple instances. 
> If a device supports multiple PASID there might be different applications running parallel. 
> So why is multiple instances restricted ?

You can't have multiple processes owning the same PCI device, it's
unmanageable.

For using multiple PASIDs, my idea was that the userspace driver ("the
server"), that owns the device, would have a way to partition it into
smaller frames. It forks to create "clients" and assigns a PASID to each
of them (by issuing VFIO_BIND(client_pid) -> pasid, then writing the PASID
into a privileged MMIO frame that defines the partition properties). Each
client accesses an unprivileged MMIO frame to use a device partition (or
sends commands to the server via IPC), and can perform DMA on its own
virtual memory.

This is complete speculation of course, we have very little information on
how PASID-capable devices will be designed, so I'm trying to imagine
likely scenarios.

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 31/36] iommu/arm-smmu-v3: Add support for PCI ATS
@ 2017-11-17 11:39                 ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-17 11:39 UTC (permalink / raw)
  To: linux-arm-kernel

On 17/11/17 06:11, Bharat Kumar Gogada wrote:
[...]
> Thanks Jean, I see that currently vfio_group_fops_open does not allow multiple instances. 
> If a device supports multiple PASID there might be different applications running parallel. 
> So why is multiple instances restricted ?

You can't have multiple processes owning the same PCI device, it's
unmanageable.

For using multiple PASIDs, my idea was that the userspace driver ("the
server"), that owns the device, would have a way to partition it into
smaller frames. It forks to create "clients" and assigns a PASID to each
of them (by issuing VFIO_BIND(client_pid) -> pasid, then writing the PASID
into a privileged MMIO frame that defines the partition properties). Each
client accesses an unprivileged MMIO frame to use a device partition (or
sends commands to the server via IPC), and can perform DMA on its own
virtual memory.

This is complete speculation of course, we have very little information on
how PASID-capable devices will be designed, so I'm trying to imagine
likely scenarios.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
  2017-10-06 13:31     ` Jean-Philippe Brucker
  (?)
@ 2017-11-22  3:15       ` Bob Liu
  -1 siblings, 0 replies; 268+ messages in thread
From: Bob Liu @ 2017-11-22  3:15 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, thunder.leizhen,
	xieyisheng1, gabriele.paoloni, nwatters, okaya, rfranz, dwmw2,
	jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

Hey Jean,

On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
> IOMMU drivers need a way to bind Linux processes to devices. This is used
> for Shared Virtual Memory (SVM), where devices support paging. In that
> mode, DMA can directly target virtual addresses of a process.
> 
> Introduce boilerplate code for allocating process structures and binding
> them to devices. Four operations are added to IOMMU drivers:
> 
> * process_alloc, process_free: to create an iommu_process structure and
>   perform architecture-specific operations required to grab the process
>   (for instance on ARM SMMU, pin down the CPU ASID). There is a single
>   iommu_process structure per Linux process.
> 

I'm a bit confused here.
The original meaning of iommu_domain is a virtual addrspace defined by a set of io page table.
(fix me if I misunderstood).
Then what's the meaning of iommu_domain and iommu_process after introducing iommu_process?
Could you consider document these concepts? 

Thanks,
Liubo

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
@ 2017-11-22  3:15       ` Bob Liu
  0 siblings, 0 replies; 268+ messages in thread
From: Bob Liu @ 2017-11-22  3:15 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu
  Cc: mark.rutland, xieyisheng1, gabriele.paoloni, catalin.marinas,
	will.deacon, okaya, yi.l.liu, lorenzo.pieralisi, ashok.raj, tn,
	joro, rfranz, lenb, jacob.jun.pan, alex.williamson, robh+dt,
	thunder.leizhen, bhelgaas, dwmw2, rjw, robdclark, hanjun.guo,
	sudeep.holla, robin.murphy, nwatters

Hey Jean,

On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
> IOMMU drivers need a way to bind Linux processes to devices. This is used
> for Shared Virtual Memory (SVM), where devices support paging. In that
> mode, DMA can directly target virtual addresses of a process.
> 
> Introduce boilerplate code for allocating process structures and binding
> them to devices. Four operations are added to IOMMU drivers:
> 
> * process_alloc, process_free: to create an iommu_process structure and
>   perform architecture-specific operations required to grab the process
>   (for instance on ARM SMMU, pin down the CPU ASID). There is a single
>   iommu_process structure per Linux process.
> 

I'm a bit confused here.
The original meaning of iommu_domain is a virtual addrspace defined by a set of io page table.
(fix me if I misunderstood).
Then what's the meaning of iommu_domain and iommu_process after introducing iommu_process?
Could you consider document these concepts? 

Thanks,
Liubo


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
@ 2017-11-22  3:15       ` Bob Liu
  0 siblings, 0 replies; 268+ messages in thread
From: Bob Liu @ 2017-11-22  3:15 UTC (permalink / raw)
  To: linux-arm-kernel

Hey Jean,

On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
> IOMMU drivers need a way to bind Linux processes to devices. This is used
> for Shared Virtual Memory (SVM), where devices support paging. In that
> mode, DMA can directly target virtual addresses of a process.
> 
> Introduce boilerplate code for allocating process structures and binding
> them to devices. Four operations are added to IOMMU drivers:
> 
> * process_alloc, process_free: to create an iommu_process structure and
>   perform architecture-specific operations required to grab the process
>   (for instance on ARM SMMU, pin down the CPU ASID). There is a single
>   iommu_process structure per Linux process.
> 

I'm a bit confused here.
The original meaning of iommu_domain is a virtual addrspace defined by a set of io page table.
(fix me if I misunderstood).
Then what's the meaning of iommu_domain and iommu_process after introducing iommu_process?
Could you consider document these concepts? 

Thanks,
Liubo

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
  2017-11-22  3:15       ` Bob Liu
  (?)
@ 2017-11-22 13:04         ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-22 13:04 UTC (permalink / raw)
  To: Bob Liu, linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, Mark Rutland, Catalin Marinas, Will Deacon,
	Lorenzo Pieralisi, hanjun.guo, Sudeep Holla, rjw, lenb,
	Robin Murphy, bhelgaas, alex.williamson, tn, thunder.leizhen,
	xieyisheng1, gabriele.paoloni, nwat

On 22/11/17 03:15, Bob Liu wrote:
> Hey Jean,
> 
> On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
>> IOMMU drivers need a way to bind Linux processes to devices. This is used
>> for Shared Virtual Memory (SVM), where devices support paging. In that
>> mode, DMA can directly target virtual addresses of a process.
>>
>> Introduce boilerplate code for allocating process structures and binding
>> them to devices. Four operations are added to IOMMU drivers:
>>
>> * process_alloc, process_free: to create an iommu_process structure and
>>   perform architecture-specific operations required to grab the process
>>   (for instance on ARM SMMU, pin down the CPU ASID). There is a single
>>   iommu_process structure per Linux process.
>>
> 
> I'm a bit confused here.
> The original meaning of iommu_domain is a virtual addrspace defined by a set of io page table.
> (fix me if I misunderstood).

iommu_domain can also be seen as a logical partition of devices that share
the same address spaces (the concept comes from AMD and Intel IOMMU
domains, I believe). Without PASIDs it was a single address space, with
PASIDs it can have multiple address spaces.

> Then what's the meaning of iommu_domain and iommu_process after introducing iommu_process?
> Could you consider document these concepts? 

iommu_process is used to keep track of Linux process address spaces. I'll
rename it to io_mm in next version, to make it clear that it doesn't
represent a Linux task but an mm_struct instead. However the
implementation stays pretty much identical. A domain can be associated to
multiple io_mm, and an io_mm can be associated to multiple domains.

In the IOMMU architectures I know, PASID is implemented like this. You
have the device tables (stream tables on SMMU), pointing to PASID tables
(context descriptor tables on SMMU). In the following diagram,

                    .->+--------+
                   / 0 |        |------ io_pgtable
                  /    +--------+
                 /   1 |        |------ io_mm->mm X
    +--------+  /      +--------+
  0 |      A |-'     2 |        |-.
    +--------+         +--------+  \
  1 |        |       3 |        |   \
    +--------+         +--------+    -- io_mm->mm Y
  2 |      B |--.     PASID tables  /
    +--------+   \                 |
  3 |      B |----+--->+--------+  |
    +--------+   /   0 |        |- | -- io_pgtable
  4 |      B |--'      +--------+  |
    +--------+       1 |        |  |
  Device tables        +--------+  |
                     2 |        |--'
                       +--------+
                     3 |        |------ io_mm->priv io_pgtable
                       +--------+
                      PASID tables

* Device 0 (e.g. PCI 0000:00:00.0) is in domain A.
* Devices 2, 3 and 4 are in domain B.
* Domain A has the top set of PASID tables.
* Domain B has the bottom set of PASID tables.

* Domain A is bound to process address space X.
  -> Device 0 can access X with PASID 1.
* Both domains A and B are bound to process address space Y.
  -> Devices 0, 2, 3 and 4 can access Y with PASID 2

* PASID 0 is special on Arm SMMU (with S1DSS=0b10). It will always be
  reserved for classic DMA map/unmap. Even for hypothetical devices that
  don't support non-pasid transactions, I'd like to keep this convention.
  It should be quite useful for device drivers to have PASID 0 available
  with DMA map/unmap.

* When introducing "private" PASID address spaces (that many are asking
  for), which are backed by a set of io-pgtable and map/unmap ops, I
  suppose they would reuse the io_mm structure. In this example PASID 3 is
  associated to a private address space and not backed by an mm. Since the
  PASID space is global, PASID 3 won't be available for any other domain.

Does this clarify the current design, or is it just more confusing?

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
@ 2017-11-22 13:04         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-22 13:04 UTC (permalink / raw)
  To: Bob Liu, linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: Mark Rutland, xieyisheng1, gabriele.paoloni, Catalin Marinas,
	Will Deacon, okaya, yi.l.liu, Lorenzo Pieralisi, ashok.raj, tn,
	joro, rfranz, lenb, jacob.jun.pan, alex.williamson, robh+dt,
	thunder.leizhen, bhelgaas, dwmw2, rjw, robdclark, hanjun.guo,
	Sudeep Holla, Robin Murphy, nwatters

On 22/11/17 03:15, Bob Liu wrote:
> Hey Jean,
> 
> On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
>> IOMMU drivers need a way to bind Linux processes to devices. This is used
>> for Shared Virtual Memory (SVM), where devices support paging. In that
>> mode, DMA can directly target virtual addresses of a process.
>>
>> Introduce boilerplate code for allocating process structures and binding
>> them to devices. Four operations are added to IOMMU drivers:
>>
>> * process_alloc, process_free: to create an iommu_process structure and
>>   perform architecture-specific operations required to grab the process
>>   (for instance on ARM SMMU, pin down the CPU ASID). There is a single
>>   iommu_process structure per Linux process.
>>
> 
> I'm a bit confused here.
> The original meaning of iommu_domain is a virtual addrspace defined by a set of io page table.
> (fix me if I misunderstood).

iommu_domain can also be seen as a logical partition of devices that share
the same address spaces (the concept comes from AMD and Intel IOMMU
domains, I believe). Without PASIDs it was a single address space, with
PASIDs it can have multiple address spaces.

> Then what's the meaning of iommu_domain and iommu_process after introducing iommu_process?
> Could you consider document these concepts? 

iommu_process is used to keep track of Linux process address spaces. I'll
rename it to io_mm in next version, to make it clear that it doesn't
represent a Linux task but an mm_struct instead. However the
implementation stays pretty much identical. A domain can be associated to
multiple io_mm, and an io_mm can be associated to multiple domains.

In the IOMMU architectures I know, PASID is implemented like this. You
have the device tables (stream tables on SMMU), pointing to PASID tables
(context descriptor tables on SMMU). In the following diagram,

                    .->+--------+
                   / 0 |        |------ io_pgtable
                  /    +--------+
                 /   1 |        |------ io_mm->mm X
    +--------+  /      +--------+
  0 |      A |-'     2 |        |-.
    +--------+         +--------+  \
  1 |        |       3 |        |   \
    +--------+         +--------+    -- io_mm->mm Y
  2 |      B |--.     PASID tables  /
    +--------+   \                 |
  3 |      B |----+--->+--------+  |
    +--------+   /   0 |        |- | -- io_pgtable
  4 |      B |--'      +--------+  |
    +--------+       1 |        |  |
  Device tables        +--------+  |
                     2 |        |--'
                       +--------+
                     3 |        |------ io_mm->priv io_pgtable
                       +--------+
                      PASID tables

* Device 0 (e.g. PCI 0000:00:00.0) is in domain A.
* Devices 2, 3 and 4 are in domain B.
* Domain A has the top set of PASID tables.
* Domain B has the bottom set of PASID tables.

* Domain A is bound to process address space X.
  -> Device 0 can access X with PASID 1.
* Both domains A and B are bound to process address space Y.
  -> Devices 0, 2, 3 and 4 can access Y with PASID 2

* PASID 0 is special on Arm SMMU (with S1DSS=0b10). It will always be
  reserved for classic DMA map/unmap. Even for hypothetical devices that
  don't support non-pasid transactions, I'd like to keep this convention.
  It should be quite useful for device drivers to have PASID 0 available
  with DMA map/unmap.

* When introducing "private" PASID address spaces (that many are asking
  for), which are backed by a set of io-pgtable and map/unmap ops, I
  suppose they would reuse the io_mm structure. In this example PASID 3 is
  associated to a private address space and not backed by an mm. Since the
  PASID space is global, PASID 3 won't be available for any other domain.

Does this clarify the current design, or is it just more confusing?

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
@ 2017-11-22 13:04         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-22 13:04 UTC (permalink / raw)
  To: linux-arm-kernel

On 22/11/17 03:15, Bob Liu wrote:
> Hey Jean,
> 
> On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
>> IOMMU drivers need a way to bind Linux processes to devices. This is used
>> for Shared Virtual Memory (SVM), where devices support paging. In that
>> mode, DMA can directly target virtual addresses of a process.
>>
>> Introduce boilerplate code for allocating process structures and binding
>> them to devices. Four operations are added to IOMMU drivers:
>>
>> * process_alloc, process_free: to create an iommu_process structure and
>>   perform architecture-specific operations required to grab the process
>>   (for instance on ARM SMMU, pin down the CPU ASID). There is a single
>>   iommu_process structure per Linux process.
>>
> 
> I'm a bit confused here.
> The original meaning of iommu_domain is a virtual addrspace defined by a set of io page table.
> (fix me if I misunderstood).

iommu_domain can also be seen as a logical partition of devices that share
the same address spaces (the concept comes from AMD and Intel IOMMU
domains, I believe). Without PASIDs it was a single address space, with
PASIDs it can have multiple address spaces.

> Then what's the meaning of iommu_domain and iommu_process after introducing iommu_process?
> Could you consider document these concepts? 

iommu_process is used to keep track of Linux process address spaces. I'll
rename it to io_mm in next version, to make it clear that it doesn't
represent a Linux task but an mm_struct instead. However the
implementation stays pretty much identical. A domain can be associated to
multiple io_mm, and an io_mm can be associated to multiple domains.

In the IOMMU architectures I know, PASID is implemented like this. You
have the device tables (stream tables on SMMU), pointing to PASID tables
(context descriptor tables on SMMU). In the following diagram,

                    .->+--------+
                   / 0 |        |------ io_pgtable
                  /    +--------+
                 /   1 |        |------ io_mm->mm X
    +--------+  /      +--------+
  0 |      A |-'     2 |        |-.
    +--------+         +--------+  \
  1 |        |       3 |        |   \
    +--------+         +--------+    -- io_mm->mm Y
  2 |      B |--.     PASID tables  /
    +--------+   \                 |
  3 |      B |----+--->+--------+  |
    +--------+   /   0 |        |- | -- io_pgtable
  4 |      B |--'      +--------+  |
    +--------+       1 |        |  |
  Device tables        +--------+  |
                     2 |        |--'
                       +--------+
                     3 |        |------ io_mm->priv io_pgtable
                       +--------+
                      PASID tables

* Device 0 (e.g. PCI 0000:00:00.0) is in domain A.
* Devices 2, 3 and 4 are in domain B.
* Domain A has the top set of PASID tables.
* Domain B has the bottom set of PASID tables.

* Domain A is bound to process address space X.
  -> Device 0 can access X with PASID 1.
* Both domains A and B are bound to process address space Y.
  -> Devices 0, 2, 3 and 4 can access Y with PASID 2

* PASID 0 is special on Arm SMMU (with S1DSS=0b10). It will always be
  reserved for classic DMA map/unmap. Even for hypothetical devices that
  don't support non-pasid transactions, I'd like to keep this convention.
  It should be quite useful for device drivers to have PASID 0 available
  with DMA map/unmap.

* When introducing "private" PASID address spaces (that many are asking
  for), which are backed by a set of io-pgtable and map/unmap ops, I
  suppose they would reuse the io_mm structure. In this example PASID 3 is
  associated to a private address space and not backed by an mm. Since the
  PASID space is global, PASID 3 won't be available for any other domain.

Does this clarify the current design, or is it just more confusing?

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
  2017-11-22 13:04         ` Jean-Philippe Brucker
  (?)
@ 2017-11-23 10:33             ` Bob Liu
  -1 siblings, 0 replies; 268+ messages in thread
From: Bob Liu @ 2017-11-23 10:33 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: Mark Rutland, gabriele.paoloni-hv44wF8Li93QT0dZR+AlfA,
	Catalin Marinas, Will Deacon, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw-LthD3rsA81gm4RdzfppkhA,
	Sudeep Holla

On 2017/11/22 21:04, Jean-Philippe Brucker wrote:
> On 22/11/17 03:15, Bob Liu wrote:
>> Hey Jean,
>>
>> On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
>>> IOMMU drivers need a way to bind Linux processes to devices. This is used
>>> for Shared Virtual Memory (SVM), where devices support paging. In that
>>> mode, DMA can directly target virtual addresses of a process.
>>>
>>> Introduce boilerplate code for allocating process structures and binding
>>> them to devices. Four operations are added to IOMMU drivers:
>>>
>>> * process_alloc, process_free: to create an iommu_process structure and
>>>   perform architecture-specific operations required to grab the process
>>>   (for instance on ARM SMMU, pin down the CPU ASID). There is a single
>>>   iommu_process structure per Linux process.
>>>
>>
>> I'm a bit confused here.
>> The original meaning of iommu_domain is a virtual addrspace defined by a set of io page table.
>> (fix me if I misunderstood).
> 
> iommu_domain can also be seen as a logical partition of devices that share
> the same address spaces (the concept comes from AMD and Intel IOMMU
> domains, I believe). Without PASIDs it was a single address space, with
> PASIDs it can have multiple address spaces.
> 
>> Then what's the meaning of iommu_domain and iommu_process after introducing iommu_process?
>> Could you consider document these concepts? 
> 
> iommu_process is used to keep track of Linux process address spaces. I'll
> rename it to io_mm in next version, to make it clear that it doesn't
> represent a Linux task but an mm_struct instead. However the
> implementation stays pretty much identical. A domain can be associated to
> multiple io_mm, and an io_mm can be associated to multiple domains.
> 
> In the IOMMU architectures I know, PASID is implemented like this. You
> have the device tables (stream tables on SMMU), pointing to PASID tables
> (context descriptor tables on SMMU). In the following diagram,
> 
>                     .->+--------+
>                    / 0 |        |------ io_pgtable
>                   /    +--------+
>                  /   1 |        |------ io_mm->mm X
>     +--------+  /      +--------+
>   0 |      A |-'     2 |        |-.
>     +--------+         +--------+  \
>   1 |        |       3 |        |   \
>     +--------+         +--------+    -- io_mm->mm Y
>   2 |      B |--.     PASID tables  /
>     +--------+   \                 |
>   3 |      B |----+--->+--------+  |
>     +--------+   /   0 |        |- | -- io_pgtable
>   4 |      B |--'      +--------+  |
>     +--------+       1 |        |  |
>   Device tables        +--------+  |
>                      2 |        |--'
>                        +--------+
>                      3 |        |------ io_mm->priv io_pgtable
>                        +--------+
>                       PASID tables
> 
> * Device 0 (e.g. PCI 0000:00:00.0) is in domain A.
> * Devices 2, 3 and 4 are in domain B.
> * Domain A has the top set of PASID tables.
> * Domain B has the bottom set of PASID tables.
> 
> * Domain A is bound to process address space X.
>   -> Device 0 can access X with PASID 1.
> * Both domains A and B are bound to process address space Y.
>   -> Devices 0, 2, 3 and 4 can access Y with PASID 2
> 
> * PASID 0 is special on Arm SMMU (with S1DSS=0b10). It will always be
>   reserved for classic DMA map/unmap. Even for hypothetical devices that
>   don't support non-pasid transactions, I'd like to keep this convention.
>   It should be quite useful for device drivers to have PASID 0 available
>   with DMA map/unmap.
> 
> * When introducing "private" PASID address spaces (that many are asking
>   for), which are backed by a set of io-pgtable and map/unmap ops, I
>   suppose they would reuse the io_mm structure. In this example PASID 3 is
>   associated to a private address space and not backed by an mm. Since the
>   PASID space is global, PASID 3 won't be available for any other domain.
> 
> Does this clarify the current design, or is it just more confusing?
> 

It's very helpful, thank you very much!

Regards,
Liubo

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
@ 2017-11-23 10:33             ` Bob Liu
  0 siblings, 0 replies; 268+ messages in thread
From: Bob Liu @ 2017-11-23 10:33 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu
  Cc: Mark Rutland, xieyisheng1, gabriele.paoloni, Catalin Marinas,
	Will Deacon, okaya, yi.l.liu, Lorenzo Pieralisi, ashok.raj, tn,
	joro, rfranz, lenb, jacob.jun.pan, alex.williamson, robh+dt,
	thunder.leizhen, bhelgaas, dwmw2, rjw, robdclark, hanjun.guo,
	Sudeep Holla, Robin Murphy, nwatters

On 2017/11/22 21:04, Jean-Philippe Brucker wrote:
> On 22/11/17 03:15, Bob Liu wrote:
>> Hey Jean,
>>
>> On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
>>> IOMMU drivers need a way to bind Linux processes to devices. This is used
>>> for Shared Virtual Memory (SVM), where devices support paging. In that
>>> mode, DMA can directly target virtual addresses of a process.
>>>
>>> Introduce boilerplate code for allocating process structures and binding
>>> them to devices. Four operations are added to IOMMU drivers:
>>>
>>> * process_alloc, process_free: to create an iommu_process structure and
>>>   perform architecture-specific operations required to grab the process
>>>   (for instance on ARM SMMU, pin down the CPU ASID). There is a single
>>>   iommu_process structure per Linux process.
>>>
>>
>> I'm a bit confused here.
>> The original meaning of iommu_domain is a virtual addrspace defined by a set of io page table.
>> (fix me if I misunderstood).
> 
> iommu_domain can also be seen as a logical partition of devices that share
> the same address spaces (the concept comes from AMD and Intel IOMMU
> domains, I believe). Without PASIDs it was a single address space, with
> PASIDs it can have multiple address spaces.
> 
>> Then what's the meaning of iommu_domain and iommu_process after introducing iommu_process?
>> Could you consider document these concepts? 
> 
> iommu_process is used to keep track of Linux process address spaces. I'll
> rename it to io_mm in next version, to make it clear that it doesn't
> represent a Linux task but an mm_struct instead. However the
> implementation stays pretty much identical. A domain can be associated to
> multiple io_mm, and an io_mm can be associated to multiple domains.
> 
> In the IOMMU architectures I know, PASID is implemented like this. You
> have the device tables (stream tables on SMMU), pointing to PASID tables
> (context descriptor tables on SMMU). In the following diagram,
> 
>                     .->+--------+
>                    / 0 |        |------ io_pgtable
>                   /    +--------+
>                  /   1 |        |------ io_mm->mm X
>     +--------+  /      +--------+
>   0 |      A |-'     2 |        |-.
>     +--------+         +--------+  \
>   1 |        |       3 |        |   \
>     +--------+         +--------+    -- io_mm->mm Y
>   2 |      B |--.     PASID tables  /
>     +--------+   \                 |
>   3 |      B |----+--->+--------+  |
>     +--------+   /   0 |        |- | -- io_pgtable
>   4 |      B |--'      +--------+  |
>     +--------+       1 |        |  |
>   Device tables        +--------+  |
>                      2 |        |--'
>                        +--------+
>                      3 |        |------ io_mm->priv io_pgtable
>                        +--------+
>                       PASID tables
> 
> * Device 0 (e.g. PCI 0000:00:00.0) is in domain A.
> * Devices 2, 3 and 4 are in domain B.
> * Domain A has the top set of PASID tables.
> * Domain B has the bottom set of PASID tables.
> 
> * Domain A is bound to process address space X.
>   -> Device 0 can access X with PASID 1.
> * Both domains A and B are bound to process address space Y.
>   -> Devices 0, 2, 3 and 4 can access Y with PASID 2
> 
> * PASID 0 is special on Arm SMMU (with S1DSS=0b10). It will always be
>   reserved for classic DMA map/unmap. Even for hypothetical devices that
>   don't support non-pasid transactions, I'd like to keep this convention.
>   It should be quite useful for device drivers to have PASID 0 available
>   with DMA map/unmap.
> 
> * When introducing "private" PASID address spaces (that many are asking
>   for), which are backed by a set of io-pgtable and map/unmap ops, I
>   suppose they would reuse the io_mm structure. In this example PASID 3 is
>   associated to a private address space and not backed by an mm. Since the
>   PASID space is global, PASID 3 won't be available for any other domain.
> 
> Does this clarify the current design, or is it just more confusing?
> 

It's very helpful, thank you very much!

Regards,
Liubo



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs
@ 2017-11-23 10:33             ` Bob Liu
  0 siblings, 0 replies; 268+ messages in thread
From: Bob Liu @ 2017-11-23 10:33 UTC (permalink / raw)
  To: linux-arm-kernel

On 2017/11/22 21:04, Jean-Philippe Brucker wrote:
> On 22/11/17 03:15, Bob Liu wrote:
>> Hey Jean,
>>
>> On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
>>> IOMMU drivers need a way to bind Linux processes to devices. This is used
>>> for Shared Virtual Memory (SVM), where devices support paging. In that
>>> mode, DMA can directly target virtual addresses of a process.
>>>
>>> Introduce boilerplate code for allocating process structures and binding
>>> them to devices. Four operations are added to IOMMU drivers:
>>>
>>> * process_alloc, process_free: to create an iommu_process structure and
>>>   perform architecture-specific operations required to grab the process
>>>   (for instance on ARM SMMU, pin down the CPU ASID). There is a single
>>>   iommu_process structure per Linux process.
>>>
>>
>> I'm a bit confused here.
>> The original meaning of iommu_domain is a virtual addrspace defined by a set of io page table.
>> (fix me if I misunderstood).
> 
> iommu_domain can also be seen as a logical partition of devices that share
> the same address spaces (the concept comes from AMD and Intel IOMMU
> domains, I believe). Without PASIDs it was a single address space, with
> PASIDs it can have multiple address spaces.
> 
>> Then what's the meaning of iommu_domain and iommu_process after introducing iommu_process?
>> Could you consider document these concepts? 
> 
> iommu_process is used to keep track of Linux process address spaces. I'll
> rename it to io_mm in next version, to make it clear that it doesn't
> represent a Linux task but an mm_struct instead. However the
> implementation stays pretty much identical. A domain can be associated to
> multiple io_mm, and an io_mm can be associated to multiple domains.
> 
> In the IOMMU architectures I know, PASID is implemented like this. You
> have the device tables (stream tables on SMMU), pointing to PASID tables
> (context descriptor tables on SMMU). In the following diagram,
> 
>                     .->+--------+
>                    / 0 |        |------ io_pgtable
>                   /    +--------+
>                  /   1 |        |------ io_mm->mm X
>     +--------+  /      +--------+
>   0 |      A |-'     2 |        |-.
>     +--------+         +--------+  \
>   1 |        |       3 |        |   \
>     +--------+         +--------+    -- io_mm->mm Y
>   2 |      B |--.     PASID tables  /
>     +--------+   \                 |
>   3 |      B |----+--->+--------+  |
>     +--------+   /   0 |        |- | -- io_pgtable
>   4 |      B |--'      +--------+  |
>     +--------+       1 |        |  |
>   Device tables        +--------+  |
>                      2 |        |--'
>                        +--------+
>                      3 |        |------ io_mm->priv io_pgtable
>                        +--------+
>                       PASID tables
> 
> * Device 0 (e.g. PCI 0000:00:00.0) is in domain A.
> * Devices 2, 3 and 4 are in domain B.
> * Domain A has the top set of PASID tables.
> * Domain B has the bottom set of PASID tables.
> 
> * Domain A is bound to process address space X.
>   -> Device 0 can access X with PASID 1.
> * Both domains A and B are bound to process address space Y.
>   -> Devices 0, 2, 3 and 4 can access Y with PASID 2
> 
> * PASID 0 is special on Arm SMMU (with S1DSS=0b10). It will always be
>   reserved for classic DMA map/unmap. Even for hypothetical devices that
>   don't support non-pasid transactions, I'd like to keep this convention.
>   It should be quite useful for device drivers to have PASID 0 available
>   with DMA map/unmap.
> 
> * When introducing "private" PASID address spaces (that many are asking
>   for), which are backed by a set of io-pgtable and map/unmap ops, I
>   suppose they would reuse the io_mm structure. In this example PASID 3 is
>   associated to a private address space and not backed by an mm. Since the
>   PASID space is global, PASID 3 won't be available for any other domain.
> 
> Does this clarify the current design, or is it just more confusing?
> 

It's very helpful, thank you very much!

Regards,
Liubo

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 10/36] vfio: Add support for Shared Virtual Memory
  2017-10-06 13:31   ` Jean-Philippe Brucker
  (?)
@ 2017-11-24  8:23     ` Bob Liu
  -1 siblings, 0 replies; 268+ messages in thread
From: Bob Liu @ 2017-11-24  8:23 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu
  Cc: mark.rutland, xieyisheng1, gabriele.paoloni, catalin.marinas,
	will.deacon, okaya, yi.l.liu, lorenzo.pieralisi, ashok.raj, tn,
	joro, rfranz, lenb, jacob.jun.pan, alex.williamson, robh+dt,
	thunder.leizhen, bhelgaas, dwmw2, rjw, robdclark, hanjun.guo,
	sudeep.holla, robin.murphy, nwatters

On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
> Add two new ioctl for VFIO containers. VFIO_DEVICE_BIND_PROCESS creates a
> bond between a container and a process address space, identified by a
> device-specific ID named PASID. This allows the device to target DMA
> transactions at the process virtual addresses without a need for mapping
> and unmapping buffers explicitly in the IOMMU. The process page tables are
> shared with the IOMMU, and mechanisms such as PCI ATS/PRI may be used to
> handle faults. VFIO_DEVICE_UNBIND_PROCESS removed a bond identified by a
> PASID.
> 

How about hide bind/unbind into ioctl(VFIO_SET_IOMMU)? 
e.g always bind to current process in SET_IOMMU.

Not sure about the real use case.

> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 243 +++++++++++++++++++++++++++++++++++++++-
>  include/uapi/linux/vfio.h       |  69 ++++++++++++
>  2 files changed, 311 insertions(+), 1 deletion(-)
> 

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 10/36] vfio: Add support for Shared Virtual Memory
@ 2017-11-24  8:23     ` Bob Liu
  0 siblings, 0 replies; 268+ messages in thread
From: Bob Liu @ 2017-11-24  8:23 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu
  Cc: mark.rutland, xieyisheng1, gabriele.paoloni, catalin.marinas,
	will.deacon, okaya, yi.l.liu, lorenzo.pieralisi, ashok.raj, tn,
	joro, rfranz, lenb, jacob.jun.pan, alex.williamson, robh+dt,
	thunder.leizhen, bhelgaas, dwmw2, rjw, robdclark, hanjun.guo,
	sudeep.holla, robin.murphy, nwatters

On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
> Add two new ioctl for VFIO containers. VFIO_DEVICE_BIND_PROCESS creates a
> bond between a container and a process address space, identified by a
> device-specific ID named PASID. This allows the device to target DMA
> transactions at the process virtual addresses without a need for mapping
> and unmapping buffers explicitly in the IOMMU. The process page tables are
> shared with the IOMMU, and mechanisms such as PCI ATS/PRI may be used to
> handle faults. VFIO_DEVICE_UNBIND_PROCESS removed a bond identified by a
> PASID.
> 

How about hide bind/unbind into ioctl(VFIO_SET_IOMMU)? 
e.g always bind to current process in SET_IOMMU.

Not sure about the real use case.

> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 243 +++++++++++++++++++++++++++++++++++++++-
>  include/uapi/linux/vfio.h       |  69 ++++++++++++
>  2 files changed, 311 insertions(+), 1 deletion(-)
> 



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 10/36] vfio: Add support for Shared Virtual Memory
@ 2017-11-24  8:23     ` Bob Liu
  0 siblings, 0 replies; 268+ messages in thread
From: Bob Liu @ 2017-11-24  8:23 UTC (permalink / raw)
  To: linux-arm-kernel

On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
> Add two new ioctl for VFIO containers. VFIO_DEVICE_BIND_PROCESS creates a
> bond between a container and a process address space, identified by a
> device-specific ID named PASID. This allows the device to target DMA
> transactions at the process virtual addresses without a need for mapping
> and unmapping buffers explicitly in the IOMMU. The process page tables are
> shared with the IOMMU, and mechanisms such as PCI ATS/PRI may be used to
> handle faults. VFIO_DEVICE_UNBIND_PROCESS removed a bond identified by a
> PASID.
> 

How about hide bind/unbind into ioctl(VFIO_SET_IOMMU)? 
e.g always bind to current process in SET_IOMMU.

Not sure about the real use case.

> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 243 +++++++++++++++++++++++++++++++++++++++-
>  include/uapi/linux/vfio.h       |  69 ++++++++++++
>  2 files changed, 311 insertions(+), 1 deletion(-)
> 

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 10/36] vfio: Add support for Shared Virtual Memory
  2017-11-24  8:23     ` Bob Liu
  (?)
@ 2017-11-24 10:58       ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-24 10:58 UTC (permalink / raw)
  To: Bob Liu, linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, Mark Rutland, Catalin Marinas, Will Deacon,
	Lorenzo Pieralisi, hanjun.guo, Sudeep Holla, rjw, lenb,
	Robin Murphy, bhelgaas, alex.williamson, tn, thunder.leizhen,
	xieyisheng1, gabriele.paoloni, nwat

On 24/11/17 08:23, Bob Liu wrote:
> On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
>> Add two new ioctl for VFIO containers. VFIO_DEVICE_BIND_PROCESS creates a
>> bond between a container and a process address space, identified by a
>> device-specific ID named PASID. This allows the device to target DMA
>> transactions at the process virtual addresses without a need for mapping
>> and unmapping buffers explicitly in the IOMMU. The process page tables are
>> shared with the IOMMU, and mechanisms such as PCI ATS/PRI may be used to
>> handle faults. VFIO_DEVICE_UNBIND_PROCESS removed a bond identified by a
>> PASID.
>> 
> 
> How about hide bind/unbind into ioctl(VFIO_SET_IOMMU)?
> e.g always bind to current process in SET_IOMMU.
> 
> Not sure about the real use case.

I guess you could introduce a new VFIO IOMMU type for this. I think this
would be useful for SVA without PASID: if the device supports I/O page
faults, use SET_IOMMU with a VFIO_SVA_IOMMU type (for example) and the
process is bound automatically to the default translation context of the
device. This requires a new IOMMU type because the MAP/UNMAP ioctl won't
work anymore.

I'm not keen on introducing loads of new features in the APIs at the
moment, because I only have the IOMMU point of view, not many endpoint users.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 10/36] vfio: Add support for Shared Virtual Memory
@ 2017-11-24 10:58       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-24 10:58 UTC (permalink / raw)
  To: Bob Liu, linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: Mark Rutland, xieyisheng1, gabriele.paoloni, Catalin Marinas,
	Will Deacon, okaya, yi.l.liu, Lorenzo Pieralisi, ashok.raj, tn,
	joro, rfranz, lenb, jacob.jun.pan, alex.williamson, robh+dt,
	thunder.leizhen, bhelgaas, dwmw2, rjw, robdclark, hanjun.guo,
	Sudeep Holla, Robin Murphy, nwatters

On 24/11/17 08:23, Bob Liu wrote:
> On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
>> Add two new ioctl for VFIO containers. VFIO_DEVICE_BIND_PROCESS creates a
>> bond between a container and a process address space, identified by a
>> device-specific ID named PASID. This allows the device to target DMA
>> transactions at the process virtual addresses without a need for mapping
>> and unmapping buffers explicitly in the IOMMU. The process page tables are
>> shared with the IOMMU, and mechanisms such as PCI ATS/PRI may be used to
>> handle faults. VFIO_DEVICE_UNBIND_PROCESS removed a bond identified by a
>> PASID.
>> 
> 
> How about hide bind/unbind into ioctl(VFIO_SET_IOMMU)?
> e.g always bind to current process in SET_IOMMU.
> 
> Not sure about the real use case.

I guess you could introduce a new VFIO IOMMU type for this. I think this
would be useful for SVA without PASID: if the device supports I/O page
faults, use SET_IOMMU with a VFIO_SVA_IOMMU type (for example) and the
process is bound automatically to the default translation context of the
device. This requires a new IOMMU type because the MAP/UNMAP ioctl won't
work anymore.

I'm not keen on introducing loads of new features in the APIs at the
moment, because I only have the IOMMU point of view, not many endpoint users.

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 10/36] vfio: Add support for Shared Virtual Memory
@ 2017-11-24 10:58       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-24 10:58 UTC (permalink / raw)
  To: linux-arm-kernel

On 24/11/17 08:23, Bob Liu wrote:
> On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
>> Add two new ioctl for VFIO containers. VFIO_DEVICE_BIND_PROCESS creates a
>> bond between a container and a process address space, identified by a
>> device-specific ID named PASID. This allows the device to target DMA
>> transactions at the process virtual addresses without a need for mapping
>> and unmapping buffers explicitly in the IOMMU. The process page tables are
>> shared with the IOMMU, and mechanisms such as PCI ATS/PRI may be used to
>> handle faults. VFIO_DEVICE_UNBIND_PROCESS removed a bond identified by a
>> PASID.
>> 
> 
> How about hide bind/unbind into ioctl(VFIO_SET_IOMMU)?
> e.g always bind to current process in SET_IOMMU.
> 
> Not sure about the real use case.

I guess you could introduce a new VFIO IOMMU type for this. I think this
would be useful for SVA without PASID: if the device supports I/O page
faults, use SET_IOMMU with a VFIO_SVA_IOMMU type (for example) and the
process is bound automatically to the default translation context of the
device. This requires a new IOMMU type because the MAP/UNMAP ioctl won't
work anymore.

I'm not keen on introducing loads of new features in the APIs at the
moment, because I only have the IOMMU point of view, not many endpoint users.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 05/36] iommu/process: Bind and unbind process to and from devices
  2017-10-06 13:31     ` Jean-Philippe Brucker
  (?)
@ 2017-11-29  6:08       ` Yisheng Xie
  -1 siblings, 0 replies; 268+ messages in thread
From: Yisheng Xie @ 2017-11-29  6:08 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu
  Cc: mark.rutland, gabriele.paoloni, catalin.marinas, will.deacon,
	okaya, yi.l.liu, lorenzo.pieralisi, ashok.raj, tn, joro, rfranz,
	lenb, jacob.jun.pan, alex.williamson, robh+dt, thunder.leizhen,
	bhelgaas, dwmw2, liubo95, rjw, robdclark, hanjun.guo,
	sudeep.holla, robin.murphy, nwatters



On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
> +int iommu_process_bind_device(struct device *dev, struct task_struct *task,
> +			      int *pasid, int flags)
> +{
[..]
> +			err = iommu_process_attach_locked(context, dev);
> +			if (err)
> +				iommu_process_put_locked(process);
one ref for a context is enough right? so also need call iommu_process_put_locked()
if attach ok, or will be leak if user call bind twice for the same device and task.

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 05/36] iommu/process: Bind and unbind process to and from devices
@ 2017-11-29  6:08       ` Yisheng Xie
  0 siblings, 0 replies; 268+ messages in thread
From: Yisheng Xie @ 2017-11-29  6:08 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu
  Cc: mark.rutland, gabriele.paoloni, catalin.marinas, will.deacon,
	okaya, yi.l.liu, lorenzo.pieralisi, ashok.raj, tn, joro, rfranz,
	lenb, jacob.jun.pan, alex.williamson, robh+dt, thunder.leizhen,
	bhelgaas, dwmw2, liubo95, rjw, robdclark, hanjun.guo,
	sudeep.holla, robin.murphy, nwatters



On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
> +int iommu_process_bind_device(struct device *dev, struct task_struct *task,
> +			      int *pasid, int flags)
> +{
[..]
> +			err = iommu_process_attach_locked(context, dev);
> +			if (err)
> +				iommu_process_put_locked(process);
one ref for a context is enough right? so also need call iommu_process_put_locked()
if attach ok, or will be leak if user call bind twice for the same device and task.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 05/36] iommu/process: Bind and unbind process to and from devices
@ 2017-11-29  6:08       ` Yisheng Xie
  0 siblings, 0 replies; 268+ messages in thread
From: Yisheng Xie @ 2017-11-29  6:08 UTC (permalink / raw)
  To: linux-arm-kernel



On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
> +int iommu_process_bind_device(struct device *dev, struct task_struct *task,
> +			      int *pasid, int flags)
> +{
[..]
> +			err = iommu_process_attach_locked(context, dev);
> +			if (err)
> +				iommu_process_put_locked(process);
one ref for a context is enough right? so also need call iommu_process_put_locked()
if attach ok, or will be leak if user call bind twice for the same device and task.

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 09/36] iommu/fault: Allow blocking fault handlers
  2017-10-06 13:31   ` Jean-Philippe Brucker
  (?)
@ 2017-11-29  6:15       ` Yisheng Xie
  -1 siblings, 0 replies; 268+ messages in thread
From: Yisheng Xie @ 2017-11-29  6:15 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: mark.rutland-5wv7dgnIgG8,
	gabriele.paoloni-hv44wF8Li93QT0dZR+AlfA,
	catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	rjw-LthD3rsA81gm4RdzfppkhA, sudeep.holla-5wv7dgnIgG8

Hi Jean,

On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
> -	if (domain->ext_handler) {
> +	if (domain->handler_flags & IOMMU_FAULT_HANDLER_ATOMIC) {
> +		fault->flags |= IOMMU_FAULT_ATOMIC;

Why remove the condition of domain->ext_handler? should it be much better like:
  if ((domain->handler_flags & IOMMU_FAULT_HANDLER_ATOMIC) && domain->ext_handler)

If domain->ext_handler is NULL, and (domain->handler_flags & IOMMU_FAULT_HANDLER_ATOMIC)
is true. It will oops, right?

>  		ret = domain->ext_handler(domain, dev, fault,
>  					  domain->handler_token);

Thanks
Yisheng Xie

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 09/36] iommu/fault: Allow blocking fault handlers
@ 2017-11-29  6:15       ` Yisheng Xie
  0 siblings, 0 replies; 268+ messages in thread
From: Yisheng Xie @ 2017-11-29  6:15 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, gabriele.paoloni, nwatters, okaya, rfranz,
	dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

Hi Jean,

On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
> -	if (domain->ext_handler) {
> +	if (domain->handler_flags & IOMMU_FAULT_HANDLER_ATOMIC) {
> +		fault->flags |= IOMMU_FAULT_ATOMIC;

Why remove the condition of domain->ext_handler? should it be much better like:
  if ((domain->handler_flags & IOMMU_FAULT_HANDLER_ATOMIC) && domain->ext_handler)

If domain->ext_handler is NULL, and (domain->handler_flags & IOMMU_FAULT_HANDLER_ATOMIC)
is true. It will oops, right?

>  		ret = domain->ext_handler(domain, dev, fault,
>  					  domain->handler_token);

Thanks
Yisheng Xie


^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 09/36] iommu/fault: Allow blocking fault handlers
@ 2017-11-29  6:15       ` Yisheng Xie
  0 siblings, 0 replies; 268+ messages in thread
From: Yisheng Xie @ 2017-11-29  6:15 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jean,

On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
> -	if (domain->ext_handler) {
> +	if (domain->handler_flags & IOMMU_FAULT_HANDLER_ATOMIC) {
> +		fault->flags |= IOMMU_FAULT_ATOMIC;

Why remove the condition of domain->ext_handler? should it be much better like:
  if ((domain->handler_flags & IOMMU_FAULT_HANDLER_ATOMIC) && domain->ext_handler)

If domain->ext_handler is NULL, and (domain->handler_flags & IOMMU_FAULT_HANDLER_ATOMIC)
is true. It will oops, right?

>  		ret = domain->ext_handler(domain, dev, fault,
>  					  domain->handler_token);

Thanks
Yisheng Xie

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 09/36] iommu/fault: Allow blocking fault handlers
  2017-11-29  6:15       ` Yisheng Xie
  (?)
@ 2017-11-29 15:01           ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-29 15:01 UTC (permalink / raw)
  To: Yisheng Xie, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: joro-zLv9SwRftAIdnm+yROfE0A, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	Mark Rutland, Catalin Marinas, Will Deacon, Lorenzo Pieralisi,
	hanjun.guo-QSEj5FYQhm4dnm+yROfE0A, Sudeep Holla,
	rjw-LthD3rsA81gm4RdzfppkhA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	Robin Murphy, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	alex.williamson-H+wXaHxf7aLQT0dZR+AlfA,
	tn-nYOzD4b6Jr9Wk0Htik3J/w, liubo95-hv44wF8Li93QT0dZR+AlfA,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	gabriele.paoloni-hv44wF8Li93QT0dZR+AlfA, nwatters

Hello,

On 29/11/17 06:15, Yisheng Xie wrote:
> Hi Jean,
> 
> On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
>> -	if (domain->ext_handler) {
>> +	if (domain->handler_flags & IOMMU_FAULT_HANDLER_ATOMIC) {
>> +		fault->flags |= IOMMU_FAULT_ATOMIC;
> 
> Why remove the condition of domain->ext_handler? should it be much better like:
>   if ((domain->handler_flags & IOMMU_FAULT_HANDLER_ATOMIC) && domain->ext_handler)
> 
> If domain->ext_handler is NULL, and (domain->handler_flags & IOMMU_FAULT_HANDLER_ATOMIC)
> is true. It will oops, right?

I removed the check because ext_handler shouldn't be NULL if handler_flags
has a bit set (as per iommu_set_ext_fault_handler). But you're right that
this is fragile, and I overlooked the case where users could call
set_ext_fault_handler to clear the fault handler.

(Note that this ext_handler will most likely be replaced by the fault
infrastructure that Jacob is working on:
https://patchwork.kernel.org/patch/10063385/ to which we should add the
atomic/blocking flags)

Thanks,
Jean
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 09/36] iommu/fault: Allow blocking fault handlers
@ 2017-11-29 15:01           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-29 15:01 UTC (permalink / raw)
  To: Yisheng Xie, linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: Mark Rutland, gabriele.paoloni, Catalin Marinas, Will Deacon,
	okaya, yi.l.liu, Lorenzo Pieralisi, ashok.raj, tn, joro, rfranz,
	lenb, jacob.jun.pan, alex.williamson, robh+dt, thunder.leizhen,
	bhelgaas, dwmw2, liubo95, rjw, robdclark, hanjun.guo,
	Sudeep Holla, Robin Murphy, nwatters

Hello,

On 29/11/17 06:15, Yisheng Xie wrote:
> Hi Jean,
> 
> On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
>> -	if (domain->ext_handler) {
>> +	if (domain->handler_flags & IOMMU_FAULT_HANDLER_ATOMIC) {
>> +		fault->flags |= IOMMU_FAULT_ATOMIC;
> 
> Why remove the condition of domain->ext_handler? should it be much better like:
>   if ((domain->handler_flags & IOMMU_FAULT_HANDLER_ATOMIC) && domain->ext_handler)
> 
> If domain->ext_handler is NULL, and (domain->handler_flags & IOMMU_FAULT_HANDLER_ATOMIC)
> is true. It will oops, right?

I removed the check because ext_handler shouldn't be NULL if handler_flags
has a bit set (as per iommu_set_ext_fault_handler). But you're right that
this is fragile, and I overlooked the case where users could call
set_ext_fault_handler to clear the fault handler.

(Note that this ext_handler will most likely be replaced by the fault
infrastructure that Jacob is working on:
https://patchwork.kernel.org/patch/10063385/ to which we should add the
atomic/blocking flags)

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 09/36] iommu/fault: Allow blocking fault handlers
@ 2017-11-29 15:01           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-29 15:01 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On 29/11/17 06:15, Yisheng Xie wrote:
> Hi Jean,
> 
> On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
>> -	if (domain->ext_handler) {
>> +	if (domain->handler_flags & IOMMU_FAULT_HANDLER_ATOMIC) {
>> +		fault->flags |= IOMMU_FAULT_ATOMIC;
> 
> Why remove the condition of domain->ext_handler? should it be much better like:
>   if ((domain->handler_flags & IOMMU_FAULT_HANDLER_ATOMIC) && domain->ext_handler)
> 
> If domain->ext_handler is NULL, and (domain->handler_flags & IOMMU_FAULT_HANDLER_ATOMIC)
> is true. It will oops, right?

I removed the check because ext_handler shouldn't be NULL if handler_flags
has a bit set (as per iommu_set_ext_fault_handler). But you're right that
this is fragile, and I overlooked the case where users could call
set_ext_fault_handler to clear the fault handler.

(Note that this ext_handler will most likely be replaced by the fault
infrastructure that Jacob is working on:
https://patchwork.kernel.org/patch/10063385/ to which we should add the
atomic/blocking flags)

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 05/36] iommu/process: Bind and unbind process to and from devices
  2017-11-29  6:08       ` Yisheng Xie
  (?)
@ 2017-11-29 15:01         ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-29 15:01 UTC (permalink / raw)
  To: Yisheng Xie, linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, Mark Rutland, Catalin Marinas, Will Deacon,
	Lorenzo Pieralisi, hanjun.guo, Sudeep Holla, rjw, lenb,
	Robin Murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, gabriele.paoloni, nwatters

On 29/11/17 06:08, Yisheng Xie wrote:
> 
> 
> On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
>> +int iommu_process_bind_device(struct device *dev, struct task_struct *task,
>> +			      int *pasid, int flags)
>> +{
> [..]
>> +			err = iommu_process_attach_locked(context, dev);
>> +			if (err)
>> +				iommu_process_put_locked(process);
> one ref for a context is enough right? so also need call iommu_process_put_locked()
> if attach ok, or will be leak if user call bind twice for the same device and task.

I wasn't sure, I think I prefer taking one ref for each bind. If user
calls bind twice, it should call unbind twice as well (in case of leak we
free the context on process exit).

Also with this implementation, user can call bind for two devices in the
same domain, which will share the same context structure. So we have to
take as many refs as bind() calls.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 05/36] iommu/process: Bind and unbind process to and from devices
@ 2017-11-29 15:01         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-29 15:01 UTC (permalink / raw)
  To: Yisheng Xie, linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: Mark Rutland, gabriele.paoloni, Catalin Marinas, Will Deacon,
	okaya, yi.l.liu, Lorenzo Pieralisi, ashok.raj, tn, joro, rfranz,
	lenb, jacob.jun.pan, alex.williamson, robh+dt, thunder.leizhen,
	bhelgaas, dwmw2, liubo95, rjw, robdclark, hanjun.guo,
	Sudeep Holla, Robin Murphy, nwatters

On 29/11/17 06:08, Yisheng Xie wrote:
> 
> 
> On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
>> +int iommu_process_bind_device(struct device *dev, struct task_struct *task,
>> +			      int *pasid, int flags)
>> +{
> [..]
>> +			err = iommu_process_attach_locked(context, dev);
>> +			if (err)
>> +				iommu_process_put_locked(process);
> one ref for a context is enough right? so also need call iommu_process_put_locked()
> if attach ok, or will be leak if user call bind twice for the same device and task.

I wasn't sure, I think I prefer taking one ref for each bind. If user
calls bind twice, it should call unbind twice as well (in case of leak we
free the context on process exit).

Also with this implementation, user can call bind for two devices in the
same domain, which will share the same context structure. So we have to
take as many refs as bind() calls.

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 05/36] iommu/process: Bind and unbind process to and from devices
@ 2017-11-29 15:01         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-29 15:01 UTC (permalink / raw)
  To: linux-arm-kernel

On 29/11/17 06:08, Yisheng Xie wrote:
> 
> 
> On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
>> +int iommu_process_bind_device(struct device *dev, struct task_struct *task,
>> +			      int *pasid, int flags)
>> +{
> [..]
>> +			err = iommu_process_attach_locked(context, dev);
>> +			if (err)
>> +				iommu_process_put_locked(process);
> one ref for a context is enough right? so also need call iommu_process_put_locked()
> if attach ok, or will be leak if user call bind twice for the same device and task.

I wasn't sure, I think I prefer taking one ref for each bind. If user
calls bind twice, it should call unbind twice as well (in case of leak we
free the context on process exit).

Also with this implementation, user can call bind for two devices in the
same domain, which will share the same context structure. So we have to
take as many refs as bind() calls.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 05/36] iommu/process: Bind and unbind process to and from devices
  2017-11-29 15:01         ` Jean-Philippe Brucker
  (?)
@ 2017-11-30  1:11           ` Yisheng Xie
  -1 siblings, 0 replies; 268+ messages in thread
From: Yisheng Xie @ 2017-11-30  1:11 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu
  Cc: joro, robh+dt, Mark Rutland, Catalin Marinas, Will Deacon,
	Lorenzo Pieralisi, hanjun.guo, Sudeep Holla, rjw, lenb,
	Robin Murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, gabriele.paoloni, nwatters

Hi, Jean,

On 2017/11/29 23:01, Jean-Philippe Brucker wrote:
> On 29/11/17 06:08, Yisheng Xie wrote:
>>
>>
>> On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
>>> +int iommu_process_bind_device(struct device *dev, struct task_struct *task,
>>> +			      int *pasid, int flags)
>>> +{
>> [..]
>>> +			err = iommu_process_attach_locked(context, dev);
>>> +			if (err)
>>> +				iommu_process_put_locked(process);
>> one ref for a context is enough right? so also need call iommu_process_put_locked()
>> if attach ok, or will be leak if user call bind twice for the same device and task.
> 
> I wasn't sure, I think I prefer taking one ref for each bind. If user
> calls bind twice, it should call unbind twice as well (in case of leak we
> free the context on process exit).
> 
> Also with this implementation, user can call bind for two devices in the
> same domain, which will share the same context structure. So we have to
> take as many refs as bind() calls.


hmm, it has two ref, one for _context_ and the other for *process* (or maybe mm for
your next version), right? For each bind it will take a ref of context as present
design. but why also process ref need be taken for each bind? I mean it seems does
not break _user can call bind for two devices in the same domain_.

And if you really want to take a ref of *process* for echo bind, you should put it when
unbind, right? I just not find where you put the ref of process when unbind. But just put
the process ref when free context.

Maybe I just miss something.

Thanks
Yisheng Xie
> 
> Thanks,
> Jean
> 
> .
> 


^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 05/36] iommu/process: Bind and unbind process to and from devices
@ 2017-11-30  1:11           ` Yisheng Xie
  0 siblings, 0 replies; 268+ messages in thread
From: Yisheng Xie @ 2017-11-30  1:11 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu
  Cc: Mark Rutland, gabriele.paoloni, Catalin Marinas, Will Deacon,
	okaya, yi.l.liu, Lorenzo Pieralisi, ashok.raj, tn, joro, rfranz,
	lenb, jacob.jun.pan, alex.williamson, robh+dt, thunder.leizhen,
	bhelgaas, dwmw2, liubo95, rjw, robdclark, hanjun.guo,
	Sudeep Holla, Robin Murphy, nwatters

Hi, Jean,

On 2017/11/29 23:01, Jean-Philippe Brucker wrote:
> On 29/11/17 06:08, Yisheng Xie wrote:
>>
>>
>> On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
>>> +int iommu_process_bind_device(struct device *dev, struct task_struct *task,
>>> +			      int *pasid, int flags)
>>> +{
>> [..]
>>> +			err = iommu_process_attach_locked(context, dev);
>>> +			if (err)
>>> +				iommu_process_put_locked(process);
>> one ref for a context is enough right? so also need call iommu_process_put_locked()
>> if attach ok, or will be leak if user call bind twice for the same device and task.
> 
> I wasn't sure, I think I prefer taking one ref for each bind. If user
> calls bind twice, it should call unbind twice as well (in case of leak we
> free the context on process exit).
> 
> Also with this implementation, user can call bind for two devices in the
> same domain, which will share the same context structure. So we have to
> take as many refs as bind() calls.


hmm, it has two ref, one for _context_ and the other for *process* (or maybe mm for
your next version), right? For each bind it will take a ref of context as present
design. but why also process ref need be taken for each bind? I mean it seems does
not break _user can call bind for two devices in the same domain_.

And if you really want to take a ref of *process* for echo bind, you should put it when
unbind, right? I just not find where you put the ref of process when unbind. But just put
the process ref when free context.

Maybe I just miss something.

Thanks
Yisheng Xie
> 
> Thanks,
> Jean
> 
> .
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 05/36] iommu/process: Bind and unbind process to and from devices
@ 2017-11-30  1:11           ` Yisheng Xie
  0 siblings, 0 replies; 268+ messages in thread
From: Yisheng Xie @ 2017-11-30  1:11 UTC (permalink / raw)
  To: linux-arm-kernel

Hi, Jean,

On 2017/11/29 23:01, Jean-Philippe Brucker wrote:
> On 29/11/17 06:08, Yisheng Xie wrote:
>>
>>
>> On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
>>> +int iommu_process_bind_device(struct device *dev, struct task_struct *task,
>>> +			      int *pasid, int flags)
>>> +{
>> [..]
>>> +			err = iommu_process_attach_locked(context, dev);
>>> +			if (err)
>>> +				iommu_process_put_locked(process);
>> one ref for a context is enough right? so also need call iommu_process_put_locked()
>> if attach ok, or will be leak if user call bind twice for the same device and task.
> 
> I wasn't sure, I think I prefer taking one ref for each bind. If user
> calls bind twice, it should call unbind twice as well (in case of leak we
> free the context on process exit).
> 
> Also with this implementation, user can call bind for two devices in the
> same domain, which will share the same context structure. So we have to
> take as many refs as bind() calls.


hmm, it has two ref, one for _context_ and the other for *process* (or maybe mm for
your next version), right? For each bind it will take a ref of context as present
design. but why also process ref need be taken for each bind? I mean it seems does
not break _user can call bind for two devices in the same domain_.

And if you really want to take a ref of *process* for echo bind, you should put it when
unbind, right? I just not find where you put the ref of process when unbind. But just put
the process ref when free context.

Maybe I just miss something.

Thanks
Yisheng Xie
> 
> Thanks,
> Jean
> 
> .
> 

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 09/36] iommu/fault: Allow blocking fault handlers
  2017-11-29 15:01           ` Jean-Philippe Brucker
  (?)
@ 2017-11-30  2:45               ` Yisheng Xie
  -1 siblings, 0 replies; 268+ messages in thread
From: Yisheng Xie @ 2017-11-30  2:45 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: Mark Rutland, gabriele.paoloni-hv44wF8Li93QT0dZR+AlfA,
	Catalin Marinas, Will Deacon, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw-LthD3rsA81gm4RdzfppkhA,
	Sudeep Holla

hi jean,

On 2017/11/29 23:01, Jean-Philippe Brucker wrote:
> Hello,
> 
> On 29/11/17 06:15, Yisheng Xie wrote:
>> Hi Jean,
>>
>> On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
>>> -	if (domain->ext_handler) {
>>> +	if (domain->handler_flags & IOMMU_FAULT_HANDLER_ATOMIC) {
>>> +		fault->flags |= IOMMU_FAULT_ATOMIC;
>>
>> Why remove the condition of domain->ext_handler? should it be much better like:
>>   if ((domain->handler_flags & IOMMU_FAULT_HANDLER_ATOMIC) && domain->ext_handler)
>>
>> If domain->ext_handler is NULL, and (domain->handler_flags & IOMMU_FAULT_HANDLER_ATOMIC)
>> is true. It will oops, right?
> 
> I removed the check because ext_handler shouldn't be NULL if handler_flags
> has a bit set (as per iommu_set_ext_fault_handler). But you're right that
> this is fragile, and I overlooked the case where users could call
> set_ext_fault_handler to clear the fault handler.
> 
> (Note that this ext_handler will most likely be replaced by the fault
> infrastructure that Jacob is working on:
> https://patchwork.kernel.org/patch/10063385/ to which we should add the
> atomic/blocking flags)
> 

Get it, thanks for your explanation.

Thanks
Yisheng Xie

> Thanks,
> Jean
> 
> .
> 

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 09/36] iommu/fault: Allow blocking fault handlers
@ 2017-11-30  2:45               ` Yisheng Xie
  0 siblings, 0 replies; 268+ messages in thread
From: Yisheng Xie @ 2017-11-30  2:45 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu
  Cc: Mark Rutland, gabriele.paoloni, Catalin Marinas, Will Deacon,
	okaya, yi.l.liu, Lorenzo Pieralisi, ashok.raj, tn, joro, rfranz,
	lenb, jacob.jun.pan, alex.williamson, robh+dt, thunder.leizhen,
	bhelgaas, dwmw2, liubo95, rjw, robdclark, hanjun.guo,
	Sudeep Holla, Robin Murphy, nwatters

hi jean,

On 2017/11/29 23:01, Jean-Philippe Brucker wrote:
> Hello,
> 
> On 29/11/17 06:15, Yisheng Xie wrote:
>> Hi Jean,
>>
>> On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
>>> -	if (domain->ext_handler) {
>>> +	if (domain->handler_flags & IOMMU_FAULT_HANDLER_ATOMIC) {
>>> +		fault->flags |= IOMMU_FAULT_ATOMIC;
>>
>> Why remove the condition of domain->ext_handler? should it be much better like:
>>   if ((domain->handler_flags & IOMMU_FAULT_HANDLER_ATOMIC) && domain->ext_handler)
>>
>> If domain->ext_handler is NULL, and (domain->handler_flags & IOMMU_FAULT_HANDLER_ATOMIC)
>> is true. It will oops, right?
> 
> I removed the check because ext_handler shouldn't be NULL if handler_flags
> has a bit set (as per iommu_set_ext_fault_handler). But you're right that
> this is fragile, and I overlooked the case where users could call
> set_ext_fault_handler to clear the fault handler.
> 
> (Note that this ext_handler will most likely be replaced by the fault
> infrastructure that Jacob is working on:
> https://patchwork.kernel.org/patch/10063385/ to which we should add the
> atomic/blocking flags)
> 

Get it, thanks for your explanation.

Thanks
Yisheng Xie

> Thanks,
> Jean
> 
> .
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 09/36] iommu/fault: Allow blocking fault handlers
@ 2017-11-30  2:45               ` Yisheng Xie
  0 siblings, 0 replies; 268+ messages in thread
From: Yisheng Xie @ 2017-11-30  2:45 UTC (permalink / raw)
  To: linux-arm-kernel

hi jean,

On 2017/11/29 23:01, Jean-Philippe Brucker wrote:
> Hello,
> 
> On 29/11/17 06:15, Yisheng Xie wrote:
>> Hi Jean,
>>
>> On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
>>> -	if (domain->ext_handler) {
>>> +	if (domain->handler_flags & IOMMU_FAULT_HANDLER_ATOMIC) {
>>> +		fault->flags |= IOMMU_FAULT_ATOMIC;
>>
>> Why remove the condition of domain->ext_handler? should it be much better like:
>>   if ((domain->handler_flags & IOMMU_FAULT_HANDLER_ATOMIC) && domain->ext_handler)
>>
>> If domain->ext_handler is NULL, and (domain->handler_flags & IOMMU_FAULT_HANDLER_ATOMIC)
>> is true. It will oops, right?
> 
> I removed the check because ext_handler shouldn't be NULL if handler_flags
> has a bit set (as per iommu_set_ext_fault_handler). But you're right that
> this is fragile, and I overlooked the case where users could call
> set_ext_fault_handler to clear the fault handler.
> 
> (Note that this ext_handler will most likely be replaced by the fault
> infrastructure that Jacob is working on:
> https://patchwork.kernel.org/patch/10063385/ to which we should add the
> atomic/blocking flags)
> 

Get it, thanks for your explanation.

Thanks
Yisheng Xie

> Thanks,
> Jean
> 
> .
> 

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 05/36] iommu/process: Bind and unbind process to and from devices
  2017-11-30  1:11           ` Yisheng Xie
  (?)
@ 2017-11-30 13:39             ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-30 13:39 UTC (permalink / raw)
  To: Yisheng Xie, linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, Mark Rutland, Catalin Marinas, Will Deacon,
	Lorenzo Pieralisi, hanjun.guo, Sudeep Holla, rjw, lenb,
	Robin Murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, gabriele.paoloni, nwatters

On 30/11/17 01:11, Yisheng Xie wrote:
> Hi, Jean,
> 
> On 2017/11/29 23:01, Jean-Philippe Brucker wrote:
>> On 29/11/17 06:08, Yisheng Xie wrote:
>>>
>>>
>>> On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
>>>> +int iommu_process_bind_device(struct device *dev, struct task_struct *task,
>>>> +			      int *pasid, int flags)
>>>> +{
>>> [..]
>>>> +			err = iommu_process_attach_locked(context, dev);
>>>> +			if (err)
>>>> +				iommu_process_put_locked(process);
>>> one ref for a context is enough right? so also need call iommu_process_put_locked()
>>> if attach ok, or will be leak if user call bind twice for the same device and task.
>>
>> I wasn't sure, I think I prefer taking one ref for each bind. If user
>> calls bind twice, it should call unbind twice as well (in case of leak we
>> free the context on process exit).
>>
>> Also with this implementation, user can call bind for two devices in the
>> same domain, which will share the same context structure. So we have to
>> take as many refs as bind() calls.
> 
> 
> hmm, it has two ref, one for _context_ and the other for *process* (or maybe mm for
> your next version), right? For each bind it will take a ref of context as present
> design. but why also process ref need be taken for each bind? I mean it seems does
> not break _user can call bind for two devices in the same domain_.
> 
> And if you really want to take a ref of *process* for echo bind, you should put it when
> unbind, right? I just not find where you put the ref of process when unbind. But just put
> the process ref when free context.
> 
> Maybe I just miss something.

No you're right I misunderstood, sorry about that. Each context has a
single ref to a process, so we do need to drop the process ref here as you
pointed out.

I thought I exercised this path though, I'll update my test suite. Also
attach_locked shouldn't take a context ref if attach fails...

Thanks a lot,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 05/36] iommu/process: Bind and unbind process to and from devices
@ 2017-11-30 13:39             ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-30 13:39 UTC (permalink / raw)
  To: Yisheng Xie, linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: Mark Rutland, gabriele.paoloni, Catalin Marinas, Will Deacon,
	okaya, yi.l.liu, Lorenzo Pieralisi, ashok.raj, tn, joro, rfranz,
	lenb, jacob.jun.pan, alex.williamson, robh+dt, thunder.leizhen,
	bhelgaas, dwmw2, liubo95, rjw, robdclark, hanjun.guo,
	Sudeep Holla, Robin Murphy, nwatters

On 30/11/17 01:11, Yisheng Xie wrote:
> Hi, Jean,
> 
> On 2017/11/29 23:01, Jean-Philippe Brucker wrote:
>> On 29/11/17 06:08, Yisheng Xie wrote:
>>>
>>>
>>> On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
>>>> +int iommu_process_bind_device(struct device *dev, struct task_struct *task,
>>>> +			      int *pasid, int flags)
>>>> +{
>>> [..]
>>>> +			err = iommu_process_attach_locked(context, dev);
>>>> +			if (err)
>>>> +				iommu_process_put_locked(process);
>>> one ref for a context is enough right? so also need call iommu_process_put_locked()
>>> if attach ok, or will be leak if user call bind twice for the same device and task.
>>
>> I wasn't sure, I think I prefer taking one ref for each bind. If user
>> calls bind twice, it should call unbind twice as well (in case of leak we
>> free the context on process exit).
>>
>> Also with this implementation, user can call bind for two devices in the
>> same domain, which will share the same context structure. So we have to
>> take as many refs as bind() calls.
> 
> 
> hmm, it has two ref, one for _context_ and the other for *process* (or maybe mm for
> your next version), right? For each bind it will take a ref of context as present
> design. but why also process ref need be taken for each bind? I mean it seems does
> not break _user can call bind for two devices in the same domain_.
> 
> And if you really want to take a ref of *process* for echo bind, you should put it when
> unbind, right? I just not find where you put the ref of process when unbind. But just put
> the process ref when free context.
> 
> Maybe I just miss something.

No you're right I misunderstood, sorry about that. Each context has a
single ref to a process, so we do need to drop the process ref here as you
pointed out.

I thought I exercised this path though, I'll update my test suite. Also
attach_locked shouldn't take a context ref if attach fails...

Thanks a lot,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 05/36] iommu/process: Bind and unbind process to and from devices
@ 2017-11-30 13:39             ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-11-30 13:39 UTC (permalink / raw)
  To: linux-arm-kernel

On 30/11/17 01:11, Yisheng Xie wrote:
> Hi, Jean,
> 
> On 2017/11/29 23:01, Jean-Philippe Brucker wrote:
>> On 29/11/17 06:08, Yisheng Xie wrote:
>>>
>>>
>>> On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
>>>> +int iommu_process_bind_device(struct device *dev, struct task_struct *task,
>>>> +			      int *pasid, int flags)
>>>> +{
>>> [..]
>>>> +			err = iommu_process_attach_locked(context, dev);
>>>> +			if (err)
>>>> +				iommu_process_put_locked(process);
>>> one ref for a context is enough right? so also need call iommu_process_put_locked()
>>> if attach ok, or will be leak if user call bind twice for the same device and task.
>>
>> I wasn't sure, I think I prefer taking one ref for each bind. If user
>> calls bind twice, it should call unbind twice as well (in case of leak we
>> free the context on process exit).
>>
>> Also with this implementation, user can call bind for two devices in the
>> same domain, which will share the same context structure. So we have to
>> take as many refs as bind() calls.
> 
> 
> hmm, it has two ref, one for _context_ and the other for *process* (or maybe mm for
> your next version), right? For each bind it will take a ref of context as present
> design. but why also process ref need be taken for each bind? I mean it seems does
> not break _user can call bind for two devices in the same domain_.
> 
> And if you really want to take a ref of *process* for echo bind, you should put it when
> unbind, right? I just not find where you put the ref of process when unbind. But just put
> the process ref when free context.
> 
> Maybe I just miss something.

No you're right I misunderstood, sorry about that. Each context has a
single ref to a process, so we do need to drop the process ref here as you
pointed out.

I thought I exercised this path though, I'll update my test suite. Also
attach_locked shouldn't take a context ref if attach fails...

Thanks a lot,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 26/36] iommu/arm-smmu-v3: Add support for Hardware Translation Table Update
  2017-10-06 13:31   ` Jean-Philippe Brucker
  (?)
@ 2017-12-06  6:51     ` Yisheng Xie
  -1 siblings, 0 replies; 268+ messages in thread
From: Yisheng Xie @ 2017-12-06  6:51 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, gabriele.paoloni, nwatters, okaya, rfranz,
	dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

Hi Jean,

On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
> If the SMMU supports it and the kernel was built with HTTU support, enable
> +	if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM) && (reg & (IDR0_HA | IDR0_HD))) {
> +		smmu->features |= ARM_SMMU_FEAT_HA;
> +		if (reg & IDR0_HD)
> +			smmu->features |= ARM_SMMU_FEAT_HD;
> +	}

What is relationship of armv8.1 HW_AFDBM and SMMUv3 HTTU? I mean why we need
IS_ENABLED(CONFIG_ARM64_HW_AFDBM) ?

If CONFIG_ARM64_HW_AFDBM=y but the process do not support ARMv8.1, should it also
enable related feature for SMMUv3?

Thanks
Yisheng Xie
> +
>  	/*
>  	 * If the CPU is using VHE, but the SMMU doesn't support it, the SMMU
>  	 * will create TLB entries for NH-EL1 world and will miss the
> 

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 26/36] iommu/arm-smmu-v3: Add support for Hardware Translation Table Update
@ 2017-12-06  6:51     ` Yisheng Xie
  0 siblings, 0 replies; 268+ messages in thread
From: Yisheng Xie @ 2017-12-06  6:51 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, gabriele.paoloni, nwatters, okaya, rfranz,
	dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

Hi Jean,

On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
> If the SMMU supports it and the kernel was built with HTTU support, enable
> +	if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM) && (reg & (IDR0_HA | IDR0_HD))) {
> +		smmu->features |= ARM_SMMU_FEAT_HA;
> +		if (reg & IDR0_HD)
> +			smmu->features |= ARM_SMMU_FEAT_HD;
> +	}

What is relationship of armv8.1 HW_AFDBM and SMMUv3 HTTU? I mean why we need
IS_ENABLED(CONFIG_ARM64_HW_AFDBM) ?

If CONFIG_ARM64_HW_AFDBM=y but the process do not support ARMv8.1, should it also
enable related feature for SMMUv3?

Thanks
Yisheng Xie
> +
>  	/*
>  	 * If the CPU is using VHE, but the SMMU doesn't support it, the SMMU
>  	 * will create TLB entries for NH-EL1 world and will miss the
> 


^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 26/36] iommu/arm-smmu-v3: Add support for Hardware Translation Table Update
@ 2017-12-06  6:51     ` Yisheng Xie
  0 siblings, 0 replies; 268+ messages in thread
From: Yisheng Xie @ 2017-12-06  6:51 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jean,

On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
> If the SMMU supports it and the kernel was built with HTTU support, enable
> +	if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM) && (reg & (IDR0_HA | IDR0_HD))) {
> +		smmu->features |= ARM_SMMU_FEAT_HA;
> +		if (reg & IDR0_HD)
> +			smmu->features |= ARM_SMMU_FEAT_HD;
> +	}

What is relationship of armv8.1 HW_AFDBM and SMMUv3 HTTU? I mean why we need
IS_ENABLED(CONFIG_ARM64_HW_AFDBM) ?

If CONFIG_ARM64_HW_AFDBM=y but the process do not support ARMv8.1, should it also
enable related feature for SMMUv3?

Thanks
Yisheng Xie
> +
>  	/*
>  	 * If the CPU is using VHE, but the SMMU doesn't support it, the SMMU
>  	 * will create TLB entries for NH-EL1 world and will miss the
> 

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 26/36] iommu/arm-smmu-v3: Add support for Hardware Translation Table Update
  2017-12-06  6:51     ` Yisheng Xie
  (?)
@ 2017-12-06 11:06         ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-12-06 11:06 UTC (permalink / raw)
  To: Yisheng Xie, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: Mark Rutland, gabriele.paoloni-hv44wF8Li93QT0dZR+AlfA,
	Catalin Marinas, Will Deacon, okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw-LthD3rsA81gm4RdzfppkhA,
	Sudeep Holla

On 06/12/17 06:51, Yisheng Xie wrote:
> Hi Jean,
> 
> On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
>> If the SMMU supports it and the kernel was built with HTTU support, enable
>> +	if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM) && (reg & (IDR0_HA | IDR0_HD))) {
>> +		smmu->features |= ARM_SMMU_FEAT_HA;
>> +		if (reg & IDR0_HD)
>> +			smmu->features |= ARM_SMMU_FEAT_HD;
>> +	}
> 
> What is relationship of armv8.1 HW_AFDBM and SMMUv3 HTTU? I mean why we need
> IS_ENABLED(CONFIG_ARM64_HW_AFDBM) ?

I think the reason we needed this was that, without CONFIG_ARM64_HW_AFDBM,
the CPU wouldn't update the pte atomically and pte_dirty() wouldn't check
the DBM bit 51 (only the SW dirty bit 55).

Since af29678fe785 ("arm64: Remove the !CONFIG_ARM64_HW_AFDBM alternative
code paths") removed lots of #ifdefs, I'll see if we can remove the above
IS_ENABLED as well.

> If CONFIG_ARM64_HW_AFDBM=y but the process do not support ARMv8.1, should it also
> enable related feature for SMMUv3?

Yes I think we can enable HTTU in the SMMU even if the CPU doesn't support
it (though HTTU is only useful when sharing process address spaces). The
mm code checks for both HW and SW bits even when the CPU doesn't support
ARMv8.1

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 26/36] iommu/arm-smmu-v3: Add support for Hardware Translation Table Update
@ 2017-12-06 11:06         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-12-06 11:06 UTC (permalink / raw)
  To: Yisheng Xie, linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: joro, robh+dt, Mark Rutland, Catalin Marinas, Will Deacon,
	Lorenzo Pieralisi, hanjun.guo, Sudeep Holla, rjw, lenb,
	Robin Murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, gabriele.paoloni, nwatters, okaya, rfranz,
	dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

On 06/12/17 06:51, Yisheng Xie wrote:
> Hi Jean,
> 
> On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
>> If the SMMU supports it and the kernel was built with HTTU support, enable
>> +	if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM) && (reg & (IDR0_HA | IDR0_HD))) {
>> +		smmu->features |= ARM_SMMU_FEAT_HA;
>> +		if (reg & IDR0_HD)
>> +			smmu->features |= ARM_SMMU_FEAT_HD;
>> +	}
> 
> What is relationship of armv8.1 HW_AFDBM and SMMUv3 HTTU? I mean why we need
> IS_ENABLED(CONFIG_ARM64_HW_AFDBM) ?

I think the reason we needed this was that, without CONFIG_ARM64_HW_AFDBM,
the CPU wouldn't update the pte atomically and pte_dirty() wouldn't check
the DBM bit 51 (only the SW dirty bit 55).

Since af29678fe785 ("arm64: Remove the !CONFIG_ARM64_HW_AFDBM alternative
code paths") removed lots of #ifdefs, I'll see if we can remove the above
IS_ENABLED as well.

> If CONFIG_ARM64_HW_AFDBM=y but the process do not support ARMv8.1, should it also
> enable related feature for SMMUv3?

Yes I think we can enable HTTU in the SMMU even if the CPU doesn't support
it (though HTTU is only useful when sharing process address spaces). The
mm code checks for both HW and SW bits even when the CPU doesn't support
ARMv8.1

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 26/36] iommu/arm-smmu-v3: Add support for Hardware Translation Table Update
@ 2017-12-06 11:06         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2017-12-06 11:06 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/12/17 06:51, Yisheng Xie wrote:
> Hi Jean,
> 
> On 2017/10/6 21:31, Jean-Philippe Brucker wrote:
>> If the SMMU supports it and the kernel was built with HTTU support, enable
>> +	if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM) && (reg & (IDR0_HA | IDR0_HD))) {
>> +		smmu->features |= ARM_SMMU_FEAT_HA;
>> +		if (reg & IDR0_HD)
>> +			smmu->features |= ARM_SMMU_FEAT_HD;
>> +	}
> 
> What is relationship of armv8.1 HW_AFDBM and SMMUv3 HTTU? I mean why we need
> IS_ENABLED(CONFIG_ARM64_HW_AFDBM) ?

I think the reason we needed this was that, without CONFIG_ARM64_HW_AFDBM,
the CPU wouldn't update the pte atomically and pte_dirty() wouldn't check
the DBM bit 51 (only the SW dirty bit 55).

Since af29678fe785 ("arm64: Remove the !CONFIG_ARM64_HW_AFDBM alternative
code paths") removed lots of #ifdefs, I'll see if we can remove the above
IS_ENABLED as well.

> If CONFIG_ARM64_HW_AFDBM=y but the process do not support ARMv8.1, should it also
> enable related feature for SMMUv3?

Yes I think we can enable HTTU in the SMMU even if the CPU doesn't support
it (though HTTU is only useful when sharing process address spaces). The
mm code checks for both HW and SW bits even when the CPU doesn't support
ARMv8.1

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 05/36] iommu/process: Bind and unbind process to and from devices
  2017-10-06 13:31     ` Jean-Philippe Brucker
  (?)
@ 2018-01-19  4:52       ` Sinan Kaya
  -1 siblings, 0 replies; 268+ messages in thread
From: Sinan Kaya @ 2018-01-19  4:52 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu
  Cc: joro, robh+dt, mark.rutland, catalin.marinas, will.deacon,
	lorenzo.pieralisi, hanjun.guo, sudeep.holla, rjw, lenb,
	robin.murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, rfranz,
	dwmw2, jacob.jun.pan, yi.l.liu, ashok.raj, robdclark

Hi Jean-Philippe,

On 10/6/2017 9:31 AM, Jean-Philippe Brucker wrote:
>  /**
> + * iommu_process_bind_device - Bind a process address space to a device
> + * @dev: the device
> + * @task: the process to bind
> + * @pasid: valid address where the PASID will be stored
> + * @flags: bond properties (IOMMU_PROCESS_BIND_*)
> + *
> + * Create a bond between device and task, allowing the device to access the
> + * process address space using the returned PASID.
> + *
> + * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an error
> + * is returned.
> + */
> +int iommu_process_bind_device(struct device *dev, struct task_struct *task,
> +			      int *pasid, int flags)

This API doesn't play nice with endpoint device drivers that have PASID limitations.

The AMD driver seems to have PASID limitations per product that are not being
advertised in the PCI capability.

device_iommu_pasid_init()
{
        pasid_limit = min_t(unsigned int,
                        (unsigned int)(1 << kfd->device_info->max_pasid_bits),
                        iommu_info.max_pasids);
        /*
         * last pasid is used for kernel queues doorbells
         * in the future the last pasid might be used for a kernel thread.
         */
        pasid_limit = min_t(unsigned int,
                                pasid_limit,
                                kfd->doorbell_process_limit - 1);
}

kfd->device_info->max_pasid_bits seems to contain per device limitations.

Would you be willing to extend the API so that the requester can impose some limit
on the PASID value that is getting allocated.

Sinan

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 05/36] iommu/process: Bind and unbind process to and from devices
@ 2018-01-19  4:52       ` Sinan Kaya
  0 siblings, 0 replies; 268+ messages in thread
From: Sinan Kaya @ 2018-01-19  4:52 UTC (permalink / raw)
  To: Jean-Philippe Brucker, linux-arm-kernel, linux-pci, linux-acpi,
	devicetree, iommu
  Cc: mark.rutland, xieyisheng1, gabriele.paoloni, catalin.marinas,
	will.deacon, yi.l.liu, lorenzo.pieralisi, ashok.raj, tn, joro,
	rfranz, lenb, jacob.jun.pan, alex.williamson, robh+dt,
	thunder.leizhen, bhelgaas, dwmw2, liubo95, rjw, robdclark,
	hanjun.guo, sudeep.holla, robin.murphy, nwatters

Hi Jean-Philippe,

On 10/6/2017 9:31 AM, Jean-Philippe Brucker wrote:
>  /**
> + * iommu_process_bind_device - Bind a process address space to a device
> + * @dev: the device
> + * @task: the process to bind
> + * @pasid: valid address where the PASID will be stored
> + * @flags: bond properties (IOMMU_PROCESS_BIND_*)
> + *
> + * Create a bond between device and task, allowing the device to access the
> + * process address space using the returned PASID.
> + *
> + * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an error
> + * is returned.
> + */
> +int iommu_process_bind_device(struct device *dev, struct task_struct *task,
> +			      int *pasid, int flags)

This API doesn't play nice with endpoint device drivers that have PASID limitations.

The AMD driver seems to have PASID limitations per product that are not being
advertised in the PCI capability.

device_iommu_pasid_init()
{
        pasid_limit = min_t(unsigned int,
                        (unsigned int)(1 << kfd->device_info->max_pasid_bits),
                        iommu_info.max_pasids);
        /*
         * last pasid is used for kernel queues doorbells
         * in the future the last pasid might be used for a kernel thread.
         */
        pasid_limit = min_t(unsigned int,
                                pasid_limit,
                                kfd->doorbell_process_limit - 1);
}

kfd->device_info->max_pasid_bits seems to contain per device limitations.

Would you be willing to extend the API so that the requester can impose some limit
on the PASID value that is getting allocated.

Sinan

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 05/36] iommu/process: Bind and unbind process to and from devices
@ 2018-01-19  4:52       ` Sinan Kaya
  0 siblings, 0 replies; 268+ messages in thread
From: Sinan Kaya @ 2018-01-19  4:52 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jean-Philippe,

On 10/6/2017 9:31 AM, Jean-Philippe Brucker wrote:
>  /**
> + * iommu_process_bind_device - Bind a process address space to a device
> + * @dev: the device
> + * @task: the process to bind
> + * @pasid: valid address where the PASID will be stored
> + * @flags: bond properties (IOMMU_PROCESS_BIND_*)
> + *
> + * Create a bond between device and task, allowing the device to access the
> + * process address space using the returned PASID.
> + *
> + * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an error
> + * is returned.
> + */
> +int iommu_process_bind_device(struct device *dev, struct task_struct *task,
> +			      int *pasid, int flags)

This API doesn't play nice with endpoint device drivers that have PASID limitations.

The AMD driver seems to have PASID limitations per product that are not being
advertised in the PCI capability.

device_iommu_pasid_init()
{
        pasid_limit = min_t(unsigned int,
                        (unsigned int)(1 << kfd->device_info->max_pasid_bits),
                        iommu_info.max_pasids);
        /*
         * last pasid is used for kernel queues doorbells
         * in the future the last pasid might be used for a kernel thread.
         */
        pasid_limit = min_t(unsigned int,
                                pasid_limit,
                                kfd->doorbell_process_limit - 1);
}

kfd->device_info->max_pasid_bits seems to contain per device limitations.

Would you be willing to extend the API so that the requester can impose some limit
on the PASID value that is getting allocated.

Sinan

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 05/36] iommu/process: Bind and unbind process to and from devices
  2018-01-19  4:52       ` Sinan Kaya
  (?)
@ 2018-01-19 10:27           ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2018-01-19 10:27 UTC (permalink / raw)
  To: Sinan Kaya, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: Mark Rutland, gabriele.paoloni-hv44wF8Li93QT0dZR+AlfA,
	Catalin Marinas, Will Deacon, ashok.raj-ral2JQCrhuEAvxtiuMwx3w,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, rjw-LthD3rsA81gm4RdzfppkhA,
	Sudeep Holla

Hi Sinan,

On 19/01/18 04:52, Sinan Kaya wrote:
> Hi Jean-Philippe,
> 
> On 10/6/2017 9:31 AM, Jean-Philippe Brucker wrote:
>>  /**
>> + * iommu_process_bind_device - Bind a process address space to a device
>> + * @dev: the device
>> + * @task: the process to bind
>> + * @pasid: valid address where the PASID will be stored
>> + * @flags: bond properties (IOMMU_PROCESS_BIND_*)
>> + *
>> + * Create a bond between device and task, allowing the device to access the
>> + * process address space using the returned PASID.
>> + *
>> + * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an error
>> + * is returned.
>> + */
>> +int iommu_process_bind_device(struct device *dev, struct task_struct *task,
>> +			      int *pasid, int flags)
> 
> This API doesn't play nice with endpoint device drivers that have PASID limitations.
> 
> The AMD driver seems to have PASID limitations per product that are not being
> advertised in the PCI capability.
> 
> device_iommu_pasid_init()
> {
>         pasid_limit = min_t(unsigned int,
>                         (unsigned int)(1 << kfd->device_info->max_pasid_bits),
>                         iommu_info.max_pasids);
>         /*
>          * last pasid is used for kernel queues doorbells
>          * in the future the last pasid might be used for a kernel thread.
>          */
>         pasid_limit = min_t(unsigned int,
>                                 pasid_limit,
>                                 kfd->doorbell_process_limit - 1);
> }
> 
> kfd->device_info->max_pasid_bits seems to contain per device limitations.
> 
> Would you be willing to extend the API so that the requester can impose some limit
> on the PASID value that is getting allocated.

Good point. Following the feedback for this series, next version adds
another public function:

int iommu_sva_device_init(struct device *dev, int features);

that has to be called by the device driver before any bind(). The intent
is to let some IOMMU drivers initialize PASID tables and other features
lazily, only if the device driver actually intends to use them. Maybe I
could change this function to:

int iommu_sva_device_init(struct device *dev, int features, unsigned int
max_pasid);

@features is a bitmask telling what the device driver needs (PASID and/or
page faults). If features has IOMMU_SVA_FEAT_PASID set, then device driver
can set a max_pasid limit, that we'd store in our private device-iommu
data. If max_pasid is 0, then we'd use the PCI limit.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 05/36] iommu/process: Bind and unbind process to and from devices
@ 2018-01-19 10:27           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2018-01-19 10:27 UTC (permalink / raw)
  To: Sinan Kaya, linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu
  Cc: Mark Rutland, xieyisheng1, gabriele.paoloni, Catalin Marinas,
	Will Deacon, yi.l.liu, Lorenzo Pieralisi, ashok.raj, tn, joro,
	rfranz, lenb, jacob.jun.pan, alex.williamson, robh+dt,
	thunder.leizhen, bhelgaas, dwmw2, liubo95, rjw, robdclark,
	hanjun.guo, Sudeep Holla, Robin Murphy, nwatters

Hi Sinan,

On 19/01/18 04:52, Sinan Kaya wrote:
> Hi Jean-Philippe,
> 
> On 10/6/2017 9:31 AM, Jean-Philippe Brucker wrote:
>>  /**
>> + * iommu_process_bind_device - Bind a process address space to a device
>> + * @dev: the device
>> + * @task: the process to bind
>> + * @pasid: valid address where the PASID will be stored
>> + * @flags: bond properties (IOMMU_PROCESS_BIND_*)
>> + *
>> + * Create a bond between device and task, allowing the device to access the
>> + * process address space using the returned PASID.
>> + *
>> + * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an error
>> + * is returned.
>> + */
>> +int iommu_process_bind_device(struct device *dev, struct task_struct *task,
>> +			      int *pasid, int flags)
> 
> This API doesn't play nice with endpoint device drivers that have PASID limitations.
> 
> The AMD driver seems to have PASID limitations per product that are not being
> advertised in the PCI capability.
> 
> device_iommu_pasid_init()
> {
>         pasid_limit = min_t(unsigned int,
>                         (unsigned int)(1 << kfd->device_info->max_pasid_bits),
>                         iommu_info.max_pasids);
>         /*
>          * last pasid is used for kernel queues doorbells
>          * in the future the last pasid might be used for a kernel thread.
>          */
>         pasid_limit = min_t(unsigned int,
>                                 pasid_limit,
>                                 kfd->doorbell_process_limit - 1);
> }
> 
> kfd->device_info->max_pasid_bits seems to contain per device limitations.
> 
> Would you be willing to extend the API so that the requester can impose some limit
> on the PASID value that is getting allocated.

Good point. Following the feedback for this series, next version adds
another public function:

int iommu_sva_device_init(struct device *dev, int features);

that has to be called by the device driver before any bind(). The intent
is to let some IOMMU drivers initialize PASID tables and other features
lazily, only if the device driver actually intends to use them. Maybe I
could change this function to:

int iommu_sva_device_init(struct device *dev, int features, unsigned int
max_pasid);

@features is a bitmask telling what the device driver needs (PASID and/or
page faults). If features has IOMMU_SVA_FEAT_PASID set, then device driver
can set a max_pasid limit, that we'd store in our private device-iommu
data. If max_pasid is 0, then we'd use the PCI limit.

Thanks,
Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 05/36] iommu/process: Bind and unbind process to and from devices
@ 2018-01-19 10:27           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2018-01-19 10:27 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Sinan,

On 19/01/18 04:52, Sinan Kaya wrote:
> Hi Jean-Philippe,
> 
> On 10/6/2017 9:31 AM, Jean-Philippe Brucker wrote:
>>  /**
>> + * iommu_process_bind_device - Bind a process address space to a device
>> + * @dev: the device
>> + * @task: the process to bind
>> + * @pasid: valid address where the PASID will be stored
>> + * @flags: bond properties (IOMMU_PROCESS_BIND_*)
>> + *
>> + * Create a bond between device and task, allowing the device to access the
>> + * process address space using the returned PASID.
>> + *
>> + * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an error
>> + * is returned.
>> + */
>> +int iommu_process_bind_device(struct device *dev, struct task_struct *task,
>> +			      int *pasid, int flags)
> 
> This API doesn't play nice with endpoint device drivers that have PASID limitations.
> 
> The AMD driver seems to have PASID limitations per product that are not being
> advertised in the PCI capability.
> 
> device_iommu_pasid_init()
> {
>         pasid_limit = min_t(unsigned int,
>                         (unsigned int)(1 << kfd->device_info->max_pasid_bits),
>                         iommu_info.max_pasids);
>         /*
>          * last pasid is used for kernel queues doorbells
>          * in the future the last pasid might be used for a kernel thread.
>          */
>         pasid_limit = min_t(unsigned int,
>                                 pasid_limit,
>                                 kfd->doorbell_process_limit - 1);
> }
> 
> kfd->device_info->max_pasid_bits seems to contain per device limitations.
> 
> Would you be willing to extend the API so that the requester can impose some limit
> on the PASID value that is getting allocated.

Good point. Following the feedback for this series, next version adds
another public function:

int iommu_sva_device_init(struct device *dev, int features);

that has to be called by the device driver before any bind(). The intent
is to let some IOMMU drivers initialize PASID tables and other features
lazily, only if the device driver actually intends to use them. Maybe I
could change this function to:

int iommu_sva_device_init(struct device *dev, int features, unsigned int
max_pasid);

@features is a bitmask telling what the device driver needs (PASID and/or
page faults). If features has IOMMU_SVA_FEAT_PASID set, then device driver
can set a max_pasid limit, that we'd store in our private device-iommu
data. If max_pasid is 0, then we'd use the PCI limit.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 05/36] iommu/process: Bind and unbind process to and from devices
  2018-01-19 10:27           ` Jean-Philippe Brucker
  (?)
@ 2018-01-19 13:07             ` okaya
  -1 siblings, 0 replies; 268+ messages in thread
From: okaya @ 2018-01-19 13:07 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: linux-arm-kernel, linux-pci, linux-acpi, devicetree, iommu, joro,
	robh+dt, Mark Rutland, Catalin Marinas, Will Deacon,
	Lorenzo Pieralisi, hanjun.guo, Sudeep Holla, rjw, lenb,
	Robin Murphy, bhelgaas, alex.williamson, tn, liubo95,
	thunder.leizhen, xieyisheng1, gabriele.paoloni, nwatters, rfranz,
	dwmw2, jacob.jun.pan, yi.l.liu, ashok.

On 2018-01-19 05:27, Jean-Philippe Brucker wrote:
> Hi Sinan,
> 
> On 19/01/18 04:52, Sinan Kaya wrote:
>> Hi Jean-Philippe,
>> 
>> On 10/6/2017 9:31 AM, Jean-Philippe Brucker wrote:
>>>  /**
>>> + * iommu_process_bind_device - Bind a process address space to a 
>>> device
>>> + * @dev: the device
>>> + * @task: the process to bind
>>> + * @pasid: valid address where the PASID will be stored
>>> + * @flags: bond properties (IOMMU_PROCESS_BIND_*)
>>> + *
>>> + * Create a bond between device and task, allowing the device to 
>>> access the
>>> + * process address space using the returned PASID.
>>> + *
>>> + * On success, 0 is returned and @pasid contains a valid ID. 
>>> Otherwise, an error
>>> + * is returned.
>>> + */
>>> +int iommu_process_bind_device(struct device *dev, struct task_struct 
>>> *task,
>>> +			      int *pasid, int flags)
>> 
>> This API doesn't play nice with endpoint device drivers that have 
>> PASID limitations.
>> 
>> The AMD driver seems to have PASID limitations per product that are 
>> not being
>> advertised in the PCI capability.
>> 
>> device_iommu_pasid_init()
>> {
>>         pasid_limit = min_t(unsigned int,
>>                         (unsigned int)(1 << 
>> kfd->device_info->max_pasid_bits),
>>                         iommu_info.max_pasids);
>>         /*
>>          * last pasid is used for kernel queues doorbells
>>          * in the future the last pasid might be used for a kernel 
>> thread.
>>          */
>>         pasid_limit = min_t(unsigned int,
>>                                 pasid_limit,
>>                                 kfd->doorbell_process_limit - 1);
>> }
>> 
>> kfd->device_info->max_pasid_bits seems to contain per device 
>> limitations.
>> 
>> Would you be willing to extend the API so that the requester can 
>> impose some limit
>> on the PASID value that is getting allocated.
> 
> Good point. Following the feedback for this series, next version adds
> another public function:
> 
> int iommu_sva_device_init(struct device *dev, int features);
> 
> that has to be called by the device driver before any bind(). The 
> intent
> is to let some IOMMU drivers initialize PASID tables and other features
> lazily, only if the device driver actually intends to use them. Maybe I
> could change this function to:
> 
> int iommu_sva_device_init(struct device *dev, int features, unsigned 
> int
> max_pasid);
> 
> @features is a bitmask telling what the device driver needs (PASID 
> and/or
> page faults). If features has IOMMU_SVA_FEAT_PASID set, then device 
> driver
> can set a max_pasid limit, that we'd store in our private device-iommu
> data. If max_pasid is 0, then we'd use the PCI limit.

Yes, this should work.

> 
> Thanks,
> Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 05/36] iommu/process: Bind and unbind process to and from devices
@ 2018-01-19 13:07             ` okaya
  0 siblings, 0 replies; 268+ messages in thread
From: okaya @ 2018-01-19 13:07 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Mark Rutland, xieyisheng1, gabriele.paoloni, linux-pci,
	Will Deacon, yi.l.liu, Lorenzo Pieralisi, ashok.raj, tn, joro,
	robdclark, linux-acpi, Catalin Marinas, rfranz, lenb, devicetree,
	jacob.jun.pan, alex.williamson, robh+dt, thunder.leizhen,
	bhelgaas, linux-arm-kernel, dwmw2, liubo95, rjw, iommu,
	hanjun.guo, Sudeep Holla, Robin Murphy, linux-pci-owner,
	nwatters

On 2018-01-19 05:27, Jean-Philippe Brucker wrote:
> Hi Sinan,
> 
> On 19/01/18 04:52, Sinan Kaya wrote:
>> Hi Jean-Philippe,
>> 
>> On 10/6/2017 9:31 AM, Jean-Philippe Brucker wrote:
>>>  /**
>>> + * iommu_process_bind_device - Bind a process address space to a 
>>> device
>>> + * @dev: the device
>>> + * @task: the process to bind
>>> + * @pasid: valid address where the PASID will be stored
>>> + * @flags: bond properties (IOMMU_PROCESS_BIND_*)
>>> + *
>>> + * Create a bond between device and task, allowing the device to 
>>> access the
>>> + * process address space using the returned PASID.
>>> + *
>>> + * On success, 0 is returned and @pasid contains a valid ID. 
>>> Otherwise, an error
>>> + * is returned.
>>> + */
>>> +int iommu_process_bind_device(struct device *dev, struct task_struct 
>>> *task,
>>> +			      int *pasid, int flags)
>> 
>> This API doesn't play nice with endpoint device drivers that have 
>> PASID limitations.
>> 
>> The AMD driver seems to have PASID limitations per product that are 
>> not being
>> advertised in the PCI capability.
>> 
>> device_iommu_pasid_init()
>> {
>>         pasid_limit = min_t(unsigned int,
>>                         (unsigned int)(1 << 
>> kfd->device_info->max_pasid_bits),
>>                         iommu_info.max_pasids);
>>         /*
>>          * last pasid is used for kernel queues doorbells
>>          * in the future the last pasid might be used for a kernel 
>> thread.
>>          */
>>         pasid_limit = min_t(unsigned int,
>>                                 pasid_limit,
>>                                 kfd->doorbell_process_limit - 1);
>> }
>> 
>> kfd->device_info->max_pasid_bits seems to contain per device 
>> limitations.
>> 
>> Would you be willing to extend the API so that the requester can 
>> impose some limit
>> on the PASID value that is getting allocated.
> 
> Good point. Following the feedback for this series, next version adds
> another public function:
> 
> int iommu_sva_device_init(struct device *dev, int features);
> 
> that has to be called by the device driver before any bind(). The 
> intent
> is to let some IOMMU drivers initialize PASID tables and other features
> lazily, only if the device driver actually intends to use them. Maybe I
> could change this function to:
> 
> int iommu_sva_device_init(struct device *dev, int features, unsigned 
> int
> max_pasid);
> 
> @features is a bitmask telling what the device driver needs (PASID 
> and/or
> page faults). If features has IOMMU_SVA_FEAT_PASID set, then device 
> driver
> can set a max_pasid limit, that we'd store in our private device-iommu
> data. If max_pasid is 0, then we'd use the PCI limit.

Yes, this should work.

> 
> Thanks,
> Jean

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 05/36] iommu/process: Bind and unbind process to and from devices
@ 2018-01-19 13:07             ` okaya
  0 siblings, 0 replies; 268+ messages in thread
From: okaya at codeaurora.org @ 2018-01-19 13:07 UTC (permalink / raw)
  To: linux-arm-kernel

On 2018-01-19 05:27, Jean-Philippe Brucker wrote:
> Hi Sinan,
> 
> On 19/01/18 04:52, Sinan Kaya wrote:
>> Hi Jean-Philippe,
>> 
>> On 10/6/2017 9:31 AM, Jean-Philippe Brucker wrote:
>>>  /**
>>> + * iommu_process_bind_device - Bind a process address space to a 
>>> device
>>> + * @dev: the device
>>> + * @task: the process to bind
>>> + * @pasid: valid address where the PASID will be stored
>>> + * @flags: bond properties (IOMMU_PROCESS_BIND_*)
>>> + *
>>> + * Create a bond between device and task, allowing the device to 
>>> access the
>>> + * process address space using the returned PASID.
>>> + *
>>> + * On success, 0 is returned and @pasid contains a valid ID. 
>>> Otherwise, an error
>>> + * is returned.
>>> + */
>>> +int iommu_process_bind_device(struct device *dev, struct task_struct 
>>> *task,
>>> +			      int *pasid, int flags)
>> 
>> This API doesn't play nice with endpoint device drivers that have 
>> PASID limitations.
>> 
>> The AMD driver seems to have PASID limitations per product that are 
>> not being
>> advertised in the PCI capability.
>> 
>> device_iommu_pasid_init()
>> {
>>         pasid_limit = min_t(unsigned int,
>>                         (unsigned int)(1 << 
>> kfd->device_info->max_pasid_bits),
>>                         iommu_info.max_pasids);
>>         /*
>>          * last pasid is used for kernel queues doorbells
>>          * in the future the last pasid might be used for a kernel 
>> thread.
>>          */
>>         pasid_limit = min_t(unsigned int,
>>                                 pasid_limit,
>>                                 kfd->doorbell_process_limit - 1);
>> }
>> 
>> kfd->device_info->max_pasid_bits seems to contain per device 
>> limitations.
>> 
>> Would you be willing to extend the API so that the requester can 
>> impose some limit
>> on the PASID value that is getting allocated.
> 
> Good point. Following the feedback for this series, next version adds
> another public function:
> 
> int iommu_sva_device_init(struct device *dev, int features);
> 
> that has to be called by the device driver before any bind(). The 
> intent
> is to let some IOMMU drivers initialize PASID tables and other features
> lazily, only if the device driver actually intends to use them. Maybe I
> could change this function to:
> 
> int iommu_sva_device_init(struct device *dev, int features, unsigned 
> int
> max_pasid);
> 
> @features is a bitmask telling what the device driver needs (PASID 
> and/or
> page faults). If features has IOMMU_SVA_FEAT_PASID set, then device 
> driver
> can set a max_pasid limit, that we'd store in our private device-iommu
> data. If max_pasid is 0, then we'd use the PCI limit.

Yes, this should work.

> 
> Thanks,
> Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 00/36] Process management for IOMMU + SVM for SMMUv3
  2017-10-25 20:20                       ` Jordan Crouse
@ 2018-02-05 18:15                           ` Jordan Crouse
  -1 siblings, 0 replies; 268+ messages in thread
From: Jordan Crouse @ 2018-02-05 18:15 UTC (permalink / raw)
  To: Jean-Philippe Brucker,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Bob Liu,
	gabriele.paoloni-hv44wF8Li93QT0dZR+AlfA, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Yisheng Xie,
	Robin Murphy, Joerg Roedel, dwmw2-wEGCiKHe2LqWVfeAwA7xHQ,
	Alex Williamson

On Wed, Oct 25, 2017 at 02:20:15PM -0600, Jordan Crouse wrote:
> On Mon, Oct 23, 2017 at 02:00:07PM +0100, Jean-Philippe Brucker wrote:
> > Hi Jordan,
> > 
> > [Lots of IOMMU people have been dropped from Cc, I've tried to add them back]
> > 
> > On 12/10/17 16:28, Jordan Crouse wrote:
> > > On Thu, Oct 12, 2017 at 01:55:32PM +0100, Jean-Philippe Brucker wrote:
> > >> On 12/10/17 13:05, Yisheng Xie wrote:
> > >> [...]
> > >>>>>> * An iommu_process can be bound to multiple domains, and a domain can have
> > >>>>>>   multiple iommu_process.
> > >>>>> when bind a task to device, can we create a single domain for it? I am thinking
> > >>>>> about process management without shared PT(for some device only support PASID
> > >>>>> without pri ability), it seems hard to expand if a domain have multiple iommu_process?
> > >>>>> Do you have any idea about this?
> > >>>>
> > >>>> A device always has to be in a domain, as far as I know. Not supporting
> > >>>> PRI forces you to pin down all user mappings (or just the ones you use for
> > >>>> DMA) but you should sill be able to share PT. Now if you don't support
> > >>>> shared PT either, but only PASID, then you'll have to use io-pgtable and a
> > >>>> new map/unmap API on an iommu_process. I don't understand your concern
> > >>>> though, how would the link between process and domains prevent this use-case?
> > >>>>
> > >>> So you mean that if an iommu_process bind to multiple devices it should create
> > >>> multiple io-pgtables? or just share the same io-pgtable?
> > >>
> > >> I don't know to be honest, I haven't thought much about the io-pgtable
> > >> case, I'm all about sharing the mm :)
> > >>
> > >> It really depends on what the user (GPU driver I assume) wants. I think
> > >> that if you're not sharing an mm with the device, then you're trying to
> > >> hide parts of the process to the device, so you'd also want the
> > >> flexibility of having different io-pgtables between devices. Different
> > >> devices accessing isolated parts of the process requires separate io-pgtables.
> > > 
> > > In our specific Snapdragon use case the GPU is the only entity that cares about
> > > process specific io-pgtables.  Everything else (display, video, camera) is happy
> > > using a global io-ptgable.  The reasoning is that the GPU is programmable from
> > > user space and can be easily used to copy data whereas the other use cases have
> > > mostly fixed functions.
> > > 
> > > Even if different devices did want to have a process specific io-pgtable I doubt
> > > we would share them.  Every device uses the IOMMU differently and the magic
> > > needed to share a io-pgtable between (for example) a GPU and a DSP would be
> > > prohibitively complicated.
> > > 
> > > Jordan
> > 
> > 
> > 
> > More context here:
> > https://www.mail-archive.com/iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org/msg20368.html
> > 
> > So to summarize the Snapdragon case, if I understand correctly you need
> > two additional features:
> > 
> > (1) A way to create process address spaces, that are not bound to an mm
> > but to a separate io-pgtable. And a way to map/unmap these contexts.
> 
> Correct.
> 
> > (2) A way to obtain the PGD in order to program it into the GPU. And also
> > the ASID I suppose? What about TCR and MAIR?
> >
> PGD and ASID.  Not the TCR and MAIR, at least not in the current iteration.
> 
> > For (1), I can see some value in isolating process contexts with
> > io-pgtable without going all the way and sharing the mm. The IOVA=VA
> > use-case feels a bit weak. But it does provide better isolation than
> > dma_map/unmap, if the GPU is in charge of PASIDs then two processes that
> > execute code on the GPU cannot access each others' DMA buffers. Maybe
> > other users will want that feature (but they really should be using bind_mm!).
> 
> That is exactly the use case.  A real-world attach vector in the mobile GPU
> world is a malicious app that knows that knows that if have a banking app
> active and copies the surfaces or at the very least scribbles over everything
> and is very rude.
> 
> > In next version I'm going to replace iommu_process_bind by something like
> > iommu_sva_bind_mm, which reduces the scope of the API I'm introducing and
> > doesn't fit your case anymore. What you need is a shortcut into the PASID
> > allocator, a way to allocate a private PASID with io-pgtables instead of
> > one backed by an mm. Something like:
> > 
> > iommu_sva_alloc_pasid(domain, dev) -> pasid
> > iommu_sva_map(pasid, iova, size, flags)
> > iommu_sva_unmap(pasid, iova, size)
> > iommu_sva_free_pasid(domain, pasid)
> 
> Yep, that matches up with my thinking.
> 
> > Then for (2) the GPU is tightly integrated into the SMMU and can switch
> > contexts. I might be wrong but I don't see this case becoming standard as
> > new implementations move to PASIDs, we shouldn't spend too much time
> > making it generic.
> 
> Agreed. This is rather specific use case.
> 
> > But to make it fit into the PASID API, how about the following.
> 
> 
> > We provide a backdoor to the GPU driver, allowing it to register PASID ops
> > into SMMUv2 driver:
> > 
> > struct smmuv2_pasid_ops {
> > 	int (*install_pasid)(struct iommu_domain, int pasid, ttbr, asid
> > 			     and whatnot);
> > 	void (*remove_pasid)(struct iommu_domain, int pasid);
> > }
> > 
> > On PASID-capable IOMMUs, iommu_sva_alloc_pasid would install a context
> > descriptor into the PASID tables (owned by the IOMMU), pointing to the
> > io-pgtable. As SMMUv2 doesn't support PASID, iommu_sva_alloc_pasid
> > wouldn't actually install a context descriptor but instead call back into
> > the GPU driver with install_pasid. The GPU can then do its thing, call
> > sva_map/unmap, and switch contexts.
> > 
> > The good thing is that (1) and (2) are separate, so you get the same
> > callbacks if you're using iommu_sva_bind_mm instead of the private pasid
> > thing.
> 
> This sounds ideal. It seems to scratch all the right itches that we have. 
> 
> Thanks for thinking about this use case. I appreciate your time.
 
Hi Jean-Philippe -

Just a gentle nudge to see if there is any progress on this front. I know the
last 6 months have been busy with other far more serious panics but I wanted to
offer any help I could provide including testing on various qcom targets.

Regards,
Jordan

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 00/36] Process management for IOMMU + SVM for SMMUv3
@ 2018-02-05 18:15                           ` Jordan Crouse
  0 siblings, 0 replies; 268+ messages in thread
From: Jordan Crouse @ 2018-02-05 18:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Oct 25, 2017 at 02:20:15PM -0600, Jordan Crouse wrote:
> On Mon, Oct 23, 2017 at 02:00:07PM +0100, Jean-Philippe Brucker wrote:
> > Hi Jordan,
> > 
> > [Lots of IOMMU people have been dropped from Cc, I've tried to add them back]
> > 
> > On 12/10/17 16:28, Jordan Crouse wrote:
> > > On Thu, Oct 12, 2017 at 01:55:32PM +0100, Jean-Philippe Brucker wrote:
> > >> On 12/10/17 13:05, Yisheng Xie wrote:
> > >> [...]
> > >>>>>> * An iommu_process can be bound to multiple domains, and a domain can have
> > >>>>>>   multiple iommu_process.
> > >>>>> when bind a task to device, can we create a single domain for it? I am thinking
> > >>>>> about process management without shared PT(for some device only support PASID
> > >>>>> without pri ability), it seems hard to expand if a domain have multiple iommu_process?
> > >>>>> Do you have any idea about this?
> > >>>>
> > >>>> A device always has to be in a domain, as far as I know. Not supporting
> > >>>> PRI forces you to pin down all user mappings (or just the ones you use for
> > >>>> DMA) but you should sill be able to share PT. Now if you don't support
> > >>>> shared PT either, but only PASID, then you'll have to use io-pgtable and a
> > >>>> new map/unmap API on an iommu_process. I don't understand your concern
> > >>>> though, how would the link between process and domains prevent this use-case?
> > >>>>
> > >>> So you mean that if an iommu_process bind to multiple devices it should create
> > >>> multiple io-pgtables? or just share the same io-pgtable?
> > >>
> > >> I don't know to be honest, I haven't thought much about the io-pgtable
> > >> case, I'm all about sharing the mm :)
> > >>
> > >> It really depends on what the user (GPU driver I assume) wants. I think
> > >> that if you're not sharing an mm with the device, then you're trying to
> > >> hide parts of the process to the device, so you'd also want the
> > >> flexibility of having different io-pgtables between devices. Different
> > >> devices accessing isolated parts of the process requires separate io-pgtables.
> > > 
> > > In our specific Snapdragon use case the GPU is the only entity that cares about
> > > process specific io-pgtables.  Everything else (display, video, camera) is happy
> > > using a global io-ptgable.  The reasoning is that the GPU is programmable from
> > > user space and can be easily used to copy data whereas the other use cases have
> > > mostly fixed functions.
> > > 
> > > Even if different devices did want to have a process specific io-pgtable I doubt
> > > we would share them.  Every device uses the IOMMU differently and the magic
> > > needed to share a io-pgtable between (for example) a GPU and a DSP would be
> > > prohibitively complicated.
> > > 
> > > Jordan
> > 
> > 
> > 
> > More context here:
> > https://www.mail-archive.com/iommu at lists.linux-foundation.org/msg20368.html
> > 
> > So to summarize the Snapdragon case, if I understand correctly you need
> > two additional features:
> > 
> > (1) A way to create process address spaces, that are not bound to an mm
> > but to a separate io-pgtable. And a way to map/unmap these contexts.
> 
> Correct.
> 
> > (2) A way to obtain the PGD in order to program it into the GPU. And also
> > the ASID I suppose? What about TCR and MAIR?
> >
> PGD and ASID.  Not the TCR and MAIR, at least not in the current iteration.
> 
> > For (1), I can see some value in isolating process contexts with
> > io-pgtable without going all the way and sharing the mm. The IOVA=VA
> > use-case feels a bit weak. But it does provide better isolation than
> > dma_map/unmap, if the GPU is in charge of PASIDs then two processes that
> > execute code on the GPU cannot access each others' DMA buffers. Maybe
> > other users will want that feature (but they really should be using bind_mm!).
> 
> That is exactly the use case.  A real-world attach vector in the mobile GPU
> world is a malicious app that knows that knows that if have a banking app
> active and copies the surfaces or at the very least scribbles over everything
> and is very rude.
> 
> > In next version I'm going to replace iommu_process_bind by something like
> > iommu_sva_bind_mm, which reduces the scope of the API I'm introducing and
> > doesn't fit your case anymore. What you need is a shortcut into the PASID
> > allocator, a way to allocate a private PASID with io-pgtables instead of
> > one backed by an mm. Something like:
> > 
> > iommu_sva_alloc_pasid(domain, dev) -> pasid
> > iommu_sva_map(pasid, iova, size, flags)
> > iommu_sva_unmap(pasid, iova, size)
> > iommu_sva_free_pasid(domain, pasid)
> 
> Yep, that matches up with my thinking.
> 
> > Then for (2) the GPU is tightly integrated into the SMMU and can switch
> > contexts. I might be wrong but I don't see this case becoming standard as
> > new implementations move to PASIDs, we shouldn't spend too much time
> > making it generic.
> 
> Agreed. This is rather specific use case.
> 
> > But to make it fit into the PASID API, how about the following.
> 
> 
> > We provide a backdoor to the GPU driver, allowing it to register PASID ops
> > into SMMUv2 driver:
> > 
> > struct smmuv2_pasid_ops {
> > 	int (*install_pasid)(struct iommu_domain, int pasid, ttbr, asid
> > 			     and whatnot);
> > 	void (*remove_pasid)(struct iommu_domain, int pasid);
> > }
> > 
> > On PASID-capable IOMMUs, iommu_sva_alloc_pasid would install a context
> > descriptor into the PASID tables (owned by the IOMMU), pointing to the
> > io-pgtable. As SMMUv2 doesn't support PASID, iommu_sva_alloc_pasid
> > wouldn't actually install a context descriptor but instead call back into
> > the GPU driver with install_pasid. The GPU can then do its thing, call
> > sva_map/unmap, and switch contexts.
> > 
> > The good thing is that (1) and (2) are separate, so you get the same
> > callbacks if you're using iommu_sva_bind_mm instead of the private pasid
> > thing.
> 
> This sounds ideal. It seems to scratch all the right itches that we have. 
> 
> Thanks for thinking about this use case. I appreciate your time.
 
Hi Jean-Philippe -

Just a gentle nudge to see if there is any progress on this front. I know the
last 6 months have been busy with other far more serious panics but I wanted to
offer any help I could provide including testing on various qcom targets.

Regards,
Jordan

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 268+ messages in thread

* Re: [RFCv2 PATCH 00/36] Process management for IOMMU + SVM for SMMUv3
  2018-02-05 18:15                           ` Jordan Crouse
@ 2018-02-05 18:43                               ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-05 18:43 UTC (permalink / raw)
  To: jcrouse-sgV2jX0FEOL9JmXXK+q4OQ
  Cc: gabriele.paoloni-hv44wF8Li93QT0dZR+AlfA,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ, Will Deacon,
	okaya-sgV2jX0FEOL9JmXXK+q4OQ,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Hi Jordan,

On 05/02/18 18:15, Jordan Crouse wrote:
[...]
> Just a gentle nudge to see if there is any progress on this front. I know the
> last 6 months have been busy with other far more serious panics but I wanted to
> offer any help I could provide including testing on various qcom targets.

If everything goes well I should be able to send v1 of this series next
week, and I believe the patches implementing sva_map()/sva_unmap() will
soon follow.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

* [RFCv2 PATCH 00/36] Process management for IOMMU + SVM for SMMUv3
@ 2018-02-05 18:43                               ` Jean-Philippe Brucker
  0 siblings, 0 replies; 268+ messages in thread
From: Jean-Philippe Brucker @ 2018-02-05 18:43 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jordan,

On 05/02/18 18:15, Jordan Crouse wrote:
[...]
> Just a gentle nudge to see if there is any progress on this front. I know the
> last 6 months have been busy with other far more serious panics but I wanted to
> offer any help I could provide including testing on various qcom targets.

If everything goes well I should be able to send v1 of this series next
week, and I believe the patches implementing sva_map()/sva_unmap() will
soon follow.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 268+ messages in thread

end of thread, other threads:[~2018-02-05 18:43 UTC | newest]

Thread overview: 268+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-06 13:31 [RFCv2 PATCH 00/36] Process management for IOMMU + SVM for SMMUv3 Jean-Philippe Brucker
2017-10-06 13:31 ` Jean-Philippe Brucker
2017-10-06 13:31 ` Jean-Philippe Brucker
2017-10-06 13:31 ` [RFCv2 PATCH 02/36] iommu: Add a process_exit callback for device drivers Jean-Philippe Brucker
2017-10-06 13:31   ` Jean-Philippe Brucker
2017-10-06 13:31   ` Jean-Philippe Brucker
2017-10-06 13:31 ` [RFCv2 PATCH 03/36] iommu/process: Add public function to search for a process Jean-Philippe Brucker
2017-10-06 13:31   ` Jean-Philippe Brucker
2017-10-06 13:31   ` Jean-Philippe Brucker
     [not found] ` <20171006133203.22803-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
2017-10-06 13:31   ` [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs Jean-Philippe Brucker
2017-10-06 13:31     ` Jean-Philippe Brucker
2017-10-06 13:31     ` Jean-Philippe Brucker
2017-10-23 11:04     ` Liu, Yi L
2017-10-23 11:04       ` Liu, Yi L
2017-10-23 11:04       ` Liu, Yi L
2017-10-23 12:17       ` Jean-Philippe Brucker
2017-10-23 12:17         ` Jean-Philippe Brucker
2017-10-23 12:17         ` Jean-Philippe Brucker
     [not found]         ` <7aaf9851-9546-f34d-1496-cbeea404abbd-5wv7dgnIgG8@public.gmane.org>
2017-10-25 18:05           ` Raj, Ashok
2017-10-25 18:05             ` Raj, Ashok
2017-10-25 18:05             ` Raj, Ashok
2017-10-30 10:28             ` Jean-Philippe Brucker
2017-10-30 10:28               ` Jean-Philippe Brucker
2017-10-30 10:28               ` Jean-Philippe Brucker
     [not found]     ` <20171006133203.22803-2-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
2017-10-20 23:32       ` Sinan Kaya
2017-10-20 23:32         ` Sinan Kaya
2017-10-20 23:32         ` Sinan Kaya
2017-11-02 16:20         ` Jean-Philippe Brucker
2017-11-02 16:20           ` Jean-Philippe Brucker
2017-11-02 16:20           ` Jean-Philippe Brucker
2017-11-08 17:50       ` Bharat Kumar Gogada
2017-11-08 17:50         ` Bharat Kumar Gogada
2017-11-08 17:50         ` Bharat Kumar Gogada
2017-11-09 12:13         ` Jean-Philippe Brucker
2017-11-09 12:13           ` Jean-Philippe Brucker
2017-11-09 12:13           ` Jean-Philippe Brucker
     [not found]         ` <BLUPR0201MB150538FDD455F6042803B54FA5560-hRBPhS1iNj/g9tdZWAsUFxrHTHEw16jenBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-11-09 12:16           ` Jean-Philippe Brucker
2017-11-09 12:16             ` Jean-Philippe Brucker
2017-11-09 12:16             ` Jean-Philippe Brucker
     [not found]             ` <16b6ba80-b15b-b278-0d06-350ae0201e82-5wv7dgnIgG8@public.gmane.org>
2017-11-13 11:06               ` Bharat Kumar Gogada
2017-11-13 11:06                 ` Bharat Kumar Gogada
2017-11-13 11:06                 ` Bharat Kumar Gogada
2017-11-22  3:15     ` Bob Liu
2017-11-22  3:15       ` Bob Liu
2017-11-22  3:15       ` Bob Liu
2017-11-22 13:04       ` Jean-Philippe Brucker
2017-11-22 13:04         ` Jean-Philippe Brucker
2017-11-22 13:04         ` Jean-Philippe Brucker
     [not found]         ` <42f815ee-2a9a-ac49-2392-5c03c1d4c809-5wv7dgnIgG8@public.gmane.org>
2017-11-23 10:33           ` Bob Liu
2017-11-23 10:33             ` Bob Liu
2017-11-23 10:33             ` Bob Liu
2017-10-06 13:31   ` [RFCv2 PATCH 04/36] iommu/process: Track process changes with an mmu_notifier Jean-Philippe Brucker
2017-10-06 13:31     ` Jean-Philippe Brucker
2017-10-06 13:31     ` Jean-Philippe Brucker
2017-10-06 13:31   ` [RFCv2 PATCH 05/36] iommu/process: Bind and unbind process to and from devices Jean-Philippe Brucker
2017-10-06 13:31     ` Jean-Philippe Brucker
2017-10-06 13:31     ` Jean-Philippe Brucker
2017-10-11 11:33     ` Joerg Roedel
2017-10-11 11:33       ` Joerg Roedel
2017-10-12 11:13       ` Jean-Philippe Brucker
2017-10-12 11:13         ` Jean-Philippe Brucker
2017-10-12 11:13         ` Jean-Philippe Brucker
     [not found]         ` <ee7f80e3-ca30-0ee7-53f3-3e57b2b58df6-5wv7dgnIgG8@public.gmane.org>
2017-10-12 12:47           ` Joerg Roedel
2017-10-12 12:47             ` Joerg Roedel
2017-10-12 12:47             ` Joerg Roedel
     [not found]     ` <20171006133203.22803-6-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
2017-10-21 15:47       ` Sinan Kaya
2017-10-21 15:47         ` Sinan Kaya
2017-10-21 15:47         ` Sinan Kaya
     [not found]         ` <683a518d-0e22-c855-2416-2e097ec3291d-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2017-11-02 16:21           ` Jean-Philippe Brucker
2017-11-02 16:21             ` Jean-Philippe Brucker
2017-11-02 16:21             ` Jean-Philippe Brucker
2017-11-29  6:08     ` Yisheng Xie
2017-11-29  6:08       ` Yisheng Xie
2017-11-29  6:08       ` Yisheng Xie
2017-11-29 15:01       ` Jean-Philippe Brucker
2017-11-29 15:01         ` Jean-Philippe Brucker
2017-11-29 15:01         ` Jean-Philippe Brucker
2017-11-30  1:11         ` Yisheng Xie
2017-11-30  1:11           ` Yisheng Xie
2017-11-30  1:11           ` Yisheng Xie
2017-11-30 13:39           ` Jean-Philippe Brucker
2017-11-30 13:39             ` Jean-Philippe Brucker
2017-11-30 13:39             ` Jean-Philippe Brucker
2018-01-19  4:52     ` Sinan Kaya
2018-01-19  4:52       ` Sinan Kaya
2018-01-19  4:52       ` Sinan Kaya
     [not found]       ` <0772e71e-4861-1e7b-f248-88aaba8bf2fc-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2018-01-19 10:27         ` Jean-Philippe Brucker
2018-01-19 10:27           ` Jean-Philippe Brucker
2018-01-19 10:27           ` Jean-Philippe Brucker
2018-01-19 13:07           ` okaya
2018-01-19 13:07             ` okaya at codeaurora.org
2018-01-19 13:07             ` okaya
2017-10-06 13:31   ` [RFCv2 PATCH 06/36] iommu: Extend fault reporting Jean-Philippe Brucker
2017-10-06 13:31     ` Jean-Philippe Brucker
2017-10-06 13:31     ` Jean-Philippe Brucker
2017-10-06 13:31   ` [RFCv2 PATCH 07/36] iommu: Add a fault handler Jean-Philippe Brucker
2017-10-06 13:31     ` Jean-Philippe Brucker
2017-10-06 13:31     ` Jean-Philippe Brucker
2017-10-06 13:31   ` [RFCv2 PATCH 08/36] iommu/fault: Handle mm faults Jean-Philippe Brucker
2017-10-06 13:31     ` Jean-Philippe Brucker
2017-10-06 13:31     ` Jean-Philippe Brucker
2017-10-06 13:31   ` [RFCv2 PATCH 13/36] iommu/of: Add stall and pasid properties to iommu_fwspec Jean-Philippe Brucker
2017-10-06 13:31     ` Jean-Philippe Brucker
2017-10-06 13:31     ` Jean-Philippe Brucker
2017-10-06 13:31   ` [RFCv2 PATCH 19/36] arm64: mm: Pin down ASIDs for sharing contexts with devices Jean-Philippe Brucker
2017-10-06 13:31     ` Jean-Philippe Brucker
2017-10-06 13:31     ` Jean-Philippe Brucker
2017-10-06 13:31   ` [RFCv2 PATCH 20/36] iommu/arm-smmu-v3: Track ASID state Jean-Philippe Brucker
2017-10-06 13:31     ` Jean-Philippe Brucker
2017-10-06 13:31     ` Jean-Philippe Brucker
2017-10-06 13:31   ` [RFCv2 PATCH 21/36] iommu/arm-smmu-v3: Implement process operations Jean-Philippe Brucker
2017-10-06 13:31     ` Jean-Philippe Brucker
2017-10-06 13:31     ` Jean-Philippe Brucker
2017-11-09  3:32     ` Yisheng Xie
2017-11-09  3:32       ` Yisheng Xie
2017-11-09  3:32       ` Yisheng Xie
2017-11-09 12:08       ` Jean-Philippe Brucker
2017-11-09 12:08         ` Jean-Philippe Brucker
2017-11-09 12:08         ` Jean-Philippe Brucker
2017-10-06 13:31   ` [RFCv2 PATCH 23/36] iommu/arm-smmu-v3: Share process page tables Jean-Philippe Brucker
2017-10-06 13:31     ` Jean-Philippe Brucker
2017-10-06 13:31     ` Jean-Philippe Brucker
2017-10-06 13:31   ` [RFCv2 PATCH 28/36] iommu/arm-smmu-v3: Maintain a SID->device structure Jean-Philippe Brucker
2017-10-06 13:31     ` Jean-Philippe Brucker
2017-10-06 13:31     ` Jean-Philippe Brucker
2017-10-06 13:31   ` [RFCv2 PATCH 29/36] iommu/arm-smmu-v3: Add stall support for platform devices Jean-Philippe Brucker
2017-10-06 13:31     ` Jean-Philippe Brucker
2017-10-06 13:31     ` Jean-Philippe Brucker
2017-10-06 13:31   ` [RFCv2 PATCH 30/36] ACPI/IORT: Check ATS capability in root complex nodes Jean-Philippe Brucker
2017-10-06 13:31     ` Jean-Philippe Brucker
2017-10-06 13:31     ` Jean-Philippe Brucker
2017-10-06 13:32   ` [RFCv2 PATCH 34/36] PCI: Make "PRG Response PASID Required" handling common Jean-Philippe Brucker
2017-10-06 13:32     ` Jean-Philippe Brucker
2017-10-06 13:32     ` Jean-Philippe Brucker
     [not found]     ` <20171006133203.22803-35-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
2017-10-06 18:11       ` Bjorn Helgaas
2017-10-06 18:11         ` Bjorn Helgaas
2017-10-06 18:11         ` Bjorn Helgaas
2017-10-06 13:32   ` [RFCv2 PATCH 35/36] iommu/arm-smmu-v3: Add support for PRI Jean-Philippe Brucker
2017-10-06 13:32     ` Jean-Philippe Brucker
2017-10-06 13:32     ` Jean-Philippe Brucker
2017-10-06 13:31 ` [RFCv2 PATCH 09/36] iommu/fault: Allow blocking fault handlers Jean-Philippe Brucker
2017-10-06 13:31   ` Jean-Philippe Brucker
     [not found]   ` <20171006133203.22803-10-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
2017-11-29  6:15     ` Yisheng Xie
2017-11-29  6:15       ` Yisheng Xie
2017-11-29  6:15       ` Yisheng Xie
     [not found]       ` <7e1c8ea4-e568-1000-17de-62f8562c7169-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2017-11-29 15:01         ` Jean-Philippe Brucker
2017-11-29 15:01           ` Jean-Philippe Brucker
2017-11-29 15:01           ` Jean-Philippe Brucker
     [not found]           ` <74891e35-17d8-5831-1ebd-18e00ce00d74-5wv7dgnIgG8@public.gmane.org>
2017-11-30  2:45             ` Yisheng Xie
2017-11-30  2:45               ` Yisheng Xie
2017-11-30  2:45               ` Yisheng Xie
2017-10-06 13:31 ` [RFCv2 PATCH 10/36] vfio: Add support for Shared Virtual Memory Jean-Philippe Brucker
2017-10-06 13:31   ` Jean-Philippe Brucker
2017-11-24  8:23   ` Bob Liu
2017-11-24  8:23     ` Bob Liu
2017-11-24  8:23     ` Bob Liu
2017-11-24 10:58     ` Jean-Philippe Brucker
2017-11-24 10:58       ` Jean-Philippe Brucker
2017-11-24 10:58       ` Jean-Philippe Brucker
2017-10-06 13:31 ` [RFCv2 PATCH 11/36] iommu/arm-smmu-v3: Link domains and devices Jean-Philippe Brucker
2017-10-06 13:31   ` Jean-Philippe Brucker
2017-10-06 13:31 ` [RFCv2 PATCH 12/36] dt-bindings: document stall and PASID properties for IOMMU masters Jean-Philippe Brucker
2017-10-06 13:31   ` Jean-Philippe Brucker
     [not found]   ` <20171006133203.22803-13-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
2017-10-13 19:10     ` Rob Herring
2017-10-13 19:10       ` Rob Herring
2017-10-13 19:10       ` Rob Herring
2017-10-16 10:23       ` Jean-Philippe Brucker
2017-10-16 10:23         ` Jean-Philippe Brucker
2017-10-16 10:23         ` Jean-Philippe Brucker
     [not found]         ` <e7288f51-1cfa-44ce-e3ce-e9f3daf91579-5wv7dgnIgG8@public.gmane.org>
2017-10-18  2:06           ` Rob Herring
2017-10-18  2:06             ` Rob Herring
2017-10-18  2:06             ` Rob Herring
2017-10-06 13:31 ` [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for Substream IDs Jean-Philippe Brucker
2017-10-06 13:31   ` Jean-Philippe Brucker
2017-11-02 12:49   ` Shameerali Kolothum Thodi
2017-11-02 12:49     ` Shameerali Kolothum Thodi
2017-11-02 12:49     ` Shameerali Kolothum Thodi
2017-11-02 15:51     ` Jean-Philippe Brucker
2017-11-02 15:51       ` Jean-Philippe Brucker
2017-11-02 15:51       ` Jean-Philippe Brucker
2017-11-02 17:02       ` Shameerali Kolothum Thodi
2017-11-02 17:02         ` Shameerali Kolothum Thodi
2017-11-02 17:02         ` Shameerali Kolothum Thodi
2017-11-03  5:45         ` Yisheng Xie
2017-11-03  5:45           ` Yisheng Xie
2017-11-03  5:45           ` Yisheng Xie
2017-11-03  9:37           ` Jean-Philippe Brucker
2017-11-03  9:37             ` Jean-Philippe Brucker
2017-11-03  9:37             ` Jean-Philippe Brucker
2017-11-03  9:39             ` Shameerali Kolothum Thodi
2017-11-03  9:39               ` Shameerali Kolothum Thodi
2017-11-03  9:39               ` Shameerali Kolothum Thodi
2017-11-06  0:50             ` Yisheng Xie
2017-11-06  0:50               ` Yisheng Xie
2017-11-06  0:50               ` Yisheng Xie
2017-10-06 13:31 ` [RFCv2 PATCH 15/36] iommu/arm-smmu-v3: Add second level of context descriptor table Jean-Philippe Brucker
2017-10-06 13:31   ` Jean-Philippe Brucker
2017-10-06 13:31 ` [RFCv2 PATCH 16/36] iommu/arm-smmu-v3: Add support for VHE Jean-Philippe Brucker
2017-10-06 13:31   ` Jean-Philippe Brucker
2017-10-06 13:31 ` [RFCv2 PATCH 17/36] iommu/arm-smmu-v3: Support broadcast TLB maintenance Jean-Philippe Brucker
2017-10-06 13:31   ` Jean-Philippe Brucker
2017-10-06 13:31 ` [RFCv2 PATCH 18/36] iommu/arm-smmu-v3: Add SVM feature checking Jean-Philippe Brucker
2017-10-06 13:31   ` Jean-Philippe Brucker
2017-10-06 13:31 ` [RFCv2 PATCH 22/36] iommu/io-pgtable-arm: Factor out ARM LPAE register defines Jean-Philippe Brucker
2017-10-06 13:31   ` Jean-Philippe Brucker
2017-10-06 13:31 ` [RFCv2 PATCH 24/36] iommu/arm-smmu-v3: Steal private ASID from a domain Jean-Philippe Brucker
2017-10-06 13:31   ` Jean-Philippe Brucker
2017-10-06 13:31 ` [RFCv2 PATCH 25/36] iommu/arm-smmu-v3: Use shared ASID set Jean-Philippe Brucker
2017-10-06 13:31   ` Jean-Philippe Brucker
2017-10-06 13:31 ` [RFCv2 PATCH 26/36] iommu/arm-smmu-v3: Add support for Hardware Translation Table Update Jean-Philippe Brucker
2017-10-06 13:31   ` Jean-Philippe Brucker
2017-12-06  6:51   ` Yisheng Xie
2017-12-06  6:51     ` Yisheng Xie
2017-12-06  6:51     ` Yisheng Xie
     [not found]     ` <d2ec2e61-f758-0394-41d2-555ae65feb0d-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2017-12-06 11:06       ` Jean-Philippe Brucker
2017-12-06 11:06         ` Jean-Philippe Brucker
2017-12-06 11:06         ` Jean-Philippe Brucker
2017-10-06 13:31 ` [RFCv2 PATCH 27/36] iommu/arm-smmu-v3: Register fault workqueue Jean-Philippe Brucker
2017-10-06 13:31   ` Jean-Philippe Brucker
2017-10-06 13:31 ` [RFCv2 PATCH 31/36] iommu/arm-smmu-v3: Add support for PCI ATS Jean-Philippe Brucker
2017-10-06 13:31   ` Jean-Philippe Brucker
2017-11-16 14:19   ` Bharat Kumar Gogada
2017-11-16 14:19     ` Bharat Kumar Gogada
2017-11-16 14:19     ` Bharat Kumar Gogada
     [not found]     ` <BLUPR0201MB150565029F9260A528739ACBA52E0-hRBPhS1iNj/g9tdZWAsUFxrHTHEw16jenBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-11-16 15:03       ` Jean-Philippe Brucker
2017-11-16 15:03         ` Jean-Philippe Brucker
2017-11-16 15:03         ` Jean-Philippe Brucker
     [not found]         ` <673fda01-2ae0-87e4-637e-fe27096b6be0-5wv7dgnIgG8@public.gmane.org>
2017-11-17  6:11           ` Bharat Kumar Gogada
2017-11-17  6:11             ` Bharat Kumar Gogada
2017-11-17  6:11             ` Bharat Kumar Gogada
     [not found]             ` <BLUPR0201MB1505BC86D3838D13F38665E7A52F0-hRBPhS1iNj/g9tdZWAsUFxrHTHEw16jenBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-11-17 11:39               ` Jean-Philippe Brucker
2017-11-17 11:39                 ` Jean-Philippe Brucker
2017-11-17 11:39                 ` Jean-Philippe Brucker
2017-10-06 13:31 ` [RFCv2 PATCH 32/36] iommu/arm-smmu-v3: Hook ATC invalidation to process ops Jean-Philippe Brucker
2017-10-06 13:31   ` Jean-Philippe Brucker
2017-10-06 13:32 ` [RFCv2 PATCH 33/36] iommu/arm-smmu-v3: Disable tagged pointers Jean-Philippe Brucker
2017-10-06 13:32   ` Jean-Philippe Brucker
2017-10-06 13:32 ` [RFCv2 PATCH 36/36] iommu/arm-smmu-v3: Add support for PCI PASID Jean-Philippe Brucker
2017-10-06 13:32   ` Jean-Philippe Brucker
2017-10-09  9:49 ` [RFCv2 PATCH 00/36] Process management for IOMMU + SVM for SMMUv3 Yisheng Xie
2017-10-09  9:49   ` Yisheng Xie
2017-10-09  9:49   ` Yisheng Xie
2017-10-09 11:36   ` Jean-Philippe Brucker
2017-10-09 11:36     ` Jean-Philippe Brucker
2017-10-09 11:36     ` Jean-Philippe Brucker
     [not found]     ` <0fecd29e-eaf7-9503-b087-7bfbc251da88-5wv7dgnIgG8@public.gmane.org>
2017-10-12 12:05       ` Yisheng Xie
2017-10-12 12:05         ` Yisheng Xie
2017-10-12 12:05         ` Yisheng Xie
2017-10-12 12:55         ` Jean-Philippe Brucker
2017-10-12 12:55           ` Jean-Philippe Brucker
2017-10-12 12:55           ` Jean-Philippe Brucker
     [not found]           ` <8a1e090d-22e8-0295-a53f-bc3b5b7d7971-5wv7dgnIgG8@public.gmane.org>
2017-10-12 15:28             ` Jordan Crouse
2017-10-12 15:28               ` Jordan Crouse
2017-10-12 15:28               ` Jordan Crouse
     [not found]               ` <20171012152803.GA3027-9PYrDHPZ2Orvke4nUoYGnHL1okKdlPRT@public.gmane.org>
2017-10-23 13:00                 ` Jean-Philippe Brucker
2017-10-23 13:00                   ` Jean-Philippe Brucker
     [not found]                   ` <8539601d-ef7a-8dd0-2fc7-51240c292678-5wv7dgnIgG8@public.gmane.org>
2017-10-25 20:20                     ` Jordan Crouse
2017-10-25 20:20                       ` Jordan Crouse
     [not found]                       ` <20171025202015.GA6159-9PYrDHPZ2Orvke4nUoYGnHL1okKdlPRT@public.gmane.org>
2018-02-05 18:15                         ` Jordan Crouse
2018-02-05 18:15                           ` Jordan Crouse
     [not found]                           ` <20180205181513.GB878-9PYrDHPZ2Orvke4nUoYGnHL1okKdlPRT@public.gmane.org>
2018-02-05 18:43                             ` Jean-Philippe Brucker
2018-02-05 18:43                               ` Jean-Philippe Brucker
2017-11-08  1:21           ` Bob Liu
2017-11-08  1:21             ` Bob Liu
2017-11-08  1:21             ` Bob Liu
2017-11-08 10:50             ` Jean-Philippe Brucker
2017-11-08 10:50               ` Jean-Philippe Brucker
2017-11-08 10:50               ` Jean-Philippe Brucker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.